Archives 2005 - 2019

The Guardian om Sverige, Göran Lindberg och våra deckarförfattare

published Aug 01, 2010 11:02 by admin ( last modified Aug 01, 2010 11:02 )

Deckarförfattarna Henning Mankell, Stieg Larsson och andra har visat en bild av Sverige som är annorlunda än den idealiserade bild som ofta finns i t ex Storbritannien. Efter domen mot förre länspolischefen Göran Lindberg så undrar The Guardian's Andrew Anthony om det kanske ligger något i denna nya bild:

Anthony skriver en lång och intressant artikel som redovisar synpunkter från olika svenska opinionsledare. Han avslutar med en undran om det kan vara så att svenskarna är överpragmatiska, att vi identifierar problem som ska lösas med utbildning, utan att ta in den större bilden av vad det är man försöker lösa.

Läs mer: Göran Lindberg and Sweden's dark side | feature | World news | The Observer

When clicking in folder_contents leads to downloads (Plone)

published Aug 01, 2010 02:58 by admin ( last modified Aug 01, 2010 02:58 )

Summary: Add the portal_type on a line in the property "typesUseViewActionInListings" in the site_properties property sheet in portal_properties, and you will get the view instead

Having created my own MP3 content type in Plone, based on the File content type, I noticed the strange behavior that clicking on any MP3 in a folder contents listing would lead to a download of the MP3. Awkward.

Trying out the standard File content type, this does not happen; you get to the view of the content item instead.

So what's the difference? With File, there is "/view" appended to the url of each content item in the folder contents listing, with the MP3 file there is not. Time to look through the view configurations in XML for the two content types. There were differences, specifically File tacks on "/view" to the specification of "View". Changing this in MP3 made no difference.

Ok, time to take on the diving suit and chase down the code for the folder_contents view. Thankfully it was reasonably easy to find, in

./plone/app/content/browser/foldercontents.py

The view decides to tack on "/view" if the content type of the item to list is in this list:

use_view_action = site_properties.getProperty('typesUseViewActionInListings', ())

So, I added "MP3" on a line in the property "typesUseViewActionInListings" in the site_properties property sheet in portal_properties. And it worked. Not so obvious.

Checking what would be updated before making an svn up

published Jul 30, 2010 11:59 by admin ( last modified Jul 30, 2010 11:59 )

If you have a working copy in subversion that is older than the latest checkin, you may want to check the differences you would get if you made an svn up, without actually doing an svn up.

svn diff --revision HEAD

...will give you that.

--revision N The client compares TARGET@N against working copy.

Läs mer: svn diff

Link - On software project development estimates

published Jul 30, 2010 06:05 by admin ( last modified Jul 30, 2010 06:05 )

How many estimates were accurate?

Läs mer: 10 Reasons Why Software Project Estimates Fail

Video editors on Linux

published Jul 28, 2010 06:23 by admin ( last modified Jul 28, 2010 06:23 )

Here is a list of Video editors available on Linux. I have tried them all, save Lombard. Will try to summarize more experiences at a later time.

My use case was extracting individual three second long moves from salsa dance performances. I tried three Linux based video editors for this: AviDemux LiVES, Cinelerra and Pitivi. The best by far for this particular application were AviDemux and LiVES.

Updated 2012-02-23 with AviDemux.

AviDemux, with preprocessing in ffmpeg is the best for my uses. Clean interface and fast. You must preprocess with e.g. ffmpeg for scrubbing to work if you have keyframes in your video. Read more here on how to do that.
Cinelerra (Cinelerra for Grandma)
LiVES The second best one I have found for extracting short video clips from a video.
Kdenlive Has a lot of filters you can apply to video. Unsure of how cutting works.
Pitivi
Lombard Very recent contribution, supports only basic editing at this point in time, according to web site.

CLI ID3 editors - tagging a tree with find and xargs

published Jul 27, 2010 11:39 by admin ( last modified Jul 27, 2010 11:39 )

Summary: id3tool, id3v2 and eyeD3 worked fine with find and xargs, see below for the specific syntax for each.
I could not get id3ren to work. All taken from Ubuntu's repository for 9.10 Karmic Koala.

I needed to tag a whole directory tree of MP3 files with the genre tag "Salsa". This blog post is about whether the above mentioned command line tagging tools worked as expected with find and xargs to batch add a tag. Results were checked with the EasyTag editor.

Now, a standard way in the shell on unixish systems to dig up files and pipe them to a command is something like this:

find -name "*.mp3" -print0 |xargs -0 tag-command --flags

"find" finds all items and prints them out for you,
the "-name "*.mp3" limits it to items ending in mp3,
the "-print0" prints them in such a way that the next command in the chain does not get confused by any weird characters in the file names, achieved by putting a NULL character in between them. MP3 files often have weird file names. Do note that the -print0 command needs to come after the parameters to find!

"xargs" then executes tag-command for each item that find finds. The "-0" flag is to unpack the NULL delimited items xargs gets from find.

Here are the results:

find -name "*.mp3" -print0|xargs -0 id3tool -G Salsa

(no output generated on command line) id3tool worked like a charm.

find -name "*.mp3" -print0|xargs -0 id3ren -genre Salsa
*** No ID3 tag found in ./music - music.mp3

===> Entering new tag info for ./music - music.mp3:

id3ren just hanged there for at least ten minutes, at which point it was terminated by me. I copied the mp3 file it stopped on to another directory and ran id3tool on it as above, and it worked fine. Finally I also ran it on files that had already been tagged; it still reported "No ID3 tag found", and hanged.

find -name "*.mp3" -print0|xargs -0 id3v2 -g 143

(no output generated on command line) id3v2 worked like a charm.

eyeD3 also worked like a charm:

find -name "*.mp3" -print0|xargs -0 eyeD3 -G Salsa

a file.mp3	[ 7.85 MB ]
-------------------------------------------------------------------------------
Time: 04:51	MPEG1, Layer III	[ ~225 kb/s @ 44100 Hz - Joint stereo ]
-------------------------------------------------------------------------------
No ID3 v1.x/v2.x tag found!
Setting track genre: Salsa
Writing tag...
ID3 v2.4:
title: 		artist: 
album: 		year: None
track: 		genre: Salsa (id 143)

...and eyeD3 just chugged on like that, adding tags.

One difference between eyeD3 and the other working editors was that they stopped when being fed a faulty file path, while eyeD3 continued with the next path.

Update 2010-09-22

This blog posting has been completely rewritten after "jax" commented that the find command used had been wrong. I had treated "-print0" as just another flag to the find command. It is not, in fact when find encounters a "-print" or "-print0" command it prints right then and there, and any flags coming thereafter are ignored (unless another -print or print0 command follows them). This meant that my impression was that none of the editors worked, when in fact all but one did work.

I first tested like this:

find -print0 -name "*.mp3"|xargs -0 tag-command --flags

But that is wrong!

This is the way it should be:

find -name "*.mp3" -print0 |xargs -0 tag-command --flags

Thanks to jax for pointing this out.

Link - Wrapping mplayer in python

published Jul 27, 2010 01:47 by admin ( last modified Jul 27, 2010 01:47 )

Update 2011-jan-09: I have now got around to it, but have had problems using the below code examples on Linux. They seem to be targeted to Windows. I have instead started using the python-mplayer library with gtk. It works fine, with the exception that I could not get sound through my USB headset when using PulseAudio. Solution was to switch to a sound card.

Sometimes I'd just like to make a video player/editor with the functions I need and nothing else: Slow motion, select a short piece and save it. If I ever get around to that, this may be a good starting point:

This article will just focus on creating a really simple Media Player that you can play movies with.

Läs mer: wxPython: Creating a Simple Media Player « The Mouse Vs. The Python

Yes please mama

published Jul 25, 2010 06:43 by admin ( last modified Jul 25, 2010 06:43 )

I asked a friend a question yesterday, about how to remember rhythms. This afaik is for samba: "Yes please mama, I want cake, mama". I guess there are many rhythms in samba, though,

One of our patterns with a phrase to help you remember it: "Yes please mama, I want cake, mama!" When you play it, "mama" is played with open notes, all the rest are closed. See, easy!

Samba Ottawa Home Page

Mp3cat - gets only the data part of the mp3

published Jul 23, 2010 03:15 by admin ( last modified Jul 23, 2010 03:15 )

One's mp3 files have a tendency to get duplicated over different computers, hard disks and portable players. Some of the tags may also get edited on some copies but not others. Since tags are stored in the file there is no easy way to detect which mp3s are identical music-wise by just hashing the file with md5, sha-1 or other hashing algorithm. Instead you need to run the hashing on only the data part.

I have not found any modules for python that does this. I have found one python module which will report the byte offset of where the data starts, but I guess that only works if the meta data are in the start of the file. The only command line tool I have found that can extract the data part is mp3cat:

This is the mp3cat home page. Download the latest release tarball mp3cat-0.4.tar.gz or (better) check out the current version from my subversion repository: http://svn.tomclegg.net/repos/trunk/mp3cat

My intial evaluation is favorable, i.e. an md5 hash of two files with different meta data but same music data gets the same hash value. Found mp3cat via this blog post:

The author wants the same thing I am searching for–the ability to generate a checksum of the audio stream and store it in the file header as a tag. Furthermore, he mentions his use of mp3cat! I pulled down a copy of mp3cat and compiled it on my archive box. Then the fun began

Tim's Mind Organized » Checksum mp3 audio frames (the data and not the headers)

There has at least existed one java program that does the data extraction and hashing in one fell swoop. It is mentioned in this discussion, but the download link does not seem to work.

Update 2010-07-25

I now looked into the CPAN archive, and as usual there is a Perl module, Audio::Digest::MP3 that does exactly what you want :-)

Audio::Digest::MP3 - Get a message digest for the audio stream out of an MP3 file (skipping ID3 tags

Audio::Digest::MP3 - search.cpan.org

Untested by me so far.

Counting bpm (beats per minute) on Linux

published Jul 23, 2010 01:10 by admin ( last modified Nov 09, 2012 11:53 )

Last updated: November 25, 2010

Summary: Use the command line tool soundstretch:

soundstretch file.wav -bpm

It detects about 50% of my salsa music correctly (sample size: about 30 songs). I have written a wrapper around soundstretch for bpm assessment of mp3 files. It is here.

There is also bpmcount, part of BpmDj. Beware that it runs about 100 times slower than soundstretch as in 6 minutes compared to 3.6 seconds: I have only tried one song so far in bpmcount, and it got that wrong as 180 bpm when it is around 208 bpm

For better performance than soundstretch (~80%) and automatic bpm detection with a GUI, use the free Abyssmedia BPM Counter, and run it under Wine. Piston bpm counter also runs under Wine, and writes BPM tags, but performs worse than Abyssmedia and Mixmeister.

For manual beat counting, use Salsa City Beat Counter, or the beat counter in Banshee, which will also give you a gui to the soundstretch code.

Bpm counting software on Linux

What software can you use to count beats per minute in your songs using Linux? Although there is a slew of software on Windows, the selection is smaller on Linux, and if you look closer at the Linux offerings you will find that they often use the same bpm counting code.

The two sources of bpm counting code that I have found are the soundtouch library by Olli Parviainen and the bpmcount from BpmDj by Werner Van Belle. Van Belle has also written a paper outlining the algorithm used in bpmcount.

Soundtouch (logo at left) is used by the command line application soundstretch, and by the GUI applications Banshee (in the shape of a gstreamer plugin) and Mixxx (as a library).

Bpmcount can be used a standalone command line application and also from inside of BpmDj.

I have uses salsa music to test the counters, and there are many rhythms in salsa songs for a poor piece of software to latch on to. Salsa also lacks deep bass (a sub woofer is useless) which may make the task more difficult. Salsa typically ranges from 165 bpm to 210 bpm.

Bars and beats

I you have looked at the Salsa City Beat Counter you may have noticed that it further down on the page says "bar" not "beat". Let's sort out the difference between bars per minute and beats per minute, and why at all you want to know any of those for a piece of music. For most music there will be 4 beats to a bar, so a song which has 160 bpm (beats per minute) has 40 bpm( ehrm, bars per minute). In this text bpm will henceforth always mean beats per minute.

The reason you want to know the bpm, well since you are reading this text you probably already precisely know why you want to know it, but generally speaking, it is useful in connection with dancing. If you are DJ'ing or practicing dancing, it is useful to play songs that fit in a certain range of dance speed. If your software includes pitch shift and time stretching, you can use also use bpms to sync up songs.

Command line applications

Soundtouch is used by the command line application soundstretch, and bpmcount from BpmDj can be used a standalone command line application. Both operate on Wav files rather than mp3 files. bpmcount is in my tests approximately 100 times slower than soundstretch, meaning a song that takes 3.6 seconds to classify with soundtouch takes a full six minutes with bpmcount. I have only tried one song so far in bpmcount, and it got that wrong as 180 bpm when it is around 208 bpm. Soundstretch I have tried on about 30 files.

Soundstretch has a tendency to output the bpm count as a quarter or half of the real count. It got the 208 bpm file as 52 bpm. I have written a wrapper around it to compensate for this and to make it operate on mp3 files. It is here.

Mixxx

Mixxx, is a piece of Dj software that has beat counting built in. It has a reasonable simple interface for analyzing files, once you find it, but the BPM is not written to disk. Mixxx uses the soundtouch library.

There is an analyze library view where you can select and analyze files, including getting the bpm.

You can register profiles, giving sane max and min bpm values for your music genres

But alas, you can't get Mixxx to write the bpm to disk, despite the very nice properties dialog, with doubling and halving, tapping the beat and a next button. Problem is "Apply" does not write the bpm to disk

Bpmdj

There is a Linux native product calledd BpmDj, but I did not get it to compile in the time I was willing to spend with (and the pre compiled binary did not work on Ubuntu 9.10 or 10.10 for me). However it contains a command line app called bpmcount and it worked just fine.

Banshee

Banshee is a music player quite similar to Rhythmbox (but written in the mono framework). It is the only music player on Linux so far that I have seen taking bpm seriously: Bpm counting and tagging is well integrated in the GUI. It uses the soundtouch library in the shape of a gstreamer plugin and performance can be expected to be identical to soundstretch because of this, but with rounding errors: Soundtouch has a tendency to output a bpm rate a quarter or a half of the correct rate. Multiplying by e.g. 4 rectifies this, but since Banshee reports the bpm count as an integer, you do lose a bit of precision.

Running Windows software under wine

One of the most recommended automatic beat counters for Windows is MixMeister, but it does not run well under Wine on Linux according to the database at winehq. I have tried it, it installs and runs, but no beat counting takes place

However Abyssmedia BPM Counter for Windows runs just fine under Wine, and does a pretty good job. With BPM counter you will have to read the bpm counts off the screen; there is no facility for writing back to the mp3 files or to a separate file. It detected 80% of my salsa music correctly, and there are many rhythms in salsa songs for a poor piece of software to latch on to. It did more or less consistently though report half the bpm of each song. I got about 70% of the songs classified at half speed. A simple reality check fixed that though: Salsa music ranges from 150 bpm to 210 bpm, so 96 bpm classification is actually 192 bpm. The same adjustments were made for soundstretch.

In the second column "File" is appended the correct verified (give or take a few beats) bpm count for the song.

Abyssmedia BPM Counter does a pretty good job, as long as you double the bpm. Petty it does not write bpms to disk.

Trying to find something that writes to tags directly, I also tried BPM ProScan running in a win2k installation under vmware. It did not perform well on Salsa songs. As an example Lollobrigida by Ricardo Lemvo was classified as having 120 bpm. It has 180 bpm in the first half and a bit over 160 bpm in the second half. Other bpm counters sometimes get it at 90 bpm, which is understandable, but 120bpm is right out.

PistonSoft BPM Detector

PistonSoft BPM Detector installs under Wine and runs. Below a screen shot that compares the actual bpm count of files, with the classification performance of PistonSoft BPM counter and Mixmeister bpm analyzer.

In the first column "Filename" is appended the correct verified (give or take a few beats) bpm count for the song. In the next column "BPM", is the bpm count provided by PistonSoft BPM counter running under Wine, and the third column "Tag BPM" shows the bpm tag of the files as written by Mixmeister bpm analyzer running under Windows Vista:

It is pretty clear that MixMeister BPM Analyzer outperforms PistonSoft BPM Detector on salsa music, although you often need to double the bpm classification for MixMeister BPM Analyzer.

However PistonSoft gets points for running under my Wine installation, which MixMeister does not.

Mixmeister BPM analyzer

Getting back to Mixmeister bpm analyzer, I finally gave in and installed it on a Windows Vista machine. It dumps the bpm data to a tab delimited data file, with bpm count as the last item in each row. I ran it on the same files as I had classified with BPM counter (and adjusted manually). In these I had encoded the bpm count right into the file name, and since the file names were included in the data file put out by mixmeister, it made for easy comparison of the bpm classifications of Mixmeister and Abyssmedia, and it turned out that they were often within one bpm. Reading more on the Internet it seems like Mixmeister does write to the bpm ID3v2 tag! It just doesn't tell you about it. Looking into one of the files I tested with in a hex editor there is indeed a TBPM field. It may be worth firing up Windows just for Mixmeister then, as long as it does not overwrite an existing BPM tag.

I am right now classifying 300+ mp3s with it, and I notice that it almost always suggest half the bpm of the actual bpm of the song. Looking through the numbers, this is caused by it never assigning anything above 165 bpm to a song. While writing my wrapper around soundstretch I applied a window of sanity for bpm classifications, and it seems that Mixmeister does the same: It fits everything in between 165 bpm and probably 83 bpm at the low end. Since salsa pretty much starts at 165 bpm and goes upwards from there, it explains the consistent halving of the bpm reported as compared to actual bpm.

Manual beat counters

How do I know that it was correct? I double checked it (well not all songs, but many) with the javascript based manual beat counter Beat Counter, which runs in a browser, and hence works just fine under Linux. With Beat Counter, you start it with a mouse click and tap the beat with any key on the keyboard (do not use space bar on Linux though, it will scroll the page).

Bear in mind that the mouse click is counted as one tap, and if you casually start it with a mouse click and then wait a few secs before tapping, it will assume there was a super long beat between the mouse click and the first tap. This will result in the average bpm be wrong and slowly rise but never reach the true bpm of the song.

Update: Here is a better beat counter: Salsa City bar counter, a bit easier to use.

BPM Counter is a fast and accurate beats per minute detector for MP3 music.

BPM Counter - detect BPM of any song.

A manual Beats per Minute (BPM) counter for DJs. Counts the beats-per-minute of a song by tapping a key or the mouse to the beat of a song. Simply click on the page to start the time then tap any key to the beat.

The JavaScript Source: Math Related: Beat Counter

Varför nämner man att det var Boeing-plan?

published Jul 21, 2010 07:37 by admin ( last modified Jul 21, 2010 07:37 )

Två trafikflygplan, ett från SAS och ett från Finnair var hyfsat nära att kollidera ovanför Östersund för några veckor sedan. TT väljer att beskriva detta som att två Boeing-plan var nära att kollidera. Att nämna planens fabrikat känns ungefär lika relevant som att vid en incident mellan två personbilar skriva t ex "Två Renault-bilar nära att krocka på Storgatan".

Det är väl inte särskilt sannolikt att det är det faktum att planen är av fabrikatet Boeing som gjort att de låg nära varandra i luftrummet?

På Internet finns det ju gott om konspirationsteorier, och denna TT-artikel som jag såg i Svenskan fick nästan mig att börja tänka i sådana banor. Konkurrensen är hård mellan Airbus och Boeing, och vissa uttryck som t ex "If it's not Boeing, I'm not going", och "Scarebus" har fått fotfäste bland en del internetdebattörer. Det blir liksom viral marknadsföring om vilken tillverkare som är bäst.

Två Boeing - en 737 från SAS och en 757 från Finnair - med passagerare möttes den 2 juli 11 000 meter över Östersund. Kurserna skar varandra, höjden var densamma. Kollisionsrisken var uppenbar

Läs mer: Boeing nära krocka över Östersund | Inrikes | SvD

Vän av ordning: tid och rum i Expressen och Svenskan

published Jul 19, 2010 01:40 by admin ( last modified Jul 19, 2010 01:40 )

Så här i semestertider - eller är det värmen? - kan det bli lite extra fel i artiklar. Speciellt kul blir det om man liksom bryter mot tid och rum.

Hopblandning av tid

Expressens nätupplaga skriver just nu på förstasidan att Pete Doherty gjorde bra ifrån sig under konserten på Arvikafestivalen, trots en incident med polisen efter konserten.

Här vill jag mena att det mest effektiva sättet att rent praktiskt hantera universum är att förutsätta att händelser som inträffar efter i tiden, inte kan påverka det som redan hänt. "Trots incidenten efteråt"???

Hopblandning av rum (geografi)

I Svenskan kan man nu läsa att en tropisk sjukdom siktats i Frankrike, och:

Fallet var det första i Europa i år, men för några år sedan avled 77 personer på den franska ön Reúnion

Men Reúnion ligger tusen mil från Europa (i fyrkanten på wikipediakartan nedan):

Får väl skicka en tidspil till Expressen och en karta til SvD?

Speaker switch that daisy chains speakers, to avoid low impedance

published Jul 17, 2010 05:51 by admin ( last modified Jul 17, 2010 05:51 )

At work, we have a stereo and four 4 ohm speakers, and a sub woofer. The amplifier only has one set of speaker outputs and is rated for outputting at 4 ohms. How do we go about connecting the speakers without getting to low an impedance? The speaker switch boxes at the local shop all seem to expand the speaker connections by putting them in parallel, which would bring the load the amplifier sees down to 2 ohms, nominally.

The sub woofer has connectors both for line in levels (250 mV or 775 mV I presume) and for speaker cable. I assume the speaker cable input is high impedance (kilo ohms or higher).

I thought about what the schematics could be for a simple speakers switch box, and below is what I came up with. It is untested:

Suspend and resume an Amazon virtual computer?

published Jul 17, 2010 04:55 by admin ( last modified Jul 17, 2010 04:55 )

With the brand new possibilty to use your own kernels, it ought to be possible for a virtual computer on the Amazon grid to be shut down with its state intact, saved to disk

Virtual servers, where you can think of a cloud of services where you pick and choose what you want and expand and contract your capacity in minutes, are becoming - I'd say - the premier choice for many outfits. There are a number of well-reputed providers, such as Amazon and Linode, but in the case of Amazon the smallest server is still rather big and pricey for being used as a alternative desktop or for throw-away development servers.

However Amazon's storage prices are low enough that having a number of standby servers, ready to recreate their state in seconds, could lend it to new applications, such as desktop machines and a company having a slew of development, staging and testing servers.

With the advent of persistent storage of your operating system on Amazon EC2 (September 2009) and now the brand new possibilty to use your own kernels, it ought to be possible for a virtual computer on the Amazon grid to be shut down with its state intact, saved to disk (i.e. hibernate), and the be woken up at later time, and you can just continue working where you are. There are such facilities in the Linux kernel, and there is also the Tux on ice project. Tux on ice allows you to save the RAM state to a file on the file system, so no need for a persistent swap partition.

This Feature Guide below is designed to teach System Administrators and other IT professionals how to utilize the User Provided Kernels in Amazon EC2. With this feature, Amazon allows you to load a para‐virtual Linux kernel within an Amazon Machine Image (AMI) or Amazon EBS volume. You can also now seamlessly upgrade the kernel on Amazon EBS‐backed instances.

Läs mer: Amazon Web Services Developer Community : Enabling User Provided Kernels in Amazon EC2

Många routrar känsliga för en DNS-attack

published Jul 16, 2010 03:31 by admin ( last modified Jul 16, 2010 03:31 )

Många internetuppkopplingar har en router som sitter mellan Internet och ett internt nätverk med ofta flera datorer på. En säkerhetskille har enligt egen utsago gjort ett program som gör det möjligt att ta sig in innanför routern på ett sådant nätverk, givet att någon på nätverket besöker en webbplats som har attackkod. Många routrar verka kunna falla för denna attack. Forbes (se nedan) har publicerat en lista över routrar där man kan se om ens egen router är i farozonen. Tredjepartsprogramvarorna DD-WRT och OPENWRT är listade som möjliga att attackera. Tomato är inte listad, men såvida den inte har ändrad kod i det anfäktade området än Linksys originalkod (som Tomato bygger på) så är väl risken stor att den också är attackerbar. Snabba motmedel är att ändra lösenordet på routern från fabriksinsällningarna, och att aldrig lagar lösenordet för routern i webbläsaren.

Heffner's trick is to create a site that lists a visitor's own IP address as one of those options. When a visitor comes to his booby-trapped site, a script runs that switches to its alternate IP address--in reality the user's own IP address--and accesses the visitor's home network, potentially hijacking their browser and gaining access to their router settings.

Läs mer: “Millions” Of Home Routers Vulnerable To Web Hack « The Firewall - Forbes.com

Så här fungerar attacken verkar det som:

As I understand it, it generally works like this: You set a ridiculously short TTL on the server hosting the exploit. When a victim connects you grab their IP address, add it and any other likely target IPs to the list of A records for the server and reload the zone. Your attack code just needs to wait for the TTL to expire, DNS to refresh and then try and connect to the target, which now appears to come from an attack on a trusted network.

Läs mer: Slashdot Comments | Millions of Home Routers Are Hackable

Offloading downloads from Plone, (relatively) transparently

published Jul 12, 2010 12:47 by admin ( last modified Jul 12, 2010 12:47 )

I've just built a first version of a system that offloads file downloads from the responsibility of Plone, while keeping authentication and authorization. With help from Apache and iw.fss, surprisingly few lines of code are needed to accomplish this.

Here comes first two bulleted summaries of what is going on. Look further down in this post for a longer description:

Workflow

User accesses page in Plone, where he is offered to download a file
When clicking the download link, a redirect is made to a sub domain, with the exact same path + filename
The subdomain "steals" a cookie from the user and uses that to authenticate an xml-rpc call to check if user has the right to download the file
If True, mod_python lets the request through, and file is downloaded through Apache

Technology

Have Apache in front,
use iw.fss to store files on the file system
Serve the files out on a subdomain with Apache
Use url rewrite to make authentication cookie or session cookie accessible to the sub domain
Use mod_python as access handler for download sub domain
The mod_python script has an xmlrpc client that can be configured with cookies
The xmlrpc client is configured with cookies, so to Plone it is the user himself
xmlrpc asks plone if dowload is ok by accessing proxy method of object of download desire

Longer description

Plone is not very good at serving out big files. A couple of concurrent huge downloads would take up a lot of Zope processes and threads. Here is a way of offloading the file downloading to other server processes (in this case Apache) with authentication and authorization intact.

The stuff needed is

Apache with mod_python and mod_rewrite
A custom transport agent for xml-rpc (included in this post)
Plone

Let's look at how it is set up, going from the outside in:

Make the authentication or session cookie available to the download server

The user is browsing the web site. The first thing the user's browser hits is the Apache server, sending the request further to Plone. The user may or may not be logged in. The virtual server directive in Apache looks something like this:

<VirtualHost *:80>
 ServerName server.topdomain
 proxyPreserveHost on
 RewriteEngine On
 # Match the __ac cookie if present, and make a new cookie with same value, but that can be sent to sub domains
 RewriteCond %{HTTP_COOKIE} __ac=([^;]+) [NC]
 RewriteRule ^/(.*) http://localhost:6080/VirtualHostBase/http/%{HTTP_HOST}:80/site/VirtualHostRoot/$1 [co=__download__ac:%1:.server.topdomain,L,P]
 RewriteRule ^/(.*) http://localhost:6080/VirtualHostBase/http/%{HTTP_HOST}:80/site/VirtualHostRoot/$1 [L,P]

</VirtualHost>

The RewriteRule looks pretty standard, it rewrites the request to fit the virtual host monster in Plone, and the Plone site with the id "site" is served out. However there is some extra stuff, particularly the line:

RewriteCond %{HTTP_COOKIE} __ac=([^;]+) [NC]

This line is a condition, that is triggered if the incoming request has a cookie by the name "__ac". The "__ac" cookie contains information that authenticates the user to the Plone site. If your site uses another cookie than "__ac" then put its name here instead. It does not matter if the cookie contains just a session id or the login information. The important thing is this part:

([^;]+)

...it matches the cookie value, and the brackets mean that Apache remembers this part of the pattern. Since this is the first (and actually only) pair of brackets in the pattern, it will be remembered as %1 (Apache's way of denoting back references from the environment or http headers is with a percentage sign, inside a url pattern it is the standard dollar sign though).

Now, let's look at the rewrite rule:

RewriteRule ^/(.*) http://localhost:6080/VirtualHostBase/http/%{HTTP_HOST}:80/site/VirtualHostRoot/$1 [co=__download__ac:%1:.server.topdomain,L,P]

The first part is standard. The square brackets at the end usually only contains control flow information (such as "stop here" or somesuch) but within the square brackets at the end here there is also an instruction to send a cookie with the response out:

co=__download__ac:%1:.server.topdomain

We define here a new cookie called __download__ac (it can be called anything), but the important thing here is the dot in front of the domain. This means that the cookie is available to the domain server.topdomain and all sub domains. In this way this cookie can be shared with sub domains.

The standard __ac cookie in Plone does not have the capability to be shared with sub domains. The __ac cookie is a bit insecure because it actually contains the user name and password in scrambled form, but for other reasons we wanted to keep it with this system.

Another way of making the cookie shared would be to write a PAS plugin for Plone that can be configured to send out the cookie with a dot-prefixed domain qualifier, but the above rewrite is hard to beat for brevity.

If one goes with the more secure session cookie, there seem to be a PAS plugin that already allows configuration with a a dot-prefixed domain, and in that case the rewrite above is not necessary. Well, either way you do it, at this point you should have cookie with authentication information that can be read by sub domains.

Apache config of download server

When the user click on the download link he will be redirected to a sub domain. Let's look at the Apache configuration for that:

<VirtualHost *:80>
 ServerName download.server.topdomain
  DocumentRoot /var/www/html
  <Location />
   #Extend the Python path to locate your callable object
  PythonPath "sys.path+['/var/www/mod_python']"
  # Make Apache aware that we want to use mod_python
  AddHandler mod_python .py
  PythonAccessHandler downloadauth
  </Location>
</VirtualHost>

This looks pretty much like a standard virtual server directive in Apache, for serving out static files from the file system. The only embellishment being that a python script gets registered to be access handler for all requests. This means that the script is actually not serving out any content, it just sits as a gatekeeper and says yay or nay to if Apache should serve out the requested content to the user.

As per the configuration above, the script is defined above as having the name "downloadauth.py" and that it should live in the "/var/www/mod_python" directory on the server. Let's take a look at that script:

#!/usr/bin/env python
# -*-python-*-
# 
from mod_python import apache, Cookie
import xmlrpclib
from cookiestransport import CookiesTransport
PLONE_URL = 'http://192.168.1.51:6080/site'


def accesshandler(req):
    """Return apache.OK if the authentication was successful,
    apache.HTTP_UNHAUTORIZED otherwise.
    """
    plone_local_uri = "/".join(req.uri.split("/")[:-1])
    xmlrpc_server_url = PLONE_URL + plone_local_uri
    
    # Get the __ac cookie
    cookies = Cookie.get_cookies(req)
    # apache rewrite adds this cookie as a sub domain friendly copy of "__ac"
    cookie = cookies.get('__download__ac', None)
    cookies_spec = []

    if cookie is not None:
        cookie_value = cookie.value
        # Cookies values are delivered wrapped in quotes
        cookie_value = cookie_value.replace('"','')
        # We configure xmlrpc to use it as "__ac"
        cookie_spec = ['__ac', cookie_value]
        cookies_spec.append(cookie_spec)
    server = xmlrpclib.Server(xmlrpc_server_url, transport=CookiesTransport(cookies=cookies_spec))
    if server.can_i_download():
        return apache.OK
    else:
        return apache.HTTP_UNAUTHORIZED

This script takes the cookie we defined, and sends an xml-rpc call to Plone. The xml-rpc client is configured with a special transport agent that can take cookies. In this way the xml-rpc client get authenticated as the user doing the download. Plone runs on Zope and Zope supports xml-rpc out of the box. All we need now is to access a method in Plone that has the same acess rights as the file the user wants to download.

If we assume the object in Plone holding the download looks something like this:

object
     method: download (permission: 'View')
     method: can_i_download (permission: 'View')

I.e. it has two methods,

One that serves out the object from inside of Plone
One that just returns True

The first method is never going to be used; we do not want Plone to serve out the file from within Plone. However we make a note of what permission that method has (usually "View"). We then make a method "can_i_download" (or whatever name you fancy) with the same permission. This method just returns True. It can look like this:

security.declareProtected(permissions.View,'can_i_download')        
    def can_i_download(self):
        """If user has permission to run this method, he 
        has permission to download the file"""
        return True

The cookie aware transport agent for xmlrpc

The code for the cookie aware transport class largely taken from:

Roberto Rocco Angeloni » Blog Archive » Xmlrpclib with cookie aware transport

My version does away with the ability to receive cookies from the server

Adds capabilty to configure client with arbitrary cookies from code

# A module with a class that allows the xmlrpc client to be configured with a list of cookies
# For authentication with plone xml-rpc methods
# Based completely on code from Rooco Angeloni:
# http://www.roccoangeloni.it/wp/2008/06/13/xmlrpclib-with-cookie-aware-transport/

import xmlrpclib
class CookiesTransport(xmlrpclib.Transport):
    """A transport class for xmlrpclib, that can be configured with
    a list of cookies and uses them in the xml-rpc request"""
    def __init__(self, cookies=[]):
        """ cookies parameter should be a list of two item lists/tuples [id,value]"""
        if hasattr(xmlrpclib.Transport, '__init__'):
            xmlrpclib.Transport.__init__(self)
        self.cookies=cookies

    def request(self, host, handler, request_body, verbose=0):
        # issue XML-RPC request
        h = self.make_connection(host)
        if verbose:
            h.set_debuglevel(1)
        self.send_request(h, handler, request_body)
        self.send_host(h, host)
        fo = open('/tmp/modpythonlogger.txt','a')
        fo.write('Value for cookiespec is:%s\n' % self.cookies)
        fo.close()
        for cookie_spec in self.cookies:
            h.putheader("Cookie", "%s=%s" % (cookie_spec[0],cookie_spec[1]) )
        self.send_user_agent(h)
        self.send_content(h, request_body)
        errcode, errmsg, headers = h.getreply()
        if errcode != 200:
            raise xmlrpclib.ProtocolError(
                host + handler,
                errcode, errmsg,
                headers
                )
        self.verbose = verbose
        try:
            sock = h._conn.sock
        except AttributeError:
            sock = None
        return self._parse_response(h.getfile(), sock)

iw.fss - care and feeding

iw.fss is a very slick product for external storage from Ingeniweb. Deciding what fields should be stored externally can be done with a ZCML file, you do not need to touch the code of the content types.

Making migration work in iw.fss

iw.fss has a control panel that allows you to migrate the already stored data in the fields, to the external storage. In our case a good number of field migrations failed. It turned out that at least in our system (which was prepopulated from an old legacy system) the data was sometimes stored in an object type, "OFS.Image.Pdata", that iw.fss migration could not handle (a bug ticket has been submitted to the iw.fss people about this).

The following monkey patch (used with collective.monkeypatcher), stored in a "patches.py" file fixed that problem:

import cgi
from ZPublisher.HTTPRequest import FileUpload
import cStringIO
from iw.fss.FileSystemStorage import FileUploadIterator

old__init__ = FileUploadIterator.__init__

def new__init__(self, file, streamsize=1<<16):
    """ this is a file upload """
    if not hasattr(file, 'read') and hasattr(file,'data'):
        data = str(file) # see OFS.Image.Pdata
        fs = cgi.FieldStorage()
        fs.file = cStringIO.StringIO(data)
        file = FileUpload(fs)
    return old__init__(self, file, streamsize)

...with the following ZCML:

<configure
    xmlns="http://namespaces.zope.org/zope"
    xmlns:monkey="http://namespaces.plone.org/monkey"
    i18n_domain="my.application">

  <include package="collective.monkeypatcher" />

  <monkey:patch
     description="Patching FileUploadIterator to handle OFS.Image.Pdata objects"
     class="iw.fss.FileSystemStorage.FileUploadIterator"
     original="__init__"
     replacement=".patches.new__init__"
     />

</configure>

Selecting storage strategy in iw.fss and how to make the url redirect

iw.fss can store the external files in different layouts, called "strategies". I chose "site2". "site2" mirrors the directory structure of Plone completely, and then adds the file in the innermost sub directory. Problem is, the path inside of Plone goes to the last sub directory, but Plone does not tack on the name of the file at the end of the url, i.e. if the file object in Plone has the url:

http://server.topdomain/afolder/another/folder/file_object

iw.fss will store it (with site2 layout) as:

/afolder/another/folder/file_object/filename

That "filename" can be anything. Thankfully iw.fss also stores a file next to the file with a fixed name "fss.cfg". This stores the name of the other file (the one we want to serve out).

I wrote a tool that can be configured with info on where on the file system iw.fss stores its data, and what the url is of the download server. Let's say the tools is called "tool_that_stores_fss_info". If you do not want to write a tool, a config sheet in portal_properties will do too.

In your file object, write a method that goes something like this:

security.declarePublic('redirectToExternalFile')        
    def redirectToExternalFile(self):
        """Redirects to external download url"""
        sc_tool = getToolByName(self, 'tool_that_stores_fss_info')
        portal_url = getToolByName(self, 'portal_url')()
        
        local_url = self.absolute_url()[len(portal_url):] # Extract local part of url
        file_path_to_download = sc_tool.fs_path_to_download_root + local_url
        config = ConfigParser.ConfigParser()
        config.read(file_path_to_download  + '/fss.cfg')
        file_name = config.get('FILENAME', 'file')
        url_to_download = sc_tool.url_to_download_root + local_url + "/" + file_name
        #return url_to_download
        self.REQUEST.response.redirect(url_to_download)

Buildout configuration of iw.fss

Below is an fss configuration that works for a Plone site named "site" stored in the Zope root, using "site2" as storage layout, and the external files get stored inside the var directory of the buildout.

[fss]
recipe = iw.recipe.fss
zope-instances =
    ${instance:location}
storages =
# The first is always generic
    global /
    site /site site2 ${buildout:directory}/var/fss_storage_site

Other location of the external files

In the above case the files gets stored in the var directory inside of the Plone buildout. It may be a better idea to store it on the server in a directory structure with more permanence, like "/var/www/html" as is suggested in the Apache configuration for the dowload server earlier in this post. The var directory in a buildout does not get overwritten on invocations of buildout, but it is still in the buildout directory structure.

Make sure however you have the iw.fss storage on the same volume as the Plone buildout:

Many unixish systems have different directories (/tmp, /home, /var and so on) on different mounted volumes. Python's os.rename cannot handle this and therefore the code in iw.fss that uses os.rename cannot handle the storage being on a different mounted volumes (say, on "/var/www/html", when the Plone site is a buildout in the "/home" hierarchy). shutil.move may be an alternative to os.rename. (a bug ticket has been submitted to the iw.fss people about this).

Some possible improvements and simplifications

What changes could be made to make this more of a solution ready out-of-the box, to be deployed on different servers?

Getting the cookie to the sub domain

The rewrite in Apache could be simplified, or
The cookie handler in Plone could check what domain it is serving the cookie in, and tack on a dot. This behavior could be switched on and off with a checkbox in the ZMI. In this way no Apache rewrite would be necessary at all.

mod_python contacting Plone

Assuming Plone runs on port 8080 and with the same ip number as mod_python could be a default in the mod_python script

Constructing the right url to redirect to

The redirect code could be factored out into a view (if they work with xml-rpc), or some other mechanism, so that a simple ZCML configuration would connect a downloadable field in a content type with the redirect code
It is probably possible to ask iw.fss where its file system storage is, in that way no separate setting would be needed for this in a tool or property sheet
The default domain to redirect to could be the one Plone is on, with "download." tacked on in front. Together with the preceding suggestion, no settings would need to be stored in Plone

Replacing xml-rpc

Instead of xml-rpc, maybe one could simply use an authenticated HEAD request to the file in Plone? Then no custom authentication method would be needed.

Use mod-xsendfile

I have since I did this work noticed Jazkarta has commissioned a creation of a file upload/download optimization:

http://blog.jazkarta.com/2010/09/21/handling-large-files-in-plone-with-ore-bigfile/

One of the parts of this is Apache's mod_xsendfile module. It seems that its purpose is to watch the response headers from a proxied server (e.g. Zope), and if a x-sendfile header is found, it cancels the response and serves out a file via apache, located on the file system according to the value of the x-sendfile header.

Our current implementation is working, but this would simplify a bit, not needing a separate domain or any cookie rewrites or xml-rpc calls, on the other hand there would be a new module to install and test:

http://tn123.ath.cx/mod_xsendfile/

Use Plone 4

If I understand correctly the new BLOB support in Plone 4 is supposed to be easy on the CPU with file iterators. This ought to mean that one can have a good number of threads per ZEO client, and these should be able to serve quite concurrently (i.e.. non-blocking).

Change the architecture

Tramline (PloneTramline) uses Apache filters to intercept request and responses in Apache. In this way it can monitor all request and responses and insert a file on the way out, and take care of a file on the way in.

It would be cool if were possible to configure Apache in such a way that you can choose proxying based on the response headers (from the first proxy you try). If those response headers match some criterion (such has there being an "x-dowload-from-somewhere-else" header), Apache would simply switch to another server it proxies. I do not think it can be done, but this page talks about a possible implementation:

Smart filter

In fact the above mentioned mod_xsendfile is a variation on this theme.

SPEAR - an attempt of finding experts and good knowledge artefacts

published Jul 11, 2010 02:08 by admin ( last modified Jul 11, 2010 02:08 )

One year ago or so I stumbled upon a guide on how you "can be an expert on almost anything", by manually identifying what knowledge was of most utility in a field. One part of this guide was about using delicio.us as a help for identifying clusters of knowledge.

Apparently, there is some research into finding algorithms that can automatically sift through and rank both people and documents, at heart using some heuristics, such as users that recommend or tag something early are more likely to be experts than later followers.

My co-worker Ching-man Au Yeung from University of Southampton and I presented the SPEAR algorithm in our joint paper "Telling Experts from Spammers: Expertise Ranking in Folksonomies" at the ACM SIGIR 2009 Conference

Läs mer: SPEAR Algorithm - Michael G. Noll

Länk - Discussion on WiFi interference sources

published Jun 29, 2010 12:01 by admin ( last modified Jun 29, 2010 12:01 )

Almost every evening, between 8:30 and 10:00, my Wi-Fi just dies.

Läs mer: Slashdot Ask Slashdot Story | Tracking Down Wi-Fi Interference?

Linux ocr for getting text from a screenshot

published Jun 15, 2010 06:52 by admin ( last modified Jun 15, 2010 06:52 )

Summary: For a 72dpi screenshot, gocr returned something intelligible, tesseract returned nothing and ocrad returned gibberish.

Multiplying the pixel count by 4, and interpolating helped tesseract and ocrad to output text at all, but they were still not superior to gocr

These OCR programs are probably not calibrated for making text out of pixel-perfect low-resolution screen shots, but from high-resolution somewhat noisy scans of different type faces on paper. Doing OCR from a screenshot ought to be quite easy: Each letter is pixel perfect and looks exactly the same, and there are no problems with slanting text or other distortions. In fact, writing your own OCR program is a distinct possibility for this.

I had via mail received a 72dpi screenshot that I wanted to get the text from. The top part looked like this:

The top of the screenshot

Tesseract, which is a program that is highly recommended on the web, returned nothing when run on this screenshot. At first after reading this, I thought this had to do with my tif possibly having a layer of transparency, but ensuring it was not there did not change anything.

According to the same discussion, it seems like tesseract wants to have a high resolution image (see tests on that further down).

Now the ocrad program returned this:

Al Po_ Po_ _ Al _|_o _|_o _ M_|_o_hl_o
lollob_IOldo _|_o_do l_m_o
A_ _ol__lo _ _|_o_do l_m_o
__ _o _| Colmo_ _ ____ _ ___ T__o_
OIOo Ml__ __o C_o_o_o_ O_O____o
llo_o_do__ _ l_|_ __llO_ Co__ol__
_| Mo_|___o O_O____o A_oOo_
A_ Amo_ C_o_do Woblo_ lo_ Ml_odo_ _ Alb___o Bo__o_ _| Tl_o_ D
_o_ldo B___lol _ _|_hl_ _o_ b Bobb_ c___
lo_ Tomoll_o_ D_ OIOo O_O____o A_oOo_ _o__
A_ld _o_ Bo_____o
_o_ Wo_ Wo_ B___o _|__o C_bo_ Plo____

...and so on.

Gocr returned this:

y 9 999)        9       9     _J   ypp yyy    y
AI P a n P a n Y AI Vin o Vin o             m eIc o c hita
L oIIo brigid a                    Ric ard o L e m v o
A V aIeria                    Ric ard o L e m v o
Se Va EICaiman                 fru ko Y Sus Tesos
Oiga mire Vea                   Guayacan Orquesta
LIora n d ote                    L uis F eIi e G o n z aIe z
EImanisero                    Orquesta Aragȯn
A Amor Cuando HabIan Las miradas       AIberto Barros''EITitan D.
S o nid o B e stiaI                  Ric hie R a 6 B o b b C ru z
Los TamaIitos De OIga               Orquesta Aragȯn,Josė
A cid                       R a y B arretto
Ran Kan Kan                   Buena Vista Cuban PIayers
Undanta                     Bo Kas ers Orkester

...and so on, which given the non-outputting competition, must be deemed fantastic. Still, it cannot deal with any characters extending below the baseline (p, g and y for example), and all ls are interpreted as 1s.

Increasing the pixel density of the image

This turned out to be non trivial with the tools I had at hand. I finally got resampling working with the program pnmenlarge, part of the netpbm suite of command line unixish image processing tools:

cat salsatext.pnm | pnmenlarge 4 > enlarged.pnm

This quadrupled each pixel, and now tesseract magically started working!

(convert to to tif first)

Fil Pen Pen 'ii Fil '·.·'inp '·.·'inp et`} i···1el¤:p·:hite
Lpllplznrigide Riterdp Lem·-rp
.·!·.·-,# '·.·'elerie et`} Riterdp Lem·-rp
5e '·.·'e El Ceimen et`} Frulce 'ii 5us Tesps
Ciige i···1ire '·.·'ee Gueyeten Ordueste
Llprendpte et`} Luis Felipe Gpneelee
El i···1eniserp Ordueste ifiregdn
.·!·.·-,# Famer, Cuendp Hel:·len Les i···1iredes et`} .·!·.ll:¤ertp Eerrps "El Titen D. ..
Epnidp Eestiel et`} Richie Re·-,# El Epl:·l:·3r Crue
Lps Temelitps De Cilge Ordueste ifiregdn, _|pse  
.·!·.¤:id Re·-,# Eerrettp
Ren Ken Ken Euene '·.·'iste ·Zul:·en F‘le3··ers
Llndenteg et`} Ep iiespers Orltester

Well, it does at least produce output, but the quality is at the point that you can barely guess which line it is trying to decode.

Let's try switching to Spanish as language:

.ü.| Pan Pan "x‛ .ü.| '·.·'ina '·.·'ina —l=*.`Š f'·'1a|·:·:··:|'•i|:a
La||a|:·ri·;|i·:|a F‘xi·:ar·:|a Lam'.«a
.ü.'-,« '·.·'a|aria —l=*.`Š F‘xi·:ar·:|a Lam'.«a
Sa '·.·'a El Caiman —l=*.`Š FrukJ:· "x‛ 5uS TaSaS
Diga Mira '·.·'aa Cuaşracan Dr·:]uaS|:a
Llarandata —l=*.`Š LuiS Falipa Ganzalaz
El f'·'1aniSara Dr·:]uaS|:a Aragón
.ü.'-,« Fumar, Cuanda Ha|:·|an LaS f'·'1ira·:|aS —l=*.`Š .ü.||:·ar|:·:· EarraS "EI Titan D. ..
Sanida EaS|:ia| —l=*.`Š F‘xi·:|'•ia Ra'-; Ex Ea|:·|:·ş» Cruz
LaS Tama|i|:aS Da Diga Dr·:]uaS|:a Aragón, _|aSa  
.ü.·:i·:| Ra'-; Earratta
F‘xan Kan Kan Euana '·.·'iSta Cu|:·an F‘|aş·'arS
L|n·:|an|:a·; —l=*.`Š Ba kaS|:·arS DrkaStar

That was not good. Maybe the enlargement needs to be smoother?

pamstretch, also from the netpbm package, also increases pixel count but additionally smooths the output by interpolating pixels.

As many unixish tools, pamstretch takes data from stdin and outputs it to stdout:

cat salsatext.pnm | pamstretch 4 > stretched.pnm

Tesseract needs tif format, handled here by Imagemagick's convert command

convert  stretched.pnm  stretched.tif

run tesseract on it in this case with -l spa, which means Spanish language

tesseract stretched.tif str -l spa

The result:

AI Pan Pan 'l" AI Vino Vino —.?•.`$ Molcochita
Lollobrigida Ricardo Lomyo
Ay 'o‘aloria —.?•.`$ Ricardo Lomyo
5o 'o‘a El Caiman —.?•.`$ Fruko 'l" Sus Tosos
Diga Miro 'o‘oa Guayacan ûrquosta
Llorandoto —.?•.`$ Luis Folipo Gonzalo:
El Manisoro ûrquosta Aragon
Ay Amor, Cuando Hablan Las Miradas —.?•.`$ Alborto Barros "El Titan D. ..
5onido Eostial —.?•.`$ Richio Ray En Bobby Cruz
Los Tamalitos Do Olga ûrquosta Aragon, José  
Acid Ray Earrotto
Ran Kan Kan Euona 'liista Cuban Playors
Undantag —.?•.`$ Bo Iäaspors ûrkostor

...better. Let's try English:

AI Pan Pan 'i" AI 'a'inu 'a'inu 3} Ms|cuchita
Lu||ubrigic|a Ricarclu Lsmyu
Ay 'a'a|sria 3} Ricarclu Lsmyu
5s 'a'a El Caiman 3} Fruku 'i" 5us Tssus
Diga Mirs 'a'sa Guayacan Drqussta
L|uranduts 3} Luis Fs|ips Gun:a|s:
El Manissru Drqussta Aragun
Ay Amur, Cuanclu Hab|an Las Miradas 3} Albsrtu Earrus "El Titan D. ..
5unic|u Esstia| 3} Richis Ray E: Eubby Cru:
Lus Tama|itus Ds D|ga Drqussta Aragun, juss  
Acid Ray Earrsttu
Ran Iian Iian Eusna 'a'ista Cuban Playsrs
Unclantag 3} Eu Iiaspsrs Drksstsr

That is worse.

How does ocrad perform?

Al Pan Pan Y Al Vino Vino __ Melcochila
Lollobrigida Ricardo Lemvo
Ay Valeria __ Ricardo Lemvo
Se Va El Caiman __ FrukD Y Sus Tesos
Oiga Mire Vea Guayacan Orquesla
Llorandole __ Luis Felipe Gonzalez
El Manisero Orquesla Arag�n
Ay Amor, Cuando Nablan Las Miradas __ Alberlo Barros "El Tilan D,,,
Sonido Beslial __ Richie Ray bBobby Cruz
Los Tamalilos De Olga Orquesla Arag�n, los� , , ,
Acid Ray Barrello
Ran Kan Kan Buena Visla Cuban Players
Undanlag __ Bo Kaspers OrkPsler

A lot better than the line noise seen before. With enlarged but not interpolated:

Al Pan Pan Y Al Vino Vino __ Melcochi_a
Lollobrigida Ricardo Lemvo
Ay Valeria __ Ricardo Lemvo
Se Va El Caiman __ FrukoYSusTesos
Oiga Mire Vea Cuayacan Orques_a
Llorando_e __ Luis Felipe Conzalez
El Manisero Orques_a Arag�n
Ay Amor, Cuando Hablan Las Miradas __ Alber_o Barros "El Ti_an D,,,
Sonido Bes_ial __ Richie Ray b Bobby Cruz
Los Tamali_os De Olga Orques_a Arag�n, _os� ,,,
Acid Ray Barre__o
Ran Kan Kan Buena Vis_a Cuban Players

That's worse.

So, tesseract and ocrad needs the input to be "scannified" by multiplying the pixel count and interpolating to get a bit of smoothness, but they still do not clearly beat gocr.

For scanned in documents the ranking seems reversed.

Peter Selinger: Review of Linux OCR software:

Of course, it must be stressed that the test results reported here are derived from only two scanned pages. It is possible that for other inputs, the programs rank differently. However, based on the tests reported on this page, here is a summary of my conclusions:
* Tesseract gives extremely good output at a reasonable speed. It is the clear overall winner of the test. The only caveat is that one absolutely must convert the input to bitonal.
* Ocrad gives reasonable output at extremely high speed. It can be useful in applications where speed is more important than accuracy.
* GOCR gives poor output at a slow speed.

Länk - Finding computers on a network

published Jun 14, 2010 05:46 by admin ( last modified Jun 14, 2010 05:46 )

Nmap is good for this - use the -O option for OS fingerprinting and -oX "filename.xml" for output as xml

nmap -sP 192.168.0.0/24

Läs mer: Get a list of all computers on a network w/o DNS - Stack Overflow