Information Technology
Getting the code for Facebook's "Like" button
Here is a link going directly to a page on Facebook with boiler plate code for making a Facebook "Like" button.
This evening I tried filling out a form on Facebook that should give me the code for a Facebook button to put on a page. Trying with different browsers I could not get it to output any code, the submit did not seem to work. Analyzing the submit button, it submits to a url that if you just access it directly (i.e. a GET without parameters) you get some boiler plate code where you easily can replace relevant parts. I guess the form submit button will start to work again, but if not accessing the taget url directly works somewhat.
Många routrar känsliga för en DNS-attack
Många internetuppkopplingar har en router som sitter mellan Internet och ett internt nätverk med ofta flera datorer på. En säkerhetskille har enligt egen utsago gjort ett program som gör det möjligt att ta sig in innanför routern på ett sådant nätverk, givet att någon på nätverket besöker en webbplats som har attackkod. Många routrar verka kunna falla för denna attack. Forbes (se nedan) har publicerat en lista över routrar där man kan se om ens egen router är i farozonen. Tredjepartsprogramvarorna DD-WRT och OPENWRT är listade som möjliga att attackera. Tomato är inte listad, men såvida den inte har ändrad kod i det anfäktade området än Linksys originalkod (som Tomato bygger på) så är väl risken stor att den också är attackerbar. Snabba motmedel är att ändra lösenordet på routern från fabriksinsällningarna, och att aldrig lagar lösenordet för routern i webbläsaren.
Heffner's trick is to create a site that lists a visitor's own IP address as one of those options. When a visitor comes to his booby-trapped site, a script runs that switches to its alternate IP address--in reality the user's own IP address--and accesses the visitor's home network, potentially hijacking their browser and gaining access to their router settings.
Läs mer: “Millions” Of Home Routers Vulnerable To Web Hack « The Firewall - Forbes.com
Så här fungerar attacken verkar det som:
As I understand it, it generally works like this: You set a ridiculously short TTL on the server hosting the exploit. When a victim connects you grab their IP address, add it and any other likely target IPs to the list of A records for the server and reload the zone. Your attack code just needs to wait for the TTL to expire, DNS to refresh and then try and connect to the target, which now appears to come from an attack on a trusted network.
Läs mer: Slashdot Comments | Millions of Home Routers Are Hackable
Offloading downloads from Plone, (relatively) transparently
I've just built a first version of a system that offloads file downloads from the responsibility of Plone, while keeping authentication and authorization. With help from Apache and iw.fss, surprisingly few lines of code are needed to accomplish this.
Here comes first two bulleted summaries of what is going on. Look further down in this post for a longer description:
Workflow
- User accesses page in Plone, where he is offered to download a file
- When clicking the download link, a redirect is made to a sub domain, with the exact same path + filename
- The subdomain "steals" a cookie from the user and uses that to authenticate an xml-rpc call to check if user has the right to download the file
- If True, mod_python lets the request through, and file is downloaded through Apache
Technology
- Have Apache in front,
- use iw.fss to store files on the file system
- Serve the files out on a subdomain with Apache
- Use url rewrite to make authentication cookie or session cookie accessible to the sub domain
- Use mod_python as access handler for download sub domain
- The mod_python script has an xmlrpc client that can be configured with cookies
- The xmlrpc client is configured with cookies, so to Plone it is the user himself
- xmlrpc asks plone if dowload is ok by accessing proxy method of object of download desire
Longer description
Plone is not very good at serving out big files. A couple of concurrent huge downloads would take up a lot of Zope processes and threads. Here is a way of offloading the file downloading to other server processes (in this case Apache) with authentication and authorization intact.
The stuff needed is
- Apache with mod_python and mod_rewrite
- A custom transport agent for xml-rpc (included in this post)
- Plone
Let's look at how it is set up, going from the outside in:
Make the authentication or session cookie available to the download server
The user is browsing the web site. The first thing the user's browser hits is the Apache server, sending the request further to Plone. The user may or may not be logged in. The virtual server directive in Apache looks something like this:
<VirtualHost *:80>
ServerName server.topdomain
proxyPreserveHost on
RewriteEngine On
# Match the __ac cookie if present, and make a new cookie with same value, but that can be sent to sub domains
RewriteCond %{HTTP_COOKIE} __ac=([^;]+) [NC]
RewriteRule ^/(.*) http://localhost:6080/VirtualHostBase/http/%{HTTP_HOST}:80/site/VirtualHostRoot/$1 [co=__download__ac:%1:.server.topdomain,L,P]
RewriteRule ^/(.*) http://localhost:6080/VirtualHostBase/http/%{HTTP_HOST}:80/site/VirtualHostRoot/$1 [L,P]
</VirtualHost>
The RewriteRule looks pretty standard, it rewrites the request to fit the virtual host monster in Plone, and the Plone site with the id "site" is served out. However there is some extra stuff, particularly the line:
RewriteCond %{HTTP_COOKIE} __ac=([^;]+) [NC]
This line is a condition, that is triggered if the incoming request has a cookie by the name "__ac". The "__ac" cookie contains information that authenticates the user to the Plone site. If your site uses another cookie than "__ac" then put its name here instead. It does not matter if the cookie contains just a session id or the login information. The important thing is this part:
([^;]+)
...it matches the cookie value, and the brackets mean that Apache remembers this part of the pattern. Since this is the first (and actually only) pair of brackets in the pattern, it will be remembered as %1 (Apache's way of denoting back references from the environment or http headers is with a percentage sign, inside a url pattern it is the standard dollar sign though).
Now, let's look at the rewrite rule:
RewriteRule ^/(.*) http://localhost:6080/VirtualHostBase/http/%{HTTP_HOST}:80/site/VirtualHostRoot/$1 [co=__download__ac:%1:.server.topdomain,L,P]
The first part is standard. The square brackets at the end usually only contains control flow information (such as "stop here" or somesuch) but within the square brackets at the end here there is also an instruction to send a cookie with the response out:
co=__download__ac:%1:.server.topdomain
We define here a new cookie called __download__ac (it can be called anything), but the important thing here is the dot in front of the domain. This means that the cookie is available to the domain server.topdomain and all sub domains. In this way this cookie can be shared with sub domains.
The standard __ac cookie in Plone does not have the capability to be shared with sub domains. The __ac cookie is a bit insecure because it actually contains the user name and password in scrambled form, but for other reasons we wanted to keep it with this system.
Another way of making the cookie shared would be to write a PAS plugin for Plone that can be configured to send out the cookie with a dot-prefixed domain qualifier, but the above rewrite is hard to beat for brevity.
If one goes with the more secure session cookie, there seem to be a PAS plugin that already allows configuration with a a dot-prefixed domain, and in that case the rewrite above is not necessary. Well, either way you do it, at this point you should have cookie with authentication information that can be read by sub domains.
Apache config of download server
When the user click on the download link he will be redirected to a sub domain. Let's look at the Apache configuration for that:
<VirtualHost *:80>
ServerName download.server.topdomain
DocumentRoot /var/www/html
<Location />
#Extend the Python path to locate your callable object
PythonPath "sys.path+['/var/www/mod_python']"
# Make Apache aware that we want to use mod_python
AddHandler mod_python .py
PythonAccessHandler downloadauth
</Location>
</VirtualHost>
This looks pretty much like a standard virtual server directive in Apache, for serving out static files from the file system. The only embellishment being that a python script gets registered to be access handler for all requests. This means that the script is actually not serving out any content, it just sits as a gatekeeper and says yay or nay to if Apache should serve out the requested content to the user.
As per the configuration above, the script is defined above as having the name "downloadauth.py" and that it should live in the "/var/www/mod_python" directory on the server. Let's take a look at that script:
#!/usr/bin/env python
# -*-python-*-
#
from mod_python import apache, Cookie
import xmlrpclib
from cookiestransport import CookiesTransport
PLONE_URL = 'http://192.168.1.51:6080/site'
def accesshandler(req):
"""Return apache.OK if the authentication was successful,
apache.HTTP_UNHAUTORIZED otherwise.
"""
plone_local_uri = "/".join(req.uri.split("/")[:-1])
xmlrpc_server_url = PLONE_URL + plone_local_uri
# Get the __ac cookie
cookies = Cookie.get_cookies(req)
# apache rewrite adds this cookie as a sub domain friendly copy of "__ac"
cookie = cookies.get('__download__ac', None)
cookies_spec = []
if cookie is not None:
cookie_value = cookie.value
# Cookies values are delivered wrapped in quotes
cookie_value = cookie_value.replace('"','')
# We configure xmlrpc to use it as "__ac"
cookie_spec = ['__ac', cookie_value]
cookies_spec.append(cookie_spec)
server = xmlrpclib.Server(xmlrpc_server_url, transport=CookiesTransport(cookies=cookies_spec))
if server.can_i_download():
return apache.OK
else:
return apache.HTTP_UNAUTHORIZED
This script takes the cookie we defined, and sends an xml-rpc call to Plone. The xml-rpc client is configured with a special transport agent that can take cookies. In this way the xml-rpc client get authenticated as the user doing the download. Plone runs on Zope and Zope supports xml-rpc out of the box. All we need now is to access a method in Plone that has the same acess rights as the file the user wants to download.
If we assume the object in Plone holding the download looks something like this:
object
method: download (permission: 'View')
method: can_i_download (permission: 'View')
I.e. it has two methods,
- One that serves out the object from inside of Plone
- One that just returns True
The first method is never going to be used; we do not want Plone to serve out the file from within Plone. However we make a note of what permission that method has (usually "View"). We then make a method "can_i_download" (or whatever name you fancy) with the same permission. This method just returns True. It can look like this:
security.declareProtected(permissions.View,'can_i_download')
def can_i_download(self):
"""If user has permission to run this method, he
has permission to download the file"""
return True
The cookie aware transport agent for xmlrpc
The code for the cookie aware transport class largely taken from:
Roberto Rocco Angeloni » Blog Archive » Xmlrpclib with cookie aware transport
My version does away with the ability to receive cookies from the server
Adds capabilty to configure client with arbitrary cookies from code
# A module with a class that allows the xmlrpc client to be configured with a list of cookies
# For authentication with plone xml-rpc methods
# Based completely on code from Rooco Angeloni:
# http://www.roccoangeloni.it/wp/2008/06/13/xmlrpclib-with-cookie-aware-transport/
import xmlrpclib
class CookiesTransport(xmlrpclib.Transport):
"""A transport class for xmlrpclib, that can be configured with
a list of cookies and uses them in the xml-rpc request"""
def __init__(self, cookies=[]):
""" cookies parameter should be a list of two item lists/tuples [id,value]"""
if hasattr(xmlrpclib.Transport, '__init__'):
xmlrpclib.Transport.__init__(self)
self.cookies=cookies
def request(self, host, handler, request_body, verbose=0):
# issue XML-RPC request
h = self.make_connection(host)
if verbose:
h.set_debuglevel(1)
self.send_request(h, handler, request_body)
self.send_host(h, host)
fo = open('/tmp/modpythonlogger.txt','a')
fo.write('Value for cookiespec is:%s\n' % self.cookies)
fo.close()
for cookie_spec in self.cookies:
h.putheader("Cookie", "%s=%s" % (cookie_spec[0],cookie_spec[1]) )
self.send_user_agent(h)
self.send_content(h, request_body)
errcode, errmsg, headers = h.getreply()
if errcode != 200:
raise xmlrpclib.ProtocolError(
host + handler,
errcode, errmsg,
headers
)
self.verbose = verbose
try:
sock = h._conn.sock
except AttributeError:
sock = None
return self._parse_response(h.getfile(), sock)
iw.fss - care and feeding
iw.fss is a very slick product for external storage from Ingeniweb. Deciding what fields should be stored externally can be done with a ZCML file, you do not need to touch the code of the content types.
Making migration work in iw.fss
iw.fss has a control panel that allows you to migrate the already stored data in the fields, to the external storage. In our case a good number of field migrations failed. It turned out that at least in our system (which was prepopulated from an old legacy system) the data was sometimes stored in an object type, "OFS.Image.Pdata", that iw.fss migration could not handle (a bug ticket has been submitted to the iw.fss people about this).
The following monkey patch (used with collective.monkeypatcher), stored in a "patches.py" file fixed that problem:
import cgi
from ZPublisher.HTTPRequest import FileUpload
import cStringIO
from iw.fss.FileSystemStorage import FileUploadIterator
old__init__ = FileUploadIterator.__init__
def new__init__(self, file, streamsize=1<<16):
""" this is a file upload """
if not hasattr(file, 'read') and hasattr(file,'data'):
data = str(file) # see OFS.Image.Pdata
fs = cgi.FieldStorage()
fs.file = cStringIO.StringIO(data)
file = FileUpload(fs)
return old__init__(self, file, streamsize)
...with the following ZCML:
<configure
xmlns="http://namespaces.zope.org/zope"
xmlns:monkey="http://namespaces.plone.org/monkey"
i18n_domain="my.application">
<include package="collective.monkeypatcher" />
<monkey:patch
description="Patching FileUploadIterator to handle OFS.Image.Pdata objects"
class="iw.fss.FileSystemStorage.FileUploadIterator"
original="__init__"
replacement=".patches.new__init__"
/>
</configure>
Selecting storage strategy in iw.fss and how to make the url redirect
iw.fss can store the external files in different layouts, called "strategies". I chose "site2". "site2" mirrors the directory structure of Plone completely, and then adds the file in the innermost sub directory. Problem is, the path inside of Plone goes to the last sub directory, but Plone does not tack on the name of the file at the end of the url, i.e. if the file object in Plone has the url:
http://server.topdomain/afolder/another/folder/file_object
iw.fss will store it (with site2 layout) as:
/afolder/another/folder/file_object/filename
That "filename" can be anything. Thankfully iw.fss also stores a file next to the file with a fixed name "fss.cfg". This stores the name of the other file (the one we want to serve out).
I wrote a tool that can be configured with info on where on the file system iw.fss stores its data, and what the url is of the download server. Let's say the tools is called "tool_that_stores_fss_info". If you do not want to write a tool, a config sheet in portal_properties will do too.
In your file object, write a method that goes something like this:
security.declarePublic('redirectToExternalFile')
def redirectToExternalFile(self):
"""Redirects to external download url"""
sc_tool = getToolByName(self, 'tool_that_stores_fss_info')
portal_url = getToolByName(self, 'portal_url')()
local_url = self.absolute_url()[len(portal_url):] # Extract local part of url
file_path_to_download = sc_tool.fs_path_to_download_root + local_url
config = ConfigParser.ConfigParser()
config.read(file_path_to_download + '/fss.cfg')
file_name = config.get('FILENAME', 'file')
url_to_download = sc_tool.url_to_download_root + local_url + "/" + file_name
#return url_to_download
self.REQUEST.response.redirect(url_to_download)
Buildout configuration of iw.fss
Below is an fss configuration that works for a Plone site named "site" stored in the Zope root, using "site2" as storage layout, and the external files get stored inside the var directory of the buildout.
[fss]
recipe = iw.recipe.fss
zope-instances =
${instance:location}
storages =
# The first is always generic
global /
site /site site2 ${buildout:directory}/var/fss_storage_site
Other location of the external files
In the above case the files gets stored in the var directory inside of the Plone buildout. It may be a better idea to store it on the server in a directory structure with more permanence, like "/var/www/html" as is suggested in the Apache configuration for the dowload server earlier in this post. The var directory in a buildout does not get overwritten on invocations of buildout, but it is still in the buildout directory structure.
Make sure however you have the iw.fss storage on the same volume as the Plone buildout:
Many unixish systems have different directories (/tmp, /home, /var and so on) on different mounted volumes. Python's os.rename cannot handle this and therefore the code in iw.fss that uses os.rename cannot handle the storage being on a different mounted volumes (say, on "/var/www/html", when the Plone site is a buildout in the "/home" hierarchy). shutil.move may be an alternative to os.rename. (a bug ticket has been submitted to the iw.fss people about this).
Some possible improvements and simplifications
What changes could be made to make this more of a solution ready out-of-the box, to be deployed on different servers?
Getting the cookie to the sub domain
- The rewrite in Apache could be simplified, or
- The cookie handler in Plone could check what domain it is serving the cookie in, and tack on a dot. This behavior could be switched on and off with a checkbox in the ZMI. In this way no Apache rewrite would be necessary at all.
mod_python contacting Plone
- Assuming Plone runs on port 8080 and with the same ip number as mod_python could be a default in the mod_python script
Constructing the right url to redirect to
- The redirect code could be factored out into a view (if they work with xml-rpc), or some other mechanism, so that a simple ZCML configuration would connect a downloadable field in a content type with the redirect code
- It is probably possible to ask iw.fss where its file system storage is, in that way no separate setting would be needed for this in a tool or property sheet
- The default domain to redirect to could be the one Plone is on, with "download." tacked on in front. Together with the preceding suggestion, no settings would need to be stored in Plone
Replacing xml-rpc
- Instead of xml-rpc, maybe one could simply use an authenticated HEAD request to the file in Plone? Then no custom authentication method would be needed.
Change the architecture
Tramline (PloneTramline) uses Apache filters to intercept request and responses in Apache. In this way it can monitor all request and responses and insert a file on the way out, and take care of a file on the way in.
It would be cool if were possible to configure Apache in such a way that you can choose proxying based on the response headers (from the first proxy you try). If those response headers match some criterion (such has there being an "x-dowload-from-somewhere-else" header), Apache would simply switch to another server it proxies. I do not think it can be done, but this page talks about a possible implementation:
Linux ocr for getting text from a screenshot
Multiplying the pixel count by 4, and interpolating helped tesseract and ocrad to output text at all, but they were still not superior to gocr
These OCR programs are probably not calibrated for making text out of pixel-perfect low-resolution screen shots, but from high-resolution somewhat noisy scans of different type faces on paper. Doing OCR from a screenshot ought to be quite easy: Each letter is pixel perfect and looks exactly the same, and there are no problems with slanting text or other distortions. In fact, writing your own OCR program is a distinct possibility for this.
I had via mail received a 72dpi screenshot that I wanted to get the text from. The top part looked like this:

The top of the screenshot
Tesseract, which is a program that is highly recommended on the web, returned nothing when run on this screenshot. At first after reading this, I thought this had to do with my tif possibly having a layer of transparency, but ensuring it was not there did not change anything.
According to the same discussion, it seems like tesseract wants to have a high resolution image (see tests on that further down).
Now the ocrad program returned this:
Al Po_ Po_ _ Al _|_o _|_o _ M_|_o_hl_o
lollob_IOldo _|_o_do l_m_o
A_ _ol__lo _ _|_o_do l_m_o
__ _o _| Colmo_ _ ____ _ ___ T__o_
OIOo Ml__ __o C_o_o_o_ O_O____o
llo_o_do__ _ l_|_ __llO_ Co__ol__
_| Mo_|___o O_O____o A_oOo_
A_ Amo_ C_o_do Woblo_ lo_ Ml_odo_ _ Alb___o Bo__o_ _| Tl_o_ D
_o_ldo B___lol _ _|_hl_ _o_ b Bobb_ c___
lo_ Tomoll_o_ D_ OIOo O_O____o A_oOo_ _o__
A_ld _o_ Bo_____o
_o_ Wo_ Wo_ B___o _|__o C_bo_ Plo____
...and so on.
Gocr returned this:
y 9 999) 9 9 _J ypp yyy y
AI P a n P a n Y AI Vin o Vin o m eIc o c hita
L oIIo brigid a Ric ard o L e m v o
A V aIeria Ric ard o L e m v o
Se Va EICaiman fru ko Y Sus Tesos
Oiga mire Vea Guayacan Orquesta
LIora n d ote L uis F eIi e G o n z aIe z
EImanisero Orquesta Aragȯn
A Amor Cuando HabIan Las miradas AIberto Barros''EITitan D.
S o nid o B e stiaI Ric hie R a 6 B o b b C ru z
Los TamaIitos De OIga Orquesta Aragȯn,Josė
A cid R a y B arretto
Ran Kan Kan Buena Vista Cuban PIayers
Undanta Bo Kas ers Orkester
...and so on, which given the non-outputting competition, must be deemed fantastic. Still, it cannot deal with any characters extending below the baseline (p, g and y for example), and all ls are interpreted as 1s.
Increasing the pixel density of the image
This turned out to be non trivial with the tools I had at hand. I finally got resampling working with the program pnmenlarge, part of the netpbm suite of command line unixish image processing tools:
cat salsatext.pnm | pnmenlarge 4 > enlarged.pnm
This quadrupled each pixel, and now tesseract magically started working!
(convert to to tif first)
Fil Pen Pen 'ii Fil '·.·'inp '·.·'inp et`} i···1el¤:p·:hite
Lpllplznrigide Riterdp Lem·-rp
.·!·.·-,# '·.·'elerie et`} Riterdp Lem·-rp
5e '·.·'e El Ceimen et`} Frulce 'ii 5us Tesps
Ciige i···1ire '·.·'ee Gueyeten Ordueste
Llprendpte et`} Luis Felipe Gpneelee
El i···1eniserp Ordueste ifiregdn
.·!·.·-,# Famer, Cuendp Hel:·len Les i···1iredes et`} .·!·.ll:¤ertp Eerrps "El Titen D. ..
Epnidp Eestiel et`} Richie Re·-,# El Epl:·l:·3r Crue
Lps Temelitps De Cilge Ordueste ifiregdn, _|pse
.·!·.¤:id Re·-,# Eerrettp
Ren Ken Ken Euene '·.·'iste ·Zul:·en F‘le3··ers
Llndenteg et`} Ep iiespers Orltester
Well, it does at least produce output, but the quality is at the point that you can barely guess which line it is trying to decode.
Let's try switching to Spanish as language:
.ü.| Pan Pan "x‛ .ü.| '·.·'ina '·.·'ina —l=*.`Š f'·'1a|·:·:··:|'•i|:a
La||a|:·ri·;|i·:|a F‘xi·:ar·:|a Lam'.«a
.ü.'-,« '·.·'a|aria —l=*.`Š F‘xi·:ar·:|a Lam'.«a
Sa '·.·'a El Caiman —l=*.`Š FrukJ:· "x‛ 5uS TaSaS
Diga Mira '·.·'aa Cuaşracan Dr·:]uaS|:a
Llarandata —l=*.`Š LuiS Falipa Ganzalaz
El f'·'1aniSara Dr·:]uaS|:a Aragón
.ü.'-,« Fumar, Cuanda Ha|:·|an LaS f'·'1ira·:|aS —l=*.`Š .ü.||:·ar|:·:· EarraS "EI Titan D. ..
Sanida EaS|:ia| —l=*.`Š F‘xi·:|'•ia Ra'-; Ex Ea|:·|:·ş» Cruz
LaS Tama|i|:aS Da Diga Dr·:]uaS|:a Aragón, _|aSa
.ü.·:i·:| Ra'-; Earratta
F‘xan Kan Kan Euana '·.·'iSta Cu|:·an F‘|aş·'arS
L|n·:|an|:a·; —l=*.`Š Ba kaS|:·arS DrkaStar
That was not good. Maybe the enlargement needs to be smoother?
pamstretch, also from the netpbm package, also increases pixel count but additionally smooths the output by interpolating pixels.
As many unixish tools, pamstretch takes data from stdin and outputs it to stdout:
cat salsatext.pnm | pamstretch 4 > stretched.pnm
Tesseract needs tif format, handled here by Imagemagick's convert command
convert stretched.pnm stretched.tif
run tesseract on it in this case with -l spa, which means Spanish language
tesseract stretched.tif str -l spa
The result:
AI Pan Pan 'l" AI Vino Vino —.?•.`$ Molcochita
Lollobrigida Ricardo Lomyo
Ay 'o‘aloria —.?•.`$ Ricardo Lomyo
5o 'o‘a El Caiman —.?•.`$ Fruko 'l" Sus Tosos
Diga Miro 'o‘oa Guayacan ûrquosta
Llorandoto —.?•.`$ Luis Folipo Gonzalo:
El Manisoro ûrquosta Aragon
Ay Amor, Cuando Hablan Las Miradas —.?•.`$ Alborto Barros "El Titan D. ..
5onido Eostial —.?•.`$ Richio Ray En Bobby Cruz
Los Tamalitos Do Olga ûrquosta Aragon, José
Acid Ray Earrotto
Ran Kan Kan Euona 'liista Cuban Playors
Undantag —.?•.`$ Bo Iäaspors ûrkostor
...better. Let's try English:
AI Pan Pan 'i" AI 'a'inu 'a'inu 3} Ms|cuchita
Lu||ubrigic|a Ricarclu Lsmyu
Ay 'a'a|sria 3} Ricarclu Lsmyu
5s 'a'a El Caiman 3} Fruku 'i" 5us Tssus
Diga Mirs 'a'sa Guayacan Drqussta
L|uranduts 3} Luis Fs|ips Gun:a|s:
El Manissru Drqussta Aragun
Ay Amur, Cuanclu Hab|an Las Miradas 3} Albsrtu Earrus "El Titan D. ..
5unic|u Esstia| 3} Richis Ray E: Eubby Cru:
Lus Tama|itus Ds D|ga Drqussta Aragun, juss
Acid Ray Earrsttu
Ran Iian Iian Eusna 'a'ista Cuban Playsrs
Unclantag 3} Eu Iiaspsrs Drksstsr
That is worse.
How does ocrad perform?
Al Pan Pan Y Al Vino Vino __ Melcochila
Lollobrigida Ricardo Lemvo
Ay Valeria __ Ricardo Lemvo
Se Va El Caiman __ FrukD Y Sus Tesos
Oiga Mire Vea Guayacan Orquesla
Llorandole __ Luis Felipe Gonzalez
El Manisero Orquesla Arag�n
Ay Amor, Cuando Nablan Las Miradas __ Alberlo Barros "El Tilan D,,,
Sonido Beslial __ Richie Ray bBobby Cruz
Los Tamalilos De Olga Orquesla Arag�n, los� , , ,
Acid Ray Barrello
Ran Kan Kan Buena Visla Cuban Players
Undanlag __ Bo Kaspers OrkPsler
A lot better than the line noise seen before. With enlarged but not interpolated:
Al Pan Pan Y Al Vino Vino __ Melcochi_a
Lollobrigida Ricardo Lemvo
Ay Valeria __ Ricardo Lemvo
Se Va El Caiman __ FrukoYSusTesos
Oiga Mire Vea Cuayacan Orques_a
Llorando_e __ Luis Felipe Conzalez
El Manisero Orques_a Arag�n
Ay Amor, Cuando Hablan Las Miradas __ Alber_o Barros "El Ti_an D,,,
Sonido Bes_ial __ Richie Ray b Bobby Cruz
Los Tamali_os De Olga Orques_a Arag�n, _os� ,,,
Acid Ray Barre__o
Ran Kan Kan Buena Vis_a Cuban Players
That's worse.
So, tesseract and ocrad needs the input to be "scannified" by multiplying the pixel count and interpolating to get a bit of smoothness, but they still do not clearly beat gocr.
For scanned in documents the ranking seems reversed.
Peter Selinger: Review of Linux OCR software:
Of course, it must be stressed that the test results reported here are derived from only two scanned pages. It is possible that for other inputs, the programs rank differently. However, based on the tests reported on this page, here is a summary of my conclusions:
* Tesseract gives extremely good output at a reasonable speed. It is the clear overall winner of the test. The only caveat is that one absolutely must convert the input to bitonal.
* Ocrad gives reasonable output at extremely high speed. It can be useful in applications where speed is more important than accuracy.
* GOCR gives poor output at a slow speed.
Switcha över en fil till branch i subversion
1) Skapa en branch, t ex genom att skapa ett bibliotek någonstans på hårddisken och sedan importera det:
jorgen@computer:~$ mkdir jm20091111
jorgen@computer:~$ svn import jm20091111 https://svn.someserver/some.project/branches/
Committed revision 2683.
2) Gå till det bibliotek någonstans din utcheckade trunk, där filen som man vill jobba på i en branch ligger, Och kopiera över filen till branchen:
svn cp trickyModule.py https://svn.someserver/some.
3) Filen på disken på din dator är nu fortfarande i trunk, men man kan switcha över till kopian. Normalt jobbar switch på ett helt bibliotek, men man kan lägga på ett optional andra argument om man bara vill switcha en fil:
svn switch https://svn.someserver/some.
4) Kolla status:
svn st
S trickyModule.py
S betyder väl "switched" får man gissa. Alla ändringar i denna fil blir nu på commit skickade till den nya branchen (har kollat)
Run a plone zexp imported into a fresh Data.fs
Summary: You can't. The server will give an error. The trick is to create a bogus site in the new Zope server. This somehow modifies Zope, from what I have read on the Internet, the acl_users folder in the Zope root.
One of my hobby projects was not on our backup bandwagon, so when I accidentally corrupted Data.fs by overwriting it (the tar command is very picky it turns out about not mixing up source and destination file names) I was out of remedies. But luckily enough I had made a "site.zexp" a few days ago by exporting the site from within the Zope ZMI.
So, just delete the old Data.fs completely, start up the server and import the zexp. That works, but you cannot view any pages, you get AttributeError: getGroups . Googling I found a posting by Andreas Jung. Jung doesn't explicitly say it but writes:
Problem solved. The behavior is caused by the stupid expectation of Plone
that the root acl_users folder having been replaced with its own
implementation while creating a new site
...and from there it was possible to deduce that creating a new bogus site from within the ZMI should do the trick. And this can be done after the import.
Make rdiff-backup use a different port for ssh
Summary:
rdiff-backup --remote-schema "ssh -C -p9222 %s rdiff-backup --server"
username@remoteserver::/path_to/filestobackup
/path_to/backedupfiles
...worked for me, to backup a remote server via a non standard ssh port (9222 in the above example). Note the double quotes around the string following --remote-schema. All examples I could find on the Internet used single quotes, and using rdiff-backup 1.2.8 between two CentOS 5 machines, this did not work.
Get Spotify working with PulseAudio on Ubuntu Linux
I had problems getting Spotify to work under Ubuntu and Wine, with a Microsoft LifeChat LX-3000 headset. The sound chopped 2-3 times per second.
Using OSS and normal Wine worked fine with the internal sound card of the laptop, but not with the USB headset.
I found this discussion thread and tried different remedies. The one that worked was Neil Wilson's fork of Wine, WinePulse, with support for PulseAudio.
You can update your system with unsupported packages from this untrusted PPA by adding ppa:neil-aldur/ppa to your system's Software Sources. Not using Ubuntu 9.10 (karmic)?
Läs mer: Release Packages : Neil Wilson
Brian Eno om slutet på musikindustri-eran
"I think records were just a little bubble through time and those who made a living from them for a while were lucky. There is no reason why anyone should have made so much money from selling records except that everything was right for this period of time. I always knew it would run out sooner or later. It couldn't last, and now it's running out. I don't particularly care that it is and like the way things are going. The record age was just a blip. It was a bit like if you had a source of whale blubber in the 1840s and it could be used as fuel. Before gas came along, if you traded in whale blubber, you were the richest man on Earth. Then gas came along and you'd be stuck with your whale blubber. Sorry mate – history's moving along. Recorded music equals whale blubber. Eventually, something else will replace it."
Läs mer: On gospel, Abba and the death of the record: an audience with Brian Eno | Interview | Music | The Observer
Harvard: IT gör inte 4000 sjukhus effektivare
När man inför ett IT-system, eller byter till ett nytt IT-system, så är det lätt hänt att man cementerar dåliga arbetsvanor och strukturer genom att stadfästa dem i programkod. Dessutom kan sidoeffekter av datoriseringen öka kostnaderna. Datorer är inflexibla jämfört med människor och mycket möda kan spillas på att jobba runt dem. Jag tror detta är orsakerna till att Harvard Medical School i en rapport kommit fram till att IT-system inte varit kostnadseffktiva i vården i snitt, sett över 4000 amerikanska sjukhus.
The recently released study evaluated data on 4,000 hospitals in the U.S over a four-year period and found that the immense cost of installing and running hospital IT systems is greater than any expected cost savings. And much of the software being written for use in clinics is aimed at administrators, not doctors, nurses and lab workers. The study comes as the federal government prepares to begin dispensing $19 billion in incentives for the health industry to roll out electronic health records systems. Beginning in 2011, the Health Information Technology for Economic and Clinical Health (HITECH) Act will provide incentive payments of up to $64,000 for each physician who deploys an electronic health records system and uses it effectively.
Läs mer: Harvard study: Computers don't save hospitals money