Personal tools
You are here: Home English

English


Getting the code for Facebook's "Like" button

Filed Under:

Here is a link going directly to a page on Facebook with boiler plate code for making a Facebook "Like" button.

This evening I tried filling out a form on Facebook that should give me the code for a Facebook button to put on a page. Trying with different browsers I could not get it to output any code, the submit did not seem to work. Analyzing the submit button, it submits to a url that if you just access it directly (i.e. a GET without parameters) you get some boiler plate code where you easily can replace relevant parts. I guess the form submit button will start to work again, but if not accessing the taget url directly works somewhat.

Video editors on Linux

Filed Under:

Here is a list of Video editors available on Linux. I have tried them all, save Lombard. Will try to summarize more experiences at a later time.   

  • Cinelerra (Cinelerra for Grandma)
  • LiVES The best one I have found for extracting short video clips from a video. My use case was extracting individual three second long moves from salsa dance performances. I tried three Linux based video editors for this: LiVES, Cinelerra and Pitivi. The best by far for this particular application was LiVES. It was easy to "scrub" with, essential for finding the right cut point.
  • Kdenlive Has a lot of filters you can apply to video. Unsure of how cutting works.
  • Pitivi
  • Lombard Very recent contribution, supports only basic editing at this point in time, according to web site.

 

 

Offloading downloads from Plone, (relatively) transparently

Filed Under:

I've just built a first version of a system that offloads file downloads from the responsibility of Plone, while keeping authentication and authorization. With help from Apache and iw.fss, surprisingly few lines of code are needed to accomplish this.

Here comes first two bulleted summaries of what is going on. Look further down in this post for a longer description:

 

Workflow

  • User accesses page in Plone, where he is offered to download a file
  • When clicking the download link, a redirect is made to a sub domain, with the exact same path + filename
  • The subdomain "steals" a cookie from the user and uses that to authenticate an xml-rpc call to check if user has the right to download the file
  • If True, mod_python lets the request through, and file is downloaded through Apache

 

Technology

  • Have Apache in front,
  • use iw.fss to store files on the file system
  • Serve the files out on a subdomain with Apache
  • Use url rewrite to make authentication cookie or session cookie accessible to the sub domain
  • Use mod_python as access handler for download sub domain
  • The mod_python script has an xmlrpc client that can be configured with cookies
  • The xmlrpc client is configured with cookies, so to Plone it is the user himself
  • xmlrpc asks plone if dowload is ok by accessing proxy method of object of download desire

Longer description

Plone is not very good at serving out big files. A couple of concurrent huge downloads would take up a lot of Zope processes and threads. Here is a way of offloading the file downloading to other server processes (in this case Apache) with authentication and authorization intact.

The stuff needed is

  • Apache with mod_python and mod_rewrite
  • A custom transport agent for xml-rpc (included in this post)
  • Plone

Let's look at how it is set up, going from the outside in:

Make the authentication or session cookie available to the download server

The user is browsing the web site. The first thing the user's browser hits is the Apache server, sending the request further to Plone. The user may or may not be logged in. The virtual server directive in Apache looks something like this:

<VirtualHost *:80>
ServerName server.topdomain
proxyPreserveHost on
RewriteEngine On
# Match the __ac cookie if present, and make a new cookie with same value, but that can be sent to sub domains
RewriteCond %{HTTP_COOKIE} __ac=([^;]+) [NC]
RewriteRule ^/(.*) http://localhost:6080/VirtualHostBase/http/%{HTTP_HOST}:80/site/VirtualHostRoot/$1 [co=__download__ac:%1:.server.topdomain,L,P]
RewriteRule ^/(.*) http://localhost:6080/VirtualHostBase/http/%{HTTP_HOST}:80/site/VirtualHostRoot/$1 [L,P]

</VirtualHost>

The RewriteRule looks pretty standard, it rewrites the request to fit the virtual host monster in Plone, and the Plone site with the id "site" is served out. However there is some extra stuff, particularly the line:

RewriteCond %{HTTP_COOKIE} __ac=([^;]+) [NC]

This line is a condition, that is triggered if the incoming request has a cookie by the name "__ac". The "__ac" cookie contains information that authenticates the user to the Plone site. If your site uses another cookie than "__ac" then put its name here instead. It does not matter if the cookie contains just a session id or the login information. The important thing is this part:

([^;]+)

...it matches the cookie value, and the brackets mean that Apache remembers this part of the pattern. Since this is the first (and actually only) pair of brackets in the pattern, it will be remembered as %1 (Apache's way of denoting back references from the environment or http headers is with a percentage sign, inside a url pattern it is the standard dollar sign though).

Now, let's look at the rewrite rule:

RewriteRule ^/(.*) http://localhost:6080/VirtualHostBase/http/%{HTTP_HOST}:80/site/VirtualHostRoot/$1 [co=__download__ac:%1:.server.topdomain,L,P]

The first part is standard. The square brackets at the end usually only contains control flow information (such as "stop here" or somesuch) but within the square brackets at the end here there is also an instruction to send a cookie with the response out:

co=__download__ac:%1:.server.topdomain

We define here a new cookie called __download__ac (it can be called anything), but the important thing here is the dot in front of the domain. This means that the cookie is available to the domain server.topdomain and all sub domains. In this way this cookie can be shared with sub domains.

The standard __ac cookie in Plone does not have the capability to be shared with sub domains. The __ac cookie is a bit insecure because it actually contains the user name and password in scrambled form, but for other reasons we wanted to keep it with this system.

Another way of making the cookie shared would be to write a PAS plugin for Plone that can be configured to send out the cookie with a dot-prefixed domain qualifier, but the above rewrite is hard to beat for brevity.

If one goes with the more secure session cookie, there seem to be a PAS plugin that already allows configuration with a a dot-prefixed domain, and in that case the rewrite above is not necessary. Well, either way you do it, at this point you should have cookie with authentication information that can be read by sub domains.

Apache config of download server

When the user click on the download link he will be redirected to a sub domain. Let's look at the Apache configuration for that:

<VirtualHost *:80>
ServerName download.server.topdomain
DocumentRoot /var/www/html
<Location />
#Extend the Python path to locate your callable object
PythonPath "sys.path+['/var/www/mod_python']"
# Make Apache aware that we want to use mod_python
AddHandler mod_python .py
PythonAccessHandler downloadauth
</Location>
</VirtualHost>

This looks pretty much like a standard virtual server directive in Apache, for serving out static files from the file system. The only embellishment being that a python script gets registered to be access handler for all requests. This means that the script is actually not serving out any content, it just sits as a gatekeeper and says yay or nay to if Apache should serve out the requested content to the user.

As per the configuration above, the script is defined above as having the name "downloadauth.py" and that it should live in the "/var/www/mod_python" directory on the server. Let's take a look at that script:

#!/usr/bin/env python
# -*-python-*-
#
from mod_python import apache, Cookie
import xmlrpclib
from cookiestransport import CookiesTransport
PLONE_URL = 'http://192.168.1.51:6080/site'


def accesshandler(req):
"""Return apache.OK if the authentication was successful,
apache.HTTP_UNHAUTORIZED otherwise.
"""
plone_local_uri = "/".join(req.uri.split("/")[:-1])
xmlrpc_server_url = PLONE_URL + plone_local_uri

# Get the __ac cookie
cookies = Cookie.get_cookies(req)
# apache rewrite adds this cookie as a sub domain friendly copy of "__ac"
cookie = cookies.get('__download__ac', None)
cookies_spec = []

if cookie is not None:
cookie_value = cookie.value
# Cookies values are delivered wrapped in quotes
cookie_value = cookie_value.replace('"','')
# We configure xmlrpc to use it as "__ac"
cookie_spec = ['__ac', cookie_value]
cookies_spec.append(cookie_spec)
server = xmlrpclib.Server(xmlrpc_server_url, transport=CookiesTransport(cookies=cookies_spec))
if server.can_i_download():
return apache.OK
else:
return apache.HTTP_UNAUTHORIZED

This script takes the cookie we defined, and sends an xml-rpc call to Plone. The xml-rpc client is configured with a special transport agent that can take cookies. In this way the xml-rpc client get authenticated as the user doing the download. Plone runs on Zope and Zope supports xml-rpc out of the box. All we need now is to access a method in Plone that has the same acess rights as the file the user wants to download.

If we assume the object in Plone holding the download looks something like this:

object
method: download (permission: 'View')
method: can_i_download (permission: 'View')

I.e. it has two methods,

  • One that serves out the object from inside of Plone
  • One that just returns True

The first method is never going to be used; we do not want Plone to serve out the file from within Plone. However we make a note of what permission that method has (usually "View"). We then make a method "can_i_download" (or whatever name you fancy) with the same permission. This method just returns True. It can look like this:

security.declareProtected(permissions.View,'can_i_download')        
def can_i_download(self):
"""If user has permission to run this method, he
has permission to download the file"""
return True

The cookie aware transport agent for xmlrpc

The code for the cookie aware transport class largely taken from:

Roberto Rocco Angeloni » Blog Archive » Xmlrpclib with cookie aware transport

My version does away with the ability to receive cookies from the server

Adds capabilty to configure client with arbitrary cookies from code

 

# A module with a class that allows the xmlrpc client to be configured with a list of cookies
# For authentication with plone xml-rpc methods
# Based completely on code from Rooco Angeloni:
# http://www.roccoangeloni.it/wp/2008/06/13/xmlrpclib-with-cookie-aware-transport/

import xmlrpclib
class CookiesTransport(xmlrpclib.Transport):
"""A transport class for xmlrpclib, that can be configured with
a list of cookies and uses them in the xml-rpc request"""
def __init__(self, cookies=[]):
""" cookies parameter should be a list of two item lists/tuples [id,value]"""
if hasattr(xmlrpclib.Transport, '__init__'):
xmlrpclib.Transport.__init__(self)
self.cookies=cookies

def request(self, host, handler, request_body, verbose=0):
# issue XML-RPC request
h = self.make_connection(host)
if verbose:
h.set_debuglevel(1)
self.send_request(h, handler, request_body)
self.send_host(h, host)
fo = open('/tmp/modpythonlogger.txt','a')
fo.write('Value for cookiespec is:%s\n' % self.cookies)
fo.close()
for cookie_spec in self.cookies:
h.putheader("Cookie", "%s=%s" % (cookie_spec[0],cookie_spec[1]) )
self.send_user_agent(h)
self.send_content(h, request_body)
errcode, errmsg, headers = h.getreply()
if errcode != 200:
raise xmlrpclib.ProtocolError(
host + handler,
errcode, errmsg,
headers
)
self.verbose = verbose
try:
sock = h._conn.sock
except AttributeError:
sock = None
return self._parse_response(h.getfile(), sock)

 

iw.fss - care and feeding

iw.fss is a very slick product for external storage from Ingeniweb. Deciding what fields should be stored externally can be done with a ZCML file, you do not need to touch the code of the content types.

Making migration work in iw.fss

iw.fss has a control panel that allows you to migrate the already stored data in the fields, to the external storage. In our case a good number of field migrations failed. It turned out that at least in our system (which was prepopulated from an old legacy system) the data was sometimes stored in an object type, "OFS.Image.Pdata", that iw.fss migration could not handle (a bug ticket has been submitted to the iw.fss people about this).

The following monkey patch (used with collective.monkeypatcher), stored in a "patches.py" file fixed that problem:

import cgi
from ZPublisher.HTTPRequest import FileUpload
import cStringIO
from iw.fss.FileSystemStorage import FileUploadIterator

old__init__ = FileUploadIterator.__init__

def new__init__(self, file, streamsize=1<<16):
""" this is a file upload """
if not hasattr(file, 'read') and hasattr(file,'data'):
data = str(file) # see OFS.Image.Pdata
fs = cgi.FieldStorage()
fs.file = cStringIO.StringIO(data)
file = FileUpload(fs)
return old__init__(self, file, streamsize)

 

...with the following ZCML:

<configure
xmlns="http://namespaces.zope.org/zope"
xmlns:monkey="http://namespaces.plone.org/monkey"
i18n_domain="my.application">

<include package="collective.monkeypatcher" />

<monkey:patch
description="Patching FileUploadIterator to handle OFS.Image.Pdata objects"
class="iw.fss.FileSystemStorage.FileUploadIterator"
original="__init__"
replacement=".patches.new__init__"
/>

</configure>

Selecting storage strategy in iw.fss and how to make the url redirect

iw.fss can store the external files in different layouts, called "strategies". I chose "site2". "site2" mirrors the directory structure of Plone completely, and then adds the file in the innermost sub directory. Problem is, the path inside of Plone goes to the last sub directory, but Plone does not tack on the name of the file at the end of the url, i.e. if the file object in Plone has the url:

http://server.topdomain/afolder/another/folder/file_object

iw.fss will store it (with site2 layout) as:

/afolder/another/folder/file_object/filename

That "filename" can be anything. Thankfully iw.fss also stores a file next to the file with a fixed name "fss.cfg". This stores the name of the other file (the one we want to serve out).

I wrote a tool that can be configured with info on where on the file system iw.fss stores its data, and what the url is of the download server. Let's say the tools is called "tool_that_stores_fss_info". If you do not want to write a tool, a config sheet in portal_properties will do too.

In your file object, write a method that goes something like this:

security.declarePublic('redirectToExternalFile')        
def redirectToExternalFile(self):
"""Redirects to external download url"""
sc_tool = getToolByName(self, 'tool_that_stores_fss_info')
portal_url = getToolByName(self, 'portal_url')()

local_url = self.absolute_url()[len(portal_url):] # Extract local part of url
file_path_to_download = sc_tool.fs_path_to_download_root + local_url
config = ConfigParser.ConfigParser()
config.read(file_path_to_download + '/fss.cfg')
file_name = config.get('FILENAME', 'file')
url_to_download = sc_tool.url_to_download_root + local_url + "/" + file_name
#return url_to_download
self.REQUEST.response.redirect(url_to_download)

 

Buildout configuration of iw.fss

Below is an fss configuration that works for a Plone site named "site" stored in the Zope root, using "site2" as storage layout, and the external files get stored inside the var directory of the buildout.

[fss]
recipe = iw.recipe.fss
zope-instances =
${instance:location}
storages =
# The first is always generic
global /
site /site site2 ${buildout:directory}/var/fss_storage_site

Other location of the external files

In the above case the files gets stored in the var directory inside of the Plone buildout. It may be a better idea to store it on the server in a directory structure with more permanence, like "/var/www/html" as is suggested in the Apache configuration for the dowload server earlier in this post. The var directory in a buildout does not get overwritten on invocations of buildout, but it is still in the buildout directory structure.

Make sure however you have the iw.fss storage on the same volume as the Plone buildout:

Many unixish systems have different directories (/tmp, /home, /var and so on) on different mounted volumes. Python's os.rename cannot handle this and therefore the code in iw.fss that uses os.rename cannot handle the storage being on a different mounted volumes (say, on "/var/www/html", when the Plone site is a buildout in the "/home" hierarchy). shutil.move may be an alternative to os.rename. (a bug ticket has been submitted to the iw.fss people about this).

Some possible improvements and simplifications

What changes could be made to make this more of a solution ready out-of-the box, to be deployed on different servers?

Getting the cookie to the sub domain

  • The rewrite in Apache could be simplified, or
  • The cookie handler in Plone could check what domain it is serving the cookie in, and tack on a dot. This behavior could be switched on and off with a checkbox in the ZMI. In this way no Apache rewrite would be necessary at all.

mod_python contacting Plone

  • Assuming Plone runs on port 8080 and with the same ip number as mod_python could be a default in the mod_python script

Constructing the right url to redirect to

  • The redirect code could be factored out into a view (if they work with xml-rpc), or some other mechanism, so that a simple ZCML configuration would connect a downloadable field in a content type with the redirect code
  • It is probably possible to ask iw.fss where its file system storage is, in that way no separate setting would be needed for this in a tool or property sheet
  • The default domain to redirect to could be the one Plone is on, with "download." tacked on in front. Together with the preceding suggestion, no settings would need to be stored in Plone

Replacing xml-rpc

  • Instead of xml-rpc, maybe one could simply use an authenticated HEAD request to the file in Plone? Then no custom authentication method would be needed.

Change the architecture

Tramline (PloneTramline) uses Apache filters to intercept request and responses in Apache. In this way it can monitor all request and responses and insert a file on the way out, and take care of a file on the way in.

It would be cool if were possible to configure Apache in such a way that you can choose proxying based on the response headers (from the first proxy you try). If those response headers match some criterion (such has there being an "x-dowload-from-somewhere-else" header), Apache would simply switch to another server it proxies. I do not think it can be done, but this page talks about a possible implementation:

Smart filter

 

Linux ocr for getting text from a screenshot

Filed Under:

 

Summary: For a 72dpi screenshot,  gocr returned something intelligible, tesseract returned nothing and ocrad returned gibberish.

Multiplying the pixel count by 4, and interpolating helped tesseract and ocrad to output text at all, but they were still not superior to gocr

 

These OCR programs are probably not calibrated for making text out of pixel-perfect low-resolution screen shots, but from high-resolution somewhat noisy scans of different type faces on paper. Doing OCR from a screenshot ought to be quite easy: Each letter is pixel perfect and looks exactly the same, and there are no problems with slanting text or other distortions. In fact, writing your own OCR program is a distinct possibility for this.

I had via mail received a 72dpi screenshot that I wanted to get the text from. The top part looked like this:

 salsatext.png

The top of the screenshot

Tesseract, which is a program that is highly recommended on the web, returned nothing when run on this screenshot. At first after reading this, I thought this had to do with my tif possibly having a layer of transparency, but ensuring it was not there did not change anything.

According to the same discussion, it seems like tesseract wants to have a high resolution image (see tests on that further down).

 

Now the ocrad program returned this:

Al Po_ Po_ _ Al _|_o _|_o _ M_|_o_hl_o
lollob_IOldo _|_o_do l_m_o
A_ _ol__lo _ _|_o_do l_m_o
__ _o _| Colmo_ _ ____ _ ___ T__o_
OIOo Ml__ __o C_o_o_o_ O_O____o
llo_o_do__ _ l_|_ __llO_ Co__ol__
_| Mo_|___o O_O____o A_oOo_
A_ Amo_ C_o_do Woblo_ lo_ Ml_odo_ _ Alb___o Bo__o_ _| Tl_o_ D
_o_ldo B___lol _ _|_hl_ _o_ b Bobb_ c___
lo_ Tomoll_o_ D_ OIOo O_O____o A_oOo_ _o__
A_ld _o_ Bo_____o
_o_ Wo_ Wo_ B___o _|__o C_bo_ Plo____

 ...and so on.

Gocr returned this:

y 9 999)        9       9     _J   ypp yyy    y
AI P a n P a n Y AI Vin o Vin o m eIc o c hita
L oIIo brigid a Ric ard o L e m v o
A V aIeria Ric ard o L e m v o
Se Va EICaiman fru ko Y Sus Tesos
Oiga mire Vea Guayacan Orquesta
LIora n d ote L uis F eIi e G o n z aIe z
EImanisero Orquesta Aragȯn
A Amor Cuando HabIan Las miradas AIberto Barros''EITitan D.
S o nid o B e stiaI Ric hie R a 6 B o b b C ru z
Los TamaIitos De OIga Orquesta Aragȯn,Josė
A cid R a y B arretto
Ran Kan Kan Buena Vista Cuban PIayers
Undanta Bo Kas ers Orkester

...and so on, which given the non-outputting competition, must be deemed fantastic. Still, it cannot deal with any characters extending below the baseline (p, g and y for example), and all ls are interpreted as 1s.

Increasing the pixel density of the image

This turned out to be non trivial with the tools I had at hand. I finally got resampling working with the program pnmenlarge, part of the netpbm suite of command line unixish image processing tools:

cat salsatext.pnm | pnmenlarge 4 > enlarged.pnm

This quadrupled each pixel, and now tesseract magically started working!

 (convert to to tif first)

Fil Pen Pen 'ii Fil '·.·'inp '·.·'inp et`} i···1el¤:p·:hite
Lpllplznrigide Riterdp Lem·-rp
.·!·.·-,# '·.·'elerie et`} Riterdp Lem·-rp
5e '·.·'e El Ceimen et`} Frulce 'ii 5us Tesps
Ciige i···1ire '·.·'ee Gueyeten Ordueste
Llprendpte et`} Luis Felipe Gpneelee
El i···1eniserp Ordueste ifiregdn
.·!·.·-,# Famer, Cuendp Hel:·len Les i···1iredes et`} .·!·.ll:¤ertp Eerrps "El Titen D. ..
Epnidp Eestiel et`} Richie Re·-,# El Epl:·l:·3r Crue
Lps Temelitps De Cilge Ordueste ifiregdn, _|pse
.·!·.¤:id Re·-,# Eerrettp
Ren Ken Ken Euene '·.·'iste ·Zul:·en F‘le3··ers
Llndenteg et`} Ep iiespers Orltester

Well, it does at least produce output, but the quality is at the point that you can barely guess which line it is trying to decode.

Let's try switching to Spanish as language:

.ü.| Pan Pan "x‛ .ü.| '·.·'ina '·.·'ina —l=*.`Š f'·'1a|·:·:··:|'•i|:a
La||a|:·ri·;|i·:|a F‘xi·:ar·:|a Lam'­.«a
.ü.'-,« '·.·'a|aria —l=*.`Š F‘xi·:ar·:|a Lam'­.«a
Sa '·.·'a El Caiman —l=*.`Š FrukJ:· "x‛ 5uS TaSaS
Diga Mira '·.·'aa Cuaşracan Dr·:]uaS|:a
Llarandata —l=*.`Š LuiS Falipa Ganzalaz
El f'·'1aniSara Dr·:]uaS|:a Aragón
.ü.'-,« Fumar, Cuanda Ha|:·|an LaS f'·'1ira·:|aS —l=*.`Š .ü.||:·ar|:·:· EarraS "EI Titan D. ..
Sanida EaS|:ia| —l=*.`Š F‘xi·:|'•ia Ra'-; Ex Ea|:·|:·ş» Cruz
LaS Tama|i|:aS Da Diga Dr·:]uaS|:a Aragón, _|aSa
.ü.·:i·:| Ra'-; Earratta
F‘xan Kan Kan Euana '·.·'iSta Cu|:·an F‘|aş·'arS
L|n·:|an|:a·; —l=*.`Š Ba kaS|:·arS DrkaStar

 

That was not good. Maybe the enlargement needs to be smoother?

pamstretch, also from the netpbm package, also increases pixel count but additionally smooths the output by interpolating pixels.

As many unixish tools, pamstretch takes data from stdin and outputs it to stdout:

cat salsatext.pnm | pamstretch 4 > stretched.pnm

Tesseract needs tif format, handled here by Imagemagick's convert command

convert  stretched.pnm  stretched.tif

run tesseract on it in this case with -l spa, which means Spanish language

tesseract stretched.tif str -l spa

The result:

AI Pan Pan 'l" AI Vino Vino —.?•.`$ Molcochita
Lollobrigida Ricardo Lomyo
Ay 'o‘aloria —.?•.`$ Ricardo Lomyo
5o 'o‘a El Caiman —.?•.`$ Fruko 'l" Sus Tosos
Diga Miro 'o‘oa Guayacan ûrquosta
Llorandoto —.?•.`$ Luis Folipo Gonzalo:
El Manisoro ûrquosta Aragon
Ay Amor, Cuando Hablan Las Miradas —.?•.`$ Alborto Barros "El Titan D. ..
5onido Eostial —.?•.`$ Richio Ray En Bobby Cruz
Los Tamalitos Do Olga ûrquosta Aragon, José
Acid Ray Earrotto
Ran Kan Kan Euona 'liista Cuban Playors
Undantag —.?•.`$ Bo Iäaspors ûrkostor

...better. Let's try English:

AI Pan Pan 'i" AI 'a'inu 'a'inu 3} Ms|cuchita
Lu||ubrigic|a Ricarclu Lsmyu
Ay 'a'a|sria 3} Ricarclu Lsmyu
5s 'a'a El Caiman 3} Fruku 'i" 5us Tssus
Diga Mirs 'a'sa Guayacan Drqussta
L|uranduts 3} Luis Fs|ips Gun:a|s:
El Manissru Drqussta Aragun
Ay Amur, Cuanclu Hab|an Las Miradas 3} Albsrtu Earrus "El Titan D. ..
5unic|u Esstia| 3} Richis Ray E: Eubby Cru:
Lus Tama|itus Ds D|ga Drqussta Aragun, juss
Acid Ray Earrsttu
Ran Iian Iian Eusna 'a'ista Cuban Playsrs
Unclantag 3} Eu Iiaspsrs Drksstsr

That is worse.

How does ocrad perform?

Al Pan Pan Y Al Vino Vino __ Melcochila
Lollobrigida Ricardo Lemvo
Ay Valeria __ Ricardo Lemvo
Se Va El Caiman __ FrukD Y Sus Tesos
Oiga Mire Vea Guayacan Orquesla
Llorandole __ Luis Felipe Gonzalez
El Manisero Orquesla Arag�n
Ay Amor, Cuando Nablan Las Miradas __ Alberlo Barros "El Tilan D,,,
Sonido Beslial __ Richie Ray bBobby Cruz
Los Tamalilos De Olga Orquesla Arag�n, los� , , ,
Acid Ray Barrello
Ran Kan Kan Buena Visla Cuban Players
Undanlag __ Bo Kaspers OrkPsler

 

A lot better than the line noise seen before. With enlarged but not interpolated:

Al Pan Pan Y Al Vino Vino __ Melcochi_a
Lollobrigida Ricardo Lemvo
Ay Valeria __ Ricardo Lemvo
Se Va El Caiman __ FrukoYSusTesos
Oiga Mire Vea Cuayacan Orques_a
Llorando_e __ Luis Felipe Conzalez
El Manisero Orques_a Arag�n
Ay Amor, Cuando Hablan Las Miradas __ Alber_o Barros "El Ti_an D,,,
Sonido Bes_ial __ Richie Ray b Bobby Cruz
Los Tamali_os De Olga Orques_a Arag�n, _os� ,,,
Acid Ray Barre__o
Ran Kan Kan Buena Vis_a Cuban Players

That's worse.

So, tesseract and ocrad needs the input to be "scannified" by multiplying the pixel count and interpolating to get a bit of smoothness, but they still do not clearly beat gocr.

For scanned in documents the ranking seems reversed.

 Peter Selinger: Review of Linux OCR software:

Of course, it must be stressed that the test results reported here are derived from only two scanned pages. It is possible that for other inputs, the programs rank differently. However, based on the tests reported on this page, here is a summary of my conclusions:
* Tesseract gives extremely good output at a reasonable speed. It is the clear overall winner of the test. The only caveat is that one absolutely must convert the input to bitonal.
* Ocrad gives reasonable output at extremely high speed. It can be useful in applications where speed is more important than accuracy.
* GOCR gives poor output at a slow speed.

 

 

 

 

 

"svnadmin load" loads into an existing repository; it does not create one

Filed Under:

  • svnadmin dump repository > repository.svndump
  • svnadmin create repository
    svnadmin load repository < repository.svndump

I needed to downgrade a couple of repositories from format version 3 to 2 in order to work with RHEL, and it took me a while to realise that svnadmin load loads into an existing repository, it does not create a new one. So first create the repository with svnadmin create, then load into it.

Get Spotify working with PulseAudio on Ubuntu Linux

Filed Under:

I had problems getting Spotify to work under Ubuntu and Wine, with a Microsoft LifeChat LX-3000 headset. The sound chopped 2-3 times per second.

Using OSS and normal Wine worked fine with the internal sound card of the laptop, but not with the USB headset.

I found this discussion thread and tried different remedies. The one that worked was Neil Wilson's fork of Wine, WinePulse, with support for PulseAudio.

You can update your system with unsupported packages from this untrusted PPA by adding ppa:neil-aldur/ppa to your system's Software Sources. Not using Ubuntu 9.10 (karmic)?


Läs mer: Release Packages : Neil Wilson

Review: Burnt by the Sun at the National Theatre, London, UK

Filed Under:

(this review written in March 2009, but I forgot to post it so here it comes)

Summary: A reasonably well executed performance, conveying a very important insight. Go see it if:

  • You have not seen the movie it is based on
  • You prefer everything to be in English,
  • Or just strongly prefer the stage before the screen.

However the movie in my mind surpasses it. But with some work it could get closer. Some of Kotov's lines and lines of reasoning are great in the play.

This review contains spoilers.

The play "Burnt by the Sun" is based on the superb film by the same name, by Nikita Michalkov. Peter Flannery has adapted it to the stage. Before the performance I listened to Flannery talking about how he adapted the film.

 

Not enough shouting and pauses :-)

Act 1 (of 2) suffers from not enough shouting, physical acting and meaningful pauses in conversation.  Emotions need to be acted out stronger whether they are from feelings of (true or feigned) joy, or from anger and malice. In film you get away with small gestures due to the camera's focus, but on stage stronger stuff is needed, especially if you do not use a spotlight. The play starts off unfocused emotionally due to this. Another way to get more focus to act 1 (besides shouting then) would be to cut down on the number of characters on stage. Two or three of the datja's inhabitans could have been cut from the script. Possibly rewriting lines to have more punch. This is all rectified in act 2 where fewer people on stage leads to stronger focus, and the language gets more direct.

The play stays eerily true to the film: Before seeing the play I hadn't realised that every scene and nearly every line of that film is etched in my memory. It was therefore easy to check off lines and plot devices as they came, and suffice to say that Flannery has not touched anything he wasn't forced to due to a different format.

Affability

In the last scene of the play  Rory Kinnear's Mitia comes off a bit too affable and emotionaly worked up, interspersing opera singing with his Russian roulette. Sitting still in a chair, reciting a children's rhyme and squeezing the trigger on numbers in the rhyme would be more consistent with the temperament of the character and his slavic devil-may-care attitude. He comes off as a bit too much of an affable tragic case now, like Jeremy Iron's Sebastian Flyte, or Klaus-Maria Brandauer's Hendrik Höfgen, while it is clear that Mitia knows exactly what game he is playing.  He could probaby still do the opera bit, but jump into some ridiculous pose before each pull to cut to the core of the character.

Kinnear does a good job otherwise in a demanding role and Ciaran Hinds as Kotov grows as the play progresses, but he needs to throw his weight around more in act 1. The actress playing the daughter does a perfectly good job. The play moves on quite swiftly and make it hard to evaluate the other performances in depth.

During the interview with Flannery that I attended I got interested in seing other things he has written. I like his ambition of presenting multiple views at the same time, and what seemed to be a humanistic approach, being mature enough not to cling to some extremist ideology. His work seems to be a quest for knowledge.

Compared to the movie the play more clearly depicts the main conflict as duty-egoism, but also subverts this by showing that also tough man Kotov can break, and just from one sentence. There is also a budding feminist angle present in the play that is missing in the film. Flannery actually lets the wife speak up a  couple of times about what she thinks of the situation. Flannery did mention that in the film the wife acts like a child, and the daughter more as a wife, and that he reversed this for the play, also to get some load off the actress playing the daughter.

One reason I went to see this play, that I stumbled upon due to me meting a friend and mixing up BFI and the National Theatre (the serendipity of being a tourist) and getting last minute tickets, is that I was curious about if it could help me figure out how much of the strength of the movie comes from the technical and artistic performances and how much comes from the drama itself.

When the play ended, I was as gripped of the drama of the story as ever before. And the end of the stage play is the same as in another film of the 90s, La Haine.

HD Video from Canon HF200 on Ubuntu Linux - convert and play

Filed Under:

Notes to self on how to play videos from the camera on my Linux computer.

The files that the Canon HD video camera outputs have the suffix MTS. These can be played by Videolan client on my Dell Celeron-equipped laptop. Well, kind of: It plays the first two frames or so, then chokes on the video and keeps playing the sound.

The mts files can be converted to other formats with ffmpeg. The video from the Canon camera seems to be interlaced. If you use ffmpeg straight off the bat like so:

ffmpeg -i canonvideo.mts -sameq video.mp4

lines will be all wavy because the canon format is interlaced. Use the -deinterlace option like this:

ffmpeg -deinterlace  -i canonvideo.mts -sameq video.mp4

The mp4 file then plays effortlessly with vlc on the computer.

It should be possible to make the output interlaced as well, with the ilme option

However this:

ffmpeg  -i 00002.MTS -sameq -flags ilme video.mp4

still creates wavy lines. There are a number of idioms with ilme in them floating around the Internet, and I am not sure how to use it.

Put together a video split in parts

Canon HF200 splits long videos in separate files, each part about 2GB in size. These files cannot be converted as individual videos! Well, the first one can, but the  following ones each depends on the one before. This is because the camera in order to save space writes incomplete frames to the files, frames that only contain the changes as compared to previous frames. Some frames however are complete on their own and are usually called key frames. Now when the camera splits the video into files, it does not take care to do this at key frames.

The video therefore before conversion needs to be put together into one large file. On Linux this can be done with the cat commmand:

cat 00001.MTS 00002.MTS 00003.MTS > whole-video.MTS

Slow motion a video and save to file

Filed Under:

This guide shows you, on Linux,  how to make a slow motion video of an mp4 video (in this case downloaded from Youtube), with the pitch of the sound intact. There is probably a one-line command to do this in mencoder, ffmpeg or vlc, if so please enlighten me. This guide starts with a terse summary, and then continues with a more verbose explanation.

Summary

You need:

  • mencoder (part of mplayer)
  • ffmpeg
  • sox

All are open source and freely downloadable.

Assume the video is called "normal.mp4", that should be made into a slow motion video called "slomo.mp4", with the pitch intact so we do not get those grovelling noises.

First to slow down the video to half speed, use mencoder, part of mplayer:

mencoder -ovc copy -oac mp3lame -speed 0.5  normal.mp4 -o slow.mp4

Extract the sound with ffmpeg:

ffmpeg -vn  -i slow.mp4 slow.wav

You may now discard the slow.mp4 file.

Pitch it up:

sox slow.wav slow_but_pitched_up.wav pitch 1200

Make a slow version of normal.mp4 with no sound:

mencoder -ovc copy -nosound -speed 0.5  normal.mp4 -o slow_no_sound.mp4

You can put sound and video together with with mencoder:

mencoder -ovc copy -audiofile slow_but_pitched_up.wav -oac faac slownosound.mp4 -o slomo.mp4

or use ffmpeg:

ffmpeg -i slownosound.mp4 -i slow_but_pitched_up.wav -map 0.0 -map 1.0 slomo.asf

(the sound needs to be compressed above, methinks)

If you can make an mp4 file instead of an asf file, so much better. On my machine it complained about not having the correct codec for sound; I am still looking into that.

Slow motion an mp4 video on Ubuntu Linux 9.10

--longer explanation and screenshots --

If you watch a video in vlc, you can slow it down, the sound is slower but stays at the original pitch, which is neat. I was unable to find a "Ok, good, now play through this and save it to a file" setting in vlc, so below is the road i treaded to finally convert a video file into a slow motion video file, with pitch intact.

First to slow down the video to half speed, use mencoder, part of mplayer:

mencoder -ovc copy -oac mp3lame -speed 0.5  normal.mp4 -o slow.mp4

This will slow the video down to half speed, but unfortunately it also speeds down the sound. Mplayer has a switch for affecting the pitch but mencoder does not pick it up.

The -speed flag above indicates the speed, with 0.5 being half speed.

So far I have been unable to make mencode preserve the pitch (i.e pitch shift it back), but sox can pitch shift. However it only operates on sound files. So, the sound of the mp4 file needs to be separated out, then sox can operate on it, then we combine the sound and video again.

Separating out the sound

On  my Ubuntu 9.10 I hade to install "libavcodec-extra" to get it to work. The  command is like this:

ffmpeg -vn  -i slow.mp4 slow.wav

Now sox can operate on it. AFAICT sox should be able to read mp3 too, but not on my machine, despite library installations and hand waving.

Pitching up the sound

Now sox can pitch it up

sox slow.wav slow_but_pitched_up.wav pitch 1200

Sox is special in that it wants the input and output files first, and then after them, the command line arguments. Sox has a flag called "pitch" pitch takes among other things a percentage value, where 100 is one semitone, and hence 1200 is an octave. We want an octave shift since we slowed down the the video to 50%, and an octave is a doubling of frequency (pitch).

Combining slow motion video and pitched up sound

Now we need to combine the sound and the video.

You can put sound and video together with with mencoder:

mencoder -ovc copy -audiofile slow_but_pitched_up.wav -oac faac slownosound.mp4 -o slomo.mp4

There is some problem with that file though since ffmpeg reports:

Seems stream 0 codec frame rate differs from container frame rate: 29.97 (30000/1001) -> 14.99 (15000/1001)

It plays fine in vlc, though

You can use ffmpeg like so:

ffmpeg -i slownosound.mp4 -i slow_but_pitched_up.wav -map 0.0 -map 1.0 slomo.asf

 

If you can make an mp4 file instead of an asf file, so much better. On my machine it complained about not having the correct codecs with ffmpeg.

vlc has a gui for combining sound and video for different files. First, for it to work I had to produce a slow version of the video with no sound, so rerun the command from the beginning of the guide, but make a video file with no sound:

mencoder -ovc copy -nosound -speed 0.5  normal.mp4 -o slow_no_sound.mp4

 Then the GUI in vlc can combine them. Start vlc and choose "Convert/Save" from the "File" menu:

 In the dialogue, select your slow motion file with no sound. Tick "Show more options", "Play another media synchronously" and click "Browse" to add extra media, and select the sound file there.

 Click "Converts/Save" in the above dialog, and you get the below dialog:

 

Here you have to experiment a little to select a profile that uses codecs that

  • you have on your system
  • vlc realises you have on the system

Happy slow motioning!

 There is this command in vlc, I wonder if it could be used for something:

--audio-time-stretch, --no-audio-time-stretch
Enable time streching audio (default enabled)
This allows to play audio at lower or higher speed withoutaffecting
the audio pitch (default enabled)

How to use the reverse-i-search in bash

Filed Under:

A quicker alternative to hitting the up arrow to get back the old command that you typed some time ago, is to hit Ctl-r, and then type a substring from the command you're looking for.

For example if you want to type "ssh -p 1022 username@ahost.domain", if you have typed it before you can just hit Ctl-r follwed by 1022 .

If you do not get the command you are looking for, hit Ctl-r again and it will find the next line in your command history that fits the pattern.


If you want to go to some other command press ctrl+r again to move backwards This will speed up your whole process


Läs mer: Just another Programmer: reverse i search for linux users

hello
 

This site conforms to the following standards: