Archives 2005 - 2019 Search

Lista på filmer som jag inte sett

published Dec 07, 2012 04:29 by admin ( last modified Dec 07, 2012 04:29 )

...så många av. Kanske dags att göra det?

As good as the original or maybe even better … the finest films taken from books

How to add filtering parameters to a Slickback request in Backbone.js

published Dec 05, 2012 01:05 by admin ( last modified Dec 10, 2012 11:48 )

Slickback has an object, Slickback.PaginatedCollection, that you can build a collection from that will not only interface with SlickGrid, but also paginate by adding parameters to the the request to the back end server. Slickback aslo contains a mixin object, Slickback.ScopedModelMixin, that allows you to send arbitrary parameters with the fetch request. The back end can then use these parameters to filter the result. if we start with a Slickback.PaginatedCollection that looks like this:

  var employeesFactory = Slickback.PaginatedCollection.extend({
    model: employee,
    url: '/employees',
  });

Then we can add Slickback.ScopedModelMixin functionality by first mixing it in:

  var employeesFactory = Slickback.PaginatedCollection.extend({
    model: employee,
    url: '/employees',
  }, Slickback.ScopedModelMixin);

However jsut mixing it in will not give us any exposed property where we can specify at run time what filter setting we want. Inorder to do that, an initialize method needs to be in place too:

  var employeesFactory = Slickback.PaginatedCollection.extend({
    initialize: function(){    
    this.extendScope({});
    },
    model: employee,
    url: '/employees',
  }, Slickback.ScopedModelMixin);

extendScope will make an object available at

employees.defaultScope.dataOptions

Instead of {}, extendScope can take an associtive hash-like object too, like {foo:'bar'}

But, now Slickback.PaginatedCollection stopped working, because it wants to run its initialize too, and we just overrode that by writing our own. But help is on the way. From the Backbone.js documentation:

Brief aside on super: JavaScript does not provide a simple way to call super — the function of the same name defined higher on the prototype chain. If you override a core function like set, or save, and you want to invoke the parent object's implementation, you'll have to explicitly call it...

So our final employeesFactory will look like this:

  var employeesFactory = Slickback.PaginatedCollection.extend({
    initialize: function(){    
    this.extendScope({});
    Slickback.PaginatedCollection.prototype.initialize.call(this);
    },
    model: employee,
    url: '/employees',
  }, Slickback.ScopedModelMixin);

Now if we want to add the cgi parameter fab=flum at run time. we do like this:

employees.defaultScope.dataOptions.fab='flum'

Slickback integrates Backbone and SlickGrid, extending Backbone collections to support pagination and filtering ("scoping"), and adapting them to serve as SlickGrid "DataView" objects

Sanitizing SQL input in python

published Dec 03, 2012 05:37 by admin ( last modified Dec 03, 2012 05:37 )

I'm toying with a fictional employees database ontaining 300'000 records as a back-end for a course in backbone.js. I use bottle.py to convert to and from JSON over http, and I needed a way of sanitizing (untaint) what the user sends back to the database. I found this:

Note that the placeholder syntax depends on the database you are using.

Pipelining in javascript

published Nov 29, 2012 04:29 by admin ( last modified Nov 29, 2012 04:29 )

By using $.flatMap() you also get error handling for free. If the request to fetch a post fails or the request to fetch the post’s author fails the promise that this version of authorForPost() returns will also fail with the appropriate failure values.

Nicholas Nassim Taleb och sårbarhet

published Nov 29, 2012 02:45 by admin ( last modified Nov 30, 2012 02:43 )

I dagarna kommer Nicholas Nassim Talebs nya bok "Antifragility" ut. Jag har följt hans hemsida och läst några av hans böcker, och jag tror att jag har koll på vad han menar är problemet som behöver rättas till inom många områden, t ex inom finans och ekonomi. Först lite bakgrund:

Världen är inte fysik

Andra världskriget avslutades med två atombomber. Om inte innan så blev det då i alla fall uppenbart vilken makt fysiker hade. Med sina komplexa matematiska modeller kan de förstöra hela städer eller för den delen försörja hela städer med energi.

När ett verksamhetsområde är framgångsrikt och får hög status, så tenderar man att låna terminologi från det området, för att ge ett skimmer av framgång och hög status till vad det nu är man själv håller på med. En gång i tiden var krig ett sådant högstatus verksamhetsområde och från den tiden har vi uttryck som "en person av den kalibern", "skydda min flank" osv. Uttryck som används utanför artilleri och kavalleri.

I och med att fysiker och ingenjörer varit så framgångsrika, så har man inom ekonomi velat pråla med deras lånta fjädrar. Man har använt komplexa matematiska modeller och man har länkat samma system med ett självförtroende som en processingenjör.

Men, alla som har jobbat som ingenjörer eller liknande vet att det mesta här i världen kan man inte kvantifiera och styra. Fysikers och ingenjörers framgång bygger på just den insikten, och en anpassning till den insikten. Man är mycket försiktig med att påstå saker, och de system som man bygger i ingenjörers fall, är testade med rejäla säkerhetsmarginaler.

Finansvärldens problem, och därmed vårt, är att man tror sig kunna hantera system som man inte begriper.

Normalfördelningen

Normalfördelning är en fördelning kring ett medelvärde. T ex om man mäter hur långa kvinnor är i en befolkning, så visar det sig att de flesta flockas kring ett medelvärde i längd, och ju längre man går från det medelvärdet, desto färre kvinnor finns det där. Utanför vissa värden blir det helt uppenbart att det inte finns några kvinnor. Det finns t ex inga fem meter långa kvinnor och inte heller några tjugo centimeter långa kvinnor. Men vad händer om vi inte vet vad det är vi mäter? Talebs insikt om att allt inte står rätt till vilar bl a på en analys av hur finansvärlden använder normalfördelning.

När Taleb analyserade vad finansvärlden använder normalfördelning till, så upptäckte han att modellerna inte stämde med verkligheten. Extrema händelser inträffade långt ut på skalan, ungefär som om en 100 meter lång kvinna plötsligt skulle materialisera sig och ställa till det. I finansvärlden avfärdade man detta, men Taleb noterade att dessa extrema händelser var så stora när de inträffade att de utplånade allt man byggt upp tidigare. Som jag tolkar det så försöker finansvärlden mäta och hantera saker som man inte begriper. Det är inte osökt man kommer att tänka på gamla folksagor där man väcker ett monster, som sedan förstör världen.

Om man antar en normalfördelning och sedan ignorerar extrema värden så är det ett säkert tecken på att man inte vet vad det är man mäter och modellerar. Och Talebs poäng är att man kanske inte ens kan ändra på modellen utan man ska bara ge tusan i att försöka hantera saker man inte begriper.

Antag till exempel att man vill fundera över vad den genomsnittliga återsående livslängden kommer att vara för människor år 2014. Då kanske man tror att det blir någon slags normalkurva som är ganska hög och skarp. Men för att återknyta till atombomben, det finns fortfarande en risk att alla människor dör i ett globalt kärnvapenkrig år 2014. Och då så blir medellivslängden 0. Det är denna typ av kritik som Taleb riktar mot Steven Pinker och dennes bok "Better angels of our nature", i vilken man kan få intrycket att vi blir fredligare och fredligare framåt i historien. I Talebs ögon har vi snarare skiftat riske från en daglig risk att bli ihjälslagen för individen, till ett utplånande av hela mänskligheten.

Om man ser de som tänker som Pinker som "optimister" och de som tänker som Taleb som "realister" kan föreställa sig följande dialog:

Realisten: Så man dödar färre människor?

Optimisten: Ja, det gör man

Realisten: Men om jag tittar på hur mycket man dödar på vissa platser och under vissa månader under de senaste hundra åren så är det väldigt många

Optimisten: Ja, men det händer så sällan

Realisten: Men man verkar ha en rejäl kapacitet att döda

Optimisten: Ja, men vi har blivit så milda

Realisten: Hade man en sådan kapacitet att döda för några hundra år sedan?

Optimisten: Nej, hurså?

Realisten: I rest my case

"Inte mitt problem, det är ditt"

Om en ingenjör skulle bygga ett system, där han inser att extrema världen plötsligt kan inträffa, säg t ex ett hus med en termostat som vart tjugonde år dödar alla i huset genom att höja temperaturen till 150 grader, så avbryter han arbetet, testar igenom systemet och ändrar på det eller kanske ännu bättre låter bli att göra något han inte begriper.

Inom finansvärlden och även inom politiken har man inte riktigt samma självkritik i alla länder och lägen. Istället ser man till att skörda frukterna av när det går bra, i finansvärlden med fullkomligt osannolika bonusar, och sedan när det går dåligt har man försvunnit från scenen eller så slår man upp händerna och säger "Detta hade ingen kunnat förutse". Att leva innebär en risk genom det man företar sig. Vad dessa människor gör är att skifta risken från sina egna liv över till våra. Lurendrejeri.

-6 och stockholmsbörsen

Igår så gick stockholmsbörsen ner när någon försökte handla med värden som översteg Sveriges bruttonationalprodukt många många gånger om. Den hundra meter höga kvinnan uppenbarade sig skulle man kunna säga. Programmerare världen runt har diskuterat vad som hände och kommit fram till att det var siffran -6 som slogs in i systemet, som sedan översattes till den största siffra systemet kan hantera, minus sex. Man har kopplat in ett datorsystem som kan ge extrema värden rakt in i börsen. De saker som Taleb varnar för är större och mer sammanlänkade saker med globala konsekvenser, men här i Sverige fick vi en liten varning, när finansvärlden nu inte ens klarade av att hantera en liten dator som de programmerat själva.

När du läst denna text så tror jag du håller med mej om slutsatsen: Problemet är inte att man glömde range-checka ett värde. Problemet är att man byggde ett automatiskt system överhuvudtaget.

"Ordern låg på köpsidan i orderboken och omfattade drygt 4,2 miljarder terminer, till ett styckpris på nästan 107 000 kronor. Det ger ett teoretiskt värde på 459 561 500 030 000 kronor, alltså knappt 460 biljoner kronor. Sveriges bruttonational produkt, som jämförelse, uppgick under 2011 till drygt 3 500 miljarder kronor."

Läs mer: Monsterorder stoppade börsen | Sverige | SvD

How to make the wireless handle local connections and the mobile modem take care of the rest on Ubuntu Linux

published Nov 16, 2012 11:27 by admin ( last modified Nov 16, 2012 11:27 )

My Internet provider is a bit shaky today so I switched over to a 3G modem. Unfortunately when I switch that 3G modem on and the wireless off, I lose contact to my local resources. And when I switch the wireless on, I lose contact to the Internet. In "Network", do this:

1) Select Wireless to see your wireless connections, select your wireless connection

2) Select "Options..."

3) Select the "IPv4 Settings" tab

4) Click the "Routes..." button

5) Check "Use this connection only for resources on this network" and click OK

You're done! Remember to uncheck when you need the wireless to reach the Internet again

Den ekonomiska krisen förklarad som saga

published Nov 13, 2012 12:59 by admin ( last modified Nov 13, 2012 12:59 )

Just nu diskuteras det hur man ska fördela vem som ska betala för krisen. Det är dock viktigare hur man tar sig ur den.

Låt oss anta att man har en liten by med tre invånare. En person sköter jordbruket, en gör verktyg till jordbruket och en är allt-i-allo. Bredvid byn ligger en likadan by.

I vår lilla by så producerar alla något av värde. I början så halvsvalt alla men då kom verktygsmakaren på ett bättre arbetsverktyg till jordbrukaren, så nu är alla mätta. Man kan säga att med det nya verktyget så växte ekonomin.

En dag kommer en resande försäljare till byn. Han har ett som han säger magiskt preparat som dubblar avkastningen på jordbruket vid nästa skörd. Alla väljer att tro på detta. Då inser plötsligt de tre i byn en sak: De är visserligen mätta nu, men det de alltid lustat efter är lakritsremmar. Så de går till grannbyn (som också har tre invånare) och säger:

"Vi kan nu täcka era matbehov nästa år också. Det enda ni behöver göra är att istället producera lakritsremmar, som vi kan äta sedan"

Vad som har hänt nu är att ekonomin i de två byarna är överhettad. Det är nämligen så att det magiska preparatet inte fungerar. Men alla är jätteoptimistiska om framtiden. Ur psykologisk synvinkel skulle man kunna säga att de är maniska.

Hösten kommer, och skörden blir inte större än vanligt. Men grannbyn har producerat en massa lakritsremmar, men det finns ingen som vill köpa dem. De måste nu hitta något annat att göra som snabbt ger dem mat.

Och där var sagan slut!

Och vad har detta med vår verklighet att göra?

Observera att denna saga och denna modell inte talar om stödpaket, devalveringar, trycka valuta eller andra sådana saker som politiker talar om idag. Det är därför att de sakerna inte är viktiga. De är istället ofta skadliga.

Människor måste sluta producera sådant som inte värdesätts så högt längre, och börja jobba med det som behövs. Det är det hela. Glöm resten.

Desto fler hinder som ställs upp för människor att byta jobb och ju fler hinder det ställs upp för att organisera dessa jobb, desto svårare blir det. De länder som har mest hinder, minst kreativitet och minst organisationsförmåga kommer det att gå sämst för, och omvänt de som har

Få hinder
Hög uppfinningsförmåga
Hög organisationsförmåga

...kommer det att gå bra för.

DJ with your Android phone/tablet and Mixxx on Linux

published Nov 12, 2012 01:00 by admin ( last modified Dec 25, 2012 01:08 )

Make sure the phone/tablet and the computer are on the same network
Install "DJ Control" from Google Play and start it, no configuration needed
Install http://code.google.com/p/dsmi/downloads/detail?name=dsmidiwifi-v1.01.tgz&can=2&q= on your Linux computer and start the executable in the unpacked archive
Install the DJ application Mixxx on the computer (from e.g. the Ubuntu repositories)
Start Mixxx, go to Options->Preferences->MIDI Controllers
- "DSMIDIWiFi" should be visble there, select it and in "Load preset" in the upper right part of the window, select "Hercules Dj Console RMX"
- Now you should be able to control Mixxx från Android, DJ away!

Wireless DJ MIDI controller for your favorite computer DJing application! Exact emulation of Hercules's "DJ Console RMX" DJ MIDI controller functions by using MIDI over WiFi!

Jade - indentation based template language

published Nov 11, 2012 11:55 by admin ( last modified Nov 11, 2012 11:55 )

There is a whole group of these template languages

Fram för dags-ljus i tunnelbanan!

published Nov 10, 2012 10:15 by admin ( last modified Nov 11, 2012 12:06 )

Under vinterns mörkaste månader får människor i Stockholm alldeles för lite dagsljus. Detta bidrar till depressioner och andra åkommor. Genom att installera ett rejält dagsljus i tunnelbanestationerna skulle man på vintern kunna mildra detta problem. Om man man riktigt vill lyxa till det kan man dessutom komplettera med smalstrålande belysning som hälper med D-vitamin.

Människan verkar aldrig riktigt ha anpassat sig helt och hållet till ett liv i den höga nord: Bristen på ljus under den mörkaste tiden påverkar oss negativt på flera sätt, dygnsrytmen kan få problem och vi vet att dålig dygnsrytm är skadlig. Man har i Norge också märkt att den norra delen av landet har mer depressioner på vintern än den södra.

I Stockholm har vi dock en för Sverige unik möjlighet att påverka allt detta. Vi har ett antal stora utrymmen, tunnelbanestationerna, där hundratusentals människor befinner sig varje dag. Och inte nog med det, dessa stationer är rikligt försedda med elektricitet som utan problem skulle kunna hantera mer belysning.

I första hand bör man se till att man får ett rejält dagsljus mitt på dagen under vintern. I andra hand kan man om man vill, installera belysning som ger UV-ljus, men eftersom UV-ljus har skadliga effekter bör det i så fall begränsas till spektrumet 295 - 300 nm, där kroppen har nytta av det för att producera Vitamin D.

Hur mycket lampor behövs det då?

Mycket lampor blir det. "Tänkte inte på det" frestas man väl att säga. Men det kan nog funka ändå, se under "Justeringar" nedan. Här ett överslag på en tiopotens när:

Enligt Wikipedia så ger en mulen dag 10'000 lux. Om man tittar på webbsidor som handlar om årstidsdepression så talar de också om 10'000 lux som en terapeutisk dos. Så låt oss ha det som mål. Vidare antas att all belysning går neråt, antingen direkt eller genom 100% effektiva reflektorer. En tunnelbaneperrong är ungefär 100m lång och säg 10m bred. Det ger 1000kvm.

Ett bra lysrör har en verkningsgrad på 100 lumen per watt. Lysdioder på marknaden kommer att nå dit och behöver inga reflektorer. En lux är sedan helt enkelt 1 lumen per kvadratmeter. Man kommer alltså då att behöva 100 watt lysrörseffekt per kvadratmeter. Vilket ger 100'000 watt per tunnelbanestation. Det finns lysrör att köpa i butik, som har 100W effekt. Så man behöver alltså 1000 lysrör per tunnelbanestation. Om man kan gruppera dem i armaturer med 8 i varje, säg 10 för enkelhetens skull, så blir det 100 armaturer per station. Man kan då ha tre rader med 30 armaturer, dvs en armatur var tredje meter ungefär. Det innebär i alla fall att det är fysiskt möjligt att installera det rent ytmässigt.

Värmeutveckling på 100kW blir ett problem. Om man har lysdioder kanske man kan använda fiberoptik, kombinerat med vattenkylning från fjärrvärmenätet.

Justeringar

Men, lysrör i taket är inte samma sak som en öppen himmel som går ner till horisonten, och man kan nog inte räkna med att folk går och tittar i taket. Alltså bör lamporna installeras längs en artificiell horisont, dvs längs väggarna, eller så väljer man att inreda stationerna med väldigt ljusa färger på väggarna, och riktar ljuset ditåt.

Sedan blir ju frågan, man stirrar ju inte på himlen en mulen sommardag och man promenerar ju inte heller dagarna i ända över öppna fält eller seglande på öppna havet utan man befinner sig ju ofta i stadsmiljö eller i skogig terräng, så man behöver kanske inte 10'000 lux längs väggarna för att uppnå en sommarlik ljusmiljö? Om man kan gå ner med en faktor 10 så blir projektet inte bara genomförbart utan tycks det mig dessutom ganska enkelt. 10 armaturer på en hel tunnelbanestation är ju ingenting, och måste nästan ligga i linje med vad som redan finns, vilket antyder att det inte är tillräckligt iofs.

Enligt Wikipedia så har f ö islänningar anpassat sig till föga solljus på vintern och visar inte samma korrelation ellan dagsljus, breddgrad och depression som i övriga Norden. En hypotes är att de har haft begränsat genflöde söderifrån.

apologies-to-the-readers-of-planet-plone

published Nov 02, 2012 11:17 by admin ( last modified Nov 02, 2012 11:17 )

I upgraded to Plone 4 from Plone 2.5 with quintagroup.transmogrifier and it all went very well, but the import script set the modified date on all documents to the import date. Basesyndication then used that date for the atom feeds and not the effective date. So all my old Plone posts clogged up the planet. Additionally it was impossible to edit any document on the site probably because the import script used attribute storage that got stuff into infinite recursion. So I had to rebuild the document tree and lost this post which is now back Sorry!

How to create a sequential workflow with error handling in Celery

published Nov 02, 2012 06:25 by admin ( last modified Dec 10, 2012 11:50 )

This is currently a proof of concept and not in production code.

Scenario

The scenario is this: A user wants to have a time consuming undertaking performed. He inputs the start parameters (e.g. a web site url to be processed and an e-mail address to send the results to) into a form on a web page and clicks "submit". The web page immediately returns informing the user that the undertaking has been accepted and that he will get an e-mail later on its completion.

Architecture

The web server that received the undertaking is on a relatively expensive server which we pay for having in a good datacenter with great uptime. We will call this machine the control machine. We do not want it to churn through any tasks since its precious computing resources are needed for front-end work. Instead the undertaking should be done on inexpensive back-end servers.

The back end servers starts churning, having the undertaking divided into two tasks. If all goes well a report will be sent to the user. If something goes wrong, the undertaking is set aside and staff notified that something is either wrong in the code or in the data.

Implementation

Step 1 - Make a worker

For this example we will use two tasks that do jobs. For simplicity in this example, they will just add numbers and multiply numbers. We will also define an error handling task that will handle the buggy add task. This module, called "test1.py" should be available both on the worker machine and on the control machine. But it won't be actually running on the control machine. It will just be reachable with an import by other scripts.

For simplicity, the error handling will just be a print statement with an apology, although it is unlikely that the user is looking at the worker machine's terminal output.

# Change these two to your back end. Here it is set to use a Redis server

# at 192.168.1.21, running on the Redis standard port, using database number 1

BROKER_URL = 'redis://192.168.1.21:6379/1'
CELERY_RESULT_BACKEND = 'redis://192.168.1.21:6379/1'

from celery import Celery

celery = Celery('test1', backend= CELERY_RESULT_BACKEND, broker=BROKER_URL)

@celery.task
def add(x, y):
    why = x/0  # An error in the code!!
    return x + y

@celery.task
def mul(x, y):
    return x * y

@celery.task
def onerror(uuid):
    print "We apologize and sweat profusely over the sub standard processing of job %s" % uuid

Start the worker with:

celery -A test1 worker

Step 2 - Make the control script

This only needs to be installed on the control machine

The control script has a client that puts stuff into the system. It also specifies what should happen if a worker throws an exception.

# Change these two to your back end. Here it is set to use a Redis server

# at 192.168.1.21, running on the Redis standard port, using database number 1

BROKER_URL = 'redis://192.168.1.21:6379/1'
CELERY_RESULT_BACKEND = 'redis://192.168.1.21:6379/1'

from test1 import add, mul, onerror

res = add.apply_async((2, 2), link=mul.s(16), link_error=onerror.s())
print res.get(propagate=False)

So, thats it. Control calls add first, and links it to mul. This means that Celery will execute add first, and whenever that is ready, will execute mul with the result value of add as part of the input to mul. However, if add throws an error, the task specified by link_error will be executed instead. You should see the apology being printed in the terminal window of the worker. Normally you wouldn't wait for the result with get, since that is a blocking operation. The get here has propagate set to false, which means it will not re-raise the error from the worker.

Womack: Push real-time events to the browser with python

published Nov 01, 2012 07:22 by admin ( last modified Nov 01, 2012 07:22 )

Untested by me

Womack is a service that you can use to push realtime events between your regular, plain-old, non-websockety web application and clients. It is built on top of gevent-socketio and redis.

The Celery work queue web monitor flower has a --broker option

published Oct 29, 2012 02:24 by admin ( last modified Oct 29, 2012 02:24 )

If you start flower - a web interface that allows you to monitor, inspect and edit things in the Celery work queue system - it will assume you are running RabbitMQ or similar on localhost. As of currently, doing

celery flower --help

...will not reveal that there is a --broker option which allows you to direct flower to monitor another back end (such as Redis on another server for example). Use the celery style url for that. e.g:

celery flower --broker=redis://192.168.1.14:6379/1

This would use database 1 on a Redis server running on its standard port of 6379 on the machine at IP number 192.168.1.14.

celery flower --broker option does not work

Make Firefox copy the page url formatted with link as Chrome does

published Oct 24, 2012 01:01 by admin ( last modified Oct 24, 2012 01:01 )

In Google Chrome, when you copy an address out of the URL filed, you get it as rich text, which makes it a cinch to paste in the URL in for example a blog like the one you are reading right now. In Firefox this is not possible. There are a couple of Firefox add.ons that manipulate copied text but surprisingly none of the do what Google does. However a gentleman posting at Stackoverflow has taken the time to extract javascript from one of the add-ons and combine that with an add-on that allows you to add shortcuts. I followed the instructions and it works and I can now with a key combination get the title of the current page inside of an A element pointing to the page, all wrapped up in rich text so that it works pasting it in to WYSIWYG editors.

I need to automate the copying of a HTML link to the current page that is viewed in the current Firefox Tab into other WYSIWYG editors.

Thoughts on message queue and work queue systems - overview & what's useful

published Oct 23, 2012 11:55 by admin ( last modified Jan 31, 2015 01:55 )

Jörgen Modin is an IT consultant and trainer who works with web based and mobile based systems. Jörgen can be reached at jorgen@webworks.se.

I have just finished writing a message queue based system and I used Redis as the message queue system. I have just run the system in production for a couple of days, but I already have some thoughts on how I would like a message queue system to work, and what seems to be available. There seems to be two names floating around to describe these kinds of queues: Message queues as a more general term and work queue as a more specific term.

A work queue system helps with distributing and deferring work. It can be so that a user of a web site can ask for some processing to be done, but does not have to wait for it to happen while the web page is slowly loading the result. Instead a quick confirmation is given and later a message - usually an e-mail, is sent to the user. A message queue system can also help with scalability and reliability, see further down in the text how it does this.

Here is a list of some interesting systems I have found that are freely available. I will divide them in three categories: Messaging toolkits, message queue systems and work queue systems. This is just a rough categorization. With messaging toolkits you have complete freedom in how you want your system to work and behave, message queue systems have the fundamentals to build work queue systems (and oftentimes form the core of the work queue systems) and finally work queue systems contain a lot of the functionality out of the box.

Messaging toolkits

zeromq - Toolkit targeted towards those who want to build their own high-speed systems.

Message queue systems

Lend themselves as a starting point for building work queue systems

Redis - Well documented and easy to get started with. Does not dabble in higher level constructs so you may need to write your own code on that level, or use any of the frameworks that build on Redis. Redis seems to have started as an improvement on memcached, but has since taken on more functions also suitable for messaging systems. A lot of the use of Redis on the web is though as a cache and session store. The building stones of Redis - keys, lists, sets, sorted sets, transactions and timeouts - seem well thought out. Redis is what I currently use. Huge user base: A Google search for pages mentioning "Redis" yields 7.5 million results. However a large part of the base is probably using the caching stuff more than the queueing.

RabbitMQ - Advanced system that follows the AMQP standard (which exists in different revisions, RabbitMQ is at 0.91, but there is a 1.0 version of the standard). Used by many large operations. Not a caching system. Has an acknowledge function that means that a job can be requeued if it does not get completed in a configurable amount of time. Also has the concept of exchanges, which are a couple of pre-defined routing algorithms. There are other AMQP systems available such as ActiveMQ and HornetMQ, and also commercial message queue systems that use the AMQP protocol. AMQP originated in the world of banking.

Work queue systems

Have features built in for managing jobs

Resque - Github's framework running on top of Redis, written in Ruby. It is used massively on github.com, and GitHub is awesome. There is a python clone on the net somewhere too, called pyres. (Resque does not seem to have a logo, so I just snatched an octocat from GitHub:s front page). There is a java version called jesque, and a javascript/Coffeescript version called coffee-resque. The latter seems to run server-side on node.js.

sidekiq is a work queue system written in Ruby similar to Resque and in many ways API compatible with it. Uses threads. Can run on the JVM with JRuby, might even be the recommended way of running it.

beanstalkd Seems fairly advanced in the work queue department. Does not have that extensive documentation, but there is a wiki.

Celery - Amibitious high level messaging system in Python that can run on top of RabbitMQ, Redis, beanstalkd or pretty much anything. Celery is extensively documented. It can still be a bit hard to find one's way through the documentation on what it really is and how it works. It's here: User Guide — Celery 3.0.11 documentation . This slideshow- Advanced task management with Celery helps a lot after having read through the docs. Here's a little thing I've been writing after initial tests: How to create a sequential workflow with error handling in Celery.

Besides running on the standard CPython (both 2 and 3), Celery can also run on the JVM with Jython, and with Pypy.

kombu - factored out of Celery - has a very nice API. Runs on top of Rabbitmq, Redis, beanstalkd or pretty much anything. Kombu is used in OpenStack - an infrastructure as a service project (basically automatically deploying virual servers).

Gearman has been around for a while, originally written in Perl, but there is now also a C version. It was mentioned in a Reddit discussion pertaining to this post. Documentation is a bit sketchy and I cannot say much about it, but seems to have a user community and many client implementations. The main documentation does not give a good overview of Gearman as far as I can see, but these do:

Popularity of the different systems

I did an unscientitic ranking by checking how many days back it took to get the 50 latest questions about each system from stackoverflow.com, and then calculate questions per week for each system. The results as of 2012-10-28 were with most popular sorted first:

Redis 35
RabbitMQ 13
Celery 9.5
Resque 5.8
ZeroMQ 5.4
Gearman 2.9
Kombu 1.6
sidekiq 1.3
Beanstalkd 1.1

I would say though that at least half of the Redis questions on stackoverflow are fielded by people who are using Redis as a cache or session store rather than as a basis for a message queue or work queue.

I am surprised that beanstalkd ranked so low and Celery so high. It may well be that some of these systems have their communities do Q/A somewhere else than at stackoverflow or that some just do not generate that many questions, but still it is a rough estimate.

I checked the tag "beanstalk" too for beanstalkd, but that one was 98% about Amazon beanstalk, which is something else.

For other AMQP systems than RabbitMQ: ActiveMQ would have ranked similar to RabbitMQ, HornetQ somewhere near Kombu.

Features and concepts

In this text:

task means the overarching thing you are trying to achieve with your application, e.g. do some searches, analyze the results and then email a report to the user
subtask is a part of the task, e.g. e-mailing out a report
worker is a program that does the subtask, e.g. a worker that is an emailer
job is an instance of a subtask, e.g an emailer worker emailing out the report to a specific user
Control is some kind of central control function, e.g. a supervisor that checks for bad jobs

Scalability and reliability

Distributing and deferring work can help with scalability and reliability. With a work queue, several worker processes can feed from the same queue and hence you can get an automatically scalable system, where each worker just needs to connect and snatch something from the queue and go to work. Reliability can be improved by revoking jobs from malfunctioning workers and reschedule them to other workers. Just the fact that you have queues means that is not a biggie if most your complex web based system temporarily goes down; as long as the web interface and the process that puts things into the queues are up and running, the rest of the system can display a bit of volatility without jeopardizing the entire application.

Caching

Message queue systems also seem to be used as a cache, similar to memcached. This is the function I am least interested currently. In caching, the message queue system supplies a number of constructs, such as queues obviously, that can help to serve out fast-changing information quicker, than what would be possible by handling it with slower back end services. It is basically a cache with a bit of intelligence, that can sort, slice and dice. For this application speed is of the essence, with sub-millisecond replies being the order of the day.

Modularity

If you divide your application into separate workers, they need a way to communicate. Often JSON is used for this in message queues, and since there are JSON parsers for all major languages you can mix and match workers written in completely different languages.

Resilience and inspectability

Having the process divided into several steps becomes a bit like having break points in your code. You can check the state of the data at the end and beginning of each step; they provide a snapshot of the state of your application at that point. You can correct data or code to get jobs unstuck in the processing chain, which gives the system a higher service level, and can help when demands are that every task should get through.

Revoking and rescheduling

Sometimes a job does not finish, or it does not finish the way you like. It would be good if the message queue system could help in handling this. In Redis I simply rescheduled the job by putting it back last in the processing queue. However the processing in my system is fairly deterministic and the job is most likely not going to fare better on a second run. Hence jobs that fail are now taken out of the queue and go to human inspection.

I figured out there could be different causes for a job not finishing and how it should be handled. It is assumed that a worker at least can catch its own exceptions. Here is what I came up with:

Bad job: Data makes worker throw an exception, detectable from worker, can reach control
Bad storage: Server makes worker throw an exception, detectable from worker, can reach control
Bad network: Server makes worker throw an exception, detectable from worker, cannot reach control
Power failure or other catastrophic failure: Server makes worker lock or crash, undetectable from worker

zeromq has some reasoning along the same lines. Since they have more experience than me I'll quote their take on it:

"So let's look at the possible causes of failure in a distributed ØMQ application, in roughly descending order of probability:

Application code is the worst offender. It can crash and exit, freeze and stop responding to input, run too slowly for its input, exhaust all memory, etc.

System code - like brokers we write using ØMQ - can die for the same reasons as application code. System code should be more reliable than application code but it can still crash and burn, and especially run out of memory if it tries to queue messages for slow clients.

Message queues can overflow, typically in system code that has learned to deal brutally with slow clients. When a queue overflows, it starts to discard messages. So we get "lost" messages.

Networks can fail (e.g. wifi gets switched off or goes out of range). ØMQ will automatically reconnect in such cases but in the meantime, messages may get lost.

Hardware can fail and take with it all the processes running on that box.

Networks can fail in exotic ways, e.g. some ports on a switch may die and those parts of the network become inaccessible.

Entire data centers can be struck by lightning, earthquakes, fire, or more mundane power or cooling failures."

Some ideas from me on remedies on the things on my list above:

Bad job: The job should be taken out of the job queue and a bad job mail be sent to a human

Bad storage: The job should be resubmitted to another worker and the malfunctioning worker should be taken out of commission, i.e. terminate itself.

Bad network: The job should time out and be resubmitted and the worker should be taken out of commission, i.e. terminate itself .

Power failure: The job should time out and the job should time out and be resubmitted and the worker should be taken out of commission from the point of control, since the malfunctioning worker can't do it. If it comes to life and sends in a job that has already been processed by another worker, this job is ignored.

The control server must make reasonable assumptions of how long a job could max take

Work queue strategies

The three systems I have found that seems to be able to help out-of-the-box with these kinds of things are Resque from GitHub, beanstalkd and Celery. After reading through the AMQP 0.91 standard as explained on the RabbitMQ site it seems RabbitMQ should also be able to contribute out-of the box on this level.

zeromq and Redis on the other hand are more like toolkits, especially zeromq.

Resque puts in an abstraction layer with a parent worker process and a child worker process, where the parent worker process starts the child process for the actual work and watches it and changes its own state depending on whether the child process concludes or something else. Some quotes from their pages:

"Resque assumes your background workers will lock up, run too long, or have unwanted memory growth."

"If you want to kill a stale or stuck child, use USR1. Processing will continue as normal unless the child was not found. In that case Resque assumes the parent process is in a bad state and shuts down."

And beanstalkd from its FAQ, on its buried state:

"For example, this [buried state] is useful in preventing the server from re-entering a timed-out task into the queue when large and unpredicatble run-times are involved. If a client reserves a job, then first buries it, then does its business logic work, then gets stuck in deadlock, the job will remain buried indefinitely, available for inspection by a human — it will not get rerun by another worker."

sidekiq has an interesting take on the same theme. Instead of burying a job for human inspection it retries with a timeout that gets longer and longer so that you should have time to fix the problem:

"Sidekiq will retry processing failures with an exponential backoff using the formula retry_count**4 + 15 (i.e. 15, 16, 31, 96, 271, ... seconds). It will perform 25 retries over approximately 20 days. Assuming you deploy a bug fix within that time, the message will get retried and successfully processed."

beanstalkd also sports configurable time-outs that can give control a signal that a job is probably hung or unreachable. RabbitMQ uses acknowledgements (acks) to track finished jobs. Redis has time-outs on its key data type, but no buillt-in detection or event handler for timeouts, so you would have to construct something like that on a higher level than in Redis itself. A pop/push timeout in Redis would have been helpful methinks.

Celery has advanced concepts in this department. Some concepts from Celery that seem to be of interest: linked workers, revoke, inspect, chains, groups, chords, scheduling. Chains, groups and chords are a way of stringing together subtasks into units, that can fan out into parallel processing and for example do map-reduce.

Some concepts from kombu that seem to be of interest: message priority, retry_policy, recover, heartbeat_check.

There seems to be some systems that are pull-based when it comes to queues such as Redis, while RabbitMQ seems to support both pul and push-based interactions. With pull based systems you need less intelligence centrally but the downside then is that you do have less intelligence centrally of course. I guess you will need a bit of both: Workers know better how to distribute the load between them, but they cannot manage states where they do not function anymore. Then control needs to do that and be able to bury jobs.

In my current application, there is a different job queue for each kind of subtask, and it is each worker's responsibility to move the results of a job on to the next queue so that the next subtask can be executed.

I discovered that you can specify in Celery a mapping between task signatures and queue names, see: Routing Tasks. Still, it seems as the responsibility for moving stuff onwards does not fall onto the worker, although I will have to look into that.

Update: I have looked into how Celery can be used for handling work queues in a way that you can move a task from worker to worker and branch out on error conditions, see my blog post: How to create a sequential workflow with error handling in Celery.

Parts and names

From Zeromq again with my boldface:

"sinks (process the messages without any response), proxies (send the messages on to other nodes), or services (send back replies)"

Side effects

Furthermore you want to avoid having the workers produce side effects that cannot be revoked or at least gather them. It is a good idea I believe if you are going to send out an email, to make that worker as simple and reliable as possible, and not trigger it unless everything else has lined up right. And do make sure it does not get stuck in a loop. In fact its should probably have a memory of what is has sent out before and refuse to send again if it detects same content and recipient. Unless you are running a huge system or a spam operation or something else nefarious you don't need to have many e-mailing workers and can get by with just a singleton, which means you do not need to worry about spawning e-mail clients all over the place (and other places).

If a subtask doesn't have any side effects, it means that it can be run several times without any ill consequences. It is then said to be idempotent, a word that pops up here and there in the documentation for the different systems. Idempotency does allow side effects too, as long as it only happends on the first run. So a singleton e-mailer that refuses to send an e-mail it has already sent would also be called idempotent.

Fallbacks & graceful degradation

One thing I think would be useful in a work queue system is fallbacks. If the biggest source for interruption of a job is a conflict between code and data (a nice way of saying that the code is buggy and/or bad data has been allowed in), then re-running the job will give the same result. Another way of handling that is to re-run the job with different code, that may not give as good a result but is more robust.

I am right now in the process of replacing one bit of worker code in CheckMyCSS.com written in python, with code written in javascript including a headless browser called phantomjs running webkit, the web browser engine that 40% of all the world's web browsers are built on top of (Chrome, Safari, Android's web browser). Phantomjs throws exceptions just like the python code but what if it will just hang in some situations? Then it can be terminated with a time out but that won't help the end user. Now, if hypothetically some jobs would fail on the new worker, why not fall back on the old one and keep the task moving to completion? One could call the fallback worker a naive worker or something. A naive e-mailer might send an SMS over the GSM network telling someone to send an e-mail for example.

Feature ratings

Could be time to start defining what would be interesting to have in a work queue system for me, right now. One should note that many things may best be left to the coder. There is no use in having a framework which concepts are just far enough from what you actually want to make the code a bit cumbersome.

I'll start with Redis, which I have used and I will also fill in some preliminary info on Celery, but I need to run more tests on that one. Please note that the rating will change as I find out more. Do also note that since for example Celery can use Redis as a back end, by definition Redis can "do" everything Celery can, as long as e.g. Celery is doing the work.

I've also started putting in info on RabbitMQ and its implementation of the AMQP, which seems to have enough knobs and levers to express many of the concepts listed below. It can however be a blurry line between what is built-in and is merely a configuration issue, and what starts looking like bona fide programming.

A solid authentication layer

Redis claims to not be overly secure if exposed directly to the Internet. One can put an SSL tunnel in front with certificates, which I have done. That may be a better solution than what could have been built in.

Celery has built in support for signing and certificates right into the messages themselves, see Security — Celery 3.0.11 documentation, however the Celery documentation says that Celery should be treated as an "unsafe component", so SSL tunneling might be a good idea anyway.

A work queue that workers can take jobs from

Redis has this with e.g. a blocking pop/push (BRPOPLPUSH), but the only way to find out if an item is in a list is to delete it (LREM)

Celery has this.

RabbitMQ has this. It can make it look like workers are pulling jobs from a queue, by using the configuration prefetch_count=1.

Possibility to bury jobs

Redis does not have this, you will have to build that on top

Celery has something called a persistent revoke, which if you specify a storage file, seems to do the trick, see: Workers Guide — Revoking tasks

Beanstalkd has this

Sidekiq retires the job with an exponentially rising delay, so that a fix can be applied, instead of burying it

According to the RabbitMQ docs here, there is something called a dead letter extension, that as I can read it, could be helpful in implementing this.

Time-outs after which jobs are rescheduled, revoked or buried

Redis does not have this, you will have to build that on top. There is a time-out option for keys though.

Sidekiq retires the job with an exponentially rising delay, so that a fix can be applied

Celery has this, with acks_late option, see: Workers Guide.

Monitoring of processes

Redis does not have this, you will have to build that on top, but there are third party modules although it unlikely that they work out-of-the-box with how you have designed your system

Celery, se Monitoring and Management Guide — Celery 3.0.11 documentation

Group operations into atomic transactions

Redis has this

RabbitMQ has this

Persistence

Redis has this

Celery has this as long as the back end is configured to persist

RabbitMQ has this.

Flexible reporting and web interface, pubsub logging

There is a PHP web interface, among others. Pubsub exists and there is e.g a python logger that publishes to Redis.
Celery has a very ambitious support for this, in the shape of flower and django plugins, see: Monitoring and Management Guide — Celery 3.0.11 documentation
There is even a limited curses interface for Celery.

Sequential workflow with error handling

Redis does not have this, you will have to build that on top

Celery has this, see How to create a sequential workflow with error handling in Celery

sidekiq can send exceptions to an exception notification service

Fallbacks

Redis does not have this concept, you will have to build that on top

Celery does not have this concept, you will have to build that on top

Language agnostic, easy to mix and use different languages

Redis has a defined wire protocol and support in a plethora of languages, and as long as you use a data format available on all platforms such as JSON, interoperability should not be a problem

Celery has a defined protocol as far as i can see, but it is not implemented in plethora of clients. It uses python's pickle format for serializing data as default, but it can be switched to e.g. JSON or Yaml. Celery can together with Django be used with a simple http protocol called Webhook, a protocol that according to Google searches seems to have been causing some enthusiasm back in 2009, and which today forms a part of Github's API. Celery can also operate with http posts and gets, see: HTTP Callback Tasks (Webhooks), celery/examples/celery_http_gateway.

Resque seems to run its child workers as system processes and that could open up a possibility for it to run code in other languages, but that does not seem to be how it is used. I guess one could write a worker that used STDOUT and STDIN to communicate with the child process I guess.

RabbitMQ works with AMQP and has standardization and interoperability built right in.

Documentation

Redis -well documented in a concise way. Redis itself is pretty concise, which helps. For every command there is information on how well that command scales, in "Big O" notation. there are also pages covering other aspects of Redis.

Celery is very well documented, from high level all the way down to the internals and message format.

beanstalkd - An FAQ that is one page on a five page wiki.

Resque

Zeromq - Lots of documentation and examples, given in parallel in several computer languages

CheckMyCSS.com - Checks what CSS on your site goes unused

published Oct 22, 2012 04:09 by admin ( last modified Oct 22, 2012 04:09 )

I've just made a site called CheckMyCSS.com . It checks your web site for what CSS selectors in your style sheets aren't actually used and mails a report to you. It's free.

CheckMyCSS.com . Take it for a spin!

Get a little more modern Redis for Your Ubuntu 10.04 LTS

published Oct 20, 2012 07:39 by admin ( last modified Oct 20, 2012 07:39 )

Ubuntu 10.04 is still supported but that does not mean that you always want the software versions that are still supported. In contrast to Debian 6, there is no official backports repository for Ubuntu 10.04 containing a newer version of Redis than what the distribution is shipped with.

However David Murphy has made a newer redis (2.2) available for 10.04. I have only tested redis-cli so far but that did the job of connecting to newer servers, which the 2.1 in 10.04 did not.

PPA description Backport of redis to 10.04 LTS

How to get stunnel running on Ubuntu 12.04 precise

published Oct 20, 2012 03:20 by admin ( last modified Oct 20, 2012 03:20 )

Summary: Download the source packages instead: stunnel-4.54.tar.gz and build them locally somewhere in a home directory, with:

./configure --prefix=/home/auser/where-you-want-stunnel

I'm writing this in frustration so I might go back here and do some edits later. Anyways: I tried to use the stunnel packages supplied with the Precise Pangolin distribution. Setting up two stunnel daemons between two machines with those, is like trying to play the violin with two cold, dead fish. They start, but they do not read any configuration files (sheet music in the case of the fish). Or they might, but I doubt it since writing garbage in the config files does not make the stunnel daemons react. In fact I think the 12.04 stunnel daemons are some of the most stable things on earth in their complete insensitivity to anything happening around them, like config files in the right place or having any kind of arguments with them. And by that I mean command line arguments.

In contrast, the source tarballs are all rainbows and unicorns, printing sane and helpful stuff to stdout and stderr, and I got up and running with them in 20 mins. Not gonna dwell how big a dollop of my life the the debs disappeared with.

Redis 2.4 for Debian6 (Debian Squeeze)

published Oct 17, 2012 10:41 by admin ( last modified Oct 17, 2012 10:41 )

The included version for Redis in Debian6 (Debian Squeeze) is a bit long in the tooth (version 2.1) and lacks some of the commands you may want to use. A more modern version (currently 2.4) can be found in Debian backports for Debian 6.

Due to high demand I have prepared official Redis packages for Debian "squeeze":

Read more: Official Redis packages for Debian "squeeze" « lamby

http://packages.debian.org/squeeze-backports/redis-server