Skip to content. | Skip to navigation

Personal tools
Log in
Sections
You are here: Home

jorgenmodin.net - Blog

Livestrong.com on microwaving frozen spinach

If you google "microwave frozen spinach" the two top results are currently from Livestrong.com. The top result says:

"When it comes time to eat, the easiest way to cook frozen spinach, and the best way to preserve its nutritional value, is in your microwave oven."

(my boldface)

However, the second result from Livestrong.com says:

"Don't microwave frozen spinach; the microwave can blanch the nutrients out of the vegetable."

Well someone is wrong on the Internet! The question is who...

Nov 23, 2014 04:46

A websockets to websockets gateway in 20 lines of javascript

I needed to connect a python websockets client to a password protected websockets server written in the go language and running over SSL.

However I couldn't get the python libraries (ws4py, autobahn, twisted) connections to be accepted by the server, so I wrote a websockets to websockets gateway in javascript, or rather I just cobbled it together from example scripts. It connects and authenticates to the go server, tellls that server what kind of channel it is interested in, and broadcasts that to any clients connected to the javascript process. It runs under node.js. The server is called "wss" below and the client "ws":

var WebSocketServer = require('ws').Server;
var fs = require('fs');
var WebSocket = require('ws');

wss = new WebSocketServer({
    port: 8080
});

wss.broadcast = function(data) {
    for (var i in this.clients)
        this.clients[i].send(data);
};
// Load the certificate for the TLS connection
var cert = fs.readFileSync('rpc.cert');
var user = "user";
var password = "a password";

// Initiate the websocket connection.  The certificate is self signed
var ws = new WebSocket('wss://localhost:18344/ws', {
    headers: {
        'Authorization': 'Basic ' + new Buffer(user + ':' + password).toString('base64')
    },
    cert: cert,
    ca: [cert]
});
ws.on('open', function() {
    console.log('CONNECTED');
    ws.send('{"jsonrpc":"1.0","id":"0","method":"notifynewtransactions","params":[true]}');
});
ws.on('message', function(data, flags) {
    wss.broadcast(data);

});
ws.on('error', function(derp) {
    console.log('ERROR:' + derp);
})
ws.on('close', function(data) {
    console.log('DISCONNECTED');
})

Maybe some more tweaking could have got the python code to talk directly to the go server, but sometimes you just have to do the solution for which you can estimate the time it will take for it to be in working order.

Nov 11, 2014 09:55

The state of writing asynchronous code in python

Python is a language with the ambition that there should be one and only one obvious way of doing things. With regards to asynchronous programming that ambition is not fulfilled currently. As an example, the reactive ReactiveX/RxPY library for python lists no fewer than 6 different ways of doing asynchronous programming in python (from the ReactiveX/RxPY documentation):

RxPY also comes with batteries included, and has a number of Python specific mainloop schedulers to make it easier for you to use RxPY with your favorite Python framework.

  • AsyncIOScheduler for use with AsyncIO. (Python 3.4 only).
  • IOLoopScheduler for use with Tornado IOLoop. See the autocomplete and konamicode examples for howto use RxPY with your Tonado application.
  • GEventScheduler for use with GEvent. (Python 2.7 only).
  • TwistedScheduler for use with Twisted.
  • TkinterScheduler for use with Tkinter. See the timeflies example for howto use RxPY with your Tkinter application.
  • PyGameScheduler for use with PyGame. See the chess example for howto use RxPY with your PyGame application.

You could probably list more of them, e.g. Gtk.

Javascript on the other hand has asynchronous operation built right in. Currently 1-0 for javascript versus python in this regard.

Nov 11, 2014 09:43

Make absolute symlinks for sites-enabled in Nginx

When symlinking virtual host files from sites-available to sites-enabled, it seems you need to be careful of how that symlink actually points to its target.

When I used relative paths nginx did not seem to be able to pick up on the target file. The default file in sites-enabled is a symlink with a full path.

Nov 11, 2014 09:40

Asynchronous code - why is it needed?

Asynchronous coding is becoming impossible to ignore. There are three main sources for the need for asynchronous programming:

The GUI - Having to deal with asynchronous events is nothing new in GUI programming, it has been a staple of it for decades. Basically humans like to click, type and select in any order they want, and the user interface has to accommodate for that.

The network - In the old days of working with a file on a GUI computer, you would work on a file until it was done, and then you delivered it to your boss. Today, what you work with is often an amalgamation of what you, what your computer well, computes, and what is sent back to you by people and services elsewhere. And since you cannot control when these people and services will get back to you with the results of your request, you need to just to react to it when it happens, if it happens.

Lastly, since we have hit a wall in CPU clock frequency, more performance is achieved by running things in parallel on multiple processing units. Even if these units are closely connected, you will still need to handle asynchronous behavior in order to keep the cores busy.

In a web application you may hit both the network and the CPU. I currently work on a project that needs to push out large amounts of large files, and it often needs to do some advanced maths to decide who gets what. For the file pushing, it is enough to multiplex a CPU core between different output streams without it breaking a sweat, but for the calculations all cores should be busy.

There are different ways of doing asynchronous programming to cater for these different use cases. Some of them are more suited to performance on CPUs, others to GUI or network applications.

One way is to use callbacks, which is used in a  lot of GUI programming and also in javascript by default. A problem with callbacks is that it can be hard to follow the execution of code, since it jumps between different functions as the callbacks are executed. This is sometimes referred to as call back hell. The problem of execution jumping between different blocks of code is somewhat similar to what can be had with exceptions handling and with reverse dependency injection.

Another way of handling concurrency is with futures and promises and building on that, reactive programming. There is also something called software transactional memory. Clojure has software transactional memory and also other constructs for dealing with concurrency on CPUs. Erlang uses the actor model and message passing and pattern matching to get things to work as one system while spread out on different processing units.

 

Nov 11, 2014 07:25

Process safe connection pool for psycopg2 (postgresql)

Summary: Gunicorn will fork its workers after the psycopg2 connection pool has started multiplying connections to postgresql. When gunicorn forks new process they will copy these connections. Because of this, several web server processes will clobber each other's connections to postgresql. Use a proxy class to re-initialise the pool, when running under a new process as shown in the code below.

One solution is to make the connection pool detect when it is running under a new process id and reset itself. The below code (which is in testing, use at your own risk) does this (Gist version here: Process safe pool manager for psycopg2):


import os
from psycopg2.pool import ThreadedConnectionPool


class ProcessSafePoolManager:

    def __init__(self, *args, **kwargs):
        self.last_seen_process_id = os.getpid()
        self.args = args
        self.kwargs = kwargs
        self._init()

    def _init(self):
        self._pool = ThreadedConnectionPool(*self.args, **self.kwargs)

    def getconn(self):
        current_pid = os.getpid()
        if not (current_pid == self.last_seen_process_id):
            self._init()
            print "New id is %s, old id was %s" % (current_pid, self.last_seen_process_id)
            self.last_seen_process_id = current_pid
        return self._pool.getconn()

    def putconn(self, conn):
        return self._pool.putconn(conn)

pool = ProcessSafePoolManager(1, 10, "host='127.0.0.1' port=12099")

Background

I'm working on a server (using the bottle micro framework, but that's optional) that needs to both serve lots of big files and do some CPU intensive work. It uses postgresql as a storage back end. An asynchronous server such as tornado or bjoern is good for serving big files a, but for the CPU heavy work several processes are needed, since python threads don't spread themselves out on the processor cores.

Gunicorn is a python server framework where you in one line of code can tell it to use asynchronous servers, and something called workers, which each one is a separate process. In bottle.py you can write something like this:

    run(host='localhost',  server='gunicorn', workers=15, worker_class="tornado", port=8080)

...and just like that you will have 15 parallel threaded Tornado servers at your disposal! I may make an egg of the above code if it seems useful in the long run.

Problem is, as soon as I set workers above 1, strange errors started to happen. After a lot of testing I started to suspect that the different workers were clobbering each other's TCP/IP connections, that is they were listening on the same ports for data from postgresql. Finally I found this web page that validated my suspicions: Celery jobs throw exceptions · Issue #3 · kennethreitz/django-postgrespool

One solution purportedly used by uwsgi is to fork before the connection pool is created. I thought of diving into the gunicorn code, or using exceptions and other things, but the quickest and cleanest solution (although maybe not the most performant) seemed to be to proxy the connection pool object from psycopg2. The proxy object checks to see if it is still running under the same process id as the previous time it was used, and if not, it will reinitialise the psycopg2 connection pool.

Optimisations

It seems that gunicorn always forks from a mother process, so once the process id has changed from the perspective of the pool manager, it will not change again. One could then do some live monkey patching to get rid of testing for a new process id after the first time it has changed, which may speed up performance. Or just set an attribute: self._changed_once = True.

Caveats and bugs

There probably are such. What happens to the old pool object once it is replaced? Are there dangling refereces to it?

Nov 04, 2014 04:30

Proof of work to prevent DOS and DDOS attacks of web pages

If an attacker had to expend a bit of work to access a web page, then it would get prohibitively expensive to dos such a site. I suggested this at a lunch with programmers today, and one of them - Rene - suggested that one could use a TLS certificate with a short key, and the client would need to use javascript to brute force it to access the web site for which the certificate is required. One would need to have a stash of these certificates ready though. I wonder how computationally expensive they would be to manufacture.

I had thought more about just having some kind of mathematical trap door where it should be easy to check that work has been expended, such as giving the client a product of primes and it needs to tell which primes were used in order to access a page. This verification of work should be done very cheaply at the edge of your server's system.

Hashcash seems to have triggered ideas in the direction of proof of work to mitigate DOS attacks: (A paper in ps format).

I wonder if one could make an implementation such as any new visitor gets a cookie. The cookie is set to the product of a number of primes, and the browser sets another cookie for the same domain, with the answer. At the next request the cookies are read server-side and a new cookie is presented as challenge. The server keeps track of what challenges it has served out and invalidates any request giving the answer to an already solved challenge. All of this should be done as early as possible in the processing of a request.

Furthermore the system should only be enabled when a dos attack is detected.

The following page Final post on Javascript crypto | root labs rdist points out that an attacker may use something else than javascript on a normal computer to do an attack (faster languages, other hardware). He attributes this insight to the below paper I think, but it does not have the same author as given by the link:

Pricing via Processing or Combatting Junk Mail - Abstract

This paper also talks about favoring memory bound instead of CPU bound problems to thwart custom hardware I suppose, as is also the idea behind scrypt (given the right parameters, which apparently was not the case for Litecoin et al).

Oct 30, 2014 12:25

SSH keys for two accounts on GitHub

You cannot use the same SSH key for two accounts on GitHub. So you need two separate keys. This is how I did it, roughly following the guide Multiple SSH keys for different github accounts.

Let's assume that you have created a second account on GitHub with the username "secondaccount" and the e-mail address "secondaccount@example.com".

You need to create a new set of SSH keys. Do that with:

ssh-keygen -t rsa -C "secondaccount@example.com"

Where the e-mail address is the one you use for your second GitHub account. ssh-keygen will ask you for the name of the key to store. Tack on the user name_secondaccount" so it becomes "id_rsa_secondaccount".

Then you need to edit the ~/.ssh/config file. If it is not there, create it. Put the following into it:

#secondaccount account
Host github.com-secondaccount
    HostName github.com
    User git
    IdentityFile ~/.ssh/id_rsa_secondaccount

This makes SSH associate a connection between the private key having the same ending as the domain after the dash sign (they could have different endings but in my experience naming everything the same where possible saves a lot of searching).

Then when checking out a repository from the second account, tack on "-secondaccount" to the Internet host, so if it looks like this initially:

git@github.com:secondaccount/my-git-repos.git

It will then look like this:

git@github.com-secondaccount:secondaccount/my-git-repos.git

Lastly, enter the repos and issue the following two commands:

git config user.name "secondaccount"

git config user.email "secondaccount@example.com"

You should now be able to push to your second account from that repository.

Your first account should continue to work as normal. At least mine does.

 

 

 

Oct 28, 2014 08:40

Python: Don't use class attributes as default values for object attributes

This bit me today. If using class attributes as default values for object attributes works or not depends on the data type used. So it is best to stay away from the habit. If you are using a list, for example, its value will be shared among objects. Example code:

class Foo:
    messages = []

    def append_message(self, m):
        self.messages.append(m)


class Bletch:
    messages = ""

    def append_message(self, m):
        self.messages += m

foo = Foo()
foo.append_message('foo')
bar = Foo()
bar.append_message('bar')
baz = Foo()
baz.append_message('baz')
print baz.messages
#  prints ['foo', 'bar', 'baz']



foo = Bletch()
foo.append_message('foo')
bar = Bletch()
bar.append_message('bar')
baz = Bletch()
baz.append_message('baz')
print baz.messages
# prints 'baz'
print foo.messages
# prints 'foo'

 

 

Oct 27, 2014 04:27

Getting a networked printer hanging off an Ubuntu server to print

Notes to self:

In this case, a new PPD had to be used (on the client machine), since the old one magically and suddenly stopped working. Very non obvious and a reminder that Linux is still the land where you sometimes are faced with complexity that makes it similar to IT consultancy work just to print something.

The printer is a HP LaserJet M1522n MFP hooked up to an old laptop that functions as a printer server, with the printer shared on the network; I changed (on the client machine not the Linux server since that one printed fine) to a driver with a different suffix somewhere at the end of it all (the full name that is).

Sep 12, 2014 03:10