The Manga Guide to Databases!

Thanks Josh Berkus for hinting about The Manga Guide to Databases! Just placed my order for a copy of this must have. :-D

A PostgreSQL friendly way to load customized SQL in Django

So I finally came to a reasonable solution for problem with customized SQL loading in Django.

By creating a app manager to load the customized SQL (by using signal the post syncdb signal as suggested in #3214). This makes my customized SQL to be run after syncdb is done. (Avoiding the problem with customized SQL getting run after table creation but before index creation and initial data loading.)

This also means my customized SQL will be run as part of test suites avoiding customized SQL being wiped as described in #8792.

I actually made a patch to the load problem in #3214 before realizing the test would fail because of #8792 which made me abandon it. It is interesting to see the impact of the patch on Django 1.0.2 though:

$ svn diff | wc -l
$ svn st
M      django/db/backends/sqlite3/
M      django/db/backends/
M      django/db/backends/postgresql_psycopg2/
M      django/core/management/
M      django/contrib/contenttypes/
$ svn diff | wc -l

I wont bother anymore but I made it available available anyway: ticket-3214-patch-django-1.0.2.patch

Arbitrary SQL with Django and PostgreSQL caveat

Update:There's no work around other than mentioned in #3214. My previous suggestion was for a work around was false and a result of an error on my part. I have therefore edited the post to reflect this.

We haven't had the need for using arbitrary SQL in our application but this just changed when I was setting up automatic build and testing. I ran into a caveat: #3214. The title is somewhat misleading, the bug is really about the arbitrary SQL being mangled by Django without letting you know it.

It's all there in the ticket; dedicated developers, users needing it, patch, documentation and tests. And the reason for not accepting? Some backends doesn't handle multiple statement... I know SQLite and PostgreSQL does.

After spending some time both in Django source and on their site I eventually found out that running the SQL as Database-backend-specific SQL data doesn't mangle it. Unfortunately this information was not included in the documentation.

Visual callgraph in Python

Sometimes a picture tells more than... Not often in programming though, but occasionally a picture does tell a little more than text and numbers can do (especially when the text and numbers are included in the picture).

Recently I found myself in need of a visual representation of a callgraph of a webapp written in Django. This was a pretty simple exercise creating a middleware that makes use of pycallgraph. The only heads-up is to make use of filters, exclude and/or include, or you'll end up with a *huge* pitcure.

I wrote a snippet to share. Using Django or not, that small piece of code shows how easy it is to make use of pycallgraph.

Basically it's just:

import pycallgraph

As said I recommend making use of a filter though.

Don't forget the EPOCH-1234567890-Party

This is just so geeky but I can't help myself. I think it's fun!

Epoch Time 1234567890 countdown to the big party on Friday.

Remember to bring your best party hat. :-)

Programmers need their zone

I just read blog post about the benefits of working from home.

Apart from the obvious benefits it concentrates on the focus span, the importance of being able to get in, and stay in, the zone when programming.

In addition to the benefits described in the post I would like to add that when working in a distributed environment it forces you to write things down; write a mail, discuss things on IRC (of course everyone saves their logs!), describe a solution or work flow on a wiki. This way things doesn't get lost due to not getting written down.

Follow up: Vacuum Full progress

Ok, to follow up my recent post I created a small example of how the progress indication could be done. The example is very simple and is only a proof of concept, nothing more. There is no support for multiple file nodes, only relation and index is considered, etc.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import psycopg2
import os
import time
from threading import Thread

class Vacuum(Thread):
    def __init__(self, dsn, name):
        super(Vacuum, self).__init__()
        self.connection = psycopg2.connect(dsn)
        self.connection.set_isolation_level(0) = name
    def run(self):
        cursor = self.connection.cursor()
        cursor.execute('vacuum full %s' % (
class Progress(Thread):
    def __init__(self, dsn, name, pgdata):
        super(Progress, self).__init__()
        self.connection = psycopg2.connect(dsn)
        self.table = name
        self.objects = {}
        cursor = self.connection.cursor()
        cursor.execute("select oid from pg_database where datname=current_database();")   
        db_oid = cursor.fetchone()[0]
        self.path = '%s/base/%s' % (pgdata, str(db_oid))
        cursor.execute("select oid from pg_class where relname='%s'" % (name))
        oid = cursor.fetchone()[0]
        epoch = os.path.getmtime(self.path + '/' + str(oid))
        orig = time.strftime('%H:%M', time.localtime(epoch))
        self.objects[name] = {'filenode': oid, 'epoch': epoch,    'original_time': orig}
        index_query = "select c2.relname, c2.relfilenode FROM pg_catalog.pg_class c, pg_catalog.  pg_class c2, pg_catalog.pg_index i WHERE c.oid=%s::integer AND c.oid = i.indrelid AND i.indexrelid = c2.oid;"
        cursor.execute(index_query % (oid))
        for row in cursor.fetchall():
            name, oid = row[0], row[1]
            epoch = os.path.getmtime(self.path + '/' + str(oid))
            orig = time.strftime('%H:%M', time.localtime(epoch))
            self.objects[name] = {'filenode': oid, 'epoch':       epoch, 'original_time': orig}

    def _check(self):
        done = 0
        for name, attrs in self.objects.items():
            epoch = os.path.getmtime(self.path + '/' + str(attrs['filenode']))
            if epoch > attrs['epoch']:
                self.objects[name]['done_epoch'] = epoch
                done += 1
        return done

    def run(self):
        done = 0
        print "done: %s of %s" % (done, len(self.      objects))
        cursor = self.connection.cursor()
        while len(self.objects) > done:
            tmp, done = done, self._check()
            if tmp != done:
                print "done: %s of %s" % (done,        len(self.objects))

if __name__ == '__main__':
    import sys
    dsn = 'dbname=testdb user=testuser password=secret host='
    vacuum = Vacuum(dsn, 'db_category')
    progress = Progress(dsn, 'db_category', '/path/to/my/pgdata')

# vi: set fileencoding=utf-8 :

To do something serious about this the implementation would have to understand which objects have file nodes, support multiple file nodes, time each run and try to estimate progress, etc.

I would also need to understand exactly how VACUUM FULL behaves, when files are written for example.

Estimate full vacuum time

Tonight during an upgrade of our systems I was doing a full vacuum on a couple of tables and I started to wonder how long it would take.

After poping the question on #postgresql@freenode Steve Atkins (I think it was) suggested I could get kind of a feeling of progress by looking at the file nodes.

So I did and it gave sort of a feeling about what stuff was happening but it was kind of a hassle to do. (I had one psql open and one xterm looking at the relfileno in one and the file node modification time in the other.)

This got me thinking that if I get a list of file nodes involved before I start the (full) vacuum I should be able to keep track of progress by using something like inotify. The idea is still a bit rough and I need to look into exactly what goes on during a full vacuum (like when are things written to disk) but I still think it's doable.

Would this kind of approach be useful for anything or am I just rambling? (After all it is 4:20am here now...)

RSS 2.0