zachary.com

personal pages

All ad proceeds donated to charity.

sixtyPercent: Cochlear Implants, Aviation, Technlology, and Philosophy 2005/09

SQLObject and Memcached, part 1

Last week I attended the BayPIGgies monthly meeting and heard Ben Bangert's excellent introduction to SQLObject & FormEncode. I'm using both of these tools at work, so it's good to learn of other folk's experience with these tools.

First an apology -- some in the group asked why SQLObject reads back a row after inserting it (on object creation), since nobody seemed to have the answer, I speculated that it was to read back the ID. Wrong -- the id is fetched before the row is created (next item in the DB sequence). My next guess is that the row is read back in case the database executed a trigger, or had some default values that it applied to the row. In this case, the in-memory values would be incorrect, or not filled in if the row is not read back.

Another question came up on how to keep multiple processes synchronized with each other with SQLObject. The short answer is that it's not easy. SQLObject is aggressive about caching objects, even when told not too -- this is apparently to preserve object identity -- i.e. two fetches of the same row yield the exact same Python object. Makes sense to me.

Even so, I've been experimenting with approaches to solving the multi-process problem with SQLObject. One standard approach is to have database based triggers update a serial number associated with each row in tables that are shared between processes. Then before every use of the object, check to see if your serial number matches -- if not, re-sync and try again.

Below is some code I wrote to do that, but instead of adding a serial number column, it just caches the serial number in memcached. To use this class, update the memcached servers list, and then re-parent the shared class to use this class instead of SQLObject as the base.

The solution isn't perfect, as documented below. There is still a small but real race condition, and I love to hear solutions to reducing or eliminating that. However the results are nice for non-critical data. For example, multiple federated web applications can use this technique to keep seldom changing user data up to date.

This post is entitled part 1 because I have another, more invasive solution that I'll post in a little while. I've hacked up SQLObject's cache to always and ONLY use a memcached-based cache for object storage. This has some pluses and some minuses -- you would definitely not want to use this for a single process application, and object identity suffers (see above). I'll post that once the code works a bit better.

# SharedObject -- a SQLObject subclass where instances are shared by multiple
# processes. This implementation isn't perfect (see below), but generally works
# ok. It uses memcached to track object serial numbers, and patches some SQLObject
# methods to catch reads / writes. Any read or write checks the serial number of
# the object from memcached, and refreshes the object if it's out of date.
#
# THIS CODE HAS (at least one) RACE CONDITION. Since the read - update - write
# cycle does not happen within the scope of a "transaction", the object in question
# could have changed after we test to see if our copy if up to date.
# This could obviously lead to bad results. You have been warned.
#
# David Creemer -- 17-Sep-2005 -- Placed in the public domain. NO WARRANTY.
# 

import sqlobject
import memcache

# set the memcache client list -- customize this as appropriate
mc = memcache.Client( [ '127.0.0.1:11211' ] )

class UnfreshObject(Exception): pass
class DeletedObject(Exception): pass

class SharedObject( sqlobject.SQLObject ):

    def _key( self ):
        return self.__class__.__name__ + str(self.id)
    
    def _use( self ):
        # call this when using the object for reading. This will ensure
        # we have the latest data by checking the memcached serial number
        if not hasattr( self, '_serial'):
            self._serial = 0
        
        k = self._key()
        s = mc.get( k )
        if s:
            if s == "DEL":
                raise DeletedObject()
            if s > self._serial:
                # we're out of date -- sync to db
                self.sync()
                self._serial = s
        else:
            # first time we are called
            mc.set( k, 0 )

    def _change( self ):
        # call then when you have changed an object
        # NOTE THAT THIS METHOD HAS A RACE CONDITION. The object could have been changed
        # after the test to see if we're up to date.
        if not hasattr( self, '_serial'):
            self._serial = 0
        k = self._key()
        s = mc.get( k )
        if s and s == "DEL":
            raise DeletedObject()
        s += 1
        mc.set( k, s )
        self._serial += 1
        if self._serial != s:
            raise UnfreshObject()

    def __getattribute__( self, name ):
        if name in ['name','val']:
            self._use()
        return sqlobject.SQLObject.__getattribute__( self, name )
        
    def set(self, **kw):
        try:
            self._change()
        except UnfreshObject:
            self._use()
        
        return sqlobject.SQLObject.set( self, **kw )

    def _SO_setValue(self, name, value, from_python, to_python):
        try:
            self._change()
        except UnfreshObject:
            print "*"
            self._use()
        
        return sqlobject.SQLObject._SO_setValue( self, name, value, from_python, to_python )
    
    def destroySelf( self ):
        # if the object is destroyed, leave a marker in the cache for a day
        mc.set( self._key(), "DEL", 60*60*24 )
        return sqlobject.SQLObject.destroySelf( self )

***
highlight file error
***

by David Creemer : 2005/09/18 : Categories python (permalink)



All content Copyright 2003-2005, David Z Creemer