Monday, June 26, 2017

Implementing a “soft delete” system using sqlalchemy

Leave a Comment

We are creating a service for an app using tornado and sqlalchemy. The application is written in django and uses a "soft delete mechanism". What that means is that there was no deletion in the underlying mysql tables. To mark a row as deleted we simply set the attributed "delete" as True. However, in the service we are using sqlalchemy. Initially, we started to add check for delete in the queries made through sqlalchemy itself like:

customers = db.query(Customer).filter(not_(Customer.deleted)).all() 

However this leads to a lot of potential bugs because developers tend to miss the check for deleted in there queries. Hence we decided to override the default querying with our query class that does a "pre-filter":

class SafeDeleteMixin(Query):     def __iter__(self):         return Query.__iter__(self.deleted_filter())     def from_self(self, *ent):         # override from_self() to automatically apply         # the criterion too.   this works with count() and         # others.         return Query.from_self(self.deleted_filter(), *ent)     def deleted_filter(self):         mzero = self._mapper_zero()         if mzero is not None:             crit = mzero.class_.deleted == False             return self.enable_assertions(False).filter(crit)         else:             return self 

This inspired from a solution on sqlalchemy docs here:

https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/PreFilteredQuery 

However, we are still facing issues, like in cases where we are doing filter and update together and using this query class as defined above the update does not respect the criterion of delete=False when applying the filter for update.

db = CustomSession(with_deleted=False)() result = db.query(Customer).filter(Customer.id == customer_id).update({Customer.last_active_time: last_active_time }) 

How can I implement the "soft-delete" feature in sqlalchemy

2 Answers

Answers 1

I've done something similar here. We did it a bit differently, we made a service layer that all database access goes through, kind of like a controller, but only for db access, we called it a ResourceManager, and it's heavily inspired by "Domain Driven Design" (great book, invaluable for using SQLAlchemy well). A derived ResourceManager exists for each aggregate root, ie. each resource class you want to get at things through. (Though sometimes for really simple ResourceManagers, the derived manager class itself is generated dynamically) It has a method that gives out your base query, and that base query gets filtered for your soft delete before it's handed out. From then on, you can add to that query generatively for filtering, and finally call it with query.one() or first() or all() or count(). Note, there is one gotcha I encountered for this kind of generative query handling, you can hang yourself if you join a table too many times. In some cases for filtering we had to keep track of which tables had already been joined. If your delete filter is off the primary table, just filter that first, and you can join willy nilly after that.

so something like this:

class ResourceManager(object):      # these will get filled in by the derived class      # you could use ABC tools if you want, we don't bother      model_class = None      serializer_class = None       # the resource manager gets instantiated once per request      # and passed the current requests SQAlchemy session       def __init__(self, dbsession):          self.dbs = dbsession        # hand out base query, assumes we have a boolean 'deleted' column      @property      def query(self):          return self.dbs(self.model_class).filter(             getattr(self.model_class, 'deleted')==False)   class UserManager(ResourceManager):      model_class = User   # some client code might look this  dbs = SomeSessionFactoryIHave()  user_manager = UserManager(dbs)     users = user_manager.query.filter_by(name_last="Duncan").first()         

Now as long as I always start off by going through a ResourceManager, which has other benefits too (see aforementioned book), I know my query is pre-filtered. This has worked very well for us on a current project that has soft-delete and quite an extensive and thorny db schema.

hth!

Answers 2

I would create a function

def customer_query():     return db.session.query(Customer).filter(Customer.deleted == False) 

I used query functions to not forget default flags, to set flags based on user permission, filter using joins etc, so that these things wont be copy-pasted and forgotten at various places.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment