I've been working on a data warehouse project lately, in python, to supportdifferent kind of data analysis I am developing as part of my current work. Idecided to use SQLAlchemy as the ORM; I can then quickly move from my development version using SQLite database, toproduction, using MySQL or MSSQL databases.
SQLAlchemy is also one of these amazing ORM that support sharding. It's not necessary to tell that it's very important when you develop a toolthat will import, format, process and analyze gigabytes of data.
Also, working with a lot of data types, to register them into my ORMinstance, and to persist them into a database, I need my software to be able toquickly generate an object representing the data type: a particular instance ofthe object. Developers usually create factories in order tocreate instances of objects. The main idea is to delegate the instantiation ofthe object to a third party object. In most factories, we specify a type ofobject that we want to create: Give me an instance of a pizza withmushroom, tomatoes and ham.
The last point on asking for a particular type (or sub-type) of object was the main limitation for my use. In fact, most of my types are related in someways, but without strong inheritance (Dish > Pie > Pizza); another important point is the maintainability of a code where I would list all different types of object my factory needs to create... Well, I wanted something more generic: a data driven factory.
The data driven factory is a factory that, based on the data sent to the factory object constructor, will produce an instance. A simple example would beto be able to get an instance of a Margerita pizza when giving the certainingredients (tomatoes, mozzarella and parmesan) or a Napolitan if I add enchovies.
This type of factory, which depends only on the data to give in parameter,is possible in python by using the class inspection capabilities of thelanguage. In fact, the implementation I propose requires to register each classto be constructed in the factory, constructor arguments (and defaultsarguments) will be analyzed for a matcher later on, and to give as argumentsthe "type" of each data field (basically, the arguments); the factory will thenget the appropriate object for you.
Side note: The fact that the factory doesn't return aninstance of an object is for performances. In fact, I get the class from thefactory, store it and loop through the instantiation with millions ofdata...
Example of use:
class Shape(object): pass class Circle(Shape): def __init__(self, center, radius=RAD_MAX): ... class DiskHole(Shape): def __init__(self, center, radius, small_radius=RAD_SMALL): ... factory = DDFactory() factory.register(Shape) factory.register(Circle) factory.register(DiskHole) print factory.get(['center', 'radius']) #> return 'Circle' ctor print factory.get(['center', 'radius', 'small_radius']) #> return 'DiskHole' ctor
You can access this factory here: dd_factory.py
In the distributed code, I assume that each object to create has
atablename
class member that tells which databasetable is the eventual
target (which is my case using SQLAlchemy / declarativeobjects). This is
easy to change by replacing the factory register method bysomething like
this:
def register(self, cls): if hasattr(cls, '__init__'): s_cls = str(cls) args, defaults_dict = DDFactory.defaults_values(cls) if s_cls not in self.registrar: self.registrar[s_cls] = { 'class' : cls, 'args' : args, 'defaults' : defaults_dict }