deep inside: security and tools

Data driven factory: I give you data, you give me an object...

I've been working on a data warehouse project lately, in python, to supportdifferent kind of data analysis I am developing as part of my current work. Idecided to use SQLAlchemy as the ORM; I can then quickly move from my development version using SQLite database, toproduction, using MySQL or MSSQL databases.

SQLAlchemy is also one of these amazing ORM that support sharding. It's not necessary to tell that it's very important when you develop a toolthat will import, format, process and analyze gigabytes of data.

Also, working with a lot of data types, to register them into my ORMinstance, and to persist them into a database, I need my software to be able toquickly generate an object representing the data type: a particular instance ofthe object. Developers usually create factories in order tocreate instances of objects. The main idea is to delegate the instantiation ofthe object to a third party object. In most factories, we specify a type ofobject that we want to create: Give me an instance of a pizza withmushroom, tomatoes and ham.

The last point on asking for a particular type (or sub-type) of object was the main limitation for my use. In fact, most of my types are related in someways, but without strong inheritance (Dish > Pie > Pizza); another important point is the maintainability of a code where I would list all different types of object my factory needs to create... Well, I wanted something more generic: a data driven factory.

The data driven factory is a factory that, based on the data sent to the factory object constructor, will produce an instance. A simple example would beto be able to get an instance of a Margerita pizza when giving the certainingredients (tomatoes, mozzarella and parmesan) or a Napolitan if I add enchovies.

This type of factory, which depends only on the data to give in parameter,is possible in python by using the class inspection capabilities of thelanguage. In fact, the implementation I propose requires to register each classto be constructed in the factory, constructor arguments (and defaultsarguments) will be analyzed for a matcher later on, and to give as argumentsthe "type" of each data field (basically, the arguments); the factory will thenget the appropriate object for you.

Side note: The fact that the factory doesn't return aninstance of an object is for performances. In fact, I get the class from thefactory, store it and loop through the instantiation with millions ofdata...

Example of use:

class Shape(object):

class Circle(Shape):
  def __init__(self, center, radius=RAD_MAX):
class DiskHole(Shape):
  def __init__(self, center, radius, small_radius=RAD_SMALL):

factory = DDFactory()

print factory.get(['center', 'radius']) 
#> return 'Circle' ctor
print factory.get(['center', 'radius', 'small_radius'])
#> return 'DiskHole' ctor

You can access this factory here:

In the distributed code, I assume that each object to create has atablename class member that tells which databasetable is the eventual target (which is my case using SQLAlchemy / declarativeobjects). This is easy to change by replacing the factory register method bysomething like this:

def register(self, cls):
  if hasattr(cls, '__init__'):
    s_cls = str(cls)
    args, defaults_dict = DDFactory.defaults_values(cls)
    if s_cls not in self.registrar:
      self.registrar[s_cls] = {
        'class' : cls, 
        'args' : args, 
        'defaults' : defaults_dict

All entries

  1. February 2013 — RSA 2013 speaking session
  2. February 2013 — HTML5 tokenization visualization
  3. September 2011 — PHP, Variable variables, Oh my!
  4. July 2011 — Dissection of a SQL injection challenge
  5. January 2010 — Yes, we need a standard to evaluate SAST, but it ain't easy...
  6. November 2009 — Data driven factory: I give you data, you give me an object...
  7. June 2009 — NIST Static Analysis Tool Exposition special publication released
  8. December 2008 — Every-day's CSRF: Sorry, I turned off your christmas tree lights
  9. August 2008 — Why the "line of code" is indeed a good metric
  10. May 2008 — Accelerate the convergence to the bug: Running the test in 16-bit
  11. February 2008 — Code review tools: the missing link (so far)
  12. January 2008 — Talk: Problems and solutions for testing web application security scanners
  13. October 2007 — IE6 And IE7 don't have compatible CSS tricks
  14. September 2007 — Source Code Obfuscation
  15. February 2007 — The return of the SVG XSS
  16. February 2007 — How you should design a test suite for Web Apps Scanners
  17. January 2007 — Test Suites for Web Application Scanners
  18. December 2006 — SVG Files: XSS attacks