SQLAlchemy is a next-generation Python Object Relational
mapper. Learn how to use the new 0.5 API, work with
third-party components, and a build a basic Web
application.
Introduction
Object Relational Mappers, or ORM's, have developed a
substantial amount of buzz in the last few years. Most of
this buzz is because ORM's are most often talked about
within the confines of Web application frameworks, as they
are critical component in a Rapid Development stack. Some
Web frameworks like Django and Ruby on Rails have taken the
approach of designing a monolithic stack that tightly
integrates a homegrown ORM to the framework, and others like
Pylons, Turbogears, and Grok have decided on more
component-based architecture with swappable third-party
components. Each approach has its advantages as a tight
integration can allow for a very cohesive experience if the
problem maps to the framework, and a component-based
architecture allows for maximum flexibility in design. This
article is not about Web frameworks, though; it is about
SQLAlchemy.
While SQLAlchemy is used in many of the next-generation
Python Web frameworks built on top of the WSGI
specification, it is developed as a standalone project by
Mike Bayer, and a core team of developers. One of the
advantages to using a standalone ORM like SQLAlchemy is that
it allows a developer to think about the data model first,
and lets them decide about how they want to visualize the
data later, be it in a command-line tool, or a Web
framework, or a GUI framework. This is a very different
approach to development than deciding to first use a Web
framework, or GUI framework, and then deciding how to use
use the data model within the confines of what the
frameworks philosophy will allow you to do.
One of the goals of SQLAlchemy is to provide an
enterprise-level persistence pattern that works with a wide
range of databases such as SQLite, MySQL, Postgres, Oracle,
MS-SQL, SQLServer, and Firebird. SQLAlchemy is in very
active development and the most current API revolves around
version 0.5. Please the resources section for links to the
official API documentation, tutorials, and a book on
SQLAlchemy.
One of the testiments to the success of SQLAlchemy is the
rich community that has developed around it. There are
several extensions and plugins to SQLAlchemy including:
declarative, Migrate, Elixir, SQLSoup, django-sqlalchemy,
DBSprockets, FormAlchemy, and z3c.sqlalchemy. In this
article we go through a tutorial on the new 0.5 API, examine
some of the third-party libraries, and finally look at how
it can be used in Pylons.
Installation
This article assumes you will use Python 2.5 or greater,
and subversion installed. Python 2.5 includes the SQLite
database and as such serves as a great in memory tool for
experimenting with SQLAlchemy. If you have Python 2.5
installed, you will only need to install the sqlalchemy 0.5
beta through the use of setuptools. To get the setup tools
script, download and run these four commands in your
terminal:
wget http://peak.telecommunity.com/dist/ez_setup.py
python ez_setup.py
sudo easy_install http://svn.sqlalchemy.org/sqlalchemy/trunk
sudo easy_install ipython
|
The first three lines of code check out the latest
version of sqlalchemy and add it as a package to the Python
installation on your local system. The final code snippit
installs IPython, which is a useful interactive Python
interpretor that I will use throughout the article. The
first thing to do is to test the version of SQLAlchemy
installed. You can test that you have version 0.5.x by
issuing this command in the IPython, or regular Python,
interpreter.
In [1]: import sqlalchemy
In [2]: sqlalchemy.__version__
Out[2]: '0.5.0beta1'
|
SQLAlchemy 0.5
quick-start guide
The new release of 0.5 brought about a few significant
changes to SQLAlchemy. Here is a quick rundown of the
changes:
- Declarative extension is the recommended way to
start in most cases.
- session.query() can accept any combination of
class/column expressions.
- session.query() is also more or less an ORM-enabled
replacement for select().
- Query has some experimental update()/delete()
methods on it for criterion-based updates/deletes.
- The Session expires itself after rollback() and
commit(); so using the default sessionmaker() means you
usually don't have to clear() or close(); Objects
synchronize with the current transaction automatically.
- Things go into the Session using session.add(),
session.add_all() (save/update/save_or_update are
deprecated).
Although the declarative extension has been in SQLAlchemy
since 0.4, it has undergone some small changes that make it
a powerful shortcut for most SQLAlchemy projects. The new
declarative syntax allows for the creation of a table,
class, and mapping to the database in one step. Let's take a
look at how this new syntax works for a tool that I wrote to
keep track of filesystem changes.
Listing 1. New SQLAlchemy
declarative style
#/usr/bin/env python2.5
#Noah Gift
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
Base = declarative_base()
class Filesystem(Base):
__tablename__ = 'filesystem'
path = Column(String, primary_key=True)
name = Column(String)
def __init__(self, path,name):
self.path = path
self.name = name
def __repr__(self):
return "<Metadata('%s','%s')>" % (self.path,self.name)
|
With this new declarative style, SQLAlchemy is able to create a table
in a database, create a class, and the mapping between the
class and the table in one spot. If you are new to
SQLAlchemy, you should probably learn this way of setting up
the ORM, but it is also good to know there is a more
explicit way of controlling each of these steps, as well, if
your projects require that level of detail.
If you look at this snippit of code, it is important to
point out a few things that might trip up people who are new
to either SQLAlchemy or to the new declarative extension.
The first thing to point out is that the line
Base = declarative_base()
|
creates a class that you then inherit from in the Filesystem
Class. If you save and run the code in the article
declarative_style, and then import it in IPython, you will
see this output:
In [2]: declarative_style.Filesystem?
Type: DeclarativeMeta
Base Class: <class 'sqlalchemy.ext.declarative.DeclarativeMeta'>
String Form: <class 'declarative_style.Filesystem'>
|
This DeclarativeMeta type is the magic that allows all of
the actions to occur in one simple class definition.
Another thing to point out about this example is that it
doesn't actually do anything yet. The actual table will not
be created until you run code to create a table, and you
also need to define what database engine SQLAlchemy will
use. Those two lines of code look like this:
engine = create_engine('sqlite:///meta.db', echo=True)
Base.metadata.create_all(engine)
|
SQlite is an ideal choice for experimentation with
SQLAlchemy, and you can choose to either use an in-memory
database, in which your line would look like this instead:
engine = create_engine('sqlite:///:memory:', echo=True)
|
or just create a simple file, as the first example
demonstrates. If you choose to create a SQLite file-based
database, you can easily start over from scratch without
deleting the file by just dropping all of the tables in the
database. You can do that by issuing this line of code:
Base.metadata.drop_all(engine)
|
At this point, we know enough to create a SQLAlchemy
project and control the database from the SQLAlchemy API.
The only other major item to tackle before getting into a
real-life example is the concept of a session. The
SQLAlchemy "official" documentation describes the session as
the handle to the database. In practical use, it allows for
distinct, transaction-based, connections to occur from a
pool of connections that SQLAlchemy has waiting. Inside of a
session it is typical to add data to the database, perform
queries, or delete data.
In order to create a session, peform these sequential
steps:
#establish Session type, only need to be done once for all sessions
Session = sessionmaker(bind=engine)
#create record object
create_record = Filesystem("/tmp/foo.txt", "foo.txt")
#make a unique session
session = Session()
#do stuff in session. We are adding a record here
session.add(create_record)
#commit the transaction
session.commit()
|
This is really all that needs to be done to get up and
running with SQLAlchemy. While SQLAlchemy has a very
sophisticated API that does many complicated things, it is
really very simple to get started with. To end this section,
I should also point out that when you created the engine in
the example above, you did it with echo=True. This is a very
handy way to see the actual SQL that is created by
SQLAlchemy. It is highly recommended to use this if you are
new to SQLAlchemy, as it will take away any perceived magic
about what SQLAlchemy does. Now run some of the code you
created and see the SQL to create a table.
Listing 2.
SQLAlchemy SQL table-creation output
2008-06-22 05:33:46,403 INFO
sqlalchemy.engine.base.Engine.0x..ec PRAGMA
table_info("filesystem")
2008-06-22 05:33:46,404 INFO sqlalchemy.engine.base.Engine.0x..ec {}
2008-06-22 05:33:46,405 INFO sqlalchemy.engine.base.Engine.0x..ec
CREATE TABLE filesystem (
path VARCHAR NOT NULL,
name VARCHAR,
PRIMARY KEY (path)
)
|
Pylesystem: A realtime
filesystem metadata indexer similar to Spotlight or Beagle
Talking abstractly about how you could use a tool is very
hard to follow for many people, so I'll show you how to
create a metadata tool using SQLAlchemy. The goal of this
tool is to monitor the filesystem, create and delete events,
and keep these changes in a SQLAlchemy database. If you have
ever used Spotlight on OS X, or Beagle on Linux®, then you
have used a real-time filesystem indexing tool. To follow
along, you need to run the Linux kernel 2.6.13 or higher.
The next example is a bit big, coming in at a little
under 100 lines of code. Look at the whole example, run it,
and then I will walk through what each section does. In
order to run this script, you must perform these actions in
the terminal:
- wget http://peak.telecommunity.com/dist/ez_setup.py
- sudo python ez_setup.py
- sudo easy_install
"http://git.dbzteam.org/?p=pyinotify.git;a=snapshot;h=HEAD;sf=tgz"
- sudo easy_install
http://svn.sqlalchemy.org/sqlalchemy/trunk
Listing 3. Filesystem
event-monitoring database
#/usr/bin/env python2.5
#Noah Gift 06/21/08
#tweaks by Mike Bayer 06/22/08
import os
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm import scoped_session
from pyinotify import *
path = "/tmp"
#SQLAlchemy
engine = create_engine('sqlite:///meta.db', echo=True)
Base = declarative_base()
Session = scoped_session(sessionmaker(bind=engine))
class Filesystem(Base):
__tablename__ = 'filesystem'
path = Column(String, primary_key=True)
name = Column(String)
def __init__(self, path,name):
self.path = path
self.name = name
def __repr__(self):
return "<Metadata('%s','%s')>" % (self.path,self.name)
def transactional(fn):
"""add transactional semantics to a method."""
def transact(self, *args):
session = Session()
try:
fn(self, session, *args)
session.commit()
except:
session.rollback()
raise
transact.__name__ = fn.__name__
return transact
class ProcessDir(ProcessEvent):
"""Performs Actions based on mask values"""
@transactional
def process_IN_CREATE(self, session, event):
print "Creating File and File Record:", event.pathname
create_record = Filesystem(event.pathname, event.path)
session.add(create_record)
@transactional
def process_IN_DELETE(self, session, event):
print "Removing:", event.pathname
delete_record = session.query(Filesystem).\
filter_by(path=event.pathname).one()
session.delete(delete_record)
def init_repository():
#Drop the table, then create again with each run
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
session = Session()
#Initial Directory Walking Addition Brute Force
for dirpath, dirnames, filenames in os.walk(path):
for file in filenames:
fullpath = os.path.join(dirpath, file)
record = Filesystem(fullpath, file)
session.add(record)
session.flush()
for record in session.query(Filesystem):
print "Database Record Number: Path: %s , File: %s " \
% (record.path, record.name)
session.commit()
if __name__ == "__main__":
init_repository()
#Pyionotify
wm = WatchManager()
mask = IN_DELETE | IN_CREATE
notifier = ThreadedNotifier(wm, ProcessDir())
notifier.start()
wdd = wm.add_watch(path, mask, rec=True)
|
To see the results of this script in action, you must
have two terminal windows open. In the first window, run the
pylesystem.py script. You will see a bunch of output that
looks something like this (Please note that the following
version is partially supressed for space):
2008-06-22 07:18:08,707 INFO
sqlalchemy.engine.base.Engine.0x..ec ['/tmp/ba.txt', 'ba.txt']
2008-06-22 07:18:08,710 INFO
sqlalchemy.engine.base.Engine.0x..ec COMMIT
2008-06-22 07:18:08,715 INFO
sqlalchemy.engine.base.Engine.0x..ec BEGIN
2008-06-22 07:18:08,716 INFO
sqlalchemy.engine.base.Engine.0x..ec SELECT filesystem.path
AS filesystem_path, filesystem.name AS filesystem_name
FROM filesystem
2008-06-22 07:18:08,716 INFO sqlalchemy.engine.base.Engine.0x..ec []
Database Record Number: Path: /tmp/ba.txt , File: ba.txt
|
This first script runs a multi-threaded file system
event-monitoring engine that writes all create and delete
changes to the /tmp directory into the sqlalchemy database.
Just a note: because it is multi-threaded, you will have to
type Control + \ to stop the threaded appliction when
you finish the tutorial.
Now that it is running, you can create events in the
second terminal window, and the newly created files or
deleted files will be added or subtracted to the database in
real time. If you simply "touch" a file in /tmp, such as
touch foobar.txt , you will see this output in
the first window:
Creating File and File Record: /tmp/foobar.txt
2008-06-22 08:02:19,468 INFO
sqlalchemy.engine.base.Engine.0x..4c BEGIN
2008-06-22 08:02:19,471 INFO
sqlalchemy.engine.base.Engine.0x..4c INSERT INTO filesystem (path, name) VALUES (?, ?)
2008-06-22 08:02:19,472 INFO
sqlalchemy.engine.base.Engine.0x..4c ['/tmp/foobar.txt', '/tmp']
2008-06-22 08:02:19,473 INFO
sqlalchemy.engine.base.Engine.0x..4c COMMIT
|
Remember earlier you enabled the SQL to echo? Because of
this, you can see the SQL statements as the code adds this
new entry to the filesystem. If you now delete that file,
you can see the delete happen, as well. Here is what it
looks like when you type in a rm statement, rm
foobar.txt :
Removing: /tmp/foobar.txt
2008-06-22 08:06:01,727 INFO
sqlalchemy.engine.base.Engine.0x..4c BEGIN
2008-06-22 08:06:01,733 INFO
sqlalchemy.engine.base.Engine.0x..4c SELECT filesystem.path
AS filesystem_path, filesystem.name AS filesystem_name
FROM filesystem
WHERE filesystem.path = ?
LIMIT 2 OFFSET 0
2008-06-22 08:06:01,733 INFO
sqlalchemy.engine.base.Engine.0x..4c ['/tmp/foobar.txt']
2008-06-22 08:06:01,736 INFO
sqlalchemy.engine.base.Engine.0x..4c DELETE FROM filesystem WHERE filesystem.path = ?
2008-06-22 08:06:01,736 INFO
sqlalchemy.engine.base.Engine.0x..4c [u'/tmp/foobar.txt']
2008-06-22 08:06:01,737 INFO
sqlalchemy.engine.base.Engine.0x..4c COMMIT
|
In the Filesystem class, you added a transactional method
that you will use a decorator to handle the semantics of the
commits to the database for filesystem events. The actual
Filesystem I/O monitoring in Pyinotify is done by the
ProcessDir class, which inherits and overrides methods from
ProcessEvents. If you notice the respective methods,
process_IN_CREATE and process_IN_DELETE each have the fancy
transactional decorator attached to them. All they do then
is take the create or delete events and make the changes to
the database.
There is also a method called initial_repository that
populates the database each time the script is run by
destroying and then recreating the tables to the database.
At the very bottom of the script, tell the Pyinotify code to
run indefinitely, and this is ultimately meant to run as a
deamon.
Summary
This article covered some of the new features of
SQLAlchemy, demonstrated how easy it is to get started, and
how very simple the API is to use. You also built an
incredibly powerful tool in under 100 lines of Python
code,by using the elegance of SQLAlchemy and the hard work
of an open-source library, Pyinotify. This is one of the
features of a simple yet powerful ORM. It takes the
mind-numbing complexity of dealing with relational
databases, and makes it a joy, not a burden. That energy can
then be devoted to spending time solving an interesting
problem, as SQLAlchemy will be the easy part.
If you are interested in learning more about SQLAlchemy,
you should read some of the
resources
listed at the end of this article. Listed there are an
excellent book and a tremendous amount of outstanding online
documentation. Finally, when you get the hang of things, you
might consider exploring some of the other projects that use
SQLAlchemy and extend it. One of the more interesting recent
SQLAlchemy-related projects is the Website reddit.com. It
was built using the pure WSGI framework Pylons, which
incorporates SQLAlchemy as its default ORM. I have included
a link to the full source code for reddit. With your
newfound knowledge of SQLAlchemy, you should be able to get
your own reddit clone up and running quickly, and should be
able to dig right into some of database queries. Good luck
and have fun! |
No comments:
Post a Comment