2014-05-12

A Dictionary-like Python interface for OData Part III: a SQL-backed OData Server

This is the third and last part of a series of three posts that introduce my OData framework for Python. To recap:

  1. In Part I I introduced a new data access layer I've written for Python that is modelled on the conventions of OData. In that post I validated the API by writing a concrete implementation in the form of an OData client.
  2. In Part II I used the same API and wrote a concrete implementation using a simple in-memory storage model. I also introduced the OData server functionality to expose the API via the OData protocol.
  3. In this part, I conclude this mini-series with a quick look at another concrete implementation of the API which wraps Python's DB API allowing you to store data in a SQL environment.

As before, you can download the source code from the QTIMigration Tool & Pyslet home page. I wrote a brief tutorial on using the SQL backed classes to take care of some of the technical details.

Rain or Shine?

To make this project a little more interesting I went looking for a real data set to play with. I'm a bit of a weather watcher at home and for almost 20 years I've enjoyed using a local weather station run by a research group at the University of Cambridge. The group is currently part of the Cambridge Computer Laboratory and the station has moved to the William Gates building.

The Database

The SQL implementation comes in two halves. The base classes are as close to standard SQL as I could get and then a small 'shim' sits over the top which binds to a specific database implementation. The Python DB API takes you most of the way, including helping out with the correct form of parameterisation to use. For this example project I used SQLite because the driver is typically available in Python implementations straight out of the box.

I wrote the OData-style metadata document first and used it to automatically generate the CREATE TABLE commands but in most cases you'll probably have an existing database or want to edit the generated scripts and run them by hand. The main table in my schema got created from this SQL:

CREATE TABLE "DataPoints" (
    "TimePoint" TIMESTAMP NOT NULL,
    "Temperature" REAL,
    "Humidity" SMALLINT,
    "DewPoint" REAL,
    "Pressure" SMALLINT,
    "WindSpeed" REAL,
    "WindDirection" TEXT,
    "WindSpeedMax" REAL,
    "SunRainStart" REAL,
    "Sun" REAL,
    "Rain" REAL,
    "DataPointNotes_ID" INTEGER,
    PRIMARY KEY ("TimePoint"),
    CONSTRAINT "DataPointNotes" FOREIGN KEY ("DataPointNotes_ID") REFERENCES "Notes"("ID"))

To expose the database via my new data-access-layer API you just load the XML metadata, create a SQL container object containing the concrete implementation and then you can access the data in exactly the same way as I did in Part's I and II. The code that consumes the API doesn't need to know if the data source is an OData client, an in memory dummy source or a full-blown SQL database. Once I'd loaded the data, here is a simple session with the Python interpreter that shows you the API in action.

>>> import pyslet.odata2.metadata as edmx
>>> import pyslet.odata2.core as core
>>> doc=edmx.Document()
>>> with open('WeatherSchema.xml','rb') as f: doc.Read(f)
... 
>>> from pyslet.odata2.sqlds import SQLiteEntityContainer
>>> container=SQLiteEntityContainer(filePath='weather.db',containerDef=doc.root.DataServices['WeatherSchema.CambridgeWeather'])
>>> weatherData=doc.root.DataServices['WeatherSchema.CambridgeWeather.DataPoints']
>>> collection=weatherData.OpenCollection()
>>> collection.OrderBy(core.CommonExpression.OrderByFromString('WindSpeedMax desc'))
>>> collection.SetPage(5)
>>> for e in collection.iterpage(): print "%s: Max wind speed: %0.1f mph"%(unicode(e['TimePoint'].value),e['WindSpeedMax'].value*1.15078)
... 
2002-10-27T10:30:00: Max wind speed: 85.2 mph
2004-03-20T15:30:00: Max wind speed: 82.9 mph
2007-01-18T14:30:00: Max wind speed: 80.6 mph
2004-03-20T16:00:00: Max wind speed: 78.3 mph
2005-01-08T06:00:00: Max wind speed: 78.3 mph

Notice that the container itself isn't needed when accessing the data because the SQLiteEntityContainer __init__ method takes care of binding the appropriate collection classes to the model passed in. Unfortunately the dataset doesn't go all the way back to the great storm of 1987 which is a shame as at the time I was living in a 5th floor flat perched on top of what I was reliably informed was the highest building in Cambridge not to have some form of structural support. I woke up when the building shook so much my bed moved across the floor.

Setting up a Server

I used the same technique as I did in Part II to wrap the API with an OData server and then had some real fun getting it up and running on Amazon's EC2. Pyslet requires Python 2.7 but EC2 Linux comes with Python 2.6 out of the box. Thanks to this blog article for help with getting Python 2.7 installed. I also had to build mod_wsgi from scratch in order to get it to pick up the version I wanted. Essentially here's what I did:

# Python 2.7 install
sudo yum install make automake gcc gcc-c++ kernel-devel git-core -y
sudo yum install python27-devel -y
# Apache install
#  Thanks to http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-LAMP.html
sudo yum groupinstall -y "Web Server"
sudo service httpd start
sudo chkconfig httpd on

And to get mod_wsgi working with Python2.7...

sudo bash
cd
yum install httpd-devel -y
mkdir downloads
cd downloads
wget http://modwsgi.googlecode.com/files/mod_wsgi-3.4.tar.gz
tar -xzvf mod_wsgi-3.4.tar.gz
cd mod_wsgi-3.4
./configure --with-python=/usr/bin/python2.7
make
make install
# Optional check to ensure that we've got the correct Python linked
# you should see the 2.7 library linked
ldd /etc/httpd/modules/mod_wsgi.so
service httpd restart

To drive the server with mod_wsgi I used a script like this:

#! /usr/bin/env python

import logging, os.path
import pyslet.odata2.metadata as edmx
from pyslet.odata2.sqlds import SQLiteEntityContainer
from pyslet.odata2.server import ReadOnlyServer

HOME_DIR=os.path.split(os.path.abspath(__file__))[0]
SERVICE_ROOT="http://odata.pyslet.org/weather"

logging.basicConfig(filename='/var/www/wsgi-log/python.log',level=logging.INFO)

doc=edmx.Document()
with open(os.path.join(HOME_DIR,'WeatherSchema.xml'),'rb') as f:
    doc.Read(f)

container=SQLiteEntityContainer(filePath=os.path.join(HOME_DIR,'weather.db'),
    containerDef=doc.root.DataServices['WeatherSchema.CambridgeWeather'])

server=ReadOnlyServer(serviceRoot=SERVICE_ROOT)
server.SetModel(doc)

def application(environ, start_response):
 return server(environ,start_response)

I'm relying on the fact that Apache is configured to run Python internally and that my server object persists between calls. I think by default mod_wsgi serialises calls to the application method but a smarter configuration with a multi-threaded daemon would be OK because the server and container objects are thread safe. There are limits to the underlying SQLite module of course so you may not gain a lot of performance this way but a proper database would help.

Try it out!

If you were watching carefully you'll see that the above script uses a public service root. So let's try the same query but this time using OData. Here it is in Firefox:

Notice that Firefox recognises that the OData feed is an Atom feed and displays the syndication title and updated date. I used the metadata document to map the temperature and the date of the observation to these (you can see they are the same data points as above by the matching dates). The windiest days are never particularly hot or cold in Cambridge because they are almost always associated with Atlantic storms and the sea temperature just doesn't change that much.

The server is hosted at http://odata.pyslet.org/weather