2014-02-24

A Dictionary-like Python interface for OData Part II: a Memory-backed OData Server

In my previous post, A Dictionary-like Python interface for OData I introduced a new sub-package I've added to Pyslet to implement support for OData version 2. You can download the latest version of the Pyslet package from the QTI Migration Tool & Pyslet home page.

To recap, I've decided to set about writing my own data access layer for Python that is modelled on the conventions of OData. I've validated the API by writing a concrete implementation in the form of an OData client. In this post I'll introduce the next step in the process which is a simple alternative implementation that uses a different underlying storage model, in other words, an implementation which uses something other than a remote OData server. I'll then expose this implementation as an OData server to validate that my data access layer API works from both perspectives.

Metadata

Unlike other frameworks for implementing OData services Pyslet starts with the metadata model, it is not automatically generated from your code, you must write it yourself. This differs from the object-first approach taken by other frameworks, illustrated here:

This picture is typical of a project using something like Microsoft's WCF. Essentially, there's a two-step process. You use something like Microsoft's entity framework to generate classes from a database schema, customise the classes a little and then the metadata model is auto-generated from your code model. Of course, you can go straight to code and implement your own code model that implements the appropriate queryable interface but this would typically be done for a specific model.

Contrast this with the approach taken by Pyslet where the entities are not model-specific classes. For example, when modelling the Northwind service there is no Python class called Product as there would be in the approach taken by other frameworks. Instead there is a generalised implementation of Entity which behaves like a dictionary. The main difference is probably that you'll use supplier['Phone'] instead of simply supplier.phone or, if you'd have gone down the getter/setter route, supplier.GetPhone(). In my opinion, this works better than a tighter binding for a number of reasons, but particularly because it makes the user more mindful of when data access is happening and when it isn't.

Using a looser binding also helps prevent the type of problems I had during the development of the QTI specification. Lots of people were using Java and JAXB to autogenerate classes from the XML specification (cf autogenerating classes from a database schema) but the QTI model contained a class attribute on most elements to allow for stylesheet support. This class attribute prevented auto-generation because class is a reserved word in the Java language. Trying to fix this up after auto-generation would be madness but fixing it up before turns out to be a little tricky and this glitch seriously damaged the specification's user-experience. We got over it, but I'm wary now and when modelling OData I stepped back from a tighter binding, in part, to prevent hard to fix glitches like the use of Python reserved words as property names.

Allocating Storage

For this blog post I'm using a lightweight in-memory data storage implementation which can be automatically provisioned from the metadata document and I'm going to cheat by making a copy of the metadata document used by the Northwind service. Exposing OData the Pyslet way is a little more work if you already have a SQL database containing your data because I don't have a tool that auto-generates the metadata document from the SQL database schema. Automating the other direction is easy, but more on that in Part III.

I used my web browser to grab a copy of http://services.odata.org/V2/Northwind/Northwind.svc/$metadata and saved it to a file called Northwind.xml. I can then load the model from the interpreter:

>>> import pyslet.odata2.metadata as edmx
>>> doc=edmx.Document()
>>> f=open('Northwind.xml')
>>> doc.Read(f)
>>> f.close()

This special Document class ensures that the model is loaded with the special Pyslet element implementations. The Products entity set can be looked up directly but at the moment it's empty!

>>> productSet=doc.root.DataServices['ODataWeb.Northwind.Model.NorthwindEntities.Products']
>>> products=productSet.OpenCollection()
>>> len(products)
0
>>> products.close()

This isn't surprising, there is nothing in the metadata model itself which binds it to the data service at services.odata.org. The model isn't linked to any actual storage for the data. By default, the model behaves as if it is bound to an empty read-only data store.

To help me validate that my API can be used for something other than talking to real OData services I've created an object that provisions storage for an EntityContainer (that's like a database in OData) using standard Python dictionaries. By passing the definition of an EntityContainer to the object's constructor I create a binding between the model and this new data store.

>>> from pyslet.odata2.memds import InMemoryEntityContainer
>>> container=InMemoryEntityContainer(doc.root.DataServices['ODataWeb.Northwind.Model.NorthwindEntities'])
>>> products=productSet.OpenCollection()
>>> len(products)
0

The collection of products is still empty but it is now writeable. I'm going to cheat again to illustrate this by borrowing some code from the previous blog post to open an OData client connected to the real Northwind service.

>>> from pyslet.odata2.client import Client
>>> c=Client("http://services.odata.org/V2/Northwind/Northwind.svc/")
>>> nwProducts=c.feeds['Products'].OpenCollection()

Here's a simple loop to copy the products from the real service into my own collection. It's a bit clumsy in the interpreter but careful typing pays off:

>>> for nwProduct in nwProducts.itervalues():
...   product=collection.CopyEntity(nwProduct)
...   product.SetKey(nwProduct.Key())
...   collection.InsertEntity(product)
... 
>>> len(collection)
77

To emphasise the difference between my in-memory collection and the live OData service I'll add another record to my copy of this entity set. Fortunately most of the fields are marked as Nullable in the model so to save my fingers I'll just set those that aren't.

>>> product=collection.NewEntity()
>>> product.SetKey(100)
>>> product['ProductName'].SetFromValue("The one and only Pyslet")
>>> product['Discontinued'].SetFromValue(False)
>>> collection.InsertEntity(product)
>>> len(collection)
78

Now I can do everything I can with the OData client using my copy of the service, I'll filter the entities to make it easier to see:

>>> import pyslet.odata2.core as core
>>> filter=core.CommonExpression.FromString("substringof('one',ProductName)")
>>> collection.Filter(filter)
>>> for p in collection.itervalues(): print p.Key(), p['ProductName'].value
... 
21 Sir Rodney's Scones
32 Mascarpone Fabioli
100 The one and only Pyslet

I can access my own data store using the same API that I used to access a remote OData service in the previous post. In that post, I also claimed that it was easy to wrap my own implementations of this API to expose it as an OData service.

Exposing an OData Server

My OData server class implements the wsgi protocol so it is easy to link it up to a simple http server and tell it to handle a single request.

>>> from pyslet.odata2.server import Server
>>> server=Server("http://localhost:8081/")
>>> server.SetModel(doc)
>>> from wsgiref.simple_server import make_server
>>> httpServer=make_server('',8081,server)
>>> httpServer.handle_request()

My interpreter session is hanging at this point waiting for a single HTTP connection. The Northwind service doesn't have any feed customisations on the Products feed and, as we slavishly copied it, the Atom-view in the browser is a bit boring so I used the excellent JSONView plugin for Firefox and the following URL to hit my service:

http://localhost:8081/Products?$filter=substringof('one',ProductName)&$orderby=ProductID desc&$format=json

This is the same filter as I used in the interpreter before but I've added an ordering and specified my preference for JSON format. Here's the result.

As I did this, Python's simple server object logged the following output to my console:

127.0.0.1 - - [24/Feb/2014 11:17:05] "GET /Products?$filter=substringof(%27one%27,ProductName)&$orderby=ProductID%20desc&$format=json HTTP/1.1" 200 1701
>>>

The in-memory data store is a bit of a toy, though some more useful applications might be possible. In the OData documentation I go through a tutorial on how to create a lightweight memory-cache of key-value pairs exposed as an OData service. I'm not really suggestion using it in a production environment to replace memcached. What this implementation is really useful for is developing and testing applications that consume the DAL API without needing to be connected to the real data source. Also, it can be wrapped in the OData Server class as shown above and used to provide a more realistic mock of an actual service for testing that your consumer application still works when the data service is remote. I've used it in Pyslet's unit-tests this way.

In the third and final part of this Python and OData series I'll cover a more interesting implementation of the API using the SQLite database.

2014-02-12

A Dictionary-like Python interface for OData

Overview

This blog post introduces some new modules that I've added to the Pyslet package I wrote. Pyslet's purpose is providing support for Standards for Learning, Education and Training in Python. The new modules implement the OData protocol by providing a dictionary-like interface. You can download pyslet from the QTIMigration Tool & Pyslet home page. There is some documentation linked from the main Pyslet wiki. This blog article is as good a way as any to get you started.

The Problem

Python has a database API which does a good job but it is not the whole solution for data access. Embedding SQL statements in code, grappling with the complexities of parameterization and dealing with individual database quirks makes it useful to have some type of layer between your web app and the database API so that you can tweak your code as you move between data sources.

If SQL has failed to be a really interoperable standard then perhaps OData, the new kid on the block, can fill the vacuum. The standard is sometimes referred to as "ODBC over the web" so it is definitely in this space (after all, who runs their database on the same server as their web app these days?).

My Solution

To solve this problem I decided to set about writing my own data access layer that would be modeled on the conventions of OData but that used some simple concepts in Python. I decided to go down the dictionary-like route, rather than simulating objects with attributes, because I find the code more transparent that way. Implementing methods like __getitem__, __setitem__ and itervalues keeps the data layer abstraction at arms length from the basic python machinery. It is a matter of taste. See what you think.

The vision here is to write a single API (represented by a set of base classes) that can be implemented in different ways to access different data sources. There are three steps:

  1. An implementation that uses the OData protocol to talk to a remote OData service.
  2. An implementation that uses python dictionaries to create a transient in-memory data service for testing.
  3. An implementation that uses the python database API to access a real database.

This blog post is mainly about the first step, which should validate the API as being OData-like and set the groundwork for the others which I'll describe in subsequent blog posts. Incidentally, it turns out to be fairly easy to write an OData server that exposes a data service written to this API, more on that in future posts.

Quick Tutorial

The client implementation uses Python's logging module to provide logging. To make it easier to see what is going on during this walk through I'm going to turn logging up from the default "WARN" to "INFO":

>>> import logging
>>> logging.basicConfig(level=logging.INFO)

To create a new OData client you simply instantiate a Client object passing the URL of the OData service root. Notice that, during construction, the Client object downloads the list of feeds followed by the metadata document. The metadata document is used extensively by this module and is loaded into a DOM-like representation.

>>> from pyslet.odata2.client import Client
>>> c=Client("http://services.odata.org/V2/Northwind/Northwind.svc/")
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/ HTTP/1.1
INFO:root:Finished Response, status 200
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/$metadata HTTP/1.1
INFO:root:Finished Response, status 200

Client objects have a feeds attribute that is a plain dictionary mapping the exposed feeds (by name) onto EntitySet objects. These objects are part of the metadata model but serve a special purpose in the API as they can be opened (a bit like files or directories) to gain access to the (collections of) entities themselves. Collection objects can be used in the with statement and that's normally how you'd use them but I'm sticking with the interactive terminal for now.

>>> products=c.feeds['Products'].OpenCollection()
>>> for p in products: print p
... 
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products HTTP/1.1
INFO:root:Finished Response, status 200
1
2
3
... [and so on]
...
20
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products?$skiptoken=20 HTTP/1.1
INFO:root:Finished Response, status 200
21
22
23
... [and so on]
...
76
77

The products collection behaves like a dictionary, iterating through it iterates through the keys in the dictionary. In this case these are the keys of the entities in the collection of products in Microsoft's sample Northwind data service. Notice that the client logs several requests to the server interspersed with the printed output. That's because the server is limiting the maximum page size and the client is following the page links provided. These calls are made as you iterate through the collection allowing you to iterate through very large collections without loading everything in to memory.

The keys alone are of limited interest, let's try a similar loop but this time we'll print the product names as well:

>>> for k,p in products.iteritems(): print k,p['ProductName'].value
... 
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products HTTP/1.1
INFO:root:Finished Response, status 200
1 Chai
2 Chang
3 Aniseed Syrup
...
...
20 Sir Rodney's Marmalade
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products?$skiptoken=20 HTTP/1.1
INFO:root:Finished Response, status 200
21 Sir Rodney's Scones
22 Gustaf's Knäckebröd
23 Tunnbröd
...
...
76 Lakkalikööri
77 Original Frankfurter grüne Soße

Sir Rodney's Scones sound interesting, we can grab an individual record just as we normally would from a dictionary, by using its key.

>>> scones=products[21]
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products(21) HTTP/1.1
INFO:root:Finished Response, status 200
>>> for k,v in scones.DataItems(): print k,v.value
... 
ProductID 21
ProductName Sir Rodney's Scones
SupplierID 8
CategoryID 3
QuantityPerUnit 24 pkgs. x 4 pieces
UnitPrice 10.0000
UnitsInStock 3
UnitsOnOrder 40
ReorderLevel 5
Discontinued False

The scones object is an Entity object. It too behaves like a dictionary. The keys are the property names and the values are one of SimpleValue, Complex or DeferredValue. In the snippet above I've used a variation of iteritems which iterates only through the data properties, excluding the navigation properties. In this model, there are no complex properties. The simple values have a value attribute which contains a python representation of the value.

Deferred values (navigation properties) can be used to navigate between Entities. Although deferred values can be opened just like EntitySets, if the model dictates that at most 1 entity can be linked a convenience method called GetEntity can be used to open the collection and read the entity in one call. In this case, a product can have at most one supplier.

>>> supplier=scones['Supplier'].GetEntity()
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products(21)/Supplier HTTP/1.1
INFO:root:Finished Response, status 200
>>> for k,v in supplier.DataItems(): print k,v.value
... 
SupplierID 8
CompanyName Specialty Biscuits, Ltd.
ContactName Peter Wilson
ContactTitle Sales Representative
Address 29 King's Way
City Manchester
Region None
PostalCode M14 GSD
Country UK
Phone (161) 555-4448
Fax None
HomePage None

Continuing with the dictionary-like theme, attempting to load a non existent entity results in a KeyError:

>>> p=products[211]
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products(211) HTTP/1.1
INFO:root:Finished Response, status 404
Traceback (most recent call last):
  File "", line 1, in 
  File "/Library/Python/2.7/site-packages/pyslet/odata2/client.py", line 165, in __getitem__
 raise KeyError(key)
KeyError: 211

Finally, when we're done, it is a good idea to close the open collection. If we'd used the with statement this step would have been done automatically for us of course.

>>> products.close()

Limitations

Currently the client only supports OData version 2. Version 3 has now been published and I do intend to update the classes to speak version 3 at some point. If you try and connect to a version 3 service the client will complain when it tries to load the metadata document. There are ways around this limitation, if you are interested add a comment to this post and I'll add some documentation.

The client only speaks XML so if your service only speaks JSON it won't work at the moment. Most of the JSON code is done and tested so adding it shouldn't be a big issue if you are interested.

The client can be used to both read and write to a service, and there are even ways of passing basic authentication credentials. However, if calling an https URL it doesn't do certificate validation at the moment so be warned as your security could be compromised. Python 2.7 does now support certification validation using OpenSLL so this could change quite easily I think.

Moving to Python 3 is non-trivial - let me know if you are interested. I have taken the first steps (running unit tests with "python -3Wd" to force warnings) and, as much as possible, the code is ready for migration. I haven't tried it yet though and I know that some of the older code (we're talking 10-15 years here) is a bit sensitive to the raw/unicode string distinction.

The documentation is currently about 80% accurate and only about 50% useful. Trending upwards though.

Downloading and Installing Pyslet

Pyslet is pure-python. If you are only interested in OData you don't need any other modules, just Python 2.7 and a reasonable setuptools to help you install it. I just upgraded my machine to Mavericks which effectively reset my Python environment. Here's what I did to get Pyslet running.

  1. Installed setuptools
  2. Downloaded the pyslet package tgz and unpacked it (download from here)
  3. Ran python setup.py install

Why?

Some lessons are hard! Ten years or so ago I wrote a migration tool to convert QTI version 1 to QTI version 2 format. I wrote it as a Python script and used it to validate the work the project team were doing on the version 2 specification itself. Realising that most people holding QTI content weren't able to easily run a Python script (especially on Windows PCs) my co-chair Pierre Gorissen wrote a small Windows-wrapper for the script using the excellent wxPython and published an installer via his website. From then on, everyone referred to it as "Pierre's migration tool". I'm not bitter, the lesson was clear. No point in writing the tool if you don't package it up in the way people want to use it.

This sentiment brings me to the latest developments with the tool. A few years back I wrote (and blogged about) a module for writing Basic LTI tools in Python. I did this partly to prove that LTI really was simple (I wrote the entire module on a single flight to the US) but also because I believed that the LTI specification was really on to something useful. LTI has been a huge success and offers a quick route for tool developers to gain access to users of learning management systems. It seems obvious that the next version of the QTI Migration Tool should be an LTI tool but moving from a desktop app to a server-based web-app means that I need a data access layer that can persist data and be smarter about things like multiple threads and processes.

2014-02-05

Deleting from iCalendar Without Notifying the Organizer - At Last!

Being a bit of a laggard I only just upgraded my Mac to Mavericks a few days ago.  If only I'd known they had fixed the number one most annoying thing about iCalendar in OS X I'd have upgraded ages ago.



Yes, you can now delete an event from your calendar without notifying the organizer.  This has caused me serious pain in the past.  Events sometimes arrive to the wrong email address or get accidentally put in the wrong calendar and previously you had to just leave them there for fear of sending a stupid "Steve declined your event" type email.

Thank you!