2012-05-22

Common Cartridge, Namespaces and Dependency Injection

This post is about coping with a significant change to the newer (public) versions of the IMS Common Cartridge specification.  This change won't affect everyone the same way, your implementation may just shrug it off.  However, I found I had to make an important change to the QTI migration tool code to make it possible to read QTI version 1 files from the newer form of cartridges.

There have been three versions of this specification now, versions 1.0, 1.1 and most recently version 1.2.  The significant change for me was between versions 1.0 (published October 2008) and 1.1 (revised May 2011).

Changing Namespaces

The key change between 1.0 and 1.1 was to the namespaces used in the XML files.  In version 1.0, the default namespace for content packaging elements is used in the manifest file: http://www.imsglobal.org/xsd/imscp_v1p1.

Content Packaging has also been through several revisions.  The v1p1 namespace (above) was defined in the widely used Content Packaging 1.1 (now on revision 4).  The same namespace was used for most of the elements in the (public draft) of the newer IMS Content Packaging version 1.2 specification too.  In this case, the decision was made to augment the revised specification with a new schema containing definitions of the new elements only.  The existing elements would stay in the 1.1 namespace to ensure that tools that recognise version 1.1 packages continue to work, ignoring the unrecognised extension elements.

Confusingly though, the schema definition provided with the content packaging specification is located here: http://www.imsglobal.org/xsd/imscp_v1p1.xsd whereas the schema definition provided with the common cartridge specification (1.0), for the same namespace, is located here: http://www.imsglobal.org/profile/cc/ccv1p0/derived_schema/imscp_v1p2.xsd.  That's two different definition files for the same namespace.  Given this discrepancy it is not surprising that newer revisions of common cartridge have chosen to use a new namespace entirely.  In the case of 1.1, the namespace used for the basic content packaging elements was changed to http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1.

But this decision is not without consequences.  The decision to retain a consistent namespace in the various revisions of the Content Packaging specification enabled existing tools to continue working.  Sure enough, the decision to change the namespace in Common Cartridge means that some tools will not continue working.  Including my Python libraries used in the QTI migration tool.

From Parser to Python Class

In the early days of XML, you could identify an element within a document by its name, scoped perhaps by the PUBLIC identifier given in the document type definition.  The disadvantage being that all elements had to be defined in the same scope.  Namespace prefixes were used to help sort this mess out.  A namespace aware parser splits off the namespace prefix (everything up to the colon) from the element name and uses it to identify the element by a pair of strings: the namespace (a URI) and the remainder of the element name.

The XML parser at the heart of my python libraries uses these namespace/name pairs as keys into a dictionary which it uses to look up the class object it should use to represent the element.  The advantage of this approach is that I can add behaviour to the XML elements when they are deserialized from their XML representations through the methods defined on the corresponding classes.  Furthermore, a rich class hierarchy can be defined allowing concepts such as XHTML's organization of elements into groups like 'inline elements' to be represented directly in the class hierarchy.

If I need two different XML definitions to map to the same class I can easily do this by adding multiple entries to the dictionary and mapping them to the same class.  So at first glance I seem to have avoided some of the problems inherent with tight-coupling of classes.  The following two elements could be mapped to the same Manifest class in my program:

('http://www.imsglobal.org/xsd/imscp_v1p1', 'manifest')
('http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1', 'manifest')

This would work fine when reading the manifest from the XML stream but what about writing manifests?  How does my Manifest class know which namespace to use when I'm creating a new manifest?  The following code snippet from the python interpreter shows me creating an instance of a Manifest (I pass None as the element's parent).  The instance knows which namespace it should be in:

>>> import pyslet.imscpv1p2 as cp
>>> m=cp.Manifest(None)
>>> print m

<manifest xmlns="http://www.imsglobal.org/xsd/imscp_v1p1">
 <organizations/>
 <resources/>
</manifest>

This clearly won't work for the new common cartridges.  The Manifest class 'knows' the namespace it is supposed to be in because its canonical XML name is provided as a class attribute on its definition.  The obvious solution is to wrap the class with a special common cartridge Manifest that overrides this attribute.  That is relatively easy to do, here is the updated definition:

class Manifest(cp.Manifest):
    XMLNAME=("http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1",'manifest')

Unfortunately, this doesn't do enough.  Continuing to use the python interpreter....

>>> class Manifest(cp.Manifest):
...     XMLNAME=("http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1",'manifest')
... 
>>> m=Manifest(None)
>>> print m

<manifest xmlns="http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1">
    <organizations xmlns="http://www.imsglobal.org/xsd/imscp_v1p1"/>
    <resources xmlns="http://www.imsglobal.org/xsd/imscp_v1p1"/>
</manifest>

Now we've got the namespace correct on the manifest but the required organizations and resources elements are still created in the old namespace.

The Return of Tight Coupling

If I'm going to fix this issue I'm going to have to wrap the classes used for all the elements in the Content Packaging specification.  That sounds like a bit of a chore but remember that the reason why the namespace has changed is because Common Cartridge has added some additional constraints to the specification so we're likely to have to override at least some of the behaviours too.

Unfortunately, wrapping the classes still isn't enough.  In the above example the organizations and resources elements are required children of the manifest.  So when I created my instance of the Manifest class the Manifest's constructor needed to create instances of the related Organizations and Resources classes and it does this using the default implementations, not the wrapped versions I've defined in my Common Cartridge module.  This is known as tight coupling, and the solution is to adopt a dependency injection solution.  For a more comprehensive primer on common solutions to this pattern you could do worse than reading Martin Fowler's article Inversion of Control Containers and the Dependency Injection pattern.

The important point here is that the logic inside my Manifest class, including the logic that takes place during construction, needs to be decoupled from the decision to use a particular class object to instantiate the Organizations and Resources elements.  These dependencies need to be injected into the code somehow.

I must admit, I find the example solutions in Java frameworks confusing because the additional coding required to satisfy the compiler makes it harder to see what is really going on.  There aren't many good examples of how to solve the problem in python.  The python wiki points straight to an article called Dependency Injection The Python Way.  But this article describes a full feature broker (like the service locator solution) which seems like overkill for my coupling problem.

A simpler solution is to pass dependencies in (in my case on the constructor) following a pattern similar to the one in this blogpost.   In fact, this poster is trying to solve a related problem of module-level dependeny but the basic idea is the same.  I could pass the wrapped class objects to the constructor.

Dependency Injection using Class Attributes

The spirit of the python language is certainly one of adopting the simplest solution that solves the problem.  So here is my dependency injection solution to this specific case of tight coupling.

I start by adding class attributes to set class dependencies.  My base Manifest class now looks something like this:

class Manifest:
    XMLNAME=("http://www.imsglobal.org/xsd/imscp_v1p1",'manifest')
    MetadataClass=Metadata
    OrganizationsClass=Organizations
    ResourcesClass=Resources

    # method definitions and other attributes follow...

And in my Common Cartridge module it is overridden like this:

class Manifest(cp.Manifest):
    XMLNAME=("http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1",'manifest')
    MetadataClass=Metadata
    OrganizationsClass=Organizations
    ResourcesClass=Resources

Although these look similar, in the first case the Metadata, Organizations and Resources names refer to classes in the base Content Packaging module whereas in the second definition they refer to overrides in the Common Cartridge Module (note the use of cp.Manifest to select the base class from the original Content Packaging module).

Now the original Manifest's constructor is modified to use these class attributes to create the required child elements:

    def __init__(self,parent):
        self.Metadata=None
        self.Organizations=self.OrganizationsClass(self)
        self.Resources=self.ResourcesClass(self)

The upshot is that when I create an instance of the Common Cartridge Manifest I don't need to override the constructor just to solve the dependency problem. The base class constructor will now create the correct Organizations and Resources members using the overridden class attributes.

I've abbreviated the code a bit, if you want to see the full implementation you can see it in the trunk of the pyslet framework.

2012-05-04

IMS LTI and the length of oauth_consumer_key

I ran in to an interesting problem today.  While playing around with the IMS LTI specification I ran into a problem with the restriction, in MySQL, on keys being 1000 bytes.


ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes


OAuth uses the concept of a consumer key to identify the system from which a signed HTTP request has been generated.  The consumer key can, in theory, be any Unicode string of characters and the specification is silent on the issue of a maximum length.  The LTI specification uses examples in which the consumer key is derived from the DNS name of the originating system, perhaps prefixed with some additional identifier.  A DNS name can be a maximum of 255 characters, but the character set of a DNS name is restricted to a simple ASCII subset.  International domain names are now allowed but these are transformed into the simpler form so the effective maximum for a domain name using characters outside the simple ASCII set is reduced.

It seems likely that an oauth_consumer_key is going to get used as a key in a database table at some point during your implementation.  The clue is in the name.

A field such as VARCHAR(255) seems reasonable as storage, provided the character set of the field can take arbitrary Unicode characters.   Unfortunately this is likely to reserve a large amount of space, MySQL reserves 3 bytes per character when the UTF-8 character set is used to ensure that worst case encoding is accommodated.  That means that this key alone takes up 765 bytes of the 1000 byte limit, leaving only 235 bytes for any compound keys.  If the compound key is also likely to be VARCHAR that's a maximum of VARCHAR(78), which seems short if the compound key is something like LTI's context_id which is also a size unrestricted arbitrary Unicode string.  The context_id identifies the course within the Tool Consumer so a combined key of oauth_consumer_key and context_id looks like a reasonable choice.

One possibility might be to collapse consumer key values onto ASCII using the same (or a similar) algorithm to the one used for international domain names (see RFC 3490).  This algorithm would then allow use of the ASCII character set for these keys with the benefit that keys based on domain names, even if expressed in the Unicode original form, would end up taking 255 bytes or less.  Doing the translation may add to the overhead of the look-up but the benefit of reducing the overall key size might pay off anyway.

2012-05-01

SOAP - how has it survived so long?

Pete Lacey’s Weblog : The S stands for Simple:

I just got handed this link by my development team and thought it was pretty funny.  Don't be put off by the long page, it is mainly comments, some of which are pretty funny in themselves.

I was still laughing when I saw that the post dates from 2006.  Suddenly it seems poignant instead, how come we are still wrestling with such a hard to fathom and implement 'simple' protocol when the truly simple protocol, HTTP, has been staring us in the face all these years?

'via Blog this'