2014-02-24

A Dictionary-like Python interface for OData Part II: a Memory-backed OData Server

In my previous post, A Dictionary-like Python interface for OData I introduced a new sub-package I've added to Pyslet to implement support for OData version 2. You can download the latest version of the Pyslet package from the QTI Migration Tool & Pyslet home page.

To recap, I've decided to set about writing my own data access layer for Python that is modelled on the conventions of OData. I've validated the API by writing a concrete implementation in the form of an OData client. In this post I'll introduce the next step in the process which is a simple alternative implementation that uses a different underlying storage model, in other words, an implementation which uses something other than a remote OData server. I'll then expose this implementation as an OData server to validate that my data access layer API works from both perspectives.

Metadata

Unlike other frameworks for implementing OData services Pyslet starts with the metadata model, it is not automatically generated from your code, you must write it yourself. This differs from the object-first approach taken by other frameworks, illustrated here:

This picture is typical of a project using something like Microsoft's WCF. Essentially, there's a two-step process. You use something like Microsoft's entity framework to generate classes from a database schema, customise the classes a little and then the metadata model is auto-generated from your code model. Of course, you can go straight to code and implement your own code model that implements the appropriate queryable interface but this would typically be done for a specific model.

Contrast this with the approach taken by Pyslet where the entities are not model-specific classes. For example, when modelling the Northwind service there is no Python class called Product as there would be in the approach taken by other frameworks. Instead there is a generalised implementation of Entity which behaves like a dictionary. The main difference is probably that you'll use supplier['Phone'] instead of simply supplier.phone or, if you'd have gone down the getter/setter route, supplier.GetPhone(). In my opinion, this works better than a tighter binding for a number of reasons, but particularly because it makes the user more mindful of when data access is happening and when it isn't.

Using a looser binding also helps prevent the type of problems I had during the development of the QTI specification. Lots of people were using Java and JAXB to autogenerate classes from the XML specification (cf autogenerating classes from a database schema) but the QTI model contained a class attribute on most elements to allow for stylesheet support. This class attribute prevented auto-generation because class is a reserved word in the Java language. Trying to fix this up after auto-generation would be madness but fixing it up before turns out to be a little tricky and this glitch seriously damaged the specification's user-experience. We got over it, but I'm wary now and when modelling OData I stepped back from a tighter binding, in part, to prevent hard to fix glitches like the use of Python reserved words as property names.

Allocating Storage

For this blog post I'm using a lightweight in-memory data storage implementation which can be automatically provisioned from the metadata document and I'm going to cheat by making a copy of the metadata document used by the Northwind service. Exposing OData the Pyslet way is a little more work if you already have a SQL database containing your data because I don't have a tool that auto-generates the metadata document from the SQL database schema. Automating the other direction is easy, but more on that in Part III.

I used my web browser to grab a copy of http://services.odata.org/V2/Northwind/Northwind.svc/$metadata and saved it to a file called Northwind.xml. I can then load the model from the interpreter:

>>> import pyslet.odata2.metadata as edmx
>>> doc=edmx.Document()
>>> f=open('Northwind.xml')
>>> doc.Read(f)
>>> f.close()

This special Document class ensures that the model is loaded with the special Pyslet element implementations. The Products entity set can be looked up directly but at the moment it's empty!

>>> productSet=doc.root.DataServices['ODataWeb.Northwind.Model.NorthwindEntities.Products']
>>> products=productSet.OpenCollection()
>>> len(products)
0
>>> products.close()

This isn't surprising, there is nothing in the metadata model itself which binds it to the data service at services.odata.org. The model isn't linked to any actual storage for the data. By default, the model behaves as if it is bound to an empty read-only data store.

To help me validate that my API can be used for something other than talking to real OData services I've created an object that provisions storage for an EntityContainer (that's like a database in OData) using standard Python dictionaries. By passing the definition of an EntityContainer to the object's constructor I create a binding between the model and this new data store.

>>> from pyslet.odata2.memds import InMemoryEntityContainer
>>> container=InMemoryEntityContainer(doc.root.DataServices['ODataWeb.Northwind.Model.NorthwindEntities'])
>>> products=productSet.OpenCollection()
>>> len(products)
0

The collection of products is still empty but it is now writeable. I'm going to cheat again to illustrate this by borrowing some code from the previous blog post to open an OData client connected to the real Northwind service.

>>> from pyslet.odata2.client import Client
>>> c=Client("http://services.odata.org/V2/Northwind/Northwind.svc/")
>>> nwProducts=c.feeds['Products'].OpenCollection()

Here's a simple loop to copy the products from the real service into my own collection. It's a bit clumsy in the interpreter but careful typing pays off:

>>> for nwProduct in nwProducts.itervalues():
...   product=collection.CopyEntity(nwProduct)
...   product.SetKey(nwProduct.Key())
...   collection.InsertEntity(product)
... 
>>> len(collection)
77

To emphasise the difference between my in-memory collection and the live OData service I'll add another record to my copy of this entity set. Fortunately most of the fields are marked as Nullable in the model so to save my fingers I'll just set those that aren't.

>>> product=collection.NewEntity()
>>> product.SetKey(100)
>>> product['ProductName'].SetFromValue("The one and only Pyslet")
>>> product['Discontinued'].SetFromValue(False)
>>> collection.InsertEntity(product)
>>> len(collection)
78

Now I can do everything I can with the OData client using my copy of the service, I'll filter the entities to make it easier to see:

>>> import pyslet.odata2.core as core
>>> filter=core.CommonExpression.FromString("substringof('one',ProductName)")
>>> collection.Filter(filter)
>>> for p in collection.itervalues(): print p.Key(), p['ProductName'].value
... 
21 Sir Rodney's Scones
32 Mascarpone Fabioli
100 The one and only Pyslet

I can access my own data store using the same API that I used to access a remote OData service in the previous post. In that post, I also claimed that it was easy to wrap my own implementations of this API to expose it as an OData service.

Exposing an OData Server

My OData server class implements the wsgi protocol so it is easy to link it up to a simple http server and tell it to handle a single request.

>>> from pyslet.odata2.server import Server
>>> server=Server("http://localhost:8081/")
>>> server.SetModel(doc)
>>> from wsgiref.simple_server import make_server
>>> httpServer=make_server('',8081,server)
>>> httpServer.handle_request()

My interpreter session is hanging at this point waiting for a single HTTP connection. The Northwind service doesn't have any feed customisations on the Products feed and, as we slavishly copied it, the Atom-view in the browser is a bit boring so I used the excellent JSONView plugin for Firefox and the following URL to hit my service:

http://localhost:8081/Products?$filter=substringof('one',ProductName)&$orderby=ProductID desc&$format=json

This is the same filter as I used in the interpreter before but I've added an ordering and specified my preference for JSON format. Here's the result.

As I did this, Python's simple server object logged the following output to my console:

127.0.0.1 - - [24/Feb/2014 11:17:05] "GET /Products?$filter=substringof(%27one%27,ProductName)&$orderby=ProductID%20desc&$format=json HTTP/1.1" 200 1701
>>>

The in-memory data store is a bit of a toy, though some more useful applications might be possible. In the OData documentation I go through a tutorial on how to create a lightweight memory-cache of key-value pairs exposed as an OData service. I'm not really suggestion using it in a production environment to replace memcached. What this implementation is really useful for is developing and testing applications that consume the DAL API without needing to be connected to the real data source. Also, it can be wrapped in the OData Server class as shown above and used to provide a more realistic mock of an actual service for testing that your consumer application still works when the data service is remote. I've used it in Pyslet's unit-tests this way.

In the third and final part of this Python and OData series I'll cover a more interesting implementation of the API using the SQLite database.

2014-02-12

A Dictionary-like Python interface for OData

Overview

This blog post introduces some new modules that I've added to the Pyslet package I wrote. Pyslet's purpose is providing support for Standards for Learning, Education and Training in Python. The new modules implement the OData protocol by providing a dictionary-like interface. You can download pyslet from the QTIMigration Tool & Pyslet home page. There is some documentation linked from the main Pyslet wiki. This blog article is as good a way as any to get you started.

The Problem

Python has a database API which does a good job but it is not the whole solution for data access. Embedding SQL statements in code, grappling with the complexities of parameterization and dealing with individual database quirks makes it useful to have some type of layer between your web app and the database API so that you can tweak your code as you move between data sources.

If SQL has failed to be a really interoperable standard then perhaps OData, the new kid on the block, can fill the vacuum. The standard is sometimes referred to as "ODBC over the web" so it is definitely in this space (after all, who runs their database on the same server as their web app these days?).

My Solution

To solve this problem I decided to set about writing my own data access layer that would be modeled on the conventions of OData but that used some simple concepts in Python. I decided to go down the dictionary-like route, rather than simulating objects with attributes, because I find the code more transparent that way. Implementing methods like __getitem__, __setitem__ and itervalues keeps the data layer abstraction at arms length from the basic python machinery. It is a matter of taste. See what you think.

The vision here is to write a single API (represented by a set of base classes) that can be implemented in different ways to access different data sources. There are three steps:

  1. An implementation that uses the OData protocol to talk to a remote OData service.
  2. An implementation that uses python dictionaries to create a transient in-memory data service for testing.
  3. An implementation that uses the python database API to access a real database.

This blog post is mainly about the first step, which should validate the API as being OData-like and set the groundwork for the others which I'll describe in subsequent blog posts. Incidentally, it turns out to be fairly easy to write an OData server that exposes a data service written to this API, more on that in future posts.

Quick Tutorial

The client implementation uses Python's logging module to provide logging. To make it easier to see what is going on during this walk through I'm going to turn logging up from the default "WARN" to "INFO":

>>> import logging
>>> logging.basicConfig(level=logging.INFO)

To create a new OData client you simply instantiate a Client object passing the URL of the OData service root. Notice that, during construction, the Client object downloads the list of feeds followed by the metadata document. The metadata document is used extensively by this module and is loaded into a DOM-like representation.

>>> from pyslet.odata2.client import Client
>>> c=Client("http://services.odata.org/V2/Northwind/Northwind.svc/")
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/ HTTP/1.1
INFO:root:Finished Response, status 200
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/$metadata HTTP/1.1
INFO:root:Finished Response, status 200

Client objects have a feeds attribute that is a plain dictionary mapping the exposed feeds (by name) onto EntitySet objects. These objects are part of the metadata model but serve a special purpose in the API as they can be opened (a bit like files or directories) to gain access to the (collections of) entities themselves. Collection objects can be used in the with statement and that's normally how you'd use them but I'm sticking with the interactive terminal for now.

>>> products=c.feeds['Products'].OpenCollection()
>>> for p in products: print p
... 
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products HTTP/1.1
INFO:root:Finished Response, status 200
1
2
3
... [and so on]
...
20
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products?$skiptoken=20 HTTP/1.1
INFO:root:Finished Response, status 200
21
22
23
... [and so on]
...
76
77

The products collection behaves like a dictionary, iterating through it iterates through the keys in the dictionary. In this case these are the keys of the entities in the collection of products in Microsoft's sample Northwind data service. Notice that the client logs several requests to the server interspersed with the printed output. That's because the server is limiting the maximum page size and the client is following the page links provided. These calls are made as you iterate through the collection allowing you to iterate through very large collections without loading everything in to memory.

The keys alone are of limited interest, let's try a similar loop but this time we'll print the product names as well:

>>> for k,p in products.iteritems(): print k,p['ProductName'].value
... 
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products HTTP/1.1
INFO:root:Finished Response, status 200
1 Chai
2 Chang
3 Aniseed Syrup
...
...
20 Sir Rodney's Marmalade
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products?$skiptoken=20 HTTP/1.1
INFO:root:Finished Response, status 200
21 Sir Rodney's Scones
22 Gustaf's Knäckebröd
23 Tunnbröd
...
...
76 Lakkalikööri
77 Original Frankfurter grüne Soße

Sir Rodney's Scones sound interesting, we can grab an individual record just as we normally would from a dictionary, by using its key.

>>> scones=products[21]
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products(21) HTTP/1.1
INFO:root:Finished Response, status 200
>>> for k,v in scones.DataItems(): print k,v.value
... 
ProductID 21
ProductName Sir Rodney's Scones
SupplierID 8
CategoryID 3
QuantityPerUnit 24 pkgs. x 4 pieces
UnitPrice 10.0000
UnitsInStock 3
UnitsOnOrder 40
ReorderLevel 5
Discontinued False

The scones object is an Entity object. It too behaves like a dictionary. The keys are the property names and the values are one of SimpleValue, Complex or DeferredValue. In the snippet above I've used a variation of iteritems which iterates only through the data properties, excluding the navigation properties. In this model, there are no complex properties. The simple values have a value attribute which contains a python representation of the value.

Deferred values (navigation properties) can be used to navigate between Entities. Although deferred values can be opened just like EntitySets, if the model dictates that at most 1 entity can be linked a convenience method called GetEntity can be used to open the collection and read the entity in one call. In this case, a product can have at most one supplier.

>>> supplier=scones['Supplier'].GetEntity()
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products(21)/Supplier HTTP/1.1
INFO:root:Finished Response, status 200
>>> for k,v in supplier.DataItems(): print k,v.value
... 
SupplierID 8
CompanyName Specialty Biscuits, Ltd.
ContactName Peter Wilson
ContactTitle Sales Representative
Address 29 King's Way
City Manchester
Region None
PostalCode M14 GSD
Country UK
Phone (161) 555-4448
Fax None
HomePage None

Continuing with the dictionary-like theme, attempting to load a non existent entity results in a KeyError:

>>> p=products[211]
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products(211) HTTP/1.1
INFO:root:Finished Response, status 404
Traceback (most recent call last):
  File "", line 1, in 
  File "/Library/Python/2.7/site-packages/pyslet/odata2/client.py", line 165, in __getitem__
 raise KeyError(key)
KeyError: 211

Finally, when we're done, it is a good idea to close the open collection. If we'd used the with statement this step would have been done automatically for us of course.

>>> products.close()

Limitations

Currently the client only supports OData version 2. Version 3 has now been published and I do intend to update the classes to speak version 3 at some point. If you try and connect to a version 3 service the client will complain when it tries to load the metadata document. There are ways around this limitation, if you are interested add a comment to this post and I'll add some documentation.

The client only speaks XML so if your service only speaks JSON it won't work at the moment. Most of the JSON code is done and tested so adding it shouldn't be a big issue if you are interested.

The client can be used to both read and write to a service, and there are even ways of passing basic authentication credentials. However, if calling an https URL it doesn't do certificate validation at the moment so be warned as your security could be compromised. Python 2.7 does now support certification validation using OpenSLL so this could change quite easily I think.

Moving to Python 3 is non-trivial - let me know if you are interested. I have taken the first steps (running unit tests with "python -3Wd" to force warnings) and, as much as possible, the code is ready for migration. I haven't tried it yet though and I know that some of the older code (we're talking 10-15 years here) is a bit sensitive to the raw/unicode string distinction.

The documentation is currently about 80% accurate and only about 50% useful. Trending upwards though.

Downloading and Installing Pyslet

Pyslet is pure-python. If you are only interested in OData you don't need any other modules, just Python 2.7 and a reasonable setuptools to help you install it. I just upgraded my machine to Mavericks which effectively reset my Python environment. Here's what I did to get Pyslet running.

  1. Installed setuptools
  2. Downloaded the pyslet package tgz and unpacked it (download from here)
  3. Ran python setup.py install

Why?

Some lessons are hard! Ten years or so ago I wrote a migration tool to convert QTI version 1 to QTI version 2 format. I wrote it as a Python script and used it to validate the work the project team were doing on the version 2 specification itself. Realising that most people holding QTI content weren't able to easily run a Python script (especially on Windows PCs) my co-chair Pierre Gorissen wrote a small Windows-wrapper for the script using the excellent wxPython and published an installer via his website. From then on, everyone referred to it as "Pierre's migration tool". I'm not bitter, the lesson was clear. No point in writing the tool if you don't package it up in the way people want to use it.

This sentiment brings me to the latest developments with the tool. A few years back I wrote (and blogged about) a module for writing Basic LTI tools in Python. I did this partly to prove that LTI really was simple (I wrote the entire module on a single flight to the US) but also because I believed that the LTI specification was really on to something useful. LTI has been a huge success and offers a quick route for tool developers to gain access to users of learning management systems. It seems obvious that the next version of the QTI Migration Tool should be an LTI tool but moving from a desktop app to a server-based web-app means that I need a data access layer that can persist data and be smarter about things like multiple threads and processes.

2014-02-05

Deleting from iCalendar Without Notifying the Organizer - At Last!

Being a bit of a laggard I only just upgraded my Mac to Mavericks a few days ago.  If only I'd known they had fixed the number one most annoying thing about iCalendar in OS X I'd have upgraded ages ago.



Yes, you can now delete an event from your calendar without notifying the organizer.  This has caused me serious pain in the past.  Events sometimes arrive to the wrong email address or get accidentally put in the wrong calendar and previously you had to just leave them there for fear of sending a stupid "Steve declined your event" type email.

Thank you!

2013-09-01

Transforming QTI v2 into (X)HTML 5

At a recent IMS meeting I again mentioned that transforming QTI v2 into HTML shouldn't be too difficult and that I'd already made a start on this project many years ago. I even mentioned it in an earlier blog post: Semantic Markup in HTML. To my shame, I got a comment on that post which called my bluff and I haven't posted my work - until now! I won't bore you with the excuses, day job, etc. I should also warn you that these files are in no way complete. However, they do solve most of the hard problems in my opinion and could be built out to cover the rest of the interaction types fairly easily.

If you want to sing along with this blog post you should look at the files in the following directory of the QTI migration source repository: https://code.google.com/p/qtimigration/source/browse/#svn%2Ftrunk%2Fqti2html. In there you'll find a collection of XSL files.

qti2html.xsl

The goal of this project was to see how easy it would be to transform QTI version 2 files into HTML 5 in such a way that the HTML 5 was an alternative representation of the complete QTI information model. The goal was not to create a rendering which would work in an assessment delivery engine but to create a rendering that would store all the information about an item and render it in a sensible way, perhaps in a way suitable for a reviewer to view. I was partly inspired by a comment from Dick Bacon, of SToMP fame. He said it would be nice to see everything from the question all in one go, including feedback that is initially hidden and so on. It sort of gave me the idea to do the XSL this way.

Let's see what this stylesheet does to the most basic QTI v2 example, here's the command I ran on my Mac:

xsltproc qti2html.xsl choice.xml > choice.xhtml

And here's how the resulting file looks in Firefox:

The first thing you'll notice is that there is no HTML form in sight. You can't interact with this page, it is static text. But remember the goal of this stylesheet is to represent the QTI information completely. Let's look at the generated HTML source:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:qti="http://www.imsglobal.org/xsd/imsqti_v2p1">
    <head>
        <title>Unattended Luggage</title>
        <meta name="qti.identifier" content="choice"/>
        <meta name="qti.adaptive" content="false"/>
        <meta name="qti.timeDependent" content="false"/>
        <style type="text/css">
        <!-- removed for brevity -->
        </style>
    </head>

To start with, the head element contains some useful meta tags with names starting "qti.", this allows us to capture basic information that would normally be in the root element of the item.

<body>
        <h2>Unattended Luggage</h2>
        <div class="qti-itemBody">
            <p>Look at the text in the picture.</p>
            <p>
                <img src="images/sign.png" alt="NEVER LEAVE LUGGAGE UNATTENDED"/>
            </p>
            <div class="qti-choiceInteraction" id="RESPONSE" data-baseType="identifier"
                data-cardinality="single" data-shuffle="false" data-maxChoices="1">
                <p class="qti-prompt">What does it say?</p>
                <ul class="qti-choiceInteraction">
                    <li class="qti-simpleChoice" data-identifier="ChoiceA" data-correct="true">You
                        must stay with your luggage at all times.</li>
                    <li class="qti-simpleChoice" data-identifier="ChoiceB">Do not let someone else
                        look after your luggage.</li>
                    <li class="qti-simpleChoice" data-identifier="ChoiceC">Remember your luggage
                        when you leave.</li>
                </ul>
            </div>
        </div>

I've abbreviated the body here but you'll see that the item body maps in to a div with a appropriate class name (this time prefixed with qti- to make styling easier). The HTML copies across pretty much unchanged but the interesting part is the div with a class of "qti-choiceInteraction". Here we've mapped the choiceInteraction in the original XML into a div and used HTML5 style data attributes to add information about the behaviour. Cardinality, etc. In essence, this div performs the role of both interaction and response variable declaration.

I chose to map the choices themselves on to an unordered list in HTML, again, using HTML5 data- attributes to provide the additional information required by QTI.

html2qti.xsl

So can this format be converted back to QTI v2? Yes it can. The purpose of this stylesheet is to take the XHTML5 representation created by the previous stylesheet and turn it back in to valid QTIv2. This gives us the power to work in either representation. If we want to edit the HTML we can!

xsltproc html2qti.xsl choice.xhmtl > choice-new.xml

The output file is not identical to the input file, there are some changes to comments, white space and the order in which attributes are expressed but it is valid QTI v2 and is the same in every important respect.

qti.xsl

Once we have an XHTML5 representation of the item it seems like it should be easy to make something more interactive. A pure HTML/JS delivery engine for QTIv2 is an attractive proposition, especially if you can use an XSL transform to get there from the original QTI file. One of the main objections to QTI I used to hear from the SCORM crowd back in the early days was that there was no player for QTI files that were put in SCORM packages. But given that most modern browsers can do XSL this objection could just disappear, at least for formative quizzes where you don't mind handing out the scoring rules embedded in the HTML. You could even tie up such an engine with calls to the SCORM API to report back scores.

Given this background, the qti.xsl file takes a first step to transforming the XHTML5 file produced with the first XSL into something that is more like an interactive question.

xsltproc qti.xsl choice.xhtml > choice_tryout.xhtml

This is what the output looks like in Firefox:

This page is interactive, if you change your choice and click OK it re-scores the question. Note that I currently have a score of 1 as I have the correct choice selected.

Future Work

These XSL files will handle some of the other variants of multi-choice, including multi-response (yes, even when interacting) but are in no way a complete method of using XSL to transform QTI. For me, the only compelling argument for using XSL is if you want to embed it in a package to force the client to do the transformation or if you are using one of those cocoon like tools that uses XSL pipelines at runtime. Realistically, writing XSL is a pig and so much time has elapsed since I wrote it that it would take an hour or two to familiarise myself with it again if I wanted to make changes.

But the key idea is the transforms enabled by the first two, which present an alternative binding for the QTI information model.

2013-08-29

In historic vote, New Zealand bans software patents | Ars Technica

In historic vote, New Zealand bans software patents | Ars Technica

In my personal opinion, this is great news. I have had to sift through patent applications from software vendors that have clearly been created by simply sending a bunch of interface files to a lawyer to be translated in to patentese. You know the sort of thing, an API call like Student.GetName(id) becomes a 'claim' in which a string representation of a student's name is obtained from a system with a stored representation of a student's registration information, etc, etc. If we carry on as we are, someone will have to write an Eclipse plug-in that generates the patent application every time you build your software.

So this news is a ray of hope. It isn't a blanket "IP law is bad" bill but a measured way of enshrining the basic principle that software 'as such' is not an invention. There will always be the odd counterexample where something is no longer patentable that we might feel should be (and vice versa!) but ever since I've been following this debate it seems to me that the system has stayed the same not for lack of suggestions of how to make it better but because of FUD around change. Thank you NZ for being bolder.

Initially 'via Blog this'

2013-08-25

What's in a name? Tin Can, Experience and the Tea Party

On Friday I was at e-Assessment Scotland, getting ready for Fiona Leteney's closing keynote "Anyone need a Can Opener?" when I tweeted:

Getting ready for the #easc13 session on the Experience API adlnet.gov/tla/experience… - anyone got an experience opener?

I tweeted this because I wanted to ensure that the twitter-savvy audience at #easc13 understood that there is an issue with the name of the API and that, in my opinion, it is important. Fiona followed up on twitter and, I tried to put this in to 140 characters but it just wouldn't fit. Before I go on though, a couple of disclaimers...

This blog post represents my personal views, not the views of my employer or of any other organization with which I may be affiliated. In the interests of full disclosure, I work for Questionmark who have presented work related to this API at previous conferencess, including at DevLearn. Also, I'm not a lawyer! I am interested in the world of open source and in some of the legal issues that new technology throws up, particularly when they relate to intellectual property law.

So What's in a name?

The title of @fionaleteney's talk clearly makes reference to the "Tin Can API" but she did touch on the naming issue in the presentation and introduced the newer "Experience API" name with the observation that "Tin Can" is likely to remain the recognisable brand.

Many people in the community probably think the name doesn't matter and that arguing over a name is an unimportant distraction. Indeed, it all feels a bit like the Monty Python film, The Life of Brian, in which followers argue over whether the shoe or the gourd is the correct symbol to represent their community. I believe it is important, in short because I don't think an API like this can succeed if it gets this type of thing wrong. The argument goes like this:

  1. The purpose of the API is to make it easier for activity statements to flow between the tools that generate or initially record activities and tools that aggregate this information.
  2. Therefore, the API is essentially an interoperability specification that will be more successful if a wide range of tools adopt it.
  3. To get a wide range of tools you need a wide range of tool suppliers to invest in adding support.
  4. To get a wide range of suppliers to invest you need a level playing field and trust within the supplier community
  5. To get trust, you need good stewardship of the API.

I've honed in on one particular prerequisite for success here. There are plenty of other challenges that an API like this faces, including other issues related to IP law. After the session one delegate expressed concern about the ownership of information communicated using the API, and I have heard privacy issues voiced too. These are important things to get right, even more reason not to distract the suppliers' legal teams with branding issues when they should be advising their clients on how to use the API to improve the learner's experience from the right side of the law.

You'll have had your tea

One aspect of good stewardship for an API is the branding. In practice, that means control of the trademarks associated with an API. To people in the technology world this often translates directly into domain names but that is only one way in which trademarks are used. What about Google Ad words? Google has a policy for that. And don't forget logos.

Of course, the big guns of the IT world get this type of thing sorted out before they even embark on a joint project but legal issues are often the last thing a small learning technology community thinks about. We're all well meaning, what could possibly go wrong? Why doesn't everyone trust me? What do you mean, I'm not allowed to team up with a small group of my competitors to gain an advantage over the rest of the marketplace? And so on... I have to thank Ed Walker for drawing my attention to the importance of getting this type of thing right when he was in his role as CEO of the IMS Global Learning consortium. He was particularly clear on the last point.

Thinking of Ed brings me (via Boston) to an interesting article about a similar problem in a very different domain. In Trademarking the Tea Party, an article from 2010, Beth Hutchens touches on an issue which resonates with the problem experienced by the Experience API community. In this case, during a dispute over the identity of a political movement one group seeks trademark protection for the Tea Party name and all the other groups cry foul!

The problem is that the Tea Party is more of a conglomerate of different groups of people based loosely around a set of common goals, and not a collective. This could be problematic in gaining trademark protection.

This could well describe the sort of informal grouping that tends to build up around an API. The author goes on to say...

It makes sense to have at least some sort of structure to keep squabbles like this from coming up in the future.

The suggested solution is set yourselves up as a collective movement with a distinctive name so that you can register a collective mark. As with anything law related, it makes sense to put a little work in up-front to ensure you don't have to spend a lot of time later figuring out who (if anyone) has the right to say how the name can and can't be used. The rest of the article is worth a read by the way, if only for its use of the word antidisestablishmentarianism in a justifiable context. But more seriously, it is a great introduction for normal people on what a collective trademark is.

Kicking the Can

To understand why the naming thing has become an issue for this API it is best to read the discussion We Call It Tan Can. There are two issues being debated here so it is a bit confusing.

  1. Which is the better name: Experience API or Tin Can API? - this blog post isn't about this question, for what it's worth I voted for Experience API but I could live with Tin Can if we got the second issue straight.
  2. What is the appropriate legal way for the community to manage and control use of the API's name?

It is clear that the community recognises the problem. The team that originally led the development of the API registered a trademark with a view to handing it over to the project sponsor, the ADL. ADL is part of the US government and it seems like handing a trademark to a government department has turned out to be trickier than expected. The US Government does have a succinct page on the issue of government owned IP though: Copyright and Other Rights Pertaining to U.S. Government Works. On this page it makes it clear that you can't use a government owned trademark without permission (no big surprise there) and clears up the confusion of which bits of government IP are in the public domain and which aren't. (An issue which has fascinating implications for the makers of model aircraft, but we digress.)

In the above thread, Mike Rustici goes on to say:

Since we're on the topic of trademarks, another significant issue to consider in this debate is the fact that "experience api" is not trademarked. If ADL is unable or unwilling to secure one, that is very problematic for the future of this spec.
...
Anybody could claim to support or own "experience api" rendering the spec (or at least the label) meaningless.

So who should own the mark? The ADL seems to be struggling to take ownership and, even it did, how would it determine the rules under which permission should be granted to members of the API community? If one member abused the mark, would the US government pursue them on behalf of the other members? It isn't clear that the ADL has the desire to fill this role on their behalf.

Let's take the very particular issue of domain names, though Google Adwords do raise similar concerns. Policies like ICANN's Unified Domain Name Dispute Resolution Policy make it clear that trademarks are an important part of determining whether complaints will be upheld. So if the API's name was owned by a neutral party would that group invoke the policy to ensure that names like experienceapi.com or tincanapi.com were used in a way that ensures there will be no confusion? You'd hope so. Right now, both tincanapi.com and experienceapi.com point to basically the same site controlled by just one member of the community and are used to promote their services over and above those of other community members. As far as this blog post is concerned, both names are problematic now.

With the benefit of Experience

It's no surprise that people have had this type of problem before and that there are legal patterns that a community can follow to help ensure that their IP, including any trademarks, have the sort of stewardship which helps attract members and build the community. Creating a new membership organization just for this API would seem onerous but the problem with choosing an existing one is that they'll have already established a way of dealing with the IP and it is unlikely to be a perfect match. Still, this seems to be the solution being explored by ADL, quoting again from the main thread: Plans are already being made to transfer ownership of the spec to an open standards body - this can't come too soon.

One final word of caution here. One of the grim duties that falls on the owner of a trademark is to ensure that it is being upheld and is not just becoming a generic term for a bunch of similar products from a variety of suppliers (hoover, biro etc). I recall the magazine Private Eye getting letters from lawyers when they used the word Portakabin in one of their articles. If confusion takes over around this API then none of the marks will be enforceable and we'll have to start all over again.

2013-03-15

RSS Readers: in the dog house

So farewell Google Reader, I will miss you.

This week's announcement of the demise of Google Reader as part of the Second Spring of Cleaning seems to be an important milestone for the internet.

There's a lot of new blog articles lamenting its demise (to some extent, this is one of them) but we shouldn't be too shocked. The original concept behind RSS has been under threat for some time, in fact if you Google "War on RSS" you'll see an established idea that companies that have a powerful influence on the way we use the internet have been deprecating RSS for some time.

Perhaps the most interesting of these contribution comes from @vambenepe who wrote The war on RSS in February last year. It's a good overview of the way RSS reading features are going missing in systems we use to access the internet and contains this worrying quote:

Google has done a lot for RSS, but as a result it has put itself in position to kill it, either accidentally or on purpose. [...snip...] [... If] Google closed Reader, would RSS survive? Doubtful.

This particular commentator is interesting because since writing this he has moved on to become "Product Manager on Google Cloud Platform". Don't expect a follow up article but he did tweet yesterday:

"1 year ago, I asked: "If Google closed Reader, would RSS survive?" http://stage.vambenepe.com/archives/1932 We'll now find out but I won't be able to comment."

One of the takeaways here is that we're not just talking about RSS specifically. When we say RSS we can include Atom and readers of this blog will know that I'm a fan of Atom and the emerging OData standard that is based upon it. But let's not get carried away. This war is not on the protocol but on the use of RSS as a way of end users discovering content on the internet. The emergence of OData (based on the Atom Publishing Protocol, not the read-only RSS) as a protocol that sits between the web app and the data source is likely to get even stronger.

Even HTTP has changed. This blog post uses HTTP in an old fashioned way. I'm writing an article, inserting anchors that form hypertext links to other resources on the internet. I'm banking on the idea that these resources won't go away and that this article will join a persistant web of information. If you're reading this you're probably thinking, duh, that's what the internet is. In the early days this was true but the internet is no longer like this for the majority of users. HTTP sits as a protocol behind the web apps we use to check Twitter, Facebook and iTunes but the concept behind the way most people consume information on the internet bears no relation to the classic hypertext visions we used to cite when we were all researchers working in universities in the early 90s.

Go back and read the seminal As we may think or review the goals of Ted Nelson's Xanadu Project and you won't recognise the origins of iTunes, on-demand TV, micro-blogging or ad-supported social networks. From a UK point of view, we didn't even have commercial broadcast television until 1955 (when ITV was launched) which is 10 years after As we may think was published. The existence of these modern uses of the internet do not preclude the research use envisaged by these information scientists, it just relegates it to a niche.

The problem for people like you and me, who occupy this niche, is that the divergence of consumer internet technology from the original research oriented web is eventually going to make it more expensive. There's no law that says that Google has to provide an RSS reading tool for free (or a blogging service for that matter). In fact, the withdrawal of this service may actually provide a shot in the arm for the makers of RSS readers who have been starved by people like me who use the freebie Google Reader instead of their more tailored offerings. Yes, I would be prepared to pay to have something like Google Reader that stays in sync across my tablet, phone and laptop.

Ad, ad, ad...

While I'm on the subject of money, I do want to draw your attention to Xanadu's rule 9:

Every document can contain a royalty mechanism at any desired degree of granularity to ensure payment on any portion accessed, including virtual copies ("transclusions") of all or part of the document.

I really think it is time that technology providers started to look again at this goal. In the early days of the internet this was considered unrealistic. In fact, I remember sitting through meetings in which people responsible for creating the infrastructure that made the internet possible were highly doubtful that traffic accounting would ever be possible. The growth in internet traffic would always outpace the ability of switching gear and routers to count bits and report on usage. That prediction turned out to be wrong. I think they underestimated the strength of the business case behind bit-counting, which is routine on mobile platforms. My cheap router counts my own internet usage and I know my service provider has realtime stats too, if only to enforce their acceptable usage policy.

There have been a lot of haters for charging based on consumption of bits and this, in my opinion, has distorted the business models available to service providers towards ad-based services and away from the Xanadu-like micro payments.

Most of the rhetoric about the demise of Google Reader is taken from the point of view of the consumer, not the information publisher. Of course I want to consume content for free using free technology over an unlimited internet connection. But none of these things are really free. We've all heard the adage that if something is free then you're the product. As an RSS consumer, my costs just outstripped my marketable value to Google. I'm not a cash cow anymore, I'm a dog.

From Reader to Blogger

But as I type, I'm not just consuming the content I used to research it. I'm also publishing content of my own. At the moment for free. I don't want to enable ads on this blog but the technology doesn't yet make it easy for me, or anyone between me and you, to collect revenue and experiment with pricing. It's more complicated than you might think.

Rule 8 of Xanadu reads "Permission to link to a document is explicitly granted by the act of publication." Early internet sites seriously considered violating this principal. Content providers considered themselves to be so valuable that someone creating a site that aggregated links to their gems were somehow cheating the system. This has been turned completely on its head now, these days information providers are hungry for links and when those links result in product sales they are prepared to pay real money to the aggregator. This is the basis on which all the market comparison sites are run.

If content publishers got revenue from people viewing their materials (Xanadu style) then linking to someone's content becomes a valuable lead. How would payments trickle back to the owner of the <a> tag?

We know that the ad-model works. YouTube generates huge revenues for people like PSY. But for people outside the mainstream who occupy this niche, typified by users of Google Reader, we need another way to solve the money problem. Perhaps the new technology that emerges to take the place of Reader will come up with a creative way to address this issue. Especially if they start getting paid by their users.