swl10: QTI

Showing posts with label QTI. Show all posts

2015-10-11

Pyslet goes https

After months of being too busy to sort this out I have finally moved the Pyslet website to SSL. This is a quick post to explain how I've done this.

Firstly, I've wanted to do this for a while because I want to use the above website to host a web version of the QTI migration tool but encouraging users to upload their precious assessment materials to a plain old HTTP URL should (hopefully would) have proved a challenge. I saw an advert for free SSL certificates for open source projects from GlobalSign so in a rush of enthusiasm I applied and got my certificate. There's a checklist of rules that the site must comply with to be eligible (see previous link) which I'll summarise here:

OSI license: Pyslet uses the BSD 3-Clause License: check!
Actively maintained: well, Pyslet is a spare-time activity but I'm going to give myself a qualified tick here.
Not used for commercial purposes: the Pyslet website is just a way of hosting demos of Pyslet in action, no adverts, no 'monetization' of any kind: check!
Must get an A rating with GlobalSign's SSL Checker...

That last one is not quite as easy as you might think. Here's what I did to make it happen, I'll assume you have already dome some openssl magic, applied for and received your crt file.

Download the intermediate certificate chain file from GlobalSign here, the default one for SHA-256 Orders was the correct one for me.

Put the following files into /var/www/ssl (your location may vary):

www.pyslet.org.key
www.pyslet.org.crt
globalsign-intermediate.crt

The first one is the key I originally created with:

openssl genrsa -des3 -out www.pyslet.org.key.encrypted 2048
openssl req -new -key www.pyslet.org.key.encrypted -out www.pyslet.org.csr
openssl rsa -in www.pyslet.org.key.encrypted -out www.pyslet.org.key

The second file is the certificate I got from GlobalSign themselves. The third one is the intermediate certificate I downloaded above.

Set permissions (as root):

chown -R root:root /var/www/ssl/*.key
chmod 700 /var/www/ssl/*.key

Add a virtual host to Apache's httpd.conf (suitable for Apache/2.2.31):

Listen 443

<VirtualHost *:443>
    ServerName www.pyslet.org
    SSLEngine on
    
    SSLCertificateFile /var/www/ssl/www.pyslet.org.crt
    SSLCertificateKeyFile /var/www/ssl/www.pyslet.org.key
    SSLCertificateChainFile /var/www/ssl/globalsign-intermediate.crt
    
    SSLCompression off
    SSLProtocol all -SSLv3 -SSLv2
    SSLCipherSuite AES128+EECDH:AES128+EDH    
    SSLHonorCipherOrder on
    
#   Rest of configuration goes here....

</VirtualHost>

This is a relatively simple configuration designed to get an A rating while not worrying too much about compatibility with really old browsers.

2014-02-12

A Dictionary-like Python interface for OData

Overview

This blog post introduces some new modules that I've added to the Pyslet package I wrote. Pyslet's purpose is providing support for Standards for Learning, Education and Training in Python. The new modules implement the OData protocol by providing a dictionary-like interface. You can download pyslet from the QTIMigration Tool & Pyslet home page. There is some documentation linked from the main Pyslet wiki. This blog article is as good a way as any to get you started.

The Problem

Python has a database API which does a good job but it is not the whole solution for data access. Embedding SQL statements in code, grappling with the complexities of parameterization and dealing with individual database quirks makes it useful to have some type of layer between your web app and the database API so that you can tweak your code as you move between data sources.

If SQL has failed to be a really interoperable standard then perhaps OData, the new kid on the block, can fill the vacuum. The standard is sometimes referred to as "ODBC over the web" so it is definitely in this space (after all, who runs their database on the same server as their web app these days?).

My Solution

To solve this problem I decided to set about writing my own data access layer that would be modeled on the conventions of OData but that used some simple concepts in Python. I decided to go down the dictionary-like route, rather than simulating objects with attributes, because I find the code more transparent that way. Implementing methods like __getitem__, __setitem__ and itervalues keeps the data layer abstraction at arms length from the basic python machinery. It is a matter of taste. See what you think.

The vision here is to write a single API (represented by a set of base classes) that can be implemented in different ways to access different data sources. There are three steps:

An implementation that uses the OData protocol to talk to a remote OData service.
An implementation that uses python dictionaries to create a transient in-memory data service for testing.
An implementation that uses the python database API to access a real database.

This blog post is mainly about the first step, which should validate the API as being OData-like and set the groundwork for the others which I'll describe in subsequent blog posts. Incidentally, it turns out to be fairly easy to write an OData server that exposes a data service written to this API, more on that in future posts.

Quick Tutorial

The client implementation uses Python's logging module to provide logging. To make it easier to see what is going on during this walk through I'm going to turn logging up from the default "WARN" to "INFO":

>>> import logging
>>> logging.basicConfig(level=logging.INFO)

To create a new OData client you simply instantiate a Client object passing the URL of the OData service root. Notice that, during construction, the Client object downloads the list of feeds followed by the metadata document. The metadata document is used extensively by this module and is loaded into a DOM-like representation.

>>> from pyslet.odata2.client import Client
>>> c=Client("http://services.odata.org/V2/Northwind/Northwind.svc/")
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/ HTTP/1.1
INFO:root:Finished Response, status 200
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/$metadata HTTP/1.1
INFO:root:Finished Response, status 200

Client objects have a feeds attribute that is a plain dictionary mapping the exposed feeds (by name) onto EntitySet objects. These objects are part of the metadata model but serve a special purpose in the API as they can be opened (a bit like files or directories) to gain access to the (collections of) entities themselves. Collection objects can be used in the with statement and that's normally how you'd use them but I'm sticking with the interactive terminal for now.

>>> products=c.feeds['Products'].OpenCollection()
>>> for p in products: print p
... 
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products HTTP/1.1
INFO:root:Finished Response, status 200
1
2
3
... [and so on]
...
20
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products?$skiptoken=20 HTTP/1.1
INFO:root:Finished Response, status 200
21
22
23
... [and so on]
...
76
77

The products collection behaves like a dictionary, iterating through it iterates through the keys in the dictionary. In this case these are the keys of the entities in the collection of products in Microsoft's sample Northwind data service. Notice that the client logs several requests to the server interspersed with the printed output. That's because the server is limiting the maximum page size and the client is following the page links provided. These calls are made as you iterate through the collection allowing you to iterate through very large collections without loading everything in to memory.

The keys alone are of limited interest, let's try a similar loop but this time we'll print the product names as well:

>>> for k,p in products.iteritems(): print k,p['ProductName'].value
... 
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products HTTP/1.1
INFO:root:Finished Response, status 200
1 Chai
2 Chang
3 Aniseed Syrup
...
...
20 Sir Rodney's Marmalade
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products?$skiptoken=20 HTTP/1.1
INFO:root:Finished Response, status 200
21 Sir Rodney's Scones
22 Gustaf's Knäckebröd
23 Tunnbröd
...
...
76 Lakkalikööri
77 Original Frankfurter grüne Soße

Sir Rodney's Scones sound interesting, we can grab an individual record just as we normally would from a dictionary, by using its key.

>>> scones=products[21]
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products(21) HTTP/1.1
INFO:root:Finished Response, status 200
>>> for k,v in scones.DataItems(): print k,v.value
... 
ProductID 21
ProductName Sir Rodney's Scones
SupplierID 8
CategoryID 3
QuantityPerUnit 24 pkgs. x 4 pieces
UnitPrice 10.0000
UnitsInStock 3
UnitsOnOrder 40
ReorderLevel 5
Discontinued False

The scones object is an Entity object. It too behaves like a dictionary. The keys are the property names and the values are one of SimpleValue, Complex or DeferredValue. In the snippet above I've used a variation of iteritems which iterates only through the data properties, excluding the navigation properties. In this model, there are no complex properties. The simple values have a value attribute which contains a python representation of the value.

Deferred values (navigation properties) can be used to navigate between Entities. Although deferred values can be opened just like EntitySets, if the model dictates that at most 1 entity can be linked a convenience method called GetEntity can be used to open the collection and read the entity in one call. In this case, a product can have at most one supplier.

>>> supplier=scones['Supplier'].GetEntity()
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products(21)/Supplier HTTP/1.1
INFO:root:Finished Response, status 200
>>> for k,v in supplier.DataItems(): print k,v.value
... 
SupplierID 8
CompanyName Specialty Biscuits, Ltd.
ContactName Peter Wilson
ContactTitle Sales Representative
Address 29 King's Way
City Manchester
Region None
PostalCode M14 GSD
Country UK
Phone (161) 555-4448
Fax None
HomePage None

Continuing with the dictionary-like theme, attempting to load a non existent entity results in a KeyError:

>>> p=products[211]
INFO:root:Sending request to services.odata.org
INFO:root:GET /V2/Northwind/Northwind.svc/Products(211) HTTP/1.1
INFO:root:Finished Response, status 404
Traceback (most recent call last):
  File "", line 1, in 
  File "/Library/Python/2.7/site-packages/pyslet/odata2/client.py", line 165, in __getitem__
 raise KeyError(key)
KeyError: 211

Finally, when we're done, it is a good idea to close the open collection. If we'd used the with statement this step would have been done automatically for us of course.

>>> products.close()

Limitations

Currently the client only supports OData version 2. Version 3 has now been published and I do intend to update the classes to speak version 3 at some point. If you try and connect to a version 3 service the client will complain when it tries to load the metadata document. There are ways around this limitation, if you are interested add a comment to this post and I'll add some documentation.

The client only speaks XML so if your service only speaks JSON it won't work at the moment. Most of the JSON code is done and tested so adding it shouldn't be a big issue if you are interested.

The client can be used to both read and write to a service, and there are even ways of passing basic authentication credentials. However, if calling an https URL it doesn't do certificate validation at the moment so be warned as your security could be compromised. Python 2.7 does now support certification validation using OpenSLL so this could change quite easily I think.

Moving to Python 3 is non-trivial - let me know if you are interested. I have taken the first steps (running unit tests with "python -3Wd" to force warnings) and, as much as possible, the code is ready for migration. I haven't tried it yet though and I know that some of the older code (we're talking 10-15 years here) is a bit sensitive to the raw/unicode string distinction.

The documentation is currently about 80% accurate and only about 50% useful. Trending upwards though.

Downloading and Installing Pyslet

Pyslet is pure-python. If you are only interested in OData you don't need any other modules, just Python 2.7 and a reasonable setuptools to help you install it. I just upgraded my machine to Mavericks which effectively reset my Python environment. Here's what I did to get Pyslet running.

Installed setuptools
Downloaded the pyslet package tgz and unpacked it (download from here)
Ran python setup.py install

Why?

Some lessons are hard! Ten years or so ago I wrote a migration tool to convert QTI version 1 to QTI version 2 format. I wrote it as a Python script and used it to validate the work the project team were doing on the version 2 specification itself. Realising that most people holding QTI content weren't able to easily run a Python script (especially on Windows PCs) my co-chair Pierre Gorissen wrote a small Windows-wrapper for the script using the excellent wxPython and published an installer via his website. From then on, everyone referred to it as "Pierre's migration tool". I'm not bitter, the lesson was clear. No point in writing the tool if you don't package it up in the way people want to use it.

This sentiment brings me to the latest developments with the tool. A few years back I wrote (and blogged about) a module for writing Basic LTI tools in Python. I did this partly to prove that LTI really was simple (I wrote the entire module on a single flight to the US) but also because I believed that the LTI specification was really on to something useful. LTI has been a huge success and offers a quick route for tool developers to gain access to users of learning management systems. It seems obvious that the next version of the QTI Migration Tool should be an LTI tool but moving from a desktop app to a server-based web-app means that I need a data access layer that can persist data and be smarter about things like multiple threads and processes.

2013-09-01

Transforming QTI v2 into (X)HTML 5

At a recent IMS meeting I again mentioned that transforming QTI v2 into HTML shouldn't be too difficult and that I'd already made a start on this project many years ago. I even mentioned it in an earlier blog post: Semantic Markup in HTML. To my shame, I got a comment on that post which called my bluff and I haven't posted my work - until now! I won't bore you with the excuses, day job, etc. I should also warn you that these files are in no way complete. However, they do solve most of the hard problems in my opinion and could be built out to cover the rest of the interaction types fairly easily.

If you want to sing along with this blog post you should look at the files in the following directory of the QTI migration source repository: https://code.google.com/p/qtimigration/source/browse/#svn%2Ftrunk%2Fqti2html. In there you'll find a collection of XSL files.

qti2html.xsl

The goal of this project was to see how easy it would be to transform QTI version 2 files into HTML 5 in such a way that the HTML 5 was an alternative representation of the complete QTI information model. The goal was not to create a rendering which would work in an assessment delivery engine but to create a rendering that would store all the information about an item and render it in a sensible way, perhaps in a way suitable for a reviewer to view. I was partly inspired by a comment from Dick Bacon, of SToMP fame. He said it would be nice to see everything from the question all in one go, including feedback that is initially hidden and so on. It sort of gave me the idea to do the XSL this way.

Let's see what this stylesheet does to the most basic QTI v2 example, here's the command I ran on my Mac:

xsltproc qti2html.xsl choice.xml > choice.xhtml

And here's how the resulting file looks in Firefox:

The first thing you'll notice is that there is no HTML form in sight. You can't interact with this page, it is static text. But remember the goal of this stylesheet is to represent the QTI information completely. Let's look at the generated HTML source:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:qti="http://www.imsglobal.org/xsd/imsqti_v2p1">
    <head>
        <title>Unattended Luggage</title>
        <meta name="qti.identifier" content="choice"/>
        <meta name="qti.adaptive" content="false"/>
        <meta name="qti.timeDependent" content="false"/>
        <style type="text/css">
        <!-- removed for brevity -->
        </style>
    </head>

To start with, the head element contains some useful meta tags with names starting "qti.", this allows us to capture basic information that would normally be in the root element of the item.

<body>
        <h2>Unattended Luggage</h2>
        <div class="qti-itemBody">
            <p>Look at the text in the picture.</p>
            <p>
                <img src="images/sign.png" alt="NEVER LEAVE LUGGAGE UNATTENDED"/>
            </p>
            <div class="qti-choiceInteraction" id="RESPONSE" data-baseType="identifier"
                data-cardinality="single" data-shuffle="false" data-maxChoices="1">
                <p class="qti-prompt">What does it say?</p>
                <ul class="qti-choiceInteraction">
                    <li class="qti-simpleChoice" data-identifier="ChoiceA" data-correct="true">You
                        must stay with your luggage at all times.</li>
                    <li class="qti-simpleChoice" data-identifier="ChoiceB">Do not let someone else
                        look after your luggage.</li>
                    <li class="qti-simpleChoice" data-identifier="ChoiceC">Remember your luggage
                        when you leave.</li>
                </ul>
            </div>
        </div>

I've abbreviated the body here but you'll see that the item body maps in to a div with a appropriate class name (this time prefixed with qti- to make styling easier). The HTML copies across pretty much unchanged but the interesting part is the div with a class of "qti-choiceInteraction". Here we've mapped the choiceInteraction in the original XML into a div and used HTML5 style data attributes to add information about the behaviour. Cardinality, etc. In essence, this div performs the role of both interaction and response variable declaration.

I chose to map the choices themselves on to an unordered list in HTML, again, using HTML5 data- attributes to provide the additional information required by QTI.

html2qti.xsl

So can this format be converted back to QTI v2? Yes it can. The purpose of this stylesheet is to take the XHTML5 representation created by the previous stylesheet and turn it back in to valid QTIv2. This gives us the power to work in either representation. If we want to edit the HTML we can!

xsltproc html2qti.xsl choice.xhmtl > choice-new.xml

The output file is not identical to the input file, there are some changes to comments, white space and the order in which attributes are expressed but it is valid QTI v2 and is the same in every important respect.

qti.xsl

Once we have an XHTML5 representation of the item it seems like it should be easy to make something more interactive. A pure HTML/JS delivery engine for QTIv2 is an attractive proposition, especially if you can use an XSL transform to get there from the original QTI file. One of the main objections to QTI I used to hear from the SCORM crowd back in the early days was that there was no player for QTI files that were put in SCORM packages. But given that most modern browsers can do XSL this objection could just disappear, at least for formative quizzes where you don't mind handing out the scoring rules embedded in the HTML. You could even tie up such an engine with calls to the SCORM API to report back scores.

Given this background, the qti.xsl file takes a first step to transforming the XHTML5 file produced with the first XSL into something that is more like an interactive question.

xsltproc qti.xsl choice.xhtml > choice_tryout.xhtml

This is what the output looks like in Firefox:

This page is interactive, if you change your choice and click OK it re-scores the question. Note that I currently have a score of 1 as I have the correct choice selected.

Future Work

These XSL files will handle some of the other variants of multi-choice, including multi-response (yes, even when interacting) but are in no way a complete method of using XSL to transform QTI. For me, the only compelling argument for using XSL is if you want to embed it in a package to force the client to do the transformation or if you are using one of those cocoon like tools that uses XSL pipelines at runtime. Realistically, writing XSL is a pig and so much time has elapsed since I wrote it that it would take an hour or two to familiarise myself with it again if I wanted to make changes.

But the key idea is the transforms enabled by the first two, which present an alternative binding for the QTI information model.

2012-07-02

QTI Pre-Conference Workshop: next week!

Sadly I won't be able to make this event next week but I thought I'd pass on a link to the flyer in case there is anyone still making travel plans.

http://caaconference.co.uk/wp-content/uploads/CAA-2012-Pre-Conference-Workshop.pdf

I'm still making the odd change to the QTI migration tool - and the integration with the PyAssess library is going well. This will bring various benefits like the ability to populate the correct response for most item types when converting from v1. So if you have a v1 to v2 migration question coming out of the workshop please feel free to get in touch or post them here.

'via Blog this'

2012-06-01

Atom, OData and Binary Blobs

I've been doing a lot of work on Atom and OData recently. I'm a real fan of Atom and the related Atom Publishing Protocol (APP for short). OData is a specification from Microsoft which builds on these two basic building blocks of the internet to provide standard conventions for querying feeds and representing properties using a SQL-like model.

Given that OData can be used to easily expose data currently residing in SQL databases it is not surprising that the issue of binary blobs is one that takes a little research to figure out. At first sight it isn't obvious how OData deals with them, in fact, it isn't even obvious how APP deals with them!

Atom Primer

Most of us are familiar with the idea of an RSS feed for following news articles and blogs like this one (this article prompted me to add the gadget to my blogger templates to make it easier to subscribe). Atom is a slightly more formal definition of the same concept and is available as an option for subscribing to this blog too. Understanding the origins of Atom helps when trying to understand the Atom data model, especially if you are coming to Atom from a SQL/OData point of view.

Atom is all about feeds (lists) of entries. The data you want, be it a news article, blog post or a row in your database table is an entry. A feed might be everything, such as all the articles in your blog or all the rows in your database table, or it may be a filtered subset such as all the articles in your blog with a particular tag or all the rows in your table that match a certain query.

Atom adheres closely to the REST-based service concept. Each entry has its own unique URI. Feeds also have their own URIs. For example, the Atom feed URL for this blog is:

http://swl10.blogspot.com/feeds/posts/default

But if you are only interested in the Python language then you might want to use a different feed:

http://swl10.blogspot.com/feeds/posts/default/-/Python

Obviously the first feed contains all the entries in the second feed too!

Atom is XML-based so an entry is represented by an <entry> element and the content of an entry is represented by a <content> child element. Here's an abbreviated example from this blog's Atom feed. Note that the atom-defined metadata elements appear as siblings of the content...

<entry>
  <id>tag:blogger.com,1999:blog-8659912959976079554.post-4875480159917130568</id>
  <published>2011-07-17T16:00:00.000+01:00</published>
  <updated>2011-07-17T16:00:06.090+01:00</updated>
  <category scheme="http://www.blogger.com/atom/ns#" term="QTI"/>
  <category scheme="http://www.blogger.com/atom/ns#" term="Python"/>
  <title type="text">Using gencodec to make a custom character mapping</title>
  <content type="html">One of the problems I face...</content>
  <link rel="edit" type="application/atom+xml"
    href="http://www.blogger.com/feeds/8659912959976079554/posts/default/4875480159917130568"/>
  <link rel="self" type="application/atom+xml"
    href="http://www.blogger.com/feeds/8659912959976079554/posts/default/4875480159917130568"/>
</entry>

For blog articles, this content is typically html text (yes, horribly escaped to allow it to pass through XML parsers). Atom actually defines three types of native content, 'html', 'text' and 'xhtml'. It also allows the content element to contain a single child element corresponding to other XML media types. OData uses this method to represent the property name/value pairs that might correspond to the column names and values for a row in the database table your are exposing.

Here's another abbreviated example taken from the Netflix OData People feed:

<entry>
  <id>http://odata.netflix.com/v2/Catalog/People(189)</id>
  <title type="text">Bruce Abbott</title>
  <updated>2012-06-01T07:55:17Z</updated>
  <category term="Netflix.Catalog.v2.Person" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
  <content type="application/xml">
    <m:properties>
   <d:Id m:type="Edm.Int32">189</d:Id>
   <d:Name>Bruce Abbott</d:Name>
    </m:properties>
  </content>
</entry>

Notice the application/xml content type and the single properties element from Microsoft's metadata schema.

Any other type of content is considered to be external media. But Atom can still describe it, it can still associate metadata with it and it can still organize it into feeds...

Binary Blobs as Media

There is nothing stopping the content of an entry from being a non-text binary blob of data. You just change the type attribute to be your favourite blob format and add a src attribute to point to an external file or base-64 encode it and include it in the entry itself (this second method is rarely used I think).

Obviously the URL of the entry (the XML document containing the <entry> tag) is not the same as the URL of the media resource, but they are closely related. The entry is referred to as a Media Link because it contains the metadata about the media file (such as the title, updated date etc) and it links to it. The media file itself is known as a media resource.

There's a problem with OData though. OData requires the child of the content element to be the properties element (see example above) and the type attribute to be application/xml. But Atom says there can only be one content element per entry. So how can OData be used for binary blobs?

The answer is a bit of a hack. When the entry is a media link entry the properties move into the metadata area of the entry. Here's another abbreviated example from Netflix which illustrates the technique:

<entry>
  <id>http://odata.netflix.com/v2/Catalog/Titles('13aly')</id>
  <title type="text">Red Hot Chili Peppers: Funky Monks</title>
  <summary type="html">Lead singer Anthony Kiedis...</summary>
  <updated>2012-01-31T09:45:16Z</updated>
  <author>
    <name />
  </author>
  <category term="Netflix.Catalog.v2.Title"
    scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
  <content type="image/jpeg" src="http://cdn-0.nflximg.com/en_us/boxshots/large/5632678.jpg" />
  <m:properties xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"
    xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices">
    <d:Id>13aly</d:Id>
    <d:Name>Red Hot Chili Peppers: Funky Monks</d:Name>
    <d:ShortName>Red Hot Chili Peppers: Funky Monks</d:ShortName>
    <d:Synopsis>Lead singer Anthony Kiedis...</d:Synopsis>
    <d:ReleaseYear m:type="Edm.Int32">1991</d:ReleaseYear>
    <d:Url>http://www.netflix.com/Movie/Red_Hot_Chili_Peppers_Funky_Monks/5632678</d:Url>
    <!-- more properties.... -->
  </m:properties>
</entry>

This entry is taken from the Titles feed, notice that the entry is a media-link to the large box graphic for the film.

Binary Blobs and APP

APP adds a protocol for publishing information to Atom feeds and OData builds on APP to allow data feeds to be writable, not just read-only streams. You can't upload your own titles to Netflix as far as I know so I don't have an example here. The details are all in section 9.6 of RFC 5023 but in a nutshell, if you post a binary blob to a feed the server should store the blob and create a media link entry that points to it (populated with a minimal set of metadata). Once created, you can then update the metadata with HTTP's PUT method on the media link's edit URI directly, or update the binary blob by using HTTP's PUT method on the edit-media URI of the media resource. (These links are given in the <link> elements in the entries, see the first example for examples.)

There is no reason why binary blobs can't be XML files of course. Many of the technical standards for education that I work with are very data-centric. They define the format of XML documents such as QTI, which are designed to be opaque to management systems like item banks (an item bank is essentially a special-purpose content management system for questions used in assessment).

So publishing feeds using OData or APP from an item bank would most likely use these techniques for making the underlying content available to third party systems. Questions often contain media resources (e.g., images) of course but even the question content itself is typically marked up using XML, as it is in QTI. This data is not easy to represent as a simple list of property values and would typically be stored as a blob in a database or as a file in a repository. Therefore, it is probably better to think of this data as a media resource when exposing it via APP/OData.

2012-05-22

Common Cartridge, Namespaces and Dependency Injection

This post is about coping with a significant change to the newer (public) versions of the IMS Common Cartridge specification. This change won't affect everyone the same way, your implementation may just shrug it off. However, I found I had to make an important change to the QTI migration tool code to make it possible to read QTI version 1 files from the newer form of cartridges.

There have been three versions of this specification now, versions 1.0, 1.1 and most recently version 1.2. The significant change for me was between versions 1.0 (published October 2008) and 1.1 (revised May 2011).

Changing Namespaces

The key change between 1.0 and 1.1 was to the namespaces used in the XML files. In version 1.0, the default namespace for content packaging elements is used in the manifest file: http://www.imsglobal.org/xsd/imscp_v1p1.

Content Packaging has also been through several revisions. The v1p1 namespace (above) was defined in the widely used Content Packaging 1.1 (now on revision 4). The same namespace was used for most of the elements in the (public draft) of the newer IMS Content Packaging version 1.2 specification too. In this case, the decision was made to augment the revised specification with a new schema containing definitions of the new elements only. The existing elements would stay in the 1.1 namespace to ensure that tools that recognise version 1.1 packages continue to work, ignoring the unrecognised extension elements.

Confusingly though, the schema definition provided with the content packaging specification is located here: http://www.imsglobal.org/xsd/imscp_v1p1.xsd whereas the schema definition provided with the common cartridge specification (1.0), for the same namespace, is located here: http://www.imsglobal.org/profile/cc/ccv1p0/derived_schema/imscp_v1p2.xsd. That's two different definition files for the same namespace. Given this discrepancy it is not surprising that newer revisions of common cartridge have chosen to use a new namespace entirely. In the case of 1.1, the namespace used for the basic content packaging elements was changed to http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1.

But this decision is not without consequences. The decision to retain a consistent namespace in the various revisions of the Content Packaging specification enabled existing tools to continue working. Sure enough, the decision to change the namespace in Common Cartridge means that some tools will not continue working. Including my Python libraries used in the QTI migration tool.

From Parser to Python Class

In the early days of XML, you could identify an element within a document by its name, scoped perhaps by the PUBLIC identifier given in the document type definition. The disadvantage being that all elements had to be defined in the same scope. Namespace prefixes were used to help sort this mess out. A namespace aware parser splits off the namespace prefix (everything up to the colon) from the element name and uses it to identify the element by a pair of strings: the namespace (a URI) and the remainder of the element name.

The XML parser at the heart of my python libraries uses these namespace/name pairs as keys into a dictionary which it uses to look up the class object it should use to represent the element. The advantage of this approach is that I can add behaviour to the XML elements when they are deserialized from their XML representations through the methods defined on the corresponding classes. Furthermore, a rich class hierarchy can be defined allowing concepts such as XHTML's organization of elements into groups like 'inline elements' to be represented directly in the class hierarchy.

If I need two different XML definitions to map to the same class I can easily do this by adding multiple entries to the dictionary and mapping them to the same class. So at first glance I seem to have avoided some of the problems inherent with tight-coupling of classes. The following two elements could be mapped to the same Manifest class in my program:

('http://www.imsglobal.org/xsd/imscp_v1p1', 'manifest')
('http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1', 'manifest')

This would work fine when reading the manifest from the XML stream but what about writing manifests? How does my Manifest class know which namespace to use when I'm creating a new manifest? The following code snippet from the python interpreter shows me creating an instance of a Manifest (I pass None as the element's parent). The instance knows which namespace it should be in:

>>> import pyslet.imscpv1p2 as cp
>>> m=cp.Manifest(None)
>>> print m

<manifest xmlns="http://www.imsglobal.org/xsd/imscp_v1p1">
 <organizations/>
 <resources/>
</manifest>

This clearly won't work for the new common cartridges. The Manifest class 'knows' the namespace it is supposed to be in because its canonical XML name is provided as a class attribute on its definition. The obvious solution is to wrap the class with a special common cartridge Manifest that overrides this attribute. That is relatively easy to do, here is the updated definition:

class Manifest(cp.Manifest):
    XMLNAME=("http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1",'manifest')

Unfortunately, this doesn't do enough. Continuing to use the python interpreter....

>>> class Manifest(cp.Manifest):
...     XMLNAME=("http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1",'manifest')
... 
>>> m=Manifest(None)
>>> print m

<manifest xmlns="http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1">
    <organizations xmlns="http://www.imsglobal.org/xsd/imscp_v1p1"/>
    <resources xmlns="http://www.imsglobal.org/xsd/imscp_v1p1"/>
</manifest>

Now we've got the namespace correct on the manifest but the required organizations and resources elements are still created in the old namespace.

The Return of Tight Coupling

If I'm going to fix this issue I'm going to have to wrap the classes used for all the elements in the Content Packaging specification. That sounds like a bit of a chore but remember that the reason why the namespace has changed is because Common Cartridge has added some additional constraints to the specification so we're likely to have to override at least some of the behaviours too.

Unfortunately, wrapping the classes still isn't enough. In the above example the organizations and resources elements are required children of the manifest. So when I created my instance of the Manifest class the Manifest's constructor needed to create instances of the related Organizations and Resources classes and it does this using the default implementations, not the wrapped versions I've defined in my Common Cartridge module. This is known as tight coupling, and the solution is to adopt a dependency injection solution. For a more comprehensive primer on common solutions to this pattern you could do worse than reading Martin Fowler's article Inversion of Control Containers and the Dependency Injection pattern.

The important point here is that the logic inside my Manifest class, including the logic that takes place during construction, needs to be decoupled from the decision to use a particular class object to instantiate the Organizations and Resources elements. These dependencies need to be injected into the code somehow.

I must admit, I find the example solutions in Java frameworks confusing because the additional coding required to satisfy the compiler makes it harder to see what is really going on. There aren't many good examples of how to solve the problem in python. The python wiki points straight to an article called Dependency Injection The Python Way. But this article describes a full feature broker (like the service locator solution) which seems like overkill for my coupling problem.

A simpler solution is to pass dependencies in (in my case on the constructor) following a pattern similar to the one in this blogpost. In fact, this poster is trying to solve a related problem of module-level dependeny but the basic idea is the same. I could pass the wrapped class objects to the constructor.

Dependency Injection using Class Attributes

The spirit of the python language is certainly one of adopting the simplest solution that solves the problem. So here is my dependency injection solution to this specific case of tight coupling.

I start by adding class attributes to set class dependencies. My base Manifest class now looks something like this:

class Manifest:
    XMLNAME=("http://www.imsglobal.org/xsd/imscp_v1p1",'manifest')
    MetadataClass=Metadata
    OrganizationsClass=Organizations
    ResourcesClass=Resources

    # method definitions and other attributes follow...

And in my Common Cartridge module it is overridden like this:

class Manifest(cp.Manifest):
    XMLNAME=("http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1",'manifest')
    MetadataClass=Metadata
    OrganizationsClass=Organizations
    ResourcesClass=Resources

Although these look similar, in the first case the Metadata, Organizations and Resources names refer to classes in the base Content Packaging module whereas in the second definition they refer to overrides in the Common Cartridge Module (note the use of cp.Manifest to select the base class from the original Content Packaging module).

Now the original Manifest's constructor is modified to use these class attributes to create the required child elements:

    def __init__(self,parent):
        self.Metadata=None
        self.Organizations=self.OrganizationsClass(self)
        self.Resources=self.ResourcesClass(self)

The upshot is that when I create an instance of the Common Cartridge Manifest I don't need to override the constructor just to solve the dependency problem. The base class constructor will now create the correct Organizations and Resources members using the overridden class attributes.

I've abbreviated the code a bit, if you want to see the full implementation you can see it in the trunk of the pyslet framework.

2012-01-20

Explore QTI in depth

Explore QTI in depth: "Xatapult"

Interesting little article explaining the basics of the QTI v2 data model - I think this type of document provides a much better overview than the documentation that comes with the specification itself.

'via Blog this'

2011-08-23

Do you <object> to <img>?

An interesting question in QTI history came up in discussion amongst the QTI IPS group recently and I thought I'd answer via my blog rather than email.

In QTI, you may use either the <img> element or the <object> element to include images in questions. But in some cases only <object> will do. In the words of the question put to me:

"The rule appears to be that declaration as img is used where the XHTML is to be passed on to (say) a browser for display, but declaration as an object is used where more complex handling is required."

Why?

Both <object> and <img> are valid ways of including an image in XHTML - during spec development Safari would render them if you just pointed it at the QTI without any processing!

However, at the time of writing <object> was considered the better way to include images with <img> being the legacy method common in HTML files. As with many efforts to regularize practice this hasn't really caught on as much as the HTML working group hoped!

Anyway, as a result of the above, where we wanted to accept an image and nothing else (i.e., in certain graphic interactions where issues like bounding rectangles, click-detection and the like needed to be simple and well defined) we chose the <object> form and not the <img> form.

This doesn't stop you putting images in runs of text in places like simpleChoice where layout is not critical. And the width and height can be put on both <img> and <object> to hint at desired rendering size. In both cases the attributes are optional as most rendering engines can use the default dimensions drawn from the media file itself.

But note that the <img> element has no way to specify the media type, which does place a burden on the rendering engine if it needs to sniff the image size because it will have to use file extension heuristics, magic bytes or similar to determine the type of file before it is able use it. These are the little things that cause bugs: <object> requires the media type so it wins there too.

At first glance, <object> seems to lack the accessibility features of <img> as it has no alt text. But it does provide a fairly rich way of handling platform limitations and accessibility needs, the rule being that the contents of the element should be used as a fallback if the object itself is not suitable. There is an issue here with its use in QTI. We chose <object> because we wanted to force people to use an image, not a text-flow. In graphic questions drag-and-drop style renderings are common and this is much harder with random chunks of HTML. But if the image is of a format that the browser does not support will a rendering agent use the fallback rules and what happens if they end up with a text flow anyway? QTI is silent on this but I would not rely on any fallback content when <object> is used in graphical interactions for one of its special purposes. If a delivery engine can't show the objects in a graphical question it should seek an alternative question, not an alternative rendering, in my opinion.

By the way, in the case of <gapImg> we explicitly added a label attribute alongside the <object> to make it clear that, if a label is given by the author it should be shown to all candidates and not treated as a more accessible alternative. Which brings me on to the issue of alt text in assessment generally...

When I first left university I seriously considered working as a graphic artist, so speaking from this limited experience I feel entitled to say that however much you think of your drawing skills others may not recognize your 'dog', 'cat' or even your 'boa constrictor digesting an elephant' (Google it if you are curious). If in doubt, a label is a good idea for everyone.

2011-07-17

Using gencodec to make a custom character mapping

One of the problems I face in the QTI migration tool is markup that looks like this:

<mattext>The circumference of a circle diameter 1 is given by the mathematical constant: </mattext>
<mattext charset="greek">p</mattext>

In XML the charset used in a document is detected according to various rules, starting from information available before the XML stream is parsed and culminating in the encoding declaration in the XML declaration at the top of the file:

<?xml version = "1.0" encoding = "UTF-8">

For this reason, the use of the charset parameter in QTI version 1 is of limited value, at best it might provide a hint on an appropriate font to use when rendering the element. This is not a huge problem these days but when QTI v1 was written it was common for document renderings to be peppered with large squares indicating that the selected font had no glyph for the required character. These days renderers are smarter about selecting default fonts enabling developers to display arbitrary unicode text.

So you would think that charset is redundant but there is one situation where we do need to take note: the symbol font. The problem is explained well in this article: Symbol font – Unicode alternatives for Greek and special characters in HTML. The use of 'greek' in the QTI v1 examples is clearly intended to indicate use of the symbol font in a similar way - not the use of the 'greek' codepage in ISO-8859. The Symbol font is used a lot in older mathematical questions, you can play around with the codec on this neat little web page: Symbol font to Unicode converter.

According to the above article the unicode character representing the lower-case letter 'p', when rendered in the symbol font actually appears to the user like this: π - known as Greek small letter pi.

The problem for my Python script is that I need to map these characters to the target unicode forms before writing them out to the QTI version 2 file. This is where the neat gencodec.py script comes in. I don't know where this is documented other than in the gencodec source file itself. But this is a very useful utility!

The synopsis of the tool is:

This script parses Unicode mapping files as available from the Unicode
site (ftp://ftp.unicode.org/Public/MAPPINGS/) and creates Python codecmodules from them.

So I downloaded the following mapping to a directory called 'codecs' on my laptop:

ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/SYMBOL.TXT

Then I ran the gencodec script:

$ python gencodec.py codecs pyslet
converting SYMBOL.TXT to pysletsymbol.py and pysletsymbol.mapping

And confirmed that the mapping was working using the interpreter:

$ python
Python 2.7.1 (r271:86882M, Nov 30 2010, 09:39:13) 
[GCC 4.0.1 (Apple Inc. build 5494)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> unicode('p','symbol')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: symbol
>>> import pysletsymbol
>>> reg=pysletsymbol.getregentry()
>>> import codecs
>>> def SymbolSearch(name):
...   if name=='symbol': return reg;
...   else: return None
... 
>>> codecs.register(SymbolSearch)
>>> unicode('p','symbol')
u'\u03c0'
>>> print unicode('p','symbol')
π

In previous versions of the migration tool I didn't include symbol font mapping because I thought it would be too laborious to create the mapping. I was wrong, future versions will do this mapping automatically.

2011-06-17

Getting ready for HTML5: Accessibility in QTI, img and alt text

Last night I was playing around with David McKain and co's excellent MathAssessEngine site.

I tripped over an interesting error with some test data produced by the QTI migration tool. I was converting a basic MCQ with four images used as the choices. On loading my content package into MathAssessEngine I got four errors like this:

Error: Required attribute is not defined: alt

I went off in search of a bug in my XML generation code from the migration tool but discovered that what MathAssessEngine is really complaining about is an empty string being used as the alt text for an image. Actually, empty alt text is allowed by the specification (my QTI v2 files validate) and it is also allowed by HTML4 so I think it is more a bug in the MathAssessEngine, but it did force me to go and check current thinking on this issue because it is so important for accessibility.

According to the current editor's draft of HTML5 the alt attribute "must be specified and its value must not be empty" so it looks like QTI-based tools will need to address this issue in the near future.

The problem with the QTI migration tool is that it only has old scrappy content to work with. There isn't even the facility to put an alt attribute on QTI version 1.x's matimage which, incidentally, is another reason why the community should be moving to QTI version 2.

So is there any way to set the alt text automatically when migrating version 1 content?

One possibility is to use the label attribute on matimage as the alt text for the img element when converting to version 2. The description of the label attribute in QTI version 1 is a 'content label' used for editing and searching. This might be quite close to the purpose of alt for matimage because a text string used to find an image in a search might be a sensible string for use when the image cannot be displayed. However, editing sounds like something only the author would do so there is a risk that the label would be inappropriate for the candidate. There is always the risk of spoiling the question, for example, if the label on an image contained the word "correct" then candidates that experienced the alt text instead of the image would be at a significant advantage!

Another common way to auto-generate alt text is to use the file name of the image, this is less likely to leak information as authors are more likely to figure that the file name might be visible to the candidate anyway. Unfortunately, image file names are typically meaningless so it would be text for the sake of it and it might even confuse the candidate - especially if the names contained letters or numbers that might get confused with the controls: just imagine a speech interface reading a shuffled MCQ: "option A image B; option B image C; option C image A" - now our poor alt text user is at a serious disadvantage.

Finally, adding a general word like 'image' is perhaps the safest thing and something I might experiment with in the future for the QTI Migration tool but clearly the word 'image' needs to be localized to the language used in the context of the image tag, otherwise it might also be distracting. I don't have a suitable look-up table to hand.

So in conclusion, content converted from version 1 is always likely to need review for accessibility. Also, my experience with the migration tool reaffirms my belief that developers of QTI 2 authoring tools should start enforcing the non-empty constraint on alt for compatibility now to get ready for HTML5.

2011-06-16

QTI on the Curriculum

01NPYPD - Linguaggi e Ambienti Multimediali: "qui"

It is amazing what you stumble upon when you use Google alerts. I was intrigued to see these course materials which include a 62-slide introduction to QTI version 2 alongside such esteemed subjects as HTML5, SVG and CSS.

The slides are in English by the way!

2011-02-03

Semantic Markup in HTML

A few days ago I spotted an interesting link from the BBC about the use of semantic markup.

This page got me thinking again about something I blogged about on my Questionmark blog too. One of the problems we experienced during the development of QTI was the issue of 'presentation'. In QTI, the presentation element refers to the structured material and interaction points that make up the question. However, to many people the word 'presentation' means the form of markup used at the point the question is delivered.

I always found this confusion difficult. Clearly people don't present the XML markup to candidates, so the real fear was the QTI would allow Question authors to specify things that should be left open until the method of presentation is known by the delivery system.

For some people, using a language like HTML implies that you have crossed this line. But the BBC page on using HTML to hold semantic markup is heartening to me because I think that QTI would be better bound directly into HTML too.

HTML has been part of QTI since the early days (when you had to choose between HTML and RTF for marking up material). With QTI version 2 we made the integration much closer. However, XHTML was in its infancy and work to make the XHTML schema more flexible through use of XML Schema and modularisation of the data model was only just getting going. As a result, QTI contains a clumsy profile of XHTML transformed into the QTI namespace itself.

In fact, XHTML and XML Schema have not played so well together and HTML5 takes the format in a new technical direction as far as the binding is concerned. For QTI, this may become a block to the rapid building of innovative applications that are also standards compliant.

But bindings are much less important than information. I always thought that QTI xml would be transformed directly into HTML for presentation by server-side scripts or, if required, by pre-processing with XSLT to make HTML-based packages. That hasn't really happened, so I thought it might be harder than I thought.

However, I did a little research and have had no difficulty transforming the simple QTI XML examples from QTI version 2 into XHTML 5 and back again using data-* attributes to augment the basic HTML markup. I'll post the transform I used if there is interest. Please add a comment/reply to this post.

Steve