2012-05-04

IMS LTI and the length of oauth_consumer_key

I ran in to an interesting problem today.  While playing around with the IMS LTI specification I ran into a problem with the restriction, in MySQL, on keys being 1000 bytes.


ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes


OAuth uses the concept of a consumer key to identify the system from which a signed HTTP request has been generated.  The consumer key can, in theory, be any Unicode string of characters and the specification is silent on the issue of a maximum length.  The LTI specification uses examples in which the consumer key is derived from the DNS name of the originating system, perhaps prefixed with some additional identifier.  A DNS name can be a maximum of 255 characters, but the character set of a DNS name is restricted to a simple ASCII subset.  International domain names are now allowed but these are transformed into the simpler form so the effective maximum for a domain name using characters outside the simple ASCII set is reduced.

It seems likely that an oauth_consumer_key is going to get used as a key in a database table at some point during your implementation.  The clue is in the name.

A field such as VARCHAR(255) seems reasonable as storage, provided the character set of the field can take arbitrary Unicode characters.   Unfortunately this is likely to reserve a large amount of space, MySQL reserves 3 bytes per character when the UTF-8 character set is used to ensure that worst case encoding is accommodated.  That means that this key alone takes up 765 bytes of the 1000 byte limit, leaving only 235 bytes for any compound keys.  If the compound key is also likely to be VARCHAR that's a maximum of VARCHAR(78), which seems short if the compound key is something like LTI's context_id which is also a size unrestricted arbitrary Unicode string.  The context_id identifies the course within the Tool Consumer so a combined key of oauth_consumer_key and context_id looks like a reasonable choice.

One possibility might be to collapse consumer key values onto ASCII using the same (or a similar) algorithm to the one used for international domain names (see RFC 3490).  This algorithm would then allow use of the ASCII character set for these keys with the benefit that keys based on domain names, even if expressed in the Unicode original form, would end up taking 255 bytes or less.  Doing the translation may add to the overhead of the look-up but the benefit of reducing the overall key size might pay off anyway.

2 comments:

  1. The way I'll be dealing with this when I get my tool using a database (xml files for now) is using an md5 of the LTI key as the database table key.

    ReplyDelete
    Replies
    1. Thanks for the tip Niall, yes I can see that a digest of the longer keys makes sense. It may be a good idea to move to a stronger algorithm than MD5 though as I am increasingly seeing code audit issues where it has been used. The SHA-2 based algorithms are now preferred -- http://en.wikipedia.org/wiki/SHA-2

      Delete