2014-11-24

LTI Tools: Putting Cookies in the Frame

Last year I ran into a problem with IMS LTI in some versions of Internet Explorer. It turns out that my LTI tool was assuming it was OK to save cookies in the browser but IE was assuming it wasn't. Chuck has written about this briefly on his blog and the advice given there is basically now best practice and documented in the specification itself. See LTI, Frames and Cookies – Oh MY!

So I just spent the day trying to implement something like this ready for my QTI migration tool to become an LTI tool rather than a desktop application and it was much harder than I thought it would be.

Firstly, why is this happening? Well if your tool is launched in a frame inside the tool consumer then you are probably falling in to the definition of 'third party' as far as cookies are concerned. There is clearly a balance to be had between privacy and utility here. If you think about it, framed pages probably know who framed them and so something innocuous like an advert for an online store could use cookies to piece together a browser history of every site you visited that included their ad. That's why when you visit your favourite online store after browsing some informational site to do product research the store seems to already know what you want to buy.

Cynics will see a battle between companies like Google and Amazon who make money by encouraging you to find products online (and buy them) and Microsoft who are more into the business of selling software and services, especially to businesses who may not appreciate having their purchasing department gamed. Perhaps it is no wonder that Amazon found themselves in court in Seattle having to defend their technical implementation of P3P.

It turns out that IE can/could be persuaded to accept your tool's cookie if your tool publishes a privacy policy which IE isn't able to parse. I don't want to appear too dismissive but I simply cannot understand how decisions about privacy can be codified into policies that computers can then read and execute with any degree of usefulness. Stackoverflow has some advice on how to do it 'properly' if you are that way inclined, however.


For the rest of us, we're going to have to get used to the idea that cookies may not be accepted and follow Chuck's advice to workaround the issue by opening a new window. It isn't just IE now anyway, by default Safari will block cookies in this situation too. Yes, the solution is horrible, but how horrible?

Some time ago I wrote some Python classes to help implement basic LTI. That felt like a good starting point for my new LTI tool. Here's Safari, pointing at the IMS LTI test harness which I've just primed with the URL of my tool: http://localhost:8081/launch

In the next shot, I've scrolled down to where the action is going to take place. I've opened Safari's developer tools so we can see something of what is happening with cookies.

So what happens when I hit "Press to Launch"? It all happens rather quickly but the frame POSTs to my launch page, which then redirects to a cookie test page. That page realises that the cookies didn't stick and shows a form which autosubmits, popping up a new window with my tool content in. The content is not very interesting but here's how my browser looked immediately after:

There's a real risk that this window will be blocked by a pop-up blocker. Indeed, Chrome does seem to block this type of pop-up window. In fact, Chrome allows the original frame to use cookies so on that browser the pop-up doesn't need to fire. I only tripped the blocker during simulated tests. Still, it is clear that you do need to consider that LTI tools may neither have access to cookies nor be able to pop-up a new window automatically. To cover this case my app has a button that allows you to open the window manually instead, if I change tabs back to the Tool Consumer page you'll see how my pop-up sequence has left the original page/frame.

Notice that Safari now shows my two cookies. These cookies were set by the new window, not by the content of this frame, even though they show up in this view. They show me that my tool has not only displayed its content but has successfully created a cookie-trackable session.


The code required to make this happen is more complex than I thought. I've tried to represent my session construction logic in diagramatic form. Each box represents a request. The top line contains the incoming cookie values ('-' means missing) and the page name. The next line indicates the values of some key query parameters used to pass Session information, Return URL and whether or not a pop-up Window has been opened. The final line shows the out-going cookies. If you follow the right hand path from top to bottom you see a hit to the home page of the tool (aka launch page) with no cookies. It sets the values 0/A and redirects to the test page which confirms that cookies work, outputs a new session identifier (B) and redirects to the content. The left-turns in the sequence show the paths taken when cookies that were set are blocked by the browser.

Why does the session get rewritten from value A to B during the sequence? Just a bit of paranoia. Exposing session identifiers in URLs is considered bad form because they can be easily leaked. Having taken the risk of passing the initial session identifier through the query string we regenerate it to prevent any form of session hijacking or fixation attacks. This is quite a complex issue and is related to other weaknesses such as cross-site request forgery. I found Cross-Site Request Forgery (CSRF) Prevention Cheat Sheet very useful reading!


I will check the code for this into my Pyslet GitHub project shortly, if you want a preview please get in touch or comment on this post. The code is too complex to show in the blog post itself.

2014-11-14

Basic Authentication, SSL and Pyslet's HTTP/OData client

Pyslet is my Python package for Standards in Learning Education and Training and represents a packaging up of the core of my QTI migration script code in a form that makes it easier for other developers to use. Earlier this year I released Pyslet to PyPi and moved development to Github to make it easier for people to download, install and engage with the source code.

Note: this article updated 2017-05-24 with code correction (see comments for details).

Warning: The code in this article will work with the latest Pyslet master from Github, and with any distribution on or later than pyslet-0.5.20141113. At the time of writing the version on PyPi has not been updated!

A recent issue that came up concerns Pyslet's HTTP client. The client is the base class for Pyslet's OData client. In my own work I often use this client to access OData feeds protected with HTTP's basic authentication but I've never properly documented how to do it. There are two approaches...

The simplest way, and the way I used to do it, is to override the client object itself and add the Authorization header at the point where each request is queued.

from pyslet.http.client import Client

class MyAuthenticatedClient(Client):

    # add an __init__ method to set some credentials 
    # in the client

    def queue_request(self, request):
        # add in the authorization credentials
        if (self.credentials is not None and
                not request.has_header("Authorization")):
            request.set_header('Authorization',
                               str(self.credentials))
            super(MyAuthenticatedClient, self).queue_request(request)

This works OK but it forces the issue a bit and will result in the credentials being sent to all URLs, which you may not want. The credentials object should be an instance of pyslet.http.auth.BasicCredentials which takes care of correctly formatting the header. Here is some sample code to create that object:

from pyslet.http.auth import BasicCredentials
from pyslet.rfc2396 import URI

credentials = BasicCredentials()
credentials.userid = "user@example.com"
credentials.password = "secretPa$$word"
credentials.protectionSpace = URI.from_octets(
    'https://www.example.com/mypage').get_canonical_root()

With the above code, str(credentials) returns the string: 'Basic dXNlckBleGFtcGxlLmNvbTpzZWNyZXRQYSQkd29yZA==' which is what you'd expect to pass in the Authorization header.

To make this code play more nicely with the HTTP standard I added some core-support to the HTTP client itself, so you don't need to override the class anymore. The HTTP client now has a credential store and an add_credentials method. Once added, the following happens when a 401 response is received:

  1. The client iterates through any received challenges
  2. Each challenge is matched against the stored credentials
  3. If matching credentials are found then an Authorization header is added and the request resent
  4. If the request receives another 401 response the credentials are removed from the store and we go back to (1)

This process terminates when there are no more credentials that match any of the challenges or when a code other than 401 is received.

If the matching credentials are BasicCredentials (and that's the only type Pyslet supports out of the box!), then some additional logic gets activated on success. RFC 2617 says that for basic authentication, a challenge implies that all paths "at or deeper than the depth of the last symbolic element in the path field" fall into the same protection space. Therefore, when credentials are used successfully, Pyslet adds the path to the credentials using BasicCredentials.add_success_path. Next time a request is sent to a URL on the same server with a path that meets this criterium the Authorization header will be added pre-emptively.

If you want to pre-empt the 401 handling completely then you just need to add a suitable path to the credentials before you add them to the client. So if you know your credentials are good for everything in /website/~user/ you could continue the above code like this:

credentials.add_success_path('/website/~user/')

That last slash is really important, if you leave it off it will add everything in '/website/' to your protection space which is probably not what you want.

SSL

If you're going to pass basic auth credentials around you really should be using https. Python makes it a bit tricky to use HTTPs and be sure that you are using a trusted connection. Pyslet tries to make this a little bit easier. Here's what I do.

  1. With Firefox, go to the site in question and check that SSL is working properly
  2. Export the certificate from the site in PEM format and save to disk, e.g., www.example.com.crt
  3. Repeat for any other sites I want my python script to work with.
  4. Concatenate the files together and save them to, say, 'certificates.pem'
  5. Pass this file name to the HTTP (or OData) client constructor.
from pyslet.http.client import Client

my_client = Client(ca_certs='certificates.pem')
my_client.add_credentials(credentials)

In this code, I've assumed that the credentials were created as above. To be really sure you are secure here, try grabbing a file from a different site or, even better, generate a self-signed certificate and use that instead. (The master version of Pyslet currently has such a certificate ready made in unittests/data_rfc2616/server.crt). Now pass that file for ca_certs and check that you get SSL errors! If you don't, something is broken and you should proceed with caution, or you may just be on a Mac (see notes in Is Python's SSL module correctly validating certificates... for details). And don't pass None for ca_certs as that tells the ssl module not to check at all!

If you don't like messing around with the certificates, and you are using a machine and network that is pretty trustworthy and from which you would happily do your internet banking then the following can be used to proxy for the browser method:

import ssl, string
import pyslet.rfc2396 as uri

certs = []
for s in ('https://www.example.com', 'https://www.example2.com', ):
    # add other sites to the above tuple as you like
    url = uri.URI.from_octets(s)
    certs.append(ssl.get_server_certificate(url.get_addr(),
                 ssl_version=ssl.PROTOCOL_TLSv1))
    with open('certificates.pem', 'wb') as f:
        f.write(string.join(certs,''))

Passing the ssl_version is optional above but the default setting in many Python installations will use the discredited SSLv3 or worse and your server may refuse to serve you, I know mine does! Set it to a protocol you trust.

Remember that you'll have to do this every so often because server certificates expire. You can always grab the certificate authority's certificate instead (and thereby trust a whole slew of sites at once) but if you're going that far then there are better recipes for finding and re-using the built-in machine certificate store anyway. The beauty of this method is that you can self-sign a server certificate you trust and connect to it securely with a Python client without having to mess around with certificate authorities at all, provided you can safely courier the certificate from your server to your client that is! If you are one of the growing number of people who think the whole trust thing is broken anyway since Snowden then this may be an attractive option.

With thanks to @bolhovsky on Github for bringing the need for this article to my attention.

2014-11-04

VMWare Fusion, Windows 8 and British Keyboards

I've been running VMWare Fusion for years to run a dual monitor setup with one screen Mac and the other screen 'PC'.  I recently upgraded to VMWare Fusion 7 and my Windows 7 virtual machine kept working just fine.  However I recently had to start from scratch with a new VM to install Windows 8.1.  I realise that there may be upgrade routes via Windows 8 but, after two hours on the phone with Microsoft technical support for an unrelated issue I personally decided that a clean install would be the way to go.

The problem is, the keyboard just doesn't work properly in the default setup.  It is years since I had to play with these things and it took me a while to figure out how to put mappings in place that make my British keyboard do the right thing.  There may actually be a problem with Fusion 7 if this forum thread is anything to go by.

The symptoms are that the '@' symbol, which is typed with shift-2 on the Mac keyboard was coming out as double-quote like it would on a PC keyboard and try as I might I could not figure out how to type a back-slash at first at all.

The solution is to go to the VM's settings, duplicate the profile you are using (VMWare created a Windows 8 profile for me) and then add some custom mappings. I found this a bit counter intuitive because you need to start by opening something like Notepad in the VM and experimenting until you find the characters you want, note down the key combination you pressed to get them and then switch to the profile screen and add a mapping from the keys you want to press to the key combinations you just discovered work. A picture will help...

Shift-'
This is the double quote on my keyboard but without the custom profile it types '@', a simple switch with Shift-2 that almost all Mac users are probably already familiar with.
Shift-2
The reverse mapping of the above
Option-3
Turns out that the backslash key types a hash/number/sharp sign on the PC so we map Option-3 to that if, like me, you have become habituated to the Apple way.
\
Turns out that the backslash is now typed by the section sign (that's the curly thing that looks like a tiny spiral-arm galaxy).
Shift-`
Lastly, you may have to hunt for the '~', but it gets typed when you hit shift-backslash on the PC. On the Mac it appears over the back-quote or back-tick symbol.

That's the most important ones. Note that shift-3 correctly types the pound-sterling sign '£' so I didn't have to touch that at all. I have no idea how one types a section sign or the combined plus-minus '±' sign on the PC but in my line of work there is very little call for them. I warmly invite readers to add their own suggested mappings as comments to this post!