The ideas in this note arose from discussions in IVOA meetings, in AstroGrid and in the VOTech-DS3 component of EuroVO. I thank my colleagues there for all their ideas and advice. In particular, I thank Reagan Moore for first pointing out that Shibboleth was useful and Andy Lawrence for insisting that it still be considered when I'd discovered the problems and was becoming dismissive.
The idea of using Shibboleth services with PKI authentication was taken from Von Welch's "GridShib" project, as presented at the UK e-Science meeting on security [e-Science], although the details have been worked out separately and the eventual solution may not be the same.
Shibboleth [Shibboleth] is an on-line security system developed by the Internet 2 project. It provides authentication functions for HTTP services and also serves information useful for authorization decisions. At base, Shibboleth is a single-sign-on (SSO) facility for a grid of HTTP services. The authors of Shibboleth are particularly concerned with the case where the user agent is a web browser and the HTTP service is a conventional web server.
Shibboleth attempts to deal with huge numbers of users by 'federating administration of users'. This means that users are registered, and their accounts managed, at the users' home institutions. These registration details are looked up at run-time by the 'payload' services needing to authenticate the users; Shibboleth is the set of middleware that allows the payload services to perform this authentication.
The problem addressed by Shibboleth is very close to that faced by the IVO in controlling access to resources. Shibboleth is a possible solution, both as a design for an SSO system and as a reference implementation. This paper examines the Shibboleth design, looking at how it might fit into the IVOA architecture. At the time of writing, I have not evaluated the Shibboleth implementation; that could be done if there is sufficient interest in adopting the design.
Shibboleth adds a number of services to the basic, unsecured system. At each site with web resources needing to be secured, there is a Shibboleth Attribute Requester (SHAR) and a Shibboleth Indexical Reference Establisher (SHIRE). At each site where users are registered, there is an Attribute Authority (AA), a Handle Service (HS) and a local authentication system (by which users sign on). There is also, typically, a Where Are You From (WAYF) service.The WAYF is chosen by the service provider, so, in principle, each payload service could run its own WAYF. However, a WAYF is typically provided by the virtual organization in which the services are federated.
The SHIRE, WAYF and HS are used to establish a handle for a user. This handle is an anonymized, unique reference to the user that is understood by the other services at the user's point of registration.
The SHAR and AA are used to authenticate the user's use of a handle and to provide 'attributes' to the payload service that allow it to make an authorization decision.
The security-check process is in three parts.
The SHIRE intercepts the attempted access to the web resource and initiates the search for a handle. It typically delegates this to a WAYF service. The delegation is done by sending a redirect response to the user agent. The URL to which the agent is redirected contains two parameters: the URL that the agent was originally trying to reach and a service endpoint in the SHIRE to which a handle may be sent.
The WAYF 'interacts with [the user] to find out his origin site'. The Shibboleth documents explicitly decline to state how this interaction is carried out. The protocol specification says:
'A WAYF is free to interact with the principal's user agent in any manner it deems appropriate to determine the identity provider to which to relay the authentication request. This includes, but is not limited to, presenting lists, a search interface, heuristics based on client characteristics, etc. A WAYF service service SHOULD provide some means for the user agent to cache the user's selection, perhaps using HTTP cookies, but SHOULD also provide a reasonable means for the user to change the selection in the future.'
Given that Shibboleth is intended for use with web browsers, this means the WAYF sends the user agent a form for the user to fill in. Having accepted the form, the WAYF sets one or more cookies to retain the information for future authentications. This is the only method that can work with current web-browsers. The details of the form and the cookies are not defined and can vary between virtual organizations.
The WAYF redirects the user agent to the HS, including in the URL the two parameters sent to it, the WAYF, by the SHIRE: target URL and the call-back URL on the SHIRE. The HS derives a handle for the user. The means by which the HS identifies the user is not specified exactly:
'...the principal is identified by the identity provider by some means outside the scope of this specification. This may require a new act of authentication or it may reuse an existing authenticated session.'
Given that the user agent is typically a web browser, 'reuse an existing session' implies that the HS looks for an HTTP cookie set when the user originally logged on to the system.
The HS encodes the handle in a SAMLResponse (an XML structure defined by the Security Assertion Mark-up Language). The HS signs this structure digitally.
The HS then returns an HTML form to the user agent and the agent is assumed to display it to the user. The form tells the user what security information is being shared with the SHIRE; the SAMLResponse carrying that information is present as a base-64-encoded, hidden parameter of the form. Submitting the form sends an HTTP-post message to the callback endpoint specified by the SHIRE when it started the authentication process.
There is provision for automating this stage:
'Furthermore [the HS] MAY include in the response sufficient client-side scripting to cause the form to be submitted automatically without intervention by the user...'
'Client-side scripting' presumably means Javascript or ECMAscript.
The SHIRE is required to validate the signature on the SAMLResponse. However, the details of how it does this are not fully specified by Shibboleth:
'The verification key is assumed to be obtainable through unspecified means (e.g. in a certificate passed along with the [SAMLResponse]; also unspecified is how the association between that key and the HS is to be validated by the SHIRE...'
If the SHIRE accepts the handle, it passes it to the SHAR by unspecified means:
'Shibboleth doesn't specify the interaction between the SHIRE and the SHAR components. In many, perhaps most, cases, the SHIRE and SHAR will be elements of a common implementation module within an HTTP server...'
Once the SHAR has the handle, it can ask for SAML 'attributes' relating to that handle. It sends SAML request to the Attribute Authority and receives back a SAML response. The SAML specification gives schemata for this request and response and the Shibboleth specification defines which parts of SAML must be used.
The SHARs and AAs in a given virtual organization may use any protocol, but the Shibboleth specification requires that both support SOAP 1.1 over HTTPS. This is presumable the protocol supported by the reference implementation of Shibboleth.
Shibboleth has some features considered important for the IVO.
Shibboleth also comes with a working reference implementation. It can be added to a simple web-server without writing any code and, apparently, without changing any of the content on that web server. Thus, Shibboleth security can easily be applied to static files on a web server and to CGI services on such a server. It would be quite straightforward, for example, to apply it to the SIAP service ivo://uk.ac.cam.ast/INT-WFS/images/siap-atlas, which is a CGI programme running on an Apache web-server. However, for the reasons listed in the following section, it would be harder to apply Shibboleth to a service made of Java servlets and very hard to apply Shibboleth to any SOAP service.
Shibboleth is designed to control access to web pages by users with web browsers. The primary use-case in the Shibboleth specification is 'Joe surfs the web'. This focus severely restricts the use of other user agents. Shibboleth assumes the following points.
I am not sure about point 10; the architecture documents imply it but do not state it.
Points 1 and 2 and, to a lesser extent points 3 and 5, make it difficult to use a Shibboleth-protected service from a user-agent that is not a web browser. Even the common command-line tool wget will not cope with a Shibboleth-protected web-site. It is not feasible in the general case to write a user agent that can cope autonomously with the arbitrary, unspecified form sent by a WAYF; however, it might be possible to automate this process for a given, known WAYF.
Points 7, 8 and 9 make it difficult to build a Shibboleth system out of parts from different authors. So much protocol is left unspecified that I assume that Shibboleth works only because it is a single, reference implementation. An alternative implement of some part could be made by reverse-engineering the protocol from the reference implementation (the source-code is open), but this could prove fragile.
Point 4 makes it difficult to use Shibboleth to protect resources on an FTP server. It is perhaps possible to map every FTP URL to an equivalent HTTP URL, and to redirect from the HTTP server to the FTP server after the security check; but this makes the work of setting up web resources more complex and fragile, and the underlying FTP URLs are then only obscure, not truly secure.
The most-limiting point is number 10. If I understand the architecture correctly, the body of a request for a web resource is lost in the process of authentication; only the URL, with any embedded parameters survives. This means that SOAP messages sent over HTTP are destroyed by Shibboleth, unless the SOAP envelope is encoded and embedded in the URL which is a perverse way to use SOAP.
If Shibboleth does preserve the message body, then it is possible to write a SOAP client that deals with Shibboleth. However, not all toolkits for generating SOAP stubs will handle the redirections; Apache Axis, for example, will not.
The use of the handle system limits the strength of the authentication of agents to payload services. Handles are effectively secrets; any agent that learns of a valid handle can use it to authenticate to a payload service. If the user agent connects to the payload site using unencrypted HTTP rather than HTTPS, then handles are being transmitted in clear text, so can be read and copied. This can be mitigated by considering the handles to be valid only for a short time (say 10 minutes) and then requiring the user to reauthenticate to the system. This works for casual, interactive browsing but is a problem for automated or scripted use of the system.
I consider that the Shibboleth Handle Service and Attribute Authority service are both usable as they are (relatively) simple SAML services. The use of redirections to obtain a handle is not suitable for the IVO as it does not work well for services that are not CGI programmes and for clients that are not interactive web-browsers. The use of handles as secrets is too weak to be a general basis for the IVO; it may be secure enough for some cases but we should not impose the weakness on services that need to be more secure.
The value in using Shibboleth parts lies more in reusing existing databases of users then in reusing software. If we use Shibboleth at all, then we certainly have to rewrite parts of it to cover web services.
Therefore, the most fruitful path seems to be to use Shibboleth installations as a source of handles and attributes, but to replace the redirection 'dance' by which they are used. The following approach seems plausible:
The implication is that we keep the Shibboleth installations at user sites, add some parts at those sites and replace the Shibboleth parts intended for payload sites.
This model is discussed in more detail in the proposed SSO architecture for the IVO [SSO architecture].
[e-Science] UK e-Science core programme, Town meeting on Security for e-Science, approaches and interoperability http://www.nesc.ac.uk/events/townmeeting0405/
[SAML] SAML technical committee of OASIS, SAML v1.1 information, http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=security#samlv11
[Shibboleth] Internet 2 project, Shibboleth web-site, http://shibboleth.internet2.edu/.