VOSpace service specification: Replies to Request for Comments | ||||||||
Changed: | ||||||||
< < | This page adds additional details to some of the replies to the questions raised on the VOSpace RCF comments page. | |||||||
> > | This page adds additional details to some of the replies to the questions raised on the VOSpace RCF comments page. | |||||||
Comments from MarkTaylor
By default, all properties are treated as string name:value pairs, or more accurately as uri:value pairs. So a client can set a generic property, and the service will just store it as a string. If we have a property URI that reprepresents 'mime type', then a service that provided HTTP GET access could add the property value to the relevant header field of the HTTP GET response. A basic service might not implement support for adding the mime type header to a HTTP GET response. In which case, it would not list the 'mime type' property in its 'accepts' or 'provides' list. A client can still set the property, but the service would just treat it as a string value, and service would not try to interpret any meaning from the value. A more complex service might understand that the 'mime type' property should be added to the headers of a HTTP GET response. By including the property in its 'accepts' and 'provides' list the service is declaring that it will allow clients to modify the value, and it will add it to the relevant header field of a HTTP GET response. If we have a property URI that represents 'data size', then the value will depend on the actual data itself, and logically it should not be possible to set this from outside. By including the 'data size' property in its 'provides' list, the service is declaring that it 'understands' the property, and will generate the property value from the data. (DaveMorris) Comments from DougTody
In my view, a basic architectural principle for services is that it should be possible to understand and use any service stand-alone, independent of other software such as the registry (although the registry might be used for related higher level functions such as discovery). This does not appear to be the case here, as fundamental information about transport details and data formats or attributes are only available via indirect URIs which are intended to be registry resolvable (there are some weasle words about nonresolvable URNs, but clearly this is discouraged and registry integration is the intention).
Once we have registered the core properties, protocols and views then these will all become fixed URIs.
Although the URIs would refer to descriptive registry resources, in practice simple clients and services would not need to dereference these via the registry, they just become constant strings and can be hard coded into both service and client.
During normal operation neither the service or client should need to dereference any of the URI identifiers. They can all be treated as opaque identifier strings.
However I appreciate that this isn't clear in the way that the specification is worded.
A number the questions raised in the RFC comments suggest that although the specification may be technically correct, it isn't clear about how thing would actually work in a deployed service.
We plan to create a supplementary document which will include one or more concrete examples of how a get and put transfer would work.
In the mean time, the following use cases may help to explain a bit more :
Use case #1 :A client tries to access data in a service that only supports encrypted or authenticated protocols. The client does not understand these protocols, but it can still list the protocol (unknown) URIs in its error message :Unable to find common transfer protocol. Client supports : ivo://aaaa ivo://bbbb Service provides : ivo://xxxx ivo://yyyyA slightly more complex client may dereference these URIs and display the display name for the unknown protocols : Unable to find common transfer protocol. Client supports : "Simple http put" (ivo://aaaa) "Simple ftp put" (ivo://bbbb) Service provides : "Secure http put" (ivo://xxxx) "Secure ftp put" (ivo://yyyy)The user can then look for a different client tool that implements one of the sercure protocols (by searching the registry for applications that support the protocol URI). Or, they can send a request to the application developer asking them if they plan to support the new protocol. All the user would need to pass to the developer is the URI of the new protocol ivo://xxxx .
The corresponding description in the registry should provide enough information to enable the developer to implement the new protocol.
The simplest case would be to include a URL in the registry description that pointed to an external resource containing the full protocol specification.
Use case #2 :A service implementation begins to see a lot of transfer requests using a new protocol listed in their server logs. If their logs indicate that a lot of client applications are capable of using the new protocol, then they may decide that it is time to updated their service to support it. The service provider can pass the protocol URI to their developer team, who can then lookup the description in the registry.Use case #3 :A complex GUI client may dereference the protocol URIs and use the display name and description to present the options to the user in a more user friendly way. To speed things up, the client implementation may hard code the most common ones and cache the unusual ones. The GUI tool may also use the descriptive names and text from the registry resource to populate select lists, tool tips or help boxes.Summary
Basic things such as a data format ("view") or available transfer protocol (e.g., HTTP) are described indirectly in descriptors which are stored in a registry and referenced by URI (so far as I can tell). Although it does not explicitly say, I suspect we might be able to use string equality on such a URI to test for these things (as one would test a MIME type for example), but in principle we would need to look the URI up in the registry, and parse the XML data structure which comes back, before we can determine basic information about what a service can do. The actual spec repeatedly says things like "at the time of this writing, the schema for registering in the IVO registry has not been finalized". Even ignoring the undesirable registry interaction, which should not be required to directly use a service, it does not appear that such details are sufficiently specified at this point.
"... I suspect we might be able to use string equality on such a URI to test for these things ..." - Yes
"... but in principle we would need to look the URI up in the registry, and parse the XML data structure which comes back, before we can determine basic information about what a service can do ..." - No
Once the core set of properties, protocols and views have been registered, client and server implementations can treat the URIs as hard coded string constants.
There is no need to dereference the registry URIs unless an application such as a complex GUI client wants to use the information to display user friendly names and descriptions.
(DaveMorris)
Although the spec says it does not attempt to address hierarchy, the restriction to "no slashes" in a URI path appears arbitrary. The internal logical structure implied by a pathname should be transparent to something like VOSpace, for which this is merely a component of a string identifying the file within a VOSpace. This might change if the VOSpace supports "directory"-level operations, but this could be added without affecting file pathnames within a VOSpace. Otherwise, to flatten a directory hierarchy, it will be necessary for a VOSpace to invent arbitrary filenames, unique within a VOSpace, to substitute for pathnames containing a slash.
This was intentional.
We plan to get v1.1 agreed as soon as possible, and that would support hierarchical data structure, with '/' as the path delimitor.
This means that '/' will have a specific meaning in vospace 1.1, so to avoid confusion between the two we want to exclude it from use in vospace 1.0.
It may be possible that a vospace 1.1 service publishes a sub directory as a vospace 1.0 service, allowing vospace 1.0 clients to access data within just that sub directory. In which case, we want the URI identifiers to make sense between the two systems.
If we allow '/' in vospace 1.0 names (and just treat them as normal strings without interpreting the '/'), then there would be an 'odd'
different in behaviour between vospace 1.0 and vospace 1.1 services.
I am not convinced of the need for "structured" nodes at this early stage, or for a VOSpace to perform arbitrary file format or data model-based conversions. This might be useful to allow a VOSpace to be used to access tables stored natively in a RDBMS, but at present this is not sufficiently well defined, at least not in the written specification. It is not clear whether a VOSpace should natively provide such a capability when this will already be provided by the more object-oriented DAL interfaces such as TAP, SIA, etc. It might be best to first provide a solid VOSpace interface for basic "file" (simple byte stream) access before addressing object-oriented access.
Support for StructuredData was added to the specification to compliment the DAL services, not to replace them. One target application of VOSpace is to provide a way of importing data into these services.
Some specific examples might help :
Images in SIAThe SIA specification provides tools for finding and getting images, but as yet, no API for importing images. In order to provide an import mechanism, a SIA service could also provide a VOSpace API. The user could use the vospace API to transfer images into the service, using the StructuredData node to identify the files as FITS images.
VOTable in TAPThe TAP interface provides tools for searching and querying the database, but as yet, no API for importing data. The VOSpace API would enable the user to transfer tabular data into the service, using the StructuredData node to identify the files as tabular data.
Does pushToVOSpace deal with a single data node? It says it returns a list of URLs, but there is only a single destination node, so I would guess that the URLs refer to alternative transport protocols. How does the client decide which to use -does it have to parse each URL to determine the protocol? (Possibly this is addressed in the WSDL, but semantic details like this should be addressed in the written specification as well).
Yes, the list of protocol options represent different endpoint(s) which provide access to the same data.
These could be alternative mirror services with the same protocol, or different protocol URLs.
If asked to provide access to the data via http or ftp, the service response could contain :
One of the things I was looking for was whether a URL could be obtained to directly GET or PUT a file, and this does appear to be the case (e.g, pullFromVOSpace). This would allow us, for example, to have a data service deliver a message to a client giving the access reference URL for a data object, once it has been generated by an asynchronous service. This might allow the VOSpace machinery to be hidden from a simple client, for example when functionality such as VOSpace and UWS are used within a TAP or SIA data service.
Yes - this was one of the use cases we had in mind when designing the interface.
|