International Virtual Observatory Alliance

Difference: VOSpaceRFCReplies (3 vs. 4)

Revision 42012-06-26 - root

META TOPICPARENT	name="VOSpaceRFC"

VOSpace service specification: Replies to Request for Comments

Changed:

<
< This page adds additional details to some of the replies to the questions raised on the VOSpace RCF comments page.

>
> This page adds additional details to some of the replies to the questions raised on the VOSpace RCF comments page.

Comments from MarkTaylor

getProperties operation
- accepts return value is described as "A list of identifiers that the service accepts and understands". What does "understands" mean here?

This is a way for the service to declare that it will, interpret specific properties to have specific meanings.

As an example :
By default, all properties are treated as string name:value pairs, or more accurately as uri:value pairs. So a client can set a generic property, and the service will just store it as a string.

If we have a property URI that reprepresents 'mime type', then a service that provided HTTP GET access could add the property value to the relevant header field of the HTTP GET response.

A basic service might not implement support for adding the mime type header to a HTTP GET response. In which case, it would not list the 'mime type' property in its 'accepts' or 'provides' list. A client can still set the property, but the service would just treat it as a string value, and service would not try to interpret any meaning from the value.

A more complex service might understand that the 'mime type' property should be added to the headers of a HTTP GET response. By including the property in its 'accepts' and 'provides' list the service is declaring that it will allow clients to modify the value, and it will add it to the relevant header field of a HTTP GET response.

If we have a property URI that represents 'data size', then the value will depend on the actual data itself, and logically it should not be possible to set this from outside.

By including the 'data size' property in its 'provides' list, the service is declaring that it 'understands' the property, and will generate the property value from the data.

(DaveMorris)

Comments from DougTody

In my view, a basic architectural principle for services is that it should be possible to understand and use any service stand-alone, independent of other software such as the registry (although the registry might be used for related higher level functions such as discovery). This does not appear to be the case here, as fundamental information about transport details and data formats or attributes are only available via indirect URIs which are intended to be registry resolvable (there are some weasle words about nonresolvable URNs, but clearly this is discouraged and registry integration is the intention).

Once we have registered the core properties, protocols and views then these will all become fixed URIs. Although the URIs would refer to descriptive registry resources, in practice simple clients and services would not need to dereference these via the registry, they just become constant strings and can be hard coded into both service and client.

During normal operation neither the service or client should need to dereference any of the URI identifiers. They can all be treated as opaque identifier strings.

However I appreciate that this isn't clear in the way that the specification is worded.

A number the questions raised in the RFC comments suggest that although the specification may be technically correct, it isn't clear about how thing would actually work in a deployed service.

We plan to create a supplementary document which will include one or more concrete examples of how a get and put transfer would work.

In the mean time, the following use cases may help to explain a bit more :

Use case #1 :

A client tries to access data in a service that only supports encrypted or authenticated protocols. The client does not understand these protocols, but it can still list the protocol (unknown) URIs in its error message :

    Unable to find common transfer protocol.
    Client supports :
        ivo://aaaa
        ivo://bbbb
    Service provides :
        ivo://xxxx
        ivo://yyyy

A slightly more complex client may dereference these URIs and display the display name for the unknown protocols :

    Unable to find common transfer protocol.
    Client supports :
        "Simple http put" (ivo://aaaa)
        "Simple ftp  put" (ivo://bbbb)
    Service provides :
        "Secure http put" (ivo://xxxx)
        "Secure ftp  put" (ivo://yyyy)

The user can then look for a different client tool that implements one of the sercure protocols (by searching the registry for applications that support the protocol URI).

Or, they can send a request to the application developer asking them if they plan to support the new protocol. All the user would need to pass to the developer is the URI of the new protocol ivo://xxxx. The corresponding description in the registry should provide enough information to enable the developer to implement the new protocol. The simplest case would be to include a URL in the registry description that pointed to an external resource containing the full protocol specification.

Use case #2 :

A service implementation begins to see a lot of transfer requests using a new protocol listed in their server logs.

If their logs indicate that a lot of client applications are capable of using the new protocol, then they may decide that it is time to updated their service to support it. The service provider can pass the protocol URI to their developer team, who can then lookup the description in the registry.

Use case #3 :

A complex GUI client may dereference the protocol URIs and use the display name and description to present the options to the user in a more user friendly way.

To speed things up, the client implementation may hard code the most common ones and cache the unusual ones.

The GUI tool may also use the descriptive names and text from the registry resource to populate select lists, tool tips or help boxes.

Summary

The specification only defines the identifiers as xsi:anyURI

Implementations are free to use proprietary URN style identifiers urn:file-size.
If an identifier points to a resource in the registry that contains things like display name and description, then it gives the community a common reference point to describe the property, view or protocol.

Server and client implementations can treat the URI identifiers as opaque strings.
If a client or service understands a particular property, then the corresponding URI can be treated as a fixed string.
If a client or service encounters a new URI that it does not understand, then it can just treat it as an unknown string.

A more complex implementation may use the display name and description in the registry resource to display additional information to the user.

(DaveMorris)

Basic things such as a data format ("view") or available transfer protocol (e.g., HTTP) are described indirectly in descriptors which are stored in a registry and referenced by URI (so far as I can tell). Although it does not explicitly say, I suspect we might be able to use string equality on such a URI to test for these things (as one would test a MIME type for example), but in principle we would need to look the URI up in the registry, and parse the XML data structure which comes back, before we can determine basic information about what a service can do. The actual spec repeatedly says things like "at the time of this writing, the schema for registering in the IVO registry has not been finalized". Even ignoring the undesirable registry interaction, which should not be required to directly use a service, it does not appear that such details are sufficiently specified at this point.

"... I suspect we might be able to use string equality on such a URI to test for these things ..." - Yes

"... but in principle we would need to look the URI up in the registry, and parse the XML data structure which comes back, before we can determine basic information about what a service can do ..." - No

Once the core set of properties, protocols and views have been registered, client and server implementations can treat the URIs as hard coded string constants.

There is no need to dereference the registry URIs unless an application such as a complex GUI client wants to use the information to display user friendly names and descriptions.

(DaveMorris)

Although the spec says it does not attempt to address hierarchy, the restriction to "no slashes" in a URI path appears arbitrary. The internal logical structure implied by a pathname should be transparent to something like VOSpace, for which this is merely a component of a string identifying the file within a VOSpace. This might change if the VOSpace supports "directory"-level operations, but this could be added without affecting file pathnames within a VOSpace. Otherwise, to flatten a directory hierarchy, it will be necessary for a VOSpace to invent arbitrary filenames, unique within a VOSpace, to substitute for pathnames containing a slash.

This was intentional. We plan to get v1.1 agreed as soon as possible, and that would support hierarchical data structure, with '/' as the path delimitor. This means that '/' will have a specific meaning in vospace 1.1, so to avoid confusion between the two we want to exclude it from use in vospace 1.0.

It may be possible that a vospace 1.1 service publishes a sub directory as a vospace 1.0 service, allowing vospace 1.0 clients to access data within just that sub directory. In which case, we want the URI identifiers to make sense between the two systems.

If we allow '/' in vospace 1.0 names (and just treat them as normal strings without interpreting the '/'), then there would be an 'odd' different in behaviour between vospace 1.0 and vospace 1.1 services.

If '/' was not excluded, the you could create something called 'a/b/c/d' in vospace 1.0 even though the impled parent 'a/b/c' didn't exist.
In vospace 1.1, creating 'a/b/c/d' would fail if 'a/b/c' didn't exist because it treated the '/' as a path delimitor.

We wanted to avoid this discrepancy by explicitly excluding '/' in vospace 1.0.

(DaveMorris)

I am not convinced of the need for "structured" nodes at this early stage, or for a VOSpace to perform arbitrary file format or data model-based conversions. This might be useful to allow a VOSpace to be used to access tables stored natively in a RDBMS, but at present this is not sufficiently well defined, at least not in the written specification. It is not clear whether a VOSpace should natively provide such a capability when this will already be provided by the more object-oriented DAL interfaces such as TAP, SIA, etc. It might be best to first provide a solid VOSpace interface for basic "file" (simple byte stream) access before addressing object-oriented access.

Support for StructuredData was added to the specification to compliment the DAL services, not to replace them. One target application of VOSpace is to provide a way of importing data into these services.

Some specific examples might help :

Images in SIA

The SIA specification provides tools for finding and getting images, but as yet, no API for importing images. In order to provide an import mechanism, a SIA service could also provide a VOSpace API. The user could use the vospace API to transfer images into the service, using the StructuredData node to identify the files as FITS images.

If the user imported file as UnstructuredData containing FITS images, the service would just store them as files on disk.

If the user imported them as StructuredData containing FITS images, then the service would attempt to interpret the contents of the files.
- If the files did not contain valid data, then the service would reject them.
- If file did contain valid data, then the service would process the FITS headers and add them to the SIA database.
- The images would then be available for access via the SIA interface.

VOTable in TAP

The TAP interface provides tools for searching and querying the database, but as yet, no API for importing data. The VOSpace API would enable the user to transfer tabular data into the service, using the StructuredData node to identify the files as tabular data.

Again, the user may import UnstructuredData containing VOTable data, and the service would just store them as files on disk.

If the user imported them as StructuredData containing VOTable data, then the service would attempt to interpret the contents of the files.
- If the files did not contain valid data, then the service would reject them.
- If file did contain valid data, then the service would process the VOTable metadata and create new database tables to contain the data.
- The new tables would then be available for access via the TAP interface.

As a specific example of how this could be used, AstroGrid plans to provide a database service that contains static data from a large survey co-located with user data imported via VOSpace. Users would be able to use the VOSpace service to upload their own VOTable files into the user area of the service, and then use the TAP interface to create ADQL join queries to cross reference data from the large survey with their own data.

(DaveMorris)

Does pushToVOSpace deal with a single data node? It says it returns a list of URLs, but there is only a single destination node, so I would guess that the URLs refer to alternative transport protocols. How does the client decide which to use -does it have to parse each URL to determine the protocol? (Possibly this is addressed in the WSDL, but semantic details like this should be addressed in the written specification as well).

Yes, the list of protocol options represent different endpoint(s) which provide access to the same data.

These could be alternative mirror services with the same protocol, or different protocol URLs.
If asked to provide access to the data via http or ftp, the service response could contain :

two separate http URLs for different http mirrors
plus a ftp URL

All of which would provide access to the same data.

The client is free to use them in any order it wants to, but it can only reliably use each one once (some of the endpoints may be one-shot URLs with cookies embedded in the URL itself).

The client does not have to parse the URL, it only has to recognise the protocol URI. Again, these URIs can be hard coded into the client as part of the code that implements that protocol.

(DaveMorris)

One of the things I was looking for was whether a URL could be obtained to directly GET or PUT a file, and this does appear to be the case (e.g, pullFromVOSpace). This would allow us, for example, to have a data service deliver a message to a client giving the access reference URL for a data object, once it has been generated by an asynchronous service. This might allow the VOSpace machinery to be hidden from a simple client, for example when functionality such as VOSpace and UWS are used within a TAP or SIA data service.

Yes - this was one of the use cases we had in mind when designing the interface.

A VOSpace client can handle the transfer negotiation, requesting access using 'standard HTTP GET' or 'standard FTP GET' in the list of protocols.
The VOSpace response would contain standard endpoint URLs for the requested protocols.
The vospace client can then pass these URLs to a separate application which does the data transfer.

(DaveMorris)

View topic | History: r4 < r3 < r2 < r1 | More topic actions...