VOSpace 2.0 Proposed Recommendation: Request for Comments

This document will act as RFC centre for the IVOA VOSpace 2.0 Proposed Recommendation. The latest version of the specification (02-Dec-2011) can be found at:

VOSpace is the IVOA interface to distributed storage. This specification presents the first RESTful version of the interface, which is functionally equivalent to the SOAP-based VOSpace 1.15 specification . Note that all prior VOSpace (1.x) clients will not work with this new version of the interface.

Reference Interoperable Implementations

The following are known implementations of VOSpace 2.0:

Implementations Validators

(If any, indicate here the links to Implementations Validators)

RFC Review Period: 20-May-2012 to 22-Jun-2012

TCG Review Period: TCG_start_date - TCG_end_date



Comments from the IVOA Community during RFC period: 20-May-2012 to 22-Jun-2012

In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your Wiki Name so that authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.

Additional discussion about any of the comments or responses can be conducted on the GWS VOSpace mailing list (vospace@ivoa.net). However, please be sure to enter your initial comments here for full consideration in any future revisions of this document

Comment by NormanGray

Sect.2 mentions that VOSpace URIs may have fragments, and illustrates how a fragment is literally copied into the retrieval URL from the vos: URL. However this is the only mention of 'fragment' in the document. If the intention is that the VOSpace spec regards the path+query+fragment as opaque, then it would be useful to state this explicitly in Sect.2. Given the potential problems with fragments (see Urbana TCG slides and draft uri-fragments note) it might be useful to do one of the following:

  • forbid fragments in VOSpace URIs;
  • require that any # characters in a VOSpace URI are encoded when being transformed into a retrieval URL; or
  • specify that any fragment is removed as part of the process of transforming the VOSpace URI into the retrieval URI, on the grounds that the fragment will (or at least should) be automatically removed, in passing, by the library which does the HTTP retrieval.

Editorial remark: the HTML version is served with MIME type text/html; charset=UTF-8 (the meta@http-equiv element in the HTML header is ignored), but the content is ISO-8859-1, and there are two soft hyphens (0xad) in the Sect 2 example identifier which appear wrongly in a browser which pays attention to the MIME type.

The example given is: vos://org.astrogrid.cam!vospace/container-6/siap-out-1.vot?foo=bar#baz. I agree that this is ambiguous and could be misinterpreted as identifying a node rather than referring to something internal to a data object. For example., vos://nvo.caltech!vospace/mydata/table1#row3 refers to "row3" within the data resource vos://nvo.caltech!vospace/mydata/table1 (and only resolved by the client when table1 has been retrieved) and should not the identify the data object "row3" in its own right. I will amend the text to correctly describe the behaviour (option 3 above) in the next version. -- MatthewGraham - 21 May 2012

Comments by DaveMorris

Changes to node type
Lots of places say "this operation cannot be used to change the node type".

Is there a mechanism where we can change the node type ?

What is the use case for this? It does not make sense in most cases, e.g., changing a ContainerNode to any other type. -- MatthewGraham - 25 May 2012

Multi-value properties
Section 3.2.1 states

"When a Property can take multiple values, e.g., a list of groups which can access a particular resource, these SHALL be represented as a comma-separated list."

Why SHALL and not MAY ?

Unless there is a specific reason for making this explicit, can't we leave it up to the defintion of the property type.

Some property type MAY be comma separated, most probably will, but why do we need to explicitly exclude everything else.

General principle - restrict as little as necessary, and only when we have a specific reason.

The issue of multiple valued parameters was discussed at a previous Interop and it was decided to represent these as a CSV list. Allowing arbitrary delimiters would mean that you would have to check for this information on a per space basis and it was just simpler to specify the delimiter from the outset. -- MatthewGraham - 25 May 2012

When the issue of multiple valued parameters was raised, it was suggested that they COULD be represented as a CSV list. I don't remember it being decided that they SHOULD.

The delimiter used in a particular property would be defined in the definition of the property type, identified by the property type URI, not on a per space basis.

A client only needs to check what the delimiter is IF it intends to do something with the value. Which implies that it already understands what that type of property contains, so it will know what the delimiter is.

e.g. "This property is a comma separated list of intensity values"

An application that understands the VOSpace property for "a list of intensity values" will know that the list will be comma separated.

An application that does not understand the VOSpace property for "a list of intensity values" will just treat the whole thing as a string.

If we only allow a specific delimiter, then anyone with existing tab, space, colon or other delimited data will have to re format their data to meet the specification, or they will avoid adding any detail and just define everything as opaque strings.

e.g. Java classpath

"This property is a list of files formatted using the Java classpath rules (colon or semi-colon delimited)"

is more informative than

"This property is an application specific string"

which is what will happen if we try to force people to use a tool that doesn't fit their data.

What do we gain by specifying the delimiter ?

Suggested compromise - change the spec to say

"multiple values SHOULD be comma separated, unless the property description defines a specific delimiter"

-- DaveMorris - 28 May 2012

Standard properties
Are the standard properties listed in section actually registered somewhere ?

If so, is there a queryable registry where we can access the definitions ?

The wording of the specification "The following URIs SHALL be used .." imples that these properties are part of the VOSpace 2.0 standard. In which case, these properties, and their data types, should be defined in an appendix.

As part of the standardization process, the properties will be registered using the StandardsRegExt in the Registry of Registries (i.e., under the ivoa.net namespace). I agree about an appendix and will add it to the next version. -- MatthewGraham - 25 May 2012

Soft hyphens
There are more than two soft hyphens (0xad) in the text. They appear as <?> symbols in a web browser, but sometimes they are not displayed at all in the PDF version.

They show up in some of the XML examples

and in the example identifiers

  • vos://nvo.caltech!vospace/myresults/siapout1.vot
  • vos://nvo.caltech!vospace/myresults/siap?out?1.vot
  • vos://nvo.caltech!vospace/myresults/siap-out-1.vot

I'll make the appropriate edits in the next version. -- MatthewGraham - 25 May 2012

Typo property in identifier
In section 3.2.4 Standard properties, the last property identifier

  • ivo://ivoa.net/vospace/core@btime

should probably be

  • ivo://ivoa.net/vospace/core#btime

I'll make the appropriate edit in the next version. -- MatthewGraham - 25 May 2012

Typos in compliance matrix
In appendix B: Compliance matrix

Property definition 13

  • 13 A Property has elements:uri, endpoint and param

should probably be

  • 13 A Property has elements:uri, value and optional readonly flag

If we have

  • 16 Standard capabilities are represented by the specified URIs

then should we also have

  • xx Standard properties are represented by the specified URIs

Property definition 26

  • 26 A Protocol has elements: uri, endpoint< and param

should probably be

  • 26 A Protocol has elements: uri, endpoint and param

-- DaveMorris - 23 May 2012

I'll make the appropriate edits in the next version. -- MatthewGraham - 25 May 2012


typos

sec 3.2.4 The last core property ivo://ivoa.net/vospace/core@btime has an @ instead of a #

sec 6, in the changes from 2.00-20110628, second bullet, "synchonous" instead of "synchronous"

clarity

sec 3.2.4 standard node properties

For the recently added properties for access permissions (groupread, groupwrite, and publicread) the actual semantics are important. Specifically, for groupwrite we implemented this as "allowed to read and write" so that it was usable by itself (without having to also set groupread); this was mainly to make it easier for users to understand and manage. I don't think the property names need to change, but the definition would be better if it was clearly read-only, read-write, and anon-read-only. It is true that this does not permit the "permission to write to some hidden container I can't see" use of the UNIX filesystem permissions, but I see that as a feature, not a shortcoming smile

sec 3.7 Searches and 5.3.3 findNodes

It is not clear what role the optional node plays in the seach. Is this supposed to be a starting node and the matches constraints are applied to this node and all children? recursively? If this is absent, that appears to mean "search the whole vospace", so it makes sense for node= to mean "search in this part".

The representation to be used in the UWS job info is not specified here, but sec 5.3.3.1 says to look in sec 3.7 for how to represent the search job. The example representation in the detailed example sequence below sec 5.3.3.3 could be extracted and put into 5.3.3.1 or 3.7, which would help readers.

Following from the detailed example search, does the URL /searches/{jobid}/results/searchDetails also return the resulting node list? It is really a UWS question, but iirc the purpose of specifying the name of the result explicitly is to enable the client to immediately know the url to and get the result. Of course, a GET to this URL could well redirect to the same url as the href attribute (where the result is actually stored, which is what we implemented in our UWS library). It would be worth exlpaining this.

sec 3.8 REST bindings

The /{sync} resource says it is for synchronous jobs rather than (more explicily) synchronous transfer jobs (as opposed to sync searches). The subsequent text does say transfer details. It would be worth expanding the "The endpoint /{sync}" paragraph with "Synchronus transfers are limited to (intended for?) pushToVoSpace and pullFromVoSpace transfers only, where the client is requesting endpoint URLs where it can read or write data." Adding all of this depends on the issue with 5.4.1 (see below).

issues

First, I'm glad to see that 500 response codes are now reserved for (utter) service failure and are are not needed for any kind of client error or usage problems. Operators/monitors will be happy.

sec 5.3.1.3 getNode faults

The status code 404 and NodeNotFound fault are specified if the target node does not exist. This should be clarified to include non-existence of a parent container node. Alternatively, in both createNode and deleteNode (5.2.1.3 and 5.2.4.3), there is a 404 ContainerNotFound if a parent node in the path does not exist. This would provide more information to the client (good) but in some implementations it may be more complex to implement (basically, you fail to find the node and now have to check if all the parents exist... maybe "harder to optimise" would be a better description). Still, for consistency it seems that a 404 ContainerNotFound fault when a parent container does not exist (as in other ops) would be a good addition.

sec 5.3.2.3 setNode faults

Same as above (specify 404 ContainerNotFound)

sec 5.4.1 pushToVoSpace

In the request section says that the convenience method is to POST to /sync (should it be /{sync} since the actual name is not fixed?) and that HTTP PUT is assumed. In practice, there is no problem with negotiating to use other protocols since you get back a transferDetails representation that says which ones to use (and at which URL). It may be desireable to require HTTP PUT with the sync transfer negotiation, but negotiation should still be possible.

In the response section, it says the convenience (POST to /{sync}) responds with a 303 to /transfers/{jobid}/results/transferDetails. The first resource name there should be {transfers} as used elsewhere. Is it actually required to have a common transfer job list and this convenient /{sync} way to optimise the interaction? The value of making this explicit is that the client can parse the URL and check the job (error summary). If this is the intent (I agree with it) then (i) sec 3.8 needs to be more explicit and (ii) we should think about which URL to redirect to.

If we do as above but the job failed, the clent will get a 404 and have to parse the url to get the job and find out why it failed. If we redirect to the finished (COMPLETED or ERROR) job, the client can chose to check the job or append and go for the transferDetails immediately, but they can't simply follow and read the transfer document directly (the normal success case). So, is it better to redirect to /{transfers}/{jobid}/results/transferDetails (optimise for success case but be prepared for a 404) or redirect to /{transfers}/{jobid} (promotes careful client, can be optimised by aware client). I'm ambivalent. The latter removes having to run and poll an async transfer and the former goes one more step and returns the transferDetails more immediately.

Note: In our impl, we return a transfer document even if the negotiation failed (it just has no protocols/endpoints in it, a which point the client has to parse the URL and check the job for error messages), but I'm not very attached to this.

sec 5.4.1 pullFromVoSpace

This section mentions the "view=data" convenience method for HTTP GET. Is there any reason the /{sync} convenient negotiation is not mentioned here? It is perfectly usable and well specified with minimal extra langauage. It is probably more work to make /{sync} not support pullFromVoSpace.

As for the view=data method, that is fine and usable in very simple cases, but the text about it returning a 303 to an alternate URL is asking for trouble. The first thing I found is that some http implementations cannot, by design, follow a redirect changes protocol (java.net.HttpURLConnection cannot even change from http to https). I feel strongly that changing protocol not be allowed here.

Comments from Mark Taylor

  • Sec. 1.1: There are namespace declarations in the examples here which don't appear to be doing anything (xmlns:vost, xmlns:xsi, xmlns:xsd), unless I'm missing something. Might be clearer to omit them.
  • Sec 3.8: I don't understand why the REST endpoints are listed with braces here, e.g. "/{protocols}" rather than "/protocols". As far as I can see these are literal strings to be used as endpoints rather than (as noted) the parts like "(job-id)" that can be chosen by the service. Am I missing something?
  • If memory serves there are fewer 500 responses mandated here than in earlier versions of this standard, but there are still some. 500 is reserved for "unexpected" conditions - it seems a bit questionable to mandate it for specific circumstances.
  • Sec 5.2.2.2 and 5.2.3.2: examples omit a closing </uws:result> tag.
  • There are a couple of things I don't understand about the OPTIONAL operations. Taking pushToVospace as an example (Sec 5.4.1):
    • how does a client know if such an operation is implemented for a given service? (maybe that's a job for a future VOSpaceRegExt)
    • how does the service respond to a request for one of these optional operations if it does not support it? None of the faults in sec 5.4.1.3 looks appropriate.
    • does support of the UWS mode imply support of the convenience mode and vice versa, or is it permissible for a service to support e.g. the convenience mode but not the UWS mode?
-- MarkTaylor - 20 Jun 2012




Edit | Attach | Watch | Print version | History: r32 | r13 < r12 < r11 < r10 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r11 - 2012-06-20 - MarkTaylor
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback