International Virtual Observatory Alliance

Difference: VOSpace2RFC (12 vs. 13)

Revision 132012-07-11 - PatrickDowler

  META TOPICPARENT 
  name="IvoaGridAndWebServices"  

 VOSpace 2.0 Proposed Recommendation: Request for Comments 

This document will act as RFC centre for the IVOA VOSpace 2.0 Proposed Recommendation. The latest version of the specification (02-Dec-2011) can be found at:
- META TOPICPARENT
+  name="IvoaGridAndWebServices"
-<
<
+ IVOA VOSpace V2.00 Proposed Recommendation
->
>
+ IVOA VOSpace V2.00 Proposed Recommendation
 VOSpace is the IVOA interface to distributed storage. This specification presents the first RESTful version of the interface, which is functionally equivalent to the SOAP-based VOSpace 1.15 specification . Note that all prior VOSpace (1.x) clients will not work with this new version of the interface.

 Reference Interoperable Implementations 

The following are known implementations of VOSpace 2.0:
-<
<
+ CADC - the source code is available at: http://code.google.com/p/opencadc/source/checkout
  VAO - the source code is available via the VAO SVN repository: svn+ssh://usvao-svn@svn.usvao.org/usvao/prototype/vospace/trunk/vospace-2.0/java/merged 
  CDS - information available here : http://cds.u-strasbg.fr/resources/doku.php?id=vospace
->
>
+ CADC - the source code is available at: http://code.google.com/p/opencadc/source/checkout
  VAO - the source code is available via the VAO SVN repository: svn+ssh://usvao-svn@svn.usvao.org/usvao/prototype/vospace/trunk/vospace-2.0/java/merged 
  CDS - information available here : http://cds.u-strasbg.fr/resources/doku.php?id=vospace
  Implementations Validators 
(If any, indicate here the links to Implementations Validators)

 RFC Review Period: 20-May-2012 to 22-Jun-2012 
 TCG Review Period: TCG_start_date - TCG_end_date 




 Comments from the IVOA Community during RFC period: 20-May-2012 to 22-Jun-2012 
In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your Wiki Name so that authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.

Additional discussion about any of the comments or responses can be conducted on the GWS VOSpace mailing list (vospace@ivoa.net). However, please be sure to enter your initial comments here for full consideration in any future revisions of this document
-<
<
+ Sample comment by WikiName 
 Response (by WikiName)
->
>
+ Sample comment by WikiName 
 Response (by WikiName)
  Comment by NormanGray 

Sect.2 mentions that VOSpace URIs may have fragments, and illustrates how a fragment is literally copied into the retrieval URL from the vos: URL.  However this is the only mention of 'fragment' in the document.  If the intention is that the VOSpace spec regards the path+query+fragment as opaque, then it would be useful to state this explicitly in Sect.2.  Given the potential problems with fragments (see Urbana TCG slides and draft uri-fragments note) it might be useful to do one of the following:
-<
<
+ forbid fragments in VOSpace URIs;
  require that any # characters in a VOSpace URI are encoded when being transformed into a retrieval URL; or
  specify that any fragment is removed as part of the process of transforming the VOSpace URI into the retrieval URI, on the grounds that the fragment will (or at least should) be automatically removed, in passing, by the library which does the HTTP retrieval.
->
>
+ forbid fragments in VOSpace URIs;
  require that any # characters in a VOSpace URI are encoded when being transformed into a retrieval URL; or
  specify that any fragment is removed as part of the process of transforming the VOSpace URI into the retrieval URI, on the grounds that the fragment will (or at least should) be automatically removed, in passing, by the library which does the HTTP retrieval.
 Editorial remark: the HTML version is served with MIME type text/html; charset=UTF-8 (the meta@http-equiv element in the HTML header is ignored), but the content is ISO-8859-1, and there are two soft hyphens (0xad) in the Sect 2 example identifier which appear wrongly in a browser which pays attention to the MIME type.

The example given is: vos://org.astrogrid.cam!vospace/container-6/siap-out-1.vot?foo=bar#baz. I agree that this is ambiguous and could be misinterpreted as identifying a node rather than referring to something internal to a data object.
For example., vos://nvo.caltech!vospace/mydata/table1#row3 refers to "row3" within the data resource vos://nvo.caltech!vospace/mydata/table1 (and only resolved by the client when table1 has been retrieved) and should not the identify the data object "row3" in its own right. I will amend the text to correctly describe the behaviour (option 3 above) in the next version. 
-- MatthewGraham - 21 May 2012

 Comments by DaveMorris 
 Changes to node type 
Lots of places say "this operation cannot be used to change the node type".

Is there a mechanism where we can change the node type ?

What is the use case for this? It does not make sense in most cases, e.g., changing a ContainerNode to any other type. 
-- MatthewGraham - 25 May 2012

 Multi-value properties 
Section 3.2.1 states

"When a Property can take multiple values, e.g., a list of groups which can access a particular resource, these SHALL be represented as a comma-separated list."

Why SHALL and not MAY ?

Unless there is a specific reason for making this explicit, can't we leave it up to the defintion of the property type.

Some property type MAY be comma separated, most probably will, but why do we need to explicitly exclude everything else.

General principle - restrict as little as necessary, and only when we have a specific reason.

The issue of multiple valued parameters was discussed at a previous Interop and it was decided to represent these as a CSV list. Allowing arbitrary delimiters would mean that you would have to check for this information on a per space basis and it was just simpler to specify the delimiter from the outset.
-- MatthewGraham - 25 May 2012

When the issue of multiple valued parameters was raised, it was suggested that they COULD be represented as a CSV list. I don't remember it being decided that they SHOULD.

The delimiter used in a particular property would be defined in the definition of the property type, identified by the property type URI, not on a per space basis.

A client only needs to check what the delimiter is IF it intends to do something with the value. Which implies that it already understands what that type of property contains, so it will know what the delimiter is.

e.g.
"This property is a comma separated list of intensity values"

An application that understands the VOSpace property for "a list of intensity values" will know that the list will be comma separated. 

An application that does not understand the VOSpace property for "a list of intensity values" will just treat the whole thing as a string. 

If we only allow a specific delimiter, then anyone with existing tab, space, colon or other delimited data will have to re format their data to meet the specification, or they will avoid adding any detail and just define everything as opaque strings.

e.g. Java classpath

"This property is a list of files formatted using the Java classpath rules (colon or semi-colon delimited)"

is more informative than

"This property is an application specific string"

which is what will happen if we try to force people to use a tool that doesn't fit their data.

What do we gain by specifying the delimiter ?

Suggested compromise - change the spec to say

"multiple values SHOULD be comma separated, unless the property description defines a specific delimiter"

-- DaveMorris - 28 May 2012

 Standard properties 
Are the standard properties listed in section actually registered somewhere ?

If so, is there a queryable registry where we can access the definitions ?

The wording of the specification "The following URIs SHALL be used .." imples that these properties are part of the VOSpace 2.0 standard. In which case, these properties, and their data types, should be defined in an appendix.

As part of the standardization process, the properties will be registered using the StandardsRegExt in the Registry of Registries (i.e., under the ivoa.net namespace). I agree about an appendix and will add it to the next version.
-- MatthewGraham - 25 May 2012

 Soft hyphens 
There are more than two soft hyphens (0xad) in the text.
They appear as <?> symbols in a web browser, but sometimes they are not displayed at all in the PDF version.

They show up in some of the XML examples
-<
<
+ http://www.w3.org/2001/XMLSchemainstance
  http://www.w3.org/2001/XMLSchema?instance
  http://www.w3.org/2001/XMLSchema-instance
->
>
+ http://www.w3.org/2001/XMLSchemainstance
  http://www.w3.org/2001/XMLSchema?instance
  http://www.w3.org/2001/XMLSchema-instance
 and in the example identifiers
-<
<
+ vos://nvo.caltech!vospace/myresults/siapout1.vot
  vos://nvo.caltech!vospace/myresults/siap?out?1.vot
  vos://nvo.caltech!vospace/myresults/siap-out-1.vot
->
>
+ vos://nvo.caltech!vospace/myresults/siapout1.vot
  vos://nvo.caltech!vospace/myresults/siap?out?1.vot
  vos://nvo.caltech!vospace/myresults/siap-out-1.vot
 I'll make the appropriate edits in the next version.
-- MatthewGraham - 25 May 2012


 Typo property in identifier 
In section 3.2.4 Standard properties, the last property identifier
-<
<
+ ivo://ivoa.net/vospace/core@btime
->
>
+ ivo://ivoa.net/vospace/core@btime
 should probably be
-<
<
+ ivo://ivoa.net/vospace/core#btime
->
>
+ ivo://ivoa.net/vospace/core#btime
 I'll make the appropriate edit in the next version.
-- MatthewGraham - 25 May 2012


 Typos in compliance matrix 
In appendix B: Compliance matrix

Property definition 13
-<
<
+A Property has elements:uri, endpoint and param
->
>
+A Property has elements:uri, endpoint and param
 should probably be
-<
<
+A Property has elements:uri, value and optional readonly flag
->
>
+A Property has elements:uri, value and optional readonly flag
 If we have
-<
<
+Standard capabilities are represented by the specified URIs
->
>
+Standard capabilities are represented by the specified URIs
 then should we also have
-<
<
+ xx Standard properties are represented by the specified URIs
->
>
+ xx Standard properties are represented by the specified URIs
 Property definition 26
-<
<
+A Protocol has elements: uri, endpoint< and param
->
>
+A Protocol has elements: uri, endpoint< and param
 should probably be
-<
<
+A Protocol has elements: uri, endpoint and param
->
>
+A Protocol has elements: uri, endpoint and param
 -- DaveMorris - 23 May 2012

I'll make the appropriate edits in the next version.
-- MatthewGraham - 25 May 2012





 typos 

sec 3.2.4 The last core property ivo://ivoa.net/vospace/core@btime has an @ instead of a #

sec 6, in the changes from 2.00-20110628, second bullet, "synchonous" instead of "synchronous" 

 clarity 

sec 3.2.4 standard node properties

For the recently added properties for access permissions (groupread, groupwrite, and publicread) the 
actual semantics are important. Specifically, for groupwrite we implemented this as "allowed to read 
and write" so that it was usable by itself (without having to also set groupread); this was mainly to
make it easier for users to understand and manage. I don't think the property names need to change, 
but the definition would be better if it was clearly read-only, read-write, and anon-read-only. It is true
that this does not permit the "permission to write to some hidden container I can't see" use of the
UNIX filesystem permissions, but I see that as a feature, not a shortcoming 


sec 3.7 Searches and 5.3.3 findNodes


It is not clear what role the optional node plays in the seach. Is this supposed to be a starting
node and the matches constraints are applied to this node and all children? recursively? If this is
absent, that appears to mean "search the whole vospace", so it makes sense for node= to
mean "search in this part".

The representation to be used in the UWS job info is not specified here, but sec 5.3.3.1 says to look 
in sec 3.7 for how to represent the search job. The example representation in the detailed example 
sequence below sec 5.3.3.3 could be extracted and put into 5.3.3.1 or 3.7, which would help readers.

Following from the detailed example search, does the URL /searches/{jobid}/results/searchDetails also
return the resulting node list? It is really a UWS question, but iirc the purpose of specifying the
name of the result explicitly is to enable the client to immediately know the url to and get the result.
Of course, a GET to this URL could well redirect to the same url as the href attribute (where the result
is actually stored, which is what we implemented in our UWS library). It would be worth exlpaining this.


sec 3.8 REST bindings

The /{sync} resource says it is for synchronous jobs rather than (more explicily) synchronous transfer jobs
(as opposed to sync searches).  The subsequent text does say transfer details. It would be worth expanding 
the "The endpoint /{sync}" paragraph with "Synchronus transfers are limited to (intended for?) pushToVoSpace 
and pullFromVoSpace transfers only, where the client is requesting endpoint URLs where it can read or write data."
Adding all of this depends on the issue with 5.4.1 (see below).

 issues 

First, I'm glad to see that 500 response codes are now reserved for (utter) service failure and are are not
needed for any kind of client error or usage problems. Operators/monitors will be happy.

sec 5.3.1.3 getNode faults

The status code 404 and NodeNotFound fault are specified if the target node does not exist. This should be 
clarified to include non-existence of a parent container node. Alternatively, in both createNode and 
deleteNode (5.2.1.3 and 5.2.4.3), there is a 404 ContainerNotFound if a parent node in the path does not 
exist. This would provide more information to the client (good) but in some implementations it may be more
complex to implement (basically, you fail to find the node and now have to check if all the parents exist...
maybe "harder to optimise" would be a better description). Still, for consistency it seems that a 404 
ContainerNotFound fault when a parent container does not exist (as in other ops) would be a good addition.

sec 5.3.2.3 setNode faults

Same as above (specify 404 ContainerNotFound)

sec 5.4.1 pushToVoSpace

In the request section says that the convenience method is to POST to /sync (should it be /{sync} since the actual
name is not fixed?) and that HTTP PUT is assumed. In practice, there is no problem with negotiating to use other
protocols since you get back a transferDetails representation that says which ones to use (and at which URL). It 
may be desireable to require HTTP PUT with the sync transfer negotiation, but negotiation should still be possible.

In the response section, it says the convenience (POST to /{sync}) responds with a 303 to 
/transfers/{jobid}/results/transferDetails. The first resource name there should be {transfers} as used elsewhere.
Is it actually required to have a common transfer job list and this convenient /{sync} way to optimise the interaction? 
The value of making this explicit is that the client can parse the URL and check the job (error summary). If this is
the intent (I agree with it) then (i) sec 3.8 needs to be more explicit and (ii) we should think about which URL 
to redirect to.

If we do as above but the job failed, the clent will get a 404 and have to parse the url to get the job and find 
out why it failed. If we redirect to the finished (COMPLETED or ERROR) job, the client can chose to check the job or 
append and go for the transferDetails immediately, but they can't simply follow and read the transfer document 
directly (the normal success case).  So, is it better to redirect to /{transfers}/{jobid}/results/transferDetails (optimise for success case but 
be prepared for a 404) or redirect to /{transfers}/{jobid} (promotes careful client, can be optimised by aware client).
I'm ambivalent. The latter removes having to run and poll an async transfer and the former goes one more step and returns
the transferDetails more immediately. 

Note: In our impl, we return a transfer document even if the negotiation failed 
(it just has no protocols/endpoints in it, a which point the client has to parse the URL and check the job for
error messages), but I'm not very attached to this.

sec 5.4.1 pullFromVoSpace

This section mentions the "view=data" convenience method for HTTP GET. Is there any reason the /{sync} convenient 
negotiation is not mentioned here? It is perfectly usable and well specified with minimal extra langauage. It is
probably more work to make /{sync} not support pullFromVoSpace.

As for the view=data method, that is fine and usable in very simple cases, but the text about it returning a 303 to an
alternate URL is asking for trouble. The first thing I found is that some http implementations cannot, by design, follow
a redirect changes protocol (java.net.HttpURLConnection cannot even change from http to https). I feel strongly that changing 
protocol not be allowed here.
->
>
+-- PatrickDowler - 2012-06-15
  Comments from Mark Taylor
-<
<
+ Sec. 1.1: There are namespace declarations in the examples here which don't appear to be doing anything (xmlns:vost, xmlns:xsi, xmlns:xsd), unless I'm missing something.  Might be clearer to omit them.
  Sec 3.8: I don't understand why the REST endpoints are listed with braces here, e.g. "/{protocols}" rather than "/protocols".  As far as I can see these are literal strings to be used as endpoints rather than (as noted) the parts like "(job-id)" that can be chosen by the service.  Am I missing something?
  If memory serves there are fewer 500 responses mandated here than in earlier versions of this standard, but there are still some.  500 is reserved for "unexpected" conditions - it seems a bit questionable to mandate it for specific circumstances.
  Sec 5.2.2.2 and 5.2.3.2: examples omit a closing </uws:result> tag.
  There are a couple of things I don't understand about the OPTIONAL operations. Taking pushToVospace as an example (Sec 5.4.1): 
 how does a client know if such an operation is implemented for a given service? (maybe that's a job for a future VOSpaceRegExt)
  how does the service respond to a request for one of these optional operations if it does not support it?  None of the faults in sec 5.4.1.3 looks appropriate.
  does support of the UWS mode imply support of the convenience mode and vice versa, or is it permissible for a service to support e.g. the convenience mode but not the UWS mode?
->
>
+ Sec. 1.1: There are namespace declarations in the examples here which don't appear to be doing anything (xmlns:vost, xmlns:xsi, xmlns:xsd), unless I'm missing something.  Might be clearer to omit them.
  Sec 3.8: I don't understand why the REST endpoints are listed with braces here, e.g. "/{protocols}" rather than "/protocols".  As far as I can see these are literal strings to be used as endpoints rather than (as noted) the parts like "(job-id)" that can be chosen by the service.  Am I missing something?
  If memory serves there are fewer 500 responses mandated here than in earlier versions of this standard, but there are still some.  500 is reserved for "unexpected" conditions - it seems a bit questionable to mandate it for specific circumstances.
  Sec 5.2.2.2 and 5.2.3.2: examples omit a closing </uws:result> tag.
  There are a couple of things I don't understand about the OPTIONAL operations. Taking pushToVospace as an example (Sec 5.4.1): 
 how does a client know if such an operation is implemented for a given service? (maybe that's a job for a future VOSpaceRegExt)
  how does the service respond to a request for one of these optional operations if it does not support it?  None of the faults in sec 5.4.1.3 looks appropriate.
  does support of the UWS mode imply support of the convenience mode and vice versa, or is it permissible for a service to support e.g. the convenience mode but not the UWS mode?
 -- MarkTaylor - 20 Jun 2012
-<
<

View topic | History: r32 < r31 < r30 < r29 | More topic actions...