VOSpace 2.0 Proposed Recommendation: Request for Comments
This document will act as
RFC centre for the
IVOA VOSpace 2.0 Proposed Recommendation. The version of the specification (02-Dec-2011) for RFC review can be found at:
A revised version (24-Aug-2012) of the document for TCG review which addresses all the issues raised during the RFC period is available here:
VOSpace is the IVOA interface to distributed storage. This specification presents the first RESTful version of the interface, which is functionally equivalent to the SOAP-based
VOSpace 1.15 specification . Note that all prior VOSpace (1.x) clients will not work with this new version of the interface.
Reference Interoperable Implementations
The following are known implementations of VOSpace 2.0:
Implementations Validators
(If any, indicate here the links to Implementations Validators)
RFC Review Period: 20-May-2012 to 22-Jun-2012
TCG Review Period: 29-Aug-2012 to 26-Sep-2012
Comments from the IVOA Community during RFC period: 20-May-2012 to 22-Jun-2012
In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your Wiki Name so that authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.
Additional discussion about any of the comments or responses can be conducted on the GWS VOSpace mailing list (
vospace@ivoa.net). However, please be sure to enter your initial comments here for full consideration in any future revisions of this document
Sect.2 mentions that VOSpace URIs may have fragments, and illustrates how a fragment is literally copied into the retrieval URL from the
vos:
URL. However this is the only mention of 'fragment' in the document. If the intention is that the VOSpace spec regards the path+query+fragment as opaque, then it would be useful to state this explicitly in Sect.2. Given the potential problems with fragments (see Urbana TCG slides and draft uri-fragments note) it might be useful to do one of the following:
- forbid fragments in VOSpace URIs;
- require that any
#
characters in a VOSpace URI are encoded when being transformed into a retrieval URL; or
- specify that any fragment is removed as part of the process of transforming the VOSpace URI into the retrieval URI, on the grounds that the fragment will (or at least should) be automatically removed, in passing, by the library which does the HTTP retrieval.
Editorial remark: the HTML version is served with MIME type
text/html; charset=UTF-8
(the
meta@http-equiv element in the HTML header is ignored), but the content is ISO-8859-1, and there are two soft hyphens (0xad) in the Sect 2 example identifier which appear wrongly in a browser which pays attention to the MIME type.
The example given is: vos://org.astrogrid.cam!vospace/container-6/siap-out-1.vot?foo=bar#baz. I agree that this is ambiguous and could be misinterpreted as identifying a node rather than referring to something internal to a data object. For example., vos://nvo.caltech!vospace/mydata/table1#row3 refers to "row3" within the data resource vos://nvo.caltech!vospace/mydata/table1 (and only resolved by the client when table1 has been retrieved) and should not the identify the data object "row3" in its own right. I will amend the text to correctly describe the behaviour (option 3 above) in the next version. --
MatthewGraham - 21 May 2012
Changes to node type
Lots of places say
"this operation cannot be used to change the node type".
Is there a mechanism where we can change the node type ?
What is the use case for this? It does not make sense in most cases, e.g., changing a
ContainerNode to any other type. --
MatthewGraham - 25 May 2012
Multi-value properties
Section 3.2.1 states
"When a Property can take multiple values, e.g., a list of groups which can access a particular resource, these SHALL be represented as a comma-separated list."
Why
SHALL and not
MAY ?
Unless there is a specific reason for making this explicit, can't we leave it up to the defintion of the property type.
Some property type
MAY be comma separated, most probably will, but why do we need to explicitly exclude everything else.
General principle - restrict as little as necessary, and only when we have a specific reason.
The issue of multiple valued parameters was discussed at a previous Interop and it was decided to represent these as a CSV list. Allowing arbitrary delimiters would mean that you would have to check for this information on a per space basis and it was just simpler to specify the delimiter from the outset. --
MatthewGraham - 25 May 2012
When the issue of multiple valued parameters was raised, it was suggested that they
COULD be represented as a CSV list. I don't remember it being decided that they
SHOULD.
The delimiter used in a particular property would be defined in the definition of the property type, identified by the property type URI, not on a per space basis.
A client only needs to check what the delimiter is IF it intends to do something with the value. Which implies that it already understands what that type of property contains, so it will know what the delimiter is.
e.g.
"This property is a comma separated list of intensity values"
An application that understands the VOSpace property for "a list of intensity values" will know that the list will be comma separated.
An application that does not understand the VOSpace property for "a list of intensity values" will just treat the whole thing as a string.
If we only allow a specific delimiter, then anyone with existing tab, space, colon or other delimited data will have to re format their data to meet the specification, or they will avoid adding any detail and just define everything as opaque strings.
e.g. Java classpath
"This property is a list of files formatted using the Java classpath rules (colon or semi-colon delimited)"
is more informative than
"This property is an application specific string"
which is what will happen if we try to force people to use a tool that doesn't fit their data.
What do we gain by specifying the delimiter ?
Suggested compromise - change the spec to say
"multiple values SHOULD be comma separated, unless the property description defines a specific delimiter"
--
DaveMorris - 28 May 2012
The revised text will be used in the next version. --
MatthewGraham - 31 July 2012
Standard properties
Are the standard properties listed in section actually registered somewhere ?
If so, is there a queryable registry where we can access the definitions ?
The wording of the specification
"The following URIs SHALL be used .." imples that these properties are part of the VOSpace 2.0 standard. In which case, these properties, and their data types, should be defined in an appendix.
As part of the standardization process, the properties will be registered using the
StandardsRegExt in the Registry of Registries (i.e., under the ivoa.net namespace). I agree about an appendix and will add it to the next version. --
MatthewGraham - 25 May 2012
Soft hyphens
There are more than two soft hyphens (0xad) in the text. They appear as
symbols in a web browser, but sometimes they are not displayed at all in the PDF version.
They show up in some of the XML examples
and in the example identifiers
- vos://nvo.caltech!vospace/myresults/siapout1.vot
- vos://nvo.caltech!vospace/myresults/siap?out?1.vot
- vos://nvo.caltech!vospace/myresults/siap-out-1.vot
I'll make the appropriate edits in the next version. --
MatthewGraham - 25 May 2012
Typo property in identifier
In section 3.2.4 Standard properties, the last property identifier
- ivo://ivoa.net/vospace/core@btime
should probably be
- ivo://ivoa.net/vospace/core#btime
I'll make the appropriate edit in the next version. --
MatthewGraham - 25 May 2012
Typos in compliance matrix
In appendix B: Compliance matrix
Property definition 13
- 13 A Property has elements:uri, endpoint and param
should probably be
- 13 A Property has elements:uri, value and optional readonly flag
If we have
- 16 Standard capabilities are represented by the specified URIs
then should we also have
- xx Standard properties are represented by the specified URIs
Property definition 26
- 26 A Protocol has elements: uri, endpoint< and param
should probably be
- 26 A Protocol has elements: uri, endpoint and param
--
DaveMorris - 23 May 2012
I'll make the appropriate edits in the next version. --
MatthewGraham - 25 May 2012
typos
sec 3.2.4 The last core property ivo://ivoa.net/vospace/core@btime has an @ instead of a #
sec 6, in the changes from 2.00-20110628, second bullet, "synchonous" instead of "synchronous"
I'll correct these in the next version. --
MatthewGraham - 31 July 2012
clarity
sec 3.2.4 standard node properties
For the recently added properties for access permissions (groupread, groupwrite, and publicread) the actual semantics are important. Specifically, for groupwrite we implemented this as "allowed to read and write" so that it was usable by itself (without having to also set groupread); this was mainly to make it easier for users to understand and manage. I don't think the property names need to change, but the definition would be better if it was clearly read-only, read-write, and anon-read-only. It is true that this does not permit the "permission to write to some hidden container I can't see" use of the UNIX filesystem permissions, but I see that as a feature, not a shortcoming
-
I'll clarify the text in the next version. --
MatthewGraham - 31 July 2012
sec 3.7 Searches and 5.3.3 findNodes
It is not clear what role the optional node plays in the seach. Is this supposed to be a starting node and the matches constraints are applied to this node and all children? recursively? If this is absent, that appears to mean "search the whole vospace", so it makes sense for node=
to mean "search in this part".
The representation to be used in the UWS job info is not specified here, but sec 5.3.3.1 says to look in sec 3.7 for how to represent the search job. The example representation in the detailed example sequence below sec 5.3.3.3 could be extracted and put into 5.3.3.1 or 3.7, which would help readers.
Following from the detailed example search, does the URL /searches/{jobid}/results/searchDetails also return the resulting node list? It is really a UWS question, but iirc the purpose of specifying the name of the result explicitly is to enable the client to immediately know the url to and get the result. Of course, a GET to this URL could well redirect to the same url as the href attribute (where the result is actually stored, which is what we implemented in our UWS library). It would be worth exlpaining this.
- I'll clarify the text in the next version to explain what the node argument does. I'll also add the example to 3.7 and the specific endpoint. -- MatthewGraham - 31 July 2012
sec 3.8 REST bindings
The /{sync} resource says it is for synchronous jobs rather than (more explicily) synchronous transfer jobs (as opposed to sync searches). The subsequent text does say transfer details. It would be worth expanding the "The endpoint /{sync}" paragraph with "Synchronus transfers are limited to (intended for?) pushToVoSpace and pullFromVoSpace transfers only, where the client is requesting endpoint URLs where it can read or write data." Adding all of this depends on the issue with 5.4.1 (see below).
I'll add this text to the next version. -- MatthewGraham - 31 July 2012
issues
First, I'm glad to see that 500 response codes are now reserved for (utter) service failure and are are not needed for any kind of client error or usage problems. Operators/monitors will be happy.
sec 5.3.1.3 getNode faults
The status code 404 and NodeNotFound fault are specified if the target node does not exist. This should be clarified to include non-existence of a parent container node. Alternatively, in both createNode and deleteNode (5.2.1.3 and 5.2.4.3), there is a 404 ContainerNotFound if a parent node in the path does not exist. This would provide more information to the client (good) but in some implementations it may be more complex to implement (basically, you fail to find the node and now have to check if all the parents exist... maybe "harder to optimise" would be a better description). Still, for consistency it seems that a 404 ContainerNotFound fault when a parent container does not exist (as in other ops) would be a good addition.
sec 5.3.2.3 setNode faults
Same as above (specify 404 ContainerNotFound)
- I'll add the appropriate ContainerNotFound 404s in the next version -- MatthewGraham - 31 July 2012
sec 5.4.1 pushToVoSpace
In the request section says that the convenience method is to POST to /sync (should it be /{sync} since the actual name is not fixed?) and that HTTP PUT is assumed. In practice, there is no problem with negotiating to use other protocols since you get back a transferDetails representation that says which ones to use (and at which URL). It may be desireable to require HTTP PUT with the sync transfer negotiation, but negotiation should still be possible.
In the response section, it says the convenience (POST to /{sync}) responds with a 303 to /transfers/{jobid}/results/transferDetails. The first resource name there should be {transfers} as used elsewhere. Is it actually required to have a common transfer job list and this convenient /{sync} way to optimise the interaction? The value of making this explicit is that the client can parse the URL and check the job (error summary). If this is the intent (I agree with it) then (i) sec 3.8 needs to be more explicit and (ii) we should think about which URL to redirect to.
If we do as above but the job failed, the clent will get a 404 and have to parse the url to get the job and find out why it failed. If we redirect to the finished (COMPLETED or ERROR) job, the client can chose to check the job or append and go for the transferDetails immediately, but they can't simply follow and read the transfer document directly (the normal success case). So, is it better to redirect to /{transfers}/{jobid}/results/transferDetails (optimise for success case but be prepared for a 404) or redirect to /{transfers}/{jobid} (promotes careful client, can be optimised by aware client). I'm ambivalent. The latter removes having to run and poll an async transfer and the former goes one more step and returns the transferDetails more immediately.
Note: In our impl, we return a transfer document even if the negotiation failed (it just has no protocols/endpoints in it, a which point the client has to parse the URL and check the job for error messages), but I'm not very attached to this.
sec 5.4.1 pullFromVoSpace
This section mentions the "view=data" convenience method for HTTP GET. Is there any reason the /{sync} convenient negotiation is not mentioned here? It is perfectly usable and well specified with minimal extra langauage. It is probably more work to make /{sync} not support pullFromVoSpace.
As for the view=data method, that is fine and usable in very simple cases, but the text about it returning a 303 to an alternate URL is asking for trouble. The first thing I found is that some http implementations cannot, by design, follow a redirect changes protocol (java.net.HttpURLConnection cannot even change from http to https). I feel strongly that changing protocol not be allowed here.
-- PatrickDowler - 2012-06-15
I'll change the text in the next version to make pushToVoSpace and pullFromVoSpace consistent in their use of the /{sync}, and its endpoint. -- MatthewGraham - 31 July 2012
Comments from Mark Taylor
- Sec. 1.1: There are namespace declarations in the examples here which don't appear to be doing anything (xmlns:vost, xmlns:xsi, xmlns:xsd), unless I'm missing something. Might be clearer to omit them.
I'll remove the unnecessary namespaces in the next version. -- MatthewGraham - 31 July 2012
- Sec 3.8: I don't understand why the REST endpoints are listed with braces here, e.g. "/{protocols}" rather than "/protocols". As far as I can see these are literal strings to be used as endpoints rather than (as noted) the parts like "(job-id)" that can be chosen by the service. Am I missing something?
This is the naming convention for endpoints used in the UWS spec. I'm checking what the convention means -- MatthewGraham - 31 July 2012
- If memory serves there are fewer 500 responses mandated here than in earlier versions of this standard, but there are still some. 500 is reserved for "unexpected" conditions - it seems a bit questionable to mandate it for specific circumstances.
A 500 is now only thrown if the operation fails - in normal operations, this will be a result of "unexpected" conditions. -- MatthewGraham - 31 July 2012
- Sec 5.2.2.2 and 5.2.3.2: examples omit a closing </uws:result> tag.
I'll correct this in the next version. -- MatthewGraham -31 July 2012
- There are a couple of things I don't understand about the OPTIONAL operations. Taking pushToVospace as an example (Sec 5.4.1):
- how does a client know if such an operation is implemented for a given service? (maybe that's a job for a future VOSpaceRegExt)
- how does the service respond to a request for one of these optional operations if it does not support it? None of the faults in sec 5.4.1.3 looks appropriate.
- does support of the UWS mode imply support of the convenience mode and vice versa, or is it permissible for a service to support e.g. the convenience mode but not the UWS mode?
-- MarkTaylor - 20 Jun 2012
There is an amount of metadata associated with a VOSpace service that needs to be put in the registry and one of these is which optional features a particular service offers. We need to see whether we need a specific VOSpaceRegExt to do this.
Optional views return a not supported error if attempted. Service calls (findNodes, pushTo, pullTo, pushFrom) should employ the standard UWS fault mechanism: "the service shall set the PHASE to "ERROR" in the Job representation. The element in the Job representation shall be set to the appropriate value for the fault type and the appropriate fault representation provided at the error URI: http://rest-endpoint/{search|transfers}/(jobid)/error." I'll add a FaultType for operation not supported where appropriate in the next version.
The convenience mode is just a shortcut around some of the UWS negotiation but still uses most of the UWS pattern. You cannot have one without the other (at least in this version).
- OK - can you make that last point explicit in the text -- MarkTaylor - 2012-08-01
-- MatthewGraham - 31 July 2012
Comments from TCG member during the TCG Review Period: 29 August 2012 - 26 September 2012
WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or not the Standard.
IG chairs or vice chairs are also encouraged to do the same, althought their inputs are not compulsory.
A revised version (24-Aug-2012) of the document which addresses all the issues raised during the RFC period is available here:
TCG Chair & Vice Chair ( _Séverin Gaudet, Matthew Graham)
Three minor things:
- In Figure 1 VOSpace in the IVOA Architecture:
- The presence of CDP is not explained in the associated paragraph nor is it referenced in section 4 Access Control.
- The presence of Resource Indentifier is not explained in the associated paragraph
- In Section 4 Access Control, the first line "[NOTE: use HTTPS with client authentication and valid X.509 certificate)" should close with a "]" and not a ")"
With these minor fixes, I approve.
-- SeverinGaudet 2012-12-18
Done
-- MatthewGraham 2012-12-21
Applications Working Group ( _Mark Taylor, Pierre Fernique)
Some unresolved issues from the RFC period:
- Soft hyphens are still present and causing trouble, e.g. in the <transfer> examples in sec 1.1 I can't tell if the URI for transfers is supposed to end "httpget" or "http-get"; either way, in the browsers I'm using it looks clearly wrong.
- Extraneous namespace attributes have been removed from the <transfer> examples in sec 1.1, thanks, but the closing angle brackets on the transfer element start tags have gone with them; they need to be reinstated.
- The extraneous namespace attributes are still present on the <node> example in sec 1.1, ideally remove them.
- My RFC period query about use of curly brackets in sec 3.8 and elsewhere in the text, which eliceted the reply "I'm checking what [the UWS] convention means" has not been resolved.
If these are fixed, Apps WG recommends acceptance.
-- MarkTaylor - 2012-09-03
Done
-- MatthewGraham 2012-12-21
Data Access Layer Working Group ( Patrick Dowler, Mike Fitzpatrick)
One small things that I feel is missing from the document is specifying the standardID for this version of the vospace spec. Further, since the various resources can be named differently in different services, it will be vital to provide a basic VOSI-capabilties document that a client app could read to figure out which resources are available. I assume these would just be $standardID#resourceName (eg ivo://ivoa.net/std/VOSpace/v2.0#nodes). Could this be added (maybe to section 2 and/or 3.8)? Even without a registry extension to describe details/policies/etc a basic VOSI-capabilities could be implemented with these URIs.
I approve this document.
-- PatrickDowler 2012-10-02
Done
-- MatthewGraham 2012-12-21
Data Model Working Group ( _Jesus Salgado, Omar Laurino)
I approve this document.
-- JesusSalgado 2012-12-12
Grid & Web Services Working Group ( Andreas Wicenec, Andre Schaaff)
I approve this document.
-- AndreSchaaff - 2012-12-18
Registry Working Group ( _Gretchen Greene, Pierre Le Sidaner)
I approve this document. Note to author, Pierre was unable to validate the xml schema doc included.
-- GretchenGreene - 2012-12-18
Semantics Working Group ( _Norman Gray, Mireille Louys)
TCG Review comments from NormanGray
There are no points of great substance in the following, but a couple of things are more than mere typos.
Punctuation: the use-case text in Sect 1.1 is hard to parse (I'm not going to go through the document sub-editing it -- it doesn't need that -- but this is an early and prominent section and so is worth particular attention). I suggest: "This is a two-stage process: creating a description of the data file (representation) in the VOSpace including any metadata (its properties) that they want to associate with it (e.g. MIME type), and then defining the transfer operation that will actually see the data file bytes uploaded to the VOSpace service."
There are still some 8-bit characters in the file, in particular in code examples (as others have noted); I'm looking at it with a UTF-8 encoding. It might be prudent to ensure that the HTML is 8-bit clean, rather than relying on content-types being correct (!)
Same section, two sentences beginning "This illustrates an important point...". It's not clear what point is being emphasised here, since it's unclear to me what the intended contrast is between 'the management of data... transfer' and 'the actual transfer'. Would "...it is only concerned with the metadata describing data storage and tranfer" be better, though it's not ideal? The overall point ends up pretty clear, nonetheless.
Sect 1.1: mention of ivo://ivoa.net/vospace/core#http??put: there's an 8-bit character here (a hyphen?) which doesn't match the specification of ivo://ivoa.net/vospace/core#httpput later in the document (again, others have noted this, above).
Sect 2, bulleted list, and Sect 2.1, "Note that any fragment identifier...": the link at the end of this sentence should be an intra-document fragment link to #fragref (getting all meta, here!).
Section 3.2, "The properties of a LinkNode do not propagate to the target of the LinkNode": the use-case here seems to contradict this statement, in the sense that using a link as proxy for annotations is a type of propagation of properties. I wouldn't have guessed that a LinkNode's properties would be retrievable when examining the target, so the initial sentence seems redundant, which means in turn that its presence makes me think I'm missing something. This paragraph would make more sense to me if 'do not' were removed from the first sentence, but I doubt that's what you mean.
Section 3.2.4, Standard properties: In "The following URIs SHALL be used to represent the service properties", the 'SHALL' is debatable. Say I put a picture of my cat in a VOSpace: should the ivo://ivoa.net/vospace/core#title be 'This is Basil the cat', or 'This is a picture of Basil the cat', or both? I'm not saying one of these is correct or preferable (that's a different argument), but that a client may want to use both, or want to resolve the ambiguity with two separate properties (which is of course allowed since the list is open), or simply be uncertain, from the description here, which of the two to use. Perhaps 'SHOULD' would be better in this section, in the sense of 'use these identifiers unless you have a good reason not to'. The 'SHALL's in Sect 3.3.5, 3.4.4, 3.5.3 are unexceptionable.
Section 3.7: the sentence "However the server MAY still impose its own limit..." is highlighted in red, but appears to be the only sentence in the document so highlighted. Is this an error -- the sentence doesn't seem quite worth of this attention.
Section 4: "[Note: ... certificate)": unmatched brackets.
Section 5, various subsections, Faults: "The service SHALL throw a HTTP 500 status code including an InternalFault fault in the entity body if the operation fails" If the service fails, it almost by definition can't control what gets sent back to the client. It might be worth making a general remark in Sect. 5 that if a service is able to detect a 500-worthy system failure, then it would be desirable for it to indicate this with a 500 code and an InternalFault representation (with an appropriate MIME type) if possible, but it can never guarantee this -- it might, for example, be a web-service container or a proxy that has failed, and that's going to reply with HTML. One could I suppose assert that 500 should never be a documented part of an interface, but if it's mentioned at all, then I think it should be a 'SHOULD' at the most.
Section 5.2.1.2, Response: Shouldn't the response code here be 201 Created? (RFC 2616 s9.6 says that 201 is a MUST response to a successful PUT). The same point might be true of s5.4.1.1 and other places where PUT is specified. Also, while we're on that subject, RFC 2616 s9.7 says that a DELETE method SHOULD result in a 200 if the response includes content, and 204 otherwise, but the example in the VOSpace s5.2.4.3 spec shows a 200 response with no content
Section 5.2.1.3, Faults: the list here switches from 'SHALL' to 'MUST' in mid-stream. RFC 2119 says these are equivalent, but the document should perhaps pick one or the other.
I like the compliance matrix of Appx B.
I approve the document, independently of the authors' responses to the above comments.
Done
-- MatthewGraham 2012-12-21
VOEvent Working Group ( _Matthew Graham, John Swinbank)
I approve this document.
-- MatthewGraham - 2012-12-18
Data Curation & Preservation Interest Group ( Alberto Accomazzi)
I approve this document.
One typo that should be fixed: there is an instance of a URI which contains the fragment #http-put rather than #httpput
-- AlbertoAccomazzi - 2013-02-26
Knowledge Discovery in Databases Interest Group ( Giuseppe Longo)
Theory Interest Group ( _Franck Le Petit, Rick Wagner)
I approve this document.
On the form, I just noticed in the HTML version of the document some characters that do not print properly in Mac browsers. For example between the http and the get in :
<protocol uri="ivo://ivoa.net/vospace/core#http�get"/>
-- FranckLePetit - 2012-10-07
Done
-- MatthewGraham 2012-12-21
Standards and Processes Committee ( Francoise Genova)