Discussion of the VOSpace 1.0 specification Document

The VOSpace-1.0 specification has been completed and submitted as an IVOA working draft.

The final versions are available here :


This is a discussion page for the VOSpace-1.0 service specification document.

This is somewhere where we can post proposals and to enable interested parties to discuss the different versions.

For each version there is a Change request section - please add to this and vote on other suggestions

  • +1 if you agree
  • 0 if you have no particular preference
  • -1 if you think the proposal is useful but needs more work
  • -2 if you disagree with the proposal

If you register a -1 or -2 vote, then please add a link to a page outlining your objections or comments.

Details and discussion of implementation plans are here.



Version 0.22

This document was produced as a result of the discussions that have occured since the Victoria Interop meeting.

Change Requests

Allow typing of properties

The current scheme is limited to key-value pairs where the value is interpreted as a string. A problem with this that some key-values pairs might be intended to represent other datatypes, e.g. a date or a float, and without this typing information, it is impossible to check the validity of the value. It is always possible for a client to add this information with an xsi:type attribute, e.g. <property uri="ivo://net.ivoa/properties/date" xsi:type="xs:dateTime">2006-11-22T18:50:03Z</property> but this might not be interpreted properly by the browser. However, if we actually add a type attribute then we can cover this: <property uri="ivo://net.ivoa/properties/date" type="xs:dateTime">2006-11-22T18:50:03Z</property>. The attribute is optional and non-inclusion implies that the datatype is string. The value of the attribute can either be an XML datatype or a reference to an XML schema that describes the data structure thus allowing for more complicated properties such as:
<property uri="ivo://net.ivoa/properties/color" type="myschema.xsd">
 <color>
   <red>123</red>
   <blue>234</blue>
   <green>89</green>
 </color>
</property> 

Votes

name vote comment
MatthewGraham +1 proposer
DaveMorris -1 Needs more detail, or defer to 1.1 ? (see email for details)

Rename views to formats

This needs to be renamed to what it actually is, i.e. format(s), since the current name is universally confusing.

Votes

name vote comment
MatthewGraham +1 proposer
DaveMorris -1 Needs more detail, or defer to 1.1 ? (see email for details)

Decoupled data servers

Under the current scheme, it is assumed that there is some communication channel between the VOSpace and a data server, e.g. a gridftp server, so that when a pushTo or pullFrom is completed, the data server can notify the VOSpace service that the transfer has completed. This sort of activity is particularly necessary when the endpoint is a logical one, e.g. a one-time-use URL. This design is fine for the cases where we have implemented the data servers ourselves or have access to the source code so that we can add the callback; however, what happens when you are dealing with an off-the-shelf data server where this is not the case or non-trivial, e.g. the Globus gridftp server. One solution is to have the client notify the VOSpace when the transaction is complete (since this really is only a problem for the asynchronous services) so pushToVoSpace would become:

  1. Client calls pushToVoSpace(<node>, <transfer>) returns <node> and <transfer> - the latter containing details for the data server
  2. Client transfers data to data server
  3. Client notifies VOSpace that transfer has been completed, e.g.
transferComplete(<node>).

There are a couple of problems with this, however: the client has to call the space twice and might forget to do the notification call and what happens if the transfer fails or is not done. An alternate approach is to do the data transfer first of all and then register the data object with the node including its physical location so pushToVoSpace becomes:

  1. Client transfer data to data server
  2. Client registers data with VOSpace: register(, URI of location) returns the registered <node>

This is actually the only transfer method which needs a modification: all the others work fine with decoupled servers. In fact, instead of adding an additional operation, we can modify pushToVoSpace either to have an additional URI argument: pushToVoSpace(<node>, <transfer>, location-uri) or we could just incorporate the location-uri into the transfer so that if the protocol contains an endpoint then that endpoint is interpreted to be the physical location of the data object.

One thing that would be useful is another operation to return the list of (decoupled) data servers (resources in SRB speak) that the VOSpace is using so I would suggest that we add a getDataServers operation.

Votes

name vote comment
MatthewGraham +1 proposer
DaveMorris -2 Current version is enough, full asynch with callbacks is later (see email for details)

copyNode and moveNode to operate across stores

See message for rationale of above itemized list:

  1. make (file)name a mandatory node property
  2. use pullDataToVOSpace with copyNode and moveNode to operate across stores
  3. {copy|move}Node to support exception StoreFull
  4. drop list paging in favor of filtering by property value

Apparently this democratic page accepts votes: Well, obviously I vote in favor of my comments, however, in case of disagreement I'd rather see how to use level 1 as is based on some real scenarios. Markus

Four points in one post, so I'll add comments as a similar list (DaveMorris).

  1. make (file)name a mandatory node property
    • The name is in the node URI, so it is already mandatory
  2. use pullDataToVOSpace with copyNode and moveNode to operate across stores
    • If services exchange SOAP messages, then this causes problems interoperability problems later on. All future services would have to support all the previous versions of the SOAP messages.
    • The current specification does support direct data transfer between services, but the control messages always come from the client. i.e. The client contacts service A and asks for a URL to access the data, and then calls service B and asks it to import the data from the URL.
    • See email and wiki for more details.
  3. {copy|move}Node to support exception StoreFull
    • This is a good point, and we don't have anything in the specification to handle this. Should this be a new fault type, or do we just wrap it as a PermissionDenied fault with an appropriate message ? Either way, we probably need to add something to the specification to say what should happen.
  4. drop list paging in favor of filtering by property value
    • Simple filtering on names (similar to Unix ls) is going to be added to version 1.0.
    • Complex filtering on property values (similar to Unix find) has been deferred version 1.1.
    • The paging is primarily intended to enable clients to handle large lists without exploding.
    • See email for more details.

Votes

name vote comment
MarkusDolensky +1 proposer (yeah)
DaveMorris -2 (items 1 and 2)
DaveMorris +1 (item 3 needs to be addressed in the specification)
DaveMorris -1 (parts of item 4 will be included in version 1.0 or 1.1)

Make the destination of CopyNode and MoveNode a simple uri

The advantage is it simplifies the standard doc (so many restrictions on node) and is much easier for client to use... as does not have to create a node object - simply create a uri for the destination.

Votes

name vote comment
PaulHarrison +1 proposer
DaveMorris 0 It would be useful to have a list of updated properties for a copy, but possibly not worth the extra complexity in the schema

Re-assess what metadata should be returned with various faults

Some faults need to return metadata beyond simply their type to convery meaningful information about exactly what has gone wrong. e.g. for a setDataNodeProperties call could specify several read-only properties amongst a larger number of properties, and the client would not know which were the properties in error without a list being returned in the fault.

Votes

name vote comment
PaulHarrison +1 proposer
GuyRixon +1  
DaveMorris +1 Can you outline the proposed changes in a wiki page ?

Version 0.21

This document was produced as a result of the discussions that occured at the Victoria Interop meeting.

Changes

VOspace10Spec21Archive

Changes for later versions of VOSpace

Consider adding an optional wildcard matching identifier to parameters for ListNodes

This would allow the client to specifly a subset of the VOSpace to be listed - in effect the behaviour would be similar to the "ls" command in unix, with typical simle shell wildcard semantics.

Reason for change: improved efficiency - if ListNodes always has to list the whole VOSpace then it is a pretty blunt instrument, especially as the number of data objects in the space increases.

Use Case

Suppose that there is a 1.0 VOSpace containing 5000 data items and a workflow step is writing results into the VOSpace using a common prefix. The next step in the workflow wants to process all of the files produced, but only knows the prefix - without wild card matching the whole of the VOSpace needs to be listed to find the files.

-- PaulHarrison 19 Jun 2006

Votes

name vote comment
PaulHarrison +1 proposer
*Result*: consider for later version

Add a FindNodes operation

This operation would allow a simple search on the VOSpace - the level of functionality would be similar to the unix "find" command.

Edit | Attach | Watch | Print version | History: r34 < r33 < r32 < r31 < r30 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r33 - 2007-05-01 - DaveMorris
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback