International Virtual Observatory Alliance

Difference: VOSpace20Spec (5 vs. 6)

Revision 62009-10-06 - MatthewGraham

META TOPICPARENT	name="VOSpaceHome"

VOSpace home page

Discussion of the VOSpace 2.0 specification

This is a discussion page for the VOSpace-2.0 service specification.

This is somewhere where we can post proposals and to enable interested parties to discuss the changes.

Please add your suggestion to this page and vote on other suggestions

+1 if you agree
0 if you think the proposal is useful but needs more work
-1 if you disagree with the proposal

If you register a 0 or -1 vote, then please add a link to a page outlining your objections or comments.

Matters arising from IVOA Strasbourg (May 2009)

Pagination

Added:

>
>

The listNodes method is to be removed - listing will be possible with a straightforward HTTP GET to the node. The representation returned can be altered with the use of a 'detail' keyword:

http://nvo.caltech.edu/vospace-2.0/mjg/mytable1?detail= {min|max|properties}

When the node is a container, it will return a list of direct children nodes in the container. Now if the container contains a lot of nodes (>10000) then the client can specify a numerically limited subset using 'limit' and 'offset' keywords:

http://nvo.caltech.edu/vospace-2.0/mjg/mydir1?limit=500&offset=2500

The ordering is determined by the server.

The aim here is that listing no longer makes use of continuation tokens but that pagination of results is controlled by the client.

findNodes can employ a similar mechanism but is directed against the appropriate search resource and not the node since a search can be an asynchronous activity.

-- MatthewGraham - 05 Oct 2009

"The ordering is determined by the server": yes, but we should be careful of the details. The ordering must be consistent between calls such that each {limit, offset} is drawn from the same, ordered sequence. That's implied, of course, but I would state it explicitly.

What happens if a client specifies {limit, offset} in a GET request to a leaf node? Do we treat it as an error or ignore the paging parameters?

-- GuyRixon - 05 Oct 2009

Votes

name	vote	comment
MatthewGraham	+1
GuyRixon	+1

Service capabilities

Simple HTTP GET for client-server transfers

I don't see why we actually need this. The simple use case of having a URL that a web browser can click on is met simply by storing the data in a HTTP web server, not in a VOSpace server.

Unless there is a strong use case for this, I don't want to mandate that a VOSpace service must implement HTTP GET for data transfers. Adding that to the VOSpace service specification would dilute a lot of the existing work that has been done to make the service protocol agnostic.

I posted a request to the VOSpace list a while ago asking for a detailed use case for adding HTTP GET http://ivoa.net/forum/vospace/0906/0213.htm

As it is, all the examples we have seen so far have significant problems with them.

I would like to see a use case for HTTP GET with params that could not be better solved by using a separate portal or resolver service.

-- DaveMorris - 25 Aug 2009

Added:

>
>

The aim here is to maintain the protocol negotiation capability to support advanced protocols, but adding simple retrieval access via HTTP as well for cases where nonauthenticated access is permissible.

The HTTP URL should be persistent, i.e., it should be possible for multiple clients to retrieve the file given a single URL. Even where authentication is required we might want to retain the same URL, but merely require HTTPS to access it.

A view needs to specified to return the appropriate resource - a basic HTTP GET call to a node just returns the resource representation (listing). So the minimum call is:

http://nvo.caltech.edu/vospace-2.0/mydata/table3?view=ivo://net.ivoa/core/views/votable-1.1

This may result in a redirection to another endpoint if appropriate.

-- MatthewGraham - 05 Oct 2009

I have two problems here.

First, the mapping for a node from the vos:// URI to http:// is broken. That's a bigger problem that I won't detail here, but it has to be sorted out before this can work.

Second, most of the time one doesn't want to specify the output format; either there's no choice or the client wants the default. However, a get without a a view is already specified to get the node metadata.

I would make the basic form like this:

http://nvo.caltech.edu/vospace-2.0/mydata/table3?view=data

then use HTTP negotiation to sort out the available formats. We didn't use HTTP negotiation in VOSpace 1.x because SOAP breaks it. We haven't used it in DAL because DAL is supposed to be independent of HTTP. Here, we define VOSpace to be based on HTTP so we can use HTTP features

-- GuyRixon - 05 Oct 2009

Votes

name	vote	comment
DaveMorris	-1	needs justification

Parameterized version as alternate to XML representation

Needs a detailed use case to define what this means and why we need it. I would like to avoid VOSpace falling into the same trap that TAP seems to be in, trying to support two conflicting models of how a web service should work.

-- DaveMorris - 25 Aug 2009

Added:

>
>

Instead of HTTP POSTing a transfer representation to the space (http://nvo.caltech.edu/vospace2.0/transfers):

<transfer xmlns="http://www.ivoa.net/xml/VOSpaceTypes-v2.0">
  <target>vos://nvo.caltech.edu!vospace2.0/mydata/table3</target>
  <direction>pushToVoSpace</direction>
  <view uri="ivo://ivoa.net/vospace/core#votable"/>
  <protocol uri="ivo://ivoa.net/vospace/core#http-put"/>
</transfer>

we can HTTP POST a parameterized representation:

http://nvo.caltech.edu/vospace2.0/transfers?target=vos://nvo.caltech.edu!vospace2.0/mydata/mydata/table3&direction=pushToVoSpace&view=ivo://ivoa.net/vospace/core#votable&protocol=ivo://ivoa.net/vospace/core#http-put

This would return the URI of the transfer resource giving operational details of the transfer such as the endpoint.

-- MatthewGraham - 05 Oct 2009

This is a correct use of POST to create a child resource as per the original HTTP spec. In an actual request, the URIs in the parameters would need to be encoded, I think.

-- GuyRixon - 05 Oct 2009

Votes

name	vote	comment
DaveMorris	-1	needs justification

Added:

>
>

GuyRixon

I think (still trying to mentally track down the implications).

UWS for transfers (how to cancel a transfer)

We need something that represents the state of a transfer, enabling the user to cancel a long running transfer.

Using the existing IVOA UWS system would make sense, if it can do everything we need. However modelling the nested structure of a VOSpace transfer may need changes to the UWS specification.

So three separate questions

We need to represent the state of a transfer
We need a webservice API for modifying (cancelling) all or part of a transfer
Is the existing UWS specification the best way of doing this

-- DaveMorris - 25 Aug 2009

Added:

>
>

Under UWS, the elements in the transfer resource representation become parameters in the UWS Job resource representation. A transfer job could thus be created by a parameterized HTTP POST to the transfers endpoint:

This would return the jobid and it could then be initiated with a POST of a single parameter PHASE=RUN to the endpoint:

http://nvo.caltech.edu/vospace2.0/transfers/{jobid}/phase

A transfer could be aborted at any time with a POST of a single parameter PHASE=ABORT to the same endpoint.

-- MatthewGraham - 06 Oct 2009

Votes

name	vote	comment
DaveMorris	+1	need to control long running transfers

Added:

>
>

GuyRixon

I would add the ability to run the transfer from the original post by adding PHASE=RUN there.

Extra information requested

Added:

>
>

As part of the information returned in the resource representation of a container, the amount of space available within that container should be returned.

-- MatthewGraham - 06 Oct 2009

Matters arising from WD-2.0-20090513

The following comments refer to sections in the initial working draft of the version 2.0 specification.

Section 3.3 Service capabilities

Should we consider refactoring the capabilities elements to use a similar structure to the capability elements in the registry schema ?

At the moment, we just have a single URI for each capability, which means that if we want to distinguish between different versions, we need to add a version number to the URI. This results in vospace having to define a separate parallel set of URIs for all the IVOA services that a vospace may want to provide. The registry schema uses a single URI for all versions of a capability, with different the versions identified in the interface element. I'm not sure what changes to our XML schema would be required, but I think we should at least consider the possibility of re-using the capability and version URIs used by the registry.

Votes

name	vote	comment
DaveMorris	+1	proposer

Section 3.6.1 service initiated transfers

Do we need to have a mechanism for canceling long running or stalled transfers ?

Votes

name	vote	comment
DaveMorris	+1	proposer

Section 3.6.2 Client initiated transfers

I would like to add the option for the client to send an empty list of protocols. In which case the service would reply with the set of protocols that it is prepared to offer. This would make it easier to initiate a transfer from one service to another.

In the existing system, to transfer data from service A to service B, the client has to query service B to get a list of protocols it supports, pass the list of protocols to service A, which replies with the subset of protocols that it is prepared to offer, which the client then passes back to service B to initiate the transfer.

If service A responded to an empty protocol list by filling in the list of protocols it is prepared to offer, then this eliminates the need for the client to query service B to start the process.

The client would just send an empty protocol list to service A, collect the offers and pass them to service B, which then initiates the transfer.

Votes

name	vote	comment
DaveMorris	+1	proposer

Section 5.2.2 Delete

The current specification uses a HTTP DELETE command sent to the REST endpoint for a node. This is the standard scheme for a REST service, however it may cause problems for some implementations.

Firstly, it requires a HTTP client that supports the DELETE command, which may be a problem for some implementations. This is the only part of the specification that cannot be driven using the standard HTTP GET, POST and PUT operations.

It may also cause problems for implementations that treat a delete as a recursive operation, either to free physical resources or to check access permissions. In which case, the time taken for a delete operation to complete will depend on the number of child nodes in the tree, and may encounter problems with socket timeouts for a large and complex tree.

An alternative to using the HTTP DELETE command is to model a delete as an internal move to 'null' or 'trash'. This would mean the client would post a transfer requesting a move to 'null', making the delete operation asynchronous.

Note - these two do not need to be exclusive. The specification could support both the HTTP DELETE command as currently defined, and a move operation that accepts 'null' as the destination.

Votes

name	vote	comment <-- -->
DaveMorris	0	not sure if this is needed

Section 3.8 REST bindings

The current specification mixes nodes and transfers in the same REST binding.

If the base URL for a service node tree is

    http://hostname:port/service/nodes

then this URL points to a container node

    http://hostname:port/service/nodes/path

this URL points to a file node

    http://hostname:port/service/nodes/path/file

and this URL points to the REST interface for initating a transfer for the file

    http://hostname:port/service/nodes/path/file/transfers

and this URL points to the state of a transfer for the file

    http://hostname:port/service/nodes/path/file/transfers/3112

Issues

This means the name 'transfers' becomes a reserved word, and we cannot have a file or container with the same name.

It should be fairly easy to add code to the services that prevents the user creating a file or container with a reserved name. However, it becomes more tricky if the service supports uploading and unpacking tar or zip archives. It means the user has to check the contents of the archive to make sure it does not contain any conflicting names before they upload it.

One option would be for the service to rename conflicting files or directories when the archive is unpacked, but that leaves the question of how to tell the user that this has happend. If the user transfers an archive file into a vospace service, then the expectation is that the resulting vospace nodes will have the same names as the original archive.

There is also a potential conflict if a service provides both a vospace 1.1 and 2.0 interface, because the vospace 1.1 would allow the user to create a file called 'transfers' that would not be accesible from the vospace 2.0 service.

These problems could be mitigated by choosing a less common name, by adding special characters to the reserved name, e.g. '--transfers'. However, this may end up making things worse. 99.99% of the time it would just work, but fails unexpectedly when the user accidentally uses a name that matches the reserved word (users never read the manual, especially the small print).

Proposal

I would like to suggest a way of avoiding this by moving handling nodes and transfers with separate REST endpoints.

If we have two REST endpoints, one for nodes

    http://hostname:port/service/nodes

and use a separate endpoint for transfers

    http://hostname:port/service/transfers

To initiate a transfer, the client posts a transfer object to the transfers endpoint and the server responds with a redirect to the state of the new transfer.

    http://hostname:port/service/transfers/3112

This separates the two object types, the node endpoint handles nodes and the transfers endpoint handles transfers.

Issues

Two problems with this.

First, if the nodes and transfers are handled by separate endpoints, then there is no longer an implicit link between a transfer and the node it applies to. So each transfer object would need to contain the URI of the target node, which is what happens in the existing v1.1 SOAP service.

    [transfer url="http://hostname:port/service/transfers/3112"]
        [node uri="vos://service/path/file"/]
        ....
    [/transfer]

Second is how the client discovers where the transfer endpoint actually is. This can be handled as a capability element in the sevice registration and/or in the node details.

If the client uses the registry to resolve the service endpoint(s) base on a vos://... URI, then the service registration could list the two endpoints as separate capabilities.

    [capability standardID="ivo://vospace-nodes"]
        [interface ...]
            [accessURL ...]
                http://hostname:port/service/nodes
            [/accessURL]
        [/interface]
    [/capability]

    [capability standardID="ivo://vospace-transfers"]
        [interface ...]
            [accessURL ...]
                http://hostname:port/service/transfers
            [/accessURL]
        [/interface]
    [/capability]

Alternatively, given access to the node interface, a request for the node details could list the transfer endpoint as a capability for that node.

    [node uri=""]
        ....
        [capabilities]
            [capability uri="ivo://vospace-transfers"]
                [endpoint]
                    http://hostname:port/service/transfers
                [/endpoint]
            [/capability]
        [/capabilities]
    [/node]

Note that the examples above use 'service/nodes' and 'service/transfers' in the endpoint URLs as examples only. These would not be part of the specification, and would not be reserved words. The only fixed names in this scheme would be the URIs for the service capabilities (URIs used in the examples are just examples, final URIs would be full standardID URIs).

Votes

name	vote	comment
DaveMorris	+1	proposer

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

View topic | History: r8 < r7 < r6 < r5 | More topic actions...