VOSpace 1.1 specification
This contains details of the proposed specification for VOSpace 1.1.
Abstract
VOSpace is the IVOA interface to distributed storage. This version extends the existing VOSpace 1.0 specification to support containers, links between individual VOSpace instances, third party APIs, and a find mechanism.
Introduction
VOSpace is the IVOA interface to distributed storage. VOSpace 1.0 [REF] defined a flat, unconnected data space. VOSpace 1.1 builds on top of this and introduces the following new functionality:
- containers - this allows the grouping of data in a hierarchical fashion
- links - this allows the federation of distinct VOSpace services
- third party APIs - this allows data objects and collections to be exposed through other interfaces
- find - this offers a more extensive search capability than is provided by list with wildcard support
Document roadmap
The rest of this document is structured as follows:
{TO BE DONE FOR PAPER COPY]
VOSpace data model
VOSpace 1.1 extends the VOSpace 1.0 data model by introducing two new node types:
ContainerNode and
LinkNode. A new element, capabilities, is also added to the
DataNode node type.
{DISCUSSION POINT: Should capabilities only exist on the
ContainerNode?]
[NEW DIAGRAM]
ContainerNode describes a data item that can contain other data items. These can be of any type including other
ContainerNodes. A
ContainerNode has no data bytes associated with it directly but only with its contents - in a tree representation, a
ContainerNode is a branch whereas data objects are leaves.
ContainerNode extends
DataNode and so has the following elements:
- uri: the
vos://
identifier for the node, URI-encoded according to RFC2396 [REF].
- properties: a set of metadata properties for the node.
- accepts: a list of the views (data formats) that the node can accept.
- provides: a list of the views (data formats) that the node can provide.
- busy: a boolean flag to indicate that the data associated with the node or its children cannot be accessed.
[DISCUSSION POINT:
capabilities should be inthis list as well since we are adding to the standard definition of _DataNode_].
The
busy flag is used to indicate that an internal operation, such as the service implementation unpacking an archive format, is in progress and so none of the node data are available.
Container identifiers
Slashes in the URI path imply a hierarchical arrangement of data: the data object identified by
vos://nvo.caltech!vospace/tables/myTable1
is within the container identified by
vos://nvo.caltech!vospace/tables
. In fact, all ancestors in the hierarchy will be resolvable containers back to the root node of the space (this precludes any system of implied hierarchy in the naming scheme for nodes with ancestors that are just logical entities and cannot be reified, e.g. the Amazon S3 system).
[DISCUSSION POINT: The root node for a VOSpace must be represented by a
ContainerNode]
Inheritable properties
Properties on a
ContainerNode may be designated as
inheritable and will propagate to children nodes of the container if they are specified in the
accepts or
provides list for this node.
[DISCUSSION POINT: If a property is also declared on a child, which value takes priority? How are properties registered as inheritable?]
Container views
For VOSpace 1.1, a view is the data representation (format) of the file that is transferred. If the view is an archive format (tar, zip, etc.) then the space will provide access to the archive contents as children nodes of the container. Whether or not the space actually unpacks the archive is implementation dependent but the service will behave as though it has done so. For example, a client wishes to upload a tar file containing several images to a VOSpace service. If he associates it with (uploads it to) a Structured/UnstructuredDataNode then it will treated as a blob and its contents will be not be available. However, if he uses a
ContainerNode with an accepts view of "tar" then the image files within the tar file will be represented as children nodes of the
ContainerNode and accessible like any other data object within the space.
[DISCUSSION POINT: What are the names of the children nodes? Are these Structured/UnstructuredDataNodes? What is the default? How is this set?]
[DISCUSSION POINT: How does the service identify what it considers to be archive formats?]
If a provides view is an archive format (tar, zip, etc.) then the space will package the container and all its children nodes in the specified format.
LinkNode describes a node that points to another node. These can be of any type including other
LinkNodes. A
LinkNode has no data bytes associated with it.
LinkNode extends
Node and so has the following elements associated with it:
- uri: the
vos://
identifier for the node, URI-encoded according to RFC2396 [REF]
- properties: a set of metadata properties for the node. The properties do not propagate to the target of the LinkNode. One use case is to enable third-party annotations to be associated with a data object but without the data object itself getting cluttered with unnecessary metadata. In this case, the client creates a LinkNode pointing to the data object in question and then adds the annotations as properties of the LinkNode.
- target: the identifier, URI-encoded according to RFC2396, for the data object to which the LinkNode points.
Capabilities
A
Capability is a third-party interface to a data object. It enables data access using other non-VOSpace methods.
A
Capability has the following members:
- uri: the Capability identifier
- endpoint: the endpoint URL to use for the third-party interface
[DISCUSSION POINT: Should there be any more members to a
Capability, e.g.
param to specify additional arguments that might be required for access?]
Example use cases
A
ContainerNode contains image files and has a DAL SIAP capability so that the images in the container can also be accessed using a SIAP service. In this way, a user could create a container in VOSpace, drop some images into it and then query the set of images using the SIAP interface.
Another example is a
DataNode with an iRODS capability so that the data replication for this data object can be handled using the iRODS service API located at the specified endpoint.
Capability identifiers
Every new type of
Capability requires a unique URI to identify the
Capability.
The rules for the
Capability identifiers are similar to the rules for namespace URIs in XML schema. The only restriction is that it must be a valid (unique) URI.
- An XML schema namespace identifier can be just a simple URN, e.g. urn:my-namespace
- Within the IVOA, the convention for namespace identifiers is to use a HTTP URL pointing to the namespace schema, or a resource describing it.
The current VOSpace schema defines
Capability identifiers as
anyURI [TBD]. The only restriction is that it must be a valid (unique) URI.
- A Capability URI can be a simple URN, e.g. urn:my-capability
This may be sufficient for testing and development on a private system, but it is not scalable for use on a public service.
For a production system, any new
Capabilities should have unique URIs that can be resolved into a description of the
Capability.
Ideally, these should be IVO registry URIs that point to a description registered in the IVO registry:
- ivo://my-registry/vospace/capabilities#my-capability
Using an IVO registry URI to identify
Capabilities has two main advantages:
- IVO registry URIs are by their nature unique, which makes it easy to ensure that different teams do not accidentally use the same URI
- If the IVO registry URI points to a description registered in the IVO registry, this provides a mechanism to discover how to use the Capability.
Capability descriptions
If the URI for a particular
Capability is resolvable, i.e. an IVO registry identifier or a HTTP URL then it should point to an XML resource that describes the
Capability.
A
CapabilityDescription should describe the third-party interface and how it should be used in this context.
A
CapabilityDescription should have the following members:
- uri: the formal URI of the Capability
- DisplayName: a simple display name of the Capability.
- Description: a text block describing the third-party interface and how it should be used in this context.
Note that at the time of writing, the schema for registering
CapabilityDescriptions in the IVO registry has not been finalized.
UI display name
If a client is unable to resolve a
Capability identifier into a description then it may just display the identifier as a text string:
- Access data using urn:edu.sdsc.irods
If a client can resolve the
Capability identifier into a description then the client may use the information in the description to display a human readable name and description of the
Capability:
Standard capabilities
The VOSpace team intend to register
Capability URIs and
CapabilityDescriptions for the core set of
Capabilities, e.g.
- Cone Search
- SIAP
- SSAP
- TAP
However, this is not intended to be a closed list and different implementations are free to define and use their own
Capabilities.
Web service operations
A VOSpace 1.1 service shall be a SOAP service with the following operations:
Service metadata
getProtocols
This is unchanged from VOSpace 1.0 (Sec 5.1.1).
getViews
This is unchanged from VOSpace 1.0 (Sec 5.1.2).
getProperties
This is unchanged from VOSpace 1.0 (Sec 5.1.3).
[DISCUSSION POINT: Is this true - do we want to denote inheritable properties in some fashion?]
getCapabilities
[DISCUSSION POINT: Do we want this operation?]
Creating and manipulating data nodes
createNode
Create a new node at a specified location.
Parameters
This is the same as VOSpace 1.0 (Sec 5.2.1.1) except that:
- the permitted values of
xsi:type
are:
-
vos:Node
-
vos:DataNode
-
vos:UnstructuredDataNode
-
vos:StructuredDataNode
-
vos:ContainerNode
-
vos:LinkNode
.auto replaces
vos://null as the reserved URI to indicate an auto-generated URI for the destination, i.e.
vos://service/path/.auto will cause a new unique URI for the node within
vos://service/path to be generated.
The
capabilities list for the
Node cannot be set using this method.
Returns
This is the same as VOSpace 1.0 (Sec 5.2.1.2) except that:
- the capabilities list for the Node may not be filled in until some data has been imported into the Node.
Faults
This is the same as VOSpace 1.0 (Sec 5.2.1.3) except that:
- The service shall throw a LinkFound exception if the parent path includes a link.
- The service shall throw a LinkFound exception if the parent node is a link.
- The service shall throw a ContainerNotFound exception if the parent path is not composed solely of ContainerNodes
[DISCUSSION POINT: Do we need both a
LinkFound and a
ContainerNotFound exception or does the latter work for both cases?]
deleteNode
Delete a node.
When the target is a
ContainerNode, all its children (the contents of the container) will also be deleted.
Parameters
This is unchanged from VOSpace 1.0 (Sec 5.2.2.1).
Returns
This is unchanged from VOSpace 1.0 (Sec 5.2.2.2).
Faults
This is the same as VOSpace 1.0 (Sec 5.2.2.3) except that:
- The service shall throw a LinkFound exception if the parent path includes a link.
- The service shall throw a ContainerNotFound exception if the parent path is not composed solely of ContainerNodes.
listNodes
List nodes in a space.
When a target URI is a
ContainerNode, only
direct (first generation) children of the node will be listed.
Parameters
This is the same as VOSpace 1.0 (Sec 5.2.3.1) except that:
- Wild cards can only be used in the final part of the URL path: for example, a/b/c/*.txt is allowed by a/*/c/*.txt is not.
Returns
This is unchanged from VOSpace 1.0 (Sec 5.2.3.2).
Faults
This is unchanged from VOSpace 1.0 (Sec 5.2.3.3).
findNodes
Find nodes whose properties match the specified values.
Parameters
- token: An optional continuation token from a previous request
- No token indicates a request for a new find operation.
The server may impose a limited lifetime on the continuation token. If a token has expired, the server will throw an exception, and the client will have to make a new request.
- limit: An optional limit indicating the maximum number of requests in the response
- No limit indicates a request for an unpaged response. However the server may still impose its own limit on the size of an individual response, splitting the results into more than one page if required.
- detail: The level of detail in the returned response
-
min
: The response contains the minimum detail for each Node with all optional parts removed - the node type should be returned
- e.g.
<node uri="vos://service/name" xsi:type="Node"/>
-
max
: The response contains the maximum detail for each Node, including any xsi:type
specific extensions
-
properties
: The response contains a basic node element with a list of properties for each Node with no xsi:type
specific extensions.
- matches: A list of match elements identifying the properties and values to match against and whether these should applied in conjunction (and) or disjunction (or).
The
match element has a
uri attribute to identify the property to which it is applying. The regular expression against which the property values are to be matched is then specified as the value of the
match element:
<match uri="..."> regex </match>
The
match elements can be combined in conjunction and/or disjunction by specifying them as subelements of
<or>
and
<and>
respectively. For example, the predicate "(property1 and property2) or property3" would be specified as:
<or>
<and>
<match uri="property1"> regex </match>
<match uri="property2"> regex </match>
</and>
<match uri="property3"> regex </match>
</or>
[DISCUSSION POINT: Are wildcards allowed in the property URIs - find me all nodes where any property matches this regular expression? ]
An empty list of
<matches>
implies a full listing of the space.
Returns
- token: An optional continuation token, indicating that the response is incomplete
- The client may use this token to request the next block of Nodes in the sequence
- No token indicates that the list is complete.
- limit: An optional limit which must be present if a limit parameter was used in the request
- If present, the value is the value from the original request and not any limit imposed by the service
- nodes: A list of the Nodes matching the requested properties
Faults
- The service shall throw an InternalFault exception if the operation fails
- The service shall throw a PermissionDenied exception if the user does not have permissions to perform the operation
- The service shall throw a PropertyNotFound exception if a particular property is specified and does not exist in the space
- This does not apply if wildcards are allowed in the property URIs
- The service shall throw an InvalidToken exception if it does not recognize the continuation token
- The service shall throw an InvalidToken exception if the continuation token has expired
moveNode
Move a node within a VOSpace service.
When the source is a
ContainerNode, all its children (the contents of the container) will also be moved to the new destination.
When the destination is an existing
ContainerNode, the source will be placed under it (i.e. within the container).
Parameters
This is unchanged from VOSpace 1.0 (Sec 5.2.4.1).
Returns
This is unchanged from VOSpace 1.0 (Sec 5.2.4.2).
Faults
This is the same as VOSpace 1.0 (Sec 5.2.4.3) except that:
- The service shall throw a DuplicateNode exception if a Node already exists at the destination unless it is a ContainerNode.
- The service shall throw a LinkFound exception if the target path includes a link.
- The service shall throw a LinkFound exception if the target node is a link.
- The service shall throw a LinkFound exception if the parent path includes a link.
- The service shall throw a LinkFound exception if the parent node is a link.
- The service shall throw a ContainerNotFound exception if the parent path is not composed solely of ContainerNodes
- The service shall throw an InvalidArgument exception if the source is a ContainerNode and the destination is not.
- The service shall throw a ContainerNotFound exception if the target node is a ContainerNode and does not exist.
copyNode
Copy a node with a VOSpace service.
When the source is a
ContainerNode, all its children (the full contents of the container) get copied, i.e. this is a deep recursive copy.
When the destination is an existing
ContainerNode, the copy will be placed under it (i.e. within the container).
Parameters
This is the same as VOSpace 1.0 (Sec 5.2.5.1) except that:
- .auto replaces vos://null as the reserved URI to indicate an auto-generated URI for the destination, i.e. vos://service/path/.auto will cause a new unique URI for the node within vos://service/path to be generated.
Returns
This is unchanged from VOSpace 1.0 (Sec 5.2.5.2).
Faults
This is the same as VOSpace 1.0 (Sec 5.2.5.3) except that:
- The service shall throw a DuplicateNode exception if a Node already exists at the destination unless it is a ContainerNode.
- The service shall throw a LinkFound exception if the target path includes a link.
- The service shall throw a LinkFound exception if the target node is a link.
- The service shall throw a LinkFound exception if the parent path includes a link.
- The service shall throw a LinkFound exception if the parent node is a link.
- The service shall throw a ContainerNotFound exception if the parent path is not composed solely of ContainerNodes
- The service shall throw an InvalidArgument exception if the source is a ContainerNode and the destination is not.
- The service shall throw a ContainerNotFound exception if the target node is a ContainerNode and does not exist.
Accessing metadata
getNode
Get the details for a specific Node.
Parameters
This is unchanged from VOSpace 1.0 (Sec 5.3.1.1).
Returns
This is unchanged from VOSpace 1.0 (Sec 5.3.1.2).
Faults
This is the same as VOSpace 1.0 (Sec 5.3.1.3) except that:
- The service shall throw a LinkFound exception if the target path includes a link.
setNode
Set the property values for a specific node.
Changes to inheritable properties on
ContainerNodes will propagate to children nodes of the container where applicable.
Parameters
This is unchanged from VOSpace 1.0 (Sec 5.3.2.1).
Returns
This is unchanged from VOSpace 1.0 (Sec 5.3.2.2).
Faults
This is the same as VOSpace 1.0 (Sec 5.3.2.3) except that:
- The service will throw a LinkFound exception if the target path includes a link.
Transferring data
pushToVoSpace
Request a list of URLs to send data to a VOSpace node.
Parameters
This is unchanged from VOSpace 1.0 (Sec 5.4.1.1) except that:
[DISCUSSION POINT: If a
Node already exists at the target URI and it is a
ContainerNode, should it be overwritten by the target
Node or should the target
Node become a child of the
ContainerNode? This also applies to pullToVoSpace.]
Returns
This is unchanged from VOSpace 1.0 (Sec 5.4.1.2).
Faults
This is the same as VOSpace 1.0 (Sec 5.4.1.3) except that:
- The service shall throw a LinkFound exception if the target path includes a link.
- The service shall throw a LinkFound exception if the target node is a link.
- The service shall throw a ContainerNotFound exception if the parent path is not composed solely of ContainerNodes.
pullToVoSpace
Import data into a VOSpace node.
Parameters
This is unchanged from VOSpace 1.0 (Sec 5.4.2.1).
Returns
This is unchanged from VOSpace 1.0 (Sec 5.4.2.2).
Faults
This is the same as VOSpace 1.0 (Sec 5.4.2.3) except that:
- The service shall throw a LinkFound exception if the target path includes a link.
- The service shall throw a LinkFound exception if the target node is a link.
- The service shall throw a ContainerNotFound exception if the parent path is not composed solely of ContainerNodes.
pullFromVoSpace
Request a set of URLs that the client can read data from.
Parameters
This is unchanged from VOSpace 1.0 (Sec 5.4.3.1).
Returns
This is unchanged from VOSpace 1.0 (Sec 5.4.3.2).
Faults
This is the same as VOSpace 1.0 (Sec 5.4.3.3) except that:
- The service shall throw a LinkFound exception if the target path includes a link.
- The service shall throw a LinkFound exception if the target node is a link.
- The service shall throw a ContainerNotFound exception if the parent path is not composed solely of ContainerNodes.
pushFromVoSpace
Ask the server to send data to a remote location.
Parameters
This is unchanged from VOSpace 1.0 (Sec 5.4.4.1).
Returns
This is unchanged from VOSpace 1.0 (Sec 5.4.4.2).
Faults
This is the same as VOSpace 1.0 (Sec 5.4.4.3) except that:
- The service shall throw a LinkFound exception if the target path includes a link.
- The service shall throw a LinkFound exception if the target node is a link.
- The service shall throw a ContainerNotFound exception if the parent path is not composed solely of ContainerNodes.
Fault arguments
This is the same as VOSpace 1.0 [Sec 5.5] with the addition of:
This is thrown with the URI of the missing
ContainerNode.
This is thrown with the URI of the found
LinkNode.
References
[VOSpace] Matthew Graham, Paul Harrison, Dave Morris, Guy Rixon, VOSpace service specification v1.02, IVOA Recommendation 2007 October 01,
http://www.ivoa.net/Documents/latest/VOSpace.html
--
MatthewGraham - 07 Jan 2008