International Virtual Observatory Alliance

Difference: VOSpace11Spec (6 vs. 7)

Revision 72007-09-23 - DaveMorris

META TOPICPARENT	name="VOSpaceHome"

VOSpace home page

Discussion of the VOSpace 1.1 specification

This is a discussion page for the VOSpace-1.1 service specification.

Version 1.1 aims to extend the VOSpace-1.0 specification to include links and containers. The proposed mechanism is that we introduce two new node types:

LinkNode - this is like a Node but also has a URI to where the link is pointing
ContainerNode - this cannot hold any data (no bytes) but can have children nodes (of any type) and views for container level formatting (aggregate zip/gzip).

At the May 2007 interop, we identified the following items as requiring resolution for VOSpace 1.1:

Container level metadata - how to distinguish those that relate to the contents of the container through inheritance and those to the container itself
Generated names - vos://null does not work for containers so use ".auto" or "/" as an alternative
Typing of protocol and view parameters - these are currently designated as "string" but normal parameters are "URI"
ACL - although this forms part of a wider SSO context, should VOSpace have some notions of ACL control
Find - a equivalent to the Unix command is desired

This is somewhere where we can post proposals and to enable interested parties to discuss the changes.

Please add your suggestion to this page and vote on other suggestions

+1 if you agree
0 if you think the proposal is useful but needs more work
-1 if you disagree with the proposal

If you register a 0 or -1 vote, then please add a link to a page outlining your objections or comments.

Logical storage units

A request has from our friends at SDSC to include references to the actual storage units that data is being deposited on. The use case is data replication so, for example, I want to move/copy a data object from a slow tape archive to an ultrafast disk but both hardware units are within the same VOSpace or I want to retrieve a data object from the ultrafast disk copy and not the slow tape one.

I think that we can incorporate this easily into our existing data model. We will refer to hardware units as logical storage units with the implication that they are identified via a logical identifier (URI) that is set by the particular VOSpace implementation. To get the list of available storage units from a VOSpace, we will need a method: getLogicalStorageUnits() which will return a list of URIs. These URIs may be resolvable to a description of the storage unit.

The logical storage unit identifier will be an optional argument in the entity so that as part of the data transfer negotiation, the user can specify a list of storage units that they want the data transferred to/from. The identifier will also be an optional argument in the entity so that specific hardware can be targetted in moving and copying data. (MatthewGraham - 13 Aug 2007)

Added:

>
>

I'm not sure there is a strong science use case for this. Turn your example round the other way, and what is the science use case for explicitly wanting to get the data from the slow tape store rather than the fast disk store ?

Adding references to the storage units will add a whole load of complexity to VOSpace, that is already handled by other tools and services. As soon as we start to deal with things like replication, we will need to define the expected behaviour in a lot more detail than just simply adding references to logical storage units.

Some of the question that we would need to answer (not a complete list) :

If the data for a node is stored on more than one storage unit, if I change the data on one unit, are the changes reflected in the other 'copy'.
How does this affect something like tabular data stored in a StructuredData node ?
- Can the data for a StructuredData node be stored in a database table and as a file on disk at the same time ?
- If so, what kind of validation is applied when I import data to the disk copy ?
- If I run a SQL statement that modifies the database table, are the changes replicated to the copy on disk ?

These are all solveable, in fact they have all been solved by systems such as SRB and iRODS. In which case, why try to re-invent the wheel ? If we try to solve these issues in VOSpace, then I am concerned that we will end up doing one of two things.

We base our solutions on how SRB and iRODS have solved the problems.
- In which case we are effectively saying "a VOSpace service must handle replication the same way that SRB does".
- This would make it much more difficult to implment a VOSpace service that uses an alternative replication mechanism.
We come up with our own solutions that behave slightly differently to the way that SRB and iRODS have solved the problems.
- This would make it much more difficult to implement a VOSpace based on SRB and iRODS.

Votes

name	vote	comment
MatthewGraham	+1	proposer
DaveMorris	-1	I'd rather handle this in a separate interface

Alternative service interfaces

In reference to the above suggestion of adding references to logical storage units to support data replication. Why attempt to re-invent the wheel.

If a VOSpace service is based on a SRB or iRODS system, then provide a way for the user to access the SRB or iRODS service interface directly.

If a VOSpace service uses a different replication mechanism, then provide a way for the user to control the replication using that mechanism instead.

The suggestion is we add a list of alternative service interfaces for accessing the node. These can either be aded to the existing provides list, or in a specific list of alternative service capabilities.

In the specific example of data replication using SRB or iRODS.

If we define a URI that means 'access the data using the iRODS service interface'. Then a VOSpace service that is based on a SRB or iRODS server can add the iRODS service interface in the provides list for a node.

    <node uri="vos://xxxx">
        ....
        <provides>
            ....
            <!-- iRODS service (version 0.0) -->
            <view uri="ivo://irods.sdsc.edu/interface/irods-v0.9">
                <endpoint>.....</endpoint>
            </view>
        </provides>
    </node>

In effect the VOSpace service is saying, "the data replication for this node can be handled using the iRODs service API at [endpoint].

A slight tweak to the VOSpace provides and view elements, and we get access to all of the iRODS service API for free.

Votes

name	vote	comment
DaveMorris	+1	proposer

ContainerNode

From the introduction above : this cannot hold any data (no bytes) but can have .... views for container level formatting (aggregate zip/gzip)

So, we have :

A ContainerNode may have child nodes
A ContainerNode cannot hold any data
A ContainerNode may have a list of views for accessing aggregated data.

The 'no data' part is (backend) storage specific, and should not be part of the external interface The specification should define what an external actor sees, not the internal implementation details.

Note - if it has a list of accepts and provides views, then to an external Actor a ContainerNode behaves the same way as a DataNode, and does indeed appear to handle data.

So the definition becomes :

A ContainerNode may have child nodes
A ContainerNode may have a list of views for accessing aggregated data.

However, I don't see the need to specify what type of data the views may or may not provide. A view that provided additional DublinCore metadata about the container itself is perfectly valid, but would be excluded by the 'aggregated data' clause.

So the definition simply becomes :

A ContainerNode may have child nodes
A ContainerNode may have a list of views.

Again - Why explicitly add clauses that exclude things that don't break anything ?

Votes

name	vote	comment
DaveMorris	+1	proposer

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

View topic | History: r14 < r13 < r12 < r11 | More topic actions...