I'm not sure there is a strong science use case for this.
Turn your example round the other way, and what is the science use case for explicitly wanting to get the data from the slow tape store rather than the fast disk store ?
Adding references to the storage units will add a whole load of complexity to VOSpace, that is already handled by other tools and services.
As soon as we start to deal with things like replication, we will need to define the expected behaviour in a lot more detail than just simply adding references to logical storage units.
Some of the question that we would need to answer (not a complete list) :
- If the data for a node is stored on more than one storage unit, if I change the data on one unit, are the changes reflected in the other 'copy'.
- How does this affect something like tabular data stored in a StructuredData node ?
- Can the data for a StructuredData node be stored in a database table and as a file on disk at the same time ?
- If so, what kind of validation is applied when I import data to the disk copy ?
- If I run a SQL statement that modifies the database table, are the changes replicated to the copy on disk ?
These are all solveable, in fact they have all been solved by systems such as SRB and iRODS.
In which case, why try to re-invent the wheel ?
If we try to solve these issues in VOSpace, then I am concerned that we will end up doing one of two things.
- We base our solutions on how SRB and iRODS have solved the problems.
- In which case we are effectively saying "a VOSpace service must handle replication the same way that SRB does".
- This would make it much more difficult to implment a VOSpace service that uses an alternative replication mechanism.
- We come up with our own solutions that behave slightly differently to the way that SRB and iRODS have solved the problems.
- This would make it much more difficult to implement a VOSpace based on SRB and iRODS.
Votes
Alternative service interfaces
In reference to the above suggestion of adding references to logical storage units to support data replication.
Why attempt to re-invent the wheel.
If a VOSpace service is based on a SRB or iRODS system, then provide a way for the user to access the SRB or iRODS service interface directly.
If a VOSpace service uses a different replication mechanism, then provide a way for the user to control the replication using that mechanism instead.
The suggestion is we add a list of alternative service interfaces for accessing the node.
These can either be aded to the existing provides list, or in a specific list of alternative service capabilities.
In the specific example of data replication using SRB or iRODS.
If we define a URI that means 'access the data using the iRODS service interface'.
Then a VOSpace service that is based on a SRB or iRODS server can add the iRODS service interface in the provides list for a node.
<node uri="vos://xxxx">
....
<provides>
....
<!-- iRODS service (version 0.0) -->
<view uri="ivo://irods.sdsc.edu/interface/irods-v0.9">
<endpoint>.....</endpoint>
</view>
</provides>
</node>
In effect the VOSpace service is saying, "the data replication for this node can be handled using the iRODs service API at [endpoint].
A slight tweak to the VOSpace provides and view elements, and we get access to all of the iRODS service API for free.
Votes
ContainerNode
From the introduction above :
this cannot hold any data (no bytes) but can have .... views for container level formatting (aggregate zip/gzip)
So, we have :
- A ContainerNode may have child nodes
- A ContainerNode cannot hold any data
- A ContainerNode may have a list of views for accessing aggregated data.
The 'no data' part is (backend) storage specific, and should not be part of the external interface
The specification should define what an external actor sees, not the internal implementation details.
Note - if it has a list of accepts and provides views, then to an external Actor a ContainerNode behaves the same way as a DataNode,
and does indeed appear to handle data.
So the definition becomes :
- A ContainerNode may have child nodes
- A ContainerNode may have a list of views for accessing aggregated data.
However, I don't see the need to specify what type of data the views may or may not provide.
A view that provided additional DublinCore metadata about the container itself is perfectly valid, but would be excluded by the 'aggregated data' clause.
So the definition simply becomes :
- A ContainerNode may have child nodes
- A ContainerNode may have a list of views.
Again - Why explicitly add clauses that exclude things that don't break anything ?
Votes
|