VOSpace service specification: Request for Comments

This document will act as RFC centre for the VOSpace service specification Proposed Recommendation V1.01. There is also an accompanying WSDL and schema.

Review period: 23 July 2007 - 20 August 2007

In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your WikiName so authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.

Discussions about any of the comments or responses should be conducted on the VOSpace mailing list, vospace@ivoa.net.

Implementation details

Three implementations of VOSpace 1.0 have been produced:

  • Astrogrid
    • Endpoint:
    • VOSpace root:
    • Secure:

Although not fully documented, the Caltech VOSpace implements all features of the VOSpace 1.0 specification including StructuredDataNodes and the ability to request different views.

Two VOSpace clients have also been produced at Caltech - one to talk to secure VOSpace services using digital signatures (WS-Security) and one to talk to unsecure services. These clients utilize a different underlying web service infrastructure to the one that the Caltech VOSpace implementation is based on (Apache Axis 1.2.1 Final vs. XFire 1.2.2).

The secure client has successfully negotiated data storage at the ESO VOSpace, including a third party data transfer into the VOSpace from a cone search service. This demonstrates sufficient interoperability of the implementations.

Comments from the Community

  • First sample comment (by MarSierra): ...
    • Response (by authorname): ...

Comments from MarkTaylor

  • Description members:
    • The Description members of various items are characterised "A text block describing ...". For implementors and users of this standard it would be helpful if a bit more detail about the intended format of this text block could be supplied, for instance should it be a short summary or a detailed description and should newlines and spaces in it be honoured or are multiple whitespaces insignificant. A mismatch in interpretation of this sort of thing between metadata supplier and data consumer can lead to ugly/unreadable presentation of such items to the user. I understand that the details of how various descriptions are to be written is somewhat dependent on schemas external to this document yet to be written, so possibly this can't be clarified at this stage.
      • One would hope that common sense prevails and that the length of the Description is sufficient to provide a concise overview and any specific usage details where necessary. As to formatting details, I agree that some sort of convention would be nice but we do not specify such constraints with other descriptive text elsewhere in the VO (c.f. the element in VOResource) so maybe this is a wider issue that needs to be addressed. (MatthewGraham)
      • I agree that we should establish a convention. Once we register the core set of properties, protocols and views, we can refer to these as examples of 'best practice'.(DaveMorris)

  • getProperties operation
    • accepts return value is described as "A list of identifiers that the service accepts and understands". What does "understands" mean here?
      • The intent here is that there might be metadata (properties) that is implementation dependent but can be used by the user to control operational aspects and this is what "understands" refers to. An example is access control: a VOSpace implementation might allow individual users to control the permissions on data objects via a property called something like "permissions". If the VOSpace receives a data object with this property then it understands what this property refers to and can deal with it accordingly. I agree that clarification is required in the spec and we will address this in the next revision. (MatthewGraham)
      • This is a way for the service to declare that it will interpret specific properties to have specific meanings. See VOSpace replies page for details. (DaveMorris)

    • contains return value: Is it wise to require this as a return? If the VOSpace service is implemented in such a way that it can accept arbitrary properties to be associated with each node, then I'd have thought this could be rather a large list and expensive to determine.
      • Firstly, any VOSpace can accept arbitrary properties - this is not implementation dependent. Some properties might be implementation controls which is what accepts returns (see comment above on this). Now it need not necessarily be expensive to determine contains but this is an implementation detail. Given that arbitrary properties can be used, there should be an operation that returns a list of which properties are being used in a particular space - this will aid with queries across federated VOSpaces, e.g. I want to find all data objects with this property. It is inefficient to do a full listing of each space when one can first of all just get a list of all the properties used within that space and determine whether it should be used further or can be passed over. (MatthewGraham)

  • listNodes operation
    • Either I've misunderstood how the wildcarding works or the list of matched examples is wrong. To me it looks like *.txt ought to match .txt and frog.txt but not .txtinfo or frog.txtinfo.
      • I agree that *.txt should only return .txt and frog.txt and will remedy this in the next revision of the document. (MatthewGraham)

  • Data model UML diagram:
    • s/propetty/property in Node box

  • setNode operation, Faults section:
    • "if the requests attempts" -> "if the request attempts"

  • pullDataFromVoSpace operation, Notes section:
    • "The any endpoint URLs..." rephrase.

  • pushDataFromVoSpace operation, Faults section:
    • Delete Bullet item 7 (partial repetition of bullet item 8)

-- MarkTaylor - 01 Aug 2007



From RayPlante - 20 Aug 2007

Nice job on this document; it has the proper level of specificity for a service specfication.

  • I think we need to treat these specifications like any other peer-review publication; thus, it is appropriate to include acknowledgements to our funders. For contributions by Matthew, I suggest the boiler-plate that I provided in my SSO comments.

  • There's an additional bit of boiler plate that I like to see in these documents that define abbreviations that we insiders are familiar with. Again, you can consider my suggested text for a preface setion called "Definitions".

  • I would strongly encourage numbering the sections of the document in the style "1.4.2". This allows one to refer to a specific defintition or requirement with a little more specificity (e.g. "This service not compliant because it does not ... as required by the the VOSpace specification, section 2.3.1").

  • Introduction: This document would greatly benefit from a subsection in the introduction (e.g. "Typical Use of a VOSpace Service") that steps us through an example of putting and retrieve data to/from a VOSpace, showing sample SOAP messages. This should introduce the major concepts, mapping them to concepts we understand (e.g. Node represents a file) without necessarily defining them generally. As it is now, it provides a bottom up description of the service--which is what you want in a spec; however, the reader doesn't discover how it all fits together until the end. (I felt like I was reading mystery novel, flipping the pages back to see how I was supposed to understand the clues revealed earlier. wink )

  • Introduction: Along the same theme, it would also be good to end the introduction with another subsection (e.g. "Document Roadmap") that describes how the rest of the document is laid out. With this road map in mind, the reader will understand how the parts will be fitting together as he/she is reading them.

  • Nodes and node types: please repeat the definition of VOSpace here. (It's given in the Abstract; however, an abstract is usually a summary of the contents of the document.) Then think carefully about how the term is being used in the section and make sure you are being consistant. I suspect that some clarification of the definition will be needed, but I'm not sure.

  • Nodes and node types: please provide a clearer semantic definition of node before launching into a definition of the types of nodes. It should be clear (perhaps with additional explanation) before talking about the types that a data file (something we all understand) is represented as a node.

  • Nodes and node types: paragraph after first bulleted list of types needs to end in a period.

  • Property identifiers: I strongly recommend that when IVOA identifiers are used to identify properties (as well as views and protocols) that they use the form, ivo://auth/blah#property-name. As the authors know, every IVOA Identifier must resolve to a separate resource description. By using the pound delimiter, all of the method names can be defined within a single resource description. The latest internal WD version of VOStandard (v0.2) supports this type of definition. We can push this to a released WD to help support VOSpace.

  • Views: In keeping with the above recommendation, I suggest that standard views be defined with one of the following forms:
    • ivo://ivoa.net/vospace#view-any, to include the definition in the standard that describes the VOSpace standard as a whole, or
    • ivo://ivoa.net/vospace/views#any, to have a resource specifically for the definition of views.
    I prefer the first alternative, because it consolidates all the VOSpace information in one resource document, which will make maintanence simpler and make the registry seem less cluttered.
    • Yes, we should probably have one document for the core set of standards, so probably ivo://ivoa.net/vospace/core#view-any. However, it should be clear that although putting multiple definitions into one resource may be 'best practice', it is not mandatory. (DaveMorris)
  • View descriptions, last sentence: The Registry WG can provide the clarity needed to replace this sentence with a more definitive statement on the timescale of the approval process for this document. In particular, the RWG should:
    • add support for DisplayName in the VOStandard schema (if desired)
    • release updated schema to WD status

  • Protocol identifiers: change form of IVO-ids to form using pound (#).

  • Local NFS transfers: change form of IVO-ids to form using pound (#).

  • Web service operations: it would be helpful to be a bit more explicit that when you say "description of the Node " that you mean a structured in terms of the model specified on p. 9. This can either be done by saying something like "(type: Node )" or with an explicit page or section reference to where Node is defined.

  • pushToVoSpace, Returns: it would be helpful to state here that the endpoint refers to the destination URL (yes?)
    • Yes, we will provide a supplementary document containing examples which should make this clearer. (DaveMorris)

  • pullToVoSpace, Parameters: it would be helpful to state here that the endpoint refers to the source URL (yes?). (And so on with other operations.)
    • Yes, we will provide a supplementary document containing examples which should make this clearer. (DaveMorris)

  • Can we include/reference the official standard WSDL?

  • Can we define the Registry VOResource extension for registering VOSpaces? Is this coming?

-- RayPlante - 20 Aug 2007



From DougTody - 20 Aug 2007

It is good to see progress on VOSpace, as we badly need this to add flexible data management capabilities to our services, and provide for more than just basic client-server data access. My comments are mostly fairly basic. I won't try to comment upon whether these should be addressed in this version or the next one. My apologies if I misunderstand aspects of the specification.

General comments:

  • It is not enough for Grid technology like VOSpace to be demonstrated in code written by the design team; we need to verify that it is useful in actual real world applications, written by others. While it is useful to have a specification for trials, I am not sure we should accept this as a proven standard until this has been demonstrated in real applications.
    • There are a number of points to respond to here:
      • Firstly I am not certain why Grid technology is being specifically singled out here. The whole approach of the VO is to employ a service-oriented or distributed computing (Grid) approach. In fact, it is very unclear what is meant by Grid technology in this context at all: VOSpace is solely a SOAP-based web service as are all SkyNodes, registry interfaces, programmatic interfaces to Footprint Services, Spectrum Services, CasJobs, STOMP, Wesix, and indeed IRAF. VOSpace implementations may employ WS-Security as a security mechanism but this is just one of the approved authentication mechanisms under the IVOA SSO Profile and nothing peculiar to VOSpace. So whatever holds true for VOSpace, holds true for all the VO web services and vice versa.
      • Secondly the requirement for RFC (actually PR) is that "each feature of the technical report has been implemented" and "preferably, the Working Group should be able to demonstrate two interoperable implementations of each feature". There is no stipulation about who the implementors are and, in fact, it is hard enough to get two independent implementations from the working group let alone from some other developers.
      • Thirdly there has been work by groups not connected to the design team, notably the SRB team at SDSC and Tamas Budavari at JHU, to implement VOSpace 1.0 on top of SRB and Amazon S3 respectively. These are ongoing efforts, however, and not directly relevant for the RFC process.
      • Lastly this really is an issue for the IVOA Exec and TCG to address. (MatthewGraham)

  • The basic concept and approach appears sound. That is, address a single service first, ignoring links between multiple VOSpaces. Separate interface from the underlying implementation. Demonstrate that basic data storage and transport can be achieved.

  • As it stands though I am not sure I can understand all the details required to use this for basic data storage and transport. The key issue is that details essential for basic data manipulation, such as data format, basic file attributes, and transport protocol (HTTP etc.), are not addressed directly in the specification. This appears to be left to the service implementor, or to the client trying to upload data, to describe indirectly in registry records (View, Protocol, Transfer, etc.) independent of the actual VOStore specification. While the flexibilty to describe arbitrary data formats and transport protocols is nice, this approach appears overly general, with the result that the basic VOSpace specification will be hard and ambiguous to use and does not adequately address basic data management using common formats and protocols.
    • The aim of VOSpace is to be a lightweight abstraction layer that sits on top of any data storage hardware, ranging in scope from the hard drive on my laptop to a Petabyte SRB facility on the other side of the world. This necessitates that the common interface bears no direct implementation dependencies but rather just addresses the fundamental operations inherent in storing and retrieving data whilst also providing a mechanism (logical identifiers) for implementation details (such as data format, file attributes, transport protocols) to be specified. There might be a case for an IVOA policy document that specifies how VOSpace is to be used in terms of which common formats and protocols must be supported but this is an ancillary document to the interface specification. This document does state that the VOSpace team will register a core set of standard transport protocols. (MatthewGraham)
    • Point taken, the specification on its own is very abstract. We will work on providing a supplementary document containing worked examples, which should make this clearer. (DaveMorris)

  • In my view, a basic architectural principle for services is that it should be possible to understand and use any service stand-alone, independent of other software such as the registry (although the registry might be used for related higher level functions such as discovery). This does not appear to be the case here, as fundamental information about transport details and data formats or attributes are only available via indirect URIs which are intended to be registry resolvable (there are some weasle words about nonresolvable URNs, but clearly this is discouraged and registry integration is the intention).
    • We will provide a supplementary document containing examples, which should make this clearer. (DaveMorris)
    • Once we have registered the core properties, protocols and views then these will all become fixed URIs. Although the URIs would refer to descriptive registry resources, in practice simple clients and services would not need to dereference these via the registry. (DaveMorris)
    • During normal operation neither the service or client should need to dereference any of the URI identifiers. They can all be treated as opaque identifier strings. See VOSpace replies page for a more detailed example. (DaveMorris)
  • Basic things such as a data format ("view") or available transfer protocol (e.g., HTTP) are described indirectly in descriptors which are stored in a registry and referenced by URI (so far as I can tell). Although it does not explicitly say, I suspect we might be able to use string equality on such a URI to test for these things (as one would test a MIME type for example), but in principle we would need to look the URI up in the registry, and parse the XML data structure which comes back, before we can determine basic information about what a service can do. The actual spec repeatedly says things like "at the time of this writing, the schema for registering in the IVO registry has not been finalized". Even ignoring the undesirable registry interaction, which should not be required to directly use a service, it does not appear that such details are sufficiently specified at this point.
    • Addressing the last two points together:
      • Firstly logical identifiers (URIs) are used to refer to arbitrary metadata (properties) associated with a data object, data formats and transport protocols. Again this is to prevent any implementation dependencies in the specification. A specific set of URIs that should be supported could be stated in a usage policy document. This really is no different from using MIME types as a format identifier, it's just that we are familiar with the syntax from everyday usage so that we do not need to refer to a document to know that image/jpeg refers to the JPEG format (but you do look it up when you come across an unknown MIME type). With practice, the same will become true of our URIs, e.g. "ivo://net.ivoa.vospace/views#jpeg" and the XML data structure will not need to be parsed.
      • Secondly the specification is clear that nonresolvable URIs such as 'urn:my-data-format' "may be sufficient for testing and development on a private system but [are] not scalable for use on a public service" and that production systems should use URIs that are resolvable into descriptions of what they represent. Ideally these should be IVO registered URIs resolvable to a description in a registry since that is currently the only framework we have within the IVOA to support such entities.
      • Thirdly it is true that the schema for registering the resolvable descriptions in the IVO registry has not been finalized. We cannot define everything ourselves and are dependent on the Registry Working Group to provide this component of the infrastructure for us. However, please note Ray's comment above about VOStandard v0.2 and his offer of pushing this to a released WD to alleviate this particular problem. (MatthewGraham)
    • Once the core set of properties, protocols and views have been registered, simple client and server implementations may treat the URIs as hard coded string constants. There is no need to dereference the registry URIs unless a complex GUI client wants to use the information to display the user friendly names. See VOSpace replies page for a more detailed example. (DaveMorris)
  • The interface defines adhoc SOAP methods which can query the service capabilities; these (appear to) return obscure, indirect URIs which contain further information on service capabilities such as what transport protocols the service supports. Whatever happened to the proposed getCapabilities operation, which is supposed to describe the capabilities of a service instance? This would seem to be the obvious way to describe things such as what formats (views), or transport protocols a given service instance supports.
    • The proposed getCapabilities operation is defined as part of the latest WD (16 May 2007) of the VOSI specification that all VO services shall implement. In light of the lack of stability in this specification at the moment, its status as a WD only and, indeed, any other service providing this mechanism, it would be unwise to specify this as the sole mechanism for retrieving service capabilities at the current time. As and when this stabilises and matures to a recommendation, we can easily incorporate it into our existing services as SOAP operations on a separate SOAP endpoint, which is one of the specified implementation possibilities. (MatthewGraham)

More detailed comments (these are representative, I am not attempting to be uniform and complete at this point):

  • Although the spec says it does not attempt to address hierarchy, the restriction to "no slashes" in a URI path appears arbitrary. The internal logical structure implied by a pathname should be transparent to something like VOSpace, for which this is merely a component of a string identifying the file within a VOSpace. This might change if the VOSpace supports "directory"-level operations, but this could be added without affecting file pathnames within a VOSpace. Otherwise, to flatten a directory hierarchy, it will be necessary for a VOSpace to invent arbitrary filenames, unique within a VOSpace, to substitute for pathnames containing a slash.
    • The reason for the restriction to 'no slashes' is explained: "slashes in the path imply a hierarchical arrangement of data, as is normal with URIs [see RFC 2396]. Since the current version of this specification does not support data hierarchies, an identifier for a node in a current service must have one slash at the start of the path [to denote the root node] and no other slashes." Containers supporting data hierarchies will be introduced in VOSpace v1.1 and the restriction on slashes will be removed. Our intent is to aggressively attack v1.1 as soon as v1.0 is passed so that users will not have to flatten directory hierarchies for long. (MatthewGraham)
    • As Matthew says, this was done to avoid problems when we introduced hierarchical trees in VOSpace 1.1. See VOSpace replies page for a more detailed example. (DaveMorris)
  • I am not convinced of the need for "structured" nodes at this early stage, or for a VOSpace to perform arbitrary file format or data model-based conversions. This might be useful to allow a VOSpace to be used to access tables stored natively in a RDBMS, but at present this is not sufficiently well defined, at least not in the written specification. It is not clear whether a VOSpace should natively provide such a capability when this will already be provided by the more object-oriented DAL interfaces such as TAP, SIA, etc. It might be best to first provide a solid VOSpace interface for basic "file" (simple byte stream) access before addressing object-oriented access.
    • The requirement to support RDBMS-based storage systems has been one of the primary use cases for VOSpace since its inception. The intent is that DB-based services such as CasJobs will expose user areas such as MyDB via VOSpace. This type of functionality has been demonstrated technically with one of the previous implementations of the VOStore specification and there is ongoing work with JHU to implement this with the current VOSpace interface. Such a capability is also provided by SRB and other large storage technologies. The distinction between structured and unstructured nodes is not, however, really whether the underlying hardware implementation is db- or file-based but rather with the former that the space understands the data format of the associated data object and can perform transform operations as a result. This functionality is optional but applicable to arbitrary data formats and not just those accessible via TAP or SIAP. (MatthewGraham)
    • Support for StructuredData? was added to the specification to compliment the DAL services, not to replace them. One target application of vospace is to provide a way of importing data into these services. See VOSpace replies page for a more detailed example. (DaveMorris)
  • It seems overly complex to describe each node property as an independently resolvable URI. The obvious solution is more along the lines of a simple "name=value". Yes, names could fail to be globally unique, but this is possibly less of a problem than adoption of an overly-complex and under-specified approach (so far as I can tell, no standard properties, even for basic file attributes such as size, modify date, etc., have yet been defined). An alternative way to address the problem of uniqueness might be to use property names such as "type:name" where "type" defines a property namespace, e.g., "file". Then we could have one property instance (possibly optional) of something like "file:schema", the value of which would be a single URI pointing to something which defines all the defined names for that namespace. (This is similar to what we do for UTYPE namespaces already for example). While this could be extensible, the most common cases could be defined directly in the core standard, without any need to inspect the registry or any such outside service.
    • Properties are intended to represent arbitrary metadata associated with a data object and are expressable as keyword-value pairs. The only distinction is that we choose to use logical identifiers (URIs) as our keywords so that these can be globally unique. In practice, there will be a core set of properties addressing common metadata such as file size and these will be documented in a standard in the registry as the specification promises. If we do not use globally unique resolvable names then how can a user know for sure that file:size used by one space means exactly the same as file:size used by another without first querying both spaces to find out what the terms mean. This is especially true when implementation details of the underlying storage are hidden - file:size does not mean the same for a db-based system than for a file-based one. (MatthewGraham)

  • In "Views" it would be better to separate concerns such as the content type of a file (FITS, VOTable, etc.), from unrelated matters such as GZIP compression (ZIP is different yet since it is a multi-file container). Otherwise it is much harder for a client to sort out what a "View" offers; it would have to parse a View descriptor, understand all the options, see what is offered in this particular view and how that compares to what the client wants, and so on.
    • Views refer specifically to the data format being used to transfer data objects. I appreciate that "tar.gz containing a VOTable" could be read as a conflation of file content type and transport container format but our intent is that "tar.gz containing VOTable" is a subset of "tar.gz" and just a semantically richer specification of the transportation format. This will, however, be covered in greater detail in VOSpace 1.1 where it has more of an impact (container-level metadata). (MatthewGraham)

  • For "unstructured" data nodes it should be possible to record primary data attributes such as the MIME type of the "file" (data node) directly, without having to deal with some indirect registry entry for a "View". The obvious thing would be a property such as file:MIMEType. If we really need object metadata at the level of VOStore, this could be a separate property namespace such as "table:NRecords".
    • A core property for MIME type is fine. However, we will use the URI name. (MatthewGraham)

  • What does moveNode do, really? Is this a rename, as in Unix?
    • In VOSpace 1.0, moveNode is just a rename but in VOSpace 1.1, it will be a full Unix move. (MatthewGraham)

  • Does pushToVOSpace deal with a single data node? It says it returns a list of URLs, but there is only a single destination node, so I would guess that the URLs refer to alternative transport protocols. How does the client decide which to use -does it have to parse each URL to determine the protocol? (Possibly this is addressed in the WSDL, but semantic details like this should be addressed in the written specification as well).
    • Yes, pushToVoSpace deals with a single data node. The process is described in the "Asynchronous transfers" section on page 19 and we should probably reference it from pushToVoSpace. Basically, however, a user specifies a list in order of preference of transport protocols that they are prepared to use. The space then returns a 'negotiated' list of protocols that can be used with implementation details filled in (URLs). The user then works through the list attempting to use each protocol until one is successful, which may very well be the first one. (MatthewGraham)
    • Yes, see VOSpace replies page for a more detailed example. (DaveMorris)
  • One of the things I was looking for was whether a URL could be obtained to directly GET or PUT a file, and this does appear to be the case (e.g, pullFromVOSpace). This would allow us, for example, to have a data service deliver a message to a client giving the access reference URL for a data object, once it has been generated by an asynchronous service. This might allow the VOSpace machinery to be hidden from a simple client, for example when functionality such as VOSpace and UWS are used within a TAP or SIA data service.

-- DougTody - 20 Aug 2007


Comments by TCG

Chairs should add their comments under their name.

Mark Allen (Applications WG)

I approve

Christophe Arviset (TCG vice Chair)

I approve this document. The detailed description of all the Web Services operations is really useful.

Bob Hanisch (Data Curation & Preservation IG)

I approve.

Gerard Lemson (Theory IG)

i approve.

Mireille Louys (Data Models WG)

I approve the document

Keith Noddle (DAL WG)

I approve this document. VOSpace V1.0 is the foundation upon which the IVOA can build truly useful data storage services. I hope we see such services and an enhanced standard (V1.1?) emerge quickly once VOSpace V1.0 has been approved.

Francois Ochsenbein (VOTable WG)

I approve the document.

Pedro Osuna (VOQL WG)

I approve.

Ray Plante (Resource Registry WG)

I see that my comments from the RFC period were addressed; therefore, I approve this document.

Andrea Preite-Martinez (Semantics WG)

I approve.

Roy Williams (VOEvent WG)

I approve this document. While this document does not give all the detail necessary to build a full implementation (schemas not defined, etc), it does provide a clear description of how the system works.

Response from MatthewGraham: This document gives a full description of the interface and the request and response messages that it uses and is perfectly sufficient to build a full implementation. However, what Roy is alluding to is that the IVOA usage policy for VOSpace 1.0 is not defined in this document. This relates to what URIs identify the core metadata and the registry extension schema for registering descriptions of these. We have addressed these points in the comments above and the document does note that these are ongoing activities but not necessary for approval of the interface. As mentioned in the latest version of the spec (1.02), GWS WG is working on an ancillary document which will provide this information.


Topic revision: r24 - 2007-09-14 - ChristopheArviset
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback