VOSpace home page

VO and VOSpace data formats

This is a discussion page looking at defining and registering a list of standard VO data formats.


The recent discussion thread about data formats and MIME types on the DAL mailing list started with a question about "gzipped images in SIAP 1.0" and has openned up into a wider discussion about MIME types and HTTP headers.

The thread has highlighted the fact that there are at least three different concepts that we need to represent when transferring data.

  • What it contains : tabular data, a spectrum, or an image
  • How it is represented : the primary data format, FITS, VOTable etc (plus modifiers e.g. binary VOTable and tiled FITS)
  • How it is transferred : e.g. zip, tar, gzip, targzip compressed files or streams

The discussion has looked at various options for representing these concepts in the HTTP header fields :

  • Content-type
  • Content-encoding
  • Content-disposition

We have been struggling to solve similar problems within the VOSpace group. The problems we are trying to solve within VOSpace may be slightly more complex for a number of reasons :

  • VOSpace supports asynchronous 3rd party transfers, where the client arranges for data to be sent from one VOSpace service to another. This means that the data transfer occurs outside the scope of the SOAP or REST service call, and so the client cannot use the HTTP request fields to indicate what format it would like the data to be transferred in.
  • VOSpace needs to be able to support other data transfer protocols, like Ftp and GridFtp, which may not have the equivalent metadata fields for Content-type and Content-encoding etc.
  • VOSpace needs to be able to distinguish between
    • "I am going to send you a zip file containing a set of FITS images, and I want you to store it as a zipfile"
    • "I am going to send you a zip file containing a set of FITS images, and I want you to unpack it and store|process the individual FITS images"

However, the core problems of deciding which formats to use, and how to refer to them are the same.

The current VOSpace specification introduces the concept of registered URIs to refer to 'views' which unfortunately attempts to represent at least two of the concepts what it contains and how it is represented using a single URI. We know this is not sufficient for what we need, and plan to revisit this fairly soon.

Using registered URIs to identify content and formats has proved to be useful, certainly within the context of VOSpace. However, I appreciate that other groups may prefer to use MIME types or something similar, particularly for HTTP based access.

I think we should work together with the other groups to produce a list of the main VO data formats, their corresponding MIME types and see if we can identify an inheritance hierarchy where it is appropriate.

For VOSpace services, we would probably want to register these formats in a VOStandard registry resource, enabling us to use registry URIs to refer to them.

However, even if we eventually decide not to use the registered URIs in services and applications, creating a list of the main VO data formats and their corresponding MIME types will provide a useful resource for developers working on VO projects.

The first step is to make an initial list of the formats, define the corresponding MIME types and the inheritance hierarchy.


Edit | Attach | Watch | Print version | History: r7 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2007-05-29 - DaveMorris
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback