TWiki
>
IVOA Web
>
IvoaGridAndWebServices
>
VOSpaceHome
>
VOSpace11Formats
(revision 3) (raw view)
Edit
Attach
[[http://www.ivoa.net/twiki/bin/view/IVOA/VOSpaceHome][VOSpace home page]] ---+ VO and VOSpace data formats This is a discussion page looking at defining and registering a list of standard VO data formats. ---- The recent discussion [[http://www.ivoa.net/forum/dal/0705/0608.htm][thread]] about data formats and MIME types on the DAL mailing list started with a question about "gzipped images in SIAP 1.0" and has openned up into a wider discussion about MIME types and HTTP headers. The thread has highlighted the fact that there are at least three different concepts that we need to represent when transferring data. * _What it contains_ : tabular data, a spectrum, or an image * _How it is represented_ : the primary data format, FITS, VOTable etc (plus modifiers e.g. binary VOTable and tiled FITS) * _How it is transferred_ : e.g. zip, tar, gzip, targzip compressed files or streams The discussion has looked at various options for representing these concepts in the HTTP header fields : * _Content-type_ * _Content-encoding_ * _Content-disposition_ We have been struggling to solve similar problems within the VOSpace group. The problems we are trying to solve within VOSpace may be slightly more complex for a number of reasons : * VOSpace supports asynchronous 3rd party transfers, where the client arranges for data to be sent from one VOSpace service to another. This means that the data transfer occurs outside the scope of the SOAP or REST service call, and so the client cannot use the HTTP request fields to indicate what format it would like the data to be transferred in. * VOSpace needs to be able to support other data transfer protocols, like Ftp and GridFtp, which may not have the equivalent metadata fields for _Content-type_ and _Content-encoding_ etc. * VOSpace needs to be able to distinguish between * "I am going to send you a zip file containing a set of FITS images, and I want you to store it as a zipfile" * "I am going to send you a zip file containing a set of FITS images, and I want you to unpack it and store|process the individual FITS images" However, the core problems of deciding which formats to use, and how to refer to them are the same. The current VOSpace specification introduces the concept of registered URIs to refer to 'views' which unfortunately attempts to represent at least two of the concepts _what it contains_ and _how it is represented_ using a single URI. We know this is not sufficient for what we need, and plan to revisit this fairly soon. Using registered URIs to identify content and formats has proved to be useful, certainly within the context of VOSpace. However, I appreciate that other groups may prefer to use MIME types or something similar, particularly for HTTP based access. I think we should work together with the other groups to produce a list of the main VO data formats, their corresponding MIME types and see if we can identify an inheritance hierarchy where it is appropriate. For VOSpace services, we would probably want to register these formats in a VOStandard registry resource, enabling us to use registry URIs to refer to them. Even if we eventually decide not to use the registered URIs in services and applications, creating a list of the main VO data formats and their corresponding MIME types will provide a useful resource for developers working on VO projects. The first step is to make an initial list of the formats, define the corresponding MIME types and the inheritance hierarchy. ---- ---++ Data formats and container formats My initial guess is that we need two types of format, a data format and a container format. * A data format is a file format that contains a representation of the data, e.g. FITS or VOTable * A container format is a file format that contains other files, e.g. zip or tar Ideally, we would want to be able to come up with a standard vocabulary that could represent the following concepts : * The selected images are available as individual _FITS/image_ files. * The selected images are available as a _zip file_ containing the individual _FITS/image_ files In this model FITS may be a special case. From my admittedly basic understanding of FITS, it is possible for FITS to be both a data format (a FITS file containing tabular or image data) and a container format (one FITS file containing multiple images or tables). I defer to others who have a much better understanding of the FITS format and its useage within astronomy to define this in more detail. Ideally, we would want to be able to come up with a standard vocabulary that could represent the following concepts : * The selected images are available as individual _FITS/image_ files. * The selected images are available as a single FITS file containing multiple images I don't know if we can define a simple MIME type that distinguishes between these two. At the moment I'm looking at building a common list of what formats we want to be able to represent, and our initial best-guess at how we want to describe them. If this particular case is overly complex and rarely used, then we label it as 'there be dragons' and move on. ---++ Specialization and inheritance Some data formats may be specializations of existing formats, and may inherit the MIME type or other details from their parent format. ---+++ Java JAR format An example of this is the Java Archive format. From the JAR file [[http://java.sun.com/j2se/1.4.2/docs/guide/jar/jar.html][specification] : * _A JAR file is essentially a ZIP file that contains an optional META-INF directory ... [that contains] specific files and directories .... that are recognized and interpreted by the Java 2 Platform._ This implies that the JAR format extends the ZIP specification, with additional metadata in the META-INF directory. As far as I know, there isn't a specific MIME-type defined for Java JAR files, so the JAR format would 'inherit' the MIME type string from the ZIP format. This particular example may not be useful for astronomers, but I listed it here because it is an already established extension format that developers will be familiar with. ---+++ VOSpace archive format What I think will be useful for astronomers is to be able to say * _Archive this branch of my VOSpace as a zip or gzip file and save it to backup space_ then at a later date * _Load this backup into VOSpace and restore all the VOSpace metadata_ in order to support this, we may need to define a similar extension to the zip or gzip format that includes VOSpace metadata. ---+++ Survey specific FITS format Having talked about this with some of our astronomers, one thing they did mention would be useful is be to be able to define extension types that represent data from specific surveys. If we had a VO data type that represented _FITS image_, then they would like to be able to define a new type that represented a _FITS image from a specific survey_. This new type would extend the standard _FITS image_, and describe the specific FITS header fields that that particular survey used in their files. The extension type would not define a new MIME type, so files of this type would inherit the standard MIME type from _FITS image_. However, the more specific content URI would point to the extension type, enabling users and software tools to be able to process the data more accurately. At this point I don't know if this is a GoodIdea or not, or whether this information should be encoded in the content type or in a separate field. However, our astronomers seemed to think that the ability to distinguish between a _generic FITS file_ and a _FITS file from a specific survey_ was important. _How_ we enable them to make this distinction is up for discussion. <br/> <!-- * Set ALLOWTOPICRENAME = %MAINWEB%.TWikiAdminGroup -->
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r7
|
r5
<
r4
<
r3
<
r2
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r3 - 2007-05-29
-
DaveMorris
IVOA
Log in
or
Register
IVOA.net
Wiki Home
WebChanges
WebTopicList
WebStatistics
Twiki Meta & Help
IVOA
Know
Main
Sandbox
TWiki
TWiki intro
TWiki tutorial
User registration
Notify me
Working Groups
Applications
Data Access Layer
Data Model
Distributed Services & Protocols
Registry
Semantics
Interest Groups
Data Curation
Education
Knowledge Discovery
High Energy
Operations
Radio Astronomy
Solar System
Time Domain
Committees
Stds&Procs
www.ivoa.net
Documents
Events
Members
XML Schema
Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback