From pjq@eso.org Thu Jan 29 07:26:04 2004 Date: Fri, 23 Jan 2004 11:35:30 +0100 From: Peter Quinn To: 'IVOA' Subject: FW: Data Model WG report ------ Forwarded Message > From: Jonathan McDowell > Date: Thu, 22 Jan 2004 20:49:47 -0500 (EST) > To: pjq@eso.org > Subject: Data Model WG report > > IVOA Data Models WG Progress Report > > Summary > ------- > > The Data Models WG is working on several fronts: > - UCDs > - Simple data model for DAL WG's Simple Spectral Access > - Full data model for astronomical data, currently divided > into Observation (high level to describe datasets such as images, > spectra etc. with all their metadata) and Quantity (reusable > low level standardization of arrays, keywords, coordinate systems). > The Quantity document is well advanced and will be recirculated > to the group shortly. > - Component models for parts of the problem, of which Space-Time Metadata > is the most advanced. > - Methods for serialization of models as XML > - At the suggestion of G. Lemson, I have drafted a FAQ for the > goals of the working group, currently at > hea-www.harvard.edu/~jcm/vo/docs > (it will go on the twiki soon). > > Despite three months of furious activity, the WG has not yet achieved > consensus on any standards. However, work on several documents has > greatly advanced and they should reach IVOA Working Draft circulation > stage soon. > > Activity has included three small technical meetings: one at ADASS > (Observation data model, about 10 people participating); one in > Baltimore in December (Quantity data model) with McDowell and Dittmar > (SAO), Thomas and Shaya (GSFC), Berry (Starlink); and one in Garching > this week (Observation again), expected to include McDowell (SAO), > Louys (CDS), Giaretta (Starlink), Micol (ESO) and Lemson (GAVO). In > between, we have been using the mailing list, which had a flood of > exchanges from October to mid-December, and email and telecon exchanges > with the smaller number of those actually working on documents. > > Technical Details > ----------------- > > * High level models > > *- Spectrum Model > > D. Tody and J. McDowell are collaborating on a simple Spectrum > data model to be used in the Simple Spectral Data Access protocol. > We tentatively propose that: > > - For spectral energy distributions, we represent photometry points > as single-point spectra and SEDS as a collection of spectra, > thus mixing continuous spectra and photometry in a single paradigm. > - Observation details are shared by each point in a single spectrum. > - An SED therefore consists of a number of Spectrum entries, each > of which contains observation metadata and a set of 1 or more data > points (and optionally a corresponding number of error values and flags) > - The error model to be used should support at least two-sided errors > and upper limits, since these are important in SEDs. > - The model can be serialized in FITS as a binary table with one > row per spectrum, using the variable-length array mechanism to accommodate > spectra of different lengths. > > The formal document has not yet been completed or circulated. A preliminary > paper setting out ideas for the spectrum data model was presented by J. > McDowell > at ADASS and will be published in those proceedings. > > - Observation model > > A technical meeting to reach consensus on an Observation data model was > held in October in Strasbourg during the ADASS conference. Significant > progress was made in reconciling the models proposed by different groups > at the high level. A second technical meeting to prepare a document on > the Observation model is planned for January 2004. > > We identified several common areas in our different views of > the Observation model. > > - For array data such as images, the data itself will be represented by > a Quantity object. > - A CoordinateSystem object is required (this is now to be incorporated > into the Frame object of the Quantity model) > - Curation object will hold traceability information and be compatible > with the curation data required by the Registry > - Coverage will summarize the bounds of the observation in at least the > space, time and frequency domains. > - Provenance will combine information on the instrument and observation > process, and the data acquisition process (the boundary between these > being fuzzy for modern instruments). This model will include > Observatory, Instrument, Processing sub-models. > - Source, Target, Field will need to be modelled to > describe the target of the observation. > > The next step is to elaborate each of these models, with the > goal of agreeing on a simple representation that will be > suitably extensible in future. > > *- XML Schema for astronomical objects > > In a poster at the ADASS meeting, E.Shaya presented an initial > concept for an all-inclusive astronomical schema that begins > with VO:AstroObject/Universes/Large-Scale Structures/Galaxies/Clusters > etc. down to meteorites. Each node contains as children the > appropriate parameters and attritubes (luminosity, mass, position, velocity > etc.). This could be used to support XMLQuery in a distributed > astronomical data system. G. Lemson and P. Dowler also presented > a knowledge model for astronomy which included both astronomical > objects and the processes for observing them. > > * Low level models > > Work on a Quantity model is now well advanced with a subgroup > consisting of J. McDowell (CfA), B. Thomas (Raytheon/UMD), > M. Dittmar (CfA), Ed Shaya (Raytheon/UMD) and Canadian > (P. Dowler) and UK (D. Berry) representatives iterating on > a definition document. A Quantity model would define a standard > method to connect numerical information with its errors, units, > and physical semantics in a way that could be reused in the > high level image and spectrum models. M. Dittmar and > B. Thomas presented preliminary Quantity models at the ADASS meeting. > > Following a technical meeting in December > in Baltimore, we identified broad areas of agreement: > - A quantity model representing both single numerical values > and arrays of values should provide the common interface > to data items which may have units, errors and quality information > as well as semantics provided by a UCD or equivalent; the units, > coordinate system and UCD form a context which we will call > a Frame and may be used separately from the values. > - When data values are provided in arrays, they may be associated > with coordinate axes which themselves have Frames giving > UCD semantics and units, > and whose coordinate values may be defined algorithmically > (by a WCS type function) or explicitly by enumerated values. > - The interface to the quantity model should support alternate > Frames for both the Quantity values and (if present) the > coordinate axes. For example, a Q instance may know how to present > itself in several different coordinate systems. > - The full StandardQuantity object inherits from a simple > BasicQuantity object that can be used for simple keyword-value pairs. > > We have recently resolved some outstanding technical issues: > > - whether a coordinate axis on an image can be represented > by a Quantity model (the axes and the Q itself have very similar > properties); this is now allowed. > - whether a BasicQuantity without errors, coordinate system etc. > should be defined as a separate object which Quantity inherits > from, or whether it is defined as a restricted interface to > the Quantity object; this is now defined by inheritance > - whether the support for explicit values on coordinate axes > can be provided without undesirable software overhead; this > use case is now adequately addressed. > We now plan to reach consensus with the one co-author who > has been out of the loop on the last few iterations, and then > circulate the document. > > * Descriptors and ontologies (UCDs) > > UCDs or something equivalent, already very useful in catalog > cross-match applications, will be critical to identify physical > phenomena in the serializations of the data models. > The UCD2 proposal was presented at Strasbourg in October 2003, and got a > mixed reception. The proposal (in the WG chair's opinion) is much more > capable of describing the semantics of astronomical data; but some > participants felt that it was too complicated and it wasn't clear how to > apply it in practice. A UCD1+ proposal has recently been > presented by Sebastien Derriere and seems promising. > > > * Space-Time and regions > > The Space-Time Coordinate metadata schema's transformation from choice > groups to substitution groups made some progress. A proper UML diagram, > independent of the XML schema is in an advanced state of completion. > A. Rots is still planning to generate two or three simplified schemas that > are upward compatible with the full schema. That upward compatibility > is extremely important since it provides a mechanism to make defaults > and implied assumptions explicit, which is a crucial requirement. > > General agreement was reached on short-hand notations for regions with > the people working on the VOQL. This issue is similar to the issue of > the simplified STC schemas mentioned above. Rots has promised a separate > design document for the Regions specification for the first quarter of > 2004. > > * Schema styles > > Ray Plante (UIUC) has proposed style standards for XML Schemas > to be used in the VO. > ------ End of Forwarded Message