-- MireilleLouys - 2012-09-10
Comments on the use-cases :
A. there is an underlying core use-case not mentionned above :
create labels to map a piece of metadata (any_name, value) to an IVOA data model field if it exists and if usage defined in this model corresponds to usage in the considered service or application .
The very first goal why Utypes were invented is to "attach" to some metadata value the name of a data model attribute in an object oriented model describing astronomical observations and simulations. This was thought for a set of metadata , attached to an observation , and represented in a VOtable file.
The idea is just to propose as general uniform 'names' a set of strings logically constructed from the names of classes and attributes built up in data models.
Just looking at the variety of possible fits keywords and vocalulary extensions defined in various archives shows that the problem of defining a uniform langage for all pieces of metadata is too complex.
Relying on a recommended modeled metadata arrangement ( the current set of recommended Data models for IVOA) has given a framework for that.
Even restricted and bounded it covers common use-cases.
B. Not only flat serialisation to consider
Today we can distinguish more various ways to distribute data sets:( -- please iterate on this to enrich this section ...-- )
-- MireilleLouys - 2012-09-10/ updated Sept 12
Remarks on breaking out reqs 1a through 1c: Supporting 1a and 1b introduces a whole new set of constraints on the actual method(s) employed, and I'm pretty sure their cost in terms of specification complexity is rather high (e.g., 1b requires we define a DM specification language). If we go this way, we should do so seeing the cost. As to 1c, that's an el-cheapo dumb-down of 1b that's probably much easier to achieve but my just be good enough. -- MarkusDemleitner - 2012-09-12
Remarks on 8: The classic example (I'm reluctant to call it use case) is that an STC library should be able to identify and e.g., transform coordinates given in characterization metadata even if it has no idea that something like char exists. -- MarkusDemleitner - 2012-09-12
Remark on 9: This is currently realized by way of FIELDref/PARAMref; an obvious -- and format-agnostic -- alternative is to allow multiple "pointers" in a single utype, e.g., by concatenating individual foo:bar.quux strings with a separator (";" was once floated as a candidate for that). Other solutions, possibly less demanding on the format (FIELDref/PARAMref) or the utype format itself (internal structure) are certainly conceivable. -- MarkusDemleitner - 2012-09-14
Represent a Photometry Catalog with a definite number of Magnitudes expressed in columns and astronomical sources in rows. For example, an SDSS catalog with the following columns:
SDSSID | RA | DEC | U | G | R | I | Z
A Photometry Catalog could refer to a single object observed in a number of filters, or to different objects observed in a number of filters, and the filters could be an arbitrary number. Employing an efficient relational approach would suggest to represent this as a table where each magnitude is expressed in a different row, and the other information (object name, coordinates, instrument, filter, etc) are in columns, or are factored out in the table header if they are common to all points.
For instance, here is a (simple) example of an (unnormalized) catalog for different sources. Notice that this table doesn't use any controlled vocabulary for filters, target names and instruments, while VO documents should:
TargetName | RA | DEC | Instrument | Filter | Magnitude | Units
M51 | xx.yy | xx.yy | SDSS | u | xx | ABMAG
M51 | xx.yy | xx.yy | SDSS | g | xx | ABMAG
NGC1068 | xx.yy | xx.yy | GALEX | nuv | Jy
An Aggregated SED is defined as an aggregation of different segments of spectro-photometric data, where each segment can be a Photometry Catalog, a Spectrum or an entire SED itself. It should be possible to serialize an SED as a list of several tables, each table representing a segment. Complex formats like VOTable and FITS can allow the tables to be stored in the same file.
It has to be possible to embed STC instances in tables and attach these instances to other objects in the table. This is a generic example of Model reuse (the STC model is reused by other models)
Since right now, the only non-deprecated way to include STC metadata in VOTables relies on utypes, it is particularly unfortunate we have no way of doing this that's REC. I'm counting this as a separate use case since IMHO it's a particular shame something as basic is almost undefined in our recommended format -- the format we, as the VO community, actually control.
Also, IMHO ideally clients should not have to worry about what (if any) data model some data within a VOTable conforms to. Just as any VOTable library could support (the now deprecated) COOSYS element, support for "modern" STC metadata should be "generic".
Sorry for littering the use case with discussions on practice; if you have a better place for this stuff, please do move it there.
One plan to do this is described in Referencing STC in VOTable. The basic idea is to collect all information pertaining to STC (or even some other data model) in one group, like this:
<GROUP utype="stc:CatalogEntryLocation"> <PARAM name="CoordFlavor" datatype="char" arraysize="*" utype="stc:AstroCoordSystem.SpaceFrame.CoordFlavor" value="SPHERICAL"/> <PARAM name="CoordRefFrame" datatype="char" arraysize="*" utype="stc:AstroCoordSystem.SpaceFrame.CoordRefFrame" value="ICRS"/> <PARAM name="ReferencePosition" datatype="char" arraysize="*" utype="stc:AstroCoordSystem.TimeFrame.ReferencePosition" value="BARYCENTER"/> <PARAM name="TimeScale" datatype="char" arraysize="*" utype="stc:AstroCoordSystem.TimeFrame.TimeScale" value="TT"/> <PARAM name="Epoch" datatype="char" arraysize="*" utype="stc:AstroCoords.Position2D.Epoch" value="2010.2"/> <PARAM name="yearDef" datatype="char" arraysize="*" utype="stc:AstroCoords.Position2D.Epoch.yearDef" value="J"/> <PARAM name="TimeInstant" datatype="char" arraysize="*" utype="stc:AstroCoords.Time.TimeInstant" value="2002-01-28T09:30:00"/> <PARAM name="URI" datatype="char" arraysize="*" utype="stc:DataModel.URI" value="http://www.ivoa.net/xml/STC/stc-v1.30.xsd"/> <FIELDref ref="raErr" utype="stc:AstroCoords.Position2D.Error2.C1"/> <FIELDref ref="deErr" utype="stc:AstroCoords.Position2D.Error2.C2"/> <FIELDref ref="ra" utype="stc:AstroCoords.Position2D.Value2.C1"/> <FIELDref ref="de" utype="stc:AstroCoords.Position2D.Value2.C2"/> <FIELDref ref="pmra" utype="stc:AstroCoords.Velocity2D.Value2.C1"/> <FIELDref ref="pmde" utype="stc:AstroCoords.Velocity2D.Value2.C2"/> </GROUP> <FIELD ID="ra" name="ra" datatype="float"/> <FIELD ID="de" name="de" datatype="float"/> <FIELD ID="raErr" name="raErr" datatype="float"/> <FIELD ID="deErr" name="deErr" datatype="float"/> <FIELD ID="pmra" name="pmra" datatype="float"/> <FIELD ID="pmde" name="pmde" datatype="float"/>
One advantage of this scheme is that it's fairly easy to isolate the STC parsing/unparsing code from the rest of the VOTable handling since the stuff it has to operate on is "just an element", not many elements spread out over the entire document.
This scheme also lets you embed multiple data models in a single VOTable. The "primary" data model (e.g., obscore or spectrum in ObsTAP or SSAP, respectively) could still use the FIELD's utype attributes; even though the "primary" data models only have crippled STC metadata (and would not, e.g., support proper motions), such information can still be transmitted in the VOTable and evaluated by clients. Here's an example that could be part of an obscore response in which the server also provides information on SSA:
<GROUP utype="stc:CatalogEntryLocation"> <PARAM name="CoordFlavor" datatype=... etc, as above </GROUP> <GROUP utype="spec:Spectrum"> <FIELDref ref="ra" utype="spec:Target.pos.ra"/> <FIELDref ref="dec" utype="spec:Target.pos.dec"/> ... (whatever else is in the table for Spectrum) ... </GROUP> <FIELD ID="ra" name="ra" datatype="float" utype="obscore:char.spatialaxis.coverage.location.coord.position2d.value2.c1"/> <FIELD ID="de" name="de" datatype="float" utype="obscore:char.spatialaxis.coverage.location.coord.position2d.value2.c2"/> ...
(ignoring the fact that I've probably made up spec utypes and obscore talks about observation rather than target; I guess you get the drift anyway). Of course, it would be much nicer if we could agree on throwing overboard most of the existing practice and could just say
<GROUP utype="stc:CatalogEntryLocation" id="targetPos"> <PARAM name="CoordFlavor" datatype="char" arraysize="*" utype="stc:AstroCoordSystem.SpaceFrame.CoordFlavor" value="SPHERICAL"/> ... etc, as before </GROUP> <GROUP utype="spec:Spectrum"> <GROUPref ref="targetPos" utype="spec:Target.pos"/> </GROUP>
But it's probably too late for that, we don't have a GROUPref element in the first place, and less referencing is better as a rule.
-- MarkusDemleitner - 2012-09-20
This is a more specific example of model reuse. Spectral DM uses some parts of the STC DM to describe the reference frame of the observations: the frame can include several axes: time, spectral coordinate, flux, photometry filters. Some of these axes could be different instance of the same STC or STC-derived class, like the photometry filters. This has to be represented in a generic tabular format using Utypes to describe the structure of the serialized instances. Different instances of the same class must somehow be disentangled from the others.
Given a database and an archive containing spectra serialized in a non-compliant way (but in a supported format, like FITS), a data publisher might want to create a VO service: in principle this would require the creation of a new database (or a view on the original one) and to copy and change the headers of the non-compliant files. A more efficient solution would be to leave the archive and the database untouched and to add an additional layer on top of them: the layer would add the required metadata to the original files on the fly (see R #4). For example, the service can read the information in the database and fill a VOTable compliant header (putting together the database values with the predefined Utypes) that will wrap the original FITS file in the files response.
Spectrum 1.1 REC, Quality Flags: Data.FluxAxis.Quality.n, where n is an integer. (Parsable? Anyway this is gone in Spectral 2.0)
Photometry 1.0 PR, Access class: “we use the Access class defined in ObsTAP and inherited from SSA” -> PhotometryFilter.transmissionCurve.Access.*
Photometry 1.0 PR, Spectrum is imported using the spec namespace (notice the difference with the previous approach).
Namespace (in several DMs): the namespace must be parsed out of the Utype string… but then again which is the actual Utype string?
Extensibility (e.g. NED SED): Data.FluxAxis.Published.Value: is this Utype by any chance related to the standard Data.FluxAxis or to Target.Name? (How can I infer it?)
Introduced as an attribute for FIELD and PARAM in VOTable 1.2:
spec:Spectrum.Target.Name and ssa:Target.Name are the same thing.
Utypes strings can change when DMs are reused. Also, the namespace changes globally for each DM (spec:Target.Name, ssa:Target.Name)
Utypes are only partially used in FITS serializations: they can be used for columns, not for parameters: in this case, an arbitrary 8 char string is provided by the DM document.
DMs do not define an “xmlns” link to the DM URI
See here for a discussion on how VO-URP can support UTYPE-s discussion.
IVOA.net
Wiki Home
WebChanges
WebTopicList
WebStatistics
Twiki Meta & Help
IVOA
Know
Main
Sandbox
TWiki
Working Groups
Interest Groups
Committees