TWiki> IVOA Web>WebPreferences>UtypesTigerTeam (revision 3)EditAttach

Utypes Tiger Team

Utypes Use Cases

Raw use cases from Urbana-Champaign

  • UC #1. Serialize DM instances into a file
  • UC #2. Deserialize a DM instance from a file
  • UC #3. Embed STC information into VOTables
    • UC #3.1. Embed STC information in FITS
  • UC #4. Provide an abstract (de)serialization strategy that can work with any expressive enough file format. A client can instantiate an object equivalent to the object that was originally serialized
    • UC #4.1 Trivial roundtripping
  • UC #5. Link columns in a relational model of the registry to VOResource schema elements
  • UC #6. Tag metadata in a DAL query response
  • UC #7. Render datasets/archives VO-compliant
  • UC #8. Extensions of standard DMs
  • UC #9. Support serialization of multiple instances of the same DM class
  • UC #10. Standard, machine readable DM description
    • UC #10.1 Versioning of DM descriptions
    • UC #10.2 DM descriptions should express relationships between DMs (reuse, extensions)
  • UC #11. Documentation of DM fields
  • UC #12. Query archives by DM attribute (e.g. by observation’s target name)

Reviewed Use Cases and Requirements

Requirements:

  • R #1 General requirement: provide a serialization and deserialization strategy for generic data model instances using tabular data formats expressive enough to contain metadata. In other terms, given a complex object, it should be possible to serialize it, without loss of information, into a generic tabular format so that a reader can reconstruct an instance of the original object. In order to do so, data models should be expressed in a machine readable format and the serialization strategy should be abstract enough to allow a generic library to serialize or deserialize objects using only the data model description and the instance itself. More concrete format-dependent strategies should be addressed on a format-by-format base, but they should always be consistent with the abstract strategy.

  • R #2 Utypes-XPATH: Utypes should be compatible with XPATH strings pointing to elements in a compliant XML instance: It should be possible, for instance, to use the utypes in a votable as XPATH strings to find the same information in an XML instance of a Data Model.

  • R #3 Compliance with existing services and applications: since the main requirement is that compliant instances should be usable by end users and applications, a requirement is that transforming a non-compliant file into a compliant one of the same format should be as easy as possible and should not require changing the file but only adding metadata to it. This might not always be possible in some very complex cases.

  • R #4 Support as many tabular formats as possible, in particular VOTable and FITS.

Abstract Use Cases:

Data Model (de)serialization

  • UC #1 Serialize DM instances to file: given an instance of a Data Model and the DM machine readable description, a writer can serialize the instance into a number of supported tabular formats. The writer could be a DAL service.
  • UC #2 Deserialize DM instance from file: given a serialized instance of a Data Model in a supported tabular format and the DM machine readable description, a reader can deserialize the instance into memory, building an object consistent with the DM itself.
  • UC #3 Trivial round-tripping: given a serialized instance of a Data Model in a supported tabular format, an I/O library (possibly model-unaware) can convert the instance into a different, supported format without breaking its VO compliance.
  • UC #4 Represent an arbitrary number of instances of the same class in a DM instance (for example, N instances of the PhotometryFilter class in a PhotometryCatalog instance of the Spectral DM). [Omar: in UTypes terms this means that the same UType could be used several times to describe attributes of several different instances, in the same file. Also, several Utyped values should be bundled together in some way, so that each instance of the class can be reconstructed].

VO Tools

  • UC #4 VO-Importer: given a non-compliant file (or set of files) and the library of all the DM descriptions, an importer application can allow users to map columns and parameters in the file (or the set of files, or database) to IVOA DM attributes, thus producing a compliant version of the file (or set of files).
  • UC #5 VO-Publisher: given a database and the library of all the DM descriptions, an helper application can allow data providers to map tables and columns in a database to IVOA DM attributes, in order to build a DAL service.
  • UC #6 VO-Query: given a compliant archive/service it is possible to query it by using Utypes to refer to data model elements (for instance, query all observations for a target whose name is given). [Omar: I am not sure I understand this: does it mean that I could, for instance, query an archive for all the SDSS.g and SDSS.u magnitudes using Utypes?]

Data Model description

  • UC #7 Data Model representation: DMs should be represented by a machine readable description that allows to:
  • UC #7.1 Describe and document Data Model elements.
  • UC #7.2 Keep versioning information about the DM.
  • UC #7.3 Reuse an existing DM in a new DM.
  • UC #7.4 Extend an existing DM with a new DM.
  • UC #7.5 Abstract the creation of VO-compliant I/O libraries from the details of the single DM. According to the programming language, each DM would be represented by some kind of plugin of the generic library. [Omar: this use case is actually a consequence of the implementation of the other 7.x cases].

Others

  • UC #8. Link columns in a relational model of the registry to VOResource schema elements

Concrete Use Cases:
(Photometry Catalog, points in columns)

Represent a Photometry Catalog with a definite number of Magnitudes expressed in columns and astronomical sources in rows. For example, an SDSS catalog with the following columns:

SDSSID | RA | DEC | U | G | R | I | Z

(Photometry Catalog, points in rows)

A Photometry Catalog could refer to a single object observed in a number of filters, or to different objects observed in a number of filters, and the filters could be an arbitrary number. Employing an efficient relational approach would suggest to represent this as a table where each magnitude is expressed in a different row, and the other information (object name, coordinates, instrument, filter, etc) are in columns, or are factored out in the table header if they are common to all points.

For instance, here is a (simple) example of an (unnormalized) catalog for different sources. Notice that this table doesn't use any controlled vocabulary for filters, target names and instruments, while VO documents should:

TargetName | RA | DEC | Instrument | Filter | Magnitude | Units

M51 | xx.yy | xx.yy | SDSS | u | xx | ABMAG

M51 | xx.yy | xx.yy | SDSS | g | xx | ABMAG

NGC1068 | xx.yy | xx.yy | GALEX | nuv | Jy

(Aggregated SED)

An Aggregated SED is defined as an aggregation of different segments of spectro-photometric data, where each segment can be a Photometry Catalog, a Spectrum or an entire SED itself. It should be possible to serialize an SED as a list of several tables, each table representing a segment. Complex formats like VOTable and FITS can allow the tables to be stored in the same file.

STC serialization)"> (STC serialization)

It has to be possible to embed STC instances in tables and attach these instances to other objects in the table. This is a generic example of Model reuse (the STC model is reused by other models)

STC in a Spectrum)"> (STC in a Spectrum)

This is a more specific example of model reuse. Spectral DM uses some parts of the STC DM to describe the reference frame of the observations: the frame can include several axes: time, spectral coordinate, flux, photometry filters. Some of these axes could be different instance of the same STC or STC-derived class, like the photometry filters. This has to be represented in a generic tabular format using Utypes to describe the structure of the serialized instances. Different instances of the same class must somehow be disentangled from the others.

(Create compliant SSA service using a database and a non-compliant archive)

Given a database and an archive containing spectra serialized in a non-compliant way (but in a supported format, like FITS), a data publisher might want to create a VO service: in principle this would require the creation of a new database (or a view on the original one) and to copy and change the headers of the non-compliant files. A more efficient solution would be to leave the archive and the database untouched and to add an additional layer on top of them: the layer would add the required metadata to the original files on the fly (see R #4). For example, the service can read the information in the database and fill a VOTable compliant header (putting together the database values with the predefined Utypes) that will wrap the original FITS file in the files response.

Current Practices and Uses

Spectrum 1.1 REC, Quality Flags: FluxAxis.Quality.n, where n is an integer. (Parsable? Anyway this is gone in Spectral 2.0)

Photometry 1.0 PR, Access class: “we use the Access class defined in ObsTAP and inherited from SSA” -> PhotometryFilter.transmissionCurve.Access.*

Photometry 1.0 PR, Spectrum is imported using the spec namespace (notice the difference with the previous approach).

Namespace (in several DMs): the namespace must be parsed out of the Utype string… but then again which is the actual Utype string?

Extensibility (e.g. NED SED): FluxAxis.Published.Value: is this Utype by any chance related to the standard FluxAxis or to Target.Name? (How can I infer it?)

Introduced as an attribute for FIELD and PARAM in VOTable 1.2:

  • Maps FIELD/PARAM to a DM attribute
  • Encourages use of the XML namespace convention for avoiding name collisions
  • Encourages use of the XML xmlns for linking to the DM
  • Highlights the usefulness of utypes for space-time coordinates and provides an example
for STC
  • Does not say anything about parsability
Redefined in SSA 1.1:
  • The goal of utypes is to “flatten a hierarchical data model so that all fields are represented
by fixed strings in a flat namespace”
  • They are introduced as “fixed” strings, but no explanation is given on the meaning of
“fixed”.
  • “Of course, if a data model becomes complex enough this will no longer be possible”
  • Introduces a serialization mechanism for multiple instances (multiple equal Utypes in the
same file), providing an example using serialization specific features, for VOTable.
  • Does not say anything explicit about parsability, however…
  • In others sections (e.g. query response metadata) other features are introduced:
Utype is built with the pseudo-grammar “”.””

spec:Spectrum.Target.Name and ssa:Target.Name are the same thing.

  • More information about utypes in Section 4.2.7 (Metadata Extension Mechanism)
Redefined in Spectrum 1.1, also introducing Data Model inheritance:
  • Analogy with XPATH (‘.’ instead of ‘/’). “a.b.c.d”, dots indicate “has-a” relationship (3.5)
  • ‘Data Model Field’ and ‘Utype’ interchangeable (3.5)
  • “Other IVOA standards may use a different prefix instead of “Spectrum.” … This
represents Data Model inheritance.” (3.5)
  • “the utypes can be used to infer the data model structure” (8.2)
Most DMs define utypes in tables, using different conventions

Utypes strings can change when DMs are reused. Also, the namespace changes globally for each DM (spec:Target.Name, ssa:Target.Name)

Utypes are only partially used in FITS serializations: they can be used for columns, not for parameters: in this case, an arbitrary 8 char string is provided by the DM document.

DMs do not define an “xmlns” link to the DM URI

Minutes of telecon 2012-08-21

Persons present: Omar, Markus, Gerard, Matthew, Patrick

Points discussed (not in temporal order):

Starting with a clean slate as far as the current working draft is concerned (while harvesting from the current text what's useful): None of the persons present object.

But: We'll have to first come up with a report that summarizes current usage of utypes (plus, when we have a better idea of where we want to go, the impact of any changes to current practice). Matthew volunteers to edit this report. Markus will help out.

Omar will be editor of the working draft, and we assume Mireille will also want to be part of the effort.

Omar will circulate a list of the use cases over the DM list; there's already something on this page (http://wiki.ivoa.net/twiki/bin/view/IVOA/UtypesTigerTeam), but it needs smoothing and elaboration. Documents will end up in volute.

We need to make explicit what are the prime use cases driving utypes. Not all of the ones on the wiki may be achieveable.

Attempts to define utypes: Are they "pointers into a data model instance", much like xpath points into a XML document? That much seems not very contentious. As far as this function is concerned, it is seen that inheritance and embedding are not very well dealt with now.

"I should be able to serialize and deserialize a data model instance in any way I want" using utypes (Matthew). But we obviously need to pose additional constraints on what the target formats need to be able to to (e.g., represent key-value pairs).

What formats do we actually care about? VOTable obviously, FITS probably (though some say people that want FITS won't care about data models), SAMP messages. JSON? No agreement. CSV doesn't have enough of a header to allow utype mapping, probably.

We've got about 8 weeks until the interop. What do we want to get done by then? We want the use cases in a form so TCG/plenary/whoever can decide which absolutely have to be covered, where we as Tiger Team provide comments/suggest priorities. We also must have the overview of where utypes are already used. Telecons every two weeks.

Edit | Attach | Watch | Print version | History: r27 | r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2012-09-04 - OmarLaurino
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback