Characterisation Data Model RFC

This document will act as RFC centre for the Characterisation Data Model 1.11 Proposed Recommendation.

Review period: 01 Jun 2007 to 4 Jul 2007
Tcg evaluation: 10 sept 07
Update of the document accordingly is available at :
http://www.ivoa.net/Documents/PR/DM/CharacterisationDM-20070914.pdf
( if not yet available see http://alinda.u-strasbg.fr/Model/Characterisation/CharacterisationDM_PRUpdate_20070914.pdf)

There are two other important documents to check with the model related to existing implementations and the derived XML elements names used for metadata tagging: Utypes.

Implementations of this model has been described in the following note, submitted today (August 13th, 2007):

http://www.ivoa.net/Documents/Notes/ImplemtationCharacDM/ImplementationCharacterisation-20070813.pdf

Utypes derived from the UML model are listed and commented in the following note: http://www.ivoa.net/Documents/latest/UtypeListCharacterisationDM.html

An XML schema has been build up from the UML model and is available at : http://www.ivoa.net/xml/Characterisation/Characterisation-v1.11.xsd

In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your WikiName so authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.(by IVOA.

Discussion about any of the comments or responses should be conducted on the data model mailing list, dm@ivoa.net.

Comments


  • Sample comment (by BrunoRino): ...
    • Response (by authorname): ...

Hello, instead of having many collections of Properties for each instance of Characterisation (like Resolution, SamplingPrecision etc..) one for each axis as shown in Figure 4, why don't you have only one collection of a CharacterisationAxis class which itself would contains the properties for this axis? I think it would make the design clearer. It is also convenient because to each CharacterisationAxis instance can be associated a single coordinate frame which also apply to each property.

The XML representation of the model has this feature , that is each CharacterisationAxis is a node underwhich we have the properties: coverage, resolution, Sampling for this axis. For the model we want to keep the symetry , the matrix-like design that is shown in the different examples. One could search / group the metadata in a Property first order and have , for example Resolution , for all axes, then Coverage, for all axes, etc... When the axes are not independant, that is Space depends on time, for instance, Resolution may be represented by a multi-variable function of all axes, and will be hooked in the Resolution Class.


Several documents related to Characterisation are named by a URL to alinda.u-strasbg.fr, but the link does not work. I was especially interested by the VOTable serialisations given as http://alinda.u-strasbg.fr/Model/Characterisation/examples/MPFSVOt-v1.1.xml

I really like the document: well structured, easy to read, with well explained concepts and clear examples.


Actually this link is available the machine was simply down when you tried. IT is planned to have all these examples on the IVOA site anyway.


Comments from the Working Group Chairs and Interest Group Chairs

Chairs should add their comments under their name.

Mark Allen (Applications WG)

I approve

Christophe Arviset (TCG vice Chair)

    • Response by M.Louys for each point below
I approve this document, providing the following comments are implemented:

- the document "Utype list for the CharDM" v1.1 should be part of the CharDM document, as an annex eventually (like for other DM specs), but it should be part of it so the CharDM doc is self contained.

    • (ML) As the syntax on Utypes is still object of debate, we decided to keep the Utype serialisation in a separate document. This document illustrates the faisability of serialising the Characterisation DM, can easily be derived in a browsable list and evolve later on concerning the strings syntax.

- p39 (nomenclature comment) it would be useful to have a unified definition of "partially compliant", "compliant" and "fully compliant" accross all IVOA standards.

  • "partially compliant" implements
    • some (but not all) MANDATORY/MUST

  • "compliant" implements
    • all MANDATORY/MUST

  • "fully compliant" implements
    • all MANDATORY/MUST
    • all RECOMMENDED/SHOULD
    • (ML) This suggestion has been incorporated in the document in Appendix D

- (editorial comment) throughout all the document, there are a lot of references to other sections (eg see section N.M) of the document, which makes sometimes the reading difficult, ie on page 15-16, there are 10 references to other sections.

    • (ML) I agree and simplified this.

- p10, section 3.4.1 "The data provider will be required to supply UCD for each axis as well as units" I assume one means UTYPE and not UCD. This should be modified.

    • see response below by F. Bonnarel.

- p44 Appendix E The document revision history should be better placed on the IVOA twiki on the DM WG pages, rather than on the CDS server.

  • (ML) agreed and inserted in Appendix E.


On the point about UCD for an axis: Christophe, you are right that each attribute in the data model has an utype, but in the specific case of the axis we have a UCD ATTRIBUTE: in the generic case this attributes tells which data parameter we are facing. Is it a flux ? a magnitude ? or a mass (if we are considering a snapshot as dataset instead of an observation), etc... So UCD should remain there.


Markus Dolensky (Data Access Layer WG, Vice Chair)

Yes, I approve but I also wish to say:

  • the judgment is not based on practical experience on how to implement compliant tools
  • sect 2.2 links to other IVOA modeling efforts: please add a reference to the spectral DM. There should be a sentence pointing SSAP implementers to the other document. A historic note: spectrum DM was split off to assure consistency of the protocol with the relevant parts of the DM but on a shorter timescale.
  • not sure the document provides somewhere a general convention on how to interpret values without error compared to instances with associated accuracy/error; but it should do so
  • a systematic tabular listing of the model items - one utype per row - such as table no. 1 in aforementioned spectrum DM doc. would be very helpful for looking up specific items

  • Response (by FrancoisBonnarel and ML ):
    • For your first point: there will be a version 2 which will benefit I presume from experience of implementers.
    • For your second point: you are right, we have to point out there that a full observation datamodel exists for 1D spectra: the spectrum data model, as I explained already in my answer to Pedro Osuna. SSA itself reuses the spectrum data model, as SLAP is using the Line data model and SNAP the so-called "SNAP datamodel". SIA2 will make direct use od Characterisation I presume.
      A reference to the SpectrumDM PR document is explicitely written in the new version of the document (1.12) .
    • For your third point: Axes without known error may still be useful for data discovery: for analysis it is more difficult, I must say. But here again experience will help for saying where lies the boundary, in a version 2.
    • For your 4th point: this list exists in a separate document available at http://www.ivoa.net/Documents/latest/UtypeListCharacterisationDM.html .

Matthew Graham (Grid & Web Services WG)

I approve this document.

Bob Hanisch (Data Curation & Preservation IG)

I find the document clear, and think it will be a real aid for those concerned with data curation. I add my approval.

Gerard Lemson (Theory IG)

I approve this document, but would like to see a statement regarding the scope similar to the one added into STC recently.

Mireille Louys (Data Models WG)

I vote for approval smile

Francois Ochsenbein (VOTable WG)

I approve the document -- as stated above, I feel the concepts are clearly explained and fully detailed.

Pedro Osuna (VOQL WG)

The concepts in the document apply very well to the description (Characterisation) of, for instance, an observation. There is a clever separation into axes with different properties, which makes the complete characterisation possible with few parameters. For instance, the table in page 9 is very useful to identify correctly the concepts for a proper description of a "data set". The figure 2 on the different levels of description is also very enlightening.

There are some points however that,not being showstoppers for the Recommendation process, should be considered for improvement, whether in this version or subsequent ones. Other issues are probably more editing than any other thing.

- in pg.3 reference is done to the Observation DM. However, it looks like not much work has been going on on the Observation DM. Is the Observation DM going to go ahead, or is it going to be "superseded" by the Characterisation DM globally?. In the former case (Obs DM disappearing), the Char DM should contain all the info that is currently on the Obs DM. In the latter, the difference (or complementarities) between the two should be briefly mentioned.

- in pg. 10 (point 3.4.1) mention is done to a combination of UCD and units to "ensure uniqueness and recognition by standard software". This issue has been treated many times, and at the last Beijing meeting it was finally recognised that a set of "BIYECTIVE UTYPES" should be defined for Data Models to be understood. This biyective-utypes were given the name "UFIs" by Jonathan. Despite the fact that they are uivoandefined yet, they resemble very closely the Object Oriented technology attribute naming, i.e., names constructed by dots in between class names. The issue of how to mention actions in the UFIs is also an important issue and a small team has been created to deal with these things (although admittedly, not started to work on it yet). Therefore, this paragraph should either contain a reference to that, or remove the first paragraph.

- the whole list of attributes in the Data Model should be clearly made explicit in the document. They should normally correspond to what has been called UTypes during all this time, and they should be inside the document, and not in a separate one. In particular, the UTypes should clearly reflect the UML structure of the DM. The UML is missing a "top level" diagram, where inheritances and associations are clearly seen. Also, the "Axis vs Properties" issue can be solved through proper UML modeling of the DM, and should be done so, in my opinion. Leaving the possibility to traverse the tree upside down (Axis to Properties) or the other way around might make the model unworkable for software handling. The UML and its attributes should be reviewed.

- mention is done in 4.4 of the Quantity DM. In my understanding, the Quantity DM effort has been discontinued by the DM group. If this is the case, the reference should be removed. Otherwise, a more detailed reference of how the Quantity DM is affecting this document should be made explicit.

- point 5.2 goes again to the UType creation. UTypes (or UFIs in our more "modern" view after Beijing) should not have repetitions in their names. Different model classes don't need to have the name of the parent in their name. In any case, the whole list of the UFIs (or UTypes) should be given in the document with a clear mapping 1 to 1 to the UML diagram that represents them. The Spectral DM could be a good example of how this can be done.


I am only answering About Observation Data model and let MireilleLouys adress the other points:

This datamodel effort has been frozen since we decided to focus on the characterization part,which became a model in itself. In the original view (see Note http://www.ivoa.net/Documents/latest/DMObs.html, 2004) Characterization was a class of the overall Observation data Model. Other classes were Data, Curation, Provenance, DataID, Target etc ... But I think it is now time to go back it. there is some demand for that.

Actually a complete Observation datamodel already exists, but specific for 1D spectra: its the spectrum datamodel and its relationship to Overall Characterization is rather clear I think: the Char class in Spectrum datamodel is a specific implementation of overall char.

For an overall Observation Data Model we will have to revisit the Data, Curation, DataID, Target as in spectrum and check if they are general enough. My personnal opinion is that Data is not and that Curation, DataID, Target probably are. (But do we really need a data class for Images, 3D data , etc ??? We have FITS which is working well for data in that case ...)

So we see that for an overall Observation datamodel we have to split the effort in two parts : - Data class on one side, (which may keep simple or ethen neglected ?). - Metadata classes on the other side.

For all the metadata classes the Spectrum model is a good starter except that Provenance is missing. There is a lot of demand in various communities for modelizing the instrumental and moreover software Provenance of the datasets. So I think the next step, as we already stated during the first DM session in Beijing (see my talk there and Jonathan's conclusions), will be to focus on Provenance...

Now the last point you adress is a vocabulary one. Some people may want to call all metadata "Characterization" (actually some astronomers outside our IVOA community already do when they specify their needs - see the discussions during Spectroscopic workshop on March 2007 at ESAC) including Curation and Provenance ... It may be the final IVOA choice, but in that case we have to find a new class name for what is presently called Characterization which is this part of metadata dealing with the description of the dataset in the data parameter space.... My personal opinion is that I don't see any good reason to change; so the future ObservationDM Map could be:

____________ObservationDM

_______Data_______________Metadata

___________________ObsId, curation, Char, Provenance , etc ...


    • in pg. 10 (point 3.4.1)
I agree the expression : "a combination of UCD and units to "ensure uniqueness and recognition by standard software" " is too loose to fully adress the problem. I 'll change it. It seems to me that refering to the on-going Utype/UFI discussion within the standard document is not adequate as the discussion will go on while the standard document will stay, at least this version.

    • "the whole list of attributes in the Data Model ... "should be included
If the Utypes syntax needs to be revisited, as they are a serialisation of the data model, it will be easier to change it in a separate document. It is convenient to keep the concepts/classes separate from the serialisations.

    • Axis-first versus Property-first serialisation :
This point is not straightforward and I thank you for rising this question. As discussed in a few DM sessions, the axes along which we do the measurements are not necessarily independant. This is taken care of in the model by the links between properties, like coverage , resolution, and the axis description along which these properties vary. The UML diagram on Fig .3, page 16 shows that the Characterisation overall class encompasses information about the properties (yellow boxes), and has links to information about the Axes as well. There are dependency links (dashed lines) between Coverage and the CharacterisationAxis class. When all axes are independant we get a table of properties vs axes, like in the table examples of Appendix C . If the sensitivity within an observation varies along several physical variables (position, wavelength, time, for instance) then the sensitivity function can be stored simply below the Coverage property , and corresponding axes listed afterwards.

So there are cases where the Property-first serialisation is better than the Axis-first one.

The UML model can support the bi-directional links: axis to property and property to axis , as UML can afford the graph structure , while XML only supports the tree structure. In other terms , we do not want to loose the generality and evolution possibilities of UML in mapping the metadata directly in the XML tree structure.

Still, from a practical point of view , there will be probably much more data serialised in the axis first XML serialisation, provided in the Characterisation XML schema. Another reason to keep the rich graph structure in UML is that the relational database structure easily supports these bi-directional links and offers another possible binding of the model concepts to the metadata value.

    • Quantity DM :
Agreed, the need for a separate model describing values and interpretation rules was not the first priority in the last two years. If there was such a nice package to encode values, units, ucds , utypes, altogether , it would be convenient to reuse it. We shall reformulate that in the document.

    • point 5.2 Utype creation :
Agreed , as the discussion is still going on about UFI/Utypes, the current list matching classes attributes to Utypes for this version will be checked. The mapping of metadata real value to one specific Utype is not always straightforward , that is the reason why the Utype list document contains all the descriptions attached.

Ray Plante (Resource Registry WG)

This is well-written, well-organized, and easily digested--nice work.

A few small comments:

  • If the schema discussed in section 5 is intended to be used as an interoperable exchange format, then the schema should be located at www.ivoa.net and should have namespace URI rooted at www.ivoa.net. (I recommend http://www.ivoa.net/Characterisation/v1.1.)
  • I wonder if the contents of App. D should be made part of the main document since it gives specific requirements for judging compliance.


Thanks ! The XML schema is actually already on line at the IVOA site: http://www.ivoa.net/xml/Characterisation/v1.11 . I shall update the link inside the document.

    • The Appendix D was meant to clarify which piece of metadata should be given at least to have a meaningfull metadata serialisation.
This can be thought of as a set of rules for minimal compliance of a serialisation ( an XML document instance, for instance) for a minimum usage : data discovery and selection . The same question was adressed by Pedro about the Spectrum DM document. Should we have a special policy about how we circulate compliance rules ?

Andrea Preite-Martinez (Semantics WG)

Good work, I approve the document for Recommendation.

Rob Seaman (VOEvent WG, Vice-Chair)

YES I approve but I also wish to say:

  • An interesting read - a job well done for all involved.

  • VOEvent should be able to work with this.

  • Future versions would benefit from editing - move the justification to appendices, use the main document for simple declarative statements. (Spoken by someone who knows well the difference between florid and spartan language.)

  • Formatting on figures 8, 9 & 10, for instance, spilled off the bottom of the page on my copy. Maybe a font size issue.
    • (ML) Font size has been reduced for these figures in the new version of the document

  • The phrase "UML diagram" is without meaning. Captions should describe the content of the particular diagram. I think we've reached the point at which folks should be expected to recognize UML. Perhaps include a UML reference, and/or mention UML when describing the doc's formatting conventions, etc. On the other hand, UML is no substitute for elegant descriptive prose. * (ML) title 'UML diagram removed from captions in the updated version 1.12

  • A key concept is that descriptions of "observed or simulated astronomical data sets" should be "independent of instrumental signatures as far as possible". However, presumably this model will be applied to raw data as well as reduced. The obvious issue is that instrumental signatures remain in raw data. Two less obvious issues, perhaps, are first that the property versus axis grid for a raw data set will be much more sparse than for the corresponding reduced data - i.e., reduction implies calibration, thus filling the grid. Second, the question of defaults will always be a per-instrument issue for raw data.

  • If "STC...encompasses the description of most of the Characterisation Axes", but "does not have the flexibility needed" - a consensus similar to that held by the VOEvent WG - one is left wondering where the disconnect lies, with STC or our use of it.

On the two last points: For relationship between raw and calibrated data, you are perfectly right, but Characterisation summarises the observation on the data axes. It will be the work of Provenance to describe the transformation of raw data to calibrated data as well as to link those to the calibration data.

About STC, it was actually perfectly possible to reuse elements such as AstroCoords, AstroCoordArea or AstroCoordSystem What was considered not flexible enough in our case is the full ObservationLocation feature.


(at request of Roy Williams, Chair)


Topic revision: r35 - 2007-09-14 - MireilleLouys
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback