TWiki> IVOA Web>IvoaDataModel>MeasRFC2 (revision 3)EditAttach

STC2:Meas Proposed Recommendation: Request for Comments


WATCHOUT : This RFC page replaces RFC#1

Rationale for a second RFC round:

  • Many comments have been collected after RFC #1. Some of them just required text improvements but some others implied significant model changes, especially for Coords.
  • A global RFC answer has been sent to WG mailing list in February 2020: answer
  • These data models should support the description of datasets and DAL responses, by defining fundamental elements which are commonly used in many contexts. The intent is that they be imported as components in these more complex models so that they all build from the same basis, thereby enhancing interoperability.
  • They should NOT be expected to fully support any use-cases outside of the described set. For example, they cannot currently support the complex error types found in various Catalogs. These use-cases are to be considered in future updates to the models.
  • Measurements cannot be used without Coords because the latest is imported by Measurements but Coords can be used in some other contexts e.g. Transform.
  • The nature of the changes in Coords, that impacts Measurements thus, led the TCG to decided on 2020-10-8 a second RFC round for both models.

Summary

Version 1 of STC was developed in 2007, prior to the development and adoption of vo-dml modeling practices. As we progress to the development of vo-dml compliant component models, it is necessary to revisit those models which define core content. Additionally, the scope of the STC-1.0 model is very broad, making a complete implementation and development of validators, very difficult. As such it may be prudent to break the content of STC-1.0 into component models itself, which as a group, cover the scope of the original.

This effort will start from first principles with respect to defining a specific project use-case, from which requirements will be drawn, satisfied by the model, and implemented in the use-case. We will make use of the original model to ensure that the coverage of concepts is complete and that the models will be compatible. However, the form and structure may be quite different. This model will use vo-dml modeling practices, and model elements may be structured differently to more efficiently represent the concepts.

This model covers the description of measured or determined astronomical data, and includes the following concepts:

  • The association of the determined ’value’ with corresponding errors. In this model, the ’value’ is given by the various Coordinate types of the Coordinates data model (Rots and Cresitello-Dittmar et al., 2019).
  • A description of the Error model.
The Latest version of the model and supporting docs:
  • Model document: here
  • VO-DML/XML representation: here
  • XML Schema: here

Implementation Requirements

(from DM Working group twiki):

The "IVOA Document Standards" standard has a broad outline of the implementation requirements for IVOA standards. These requirements fit best into the higher level standards for applications and protocols than for data models themselves. At the Oct 2017 interop in Trieste, the following implementation requirements for Data Model Standards was agreed upon, which allow the models to be vetted against their requirements and use cases, without needing full science use cases to be implemented.

  • VO-DML models must validate against schema
  • Serializations which touch each entity of the model. These serializations may be 'fake' (ie: not based on actual data files), and are to be produced by the modeler as unit tests/examples.
  • Real world serializations covering use cases, produced by others following the model, in a mutually agreed upon format.
  • Software which interprets these serializations and demonstrates proper interpretation of the content

Serializations:

  • Modeler Generated Examples:
    • using home grown python code, the modeler has generated example serializations which span all elements of the model. The examples are generated in 4 formats:
      • VOTable-1.3 standard syntax; Validates using votlint
      • VOTable-1.3 annotated with VO-DML/Mapping syntax; Validates using xmllint to a VOTable-1.3 schema enhanced with an imported VO-DML mapping syntax schema
      • XML format; Validates against the model schema
      • An internal DOC format; XML/DOM structure representing the instances generated when interpreting the instance templates.
  • Real world serializations:

Software:

  • vodml Parser: Notebook developed by Omar Laurino parses vo-dml/Mapping annotation to generate instances of TimeSeries -Cube-Meas-Coord using AstroPy and generates plots of the content using Matplotlib. Note: this was developed and presented in 2017 using earlier drafts of the models. These vary only in detail, and the demo could be updated to the current model suite.
  • TDIG: Working project of Time Series as Cube.
    • An ongoing project is underway to enhance SPLAT to load/interpret/analyze TimeSeries data. This was most recently presented at the IVOA Interop in Paris 2019 (see PDF)
    • The tool currently uses a combination of TIMESYS, table column descriptions, UCDs and UTypes to identify and interpret the data automatically. Each of these are annotation schemes which tie directly to model components.
    • Delays in resolving on a standard annotation syntax has delayed progress on this project to fully realize the possibilities. This is a high-priority for upcoming work.
  • pyVO: extract_skycoord_from_votable()
    • Demonstrated at the IVOA Interop in Paris 2019, this product of the hack-a-thon generates AstroPy SkyCoord instances from VOTables using various elements embedded in the VOTable.
      • Interrogates a VOTable-1.3 serialization, identifies key information and uses that to automatically generate instances of SkyCoord.
        • UCD: 'pos.eq.ra', 'pos.eq.dec'
        • COOSYS.system: "ICRS", "FK4", "FK5"
        • COOSYS.equinox
      • The COOSYS maps directly to SpaceFrame, and the value of the system
      • The UCD 'pos.eq' maps directly to meas:EquatorialPosition; with 'pos.eq.ra|dec' identifying the corresponding attributes (EquatorialPosition.ra|dec) as coordinates coords:Longitude and coords:Latitude.
      • This illustrates that even with minimal annotation, this sort of automatic discovery/instantiation can take place. With a defined annotation syntax, this utility could be expanded to generate other AstroPy objects very easily.

Validators

As noted above, the serializations may be validated to various degrees using the corresponding schema

  • VOTable-1.3 using votlint: verifies the serialization complies with VOTable syntax
  • VOTable-1.3 + VODML: verifies the serialization is properly annotated
  • XML using xmllint with model schema: verifies the serialization is a valid instance of the model.
  • NOTE: The modeler examples undergo all levels of validation, showing that the VOTable serializations are also valid instances of the model.
I don't believe there are validators for the various software utilities. Their purpose is to show that given an agreed serialization which can be mapped to the model(s), the data can be interpreted in an accurate and useful manner.

Links with Coords

The Measurement model is heavily dependent on the Coordinates model (also in RFC) for its core elements. Information about its relation to the Coordinates model, and how the requirements are distributed can be found on the STC2 page

Comments from the IVOA Community during RFC/TCG review period: 2020-10-26 - 2020-12-07

Comments by Markus Demleitner

The introduction and points (1), (4), (7), (11) from my RFC 1 comments I'd still retain, and I essentially stand by the summary, which I'd update to:

I'd make the model a lot smaller, thus creating space we'll need once we tackle strict errors (i.e., explicit distributions) in earnest one day. We ought to have, perhaps NaiveMeasuremetnt (a pseudo-distribution saying "value is something like an expectation or perhaps a median, and error is something like a first moment or something like that), and perhaps AsymmetricNaiveMeasure. I'm not even convinced we have credible use cases for statError and sysError. And if we want correlations, these should be explicit as relations between, I guess, measurements (or their parameters?).

The way to exercise that is to have a few data centers annotate a representative part of their tables; I'm not saying they need to be able to precisely annotate everything -- if it's enough for the "automatically plot error bars" use case, I'd say that's a success on which we can build.

On the changes since PR1, the recent addition "The current model assumes Gaussian distributions with shapes defined at the 68% confidence level" we really shouldn't do -- it's a lot more than we can confidently claim about most of our data holdings. At this point, I think we can only say "what we're doing here are rough error bars". Going for actual distributions is something for when we know what we'll do with them, and an "automatic error propagation" use case, to me, is a bit ambitious when we can't even plot error bars at this time.

Adding to point (4) of the original RFC comments -- that we shouldn't have Time, Position, Velocity, ProperMotion, and Polarization as separate classes, but instead distinguish physics by UCD as elsewhere in the VO -- people have said that's a serialisation issue; well, it's not in our current plans, where the DM types would directly sit on VOTable elements. This means that both UCD and DM type would do about the same thing. Let's avoid that.

Since I can't make out a use case for why something figuring out distributions would actually need to know physics, my preference would be to keep the whole notion outside of measurements in the first place. If such a use case were identified, I'd say adding a UCD attribute would be ok, with the understanding that some magic in our VOTable mapping would make that the UCD of the annotated FIELD, PARAM, or GROUP.

-- MarkusDemleitner - 2020-11-25

Comments from TCG member during the RFC/TCG Review Period: 2020-10-10 - 2020-12-07

WG chairs or vice chairs must read the Document, provide comments if any (including on topics not directly linked to the Group matters) or indicate that they have no comment.

IG chairs or vice chairs are also encouraged to do the same, althought their inputs are not compulsory.

TCG Chair & Vice Chair

Applications Working Group

Data Access Layer Working Group

Data Model Working Group

Grid & Web Services Working Group

Registry Working Group

Semantics Working Group

Data Curation & Preservation Interest Group

Education Interest Group

Knowledge Discovery Interest Group

Solar System Interest Group

Theory Interest Group

Time Domain Interest Group

Operations

Standards and Processes Committee


TCG Vote : Vote_start_date - Vote_end_date

If you have minor comments (typos) on the last version of the document please indicate it in the Comments column of the table and post them in the TCG comments section above with the date.

Group Yes No Abstain Comments
TCG        
Apps        
DAL        
DM        
GWS        
Registry        
Semantics        
DCP        
KDIG        
SSIG        
Theory        
TD        
Ops        
<nop>StdProc        



<!--</p> <ul> <li> <ul> <li> Set ALLOWTOPICRENAME =<span class="WYSIWYG_PROTECTED"> TWikiAdminGroup</span> </li> </ul></li> </ul>-->

Edit | Attach | Watch | Print version | History: r27 | r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2020-11-25 - MarkusDemleitner
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback