Model Annotation in VOTables (MIVOT)

MIVOT at a Glance

Model Instances in VOTables (MIVOT) defines a syntax to map VOTable data to any model serizalized in VODML. The annotation operates as a bridge between the data and the model. It associates the column/param metadata from the VOTable to the data model elements (class, attributes, types, etc.) of a standardized IVOA data model, expressed in the Virtual Observatory Data Modeling Language (here after VO-DML).

It also brings up VOTable data or metadata that were possibly missing in the table metadata. The data model elements are grouped in an independent annotation block complying with the MIVOT XML syntax. This annotation block is added as an extra resource element at the top of the result resource of the query response. The data annotation be operated in the context of any VO protocol.

The MIVOT syntax allows to describe a data structure as a hierarchy of classes. It is also able to represent relations and composition between them. It can also build up data model objects by aggregating instances from different tables of the VOTable. Missing metadata can also be provided using MIVOT, for instance by completing coordinate system description, or by providing curation tracing.

The annotation block is the VODML transcription of data model classes, with their attributes, types, and relations. It maps the VOTable data on the relevant model classes. It is made of re-usable bricks that facilitate the developement of tools on both client and server sides. The adopted design does not alter the original VOTable content.

History

The first proposal for a solution for mapping data on models was based on GROUPS and UTypes. This approach suffered some flaws and the IVOA decided to promote a mapping syntax closer to VO-DML, a new standard for model serializations in XML that became a REC in 2016. The baseline of this approach consists in inserting into the VOTable to be annotated, an XML block that is faithful to the model structure and that acts as a bridge between the model leaves and the actual data.

  • A first proposal has been published as a VO working draft in 2018.
    • Users were able to test it on their data during a hands-on session in May 2018. The conclusions were that the approach was promising but we were missing at that time models applicable to real data.
    • This syntax proposal has also been tested on some model examples (STC2, Cube, TimeSeries and Tesselation...). The major objections were 2 folds 1) the verbosity should be reduced 2) the mapping of some ORM features had to be simplified.
  • After this, the working group ran a wide survey to gather requirements for annotating source data. A simplified mapping syntax (namely vodml-lite) that was driven by the need of improving both readability and compactness of the annotations, has been tested in this context as well as along with the work of the time domain interest group.
  • Finally, at the end of 2020, the DM-WG has been mandated by the TCG to run a virtual workshop to clearly define the model usage policy in the VO and to issue a common syntax for the annotations.
The present proposal, one of the major outcomes of that workshop, results from a lot of work put by many people. It is the product of an evolution which combines the best features of both proposals to create a highly effective solution.

Standard and Tools

The draft is managed following the GitHub workflow. The proposal is hosted at https://github.com/ivoa-std/ModelInstanceInVot

  • The PDF, updated after each merge on the main branch, can be seen as a release asset .
  • This draft comes with another repository (modelinstanceinvot-code ) that gathers codes that are being developped to exercise the mapping syntax on real data.
    • This project contains a few Jupyter notebooks that can be launched online
    • This code is meant to be integrated in PyVO

Repository

The GitHub project repository contains the following section:

  • Document sources: one TEX file per section.
  • XML schema: XSD1.1 schema file that is currently used for the validation. Using XSD1.1 has been made necessary to state the syntactic rules to be applied to elements depending on the context in which they are used.
  • Test suite: Huge Python test bench validating all indvidual MIVOT features. The tests check both valid features and forbidden patterns along with the reasons for which they are rejected. We encourage people interested in practicing the mapping to get snippet examples telling what to do and what not to do.

Reference Interoperable Implementations

Data annotation is one of the steps of a broader workflow that starts from raw data and ends-up with the science code. It is difficult to figure out what is a reference implemenation for the mapping without service able to provide annotated data or clients able to process them either. We overcame this difficulty by emulating the missing links. Our implementations work mainly with prototype services or with datasets annotated be hand. Client code shows that the data set content can be interpreted only by reading the annotations.

  • Code sample: the modelinstanceinvot-code package provide a lot of code able to process annotated VOtables.
    • This CODE is based on a model viewer object able to provide different serialisations of the mapped objects. It operates the followings steps:
      • Mapping block extraction
      • Reference resolution (model leaves are set with table data). At this point the user gets an XML serialization of the model instance that can be used in different ways:
        • Extracting model components with XPATH queries
        • Converting the XML serialization in JSON
        • Building Python instances of datamodel place-holder classes (see client/class_wrapper)
        • Building Astropy objects (see client/class_wrapper/astropy_wrapper)
    • WATCHOUT:
      • This code is being developed to exercice the annotation processing, it likely suffers some weakness though.
      • The mapped models are not necessarily VO standards. They can be prototype models (MANGO, SparseCube) or pre-PR version (MCT, PhotDM); but they are all VO-DML serializable.
    • JUPYTER notebooks ( ./jupyter). These are notebooks based on a data sample annotated by hand and located in ./mivot_code/examples/data.
      The notebooks can be run with Binder (https://mybinder.org/v2/gh/ivoa/modelinstanceinvot-code/package)
      • gaia_3D.ipynb: 3D position plot of a star cluster (GAIA) based on MCT classes
      • gaia_3D_astropy.ipynb: 3D position plot of a star cluster (GAIA) based on astropy.SkyCoord
      • moving_source.ipynb: Plot the positions of an XMM source along of 20 years of observations
      • photdm_impl.ipynb: Plot SEDs from a table of cross-matched XMM sources
    • UNIT TESTS: There are many unit tests checking that the parser is able to process all of the mapping features.
      • The MIVOT snippets are in tests/data/input, the output references are in tests/data/output
      • Pay attention to the test #14 which extracts a SparseCube instance from a VOTable including all mapping features for the record.
        This VOTable has been written on purpose by Mark CD as a DM workshop usecase (2021).
    • EXAMPLES: ( mivot_code/examples) some standalone scripts processing each one specific VOTable.
      • The scripts named example.1.xtapdb.* have the best documentaion in both python code and XML files
    • LAUNCHERS: ( mivot_code/launchers)
  • PhotDM Photometric calibrations serialized with MIVOT (M. Louys 2022)
    • Various photometric calibrations are available here
    • No processing code available, just look at the VOTables.
  • Field of View:
    • Instrument field of views serialized with MIVOT and consumed by Aladin (Lite and Desktop) (UTBM Intern Clément Nogueira, 2022).
    • Graphical editor for the FoV shapes.
      • Allows to download MIVOT serializations of the drawn FoVs
      • A FITS image can be plotted in the background as a drawing template.
    • Can be tested here
      • Follow the landing page instructions
        • Draw a FoV with the editor
        • Save it on disk
        • Look at the MIVOT file
        • Upload it in AladinLite
  • XTapDB TAP service mapping on the fly XMM data on MANGO (Unistra intern I. Errami, 2022)
    • Annotation processing
      • Install the Python package modelinstanceinvot-code
      • run xcatdb-client 'select * from catalogueentry'

Implementations Validators

mivot-validator is a Python validator for VOTables annotated with MIVOT.

The validation process is 2 steps;

  • VOTable validation (against 1.3)
  • MIVOT validation
Both must succeed for the files to be considered as valid.

The validator can process either individual files or directory contents (no recursivity)

Annotations and Astropy

The TCG mandate about the mapping syntax also included the commitment of providing tools that could help both data provider and client developers to assess the impact of working with annotated data.

  • Some work has been put on the design for an integration of the mapping processing into the AstroPy/PyVO ecosystem.


Comments from the IVOA Community during RFC/TCG review period: 2022-09-12 - 2022-10-24

The comments from the TCG members during the RFC/TCG review should be included in the next section.

In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your Wiki Name so that authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.

Additional discussion about any of the comments or responses can be conducted on the WG mailing list. However, please be sure to enter your initial comments here for full consideration in any future revisions of this document

Comments added as GitHub isssues must also be reported here.

Using GitHub Issues: You can use GitHub issues to comment. In this case, just add here a short label with the issue URL. The disscussion can take place on GitHub and the final editor answer will be summarized with a reference to the corresponding MR if any.

Comments from TCG member during the RFC/TCG Review Period: 2022-09-16 - 2022-10-31

WG chairs or vice chairs must read the Document, provide comments if any (including on topics not directly linked to the Group matters) or indicate that they have no comment.

IG chairs or vice chairs are also encouraged to do the same, althought their inputs are not compulsory.

TCG Chair & Vice Chair

Applications Working Group

Data Access Layer Working Group

Data Model Working Group

Grid & Web Services Working Group

Registry Working Group

Semantics Working Group

Data Curation & Preservation Interest Group

Education Interest Group

Knowledge Discovery Interest Group

Operations Interest Group

Overall the text looks carefully written. I haven't yet tried using this proposal myself so I don't have a feel for how it works in practice, but implementations and validators seem to be in place.

One suggestion: The examples in the text for each feature are welcome, but it would be nice as well to have a single complete example of a VOTable marked up using MIVOT, alongside an explanation of what the markup is doing or how it could be used. Because of the size of the standard it wouldn't make sense to show off all the features in such an example, I have in mind something fairly simple. I appreciate that there are multiple examples in the github repository, but it's not obvious to the reader where to start, and not all those examples correspond to the published version. I leave it to the authors to decide whether this suggestion is a good idea or practicable.

A few minor corrections below (version 2022-09-16):

  • Section 4.6: listings 10 and 11 are identical.
  • Section 4.9: broken Appendix reference "See more examples in Appendix ??" . Same thing in (at least) Listing 10 and Section 4.10 - missing \label{appen_*} definitions?
  • Listing 9: outermost closing tag is missing a "/"; example reads "<GLOBALS>...<GLOBALS>" rather than "<GLOBALS>...</GLOBALS>". Same thing in listings 10, (11), 12.
  • Table 22: "he host" -> "the host"
  • Section 4.12: there appear to be a couple of full stops missing.

Radio Astronomy Interest Group

Solar System Interest Group

Theory Interest Group

Time Domain Interest Group

Standards and Processes Committee


TCG Vote : 2022-09-16 - 2022-10-31

If you have minor comments (typos) on the last version of the document please indicate it in the Comments column of the table and post them in the TCG comments section above with the date.

Group Yes No Abstain Comments
TCG        
Apps        
DAL        
DM        
GWS        
Registry        
Semantics        
DCP        
Edu        
KDIG        
Ops *      
Radio        
SSIG        
Theory        
TD        
<nop>StdProc        

Topic revision: r25 - 2022-09-26 - MarkTaylor
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback