Model Annotation in VOTables (MIVOT)


MIVOT at a Glance

Model Instances in VOTables (MIVOT) defines a syntax to map VOTable data to any model serizalized in VODML. The annotation operates as a bridge between the data and the model. It associates the column/param metadata from the VOTable to the data model elements (class, attributes, types, etc.) of a standardized IVOA data model, expressed in the Virtual Observatory Data Modeling Language (here after VO-DML).

It also brings up VOTable data or metadata that were possibly missing in the table metadata. The data model elements are grouped in an independent annotation block complying with the MIVOT XML syntax. This annotation block is added as an extra resource element at the top of the resource containing the query response. The data annotation be operated in the context of any VO protocol.

The MIVOT syntax allows to describe a data structure as a hierarchy of classes. It is also able to represent relations and composition between them. It can also build up data model objects by aggregating instances from different tables of the VOTable. Missing metadata can also be provided using MIVOT, for instance by completing coordinate system description, or by providing curation tracing.

The annotation block is the VODML transcription of data model classes, with their attributes, types, and relations. It maps the VOTable data on the relevant model classes. It is made of re-usable bricks that facilitate the developement of tools on both client and server sides. The adopted design does not alter the original VOTable content.

History

The first proposal for a solution for mapping data on models was based on GROUPS and UTypes. This approach suffered some flaws and the IVOA decided to promote a mapping syntax closer to VO-DML, a new standard for model serializations in XML that became a REC in 2016. The baseline of this approach consists in inserting into the VOTable to be annotated, an XML block that is faithful to the model structure and that acts as a bridge between the model leaves and the actual data.

  • A first proposal has been published as a VO working draft in 2018.
    • Users were able to test it on their data during a hands-on session in May 2018. The conclusions were that the approach was promising but we were missing at that time models applicable to real data.
    • This syntax proposal has also been tested on some model examples (STC2, Cube, TimeSeries and Tesselation...). The major objections were 2 folds 1) the verbosity should be reduced 2) the mapping of some ORM features had to be simplified.
  • After this, the working group ran a wide survey to gather requirements for annotating source data. A simplified mapping syntax (namely vodml-lite) that was driven by the need of improving both readability and compactness of the annotations, has been tested in this context as well as along with the work of the time domain interest group.
  • Finally, at the end of 2020, the DM-WG has been mandated by the TCG to run a virtual workshop to clearly define the model usage policy in the VO and to issue a common syntax for the annotations.
The present proposal, one of the major outcomes of that workshop, results from a lot of work put by many people. It is the product of an evolution which combines the best features of both proposals to create a highly effective solution.

Standard and Tools

The draft is managed following the GitHub workflow. The proposal is hosted at https://github.com/ivoa-std/ModelInstanceInVot

  • The PDF, updated after each merge on the main branch, can be seen as a release asset .
  • This draft comes with another repository (modelinstanceinvot-code ) that gathers codes that are being developped to exercise the mapping syntax on real data.
    • This project contains a few Jupyter notebooks that can be launched online
    • This code is meant to be integrated in PyVO

Repository

The GitHub project repository contains the following section:

  • Document sources: one TEX file per section.
  • XML schema: XSD1.1 schema file that is currently used for the validation. Using XSD1.1 has been made necessary to state the syntactic rules to be applied to elements depending on the context in which they are used.
  • Test suite: Huge Python test bench validating all indvidual MIVOT features. The tests check both valid features and forbidden patterns along with the reasons for which they are rejected. We encourage people interested in practicing the mapping to get snippet examples telling what to do and what not to do.

Reference Interoperable Implementations

Data annotation is one of the steps of a broader workflow that starts from raw data and ends-up with the science code. It is difficult to figure out what is a reference implemenation for the mapping without service able to provide annotated data or clients able to process them either. We overcame this difficulty by emulating the missing links. Our implementations work mainly with prototype services or with datasets annotated be hand. Client code shows that the data set content can be interpreted only by reading the annotations.

  • Code sample: the modelinstanceinvot-code package provide a lot of code able to process annotated VOtables.
    • This CODE is based on a model viewer object able to provide different serialisations of the mapped objects. It operates the followings steps:
      • Mapping block extraction
      • Reference resolution (model leaves are set with table data). At this point the user gets an XML serialization of the model instance that can be used in different ways:
        • Extracting model components with XPATH queries
        • Converting the XML serialization in JSON
        • Building Python instances of datamodel place-holder classes (see client/class_wrapper)
        • Building Astropy objects (see client/class_wrapper/astropy_wrapper)
    • WATCHOUT:
      • This code is being developed to exercice the annotation processing, it likely suffers some weakness though.
      • The mapped models are not necessarily VO standards. They can be prototype models (MANGO, SparseCube) or pre-PR version (MCT, PhotDM); but they are all VO-DML serializable.
    • JUPYTER notebooks ( ./jupyter). These are notebooks based on a data sample annotated by hand and located in ./mivot_code/examples/data.
      The notebooks can be run with Binder (https://mybinder.org/v2/gh/ivoa/modelinstanceinvot-code/package)
      • gaia_3D.ipynb: 3D position plot of a star cluster (GAIA) based on MCT classes
      • gaia_3D_astropy.ipynb: 3D position plot of a star cluster (GAIA) based on astropy.SkyCoord
      • moving_source.ipynb: Plot the positions of an XMM source along of 20 years of observations
      • photdm_impl.ipynb: Plot SEDs from a table of cross-matched XMM sources
    • UNIT TESTS: There are many unit tests checking that the parser is able to process all of the mapping features.
      • The MIVOT snippets are in tests/data/input, the output references are in tests/data/output
      • Pay attention to the test #14 which extracts a SparseCube instance from a VOTable including all mapping features for the record.
        This VOTable has been written on purpose by Mark CD as a DM workshop usecase (2021).
    • EXAMPLES: ( mivot_code/examples) some standalone scripts processing each one specific VOTable.
      • The scripts named example.1.xtapdb.* have the best documentaion in both python code and XML files
    • LAUNCHERS: ( mivot_code/launchers)
  • PhotDM Photometric calibrations serialized with MIVOT (M. Louys 2022)
    • Various photometric calibrations are available here
    • No processing code available, just look at the VOTables.
  • Field of View:
    • Instrument field of views serialized with MIVOT and consumed by Aladin (Lite and Desktop) (UTBM Intern Clément Nogueira, 2022).
    • Graphical editor for the FoV shapes.
      • Allows to download MIVOT serializations of the drawn FoVs
      • A FITS image can be plotted in the background as a drawing template.
    • Can be tested here
      • Follow the landing page instructions
        • Draw a FoV with the editor
        • Save it on disk
        • Look at the MIVOT file
        • Upload it in AladinLite
  • XTapDB TAP service mapping on the fly XMM data on MANGO (Unistra intern I. Errami, 2022)
    • Annotation processing
      • Install the Python package modelinstanceinvot-code
      • run xcatdb-client 'select * from catalogueentry'

Implementations Validators

mivot-validator is a Python validator for VOTables annotated with MIVOT.

The validation process is 2 steps;

  • VOTable validation (against 1.3)
  • MIVOT validation
Both must succeed for the files to be considered as valid.

The validator can process either individual files or directory contents (no recursivity)

Annotations and Astropy

The TCG mandate about the mapping syntax also included the commitment of providing tools that could help both data provider and client developers to assess the impact of working with annotated data.

  • Some work has been put on the design for an integration of the mapping processing into the AstroPy /PyVO ecosystem.

Comments from the IVOA Community during RFC/TCG review period: 2022-09-12 - 2022-10-24

The comments from the TCG members during the RFC/TCG review should be included in the next section.

In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your Wiki Name so that authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.

Additional discussion about any of the comments or responses can be conducted on the WG mailing list. However, please be sure to enter your initial comments here for full consideration in any future revisions of this document

Comments added as GitHub isssues must also be reported here.

Using GitHub Issues: You can use GitHub issues to comment. In this case, just add here a short label with the issue URL. The disscussion can take place on GitHub and the final editor answer will be summarized with a reference to the corresponding MR if any.

Comments by Markus Demleitner

You'll not be suprised to learn that I still think this is far too complex, complicated and powerful, and that I severly doubt we have the capacities to even fully specify such a thing, let alone correctly implement it.

LM Comment (on the comments): Mivot has been designed 1) to render any object hierarchy 2) to map any data arrangement on that hierarchy 3) to have a seamless intergation into the VOTable ecosystem:

  1. For 1) we need 3 elements: attributes, object and collection (note that this perfectly matches the JSON elements).
  2. For 2) we need to do join and to use references
  3. For 3) we need to contain the annotations in a block that does not interfer with the VOTable schema.
But even if you prove me wrong on this, I still believe it's not a good idea to require a complete ORM formalism for a basic task like annotating frames and assembling ra, dec, PMs, RVs and parallaxes. As promised, I'll not try to stop this, but I will say I'd be a lot more relaxed if we had a solid fallback position in a refurbished COOSYS.

LM Answer: This has been discussed in detail in the 2021 DM workshop (https://github.com/ivoa/dm-usecases). My point is that MIVOT is able to simply do simple things without prohibiting more complex patterns. You can verify this assertion by browsing the examples we provide within the RFC page. You can also have a look at this talk to see what I mean. In these slides I propose to use MIVOT first to simply add various coordinate frames in VOTables. Just have a look at the XTapDB output and you will get a fallback position in a refurbished COOSYS.

I'd also be a lot more relaxed if I saw something like a PR against astropy exercising all the various features at least at the level of https://github.com/msdemlei/astropy; I can't say I've properly reviewed the reference implementations to figure out how much of what's discussed here they actually cover.

LM Answer: you can find a a example of Astropy binding (SkyCoord output) in https://github.com/ivoa/modelinstanceinvot-code. Have a look at the notebook

This review is against git commit ca37efa5414ceebb8c129e08216894899dc342b9 with PDF page numbers.

(1) p. 7 "The ultimate goal of this standard is to allow clients to get a better understanding of the data complexity whether it is legacy or set for a particular model." Hm. Shouldn't this rather be "understand complex data"?

LM Answer: see PR #166

(2) p. 8 "the addition of any other elements (e.g. X-ray energy band) would require VOTable version upgrades." -- depends. Where the content is controlled by vocabularies, and I suppose "X-ray band" would be an example if VOTable talked about these things (it's part of the messenger vocabulary), all it would take is a VEP. Perhaps you can choose a somewhat clearer example?

LM Answer: see PR #166

(3) p. 8 "This is the purpose of e.g. Measure model which proposes..." -- I'm not sure what the "e.g." means here, and we shouldn't say "propose" when this is a REC.

LM Answer: see PR #166

(4) p. 7 ff I can't say I am too wild about the representation of the use cases and requirements as itemize lists ("bullet points"). At least use enumerations (which will later let you reference individual use cases and requirements; you see, ideally, each requirement would say which use case it was derived from...) -- or go for subsubsections. If I may say so myself, I think that worked our nicely for Vocabularies 2.0...

LM Answer: see PR #166

(5) p. 9, "The vocabulary in the annotation name-space must not overlap with the VOTable elements (names or attributes)" -- that is actually not terribly compelling (and I'd rather read "tag set" instead of "vocabulary" here). Most real VOTable parsers have, for a large part, been ignoring unknown elements almost forever, and since the schema versioning EN, they are actually required to do so. I'd expect we can have elements in the VOTable namespace without breaking anything not already broken for other reasons (that's corroborated by the lack of protest after we pushed in TIMESYS). So... I'd dispute that requirement.

LM Anwer: You are right that VOTable parsers will naturally skip MIVOT blocks, but our concern was more to prevent MIVOT parsers from getting lost in VOTable elements. This requirement is actually to make as easy as possible the live of the MIVOT client developers. We will keep it.

(6) p. 9 "The content of the annotation block must be validated according to a specific schema." -- you mean: "validatable"?

LM Answer: see PR #166

(7) p. 9 "The annotation schema must be independent from the VOTable schema." -- again, I'd dispute that that requirement can be derived from any of your use cases, and I also don't think many of the following requirements have a strong grounding in the use cases. They sound more like something describing the system you eventually came up with. Perhaps you can move these into the introduction as desiderata rather than pretending they're hard requirements?

LM Answer: This requirement does not derive from any use-case, but from a TCG requirement saying not to touch any VOTable features and not not bind annotation syntax with the VOTable schema either. (you stronly suported this requirement AFAIR). That has been one of the main guidelines of this standard.

(8) p. 10 "Clients must be informed on the execution status of the annotation process." -- I don't think I understand what you mean by this. Is this requiring a "Worked out/didn't work out" flag? This would certainly profit from a cross-reference to a use case that would indicate how such a thing would be used in a UI/a workflow/whatever. [Later:] read about "REPORT" and now understand your drift here. Can't you make that a bit plainer? Something like "We would like to have a place where we can stick error messages in case the annotation went wrong"?

LM Answer: see PR #166

(9) p.10 "Clients must be informed of the data model name and version..." -- I don't think that has a lot to do with "on the fly"; I'd say it fits much better under validation, as perhaps in "In order to allow validators to retrieve the VO-DML specifications a document claims to adhere to, the document must make it easy to find out data model names and versions in use" or so.

LM Answer: see PR #166

(10) p. 11 "which MUST be the first child of a RESOURCE with type="results"." -- I'd drop at least the requirement on the type. Consider the famous "self-describing Datalink" service (regardless of whether I personally think it's a good idea...) that just returns a meta RESOURCE containing the parameters the service accepts. You may definitely want to annotate such a thing with something like PDL to say "param B can only be given when param A is given, but then it's mandatory". Let's not kill such an application early by requiring a specific RESOURCE type (and: why should we?)

LM Answer: see PR #166

(11) p. 11 xmlns="http://www.ivoa.net/xml/merged-syntax" -- I'm definitely against calling this schema "merged-syntax". Whoever hasn't been part of the development of this won't have a clue what "syntax" you're talking about here, let alone what might have been "merged" to produceit. Make the local part, perhaps, mivot or, if you prefer, mivot1.

LM Answer: see PR #166

(12) p. 12 "There are only three elements for the models structure itself." Perhaps that's an indication that these three elements should go into VOTable proper (well, ok, you'd need TEMPLATES, too, but that would be it for structural annotation)? That way, people just interested in annotation won't need to worry about JOIN, WHERE, and the various COLLECTION-s.

LM Answer (see my top comment)

(13) p. 12 "These excerpts are out of the box, they refer..." -- Um... what does this mean?

LM Answer: see PR #166

(14) p.13 "When the mapping REPORT is set to KO, the" -- there's a reference to the status attribute missing here, isn't it?

LM Answer: see PR #166

(15) p. 13 "When the mapping REPORT is set to KO, the other children mapping elements are optional" -- well, apart from MODEL, they are anyway, no? And even MODEL has minOccurs=0, so I'd say this sentence is more confusing than informative and hence should probably go. This is somewhat related to your XSD assertion that you have at least one MODEL element for a succesful annotation. I don't think that's a good idea. A null annotation arguably is successful, too, no?

LM Answer: see PR #166

(16) p. 14 Listing 4. In the interest of giving a good example, I'd say we should try to come up with a better message than "The annotation process failed", which just repeats the status="KO" -- at least I am usually rather annoyed when something gives me error messages that are at the same time obvious and useless. Can't we invent something like "Conflicting types for column ra" or "Missing mandatory attribute foo on an instance of class samplemod:Bar"? People will still produce useless error messages, but at least they can't claim they just followed our example.

LM Answer: see PR #166 (labeled as second topic 14)

(17) @status "must be either OK or KO" -- the "KO" is admittedly cute, but at least consider something like a more conventional "FAILED" here, or perhaps even codes indicating whether there was no annotation to begin with (ANNOTATION-MISSING -- if you consider that an error, which may or may not be a good idea) or whether there was annotation that could somehow not be rendered (ANNOTATION-BAD) -- for instance.

LM Answer: see PR #166

(18) p. 15 "Only models that are used in the file must be declared." Please clarify if that means "You must not declare models you do not use" [why would you require that?] or "Every data model in use in WHATEVER must be declared" (but you can declare more if you wnat). The WHATEVER should, I think, be RESOURCE rather than file here.

LM Answer: see PR #166

(19) p. 15 "This attribute MUST not be empty and forms the prefix used in @dmrole @dmtype tags of elements from that model." There's at least an "and" missing here, but I don't get the "tags of elements" thing here either.

LM Answer: see PR #166

(20) p. 15 Table 6. I don't think you have explained what "attribute patterns" are, and while I get the MAND and OPT thing, I'm really not sure what the third column in this table is supposed to say and why it would be in a single table with the other columns. Actually, when one reads on, one starts getting an idea why you do this, what with different requirements on attributes depending on the parent of the element. Well... this smells extremely fishy, but if you really think there's no way around this kind of thing, please add some language on this to the introduction. I'm rather sure I've never seen such tables before, and a brief explanation where they come from would certainly be appreciated by many.

LM Answer: see PR #166

(21) p. 15 "The GLOBALS block holds singular data model instances," -- umm... is this supposed to be "singleton"? But then isn't the criterion here that everything is constant?

LM Answer: see PR #166

(22) p. 15 "The contained instances have a global scope" -- the other time you're talking about scopes in this sense is when you state that the declarations have the scope of the enclosing RESOURCE. I think that's true for both GLOBALS and TEMPLATES in your scheme, so I don't think you should talk about scope here at all.

LM Answer: see PR #166 (grouped with 21)

(23) p. 16 "(see in \ref{TEMPLATES_snippet}).}" -- this is formatted as "see in 202", which is rather confusing. I'd say this should be something like "see line~\ref{...} in Appendix D".

LM Answer: see PR #166

(24) p. 17 "It plays no role as a GLOBALS element" in Listing 9's caption -- I don't understand what you are saying here.

LM Answer: see PR #166

(25) p. 19 "Datamodel element id, MUST be unique within the document." That's better in other places you describe dmid, where it says "within the mapping block". I'd say this needs to be changed here, too.

LM Answer: see PR #166

(26) p. 19 In general, COLLECTION scares me, and I can't say I really understand all the magic it does. It does seem it's somehow combining properties of a relation (i.e., a table) and or something like an array ("The collection contains a matrix of atomic values."). And that the content model of COLLECTION changes depending on its parent elements is outright weird as far as I am concerned. To me, this really suggests that you should pick the thing apart and create at least two different elements from this -- but as I said, you kind of lost me in sect. 4.6 anyway, so I can't even say if that number shouldn't be higher.

LM Answer (see my top comment): A collection is a bag/array/set of something. To be short; COLLECTION-s are the counterpart of the JSON [...]

(27) p. 21 "The Position column indicates the required rank of the child element" -- "rank" isn't a well-defined term here. Let's just drop that sentence; I'd say the "Position" column is clear enough as is.

LM Answer: see PR #166

(28) p. 22 "The ATTRIBUTE must always have a non empty @dmtype XML attribute." -- ouch. You see, when an attribute references a FIELD or PARAM, a parser then has to juggle datatype, arraysize, and xtype from the VOTable thing and now on top your dmtype. Why would you want to inflict something like that on our clients? It's bad enough that VO-DML had to introduce another type system -- please keep it out of VOTable and outlaw @dmtype when referencing VOTable PARAMs and FIELDs. As long as it's legal, it's only a matter of time until someone abuses that attribute to deepen the confusion of the VOTable types even further.

LM Answer: I agree with you that there is a sort of mess around all the type definitions you enumerate. However, as dmtype are mandatory parts of the models (not only in VODML), MIVOT has to support them.

(30) p. 23 "Reference of the FIELD or PARAM that has to be used to set the ATTRIBUTE value." - aw, that can of worms. Using VOTable or XML id-s for these references is rather uncool, for one, because writers have to come up with these ids, but even more because when you combine multiple resources into one VOTable (which, for instance, TOPCAT does), requiring global uniqueness means you have to manage the dmids or the individual resources, which usually is ugly and error-prone. So... since we're now directly annotating tables, can't we at least (also) allow using names as references? I realise that's a bit clumsy because PARAM-s and FIELD-s can have the same names, but enabling name-based references would count as my strongest reason to bind annotations to single tables. So, let's reap that benefit...

LM Answer: see PR #166

(31) p. 24 Where I think that COLLECTION may have a few meanings too many, I'm not entirely sure why there's an extra REFERENCE element. What would be worse if you just allowed dmref on ATTRIBUTE?

LM Answser: In the first draft, INSTANCE@dmref was used instead of REFERENCE but it turned out that, and appart of the XSD complexity, transforming INSTANCE in a Swiss knife was damageable for the readability. In addition, our experience with the on-the-fly annotations made us thinking that this pattern might raise some difficulties.

(32) p. 24, the sourceref attribute -- I can't claim I've thought this properly through, but this sounds more than a bit scary. This is intended to support cases where metadata is in table columns, right? If so, I'd strongly say we should drop the feature. I know there are tables like that out there, but they're broken (which you can tell already because you can't do proper descriptions or UCDs for them), and we should not encourage people to write broken tables. Let's use mivot to nudge people in the right direction, not complicate its implementation.

LM Answser: This pattern allows to retrieve one particular item from a collection without using its position. This is usefull to e.g.; get items by filter name (what you do complain about) or to retrieve data related to one particular source from collections mixing data from different sources which rather usual.

(33) p. 24, Listing 15: There's LaTeX markup left in the listing.

LM Answer: see PR #166

(34) p. 25 "The foreign collection can either be a static element GLOBALS COLLECTION or a collection of INSTANCE resulting from the iteration over a TEMPLATES." -- for one, I don't get the GLOBALS COLLECTION part, and then the numerus of the various words does't fit. Either write "collection of INSTANCE-s" or, perhaps preferably, "collection of INSTANCE elements". Analogously, "iteration over a TEMPLATES element", etc. [that problem exists in several other places, too, but it's worst here, so this is where I complain...]

LM Answer: see PR #166

(35) p. 26 "This over-statement may help the parser but it can carry inconsistencies. An error must be risen in this case." -- how does it help the parser to have to check the consistency? While I've not written any code yet, I can already predict I'll curse you for this. Please just don't do it. If you think there's a point for sourceref (which isn't obvious to me, see above), then at least outlaw dmref and sourceref at the same time.

LM Answer: see PR #166

(36) p. 26 "the foreign COLLECTION must be direct child of GLOBALS..." -- this sequence of requirements left me scratching my head (and you probably want to say "possibly" rather than "eventually" a few lines below that). All the if-then-s here made me wonder, and I tried to locate corresponding code for that in the mivot_validator. I eventually figured out you're using xs:assert statements in the schema itself, so I ended up staring at these; first, please remove the commented-out asserts, which make the stuff look even worse. And the assertions after current line 210 re-inforce my skepticism -- having so many if-thens is an indication to me that we ought to structure the whole standard differently.

LM Answer: We are using as few as possible different XML elements whith an obvious gain in term of both compactness and readbility. The price for this is that the flexibility is carried by the element attributes. This requires either to defined the allowed attribute patterns within the document or to use XSD1.1. We choose the latest because it is unambigous by construction and machine readable.

(37) p. 27 "The mapping syntax does not specify the data types to be used to evaluate the expression." -- I notice that your implementation tries to make good on this by stringifying the various values. That may be marginally all right. But the consequence is that NULLs will participate in joins (as 'None'=='None'), which they don't in SQL, and I think it'd be a lot smarter if we followed SQL in this. Which brings me to a major qualm with the current proposal: It doesn't say anything about NULLs anywhere. Perhaps that's all right (though I suspect commenting on what a parser should do when it encounters NULLs in the various places FIELDs and PARAMs are used would uncover a few hidden snags), but at least with WHERE you have to be explicit.

LM Answer: see PR #166

(38) p. 29 primary_key patterns -- for this case, I checked the mivot-code tests and didn't find a test for when there's a non-empty @ref. Is there something else exercising this?

LM Answer: Unit test #11 checks the pattern <PRIMARY_KEY dmtype="fffff" ref="zzzz" />. These test only validate the syntax, not the client processing. In fact, this is one of the feature that are still not tested by mivot-code.

(39) p.30 "This is part of this standard and available at http://ivoa.net/XML/MIVOT/mivot-v1.0.xsd." It's perhaps not unreasonable to link to the versioned artefact here. But please add something like "Implementations should always retrieve the schema from the namespace URI, i.e., http://www.ivoa.net/xml/merged-syntax" (or, really, whatever better NS URI you come up with, see above).

LM Answer: see PR #166

(40) p. 30, the extra validity requirements at the foot of the page ("Are references resolvable" ff): mivot-validator doesn't look at any of this, does it?

LM Answer: It doesn't. It just validates against both VOTable and MIVOT schema. XML attribute values are not covered by the schema.

(41) p. 23 "value is set by a resolved @ref, @unit must be compliant with the unit of the referenced FIELD or PARAM. The inconsistency handling at this level is beyond the scope of this document." Please don't do this. It's easy to just outlaw unit when there's ref, and I don't think there's a plausible situation where allowing it would improve anything.

LM Answer: see PR #166

(42) p. 31 "This imposes the client to parse some XML using..." -- I think you can't use "impose" like this. Perhaps: "To do that, the client must parse..." (but then the "using XPath strings" doesn't quite match -- you can't parse using xpath... In general, if I had to choose I'd drop the entire section 6 ("Client APIs"); it doesn't seem like something that should be part of a REC, and I suspect it'll strike readers as a bit odd several years down the road.

LM Answer: see PR #166

(43) "No previous versions yet." -- Well, according to the IVOA doc repo, there has been a WD in April 2022, and according to the git log, there have been quite a few changes between the WD and the PR. Perhaps it doesn't matter much at this point, but as a service to folks that have already looked at the WD, it probably would be a nice service to summarise these changes.

LM Answer: see PR #166

(44) p. 32 Appendix A I'd also rather drop from a standard.

LM Answer: Appendix A has been swapped with appendix D. My take is that it is important to remind that MIVOT is not something out of the box but that it has been tested on services delivering legacy data. This rises some issues that deserve to be mentioned aside of the standard definition sections.

(45) p. 32f Appendices B and C should perhaps be inlined to where the elements are treated. That way, it's less likely they'll be forgotten when these elements are updated, and with appendix B in REFERENCE's section I'd have known right away that I'm sure this shouldn't be supported:-).

LM Answser: These example have been moved into he appendices because there are too long to stay in the normative section. Appendix B ( Dynamic References) is referenced from the REFERENCE section and appendix C ( Join Examples) is referenced from the JOIN section.

(46) p. 34ff, Appendix D -- please let's not use a totally denormalised table as an example in such a spec. People shouldn't write data like this, where mag and flux are really inhomogeneous; as I said above, you can already see that when you're trying to find UCDs for them. The right way to write such a table is to have per-band columns. Giving a denormalised example is going to be a bad precedent for many years ("but MIVOT is doing it like this, too"). If you really have nothing else to show off your features (but then perhaps the features should go?), at least put a big, fat warning here to the effect that "This example was chosen as a particular challenge for annotation. Do not write tables like this at home."

LM Answer: see PR #166. I wouldn't say that this table is denormalised taken into account that each measurement has its own time. This is rather an efficient way to provides photometric points covering different wavelength and each with a different timestamps.

(47) Please add a test target to your Makefile as per https://ivoa.net/documents/Notes/IVOATexDoc/20220525/NOTE-ivoatexDoc-1.3-20220525.html#tth_sEc3.11.1. As a hint why such a thing is important: Your example appendix_D.xml has not validated as it was in my checkout; a side benefit of having pre-flight tests is that you have a chance to notice when you change our schema and break a feature exercised in your example.

With the corrections I have applied in my small-edits PR, I'd say

stilts xsdvalidate schemaloc="http://www.ivoa.net/xml/merged-syntax=../schema/xsd/mivot-v1.0.xsd" appendix_D.xml

for a reasonably recent stilts should give no errors; it doesn't though. It looks as if the XSD mechanism in stilts doesn't understand your xsd:assert-s. Perhaps you can work with Mark to see if he can include an XSD processor that supports them? Or, sigh, use some XSD processor that knows about xs:assert instead of stilts?

However, frankly: I'd say a specification that can live without these assert elements and still catches a lot of possible errors would probably be a better spec anyway...

ANSWER 47 (LM): We had this discussion with MT. It turned out that XSD1.1 is no longer natively supported in JAVA and Mark wish not to add external depencies to stilts. He told us that shouldn't prevent us to use XSD assserts. We spent a lot of time to figure out how to avoid using this but we failed. This is why I wrote a Py validator (https://github.com/ivoa/mivot-validator).
A valid_snippet target has been added to the makefile (https://github.com/ivoa-std/ModelInstanceInVot/pull/165).

I've made PR#164 with a few editorial changes.

ANSWER (LM) https://github.com/ivoa-std/ModelInstanceInVot/pull/164 merged

-- MarkusDemleitner - 2022-11-04

Second review by MarkusDemleitner

I have some replies to Laurent's replies to my first, November 2022, review above. This is what "item (n)" below refers to; I'm extracting them here to keep things from disappearing the in the text above. Finally, as an implementation feedback, I have a new point below, (f).

(a) [on the lack or implementations]

LM Answer: you can find a a example of Astropy binding (SkyCoord output) in https://github.com/ivoa/modelinstanceinvot-code. Have a look at the notebook

Hm -- I'm really unhappy with this example, as it does not address the fundamental challenge: How do I assemble the six-parameter solution with a position and its velocities so the machine can confidently do epoch propagation? Sure, you could blindly pull some Velocity instance, but that breaks the moment there is more than one position or velocity. That is: What this demonstrates would have worked just as well just with UCDs.

I also find the positional error modelled rather oddly in the luhman16 example: Why the mix between meas:Asymmetrical3D and meas:Asymmetrical2D.minus? Why is that asymmetrical in the first place? I give you that's mostly a problem with the models, but MIVOT can't be exercised without them, and so regrettably we can't quite abstract away from these.

So... the example would be a lot more convincing if you did epoch propagation (perhaps having an epoch slider in your plot), and even more convincing if you did epoch propagation in the presence of multiple solutions, as only then a gain over UCDs is demonstrable.

LM answer: I can answser with a little homework: take the input VOTABLE of that notebook, add new positional columns and run Jupyter again. You will get the same result.

(b) On item (5)

LM Answer: You are right that VOTable parsers will naturally skip MIVOT blocks, but our concern was more to prevent MIVOT parsers from getting lost in VOTable elements. This requirement is actually to make as easy as possible the live of the MIVOT client developers. We will keep it.

Given that having to juggle the namespaces makes life harder for the much larger number of document producers, I still maintain that's a questionable decision by utilitarian points of view. But true, as long as we have that plethora of new elements, we certainly don't want to pollute the VOTable namespace with them. So, as long as we don't greatly simplify MIVOT, I'll not belabour that point any further.

(c) on item (28)

LM Answer: I agree with you that there is a sort of mess around all the type definitions you enumerate. However, as dmtype are mandatory parts of the models (not only in VODML), MIVOT has to support them.

Supporting them of literals (as long as you think you have to have them) is one thing. Requiring them when you reference FIELD-s and PARAM-s is a totally different thing and, I claim, is entirely wrong. What use case would even suggest that, let alone require it? So, I keep up my request to forbid dmtype on references to PARAMs and FIELDs.

MCD answer:

o the dmtype and dmrole attributes are what map the VOTable content to the model.
o Markus' earlier argument ( VOTable PARAMs already have datatype, arraysize, xtype ).

- For those to serve the same purpose would have linked the annotation and votable schema (at the very least), and require providers to change their existing VOTable serializations to match model expectations.

- Both of these were rejected as possibilities long ago.
o on the current comment:
- for the most part, there should be a pretty good agreement between the ATTRIBUTE dmtype and the PARAM/FIELD datatype but
+ model types don't necessarily agree with serialization. This is usually trivial, but for example, if a model says 'ivoa:nonnegativeinteger' and the VOTable datatype="double",
the model-savvy code can identify bad/invalid data values from the model perspective (eg: -1.0 or 4.384).
+ allows mapping of string content to particular interpretations: eg: "ivoa:Unit" and "Phot:UCD" will both be VOTable datatype='char' type
+ allow different packaging:
VOTable: <PARAM datatype='double' arraysize=2 unit="deg" value="47.9343 -83.5931" \>
MIVOT: 2 ATTRIBUTES populating LonLatPoint lon and lat attributes individually

(d) on item (31)

LM Answser: In the first draft, INSTANCE@dmref was used instead of REFERENCE but it turned out that, and appart of the XSD complexity, transforming INSTANCE in a Swiss knife was damageable for the readability. In addition, our experience with the on-the-fly annotations made us thinking that this pattern might raise some difficulties.

Could you try and explain where RESOURCE vs. dmref improves anything?

MCD answer:
The REFERENCE element conveys a model relation, that the target object has a reference relationship with the parent object.
eg: A COLLECTION of REFERENCES is different from a COLLECTION of INSTANCES.

LM answer: An INSTANCE being able to either convey an object content o reference on another INSTANCE would be a big mess for the XSD and the users because you should deal with many rules stating how ref/value/dmref/element-emptyness should live toghter. You remark is bit in contradiction with your considerations in XSD complexity (off record).

(e) on item (26)

LM Answer: We are using as few as possible different XML elements whith an obvious gain in term of both compactness and readbility.

For the record, I'd dispute the readability part: If I have to go through all these if-thens before I understand what a construct actually is, that's not readable at all. But note that that's not an argument for more elements. It's an argument for fewer features.

MCD answer: (same as above )The syntax needs to convey the modeling concepts supported by VODML, so that is either more elements (ala original syntax) or more complex descriptions and workflows (ala MIVOT). I don't think 'fewer features' is an option.

(f) My new concern: I have never been particularly fond of fully qualified role names ('dmrole="ivoa:Quantity.val"'). I have now tried to implement things and I have found that with these qualified names, I have to implement parsing of VO-DML files (and then following inheritance hierarchies) just to infer these fully qualified attribute names. That's about an *order of magnitude* more work than if I can just write 'dmrole="val"' (note that the type that attribute sits on is already given by the dmtype of the embedding element, except that that's the actual type, not the type that defines the attribute). Given that: is there a really strong reason to have these qualified role names? And where in the spec is the "algorithm" to figure out these names spelled out anyway? Now that I have looked, it would seem to me I'd be perfectly justified to write "val" (or just about anything else) there...

MCD answer: the value of these is a VODML "ElementRef" and is defined in the vodml standard section 4.2. It is that which identifies what 'modeled thing' is being talked about. MIVOT could make a statement about the value origin and syntax referring to the VODML spec, but MIVOT itself does not define these.
o there is some merit to the general statement.. it is awkward to have an INSTANCE with dmtype="Phot:PhotometryFilter" and then each contained ATTRIBUTE with dmrole="Phot:PhotometryFilter.xxx" and even more so when class hierarchy gets in the mix.

o However

1) this has been a core part of the syntax design from the start.. ( so over 5yrs now ), so its a bit unfair to bring this up as an issue now.

2) I can't say I've done a lot of it, but it seems more awkward to have to refer back to the parent element to determine the context in which an object is playing that role.

eg: find the target position:

a) search for instance with dmtype="meas:Position" and dmrole="ds:AstroTarget.position"

b) search for instance with dmtype="meas:Position" and dmrole="position" whose parent dmtype="ds:AstroTarget"

eg: what's the bandwidth for the "SDSS.G" Photometry filter?

a) get parent instance of attribute with dmrole="Phot:PhotometryFilter.name" and value="SDSS.G"; return values of child attributes with dmrole="Phot:Bandwidth.start" and dmrole="Phot:Bandwidth.stop"

b) since 'name' is a role that comes up often, you need to make sure you are in the right context.. you can't just search for any attribute with dmrole='name' and value='SDSS.G'. so the process is:

get all instances with "dmtype="Phot:PhotometryFilter"; find instance containing direct child attribute with dmrole="name" and value="SDSS.G"; find containing instance with dmtype="Phot:BandWidth"; return values of child attributes with dmrole="start" and dmrole="stop".



-- MarkusDemleitner – 2023-01-26



Comments from TCG member during the RFC/TCG Review Period: 2022-09-16 - 2022-10-31

WG chairs or vice chairs must read the Document, provide comments if any (including on topics not directly linked to the Group matters) or indicate that they have no comment.

IG chairs or vice chairs are also encouraged to do the same, althought their inputs are not compulsory.

TCG Chair & Vice Chair

Applications Working Group

In general the document is well written and the design and decision making process very well documented. There is a lot of information in addition to the document itself and it's obvious that a lot of thought and effort went into creating it.

From the Apps perspective, I would have like to see more practical examples in the "2.1 Use Cases and Requirements" section. In practice, how is MiVOT going to solve the problems listed there? Cross matching source in VOTables to pick an example. I doubt that end users (astronomers) want to deal with the XML annotations so how can we automate the parsing of the VOTable? What do we offer them instead? DM specific classes? astropy classes? MIVOT offers the means to achieve serialization but in itself is not an end user solution. I am aware that there are efforts underway to use it with PyVO - maybe that can offer some answers but the use cases should give examples of what can be achieved.

LM answer: I propose to append something to section 6 for this. See PR #185

A main drawback that I see with MIVOT (and maybe it has been discussed before) is that the end user has no visibility into what fields need to be selected to get back a DM component. Or maybe that could be part of the client software API where the user does a "SELECT *" with MAXREC=0 and parses the result to get a list of tables and columns that need to appear in the SELECT statement. This is just another example of a practical problem that this spec could solve.

LM answer: This point may be clarified in appendix D. We discussed this a lot when preparing the Adass 2021 BoF. The fact is that we do not have any query langage able to SELECT model classes. In consequence, the server has to do the better mapping it can with the selected columns. If you want to make sure to get all columns you need to work with objects you have to run SELECT * FROM SOMETHING. This is one of the reasons why we introduced the REPORT element. It allows to inform the client about the mapping status. See PR #185

I propose to add somethin

Other minor observations:

Section 4.2

"Each element subsection come" -> "Each element subsection comes"

LM answer: See PR #82

Section 4.10

I think that the serialization should work across languages and should specify the representation of NULL values (vs empty ones).

LM answer: OK for adding the representation. I've open the issue #183 to get feedback before to open a MR. See proposal in PR #185

"but the way it do it is" -> "but the way it does it is"

LM answer: See PR #182

-- AdrianDamian - 2023-01-26

Data Access Layer Working Group

The standard looks well written and seems a good fit for the requirements. We haven't directly used it ouselves yet though.

-- JamesDempsey - 2023-01-25

Data Model Working Group

Grid & Web Services Working Group

The document is well structured and there are examples and code tha guide to the implementations.

It seems that the Editor resolve the comments from other WGs and from the Community. I think in this stage the document can be accepted.

-- GiulianoTaffoni - 2023-01-25

Registry Working Group

(1) From the Registry perspective: it could be useful for VO users and validators to discover which services may implement MIVOT blocks in the VOTables they return, or to check if a particular service does it. This document could therefore benefit from a small section about that need, which could evoke a new Registry capability to be defined in details in a future MIVOTRegExt standard.

LM answer: an appendix on the registry issue has been added in #184

-- RenaudSavalle - 2023-01-27

Semantics Working Group

It seems Semantics appears largely unconcerned.

(1) We would, however, be grateful for a brief guide of the type "To see that this works and perhaps play around with it a bit, do X, Y, and Z"; it's nice that there's apparently quite a bit of code around, but at the same time that makes it a bit hard to decide what to look at (and for). You know, this is a rather complex beast, and I find it hard to see which of the many patterns here are actually implemented in which sense.

LM Answer: You are right. We put a lot of work into developing this standard and various tools to help the community implement it; it would have been appropriate to also provide a beginner's guide. However, writing it would have required too much manpower and thus delayed once more the RFC process. We prefer to move forward to make the REC happen while remaining ready to provide assistance to anyone interested in getting involved.

(2) The obvious connection to Semantics is VO-DML's SemanticConcept. As you in general don't say a lot about the relationship of MIVOT and the various VO-DML elements (should we be worried about that?), you don't say anything about that, either. So... what about this? Are there common rules needed to reference vocabulary terms? Can you perhaps point us to examples where you used, say reference_frame?

LM Asnwer: To be short there is no straight connection between MIVOT and VODML. MIVOT is designed to map data on class hierarchies made of 3 structural elements: the atomic values, the lists and the objects (see section 4). Each of element can have a type and play a role in that hierachy. This is the basis of any object oriented approach. The link with VODML is that both roles and types in the mapping block must refer to models that must be serialized in VODML. However if you ignore this requirement, you won't break anything as shown by all of our validation snippets. The processing of semantic concepts is a concern for the models, not for the mapping. In VODML, semantic concepts are rather used to constraint model leaves values. This has nothing to do with the mapping syntax which do not operate any checking on the model compliance.

-- MarkusDemleitner - 2022-11-04

After poking a bit more, I am really unconvinced that sufficient implementation has been demonstrated. Sure, there is a lot of code, but for all I can see that's in various stages of being outdated, partly clearly broken, and partly using unclear or outdated models.

The least I would like to see as "reference implementation" is a live TAP service generating MIVOT annotation sufficient for (a) doing epoch propagation and (b) outputting (wavelength, flux) pairs from sufficiently annotated magnitudes. With my WG chair hat off, I would be happy to assist in that implementation, but with my WG chair hat on I cannot vote for this if that basic functionality is not convincincly demonstrated.

LM answer: I think I've to clarify that MIVOT is not an end-to-end framework. Its purpose is to map data on models, any model, nothing more.
We are here on the RFC page where we are supposed to validate the fact the MIVOT does well the job it has been designed for. I'm sure it does.
MIVOT is one element of the WG roadmap as presented for 4 years: 1) make models for components (STC/PhotDM) 2) Make a mapping syntax that works with any model (here we are) 3) the last stage: building a model that agregate those of 1). This is the purpose of the MANGO data model we are working on. Once we will have all of this stuff in RFC or so, you could rightfuly address the science cases that are not covered. But as long as we do not have MIVOT as a REC and MANGO as an IVOA draft, your request is not relevant. I'm reacting that way because the connection of a measure with a Photcal is job for MANGO (see many interop PPTs). I would also remind you that this PhotDM impl notebook is very close to what you are expecting. I'm ready to work with you to improve it, but this is a model matter (Mango) not a mapping issue.

Another very central thing I could not spot is a proper instance validator. The validator that's there apparently only does a schema validation (and that only with XSD 1.1, which is a major step for the IVOA that's so far kept to XSD 1.0, something I'd advocate a lot because XSD 1.1 implementations appear to be few and far between). But the actual promise of the whole DM thing is that I can see whether instances actually match the constraints set by the VODML -- is there anything like that? Again, I think I really cannot vote for this until there is at least a credible prototype for such a validator, because without it, I can already predict we will have almost only invalid annotations given the complexity of this spec.

-- MarkusDemleitner – 2023-01-26

LM answer: Building a tool than can validate MIVOT annotations against the model structure is feasible but out of reach for the resources we have. I've enough experience with this sort of tool to say that. If someone volunteers for making it I can help. If this is your request, the answer is negative, I cannot promise it. I can however enable my validator to check that all dmroles and dmtypes match those of the decalred models.

Data Curation & Preservation Interest Group

Education Interest Group

Knowledge Discovery Interest Group

Operations Interest Group

Overall the text looks carefully written. I haven't yet tried using this proposal myself so I don't have a feel for how it works in practice, but implementations and validators seem to be in place.

One suggestion: The examples in the text for each feature are welcome, but it would be nice as well to have a single complete example of a VOTable marked up using MIVOT, alongside an explanation of what the markup is doing or how it could be used. Because of the size of the standard it wouldn't make sense to show off all the features in such an example, I have in mind something fairly simple. I appreciate that there are multiple examples in the github repository, but it's not obvious to the reader where to start, and not all those examples correspond to the published version. I leave it to the authors to decide whether this suggestion is a good idea or practicable.

LM Answer: Thank you for the suggestion. We had a similar discussion among the authors and we considered then a way to highlight the purpose of the snippet in a real context see Github Issue. The idea was to add an appendix with a complete VOtable containing most of the mapping features. We have one compiled by MCD. This would take 7 pages which is affordable. The snippets within the normative text would then be extracted from that table along with \label\ref links allowing readers to jump forth and back from the normative section to the original location in the VOTable. We exercised it with some success (and one tweak) but as this would require a lot of work we decided to postpone it until we get strong motivations from the reviewers. So let's do it.

A PR has been open with the desired changes

-- LaurentMichel - 2022-10-06

A few minor corrections below (version 2022-09-16):

  • Section 4.6: listings 10 and 11 are identical.
  • Section 4.9: broken Appendix reference "See more examples in Appendix ??" . Same thing in (at least) Listing 10 and Section 4.10 - missing \label{appen_*} definitions?
  • Listing 9: outermost closing tag is missing a "/"; example reads "<GLOBALS>...<GLOBALS>" rather than "<GLOBALS>...</GLOBALS>". Same thing in listings 10, (11), 12.
  • Table 22: "he host" -> "the host"
  • Section 4.12: there appear to be a couple of full stops missing.
LM Answer: Fixed by https://github.com/ivoa-std/ModelInstanceInVot/pull/161
-- LaurentMichel - 2022-10-05

Radio Astronomy Interest Group

Solar System Interest Group

Theory Interest Group

Time Domain Interest Group

Standards and Processes Committee


TCG Vote : 2022-09-16 - 2022-10-31

If you have minor comments (typos) on the last version of the document please indicate it in the Comments column of the table and post them in the TCG comments section above with the date.

Group Yes No Abstain Comments
TCG        
Apps        
DAL *      
DM        
GWS *      
Registry        
Semantics        
DCP        
Edu        
KDIG        
Ops *      
Radio        
SSIG        
Theory        
TD        
<nop>StdProc        
<!--
* Set ALLOWTOPICRENAME = TWikiAdminGroup
-->
Edit | Attach | Watch | Print version | History: r48 < r47 < r46 < r45 < r44 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r48 - 2023-01-27 - LaurentMichel
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback