Model Annotation in VOTables (MIVOT)
MIVOT at a GlanceModel Instances in VOTables (MIVOT) defines a syntax to map VOTable data to any model serizalized in VODML. The annotation operates as a bridge between the data and the model. It associates the column/param metadata from the VOTable to the data model elements (class, attributes, types, etc.) of a standardized IVOA data model, expressed in the Virtual Observatory Data Modeling Language (here after VO-DML). It also brings up VOTable data or metadata that were possibly missing in the table metadata. The data model elements are grouped in an independent annotation block complying with the MIVOT XML syntax. This annotation block is added as an extra resource element at the top of the resource containing the query response. The data annotation be operated in the context of any VO protocol. The MIVOT syntax allows to describe a data structure as a hierarchy of classes. It is also able to represent relations and composition between them. It can also build up data model objects by aggregating instances from different tables of the VOTable. Missing metadata can also be provided using MIVOT, for instance by completing coordinate system description, or by providing curation tracing. The annotation block is the VODML transcription of data model classes, with their attributes, types, and relations. It maps the VOTable data on the relevant model classes. It is made of re-usable bricks that facilitate the developement of tools on both client and server sides. The adopted design does not alter the original VOTable content.HistoryThe first proposal for a solution for mapping data on models was based on GROUPS and UTypes. This approach suffered some flaws and the IVOA decided to promote a mapping syntax closer to VO-DML, a new standard for model serializations in XML that became a REC in 2016. The baseline of this approach consists in inserting into the VOTable to be annotated, an XML block that is faithful to the model structure and that acts as a bridge between the model leaves and the actual data.
Standard and ToolsThe draft is managed following the GitHub workflow. The proposal is hosted at https://github.com/ivoa-std/ModelInstanceInVot
RepositoryThe GitHub project repository contains the following section:
Reference Interoperable ImplementationsData annotation is one of the steps of a broader workflow that starts from raw data and ends-up with the science code. It is difficult to figure out what is a reference implemenation for the mapping without service able to provide annotated data or clients able to process them either. We overcame this difficulty by emulating the missing links. Our implementations work mainly with prototype services or with datasets annotated be hand. Client code shows that the data set content can be interpreted only by reading the annotations.
Implementations Validatorsmivot-validator is a Python validator for VOTables annotated with MIVOT. The validation process is 2 steps;
Annotations and AstropyThe TCG mandate about the mapping syntax also included the commitment of providing tools that could help both data provider and client developers to assess the impact of working with annotated data.
Comments from the IVOA Community during RFC/TCG review period: 2022-09-12 - 2022-10-24The comments from the TCG members during the RFC/TCG review should be included in the next section. In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your Wiki Name so that authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment. Additional discussion about any of the comments or responses can be conducted on the WG mailing list. However, please be sure to enter your initial comments here for full consideration in any future revisions of this document Comments added as GitHub isssues must also be reported here. Using GitHub Issues: You can use GitHub issues to comment. In this case, just add here a short label with the issue URL. The disscussion can take place on GitHub and the final editor answer will be summarized with a reference to the corresponding MR if any.Comments by Markus DemleitnerYou'll not be suprised to learn that I still think this is far too complex, complicated and powerful, and that I severly doubt we have the capacities to even fully specify such a thing, let alone correctly implement it. LM Comment (on the comments): Mivot has been designed 1) to render any object hierarchy 2) to map any data arrangement on that hierarchy 3) to have a seamless intergation into the VOTable ecosystem:
(39) p.30 "This is part of this standard and available at http://ivoa.net/XML/MIVOT/mivot-v1.0.xsd." It's perhaps not unreasonable to link to the versioned artefact here. But please add something like "Implementations should always retrieve the schema from the namespace URI, i.e., http://www.ivoa.net/xml/merged-syntax" (or, really, whatever better NS URI you come up with, see above). LM Answer: see PR #166 (40) p. 30, the extra validity requirements at the foot of the page ("Are references resolvable" ff): mivot-validator doesn't look at any of this, does it? LM Answer: It doesn't. It just validates against both VOTable and MIVOT schema. XML attribute values are not covered by the schema. (41) p. 23 "value is set by a resolved @ref, @unit must be compliant with the unit of the referenced FIELD or PARAM. The inconsistency handling at this level is beyond the scope of this document." Please don't do this. It's easy to just outlaw unit when there's ref, and I don't think there's a plausible situation where allowing it would improve anything. LM Answer: see PR #166 (42) p. 31 "This imposes the client to parse some XML using..." -- I think you can't use "impose" like this. Perhaps: "To do that, the client must parse..." (but then the "using XPath strings" doesn't quite match -- you can't parse using xpath... In general, if I had to choose I'd drop the entire section 6 ("Client APIs"); it doesn't seem like something that should be part of a REC, and I suspect it'll strike readers as a bit odd several years down the road. LM Answer: see PR #166 (43) "No previous versions yet." -- Well, according to the IVOA doc repo, there has been a WD in April 2022, and according to the git log, there have been quite a few changes between the WD and the PR. Perhaps it doesn't matter much at this point, but as a service to folks that have already looked at the WD, it probably would be a nice service to summarise these changes. LM Answer: see PR #166 (44) p. 32 Appendix A I'd also rather drop from a standard. LM Answer: Appendix A has been swapped with appendix D. My take is that it is important to remind that MIVOT is not something out of the box but that it has been tested on services delivering legacy data. This rises some issues that deserve to be mentioned aside of the standard definition sections. (45) p. 32f Appendices B and C should perhaps be inlined to where the elements are treated. That way, it's less likely they'll be forgotten when these elements are updated, and with appendix B in REFERENCE's section I'd have known right away that I'm sure this shouldn't be supported:-). LM Answser: These example have been moved into he appendices because there are too long to stay in the normative section. Appendix B ( Dynamic References) is referenced from the REFERENCE section and appendix C ( Join Examples) is referenced from the JOIN section. (46) p. 34ff, Appendix D -- please let's not use a totally denormalised table as an example in such a spec. People shouldn't write data like this, where mag and flux are really inhomogeneous; as I said above, you can already see that when you're trying to find UCDs for them. The right way to write such a table is to have per-band columns. Giving a denormalised example is going to be a bad precedent for many years ("but MIVOT is doing it like this, too"). If you really have nothing else to show off your features (but then perhaps the features should go?), at least put a big, fat warning here to the effect that "This example was chosen as a particular challenge for annotation. Do not write tables like this at home." LM Answer: see PR #166. I wouldn't say that this table is denormalised taken into account that each measurement has its own time. This is rather an efficient way to provides photometric points covering different wavelength and each with a different timestamps. (47) Please add a test target to your Makefile as per https://ivoa.net/documents/Notes/IVOATexDoc/20220525/NOTE-ivoatexDoc-1.3-20220525.html#tth_sEc3.11.1. As a hint why such a thing is important: Your example appendix_D.xml has not validated as it was in my checkout; a side benefit of having pre-flight tests is that you have a chance to notice when you change our schema and break a feature exercised in your example. With the corrections I have applied in my small-edits PR, I'd say stilts xsdvalidate schemaloc="http://www.ivoa.net/xml/merged-syntax=../schema/xsd/mivot-v1.0.xsd" appendix_D.xml for a reasonably recent stilts should give no errors; it doesn't though. It looks as if the XSD mechanism in stilts doesn't understand your xsd:assert-s. Perhaps you can work with Mark to see if he can include an XSD processor that supports them? Or, sigh, use some XSD processor that knows about xs:assert instead of stilts? However, frankly: I'd say a specification that can live without these assert elements and still catches a lot of possible errors would probably be a better spec anyway... ANSWER 47 (LM): We had this discussion with MT. It turned out that XSD1.1 is no longer natively supported in JAVA and Mark wish not to add external depencies to stilts. He told us that shouldn't prevent us to use XSD assserts. We spent a lot of time to figure out how to avoid using this but we failed. This is why I wrote a Py validator (https://github.com/ivoa/mivot-validator). A valid_snippet target has been added to the makefile (https://github.com/ivoa-std/ModelInstanceInVot/pull/165). I've made PR#164 with a few editorial changes. ANSWER (LM) https://github.com/ivoa-std/ModelInstanceInVot/pull/164 merged -- MarkusDemleitner - 2022-11-04 Second review by MarkusDemleitnerI have some replies to Laurent's replies to my first, November 2022, review above. This is what "item (n)" below refers to; I'm extracting them here to keep things from disappearing the in the text above. Finally, as an implementation feedback, I have a new point below, (f). (a) [on the lack or implementations]LM Answer: you can find a a example of Astropy binding (SkyCoord output) in https://github.com/ivoa/modelinstanceinvot-code. Have a look at the notebookHm -- I'm really unhappy with this example, as it does not address the fundamental challenge: How do I assemble the six-parameter solution with a position and its velocities so the machine can confidently do epoch propagation? Sure, you could blindly pull some Velocity instance, but that breaks the moment there is more than one position or velocity. That is: What this demonstrates would have worked just as well just with UCDs. I also find the positional error modelled rather oddly in the luhman16 example: Why the mix between meas:Asymmetrical3D and meas:Asymmetrical2D.minus? Why is that asymmetrical in the first place? I give you that's mostly a problem with the models, but MIVOT can't be exercised without them, and so regrettably we can't quite abstract away from these. So... the example would be a lot more convincing if you did epoch propagation (perhaps having an epoch slider in your plot), and even more convincing if you did epoch propagation in the presence of multiple solutions, as only then a gain over UCDs is demonstrable. (b) On item (5) LM Answer: You are right that VOTable parsers will naturally skip MIVOT blocks, but our concern was more to prevent MIVOT parsers from getting lost in VOTable elements. This requirement is actually to make as easy as possible the live of the MIVOT client developers. We will keep it.Given that having to juggle the namespaces makes life harder for the much larger number of document producers, I still maintain that's a questionable decision by utilitarian points of view. But true, as long as we have that plethora of new elements, we certainly don't want to pollute the VOTable namespace with them. So, as long as we don't greatly simplify MIVOT, I'll not belabour that point any further. (c) on item (28) LM Answer: I agree with you that there is a sort of mess around all the type definitions you enumerate. However, as dmtype are mandatory parts of the models (not only in VODML), MIVOT has to support them.Supporting them of literals (as long as you think you have to have them) is one thing. Requiring them when you reference FIELD-s and PARAM-s is a totally different thing and, I claim, is entirely wrong. What use case would even suggest that, let alone require it? So, I keep up my request to forbid dmtype on references to PARAMs and FIELDs. (d) on item (31) LM Answser: In the first draft, INSTANCE@dmref was used instead of REFERENCE but it turned out that, and appart of the XSD complexity, transforming INSTANCE in a Swiss knife was damageable for the readability. In addition, our experience with the on-the-fly annotations made us thinking that this pattern might raise some difficulties.Could you try and explain where RESOURCE vs. dmref improves anything? (e) on item (26) LM Answer: We are using as few as possible different XML elements whith an obvious gain in term of both compactness and readbility.For the record, I'd dispute the readability part: If I have to go through all these if-thens before I understand what a construct actually is, that's not readable at all. But note that that's not an argument for more elements. It's an argument for fewer features. (f) My new concern: I have never been particularly fond of fully qualified role names ('dmrole="ivoa:Quantity.val"'). I have now tried to implement things and I have found that with these qualified names, I have to implement parsing of VO-DML files (and then following inheritance hierarchies) just to infer these fully qualified attribute names. That's about an *order of magnitude* more work than if I can just write 'dmrole="val"' (note that the type that attribute sits on is already given by the dmtype of the embedding element, except that that's the actual type, not the type that defines the attribute). Given that: is there a really strong reason to have these qualified role names? And where in the spec is the "algorithm" to figure out these names spelled out anyway? Now that I have looked, it would seem to me I'd be perfectly justified to write "val" (or just about anything else) there... -- MarkusDemleitner – 2023-01-26 Comments from TCG member during the RFC/TCG Review Period: 2022-09-16 - 2022-10-31WG chairs or vice chairs must read the Document, provide comments if any (including on topics not directly linked to the Group matters) or indicate that they have no comment. IG chairs or vice chairs are also encouraged to do the same, althought their inputs are not compulsory.TCG Chair & Vice ChairApplications Working GroupIn general the document is well written and the design and decision making process very well documented. There is a lot of information in addition to the document itself and it's obvious that a lot of thought and effort went into creating it.From the Apps perspective, I would have like to see more practical examples in the "2.1 Use Cases and Requirements" section. In practice, how is MiVOT going to solve the problems listed there? Cross matching source in VOTables to pick an example. I doubt that end users (astronomers) want to deal with the XML annotations so how can we automate the parsing of the VOTable? What do we offer them instead? DM specific classes? astropy classes? MIVOT offers the means to achieve serialization but in itself is not an end user solution. I am aware that there are efforts underway to use it with PyVO - maybe that can offer some answers but the use cases should give examples of what can be achieved. LM answer: I propose to append something to section 6 for this A main drawback that I see with MIVOT (and maybe it has been discussed before) is that the end user has no visibility into what fields need to be selected to get back a DM component. Or maybe that could be part of the client software API where the user does a "SELECT *" with MAXREC=0 and parses the result to get a list of tables and columns that need to appear in the SELECT statement. This is just another example of a practical problem that this spec could solve. LM answer: This point may be clarified in appendix D. We discussed this a lot when preparing the Adass 2021 BoF. The fact is that we do not have any query langage able to SELECT model classes. In consequence, the server has to do the better mapping it can with the selected columns. If you want to make sure to get all columns you need to work with objects you have to run SELECT * FROM SOMETHING. This is one of the reasons why we introduced the REPORT element. It allows to inform the client about the mapping status. I propose to add somethin Other minor observations: Section 4.2
"Each element subsection come" -> "Each element subsection comes"
LM answer: See PR #82
Section 4.10
I think that the serialization should work across languages and should specify the representation of NULL values (vs empty ones).
LM answer: OK for adding the representation. I've open the issue #183 to get feedback before to open a MR.
"but the way it do it is" -> "but the way it does it is"
LM answer: See PR #182
Data Access Layer Working GroupThe standard looks well written and seems a good fit for the requirements. We haven't directly used it ouselves yet though. -- JamesDempsey - 2023-01-25Data Model Working GroupGrid & Web Services Working GroupThe document is well structured and there are examples and code tha guide to the implementations. It seems that the Editor resolve the comments from other WGs and from the Community. I think in this stage the document can be accepted. -- GiulianoTaffoni - 2023-01-25Registry Working Group(1) From the Registry perspective: it could be useful for VO users and validators to discover which services may implement MIVOT blocks in the VOTables they return, or to check if a particular service does it. This document could therefore benefit from a small section about that need, which could evoke a new Registry capability to be defined in details in a future MIVOTRegExt standard. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Added: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> > | LM answer: an appendix on the registry issue has been added in #184 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-- RenaudSavalle - 2023-01-27
Semantics Working GroupIt seems Semantics appears largely unconcerned. (1) We would, however, be grateful for a brief guide of the type "To see that this works and perhaps play around with it a bit, do X, Y, and Z"; it's nice that there's apparently quite a bit of code around, but at the same time that makes it a bit hard to decide what to look at (and for). You know, this is a rather complex beast, and I find it hard to see which of the many patterns here are actually implemented in which sense. LM Answer: You are right. We put a lot of work into developing this standard and various tools to help the community implement it; it would have been appropriate to also provide a beginner's guide. However, writing it would have required too much manpower and thus delayed once more the RFC process. We prefer to move forward to make the REC happen while remaining ready to provide assistance to anyone interested in getting involved. (2) The obvious connection to Semantics is VO-DML's SemanticConcept. As you in general don't say a lot about the relationship of MIVOT and the various VO-DML elements (should we be worried about that?), you don't say anything about that, either. So... what about this? Are there common rules needed to reference vocabulary terms? Can you perhaps point us to examples where you used, say reference_frame? LM Asnwer: To be short there is no straight connection between MIVOT and VODML. MIVOT is designed to map data on class hierarchies made of 3 structural elements: the atomic values, the lists and the objects (see section 4). Each of element can have a type and play a role in that hierachy. This is the basis of any object oriented approach. The link with VODML is that both roles and types in the mapping block must refer to models that must be serialized in VODML. However if you ignore this requirement, you won't break anything as shown by all of our validation snippets. The processing of semantic concepts is a concern for the models, not for the mapping. In VODML, semantic concepts are rather used to constraint model leaves values. This has nothing to do with the mapping syntax which do not operate any checking on the model compliance. -- MarkusDemleitner - 2022-11-04 After poking a bit more, I am really unconvinced that sufficient implementation has been demonstrated. Sure, there is a lot of code, but for all I can see that's in various stages of being outdated, partly clearly broken, and partly using unclear or outdated models. The least I would like to see as "reference implementation" is a live TAP service generating MIVOT annotation sufficient for (a) doing epoch propagation and (b) outputting (wavelength, flux) pairs from sufficiently annotated magnitudes. With my WG chair hat off, I would be happy to assist in that implementation, but with my WG chair hat on I cannot vote for this if that basic functionality is not convincincly demonstrated. Another very central thing I could not spot is a proper instance validator. The validator that's there apparently only does a schema validation (and that only with XSD 1.1, which is a major step for the IVOA that's so far kept to XSD 1.0, something I'd advocate a lot because XSD 1.1 implementations appear to be few and far between). But the actual promise of the whole DM thing is that I can see whether instances actually match the constraints set by the VODML -- is there anything like that? Again, I think I really cannot vote for this until there is at least a credible prototype for such a validator, because without it, I can already predict we will have almost only invalid annotations given the complexity of this spec. -- MarkusDemleitner – 2023-01-26Data Curation & Preservation Interest GroupEducation Interest GroupKnowledge Discovery Interest GroupOperations Interest GroupOverall the text looks carefully written. I haven't yet tried using this proposal myself so I don't have a feel for how it works in practice, but implementations and validators seem to be in place. One suggestion: The examples in the text for each feature are welcome, but it would be nice as well to have a single complete example of a VOTable marked up using MIVOT, alongside an explanation of what the markup is doing or how it could be used. Because of the size of the standard it wouldn't make sense to show off all the features in such an example, I have in mind something fairly simple. I appreciate that there are multiple examples in the github repository, but it's not obvious to the reader where to start, and not all those examples correspond to the published version. I leave it to the authors to decide whether this suggestion is a good idea or practicable. LM Answer: Thank you for the suggestion. We had a similar discussion among the authors and we considered then a way to highlight the purpose of the snippet in a real context see Github Issue. The idea was to add an appendix with a complete VOtable containing most of the mapping features. We have one compiled by MCD. This would take 7 pages which is affordable. The snippets within the normative text would then be extracted from that table along with \label\ref links allowing readers to jump forth and back from the normative section to the original location in the VOTable. We exercised it with some success (and one tweak) but as this would require a lot of work we decided to postpone it until we get strong motivations from the reviewers. So let's do it. A PR has been open with the desired changes -- LaurentMichel - 2022-10-06 A few minor corrections below (version 2022-09-16):
-- LaurentMichel - 2022-10-05 Radio Astronomy Interest GroupSolar System Interest GroupTheory Interest GroupTime Domain Interest GroupStandards and Processes CommitteeTCG Vote : 2022-09-16 - 2022-10-31If you have minor comments (typos) on the last version of the document please indicate it in the Comments column of the table and post them in the TCG comments section above with the date.
* Set ALLOWTOPICRENAME = TWikiAdminGroup --> |