Vocabularies in the VO 2 Proposed Recommendation: Request for Comments
Vocabularies in the VO, version 2, proposes formats and practices to manage hierarchical word lists that need consensus within the VO. See
http://ivoa.net/rdf the vocabularies currently in use or under consideration.
Note that this is
not “Semantics in the VO”, i.e., further applications of RDF (e.g., full ontologies) are by no means excluded by this specification.
Latest version of Vocabularies in the VO 2 can be found at:
A build of svn trunk that already includes fixes after reviewer comments is available at
https://docs.g-vo.org/Vocabularies.pdf
Reference Interoperable Implementations
Vocabularies of the type described here are in use by several existing standards:
- Datalink (the semantics column)
- VOTable (TIMESYS time scales and reference positions)
- VOResource (relationship types, content levels, content types, date roles, prospectively the subject keywords)
- SimpleDALRegExt (under review: product types)
- VODataService (under review: messengers)
The code managing the RDF repository is available at
https://volute.g-vo.org/svn/trunk/projects/semantics/voc-source
Implementations on the consumer side:
- stilts' VOTable validator uses vocabularies to check the TIMESYS attributes (this gives a simple example for how to deal with IVOA vocabularies in Java)
- pyVO will use the Datalink vocabulary for query expansion in the bysemantics method (https://github.com/astropy/pyvo/pull/241). This gives an example for how to use vocabularies from Python
- Sembarebro is an example for how to use vocabularies from Javascript (code)
- Another example for using IVOA vocabularies from python is the implementation of the gavo_vocmatch ADQL User Defined Function in DaCHS. See http://dc.g-vo.org/tap/capabilities for a definition of the UDF (Code, around line 526)
- A somewhat more complex use case is using the UAT hierarchy in mapping metadata, which again contains examples for vocabulary use in Python.
On processes defined:
- several VEPs have been run
- a PEN has been produced for vocabulary adoption: https://ivoa.net/documents/uat-as-upstream/20201117/ – it is probably a good idea to give this a brief look, too, when reviewing Vocabularies 2. Perhaps these Vocabularies 2 should give some constraints what must minimally be addressed in this kind of document
Plans for the consumer side:
- The RofR publishing registry validator should use the VOResource vocabularies; we expect this to happen during RFC.
Implementations Validators
The vocabulary process itself is in some sense self-validating because the input files are parsed and mangled. A “deeper” validation (“are these concepts any good?”; “can people work out from a description what is and what is not within the concept?”) is probably beyond what automated validators can do.
As to the external interface, common RDF validators can be used to check the syntactic correctness of our artefacts, for instance the
W3C RDF validator.
Comments from the IVOA Community during RFC/TCG review period: 2021-03-22 through 2021-05-03
The comments from the TCG members during the RFC/TCG review should be included in the next section.
In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your Wiki Name so that authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.
Additional discussion about any of the comments or responses can be conducted on the WG mailing list. However, please be sure to enter your initial comments here for full consideration in any future revisions of this document
- Comment by CarloMariaZwoelf :
- According to Section 5, the chairs will become de facto the curators of all the standard vocabularies (including orchestrating the VEPs workflows). In case of the external managed vocabularies the chairs also have the responsibility to keep the IVOA mirrors synchronized. This is strongly reshaping the roles of the chairs: passing from foster, coordinate and drive the WG discussions to consensus forming and issue-solving (as it is for other WGs/IGs) to something completely new, without a well defined framework. Alternative solutions to discuss and agree/disagree on may be:
- Have some sort of advisory committee the way it was originally envisioned for UCDs
- Entrust some defined institution with vocabulary stewardship
- Make the Exec appoint a vocabulary steward personally, as with the document coordinator.
- vocabulary stewardship should be personalised while the management of the VEP process remains the role of chairs?
- While it doesn't seem wise to make strong commitments as to organisatorial structures in a REC, I think you're right and we should open up the text for decoupling certain roles in vocabulary maintenance from the vocabulary chair. In volute rev. 5953, it now says "In the following, the phrase ``chair of the Semantics WG'' is understood to mean ``chair or vice-chair of the Semantics WG, or a person designated by them for the purpose with the consent of the TCG''." Does that address your concerns somewhat? -- MarkusDemleitner - 2021-05-17
- Another point: as it was repeated several times, the role of IVOA is to promote standards, not to check/ensure the quality of what is distributed via the standards. This is up to the service-providers. In this case semantics is an exception: the quality checks and controls about what is distributed with a given standard (i.e. the set of vocabularies following the VOC2 standard) becomes the mission of the Semantics WG. Is this coherent with the "IVOA mission"?
Comments from TCG member during the RFC/TCG Review Period: 2021-03-22 through 2021-05-03
WG chairs or vice chairs must read the Document, provide comments if any (including on topics not directly linked to the Group matters) or indicate that they have no comment.
IG chairs or vice chairs are also encouraged to do the same, althought their inputs are not compulsory.
TCG Chair & Vice Chair
Approved. --
PatrickDowler - 2021-05-11
I found the document to be well written and well thought out. Questions that came up while reading were answered clearly in the subsequent text. I found the code snippets very useful as they made some basic usage concepts more concrete.
Section 5.3 (Externally Managed Vocabularies) raises concerns for me in that I wonder if the the benefits of having a mirror of an external vocabulary will ever outweigh the maintainability and complexity costs in following the process. That said, I don't object to the section since it ensures that such a situation will not be handled in an ad hoc way, but I'm skeptical that it will be used (maybe a good thing?).
Approved --
TomDonaldson - 2021-05-12
The spec is well thought out and a very good definition of how vocabularies should be defined and implemented. We note that sections 1 and 2 have quite complex sentence structure and recommend these be edited. This would enhance readability and accessibility of the standard.
- Hm... If you point me to particularly bad parts, I'm happy to try my hand again (and of course I'm grateful for any sorts of readabilty enhancements directly in the VCS). -- MarkusDemleitner - 2021-05-11
- I've tried to defuse some of the worse instances of commas and dashes in volute rev 5954. Does this help? -- MarkusDemleitner - 2021-05-17
- It does, thanks. I've also just committed a few more edits for clairity whilst hopefully retaining the meaning. -- JamesDempsey - 2021-05-18
We noted the following specific issues:
* Abstract - The mention of three flavours is confusing here. The mention of classes and properties should state that these are RDF, perhaps as "and the two strict hierarchies of RDF classes and RDF properties on the other" (like it's made explicit in §4). * I've tried to simplify the language; feel free to further streamline it. --
MarkusDemleitner - 2021-05-11
- §1.1 - SKOS needs to be defined and/or cited
- Added a reference, and while I was at it, to RDFS, too. -- MarkusDemleitner - 2021-05-11
- §1.1 - last sentence misses main verb, like "... where several fields are defined, the values ..."
- §1.4 - expanding CURIE won't hurt here
- I'm not aware that there is an expansion beyond the not very well-fitting "Compact URI". I've put in an explanation instead. -- MarkusDemleitner - 2021-05-11
- §2.1.2 - The intent of "They" in the last sentence is unclear
- §2.1.5 - Does VOResource v 1.0 and 1.1 need to be cited?
- Added a 1.0 reference. The "last version of VOResource" is introduced in the "Role within in the VO" section. -- MarkusDemleitner - 2021-05-11
- §3.1 - how is the "email to Semantics notification" managed and advertised besides this text in par.5 in §3.1? I feel it's conflicting w.r.t. "the casual user"
- While I'd like to avoid fixing such details in a REC, the foot line of the vocabularies seems about the right place. Right now, this just links to the spec. Once we have a bit of experience, I guess we'd link a little how-to page on the wiki, perhaps a bit like our GettingIntoTheRegistry page. -- MarkusDemleitner - 2021-05-11
- §3.2 - It would be good to state that you are defining desise here rather than using something that already exists.
- I've tried my best -- does this work for you? -- MarkusDemleitner - 2021-05-11
- maybe desise has to be quoted, slanted or else, being a tool/acronym-like
- §3.2 - In deprecated and preliminary, what does "mapped to a reserved value" mean? Should a specific value be provided here?
- Well, I was hoping I could get away with "don't look there". Technically, what's there is known as a "blank node" in RDF, and I'd hate to scare people with this here. On the other hand, blank nodes in RDF can have various representations, and perhaps we one day want to do something with them. I've hence not touched the text for now. If you want, I can add a footnote "reserved means 'don't look at its value, the presence of the key is all you need to know'". Or I can research how much of null or undefined has made it into JSON. But, really, I'd prefer to keep it as is. -- MarkusDemleitner - 2021-05-11
- §3.2 - Should the python example be in a non-normative subsection (like §4.2.2)?
- Good point. I've added subsubsections for "desise definition" and "examples (non-normative)" -- MarkusDemleitner - 2021-05-11
- §4.1.1 - Should skos:exactMatch refer to §2.2.13?
- §4.1.2 - Is OpticalSource the best broader entry here? e.g. Most radio point sources are AGN
- Interesting question; however, this is taken live from the actual UAT, so this, really, is a question to them. On the other hand, I could argue it's a good example for how SKOS' broader by design is a lot looser than the strict is-a relationships we have in our tree-like vocabularies (and that they should be preferred when we don't absolutely need SKOS' laissez-faire).
- §4.4 - ivoasem:vocflavour - Should may in "define what string may occur here" be stronger such as MUST or SHOULD?
- I don't think so; this sentence is just a statement about things within the document -- and at least for now we don't allow anyone else to define vocflavours. Or am I missing what you're aiming at here? -- MarkusDemleitner - 2021-05-11
- §4.4 - ivoasem:preliminary - "The object of triples using it is a blank node" - what does this mean?
- same in ivoasem:deprecated
- That's RDF jargon; in the simplest case, it's a bit like SQL's NULL (and that's how it's used here). The truth is a lot more complicated, as you can use them as inner nodes in trees, so there can be many such blank nodes with some amount of individuality. Now, while I promise in 1.3 that "Concepts not covered by Gray's essay will be informally introduced here" and Norman doesn't go into technicalities of this kind, could you let me off the hook here? Or perhaps let be get away with "(i.e., don't look at the object in the triple)"? -- MarkusDemleitner - 2021-05-11
- §4.4 - ivoasem:useInstead - should there be a MUST or SHOULD linkage to deprecated? i.e. it MUST be present if a deprecated property is present and/or MUST NOT be present if deprecated is not present?
- The latter. I've added "This property MUST NOT be used with non-deprecated subjects" -- MarkusDemleitner - 2021-05-11
- §4.4.1 - Why are two rdf:Description elements used for the two properties (dc:created and ivoasem:vocFlavour) for relationship_type?
- RDF/X is painfully flexible and lets you organise your triples in many different ways; a Description element can have one or more properties, and there may be as many Description elements for a resource as you want. There's a reason I eventually gave in and invented desise... -- MarkusDemleitner - 2021-05-11
- §4.4.1 - How would these be used alongside the examples in §4.[123].2 ?
- They are just dumped into the same file; that's part of the beauty of RDF: you can just pile triples upon triples to your hearts' desire, and they don't need to know of each other, as long as the URIs match. Since normal VO users should never touch these technicalities (and even the vocabulary maintainer will never write RDF/X directly), however, I'd rather not expand on the details of RDF/X serialisation here.
- §5.1 - Should there be a statement about removing ivoasem:preliminary when vocabulary is approved?
- §A.1 - Should all values be quoted? That would allow the standard csv separator of comma to be used and would make it easier to parse for other tooling.
- Hm. How valuable would you consider that? I'd have to touch all the existing vocabularies (~dozen) and the existing tooling. It wouldn't be really bad, but since I personally prefer typing semicolons to having quotes all around I'll frankly only do it if someone expresses a strong preference. -- MarkusDemleitner - 2021-05-11
- §A.2 - Are there any references you could use to cite the INI-style format definition?
- Gosh. Never even thought about it. Just checked the configparser docs from the python stdlib, and they don't have any references in there, either. I'd say that's an "industry standard". If you have anything, I'll happily add it to ivoatex... -- MarkusDemleitner - 2021-05-11
- §A.3 - Should we be moving this to github now?
- Perhaps -- but since nobody has yet bothered to clearly lay out our github policies yet (I've already migrated ivoatexDoc to github to make PRs easy), and I'd hate to rack my brains for good ways to use it for Microsoft's benefit, I don't see an urgent need to put this into the document now. The whole thing's non-normative, so we can later move at any time convenient. -- MarkusDemleitner - 2021-05-11
- affects also §B
- §A - Should the tooling itself be described or referenced?
- You mean is in "it's in version control here"? Perhaps, but once we've worked out things on github I expect it'll move there, and I think adding docs on that then will be early enough, no? That aside, I'd say where that really belongs is our IVOA assets note. -- MarkusDemleitner - 2021-05-11
- §C - The"B1875.0" entry is missing a useInstead entry
- No -- it's just deprecated. Sometimes concepts just don't work out, and giving them another term won't help. The message to adopters of this term then is: You need fundamental changes (where I'm not saying we'll ever have a B1875 term, and if we had it, there'd probably be no reason to deprecate it, either). -- MarkusDemleitner - 2021-05-11
- §D - It would be useful to have a "Changes from REC v1" section noting that this is a complete revision, just so people can trace the history.
--
JamesDempsey - 2021-05-10
Thanks for this very thorough review (and also the typo fixes before)! The changes are in Volute rev. 5952 --
MarkusDemleitner - 2021-05-11
Thanks for the updates. I've approved the RFC now. --
JamesDempsey - 2021-05-18
[changes in volute commit 5948 --
MarkusDemleitner - 2021-05-04]
I've a few comments that do not requires document changes:
- P6 The UCD-Vocab binding could be more detailed
- I've added a few words to the effect that at this point, UCDs are not concerned by VocInVO2 and how they might become so in the future. -- MarkusDemleitner - 2021-05-04
- P7 I really appreciate the reading guide
- P10 Links between DM and vocab need further reflexions
- That is true, but I think that ought to be done in the VO-DML document or perhaps in the mapping document. -- MarkusDemleitner - 2021-05-04
- General: How a client can retrieve the vocabulary a word refers to?
A client getting BARYCENTER as TIMESYS@reflocation has no way to know that the word BARYCENTER is part of http://ivoa.net/rdf/refposition
- The full resource URIs contain the vocabulary name; in the example, it's http://ivoa.net/rdf/refposition#reflocation, which immediately resolves. Individual standards can (and should) provide shortcuts. "Take the terms from vocabulary X" -- this is what VOTable and VOResource do -- then translates into "The RDF URI for t is X#t". Datalink says "absolutify relative URIs with http://www.ivoa.net/rdf/datalink/core", which is why the terms there look like "#calibration". Other standards might want to use the full URIs. Do you think something like this should be part of the document? If so, where? -- MarkusDemleitner - 2021-05-04
- P34 A3: is the volute URL supposed to survive Vocabukary 2.0
- Well, it's non-normative. The trouble is that we still have no actual text on our github policies, and while that's not there I have a hard time just saying something like "put it on github". -- MarkusDemleitner - 2021-05-04
and a few non blocking suggestions from a novice reader
- P14 2.3 RDF triples: a foornote (https://www.w3.org/TR/rdf-concepts/#section-triples) would help as well
- Hm... I think Norman explains that a lot better in the little essay I'm referencing in the reading guide. I could make a footnote like "see p. 9 of Gray (2015)", but it's a slippery slope from there to explaining more and more of RDF proper, which is far beyond the scope of this document. Hence, I'd rather not. If you think it really helps, I will, though. Just holler. -- MarkusDemleitner - 2021-05-04
- P18 Examples of valid and non valid RDF URIs would help.
- I have added a few examples of existing, recommended forms. I could be talked into listing a few bad ones, but I think that, really, only has a point to illustrate common pitfalls. Since I'm not aware of any of those at this point, I've not put in any yet. -- MarkusDemleitner - 2021-05-04
Approved
--
LaurentMichel - 2021-04-22
I found the document well written and the use of code snippets extremely useful. I have not further editorial changes to ask.
Approved
--
GiulianoTaffoni - 2021-05-17
Approved, with current editorial changes and with externalizing Vocabularies stewardship as a separate role from WG chair, potentially held by someone else as is currently the case for document coordinator.
I'd appreciate a footnote on tuples as well (per Laurent's comment above), it can be from Gray or the spec itself.
- Ok, I've put a brief footnote in in volute rev. 5955: "i.e., basic statements of the form (subject, predicate, object) within the RDF; see page 8 of Gray (2015) for a less terse definition." Does this help? -- MarkusDemleitner - 2021-05-17
--
TheresaDower - 2021-05-14
The Simulation Data Model uses semantic (SKOS) vocabularies extensively.
And members of the theory interest group have had extensive discussions about Vocabularies 2.0 in this context.
Changes that were proposed were not implemented.
That said, we think that theory does not need vocab 2.0 and will not block the RFC process.
The Theory I.G. will where necessary propose its own recommendations for vocabularies describing simulations and their maintenance.
Accept. --
MarkTaylor - 2021-03-20
A well wriiten and pretty clear specification.
- As opinion if iG have been asked we checked this against the usage of vocabularies in the various specs of interest for radio astronomy
- there is probably a typo at page 4 "having extra extra convention". Or is it a style effect ?
- the example at the end of 2.1.3 is not consistent with the recent discussion on VEP006. "Dark" is not a progenitor. Propose to rephrase it this way "For instance, they may want to discern among calibration files the bias frame, a dark frame, and a flat field. In all these cases,clients should still be able to work out that such artefacts are calibrations."
- In 5.2 we appreciate that a VEP can now gather several related terms proposals, which was not the case in a previous version. this has been estimated to be important during some recent (2020)DataLink semantics terms discussion. When the terms change during a VEP discussion the VEP is withdrawn but it very important to be able to foind out the whole discussion. So the folowwing statement should be made mandatory "The VEP withdrawn receives aSuperceded-byitem referencing anynew VEPs, any new VEPs have aSupercedesitem referencing the originalVEP." --> "MUST receive" and "MUST have"
- Since this language describes a procedure ("what do you do when a VEP fails") and describes several different end points (including no followup at all), I'd say making these more look more formal will not help much but might cause confusion ("um... what do I do if I there is to superceding VEP?"). Since I don't see scenarios where this might actually become a major problem, and I'd like to keep last-second text changes at a minimum, I'm keeping the text as-is. -- MarkusDemleitner - 2021-05-25
Approved --
FrancoisBonnarel - 2021-05-23 and
MarkLacy
TCG Vote: TBD
If you have minor comments (typos) on the last version of the document please indicate it in the Comments column of the table and post them in the TCG comments section above with the date.
Group |
Yes |
No |
Abstain |
Comments |
TCG |
* |
|
|
|
Apps |
* |
|
|
|
DAL |
* |
|
|
|
DM |
* |
|
|
|
GWS |
* |
|
|
|
Registry |
* |
|
|
|
Semantics |
* |
|
|
|
DCP |
|
|
|
|
KDIG |
|
|
|
|
SSIG |
|
|
|
|
Theory |
|
|
|
|
TD |
|
|
|
|
Ops |
* |
|
|
|
RIG |
* |
|
|
|
StdProc |
|
|
|
|