Semantics Calls 4
The fourth edition of the Semantics Calls telecon has taken place on
Wednesday, Sept 15, 13:00 UTC.
Minutes
VEP-006:
The main contention point in the end apparently results from in the fact that datalink semantics
uses an identifier, which is
http://www.ivoa.net/rdf/datalink/core#calibration, whereas the concept's
label (which Markus insists is the human-facing identity)
is "Calibration applicable".
This is VEP-006's #calibrated, used roughly in: There is a datalink
document for a raw data file (#this); this then has links to files that
can be used to create a calibrated file.
This is different from when the datalink is for a calibrated file
and the calibration files (e.g., statistical profiles,
simulations) applied to a raw data set (e.g., an
event list) are linked to this; VEP-006's #calibration doesn't apply to
those, and there are many voices against using current #progenitor for
those on grounds that there's a widespread sentiment that "Progenitor"
in the vernacular just doesn't include these. That's the background of
VEP-009.
The label of such a concept should probably be "Calibration Applied";
Markus notes no such link appears to exist in the current VO, and that
we should delay discussion to when someone actually wants to have one.
Markus suspects it'll be a lot more common to have a separate datalink
file with raw science data and calibration files, and we shouldn't block
#progenitor from pointing to that from the reduced data.
Other people feel
http://www.ivoa.net/rdf/datalink/core#progenitor needs
to be fixed because it mixes two categories of data that are separated
when we deal with provenance tracking; however, no case was brought
forward where that mixing would become operationally relevant in
datalink (i.e., it's unclear how a datalink client would treat the two
differently). Markus suggests such distinctions simply are the domain
of the provenance data model, not of datalink.
VEP-007
There was quite a bit of discussion on the mailing list up front whether
we need #metadata in addition to #documentation. Unfortunately, the
main proponent of that was not present, so the discussion strayed a bit
to applications #detached-header has out there. This leads to the
question why a datalink client would need to tell apart machine-readable
(like a header or an XLS with weather data) from human-readable (a PDF
with instrument schematics), to which there were no answers.
Example for PDS: the data set
https://astrogeology.usgs.gov/search/map/Pluto/NewHorizons/Pluto_NewHorizons_Global_Mosaic_300m_Jul2017
has detached metadata at something like
https://astropedia.astrogeology.usgs.gov/download/Pluto/NewHorizons/ancillary/Pluto_NewHorizons_Global_Mosaic_300m_Jul2017_8bit_pds3.lbl or here
https://astrogeology.usgs.gov/search/map/Pluto/NewHorizons/Pluto_NewHorizons_Global_Mosaic_300m_Jul2017.xml
[MD: this is a 404-ish thing for me]
VEP-009
Markus is missing the pragmatics here: why would a computer care about
the distinction between a "science" progenitor and a "calibration"
progenitor?
Mireille wonders if we're actually so sure that datalink won't be used
to drive recalibrations, which Markus thinks unlikely: There's just too
much metadata missing in datalink to make that realistic, and we already
have provenance that's much closer to enabling this kind of thing.
After session, Laurent remarked that calling a calibration a progenitor is
semantically wrong. Markus objected by pointing out there's no
consensus on what "Progenitor" ought to mean in the present context, and
that the current datalink definition ("data resources that were used to
create this dataset (e.g. input raw data)") quite clearly contradicts
Laurent's idea of what progenitors should be.
But anyway, if people feel strongly about that, Markus argues that
rather than fiddling with a perfectly good concept we rather ought to
fix #progenitor's label (perhaps: "Part of Provenance"?).
UCD atom order
In the context of a larger effort to improve validity of the UCDs used
in the VO, two topics mainly related to the order constraints for UCDs
came up on the mailing list:
http://mail.ivoa.net/pipermail/semantics/2021-September/002833.html
and
http://mail.ivoa.net/pipermail/semantics/2021-July/002827.html.
We mainly tackled the meta.curation case. meta.curation is currently P,
but people have asked for things like meta.ref.ivoid;meta.curation and
meta.date;meta.curation. Also, the "identity" bit in the current
definition is a bit unclear.
All this is exarcerbated by SSAP using meta.curation in both S and P
positions.
We discussed whether it's wise to comply and make meta.curation Q.
That, however, makes it hard to find a defintion that will work in both
positions. Can it be "S"? Then we can start the definition with
"related to", which would make the semantics a lot clearer. But then
SSAP needs a change.
Let's do that: their meta.curation can easily replaced by
meta.id;meta.curation. Mireille will make that happen.
We have deferred the discussion about arith.diff and arith.ratio being S
to the next call.
UAT adoption EN
This was drafted as part of the Vocabularies overhaul and now needs a
thorough review before we can confidently tell Registry we've figured
this out for them. So, please review:
http://ivoa.net/documents/uat-as-upstream/20201117/index.html
AOB
- Updates on the object type vocabulary: Simbad has now published their version, so there are just a few probably minor points to be worked out, in particular the lower case mapping for the term identifiers; let's try again to make clear that the "human-facing" identity of a concept is the label, not the identity.
- Not much activity on product type (which would be good for obscore) since Markus' Interop talk: https://wiki.ivoa.net/internal/IVOA/InterOpMay2021Semantics/product-type.pdf