ObsCore 1.1, Erratum 3: Drop obs_id non-NULL requirement
Author: Markus Demleitner
Date last changed: 2022-04-12
Date accepted: (not accepted yet)
Rationale
In Version 1.1, Obscore requires the obs_id column to be non-NULL. This column is used in one technique for publishing observational results yielding multiple artefacts, where each artefact has a separate obscore row, which are then grouped by sharing a single obs_id ("multi-row observation"). For services that use datalink for such data or only have single-artefact data in the first place, obs_id plays no role and is therefore often neglected (e.g., there is no index on the column, and it may contain meaningless data invented for the sole purpose of satisfying the non-NULL requirement).
This becomes a problem as validators ascertain that the non-NULL requirement is in fact satisfied. In the absence of an index, the database has to inspect, and hence load from disk, the
entire table (at least in a row-store) in a 1.1-correct obscore table, as it will not find a row with a NULL obs_id. For a large obscore table, this can take minutes, which slows down validation (or leads to false negatives due to timeouts) and puts significant load on the database servers.
To remedy this situation, the operators of such services could of course add indexes to their obs_id columns. At least where obscore is implemented as a view, however, this is not necessarily simple. It was hence tried to find a case where the non-NULL requirement on obs_id actually provides an operational benefit (see
DAL list in March 2022).
Since this did not yield a convincing case, this Erratum drops the non-NULL requirement on obs_id.
Erratum Content
(1) In Table 4, replace the table row
ivoa.ObsCore obs_id |
adql:VARCHAR |
|
not null |
with
ivoa.ObsCore obs_id |
adql:VARCHAR |
|
|
(2) In the sub-chapter to section 4.1 (erroneously called "1.1.1"), replace
Progenitors and their derived data products must have the same obs_id.
with
Progenitors and their derived data products must have the same obs_id if their obs_id is non-NULL.
(3) In Section 4.4, delete the sentence "Values in the obs_id column must not be NULL."
Impact Assessment
Since no use case were identified that would fail when NULL obs_id-s are admitted, we do not expect any practical impact of this change. Service operators that do choose to have NULL obs_ids should consider that queries using obs_id but not aware of this erratum will group all the corresponding rows into a single observation. Given that such queries were not found in software not specifically tailored to a specific archive, that is probably acceptable behaviour.
If this erratum is accepted authors of queries inspecting obs_id are advised to
not group observations with a NULL obs_id; NULL obs_ids always
indicate distinct observations.