ObsCore 1.1, Erratum 3: Drop obs_id non-NULL requirement

Author: Markus Demleitner

Date last changed: 2022-04-12

Date accepted: (not accepted yet)

Rationale

In Version 1.1, Obscore requires the obs_id column to be non-NULL. This column is used in one technique for publishing observational results yielding multiple artefacts, where each artefact has a separate obscore row, which are then grouped by sharing a single obs_id ("multi-row observation"). For services that use datalink for such data or only have single-artefact data in the first place, obs_id plays no role and is therefore often neglected (e.g., there is no index on the column, and it may contain meaningless data invented for the sole purpose of satisfying the non-NULL requirement).

This becomes a problem as validators ascertain that the non-NULL requirement is in fact satisfied. In the absence of an index, the database has to inspect, and hence load from disk, the entire table (at least in a row-store) in a 1.1-correct obscore table, as it will not find a row with a NULL obs_id. For a large obscore table, this can take minutes, which slows down validation (or leads to false negatives due to timeouts) and puts significant load on the database servers.

To remedy this situation, the operators of such services could of course add indexes to their obs_id columns. At least where obscore is implemented as a view, however, this is not necessarily simple. It was hence tried to find a case where the non-NULL requirement on obs_id actually provides an operational benefit (see DAL list in March 2022).

Since this did not yield a convincing case, this Erratum drops the non-NULL requirement on obs_id.

Erratum Content

(1) In Table 4, replace the table row

ivoa.ObsCore obs_id adql:VARCHAR   not null

with

ivoa.ObsCore obs_id adql:VARCHAR    

(2) In the sub-chapter to section 4.1 (erroneously called "1.1.1"), replace

Progenitors and their derived data products must have the same obs_id.

with

Progenitors and their derived data products must have the same obs_id if their obs_id is non-NULL.

(3) In Section 4.4, delete the sentence "Values in the obs_id column must not be NULL."

Impact Assessment

Since no use case was identified that would fail when NULL obs_id-s are admitted, we do not expect any practical impact of this change. Service operators that do choose to have NULL obs_ids should consider that queries using obs_id but not aware of this erratum will group all the corresponding rows into a single observation. Given that such queries were not found in software not specifically tailored to a specific archive, that is probably acceptable behaviour.

If this erratum is accepted authors of queries inspecting obs_id are advised to consider adding an obs_id IS NOT NULL condition for robustness.


Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2022-04-12 - MarkusDemleitner
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback