ObsCore 1.1, Erratum 3: Drop obs_id non-NULL requirementAuthor: Markus Demleitner Date last changed: 2022-04-12 Date accepted: (not accepted yet)RationaleIn Version 1.1, Obscore requires the obs_id column to be non-NULL. This column is used in one technique for publishing observational results yielding multiple artefacts, where each artefact has a separate obscore row, which are then grouped by sharing a single obs_id ("multi-row observation"). For services that use datalink for such data or only have single-artefact data in the first place, obs_id plays no role and is therefore often neglected (e.g., there is no index on the column, and it may contain meaningless data invented for the sole purpose of satisfying the non-NULL requirement). This becomes a problem as validators ascertain that the non-NULL requirement is in fact satisfied. In the absence of an index, the database has to inspect, and hence load from disk, the entire table (at least in a row-store) in a 1.1-correct obscore table, as it will not find a row with a NULL obs_id. For a large obscore table, this can take minutes, which slows down validation (or leads to false negatives due to timeouts) and puts significant load on the database servers. To remedy this situation, the operators of such services could of course add indexes to their obs_id columns. At least where obscore is implemented as a view, however, this is not necessarily simple. It was hence tried to find a case where the non-NULL requirement on obs_id actually provides an operational benefit (see DAL list in March 2022). Since this did not yield a convincing case, this Erratum drops the non-NULL requirement on obs_id.Erratum Content(1) In Table 4, replace the table row
Impact AssessmentSince no use case were identified that would fail when NULL obs_id-s are admitted, we do not expect any practical impact of this change. Service operators that do choose to have NULL obs_ids should consider that queries using obs_id but not aware of this erratum will group all the corresponding rows into a single observation. Given that such queries were not found in software not specifically tailored to a specific archive, that is probably acceptable behaviour. | |||||||||||
Changed: | |||||||||||
< < | If this erratum is accepted authors of queries inspecting obs_id are advised to consider adding an obs_id IS NOT NULL condition for robustness. | ||||||||||
> > | If this erratum is accepted authors of queries inspecting obs_id are advised to | ||||||||||
Added: | |||||||||||
> > | not group observations with a NULL obs_id; NULL obs_ids always indicate distinct observations. | ||||||||||
<--
|
ObsCore 1.1, Erratum 3: Drop obs_id non-NULL requirement | |||||||||
Changed: | |||||||||
< < | Author: Markus Demleitner | ||||||||
> > | Author: Markus Demleitner | ||||||||
Changed: | |||||||||
< < | Date last changed: 2022-04-12 | ||||||||
> > | Date last changed: 2022-04-12 | ||||||||
Date accepted: (not accepted yet)
RationaleIn Version 1.1, Obscore requires the obs_id column to be non-NULL. This column is used in one technique for publishing observational results yielding multiple artefacts, where each artefact has a separate obscore row, which are then grouped by sharing a single obs_id ("multi-row observation"). For services that use datalink for such data or only have single-artefact data in the first place, obs_id plays no role and is therefore often neglected (e.g., there is no index on the column, and it may contain meaningless data invented for the sole purpose of satisfying the non-NULL requirement). This becomes a problem as validators ascertain that the non-NULL requirement is in fact satisfied. In the absence of an index, the database has to inspect, and hence load from disk, the entire table (at least in a row-store) in a 1.1-correct obscore table, as it will not find a row with a NULL obs_id. For a large obscore table, this can take minutes, which slows down validation (or leads to false negatives due to timeouts) and puts significant load on the database servers. To remedy this situation, the operators of such services could of course add indexes to their obs_id columns. At least where obscore is implemented as a view, however, this is not necessarily simple. It was hence tried to find a case where the non-NULL requirement on obs_id actually provides an operational benefit (see DAL list in March 2022). Since this did not yield a convincing case, this Erratum drops the non-NULL requirement on obs_id.Erratum Content(1) In Table 4, replace the table row
Impact Assessment | |||||||||
Changed: | |||||||||
< < | Since no use case was identified that would fail when NULL obs_id-s are admitted, we do not expect any practical impact of this change. Service operators that do choose to have NULL obs_ids should consider that queries using obs_id but not aware of this erratum will group all the corresponding rows into a single observation. Given that such queries were not found in software not specifically tailored to a specific archive, that is probably acceptable behaviour. | ||||||||
> > | Since no use case were identified that would fail when NULL obs_id-s are admitted, we do not expect any practical impact of this change. Service operators that do choose to have NULL obs_ids should consider that queries using obs_id but not aware of this erratum will group all the corresponding rows into a single observation. Given that such queries were not found in software not specifically tailored to a specific archive, that is probably acceptable behaviour. | ||||||||
If this erratum is accepted authors of queries inspecting obs_id are advised to consider adding an obs_id IS NOT NULL condition for robustness.
<--
|
ObsCore 1.1, Erratum 3: Drop obs_id non-NULL requirementAuthor: Markus Demleitner Date last changed: 2022-04-12 Date accepted: (not accepted yet)RationaleIn Version 1.1, Obscore requires the obs_id column to be non-NULL. This column is used in one technique for publishing observational results yielding multiple artefacts, where each artefact has a separate obscore row, which are then grouped by sharing a single obs_id ("multi-row observation"). For services that use datalink for such data or only have single-artefact data in the first place, obs_id plays no role and is therefore often neglected (e.g., there is no index on the column, and it may contain meaningless data invented for the sole purpose of satisfying the non-NULL requirement). This becomes a problem as validators ascertain that the non-NULL requirement is in fact satisfied. In the absence of an index, the database has to inspect, and hence load from disk, the entire table (at least in a row-store) in a 1.1-correct obscore table, as it will not find a row with a NULL obs_id. For a large obscore table, this can take minutes, which slows down validation (or leads to false negatives due to timeouts) and puts significant load on the database servers. To remedy this situation, the operators of such services could of course add indexes to their obs_id columns. At least where obscore is implemented as a view, however, this is not necessarily simple. It was hence tried to find a case where the non-NULL requirement on obs_id actually provides an operational benefit (see DAL list in March 2022). Since this did not yield a convincing case, this Erratum drops the non-NULL requirement on obs_id.Erratum Content(1) In Table 4, replace the table row
Impact AssessmentSince no use case was identified that would fail when NULL obs_id-s are admitted, we do not expect any practical impact of this change. Service operators that do choose to have NULL obs_ids should consider that queries using obs_id but not aware of this erratum will group all the corresponding rows into a single observation. Given that such queries were not found in software not specifically tailored to a specific archive, that is probably acceptable behaviour. If this erratum is accepted authors of queries inspecting obs_id are advised to consider adding anobs_id IS NOT NULL condition for robustness.
<--
|