RegTAP 1.1 Erratum 1: No Case Folding for table_name

Author: Markus Demleitner

Date last changed: 2021-08-30

Date accepted: Not accepted yet

Rationale

Table names in the VO at this point will almost always refer to SQL table names. These typically come as sequences of regular SQL identifiers and thus are indeed case-insensitive, which would suggest case-folding for robust matching of names. However, SQL also admits delimited identifiers in all of catalog, schema, and table name, and there are important VO services that employ them. These identifiers do not case fold.

When RegTAP folds case (as in the pre-erratum version), a table name like "J/A+A/437/789/table2" would become "j/a+a/437/789/table2". This form does not work in queries, and indeed a TAP client has no way to find out the actual, functioning name of the table to query from the RegTAP response.

This was not noticed so far because up to now no common RegTAP client has used information from rr.res_table to pre-select tables or generate queries against them.

This erratum is intended to pave the way for such clients.

Erratum Content

In RegTAP 1.1, sect. 8.6, remove table_name from the list of columns converted to lowercase during ingestion. That is, replace

The following columns MUST be lowercased during ingestion: ivoid, table_name, table_type, table_utype.

with

The following columns MUST be lowercased during ingestion: ivoid, table_type, table_utype.

Impact assessment

RegTAP clients using the = or LIKE operators to compare lowercased strings with the content of table_column will no longer find them. To assess the severety of the problem, I ran select table_name from rr.res_table on GAVO's RegTAP service (which already stopped case-folding table names) on 2021-08-30, removed all names with slashes in them (these are VizieR names, the text content of which probably has not played a role in data discovery even before this erratum) as well as all names referencing TAP_SCHEMA (which do not need discovery). This results in 9683 table_names containing uppercase characters, and which hence will no longer match in the above scenario. Examples include ivoa.ObsCore, ATLASGAL_V1, video_er3_zyjhks_CDFS_catMetaData_fits_V2, dbo.ObjectQualityFlags, ~QsoCatalogAll, BestDR7.!TiledTargetAll, or "FIRST".firstSource12Feb16.

The adoption of this erratum will mean that, for instance, table_name='ivoa.obscore' will miss a few entries it has previously caught (note that this is not the recommended discovery mode for obscore tables); for most of the other table names, I expect that most of what matched in normalised camel case will have counterparts in titles and descriptions (which would make them show up in typical discovery queries as issued by TOPCAT or WIRR).

An alternative avoiding this behavioural change would be to add a column in RegTAP that has the unmodified table name. In the interest of avoiding the accumulation of cruft for excessive backwards compatibility (“A20-gateism”) I would submit that is the less desirable alternative. Instead, clients should be fixed to use ILIKE (or similar) when (ab)using rr.res_table.table_name in data discovery (which in most use cases is a questionable practice anyway).

Apart from that, the impact of this change will be that clients can actually use RegTAP table names with VizieR and services sporting table names like "FIRST".firstSource12Feb16.

Implementation Note

As of late August 2021, reg.g-vo.org has already adopted this, risking temporary non-compliance. For the STScI searchable registry, this is a non-problem because it relies on case-insensitive searches on the side of the database. The EuroVO RegTAP service still behaves as prescribed by pre-erratum RegTAP 1.1.


Topic revision: r2 - 2021-08-31 - MarkusDemleitner
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback