RegTAP 1.1 Erratum 1: No Case Folding for table_name
Author: Markus Demleitner
Date last changed: 2021-08-30
Date accepted: 2021-10-13
Rationale
Table names in the VO at this point will almost always refer to SQL
table names. These typically come as sequences of regular SQL
identifiers and thus are indeed case-insensitive, which would suggest
case-folding for robust matching of names. However, SQL also admits
delimited identifiers in all of catalog, schema, and table name, and
there are important VO services that employ them. These identifiers do
not case fold.
When RegTAP folds case (as in the pre-erratum version), a table name
like
"J/A+A/437/789/table2"
would become
"j/a+a/437/789/table2"
.
This form does
not work in queries, and indeed a TAP client has no way
to find out the actual, functioning name of the table to query from the
RegTAP response.
This was not noticed so far because up to now no common RegTAP client
has used information from rr.res_table to pre-select tables or generate
queries against them.
This erratum is intended to pave the way for such clients.
Erratum Content
In RegTAP 1.1, sect. 8.6, remove table_name from the list of columns
converted to lowercase during ingestion. That is, replace
The following columns MUST be lowercased during ingestion: ivoid,
table_name, table_type, table_utype.
with
The following columns MUST be lowercased during ingestion: ivoid,
table_type, table_utype.
Impact assessment
RegTAP clients using the = or LIKE operators to compare lowercased
strings with the content of table_column will no longer find them. To
assess the severety of the problem, I ran
select table_name from rr.res_table
on GAVO's RegTAP service (which already stopped
case-folding table names) on 2021-08-30, removed all names with slashes in
them (these are
VizieR names, the text content of which probably has
not played a role in data discovery even before this erratum) as well as
all names referencing TAP_SCHEMA (which do not need discovery). This
results in 9683 table_names containing uppercase characters, and which hence
will no longer match in the above scenario. Examples include
ivoa.ObsCore, ATLASGAL_V1, video_er3_zyjhks_CDFS_catMetaData_fits_V2,
dbo.ObjectQualityFlags, ~QsoCatalogAll, BestDR7.!TiledTargetAll, or
"FIRST".firstSource12Feb16.
The adoption of this erratum will mean that, for instance,
table_name='ivoa.obscore'
will miss a few entries it has previously
caught (note that this is
not the recommended discovery mode for
obscore tables); for most of the other table names, I expect that most of
what matched in normalised camel case will have counterparts in titles
and descriptions (which would make them show up in typical discovery
queries as issued by TOPCAT or WIRR).
An alternative avoiding this behavioural change would be to add a column
in RegTAP that has the unmodified table name. In the interest of
avoiding the accumulation of cruft for excessive backwards compatibility
(“A20-gateism”) I would submit that is the less desirable alternative.
Instead, clients should be fixed to use ILIKE (or similar) when (ab)using
rr.res_table.table_name in data discovery (which in most use cases is a
questionable practice anyway).
Apart from that, the impact of this change will be that clients can
actually use RegTAP table names with
VizieR and services sporting table
names like
"FIRST".firstSource12Feb16
.
Implementation Note
As of late August 2021, reg.g-vo.org has already adopted this, risking
temporary non-compliance. For the
STScI searchable registry, this is
a non-problem because it relies on case-insensitive searches on the
side of the database. The
EuroVO RegTAP service still behaves as
prescribed by pre-erratum
RegTAP 1.1.