This page contains my comments on the
TAP 0.31 spec
First summary of main points, then list of more detailed comments with location intext.
Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive.
Some comments may have been made irrelevant by the recent version 0.4.
NOTE: Notes like this one included below once action has been taken with respect to each point (PD aka
PatrickDowler). If TAP doc version is not specified, it is TAP-0.41.
Major points/issues/questions
- /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.
NOTE: post to dal mailing list (2009-03-02) to explain and initiate discussion (PD)
- Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
- foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
- indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
- "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
- IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.
NOTE: metadata discussion deferred until after next draft (PD)
- Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
- Imho, MAXREC and MTIME parameters should not be mixed with ADQL.
NOTE: This should be much more clear in TAP-0.4 (PD)
- Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
- ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
- Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
- Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).
NOTE: posted explanation and request for comment to dal mailing list on 2009-03-02 (PD)
line-by-line notes/questions/issues
(s=section,p=page,par=paragraph on page or in section).
- s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
NOTE: This was already gone in 0.4 (PD)
- s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport
ADQL queries is by storing one's results in a relational database and pass it the
ADQL, possibly slightly adapted. Not write one's own database engine.
NOTE: extra text about abstraction removed for clarity (PD)
select POINT(c.coordSys, t.ra, t.dec)
from (select 'ICRS' as coordSys) c
, table t
...
- s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
- s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?
- s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with ) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
- s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)
- s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
- s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.
- s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.
- s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. * s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?
- s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]
- s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
- s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.
- s2.5 This section seems to belong to 2.6, can it not be merged with that section?
- s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
- s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
- s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
select *
from TAP_SCHEMA.tables
where table_name like 'TAP_SCHEMA.%'
This is a JPEG version of a MagicDraw model which is available in UML form
here.
In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components.
In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys.
NB, the
original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well.
That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. :
