|
META TOPICPARENT |
name="GerardLemson" |
This page contains my comments on the
TAP 0.31 spec
First summary of main points, then list of more detailed comments with location intext.
Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive.
Some comments may have been made irrelevant by the recent version 0.4.
Major points/issues/questions
- /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.
- Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
- foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
- indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
- "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
- IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.
- Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
- Imho, MAXREC and MTIME parameters should not be mixed with ADQL.
- Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
- ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
- Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
- Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).
line-by-line notes/questions/issues
(s=section,p=page,par=paragraph on page or in section).
- s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
- s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted. Not write one's own database engine.
select POINT(c.coordSys, t.ra, t.dec)
from (select 'ICRS' as coordSys) c
, table t
...
- s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
- s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?
- s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)
s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.
s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.
s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. * s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?
- s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]
- s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
- s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.
- s2.5 This section seems to belong to 2.6, can it not be merged with that section?
- s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
- s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
- s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
select *
from TAP_SCHEMA.tables
where table_name like 'TAP_SCHEMA.%'
|
|
< < | Guess this depends much on UWS interaction, must check later in doc. |
> > | Guess this depends much on UWS functionality? |
|
- s2.7 p20 par7 "... any type of file ... do something useful with the file."
|
|
< < | Check rest of document whether examples are given of this for other parameters. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (also STC in example). |
> > | I could not find if the document defines such behaviour explicitly. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (and the STC in example). |
|
- s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
- s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
- s2.8.1 p21 par 1 "The first data row should give the column name..."
- First, is there a distinction between data rows and other rows?
- Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
|
|
< < |
- s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is it really the value of the REQUEST parameter?
|
> > |
- s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is the value of the REQUEST parameter meant?
|
|
- s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?
|
|
< < | That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A Tableset represents the whole database. A single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined and leads to unnessecary complications.Thsough ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications? |
> > | That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A tableset represents the whole database, a single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined properly and likely leads to unnessecary complications.Through ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications? |
|
- s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys.
|
|
< < | The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely only to cover this case?
- s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." In any case, assume this should read: "...one TABLE element..."?
|
> > | The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely mainly to cover this case?
- s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." I assume this should read: "...one TABLE element per table..."?
|
|
- s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,
|
|
< < | should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it.That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
- s2.8.5 "Overflows" I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client
|
> > | should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it. That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
- s2.8.5 "Overflows" (already commented on above) I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client
|
| might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA? |
|
< < |
- s2.10 I would suggest that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11.
|
> > |
- s2.10 I suppose that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11. [I guess that version 0.4 takes care of part of this.]
|
|
- s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
- s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
- s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate.
|
|
< < | Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with other statemennts (though has my preference). |
> > | Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with statements elsewhere in the spec. |
|
- s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens".
|
|
< < | Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be completed. |
> > | Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be complete. |
|
- s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?
|
|
< < | This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
- s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends on the query pure and simple what is returned.
|
> > | This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null). |
|
> > |
- s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends only on the query what is returned.
|
| If a user queries select ra, dec ... than the service MUST return an ra and a dec column.
- s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
|
|
< < | |
> > | |
|
<--
--> |
|
< < | This is a JPEG version of a MagicDraw model which is available in UML form here\.
NB, the VO-URP GoogleCode project is a split-off from the SimDB development in Volute.
XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. : |
> > | This is a JPEG version of a MagicDraw model which is available in UML form here.
In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components.
In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys. |
|
> > |
NB, the original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well.
That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. : |
|
META FILEATTACHMENT |
attr="" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3" |
|