---------- PRELIMINARY -----------------

My workspace for commenting on TAP 0.31... Others than GerardLemson should not use this page.

Notes/questions/issues on reading TAP 0.31

Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf

I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others.

  • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
  • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
  • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
  • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
  • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
    1. querying standardised tables using ADQL or PARAMQUERY
    2. tableset queries
    3. VOSI queries
  • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
  • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
  • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
  • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
  • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
  • skip 1.2 for now, must be matched to normative sections.
  • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
  • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
  • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
  • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
  • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
  • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
  • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
  • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
  • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
  • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
  • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
  • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
  • s2.4.1 p11 par2, list The statement on getCapabilitie, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
    • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
    • Why does this spec which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to bve no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that correct VOSI spec?)
  • s2.4.2 p12 par1 "The query string is case sensitive."
    • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
    • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
  • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
  • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
    • Is ISO8601:2004 intended?
    • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
    • An overview of other RDBS would be useful.
  • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
  • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
  • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
select POINT(c.coordSys, t.ra, t.dec)
from (select 'ICRS' as coordSys) c
,    table t
...
  • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
  • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
  • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
  • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
  • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. I believe a TAP service should only impose its own "maximum permitted value of MAXREC", not some default value that comes into play when a user does not specify MAXREC. And only in that case should an overflow notification be given (s2.8.5).
  • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". IF a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. If a user asks for MAXREC rows, taht's what (s)he should get. ONLY if the service's "maximum permitted value for MAXREC" is reached should one react. But in the manner described in 2.8.4, using an INFO element.
  • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. Only if "TOP 0" is used might one call a query a null query. This is mainly based on the distinction between request and query (the payload of a request) that has been made in the mailing lists.
  • s2.4.8 Similar to MAXREC I don't think this parameter should go together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It could be noted maybe that in general it is good practice to have such columns, "createDate", "updateDate", explicitly added to their data model. Maybe there is a place in the metadata of a table in case a table contains only rows all with the same updateTime.
  • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well.
  • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
  • s2.4.14 p17 par2 "Clients should not repeat parameters in a
request". Seems to be a SHOULD requirement on clients.
  • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
  • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
  • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear.Various comments on the actual metadata prescription. I have a proposal for a datamodel for the TAP_SCHEMA in UML form here:
    • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
    • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
    • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
    • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well.I.e. I suggest an extra row, "view_sql, the SQL that defines this view (for rows with table_type=view).
    • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
    • third table I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries. It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataServicestarting with Ray's email here. One problem that has been identified there is that ADQL does
not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
    • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the [[http://vo-urp.googlecode.com/svn/trunk/input/tap-metadata/images/TAP_METADATA.jpg][data model proposal]).
    • s2.7 Metadata prescription for foreign keys is missing and are very important. See discussions in same mail thread starting here. A proposal for a model is given here again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
    • s2.7 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model has a suggestion for this.


  • TAP_METADATA.jpg:
    TAP_METADATA.jpg
Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg TAP_METADATA.jpg r1 manage 427.6 K 2009-02-18 - 14:42 GerardLemson  
Edit | Attach | Watch | Print version | History: r31 | r16 < r15 < r14 < r13 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r14 - 2009-02-18 - GerardLemson
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback