Difference: GerardLemsonTAP_031 (1 vs. 32)

Revision 322012-06-26 - root

 
META TOPICPARENT name="GerardLemson"
This page contains my comments on the
Changed:
<
<
TAP 0.31 spec
>
>
TAP 0.31 spec
 First summary of main points, then list of more detailed comments with location intext. Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive. Some comments may have been made irrelevant by the recent version 0.4.

NOTE: Notes like this one included below once action has been taken with respect to each point (PD aka PatrickDowler). If TAP doc version is not specified, it is TAP-0.41.

Major points/issues/questions

  • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.

NOTE: post to dal mailing list (2009-03-02) to explain and initiate discussion (PD)

  • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
    1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
    2. indexes SHOULD [GL changed from MUST]be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
    3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata. [GL see datatypes page
    4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.

NOTE: metadata discussion deferred until after next draft (PD)

  • Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
    • Imho, MAXREC and MTIME parameters should not be mixed with ADQL.

NOTE: This should be much more clear in TAP-0.4 (PD)

  • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
    • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
    • Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
    • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).

NOTE: posted explanation and request for comment to dal mailing list on 2009-03-02 (PD)

NOTE: After discussion on dal mailing list, the doc has been changed to defer to the query language spec in matter of case sensitivity.

line-by-line notes/questions/issues

(s=section,p=page,par=paragraph on page or in section).

  • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted. Not write one's own database engine.

NOTE: extra text about abstraction removed for clarity (PD)

  • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?

NOTE: In my opinion it is necessary to allow services to support a subset of ADQL; this would be described in the capabilties returned from the VOSI capabilities request... not sure if one lists all the ADQL features (keywords) that are supported or the version of ADQL and then the ones that are not (should be a smaller list)... TBD (PD)

  • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref]! Maybe such a "meta-specification" would be a good place to put some of the parameter query specification in.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
    1. querying standardised tables using ADQL or PARAMQUERY
    2. tableset queries
    3. VOSI queries

NOTE: deferring (as above for metadata)

  • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset as it adds extensions such as user defined functions and of course all the REGION stuff.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.2 p6 par3: "... use an off-the-shelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if it is properly supplied with the required user-defined-functions.

NOTE: this text is no longer in as of TAP-0.4; underlying issue not otherwise addressed (PD)

  • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect cone search as the most common use case, but is that true? Could be changed to "some common use cases".

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is. This would be particularly useful for those people who want to implement TAP before UWS is completely accepted. Same is true for possible dependencies on other not-yet-accepted standards such as VOSI.

NOTE: It was accepted in Trieste that the UWS spec would have to be developed and standardised ahead of TAP (PD)

  • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as
    select * from thattable
    , not advanced at all. But it may lead to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.

NOTE: this text is mostly gone as of TAP-0.4; discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?

NOTE: it is true that when describing a service interface that some things are requirements for the service and some for the clients; the latter also need to be described so that the correct response can be specified (e.g. an error when a required param is missing); will try to clarify after next draft (PD)

  • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service MUST support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.

NOTE: discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2.1 p9 3rd item in list I would think that table metadata MUST be provided. Without it no queries are possible.

NOTE: agreed, changed in TAP-0.41

  • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
  • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?

NOTE: Yes, the R in REST is represent(...). I don't see any reason one could not have/serve other web resources from within the tree. TAP (and UWS) simply enumerate a (required) set of resources and what they mean (PD).

  • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?

NOTE: This is just explaining how HTTP works in practice and really belongs in Use of HTTP (Section 7 in TAP-0.4).

  • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?

NOTE: as above, sync vs async discussion on mailing list (PD)

  • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit indication of which combinations are valid.

NOTE: Let's revisit this w.r.t. TAP-0.4 now that many parameters have been moved to a separate document (PD).

  • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.

NOTE: It is informative for a service implementor though: it tells them what to assume and what is an error. It could be worded in a more service-implementor centric fashion, but then a client-centric doc would be needed -- maybe is? (PD).

  • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?

NOTE: It is spurious. It is assumed that the service will extract parameters it knows about from the request and ignore anything that is not applicable, which includes everything it does not know about.

  • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.

NOTE: not sure what this refers to as the page numbers are not that helpful (did you print on A4?) but you are right that the REQUEST requirement applies to direct access to the /async and /sync endpoints. That is, you would not need REQUEST to access the child resources under a UWS job. Text clarified in TAP-0.41 (PD).

  • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?

NOTE: consistent case in TAP-0.4 (PD).

  • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
    • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
    • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that the correct VOSI spec?)

NOTE: As with UWS (above), we expect that VOSI as it pertains to TAP will be standardised ahead of TAP (PD). The returned XML is specified by the VODataService spec; anyway, this needs discussion as part of the whole metadata topic (PD).

  • s2.4.2 p12 par1 "The query string is case sensitive."
    • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
    • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc).
Changed:
<
<
Maybe useful to look at report on different database systems by JVO in
>
>
Maybe useful to look at report on different database systems by JVO in
 Victoria. Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).

NOTE: initiated discussion of case sensitiveness on mailing list 2009-03-02 (PD).

  • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?

NOTE: part oft he whole case-senitive topic above; it is a requirement on the client as stated (PD).

  • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
    • Is ISO8601:2004 intended?
    • Must all of ISO8601(:2004) be supported?
    • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
    • An overview of other RDBS would be useful.

NOTE: Agreed: reviewing what DBs mostly support so that dates can be passed through easily would be good... TBD (PD)

  • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one MUST implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS and other REGION-like extensions) which one may or may not support. After all
     select * from sources where dec between -10 and 10
    looks like a spatial query, but does not require INTERSECTS etc.

NOTE: We wanted to specify a fixed set of ADQL region constructs that everyone supported so make it easier on the client and on the implementor (fewer decisions). The text says "contains columns with spatial" AND "service wants to support". This is intended to mean that spatial querying support via ADQL region constructs is optional. The example of a range of dec above is independent of this and perfectly acceptable. Text clarified in TAP-0.41 (PD).

  • s2.4.2 p12 par3 "the extent of STC/S support within the REGION function is left up to the implementation" I can read this as allowing no support for STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.

NOTE: For consistency with the direct ADQL constructs, we could require support for position, circle, and box in STC/S. In general, the claim is that services and applications can supprot whatever part of STC they like and that is OK... (PD).

  • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. Also noted by Markus Demleitner I think. This seems like a change to the language, which might even require different parsers/interpreters than one would normally implement. How far does this restriction go. Is the following query ok for example:
select POINT(c.coordSys, t.ra, t.dec)
from (select 'ICRS' as coordSys) c
,	 table t
...

NOTE: I agree that ADQL allows these and in ADQL discussions where people didn't like the look of such constructs it was argued that this was just the nature of ADQL (SQL) and it's treatment of argument types (literal is equivalent to column ref); this text was included in a provocative manner when it should be simply a warning to users that if they do this they are possibly going to make mistakes. Of course, there are plenty of ways to make mistakes with ADQL and this particular complexity is not going to solve that. Changed text to make this a note/warning in TAP-0.4 (PD).

  • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?

NOTE: changed to MUST in TAP-0.4 (PD)

  • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?

NOTE: good idea (PD)

  • s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?

NOTE: There is an html format but that is for the whole page. Can you plausibly get an html table element without associated CSS style sheets and expect something useful? Marginally maybe... (PD)

  • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)

NOTE: I think the intent is for TABLEDATA ONLY. TBD? Will clarify text to say TABLEDATA only for now in TAP-0.41 (PD)

  • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?

NOTE: It must be a legal table name as defined by ADQL, so not following the should means that one has added optional schema (and maybe catalgo) names as prefixes. If the schema name is TAP_UPLOAD (doc incorrectly says TAP_SCHEMA) that would be ok but if it is anythign else it would have to be an error. Clarifying to say "must be an unqualified table name" in TAP-0.41 (PD).

  • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.

NOTE: MAXREC is used to possibly negotiate a query size limit with the service, which may not otherwise be able to tell what the query will return. Without adding MAXREC to an ADQL query (even one using TOP) the service may truncate the result at a different place due to default limits (PD).

  • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.

NOTE: this paragraph was removed in TAP-0.4; rules for indicating truncation are described elsewhere (PD).

  • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request.
NOTE: Last sentence mentioning null-query removed in TAP-0.41 (PD).

  • s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?

NOTE: It is true that MTIME is intended for finding new/changed/deleted records and making a mirror. While that may generally be best done via param-query, at this point only ADQL support is required so although MTIME is optional we did not want to make it dependent on other optional optional features. If the service cannot deal with MTIME it is ignored, as usual (PD).

  • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]

NOTE: Yes, this was moved to a separate document (PD).

  • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.

NOTE: The section on LANG no longer says value is case insenstive as of TAP-0.4; other case-sensitivity issues TBD (PD).

  • s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.

NOTE: It is, although it also says that the service never has to deal with multi-valued parameters in the HTTP sense. Not sure why not.. will bring up on DAL list (PD).

  • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
  • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.

NOTE: Fixed in TAP 0.41. Clarified to say that table name is defined in the query language spec. In cases where one can use the .. construct to specify the default schema, there is still a schema and you can put that explicitly in the metadata, so I don't see a problem. (PD)

  • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
    • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?

NOTE: fixed in TAP 0.41 (PD)

    • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?

NOTE: fixed in TAP 0.41 (PD)

    • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?

NOTE: fixed in TAP 0.41 (PD)

    • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well. I suggest an extra row, "view_sql, containing the SQL that defines this view (for rows with table_type=view).

NOTE: ADQL does not specify CREATE statements so this could not be described with ADQL. As for showing the SQL CREATE VIEW, is that actually worthwhile? It will not necessarily map to anything the user could infer from the metadata (table and column names could be arbitrarily different, for example). (PD)

    • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?

NOTE: fixed in TAP 0.41 (PD)

    • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries.
NOTE: neither the SQL type nor the VOTable type is actually sufficient; one needs the ADQL type which includes the region constructs as well. The SQL types of those will be (var)char or (var)binary (most likely)... will post discussion to dal list (PD)

It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.

    • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
    • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
      select primary,.indexed,std from tap_schema.columns
      ? (I guess 2.11 might say something about this)
    • s2.6 Metadata prescription for foreign keys is missing but very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
    • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.

NOTE: There have been discussions about functions and so far the consensus has been that we should just leave it out of the initial version. That does mean that people will not be able to use any of the ADQL region stuff without just guessing it will work and being ready for an error. Will initiate further discussion on dal list (PD)

  • s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
select * 
  from TAP_SCHEMA.tables
 where table_name like 'TAP_SCHEMA.%'
    • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
    • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
  • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
    select * from TAP_SCHEMA.tableset
    as that table does not exist.
  • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS functionality?
  • s2.7 p20 par7 "... any type of file ... do something useful with the file." I could not find if the document defines such behaviour explicitly. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (and the STC in example).
  • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
  • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
  • s2.8.1 p21 par 1 "The first data row should give the column name..."
    • First, is there a distinction between data rows and other rows?
    • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
  • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is the value of the REQUEST parameter meant?
  • s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A tableset represents the whole database, a single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined properly and likely leads to unnessecary complications.Through ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
  • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely mainly to cover this case?
  • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." I assume this should read: "...one TABLE element per table..."?
  • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it. That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
  • s2.8.5 "Overflows" (already commented on above) I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
  • s2.10 I suppose that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11. [I guess that version 0.4 takes care of part of this.]
  • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
  • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
  • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with statements elsewhere in the spec.
  • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be complete.
  • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
  • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends only on the query what is returned. If a user queries
    select ra, dec ...
    than the service MUST return an ra and a dec column.
  • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
  • (maybe more later)


  • TAP_METADATA.jpg:
This is a JPEG version of a MagicDraw model which is available in UML form here. In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components. In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys.

NB, the original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well. That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. : TAP_METADATA.jpg

For those who don't like UML, here an attempt at a summary:

  • database [name,description, utype]
    • schema [name,description, utype]
      • table/view: [name,description, utype, sql (for views)]
        • column [[name,description, utype,datatype, ucd, etc]
        • foreignkey [toTableName, ...]
          • foreignKeyColumn [fromColumnName, toColumnName]
        • index[name, description, ...]
          • indexColumn [columnName, rank]
        • group [name, id, ...]
          • columnRef [columnName, rank]
          • param(Ref) [...]
          • group(Ref) [...]
        • param [name, ucd, ..., value]
  • QueryResult
    • Result column
    • ?source column?

META FILEATTACHMENT attr="h" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"
META FILEATTACHMENT attr="" comment="Added query result" date="1239974738" name="TAP_METADATA.png" path="TAP_METADATA.png" size="118303" user="GerardLemson" version="1.3"

Revision 312009-04-17 - GerardLemson

 
META TOPICPARENT name="GerardLemson"
This page contains my comments on the TAP 0.31 spec First summary of main points, then list of more detailed comments with location intext. Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive. Some comments may have been made irrelevant by the recent version 0.4.

NOTE: Notes like this one included below once action has been taken with respect to each point (PD aka PatrickDowler). If TAP doc version is not specified, it is TAP-0.41.

Major points/issues/questions

  • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.

NOTE: post to dal mailing list (2009-03-02) to explain and initiate discussion (PD)

  • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
    1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
    2. indexes SHOULD [GL changed from MUST]be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
    3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata. [GL see datatypes page
    4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.

NOTE: metadata discussion deferred until after next draft (PD)

  • Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
    • Imho, MAXREC and MTIME parameters should not be mixed with ADQL.

NOTE: This should be much more clear in TAP-0.4 (PD)

  • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
    • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
    • Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
    • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).

NOTE: posted explanation and request for comment to dal mailing list on 2009-03-02 (PD)

NOTE: After discussion on dal mailing list, the doc has been changed to defer to the query language spec in matter of case sensitivity.

line-by-line notes/questions/issues

(s=section,p=page,par=paragraph on page or in section).

  • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted. Not write one's own database engine.

NOTE: extra text about abstraction removed for clarity (PD)

  • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?

NOTE: In my opinion it is necessary to allow services to support a subset of ADQL; this would be described in the capabilties returned from the VOSI capabilities request... not sure if one lists all the ADQL features (keywords) that are supported or the version of ADQL and then the ones that are not (should be a smaller list)... TBD (PD)

  • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref]! Maybe such a "meta-specification" would be a good place to put some of the parameter query specification in.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
    1. querying standardised tables using ADQL or PARAMQUERY
    2. tableset queries
    3. VOSI queries

NOTE: deferring (as above for metadata)

  • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset as it adds extensions such as user defined functions and of course all the REGION stuff.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.2 p6 par3: "... use an off-the-shelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if it is properly supplied with the required user-defined-functions.

NOTE: this text is no longer in as of TAP-0.4; underlying issue not otherwise addressed (PD)

  • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect cone search as the most common use case, but is that true? Could be changed to "some common use cases".

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is. This would be particularly useful for those people who want to implement TAP before UWS is completely accepted. Same is true for possible dependencies on other not-yet-accepted standards such as VOSI.

NOTE: It was accepted in Trieste that the UWS spec would have to be developed and standardised ahead of TAP (PD)

  • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as
    select * from thattable
    , not advanced at all. But it may lead to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.

NOTE: this text is mostly gone as of TAP-0.4; discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?

NOTE: it is true that when describing a service interface that some things are requirements for the service and some for the clients; the latter also need to be described so that the correct response can be specified (e.g. an error when a required param is missing); will try to clarify after next draft (PD)

  • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service MUST support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.

NOTE: discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2.1 p9 3rd item in list I would think that table metadata MUST be provided. Without it no queries are possible.

NOTE: agreed, changed in TAP-0.41

  • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
  • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?

NOTE: Yes, the R in REST is represent(...). I don't see any reason one could not have/serve other web resources from within the tree. TAP (and UWS) simply enumerate a (required) set of resources and what they mean (PD).

  • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?

NOTE: This is just explaining how HTTP works in practice and really belongs in Use of HTTP (Section 7 in TAP-0.4).

  • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?

NOTE: as above, sync vs async discussion on mailing list (PD)

  • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit indication of which combinations are valid.

NOTE: Let's revisit this w.r.t. TAP-0.4 now that many parameters have been moved to a separate document (PD).

  • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.

NOTE: It is informative for a service implementor though: it tells them what to assume and what is an error. It could be worded in a more service-implementor centric fashion, but then a client-centric doc would be needed -- maybe is? (PD).

  • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?

NOTE: It is spurious. It is assumed that the service will extract parameters it knows about from the request and ignore anything that is not applicable, which includes everything it does not know about.

  • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.

NOTE: not sure what this refers to as the page numbers are not that helpful (did you print on A4?) but you are right that the REQUEST requirement applies to direct access to the /async and /sync endpoints. That is, you would not need REQUEST to access the child resources under a UWS job. Text clarified in TAP-0.41 (PD).

  • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?

NOTE: consistent case in TAP-0.4 (PD).

  • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
    • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
    • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that the correct VOSI spec?)

NOTE: As with UWS (above), we expect that VOSI as it pertains to TAP will be standardised ahead of TAP (PD). The returned XML is specified by the VODataService spec; anyway, this needs discussion as part of the whole metadata topic (PD).

  • s2.4.2 p12 par1 "The query string is case sensitive."
    • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
    • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). Maybe useful to look at report on different database systems by JVO in Victoria. Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).

NOTE: initiated discussion of case sensitiveness on mailing list 2009-03-02 (PD).

  • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?

NOTE: part oft he whole case-senitive topic above; it is a requirement on the client as stated (PD).

  • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
    • Is ISO8601:2004 intended?
    • Must all of ISO8601(:2004) be supported?
    • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
    • An overview of other RDBS would be useful.

NOTE: Agreed: reviewing what DBs mostly support so that dates can be passed through easily would be good... TBD (PD)

  • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one MUST implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS and other REGION-like extensions) which one may or may not support. After all
     select * from sources where dec between -10 and 10
    looks like a spatial query, but does not require INTERSECTS etc.

NOTE: We wanted to specify a fixed set of ADQL region constructs that everyone supported so make it easier on the client and on the implementor (fewer decisions). The text says "contains columns with spatial" AND "service wants to support". This is intended to mean that spatial querying support via ADQL region constructs is optional. The example of a range of dec above is independent of this and perfectly acceptable. Text clarified in TAP-0.41 (PD).

  • s2.4.2 p12 par3 "the extent of STC/S support within the REGION function is left up to the implementation" I can read this as allowing no support for STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.

NOTE: For consistency with the direct ADQL constructs, we could require support for position, circle, and box in STC/S. In general, the claim is that services and applications can supprot whatever part of STC they like and that is OK... (PD).

  • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. Also noted by Markus Demleitner I think. This seems like a change to the language, which might even require different parsers/interpreters than one would normally implement. How far does this restriction go. Is the following query ok for example:
select POINT(c.coordSys, t.ra, t.dec)
from (select 'ICRS' as coordSys) c
,	 table t
...

NOTE: I agree that ADQL allows these and in ADQL discussions where people didn't like the look of such constructs it was argued that this was just the nature of ADQL (SQL) and it's treatment of argument types (literal is equivalent to column ref); this text was included in a provocative manner when it should be simply a warning to users that if they do this they are possibly going to make mistakes. Of course, there are plenty of ways to make mistakes with ADQL and this particular complexity is not going to solve that. Changed text to make this a note/warning in TAP-0.4 (PD).

  • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?

NOTE: changed to MUST in TAP-0.4 (PD)

  • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?

NOTE: good idea (PD)

  • s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?

NOTE: There is an html format but that is for the whole page. Can you plausibly get an html table element without associated CSS style sheets and expect something useful? Marginally maybe... (PD)

  • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)

NOTE: I think the intent is for TABLEDATA ONLY. TBD? Will clarify text to say TABLEDATA only for now in TAP-0.41 (PD)

  • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?

NOTE: It must be a legal table name as defined by ADQL, so not following the should means that one has added optional schema (and maybe catalgo) names as prefixes. If the schema name is TAP_UPLOAD (doc incorrectly says TAP_SCHEMA) that would be ok but if it is anythign else it would have to be an error. Clarifying to say "must be an unqualified table name" in TAP-0.41 (PD).

  • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.

NOTE: MAXREC is used to possibly negotiate a query size limit with the service, which may not otherwise be able to tell what the query will return. Without adding MAXREC to an ADQL query (even one using TOP) the service may truncate the result at a different place due to default limits (PD).

  • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.

NOTE: this paragraph was removed in TAP-0.4; rules for indicating truncation are described elsewhere (PD).

  • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request.
NOTE: Last sentence mentioning null-query removed in TAP-0.41 (PD).

  • s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?

NOTE: It is true that MTIME is intended for finding new/changed/deleted records and making a mirror. While that may generally be best done via param-query, at this point only ADQL support is required so although MTIME is optional we did not want to make it dependent on other optional optional features. If the service cannot deal with MTIME it is ignored, as usual (PD).

  • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]

NOTE: Yes, this was moved to a separate document (PD).

  • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.

NOTE: The section on LANG no longer says value is case insenstive as of TAP-0.4; other case-sensitivity issues TBD (PD).

  • s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.

NOTE: It is, although it also says that the service never has to deal with multi-valued parameters in the HTTP sense. Not sure why not.. will bring up on DAL list (PD).

  • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
  • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.

NOTE: Fixed in TAP 0.41. Clarified to say that table name is defined in the query language spec. In cases where one can use the .. construct to specify the default schema, there is still a schema and you can put that explicitly in the metadata, so I don't see a problem. (PD)

  • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
    • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?

NOTE: fixed in TAP 0.41 (PD)

    • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?

NOTE: fixed in TAP 0.41 (PD)

    • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?

NOTE: fixed in TAP 0.41 (PD)

    • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well. I suggest an extra row, "view_sql, containing the SQL that defines this view (for rows with table_type=view).

NOTE: ADQL does not specify CREATE statements so this could not be described with ADQL. As for showing the SQL CREATE VIEW, is that actually worthwhile? It will not necessarily map to anything the user could infer from the metadata (table and column names could be arbitrarily different, for example). (PD)

    • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?

NOTE: fixed in TAP 0.41 (PD)

    • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries.
NOTE: neither the SQL type nor the VOTable type is actually sufficient; one needs the ADQL type which includes the region constructs as well. The SQL types of those will be (var)char or (var)binary (most likely)... will post discussion to dal list (PD)

It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.

    • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
    • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
      select primary,.indexed,std from tap_schema.columns
      ? (I guess 2.11 might say something about this)
    • s2.6 Metadata prescription for foreign keys is missing but very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
    • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.

NOTE: There have been discussions about functions and so far the consensus has been that we should just leave it out of the initial version. That does mean that people will not be able to use any of the ADQL region stuff without just guessing it will work and being ready for an error. Will initiate further discussion on dal list (PD)

  • s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
select * 
  from TAP_SCHEMA.tables
 where table_name like 'TAP_SCHEMA.%'
    • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
    • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
  • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
    select * from TAP_SCHEMA.tableset
    as that table does not exist.
  • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS functionality?
  • s2.7 p20 par7 "... any type of file ... do something useful with the file." I could not find if the document defines such behaviour explicitly. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (and the STC in example).
  • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
  • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
  • s2.8.1 p21 par 1 "The first data row should give the column name..."
    • First, is there a distinction between data rows and other rows?
    • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
  • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is the value of the REQUEST parameter meant?
  • s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A tableset represents the whole database, a single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined properly and likely leads to unnessecary complications.Through ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
  • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely mainly to cover this case?
  • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." I assume this should read: "...one TABLE element per table..."?
  • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it. That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
  • s2.8.5 "Overflows" (already commented on above) I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
  • s2.10 I suppose that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11. [I guess that version 0.4 takes care of part of this.]
  • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
  • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
  • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with statements elsewhere in the spec.
  • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be complete.
  • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
  • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends only on the query what is returned. If a user queries
    select ra, dec ...
    than the service MUST return an ra and a dec column.
  • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
  • (maybe more later)


<--  
-->
  • TAP_METADATA.jpg:
This is a JPEG version of a MagicDraw model which is available in UML form here. In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components. In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys.

NB, the original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well. That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. :Changed: <
< TAP_METADATA.jpg>
> TAP_METADATA.jpg 

For those who don't like UML, here an attempt at a summary:

  • database [name,description, utype]
    • schema [name,description, utype]
      • table/view: [name,description, utype, sql (for views)]
        • column [[name,description, utype,datatype, ucd, etc]
        • foreignkey [toTableName, ...]
          • foreignKeyColumn [fromColumnName, toColumnName]
        • index[name, description, ...]
          • indexColumn [columnName, rank]
        • group [name, id, ...]
          • columnRef [columnName, rank]
          • param(Ref) [...]
          • group(Ref) [...]
        • param [name, ucd, ..., value]
Added: >
>  
META FILEATTACHMENT attr="h" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"
Changed: <
<
META FILEATTACHMENT attr="" comment="Added some VOTable ellements" date="1239968234" name="TAP_METADATA.png" path="TAP_METADATA.png" size="89392" user="GerardLemson" version="1.2"
>
>
META FILEATTACHMENT attr="" comment="Added query result" date="1239974738" name="TAP_METADATA.png" path="TAP_METADATA.png" size="118303" user="GerardLemson" version="1.3"
 

Revision 302009-04-17 - GerardLemson

 
META TOPICPARENT name="GerardLemson"
This page contains my comments on the TAP 0.31 spec First summary of main points, then list of more detailed comments with location intext. Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive. Some comments may have been made irrelevant by the recent version 0.4.

NOTE: Notes like this one included below once action has been taken with respect to each point (PD aka PatrickDowler). If TAP doc version is not specified, it is TAP-0.41.

Major points/issues/questions

  • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.

NOTE: post to dal mailing list (2009-03-02) to explain and initiate discussion (PD)

  • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
    1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
    2. indexes SHOULD [GL changed from MUST]be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
    3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata. [GL see datatypes page
    4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.

NOTE: metadata discussion deferred until after next draft (PD)

  • Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
    • Imho, MAXREC and MTIME parameters should not be mixed with ADQL.

NOTE: This should be much more clear in TAP-0.4 (PD)

  • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
    • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
    • Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
    • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).

NOTE: posted explanation and request for comment to dal mailing list on 2009-03-02 (PD)

NOTE: After discussion on dal mailing list, the doc has been changed to defer to the query language spec in matter of case sensitivity.

line-by-line notes/questions/issues

(s=section,p=page,par=paragraph on page or in section).

  • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted. Not write one's own database engine.

NOTE: extra text about abstraction removed for clarity (PD)

  • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?

NOTE: In my opinion it is necessary to allow services to support a subset of ADQL; this would be described in the capabilties returned from the VOSI capabilities request... not sure if one lists all the ADQL features (keywords) that are supported or the version of ADQL and then the ones that are not (should be a smaller list)... TBD (PD)

  • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref]! Maybe such a "meta-specification" would be a good place to put some of the parameter query specification in.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
    1. querying standardised tables using ADQL or PARAMQUERY
    2. tableset queries
    3. VOSI queries

NOTE: deferring (as above for metadata)

  • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset as it adds extensions such as user defined functions and of course all the REGION stuff.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.2 p6 par3: "... use an off-the-shelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if it is properly supplied with the required user-defined-functions.

NOTE: this text is no longer in as of TAP-0.4; underlying issue not otherwise addressed (PD)

  • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect cone search as the most common use case, but is that true? Could be changed to "some common use cases".

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is. This would be particularly useful for those people who want to implement TAP before UWS is completely accepted. Same is true for possible dependencies on other not-yet-accepted standards such as VOSI.

NOTE: It was accepted in Trieste that the UWS spec would have to be developed and standardised ahead of TAP (PD)

  • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as
    select * from thattable
    , not advanced at all. But it may lead to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.

NOTE: this text is mostly gone as of TAP-0.4; discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?

NOTE: it is true that when describing a service interface that some things are requirements for the service and some for the clients; the latter also need to be described so that the correct response can be specified (e.g. an error when a required param is missing); will try to clarify after next draft (PD)

  • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service MUST support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.

NOTE: discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2.1 p9 3rd item in list I would think that table metadata MUST be provided. Without it no queries are possible.

NOTE: agreed, changed in TAP-0.41

  • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
  • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?

NOTE: Yes, the R in REST is represent(...). I don't see any reason one could not have/serve other web resources from within the tree. TAP (and UWS) simply enumerate a (required) set of resources and what they mean (PD).

  • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?

NOTE: This is just explaining how HTTP works in practice and really belongs in Use of HTTP (Section 7 in TAP-0.4).

  • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?

NOTE: as above, sync vs async discussion on mailing list (PD)

  • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit indication of which combinations are valid.

NOTE: Let's revisit this w.r.t. TAP-0.4 now that many parameters have been moved to a separate document (PD).

  • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.

NOTE: It is informative for a service implementor though: it tells them what to assume and what is an error. It could be worded in a more service-implementor centric fashion, but then a client-centric doc would be needed -- maybe is? (PD).

  • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?

NOTE: It is spurious. It is assumed that the service will extract parameters it knows about from the request and ignore anything that is not applicable, which includes everything it does not know about.

  • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.

NOTE: not sure what this refers to as the page numbers are not that helpful (did you print on A4?) but you are right that the REQUEST requirement applies to direct access to the /async and /sync endpoints. That is, you would not need REQUEST to access the child resources under a UWS job. Text clarified in TAP-0.41 (PD).

  • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?

NOTE: consistent case in TAP-0.4 (PD).

  • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
    • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
    • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that the correct VOSI spec?)

NOTE: As with UWS (above), we expect that VOSI as it pertains to TAP will be standardised ahead of TAP (PD). The returned XML is specified by the VODataService spec; anyway, this needs discussion as part of the whole metadata topic (PD).

  • s2.4.2 p12 par1 "The query string is case sensitive."
    • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
    • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). Maybe useful to look at report on different database systems by JVO in Victoria. Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).

NOTE: initiated discussion of case sensitiveness on mailing list 2009-03-02 (PD).

  • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?

NOTE: part oft he whole case-senitive topic above; it is a requirement on the client as stated (PD).

  • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
    • Is ISO8601:2004 intended?
    • Must all of ISO8601(:2004) be supported?
    • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
    • An overview of other RDBS would be useful.

NOTE: Agreed: reviewing what DBs mostly support so that dates can be passed through easily would be good... TBD (PD)

  • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one MUST implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS and other REGION-like extensions) which one may or may not support. After all
     select * from sources where dec between -10 and 10
    looks like a spatial query, but does not require INTERSECTS etc.

NOTE: We wanted to specify a fixed set of ADQL region constructs that everyone supported so make it easier on the client and on the implementor (fewer decisions). The text says "contains columns with spatial" AND "service wants to support". This is intended to mean that spatial querying support via ADQL region constructs is optional. The example of a range of dec above is independent of this and perfectly acceptable. Text clarified in TAP-0.41 (PD).

  • s2.4.2 p12 par3 "the extent of STC/S support within the REGION function is left up to the implementation" I can read this as allowing no support for STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.

NOTE: For consistency with the direct ADQL constructs, we could require support for position, circle, and box in STC/S. In general, the claim is that services and applications can supprot whatever part of STC they like and that is OK... (PD).

  • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. Also noted by Markus Demleitner I think. This seems like a change to the language, which might even require different parsers/interpreters than one would normally implement. How far does this restriction go. Is the following query ok for example:
select POINT(c.coordSys, t.ra, t.dec)
from (select 'ICRS' as coordSys) c
,	 table t
...

NOTE: I agree that ADQL allows these and in ADQL discussions where people didn't like the look of such constructs it was argued that this was just the nature of ADQL (SQL) and it's treatment of argument types (literal is equivalent to column ref); this text was included in a provocative manner when it should be simply a warning to users that if they do this they are possibly going to make mistakes. Of course, there are plenty of ways to make mistakes with ADQL and this particular complexity is not going to solve that. Changed text to make this a note/warning in TAP-0.4 (PD).

  • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?

NOTE: changed to MUST in TAP-0.4 (PD)

  • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?

NOTE: good idea (PD)

  • s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?

NOTE: There is an html format but that is for the whole page. Can you plausibly get an html table element without associated CSS style sheets and expect something useful? Marginally maybe... (PD)

  • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)

NOTE: I think the intent is for TABLEDATA ONLY. TBD? Will clarify text to say TABLEDATA only for now in TAP-0.41 (PD)

  • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?

NOTE: It must be a legal table name as defined by ADQL, so not following the should means that one has added optional schema (and maybe catalgo) names as prefixes. If the schema name is TAP_UPLOAD (doc incorrectly says TAP_SCHEMA) that would be ok but if it is anythign else it would have to be an error. Clarifying to say "must be an unqualified table name" in TAP-0.41 (PD).

  • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.

NOTE: MAXREC is used to possibly negotiate a query size limit with the service, which may not otherwise be able to tell what the query will return. Without adding MAXREC to an ADQL query (even one using TOP) the service may truncate the result at a different place due to default limits (PD).

  • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.

NOTE: this paragraph was removed in TAP-0.4; rules for indicating truncation are described elsewhere (PD).

  • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request.
NOTE: Last sentence mentioning null-query removed in TAP-0.41 (PD).

  • s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?

NOTE: It is true that MTIME is intended for finding new/changed/deleted records and making a mirror. While that may generally be best done via param-query, at this point only ADQL support is required so although MTIME is optional we did not want to make it dependent on other optional optional features. If the service cannot deal with MTIME it is ignored, as usual (PD).

  • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]

NOTE: Yes, this was moved to a separate document (PD).

  • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.

NOTE: The section on LANG no longer says value is case insenstive as of TAP-0.4; other case-sensitivity issues TBD (PD).

  • s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.

NOTE: It is, although it also says that the service never has to deal with multi-valued parameters in the HTTP sense. Not sure why not.. will bring up on DAL list (PD).

  • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
  • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.

NOTE: Fixed in TAP 0.41. Clarified to say that table name is defined in the query language spec. In cases where one can use the .. construct to specify the default schema, there is still a schema and you can put that explicitly in the metadata, so I don't see a problem. (PD)

  • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
    • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?

NOTE: fixed in TAP 0.41 (PD)

    • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?

NOTE: fixed in TAP 0.41 (PD)

    • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?

NOTE: fixed in TAP 0.41 (PD)

    • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well. I suggest an extra row, "view_sql, containing the SQL that defines this view (for rows with table_type=view).

NOTE: ADQL does not specify CREATE statements so this could not be described with ADQL. As for showing the SQL CREATE VIEW, is that actually worthwhile? It will not necessarily map to anything the user could infer from the metadata (table and column names could be arbitrarily different, for example). (PD)

    • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?

NOTE: fixed in TAP 0.41 (PD)

    • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries.
NOTE: neither the SQL type nor the VOTable type is actually sufficient; one needs the ADQL type which includes the region constructs as well. The SQL types of those will be (var)char or (var)binary (most likely)... will post discussion to dal list (PD)

It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.

    • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
    • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
      select primary,.indexed,std from tap_schema.columns
      ? (I guess 2.11 might say something about this)
    • s2.6 Metadata prescription for foreign keys is missing but very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
    • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.

NOTE: There have been discussions about functions and so far the consensus has been that we should just leave it out of the initial version. That does mean that people will not be able to use any of the ADQL region stuff without just guessing it will work and being ready for an error. Will initiate further discussion on dal list (PD)

  • s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
select * 
  from TAP_SCHEMA.tables
 where table_name like 'TAP_SCHEMA.%'
    • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
    • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
  • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
    select * from TAP_SCHEMA.tableset
    as that table does not exist.
  • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS functionality?
  • s2.7 p20 par7 "... any type of file ... do something useful with the file." I could not find if the document defines such behaviour explicitly. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (and the STC in example).
  • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
  • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
  • s2.8.1 p21 par 1 "The first data row should give the column name..."
    • First, is there a distinction between data rows and other rows?
    • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
  • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is the value of the REQUEST parameter meant?
  • s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A tableset represents the whole database, a single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined properly and likely leads to unnessecary complications.Through ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
  • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely mainly to cover this case?
  • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." I assume this should read: "...one TABLE element per table..."?
  • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it. That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
  • s2.8.5 "Overflows" (already commented on above) I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
  • s2.10 I suppose that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11. [I guess that version 0.4 takes care of part of this.]
  • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
  • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
  • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with statements elsewhere in the spec.
  • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be complete.
  • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
  • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends only on the query what is returned. If a user queries
    select ra, dec ...
    than the service MUST return an ra and a dec column.
  • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
  • (maybe more later)


<--  
-->
  • TAP_METADATA.jpg:
This is a JPEG version of a MagicDraw model which is available in UML form here. In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components. In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys.

NB, the original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well. That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. :Changed: <
< TAP_METADATA.jpg>
> TAP_METADATA.jpg 

For those who don't like UML, here an attempt at a summary:

  • database [name,description, utype]
    • schema [name,description, utype]
      • table/view: [name,description, utype, sql (for views)]
        • column [[name,description, utype,datatype, ucd, etc]
        • foreignkey [toTableName, ...]
          • foreignKeyColumn [fromColumnName, toColumnName]
        • index[name, description, ...]
          • indexColumn [columnName, rank]
Changed: <
<
        • group [name, id]
>
>
        • group [name, id, ...]
 
          • columnRef [columnName, rank]
Changed: <
<
          • paramRef [...]
          • groupRef [...]
>
>
          • param(Ref) [...]
          • group(Ref) [...]
 
        • param [name, ucd, ..., value]

META FILEATTACHMENT attr="h" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"
Changed: <
<
META FILEATTACHMENT attr="" comment="" date="1239896193" name="TAP_METADATA.png" path="TAP_METADATA.png" size="77819" user="GerardLemson" version="1.1"
>
>
META FILEATTACHMENT attr="" comment="Added some VOTable ellements" date="1239968234" name="TAP_METADATA.png" path="TAP_METADATA.png" size="89392" user="GerardLemson" version="1.2"
 

Revision 292009-04-17 - GerardLemson

 
META TOPICPARENT name="GerardLemson"
This page contains my comments on the TAP 0.31 spec First summary of main points, then list of more detailed comments with location intext. Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive. Some comments may have been made irrelevant by the recent version 0.4.

NOTE: Notes like this one included below once action has been taken with respect to each point (PD aka PatrickDowler). If TAP doc version is not specified, it is TAP-0.41.

Major points/issues/questions

  • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.

NOTE: post to dal mailing list (2009-03-02) to explain and initiate discussion (PD)

  • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
    1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
    2. indexes SHOULD [GL changed from MUST]be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
    3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata. [GL see datatypes page
    4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.

NOTE: metadata discussion deferred until after next draft (PD)

  • Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
    • Imho, MAXREC and MTIME parameters should not be mixed with ADQL.

NOTE: This should be much more clear in TAP-0.4 (PD)

  • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
    • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
    • Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
    • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).

NOTE: posted explanation and request for comment to dal mailing list on 2009-03-02 (PD)

NOTE: After discussion on dal mailing list, the doc has been changed to defer to the query language spec in matter of case sensitivity.

line-by-line notes/questions/issues

(s=section,p=page,par=paragraph on page or in section).

  • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted. Not write one's own database engine.

NOTE: extra text about abstraction removed for clarity (PD)

  • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?

NOTE: In my opinion it is necessary to allow services to support a subset of ADQL; this would be described in the capabilties returned from the VOSI capabilities request... not sure if one lists all the ADQL features (keywords) that are supported or the version of ADQL and then the ones that are not (should be a smaller list)... TBD (PD)

  • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref]! Maybe such a "meta-specification" would be a good place to put some of the parameter query specification in.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
    1. querying standardised tables using ADQL or PARAMQUERY
    2. tableset queries
    3. VOSI queries

NOTE: deferring (as above for metadata)

  • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset as it adds extensions such as user defined functions and of course all the REGION stuff.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.2 p6 par3: "... use an off-the-shelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if it is properly supplied with the required user-defined-functions.

NOTE: this text is no longer in as of TAP-0.4; underlying issue not otherwise addressed (PD)

  • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect cone search as the most common use case, but is that true? Could be changed to "some common use cases".

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is. This would be particularly useful for those people who want to implement TAP before UWS is completely accepted. Same is true for possible dependencies on other not-yet-accepted standards such as VOSI.

NOTE: It was accepted in Trieste that the UWS spec would have to be developed and standardised ahead of TAP (PD)

  • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as
    select * from thattable
    , not advanced at all. But it may lead to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.

NOTE: this text is mostly gone as of TAP-0.4; discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?

NOTE: it is true that when describing a service interface that some things are requirements for the service and some for the clients; the latter also need to be described so that the correct response can be specified (e.g. an error when a required param is missing); will try to clarify after next draft (PD)

  • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service MUST support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.

NOTE: discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2.1 p9 3rd item in list I would think that table metadata MUST be provided. Without it no queries are possible.

NOTE: agreed, changed in TAP-0.41

  • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
  • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?

NOTE: Yes, the R in REST is represent(...). I don't see any reason one could not have/serve other web resources from within the tree. TAP (and UWS) simply enumerate a (required) set of resources and what they mean (PD).

  • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?

NOTE: This is just explaining how HTTP works in practice and really belongs in Use of HTTP (Section 7 in TAP-0.4).

  • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?

NOTE: as above, sync vs async discussion on mailing list (PD)

  • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit indication of which combinations are valid.

NOTE: Let's revisit this w.r.t. TAP-0.4 now that many parameters have been moved to a separate document (PD).

  • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.

NOTE: It is informative for a service implementor though: it tells them what to assume and what is an error. It could be worded in a more service-implementor centric fashion, but then a client-centric doc would be needed -- maybe is? (PD).

  • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?

NOTE: It is spurious. It is assumed that the service will extract parameters it knows about from the request and ignore anything that is not applicable, which includes everything it does not know about.

  • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.

NOTE: not sure what this refers to as the page numbers are not that helpful (did you print on A4?) but you are right that the REQUEST requirement applies to direct access to the /async and /sync endpoints. That is, you would not need REQUEST to access the child resources under a UWS job. Text clarified in TAP-0.41 (PD).

  • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?

NOTE: consistent case in TAP-0.4 (PD).

  • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
    • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
    • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that the correct VOSI spec?)

NOTE: As with UWS (above), we expect that VOSI as it pertains to TAP will be standardised ahead of TAP (PD). The returned XML is specified by the VODataService spec; anyway, this needs discussion as part of the whole metadata topic (PD).

  • s2.4.2 p12 par1 "The query string is case sensitive."
    • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
    • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). Maybe useful to look at report on different database systems by JVO in Victoria. Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).

NOTE: initiated discussion of case sensitiveness on mailing list 2009-03-02 (PD).

  • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?

NOTE: part oft he whole case-senitive topic above; it is a requirement on the client as stated (PD).

  • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
    • Is ISO8601:2004 intended?
    • Must all of ISO8601(:2004) be supported?
    • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
    • An overview of other RDBS would be useful.

NOTE: Agreed: reviewing what DBs mostly support so that dates can be passed through easily would be good... TBD (PD)

  • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one MUST implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS and other REGION-like extensions) which one may or may not support. After all
     select * from sources where dec between -10 and 10
    looks like a spatial query, but does not require INTERSECTS etc.

NOTE: We wanted to specify a fixed set of ADQL region constructs that everyone supported so make it easier on the client and on the implementor (fewer decisions). The text says "contains columns with spatial" AND "service wants to support". This is intended to mean that spatial querying support via ADQL region constructs is optional. The example of a range of dec above is independent of this and perfectly acceptable. Text clarified in TAP-0.41 (PD).

  • s2.4.2 p12 par3 "the extent of STC/S support within the REGION function is left up to the implementation" I can read this as allowing no support for STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.

NOTE: For consistency with the direct ADQL constructs, we could require support for position, circle, and box in STC/S. In general, the claim is that services and applications can supprot whatever part of STC they like and that is OK... (PD).

  • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. Also noted by Markus Demleitner I think. This seems like a change to the language, which might even require different parsers/interpreters than one would normally implement. How far does this restriction go. Is the following query ok for example:
select POINT(c.coordSys, t.ra, t.dec)
from (select 'ICRS' as coordSys) c
,	 table t
...

NOTE: I agree that ADQL allows these and in ADQL discussions where people didn't like the look of such constructs it was argued that this was just the nature of ADQL (SQL) and it's treatment of argument types (literal is equivalent to column ref); this text was included in a provocative manner when it should be simply a warning to users that if they do this they are possibly going to make mistakes. Of course, there are plenty of ways to make mistakes with ADQL and this particular complexity is not going to solve that. Changed text to make this a note/warning in TAP-0.4 (PD).

  • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?

NOTE: changed to MUST in TAP-0.4 (PD)

  • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?

NOTE: good idea (PD)

  • s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?

NOTE: There is an html format but that is for the whole page. Can you plausibly get an html table element without associated CSS style sheets and expect something useful? Marginally maybe... (PD)

  • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)

NOTE: I think the intent is for TABLEDATA ONLY. TBD? Will clarify text to say TABLEDATA only for now in TAP-0.41 (PD)

  • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?

NOTE: It must be a legal table name as defined by ADQL, so not following the should means that one has added optional schema (and maybe catalgo) names as prefixes. If the schema name is TAP_UPLOAD (doc incorrectly says TAP_SCHEMA) that would be ok but if it is anythign else it would have to be an error. Clarifying to say "must be an unqualified table name" in TAP-0.41 (PD).

  • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.

NOTE: MAXREC is used to possibly negotiate a query size limit with the service, which may not otherwise be able to tell what the query will return. Without adding MAXREC to an ADQL query (even one using TOP) the service may truncate the result at a different place due to default limits (PD).

  • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.

NOTE: this paragraph was removed in TAP-0.4; rules for indicating truncation are described elsewhere (PD).

  • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request.
NOTE: Last sentence mentioning null-query removed in TAP-0.41 (PD).

  • s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?

NOTE: It is true that MTIME is intended for finding new/changed/deleted records and making a mirror. While that may generally be best done via param-query, at this point only ADQL support is required so although MTIME is optional we did not want to make it dependent on other optional optional features. If the service cannot deal with MTIME it is ignored, as usual (PD).

  • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]

NOTE: Yes, this was moved to a separate document (PD).

  • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.

NOTE: The section on LANG no longer says value is case insenstive as of TAP-0.4; other case-sensitivity issues TBD (PD).

  • s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.

NOTE: It is, although it also says that the service never has to deal with multi-valued parameters in the HTTP sense. Not sure why not.. will bring up on DAL list (PD).

  • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
  • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.

NOTE: Fixed in TAP 0.41. Clarified to say that table name is defined in the query language spec. In cases where one can use the .. construct to specify the default schema, there is still a schema and you can put that explicitly in the metadata, so I don't see a problem. (PD)

  • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
    • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?

NOTE: fixed in TAP 0.41 (PD)

    • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?

NOTE: fixed in TAP 0.41 (PD)

    • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?

NOTE: fixed in TAP 0.41 (PD)

    • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well. I suggest an extra row, "view_sql, containing the SQL that defines this view (for rows with table_type=view).

NOTE: ADQL does not specify CREATE statements so this could not be described with ADQL. As for showing the SQL CREATE VIEW, is that actually worthwhile? It will not necessarily map to anything the user could infer from the metadata (table and column names could be arbitrarily different, for example). (PD)

    • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?

NOTE: fixed in TAP 0.41 (PD)

    • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries.
NOTE: neither the SQL type nor the VOTable type is actually sufficient; one needs the ADQL type which includes the region constructs as well. The SQL types of those will be (var)char or (var)binary (most likely)... will post discussion to dal list (PD)

It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.

    • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
    • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
      select primary,.indexed,std from tap_schema.columns
      ? (I guess 2.11 might say something about this)
    • s2.6 Metadata prescription for foreign keys is missing but very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
    • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.

NOTE: There have been discussions about functions and so far the consensus has been that we should just leave it out of the initial version. That does mean that people will not be able to use any of the ADQL region stuff without just guessing it will work and being ready for an error. Will initiate further discussion on dal list (PD)

  • s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
select * 
  from TAP_SCHEMA.tables
 where table_name like 'TAP_SCHEMA.%'
    • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
    • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
  • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
    select * from TAP_SCHEMA.tableset
    as that table does not exist.
  • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS functionality?
  • s2.7 p20 par7 "... any type of file ... do something useful with the file." I could not find if the document defines such behaviour explicitly. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (and the STC in example).
  • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
  • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
  • s2.8.1 p21 par 1 "The first data row should give the column name..."
    • First, is there a distinction between data rows and other rows?
    • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
  • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is the value of the REQUEST parameter meant?
  • s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A tableset represents the whole database, a single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined properly and likely leads to unnessecary complications.Through ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
  • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely mainly to cover this case?
  • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." I assume this should read: "...one TABLE element per table..."?
  • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it. That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
  • s2.8.5 "Overflows" (already commented on above) I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
  • s2.10 I suppose that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11. [I guess that version 0.4 takes care of part of this.]
  • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
  • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
  • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with statements elsewhere in the spec.
  • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be complete.
  • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
  • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends only on the query what is returned. If a user queries
    select ra, dec ...
    than the service MUST return an ra and a dec column.
  • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
  • (maybe more later)


<--  
-->
  • TAP_METADATA.jpg:
This is a JPEG version of a MagicDraw model which is available in UML form here. In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components. In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys.

NB, the original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well. That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. : TAP_METADATA.jpgAdded: >
> For those who don't like UML, here an attempt at a summary:

  • database [name,description, utype]
    • schema [name,description, utype]
      • table/view: [name,description, utype, sql (for views)]
        • column [[name,description, utype,datatype, ucd, etc]
        • foreignkey [toTableName, ...]
          • foreignKeyColumn [fromColumnName, toColumnName]
        • index[name, description, ...]
          • indexColumn [columnName, rank]
        • group [name, id]
          • columnRef [columnName, rank]
          • paramRef [...]
          • groupRef [...]
        • param [name, ucd, ..., value]
 
META FILEATTACHMENT attr="h" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"
META FILEATTACHMENT attr="" comment="" date="1239896193" name="TAP_METADATA.png" path="TAP_METADATA.png" size="77819" user="GerardLemson" version="1.1"

Revision 282009-04-16 - GerardLemson

 
META TOPICPARENT name="GerardLemson"
This page contains my comments on the TAP 0.31 spec First summary of main points, then list of more detailed comments with location intext. Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive. Some comments may have been made irrelevant by the recent version 0.4.

NOTE: Notes like this one included below once action has been taken with respect to each point (PD aka PatrickDowler). If TAP doc version is not specified, it is TAP-0.41.

Major points/issues/questions

  • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.

NOTE: post to dal mailing list (2009-03-02) to explain and initiate discussion (PD)

  • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
    1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
    2. indexes SHOULD [GL changed from MUST]be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
    3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata. [GL see datatypes page
    4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.

NOTE: metadata discussion deferred until after next draft (PD)

  • Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
    • Imho, MAXREC and MTIME parameters should not be mixed with ADQL.

NOTE: This should be much more clear in TAP-0.4 (PD)

  • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
    • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
    • Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
    • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).

NOTE: posted explanation and request for comment to dal mailing list on 2009-03-02 (PD)

NOTE: After discussion on dal mailing list, the doc has been changed to defer to the query language spec in matter of case sensitivity.

line-by-line notes/questions/issues

(s=section,p=page,par=paragraph on page or in section).

  • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted. Not write one's own database engine.

NOTE: extra text about abstraction removed for clarity (PD)

  • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?

NOTE: In my opinion it is necessary to allow services to support a subset of ADQL; this would be described in the capabilties returned from the VOSI capabilities request... not sure if one lists all the ADQL features (keywords) that are supported or the version of ADQL and then the ones that are not (should be a smaller list)... TBD (PD)

  • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref]! Maybe such a "meta-specification" would be a good place to put some of the parameter query specification in.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
    1. querying standardised tables using ADQL or PARAMQUERY
    2. tableset queries
    3. VOSI queries

NOTE: deferring (as above for metadata)

  • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset as it adds extensions such as user defined functions and of course all the REGION stuff.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.2 p6 par3: "... use an off-the-shelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if it is properly supplied with the required user-defined-functions.

NOTE: this text is no longer in as of TAP-0.4; underlying issue not otherwise addressed (PD)

  • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect cone search as the most common use case, but is that true? Could be changed to "some common use cases".

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is. This would be particularly useful for those people who want to implement TAP before UWS is completely accepted. Same is true for possible dependencies on other not-yet-accepted standards such as VOSI.

NOTE: It was accepted in Trieste that the UWS spec would have to be developed and standardised ahead of TAP (PD)

  • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as
    select * from thattable
    , not advanced at all. But it may lead to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.

NOTE: this text is mostly gone as of TAP-0.4; discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?

NOTE: it is true that when describing a service interface that some things are requirements for the service and some for the clients; the latter also need to be described so that the correct response can be specified (e.g. an error when a required param is missing); will try to clarify after next draft (PD)

  • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service MUST support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.

NOTE: discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2.1 p9 3rd item in list I would think that table metadata MUST be provided. Without it no queries are possible.

NOTE: agreed, changed in TAP-0.41

  • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
  • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?

NOTE: Yes, the R in REST is represent(...). I don't see any reason one could not have/serve other web resources from within the tree. TAP (and UWS) simply enumerate a (required) set of resources and what they mean (PD).

  • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?

NOTE: This is just explaining how HTTP works in practice and really belongs in Use of HTTP (Section 7 in TAP-0.4).

  • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?

NOTE: as above, sync vs async discussion on mailing list (PD)

  • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit indication of which combinations are valid.

NOTE: Let's revisit this w.r.t. TAP-0.4 now that many parameters have been moved to a separate document (PD).

  • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.

NOTE: It is informative for a service implementor though: it tells them what to assume and what is an error. It could be worded in a more service-implementor centric fashion, but then a client-centric doc would be needed -- maybe is? (PD).

  • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?

NOTE: It is spurious. It is assumed that the service will extract parameters it knows about from the request and ignore anything that is not applicable, which includes everything it does not know about.

  • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.

NOTE: not sure what this refers to as the page numbers are not that helpful (did you print on A4?) but you are right that the REQUEST requirement applies to direct access to the /async and /sync endpoints. That is, you would not need REQUEST to access the child resources under a UWS job. Text clarified in TAP-0.41 (PD).

  • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?

NOTE: consistent case in TAP-0.4 (PD).

  • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
    • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
    • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that the correct VOSI spec?)

NOTE: As with UWS (above), we expect that VOSI as it pertains to TAP will be standardised ahead of TAP (PD). The returned XML is specified by the VODataService spec; anyway, this needs discussion as part of the whole metadata topic (PD).

  • s2.4.2 p12 par1 "The query string is case sensitive."
    • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
    • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). Maybe useful to look at report on different database systems by JVO in Victoria. Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).

NOTE: initiated discussion of case sensitiveness on mailing list 2009-03-02 (PD).

  • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?

NOTE: part oft he whole case-senitive topic above; it is a requirement on the client as stated (PD).

  • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
    • Is ISO8601:2004 intended?
    • Must all of ISO8601(:2004) be supported?
    • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
    • An overview of other RDBS would be useful.

NOTE: Agreed: reviewing what DBs mostly support so that dates can be passed through easily would be good... TBD (PD)

  • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one MUST implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS and other REGION-like extensions) which one may or may not support. After all
     select * from sources where dec between -10 and 10
    looks like a spatial query, but does not require INTERSECTS etc.

NOTE: We wanted to specify a fixed set of ADQL region constructs that everyone supported so make it easier on the client and on the implementor (fewer decisions). The text says "contains columns with spatial" AND "service wants to support". This is intended to mean that spatial querying support via ADQL region constructs is optional. The example of a range of dec above is independent of this and perfectly acceptable. Text clarified in TAP-0.41 (PD).

  • s2.4.2 p12 par3 "the extent of STC/S support within the REGION function is left up to the implementation" I can read this as allowing no support for STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.

NOTE: For consistency with the direct ADQL constructs, we could require support for position, circle, and box in STC/S. In general, the claim is that services and applications can supprot whatever part of STC they like and that is OK... (PD).

  • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. Also noted by Markus Demleitner I think. This seems like a change to the language, which might even require different parsers/interpreters than one would normally implement. How far does this restriction go. Is the following query ok for example:
select POINT(c.coordSys, t.ra, t.dec)
from (select 'ICRS' as coordSys) c
,	 table t
...

NOTE: I agree that ADQL allows these and in ADQL discussions where people didn't like the look of such constructs it was argued that this was just the nature of ADQL (SQL) and it's treatment of argument types (literal is equivalent to column ref); this text was included in a provocative manner when it should be simply a warning to users that if they do this they are possibly going to make mistakes. Of course, there are plenty of ways to make mistakes with ADQL and this particular complexity is not going to solve that. Changed text to make this a note/warning in TAP-0.4 (PD).

  • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?

NOTE: changed to MUST in TAP-0.4 (PD)

  • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?

NOTE: good idea (PD)

  • s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?

NOTE: There is an html format but that is for the whole page. Can you plausibly get an html table element without associated CSS style sheets and expect something useful? Marginally maybe... (PD)

  • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)

NOTE: I think the intent is for TABLEDATA ONLY. TBD? Will clarify text to say TABLEDATA only for now in TAP-0.41 (PD)

  • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?

NOTE: It must be a legal table name as defined by ADQL, so not following the should means that one has added optional schema (and maybe catalgo) names as prefixes. If the schema name is TAP_UPLOAD (doc incorrectly says TAP_SCHEMA) that would be ok but if it is anythign else it would have to be an error. Clarifying to say "must be an unqualified table name" in TAP-0.41 (PD).

  • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.

NOTE: MAXREC is used to possibly negotiate a query size limit with the service, which may not otherwise be able to tell what the query will return. Without adding MAXREC to an ADQL query (even one using TOP) the service may truncate the result at a different place due to default limits (PD).

  • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.

NOTE: this paragraph was removed in TAP-0.4; rules for indicating truncation are described elsewhere (PD).

  • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request.
NOTE: Last sentence mentioning null-query removed in TAP-0.41 (PD).

  • s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?

NOTE: It is true that MTIME is intended for finding new/changed/deleted records and making a mirror. While that may generally be best done via param-query, at this point only ADQL support is required so although MTIME is optional we did not want to make it dependent on other optional optional features. If the service cannot deal with MTIME it is ignored, as usual (PD).

  • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]

NOTE: Yes, this was moved to a separate document (PD).

  • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.

NOTE: The section on LANG no longer says value is case insenstive as of TAP-0.4; other case-sensitivity issues TBD (PD).

  • s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.

NOTE: It is, although it also says that the service never has to deal with multi-valued parameters in the HTTP sense. Not sure why not.. will bring up on DAL list (PD).

  • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
  • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.

NOTE: Fixed in TAP 0.41. Clarified to say that table name is defined in the query language spec. In cases where one can use the .. construct to specify the default schema, there is still a schema and you can put that explicitly in the metadata, so I don't see a problem. (PD)

  • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
    • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?

NOTE: fixed in TAP 0.41 (PD)

    • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?

NOTE: fixed in TAP 0.41 (PD)

    • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?

NOTE: fixed in TAP 0.41 (PD)

    • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well. I suggest an extra row, "view_sql, containing the SQL that defines this view (for rows with table_type=view).

NOTE: ADQL does not specify CREATE statements so this could not be described with ADQL. As for showing the SQL CREATE VIEW, is that actually worthwhile? It will not necessarily map to anything the user could infer from the metadata (table and column names could be arbitrarily different, for example). (PD)

    • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?

NOTE: fixed in TAP 0.41 (PD)

    • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries.
NOTE: neither the SQL type nor the VOTable type is actually sufficient; one needs the ADQL type which includes the region constructs as well. The SQL types of those will be (var)char or (var)binary (most likely)... will post discussion to dal list (PD)

It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.

    • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
    • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
      select primary,.indexed,std from tap_schema.columns
      ? (I guess 2.11 might say something about this)
    • s2.6 Metadata prescription for foreign keys is missing but very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
    • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.

NOTE: There have been discussions about functions and so far the consensus has been that we should just leave it out of the initial version. That does mean that people will not be able to use any of the ADQL region stuff without just guessing it will work and being ready for an error. Will initiate further discussion on dal list (PD)

  • s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
select * 
  from TAP_SCHEMA.tables
 where table_name like 'TAP_SCHEMA.%'
    • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
    • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
  • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
    select * from TAP_SCHEMA.tableset
    as that table does not exist.
  • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS functionality?
  • s2.7 p20 par7 "... any type of file ... do something useful with the file." I could not find if the document defines such behaviour explicitly. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (and the STC in example).
  • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
  • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
  • s2.8.1 p21 par 1 "The first data row should give the column name..."
    • First, is there a distinction between data rows and other rows?
    • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
  • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is the value of the REQUEST parameter meant?
  • s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A tableset represents the whole database, a single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined properly and likely leads to unnessecary complications.Through ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
  • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely mainly to cover this case?
  • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." I assume this should read: "...one TABLE element per table..."?
  • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it. That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
  • s2.8.5 "Overflows" (already commented on above) I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
  • s2.10 I suppose that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11. [I guess that version 0.4 takes care of part of this.]
  • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
  • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
  • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with statements elsewhere in the spec.
  • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be complete.
  • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
  • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends only on the query what is returned. If a user queries
    select ra, dec ...
    than the service MUST return an ra and a dec column.
  • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
  • (maybe more later)


<--  
-->
  • TAP_METADATA.jpg:
This is a JPEG version of a MagicDraw model which is available in UML form here. In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components. In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys.

NB, the original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well. That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. :Changed: <
< TAP_METADATA.jpg>
> TAP_METADATA.jpg Changed: <
<

META FILEATTACHMENT attr="" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"
>
>Added: >
>
META FILEATTACHMENT attr="h" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"
META FILEATTACHMENT attr="" comment="" date="1239896193" name="TAP_METADATA.png" path="TAP_METADATA.png" size="77819" user="GerardLemson" version="1.1"
 

Revision 272009-03-23 - GerardLemson

 
META TOPICPARENT name="GerardLemson"
This page contains my comments on the TAP 0.31 spec First summary of main points, then list of more detailed comments with location intext. Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive. Some comments may have been made irrelevant by the recent version 0.4.

NOTE: Notes like this one included below once action has been taken with respect to each point (PD aka PatrickDowler). If TAP doc version is not specified, it is TAP-0.41.

Major points/issues/questions

  • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.

NOTE: post to dal mailing list (2009-03-02) to explain and initiate discussion (PD)

  • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
    1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
Changed:
<
<
    1. indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
    2. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
>
>
    1. indexes SHOULD [GL changed from MUST]be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
    2. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata. [GL see datatypes page
 
    1. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.

NOTE: metadata discussion deferred until after next draft (PD)

  • Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
    • Imho, MAXREC and MTIME parameters should not be mixed with ADQL.

NOTE: This should be much more clear in TAP-0.4 (PD)

  • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
    • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
    • Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
    • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).

NOTE: posted explanation and request for comment to dal mailing list on 2009-03-02 (PD)

NOTE: After discussion on dal mailing list, the doc has been changed to defer to the query language spec in matter of case sensitivity.

line-by-line notes/questions/issues

(s=section,p=page,par=paragraph on page or in section).

  • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted. Not write one's own database engine.

NOTE: extra text about abstraction removed for clarity (PD)

  • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?

NOTE: In my opinion it is necessary to allow services to support a subset of ADQL; this would be described in the capabilties returned from the VOSI capabilities request... not sure if one lists all the ADQL features (keywords) that are supported or the version of ADQL and then the ones that are not (should be a smaller list)... TBD (PD)

  • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref]! Maybe such a "meta-specification" would be a good place to put some of the parameter query specification in.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
    1. querying standardised tables using ADQL or PARAMQUERY
    2. tableset queries
    3. VOSI queries

NOTE: deferring (as above for metadata)

  • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset as it adds extensions such as user defined functions and of course all the REGION stuff.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.2 p6 par3: "... use an off-the-shelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if it is properly supplied with the required user-defined-functions.

NOTE: this text is no longer in as of TAP-0.4; underlying issue not otherwise addressed (PD)

  • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect cone search as the most common use case, but is that true? Could be changed to "some common use cases".

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is. This would be particularly useful for those people who want to implement TAP before UWS is completely accepted. Same is true for possible dependencies on other not-yet-accepted standards such as VOSI.

NOTE: It was accepted in Trieste that the UWS spec would have to be developed and standardised ahead of TAP (PD)

  • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as
    select * from thattable
    , not advanced at all. But it may lead to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.

NOTE: this text is mostly gone as of TAP-0.4; discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?

NOTE: it is true that when describing a service interface that some things are requirements for the service and some for the clients; the latter also need to be described so that the correct response can be specified (e.g. an error when a required param is missing); will try to clarify after next draft (PD)

  • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service MUST support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.

NOTE: discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2.1 p9 3rd item in list I would think that table metadata MUST be provided. Without it no queries are possible.

NOTE: agreed, changed in TAP-0.41

  • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
  • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?

NOTE: Yes, the R in REST is represent(...). I don't see any reason one could not have/serve other web resources from within the tree. TAP (and UWS) simply enumerate a (required) set of resources and what they mean (PD).

  • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?

NOTE: This is just explaining how HTTP works in practice and really belongs in Use of HTTP (Section 7 in TAP-0.4).

  • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?

NOTE: as above, sync vs async discussion on mailing list (PD)

  • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit indication of which combinations are valid.

NOTE: Let's revisit this w.r.t. TAP-0.4 now that many parameters have been moved to a separate document (PD).

  • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.

NOTE: It is informative for a service implementor though: it tells them what to assume and what is an error. It could be worded in a more service-implementor centric fashion, but then a client-centric doc would be needed -- maybe is? (PD).

  • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?

NOTE: It is spurious. It is assumed that the service will extract parameters it knows about from the request and ignore anything that is not applicable, which includes everything it does not know about.

  • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.

NOTE: not sure what this refers to as the page numbers are not that helpful (did you print on A4?) but you are right that the REQUEST requirement applies to direct access to the /async and /sync endpoints. That is, you would not need REQUEST to access the child resources under a UWS job. Text clarified in TAP-0.41 (PD).

  • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?

NOTE: consistent case in TAP-0.4 (PD).

  • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
    • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
    • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that the correct VOSI spec?)

NOTE: As with UWS (above), we expect that VOSI as it pertains to TAP will be standardised ahead of TAP (PD). The returned XML is specified by the VODataService spec; anyway, this needs discussion as part of the whole metadata topic (PD).

  • s2.4.2 p12 par1 "The query string is case sensitive."
    • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
    • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). Maybe useful to look at report on different database systems by JVO in Victoria. Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).

NOTE: initiated discussion of case sensitiveness on mailing list 2009-03-02 (PD).

  • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?

NOTE: part oft he whole case-senitive topic above; it is a requirement on the client as stated (PD).

  • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
    • Is ISO8601:2004 intended?
    • Must all of ISO8601(:2004) be supported?
    • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
    • An overview of other RDBS would be useful.

NOTE: Agreed: reviewing what DBs mostly support so that dates can be passed through easily would be good... TBD (PD)

  • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one MUST implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS and other REGION-like extensions) which one may or may not support. After all
     select * from sources where dec between -10 and 10
    looks like a spatial query, but does not require INTERSECTS etc.

NOTE: We wanted to specify a fixed set of ADQL region constructs that everyone supported so make it easier on the client and on the implementor (fewer decisions). The text says "contains columns with spatial" AND "service wants to support". This is intended to mean that spatial querying support via ADQL region constructs is optional. The example of a range of dec above is independent of this and perfectly acceptable. Text clarified in TAP-0.41 (PD).

  • s2.4.2 p12 par3 "the extent of STC/S support within the REGION function is left up to the implementation" I can read this as allowing no support for STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.

NOTE: For consistency with the direct ADQL constructs, we could require support for position, circle, and box in STC/S. In general, the claim is that services and applications can supprot whatever part of STC they like and that is OK... (PD).

  • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. Also noted by Markus Demleitner I think. This seems like a change to the language, which might even require different parsers/interpreters than one would normally implement. How far does this restriction go. Is the following query ok for example:
select POINT(c.coordSys, t.ra, t.dec)
from (select 'ICRS' as coordSys) c
,	 table t
...

NOTE: I agree that ADQL allows these and in ADQL discussions where people didn't like the look of such constructs it was argued that this was just the nature of ADQL (SQL) and it's treatment of argument types (literal is equivalent to column ref); this text was included in a provocative manner when it should be simply a warning to users that if they do this they are possibly going to make mistakes. Of course, there are plenty of ways to make mistakes with ADQL and this particular complexity is not going to solve that. Changed text to make this a note/warning in TAP-0.4 (PD).

  • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?

NOTE: changed to MUST in TAP-0.4 (PD)

  • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?

NOTE: good idea (PD)

  • s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?

NOTE: There is an html format but that is for the whole page. Can you plausibly get an html table element without associated CSS style sheets and expect something useful? Marginally maybe... (PD)

  • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)

NOTE: I think the intent is for TABLEDATA ONLY. TBD? Will clarify text to say TABLEDATA only for now in TAP-0.41 (PD)

  • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?

NOTE: It must be a legal table name as defined by ADQL, so not following the should means that one has added optional schema (and maybe catalgo) names as prefixes. If the schema name is TAP_UPLOAD (doc incorrectly says TAP_SCHEMA) that would be ok but if it is anythign else it would have to be an error. Clarifying to say "must be an unqualified table name" in TAP-0.41 (PD).

  • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.

NOTE: MAXREC is used to possibly negotiate a query size limit with the service, which may not otherwise be able to tell what the query will return. Without adding MAXREC to an ADQL query (even one using TOP) the service may truncate the result at a different place due to default limits (PD).

  • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.

NOTE: this paragraph was removed in TAP-0.4; rules for indicating truncation are described elsewhere (PD).

  • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request.
NOTE: Last sentence mentioning null-query removed in TAP-0.41 (PD).

  • s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?

NOTE: It is true that MTIME is intended for finding new/changed/deleted records and making a mirror. While that may generally be best done via param-query, at this point only ADQL support is required so although MTIME is optional we did not want to make it dependent on other optional optional features. If the service cannot deal with MTIME it is ignored, as usual (PD).

  • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]

NOTE: Yes, this was moved to a separate document (PD).

  • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.

NOTE: The section on LANG no longer says value is case insenstive as of TAP-0.4; other case-sensitivity issues TBD (PD).

  • s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.

NOTE: It is, although it also says that the service never has to deal with multi-valued parameters in the HTTP sense. Not sure why not.. will bring up on DAL list (PD).

  • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
  • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.

NOTE: Fixed in TAP 0.41. Clarified to say that table name is defined in the query language spec. In cases where one can use the .. construct to specify the default schema, there is still a schema and you can put that explicitly in the metadata, so I don't see a problem. (PD)

  • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
    • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?

NOTE: fixed in TAP 0.41 (PD)

    • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?

NOTE: fixed in TAP 0.41 (PD)

    • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?

NOTE: fixed in TAP 0.41 (PD)

    • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well. I suggest an extra row, "view_sql, containing the SQL that defines this view (for rows with table_type=view).

NOTE: ADQL does not specify CREATE statements so this could not be described with ADQL. As for showing the SQL CREATE VIEW, is that actually worthwhile? It will not necessarily map to anything the user could infer from the metadata (table and column names could be arbitrarily different, for example). (PD)

    • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?

NOTE: fixed in TAP 0.41 (PD)

    • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries.
NOTE: neither the SQL type nor the VOTable type is actually sufficient; one needs the ADQL type which includes the region constructs as well. The SQL types of those will be (var)char or (var)binary (most likely)... will post discussion to dal list (PD)

It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.

    • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
    • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
      select primary,.indexed,std from tap_schema.columns
      ? (I guess 2.11 might say something about this)
    • s2.6 Metadata prescription for foreign keys is missing but very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
    • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.

NOTE: There have been discussions about functions and so far the consensus has been that we should just leave it out of the initial version. That does mean that people will not be able to use any of the ADQL region stuff without just guessing it will work and being ready for an error. Will initiate further discussion on dal list (PD)

  • s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
select * 
  from TAP_SCHEMA.tables
 where table_name like 'TAP_SCHEMA.%'
    • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
    • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
  • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
    select * from TAP_SCHEMA.tableset
    as that table does not exist.
  • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS functionality?
  • s2.7 p20 par7 "... any type of file ... do something useful with the file." I could not find if the document defines such behaviour explicitly. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (and the STC in example).
  • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
  • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
  • s2.8.1 p21 par 1 "The first data row should give the column name..."
    • First, is there a distinction between data rows and other rows?
    • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
  • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is the value of the REQUEST parameter meant?
  • s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A tableset represents the whole database, a single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined properly and likely leads to unnessecary complications.Through ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
  • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely mainly to cover this case?
  • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." I assume this should read: "...one TABLE element per table..."?
  • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it. That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
  • s2.8.5 "Overflows" (already commented on above) I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
  • s2.10 I suppose that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11. [I guess that version 0.4 takes care of part of this.]
  • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
  • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
  • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with statements elsewhere in the spec.
  • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be complete.
  • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
  • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends only on the query what is returned. If a user queries
    select ra, dec ...
    than the service MUST return an ra and a dec column.
  • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
  • (maybe more later)


<--  
-->
  • TAP_METADATA.jpg:
This is a JPEG version of a MagicDraw model which is available in UML form here. In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components. In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys.

NB, the original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well. That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. : TAP_METADATA.jpg

META FILEATTACHMENT attr="" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"

Revision 262009-03-17 - PatrickDowler

 
META TOPICPARENT name="GerardLemson"
This page contains my comments on the TAP 0.31 spec First summary of main points, then list of more detailed comments with location intext. Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive. Some comments may have been made irrelevant by the recent version 0.4.

NOTE: Notes like this one included below once action has been taken with respect to each point (PD aka PatrickDowler). If TAP doc version is not specified, it is TAP-0.41.

Major points/issues/questions

  • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.

NOTE: post to dal mailing list (2009-03-02) to explain and initiate discussion (PD)

  • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
    1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
    2. indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
    3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
    4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.

NOTE: metadata discussion deferred until after next draft (PD)

  • Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
    • Imho, MAXREC and MTIME parameters should not be mixed with ADQL.

NOTE: This should be much more clear in TAP-0.4 (PD)

  • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
    • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
    • Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
    • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).

NOTE: posted explanation and request for comment to dal mailing list on 2009-03-02 (PD)

Added:
>
>
NOTE: After discussion on dal mailing list, the doc has been changed to defer to the query language spec in matter of case sensitivity.
 

line-by-line notes/questions/issues

(s=section,p=page,par=paragraph on page or in section).

  • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted. Not write one's own database engine.

NOTE: extra text about abstraction removed for clarity (PD)

  • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?

NOTE: In my opinion it is necessary to allow services to support a subset of ADQL; this would be described in the capabilties returned from the VOSI capabilities request... not sure if one lists all the ADQL features (keywords) that are supported or the version of ADQL and then the ones that are not (should be a smaller list)... TBD (PD)

  • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref]! Maybe such a "meta-specification" would be a good place to put some of the parameter query specification in.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
    1. querying standardised tables using ADQL or PARAMQUERY
    2. tableset queries
    3. VOSI queries

NOTE: deferring (as above for metadata)

  • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset as it adds extensions such as user defined functions and of course all the REGION stuff.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.2 p6 par3: "... use an off-the-shelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if it is properly supplied with the required user-defined-functions.

NOTE: this text is no longer in as of TAP-0.4; underlying issue not otherwise addressed (PD)

  • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect cone search as the most common use case, but is that true? Could be changed to "some common use cases".

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is. This would be particularly useful for those people who want to implement TAP before UWS is completely accepted. Same is true for possible dependencies on other not-yet-accepted standards such as VOSI.

NOTE: It was accepted in Trieste that the UWS spec would have to be developed and standardised ahead of TAP (PD)

  • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as
    select * from thattable
    , not advanced at all. But it may lead to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.

NOTE: this text is mostly gone as of TAP-0.4; discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?

NOTE: it is true that when describing a service interface that some things are requirements for the service and some for the clients; the latter also need to be described so that the correct response can be specified (e.g. an error when a required param is missing); will try to clarify after next draft (PD)

  • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service MUST support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.

NOTE: discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2.1 p9 3rd item in list I would think that table metadata MUST be provided. Without it no queries are possible.

NOTE: agreed, changed in TAP-0.41

  • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
  • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?

NOTE: Yes, the R in REST is represent(...). I don't see any reason one could not have/serve other web resources from within the tree. TAP (and UWS) simply enumerate a (required) set of resources and what they mean (PD).

  • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?

NOTE: This is just explaining how HTTP works in practice and really belongs in Use of HTTP (Section 7 in TAP-0.4).

  • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?

NOTE: as above, sync vs async discussion on mailing list (PD)

  • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit indication of which combinations are valid.

NOTE: Let's revisit this w.r.t. TAP-0.4 now that many parameters have been moved to a separate document (PD).

  • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.

NOTE: It is informative for a service implementor though: it tells them what to assume and what is an error. It could be worded in a more service-implementor centric fashion, but then a client-centric doc would be needed -- maybe is? (PD).

  • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?

NOTE: It is spurious. It is assumed that the service will extract parameters it knows about from the request and ignore anything that is not applicable, which includes everything it does not know about.

  • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.

NOTE: not sure what this refers to as the page numbers are not that helpful (did you print on A4?) but you are right that the REQUEST requirement applies to direct access to the /async and /sync endpoints. That is, you would not need REQUEST to access the child resources under a UWS job. Text clarified in TAP-0.41 (PD).

  • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?

NOTE: consistent case in TAP-0.4 (PD).

  • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
    • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
    • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that the correct VOSI spec?)

NOTE: As with UWS (above), we expect that VOSI as it pertains to TAP will be standardised ahead of TAP (PD). The returned XML is specified by the VODataService spec; anyway, this needs discussion as part of the whole metadata topic (PD).

  • s2.4.2 p12 par1 "The query string is case sensitive."
    • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
    • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). Maybe useful to look at report on different database systems by JVO in Victoria. Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).

NOTE: initiated discussion of case sensitiveness on mailing list 2009-03-02 (PD).

  • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?

NOTE: part oft he whole case-senitive topic above; it is a requirement on the client as stated (PD).

  • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
    • Is ISO8601:2004 intended?
    • Must all of ISO8601(:2004) be supported?
    • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
    • An overview of other RDBS would be useful.

NOTE: Agreed: reviewing what DBs mostly support so that dates can be passed through easily would be good... TBD (PD)

  • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one MUST implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS and other REGION-like extensions) which one may or may not support. After all
     select * from sources where dec between -10 and 10
    looks like a spatial query, but does not require INTERSECTS etc.

NOTE: We wanted to specify a fixed set of ADQL region constructs that everyone supported so make it easier on the client and on the implementor (fewer decisions). The text says "contains columns with spatial" AND "service wants to support". This is intended to mean that spatial querying support via ADQL region constructs is optional. The example of a range of dec above is independent of this and perfectly acceptable. Text clarified in TAP-0.41 (PD).

  • s2.4.2 p12 par3 "the extent of STC/S support within the REGION function is left up to the implementation" I can read this as allowing no support for STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.

NOTE: For consistency with the direct ADQL constructs, we could require support for position, circle, and box in STC/S. In general, the claim is that services and applications can supprot whatever part of STC they like and that is OK... (PD).

  • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. Also noted by Markus Demleitner I think. This seems like a change to the language, which might even require different parsers/interpreters than one would normally implement. How far does this restriction go. Is the following query ok for example:
select POINT(c.coordSys, t.ra, t.dec)
from (select 'ICRS' as coordSys) c
,	 table t
...

NOTE: I agree that ADQL allows these and in ADQL discussions where people didn't like the look of such constructs it was argued that this was just the nature of ADQL (SQL) and it's treatment of argument types (literal is equivalent to column ref); this text was included in a provocative manner when it should be simply a warning to users that if they do this they are possibly going to make mistakes. Of course, there are plenty of ways to make mistakes with ADQL and this particular complexity is not going to solve that. Changed text to make this a note/warning in TAP-0.4 (PD).

  • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?

NOTE: changed to MUST in TAP-0.4 (PD)

  • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?

NOTE: good idea (PD)

  • s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?

NOTE: There is an html format but that is for the whole page. Can you plausibly get an html table element without associated CSS style sheets and expect something useful? Marginally maybe... (PD)

  • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)

NOTE: I think the intent is for TABLEDATA ONLY. TBD? Will clarify text to say TABLEDATA only for now in TAP-0.41 (PD)

  • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?

NOTE: It must be a legal table name as defined by ADQL, so not following the should means that one has added optional schema (and maybe catalgo) names as prefixes. If the schema name is TAP_UPLOAD (doc incorrectly says TAP_SCHEMA) that would be ok but if it is anythign else it would have to be an error. Clarifying to say "must be an unqualified table name" in TAP-0.41 (PD).

  • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.

NOTE: MAXREC is used to possibly negotiate a query size limit with the service, which may not otherwise be able to tell what the query will return. Without adding MAXREC to an ADQL query (even one using TOP) the service may truncate the result at a different place due to default limits (PD).

  • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.

NOTE: this paragraph was removed in TAP-0.4; rules for indicating truncation are described elsewhere (PD).

  • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request.
NOTE: Last sentence mentioning null-query removed in TAP-0.41 (PD).

  • s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?

NOTE: It is true that MTIME is intended for finding new/changed/deleted records and making a mirror. While that may generally be best done via param-query, at this point only ADQL support is required so although MTIME is optional we did not want to make it dependent on other optional optional features. If the service cannot deal with MTIME it is ignored, as usual (PD).

  • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]

NOTE: Yes, this was moved to a separate document (PD).

  • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.

NOTE: The section on LANG no longer says value is case insenstive as of TAP-0.4; other case-sensitivity issues TBD (PD).

  • s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.

NOTE: It is, although it also says that the service never has to deal with multi-valued parameters in the HTTP sense. Not sure why not.. will bring up on DAL list (PD).

  • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
  • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
Added:
>
>
NOTE: Fixed in TAP 0.41. Clarified to say that table name is defined in the query language spec. In cases where one can use the .. construct to specify the default schema, there is still a schema and you can put that explicitly in the metadata, so I don't see a problem. (PD)
 
  • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
    • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
Added:
>
>
NOTE: fixed in TAP 0.41 (PD)
 
    • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
Added:
>
>
NOTE: fixed in TAP 0.41 (PD)
 
    • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
Added:
>
>
NOTE: fixed in TAP 0.41 (PD)
 
    • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well. I suggest an extra row, "view_sql, containing the SQL that defines this view (for rows with table_type=view).
Added:
>
>
NOTE: ADQL does not specify CREATE statements so this could not be described with ADQL. As for showing the SQL CREATE VIEW, is that actually worthwhile? It will not necessarily map to anything the user could infer from the metadata (table and column names could be arbitrarily different, for example). (PD)
 
    • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
Added:
>
>
NOTE: fixed in TAP 0.41 (PD)
 
    • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries.
Added:
>
>
NOTE: neither the SQL type nor the VOTable type is actually sufficient; one needs the ADQL type which includes the region constructs as well. The SQL types of those will be (var)char or (var)binary (most likely)... will post discussion to dal list (PD)
 It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
    • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
    • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
      select primary,.indexed,std from tap_schema.columns
      ? (I guess 2.11 might say something about this)
    • s2.6 Metadata prescription for foreign keys is missing but very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
    • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.
Added:
>
>
NOTE: There have been discussions about functions and so far the consensus has been that we should just leave it out of the initial version. That does mean that people will not be able to use any of the ADQL region stuff without just guessing it will work and being ready for an error. Will initiate further discussion on dal list (PD)
 
  • s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
select * 
  from TAP_SCHEMA.tables
 where table_name like 'TAP_SCHEMA.%'
    • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
    • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
  • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
    select * from TAP_SCHEMA.tableset
    as that table does not exist.
  • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS functionality?
  • s2.7 p20 par7 "... any type of file ... do something useful with the file." I could not find if the document defines such behaviour explicitly. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (and the STC in example).
  • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
  • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
  • s2.8.1 p21 par 1 "The first data row should give the column name..."
    • First, is there a distinction between data rows and other rows?
    • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
  • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is the value of the REQUEST parameter meant?
  • s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A tableset represents the whole database, a single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined properly and likely leads to unnessecary complications.Through ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
  • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely mainly to cover this case?
  • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." I assume this should read: "...one TABLE element per table..."?
  • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it. That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
  • s2.8.5 "Overflows" (already commented on above) I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
  • s2.10 I suppose that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11. [I guess that version 0.4 takes care of part of this.]
  • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
  • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
  • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with statements elsewhere in the spec.
  • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be complete.
  • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
  • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends only on the query what is returned. If a user queries
    select ra, dec ...
    than the service MUST return an ra and a dec column.
  • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
  • (maybe more later)


<--  
-->
  • TAP_METADATA.jpg:
This is a JPEG version of a MagicDraw model which is available in UML form here. In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components. In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys.

NB, the original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well. That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. : TAP_METADATA.jpg

META FILEATTACHMENT attr="" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"

Revision 252009-03-02 - PatrickDowler

 
META TOPICPARENT name="GerardLemson"
This page contains my comments on the TAP 0.31 spec First summary of main points, then list of more detailed comments with location intext. Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive. Some comments may have been made irrelevant by the recent version 0.4.

NOTE: Notes like this one included below once action has been taken with respect to each point (PD aka PatrickDowler). If TAP doc version is not specified, it is TAP-0.41.

Major points/issues/questions

  • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.

NOTE: post to dal mailing list (2009-03-02) to explain and initiate discussion (PD)

  • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
    1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
    2. indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
    3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
    4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.

NOTE: metadata discussion deferred until after next draft (PD)

  • Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
    • Imho, MAXREC and MTIME parameters should not be mixed with ADQL.

NOTE: This should be much more clear in TAP-0.4 (PD)

  • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
    • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
    • Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
    • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).

NOTE: posted explanation and request for comment to dal mailing list on 2009-03-02 (PD)

line-by-line notes/questions/issues

(s=section,p=page,par=paragraph on page or in section).

  • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted. Not write one's own database engine.

NOTE: extra text about abstraction removed for clarity (PD)

  • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?

NOTE: In my opinion it is necessary to allow services to support a subset of ADQL; this would be described in the capabilties returned from the VOSI capabilities request... not sure if one lists all the ADQL features (keywords) that are supported or the version of ADQL and then the ones that are not (should be a smaller list)... TBD (PD)

  • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref]! Maybe such a "meta-specification" would be a good place to put some of the parameter query specification in.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
    1. querying standardised tables using ADQL or PARAMQUERY
    2. tableset queries
    3. VOSI queries

NOTE: deferring (as above for metadata)

  • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset as it adds extensions such as user defined functions and of course all the REGION stuff.

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.2 p6 par3: "... use an off-the-shelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if it is properly supplied with the required user-defined-functions.

NOTE: this text is no longer in as of TAP-0.4; underlying issue not otherwise addressed (PD)

  • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect cone search as the most common use case, but is that true? Could be changed to "some common use cases".

NOTE: this text is no longer in as of TAP-0.4 (PD)

  • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is. This would be particularly useful for those people who want to implement TAP before UWS is completely accepted. Same is true for possible dependencies on other not-yet-accepted standards such as VOSI.

NOTE: It was accepted in Trieste that the UWS spec would have to be developed and standardised ahead of TAP (PD)

  • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as
    select * from thattable
    , not advanced at all. But it may lead to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.

NOTE: this text is mostly gone as of TAP-0.4; discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?

NOTE: it is true that when describing a service interface that some things are requirements for the service and some for the clients; the latter also need to be described so that the correct response can be specified (e.g. an error when a required param is missing); will try to clarify after next draft (PD)

  • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service MUST support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.

NOTE: discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)

  • s2.1 p9 3rd item in list I would think that table metadata MUST be provided. Without it no queries are possible.

NOTE: agreed, changed in TAP-0.41

  • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
  • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?

NOTE: Yes, the R in REST is represent(...). I don't see any reason one could not have/serve other web resources from within the tree. TAP (and UWS) simply enumerate a (required) set of resources and what they mean (PD).

  • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?

NOTE: This is just explaining how HTTP works in practice and really belongs in Use of HTTP (Section 7 in TAP-0.4).

  • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?

NOTE: as above, sync vs async discussion on mailing list (PD)

  • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit indication of which combinations are valid.

NOTE: Let's revisit this w.r.t. TAP-0.4 now that many parameters have been moved to a separate document (PD).

  • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.

NOTE: It is informative for a service implementor though: it tells them what to assume and what is an error. It could be worded in a more service-implementor centric fashion, but then a client-centric doc would be needed -- maybe is? (PD).

  • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?

NOTE: It is spurious. It is assumed that the service will extract parameters it knows about from the request and ignore anything that is not applicable, which includes everything it does not know about.

  • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.

NOTE: not sure what this refers to as the page numbers are not that helpful (did you print on A4?) but you are right that the REQUEST requirement applies to direct access to the /async and /sync endpoints. That is, you would not need REQUEST to access the child resources under a UWS job. Text clarified in TAP-0.41 (PD).

  • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?

NOTE: consistent case in TAP-0.4 (PD).

  • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
    • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
    • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that the correct VOSI spec?)

NOTE: As with UWS (above), we expect that VOSI as it pertains to TAP will be standardised ahead of TAP (PD). The returned XML is specified by the VODataService spec; anyway, this needs discussion as part of the whole metadata topic (PD).

  • s2.4.2 p12 par1 "The query string is case sensitive."
    • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
    • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). Maybe useful to look at report on different database systems by JVO in Victoria. Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).

NOTE: initiated discussion of case sensitiveness on mailing list 2009-03-02 (PD).

  • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?

NOTE: part oft he whole case-senitive topic above; it is a requirement on the client as stated (PD).

  • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
    • Is ISO8601:2004 intended?
    • Must all of ISO8601(:2004) be supported?
    • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
    • An overview of other RDBS would be useful.

NOTE: Agreed: reviewing what DBs mostly support so that dates can be passed through easily would be good... TBD (PD)

  • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one MUST implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS and other REGION-like extensions) which one may or may not support. After all
     select * from sources where dec between -10 and 10
    looks like a spatial query, but does not require INTERSECTS etc.
Changed:
<
<
We wanted to specify a fixed set of ADQL region constructs that everyone supported so make it easier on the client and on the implementor (fewer decisions). The text says "contains columns with spatial" AND "service wants to support". This is intended to mean that spatial querying support via ADQL region constructs is optional. The example of a range of dec above is independent of this and perfectly acceptable. Text clarified in TAP-0.41 (PD).
>
>
NOTE: We wanted to specify a fixed set of ADQL region constructs that everyone supported so make it easier on the client and on the implementor (fewer decisions). The text says "contains columns with spatial" AND "service wants to support". This is intended to mean that spatial querying support via ADQL region constructs is optional. The example of a range of dec above is independent of this and perfectly acceptable. Text clarified in TAP-0.41 (PD).
 
  • s2.4.2 p12 par3 "the extent of STC/S support within the REGION function is left up to the implementation" I can read this as allowing no support for STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
Added:
>
>
NOTE: For consistency with the direct ADQL constructs, we could require support for position, circle, and box in STC/S. In general, the claim is that services and applications can supprot whatever part of STC they like and that is OK... (PD).
 
  • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. Also noted by Markus Demleitner I think. This seems like a change to the language, which might even require different parsers/interpreters than one would normally implement. How far does this restriction go. Is the following query ok for example:
select POINT(c.coordSys, t.ra, t.dec)
from (select 'ICRS' as coordSys) c
,	 table t
...
Added:
>
>
NOTE: I agree that ADQL allows these and in ADQL discussions where people didn't like the look of such constructs it was argued that this was just the nature of ADQL (SQL) and it's treatment of argument types (literal is equivalent to column ref); this text was included in a provocative manner when it should be simply a warning to users that if they do this they are possibly going to make mistakes. Of course, there are plenty of ways to make mistakes with ADQL and this particular complexity is not going to solve that. Changed text to make this a note/warning in TAP-0.4 (PD).
 
  • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
Added:
>
>
NOTE: changed to MUST in TAP-0.4 (PD)
 
  • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?
Added:
>
>
NOTE: good idea (PD)
 
  • s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
Added:
>
>
NOTE: There is an html format but that is for the whole page. Can you plausibly get an html table element without associated CSS style sheets and expect something useful? Marginally maybe... (PD)
 
  • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)
Added:
>
>
NOTE: I think the intent is for TABLEDATA ONLY. TBD? Will clarify text to say TABLEDATA only for now in TAP-0.41 (PD)
 
  • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
Added:
>
>
NOTE: It must be a legal table name as defined by ADQL, so not following the should means that one has added optional schema (and maybe catalgo) names as prefixes. If the schema name is TAP_UPLOAD (doc incorrectly says TAP_SCHEMA) that would be ok but if it is anythign else it would have to be an error. Clarifying to say "must be an unqualified table name" in TAP-0.41 (PD).
 
  • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.
Added:
>
>
NOTE: MAXREC is used to possibly negotiate a query size limit with the service, which may not otherwise be able to tell what the query will return. Without adding MAXREC to an ADQL query (even one using TOP) the service may truncate the result at a different place due to default limits (PD).
 
  • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.
Added:
>
>
NOTE: this paragraph was removed in TAP-0.4; rules for indicating truncation are described elsewhere (PD).
 
  • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request.
Added:
>
>
NOTE: Last sentence mentioning null-query removed in TAP-0.41 (PD).
 
  • s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?
Added:
>
>
NOTE: It is true that MTIME is intended for finding new/changed/deleted records and making a mirror. While that may generally be best done via param-query, at this point only ADQL support is required so although MTIME is optional we did not want to make it dependent on other optional optional features. If the service cannot deal with MTIME it is ignored, as usual (PD).
 
  • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]
Added:
>
>
NOTE: Yes, this was moved to a separate document (PD).
 
  • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
Added:
>
>
NOTE: The section on LANG no longer says value is case insenstive as of TAP-0.4; other case-sensitivity issues TBD (PD).
 
  • s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.
Added:
>
>
NOTE: It is, although it also says that the service never has to deal with multi-valued parameters in the HTTP sense. Not sure why not.. will bring up on DAL list (PD).
 
  • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
  • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
  • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
    • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
    • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
    • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
    • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well. I suggest an extra row, "view_sql, containing the SQL that defines this view (for rows with table_type=view).
    • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
    • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries. It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
    • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
    • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
      select primary,.indexed,std from tap_schema.columns
      ? (I guess 2.11 might say something about this)
    • s2.6 Metadata prescription for foreign keys is missing but very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
    • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.
  • s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
select * 
  from TAP_SCHEMA.tables
 where table_name like 'TAP_SCHEMA.%'
    • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
    • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
  • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
    select * from TAP_SCHEMA.tableset
    as that table does not exist.
  • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS functionality?
  • s2.7 p20 par7 "... any type of file ... do something useful with the file." I could not find if the document defines such behaviour explicitly. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (and the STC in example).
  • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
  • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
  • s2.8.1 p21 par 1 "The first data row should give the column name..."
    • First, is there a distinction between data rows and other rows?
    • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
  • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is the value of the REQUEST parameter meant?
  • s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A tableset represents the whole database, a single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined properly and likely leads to unnessecary complications.Through ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
  • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely mainly to cover this case?
  • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." I assume this should read: "...one TABLE element per table..."?
  • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it. That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
  • s2.8.5 "Overflows" (already commented on above) I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
  • s2.10 I suppose that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11. [I guess that version 0.4 takes care of part of this.]
  • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
  • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
  • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with statements elsewhere in the spec.
  • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be complete.
  • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
  • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends only on the query what is returned. If a user queries
    select ra, dec ...
    than the service MUST return an ra and a dec column.
  • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
  • (maybe more later)


<--  
-->
  • TAP_METADATA.jpg:
This is a JPEG version of a MagicDraw model which is available in UML form here. In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components. In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys.

NB, the original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well. That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. : TAP_METADATA.jpg

META FILEATTACHMENT attr="" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"

Revision 242009-03-02 - PatrickDowler

 
META TOPICPARENT name="GerardLemson"
This page contains my comments on the TAP 0.31 spec First summary of main points, then list of more detailed comments with location intext. Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive. Some comments may have been made irrelevant by the recent version 0.4.

NOTE: Notes like this one included below once action has been taken with respect to each point (PD aka PatrickDowler). If TAP doc version is not specified, it is TAP-0.41.

Major points/issues/questions

  • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.

NOTE: post to dal mailing list (2009-03-02) to explain and initiate discussion (PD)

  • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
    1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
    2. indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
    3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
    4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.

NOTE: metadata discussion deferred until after next draft (PD)

  • Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
    • Imho, MAXREC and MTIME parameters should not be mixed with ADQL.

NOTE: This should be much more clear in TAP-0.4 (PD)

  • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
    • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
    • Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
    • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).

NOTE: posted explanation and request for comment to dal mailing list on 2009-03-02 (PD)

line-by-line notes/questions/issues

(s=section,p=page,par=paragraph on page or in section).

  • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
Changed:
<
<
NOTE: This was already gone in 0.4 (PD)
>
>
NOTE: this text is no longer in as of TAP-0.4 (PD)
 
  • s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted. Not write one's own database engine.

NOTE: extra text about abstraction removed for clarity (PD)

  • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
Added:
>
>
NOTE: In my opinion it is necessary to allow services to support a subset of ADQL; this would be described in the capabilties returned from the VOSI capabilities request... not sure if one lists all the ADQL features (keywords) that are supported or the version of ADQL and then the ones that are not (should be a smaller list)... TBD (PD)
 
  • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref]! Maybe such a "meta-specification" would be a good place to put some of the parameter query specification in.
Added:
>
>
NOTE: this text is no longer in as of TAP-0.4 (PD)
 
  • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
    1. querying standardised tables using ADQL or PARAMQUERY
    2. tableset queries
    3. VOSI queries
Added:
>
>
NOTE: deferring (as above for metadata)
 
  • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset as it adds extensions such as user defined functions and of course all the REGION stuff.
Added:
>
>
NOTE: this text is no longer in as of TAP-0.4 (PD)
 
  • s1.1.2 p6 par3: "... use an off-the-shelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if it is properly supplied with the required user-defined-functions.
Added:
>
>
NOTE: this text is no longer in as of TAP-0.4; underlying issue not otherwise addressed (PD)
 
  • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect cone search as the most common use case, but is that true? Could be changed to "some common use cases".
Added:
>
>
NOTE: this text is no longer in as of TAP-0.4 (PD)
 
  • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is. This would be particularly useful for those people who want to implement TAP before UWS is completely accepted. Same is true for possible dependencies on other not-yet-accepted standards such as VOSI.
Added:
>
>
NOTE: It was accepted in Trieste that the UWS spec would have to be developed and standardised ahead of TAP (PD)
 
  • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as
    select * from thattable
    , not advanced at all. But it may lead to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
Added:
>
>
NOTE: this text is mostly gone as of TAP-0.4; discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)
 
  • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
Added:
>
>
NOTE: it is true that when describing a service interface that some things are requirements for the service and some for the clients; the latter also need to be described so that the correct response can be specified (e.g. an error when a required param is missing); will try to clarify after next draft (PD)
 
  • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service MUST support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
Added:
>
>
NOTE: discussion of sync vs async (from main points above) redirected to DAL mailing list (PD)
 
  • s2.1 p9 3rd item in list I would think that table metadata MUST be provided. Without it no queries are possible.
Added:
>
>
NOTE: agreed, changed in TAP-0.41
 
  • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
  • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
Added:
>
>
NOTE: Yes, the R in REST is represent(...). I don't see any reason one could not have/serve other web resources from within the tree. TAP (and UWS) simply enumerate a (required) set of resources and what they mean (PD).
 
  • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
Added:
>
>
NOTE: This is just explaining how HTTP works in practice and really belongs in Use of HTTP (Section 7 in TAP-0.4).
 
  • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
Added:
>
>
NOTE: as above, sync vs async discussion on mailing list (PD)
 
  • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit indication of which combinations are valid.
Added:
>
>
NOTE: Let's revisit this w.r.t. TAP-0.4 now that many parameters have been moved to a separate document (PD).
 
  • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
Added:
>
>
NOTE: It is informative for a service implementor though: it tells them what to assume and what is an error. It could be worded in a more service-implementor centric fashion, but then a client-centric doc would be needed -- maybe is? (PD).
 
  • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
Added:
>
>
NOTE: It is spurious. It is assumed that the service will extract parameters it knows about from the request and ignore anything that is not applicable, which includes everything it does not know about.
 
  • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
Added:
>
>
NOTE: not sure what this refers to as the page numbers are not that helpful (did you print on A4?) but you are right that the REQUEST requirement applies to direct access to the /async and /sync endpoints. That is, you would not need REQUEST to access the child resources under a UWS job. Text clarified in TAP-0.41 (PD).
 
  • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
Added:
>
>
NOTE: consistent case in TAP-0.4 (PD).
 
  • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
    • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
    • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that the correct VOSI spec?)
Added:
>
>
NOTE: As with UWS (above), we expect that VOSI as it pertains to TAP will be standardised ahead of TAP (PD). The returned XML is specified by the VODataService spec; anyway, this needs discussion as part of the whole metadata topic (PD).
 
  • s2.4.2 p12 par1 "The query string is case sensitive."
    • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
    • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). Maybe useful to look at report on different database systems by JVO in Victoria. Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
Added:
>
>
NOTE: initiated discussion of case sensitiveness on mailing list 2009-03-02 (PD).
 
  • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
Added:
>
>
NOTE: part oft he whole case-senitive topic above; it is a requirement on the client as stated (PD).
 
  • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
    • Is ISO8601:2004 intended?
    • Must all of ISO8601(:2004) be supported?
    • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
    • An overview of other RDBS would be useful.
Added:
>
>
NOTE: Agreed: reviewing what DBs mostly support so that dates can be passed through easily would be good... TBD (PD)
 
  • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one MUST implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS and other REGION-like extensions) which one may or may not support. After all
     select * from sources where dec between -10 and 10
    looks like a spatial query, but does not require INTERSECTS etc.
Added:
>
>
We wanted to specify a fixed set of ADQL region constructs that everyone supported so make it easier on the client and on the implementor (fewer decisions). The text says "contains columns with spatial" AND "service wants to support". This is intended to mean that spatial querying support via ADQL region constructs is optional. The example of a range of dec above is independent of this and perfectly acceptable. Text clarified in TAP-0.41 (PD).
 
  • s2.4.2 p12 par3 "the extent of STC/S support within the REGION function is left up to the implementation" I can read this as allowing no support for STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
  • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. Also noted by Markus Demleitner I think. This seems like a change to the language, which might even require different parsers/interpreters than one would normally implement. How far does this restriction go. Is the following query ok for example:
select POINT(c.coordSys, t.ra, t.dec)
from (select 'ICRS' as coordSys) c
,	 table t
...
  • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
  • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?
  • s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
  • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)
  • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
  • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.
  • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.
  • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. * s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
  • especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?
    • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]
    • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
    • s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.
    • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
    • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
      • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
      • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well. I suggest an extra row, "view_sql, containing the SQL that defines this view (for rows with table_type=view).
      • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
      • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries. It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
      • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
      • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
        select primary,.indexed,std from tap_schema.columns
        ? (I guess 2.11 might say something about this)
      • s2.6 Metadata prescription for foreign keys is missing but very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
      • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.
    • s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
    select * 
      from TAP_SCHEMA.tables
     where table_name like 'TAP_SCHEMA.%'
    
      • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
      • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
    • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
      select * from TAP_SCHEMA.tableset
      as that table does not exist.
    • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS functionality?
    • s2.7 p20 par7 "... any type of file ... do something useful with the file." I could not find if the document defines such behaviour explicitly. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (and the STC in example).
    • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
    • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
    • s2.8.1 p21 par 1 "The first data row should give the column name..."
      • First, is there a distinction between data rows and other rows?
      • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
    • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is the value of the REQUEST parameter meant?
    • s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A tableset represents the whole database, a single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined properly and likely leads to unnessecary complications.Through ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
    • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely mainly to cover this case?
    • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." I assume this should read: "...one TABLE element per table..."?
    • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it. That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
    • s2.8.5 "Overflows" (already commented on above) I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
    • s2.10 I suppose that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11. [I guess that version 0.4 takes care of part of this.]
    • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
    • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
    • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with statements elsewhere in the spec.
    • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be complete.
    • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
    • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends only on the query what is returned. If a user queries
      select ra, dec ...
      than the service MUST return an ra and a dec column.
    • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
    • (maybe more later)


    <--  
    -->
    • TAP_METADATA.jpg:
    This is a JPEG version of a MagicDraw model which is available in UML form here. In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components. In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys.

    NB, the original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well. That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. : TAP_METADATA.jpg

    META FILEATTACHMENT attr="" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"

    Revision 232009-03-02 - PatrickDowler

     
    META TOPICPARENT name="GerardLemson"
    This page contains my comments on the TAP 0.31 spec First summary of main points, then list of more detailed comments with location intext. Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive. Some comments may have been made irrelevant by the recent version 0.4.
    Added:
    >
    >
    NOTE: Notes like this one included below once action has been taken with respect to each point (PD aka PatrickDowler). If TAP doc version is not specified, it is TAP-0.41.
     

    Major points/issues/questions

    • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.
    Added:
    >
    >
    NOTE: post to dal mailing list (2009-03-02) to explain and initiate discussion (PD)
     
    • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
      1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
      2. indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
      3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
      4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.
    Added:
    >
    >
    NOTE: metadata discussion deferred until after next draft (PD)
     
    • Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
      • Imho, MAXREC and MTIME parameters should not be mixed with ADQL.
    Added:
    >
    >
    NOTE: This should be much more clear in TAP-0.4 (PD)
     
    • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
      • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
      • Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
      • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).
    Added:
    >
    >
    NOTE: posted explanation and request for comment to dal mailing list on 2009-03-02 (PD)
     

    line-by-line notes/questions/issues

    (s=section,p=page,par=paragraph on page or in section).

    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    Added:
    >
    >
    NOTE: This was already gone in 0.4 (PD)
     
    • s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
    Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted. Not write one's own database engine.
    Added:
    >
    >
    NOTE: extra text about abstraction removed for clarity (PD)
     
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref]! Maybe such a "meta-specification" would be a good place to put some of the parameter query specification in.
    • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset as it adds extensions such as user defined functions and of course all the REGION stuff.
    • s1.1.2 p6 par3: "... use an off-the-shelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if it is properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect cone search as the most common use case, but is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is. This would be particularly useful for those people who want to implement TAP before UWS is completely accepted. Same is true for possible dependencies on other not-yet-accepted standards such as VOSI.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as
      select * from thattable
      , not advanced at all. But it may lead to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service MUST support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 3rd item in list I would think that table metadata MUST be provided. Without it no queries are possible.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit indication of which combinations are valid.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
      • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that the correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). Maybe useful to look at report on different database systems by JVO in Victoria. Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
      • Must all of ISO8601(:2004) be supported?
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one MUST implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS and other REGION-like extensions) which one may or may not support. After all
       select * from sources where dec between -10 and 10
      looks like a spatial query, but does not require INTERSECTS etc.
    • s2.4.2 p12 par3 "the extent of STC/S support within the REGION function is left up to the implementation" I can read this as allowing no support for STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. Also noted by Markus Demleitner I think. This seems like a change to the language, which might even require different parsers/interpreters than one would normally implement. How far does this restriction go. Is the following query ok for example:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?
    • s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
    ) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
  • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)
  • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
  • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.
  • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.
  • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. * s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
  • especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?
    • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]
    • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
    • s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.
    • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
    • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
      • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
      • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well. I suggest an extra row, "view_sql, containing the SQL that defines this view (for rows with table_type=view).
      • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
      • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries. It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
      • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
      • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
        select primary,.indexed,std from tap_schema.columns
        ? (I guess 2.11 might say something about this)
      • s2.6 Metadata prescription for foreign keys is missing but very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
      • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.
    • s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
    select * 
      from TAP_SCHEMA.tables
     where table_name like 'TAP_SCHEMA.%'
    
      • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
      • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
    • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
      select * from TAP_SCHEMA.tableset
      as that table does not exist.
    • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS functionality?
    • s2.7 p20 par7 "... any type of file ... do something useful with the file." I could not find if the document defines such behaviour explicitly. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (and the STC in example).
    • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
    • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
    • s2.8.1 p21 par 1 "The first data row should give the column name..."
      • First, is there a distinction between data rows and other rows?
      • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
    • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is the value of the REQUEST parameter meant?
    • s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A tableset represents the whole database, a single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined properly and likely leads to unnessecary complications.Through ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
    • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely mainly to cover this case?
    • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." I assume this should read: "...one TABLE element per table..."?
    • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it. That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
    • s2.8.5 "Overflows" (already commented on above) I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
    • s2.10 I suppose that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11. [I guess that version 0.4 takes care of part of this.]
    • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
    • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
    • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with statements elsewhere in the spec.
    • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be complete.
    • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
    • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends only on the query what is returned. If a user queries
      select ra, dec ...
      than the service MUST return an ra and a dec column.
    • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
    • (maybe more later)


    <--  
    -->
    • TAP_METADATA.jpg:
    This is a JPEG version of a MagicDraw model which is available in UML form here. In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components. In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys.

    NB, the original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well. That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. : TAP_METADATA.jpg

    META FILEATTACHMENT attr="" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"

    Revision 222009-03-02 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"
    This page contains my comments on the TAP 0.31 spec First summary of main points, then list of more detailed comments with location intext. Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive. Some comments may have been made irrelevant by the recent version 0.4.

    Major points/issues/questions

    • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.
    • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
      1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
      2. indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
      3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
      4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.
    • Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
      • Imho, MAXREC and MTIME parameters should not be mixed with ADQL.
    • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
      • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
      • Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
      • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).

    line-by-line notes/questions/issues

    (s=section,p=page,par=paragraph on page or in section).

    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
    Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted. Not write one's own database engine.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref]! Maybe such a "meta-specification" would be a good place to put some of the parameter query specification in.
    • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset as it adds extensions such as user defined functions and of course all the REGION stuff.
    • s1.1.2 p6 par3: "... use an off-the-shelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if it is properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect cone search as the most common use case, but is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is. This would be particularly useful for those people who want to implement TAP before UWS is completely accepted. Same is true for possible dependencies on other not-yet-accepted standards such as VOSI.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as
      select * from thattable
      , not advanced at all. But it may lead to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service MUST support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 3rd item in list I would think that table metadata MUST be provided. Without it no queries are possible.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit indication of which combinations are valid.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
      • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that the correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). Maybe useful to look at report on different database systems by JVO in Victoria. Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
    Added:
    >
    >
      • Must all of ISO8601(:2004) be supported?
     
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one MUST implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS and other REGION-like extensions) which one may or may not support. After all
       select * from sources where dec between -10 and 10
      looks like a spatial query, but does not require INTERSECTS etc.
    • s2.4.2 p12 par3 "the extent of STC/S support within the REGION function is left up to the implementation" I can read this as allowing no support for STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. Also noted by Markus Demleitner I think. This seems like a change to the language, which might even require different parsers/interpreters than one would normally implement. How far does this restriction go. Is the following query ok for example:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?
    • s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
    ) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
  • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)
  • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
  • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.
  • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.
  • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. * s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
  • especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?
    • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]
    • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
    • s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.
    • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
    • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
      • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
      • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well. I suggest an extra row, "view_sql, containing the SQL that defines this view (for rows with table_type=view).
      • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
      • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries. It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
      • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
      • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
        select primary,.indexed,std from tap_schema.columns
        ? (I guess 2.11 might say something about this)
      • s2.6 Metadata prescription for foreign keys is missing but very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
      • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.
    • s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
    select * 
      from TAP_SCHEMA.tables
     where table_name like 'TAP_SCHEMA.%'
    
      • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
      • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
    • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
      select * from TAP_SCHEMA.tableset
      as that table does not exist.
    • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS functionality?
    • s2.7 p20 par7 "... any type of file ... do something useful with the file." I could not find if the document defines such behaviour explicitly. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (and the STC in example).
    • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
    • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
    • s2.8.1 p21 par 1 "The first data row should give the column name..."
      • First, is there a distinction between data rows and other rows?
      • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
    • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is the value of the REQUEST parameter meant?
    • s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A tableset represents the whole database, a single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined properly and likely leads to unnessecary complications.Through ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
    • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely mainly to cover this case?
    • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." I assume this should read: "...one TABLE element per table..."?
    • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it. That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
    • s2.8.5 "Overflows" (already commented on above) I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
    • s2.10 I suppose that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11. [I guess that version 0.4 takes care of part of this.]
    • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
    • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
    • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with statements elsewhere in the spec.
    • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be complete.
    • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
    • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends only on the query what is returned. If a user queries
      select ra, dec ...
      than the service MUST return an ra and a dec column.
    • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
    • (maybe more later)


    <--  
    -->
    • TAP_METADATA.jpg:
    This is a JPEG version of a MagicDraw model which is available in UML form here. In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components. In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys.

    NB, the original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well. That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. : TAP_METADATA.jpg

    META FILEATTACHMENT attr="" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"

    Revision 212009-02-19 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"
    This page contains my comments on the TAP 0.31 spec First summary of main points, then list of more detailed comments with location intext. Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive. Some comments may have been made irrelevant by the recent version 0.4.

    Major points/issues/questions

    • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.
    • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
      1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
      2. indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
      3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
      4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.
    • Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
      • Imho, MAXREC and MTIME parameters should not be mixed with ADQL.
    • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
      • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
      • Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
      • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).

    line-by-line notes/questions/issues

    (s=section,p=page,par=paragraph on page or in section).

    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
    Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted. Not write one's own database engine.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref]! Maybe such a "meta-specification" would be a good place to put some of the parameter query specification in.
    • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset as it adds extensions such as user defined functions and of course all the REGION stuff.
    • s1.1.2 p6 par3: "... use an off-the-shelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if it is properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect cone search as the most common use case, but is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is. This would be particularly useful for those people who want to implement TAP before UWS is completely accepted. Same is true for possible dependencies on other not-yet-accepted standards such as VOSI.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as
      select * from thattable
      , not advanced at all. But it may lead to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service MUST support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 3rd item in list I would think that table metadata MUST be provided. Without it no queries are possible.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit indication of which combinations are valid.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
      • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that the correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). Maybe useful to look at report on different database systems by JVO in Victoria. Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one MUST implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS and other REGION-like extensions) which one may or may not support. After all
       select * from sources where dec between -10 and 10
      looks like a spatial query, but does not require INTERSECTS etc.
    • s2.4.2 p12 par3 "the extent of STC/S support within the REGION function is left up to the implementation" I can read this as allowing no support for STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. Also noted by Markus Demleitner I think. This seems like a change to the language, which might even require different parsers/interpreters than one would normally implement. How far does this restriction go. Is the following query ok for example:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?
    • s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
    ) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
  • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)
  • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
  • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.
  • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.
  • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. * s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
  • especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?
    • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]
    • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
    • s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.
    • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
    • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
      • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
      • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well. I suggest an extra row, "view_sql, containing the SQL that defines this view (for rows with table_type=view).
      • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
      • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries. It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
      • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
      • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
        select primary,.indexed,std from tap_schema.columns
        ? (I guess 2.11 might say something about this)
      • s2.6 Metadata prescription for foreign keys is missing but very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
      • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.
    • s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
    select * 
      from TAP_SCHEMA.tables
     where table_name like 'TAP_SCHEMA.%'
    
      • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
      • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
    • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
      select * from TAP_SCHEMA.tableset
      as that table does not exist.
    • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests.
    Changed: <
    <Guess this depends much on UWS interaction, must check later in doc.>
    >Guess this depends much on UWS functionality? 
    • s2.7 p20 par7 "... any type of file ... do something useful with the file."
    Changed: <
    <Check rest of document whether examples are given of this for other parameters. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (also STC in example).>
    >I could not find if the document defines such behaviour explicitly. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (and the STC in example). 
    • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
    • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
    • s2.8.1 p21 par 1 "The first data row should give the column name..."
      • First, is there a distinction between data rows and other rows?
      • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
    Changed: <
    <
    • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is it really the value of the REQUEST parameter?
    >
    >
    • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is the value of the REQUEST parameter meant?
     
    • s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?
    Changed: <
    <That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A Tableset represents the whole database. A single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined and leads to unnessecary complications.Thsough ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?>
    >That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A tableset represents the whole database, a single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined properly and likely leads to unnessecary complications.Through ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications? 
    • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys.
    Changed: <
    <The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely only to cover this case?
    • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." In any case, assume this should read: "...one TABLE element..."?
    >
    >The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely mainly to cover this case?
    • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." I assume this should read: "...one TABLE element per table..."?
     
    • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,
    Changed: <
    <should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it.That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
    • s2.8.5 "Overflows" I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client
    >
    >should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it. That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
    • s2.8.5 "Overflows" (already commented on above) I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client
     might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?Changed: <
    <
    • s2.10 I would suggest that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11.
    >
    >
    • s2.10 I suppose that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11. [I guess that version 0.4 takes care of part of this.]
     
    • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
    • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
    • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate.
    Changed: <
    <Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with other statemennts (though has my preference).>
    >Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with statements elsewhere in the spec. 
    • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens".
    Changed: <
    <Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be completed.>
    >Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be complete. 
    • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?
    Changed: <
    <This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
    • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends on the query pure and simple what is returned.
    >
    >This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).Added: >
    >
    • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends only on the query what is returned.
     If a user queries
    select ra, dec ...
    than the service MUST return an ra and a dec column.
    • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
    Changed: <
    <
    >
    >
    • (maybe more later)
     


    <--  
    -->
    • TAP_METADATA.jpg:
    Changed: <
    <This is a JPEG version of a MagicDraw model which is available in UML form here\. NB, the VO-URP GoogleCode project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. :>
    >This is a JPEG version of a MagicDraw model which is available in UML form here. In white components that have been taken over unchanged. In orange existing components that have been updated. In purple completely new components. In green a suggestion by Francois Ochsenbein on primary keys and their use in the definition of foreign keys.Added: >
    > NB, the original MagicDraw diagram can be obtained from the VO-URP GoogleCode project as well. That project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. :  TAP_METADATA.jpg

    META FILEATTACHMENT attr="" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"

    Revision 202009-02-19 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"
    Changed:
    <
    <
    This page contains my comments on the TAP 0.31 spec. First summary of main points, then list of more detailed comments.
    >
    >
    This page contains my comments on the TAP 0.31 spec
    Added:
    >
    >
    First summary of main points, then list of more detailed comments with location intext. Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days, so may be somewhat repetitive. Some comments may have been made irrelevant by the recent version 0.4.
     

    Major points/issues/questions

    • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.
    Changed:
    <
    <
    I think /async is so much harder to implement that a /sync-only service should be allowed. But I can imagine if some implementers would prefer always /async for data queries, so propse that to be possible as well. Should probably be a capability, i.e. explicitly mentioned in TAP service metadata.
    >
    >
    I think /async is so much harder to implement that a /sync-only service should be allowed, but I can imagine if some implementers would prefer always /async for data queries. I propose that either (or both) is allowed, and should be part of service metadata.
     
    • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
    Changed:
    <
    <
      • how to access metadata
        1. REQUEST=ADQL/Param : query to TAP_SCHEMA tables => tabular result: OK
        2. REQUEST=getTableMetadata => VODataService result(?); OK but Unclear how to actually query for this. Is it REQUEST=getTableMetadata&FORMAT=xml ?
    >
    >
      1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
      2. indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
      3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
      4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... );
    Deleted:
    <
    <
    And unclear how the WHERE clause should be implemented. Is it really necessary to bring in this kind of complication? We have ADQL and to lesser extent ParamQuery to allow users any type of flexibility in querying, and is supported at the moment as querying for data itself is supported. The WHERE clause in a getTableSet query is ill defined anyway I think, as it can only talk to one table, and a table set is really the result of querying all TAP_SCHEMA tables.
        1. 2.8.2 Tableset queries +> empty VOTable with no data, multiple tables. Unclear how to actually query for this. Is it REQUEST=getTableMetadata&FORMAT=votable ? I suppose for this reason Francois has added his primaryLey and foreignKey etc constructs to VOTable? Why not allow the VOTable version of the metadata to be the obvious serialisation to VOTable of the TAP_SCHEMA tables? I.e. what I would get if I executed
          select * from TAP_SCHEMA.schemas
          and
          select * from TAP_SCHEMA.tables
          etc, with the results combined in one VOTable.No alternative representation of the metadata is then required.
        2. 2.8.3 "VOSI-table metadata": seems not to exist in VOSI doc.
      • content of metadata:
        1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
        2. indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
        3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
        4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... );
      maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.
    Changed:
    <
    <
    • Grouping of and dependencies between parameters for the different request types should be made explicit.
    >
    >
    • Grouping of and dependencies between HTTP parameters for the different request types should be made explicit.
    Added:
    >
    >
      • Imho, MAXREC and MTIME parameters should not be mixed with ADQL.
     
    • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
      • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
    Changed:
    <
    <
      • propose that case sensitivity is only an issue for column values, not names of tables and columns etc.
    >
    >
      • Propose that case sensitivity is only an issue for column values, not (never ?) for names of tables and columns etc.
    Deleted:
    <
    <
      • Inconsistent with ParamQuery/WHERE treatment of string fields.
     
      • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).
    Deleted:
    <
    <
      • I would suggest that the ADQL keywords and table/column/function identifiers be case insensitive as ADQL defines.
    • MAXREC and MTIME parameters should not be mixed with ADQL. Overflow should only happen if the service decides not to return all rows that could be retrieved for a query.NOT when a user requested (TOP for ADQL, MAXREC for ParamQuery) maximum is reached.
     

    line-by-line notes/questions/issues

    Changed:
    <
    <
    Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf
    >
    >
    (s=section,p=page,par=paragraph on page or in section).
     
    Deleted:
    <
    <
    I know not all sections are normative, nevertheless I am commenting on some of them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days,so may be somewhat repetitive.
     
    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    Changed:
    <
    <
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    >
    >
    • s1 end p4: ".. is not visible to users." I don't know whether it is necessarily a good idea to completely abstract away from a user whether there is a relational database on the backend or not. In some sense the fact that one can send ADQL, which is clearly an SQL dialect, makes users expect relational database technology. They may then also expect, and use, some specific database features such as indexes and foreign keys when writing their queries.
    Also I think if this abstracting-away would translate into a suggestion to potential implementers, that they could just as well implement TAP on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass it the ADQL, possibly slightly adapted.
    Added:
    >
    >
    Not write one's own database engine.
     
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    Changed:
    <
    <
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    >
    >
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref]! Maybe such a "meta-specification" would be a good place to put some of the parameter query specification in.
     
    • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    Changed:
    <
    <
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if properly supplied with the required user-defined-functions.
    >
    >
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset as it adds extensions such as user defined functions and of course all the REGION stuff.
    • s1.1.2 p6 par3: "... use an off-the-shelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database,
    Added:
    >
    >
    even if it is properly supplied with the required user-defined-functions.
     
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly
    Changed:
    <
    <
    on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    >
    >
    on the database. It likely refers to the usual suspect cone search as the most common use case, but is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is. This would be particularly useful for those people who want to implement TAP before UWS is completely accepted. Same is true for possible dependencies on other not-yet-accepted standards such as VOSI.
     
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced"
    Changed:
    <
    <
    a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso
    >
    >
    a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as
    select * from thattable
    , not advanced at all. But it may lead to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso
     download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    Deleted:
    <
    <
    • skip 1.2 for now, must be matched to normative sections.
     
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database),
    Changed:
    <
    <
    is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL".
    >
    >
    is it possible to change the requirements to something like: "A TAP service MUST support at least one of sync-ADQL and async-ADQL".
     I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    Changed:
    <
    <
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible.
    >
    >
    • s2.1 p9 3rd item in list I would think that table metadata MUST be provided. Without it no queries are possible.
     
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    Changed:
    <
    <
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    >
    >
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit indication of which combinations are valid.
     
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
      • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that the correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and
    Changed:
    <
    <
    column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even.
    >
    >
    column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). Maybe useful to look at report on different database systems by JVO in
    Added:
    >
    >
    Victoria. Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even.
     This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns,
    Changed:
    <
    <
    one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
    >
    >
    one MUST implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS and other REGION-like extensions) which one may or may not support. After all
     select * from sources where dec between -10 and 10
    looks like a spatial query, but does not require INTERSECTS etc.
    • s2.4.2 p12 par3 "the extent of STC/S support within the REGION function is left up to the implementation" I can read this as allowing no support for STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values."
    Added:
    >
    >
    I do not understand the reason for this restriction at all. Also noted by Markus Demleitner I think. This seems like a change to the language, which might even require different parsers/interpreters than one would normally implement. How far does this restriction go. Is the following query ok for example:
     
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    Changed:
    <
    <
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available: TABLEDATA BINARY FITS, LINKs iso DATA?
    >
    >
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly in the document?
    • s2.4.5 p13 list Might it be useful to have an html-table (i.e. starting with <table..> and ending with
    ) as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    Added:
    >
    >
    • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available, TABLEDATA, BINARY, FITS, also LINKs iso DATA? (Maybe answered in 2.12?)
     
    • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
    Changed:
    <
    <
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. I believe a TAP service should only impose its own "maximum permitted value of MAXREC", not some default value that comes into play when a user does not specify MAXREC. And only in that case should an overflow notification be given (s2.8.5).
    • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". IF a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in
    >
    >
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. Useful for ParamQueries though.
    • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". In my opinion, if a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows (or less) MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. ONLY if the service's "maximum permitted value for MAXREC" is reached should an overflow warning be give, but in the manner described in 2.8.4, using an INFO element.
    Deleted:
    <
    <
    the SQL) would need to use TOP ..+1 etc. If a user asks for MAXREC rows, taht's what (s)he should get. ONLY if the service's "maximum permitted value for MAXREC" is reached should one react. But in the manner described in 2.8.4, using an INFO element.
     
    • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request.
    Changed:
    <
    <
    Only if "TOP 0" is used might one call a query a null query. This is mainly based on the distinction between request and query (the payload of a request) that has been made in the mailing lists.
    • s2.4.8 Similar to MAXREC I don't think this parameter should go together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It could be noted maybe that in general it is good practice to have such columns, "createDate", "updateDate",
    >
    >
    • s2.4.8 I don't think MTIME should be used together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It might be suggested that in general it is good practice to have such columns, "createDate", "updateDate",
    especially if tables get updated over time. If tables get created and filed in one bulk insert it may be useful to add such information to the table's metadata?
    Deleted:
    <
    <
    explicitly added to their data model. Maybe there is a place in the metadata of a table in case a table contains only rows all with the same updateTime.
     
    • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something
    Changed:
    <
    <
    similar was specified in SSA already as well.
    >
    >
    similar was specified in SSA already as well. [I guess it has indeed be removed from version 0.4]
     
    • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
    Changed:
    <
    <
    • s2.4.14 p17 par2 "Clients should not repeat parameters in a
    >
    >
    • s2.4.14 p17 par2 "Clients should not repeat parameters in a request". Seems to be a SHOULD requirement on clients.
    Deleted:
    <
    <
    request". Seems to be a SHOULD requirement on clients.
     
    • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
    Changed:
    <
    <
    • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear.Various comments on the actual metadata prescription. I have a proposal for a datamodel for the TAP_SCHEMA attached to this page (below).
    >
    >
    • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear. Some comments on the actual metadata prescription (a summary of the proposal can be inferred form the UML diagram at the bottom of this page):
    Deleted:
    <
    <
    It is a JPEG version of a MagicDraw model which is available in UML form here\. NB, the VO-URP GoogleCode project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically.:
     
      • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
      • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this
    Changed:
    <
    <
    table as well.
    >
    >
    table as well. I suggest an extra row, "view_sql, containing the SQL that defines this view (for rows with table_type=view).
    Deleted:
    <
    <
    I.e. I suggest an extra row, "view_sql, the SQL that defines this view (for rows with table_type=view).
     
      • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
      • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries. It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService
    Changed:
    <
    <
    starting with Ray's email here. One problem that has been identified there is that ADQL does
    >
    >
    starting with Ray's email here. One problem that has been identified there is that ADQL does
     not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
      • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
      • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database?
    Changed:
    <
    <
    I.e what are valid values for
    select primary,.indexed,std from tap_schema.columns
    ?
      • s2.6 Metadata prescription for foreign keys is missing and are very important. See discussions in same
    >
    >
    I.e what are valid values for
    select primary,.indexed,std from tap_schema.columns
    ? (I guess 2.11 might say something about this)
      • s2.6 Metadata prescription for foreign keys is missing but very important. See discussions in same
     mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
      • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.
    Changed:
    <
    <
    • s2.6 p19 par2* "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client."
    >
    >
    • s2.6 p19 par2 "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client."
     I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
    select * 
      from TAP_SCHEMA.tables
     where table_name like 'TAP_SCHEMA.%'
    
      • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
      • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
    • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
      select * from TAP_SCHEMA.tableset
      as that table does not exist.
    • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS interaction, must check later in doc.
    • s2.7 p20 par7 "... any type of file ... do something useful with the file." Check rest of document whether examples are given of this for other parameters. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (also STC in example).
    • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
    • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
    Changed:
    <
    <
    • s2.8.1 p21 par 1 "The first data row should give the column name"
      • First is there a distinction between data rows and other rows?
    >
    >
    • s2.8.1 p21 par 1 "The first data row should give the column name..."
      • First, is there a distinction between data rows and other rows?
     
      • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
    • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is it really the value of the REQUEST parameter? * s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A Tableset represents the whole database. A single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined and leads to unnessecary complications.Thsough ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
    • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely only to cover this case?
    • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." In any case, assume this should read: "...one TABLE element..."?
    • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it.That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
    • s2.8.5 "Overflows" I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
    • s2.10 I would suggest that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11.
    • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
    • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
    • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with other statemennts (though has my preference).
    • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be completed.
    • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
    • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends on the query pure and simple what is returned. If a user queries
      select ra, dec ...
      than the service MUST return an ra and a dec column.
    • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.


    <--  
    -->
    Changed:
    <
    <
    • TAP_METADATA.jpg:
      TAP_METADATA.jpg
    >
    >
    • TAP_METADATA.jpg:
    Added:
    >
    >
    This is a JPEG version of a MagicDraw model which is available in UML form here\. NB, the VO-URP GoogleCode project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically. : TAP_METADATA.jpg
     
    Changed:
    <
    <
    META FILEATTACHMENT attr="" comment="" date="1234968163" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="437828" user="GerardLemson" version="1.1"
    >
    >
    META FILEATTACHMENT attr="" comment="" date="1235058275" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="444453" user="GerardLemson" version="1.3"
     

    Revision 192009-02-19 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"
    Changed:
    <
    <

    ---------- PRELIMINARY -----------------

    My workspace for commenting on TAP 0.31...
    >
    >
    This page contains my comments on the TAP 0.31 spec. First summary of main points, then list of more detailed comments.
    Deleted:
    <
    <
    Others than GerardLemson should not use this page.
     
    Deleted:
    <
    <
     

    Major points/issues/questions

    Changed:
    <
    <
    • /sync vs /async: I would like it to be possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync only service should be allowed. But I can understand if some implementers would prefer always /async for data queries, so that should be possible as well. Should therefore be a capability.
    • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived.)
      • comments on how to access metadat
    >
    >
    • /sync vs /async: I think it preferable if it were possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync-only service should be allowed. But I can imagine if some implementers would prefer always /async for data queries, so propse that to be possible as well. Should probably be a capability, i.e. explicitly mentioned in TAP service metadata.
    • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in TAP_SCHEMA model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived. Based partially on discussions on mailing list.)
    Added:
    >
    >
      • how to access metadata
     
        1. REQUEST=ADQL/Param : query to TAP_SCHEMA tables => tabular result: OK
        2. REQUEST=getTableMetadata => VODataService result(?); OK but Unclear how to actually query for this. Is it REQUEST=getTableMetadata&FORMAT=xml ?And unclear how the WHERE clause should be implemented. Is it really necessary to bring in this kind of complication? We have ADQL and to lesser extent ParamQuery to allow users any type of flexibility in querying, and is supported at the moment as querying for data itself is supported. The WHERE clause in a getTableSet query is ill defined anyway I think, as it can only talk to one table, and a table set is really the result of querying all TAP_SCHEMA tables.
        3. 2.8.2 Tableset queries +> empty VOTable with no data, multiple tables. Unclear how to actually query for this. Is it REQUEST=getTableMetadata&FORMAT=votable ? I suppose for this reason Francois has added his primaryLey and foreignKey etc constructs to VOTable? Why not allow the VOTable version of the metadata to be the obvious serialisation to VOTable of the TAP_SCHEMA tables? I.e. what I would get if I executed
          select * from TAP_SCHEMA.schemas
          and
          select * from TAP_SCHEMA.tables
          etc, with the results combined in one VOTable.No alternative representation of the metadata is then required.
        4. 2.8.3 "VOSI-table metadata": seems not to exist in VOSI doc.
    Changed:
    <
    <
      • comments on content of metadata:
    >
    >
      • content of metadata:
     
        1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
        2. indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
        3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
        4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.
    Changed:
    <
    <
    • Case sensitivity:
    >
    >
    • Grouping of and dependencies between parameters for the different request types should be made explicit.
    Added:
    >
    >
    • Case sensitivity: The QUERY parameter is supposed to be case sensitive. Imho this should not be the case.
     
      • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
      • propose that case sensitivity is only an issue for column values, not names of tables and columns etc.
      • Inconsistent with ParamQuery/WHERE treatment of string fields.
      • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).
      • I would suggest that the ADQL keywords and table/column/function identifiers be case insensitive as ADQL defines.
    Changed:
    <
    <
    • Grouping of and dependencies between parameters for the different request types should be made explicit.
    • Separation of some of the ParamQuery details from into a separate document.
    >
    >
    • MAXREC and MTIME parameters should not be mixed with ADQL. Overflow should only happen if the service decides not to return all rows that could be retrieved for a query.NOT when a user requested (TOP for ADQL, MAXREC for ParamQuery) maximum is reached.
    Deleted:
    <
    <
    This could be the "Common elements in the DAL2 family of services" specification.Note, it is very feasible to design a next SIAP spec as follows: 1 Create a data model for SIAP2, for its metadata really.
      1. Map it to a relational data model.
      2. Replace the queryData part from SIAP with the REQUEST=doQuery part in this specialisation of TAP.
      3. (If in the future we develop TAP further, and allow for example BLOB datatypes, one might even implement the getData part as an ADQL query!equires some though on serialising BLOBs ofcourse)
      • Note that this (apart from the last step) is exactly he approach originally taken originally in SNAP, now SimDB+!SimDAP. SimDB implements the first three steps. SimDAP the getData part.
      • Note furthermore that this is an immediate answer to Roy Williams' concern that TAP does not support interoperability. It does insofar that the main tasks for new spec can shift to designing the proper common data model. The query protocol is already given !
     
    Changed:
    <
    <
    >
    >

    line-by-line notes/questions/issues

    Deleted:
    <
    <

    line-by-line notes/questions/issues on reading TAP 0.31

     Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf
    Changed:
    <
    <
    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others. In red comments that require checking elsewhere in document.
    >
    >
    I know not all sections are normative, nevertheless I am commenting on some of them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others. I have been writing comments down while reading the spec over a couple of days,so may be somewhat repetitive.
     
    Changed:
    <
    <
    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
    >
    >
    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice.
    Added:
    >
    >
    The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be at least three ways of querying for table metadata:
     
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    Changed:
    <
    <
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    >
    >
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, even if properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    Added:
    >
    >
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
     
    • skip 1.2 for now, must be matched to normative sections.
    Changed:
    <
    <
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    >
    >
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the
    Added:
    >
    >
    service. Should identify those and if correct must something be done about that?
     
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    Changed:
    <
    <
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    >
    >
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    Added:
    >
    >
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
     
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    Changed:
    <
    <
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    >
    >
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems
    Added:
    >
    >
    to be inconsistent with the last par on p9, which does not mandate error.
     
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    Changed:
    <
    <
    • s2.4.1 p11 par2, list The statement on getCapabilities, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
      • Why does this spec, which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata?
    >
    >
    • s2.4.1 p11 par2, list The statement on getCapabilities, getAvailability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not yet an accepted standard (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly. (Or is this done later?)
      • Why does this spec, which seems to be the correct specificaiton for defining how to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata?
      Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf,
    Changed:
    <
    <
    is that correct VOSI spec?)
    >
    >
    is that the correct VOSI spec?)
     
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
    Changed:
    <
    <
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    >
    >
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by
    Added:
    >
    >
    default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
     
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
    Changed:
    <
    <
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
    >
    >
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs
    Added:
    >
    >
    extended version yyyy-mm-dd.
     
      • An overview of other RDBS would be useful.
    Changed:
    <
    <
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
    >
    >
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that
    Added:
    >
    >
    I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
     
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    Changed:
    <
    <
    • *s2.4.5 p13 list* Is it allowed for the VOTable to contain data in all its DATA types available: TABLEDATA BINARY FITS, LINKs iso DATA?
    >
    >
    • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available: TABLEDATA BINARY FITS, LINKs iso DATA?
     
    • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
    Changed:
    <
    <
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. I believe a TAP service should only impose its own "maximum permitted value of MAXREC", not some default value that comes into play when a user does not specify MAXREC. And only in that case should an overflow notification be given (s2.8.5).
    • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". IF a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. If a user asks for MAXREC rows, taht's what (s)he should get. ONLY if the service's "maximum permitted value for MAXREC" is reached should one react. But in the manner described in 2.8.4, using an INFO element.
    • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. Only if "TOP 0" is used might one call a query a null query. This is mainly based on the distinction between request and query (the payload of a request) that has been made in the mailing lists.
    • s2.4.8 Similar to MAXREC I don't think this parameter should go together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It could be noted maybe that in general it is good practice to have such columns, "createDate", "updateDate", explicitly added to their data model. Maybe there is a place in the metadata of a table in case a table contains only rows all with the same updateTime.
    • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well.
    • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
    >
    >
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. I believe a TAP service should only impose its own "maximum permitted value of MAXREC", not some default value that comes into play when a user does not specify MAXREC. And only in that case should an overflow notification be given (s2.8.5).
    • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". IF a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. If a user asks for MAXREC rows, taht's what (s)he should get. ONLY if the service's "maximum permitted value for MAXREC" is reached should one react. But in the manner described in 2.8.4, using an INFO element.
    Added:
    >
    >
    • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. Only if "TOP 0" is used might one call a query a null query. This is mainly based on the distinction between request and query (the payload of a request) that has been made in the mailing lists.
    • s2.4.8 Similar to MAXREC I don't think this parameter should go together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It could be noted maybe that in general it is good practice to have such columns, "createDate", "updateDate", explicitly added to their data model. Maybe there is a place in the metadata of a table in case a table contains only rows all with the same updateTime.
    • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well.
    • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
     
    • s2.4.14 p17 par2 "Clients should not repeat parameters in a
    request". Seems to be a SHOULD requirement on clients.
    • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ?
    Changed:
    <
    <
    Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
    >
    >
    Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to
    Added:
    >
    >
    using the default schema.
     
    • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear.Various comments on the actual metadata prescription. I have a proposal for a datamodel for the TAP_SCHEMA attached to this page (below).It is a JPEG version of a MagicDraw model which is available in UML form here\.
    NB, the VO-URP GoogleCode project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically.:
      • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
    Changed:
    <
    <
      • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well.
    >
    >
      • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this
    Added:
    >
    >
    table as well.
     I.e. I suggest an extra row, "view_sql, the SQL that defines this view (for rows with table_type=view).
      • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
      • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries. It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does
    not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
      • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
      • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
        select primary,.indexed,std from tap_schema.columns
        ?
      • s2.6 Metadata prescription for foreign keys is missing and are very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
      • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.
    • s2.6 p19 par2* "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
    select * 
      from TAP_SCHEMA.tables
     where table_name like 'TAP_SCHEMA.%'
    
      • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
      • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
    • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
      select * from TAP_SCHEMA.tableset
      as that table does not exist.
    • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS interaction, must check later in doc.
    Changed:
    <
    <
    • *s2.7 p20 par7* "... any type of file ... do something useful with the file."
    >
    >
    • s2.7 p20 par7 "... any type of file ... do something useful with the file."
     Check rest of document whether examples are given of this for other parameters. Eg REGION (for STC mask upload?).
    Changed:
    <
    <
    Otherwise better to remove mention of this (also STC in example).
    >
    >
    Otherwise better to remove mention of this (also STC in example).
     
    • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
    • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
    • s2.8.1 p21 par 1 "The first data row should give the column name"
      • First is there a distinction between data rows and other rows?
      • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
    • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is it really the value of the REQUEST parameter? * s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?
    Changed:
    <
    <
    That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A Tableset represents the whole database. A single WHERE clause can not query that.
    >
    >
    That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A Tableset represents the whole database.
    Added:
    >
    >
    A single WHERE clause can not query that.
     I would say this option of restricting a tableset XML document should not be available, as it needs to be defined and leads to unnessecary complications.Thsough ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
    • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely only to cover this case?
    • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." In any case, assume this should read: "...one TABLE element..."?
    • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]"
    Changed:
    <
    <
    I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,
    >
    >
    I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,
     should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it.
    Changed:
    <
    <
    That includes the VODataServices spec. TBD was mentioned earlier, don't repeat.
    >
    >
    That includes the VODataServices spec. This comment is a duplicate of one above, but still relevant.
     
    • s2.8.5 "Overflows" I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
    Changed:
    <
    <
    • s2.10 I would suggest that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11.
    >
    >
    • s2.10 I would suggest that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section,
    Added:
    >
    >
    as well as section 2.4.11.
     
    • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
    • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
    • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with other statemennts (though has my preference).
    • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be completed.
    • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
    • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends on the query pure and simple what is returned. If a user queries
      select ra, dec ...
      than the service MUST return an ra and a dec column.
    • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.


    <--  
    -->
    • TAP_METADATA.jpg:
      TAP_METADATA.jpg

    META FILEATTACHMENT attr="" comment="" date="1234968163" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="437828" user="GerardLemson" version="1.1"

    Revision 182009-02-19 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"

    ---------- PRELIMINARY -----------------

    My workspace for commenting on TAP 0.31... Others than GerardLemson should not use this page.

    Major points/issues/questions

    • /sync vs /async: I would like it to be possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync only service should be allowed. But I can understand if some implementers would prefer always /async for data queries, so that should be possible as well. Should therefore be a capability.
    Changed:
    <
    <
    • Metadata:
    >
    >
    • Metadata: (at the bottom of this page a proposal for a UML data model containing all contents already in model and extra. From it XML schema and TAP_SCHEMA tables can be easily derived.)
     
      • comments on how to access metadat
        1. REQUEST=ADQL/Param : query to TAP_SCHEMA tables => tabular result: OK
        2. REQUEST=getTableMetadata => VODataService result(?); OK but Unclear how to actually query for this. Is it REQUEST=getTableMetadata&FORMAT=xml ?And unclear how the WHERE clause should be implemented. Is it really necessary to bring in this kind of complication? We have ADQL and to lesser extent ParamQuery to allow users any type of flexibility in querying, and is supported at the moment as querying for data itself is supported. The WHERE clause in a getTableSet query is ill defined anyway I think, as it can only talk to one table, and a table set is really the result of querying all TAP_SCHEMA tables.
        3. 2.8.2 Tableset queries +> empty VOTable with no data, multiple tables. Unclear how to actually query for this. Is it REQUEST=getTableMetadata&FORMAT=votable ? I suppose for this reason Francois has added his primaryLey and foreignKey etc constructs to VOTable? Why not allow the VOTable version of the metadata to be the obvious serialisation to VOTable of the TAP_SCHEMA tables? I.e. what I would get if I executed
          select * from TAP_SCHEMA.schemas
          and
          select * from TAP_SCHEMA.tables
          etc, with the results combined in one VOTable.No alternative representation of the metadata is then required.
        4. 2.8.3 "VOSI-table metadata": seems not to exist in VOSI doc.
      • comments on content of metadata:
        1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
        2. indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
        3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
        4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.
    • Case sensitivity:
      • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
      • propose that case sensitivity is only an issue for column values, not names of tables and columns etc.
      • Inconsistent with ParamQuery/WHERE treatment of string fields.
      • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).
      • I would suggest that the ADQL keywords and table/column/function identifiers be case insensitive as ADQL defines.
    • Grouping of and dependencies between parameters for the different request types should be made explicit.
    • Separation of some of the ParamQuery details from into a separate document.This could be the "Common elements in the DAL2 family of services" specification.Note, it is very feasible to design a next SIAP spec as follows: 1 Create a data model for SIAP2, for its metadata really.
      1. Map it to a relational data model.
      2. Replace the queryData part from SIAP with the REQUEST=doQuery part in this specialisation of TAP.
      3. (If in the future we develop TAP further, and allow for example BLOB datatypes, one might even implement the getData part as an ADQL query!equires some though on serialising BLOBs ofcourse)
      • Note that this (apart from the last step) is exactly he approach originally taken originally in SNAP, now SimDB+!SimDAP. SimDB implements the first three steps. SimDAP the getData part.
      • Note furthermore that this is an immediate answer to Roy Williams' concern that TAP does not support interoperability. It does insofar that the main tasks for new spec can shift to designing the proper common data model. The query protocol is already given !

    line-by-line notes/questions/issues on reading TAP 0.31

    Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf

    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others. In red comments that require checking elsewhere in document.

    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • skip 1.2 for now, must be matched to normative sections.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilities, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
      • Why does this spec, which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    • *s2.4.5 p13 list* Is it allowed for the VOTable to contain data in all its DATA types available: TABLEDATA BINARY FITS, LINKs iso DATA?
    • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. I believe a TAP service should only impose its own "maximum permitted value of MAXREC", not some default value that comes into play when a user does not specify MAXREC. And only in that case should an overflow notification be given (s2.8.5).
    • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". IF a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. If a user asks for MAXREC rows, taht's what (s)he should get. ONLY if the service's "maximum permitted value for MAXREC" is reached should one react. But in the manner described in 2.8.4, using an INFO element.
    • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. Only if "TOP 0" is used might one call a query a null query. This is mainly based on the distinction between request and query (the payload of a request) that has been made in the mailing lists.
    • s2.4.8 Similar to MAXREC I don't think this parameter should go together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It could be noted maybe that in general it is good practice to have such columns, "createDate", "updateDate", explicitly added to their data model. Maybe there is a place in the metadata of a table in case a table contains only rows all with the same updateTime.
    • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well.
    • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
    • s2.4.14 p17 par2 "Clients should not repeat parameters in a
    request". Seems to be a SHOULD requirement on clients.
    • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
    • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear.Various comments on the actual metadata prescription. I have a proposal for a datamodel for the TAP_SCHEMA attached to this page (below).It is a JPEG version of a MagicDraw model which is available in UML form here\.
    NB, the VO-URP GoogleCode project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically.:
      • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
      • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well.I.e. I suggest an extra row, "view_sql, the SQL that defines this view (for rows with table_type=view).
      • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
      • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries. It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does
    not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
      • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
      • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
        select primary,.indexed,std from tap_schema.columns
        ?
      • s2.6 Metadata prescription for foreign keys is missing and are very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
      • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.
    • s2.6 p19 par2* "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
    select * 
      from TAP_SCHEMA.tables
     where table_name like 'TAP_SCHEMA.%'
    
      • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
      • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
    • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
      select * from TAP_SCHEMA.tableset
      as that table does not exist.
    • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS interaction, must check later in doc.
    • *s2.7 p20 par7* "... any type of file ... do something useful with the file." Check rest of document whether examples are given of this for other parameters. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (also STC in example).
    • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
    • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
    • s2.8.1 p21 par 1 "The first data row should give the column name"
      • First is there a distinction between data rows and other rows?
      • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
    • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is it really the value of the REQUEST parameter? * s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A Tableset represents the whole database. A single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined and leads to unnessecary complications.Thsough ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
    • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely only to cover this case?
    • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." In any case, assume this should read: "...one TABLE element..."?
    • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it.That includes the VODataServices spec. TBD was mentioned earlier, don't repeat.
    • s2.8.5 "Overflows" I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
    • s2.10 I would suggest that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11.
    • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
    • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
    • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with other statemennts (though has my preference).
    • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be completed.
    • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
    • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends on the query pure and simple what is returned. If a user queries
      select ra, dec ...
      than the service MUST return an ra and a dec column.
    • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.


    <--  
    -->
    • TAP_METADATA.jpg:
      TAP_METADATA.jpg

    META FILEATTACHMENT attr="" comment="" date="1234968163" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="437828" user="GerardLemson" version="1.1"

    Revision 172009-02-19 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"

    ---------- PRELIMINARY -----------------

    My workspace for commenting on TAP 0.31... Others than GerardLemson should not use this page.
    Changed:
    <
    <

    Major points/issues/uqestions

    >
    >
    Added:
    >
    >

    Major points/issues/questions

    • /sync vs /async: I would like it to be possible to make a choice for implementing /sync and/or /async and not mandate both /sync and /asyn ADQL.I think /async is so much harder to implement that a /sync only service should be allowed. But I can understand if some implementers would prefer always /async for data queries, so that should be possible as well. Should therefore be a capability.
     
    • Metadata:
      • comments on how to access metadat
        1. REQUEST=ADQL/Param : query to TAP_SCHEMA tables => tabular result: OK
        2. REQUEST=getTableMetadata => VODataService result(?); OK but Unclear how to actually query for this. Is it REQUEST=getTableMetadata&FORMAT=xml ?And unclear how the WHERE clause should be implemented. Is it really necessary to bring in this kind of complication? We have ADQL and to lesser extent ParamQuery to allow users any type of flexibility in querying, and is supported at the moment as querying for data itself is supported. The WHERE clause in a getTableSet query is ill defined anyway I think, as it can only talk to one table, and a table set is really the result of querying all TAP_SCHEMA tables.
        3. 2.8.2 Tableset queries +> empty VOTable with no data, multiple tables. Unclear how to actually query for this. Is it REQUEST=getTableMetadata&FORMAT=votable ? I suppose for this reason Francois has added his primaryLey and foreignKey etc constructs to VOTable?
    Changed:
    <
    <
    Why not allow the VOTable version of the metadata to be the ibvious serialisation to VOTable of the TAP_SCHEMA tables?
    >
    >
    Why not allow the VOTable version of the metadata to be the obvious serialisation to VOTable of the TAP_SCHEMA tables?
     I.e. what I would get if I executed
    select * from TAP_SCHEMA.schemas
    and
    select * from TAP_SCHEMA.tables
    etc, with the results combined in one VOTable.No alternative representation of the metadata is then required.
        1. 2.8.3 "VOSI-table metadata": seems not to exist in VOSI doc.
      • comments on content of metadata:
        1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
        2. indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
        3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
        4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.
    • Case sensitivity:
      • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
    Changed:
    <
    <
      • propose that case sensitivity is only an issue for column values.
    >
    >
      • propose that case sensitivity is only an issue for column values, not names of tables and columns etc.
    Added:
    >
    >
      • Inconsistent with ParamQuery/WHERE treatment of string fields.
     
      • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).
      • I would suggest that the ADQL keywords and table/column/function identifiers be case insensitive as ADQL defines.
    Changed:
    <
    <
    • Separation and dependencies between parameters for the different request types should be made clear.
    >
    >
    • Grouping of and dependencies between parameters for the different request types should be made explicit.
     
    • Separation of some of the ParamQuery details from into a separate document.This could be the "Common elements in the DAL2 family of services" specification.Note, it is very feasible to design a next SIAP spec as follows: 1 Create a data model for SIAP2, for its metadata really.
      1. Map it to a relational data model.
      2. Replace the queryData part from SIAP with the REQUEST=doQuery part in this specialisation of TAP.
      3. (If in the future we develop TAP further, and allow for example BLOB datatypes, one might even implement the getData part as an ADQL query!equires some though on serialising BLOBs ofcourse)
    Changed:
    <
    <
      • Note that this (apart from the last step) is exactly he approach originally taken in SNAP, no SimDB+!SimDAP! SimDB implements the first three steps. SimDAP the getData part.
    >
    >
      • Note that this (apart from the last step) is exactly he approach originally taken originally in SNAP, now SimDB+!SimDAP. SimDB implements the first three steps. SimDAP the getData part.
     
      • Note furthermore that this is an immediate answer to Roy Williams' concern that TAP does not support interoperability. It does insofar that the main tasks for new spec can shift to designing the proper common data model. The query protocol is already given !

    line-by-line notes/questions/issues on reading TAP 0.31

    Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf

    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others. In red comments that require checking elsewhere in document.

    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • skip 1.2 for now, must be matched to normative sections.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    Changed:
    <
    <
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    >
    >
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database),
    Added:
    >
    >
    is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
     
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilities, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
      • Why does this spec, which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    Changed:
    <
    <
    • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available: TABLEDATA BINARY FITS, LINKs iso DATA?
    >
    >
    • *s2.4.5 p13 list* Is it allowed for the VOTable to contain data in all its DATA types available: TABLEDATA BINARY FITS, LINKs iso DATA?
     
    • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. I believe a TAP service should only impose its own "maximum permitted value of MAXREC", not some default value that comes into play when a user does not specify MAXREC. And only in that case should an overflow notification be given (s2.8.5).
    • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". IF a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. If a user asks for MAXREC rows, taht's what (s)he should get. ONLY if the service's "maximum permitted value for MAXREC" is reached should one react. But in the manner described in 2.8.4, using an INFO element.
    • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. Only if "TOP 0" is used might one call a query a null query. This is mainly based on the distinction between request and query (the payload of a request) that has been made in the mailing lists.
    • s2.4.8 Similar to MAXREC I don't think this parameter should go together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It could be noted maybe that in general it is good practice to have such columns, "createDate", "updateDate", explicitly added to their data model. Maybe there is a place in the metadata of a table in case a table contains only rows all with the same updateTime.
    • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well.
    • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
    • s2.4.14 p17 par2 "Clients should not repeat parameters in a
    request". Seems to be a SHOULD requirement on clients.
    • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
    • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear.Various comments on the actual metadata prescription. I have a proposal for a datamodel for the TAP_SCHEMA attached to this page (below).It is a JPEG version of a MagicDraw model which is available in UML form here\.
    NB, the VO-URP GoogleCode project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically.:
      • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
      • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well.I.e. I suggest an extra row, "view_sql, the SQL that defines this view (for rows with table_type=view).
      • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
      • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries. It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does
    not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
      • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
      • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
        select primary,.indexed,std from tap_schema.columns
        ?
      • s2.6 Metadata prescription for foreign keys is missing and are very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
      • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.
    • s2.6 p19 par2* "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
    select * 
      from TAP_SCHEMA.tables
     where table_name like 'TAP_SCHEMA.%'
    
      • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
      • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
    • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
      select * from TAP_SCHEMA.tableset
      as that table does not exist.
    • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS interaction, must check later in doc.
    • *s2.7 p20 par7* "... any type of file ... do something useful with the file." Check rest of document whether examples are given of this for other parameters. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (also STC in example).
    • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
    • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
    • s2.8.1 p21 par 1 "The first data row should give the column name"
      • First is there a distinction between data rows and other rows?
      • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
    • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is it really the value of the REQUEST parameter? * s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A Tableset represents the whole database. A single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined and leads to unnessecary complications.Thsough ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
    • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely only to cover this case?
    • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." In any case, assume this should read: "...one TABLE element..."?
    • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]"
    Changed:
    <
    <
    I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,
    >
    >
    I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,
     should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it.
    Changed:
    <
    <
    That includes the VODataServices spec.
    >
    >
    That includes the VODataServices spec. TBD was mentioned earlier, don't repeat.
     
    • s2.8.5 "Overflows" I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
    • s2.10 I would suggest that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11.
    • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
    Added:
    >
    >
    • s2.10.3 p27 par1 "The must implement a SELECT parameter" I suppose this should be "The service MUST support a SELECT parameter." ?As ParamQuery is otional, a TAP service must accept SELECT parameters without error, but need not implement it.
    • s2.20.5 p28 par4 "the field “observer” must contain the case insensitive substring “smith”" First I guess that the boldfaced-ness of the must here is inappropriate. Does not correspond to meaning in IETF RFC 2119 I think. Case-insensitiveness is inconsistent with other statemennts (though has my preference).
    • s2.10.5 p29 par1 "... not attempted to detail the BNF for the numeric, string, and date tokens". Considering that later in the section special forms of the string parameter are described, it would be good if the BMF would be completed.
    • s2.11 p30-31 par1 How should one query a database that declares to have a boolean column? Should DB understand both 0/1 and false/true?This may be a charge to ADQL parsers/transformers. Could it be a capability for a boolean column? Note that boolean does not exist in SQL92, and in sql99 has values true and false (and null).
    • s2.12 p33 par2 "then the output may also use multiple columns". I would think it depends on the query pure and simple what is returned. If a user queries
      select ra, dec ...
      than the service MUST return an ra and a dec column.
    • s2.12 p33 par3 "and may be aggregated with the VOTable GROUP construct" I would think this is quite difficult to do correctly, and easy to do wrong especially for ADQL queries.It requires a parser to understand a query in great detail, more then we might expect from the of-the-shelf parsers taht will be written. And is it necessary. When a user submits a query,(s)he is assumed to understand the schema and the query and understand how things belong together.
     
    Added:
    >
    >
     


    <--  
    -->
    • TAP_METADATA.jpg:
      TAP_METADATA.jpg

    META FILEATTACHMENT attr="" comment="" date="1234968163" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="437828" user="GerardLemson" version="1.1"

    Revision 162009-02-19 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"

    ---------- PRELIMINARY -----------------

    My workspace for commenting on TAP 0.31... Others than GerardLemson should not use this page.
    Changed:
    <
    <

    Notes/questions/issues on reading TAP 0.31

    >
    >
    Added:
    >
    >

    Major points/issues/uqestions

    • Metadata:
      • comments on how to access metadat
        1. REQUEST=ADQL/Param : query to TAP_SCHEMA tables => tabular result: OK
        2. REQUEST=getTableMetadata => VODataService result(?); OK but Unclear how to actually query for this. Is it REQUEST=getTableMetadata&FORMAT=xml ?And unclear how the WHERE clause should be implemented. Is it really necessary to bring in this kind of complication? We have ADQL and to lesser extent ParamQuery to allow users any type of flexibility in querying, and is supported at the moment as querying for data itself is supported. The WHERE clause in a getTableSet query is ill defined anyway I think, as it can only talk to one table, and a table set is really the result of querying all TAP_SCHEMA tables.
        3. 2.8.2 Tableset queries +> empty VOTable with no data, multiple tables. Unclear how to actually query for this. Is it REQUEST=getTableMetadata&FORMAT=votable ? I suppose for this reason Francois has added his primaryLey and foreignKey etc constructs to VOTable? Why not allow the VOTable version of the metadata to be the ibvious serialisation to VOTable of the TAP_SCHEMA tables? I.e. what I would get if I executed
          select * from TAP_SCHEMA.schemas
          and
          select * from TAP_SCHEMA.tables
          etc, with the results combined in one VOTable.No alternative representation of the metadata is then required.
        4. 2.8.3 "VOSI-table metadata": seems not to exist in VOSI doc.
      • comments on content of metadata:
        1. foreign keys MUST be queriable (though may not exists ofcourse), therefore added to metadata
        2. indexes MUST be queriable (though may not exists ofcourse) , but MUST NOT be specified simply with an index=true attribute on column metadata
        3. "SQL type" SHOULD (MUST?) be added as possible data type to column metadata.
        4. IF UDFs are really part of ADQL, metadata about them MUST be queriable (though ... ); maybe here also the standard functions such as INTERSECTS etc should then be specified IF they are supported.
    • Case sensitivity:
      • ADQL is case insensitive. So are some major online databases (SDSS, Millennium, others?). So are many default settings on relational databases.
      • propose that case sensitivity is only an issue for column values.
      • Propose to make this a capability, possibly can be added at level of complete database, or schema, or table, or even column level. It is only relevant for (VAR)CHAR columns, maybe the T and Z in iso8601 dates(?).
      • I would suggest that the ADQL keywords and table/column/function identifiers be case insensitive as ADQL defines.
    • Separation and dependencies between parameters for the different request types should be made clear.
    • Separation of some of the ParamQuery details from into a separate document.This could be the "Common elements in the DAL2 family of services" specification.Note, it is very feasible to design a next SIAP spec as follows: 1 Create a data model for SIAP2, for its metadata really.
      1. Map it to a relational data model.
      2. Replace the queryData part from SIAP with the REQUEST=doQuery part in this specialisation of TAP.
      3. (If in the future we develop TAP further, and allow for example BLOB datatypes, one might even implement the getData part as an ADQL query!equires some though on serialising BLOBs ofcourse)
      • Note that this (apart from the last step) is exactly he approach originally taken in SNAP, no SimDB+!SimDAP! SimDB implements the first three steps. SimDAP the getData part.
      • Note furthermore that this is an immediate answer to Roy Williams' concern that TAP does not support interoperability. It does insofar that the main tasks for new spec can shift to designing the proper common data model. The query protocol is already given !

    line-by-line notes/questions/issues on reading TAP 0.31

     Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf

    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others. In red comments that require checking elsewhere in document.

    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • skip 1.2 for now, must be matched to normative sections.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    Changed:
    <
    <
    • s2.4.1 p11 par2, list The statement on getCapabilitie, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
    >
    >
    • s2.4.1 p11 par2, list The statement on getCapabilities, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
     
      • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
    Changed:
    <
    <
      • Why does this spec which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to bve no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that correct VOSI spec?)
    >
    >
      • Why does this spec, which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata?
    Added:
    >
    >
    Actually, there seems to be no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that correct VOSI spec?)
     
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    Added:
    >
    >
    • s2.4.5 p13 list Is it allowed for the VOTable to contain data in all its DATA types available: TABLEDATA BINARY FITS, LINKs iso DATA?
     
    • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. I believe a TAP service should only impose its own "maximum permitted value of MAXREC", not some default value that comes into play when a user does not specify MAXREC. And only in that case should an overflow notification be given (s2.8.5).
    • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". IF a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. If a user asks for MAXREC rows, taht's what (s)he should get. ONLY if the service's "maximum permitted value for MAXREC" is reached should one react. But in the manner described in 2.8.4, using an INFO element.
    • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. Only if "TOP 0" is used might one call a query a null query. This is mainly based on the distinction between request and query (the payload of a request) that has been made in the mailing lists.
    • s2.4.8 Similar to MAXREC I don't think this parameter should go together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It could be noted maybe that in general it is good practice to have such columns, "createDate", "updateDate", explicitly added to their data model. Maybe there is a place in the metadata of a table in case a table contains only rows all with the same updateTime.
    • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well.
    • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
    • s2.4.14 p17 par2 "Clients should not repeat parameters in a
    request". Seems to be a SHOULD requirement on clients.
    • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
    • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear.Various comments on the actual metadata prescription. I have a proposal for a datamodel for the TAP_SCHEMA attached to this page (below).It is a JPEG version of a MagicDraw model which is available in UML form here\.
    NB, the VO-URP GoogleCode project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically.:
      • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
      • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well.I.e. I suggest an extra row, "view_sql, the SQL that defines this view (for rows with table_type=view).
      • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
      • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries. It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService starting with Ray's email here. One problem that has been identified there is that ADQL does
    not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
      • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the again the data model proposal below).
      • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database? I.e what are valid values for
        select primary,.indexed,std from tap_schema.columns
        ?
      • s2.6 Metadata prescription for foreign keys is missing and are very important. See discussions in same mail thread starting here. A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
      • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.
    • s2.6 p19 par2* "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
    select * 
      from TAP_SCHEMA.tables
     where table_name like 'TAP_SCHEMA.%'
    
      • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
      • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
    • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
      select * from TAP_SCHEMA.tableset
      as that table does not exist.
    • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS interaction, must check later in doc.
    • *s2.7 p20 par7* "... any type of file ... do something useful with the file." Check rest of document whether examples are given of this for other parameters. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (also STC in example).
    • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
    • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
    • s2.8.1 p21 par 1 "The first data row should give the column name"
      • First is there a distinction between data rows and other rows?
      • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
    Changed:
    <
    <
    • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query?Is this meant for param queies only? Can we not simply deal with non-table results as special REQUEST-s?
    >
    >
    • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query? Is it really the value of the REQUEST parameter? * s2.8.2 p21 par2 footnote " a tableset query can be restricted by the WHERE clause of that query" I assume this WHERE clause refers to the ParamQuery WHERE clause?
    Added:
    >
    >
    That clause can only contain constraints on a single table, can not include joins. The tableset table does not exists. A Tableset represents the whole database. A single WHERE clause can not query that. I would say this option of restricting a tableset XML document should not be available, as it needs to be defined and leads to unnessecary complications.Thsough ADQL users can query all the metadata tables in any way they want. Through the getTableMetadata/XML they get all metadata in one go. Why add more ill defined complications?
    • s2.8.2 p21 par3 "The special use of VOTable must be a dataless VOTable in which the header elements denote the structure of the tableset"An alternative use of VOTable for representing table sets would be for it contain the serialisation of the TAP_SCHEMA tables as individual table elements.In the current proposal new features have to be introduced into the VOTable spec for each new metadata feature we may think of: indexes, foreign keys, primary keys. The fact that Francois has added some way to deal with the latter two to the new VOTable proposal is likely only to cover this case?
    • s2.8.2 p21 par3 "...there MUST be on VOTable element per table ..." In any case, assume this should read: "...one TABLE element..."?
    • s2.8.3 p21 par1 "Representations of VOSI outputs ... table metadata) must be as defined in the VOSI standard [6]" I do not see any mention of table metadata in the VOSI spec. In any case I do not see why TAP, which is the main spec for defining database metadata,
    should defer to another spec for representing that. I'd think it is TAP's responsibility to define the complete content of the metadata, others should follow it.That includes the VODataServices spec.
    • s2.8.5 "Overflows" I think the only overflow that can happen and should lead to an error info message is when the service returns fewer rows than the client might have recieved if there are no restrictions set by the service. If the client explicitly asks for a maximum of 1000 rows, through TOP (or MAXREC for param queries)to be returned, and there are 1000 rows available, 1000 should be returned, WITHOUT ANY MESSAGE OR EXTRA ROW! If the user asks explicitly, or implicitly (no TOP/MAXREC) for more than the service is willing to return, then I think the service should return the its maximum number of rows but give a warning message indicating this truncation.I would not even in that case add an extra row. The info message should be explicit and sufficient. I believe VOTable 1.2 has explicitly for this purpose a closing INFO element in its DATA?
    • s2.10 I would suggest that all parameters defined in this section are deemed irrelevant when query=ADQL. I would include therefore the subsection on MTIME and MAXREC in this section, as well as section 2.4.11.
    • s2.10 I think that parts of this section could be usefully extracted and made into separate spec. In particular the "meta-specification" on how to create ranges, lists as values forparameters have already been needed and used in SSA for example. A proper BNF for these would be good, as is used here for the WHERE clause only.This could be the "Common elements in the DAL2 family of services" specification.
     
    Deleted:
    <
    <
     


    <--  
    -->
    • TAP_METADATA.jpg:
      TAP_METADATA.jpg

    META FILEATTACHMENT attr="" comment="" date="1234968163" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="437828" user="GerardLemson" version="1.1"

    Revision 152009-02-18 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"

    ---------- PRELIMINARY -----------------

    My workspace for commenting on TAP 0.31... Others than GerardLemson should not use this page.

    Notes/questions/issues on reading TAP 0.31

    Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf
    Changed:
    <
    <
    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others.
    >
    >
    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section).
    Added:
    >
    >
    Some of these comments have (no doubt) been noted by others. In red comments that require checking elsewhere in document.
     
    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • skip 1.2 for now, must be matched to normative sections.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
    Changed:
    <
    <
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    >
    >
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
     
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilitie, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
      • Why does this spec which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to bve no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. I believe a TAP service should only impose its own "maximum permitted value of MAXREC", not some default value that comes into play when a user does not specify MAXREC. And only in that case should an overflow notification be given (s2.8.5).
    • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". IF a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. If a user asks for MAXREC rows, taht's what (s)he should get. ONLY if the service's "maximum permitted value for MAXREC" is reached should one react. But in the manner described in 2.8.4, using an INFO element.
    • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. Only if "TOP 0" is used might one call a query a null query. This is mainly based on the distinction between request and query (the payload of a request) that has been made in the mailing lists.
    • s2.4.8 Similar to MAXREC I don't think this parameter should go together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It could be noted maybe that in general it is good practice to have such columns, "createDate", "updateDate", explicitly added to their data model. Maybe there is a place in the metadata of a table in case a table contains only rows all with the same updateTime.
    • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well.
    • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
    • s2.4.14 p17 par2 "Clients should not repeat parameters in a
    request". Seems to be a SHOULD requirement on clients.
    • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
    Changed:
    <
    <
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
    >
    >
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ?
    Added:
    >
    >
    Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
     
    • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear.
    Changed:
    <
    <
    Various comments on the actual metadata prescription. I have a proposal for a datamodel for the TAP_SCHEMA in UML form here:
    >
    >
    Various comments on the actual metadata prescription. I have a proposal for a datamodel for the TAP_SCHEMA attached to this page (below).It is a JPEG version of a MagicDraw model which is available in UML form here\.
    Added:
    >
    >
    NB, the VO-URP GoogleCode project is a split-off from the SimDB development in Volute. XML schema serialisations of the model, as well as a specific design for DDL schemas can be derived form the UML automatically.:
     
      • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
      • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well.I.e. I suggest an extra row, "view_sql, the SQL that defines this view (for rows with table_type=view).
      • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
    Changed:
    <
    <
      • third table I believe it would be very useful to also have an indication of the SQL type of a column.
    >
    >
      • third table, datatype I believe it would be very useful to also have an indication of the SQL type of a column.
     It is that type, and not its mapping to VOTable types that is of relevance when constructing queries. It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types.
    Changed:
    <
    <
    This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService
    >
    >
    This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataService
     starting with Ray's email here. One problem that has been identified there is that ADQL does not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
      • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index.
    Changed:
    <
    <
    This may require 2 extra tables. (see the [[http://vo-urp.googlecode.com/svn/trunk/input/tap-metadata/images/TAP_METADATA.jpg][data model proposal]).
      • s2.7 Metadata prescription for foreign keys is missing and are very important. See discussions in same
    >
    >
    This may require 2 extra tables. (see the again the data model proposal below).
      • third table What are the datatypes of Primary, Indexed and Std? All boolean? How should that be stored in a database?
    Added:
    >
    >
    I.e what are valid values for
    select primary,.indexed,std from tap_schema.columns
    ?
      • s2.6 Metadata prescription for foreign keys is missing and are very important. See discussions in same
     mail thread starting here.
    Changed:
    <
    <
    A proposal for a model is given here again. A proposal for an XML representation is given in
    >
    >
    A proposal for a model is given in the diagram below again. A proposal for an XML representation is given in
     http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
    Changed:
    <
    <
      • s2.7 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model has a suggestion for this.
    >
    >
      • s2.6 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model below has a suggestion for modelling this.
    Added:
    >
    >
    • s2.6 p19 par2* "The schema name TAP_UPLOAD should be included in the table name for any tables uploaded to the service by a client." I suppose this is a requirement on the client? Must TAP_UPLOAD also be added in the TAP_SCHEMA.schemas table? * s2.6 p19 par3 "...may be queried for tables named TAP_SCHEMA.*..." Is this intended to imply the following ADQL query?
    select * 
      from TAP_SCHEMA.tables
     where table_name like 'TAP_SCHEMA.%'
    
      • s2.6 p19 par4 "...“Primary” indicates that the column should be visible in the default (narrow) view of a table" I suppose this is only relevant for Param queries?
      • s2.6 p19 par4 "Std ... a given column is defined by some standard". What is the relation of this to UTYPE? Is it required. Is it useful without any more indication of what stadard
    • s2.6 p19 par5 "A simple tablesetquery must return the entire tableset ..." Very unclear. Why not define it accurately here, or leave the whole description to section 2.8.2?E.g. how does one issue such a query? Certainly (I hope) not by
      select * from TAP_SCHEMA.tableset
      as that table does not exist.
    • s2.7 p19 par2 "Tables in the TAP_UPLOAD schema persist only for the lifetime of the query" I suppose the uploaded tables are visible only to the "session" as well.I.e. different requests can upload tables with the same name. How does this work in /async sessions. As long as the query has not completed, should the user be able to find the uploaded tables in other requests. Guess this depends much on UWS interaction, must check later in doc.
    • *s2.7 p20 par7* "... any type of file ... do something useful with the file." Check rest of document whether examples are given of this for other parameters. Eg REGION (for STC mask upload?). Otherwise better to remove mention of this (also STC in example).
    • s2.8.1 p20 par3 "... MIME type of text/xml;content=xvotable" Is different from the "application/x-votable+xml" Content-Type in the example in the previous section. Is that how it should be?
    • s2.8.1 p21 par1 "If a column value contains a comma the entire column value should be enclosed in double quotes." How do we deal with strings that contain commas as well as double quotes?Suggest to use "standard" that embedded double quotes should be doubled.
    • s2.8.1 p21 par 1 "The first data row should give the column name"
      • First is there a distinction between data rows and other rows?
      • Second, can we make this a MUST. What if all returned columns are strings and we can not be sure if first row contains column name.
    • s2.8.2 p21 par1 "If the target of the query is the special table TAP_SCHEMA.tableset ...". What is the "target" of a query?Is this meant for param queies only? Can we not simply deal with non-table results as special REQUEST-s?
     


    <--  
    -->
    • TAP_METADATA.jpg:
      TAP_METADATA.jpg

    META FILEATTACHMENT attr="" comment="" date="1234968163" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="437828" user="GerardLemson" version="1.1"

    Revision 142009-02-18 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"

    ---------- PRELIMINARY -----------------

    My workspace for commenting on TAP 0.31... Others than GerardLemson should not use this page.

    Notes/questions/issues on reading TAP 0.31

    Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf

    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others.

    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • skip 1.2 for now, must be matched to normative sections.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilitie, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
      • Why does this spec which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to bve no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. I believe a TAP service should only impose its own "maximum permitted value of MAXREC", not some default value that comes into play when a user does not specify MAXREC. And only in that case should an overflow notification be given (s2.8.5).
    • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". IF a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. If a user asks for MAXREC rows, taht's what (s)he should get. ONLY if the service's "maximum permitted value for MAXREC" is reached should one react. But in the manner described in 2.8.4, using an INFO element.
    • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. Only if "TOP 0" is used might one call a query a null query. This is mainly based on the distinction between request and query (the payload of a request) that has been made in the mailing lists.
    • s2.4.8 Similar to MAXREC I don't think this parameter should go together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It could be noted maybe that in general it is good practice to have such columns, "createDate", "updateDate", explicitly added to their data model. Maybe there is a place in the metadata of a table in case a table contains only rows all with the same updateTime.
    • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well.
    • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
    • s2.4.14 p17 par2 "Clients should not repeat parameters in a
    request". Seems to be a SHOULD requirement on clients.
    Changed:
    <
    <
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Should this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema. *
    >
    >
    • s2.5 This section seems to belong to 2.6, can it not be merged with that section?
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Following ADQL, shouldn't this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema.
    Added:
    >
    >
    • s2.6 I understand this section to imply that TAP should expose these three tables and make them accessible through ADQL and Param queries. If so, that might be made more explicitly clear.Various comments on the actual metadata prescription. I have a proposal for a datamodel for the TAP_SCHEMA in UML form here:
      • first table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In first row (schema_name), "catalog.schema", should this be [catalog.]schema ?
      • second table In second row (table_name), "catalog.schema.table", should this be [[catalog_name.[schema_name.]table_name?
      • second table IN third row (table_type). As apparently views are described in TAP_SCHEMA.tables, I think it would be useful to store the SQL(ADQL?) that defines this view in this table as well.I.e. I suggest an extra row, "view_sql, the SQL that defines this view (for rows with table_type=view).
      • third table 2nd row (table_name), "catalog.schema.table" should this be [[catalog.]schema.]table ?
      • third table I believe it would be very useful to also have an indication of the SQL type of a column. It is that type, and not its mapping to VOTable types that is of relevance when constructing queries. It is understoood that the result of a query is to be expressed as a VOTable, but VOTable is a messaging format, and should not determine how to express metadata for table sets, database really, that can be queried with ADQL.For example, date-like types are missing from the VOTable types. This issue has been discussed in the mailing list, in particular in some emails in the registry thread on VODataServicestarting with Ray's email here. One problem that has been identified there is that ADQL does
    not define data types explicitly. One reason why it seems not to need them in the language is because DDLs are not supported.But also the CAST function can now not be supported. One issue would therefore be which SQL types to use.
      • third table "indexed" This column is useless. To make proper use of indexes one needs to have their complete definition.This includes all the columns in a given index and the order in which they appear in the index. This may require 2 extra tables. (see the [[http://vo-urp.googlecode.com/svn/trunk/input/tap-metadata/images/TAP_METADATA.jpg][data model proposal]).
      • s2.7 Metadata prescription for foreign keys is missing and are very important. See discussions in same mail thread starting here. A proposal for a model is given here again. A proposal for an XML representation is given in http://www.ivoa.net/forum/registry/0811/2023.htm. Note that there has been a discussion between Francois Ochsenbein and me on some details of this model. In particularFO argues that to define a foreign key (FK) one also needs a primary key (PK). Imho this is not required for us here, though indeed it is required in all relational databases. But there FKsrepresent a constraint, whereas in my original proposal they define a pointer only.
      • s2.7 Since user defined functions are part of the ADQL language, the metadata should reflect this.I.e. we need a way to query for them. The data model has a suggestion for this.
     

    Deleted:
    <
    <
     
    <--  
    -->
    Added:
    >
    >
    • TAP_METADATA.jpg:
      TAP_METADATA.jpg
     
    Added:
    >
    >
    META FILEATTACHMENT attr="" comment="" date="1234968163" name="TAP_METADATA.jpg" path="TAP_METADATA.jpg" size="437828" user="GerardLemson" version="1.1"
     

    Revision 132009-02-18 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"

    ---------- PRELIMINARY -----------------

    My workspace for commenting on TAP 0.31... Others than GerardLemson should not use this page.

    Notes/questions/issues on reading TAP 0.31

    Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf

    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others.

    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • skip 1.2 for now, must be matched to normative sections.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilitie, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
      • Why does this spec which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to bve no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. I believe a TAP service should only impose its own "maximum permitted value of MAXREC", not some default value that comes into play when a user does not specify MAXREC. And only in that case should an overflow notification be given (s2.8.5).
    • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". IF a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. If a user asks for MAXREC rows, taht's what (s)he should get. ONLY if the service's "maximum permitted value for MAXREC" is reached should one react. But in the manner described in 2.8.4, using an INFO element.
    Changed:
    <
    <
    • *s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. Only if "TOP 0" is used might one call a query a null query. This is mainly based on the distinction between request and query (the payload of a request) that has been made in the mailing lists.
    >
    >
    • s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. Only if "TOP 0" is used might one call a query a null query. This is mainly based on the distinction between request and query (the payload of a request) that has been made in the mailing lists.
     
    • s2.4.8 Similar to MAXREC I don't think this parameter should go together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It could be noted maybe that in general it is good practice to have such columns, "createDate", "updateDate", explicitly added to their data model. Maybe there is a place in the metadata of a table in case a table contains only rows all with the same updateTime.
    Changed:
    <
    <
    >
    >
    • s2.4.11 This seems to me a perfect example of a meta-standard suitable for the "DAL-2 family of specifications": how to specify lists and ranges in DAL service parameters. Something similar was specified in SSA already as well.
    • s2.4.13 "Parameter names must not be case sensitive, but parameter values must be so." Seems to conflict with the requirement on LANG in 2.4.4. See also my comment on case sensitivity of ADQL queries above.
    Added:
    >
    >
    • s2.4.14 p17 par2 "Clients should not repeat parameters in a
    request". Seems to be a SHOULD requirement on clients.
    • s2.5 p17 par1 "[[catalog_name”.”[schema_name”.”]table_name]]" Should this be [[catalog_name”.”]schema_name”.”]table_name ? Note, if I am not mistaken, ADQL does not allow catalog_name..table_name , i.e. schema_name="" (possible IF catalog_name = ""), something which is allowed in SQLServer and corresponds to using the default schema. *
     


    <--  
    -->

    Revision 122009-02-18 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"

    ---------- PRELIMINARY -----------------

    My workspace for commenting on TAP 0.31... Others than GerardLemson should not use this page.

    Notes/questions/issues on reading TAP 0.31

    Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf

    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others.

    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • skip 1.2 for now, must be matched to normative sections.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilitie, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
      • Why does this spec which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to bve no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. I believe a TAP service should only impose its own "maximum permitted value of MAXREC", not some default value that comes into play when a user does not specify MAXREC. And only in that case should an overflow notification be given (s2.8.5).
    Added:
    >
    >
    • s2.4.7 p14 par4 "...if overflow occurs, MAXREC plus one rows should be returned to indicate that overflow occurred ...". IF a user requests that MAXREC rows are to be returned, either using this parameter, or using TOP in ADQL, I think MAXREC rows MUST be returned, not MAXREC+1. In particular, enforcing this would mean that the obvious implementation (using TOP or LIMIT in the SQL) would need to use TOP ..+1 etc. If a user asks for MAXREC rows, taht's what (s)he should get. ONLY if the service's "maximum permitted value for MAXREC" is reached should one react. But in the manner described in 2.8.4, using an INFO element.
    • *s2.4.7 p14 par5 "..null query, that is, a query which produces an empty table.." In its current form (i,e, using MAXREC) I would not call this a null query, but a null request. Only if "TOP 0" is used might one call a query a null query. This is mainly based on the distinction between request and query (the payload of a request) that has been made in the mailing lists.
    • s2.4.8 Similar to MAXREC I don't think this parameter should go together with ADQL. IF a table contains a "lastModfied" column, users can use it in their ADQL queries. If there is no such column it is an indication that it is not possible to pose this type of query. It could be noted maybe that in general it is good practice to have such columns, "createDate", "updateDate", explicitly added to their data model. Maybe there is a place in the metadata of a table in case a table contains only rows all with the same updateTime.
     


    <--  
    -->

    Revision 112009-02-18 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"

    ---------- PRELIMINARY -----------------

    Added:
    >
    >
    My workspace for commenting on TAP 0.31... Others than GerardLemson should not use this page.
     

    Notes/questions/issues on reading TAP 0.31

    Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf

    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others.

    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • skip 1.2 for now, must be matched to normative sections.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilitie, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
      • Why does this spec which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to bve no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
    Changed:
    <
    <
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there.
    >
    >
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there. I believe a TAP service should only impose its own "maximum permitted value of MAXREC", not some default value that comes into play when a user does not specify MAXREC. And only in that case should an overflow notification be given (s2.8.5).
     


    <--  
    -->

    Revision 102009-02-18 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"
    Added:
    >
    >

    ---------- PRELIMINARY -----------------

     

    Notes/questions/issues on reading TAP 0.31

    Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf

    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others.

    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • skip 1.2 for now, must be matched to normative sections.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilitie, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
      • Why does this spec which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to bve no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there.


    <--  
    -->

    Revision 92009-02-17 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"

    Notes/questions/issues on reading TAP 0.31

    Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf

    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others.

    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • skip 1.2 for now, must be matched to normative sections.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilitie, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
      • Why does this spec which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to bve no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
    Changed:
    <
    <
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    >
    >
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?
    Added:
    >
    >
    • s2.4.6 p14 par1 "...name for the table name SHOULD be an unqualified tablename...". Seems a requirement on clients, but not a MUST. What if not obeyed?
    • s2.4.7 MAXREC seems not necessary for ADQL, as TOP plays that role there.
     


    <--  
    -->

    Revision 82009-02-17 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"

    Notes/questions/issues on reading TAP 0.31

    Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf
    Changed:
    <
    <
    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section).
    >
    >
    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section). Some of these comments have (no doubt) been noted by others.
     
    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • skip 1.2 for now, must be matched to normative sections.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilitie, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
      • Why does this spec which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to bve no tables metadata in VOSI spec at all (I refer to http://www.ivoa.net/Documents/WD/GWS/VOSI-20081023.pdf, is that correct VOSI spec?)
    • s2.4.2 p12 par1 "The query string is case sensitive."
      • ADQL spec states (p4, 3rd line; p6 1st line): "Case insensitiveness otherwise stated" and "Both the identifiers and the keywords are case insensitive". So why does TAP go against this?
    Changed:
    <
    <
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive.
    >
    >
      • IF this is sometimes desirable, could this be a capability and would it be possible to state for a TAP service that it is in fact case0-insensitive. SkyServer and Millennium database are not case sensitive, as MS SQLServer is case insensitive by default. Note that for these databases the case-insensitivity even applies to values of CHAR and VARCHAR columns! The latter is not so in Postgres, though as far as keywords and table and column names also Postgres seems to be case insensitive (at least in my default installation on my desk top pc). (Maybe should look into report on different database systems by JVO in Victoria). Therefore there might be two modes of case insensitivity: keywords+schema and CHAR values. SQLServer allows case sensitivity, and this can be configured at the column level even. This might imply another metadata element for columns: isCaseSensitive. In any case it would be useful to see how other database handle case sensitivity (by default).
    Added:
    >
    >
    • s2.4.2 p12 par1 "...the case of table and column names must be preserved..." This seems a requirement on the client, or does it imply that if the client uses a different case for a table for example the service MUST report an error?
     
    • s2.4.2 p12 par2 "...the service must support the use of datetime/timestamp values in ISO8601 format." Apparently ISO8601 is still rather liberal and has different versions.
      • Is ISO8601:2004 intended?
    Changed:
    <
    <
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it does not allow
    >
    >
      • MS SQLServer 2005 seems not to support all allowed ISO8601 versions, even though it claims it is compatible. For example it seems (in my installation) not to allow yyyymmdd, needs extended version yyyy-mm-dd.
    Added:
    >
    >
      • An overview of other RDBS would be useful.
    • s2.4.2 p12 par3 "...enable the caller to perform spatial queries...MUST support the INTERSECTS..." Does this imply that if a published table contains pos.eq.ra and pos.eq.dec columns, one must implement INTERSECTS etc. Or are "spatial queries" a separate class of queries (namely those including INTERSECTS etc) which one may or may not support.
    • s2.4.2 p12 par3 "the extent of STCS support within the REGION function is left up to the implementation" I can read this as supporting no STC string at all, which implies really that I do not support REGION, which I MUST do when supporting spatial queries. Seems not consistent.
    • s2.4.2 p12 par4 "...should return an error if ... mix constants and column references for coordinate system and coordinate values." I do not understand the reason for this restriction at all. This seems like a change to the language, requiring different parsers etc. How far does this restriction go. Is the following query ok:
    select POINT(c.coordSys, t.ra, t.dec)
    from (select 'ICRS' as coordSys) c
    ,	 table t
    ...
    
    • s2.4.4 p13 "The service SHOULD implement the LANG parameter." What if the service does not, which language/version is supposed to be supported. Is this a capability ?
    • s2.4.5 p13 par1 Could the acceptable MIME types be listed explicitly?
    • s2.4.5 p13 list Might it be useful to have an html-table as possible return type. Such a result could be added to a wrapping web page, possibly AJAX like. Might TeX tables be of interest?

     
    <--  
    -->

    Revision 72009-02-17 - GerardLemson

     
    META TOPICPARENT name="GerardLemson"

    Notes/questions/issues on reading TAP 0.31

    Source: http://www.ivoa.net/internal/IVOA/TableAccess/TAP-0.31-20081124.pdf

    I know not all sections are normative, nevertheless I am commenting on them as well. (s=section,p=page,par=paragraph on page or in section).

    • s1 p4 par2: "... it is not a table containing links to data object ...". I suppose that if someone publishes a table that contains links to data sets, images or spectra, there is no problem with that. Queries might than indeed produce such links.
    • s1 end p4: ".. is not visible to users." I don't see why it is a good idea to aim at completely abstracting away form a user whether there is a relational database on the backend or not. In some sense tha fact that one can send ADQL, whihc is clearly an SQL dialect, makes users expect relational database technology. Also I think if this would carry through in our message to potential TAP implementors, that they could just as well implement this on files, we'd do them a disservice. The best way to suport ADQL queries is by storing one's results in a relational database and pass the ADQL, possibly slightly adapted, to the database. Not write one's own database engine. I would also like to see some specific database features such as indexes and foreign keys show up explicitly in the metadata.
    • s1 p5 par2: "... joins ... and provided the service supports these capabilities.". I would think that services MUST support joins, as those are an intricate part of ADQL and because service MUST support ADQL queries. Or is it possible to specify that one supports only a subset of ADQL?
    • s1 p5 par3:".. conforming to the second generation (DAL2) interface standards [ref]." It would be really good to have this [ref] ! A lot of arguments use this homogeneous family of standards, but where is the "meta-specification" that describes this?
    • s1.1.1: Confusing section. There seem to be three ways of querying for table metadata:
      1. querying standardised tables using ADQL or PARAMQUERY
      2. tableset queries
      3. VOSI queries
    • s1.1.2 p6 end par2:" ... (ADQL), a standardized subset of SQL92...". Is not quite correct. Is based on SQL92, but no strict subset.
    • s1.1.2 p6 par3: "... use an offtheshelf ADQL parser...". This is the problem with ADQL, that in general one can not simply pass it through to the underlying database, evenif properly supplied with the required user-defined-functions.
    • s1.1.2 p6 par3: "... simplified parametric queries for the most common use cases." How do we know what the "most common use cases" are? I think this depends strongly on the database. It likely refers to the usual suspect that cone search is the most common use case, is that true? Could be changed to "some common use cases".
    • s1.1.3 p6 par3: Use of UWS, which is not accepted yet, in this specification, would seem to require that TAP must define its view of what UWS is for those people who want to implement TAP before UWS is completely accepts. Same is true for possible dependencies on other not-yet-accepted standards.
    • s1.1.3 p7 par1: "... there are many more advanced use cases where synchronous queries are not sufficient." I would argue that this has not much to do with how "advanced" a use case is, as with queries requiring lots of work and/or resources on the server side. The query can be as simple as "select * from table", not advanced at all, but possibly leading to timeouts/overflows for /sync queries. Whereas other queries make very advanced use of ADQL, and precisely because of that (calculating statistics on the server iso download, proper index usage, proper database design etc) can be supported with /sync just as well. And /sync is MUCH easier to implement.
    • skip 1.2 for now, must be matched to normative sections.
    • s2 "Requirements for a TAP service (normative)" (my italics). It seems to me that there are some requirements in this section that are aimed at clients, not the service. Should identify those and if correct must something be done about that?
    • s2.1: As /sync is SO MUCH easier to implement, and can nevertheless provide more than adequate support (from experience with sync-only Millennium database), is it possible to change the requirements to something like: "A TAP service must support at least one of sync-ADQL and async-ADQL". I first thought that sync alone should be made mandatory, but I guess some people would like to only implement async.
    • s2.1 p9 I would think that table metadata MUST be provided. Without it no queries are possible. I know not enough abut the service metadata to say something about there SHOULD. Btw, why does VOSI have anything to say about the structure of a table set. I would think TAP is the place to specify this, and VOSI should adhere to this spec iso theother way around. But then I do not really know VOSI.
    • s2.1 p9 final par "...inheritance of requirements ...". This is relevant as well for SimDB. There we define a global data model for describing (3+1D/space-time/"cosmological") simulations. The model gets a mapping to TAP with the goal that users can use ADQL (sync only necessary!) to query SimDB implementations.
    • s2.2 p9 par1+2 and p10 par2 "...service must be represented as a tree structure..." and "... represent the service as a whole" and "...web resource must represent the results...". Is "represent" a formal concept in REST or so. Otherwise what is meant by this? Must everything under the root be related to the service?
    • s2.2 p10 par4 "...may return a cached copy...". Don't really understand this paragraph. Isn't this up to service. If it knows that a certain query always corresponds to a particular cached data product, why would it depend on a GET or a POST? Also (see par7) does it mean that /async requests can never return cached data?
    • s2.2 p10 par1 and par5 "A TAP service must provide a web resource with relative URL /sync" and "A TAP service must provide a web resource with relative URL /async." See the comment (@*2.1*) above for motivation. Could this be SHOULD or MAY? Or allow implementers to choose one (or both)?
    • s2.4 p11 par2 Not all combinations of the parameters are meaningful." Would be good to make an explicit matching somewhere, e.g. tabular.
    • s2.4.1 p11 par1 "A TAP client must set this parameter correctly ...". This is an example of comment @*2* above, a MUST requirement on a client. Is this appropriate.
    • s2.4.1 p11 par2 "If a service receives a spurious parameter ...". Is a parameter that is not in the list of parameters to be considered spurious as well, or is it an error?
    • s2.4.1 p11 par1 "If a TAP service receives a request without...". I assume that this concerns a TAP service request that has a /sync or /async added to the root, otherwise it seems to be inconsistent with the last par on p9, which does not mandate error.
    • s2.4.1 p11 par2, list Case of allowed values seems to have arbitrary case. Is this to be coordinated with the table on p11?
    • s2.4.1 p11 par2, list The statement on getCapabilitie, getAvaialability and especially getTableMetadata relate to corresponding VOSI metadata.
      • As VOSI is not accepted (correct?), might be good (formally necessary) to give TAP's view on what this means explicitly.
      • Why does this spec which seems to be the main way to talk to and about table sets/database, defer to another, not yet accepted spec, for table metadata? Actually, there seems to bve no tables metadata in VOSI spec at all (I refer to