Scope and Role of TAP Parameter QueryIntroductionFirst some history. Back when we began discussing TAP a couple of years ago, it was agreed that TAP should be able to query table metadata as well as table data. While all agreed that VOSI support was needed for use internal to the VO projects, some of us felt that the VOSI approach of a big block of XML describing the full tableset was not what we wanted to provide to science application writers (often non-professionals) for table metadata queries. Rather we wanted to use the same table query mechanism to query table metadata as well as table data, in the spirit of the elegant SQL information schema concept. While ADQL could be used for this purpose, at the time it was felt that requiring ADQL to support fully general table metadata queries, while nice as an advanced feature, was more than was needed and would require data providers to implement the TAP_SCHEMA as actual database tables, which we (AstroGrid in particular) wanted to avoid. A simple parameter query interface could support basic table metadata queries more flexibly than VOSI but without requiring full ADQL support for table metadata queries, sharing the same basic query i/o interface with the ADQL query. Most of what a client would want for table metadata queries could be provided by such simple queries such as<preamble>&FROM=TAP_SCHEMA.tableswhich would list and describe all tables supported by the TAP service, and <preamble>&FROM=TAP_SCHEMA.colums&WHERE=table_name,foowhich would describe the columns of table "foo". These basic queries are simple enough to be implemented using static metadata if desired, as with VOSI, but would return only the metadata required by typical client use cases, in a format convenient for the client to process. The standard query interface with all of its features would be available without change for client table metadata queries. Once such an interface was contemplated it quickly became clear that it could also be useful for simple filter-type queries of individual tables (e.g. astronomical catalogs). Adding a spatial region constraint (cone search) capability was also easy, and would provide an attractive upgrade path for phasing out legacy cone search. This concept was easily extended to support multi-position queries and support for general STC regions (REGION parameter) as well. Fully relational DBMS queries would require ADQL (which TAP also provides) but are not required for typical queries of individual astronomical catalogs. A final motivation for the parameter query, at least for some of us, was to provide something simple which could be implemented robustly now, for use for our most common astronomical use cases while the more complex and powerful ADQL-based TAP query technology matures. In particular advanced functionality such as scalable multi-position ("multicone") queries could be robustly implemented while providing a simple interface to the scientist-programmer user. Once we succeed in enticing astronomer-users with such a simple interface they will be motivated to learn the more complex and powerful ADQL interface, with all the advanced analysis capabilities that SQL provides (one reason for providing both in the same service interface). Professional programmers and the large projects developing advanced portal applications would probably use the ADQL query capability from the beginning, developing this technology into a mature capability in the process. The parameter query and ADQL query are merely two alternative ways of expressing a query. Param query is much more constrained than ADQL but provides explicit support for some important common use cases. ADQL is a general parsed query language, directly leverages SQL, and is much more flexible and powerful, but also more complex. Both share the same service interface, execution engine, and output processing. Prototypes of the TAP param query have thus far been implemented only within NVO (to my knowledge). Fairly complete prototypes are available from both STScI and IPAC/IRSA. Similar capabilities (each with a different interface) have however been provided by years by many of the major astronomical data centers, e.g., CDS, CADC, IRSA, HEASARC, and others, and have been quite popular with users for basic catalog access. A survey and analysis of these was done early on in the development of the TAP param query proposal. Scope of TAP Parameter-Based QueriesThus far two proposals have been made to begin to define the scope of TAP parameter-based queries. The first originated within NVO and deals explicitly with the issue of querying table data and metadata within the TAP interface. In a later phase of the TAP discussions the concept of a generalized parameter query language (PQL) was also introduced. The functionality proposed for parameter-based queries of table data and metadata was summarized in late February. The details can be found here: http://www.ivoa.net/forum/dal/0902/1016.htm We won't repeat the details here but the capabilities discussed in the link above include the following:
InterfaceThe most complete definition of the proposed param query interface and functionality may be found in section 3.3 of V0.3 of the TAP draft (as presented in the Baltimore interop in fall 2008): http://www.ivoa.net/internal/IVOA/TableAccess/TAP-v0.3.pdf The basic interface is preserved in later versions of the draft TAP spec but TAP-specific functionality such as use of param query | |||||||||||
Changed: | |||||||||||
< < | for table metadata queries and multi-position queries is less fully addressed. | ||||||||||
> > | for table metadata queries and multi-position queries is no longer fully specified. | ||||||||||
The param query interface as currently proposed includes the following parameters: | |||||||||||
Deleted: | |||||||||||
< < |
| ||||||||||
Added: | |||||||||||
> > |
| ||||||||||
Usage of these parameters is more fully presented in the TAP draft | |||||||||||
Changed: | |||||||||||
< < | specifications, including at the link above. | ||||||||||
> > | specifications, as at the link above. | ||||||||||
Other parameters, common with the ADQL query, can also be used, e.g.,
FORMAT, UPLOAD, MAXREC, MTIME, RUNID (an issue not discussed further
here is whether MTIME should be limited to the param query).
PQL preserves all this but proposes a more general DAL parameter-based
query language, not specific to table data. Non-spatial query
parameters such as BAND and TIME are proposed; these are not normally
associated with table data but are used in the other DAL interfaces.
Aside from semantics the most significant change is replacement of
separate ParamQuery and AdqlQuery operations with a single query
operation, using LANG (or some such parameter) to specify the type
of query method to be used, i.e., ADQL, other-QL, or param.
Issues For Discussion1. TAP-Specific Parameter Queries vs Generic PQLThe issue here is whether the TAP param query should be specific to | |||||||||||
Changed: | |||||||||||
< < | TAP, or some more generic query language like ADQL, which could be | ||||||||||
> > | TAP, or some more generic query language like ADQL which could be | ||||||||||
used in other contexts. A primary requirement for parameter queries in TAP is that we fully specify how to query data tables as well as table metadata - not images, not spectra, not spectral line lists, etc, but actual tables of some sort, as this is what TAP is primarily for. As noted under "scope" above, we want to be able to do cone search or multi-position queries of astronomical catalogs, possibly including a filter constraint specified over the table fields. A simple filter-type query with no spatial constraint is also needed. Param query should provide a basic mechanism for table metadata queries. Whatever we do, TAP param query needs to fully and explicitly specify how we do these things. The possibility to use a generic parameter-based query to query for any type of data (not just tables) is also intriguing, and is part of the motivation for the PQL proposal. While there is some potential here (more on this below) there are two main issues with this proposal. First, while DAL queries such as SIA, SSA, etc. may look similar and have similar parameters, they are used for actual data access as well as for data discovery and the semantics are necessarily specific to the type of data being accessed and the need to specify virtual data. For example, if we look at what is required for spectral extraction, or slicing and dicing a data cube, or generation of synthetic spectra from a theoretical model, this has little to do with some generic query mechanism. Second, if we try to make TAP parameter queries generic we must not in the process compromise our primary requirement of fully specifying how to query table data and metadata. Nonetheless there is a role in DAL for a generic data query mechanism, | |||||||||||
Changed: | |||||||||||
< < | called the generic dataset query. This has been under discussion | ||||||||||
> > | known as the generic dataset query. This has been under discussion | ||||||||||
for some years and is documented in the DAL2 architecture document
http://www.ivoa.net/internal/IVOA/SiaInterface/DAL2_Architecture.pdf
In object modeling terms the generic dataset is the base class for
all the DAL interfaces, with SIA, SSA, etc. being subclassed from the
generic dataset, providing specialized access for each major type of
astronomical data. In addition, the generic dataset query would make
it possible to discover any type of data with a single query, describe
associations among related primary datasets to model complex data,
link to the actual physical (archival) datasets for retrieval, or link
to data services which could be used for more advanced data access.
DAL has long proposed adding an actual service to implement the generic
dataset query. However, one can't help but notice that the proposed
PQL and the generic dataset query have similarities, especially if we
restrict PQL to data discovery (no actual data access or virtual data).
While something like PQL could not serve as the base class for actual
data access services, it is similar to the generic dataset query.
The proposed generic dataset query would provide both parameter
and ADQL capabilities to query for generic datasets, returning the
result as a table providing associations and data links to model
complex data and provide access to such data. In practice a site
would probably construct an actual DBMS table providing an index to
all primary datasets (e.g., archive files) available at the site,
describing each such dataset using the generic dataset metadata (this
is essentially the same as the Observation data model in DM parlance).
If this generic dataset index is an actual table, can we use TAP to
query it? Clearly we can, with both parameter and ADQL interfaces,
because that is what TAP already specifies. The next question is
whether we can generalize the TAP parameter query capability to
provide the functionality of the proposed generic dataset query.
It is very close already. If we extend TAP param query to do the
generic dataset query (as well as general table data/metadata queries)
then we will have the generic dataset query as well as something
much like the proposed PQL. So long as the generic dataset index
at a site is an actual table, TAP ADQL queries would be supported
as well. What the TAP param/GDS query would add would be integrated,
high level support for the generic dataset data model.
2. Use of Range-List for Filter Specification (WHERE parameter)The value of the WHERE parameter in TAP param query is a simple list of table field constraints, each of which is a simple open or closed range, list of allowable values, or textual whole or substring match. Negation and null value comparisons are supported. Pattern matches are case insensitive but can be made case sensitive by quoting. Lexical analysis can also be defeated by quoting. The use of a simple range-list for the WHERE parameter has always been a debatable issue. It provides all that is needed for simple table metadata queries (the original motivation), but one might like a more powerful parsed expression capability for general table filter constraints. The range list (a DAL2 standard parameter syntax) does not require a rule based parser to process; all that is required is simple lexical token generation. It is straightforward to convert a param query WHERE clause into an equivalent native SQL (or ADQL) expression. WHERE as proposed is simple to compose and process, and adequate for simple filter-type table field constraints. It is tempting to permit more general expressions but this could significantly complicate implementations, and in any case we already have the ADQL query if general expressions are required. The issue is whether a more general expression mechanism is warranted, and if so what it would look like. If possible we would like to maximize DAL2 compatibility to promote code reuse, and minimize use of HTTP-unfriendly metacharacters to simplify user submission of queries with common Web tools. The syntax currently proposed is a compromise, taking all these considerations into account.3. Table Metadata QueriesShould VOSI (for the registry or registry-oriented client apps) and param query be our primary TAP mechanisms for metadata queries? MAXREC=0, with either the ADQL or param query, can also be used but this provides more limited information. Imagine we are demonstrating TAP to a user, using only a Web browser. It is very tempting to type in something like<preamble>?FROM=TAP_SCHEMA.tables,FORMAT=text (or html,csv,tsv)to get a simple list of the tables the service supports, followed by something like <preamble>?FROM=TAP_SCHEMA.columns&WHERE=table_name,xxxto examine the columns of a table, after which we are ready to submit a data query. So long as we have the TAP_SCHEMA this is all straightforward. An ADQL query could be used as well, although it would be overkill for simple metadata queries. The issue is whether we want to define a minimum requirement for such metadata queries (as in the original TAP 0.3 draft). The above queries are implementable without requiring actual metadata tables in the DBMS. 4. Specifying Query MethodSince we no longer have separate ParamQuery and AdqlQuery service operations, both having been combined into a single operation, an issue is how we specify this in the service interface. What we currently have is LANG, however the parameter query is not a general query language in the sense that ADQL is. Perhaps LANG should be generalized to QUERYTYPE, QUERYMETHOD, or some such concept. -- DougTody - 14 May 2009<--
|