Scope and Role of TAP Parameter Query

Introduction

First some history. Back when we began discussing TAP a couple of years ago, it was agreed that TAP should be able to query table metadata as well as table data. While all agreed that VOSI support was needed for use internal to the VO projects, some of us felt that the VOSI approach of a big block of XML describing the full tableset was not what we wanted to provide to science application writers (often non-professionals) for table metadata queries. Rather we wanted to use the same table query mechanism to query table metadata as well as table data, in the spirit of the elegant SQL information schema concept.

While ADQL could be used for this purpose, at the time it was felt that requiring ADQL to support fully general table metadata queries, while nice as an advanced feature, was more than was needed and would require data providers to implement the TAP_SCHEMA as actual database tables, which we (AstroGrid in particular) wanted to avoid. A simple parameter query interface could support basic table metadata queries more flexibly than VOSI but without requiring full ADQL support for table metadata queries, sharing the same basic query i/o interface with the ADQL query. Most of what a client would want for table metadata queries could be provided by such simple queries such as

    <preamble>&FROM=TAP_SCHEMA.tables

which would list and describe all tables supported by the TAP service, and

    <preamble>&FROM=TAP_SCHEMA.colums&WHERE=table_name,foo

which would describe the columns of table "foo". These basic queries are simple enough to be implemented using static metadata if desired, as with VOSI, but would return only the metadata required by typical client use cases, in a format convenient for the client to process. The standard query interface with all of its features would be available without change for client table metadata queries.

Once such an interface was contemplated it quickly became clear that it could also be useful for simple filter-type queries of individual tables (e.g. astronomical catalogs). Adding a spatial region constraint (cone search) capability was also easy, and would provide an attractive upgrade path for phasing out legacy cone search. This concept was easily extended to support multi-position queries and support for general STC regions (REGION parameter) as well. Fully relational DBMS queries would require ADQL (which TAP also provides) but are not required for typical queries of individual astronomical catalogs.

A final motivation for the parameter query, at least for some of us, was to provide something simple which could be implemented robustly now, for use for our most common astronomical use cases while the more complex and powerful ADQL-based TAP query technology matures. In particular advanced functionality such as scalable multi-position ("multicone") queries could be robustly implemented while providing a simple interface to the scientist-programmer user. Once we succeed in enticing astronomer-users with such a simple interface they will be motivated to learn the more complex and powerful ADQL interface, with all the advanced analysis capabilities that SQL provides (one reason for providing both in the same service interface). Professional programmers and the large projects developing advanced portal applications would probably use the ADQL query capability from the beginning, developing this technology into a mature capability in the process.

The parameter query and ADQL query are merely two alternative ways of expressing a query. Param query is much more constrained than ADQL but provides explicit support for some important common use cases. ADQL is a general parsed query language, directly leverages SQL, and is much more flexible and powerful, but also more complex. Both share the same service interface, execution engine, and output processing.

Prototypes of the TAP param query have thus far been implemented only within NVO (to my knowledge). Fairly complete prototypes are available from both STScI and IPAC/IRSA. Similar capabilities (each with a different interface) have however been provided by years by many of the major astronomical data centers, e.g., CDS, CADC, IRSA, HEASARC, and others, and have been quite popular with users for basic catalog access. A survey and analysis of these was done early on in the development of the TAP param query proposal.

Scope of TAP Parameter-Based Queries

Thus far two proposals have been made to begin to define the scope of TAP parameter-based queries. The first originated within NVO and deals explicitly with the issue of querying table data and metadata within the TAP interface. In a later phase of the TAP discussions the concept of a generalized parameter query language (PQL) was also introduced.

The functionality proposed for parameter-based queries of table data and metadata was summarized in late February. The details can be found here:

http://www.ivoa.net/forum/dal/0902/1016.htm

We won't repeat the details here but the capabilities discussed in the link above include the following:

  • Simple table/DBMS metadata queries.
  • Cone search replacement (spatial data model support).
  • Multi-position queries ("multicone").
  • Simple filter-type queries of astronomical catalogs.
  • Query for table modifications (MTIME).
  • Use of views to leverage SQL with simple param queries.

As part of TAP, param query would support inline or URL-based table uploads, and querying of arbitrarily large catalogs using async execution or streaming data transfers. Integration with VOTable and use of UTYPE-based queries may also eventually be possible. These capabilities are however shared with ADQL-based queries and are not specific to the parameter query, and with the exception of VOTable integration and UTYPE support has already been specified.

The parameter query language (PQL) concept proposes an ADQL-like general query capability using parameters instead of a parsed language to pose the query. To the extent that this is used to query tables it is the same as what is described above; the issues arise if we attempt to use a generic query to query virtual data or access typed data (images, spectra, etc.) where the semantics of the query necessarily depend upon the type of data being accessed. This issue is discussed further under "Issues for Discussion" below.

Interface

The most complete definition of the proposed param query interface and functionality may be found in section 3.3 of V0.3 of the TAP draft (as presented in the Baltimore interop in fall 2008):

http://wiki.ivoa.net/internal/IVOA/TableAccess/TAP-v0.3.pdf

The basic interface is preserved in later versions of the draft TAP spec but TAP-specific functionality such as use of param query for table metadata queries and multi-position queries is no longer fully specified.

The param query interface as currently proposed includes the following parameters:

POS, SIZE -- "Cone search" type spatial queries including multi-position queries using table uploads.
REGION -- Spatial queries using more general STC-based regions.
SELECT -- Specifies the table columns to be returned.
FROM -- Specifies the table to be queried (including TAP_SCHEMA tables).
WHERE -- Specifies an optional simple filter to be applied to specific table fields.

Usage of these parameters is more fully presented in the TAP draft specifications, as at the link above.

Other parameters, common with the ADQL query, can also be used, e.g., FORMAT, UPLOAD, MAXREC, MTIME, RUNID (an issue not discussed further here is whether MTIME should be limited to the param query).

PQL preserves all this but proposes a more general DAL parameter-based query language, not specific to table data. Non-spatial query parameters such as BAND and TIME are proposed; these are not normally associated with table data but are used in the other DAL interfaces. Aside from semantics the most significant change is replacement of separate ParamQuery and AdqlQuery operations with a single query operation, using LANG (or some such parameter) to specify the type of query method to be used, i.e., ADQL, other-QL, or param.

Issues For Discussion

1. TAP-Specific Parameter Queries vs Generic PQL

The issue here is whether the TAP param query should be specific to TAP, or some more generic query language like ADQL which could be used in other contexts.

A primary requirement for parameter queries in TAP is that we fully specify how to query data tables as well as table metadata - not images, not spectra, not spectral line lists, etc, but actual tables of some sort, as this is what TAP is primarily for. As noted under "scope" above, we want to be able to do cone search or multi-position queries of astronomical catalogs, possibly including a filter constraint specified over the table fields. A simple filter-type query with no spatial constraint is also needed. Param query should provide a basic mechanism for table metadata queries. Whatever we do, TAP param query needs to fully and explicitly specify how we do these things.

The possibility to use a generic parameter-based query to query for any type of data (not just tables) is also intriguing, and is part of the motivation for the PQL proposal. While there is some potential here (more on this below) there are two main issues with this proposal. First, while DAL queries such as SIA, SSA, etc. may look similar and have similar parameters, they are used for actual data access as well as for data discovery and the semantics are necessarily specific to the type of data being accessed and the need to specify virtual data. For example, if we look at what is required for spectral extraction, or slicing and dicing a data cube, or generation of synthetic spectra from a theoretical model, this has little to do with some generic query mechanism. Second, if we try to make TAP parameter queries generic we must not in the process compromise our primary requirement of fully specifying how to query table data and metadata.

Nonetheless there is a role in DAL for a generic data query mechanism, known as the generic dataset query. This has been under discussion for some years and is documented in the DAL2 architecture document

http://wiki.ivoa.net/internal/IVOA/SiaInterface/DAL2_Architecture.pdf

In object modeling terms the generic dataset is the base class for all the DAL interfaces, with SIA, SSA, etc. being subclassed from the generic dataset, providing specialized access for each major type of astronomical data. In addition, the generic dataset query would make it possible to discover any type of data with a single query, describe associations among related primary datasets to model complex data, link to the actual physical (archival) datasets for retrieval, or link to data services which could be used for more advanced data access.

DAL has long proposed adding an actual service to implement the generic dataset query. However, one can't help but notice that the proposed PQL and the generic dataset query have similarities, especially if we restrict PQL to data discovery (no actual data access or virtual data).

While something like PQL could not serve as the base class for actual data access services, it is similar to the generic dataset query. The proposed generic dataset query would provide both parameter and ADQL capabilities to query for generic datasets, returning the result as a table providing associations and data links to model complex data and provide access to such data. In practice a site would probably construct an actual DBMS table providing an index to all primary datasets (e.g., archive files) available at the site, describing each such dataset using the generic dataset metadata (this is essentially the same as the Observation data model in DM parlance).

If this generic dataset index is an actual table, can we use TAP to query it? Clearly we can, with both parameter and ADQL interfaces, because that is what TAP already specifies. The next question is whether we can generalize the TAP parameter query capability to provide the functionality of the proposed generic dataset query. It is very close already. If we extend TAP param query to do the generic dataset query (as well as general table data/metadata queries) then we will have the generic dataset query as well as something much like the proposed PQL. So long as the generic dataset index at a site is an actual table, TAP ADQL queries would be supported as well. What the TAP param/GDS query would add would be integrated, high level support for the generic dataset data model.

2. Use of Range-List for Filter Specification (WHERE parameter)

The value of the WHERE parameter in TAP param query is a simple list of table field constraints, each of which is a simple open or closed range, list of allowable values, or textual whole or substring match. Negation and null value comparisons are supported. Pattern matches are case insensitive but can be made case sensitive by quoting. Lexical analysis can also be defeated by quoting.

The use of a simple range-list for the WHERE parameter has always been a debatable issue. It provides all that is needed for simple table metadata queries (the original motivation), but one might like a more powerful parsed expression capability for general table filter constraints. The range list (a DAL2 standard parameter syntax) does not require a rule based parser to process; all that is required is simple lexical token generation. It is straightforward to convert a param query WHERE clause into an equivalent native SQL (or ADQL) expression.

WHERE as proposed is simple to compose and process, and adequate for simple filter-type table field constraints. It is tempting to permit more general expressions but this could significantly complicate implementations, and in any case we already have the ADQL query if general expressions are required. The issue is whether a more general expression mechanism is warranted, and if so what it would look like. If possible we would like to maximize DAL2 compatibility to promote code reuse, and minimize use of HTTP-unfriendly metacharacters to simplify user submission of queries with common Web tools. The syntax currently proposed is a compromise, taking all these considerations into account.

3. Table Metadata Queries

Should VOSI (for the registry or registry-oriented client apps) and param query be our primary TAP mechanisms for metadata queries? MAXREC=0, with either the ADQL or param query, can also be used but this provides more limited information.

Imagine we are demonstrating TAP to a user, using only a Web browser. It is very tempting to type in something like

    <preamble>?FROM=TAP_SCHEMA.tables,FORMAT=text (or html,csv,tsv)

to get a simple list of the tables the service supports, followed by something like

    <preamble>?FROM=TAP_SCHEMA.columns&WHERE=table_name,xxx

to examine the columns of a table, after which we are ready to submit a data query.

So long as we have the TAP_SCHEMA this is all straightforward. An ADQL query could be used as well, although it would be overkill for simple metadata queries. The issue is whether we want to define a minimum requirement for such metadata queries (as in the original TAP 0.3 draft). The above queries are implementable without requiring actual metadata tables in the DBMS.

4. Specifying Query Method

Since we no longer have separate ParamQuery and AdqlQuery service operations, both having been combined into a single operation, an issue is how we specify this in the service interface. What we currently have is LANG, however the parameter query is not a general query language in the sense that ADQL is. Perhaps LANG should be generalized to QUERYTYPE, QUERYMETHOD, or some such concept.

-- DougTody - 14 May 2009


Topic revision: r3 - 2009-05-15 - DougTody
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback