From dtody@nrao.edu Thu Nov 15 21:01:43 2007 Date: Thu, 15 Nov 2007 20:17:48 -0700 From: Doug Tody To: Keith Noddle , Francois Ochsenbein , Pedro Osuna , Alex Szalay , John Good Cc: Robert Hanisch , Roy Williams , Ray Plante , Matthew Graham Subject: RE: TAP dance Hi Keith, All - We discussed TAP in the NVO team meeting in Boston last week, with the goal of reviewing the key issues within NVO to be clear on our requirements and priorities in advance of the TAP meeting in Baltimore next week. All of the NVO folks involved in table access over the past several years were present in these discussions. The following is provided to summarize these discussions and the consensus reached, so that you all know where we stand going into our meeting next week, and have time to think about it before we meet. I think the following accurately represents the consensus of our discussions within NVO; if I got any of it wrong, or if anyone wants to clarify or expand upon a point, please do so. - Doug 1. Primary TAP Use-Cases 1.1 Complex/Large Table Query Large queries (requires async, vospace, authentication) Multi-table operations (join etc.) Multi-region queries including user table upload Advanced ADQL/SQL capabilities *Required to support cross-match portal, advanced apps* *Full functionality is required* 1.2 Simple Table/Catalog Query Filter-type operation upon a single table Most basic astronomical catalog access is of this type ADQL, async useful but not required for simple queries *Probably sufficient for most small data providers* *Cone search is not enough* 1.3 Table Metadata Query Key point: metadata describing stored data is also data Can be virtual/computed OTF, subsetted, transformed, etc. Queries against both data/metadata reqd at the service level Typical use case: Client application queries TAP service for available data Tables, table columns, relationships, etc. Service metadata - capabilities etc - is a separate issue *Basic metadata model should be simple* *Extensibility required for advanced query support* 1.4 Data Access Query This is "ADQL integration into DAL" ADQL query against a DAL data model (complex data etc.) This is less critical than 1-3 above, but also important in the longer term. 2. Key Requirements/Issues 2.1 ADQL and Grid capabilities Motivation Required for portals and advanced applications Needs everything: ADQL, multi-region, async, vospace, SSO, etc. Issues or Options Not controversial: everyone agrees we need this Not required however for basic usage Complex; will take time to prototype, specify, standardize Unrealistic to expect community implementation w/o toolkits This is required to support development of a distributed cross-match portal or data discovery portal, for grid workflows, as well as for other advanced applications. Full functionality (async, vospace, etc.) will inevitably be required for the larger problems. Within NVO we would like to have this type of capability functional by summer 2008 to support NVO Facility applications development, the NVO summer school in September 2008, etc. Realistically, this will probably be a functional prototype based on working drafts of the relevant IVOA standards, and we think it may take a year or so before all the relevant standards (not just TAP) reach Recommendation status. It would be good to have the first round of NVO prototypes functional by March or April of 2008. 2.2 Simple Query capability Motivation Provide a simple, basic table access capability Probably needed anyway for simple table metadata queries Adequate for most simple filter-type queries of single table Supplants cone search; much more powerful but still simple Provide robust implementation while we develop advanced stuff Issues Provides minimal-TAP option for small data providers Sufficient for publishing a few typical catalogs Provides all that is needed for >80% of typical queries Since all are agreed that a full-up ADQL/Grid capability is required, this was the most important issue discussed in the NVO meeting and all agreed that a simple query/simple-TAP capability is a requirement for NVO. We are not willing to abandon our small data providers, and the simple (non-ADQL, non-async) query is an 80-90% solution which is sufficient for small data providers until the more advanced technology is fully mature, with ready to use service toolkits and frameworks provided. This is essential to ensure buy-in by the broader community outside the major data centers. A "simple query" capability is probably desirable for tableset metadata queries in any case, and ideally the same interface can be used for both. Most small data providers (even many larger data providers) only publish a few independent catalogs which can be queried with a simple filtering operation working upon a single table. ADQL would be useful but is not required for such simple filter type queries, and sequential filter type queries on a single table have a quick start up hence can stream, permitting larger queries without requiring asynchronous capabilities or data staging. Cone search might continue to provide a solution, but is *too* simple, and we would like to have something significantly more powerful in the suite of second generation services, while retaining a simple basic implementation requirement, with the advanced capabilities optional. Finally, a simple query capability can be provided without taking anything away or otherwise compromising the advanced capabilities planned for TAP. It should be possible to define this early on, and provide it for immediate use while work continues on prototyping, specifying, and ultimately standardizing all aspects of the advanced TAP capabilities. 2.3 TAP Information Schema Motivation Provide uniform access to both table data and metadata Same query/access interface used for both Supports virtual data, dynamic queries, format options, etc Easily extended without changing interface Don't do one thing now, another later Issues or Options Need to specify/agree upon minimal core metadata Strategy: Adopt registry table model with minor changes Other options: VOTable with no data, literal registry XML TAP should provide access to both tableset data and metadata; this is a requirement. We think it is unacceptable to define one interface for table metadata now, and a different interface on the time scale of next spring, expecting the user community to implement both (meanwhile both NVO and Astrogrid/EuroVO already have interim table access interfaces which can serve in the interim). Hence a good strategy is to pick *one* approach which can provide what we require in the "long term" (next summer), while being simple to specify and implement initially. Various options have been discussed. The only viable option we can see which meets all requirements is the information schema type approach, since this can be simple initially but is easily extensible, while reusing the same access interface defined for table data. To avoid having to spend time defining a TAP information schema early on, we suggest adopting the Registry (VODataService) tableset model initially, but recast into the form of an information schema. Pat, Ray, and I (as well as others within NVO) have already discussed this and it appears that all can agree to this, so if we can get agreement within the TAP team and within the Registry WG (for any minor changes to the VODataService schema) then this could provide a quick solution for TAP 1.0 which will also work in the longer term. With this approach the initial TAP information schema would be what is currently defined in the registry VODataService, possibly with minor changes, i.e.: TAP_SCHEMA.Table name [[catalog.]schema.]table type base table, view, output, etc. description table description TAP_SCHEMA.Columns name column name tableName table name description column description unit unit in VO standard format ucd UCD if any utype UTYPE if any dataType dataType as in VOTable/registry arrayShape array "shape"/size as in VOTable std standard column (else custom) The only significant change from the current Registry VODataService suggested is the addition of a "type" attribute to Table, replacing "role" with a more general concept. UTYPE would also be added to Columns as this is currently missing. Other than that, what is shown above is what is currently defined by VODataService to describe tables and their columns in the registry, and this provides the essential information required to compose client data queries. Extension is possible by adding either additional fields to either table, or by adding new tables, however this could be deferred and dealt with as part of the prototyping effort for advanced ADQL queries. The obvious way to access this metadata is with a SimpleQuery, which avoids the need to add additonal special case operations at the service level. The default output format would be VOTable as for a data query, however registry-compliant XML could also be output if required. The details of how an advanced ADQL query might work have also been discussed, for example how VOSpace integration might work, or uploading of a user supplied multi-region table at query time. However everyone appears to agree that we need these capabilities, so most attention currently has focused on the scope of the basic TAP interface.