ADQL RFCThis document will act as RFC centre for the Astronomical Data Query Language v2.0 Proposed Recommendation. This document is currently under TCG evaluation for approval since 15 Sep 2008. TCG members shall approve or otherwise in the reserved section at the bottom of these pages.<-- Tcg evaluation: -->
SELECT list: The service is free to return a standard String serialization of the geometry.
yyyy[-mm[-dd[Thh[:mm[:ss[.s...]]]]]] :
I.e., the limited ISO 8601 format only, as defined for FITS and in STC. VOQL-TEG Answer: RDBMSs implicitly convert strings to internal (datetime or timestamp) form using a variety of techniques; ISO8601 format is an acceptable format already. As with other string representations, it is a service capability (possibly mandatory) to understand specific formats. The PR introduces a "geometry" data type, but it really is a hodgepodge of stings - essentially, a geometry data type is defined, if I understand it correctly, as a string that contains a function call, with a variety of functions and parameters that seem to be reinventing STC. If one wants to introduce a new data type, there is a much more elegant and ready-made option: an STC-S string. I would be more inclined to call it an STC data type, since the term geometry data type is misleadingly narrow, but that can be discussed. VOQL-TEG Answer: As described in the attached presentation (and above) STC can be used within the REGION construct; it is up to a service to specify that it understands STC and they can support as little or much as is feasible. The ADQL language itself remains independent of any specific string representation or version thereof. In the Interop discussions it was explicitly stated that we expect TAP services to use standards (for region and the coordinate system arguments). The text within the PR will be expanded to make this clear. Having defined such a data type, one can use the to-be-developed STC library to verify its validity and manipulate (interpret) the string, and define a set of STC core functions to provide the necessary operations, such as intersection, union, contained_in, etc. The STC-S string has the added advantage that it can also handle coordinates, so that problem would be resolved, too. As it is, the PR basically develops its own STC model, through an ad-hoc set of functions and parameters. This, I think, is undesirable. If the IVOA has an accepted standard for certain structures, one should use it, rather than roll one's own. The reason is a very practical one: it encapsulates the model issues in that particular standard, allowing its related library to handle the details in a transparent and uniform way across applications, allowing models to evolve as necessary without impacting the users. If ADQL and SIAP and SSAP and Registry all would start using their own definitions of, say, Circle we create chaos for the user. I have heard many complain that incorporating STC greatly complicates the systems into which it is to be integrated. That is nonsense. I have said it numerous times and will say it again: There is nothing wrong with applications recognizing only a limited subset of STC structures and coordinate systems - returning an error if they encounter something they cannot handle. That is the way STC was incorporated into VOEvent - and it works. It's perfectly acceptable to accept only simple strings and a limited set of shapes and coordinate systems. It is not hard to parse those strings and the enormous advantage is that (a) one is compliant with an IVOA standard and (b) one is ready to expand functionality to include more cases, shapes, and coordinate systems at any time, without any changes to the interface, just by adding code to the server software. Aside from the fact that it is not a good idea to develop a new STC model for ADQL, there are a number of specific problems with the way it has been done, as well. The only coordinate system flavor recognized in the PR is 2-D spherical. Coordinates are specifically assumed to consist of a longitude and latitude. This may seem sensible, but it means that one is painting oneself into a corner for future extensions. VOQL-TEG Answer: The defined geometry types are 2-D and the LONGITUDE/LATITUDE functions unnecessarily restrict it to spherical coordinates (see above for more about LONG/LAT). VOQL-TEG Answer: The PR was arrived at by balancing usefulness and complexity and the TEG feels we have found the best comprimise. We have considered future extensibility and found that it is plausible to make only modest extensions beyond the current PR. It is feasible to add 1-D regions (intervals) and 2-D ellipse in a way that is consistent with the PR. After that, one would have to break with SQL to add much value. Another issue is that, again, if I understand it correctly, the parameters in the function calls may be valued or consist of column references. The problem is that there is no way to handle the coordinate system if the coordinate elements consist of references. I cannot say "function ('ICRS', table.ra, table.dec)" because I do not necessarily know whether the table contains ICRS positions - nor should I have to. VOQL-TEG Answer: The first argument should be another column in the table, or the whole construct (a POINT) should be taken from a single column of the table. See attached presentation. If I ask whether the positions in a table lie within a certain region that I define in a certain coordinate system, the server (knowing what its own coordinate system is, one would hope) should perform the necessary transformations. The PR specifies that when a server cannot perform a necessary transformation, it should return a NULL. I question whether this is a sensible thing: it is indistinguishable from "I have no data". I would much rather see a standardized warning returned.
Section 1
This document provides the general semantics for the language elements; where these semantics are ambiguous, the specification of the service or application using ADQL should clarify how the elements should be applied. Section 2.1.2
Section 2.2
The query should be interpreted much like an SQL92 query: where ADQL and SQL92 keywords are identical, the ADQL keywords and their operands should be interpreted in the same way as defined in SQL92. Section 2.4 | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
Comments by TCGChairs should add their comments under their name.Tom McGlynn (Applications WG)Christophe Arviset (TCG)I approve the document. As a minor editorial comment, it would be useful to add page numbers.Keith Noddle (Data Access Layer)I approve the document. As has been stated elsewhere, this is an evolutionary step towards a full ADQL specification and as such represents a substantial effort on the part of the authors; they are to be congratulated on their work. I feel sure that the compromises included in the document (especially with respect to REGION etc) will allow practical implementations to help guide the way forward in future versions of ADQL.Matthew Graham (Grid & Web Services WG)I approve this document: it is reassuring to see that there are bona fide implementations for what is clearly a standard on an evolutionary track.Mireille Louys (Data Models WG)The document is now easier to read and consistent, has integrated many of the improvements asked in the previous phase.
Francois Ochsenbein (VOTable WG)I approve the document (see however comments below) Section 2.3 Mathematical and Trigonometrical Functions: in the power function, the second argument is a double (not an integer); and atan2 is not exactly the arc tangent (it gives the polar angle of a point having the cartesian coordinates (x,y)) Section 2.4 I feel the authors made a good compromise between the divergent points of view (on my side, I still would have preferred ADQL to stick to a single reference system, i.e. remove completely the requirement of a COORDSYS -- any of the described regions can be expressed in the ICRS, and coordinate system transformation is outside the scope of ADQL). Nevertheless having to refer to a definition of the time to define a simple point on a sphere does not look rational...Pedro Osuna (VOQL WG)I approve.Ray Plante (Resource Registry WG)All in all, this is a well written document, and nicely separates itself as a language from the service that utilizes it. I traced the BNF through a few of the concepts, and it appears in good shape. I did not review this document during the RFC period as I should have, so some of my comments are out-of-scope of the TCG comment period. Nevertheless, I summarize them briefly here, and, for the benefit of the record and the authors, detail them above in the RFC section:
| ||||||||
Changed: | ||||||||
< < | These issues may be very easily rectified. I would appreciate it if the authors could comment on these or reference previous discussion where the rational was examined. (Please see also my detailed comments in the RFC section above.) | |||||||
> > | Because these are out of scope, don't interpret them as conditions for approval. Nevertheless, I think these issues may be very easily rectified/addressed. I would appreciate it if the authors could comment on these or reference previous discussion where the rational was examined. (Please see also my detailed comments in the RFC section above.) | |||||||
I'm not sure what to make of the implementations. I see reference to only one implementation (below in response to Hanisch's comments). The URL points only to a paragraph describing the code, not to the actual code itself. Can we have a pointer to the source? The paragraph seems to indicate that the parser simply converts ADQL into XML, which is then translated into a native SQL via a selection of stylesheets. I not sure how to assess whether this implementation demonstrates compliant use of the spec--is it sufficient just to unambiguously parse the query, or should we also demonstrate it being used to execute queries with the correct meaning. I tend to think the latter, and for this, we need more specific explanations of the semantics.
In this regard, I agree with Bob's comment:
I would like to see a more restricted, unambiguous, and implementable definition than what we have here if it is to go forward as a REC.I believe that this can be easily rectified with the following:
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
Added: | ||||||||
> > |
I also agree with Arnold's concern about whether coordinate conversions are expected (also referenced in Bob's comments below). The authors address this explicitly for the CONTAINS function (2.4.6) but it applys to all of the functions that accept a coordinate system argument. To be consistant, the I think that coordinate conversion is expected in all cases. An explicit statement that addresses this (insertable in section 2.4.1 under data types) might be:
When a geometry data type within the context of the WHERE clause, the given coordinate system may not match the system in use by the underlying database that the query is applied to. In this case, it is expected that the service or application will perform the necessary coordinate conversions to evaluate the constraints. If this conversion is not possible, then an appropriate error should be thrown as defined by the service or applicaiton using ADQL. | |||||||
Sebastien Derriere (Semantics WG)Rob Seaman (VOEvent WG)Bob Hanisch (Data Curation & Preservation IG)Here are some comments that Arnold sent to Pedro et al., based on discussions that he, I, and Alex had in the past couple of weeks. 1. section 2.1.3 correctly states that ISO 8601 is an acceptable format and that therefore it would be perfectly in order for services to indicate that they understand it. But that was not the point I was making. I suggested that recognizing ISO 8601 date-time be mandatory, for the simple reason that it would guarantee a single date-time format that will be recognized by all services. As it currently stands, there is no such thing and clients would have to query each service as to what it does and does not understand - or break up the datetime in small pieces and handle them separately. I feel that time is sufficiently essential in astronomy to warrant requiring at least one common format (other than JD or MJD) that clients can count on to be understood by all servers. 2. The geomerical functions in Section 2.4 are better defined in the present version, but I still feel that the ADQL-specials complicate life unnecessarily. I do not see that there is much to be gained (and, imho, much more to be lost) by allowing the same thing to be specified by: CIRCLE ('UTC-FK5-GEO', 25.0, -20.0, 1) and: REGION ('Circle FK5 GEOCENTER 15.0 -20.0 1.0') I am afraid that the duplication is going to lead to a situation where some clients will support one, but not the other, and services that support the other, but not the one. That is not interoperability. Alternatively, everyone has to write code to understand both - the information is the same, but the formatting subtly different. (Actually, strictly speaking, the former would need to look up the AstroCoordSystem in the library that UTC-FK5-GEO points to, so it's worse off than using STC-S directly.) This is a waste of resources and would like to advocate that the only geometric function we recognize be REGION with as argument an STC-S string. I think the STC-S syntax in REGION is no more complicated or obtuse than the arguments in CIRCLE. It has been said that SQL cannot validate the STC-S string, but the same is true for CIRCLE: SQL can count the arguments and their types, but cannot make a judgment as to whether their values make sense - that's left to the CIRCLE function. The same is true for REGION: SQL can check the number and type of the parameters (one string), but it is left to REGION to validate the string. There is the issue of passing references (like "t.ra" and "t.dec") on. I am sure we can come to an agreement on that, for instance by escaping the names, like: REGION ('Circle FK5 GEOCENTER @t.ra @t.dec 1.0') 3. The meaning of a coordinate system specification in the presence of referenced coordinate values. To some extent, this refers to Section 2.4.9 which is not very clear, but it is more fundamental than that. What does it mean when I say (as in the example above): REGION ('Circle FK5 GEOCENTER 15.0 -20.0 1.0') or: CIRCLE ('UTC-FK5-GEO', t,ra, t.dec, 1) Clearly, it cannot mean: "I will interpret (t.ra,t.dec) as FK5", because they may not be and besides, the server should know. The most intuitive meaning would probably be: "I want (t.ra,t.dec) in FK5, so, if they aren't, do the transformation." But that may not be the most practical solution under all circumstances. The point is, though, this needs to be clarified and specified very explicitly. (End of comments.) Subsequently there was an e-mail response from Jeff Lusted: "I think most members of the VOQL TEG group are sympathetic to the concerns you express, but feel we have gone well beyond v1.0 and need implementation feedback before we make further progress. However, I speak for myself here, and the opinions in the following points are my own" If we need implementation feedback then I do not see how we are quite at the point of accepting this as a REC. We made an exception to this guideline for STC, at my (and others) urging. Some still think this was a mistake. However, STC had been implemented and had been used in several software environments. Who has an ADQL parser that can accept queries in the variety of syntaxes that are described in the document? I would like to see a more restricted, unambiguous, and implementable definition than what we have here if it is to go forward as a REC. VOQL Chair Answer: Issue 1 concerns imposing a specific date-time format. This is to be addressed in either Protocols or Services, rather than language Issue 2 concerns the usage and semantics of the REGION constructs. The usage of those has been discussed at length and agreed at the last interop meeting in Trieste. The point made worries about possible use of two ways of formatting a query, but does not imply an inherent problem. Issue 3 might require a small clarification of the final version of the document. With respect to implementations, there are at least two implementations of the ADQL doc: - a full parser implementing all the BNF defining the language (c.f.: http://deployer.astrogrid.org/software/adqlparser-2008.2/index.html) - a service for Table Access from CVO making use of ADQL A definitive implementation of a BNF-based language should be a parser that understands the language constructs, and this is already available (see above). Other implementations will take place whenever protocols making use of the language are available so that clients can implement them. A parallel example is easily found with the SQL language itself. I believe therefore the document is ready for REC. (end of VOQL Chair Answer)Herve Wozniak (Theory IG)Approved.Masatoshi Ohishi (Astro-RGIG)
|