|
META TOPICPARENT |
name="IvoaVOQL" |
This document will act as RFC centre for the Astronomical Data Query Language v2.0 Proposed Recommendation.
This document is currently under TCG evaluation for approval since 15 Sep 2008. TCG members shall approve or otherwise in the reserved section at the bottom of these pages.
<-- Tcg evaluation: -->
Section 1
2nd paragraph: "The ADQL specification pretends...". I don't think this is was you mean. Better to just say "The ADQL specification avoids any distinction..."
or "The ADQL specification makes no distinction..."
VOQL-TEG Answer: OK. The document will be updated accordingly.
3rd paragraph, last sentence, "which" should be "that".
VOQL-TEG Answer: OK. The document will be updated accordingly.
4th paragraph, "keywords" should be "key words" (to avoid confusion with, e.g., FITS keywords. The cited RFC also says "key words".
VOQL-TEG Answer: OK. The document will be updated accordingly.
Section 2.3.2
I know that this territory has been covered many times now, but I am concerned to see all of the region names explicitly defined in the ADQL grammar rather than referring to the IVOA standard for regions, STC. Almost all of this part of the document could be reduced to a reference to STC-S. I am not enough of an SQL expert to know whether it is better to express these all as just arguments to a generic REGION function, or to bring the region names directly into the ADQL. But there is no reference here to STC whatsoever, like it does not exist.
VOQL-TEG Answer: As shown in the current (Trieste 2008) meeting, the REGION construct takes a string representation of the region and converts it to internal format. The format of the string is to be specified by a service that accepts ADQL by refering to a standard format. Thus, accepting STC (S or X) is a service capability (see attached presentation by P. Dowler). As for the specific shapes, we chose the ones that cover a large fraction of the use cases and which could be defined unambiguously (see RECTANGLE comment below). It has been agreed, however, that a more explicit mention shall be done in the document pointing to the STC one.
2.3.2.2.3 and 2.3.2.2.4
Are these really necessary? Why do these calculations need to be done in the ADQL language (implying that they must be server-side functions)?
VOQL-TEG Answer: It was felt these functions would be useful extensions to have for a query language. Other, similar initiatives in extending SQL (eg: OpenGIS) have utility functions such as AREA and CENTROID, amongst others. Further functions might be added in the future, where calculations within the ADQL syntax itself might be considered overly burdensome.
Section 2.3.2.2.5.6 INTERSECTS
Near the end it says the "intersect function is symmetric in its arguments." I think it would be more concise to say the "intersect function's arguments are commutative."
VOQL-TEG Answer: OK. The document will be updated accordingly.
Section 2.3.2.2.5.11
Last sentence says "Transforming a rectangle to another coordinate system will generally result in polygon." This is not correct, is it? I mean, a RECTANGLE will have at least one side that is not a great circle. What coordinate transformation will turn that side into an arc of a great circle (which is required for a POLYGON)?
VOQL-TEG Answer: RECTANGLE is poorly defined. It was agreed to change this to BOX, which will be defined as a short-cut for POLYGON and consistently with STC.
Section 2.3.2.2.6
Second to last sentence, "gratuitously" is misspelled.
(By they way, I think the subsectioning is totally out of control here! Five levels of subsections!! This is not really useful.)
VOQL-TEG Answer: OK. The document will be updated accordingly.
First, I would like to thank Pedro + Inaki for the new VOQL document -- the regions stuff looks now understandable, and the document can be used to start implementations as SQL extensions.
I have however a couple of major problems, and a few other considerations that I would like to stress before the document becomes an IVOA recommendation.
Major points:
- At the Beijing meeeting, it was agreed to implement a sphericalDistance function as sphericalDistance(ra1, dec1, ra2, dec2). This function is completely independent of STC, and it follows the standard definitions of SQL functions: any of the 4 arguments may be either a number or a column reference.
VOQL-TEG Answer: A sphericalDistance function defined with numeric values and implicitly the same coordinate system was considered a subset of the general DISTANCE function.
- The definitions of functions in the document should be completed -- the units of arguments and returned values must be specified in the final document. And for my own point of view, I would strongly recommend to use radians for the sphericalDistance function, and steradians for the returned value of the AREA function, in order to be coherent with the other standard math functions existing in SQL.
Other comments:
- About the introduction of the REGION or Point objects: these are not numbers neither strings and cannot be directly manipulated by SQL, hence the introduction of the new <geometry_value_expression>. It is not clear how these objects are stored in relational tables; one may assume they are stored as e.g. string or binary objects, or they are computed from the other attributes of the table. I feel that some explanations about such non-atomic data would help the reader to understand how it works. The concept of a function which returns an object like CENTROID would also benefit of some explanation -- it differs from any SQL standard function since it does not return a number nor a string. What do you get if you issue a select statement like
Select centroid(mytable.centre) from mytable which, according to the BNF, looks correct ?
VOQL-TEG Answer: It is up to the implementing parties to decide how the geometry attributes are to be stored or if they are to be materialized at all. The same applies to
situations where a is present in the SELECT list: The service is free to return a standard String serialization of the geometry.
- the ELLIPSE region is especially important in astronomical applications, like cross-matching or studies of surface brightness of galaxies. Was it omitted because the ellipse is a too complex geometry ?
- About the query expression: I share Jeff's comments about readability
where CIRCLE('ICRS', 123, 45,12) contains Point('ICRS', mytable.ra, mytable.dec) (b1) looks more understandable than where CONTAINS(Point('ICRS',mytable.ra, mytable.dec), CIRCLE('ICRS', 123, 45,12)) = 1 (b2) The functionnalities are equivalent, and there is surely a way of mapping exactly (b1) into (b2) which is SQL-92 compliant. Assuming that a table contains an attribute "coo" which represents a point, the 2 expressions would be where CIRCLE('ICRS', 123, 45,12) contains mytable.coo versus where CONTAINS(mytable.coo, CIRCLE('ICRS', 123, 45,12)) = 1
VOQL-TEG Answer We had long discussions within the group about function vs. operator approach. We finally decided to go for the functional approach as it would conform an automatically valid SQL statement, provided that the mapping of the actual parameters on to the precise definition of the function is correct. As Alex Szalay showed in the Cambridge Interop, the functional approach could be implemented directly in SQL.
- I don't see the necessity of separating math and trigo functions; the table in section 2.3.1 gathers both functions... Maybe just modify the caption of this table ?
- Terminology: rather than <system_defined_function> I would prefer the term <adql_defined_function>
VOQL-TEG Answer BNF construct names like <system_defined_function> and <user_defined_function> have been discussed and agreed within the VOQL-TEG already.
- about the section numbering: it looks strange to require a 6-level numbering to present ADQL; is it to compete with the STC document ? It might look rational for the writer, but it looks really not balanced for the reader ... and the table of contents is therfore lacking the important topics related to the region stuff.
VOQL-TEG Answer: Ok. See response to BobHanisch
for his last comment.
In summary, the new document clarifies the concepts (even if some details could be even better clarified :-); and the presence of the BNF appendix is really helpful.
(1) The standard as given is very simple in terms of coordinate frames etc -- assumes all handled elsewhere. What should a user do if they wish to query a table that is in galactic coordinates or B1950.0? What if the user wishes to write queries in other coordinate systems?
(2) The functions defined in the paper have poor extensibility because of these fixed fortran-like arguments. Why not plan for the future extension of the standard, and also make things more readable, with something like this:
POINT(frame="ICRS" equinox="1950" position2d="270.0, 22.0")
VOQL-TEG Answer: We expended a lot of energy in debating syntax, particularly the difference between a function based approach and an operator approach for things like CONTAINS and INTERSECTS (See the answer to Francois' points above). This is of a similar ilk, with questions of usability and extensibility and ease of implementation. The short answer is that it is an implementation question (though that is too simple!). You can perhaps see it partly depends on documentation (the parameters could be well documented in a user manual) and also on the presence or absence of an underlying parser. In the end we compromised and went for the simplest functional approach, as it would be a virtual pass through to a valid SQL statement.
(3) The given document is difficult to read with its single-minded focus on bulding a BNF grammar. It needs examples to show the semantics and semiotics of the BNF statements. Without examples it is difficult for me to understand semantics: either how customers might respond, or how difficult it is to implement. How, for example is the LONGITUDE(POINT) used?
-- Can I say this:
SELECT * from Table where LONGITUDE(POINT) between 270 and 285
If so, what kind of tables have POINT structures in them?
-- Can I say this to implement a SIAP?
SELECT * from Imagetable where INTERSECT(POLYGON, SDSS_DR6_FOOTPRINT)
-- Can I say
SELECT POLYGON from Imagetable
What kind of tables can be interpreted to have POLYGON attributes?
VOQL-TEG Answer: In Trieste we agreed to create an auxiliary document with examples.
(4) What kind of table can I specify in the select clause? Can I use VOSpace, like this:
SELECT * from vos://nvo.caltech.edu/myTable.dat where INNER JOIN SDSS_DR6 ....
I ask this, because the DAL group is assuming that everyone will have a VOSpace that can provide temporary storage, and that crossmatches are done through ADQL queries. An alternative is for VOQL services to allow qualified users to upload tables, is that envisioned?
VOQL-TEG Answer: Yes, a VOSpace URI could be used as a table name if escaping it with double quotes. As can be seen in the grammar, the table name construct allows either a regular identifier (made of latin letters, digits and/or underscores) or a delimited identifier (double quotes being the escape/delimiter characters).
(5) There is already a specification of spatial regions (*) that is semantically identical to the STC standard, which is and IVOA Recommendation.
But the definitions in the ADQL document use different words and different meanings from this, (eg one says BOX and the other says RECTANGLE). Have the ADQL authors read the existing IVOA standards? If so, what has been inherited from the STC standard into the ADQL?
VOQL-TEG Answer: See response to BobHanisch, Section 2.3.2 and 2.3.2.2.5.11 (above)
(*) http://www.ivoa.net/Documents/Notes/STC-S/STC-S-20071205.html, section 4.2
- I agree with Francois that the the Functions table in sec 2.3 needs some more attention:
- are trig function arguments and return values in degrees or radians?
- how does the seed work in the
rand(x) function? I guess x is the seed - is it an integer or a float? It looks like you need a seed every time you invoke it, but that wouldn't be very convenient. VOQL-TEG Answer: The seed is supposed to be an integer. We will detail the data type for each function argument in the next version of the document.
- how come the
round(x,n) ("Round to nearest integer") function has two arguments? VOQL-TEG Answer: The function description is not complete. It rounds to the given number of decimal places, with the default being to round to the nearest integer. You use a negative number to round to the left of the decimal point. The function with two args is supported by the main manufacturers (both parameters are required in SQL Server). The function description will be corrected in the next version of the document.
- Some other functions need to be documented more carefully:
-
LONGITUDE and LATITUDE - the units of the returned values need to be specified (presumably degrees). So do the ranges: I guess LATITUDE will return a value in the range [-90,+90]. LONGITUDE suggests (to me) a quantity in the range [-180,+180], but given that it's basically talking about Right Ascension people will probably be expecting [0,360]. Or can it be either, left up to the implementation? (I suggest not). Either way, document the intention. Permitted ranges should also be noted for supplying latitude and (especially) longitude values as arguments to the various geometry functions - will values outside the nominal range of lat/long be wrapped automatically or are they errors?
-
AREA - what are the units of the return value?
- VOQL-TEG Answer: The
LONGITUDE and LATITUDE functions are meant to extract the first and second coordinate from an attribute of POINT type. No coordinate transformation is supposed to take place.
- Coordinate systems: Is there a vocabulary for these? The only example given is 'ICRS' and the BNF is specified only as
<string_value_expression> . Are coordinate system specifications intended to be lifted from STC (or elsewhere) or is it intended that the implementation may define any values it likes for known coordinate systems? In practice - if only for testing purposes - it is often convenient to use "the default coordinate system", i.e. whatever is in use in the database; is there some way this can be selected? VOQL-TEG Answer: In Trieste we agreed to define a vocabulary for the possible coordinate systems.
- It seems a bit strange that you can extract the longitude and latitude, but not the coordinate system from a POINT. But maybe it's OK - I must admit I don't have a clear idea of how these constructs are expected to be used.
VOQL-TEG Answer: A function for extracting the coordinate system from a POINT will be added.
I am not happy with the way ADQL handles STC metadata in the PR.
First, as a minor point, it would be nice if an isodatetime data type
were added, so that we could directly use ISO 8601 datatime strings:
yyyy[-mm[-dd[Thh[:mm[:ss[.s...]]]]]] :
I.e., the limited ISO 8601 format only, as defined for FITS and in STC.
VOQL-TEG Answer: RDBMSs implicitly convert strings to internal (datetime or timestamp) form using a variety of techniques; ISO8601 format is an acceptable format already. As with other string representations, it is a service capability (possibly mandatory) to understand specific formats.
The PR introduces a "geometry" data type, but it really is a
hodgepodge of stings - essentially, a geometry data type is defined,
if I understand it correctly, as a string that contains a function
call, with a variety of functions and parameters that seem to be
reinventing STC.
If one wants to introduce a new data type, there is a much more
elegant and ready-made option: an STC-S string.
I would be more inclined to call it an STC data type, since the term
geometry data type is misleadingly narrow, but that can be discussed.
VOQL-TEG Answer: As described in the attached presentation (and above) STC can be used within the REGION construct; it is up to a service to specify that it understands STC and they can support as little or much as is feasible. The ADQL language itself remains independent of any specific string representation or version thereof. In the Interop discussions it was explicitly stated that we expect TAP services to use standards (for region and the coordinate system arguments). The text within the PR will be expanded to make this clear.
Having defined such a data type, one can use the to-be-developed STC
library to verify its validity and manipulate (interpret) the string,
and define a set of STC core functions to provide the necessary
operations, such as intersection, union, contained_in, etc.
The STC-S string has the added advantage that it can also handle
coordinates, so that problem would be resolved, too.
As it is, the PR basically develops its own STC model, through an
ad-hoc set of functions and parameters. This, I think, is
undesirable. If the IVOA has an accepted standard for certain
structures, one should use it, rather than roll one's own. The reason
is a very practical one: it encapsulates the model issues in that
particular standard, allowing its related library to handle the
details in a transparent and uniform way across applications, allowing
models to evolve as necessary without impacting the users.
If ADQL and SIAP and SSAP and Registry all would start using their own
definitions of, say, Circle we create chaos for the user.
I have heard many complain that incorporating STC greatly complicates
the systems into which it is to be integrated. That is nonsense.
I have said it numerous times and will say it again:
There is nothing wrong with applications recognizing only a limited
subset of STC structures and coordinate systems - returning an error
if they encounter something they cannot handle.
That is the way STC was incorporated into VOEvent - and it works.
It's perfectly acceptable to accept only simple strings and a limited
set of shapes and coordinate systems. It is not hard to parse those
strings and the enormous advantage is that (a) one is compliant with
an IVOA standard and (b) one is ready to expand functionality to
include more cases, shapes, and coordinate systems at any time,
without any changes to the interface, just by adding code to the
server software.
Aside from the fact that it is not a good idea to develop a new STC
model for ADQL, there are a number of specific problems with the way
it has been done, as well.
The only coordinate system flavor recognized in the PR is 2-D
spherical. Coordinates are specifically assumed to consist of a
longitude and latitude. This may seem sensible, but it means that one
is painting oneself into a corner for future extensions.
VOQL-TEG Answer: The defined geometry types are 2-D and the LONGITUDE/LATITUDE functions unnecessarily restrict it to spherical coordinates (see above for more about LONG/LAT).
VOQL-TEG Answer: The PR was arrived at by balancing usefulness and complexity and the TEG feels we have found the best comprimise. We have considered future extensibility and found that it is plausible to make only modest extensions beyond the current PR. It is feasible to add 1-D regions (intervals) and 2-D ellipse in a way that is consistent with the PR. After that, one would have to break with SQL to add much value.
Another issue is that, again, if I understand it correctly, the
parameters in the function calls may be valued or consist of column
references. The problem is that there is no way to handle the
coordinate system if the coordinate elements consist of references.
I cannot say "function ('ICRS', table.ra, table.dec)" because I do not necessarily know whether the table contains ICRS positions - nor
should I have to.
VOQL-TEG Answer: The first argument should be another column in the table, or the whole construct (a POINT) should be taken from a single column of the table. See attached presentation.
If I ask whether the positions in a table lie within
a certain region that I define in a certain coordinate system, the
server (knowing what its own coordinate system is, one would hope)
should perform the necessary transformations.
The PR specifies that when a server cannot perform a necessary
transformation, it should return a NULL. I question whether this is a
sensible thing: it is indistinguishable from "I have no data". I would
much rather see a standardized warning returned.
Section 2.1.2. RAND is listed as a reserved word, but isn't in the list of mathematical functions.
VOQL-TEG Answer: John, Not quite sure which copy of the spec you have. In mine, RAND is there in the table of maths functions. JeffLusted
Functions to do bitwise operations would be useful - some datasets such as the SDSS need them to select objects with particular flags set.
VOQL-TEG Answer: We are aware the the language still has shortcomings, the absence of bitwise operations being one. These should be up for consideration in the next version. JeffLusted
Comments by TCG
Chairs should add their comments under their name.
Tom McGlynn (Applications WG)
I have lots of comments. The most substantial are that I think we need to clarify the nature of geometry objects within the database better than is currently done and also think about whether we should entangle the astronomical concept of coordinate systems with the purely geometric concepts of regions on a sphere -- I think we should not. The first I think needs to be addressed before I'd be at all comfortable with this -- it doesn't seem like the behavior is currently well defined. The second reflects my view of how things should be done, so there I would be more reluctant for my viewpoint to impede progress.
- 2.1.2 Many existing databases have case sensitive identifiers (including ours). Requiring case insentitivity in ADQL will be extremely costly for us to implement.
- General Elements of the ADQL syntax that are extensions of standard SQL syntax should be clearly identified (e.g., TOP)
- Use of TOP: The syntax for limiting queries to a given number of rows is extremely varied among the databases. Use of the LIMIT syntax (used by MySQL for example) might be preferred since it allows not just the specification of the first N rows, but also the n-m'th row. However this isn't a big deal.
- Section 2.2.1 is completely incomprehensible to me. It needs to have examples of what is included and excluded.
- 2.3.1 The arguments (and return values) of the mathematical functions need to be clarified with regard to radians versus degrees.
Especially if we are going to have some functions that return degrees (though I think that is a bad idea for any function to use degrees other than the degrees/radians conversion functions).
- 2.3.2 Can geometric types be used in subselects. What is the effect of null values in the creation of a geometry type? Do they become null? Can tables support persistent geometry types? How do geometry types act under standard SQL operations? E.g., if I specify the same point in two different coordinates systems are they equal? Can >< operations be done on these? E.g., is CIRCLE(ICRS, x,y,10)>circle(ICRS,x,y,5), .... There needs to be a lot more discussion of how geometry objects work.
- 2.3.2 I don't like the contains function returning a numeric value. It's a geometric equivelent of the SQL IN function and should be a boolean. Intersects is better as an number but only because it could give a value (between 0 and 1) that reflects the degree of intersection. If we want these things to be boolean, then that's what they should be. SQL does not use integers as substitutes for boolean values and we shouldn't import that C atrocity into ADQL.
- 2.3.2.2.5.1 The units of area are not given. They should be steradians.
- 2.3.2.2.5.2 What is the centroid of a geometry? Does it need to be inside the geometry? How it it to be computed? Is it required to be computed the same way on all platforms? E.g., what is the centroid of a belt around the equator? One of the poles? Which one? Lots of questions here.
- 2.3.2.2.5.3 I don't believe that any geometry other than perhaps point should refer to a specific coordinate system. So I'd recommend a syntax of CIRCLE(POINT('ICRS', x,y),radius). Also, there should be no reference to a cone in the sky in the normative section, though it could be noted that circles may be helpful in implementing cone searches. My view is that we want to isolate the explicit astronomical elements of ADQL into a very few areas. The extensions to SQL should be made as primarily new mathematical capabilities (such as regions) indenpendent of astronomical peculiarities. Also, I believe that all angular measurements should be in radians.
Even better would be to have no dependence of any geometry on astronomical coordinates. Everything happens on the unit sphere. We could provide a single function
transform(InputSystem, OutputSystem, Point)
that would transform a point from one system to the other and that would be the only piece of ADQL that would need to worry about Astronomical coordinate systems.
- 2.3.2.2.5 generally. This is weirdly structured. The functions that generate area datatypes should be specified in one section and those things that operation on them to produce standard types (or perhaps booleans) elsewhere. I.e., point, circle and polygon (and box if retained in one section. Area, distance, latitude, longitude in another and perhaps contains and intersects in a third.
- 2.3.2.2.5.5 Again I think that all functions that take angular measurements should be in radians. Otherwise it gets very confusing as you combine sines and distances and such.
- 2.3.2.2.5.7-9 I think there is a very important issue concealed here. In my conception, we allow users to specify a coordinate system in creating a point primarily so that the local system can make any necessary transformation to get the point into what coordinate representation is used locally. However since the ADQL query may be a multidatabase query it's possible that multiple native coordinate representations are involved. So either we are going to specify a standard way in which geometries are passed between databases in a distributed query, or we need to tag each of the geometries that are created with the coordinate system in which they were created. As I mentioned above, I don't think we want to pollute geometries with this very astronomical concept. They should be defined on an abstract unit sphere.
- 2.3.2.2.5.10 I'd have made it POLYGON(POINT('ICRS', 10,10),POINT('ICRS',10,20),POINT('ICRS', 15,15). All the code for handling coordinate systems is embedded in POINT (and maybe LATITUDE and LONGITUDE). Despite being longer this is easier for me to read and it's more powerful.
If we could take the coordinate systems out of the geometry entirely, then in the case where the user was simply making a query in the same coordinate system ast the table it would be
POLYGON(POINT(10,10),Point(10,20),POINT(15,15))
or if we know that the table has galactic coordinates and the user has ICRS, then we'd have the straightforward
POLYGON(transform('ICRS','Galactic',POINT(10,10)),
transform('ICRS','Galactic',POINT(10,20)),
transform('ICRS','Galactic',POINT(15,15)))
- 2.3.2.2.5.11 I do not believe that the rectangle should be supported. Note that the statement that transforming a rectangle from one coordinate system to another results in a polygon is incorrect. The parallels that define a rectangle in one coordinate system do not transform to great circles. Rectangles can be supported using standard SQL on coordinates where needed and need no special support within ADQL.
- 2.3.2.2.5.12 I see no reason for a special REGION syntax.
- 2.3.3. Are user defined functions supported? It is unclear from this section. If they are not, then this section should be deleted until they are. Who are the users that define these functions? How and when do they get defined? How are they invoked after being defined? How persistent are they? Can they be shared between users?....
VOQL Chair Answer
I will obviously not have time to answer this very late and long list of comments before I take a plane to Baltimore tomorrow.
Nevertheless, could you please make sure you have read the proper version and eventually comment over that one? It seems your comments are directed against an old one. The latest version is linked from the top of this page. |
|
Christophe Arviset (TCG)
I approve the document.
As a minor editorial comment, it would be useful to add page numbers.
VOQL Chair Answer
Page numbers will eventually be added
(end VOQL Chair Answer)
Keith Noddle (Data Access Layer)
I approve the document.
As has been stated elsewhere, this is an evolutionary step towards a full ADQL specification and as such represents a substantial effort on the part of the authors; they are to be congratulated on their work. I feel sure that the compromises included in the document (especially with respect to REGION etc) will allow practical implementations to help guide the way forward in future versions of ADQL.
Matthew Graham (Grid & Web Services WG)
I approve this document: it is reassuring to see that there are bona fide implementations for what is clearly a standard on an evolutionary track.
Mireille Louys (Data Models WG)
The document is now easier to read and consistent, has integrated many of the improvements asked in the previous phase.
- I just noted a detail : in section 2.4.15 Geometry in the SELECT clause, the example
« SELECT circle(‘UTC-FK5-GEO’, 1, 2, 0.5) »
is not compatible to the query syntax specification of SELECT in 2.2: FROM is missing.
- dissemination of this language:
An appendix with a few query examples would help newcomers to digest the specification and write their own queries more easily. Alternatively, a web page with examples would also work to make ADQL more popular.
I approve the revised document for recommendation.
VOQL Chair Answer
As agreed in Trieste, a separate document is being created with examples, as a sort of annex to the specification. The document is currently being worked by the VOQL-TEG (B. Guffler) and will be available soon.
(end VOQL Chair Answer)
Francois Ochsenbein (VOTable WG)
I approve the document (see however comments below)
Section 2.3 Mathematical and Trigonometrical Functions:
in the power function, the second argument is a double
(not an integer); and atan2 is not exactly the arc tangent
(it gives the polar angle of a point having the cartesian
coordinates (x,y))
VOQL Chair Answer
Will eventually be corrected
(end VOQL Chair Answer)
Section 2.4 I feel the authors made a good compromise between
the divergent points of view (on my side, I still would
have preferred ADQL to stick to a single reference system,
i.e. remove completely the requirement of a COORDSYS --
any of the described regions can be expressed in the ICRS, and
coordinate system transformation is outside the scope of ADQL).
Nevertheless having to refer to a definition of the time to define
a simple point on a sphere does not look rational...
Pedro Osuna (VOQL WG)
I approve.
Ray Plante (Resource Registry WG)
All in all, this is a well written document, and nicely separates itself as a language from the service that utilizes it. I traced the BNF through a few of the concepts, and it appears in good shape.
I did not review this document during the RFC period as I should have, so some of my comments are out-of-scope of the TCG comment period. Nevertheless, I summarize them briefly here, and, for the benefit of the record and the authors, detail them above in the RFC section:
- TOP is not SQL92 and is highly problematic, so I don't think it should not be included.
- The defined semantics for the language elements are not spelled out sufficiently to test whether an implementation is correct. Is this really what we want?
- There are a few minor wording improvements that could be made
Because these are out of scope, don't interpret them as conditions for approval. Nevertheless, I think these issues may be very easily rectified/addressed. I would appreciate it if the authors could comment on these or reference previous discussion where the rational was examined. (Please see also my detailed comments in the RFC section above.)
I'm not sure what to make of the implementations. I see reference to only one implementation (below in response to Hanisch's comments). The URL points only to a paragraph describing the code, not to the actual code itself. Can we have a pointer to the source? The paragraph seems to indicate that the parser simply converts ADQL into XML, which is then translated into a native SQL via a selection of stylesheets. I not sure how to assess whether this implementation demonstrates compliant use of the spec--is it sufficient just to unambiguously parse the query, or should we also demonstrate it being used to execute queries with the correct meaning. I tend to think the latter, and for this, we need more specific explanations of the semantics.
In this regard, I agree with Bob's comment:
I would like to see a more restricted, unambiguous, and implementable definition than what we have here if it is to go forward as a REC.
I believe that this can be easily rectified with the following:
- a blanket statement near the beginning indicating that where a term in ADQL is identical to that in SQL92, the semantics of that term (and its arguments) are the same as in SQL92.
- spelling out the meanings and types of all arguments for functions. (Suggested sentences are given here.)
- you mention that the region functions are based on STC; I recommend that an explicit statement indicating that the arguments are identical in type and meaning as the arguments defined for STC/s.
I also agree with Arnold's concern about whether coordinate conversions are expected (also referenced in Bob's comments below). The authors address this explicitly for the CONTAINS function (2.4.6) but it applys to all of the functions that accept a coordinate system argument. To be consistant, the I think that coordinate conversion is expected in all cases. An explicit statement that addresses this (insertable in section 2.4.1 under data types) might be:
When a geometry data type within the context of the WHERE clause, the given coordinate system may not match the system in use by the underlying database that the query is applied to. In this case, it is expected that the service or application will perform the necessary coordinate conversions to evaluate the constraints. If this conversion is not possible, then an appropriate error should be thrown as defined by the service or applicaiton using ADQL.
VOQL Chair Answer
* TOP falls under the category of ADQL Reserved Words, as it was understood as an extension to the SQL92 reserved keywords. First attempts on TAP to sketch query ranking functionalities for large datasets showed the need to provide such features in the language rather than in the service by combining the use of ORDER_BY and TOP. At that time, no problem was found to include TOP despite the fact that it can be named and implemented in different ways in different database systems. That is why it is defined as an ADQL extension and not and SQL reserved word. The first time TOP appears is in the April 2007 version. Later, in Beijing (May 2007), some people expressed concerns that TOP might have different implementations in different DB systems, but there was no final requirement from any party to remove it. Nor was there such a requirement within any of the members of the VOQL-TEG. In case the TOP would create troubles when implementing services making use of ADQL language, it could be deprecated in next ADQL version. For the time being, it would give a good functionality for eventual TAP-like services, and therefore it should stay in.
* I am not sure I understand the part on implementations. The BNF parser is a full system very kindly developed by Jeff Lusted (Astrogrid) and which source code, packaging information, documentation, etc. is fully available at the pages given in the referred URL (you only have to click on the corresponding menu at the left to navigate through the source code, etc.). As an example, please find attached this URL where part of the java code can be seen: http://www.astrogrid.org/viewcvs/astrogrid/adql2/
The language is defined via a BNF (as was agreed several times in the past). The above implementation is a full implementation of the parser. Actually, I believe this is one of the most thorough implementation there has ever been of an IVOA proposed standard.
The fact that Protocols (and Services making use of them) will make use of the ADQL within their future implementations shall not be confused with an implementation of the BNF of the language itself, which -as said above- is attained by the referenced parser.
* A language should not give instructions for services to do any type of transformation. It is up to the service to understand that if coordinates are given to it in equatorial RA,DEC and the service only has galactic L,B then either an error shall be returned or the calculation done silently.
* In response to: [...] blanket statement near the beginning indicating that where a term in ADQL is identical to that in SQL92, the semantics of that term (and its arguments) are the same as in SQL92[...] I believe the document clearly states what is SQL92 standard and what is ADQL specific.
* In response to: [...]you mention that the region functions are based on STC; I recommend that an explicit statement indicating that the arguments are identical in type and meaning as the arguments defined for STC/s.[...] read points 2.4.1 and 2.4.14 of the document (see excerpt here)
[...]A special attention has to be paid to the REGION function. As can be seen more
in detail in Section 2.4.14, this construct is a general purpose function and it
takes a string value expression as argument. The format of the string is to be
specified by a service that accepts ADQL by referring to a standard format.
Currently STC/s (See [3] and [4]) is the only standardized string representation a
service can declare.[...]
(end VOQL Chair Answer)
RayPlante replies:
- I'm fine with your response regarding TOP (thanks).
- Thanks for the extra link to the implementation; I couldn't seem to find my way to the source from those navigation links on cited page. (I've queried Jeff from some extra detail.) I have no problems with the BN--it looks great. My main issue is with the semantics.
- In response to issue of transformations: It's fine if you do not want to go as far as indicating that it is an error when the transformation can not be done. However, is it your intention that a service be allowed to interpret the POINT positions as in a coordinate system other than the one provided? If the answer is yes, then we're okay; although, there is a bit of inconsistancy with CONTAINS and INTERSECTS addressing transformations but other not. It would be helpful if a statement regarding this point be made either way. The important point is that these functions are intended to mean something regardless of where ADQL is used; that meaning needs to be respected and, more importantly, made clear.
- In response to [...]I believe the document clearly states what is SQL92 standard and what is ADQL specific[...] I apologize if I missed this, but could you point to the statement[s] that are intended to clarify this?
- In response to reference to 2.4.1 and 2.4.14: Perhaps I was unclear. Is there an intended relationship between, say, CIRCLE() in ADQL and the "Circle" sub-phrase in SQL/s? If so, then stating this would help clarify the semantics of this function. Please see my suggested edit for 2.4.5 CIRCLE
Sebastien Derriere (Semantics WG)
I share some of the comments already done, and am reassured by the VOQL TEG answer. There are only minor issues I hope can be clarified:
-
- Properly define the argument and return units (degree/radians) for trigonometric functions (as Mark Taylor above)
- Why does round(x,n) take 2 arguments? Is round(0.5)=1 ?
- section 2.3.2.2.2 indicates in "later additions" the "overlaps" keyword. But note this is a SQL reserved keyword in section 2.1.2.
But as a whole, I approve this document.
VOQL Chair Answer
I am afraid, Sebastien, that you were reading an old version. In the current version linked at the top of this page:
* arguments for trig functions were updated
* explanation on round(x,n) was added
* there is no longer 2.3.2.2.2 section (implementing a comment from B. Hanisch on inconvenience to have so many subchapter levels)
(end VOQL Chair Answer)
Rob Seaman (VOEvent WG)
I approve.
Bob Hanisch (Data Curation & Preservation IG)
Here are some comments that Arnold sent to Pedro et al., based on discussions that he, I, and Alex had in the past couple of weeks.
1. section 2.1.3 correctly states that ISO 8601 is an acceptable
format and that therefore it would be perfectly in order for
services to indicate that they understand it.
But that was not the point I was making.
I suggested that recognizing ISO 8601 date-time be mandatory, for the
simple reason that it would guarantee a single date-time format that
will be recognized by all services. As it currently stands, there is
no such thing and clients would have to query each service as to what
it does and does not understand - or break up the datetime in small
pieces and handle them separately.
I feel that time is sufficiently essential in astronomy to warrant
requiring at least one common format (other than JD or MJD) that
clients can count on to be understood by all servers.
2. The geomerical functions in Section 2.4 are better defined in the
present version, but I still feel that the ADQL-specials complicate
life unnecessarily.
I do not see that there is much to be gained (and, imho, much more to
be lost) by allowing the same thing to be specified by:
CIRCLE ('UTC-FK5-GEO', 25.0, -20.0, 1)
and:
REGION ('Circle FK5 GEOCENTER 15.0 -20.0 1.0')
I am afraid that the duplication is going to lead to a situation where
some clients will support one, but not the other, and services that
support the other, but not the one. That is not interoperability.
Alternatively, everyone has to write code to understand both - the
information is the same, but the formatting subtly different.
(Actually, strictly speaking, the former would need to look up the
AstroCoordSystem in the library that UTC-FK5-GEO points to, so it's
worse off than using STC-S directly.)
This is a waste of resources and would like to advocate that the only
geometric function we recognize be REGION with as argument an STC-S
string.
I think the STC-S syntax in REGION is no more complicated or obtuse
than the arguments in CIRCLE. It has been said that SQL cannot
validate the STC-S string, but the same is true for CIRCLE: SQL can
count the arguments and their types, but cannot make a judgment as to
whether their values make sense - that's left to the CIRCLE function.
The same is true for REGION: SQL can check the number and type of the
parameters (one string), but it is left to REGION to validate the
string.
There is the issue of passing references (like "t.ra" and "t.dec") on.
I am sure we can come to an agreement on that, for instance by
escaping the names, like:
REGION ('Circle FK5 GEOCENTER @t.ra @t.dec 1.0')
3. The meaning of a coordinate system specification in the presence of
referenced coordinate values. To some extent, this refers to
Section 2.4.9 which is not very clear, but it is more fundamental
than that.
What does it mean when I say (as in the example above):
REGION ('Circle FK5 GEOCENTER 15.0 -20.0 1.0')
or:
CIRCLE ('UTC-FK5-GEO', t,ra, t.dec, 1)
Clearly, it cannot mean: "I will interpret (t.ra,t.dec) as FK5",
because they may not be and besides, the server should know.
The most intuitive meaning would probably be: "I want (t.ra,t.dec) in
FK5, so, if they aren't, do the transformation." But that may not be
the most practical solution under all circumstances.
The point is, though, this needs to be clarified and specified very
explicitly.
(End of comments.)
Subsequently there was an e-mail response from Jeff Lusted: "I think most members of the VOQL TEG group are sympathetic to the
concerns you express, but feel we have gone well beyond v1.0 and need
implementation feedback before we make further progress. However, I
speak for myself here, and the opinions in the following points are my
own"
If we need implementation feedback then I do not see how we are quite at the point of accepting this as a REC. We made an exception to this guideline for STC, at my (and others) urging. Some still think this was a mistake. However, STC had been implemented and had been used in several software environments. Who has an ADQL parser that can accept queries in the variety of syntaxes that are described in the document?
I would like to see a more restricted, unambiguous, and implementable definition than what we have here if it is to go forward as a REC.
VOQL Chair Answer:
Issue 1 concerns imposing a specific date-time format. This is to be addressed in either Protocols or Services, rather than language
Issue 2 concerns the usage and semantics of the REGION constructs. The usage of those has been discussed at length and agreed at the last interop meeting in Trieste. The point made worries about possible use of two ways of formatting a query, but does not imply an inherent problem.
Issue 3 might require a small clarification of the final version of the document.
With respect to implementations, there are at least two implementations of the ADQL doc:
- a full parser implementing all the BNF defining the language (c.f.: http://deployer.astrogrid.org/software/adqlparser-2008.2/index.html)
- a service for Table Access from CVO making use of ADQL
A definitive implementation of a BNF-based language should be a parser that understands the language constructs, and this is already available (see above). Other implementations will take place whenever protocols making use of the language are available so that clients can implement them. A parallel example is easily found with the SQL language itself.
I believe therefore the document is ready for REC.
(end of VOQL Chair Answer)
Herve Wozniak (Theory IG)
Approved.
Masatoshi Ohishi (Astro-RGIG)
META FILEATTACHMENT |
attr="" comment="Region and STC explanation" date="1211352567" name="RegionSTC-ADQL-RFC.pdf" path="RegionSTC-ADQL-RFC.pdf" size="198161" user="PatrickDowler" version="1.1" |
META FILEATTACHMENT |
attr="" comment="Notes taken during the VOQL session @ Trieste" date="1211366783" name="notes-20080521-voql.txt" path="notes-20080521-voql.txt" size="4925" user="BenjaminGufler" version="1.1" |
META TOPICMOVED |
by="BrunoRino" date="1209164473" from="IVOA.RfcPages" to="IVOA.ADQLv2RFC" |
|