This document will act as
RFC centre for the
Astronomical Data Query Language v2.0 Proposed Recommendation.
This document is currently under TCG evaluation for approval since 15 Sep 2008. TCG members shall approve or otherwise in the reserved section at the bottom of these pages.
Section 1
2nd paragraph: "The
ADQL specification pretends...". I don't think this is was you mean. Better to just say "The
ADQL specification avoids any distinction..."
or "The
ADQL specification makes no distinction..."
VOQL-TEG Answer: OK. The document will be updated accordingly.
3rd paragraph, last sentence, "which" should be "that".
VOQL-TEG Answer: OK. The document will be updated accordingly.
4th paragraph, "keywords" should be "key words" (to avoid confusion with, e.g., FITS keywords. The cited RFC also says "key words".
VOQL-TEG Answer: OK. The document will be updated accordingly.
Section 2.3.2
I know that this territory has been covered many times now, but I am concerned to see all of the region names explicitly defined in the
ADQL grammar rather than referring to the IVOA standard for regions,
STC. Almost all of this part of the document could be reduced to a reference to
STC-S. I am not enough of an SQL expert to know whether it is better to express these all as just arguments to a generic REGION function, or to bring the region names directly into the
ADQL. But there is no reference here to
STC whatsoever, like it does not exist.
VOQL-TEG Answer: As shown in the current (Trieste 2008) meeting, the REGION construct takes a string representation of the region and converts it to internal format. The format of the string is to be specified by a service that accepts
ADQL by refering to a standard format. Thus, accepting
STC (S or X) is a service capability (see attached presentation by P. Dowler). As for the specific shapes, we chose the ones that cover a large fraction of the use cases and which could be defined unambiguously (see RECTANGLE comment below). It has been agreed, however, that a more explicit mention shall be done in the document pointing to the
STC one.
2.3.2.2.3 and 2.3.2.2.4
Are these really necessary? Why do these calculations need to be done in the
ADQL language (implying that they must be server-side functions)?
VOQL-TEG Answer: It was felt these functions would be useful extensions to have for a query language. Other, similar initiatives in extending SQL (eg: OpenGIS) have utility functions such as AREA and CENTROID, amongst others. Further functions might be added in the future, where calculations within the
ADQL syntax itself might be considered overly burdensome.
Section 2.3.2.2.5.6 INTERSECTS
Near the end it says the "intersect function is symmetric in its arguments." I think it would be more concise to say the "intersect function's arguments are commutative."
VOQL-TEG Answer: OK. The document will be updated accordingly.
Section 2.3.2.2.5.11
Last sentence says "Transforming a rectangle to another coordinate system will generally result in polygon." This is not correct, is it? I mean, a RECTANGLE will have at least one side that is not a great circle. What coordinate transformation will turn that side into an arc of a great circle (which is required for a POLYGON)?
VOQL-TEG Answer: RECTANGLE is poorly defined. It was agreed to change this to BOX, which will be defined as a short-cut for POLYGON and consistently with
STC.
Section 2.3.2.2.6
Second to last sentence, "gratuitously" is misspelled.
(By they way, I think the subsectioning is totally out of control here! Five levels of subsections!! This is not really useful.)
VOQL-TEG Answer: OK. The document will be updated accordingly.
First, I would like to thank Pedro + Inaki for the new VOQL document -- the regions stuff looks now understandable, and the document can be used to start implementations as SQL extensions.
I have however a couple of major problems, and a few other considerations that I would like to stress before the document becomes an IVOA recommendation.
Major points:
- At the Beijing meeeting, it was agreed to implement a sphericalDistance function as sphericalDistance(ra1, dec1, ra2, dec2). This function is completely independent of STC, and it follows the standard definitions of SQL functions: any of the 4 arguments may be either a number or a column reference.
VOQL-TEG Answer: A sphericalDistance function defined with numeric values and implicitly the same coordinate system was considered a subset of the general DISTANCE function.
- The definitions of functions in the document should be completed -- the units of arguments and returned values must be specified in the final document. And for my own point of view, I would strongly recommend to use radians for the sphericalDistance function, and steradians for the returned value of the AREA function, in order to be coherent with the other standard math functions existing in SQL.
Other comments:
- About the introduction of the REGION or Point objects: these are not numbers neither strings and cannot be directly manipulated by SQL, hence the introduction of the new <geometry_value_expression>. It is not clear how these objects are stored in relational tables; one may assume they are stored as e.g. string or binary objects, or they are computed from the other attributes of the table. I feel that some explanations about such non-atomic data would help the reader to understand how it works. The concept of a function which returns an object like CENTROID would also benefit of some explanation -- it differs from any SQL standard function since it does not return a number nor a string. What do you get if you issue a select statement like
Select centroid(mytable.centre) from mytable
which, according to the BNF, looks correct ?
VOQL-TEG Answer: It is up to the implementing parties to decide how the geometry attributes are to be stored or if they are to be materialized at all. The same applies to
situations where a
is present in the SELECT list: The service is free to return a standard String serialization of the geometry.
- the ELLIPSE region is especially important in astronomical applications, like cross-matching or studies of surface brightness of galaxies. Was it omitted because the ellipse is a too complex geometry ?
- About the query expression: I share Jeff's comments about readability
where CIRCLE('ICRS', 123, 45,12) contains Point('ICRS', mytable.ra, mytable.dec) (b1)
looks more understandable than
where CONTAINS(Point('ICRS',mytable.ra, mytable.dec), CIRCLE('ICRS', 123, 45,12)) = 1 (b2)
The functionnalities are equivalent, and there is surely a way of mapping exactly (b1) into (b2) which is SQL-92 compliant.
Assuming that a table contains an attribute "coo" which represents a point, the 2 expressions would be
where CIRCLE('ICRS', 123, 45,12) contains mytable.coo
versus
where CONTAINS(mytable.coo, CIRCLE('ICRS', 123, 45,12)) = 1
VOQL-TEG Answer We had long discussions within the group about function vs. operator approach. We finally decided to go for the functional approach as it would conform an automatically valid SQL statement, provided that the mapping of the actual parameters on to the precise definition of the function is correct. As Alex Szalay showed in the Cambridge Interop, the functional approach could be implemented directly in SQL.
- I don't see the necessity of separating math and trigo functions; the table in section 2.3.1 gathers both functions... Maybe just modify the caption of this table ?
- Terminology: rather than <system_defined_function> I would prefer the term <adql_defined_function>
VOQL-TEG Answer BNF construct names like <system_defined_function> and <user_defined_function> have been discussed and agreed within the VOQL-TEG already.
- about the section numbering: it looks strange to require a 6-level numbering to present ADQL; is it to compete with the STC document ?
It might look rational for the writer, but it looks really not balanced for the reader ... and the table of contents is therfore lacking the important topics related to the region stuff.
VOQL-TEG Answer: Ok. See response to BobHanisch
for his last comment.
In summary, the new document clarifies the concepts (even if some details could be even better clarified :-); and the presence of the BNF appendix is really helpful.
(1) The standard as given is very simple in terms of coordinate frames etc -- assumes all handled elsewhere. What should a user do if they wish to query a table that is in galactic coordinates or B1950.0? What if the user wishes to write queries in other coordinate systems?
(2) The functions defined in the paper have poor extensibility because of these fixed fortran-like arguments. Why not plan for the future extension of the standard, and also make things more readable, with something like this:
POINT(frame="ICRS" equinox="1950" position2d="270.0, 22.0")
VOQL-TEG Answer: We expended a lot of energy in debating syntax, particularly the difference between a function based approach and an operator approach for things like CONTAINS and INTERSECTS (See the answer to Francois' points above). This is of a similar ilk, with questions of usability and extensibility and ease of implementation. The short answer is that it is an implementation question (though that is too simple!). You can perhaps see it partly depends on documentation (the parameters could be well documented in a user manual) and also on the presence or absence of an underlying parser. In the end we compromised and went for the simplest functional approach, as it would be a virtual pass through to a valid SQL statement.
(3) The given document is difficult to read with its single-minded focus on bulding a BNF grammar. It needs examples to show the semantics and semiotics of the BNF statements. Without examples it is difficult for me to understand semantics: either how customers might respond, or how difficult it is to implement. How, for example is the LONGITUDE(POINT) used?
-- Can I say this:
SELECT * from Table where LONGITUDE(POINT) between 270 and 285
If so, what kind of tables have POINT structures in them?
-- Can I say this to implement a SIAP?
SELECT * from Imagetable where INTERSECT(POLYGON, SDSS_DR6_FOOTPRINT)
-- Can I say
SELECT POLYGON from Imagetable
What kind of tables can be interpreted to have POLYGON attributes?
VOQL-TEG Answer: In Trieste we agreed to create an auxiliary document with examples.
(4) What kind of table can I specify in the select clause? Can I use VOSpace, like this:
SELECT * from vos://nvo.caltech.edu/myTable.dat where INNER JOIN SDSS_DR6 ....
I ask this, because the DAL group is assuming that everyone will have a VOSpace that can provide temporary storage, and that crossmatches are done through ADQL queries. An alternative is for VOQL services to allow qualified users to upload tables, is that envisioned?
VOQL-TEG Answer: Yes, a VOSpace URI could be used as a table name if escaping it with double quotes. As can be seen in the grammar, the table name construct allows either a regular identifier (made of latin letters, digits and/or underscores) or a delimited identifier (double quotes being the escape/delimiter characters).
(5) There is already a specification of spatial regions (*) that is semantically identical to the STC standard, which is and IVOA Recommendation.
But the definitions in the ADQL document use different words and different meanings from this, (eg one says BOX and the other says RECTANGLE). Have the ADQL authors read the existing IVOA standards? If so, what has been inherited from the STC standard into the ADQL?
VOQL-TEG Answer: See response to BobHanisch, Section 2.3.2 and 2.3.2.2.5.11 (above)
(*) http://www.ivoa.net/Documents/Notes/STC-S/STC-S-20071205.html, section 4.2
- I agree with Francois that the the Functions table in sec 2.3 needs some more attention:
- are trig function arguments and return values in degrees or radians?
- how does the seed work in the
rand(x) function? I guess x is the seed - is it an integer or a float? It looks like you need a seed every time you invoke it, but that wouldn't be very convenient.
VOQL-TEG Answer: The seed is supposed to be an integer. We will detail the data type for each function argument in the next version of the document.
- how come the
round(x,n) ("Round to nearest integer") function has two arguments?
VOQL-TEG Answer: The function description is not complete. It rounds to the given number of decimal places, with the default being to round to the nearest integer. You use a negative number to round to the left of the decimal point. The function with two args is supported by the main manufacturers (both parameters are required in SQL Server). The function description will be corrected in the next version of the document.
- Some other functions need to be documented more carefully:
-
LONGITUDE and LATITUDE - the units of the returned values need to be specified (presumably degrees). So do the ranges: I guess LATITUDE will return a value in the range [-90,+90]. LONGITUDE suggests (to me) a quantity in the range [-180,+180], but given that it's basically talking about Right Ascension people will probably be expecting [0,360]. Or can it be either, left up to the implementation? (I suggest not). Either way, document the intention. Permitted ranges should also be noted for supplying latitude and (especially) longitude values as arguments to the various geometry functions - will values outside the nominal range of lat/long be wrapped automatically or are they errors?
-
AREA - what are the units of the return value?
- VOQL-TEG Answer: The
LONGITUDE and LATITUDE functions are meant to extract the first and second coordinate from an attribute of POINT type. No coordinate transformation is supposed to take place.
- Coordinate systems: Is there a vocabulary for these? The only example given is 'ICRS' and the BNF is specified only as
<string_value_expression>. Are coordinate system specifications intended to be lifted from STC (or elsewhere) or is it intended that the implementation may define any values it likes for known coordinate systems? In practice - if only for testing purposes - it is often convenient to use "the default coordinate system", i.e. whatever is in use in the database; is there some way this can be selected?
VOQL-TEG Answer: In Trieste we agreed to define a vocabulary for the possible coordinate systems.
- It seems a bit strange that you can extract the longitude and latitude, but not the coordinate system from a POINT. But maybe it's OK - I must admit I don't have a clear idea of how these constructs are expected to be used.
VOQL-TEG Answer: A function for extracting the coordinate system from a POINT will be added.
I am not happy with the way ADQL handles STC metadata in the PR.
First, as a minor point, it would be nice if an isodatetime data type
were added, so that we could directly use ISO 8601 datatime strings:
yyyy[-mm[-dd[Thh[:mm[:ss[.s...]]]]]]:
I.e., the limited ISO 8601 format only, as defined for FITS and in STC.
VOQL-TEG Answer: RDBMSs implicitly convert strings to internal (datetime or timestamp) form using a variety of techniques; ISO8601 format is an acceptable format already. As with other string representations, it is a service capability (possibly mandatory) to understand specific formats.
The PR introduces a "geometry" data type, but it really is a
hodgepodge of stings - essentially, a geometry data type is defined,
if I understand it correctly, as a string that contains a function
call, with a variety of functions and parameters that seem to be
reinventing STC.
If one wants to introduce a new data type, there is a much more
elegant and ready-made option: an STC-S string.
I would be more inclined to call it an STC data type, since the term
geometry data type is misleadingly narrow, but that can be discussed.
VOQL-TEG Answer: As described in the attached presentation (and above) STC can be used within the REGION construct; it is up to a service to specify that it understands STC and they can support as little or much as is feasible. The ADQL language itself remains independent of any specific string representation or version thereof. In the Interop discussions it was explicitly stated that we expect TAP services to use standards (for region and the coordinate system arguments). The text within the PR will be expanded to make this clear.
Having defined such a data type, one can use the to-be-developed STC
library to verify its validity and manipulate (interpret) the string,
and define a set of STC core functions to provide the necessary
operations, such as intersection, union, contained_in, etc.
The STC-S string has the added advantage that it can also handle
coordinates, so that problem would be resolved, too.
As it is, the PR basically develops its own STC model, through an
ad-hoc set of functions and parameters. This, I think, is
undesirable. If the IVOA has an accepted standard for certain
structures, one should use it, rather than roll one's own. The reason
is a very practical one: it encapsulates the model issues in that
particular standard, allowing its related library to handle the
details in a transparent and uniform way across applications, allowing
models to evolve as necessary without impacting the users.
If ADQL and SIAP and SSAP and Registry all would start using their own
definitions of, say, Circle we create chaos for the user.
I have heard many complain that incorporating STC greatly complicates
the systems into which it is to be integrated. That is nonsense.
I have said it numerous times and will say it again:
There is nothing wrong with applications recognizing only a limited
subset of STC structures and coordinate systems - returning an error
if they encounter something they cannot handle.
That is the way STC was incorporated into VOEvent - and it works.
It's perfectly acceptable to accept only simple strings and a limited
set of shapes and coordinate systems. It is not hard to parse those
strings and the enormous advantage is that (a) one is compliant with
an IVOA standard and (b) one is ready to expand functionality to
include more cases, shapes, and coordinate systems at any time,
without any changes to the interface, just by adding code to the
server software.
Aside from the fact that it is not a good idea to develop a new STC
model for ADQL, there are a number of specific problems with the way
it has been done, as well.
The only coordinate system flavor recognized in the PR is 2-D
spherical. Coordinates are specifically assumed to consist of a
longitude and latitude. This may seem sensible, but it means that one
is painting oneself into a corner for future extensions.
VOQL-TEG Answer: The defined geometry types are 2-D and the LONGITUDE/LATITUDE functions unnecessarily restrict it to spherical coordinates (see above for more about LONG/LAT).
VOQL-TEG Answer: The PR was arrived at by balancing usefulness and complexity and the TEG feels we have found the best comprimise. We have considered future extensibility and found that it is plausible to make only modest extensions beyond the current PR. It is feasible to add 1-D regions (intervals) and 2-D ellipse in a way that is consistent with the PR. After that, one would have to break with SQL to add much value.
Another issue is that, again, if I understand it correctly, the
parameters in the function calls may be valued or consist of column
references. The problem is that there is no way to handle the
coordinate system if the coordinate elements consist of references.
I cannot say "function ('ICRS', table.ra, table.dec)" because I do not necessarily know whether the table contains ICRS positions - nor
should I have to.
VOQL-TEG Answer: The first argument should be another column in the table, or the whole construct (a POINT) should be taken from a single column of the table. See attached presentation.
If I ask whether the positions in a table lie within
a certain region that I define in a certain coordinate system, the
server (knowing what its own coordinate system is, one would hope)
should perform the necessary transformations.
The PR specifies that when a server cannot perform a necessary
transformation, it should return a NULL. I question whether this is a
sensible thing: it is indistinguishable from "I have no data". I would
much rather see a standardized warning returned.
Section 2.1.2. RAND is listed as a reserved word, but isn't in the list of mathematical functions.
VOQL-TEG Answer: John, Not quite sure which copy of the spec you have. In mine, RAND is there in the table of maths functions. JeffLusted
Functions to do bitwise operations would be useful - some datasets such as the SDSS need them to select objects with particular flags set.
VOQL-TEG Answer: We are aware the the language still has shortcomings, the absence of bitwise operations being one. These should be up for consideration in the next version. JeffLusted
Comments by TCG
Chairs should add their comments under their name.
Tom McGlynn (Applications WG)
Christophe Arviset (TCG)
Keith Noddle (Data Access Layer)
Matthew Graham (Grid & Web Services WG)
Bob Hanisch (Data Curation & Preservation IG)
Herve Wosniak (Theory IG)
Mireille Louys (Data Models WG)
Francois Ochsenbein (VOTable WG)
Pedro Osuna (VOQL WG)
Ray Plante (Resource Registry WG)
Sebastien Derriere (Semantics WG)
Rob Seaman (VOEvent WG)