UDF catalogue Proposed Endorsed Note: Request for Comments
This PEN proposes a mechanism for listing
ADQL extension functions that are requried to work the same way across different data centers. It is a supplement to the
ADQL standard, and currently there's text in
ADQL 2.1 referencing it.
The latest version of the UDF catalogue can be found at:
http://ivoa.net/documents/udf-catalogue/
Version control:
https://github.com/jontxu/udf-catalogue
Reference Interoperable Implementations
As to the endorsed content, there should always be at least two services carrying the functions. Regrettably, the required RegTAP res_detail table doesn't carry UDF names (there's always a use case exactly for the most exotic feature you can think of), so to find these services, you'll have to rely on volutary extra keys parsed by the services.
The RegTAP service at
http://dc.g-vo.org/tap parses the extra res_detail. Use a query like
select access_url
from rr.res_detail
natural join rr.interface
natural join rr.capability
where detail_xpath='/capability/language/languageFeatures/feature/form'
and detail_value like 'ivo_healpix_index%'
and standard_id='ivo://ivoa.net/std/tap'
Since people have asked for it, here's how to look at things for the healpix functions. With the above query, you will identify (at this point) 24 services carrying the functions; it's a bit tricky to see which really use different software (see
these considerations over at Ops for more on this), but you can trust me that
https://gaia.ari.uni-heidelberg.de/tap and
http://dc.zah.uni-heidelberg.de/tap are different software (actually: I think
VizieR can do it, too, it's just missing from its capabilities).
So, fire up your TOPCAT and paste either URL into the TAP URL box in the TAP client. Then check out the PEN to find:
While ADQL does not support standalone evaluation of functions, a query like SELECT TOP 1 <example> AS res FROM TAP_SCHEMA.tables will return one row with the function result for simple, non-aggregate func-
Armed with this, try, for instance,
SELECT TOP 1 ivo_healpix_index(0, 1, 1) AS res FROM TAP_SCHEMA.tables
(from the first example in 2.1.1), and see that it actually returns 4.
If some volunteer wrote up a little bash/stilts (or pyvo) script to automate that, I'd gladly include it in the spec archive. --
MarkusDemleitner - 2020-09-02
Implementations Validators
Not really applicable here. It might be nice to have a few standard queries per function with the intended results, but since
ADQL doesn't have queries without from clauses and there is essentially no content in the underlying databases you can rely on, they're really hard to write.
Perhaps a case for an
ADQL extension: Facilitate writing tests?
Comments from the IVOA Community during RFC/TCG review period: 2020-02-15 .. 2020-03-30
The comments from the TCG members during the RFC/TCG review should be included in the next section.
In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your Wiki Name so that authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.
Additional discussion about any of the comments or responses can be conducted on the WG mailing list. However, please be sure to enter your initial comments here for full consideration in any future revisions of this document
- Comments by GregoryMantelet :
-
ivo_apply_pm(...)
: currently, it is in the "HEALPIX-related" section, but it is not Healpix related. I suggest to move it in a different section (e.g. "Coordinates") or to rename the "HEALPIX-related" section into something more generic.
- We've taken out the whole function; we'll need to think a bit more about which semantics we want, anyway -- MarkusDemleitner - 2020-08-11
-
ivo_healpix_index(spoint, ...)
: rename the 1st argument - spoint
- by point
or something else that does not make think about the datatype spoint
of PgSphere
-
gavo_transform(from_sys, to_sys, geo)
: what are the allowed syntaxes for the two 1st arguments? Can the equinox be specified in there?
- My suggestion would be that the third-party UDFs just have what the operators give in terms of documentation. This one I've updated in the meantime, and I will reflect that in the document; in general, I'd say problems in them should be reported to the service operators rather than the UDFcat editors. -- MarkusDemleitner - 2020-08-11
- Maybe few examples can be added to functions. It could be helpful for implementers (especially for
gavo_transform(...)
)
- For the normative ones, we've added examples in WD-20200806. For the others, again, I'd say that's up to the service operators. -- MarkusDemleitner - 2020-08-11
Comments from TCG member during the RFC/TCG Review Period: 2020-02-15 .. 2020-03-30
WG chairs or vice chairs must read the Document, provide comments if any (including on topics not directly linked to the Group matters) or indicate that they have no comment.
IG chairs or vice chairs are also encouraged to do the same, althought their inputs are not compulsory.
TCG Chair & Vice Chair
I very much like the idea of collecting these definitions in a central location, and find this particular location to be easy to understand, and hopefully easy enough to find. I endorse the note as is with the assumption that the link to the harvesting script (
http://www.ivoa.net/documents/udf-catalogue/20200806/harvestfuncs.py) will be fixed in subsequent deployments.
As discussed in other comments, I wonder if the maintenance and updating of this note will be a little too process heavy. Should that turn out to be the case, I would quickly endorse an update that describes a lighter-weight open development style process that stores the function definitions elsewhere.
I also understand why the "Implementation Validators" section above is declared not really applicable, and I think that's OK for now. Understanding the complexities (including for cases where the spec is deliberately flexible), I would still encourage more thought in that area after the approval of the note. Having some sort of test suite associated with a function definition could build confidence for interoperability, and could help define edge cases that may be inadvertantly ambiguous.
--
TomDonaldson - 2020-11-16
DAL is happy with the goal and main content of this Note and will be happy to support its Endorsement provided that the following comments are taken into account and fixed in the document.
- purely technical: the .pdf and .html available online and linked at the top of this page lack the internal references and references list. We suspect this is due to some issue in building them, since building the package from scratch from the svn provides them and fixes what looks like a locale issue in the version control metadata.
- general remark (also arisen by G. Mantelet above): examples of usage of the functions would be really valuable.
- Sec. 1 (Introduction), second paragraph ("In order to..."): this sentence is quite long and difficult to follow. We'd like it clarified and the reference to ADQL to be like "starting from ADQL-2.1" or "ADQL-2.1 and subsequent versions". We suggest a look at current wording in ADQL-2.1 (currently here: https://github.com/ivoa-std/ADQL/pull/26/files) to align the PEN statements.
- Forgot this for the WD; in commit cfb2c317b4e712eab2adab5bc5821b301b69d260, I'm now writing: "In order to avoid different signatures or semantics on functions offering identical or similar functionality, this document defines UDFs with names prefixed with ivo_. If services have functions with this names, they must work as defined here." -- do you like this better? As to a reference to ADQL 2.1: Why would you want that? Right now, we will make no ADQL 2.0 service invalid (because non has noncompliant ivo_ functions); in the future, even ADQL 2.0 services shouldn't deal with ivo_ differently from what we are saying here, no? -- MarkusDemleitner - 2020-08-11
- Sec. 1 (Introduction), third paragraph ("This note documents..."): also here we'd like some re-wording, plus we don't agree on the safety of having the maintenance devoted to svn commits (slightly better if GitHub issues) when the normative list actually comes from RECs and the non-normative comes from Registry harvesting (actually the MOC related UDF come from neither one...). We suggest to fix this by stating something on the implementation of these UDFs: if they come from RECs reference is in the REC, if they come from TAP service homogenization please list the providers previously using them (like gavo_ + cds_ -> ivo_) to understand where the norm comes from. At that point maintenance will be just a matter of re-checking/updating the Note on a regular basis or at REC issuing.
- The trouble is that the generation of the non-RegTAP functions hasn't worked this way. The HEALPix ivo_ UDFs were forged by me and Grégory, one of us standing in the others' office door. So, there's really no prior art we could claim, and if the functions are accepted, it's on the TCG members. As to saying anything about VCSes, we're now just referencing the DocStds -- MarkusDemleitner - 2020-08-11
- Sec. 1 (Introduction), fourth paragraph ("Note that no function..."): we are not sure stating a "must" is part of an EN, we suggest, again to try to harmonize the sentence to the vision the ADQL text reports: this is a best practice, for everyone to follow.
- Hm -- either it's a must or it's not; and we go for endoresement exactly to make this binding (a "best practice" could just be a note). Asked another way: Why would anyone want to violate this "must"? Would you actually want to downgrade validator diagnostics to a warning or an info if it found that a UDF doesn't do what it's supposed to? (well, I'm not mentioning that I still have a few "legacy" ivo_ functions that I should rename gavo_ -- but you ought be able to come after me for that) -- MarkusDemleitner - 2020-08-11
Other than the above some typos or clarification requests (some other, minor, sent to authors directly) are listed here:
- Sec. 2: maybe "precision" rather then "length" of (REAL) floating point values
- 2.1.1 ivo_healpix_index: ra/dec vs. long/lat parameters differ in the text w.r.t. the signature
- 2.1.2 ivo_healpix_index: hpxOrder missing in signature, reference to PgSphere spoint (see G. Mantelet comment)
- 2.1.4 ivo_apply_pm: tangential plane, we suggest usage of the pmra description w.r.t. the introductory one
- if it's independent on the frame, why no use lat, long, pmlat, pmlong, or, highlight the "Note that..."
- 2.2.2 ivo_nocasematch: please reword the "In ADQL 2.1 and later, use the ILIKE operator instead." into a more soft "Please consider that...suggest usage of ILIKE" or the like.
- 2.3.1 ivo_interval_overlaps: the shortened param names lead to typographical ambiguity on the "l" and there's a double "h1" in the signature. lowerN, higherN would be better?
- 2.3.2 ivo_interval_has: we'd prefer longer param names for better readability
- there are 2 appendixes "A"
- A1.6 is it really a 13c_angpix function? We suspect a typo
-
- All of these have been considered for the WD or the most recent commit, or the functions they refer to have been taken out of the spec. -- MarkusDemleitner - 2020-08-11
- Providing this catalogue is a great idea for both service and client developers and for interoperabolity in general. It is a tool necessary to avoid seeing the same functionality implemented with different names or parameters in different services.
- Giving naming rules for UDFs is also important. Identifying something in the global VO picture just by looking at its name is an important step toward a better reliability.
- Function prototypes are well defined as well.
- The risk of the exercise is that some functions behave slightly differently from one implementation to another, but in the present case the descriptions are clear enough to avoid such ambiguities.
--
LaurentMichel - 2020-10-22
The Note is well-written and comprehensible. It is not affecting direcly the GWS standards but the goal and the content of this note is extremely relevant. I do not have comments not already expressed but other WG/IG and that has not been addessed, so
I am willing to Endorse it.
I'm in general agreement with Ops that this is a good list to keep somewhere, but unsure whether the need for it to be continually updated works with or is in conflict with the process for a Note. I'll sign off if general consensus is to keep it in a Note by the time other WGs have weighed in.
I, frankly, am not happy about the sluggish RFC either. But inventing a new process for the relatively minor thing of reviewing new interoperable ADQL UDFs just seems appears unproportional to me. And, of course, if you want a wide review of something, I doubt that changing the process will make much of a difference in terms of net agility. -- MarkusDemleitner - 2020-10-19
With my TAP implementer hat on, I do find it useful to see what other data providers are calling their own support functions, and what functions already exist from standards other than ones we have implemented locally, as we add some UDFs to new and existing services. There are some UDF already available in MAST services, not all useful to include (especially under their current names; aliases may fix this), namely ones tightly coupled with other MAST interfaces, allowing for parallel queries to those in CASJOBS, etc. If we keep this a note, I may propose the inclusion of some generically useful existing UDFs from MAST in a future version, particularly UDFs for reference frame/scale conversion and the like.
--
TheresaDower - 2020-10-07
This Proposed Endorsed Note lists a series of User Defined Functions (UDF), which are now related to
ADQL but, as it is recalled in the text, in future may not be naturally associated with a particular standard.
For the identified functions, a human readable description is provided. A given description lists the set of input parameters (together with their type and physical meaning) and the output.
The goal of the Parameter Description Language is to deal with these descriptions in a standard way (cf. section 2 of
http://www.ivoa.net/documents/PDL/20140518/PR-PDL-1.0-20140518.pdf). Indeed one of the primary goals of PDL was to provide the community with a grammar allowing data/service providers to describe the parameters of their functions.
- A PDL description of the functions described in this note should be provided*: the XML description files should be online and the note contain the links pointing to the descriptions. If the PDL description is compact, this could also be reported in the annexe of the note.
- This use case is welcome for kicking-off the discussion about how to register (i.e. where to put and store) the PDL description of a given ressource in the IVOA ecosystem.
- Since PDL is dealing with semantics (a concept describing the physical meaning is attached to each input/output parameter) the concepts describing the parameters of UDFs should be into some IVOA vocabulary. If we follow VOC2 this should be official IVOA vocabularies, isnt'it ?
The advantage of providing PDL descriptions for UDF is in the client side: client software may parse the description and perform parameter consistency check before invoking the remote function. They also can rise alert if the meaning of a parameter is misunderstood by the users.
Carlo and I have discussed this a bit, and we figured that while having a uniform system of describing interfaces is highly desirable and thus PDL should be adopted by standards requiring this kind of metadata, this EN probably is not a good place to begin that; for one, the signatures are already implemented and work well for the sort of weak typing we find in SQL. More importantly, the UDF catalogue simply gives descriptions as required by TAPRegExt, and hence PDL would need to be adopted there; adding the PDL descriptions in the UDF catalogue would then probably be relatively trivial. -- MarkusDemleitner - 2020-10-19
General comment: it is useful to have this information in a centralised form, and the Note is generally well-written and comprehensible. I am willing to Endorse it. However, although an EN is more lightweight than a Recommendation-track document, the endorsement process is still typically quite slow. An alternative would be to have something more informal such as a wiki page (it's worked quite well for
SampMTypes, which has no enforced restrictions on editing, but has in practice been very stable).
One or two specific comments on the text beyond those already noted:
- Sec 2.1.3: make explicit that this function uses the HEALPix NESTED scheme.
- Sec 2.1.4: Consider use of mas/year rather than degrees/year here? That's the unit used in the Gaia source catalogue, which will presumably be a common input for this function. But if this description is codifying existing usage, it may be too late to make the change.
- Sec 2.2.4: Make explicit (following RegTAP) that the function returns zero in case of no match
TCG Vote : 2020-11-26 - 2020-12-11
If you have minor comments (typos) on the last version of the document please indicate it in the Comments column of the table and post them in the TCG comments section above with the date.
Group |
Yes |
No |
Abstain |
Comments |
TCG |
|
|
|
|
Apps |
* |
|
|
|
DAL |
* |
|
|
|
DM |
|
|
|
|
GWS |
* |
|
|
|
Registry |
|
|
|
|
Semantics |
|
|
|
|
DCP |
|
|
|
|
KDIG |
|
|
|
|
SSIG |
|
|
|
|
Theory |
|
|
|
|
TD |
|
|
|
|
Ops |
* |
|
|
|
<!StdProc |
|
|
|
|