*IVOA May 2021 Interoperability Meeting - DAL session *Time: Wednesday May 26 20:30 UTC participants; ~66 *schedule Markus Demleitner - Datalink XSLT Grégory Mantelet - Wrapping up ADQL-2.1 François Bonnarel - TimeSeries discovery and Access DAL possible solutions François Bonnarel - SIAV2-next and Simple dataset discovery *Notes *Datalink XSLT (Markus) Showcase: http://dc.g-vo.org/static/datalinks.shtml XSLT&JS to style out Datalink provided products. Basic: turn VOTable into an HTML one and extract a preview (if available). Semantics still not nicely reported. Further: turn it into a tree-like structure with Datalink core vocab help. Vocab term turned into a label and assigned a description. This requires JS to work. Prettify SODA parameters using Aladin Lite for graphical interaction and other features. It can be used within _your_ Datalink output just adding the XSLT preprocessing on top of the VOTable/XML. Looking forward for feedback and contributions (github repo: https:/github.com/msdemlei/datalink-xslt). JD: CASDA has a similar solution. *Wrap up ADQL-2.1 (Grégory) No issues on 2.1 anymore. New PR^2 soon and RFC. Walk through changes from 2.0 to 2.1: coordinate system argument deprecated for geometry functions, added syntax using POINT instead of coordinate couples only, DISTANCE to express positional x-match as preferred syntax, string functions additions, CAST function, datatype system definitions, care for identifiers to avoid users quoting reserved keywords and such, UDFs with prefix and Catalogue reference in Endorsed Note, ivo_healpix_index UDF as a step towards MOC support, "pagination" through OFFSET (+ ORDER BY + TOP), IN_UNIT conversion Set operations Common Table Expressions Implementations are on the way. List of next features and changes. Q: Petr Skoda: recently I needed to query SIMBAD to find the longest targetname (to create a table column wide enough in our local database) and it appeared that ADQL does not support LENGTH (string) is the new ADQL supporting it ? Markus: no, no LENGTH function yet in ADQL Jesus Salgado: CHAR_LENGTH is a reserved word but I think this is not commonly implemented PS: thanks all for explanation - I supposed that the LENGTH or CHAR_LENGTH is associated with things like LOWER etc ... And was suprised to find reserved word (thanks Jesus...) which did not work ... GM: About LENGTH(...) (or any other variant), I think it is something that we could at least consider for a next version of ADQL. I'll add this in my list and we'll see later. PS: thanks, Gregory - IMHO it should not be so complicated MD: Petr: It´s trivial as long as we have ASCII. It´s nasty once we don´t. You´ll quickly end up in the uncomfortable realm of VOTable´s unicodeChar type, which is in dire need of repair... MT: A LENGTH that just comes up with some number (possibly arguably wrong for unicode) would still be 95%+ useful. MD: But I´d not veto a LENGTH that says "ASCII only, please). PS: Mark, thanks for supporting me - the 80/20 rule would be enough (no need for 95 % ;-) GM: I agree Mark. And I also agree that some clarification about ASCII or Unicode would be needed. Anne Raugh: Does thinking in terms of "bytes" vs "characters" help? MD: characters vs. bytes is exactly the problem... AR: Just wondering if it was easier to count bytes without addressing what the bytes contains, and implement something that recognizes "characters" down the road. MD: Where VOTable, where the results probably end up in, has char (which is ASCII only in theory but contains UTF-8 in some cases), and unicodeChar (which contains UCS-2, which almost nobody can do any more) MD: Going to UTF-16, which is UCS-2 heir apparent, only deepens the trouble. So... I´d say "if length, then ASCII only". We can always say "non-ASCII behaviour is implementation-defined". PS: suppose the ADQL with LENGHT will be applied on published catalogues, with some names (objects, surveys ...) ... do we have many catalogues (e.g. in SIMBAD, VIZIER..) where non-ASCII strings are used ? MD: Ouch. What non-ASCII is in there? How is it retrieved right now? unicodeChars? XW: not non-ascii data in NED right now. But we are thinking about it for certain units for photometry data XW: like super script and subscript in unit string MD: Units are (probably) metadata, so XML does all the escaping for us, so it´s not a problem. MD: However: If you re-do unit strings, I´d politely request to use (at least in VOTables) VOUnit.. MT: VOUnit standard restricts to ASCII. NED isn't obliged to follow that of course, but it might be worth considering. PS: I would expect something like author or observer names in some lobserving logs, names of observatories etc ... But IMHO what goes to Vizier must be in FORTRAN like format README and pure ASCII Gregory Dubois-Felsmann: Yes, that should be a client-side task, to manifest “pretty-printed” units where appropriate. XW: the units that data was published in for display purpose XW: They are not uniform XW: Mark, I will check out VOUnit. NED data was there before VO Anais Oberto: In SIMBAD, names are only ASCII, but publication titles, abstracts do have utf8 XW: NED is the same as Simbad currently. MD: And you´re packing those into unicodeChars in TAP responses? AO: I assume it is like it is, in UTF8? I will check GM: Markus, I doubt so. It is probably a regular char datatype. I am pretty sure I did not implement unicodeChar in my library (not knowing at that time how to deal/identify such thing and in which case it would be useful)...that's definitely something that could be fixed anyway while I'll upgrade it entirely to TAP-1.1 (with the new way to express datatypes...as in VOTable). CADC implementation is still ADQL-2.0; may have time/effort to update this summer JD: is there an ADQL validator? Gregory: There's an online one for 2.0 that can be updated to take care of this. Mark T: Promotion of ADQL 2.1 to PR and RFC shouldn't happen before the validator (presumably VOLLT) is available. Tamara Civera: At CEFCA, we have just implemented the OFFSET property and it is very very useful. Users are already using it since the day it was available. Mark Taylor: Tamara, what are people doing with OFFSET - when do they want to use it? Baptiste Cecconi: in VESPA we need TOP+ORDER_BY+OFFSET for pagination TC: They are using it because we have a limitation in the maximum number of rows returned, so with the OFFSET property they can obtain all the data like in chuncks. MT: I see, makes sense. Thanks. Theresa Dower: Nice. There are MAST interfaces that use pagination, but TAP has not been able to, I am curious to try this out with our translator derived from Gregory's. Is this in the MSSQL class yet? I can try my hand at it as a pull request if not. MD: Well, TOP n for one won´t work with pg in any way I can imagine.. GM: Markus, I was also thinking to that particular example... Ed Sabol: We implemented ADQL on top of PostgreSQL+pgsphere using an SQL file that defines a variety of operators and types in PostgreSQL. Is there an updated version of that? Markus: I wasn´t aware there is such a thing. do you remember where you got it? Ed Sabol: Tom McGlynn got it from somewhere. Not sure where. PD: Tom wrote that, iirc MD: Ed: I´d be curious about it - also about what you do with it, because you probably still need an external translator, right? GM: Ed, as Markus, I'd be also curious about your implementation on top of Postgres+PgSphere with an SQL file. ES: Markus and Gregory: I checked the SQL files, and there’s no author information. Maybe Tom McGlynn developed them, actually. He has since retired. Anyway, I think I have your email, Markus. I can send them to you. Gregory, I don’t have your email, unfortunately. MD: Ed: That would be most appreciated. Of course, it might be a valuable resource for implementors, so if you can slap a licence on it and let us publish it in, say the IVOA wiki, that would be great. Xiuqin Wu: Ed: could you post the information in the note? ES: I think I would have to have a clear provenance in order to publish these SQL files with a license, but I am willing to share them privately. Considering their age, I imagine they implement ADQL 1.0 or thereabouts. *TimeSeries Discovery (François) Tentative definition of a time series. Goal of the talk: discover, access, not representation (for simple light curve or cube+mango - different note and talk) Discovery modes: source driven: time series attached to a source in a catalog (retrieved via Cone or TAP), get direct URL or Datalink (especially when multiple time series attached to same source or other products exist) ObsCore based: dataproduct_type = "time series" + other params. Are other params needed? Create a timeCore table to extend obscore? (presented in Nov.'20 interop) why not set up a parameter based I/F to obscore? extending SIA? (pointer to next talk) Mixed case, GAPS use case, using two table and joins Access: full retrieval by URL or DataLink or transform through SODA. Building time series from catalog data? Using responses from ConeSearch or TAP leveraging the TIMESYS and TIME parameter proposed in Cone-1.1. Using STMOC, also in querying Simple* protocols. New version of the TimeSeries Discovery and Access Note (soon on github). MD: what about using SSA directly using dataproduct type in SimpleDALRegExt? FB: yes, but SSA might be revisioned and put together with SIA. C.Boisson: how to handle missing evolving time series? FB: 2 questions: one can be answered by the obscore extension, the other requires span of sample. Ada: it's different to have t_exp only and min/max values to filter/discover. PD: I would like to see a solution that uses ObsCore (+possible extended metadata) so discovery can be done via a Simple param service and TAP (for more complex use cases, joins, etc). *SIAv2 Next and simple dataset discovery (Fançois) some history about feedback on SIAv2 (5.5 years after REC release) Feedback on SIA and SODA can be discussed on github, where SIA has been recently pushed. issues are already there (discussed in the slides) Some errata already filed in. How to distinguish "archive" from "virtual" mode if using SODA/Datalink solution for _cutout_ method non provided in 2.0 w.r.t. 1.0? Vocabulary usage for FACILITY and INSTRUMENT? Extension of the allowed dataproduct_type to other than image and cube? Some products seem easy to plug in, other (radio ones) not. Adding specific extensions brings in the question on what would be the behaviour when querying with extensions on a "simple" siav2? And how to call that protocol? G.D-F.: Rubin will make pretty heavy use of dataproduct_subtype, so we are definitely interested in the query-on-optional-parameter question. B. Cecconi: increasing dataproduct_type could be better than adding subtypes. MD: +1 on extending product-type. PD: +1 ML: +1 mainly because subtype is optional and not controlled by the standard JD: +1 it would be good to have a catalog type for instance G.D-F.: use case at Vera Rubin is for different products that have the same dataproduct_type but different meaning for it. PD: Simple Obscore Access Protocol? SOAP? ouch PD Joking aside, I like the idea of SDA and have a prototype based on the ObsCore view in our CAOM TAP service ready to go G.D_F: As presented last interop, will have catalogues both by TAP and as sky tile parquet files, would be good to have those discoverable by obscore and SIA2 MD: Could you use datalink G.D-F: Yes have looked at this but discovery via images not ideal