*IVOA DAL WG Running Meeting #2 Monday 27 July 2020 - 22:00 UTC - vconf Participants: (8) Marco Molinaro, Brent Miszalski, Dave Morris, James Dempsey, Serge Monkewitz, Tim Jenness, Gregory Dubois-Felsmann, Jesus Salgado *Agenda: 1. position of service-descriptor w.r.t. DataLink usages 1. multi-column primary key annotation in TAP 1. ConeSearch roadmap *Notes *1. ordering in VOTable & DataLink See slack post: * https://ivoa.slack.com/archives/CB51P1MC7/p1592254695028400 (MM: I copy here the post since it won't live forever in slack chat) [Serge Monkewitz] Hi all - IRSA is currently working on returning DataLink service descriptors in TAP query results. One question that came up is whether we should put the service descriptor ahead of the "results" . It would be nice for clients to have descriptor metadata available before reading through all the query results, but then column references in the service descriptor would point forwards in the document. This goes against the recommendation of the VO table 1.4 standard, which says that: The ID attribute is therefore required in the elements which have to be referenced, but the elements having an ID attribute do not need to be referenced. From VOTable 1.2, it is further recommended to place the ID attribute prior to referencing it whenever possible. I am curious if ignoring this recommendation causes known issues with clients (failures, service descriptors being ignored...), and what others have chosen to do here. Thanks for any comments/advice! [Gregory Dubois-Felsmann] … one motivation being the ability to present a full-featured, DataLink-driven UI for the initially received rows of a very large dataset even if the remaining data are being delivered and processed in a streaming manner. [reply from Markus] Since we've not done this from the start, it's difficult to fix much now. It would be nice to be able to say "if you've parsed a resource, you've parsed all datalink services there are. So, my proposal would be to deprecate all uses different from * * ... * * * i.e., datalink resources are nested behind all TABLES beloning to a RESOURCE. I wonder if any clients break when we move to a nested resource... If we can't have that, I don't see what's to be gained by imposing additional rules. [remark from François] I don't think this is a real issue. Having the ref attribute before its target is only a general recommendation but cannot be considered as mandatory. It is quite reasonable that clients read the service descriptor first prepare the interface according to the content of the RESOURCE and then search the apropriate ref attribute value into the DataLink table to identify associated rows in that Table. this also because a currebt usage is to have a DataLink table for a single ID "main" item/ Maybe a short sentence explaining that should be added to the spec ? [SM] issue on VOTable & action about client dealing with this. [GDF] need to digest Markus's remark. OK with opening issues on repos. [GDF] Maybe we can ping the tool developers specifically to see if they would support resources being up front? [DM] Do we have a (test) service instance that does this ? Having a service to test with might help in discussion with tool developers. *2. TAP multi-column primary keys See slack post: * https://ivoa.slack.com/archives/CB51P1MC7/p1592957700041100 (MM: I copy here the post since it won't live forever in slack chat) [Gregory Dubois-Felsmann] I am looking for a way that a TAP data provider could signal to a client, preferably both via TAP_SCHEMA and in the VOTable response to a query, that a column or columns in a specific table (or tabular response) constitute a unique key into that table. I have several use cases in mind for this, some rather outré, but including just the basic capability to assist a user of a graphical or programmatic client in ensuring that, when they select a subset of columns from a table of interest, that subset includes a unique key that would let them connect their work back to the full breadth of data in the table at a later date. I see that there is no meta.* UCD that has exactly this meaning (meta.id is an appropriate but broader concept). I see a “hack” that could be used: it seems to me that a TAP_SCHEMA.keys entry with from_table==target_table could represent such a concept, particularly if there were an available utype value that had the semantics of “unique key”. Has this come up before? [James Dempsey] I've used the UCD meta.id;meta.main for this purpose [Gregory Dubois-Felsmann] For concreteness’ sake, I’ll illustrate by example: I am performing a multi-epoch survey like ZTF or Rubin/LSST or one of the Vista surveys; I observe the same sky locations many times, and I analyze the data by coadding to find sources on the sky, and then given those detections, I go back to the single-epoch data to perform forced photometry. I end up with two tables: Object and ObjectObs. Object is a list of astrophysical objects that I have found in my dataset and is, potentially, a really wide table; ObjectObs is a list of single-epoch forced-photometry results and is joinable on Object.id == ObjectObs.objectid to obtain, for instance, a light curve for a single object. ObjectObs.epochid is the ID of the observation (single-epoch image) on which the measurement was made. Object.id is a unique primary key for Object; { ObjectObs.objectid, ObjectObs.epochid } is a unique key for ObjectObs. I already know how to use TAP_SCHEMA to indicate the cross-table key relationships here. But I’d also like to be able to document explicitly the unique keys for the two tables, with the second one being a composite key. I’m fine with giving Object.id the UCD meta.id;meta.main but I am less clear on how to do this for the ObjectObs table. Oh… I could create a in a VOTable and give the two separate columns ObjectObs.objectid and ObjectObs.epochid meta.id UCDs but give meta.id;meta.main only to the . But that’s not visible in TAP_SCHEMA or, I think, at the /tables endpoint. I’d have to get it via MAXREC=0, perhaps, if I wanted to know about it in advance of performing a large query. (edited) [Patrick Dowler] something more sophisticated w.r.t keys would be nice. The keys and key_columns more or less describe foreign keys (the standard way to join tables), but the minimalist indexed flag in columns doesn't even say unique let alone "primary key". And then multi-column indices... well, in youcat we have a requirement that goes in this direction as well. Users want to be able to specify a PK and right now they can create a unique index, but tap_schema.columns only gets updated to say indexed is true, so they aren't quite sure they did it right and can't (some months later) look at the uniqueness constraints. I have been procrastinating on adding PK and multi-column indices to the half-implemented unique we have until I can do it in a way that fits the TAP spec. So, sounds like we have a couple of places needing the same sort of enhancement reminder: in youcat, we added ways to create content but all the query usage and output is bog-standard TAP service; if you want users to be able to do things and see/verify the result (eg and write any kind of tests or validation) then we've hit the wall [reply from Markus] Is there *any* primary key annotation in TAP at this point? If we want that, I suppose I'd to another column in tap_schema.columns, "pk_part", perhaps (because primary is reserved). Then, multiple columns aren't a problem. This would be a good time to decide on such a thing, because I can quickly stick it into VODataService 1.2 at this point (while I'm waiting for takeup before RFC). [GDF] could use this directly in Firefly if it is in the spec [MM] Could raise an issue on the TAP github repo but copy to mailing list as repo is not yet active. [GDF] I'll file issue. [MM] follow up by email *3. ConeSearch roadmap [MM] I'm doing some work on the repo, first goal is to clean it to make it usable as a proper WD for the community. I've made some fixes to the text with this goal. This requires a (small?) decision to re-instate Ray as an editor (after all, we're going minor) * also I wasn't able to find the originating NVO specification document [JD] reasonable to keep Ray in [GDF] with the chance to remove himself if he wants to [MM] Will email Ray to offer chance to remove himself [MM] Unable to find a link to the old NVO proto-cone search spec [GDF] ask NAVO for NVO products wrt the originating cone search (GDF to ask Bruce) [DM] Is max search radius allowed by a service in metadata? If so, should there be a maximum time window? [MM] We should consider adding a maximum time window. Also we should allow the time bounds available to query to be shown in the service metadata. [MM] add issue for time window constraints: timespan & datetime limits [DM] Can we use 180 as special value for an all sky query? DALI is currently working on a solution for this for all standards, so no sense in SCS going its own way with a new field. [SM] How would a service register they had a maximum radius plus support for all sky? [MM] Will have to discuss with registry [GDF] Do we have any providers who want to provide an all-sky search? [MM] Some time domain providers would. Can't just leave out the RA and Dec as that is an error, and that cannot be changed in a minor revision. [GDF] Its not clear if the ra, dec and sr must be specified by the user, or that they just need to be supported by the provider. [DM] Sections 2.1.1 to 2.1.3 indicate this in the WD [GDF] Also noted in section 2 point 3 in the 1.03 recommendation.