SIAP-2.0 & SODA-1.0 feedback & PyVO *(Wednesday May 8 - 15:00 UTC) Participants: 56 *Session notes: François' presentation (see slides for details https://wiki.ivoa.net/internal/IVOA/InterOpMay2020DAL/SIA2-SODA-nextPyVO-support.pdf) SIAP2.0 4.5 years old SODA 2017 Multiple service implementations - From Gregory to Everyone: (11:09 AM) 
https://irsa.ipac.caltech.edu/SIA/
 , documentation at https://irsa.ipac.caltech.edu/ibe/sia.html - From Bruce Berriman to Everyone: (11:16 AM) 
KOA also has an SIAPv2 instance
https://koa.ipac.caltech.edu/UserGuide/program_interface.html#query_examples
 PyVO client is recent Aladin and Topcat (pre-release) clients All the spec are on github with initial issues filed. Will be "next" version pages on twiki Feedback: 2 SIA2 errata * POS=RANGE inconsistency * Possible confusion between FORMAT and RESPONSEFORMAT PARAMETERs less useful if we have no prior idea of their possible values * Proposal to require the currently optional Service PARAMETER self description; then how to retrieve that? PyVO * New development by Adrian for ALMA, CADC PyVO feedback Due to issues with SIA2, CADC had to move to ObsTAP in order to Astroquery ALMA archive (port by Adrian to use PyVO had some features not acheivable with SIA2). - E.g., ObsCore has an optional release date parameter, but SIA2 doesn't provde corresponding query PARAMETER * STRING QUERY PARAMETER doesn't allow wild carding or incompletion * COLLECTION=HST_* not allowed * Case sensitivity an issue Compare SIA1 vs SIA2 SIA1 had cutout and mosaic modes in "1 shot" SIA2 or ObsTAP requires 2 shots via SODA, DataLink see slides for detailed comparison... SODA has 2 errata - Example for polarization parameter values is syntactically wrong - Wrong example for BAND interval MIN and MAX values (see: http://mail.ivoa.net/pipermail/dal/2020-April/008331.html ) SODA simple enhancement * allow pixel coordinate (instead of world coordinates) On the SODA side, CADC supports pixel cutouts using cfitsio syntax. It's pretty complicated (JJ: "rich") becaue the user needs to know a lot about the file structure to use it. It is well-used regardless. Curious about how to come up with a simpler SODA interface that would cover not just FITS (cfitsio being FITS-centric). * This is the issue behind MAST wanting to know the best way to expose the TESScut (Astrocut) service. * Definitely desired by Rubin/LSST * [Pat] What's the syntax for an N-dimensional cube? * [James] CASDA have used CHANNEL with a range for the spectral axis. Not usre about spatial syntax but maybe something around x,y * At CADC we support pixel cutout in cfitsio syntax, but client has to know a lot to do that (especially with multi-extension FITS files) * maybe: BAND=100 200 & POS=circle 100 200 20 & FRAME=pix ?? Beyond simple cutout * proposal for rebinning/regridding * Cutouts from a HEALPix cube reprojected to TAN will be likely to be of interest to IRSA for SPHEREx Several open questions for "beyond simple cutout" * extended to SIA in "virtual mode" * "Virtual" (on-demand) image data products are a key part of the Rubin/LSST data model * extend rebinning/reproj beyond spatial axis? * TimeSeries * spectral and polarization axes * How to simulate this new behavior in ObsTAP context? * dedicated Extension of ADQL ? ● Can ObsTAP discover virtual data ? * Looking forward to discussing this in more detail ● Should we allow SIAP2/SIAP1 mixture in the meantime? ● This is possible : add new parameters in the query and FIELDs in the response. ● In that case it doesn’t have to be normalized * What is the border between standard service, custom service and « code to the data » on science platforms to do such things ? ● Cube generation from visibility data is probaby NOT a SODA thing Document source still to be ported to github Q1 [M.Molinaro] scientific concerns in soda next steps: rebinning, regridding, merging/mosaicing Concern is how to make clear to the user that the manipulation was done. JJ Clarifies Astroquery issue with SIA2. It's just that some of the features already exposed in the ALMA module were not possible with SIA2, so they used ObsTAP. Those features don't necessarily need to appear in SIA2 since they were at a fairly detailed level. FB: Please comment on the github issue Pat: Idea behind ObsTAP and SIAv2 using the same model was to allow many of the same things, while allowing a path for a simpler query for some cases (via SIA2). That means we probably shouldn't look to complicate the SIA2 query to accommodate a lot of new features. [Pat] well, not overly complicate it ALMA services (TAP, SIAv2) is CADC code but it is not running at CADC: it runs at 3 ALMA archive sites. FB: What is the PyVO SODA interface doing? We think it's a light-weight wrapper to help with the parameters (type conversions, etc.) SB: Submitting a separate request to the service according to values in the data row or votable response - for example image cutouts Q4: [Gregory D-F] Querying file-oriented catalog data products (e.g., Parquet files from sky tiles) - this is OK with ObsCore (and therefore ObsTAP), using dataproduct_type=measurements, but SIAv2 is explicitly banned from returning these (from the DPTYPE parameter documentation: "For the SIA {query} resource, the only values that should be returned for dataproduct_type are image and cube, so this parameter can be only really be used to select one of these."). Could we re-write the standard to have "image+cube" be the *default* but allow other DPTYPE values to be explicitly requested? FB: A note has been written with Marco. Includes vocabulary issues, and annotation issues with time series. GDF: Should with separate the annotation issue? [Ada] I’m trying to find the Note, it was about discovery of time series —> this is a separate issue from annotation, I totally agree, and I thought that by having that note out that would be good for making the needed modifications that were highlighted there. I see now, that I have not followed up on that one
. --> http://www.ivoa.net/documents/Notes/TimeSeriesDiscoveryAndAccess/index.html [Pat] How about generalising SIA to Simple Observation Access without the dataproduct_type restruction? Expose all ObsCore content! also note that ObsCore dataproduct_type will soon be a vocabulary so more extensible... From Gregory to Everyone: (12:10 PM) 
“The time series "DataModelling" effort identified specific additional time and observable attributes necessary for "fine-grain" Discovery of the time series. These new attributes include (not exhaustively) the min and max exposure time per sample and the min and max time span separating samples (Nebot et al, 2018).” (from the note Ada posted)
 [Ada] The problem is that Nebot ... was not published since it never concluded... but I will find the table I was talking about where I defined what I considered the minimum information that could be useful for discovery of TimeSeries "a la" ObsCore and compared ObsCore and EPNCore. --> Found the information in a presentation I gave In Santiago: https://wiki.ivoa.net/internal/IVOA/InterOpOct2017TDIG/IVOA2017Santiago-DiscussionTDIG-DM-DAL-Session1.pdf +1 (Mireille) [Ada] And here is the draft Document we wrote back in 2018 (Nebot et al, 2018) http://volute.g-vo.org/svn/trunk/projects/time-domain/time-series/note/TSSerializationNote.pdf GDF: What does it mean that virtual data products work in "SIA" but not in ObsTAP? Rubin/LSST plans to have many non-persisted data products, and we don't see any problem with providing access to them via SIAv2/ObsTAP queries that return links-service (DataLink) access_url values, with the links table resulting from the followup query providing instructions for how to retrieve these on-demand data products. Can we clarify what was meant by "virtual"? [François B] For SIA1 in cutout mode, the reponse could be created by service but didn't exist yet. If we want that with ObsTAP, it's not clear how to do this. Doesn't every record returned by ObsTAP already have to be in a database? [Gregory D-F] Aha: the difference seems to be between virtual data products that can be enumerated _in advance of their creation_ AND _in advance of their query_ (as is the case for Rubin/LSST's plans - e.g., a difference image for a particular observation) vs. a virtual data product whose definition is driven by specific query parameters, as in the example given by François of a spectral cutout where the band edges are not known until the query is executed. So an ObsTAP query with a BAND range would be limited to returning records for _pre-defined_ data products that _overlapped_ the specified range, but an SIA query for that range could return a record for a virtual data product (e.g., a cutout from a spectral cube) that corresponded _exactly_ to the specified range. AM: If the user want two images (2 positions, each with separate sizes). The standard now gives back 4 images if you give those positions and sizes. FB: What parameters are being used for position and size? In SODA these are in the same parameter. Probably should solve this off line JD: We have a similar issue with two galaxies with different positions and velocities. if you provide 2x POS and 2xBAND then it would give you 4 products but you only want two object cutouts Pat: This would need an upload of a table with sets of linked parameters. It was considered for v1 but not included due to time. AM: What if you have multiple products as a result Pat: FB was right, the parameter are structured to allow to just getting the 2 images you want, but can also get all 4 if you want that. In SODA sync, Multi-extension FITS file (MEF) can contain multiple values. In SODA async, you can return multiple result URLs if you want. [Gregory] This can get arbitrarily hard: what if the individual outputs a service wants to return are already MEFs? Would it be reasonable to concatenate them? Depends on the data model, but in many cases that would be a disaster, if there are magic extension numbers and/or names in the data models of the individual MEFs. The "cartesian product" occurs when you submit different params: POS xN & BAND xM will give N*M results because there is no way to "couple" a group of param=value... UPLOAD From Gregory to Everyone: (12:03 PM) 
The multi-cutout via UPLOAD just described by Pat would be clearly useful for SPHEREx. Exactly the above use case of doing cutouts from different locations at different wavelengths will be relevant. Meeting has officially ended. Conversation continues informally, including... Annotation issues versus models issues (see note referenced above). Maybe add some time-related parameters to ObsCore. From Mark Allen to Everyone: (12:08 PM) 
cutouts using hips2fits may be a solution for many situations. Publish a hips, and you get a cut-out generator for free… Mentioned in Francois’ talk, http://alasky.u-strasbg.fr/hips-image-services/hips2fits From Kai Polsterer to Everyone: (12:09 PM) 
exactly Mark. hips2fits makes deep learning so much simpler. Depending on the task, you might need millions of cut-outs!!!
 *Questions: *(write your "Name: my question?" in this list) 1. [M.Molinaro] scientific concerns in soda next steps: rebinning, regridding, merging/mosaicing 1. [Gregory D-F] ProvDM auxiliary responses from SODA operations that perform non-trivial computations? 1. [Gregory D-F] SIAv2 validators? 1. [Gregory D-F] Querying file-oriented catalog data products (e.g., Parquet files from sky tiles) - this is OK with ObsCore (and therefore ObsTAP), using dataproduct_type=measurements, but SIAv2 is explicitly banned from returning these (from the DPTYPE parameter documentation: "For the SIA {query} resource, the only values that should be returned for dataproduct_type are image and cube, so this parameter can be only really be used to select one of these."). Could we re-write the standard to have "image+cube" be the *default* but allow other DPTYPE values to be explicitly requested?