* IVOA May 2021 Interoperability Meeting - DAL/DM session *Time: Wednesday May 27 06:30 UTC participants: 71 *schedule Brent Miszalski - Data Central’s Simple Spectral Access Service Vandana Desai - IVOA Spectral Models and Access in the Era of Big Data Jesus Salgado - ObsLocTAP status François Bonnarel - Radio data and interferometry ObsCore extension proposal *Notes *Brent Miszalski - Data Central’s Simple Spectral Access Service SIA (july 2020) + SSA (feb. 2021) -> more on the second. No TAP underneath, python3 Django model based. M+ (growing) spectra Obscore django model attached to each spectrum to complement the DataCentral model one. Develope specutils loaders to take care of heterogeneous spectra. Different output data formats: VOTable by default, FITS supported. 1D simplified directly via access_url. Original spectra also available. Example python scripts provided. (various examples/screenshots/features exemplified in slides) Plan to reduce 2fFdr from AAT to incorporate them into the SSA service. Future: use STMOC as input, SODA/Datalink for further access, maybe a TAP wrapper to the obscore table underneath, register the services (SIA and SSA). Looking forward to SIA/SSA generalisation. G.D-F.: Pipeline-aaS - could this use UWS? Brent: django and celery, to be able to recover jobs that fail. Will look into UWS when implementing system in production. FB: for generic S-Dataset-protocol: is there additional parameters beyond the obscore ones, for spectra, time domain related maybe? Brent: mostly covered by current specification. time domain, nothing specific, but used start and end time for spectra are curated. G.D-F.: Brent mentioned "seeing" as missing from the ObsCore data model. Are s_resolution (mandatory) and s_resolution_min/max not suitable for this at a generic-interface level? BM: Not really, as the fibres don't have spatial resolution. I did consider using s_resolution but it felt a little awkward. Would it be helpful if I put together some of these (possible) gaps in the data model? Yes, that would definitely be helpful. Brent also mentioned adding a query parameter to SSA to support querying against the rest-frame wavelength of the spectral coverage (if it is known) - that might be an extension to ObsCore to consider along the lines that François suggested. BM: Great idea. I'd be happy to add some more details on this. I've tried looking for a SSA IVOA std doc on github to add notes, but could not see one. Is there an obscore one? Some of the details may be found here: https://docs.datacentral.org.au/reference/services/simple-spectral-access-ssa-service/ Redshift I think was also missing IIRC. I *think* SSA and ObsCore have not yet been migrated to GitHub. There is a lot of discussion of extensions to ObsCore, so that might be a good one to transition soon. BM: Yes, that would be great to see on GitHub and I'd be happy to add some notes there. *Vandana Desai - IVOA Spectral Models and Access in the Era of Big Data SpecDM: 1.1 (2011), 2.0 (2016 abandoned) -> started from v.1.1 spectral model and access assessment: it's time! numbers are growing... how to help archive re-use of existing spectral holdings: - quick browse high level data : 1D spectra - spectra mixing accross various missions and measurements How SpecDM can help? Plot requirements for: spectral orders and limits for SEDs -> proposed changes to SDM-1.1 Spectral orders can overlap. 2 options: - separate table for each order (complex) - put order in a column (straightforward, UCD exists, but no utype) Plot limits for SEDs, upper or lower Current SDM-1.1 represent them as highly asymmetric errors, that not exactly what it means. Plus lower limits are not addressed. Proposal: new utypes combined to existing UCDs Requests: SDM-1.1 uses OGIP units, why not move to VOUnits? FITS representation in download for many multiple spectra (instead of VOTable) redshift and other features for spectra need to be added overplot synthetic photometry Better a small revision with the utypes above while looking for a full major revision. FB: apart the utypes, do you think having ranges for that obscore like could be useful? Vandana: need to think over it and discuss with users. Michèle Sanguillon - Another request: Pollux, which provides synthetic stellar spectra, also needs a new feature in the SDM1.1: the possibility to have in addition of the flux axis, a normalised flux axis (the DM proposes only one flux axis) Alberto Micol: You proposed an upper limit and a lower limit column, but I do not understand how that will work with the FITS binary table which requires every spectrum or SED to be serialised in one record using arrays: one cell for the wavelength array, one cell for the flux array, etc. While I see the usefulness of an "order" column I cannot figure out how you'd serialise the upper/lower limits into columns. Could you explain that? G.D-F. In general it is meaningful to quote both a central value with (possibly asymmetric) error bars and statistical upper (and sometimes lower) limits. For values that are statistically consistent with zero, at some significance threshold, one tends to prefer to quote a limit for some purposes. But the original measurement is still relevant statistically, particularly for use in combining with other measurements. In *current* IRSA datasets we usually only have *either* a central value with uncertainties *or* a limit, but this is likely to change in the future. For now, we use the "limit" columns to allow displaying spectral and/or SED data points for which only a limit is quoted in the source. Firefly displays them distinctively, as Vandana showed. Distinguishing them from proper measurements allows clients to avoid incorrectly statistically combining the two. A.M.: It is the serialisation part that I do not fully understand: would be the upper limit a column containing one array of booleans that tells if the related flux point is a real measurement or an upper limit? In the VOTable serialization we are literally using a different column, so that we have the ability to quote both a measurement and a limit, as I noted above. We have not confronted the FITS serialization of this. Obviously it is not efficient for bulk spectral data to have multiple columns - this is why many existing datasets (e.g., WISE photometry) overload limits onto the flux column, with some "magic number" flags to indicate the meaning of the value found in the column. As Vandana said, we are now moving on to the bulk-spectral-data problem and will be looking at this. Thanks! G.D-F.: François, we are currently proposing Spectrum.Data.FluxAxis UTypes for the per-point limits. For dataset-level bounds on the largest and smallest fluxes in the dataset, one could use Spectrum.Char.FluxAxis.Coverage.Bounds. This does not appear in SpectrumDM 1.1 but, as Vandana said, perhaps we should think about use cases. I can imagine that if someone is studying faint objects, the ability to exclude from an SSA query all spectra that simply don't reach down that far (in the selected wavelength range?) would be of interest for discovery purposes. Laurent Michel: Any SDM update would require to serialize the model in VODML. This is a bit of work but this would allow to use more advanced annotations, such as shown in DM workshop, that could help to sort out complex use-cases. Mark Cresitello wrote below the second part of my statement about using cube. In any case, I think we have to meet with MCD in the frame of the DMWG to discuss this option. VRD: Yes, let us do that. I do remember that you reached out after the last talk, and I didn't get back to you. Sorry-- it was a busy time but we do want to push this forward. Very frankly: Let´s make this a small, 1.x update, keeping SDM as it is right now ((meaning: in particular, not try to stick it into VO-DML, which would dramatically change it). Otherwise this will take forever. So... I´d volunteer to take the existing document to ivoatex/github, and then we´d just add a few utypes and let the rest as it is. -- MarkusDemleitner G.D-F. 100% agreement from me (speaking just for myself). We're not trying to derail the next-generation work. Xiuqin Wu: Same thought as Gregory stated above. VRD: Laurent & Markus, I think I am not completely understanding the process. I will defer to those who have more experience here. Mark CD: Markus, I agree with you on that, no need to divert attention from the current plan to make a vodml compliant version of Spectrum 1.1. Mark Cresitello-Dittmar: I was heavily involved in the Spectal-2.0 model work which was abandoned. The focus moved toward the Cube model. The current 'plan' is to use Cube as the basis and represent TimeSeries and Spectra as cuts/specializations of that. I'd be interested to get. your impressions on this and possibly see how/if Firefly can evolve into this new paradigm. VRD: Mark, we have use cases for Spitzer, SOFIA, Euclid, and SPHEREx that mirror the kind of representation you are talking about: taking a cube and slicing it to arrive at a 1-D spectrum. To answer your question with any meaning, I would need to study the SDM2 further. Ditto; I don't feel like I understand the direction of the next-generation model yet. Perhaps we could use one of the running meetings to explore this? running meetings? (these are monthly meetings of the DAL group - online and public) Ah.. those! got it. JD: Yes there seems to be plenty of interest. I'll pencil it in Petr Skoda: The normalised spectra (to continuum level) is quite common in stellar astronomy. There are no units on flux - it is unitless (just a ratio of spectrum and a mathematical function - like spline, polynomial) . in SDM 1 it was not possible to express this correctly - please take this into account I can find the dicussion in old DM list and bring it back. Please contact me when you open the discussion on a new DM Michèle Sanguillon: I am also interested! BM:We have a lot of normalised spectra in our SSA service from GALAH DR3. We chose to use dataproduct_subtype = normalised for the normalised spectra, which is not ideal. More of a short term work around. The non-normalised spectra have dataproduct_subtype = combined (they are combinations of several exposures usually). An example query is e.g. https://datacentral.org.au/vo/ssa/query?COLLECTION=galah_dr3&REQUEST=queryData&MAXREC=100 (open with e.g. TOPCAT) P.S. Concerning the orders - it would be useful for echelle spectra as well and there are in fact two order numbers. One is the relative order counting from 1 (in IRAF it is called aperture number) and then there is an ABSOLUTE echelle order number - it is the real n where the echelle optics is working (number of wavelengths whic will fit the distance between two grooves on echelle grating. This absolute is important for checking the wavelength calibration etc ... It sounds like those might even need distinct UCDs. PS. Yes, IMHO the echelle spectra were abandoned in SSA and SDM for years (I tried it to push it since 2009 or so .. BM: Very interested to see echelle spectra supported too. We have a lot of archival echelle spectra taken with the AAT (e.g. UCLES). Petr's note about the orders for a check on wavelength calibration are spot on. Reduction pipelines can sometimes misidentify orders, resulting in order-scale wavelength shifts that are quite bad, so it's useful to be able to pick up on those. Markus' previous proposals on SSAP: * https://blog.g-vo.org/from-byurakan-to-l2-short-spectra/ * http://docs.g-vo.org/talks/2012-urbana-ssapstate.pdf PS. I am glad that now there is a better knowledge about such issues in VO spectra handling than several years ago. I think it is time to start to work on it (not only echelle ) on both SDM and SSA . Will you join ? *Jesus Salgado - ObsLocTAP status overview of the model and how obs planning is not homogeneous in the real world RFC status (~finished), implementations available, including work on a docker-containerised solution. Reference client implementation available. STILTS/taplint validation available. Registration of ObsLocTAP services: example included in the document, using tableset utype with TAP capability ivoid. Review of comments from TCG in the RFC period. Open issue: moving objects in the solar system and transits. There's the idea to endorse this spec in IAU: having it REC soon could help. Baptiste (comment) it would be useful also for radio astronomy. *François Bonnarel - Radio data and interferometry ObsCore extension proposal complex correspondence raw to science data method 1: datalink, but no natural description and it's a 2-step discovery method2: obstap/siav2 service, not obvious mapping visibility: frequency better than wl (issue), maybe better a UDF to take care of the transformation (a draft for that is discussed in https://blog.g-vo.org/spectral-units-in-adql/). splitting observations in multiple obs dtasets no unique solution for the science cubes you end up with, spatial, time, spectral are usually ranges. Giving rough values or more accuaret min/max ObsCore extension: - adding table to ivoa schema Already existing examples. Otherwise extend the content of the obscore table Mark Kettenis: Problem with two tables is that you need something to join on and to my knowledge there is no obvious “unique” column in ObsCore to do that on… Alberto MIcol: @Mark Kettenis: obs_publisher_did is the primary key MK: That isn’t spelled out in the standard is it? Markus: No, it´s not... MK: (and our current implementation uses the same obs_publisher_did for different targets in the same dataset) Markus: I´m not sure that´s a good idea (with a view to datalink, for instance), but I´m quite sure it´s not against any regulation. MK: The reason is pretty much datalink since you get back the same datalink for rows that share the same obs_publisher_did AM: Obscore specifies: 4.5. Publisher Dataset Identifier (obs_publisher_did) The obd_publisher_did column contains the IVOA dataset identifier (Plante and al. 2007) for the published data product. This value must be unique within the namespace controlled by the dataset publisher (data center). MD: Yeah... you can´t have the same pubDID for two datasets. MK: Yes, in our case the obs_publisher_did is unique for each dataproduct MD: but a dataset can be in obscore multiple times (e.g., a spectrum in FITS and SDM). MK: it is just that we have multiple ObsCore rows that point to the same data product AM: Different formats for the same product should be addressed in datalink more than in obscore, but that does not matter here, as the join with a second table will indeed work, as you’d want both the FITS and the SDM records be joined with extra information in a second table MD: That, of course, is true. Both parts. The SSAP way for format selection is of course horrible. AM: we need to upgrade SSAP! MK: So this all stems from the fact that VLBI observations include multiple targets/pointings in the same observation. MD: Oh... yeah, I´ve long advocated fixing SSAP, at least since the Rio interop. But someone would have to do it... MK: And splitting a VLBI dataset by target really makes little sense MK: scientifically MK: so the choice I face is 1. having multiple obs_publisher_did's for the same data product or 2. having a 1:1 correspondence between obs_publisher_did's and data products but have multiple rows (for each target/spectral range) with the same obs_publisher_did +1 The toy model with came up with 6 months ago was for the *table* name to be algorithmically derivable from the dataproduct_type, and the join key to be the obs_publisher_did, which is guaranteed to be not_null. To quote: "The obs_publisher_did column contains the IVOA dataset identifier (Plante and al. 2007) for the published data product. This value must be unique within the namespace controlled by the dataset publisher (data center). The value will also be globally unique since each publisher has a unique IVOA registered publisher ID. The same dataset may however have more than one publisher dataset identifier if it is published in more than one location; the creator DID, if defined for the given dataset, would be the same regardless of where the data is published. "The returned obs_publisher_did for a static data product should remain identical through time for future reference. "Values in the obs_publisher_did column must not be NULL. " In cases where (as mentioned in the Zoom chat) there are multiple rows in the ObsCore data that have the same obs_publisher_did, my understanding of what's been said above is that they represent different realizations of the semantically same dataset, and therefore there would probably be no harm in using it to join with the additional dataproduct-specific characterization data in the additional tables we're considering. MK: The uv characterization would be different since that is very much tied to the target Primary reason for having the uv characterization is to determine whether there is enough data for a particular target in a dataset to have a reasonable chance to satisfy the science goals Additions to obscore from visibility data: s_fov, s_resolution ranges (_min, _max) Characterise the fourier space (uv plane) or put instrument details to help uv coverage, or use plots/maps. (proposed fields in tables on the slides). Next: - extension: EN or REC? - plots via datalink: semantics in there?