TWiki> IVOA Web>IvoaVOTable>RfcPage>Siav2RFC (revision 16)EditAttach

SIA v2.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA Simple Image Access (version 2.0) Proposed Recommendation.

Latest version of the IVOA SIAV2.0 can be found at:

Reference Interoperable Implementations

(Indicate here the links to at least two Reference Interoperable Implementations)

AMIGA SIAv2 Archive Prototype

The AMIGA SIAv2 Archive Prototype provides discovery and access interfaces for two collections of radiointerferometric single-line emission datacubes of galaxies. The B0DEGA collection is composed of 30 velocity datacubes of southern nearby galaxies obtained using the Submillimeter Array (SMA) for CO(2-1) emission of the circumnuclear regions (1 arcmin). The WHISP collection provides 33 velocity datacubes of the more extended public WHISP survey observed for HI 21cm emission line. RESTful interfaces may be tested with HTML forms that were initially developed for debugging purposes.

Interfaces provided by AMIGA SIAv2 Archive Prototype:

DALI/VOSI compliant endpoints for /capabilities and /availability are also provided for SIAv2 Discovery Service (as well as for DataLink Service), the content of these responses (e.g. standardID values) may be modified to final agreed values accordingly.

Service descriptors for DataLink and Adhoc cube extraction services are provided in the SIAv2 discovery responses (as well as in the Full Characterization Metadata Service and DataLink responses)

AMIGA SIAv2 discovery service can accept other input params (very specific for searching in HTML form interface: LINE, VELOCITY and CHANWIDTH) which does not prevent it from beeing compliant with last SIAv2 draft. AMIGA SIAv2 discovery service is compliant with recommendations for the use of COOSYS in VOTables.

-- JoseEnriqueRuiz - 2014-11-18

CADC implementation of the SIA-2.0

The CADC implementation has been updated to support all the query parameters in the PR document. Resources:

http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/sia/availability

http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/sia/capabilities

http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/sia/v2query

The landing page (http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/sia/) provides some example queries. The access_url in the output is currently a link to the CADC DataLink service (as described by the access_format value); this may change in the future to be a direct link to science data. We also include a DataLink service descriptor in the response (VOTable) document that can be used to generate calls to the DataLinks service using ID values from the table.

Implementation note: This implementation is really just a front-end for the CADC TA service. It generates an ADQL query on the ivoa.ObsCore table and submits that as a query job to TAP sync endpoint; the caller gets the outpu directly from the CADC TAP service. All of the code to implement such a front-end is available in the cadcSIA module at https://code.google.com/p/opencadc/ -- the code here can be used to parse query params, generate ADQL, and provide a "TAP-frontend resource" using the sync support in the cadcUWS library.

Implementations Validators

(If any, indicate here the links to Implementations Validators)

RFC Review Period: 2014-07-31 - 2014-09-09



Comments from the IVOA Community during RFC period: 2014-07-14 - 2014-09-09

Comments from José Enrique Ruiz

  • Abstract
    • "SIA provides capabilities for image discovery and access, but capabilities for access are defined elsewhere." (?!)
  • 1.2.1. Simple Data Discovery
    • "find data within a range of exposure (integration) time" I guess this requirement directly translates to searching data with flux in a specified range. I do care about integration time only if all observations have been made with the same instrumental set-up/sensitivity, which is not the case in broadcasted discovery queries.
    • I can think on other requirements related to spectral radio velocity datacubes.
      • "find data observed for a specific spectral line"
      • "find data within a specified range of velocity for a specific spectral line"
  • 2. Resources
    • In the table, what does the asterisk mean in no* ?
Changed to "reserved for future use". -- PatrickDowler - 2014-10-20
  • 2.1.2 BAND
    • This is a major point. I would use this param to search for datasets that have been observed in a specific range of wavelengths. The doc states that "values used in the BAND parameter are always assumed to (vacuum) wavelength with UNKNOWN reference position" I have to admit that I do not fully understand this sentence. Does it mean that I have to provide in my query corrected from redshift values instead of the actual frequency values in the instrumental set-up of the observational dataset? If this is the case, this perfectly answers how could I search for spectral velocity radio datacubes that have been observed for a specific emission line (independently of its redshift) But what if I am searching for data within a specified range of velocity for a specific spectral line? My proposition is to keep BAND as observed wavelength, and add a param for LINE, and another one for VELOCITY.
  • 2.1.6 SPATRES
    • I just draw your attention to a particular point in radio interferometry observations. There is an instrumental param called "Maximum Angular Scale" that provides the maximum angular scale structure that may be recoverable, which results in the fact the larger structures in the sky are "resolved out" and cannot be detected. A potential use case could be to find observations performed with a particular "Maximum Angular Scale" so we are sure we do not miss any structure in the sky smaller that this param. I guess this use case cannot be addressed with values for SPATRES.
  • 2.1.7 EXPTIME
    • As I said previously, I only see this param useful in the case I'm searching for observations that have been made with the same instrumental set-up/sensitivity, which in general it's not the case for broadcasted discovery queries.
  • 2.1.8 ID
    • Do we have a use case for broadcasted discovery using obs_publisher_did?
    • "should this parameter override the normal case-sensitive string equality comparison?" I would say yes.
  • COLLECTION, FACILITY, INSTRUMENT, TARGET
    • I think it is important to reach an agreement on what to do wrt. case-sensitivity and strict-equality for these string-valued params. I personally see them more in the realm of use cases for search of services in the registry, with the exception of target that could be translated to coordinates by a name/coords look-up service like Sesame.
  • 2.1.16 SPECRP
    • In spectral velocity radio datacubes, resolving power (more used in optical wavelength observations) may have its analogue in the concept of "channel width" usually measured (again) in units of velocity.
  • 2.1.18 UPLOAD
    • It could be good to have an example of how to reference (from any input param) a column in a VOTable provided by UPLOAD (I guess only tabular data in VOTables allowed?).
Left undefined in 2.0. -- PatrickDowler - 2014-10-20
  • page 19
    • "Note that the {query} resource does not have to be named as shown in the accessURL(s) above; in fact, they could share the same accessURL since they also differ in the value of the REQUEST parameter..." Since REQUEST param has been removed, all this text should be revisioned, and {metadata} resource declared in VOSI capabilities..
All query parameters are defined in terms of the ObsCore data model and we now have params for all the interesting mandatory fields. Fixed the bit about REQUEST. -- PatrickDowler - 2014-10-20

  • side point: logical operators
    • It is said in the doc: "The constraints from multiple values of a parameter are combined with logical OR operator. The constraints from different parameters are combined with a logical AND operator." Given the number high number of params, should we start thinking (for a future version) on a mechanism to define logical operators for the combination of multi-valued params in multi-queries? I'm thinking on the potential use case of working with inclusion AND operators for a list of ranges in the POS params (overlapping in spatial dimension), for example. Moreover, this will help to combine different multi-valued params at the same time..
This is something that could be added in 2.1; for example some particiants are experimenting with passing query parameters in a document (e.g. json) and it would be simple enough to add logical operators and brackets to that. -- PatrickDowler - 2014-10-20 -- JoseEnriqueRuiz - 2014-10-08

Comments from Mark Taylor

Looks mostly well-written, reasonable and comprehensible to me (though I don't have a strong interest in the details of image access). Some small comments just to prove I read it:

  • One or two parts have not been prepared to the level of PR readiness:
    • Sec 1.1: TODO include architecture diagram
    • Sec 1.3: "queryData and get-gory-details ... to be renamed in diagram before PR"
    • Sec 2.1.8 and 2.2.1: TBDs concerning ID case-sensitivity
    • The placeholders on the RFC page for implementations and validators are blank.
  • Fixed. -- PatrickDowler - 2014-10-01

  • Minor suggestions/queries:
    • Sec 2.1.1: Is POS longitude always in the range 0..360 degrees, or are services permitted/required to make sense of values in the range -180..0 as well? (possibly this is covered by one of the other referenced standards, but it might be good to add a short note here in any case).
    • Sec 2.4: "in fact, they could share the same accessURL since they also differ in the value of the REQUEST parameter that invokes their standard behaviour." I don't quite understand which resources are being referred to here as potential sharers, but in any case I think this comment is invalidated since the removal of the REQUEST parameter from the {query} resource. Probably this clause can just be removed.
    • It would be nice either in the Changes section or somewhere in the introduction to include a short comment on the difference between SIA v1 and v2.
  • Added valid coordinate value range to text (longitude in [0,360], latitude in [-90,90] since we only support ICRS RA and DEC). Removed the bit about REQUEST. IVOA. -- PatrickDowler - 2014-10-01
  • SIA v1 and v2 are quite different so we will add a paragraph in the Introduction to explain the major changes. -- PatrickDowler - 2014-10-05

  • Tiny typos:
    • Sec 2: "will return detailed metadata can be used" - missing "that"?
    • Sec 2.1.4: "ther ObsCore data model"
    • Sec 2.1.5: "The FOV parameter define the range(s)" -> "defines"
  • Fixed. -- PatrickDowler - 2014-10-01
-- MarkTaylor - 2014-08-31

Comments from FrancoisBonnarel

  1. 1.4 POL (page 13)
POL=I POL=Q POL=U

means dataset contains "I OR Q OR U"

There is currently no way to say that a dataset contains "I AND Q AND U"

Currently implementations understand POL=I as "contains I" and not as "strictly equal to I".

We do not recommand to change this for version 2.0


I have other personal comments. One (2.2) is a real question

  • 1.1 I have an interrogation about "SIA defines a separate resource for accessing the complete ImageDM metadata for a single dataset. See below .

  • Page 7 we still have get-gory-details in the diagram and page 8 in the text. I think metadata is fine as a final name, although FullMetadata, or ImageMetadata could be more precise.
  • Fixed name in diagram and text. -- PatrickDowler - 2014-10-01
* 2.1 (page 9) "all parameters for the {query} resource defined below are required.". ---> I assume this means that all parameters must be interprateted by the service, but they may be absent from the URL built by the client/user. Otherwise ... Should we say that or is this obvious ? * 2.1.3 (page 11) BAND Find data that includes 21 cm : this should be BAND = 0.21. BAND = 0.20/0.22 finds data which overlaps 0.20/0.22. It may be a range of 0.195/0.205 and in that case 0.21 will be outside ... (because we have said Intervals are intersecting ) * 2.1.17 (page 17)

I don't think FORMAT = .... can be used when access_format has a DataLINK value. How are we sure we know the content-type values for the considered dataset ?

* 2.2 {metadata} resource

The choice made there is to access metadata for one single dataset (except if we put several IDS values in the ID parameter)

In that case {metadata} is very close to {accessdata} resource not in the actual content but in the relationship to query. The output (one to several full descriptions) is then let open with a single format. I had in mind that a totally different mechanism could be usefull: Discovery with full metadata . In that case the {metadata}resource shares the same input parameters with {query}. But the ouptut table (or tables) should be consistent with full Image DM. It is a "long query" in some way ! In other words, WCS is directly available in the response obtained by the full parameter query.

-- FrancoisBonnarel - 2014-07-31

Answer by DougTody -

Since the full metadata (WCS etc.) for a single dataset can be very large, and a discovery query may find many datasets, I think getMetadata should be a separate operation, restricted to a single image. Also, for reasons of complexity, only queryData should do discovery, and getMetadata and accessData should be limited to single datasets.

However, if we want basic WCS info in the discovery query response then that is possible as we probably do not need full image metadata. The minimal stuff we have proposed for ObsTAP 1.1, plus s_region, already comes close. If we were to add just a bit more (e.g. reference point and projection type) then it would suffice for the typical nsubarrays=1 use case.

-- DougTody - 2014-07-31

Comments from WalterLandry -2014-09-26

* Reminder: the parameters you are critiquing are from the SIAv2 query capability. This capability is designed to query for datasets that match the specified conditions, so the parameters are inherently those that express the conditions and not (necessarily) those that describe some ideal (construct-able) data. -- Answer by PatrickDowler - 2014-09-29

* Comment/response inline -- Answer by PatrickDowler - 2014-09-29

Back in July, I sent a note to this list about some issues I had with SIA 2.0. Since then, we have implemented a synthetic image generation service for the Planck satellite. We tried to implement this is in a way consistent with SIA 2.0, but we had some difficulties.

1) BAND

The Planck satellite detector bands are all specified in GHz: 30, 44, 70, etc. These are nice, integer numbers. Mapping to wavelength leaves me with numbers that are not exactly representable in floating point. This means that every search has to give a range. It would be nice if I could specify the frequency instead of the wavelength. We ended up using the keyword FREQ in MHz, since I do not know of any astronomical observations of EM radiation that go lower than that.

* We've had this argument so many times and it always comes own to "just pick something". In ObsCore-1.0 the em_min and em_max fields are wavelength in meters, so that is what SIA-2.0 query uses to query those fields. No matter what you pick for the standard, someone has to transform nice looking values into something with scientific notation... Implementing FREQ in addition is fine for you; requiring it in the standard is more work for all services; implementing FREQ instead means services are not compm>atible or (at best) clients have to get/grok service capabilities before being able to call them. -- Answer by PatrickDowler - 2014-09-29

2) RANGE

As I mentioned in July, RANGE is prohibitively expensive for this data set. So we do not support it, will never support it, and I still think it should not be part of the spec.

* I don't disagree with this; at best it is a convenience for making some common (large) polygons. -- Answer by PatrickDowler - 2014-09-29

3) POS

Since this is a synthetic image generation service, it would be nice to make rectangular images. The current SIA 2.0 spec has no great circle rectangles. The client has to construct the polygons themselves, which is non-trivial. Given the negative reaction I got to Box's last time, we ended up ditching SIA 2.0 for this entirely and using SIA 1 syntax: POS, SIZE, CFRAME, CDELT (though SPATRES would have been fine). I would really prefer a better mechanism than this.

* Well, I basically grok what you re trying to do and it is just using a small bit of FTS WCS to describe what you want to get back. That seems to me to fit much better in an AccessData-ish service and not in a pure data discovery service. Since it seems to be driven from real data, there is obviously some part of the usage that would involve discovery... let's make sure to discuss this kind of usage in Banff next week. -- Answer by PatrickDowler - 2014-09-29

4) TARGET vs OBJECT

Why does SIA 2.0 use the TARGET parameter? OBJECT is an existing standard FITS convention.

* The query result is ObsCore; in there the name of the field being constrained is target_name, hence TARGET. -- Answer by PatrickDowler - 2014-09-29

5) Syntax

Consider these issues:

a) In July, I highlighted a problem with the syntax of POS parameters. It requires spaces, which must be URL encoded or things silently break. Silent breakage is the worst kind of breakage.

* Syntax requires encoding, yes. Even without syntax, parameter values must be encoded to be safe or strange things happen. How many times have I cursed the IAU naming convention that includes + sign? Lost count Must encode. -- Answer by PatrickDowler - 2014-09-29

b) Polygon searches use a straight list of numbers. It would be better to have a list of pairs to make typos more obvious.

* It is possible to make syntax errors. More syntax solves it? -- Answer by PatrickDowler - 2014-09-29

c) We need to be able to select multiple detectors at once, so we would like to have an array of strings.

* Not sure I follow... you said synthetic but now are talking about multiple detectors. If the underlying data is some kind of mosaic camera then you have several choices on what constitutes a single ObsCore entity (been there, we can discuss off-line), but describing the complexity of 1 observation -> N subarrays is not in the scope of ObsCore-1.0 so not in the scope of SIA-2.0 query. There is ObsCore-1.1 work underway, plus the ImageDM and consequent SIAv2 "metadata" capability for exposing it. -- Answer by PatrickDowler - 2014-09-29

d) There is no way to add arbitrary parameters. COORD was the way to do that in old versions of SIA 2.0. Now I have to use up a keyword and hope it does not accidentally conflict with new versions of the standard. This is not going to scale.

* SELECT and COORD were never part of SIA-2.0 query; they were part of WD-AccessData-1.0 to show a way that SimDAL could be supported within that spec. -- Answer by PatrickDowler - 2014-09-29

e) We have a smart client doing searches on behalf of the user. In general, we would like to set arbitrary metadata that are not necessary for the search but convenient for the user.

* You can always add custom fields to your ObsCore output. If they aren't really custom, but just in the optional fields of the appendix, then chosing standard names would be a good thing to do. -- Answer by PatrickDowler - 2014-09-29

This prompted me to use a more general syntax to express queries. Specifically, I used json5

https://github.com/aseemk/json5

It is an extension of JSON to make it friendlier to write. It is a strict superset of JSON and a strict subset of Javascript. So every valid JSON file is valid json5, and eval() will still work for those of you foolish enough to run it on unverified user input To be specific, a sample query would be

http://irsa.ipac.caltech.edu/cgi-bin/Planck_TOI/nph-planck_toi_sia?POS=[0.053,-0.062]&CFRAME='GAL'&ROTANG=90&SIZE=1&CDELT=0.05&FREQ=44000&ITERATIONS=20&INSTRUMENT=['24m','24s']&TIME=[[0,55300],[55500,Infinity]]&USER_METADATA={CLIENT:'IRSA Smart Client'}

Note that the service is not public yet, so this URL will not work for you yet.

Internally, every parameter is converted into a json5 element. So this would turn into the json5 document

{ POS:[0.053,-0.062], CFRAME:'GAL', ROTANG:90, SIZE:1, CDELT:0.05, FREQ:44000, ITERATIONS:20, INSTRUMENT:['24m','24s'], TIME:[[0,55300],[55500,Infinity]], USER_METADATA:{CLIENT:'IRSA Smart Client'} }

Modulo whitespace, this is just replacing '&' with ',' and '=' with ':'. We also support submitting a json5 document directly

http://irsa.ipac.caltech.edu/cgi-bin/Planck_TOI/nph-planck_toi_sia?{POS:[0.053,-0.062],CFRAME:'GAL',ROTANG:90,SIZE:1,CDELT:0.05,FREQ:44000,ITERATIONS:20,INSTRUMENT:['24m','24s'],TIME:[[0,55300],[55500,Infinity]],USER_METADATA:{CLIENT:'IRSA Smart Client'}}

* This is interesting and I remember talking about json5 at the last interop. Hopefully we can see/discuss this further in Banff. -- Answer by PatrickDowler - 2014-09-29

On a side note, I have used JSON (not json5) as input for simulations in geology [1]. The SAMRAI Adaptive Mesh Refinement framework for massively parallel simulations [2] also uses a format that is almost indistinguishable [3] from json5 for input files. So I would claim that json5 would be able to cover any needs that SIMDAL would need in specifying model parameters.

Given all of the issues that I ran into, it is not clear to me that it would be a good idea to ratify SIA 2 as it is now. Whether or not you like the changes I made, there seem to be some major deficiencies that need to be addressed before the standard can be accepted.

[1] http://geodynamics.org/cig/software/gale [2] https://computation-rnd.llnl.gov/SAMRAI/index.php [3] You can separate elements in an array or object with newlines instead of commas. -- WalterLandry 2014-09-26

Comments from José Enrique Ruiz 2014-10-28

Comments for SIAv2 PR 2014-10-24

  • 1.2.1. Simple Data Discovery. Last item "find data within a range of exposure (integration) time"
I guess this requirement directly translates to searching data with flux in a specified range, i.e. finest details of an object detected. Integration time is not so useful when observations have been made with different instrumental set-up/sensitivities. I would prefer a scientific requirement taking exposure (integration) time as a threshold, then the delivered response would need subsequent filtering.

  • 2.1 {query} resource. "All query parameters are multi-valued. The constraints from multiple values of a parameter are combined with a logical OR operator. The constraints from different parameters are combined with a logical AND operator."
I think readers will appreciate a more developed paragraph, crearly saying that a single invocation of the service with several multivalued input params result in a complex discovery task issued from the multiple combination of the values among the parameters, so multi-queries are possible with a single invocation of the service.

  • 2.1 {query} resource. Last paragraph "The units for numeric values are are..."
Typo

  • 2.1.2 BAND. Last paragraph "Energy values used in the BAND parameter are always assumed to be (vacuum) wavelength in meters. The default interpretation is with an UNKNOWN reference position [9]."
More clarity would be appreciated. i.e. "Energy values used in the BAND parameter are always assumed to be observed wavelength in meters"

  • 2.1.4 POL. Last paragraph "Possible values for the POL parameter are defined in, and the POL parameter..."
Something is missing after "are defined in.." ?

  • 2.1.8 ID
I guess this param prevails over the whole rest. If a value for ID is provided, the rest of input params are dismissed?

  • 3.2 Errors. "If the requested format is VOTable..."
DALI RESPONSEFORMAT param is not present in the doc. I guess the resquested format could still be passed in this "not mandatory" param.

-- JoseEnriqueRuiz - 2014-10-28

Answer by FrancoisBonnarel

I go back to this discussion between Walter and Pat (see above). I think the point has been clarified at Banff and in the new version.

-- Walter said 3) POS

Since this is a synthetic image generation service, it would be nice to make rectangular images. The current SIA 2.0 spec has no great circle rectangles. The client has to construct the polygons themselves, which is non-trivial. Given the negative reaction I got to Box's last time, we ended up ditching SIA 2.0 for this entirely and using SIA 1 syntax: POS, SIZE, CFRAME, CDELT (though SPATRES would have been fine). I would really prefer a better mechanism than this.

* Well, I basically grok what you re trying to do and it is just using a small bit of FTS WCS to describe what you want to get back. That seems to me to fit much better in an AccessData-ish service and not in a pure data discovery service. Since it seems to be driven from real data, there is obviously some part of the usage that would involve discovery... let's make sure to discuss this kind of usage in Banff next week. -- Answer by PatrickDowler - 2014-09-29

** This has been clarified in the introduction (differences SIAV1 and SIAV2)

" The capabilities for dynamic access to image datasets are expanded in scope, but are separated from data discovery and download of whole image datasets. A separate "AccessData" specification currently under development will define the more advanced dynamic data access functionality. Automated virtual data generation and discovery (as in SIA- 1.0) is not currently supported but is being considered for a future version of SIA."

Actually, AccessData 1.0 allows to extract cutouts. Regridding, changes of WCS will be covered by AccessData 1.1. Discovery of virtual data in the query phase will be covered by SIAV2.1



Comments from TCG member during the TCG Review Period: 2014-11-20- 2014-12-20

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or not the Standard.

IG chairs or vice chairs are also encouraged to do the same, althought their inputs are not compulsory.

TCG Chair & Vice Chair ( _Séverin Gaudet, Matthew Graham )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Jesus Salgado, Omar Laurino )

Grid & Web Services Working Group ( André Schaaff, Brian Major )

Registry Working Group ( _Markus Demleitner, Pierre Le Sidaner )

Semantics Working Group ( _Norman Gray, Mireille Louys )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Mike Fitzpatrick )

Data Curation & Preservation Interest Group ( Françoise Genova )

Knowledge Discovery in Databases Interest Group ( George Djorgovski )

Theory Interest Group ( _Franck Le Petit, Rick Wagner )

Standards and Processes Committee ( Françoise Genova)

-->



Edit | Attach | Watch | Print version | History: r48 | r18 < r17 < r16 < r15 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r16 - 2014-12-08 - FrancoisBonnarel
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback