International Virtual Observatory Alliance

Difference: DALFuture (13 vs. 14)

Revision 142020-05-13 - FrancoisBonnarel

  META TOPICPARENT 
  name="WebPreferences"  


 <-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

 DAL Future - discussion page 

This page is meant to gather opinions, feedback and proposals for DAL future (see Trieste 2016 Fall Interop presentation).

The underlying question is what will be the main evolution of the DAL protocols in a near future (SIA, SODA, TAP, ADQL, DataLink, etc.).

The main drivers are:
 

 CSP priorities : what is achieved and what is not achieved.
  Feedback from implementation of recently recommended protocols (SIA, DataLink, SODA, TAP new version)
  Multidimensional data access roadmap as established in october 2013 in Hawaï (Fall Interop + ADASS).
 
Properties to consider: 
 Data Type
  Functionalities
  Software Design
  Position in IVOA landscape
 
(see ADASS XXVI DAL poster for reference)

Contribution can take the usual forms: linked pages to topics listed below (or added ones), mailing list follow up from the topics.

(please, when creating a page to link from a topic here, use a DALFutureTopic TWiki name for the new page)

-- FrancoisBonnarel - 2020-05-08 After a while discussions here have been specialized in various pages :
- META TOPICPARENT
+  name="WebPreferences"
-<
<
+                  DataLink here
->
>
+                  DataLink -> here
-<
<
+                  SIA2  here
->
>
+                  SIA2 -> here
-<
<
+                  SODA  here
->
>
+                  SODA -> here
-<
<
+                  SCS  here
->
>
+                  SCS  -> here
->
>
+                   TimeSeries : 

  * discovery and access ->here

  * additional ObsCore parameters -> here and  here
  Data Type 

Add Time Domain ( HIGHEST Scientific priority: see CSP Time Series use cases)
 

 use cases
 

Experience  from SVO, High energy groups (XMM archive,  SVOM project), CDS VizieR, Planetary science ESAC, CTU Prag , GAVO, etc.

Metadata needed for discovery 

 Spatial coordinate system Time coordinate system : scale, reference position, 
representation    

Time, spectral, space and polarisation characterisation and 
statistics, Raw or mean position, Raw bounding limits, Standard deviation

Time sampling characterisation and statistics,Mean sampling step, Sampling step limits, Sampling step standard deviation

Total exposure time, Exposure time characterisation and statistics, Mean total exposure time, mean exposure time per step, min, max and standard deviation of exposure time per step 

Characterisation on the time frequency axis: Periodograms are another representation of data. We can have period(s) for periodic data or variability We can proceed to frequency analysis and provide 
coefficient and frequencies Phase representation

What are the dependant and independant quantities : Nature of the dependant quantities.

Which mode are the data? Transient or periodic this can be seen on periodogram or by the Target class. 

Target name, class, subclass, are needed e.g. SN, eclipsing binary, spectroscopic binary,.... This also gives an hint of the variability type. Reuse of standard vocabulary suggested.

It would be nice to answer questions such as : "have we more observations on Wednesdays or every day between one and two o’clock?" Usefull to track artefact


What do the data called «TimeSeries» encompass ? 

It's a temporal sequence of «measurement points» containing: A time coordinate and either one or several flux(es), with errors, resolution, etc.. or a A time coordinate and either one or several flux(es), with errors, resolution, etc.. or a 
derivative (mag, mag, diff etc..) a radial velocity (double stars, exoplanets), a position (solar activity)

But also : Spectra, images

In the latter 2 cases is it better represented as a  regular cube with only one sparse axis or as an event list

Should we recommend a time representation for standard output (probably MJD) ?

Relative time for theoretical data ? 

-- FrancoisBonnarel - 2017-05-30
 

 ideas for solutions
 

POSSIBLE view for a TS discovery and access (ALSO SEE BELOW * Driving extended functionalities)

Extend ObsCore with a new TimeSeriesCore table

ObsTAP-TS can query both tables together

Extend «SIAV2» query interface to new timeSeries specific query parameters
And Rename «SIAV2» in DataSetDiscovery

-->Archived time series retrieval or DataLink 

Virtual data discovery (= TimeSeries produced on the fly) in SIAV2 = DsDisc. Access.reference is a 
SODA url

SODA extensions to TS. Beside «cutout» or time selection add:

Selection on time frequencies

Selection in exposure times

Time bining

Add Frequency or phase output. 

rETRIEVAL OF FULL METADATA CONSISTENT WITH EXTENDED DATAmODELS (VO-dml)

-- FrancoisBonnarel - 2017-05-30
 

 attempts and prototypes reports
 

 Functionalities 
 

 Feedback : what is missing/inadequate in current protocols
  Extended metadata (for Cubes and Time Domain)
  Extended server side data services (regridding, deconvolution, etc.)
 

 Software Design 

 Interface design 
 

 Feedback on current interfaces
 
Recently I discovered that calls to datalink+SODA, outside ObsCore and SIAv2 context are available in VizieR, but also in GAVO. Markus is now using in SIA 1.0 responses (see http://dc.zah.uni-heidelberg.de/lswscans/res/positions/siap/siap.xml?POS=311.41098458333335,30.723715361111108&SIZE=2.64&FORMAT=image/fits) It sounds reasonable and I can imagine we wil find more of that in the future. But here there are already fields with utypes Access.Reference in le SIA mai n part, it is now difficult/impossible to recognize that a given link will return Datalink or something else.

On the other side, the Datalink VOTable response doesn't contain any signature allowing to recognize a posteriori that it has a datalink content. That's a pity. This would have been harmless and could simplify everything.

This has direct consequence on Aladin v10 (or any other client), which has the code necessary to perform specific actions for DataLink., cannot benefit from all these standardisation efforts and is limited to display the DataLink result as a simple,VOTable without coordinates , or still worse in a Web navigator.

The good point is that we are now close to a real usage and no more in prototyping. THis kind of usage is probably wider than what was imagined initially for datalink Aladin is close to make something of it there is now a couple of servers which deliver these things. JUST sad that everybody is using it his own way with the consequency that it remains unusable, while nearly nothing is missing to fix that.

So, I would recommend two changes:

-A simple method to identify the links and to discover what they are supposed to return

solution 1: a specific utype for datalink URL (ex: utype=Access.Reference.Datalink.1.xxx")

solution 2: an appropriate LINK in the FIELD definition (ex: LINK content-type="application-stream/votable;datalink" ...)

-A signature in the VOTable, for example using an INFO tag or as an attribute of RESOURCE , RESOURCE type="result;datalink"...

These "solutions" are just examples to illustrate the idea. WE have to check their validity with regard with the standard evolution. -- PierreFernique - 2017-03-07
 

 Driving extended functionalities
 
Current DAL situation for dataset discovery and access

We have a core of 4 protocols SIAV2, ObsTAP, DataLink and SODA ?

ObsTAP is controled via ADQL and its response is an Obscore table. SIAV2 is a parameter driven service, doesn't requite TAP infrastructure . In some respects it is a PQL ObsTAP (actually parameter language allows more evolution towards virtual data)

DataSets can be searched by any of the Spatial, time, band and polarization criteria.

Access is managed by SODA / currently only doing cutouts on ND cubes.

DataLink provides gluing facility between all these protocols responses and with other services

Older protocol "lost "functionalities

SSA allows to discover spectra by Spatial and BAND positions / response in discovery mo

de standardized with SSA response (some kind of pre-Obscore)

SIAV1.0 allows to discover anything with a 2D signature on space, but only Spatial axes are standardized.

SSA and mostly SIAV1 have a "virtual data discovery mode" IN that case the retrieval is performing "Server side operations for data access"

Possible evolution to extend the functionalities of the new protocol and tackle the TimeSeries CSP priority

SIAV2 interface could allow discovery of TimeSeries and Spectra with little extensions (time frequencies characterisation for example) Some functionalities available in SSA/SIAV1.0 have to be added to SIAV2

Virtual data discovery : Basically the service arbitrates the discovery query parameters and propose a matching SODA URL. Specially usefull for TimeSeries where many time the TimeSeries is built from the data content.

We could extend the parameters in ObsCore to tackle TimeSeries and spectra, add input paramaters to constrain that and add virtual data functionality

This will be both an extension of SIAV2.0 and a new overall "DataSet discovery" protocol.

SODA = add spectra and Time Series functionalities

Provide Extended metadata consistent with NDcube DM : is this a work for SIAV2 or for SODA 1.1 ?
Extended metadata retrieval (any kind of full serialization of datamodels for a given dataproductype) is very close to retrieval of the dataset themselves (or excerpt/transformations of datasets). So it seems that this functionnality is more a SODA one than SIAV2 one...  

-- FrancoisBonnarel - 2017-05-15
 

 Standard definition of custom services
 
Full knowledge of custom services is in the hands of these custom services developpers !!! Currently service descriptors are in the hands of data centers operating DAL discovery and/or DataLink services.

They may not know all the details on the service parameters

DataLink service operfors may not know the dataset metadata in detail

It could be usefull to add in DataLink the feature that services autodescribe

-- FrancoisBonnarel - 2017-05-15

 Pushing code to the data 
 

 Science cases
  Attempts and prototypes
 

 Formats and Languages 
 

 Json, PQL, PDL, etc.
  What to use next ?
 

 TAP evolution 
 

 From relational databases to...
  Relaxing ADQL ?
 

 TAP & Healpix 

[This part is a sum-up of the talk Bringing Healpix and MOC in TAP by G. Mantelet presented at the IVOA Interoperability meeting in May 2017 in Shanghai.]

In TAP, with few extensions, it could be possible to get Healpix information and/or to add constraints on Healpix information.

 Proposed new features: 
 

 New UDFs:  
 ivo_healpix_index(hpx_order INTEGER, position POINT) --> BIGINT
   ivo_healpix_index(hpx_order INTEGER, ra DOUBLE, dec DOUBLE) --> BIGINT [optional]
   ivo_healpix_center(hpx_order INTEGER, hpx_index BIGINT) --> POINT [optional]
  moc_agg(hpx_order INTEGER, position POINT) --> MOC
  moc_agg(hpx_order INTEGER, hpx_index BIGINT) --> MOC [optional]
 
  Addition of the geometrical type 'MOC' in DALI (in addition of the already existing POINT and REGION)  
 in VOTable: <FIELD ... datatype="char" arraysize="*" xtype="MOC" />
  a MOC would be serialized into an ASCII representation described in the appendices of the talk Bringing Healpix and MOC in TAP. *Example:* "10/63-65,87 11/1 13/" (i.e. MOC of order 13 with 4 cells at order 10 and one cell at order 11)
  using this ASCII serialization, it could be possible to create manually a MOC using the function REGION(...). *Example:* "REGION('1/1,3,4 2/4,25,12-14,21')"
 
  The ADQL grammar of some geometrical functions (like AREA, CONTAINS and INTERSECTS) should be adapted in order to be able to operate on any function returning geometrical regions. For the moment, it is limited to POINT, BOX, CIRCLE, POLYGON and REGION. This does not allow the usage of UDFs returning region, like moc_agg(...) as defined above or REGION(...) with an ASCII representation of a MOC.
 
Still to answer:
 

 How to designate an Healpix-related value (e.g. Hpx index, MOC) in a VOTable?  
 UCD? Datatype? XType (xtype='MOC', ok but what about an Hpx index: xtype='HEALPIX'?) Or more complex like a VOTable GROUP?
  An Healpix index or derived product always requires 2 additional pieces of information: scheme ('nested' or 'ring') and order (an integer between 0 and 29). How to provide them along the Healpix-related value in a VOTable?
 
 
Usage examples:
 

 2-D histogram using Healpix (see ADASS Poster for more examples):
 
SELECT ivo_healpix_index(7, POINT(’’, ra, dec)) AS hpx_index, COUNT(*) AS density
FROM tycho2
GROUP BY hpx_index

 

 creating a MOC at Healpix order 7 from Tycho2 (or a subset):
 

SELECT moc_agg(7, POINT(’’, ra, dec)) AS mymoc
FROM tycho2
...
 

 filtering by Healpix index:
 

SELECT *
FROM tycho2
WHERE ivo_healpix_index(7, POINT(’’, ra, dec)) IN (12,23,68,69,70)
 

 filtering by MOC:  
 with an ASCII serialization:
 
 

SELECT *
FROM tycho2
WHERE 1= CONTAINS(POINT(’’, ra, dec), REGION(’2/12-20 5/60’))
 

  
 with a MOC embedded in a VOTable column (or more):
 
 

SELECT t.*
FROM tycho2 AS t JOIN TAP_UPLOAD.mymoc AS m
ON 1=CONTAINS(POINT(’’, t.ra, t.dec), m.moc1)

TAP_UPLOAD.mymoc is a normal uploaded VOTable table with a column named moc1 of type ‘VARCHAR’ and xtype 'MOC'.
 

  
 with a MOC formatted into a binary FITS table:
 
 

SELECT t.*
FROM tycho2 AS t JOIN TAP_UPLOAD.mymoc AS m
ON 1=CONTAINS(POINT(’’, t.ra, t.dec), m.moc)

Instead of uploading a VOTable, a FITS file would be uploaded (the TAP implementation has to allow that). The uploaded FITS file has special headers specifying that it represents neither an image nor a table, but a MOC. Then, it should be considered as such while used in the ADQL query. But TAP allows only the upload of table. So, in order to use the uploaded MOC, the TAP service has to create a table of only one cell containing the uploaded MOC (so, one cell for the entire FITS file). As for a "normal" upload, the name of the table is provided in the HTTP parameter UPLOAD, but there is no name for the column containing the single MOC and that we need to refer to in the ADQL query. To solve this issue, we could agree on a standard name for this column: let's say "moc". So, on the above example we had UPLOAD="mymoc,param:moc.fits" which has been uploaded as the table TAP_UPLOAD.mymoc with only row and one column named "moc".

-- GregoryMantelet - 2017-05-29
 Position in IVOA landscape 

 Merging DAL protocols, HiPS and MOC 
 

 HiPS is also a discovery and access mode: How to marry service approach and HiPS Approach in both ways ?
  Querying services by MOC : is that necessary ?
 

 Data Models 
 

 ObsCore
  DataSet Metadata
  ND-Cube
  SparseCube
  STC
  other...
 

 Updating SCS 

SCS still requires VOTable 1.1 and has some other quirks that makes it needlessly incompatible with the rest of the DAL landscape (in particular, it's totally DALI-incompatible). We should figure out how we can evolve it to be less odd with minimal disruption to existing services and clients (e.g., relaxing VOTable requirements, support for DALI MAXREC, RESPONSEFORMAT, metadata discovery). -- MarkusDemleitner - 2017-03-01

In SCS there are three things which are causing problems for me:

1) The UCD's that are required by the spec are rather outdated. 2) VOTable is required to be 1.0 or 1.1, which is far behind the current 1.3. 3) It requires a column with ucd="ID_MAIN". In UCD1+, this would be meta.id. We do not always have a column with that ucd, but we do have one with ucd=meta.record. So I would propose that the table must have one of meta.id or meta.record.

-- WalterLandry - 2017-04-15

Please, continue SCS future discussion at SCS-1_03-Next page, were discussion for SCS-1.1 will take place. -- MarcoMolinaro - 2017-07-11
 In TAP, with few extensions, it could be possible to get Healpix information
 and/or to add constraints on Healpix information.
 Proposed new features:
 - New UDFs:
 - ivo_healpix_index(hpx_order INTEGER, position POINT) --> BIGINT
 - ivo_healpix_index(hpx_order INTEGER, ra DOUBLE, dec DOUBLE) --> BIGINT
 - ivo_healpix_center(hpx_order INTEGER, hpx_index BIGINT) --> POINT
 - moc_agg(hpx_order INTEGER, position POINT) --> MOC
 - moc_agg(hpx_order INTEGER, hpx_index BIGINT) --> MOC
 - Addition of the geometrical type 'MOC' in DALI (in addition of the already
 existing POINT and REGION)
 - in VOTable: <FIELD ... datatype="char" arraysize="*" xtype="MOC" />
 - a MOC would be serialized into an ASCII representation described in the
 appendices of the talk
 [Bringing Healpix capability in TAP][http://....].
 Example: "10/63-65,87 11/1 13/" (i.e. MOC of order 13 with 4 cells at
 order 10 and one cell at order 11)
 - using this ASCII serialization, it could be possible to create manually
 a MOC using the function REGION(...).
 Example: "REGION('1/1,3,4 2/4,25,12-14,21')"
 - The ADQL grammar of some geometrical functions (like AREA, CONTAINS and
 INTERSECTS) should be adapted in order to be able to operate on any
 function returning geometrical regions. For the moment, it is limited to
 POINT, BOX, CIRCLE, POLYGON and REGION. This does not allow the usage
 of UDFs returning region, like moc_agg(...) as defined above or REGION(...)
 with an ASCII representation of a MOC.
 Still to answer:
 - How to designate an Healpix-related value (e.g. Hpx index, MOC) in a VOTable?
 - UCD? Datatype? XType (xtype='MOC', ok but what about an Hpx index: xtype='HEALPIX'?) Or more complex a VOTable GROUP?
 - An Healpix index or derived product always requires 2 additional pieces of
 information: scheme ('nested' or 'ring') and order (an integer between 0 and 29).
 How to provide them along to the Healpix-related value in a VOTable?
 Usage examples:
 - 2-D histogram using Healpix (see http://www.star.bris.ac.uk/~mbt/papers/adassXXVI-P1-31-poster.pdf for more examples):
 SELECT ivo_healpix_index(7, POINT(’’, ra, dec)) AS hpx_index,
 COUNT(*) AS density
 FROM tycho2
 GROUP BY hpx_index
 - creating a MOC at Healpix order 7 from Tycho2 (or a subset):
 SELECT moc_agg(7, POINT(’’, ra, dec)) AS mymoc
 FROM tycho2
 ...
 - filtering by Healpix index:
 SELECT *
 FROM tycho2
 WHERE ivo_healpix_index(7, POINT(’’, ra, dec)) IN (12,23,68,69,70)
 - filtering by MOC:
 - with an ASCII serialization:
 SELECT *
 FROM tycho2
 WHERE 1= CONTAINS(POINT(’’, ra, dec), REGION(’2/12-20 5/60’))
 - with a MOC embedded in a VOTable column (or more):
 SELECT t.*
 FROM tycho2 AS t JOIN TAP_UPLOAD.mymoc AS m
 ON 1=CONTAINS(POINT(’’, t.ra, t.dec), m.moc1)
 TAP_UPLOAD.mymoc is a normal uploaded VOTable table with a column named
 moc1 of type ‘VARCHAR’ and xtype 'MOC'.
 - with a MOC formatted into a binary FITS table:
 SELECT t.*
 FROM tycho2 AS t JOIN TAP_UPLOAD.mymoc AS m
 ON 1=CONTAINS(POINT(’’, t.ra, t.dec), m.moc)
 Instead of uploading a VOTable, a FITS file would be uploaded (the TAP
 implementation has to allow that). The uploaded FITS file has special
 headers specifying that it represents neither an image nor a table, but a
 MOC. Then, it should be considered as such while used in the ADQL query.
 But TAP allows only the upload of table. So, in order to use the uploaded
 MOC, the TAP service has to create a table of only one cell containing the
 uploaded MOC (so, one cell for the entire FITS file). As for a "normal"
 upload, the name of the table is provided in the HTTP parameter UPLOAD,
 but there is no name for the column containing the single MOC and that
 we need to refer to in the ADQL query. To solve this issue, we could agree
 on a standard name for this column: let's say "moc". So, on the above
 example we had UPLOAD="mymoc,param:moc.fits" which has been uploaded as
 the table TAP_UPLOAD.mymoc with only row and one column named "moc".

View topic | History: r15 < r14 < r13 < r12 | More topic actions...