Data Product Type

  1. Should we specify all possible values ? There is the suggestion in the current document to allow the value 'other' when no available category fits to describe a possible science data product.
  2. If we allow other , how do we specify more information ? In free format, in Subtype, but then no interoperability is granted.
  3. How could we suggest a possible set of pairs (type, subtype) for a data product and provide examples
-- MireilleLouys - 07 Mar 2011

Comments

In my opinion, in an ideal world this would be something like a set of tags (e.g., it's easy to imagine spectrum-image, or spectrum-timeseries, or image-timeseries). A taxonomy of such classes (the "tree") could then be used for query expansion by clients. However, there are no set-valued columns in ADQL, and faking them using, e.g., strings and SQL patterns ("=AND producttype like '%/spectrum/%'=") would defeat indexing and simply be ugly. So, I'd say make a catalog, ideally using the StandardKeyEnumeration from StandardsRegExt, and "other" is simply the SQL NULL. Tell both data providers and consumers not to fill in the type if there's not a good match. Ideally, the StandardKeyEnumeration would in the descriptions try to cover as many real data products as possible ("This category covers objective prism exposures"). My feeling is that that's the Pareto-correct way of doing things, actually covering probably around 98% of the actual queries rather than just 80%.

-- MarkusDemleitner - 04 Mar 2011


I second this way of working. NULL should be the other, and the enumeration should be registered as a StandardRegExt. Among the datatypes I would propose:

  • image
  • spectrum
  • image-spectrum (for series of images ordered in a frequency/velocity range, i.e. radio data cube)
  • spectrum-timeseries
  • image-spectrum-timeseries (for a 4D datacube consisting of image-spectrum cubes at different times)
  • image-timeseries
  • spectrum-spatialseries (for spectra on arbitrary positions, such as IFUs, OTFs, MOS)
  • image-mef (for multi-exposure frame images)
  • spectrum-mef (for multi-exposure frame images or _spectra_?)
  • visibilities (for radio visibilities)
  • timeseries (for arbitrary data, specified by o_obs, changing with time; similar to event data)
  • map (for a 2D data product with spatial indexes, and arbitrary observable)
  • map-timeseries (for a time-varying 2D data product with spatial indexes, and arbitrary observable)
  • datacube (for a 3D data product with spatial indexes, and arbitrary (non-time) 3rd dimension axis and observable)

Things like a image of a given polarization, or a polarization map, should be indicated by using the o_ucd.

-- JuanDeDiosSantanderVela - 07 Mar 2011

Comments on Juande's suggestion -- MireilleLouys - 09 Mar 2011

What would be the difference between image and map? image has flux as observable?

I like this kind of classification with only one field and no subtype needed.

Notice that dataproduct_type is not nillable, and should be filled by the data provider.

For instance:

  • I do not find a category for my data set : I choose "other"
  • I forgot to fill in dataproduct_type and leave it to NULL: it is not a valid implementation of Obstap

In this case I guess 'other' helps to distinguish missing entries.

-- JuanDeDiosSantanderVela

  • About map vs images: yes, the observable for an image is a real flux, flux_density, or counts (for a raw image). But I think then that it is better to forget map, unless we want to be able to decide if something is an image or a map when an ObsTAP service does not support o_ucd. Perhaps o_ucd should not be optional.

-- IgorChilingarian 08 Mar 2011

What data product type will be appropriate to describe multi-object spectroscopy? Presently, a large fraction of all spectra provided by modern facilities come as multi-object (or multi-slit). Well, when the calib_level=2, normally one would expect the spectra to be extracted from the dataset and published one-by-one, then the "spectrum" is fine (we have to think what Observation ID to attach). But in case of lower calib_level values, the reduced spectra obtained during the same exposure are often presented in some specific formats (e.g. Euro3D, ESO FLAMES) just like "all in one". I would definitely disagree to call it "other". Juan De is proposing to use a fine-grained data product type, but this we have to discuss in more detail

AnitaRichards 09 Mar 2011

p15 3.3.1 Data Product Type Is it worth stating explicitly that visibility data is likely to be in formats such as FITS, Measurement Sets (MS) or Science Data Models (SDM, ASDM etc.) (usually distributed as tar or zip directories) - just so that implementors can recognise the type of the latter, relatively new formats?

Should metadata only (as VOTable?) be included as a data product type, see comment on 4.6

p17 3.3.3

I find that the discussion in paras 3, 4 adds confusion to the nice clear description in 3.3.2 - why are aggregates of multiple files limited to levels 0 or 1 (e.g. MS, CASA-format images are directories but can be completely calibrated and science-ready or even advanced products)? Surely the approaches adopted are up to the provider. Either cut it down or leave it out, or replace it with more fully described examples from specific archives?

p20 4.3 For e.g. MERLIN+VLA images, is 'MERLIN+VLA' acceptable a) logically b) is '+' allowed or should it be and or ...? - guideance on allowed characters?

DougTody 10 Mar 2011

ObsTAP per se does not specify the possible file formats, it just allows them to be described. The plan is to address this file format issue mainly in the access_format specification. This is still being specified, but we are in the process of looking at ALMA, EVLA and others to see how well they map onto {collection, dataproduct_type/subtype, calib_level, access_format} which should be sufficient to fully specify what a data product is.

If the "data product" (tar, directory, etc.) being described is a collection of instrument-specific files, regardless of their individual calibration level, it is still an instrument-specific data product. Hence probably level 1. The instrument signature has to be removed to get to level 2-3. Level 1 may be calibrated, it is whether it is an insrument-specific data product which is the issue here.

If instead of a tar or a dir the individual data products are exposed then they can have separate calibration levels. An image or cube for example could be level 2-3. But if they are all together as an instrument "observation" grouping (tar or dir) then they are probably instrument specific. Of course, it is up to the DP to make the final decision on the calibration level.

PatrickDowler 09 Mar 2011 to Anita

> p15 3.3.1 Data Product Type
> > Is it worth stating explicitly that visibility data is likely to be in
> > formats such as FITS, Measurement Sets (MS) or Science Data Models (SDM,
> > ASDM etc.) (usually distributed as tar or zip directories) - just so that
> > implementors can recognise the type of the latter, relatively new formats?

The access_format will say if it is FITS (application/fits is a valid value of that column).

-- AnitaRichards 09 Mar 2011

It is not FITS I am worried about, it is the newer formats - but as it list is extensible it is not a big issue.

-- FrancoisBonnarel 15 Mar 2011

It is important to distinguish the data product type (spectrum, image, etc..) and the format (fits, mef, etc ...).

I think allowing NULL for dataproduct_type solves the "other" issue. I am also in favor of an optional subtype field which will allow to specify both the typology of these undiscribed dataproducts as well as specifying details such as "multiple" for spectrum or "multi-wavelenght" for a data cube.

Anything related with description of the quantity functionally dependant from other "coordinates" should be set by o_ucd which is mandatory and not by the optional dataproduct subtype.

-- PatrickDowler 2011-03-21

I agree with Markus and Francois that we should allow NULL for dataproduct_type rather than introduce "other". I am very strongly against hack special values like "other", "none", "unknown", etc. when SQL has the correct concept and it behaves correctly when queries execute.

Back to TOP discussion page


Topic revision: r10 - 2011-03-21 - PatrickDowler
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback