Data Product Type
- Should we specify all possible values ? There is the suggestion in the current document to allow the value 'other' when no available category fits to describe a possible science data product.
- If we allow other , how do we specify more information ? In free format, in Subtype, but then no interoperability is granted.
- How could we suggest a possible set of pairs (type, subtype) for a data product and provide examples
--
MireilleLouys - 07 Mar 2011
Comments
In my opinion, in an ideal world this would be something like a set of tags
(e.g., it's easy to imagine spectrum-image, or spectrum-timeseries, or
image-timeseries). A taxonomy of such classes (the
"tree") could then be used for query expansion by clients. However, there
are no set-valued columns in
ADQL, and faking them using, e.g., strings and SQL
patterns ("=AND producttype like '%/spectrum/%'=") would defeat indexing and
simply be ugly. So, I'd say make a catalog, ideally using the
StandardKeyEnumeration from
StandardsRegExt, and "other" is simply the SQL
NULL. Tell both data providers and consumers not to fill in the type if
there's not a good match. Ideally, the StandardKeyEnumeration would in
the descriptions try to cover as many real data products as possible
("This category covers objective prism exposures").
My feeling is that that's the Pareto-correct way of doing things,
actually covering probably around 98% of the actual queries rather than just
80%.
--
MarkusDemleitner - 04 Mar 2011
I second this way of working. NULL should be the
other
, and the enumeration should be registered as a StandardRegExt. Among the datatypes I would propose:
-
image
-
spectrum
-
image-spectrum
(for series of images ordered in a frequency/velocity range, i.e. radio data cube)
-
spectrum-timeseries
-
image-spectrum-timeseries
(for a 4D datacube consisting of image-spectrum
cubes at different times)
-
image-timeseries
-
spectrum-spatialseries
(for spectra on arbitrary positions, such as IFUs, OTFs, MOS)
-
image-mef
(for multi-exposure frame images)
-
spectrum-mef
(for multi-exposure frame images or _spectra_
?)
-
visibilities
(for radio visibilities)
-
timeseries
(for arbitrary data, specified by o_obs
, changing with time; similar to event data)
-
map
(for a 2D data product with spatial indexes, and arbitrary observable)
-
map-timeseries
(for a time-varying 2D data product with spatial indexes, and arbitrary observable)
-
datacube
(for a 3D data product with spatial indexes, and arbitrary (non-time) 3rd dimension axis and observable)
Things like a image of a given polarization, or a polarization map, should be indicated by using the
o_ucd
.
--
JuanDeDiosSantanderVela - 07 Mar 2011
Comments on Juande's suggestion
--
MireilleLouys - 09 Mar 2011
What would be the difference between image and map? image has flux as observable?
I like this kind of classification with only one field and no subtype needed.
Notice that dataproduct_type is not nillable, and should be filled by the data provider.
For instance:
- I do not find a category for my data set : I choose "other"
- I forgot to fill in dataproduct_type and leave it to NULL: it is not a valid implementation of Obstap
In this case I guess 'other' helps to distinguish missing entries.
--
JuanDeDiosSantanderVela
- About map vs images: yes, the observable for an image is a real flux, flux_density, or counts (for a raw image). But I think then that it is better to forget map, unless we want to be able to decide if something is an image or a map when an ObsTAP service does not support
o_ucd
. Perhaps o_ucd
should not be optional.
--
IgorChilingarian 08 Mar 2011
What data product type will be appropriate to describe multi-object spectroscopy? Presently, a large fraction of all spectra provided by modern facilities come as multi-object (or multi-slit). Well, when the calib_level=2, normally one would expect the spectra to be extracted from the dataset and published one-by-one, then the "spectrum" is fine (we have to think what Observation ID to attach). But in case of lower calib_level values, the reduced spectra obtained during the same exposure are often presented in some specific formats (e.g.
Euro3D, ESO FLAMES) just like "all in one". I would definitely disagree to call it "other". Juan De is proposing to use a fine-grained data product type, but this we have to discuss in more detail
AnitaRichards 09 Mar 2011
p15 3.3.1 Data Product Type Is it worth stating explicitly that visibility data is likely to be in formats such as FITS, Measurement Sets (MS) or Science Data Models (SDM, ASDM etc.) (usually distributed as tar or zip directories) - just so that implementors can recognise the type of the latter, relatively new formats?
Should metadata only (as VOTable?) be included as a data product type, see comment on 4.6
p17 3.3.3
I find that the discussion in paras 3, 4 adds confusion to the nice clear description in 3.3.2 - why are aggregates of multiple files limited to levels 0 or 1 (e.g. MS, CASA-format images are directories but can be completely calibrated and science-ready or even advanced products)? Surely the approaches adopted are up to the provider. Either cut it down or leave it out, or replace it with more fully described examples from specific archives?
p20 4.3 For e.g. MERLIN+VLA images, is 'MERLIN+VLA' acceptable a) logically b) is '+' allowed or should it be and or ...? - guideance on allowed characters?
DougTody 10 Mar 2011
ObsTAP per se does not specify the possible file formats, it just allows
them to be described. The plan is to address this file format issue
mainly in the access_format specification. This is still being
specified, but we are in the process of looking at ALMA, EVLA and others
to see how well they map onto {collection, dataproduct_type/subtype,
calib_level, access_format} which should be sufficient to fully specify
what a data product is.
If the "data product" (tar, directory, etc.) being described is a
collection of instrument-specific files, regardless of their individual
calibration level, it is still an instrument-specific data product.
Hence probably level 1. The instrument signature has to be removed to
get to level 2-3. Level 1 may be calibrated, it is whether it is an
insrument-specific data product which is the issue here.
If instead of a tar or a dir the individual data products are exposed
then they can have separate calibration levels. An image or cube for
example could be level 2-3. But if they are all together as an
instrument "observation" grouping (tar or dir) then they are probably
instrument specific. Of course, it is up to the DP to make the final
decision on the calibration level.
PatrickDowler 09 Mar 2011 to Anita
>
p15 3.3.1 Data Product Type
>
> Is it worth stating explicitly that visibility data is likely to be in
>
> formats such as FITS, Measurement Sets (MS) or Science Data Models (SDM,
>
> ASDM etc.) (usually distributed as tar or zip directories) - just so that
>
> implementors can recognise the type of the latter, relatively new formats?
The access_format will say if it is FITS (application/fits is a valid value of
that column).
--
AnitaRichards 09 Mar 2011
It is not FITS I am worried about, it is the newer formats - but as it list is extensible it is not a big issue.
--
FrancoisBonnarel 15 Mar 2011
It is important to distinguish the data product type (spectrum, image, etc..) and the format (fits, mef, etc ...).
I think allowing NULL for dataproduct_type solves the "other" issue.
I am also in favor of an optional subtype field which will allow to
specify both the typology of these undiscribed dataproducts as well as specifying details such as "multiple" for spectrum or "multi-wavelenght" for a data cube.
Anything related with description of the quantity functionally dependant from other "coordinates" should be set by o_ucd which is
mandatory and not by the optional dataproduct subtype.
--
PatrickDowler 2011-03-21
I agree with Markus and Francois that we should allow NULL for dataproduct_type rather than introduce "other". I am very strongly against hack special values like "other", "none", "unknown", etc. when SQL has the correct concept and it behaves correctly when queries execute.
Back to
TOP discussion page