On Thu, 12 Nov 2009, Doug Tody wrote:

> Hi -
>
> Some more thoughts on the issue of content types.
>
> First, what are we actually trying to describe?  If the observation table
> describes "observations", this is not necessarily the same thing as,
> for example, downloadable archive files.  So for example an "observation
> type" could be quite different than a "data product type", and might
> describe the physical nature of an observation rather than the type of
> a specific data product (a data product could also be an observing log,
> proposal cover page, etc.).  As already noted there is often a one-to-many
> relationship.  This is true also for "projects", "survey fields", and
> so on.  All of these really require their own model to describe.
>
> Since "observation" is not well defined and we are physically
> characterizing and accessing data I suspect we are actually talking
> about archive data products - catalogs, images, spectra, and so
> forth.  Probably we only want to index data with physical coverage
> (spatial, spectral, time, polarization), i.e., data with a physically
> meaningful observable of some sort (hence the actual implied meaning
> of "observation").  Any other data products should be available only
> indirectly via association or linking; we are trying to avoid those to
> provide a mechanism which does not require custom relational queries.
>
> If the above is true then we want to index catalogs, images, spectra,
> time series, etc., and possibly instrumental data as well (observatory
> archives at least need this).  Other data products such as observing
> logs or proposal cover pages do not fit the model and should only be
> available via data linking.
>
> This implies that we will need to make all data products which have an
> observable and coverage available in the ObsTAP index, regardless of type.
> A more complex concept such as an observation, observing project, survey
> field, etc. may include any number of individual data products plus
> their associated non-observed data products.
>
> If this is what we want to do then we can conclude:
>
>    o	Each row of the Obs table describes a science data product.
>
>     o	A science data product has a primary type such as catalog,
> 	 image, spectrum, and so forth (or various, ultimately open-ended
> 	 subclassifications, all the way down to instrumental data).
>
>     o	Rows may be logically associated to describe complex data
>    	 associations such as an observation, survey field, etc.
>
>     o	Data product descriptions (rows of the main table) could
> 	 directly include an acref URL to get a single data product.
> 	 In general however the OBSID (or OBS_ID, whatever) could be
> 	 used as a foreign key to search an associated data links table
> 	 to discover an open-ended set of links which could be followed
> 	 to do various types of things with the science data product.
>
> 	 A simple link might describe a type of associated data product
> 	 with a specific data product type (not necessarily a science
> 	 data product at this point), dataset identifier, and acref
> 	 URL which could be used to directly access the data product.
> 	 Other types of links such as for services, queries, etc. are
> 	 also possible as already mentioned.
>
> This scheme would provide for simple discovery and description of any
> type of science data product, in many cases with a simple acref URL
> for download.  But we could also describe more complex associations
> consisting of several simpler science data products, and link to
> associated non-science data products, or services which are available
> on the server to do more complex things with the data product.
>
> Content Types
>
> If the content type, or data product type, describes an individual
> science data product then I suggest we start with the main categories
> which astronomers expect when they access our archives: catalog, image,
> spectrum, and so forth.  A more general multi-parametric classification
> could also be useful for semantic analysis but we need to do something
> reasonable if the astronomer just asks for all "images", "spectra",
> "SEDs", "spectral image cubes", or whatever.  A simple keyword search
> on such a term might suffice in most cases.
>
> A reasonable approach might be a scheme with several levels, e.g., a
> primary object or data product type (catalog, image, spectrum, etc.),
> a predefined subtype (catalog.source_catalog, image.cube, spectrum.SED),
> and then whatever the data provider wants to add, e.g., to identify data
> from a particular instrument.
>
> We need to discuss what the half-dozen or so primary types might be but
> something like table/catalog, image, spectrum, time_series, visibility,
> event would come close to what an astronomer might expect.
>
> Ultimately we end up with something like <primary_type>.<subtype>.<custom>
> and a search on <primary_type> (catalog, image, spectrum, etc.) would
> find everything in that category.  But if additional types are specified
> anything including those kewords would be found.  Ultimately the exact
> data product or content type used by the data provider could be specified
> to get an exact query.
>
> I won't attempt to refine the classification here, but something
> along these lines is flexible, should be familiar to astronomers,
> and would probably work fairly well.
>
> 	- Doug


Here is a more graphical illustration of what I describe above.

Three levels of classification (four if include root GDS/Obs model)

    GDS/Obs
	Primary Type (Table/Catalog, Image, Spectrum etc.)
	    Subtype (Image.2D, Image.Cube, etc.)
		Custom (provider-defined, open ended)

    GDS/Obs
	Table
	    Catalog
		Source_Catalog
	    SimDB
	        (etc.)

	Image
	    2D sky image
	    Cube
		Spectral
		Time
		Polarization
	    Longslit
	        (etc.)

	Spectrum
	    1D spectrum
	    SED
	    MOS/IFU aggregation

	TimeSeries
	    Light Curve

	Visibility

	Event