On Thu, 12 Nov 2009, Doug Tody wrote: > Hi - > > Some more thoughts on the issue of content types. > > First, what are we actually trying to describe? If the observation table > describes "observations", this is not necessarily the same thing as, > for example, downloadable archive files. So for example an "observation > type" could be quite different than a "data product type", and might > describe the physical nature of an observation rather than the type of > a specific data product (a data product could also be an observing log, > proposal cover page, etc.). As already noted there is often a one-to-many > relationship. This is true also for "projects", "survey fields", and > so on. All of these really require their own model to describe. > > Since "observation" is not well defined and we are physically > characterizing and accessing data I suspect we are actually talking > about archive data products - catalogs, images, spectra, and so > forth. Probably we only want to index data with physical coverage > (spatial, spectral, time, polarization), i.e., data with a physically > meaningful observable of some sort (hence the actual implied meaning > of "observation"). Any other data products should be available only > indirectly via association or linking; we are trying to avoid those to > provide a mechanism which does not require custom relational queries. > > If the above is true then we want to index catalogs, images, spectra, > time series, etc., and possibly instrumental data as well (observatory > archives at least need this). Other data products such as observing > logs or proposal cover pages do not fit the model and should only be > available via data linking. > > This implies that we will need to make all data products which have an > observable and coverage available in the ObsTAP index, regardless of type. > A more complex concept such as an observation, observing project, survey > field, etc. may include any number of individual data products plus > their associated non-observed data products. > > If this is what we want to do then we can conclude: > > o Each row of the Obs table describes a science data product. > > o A science data product has a primary type such as catalog, > image, spectrum, and so forth (or various, ultimately open-ended > subclassifications, all the way down to instrumental data). > > o Rows may be logically associated to describe complex data > associations such as an observation, survey field, etc. > > o Data product descriptions (rows of the main table) could > directly include an acref URL to get a single data product. > In general however the OBSID (or OBS_ID, whatever) could be > used as a foreign key to search an associated data links table > to discover an open-ended set of links which could be followed > to do various types of things with the science data product. > > A simple link might describe a type of associated data product > with a specific data product type (not necessarily a science > data product at this point), dataset identifier, and acref > URL which could be used to directly access the data product. > Other types of links such as for services, queries, etc. are > also possible as already mentioned. > > This scheme would provide for simple discovery and description of any > type of science data product, in many cases with a simple acref URL > for download. But we could also describe more complex associations > consisting of several simpler science data products, and link to > associated non-science data products, or services which are available > on the server to do more complex things with the data product. > > Content Types > > If the content type, or data product type, describes an individual > science data product then I suggest we start with the main categories > which astronomers expect when they access our archives: catalog, image, > spectrum, and so forth. A more general multi-parametric classification > could also be useful for semantic analysis but we need to do something > reasonable if the astronomer just asks for all "images", "spectra", > "SEDs", "spectral image cubes", or whatever. A simple keyword search > on such a term might suffice in most cases. > > A reasonable approach might be a scheme with several levels, e.g., a > primary object or data product type (catalog, image, spectrum, etc.), > a predefined subtype (catalog.source_catalog, image.cube, spectrum.SED), > and then whatever the data provider wants to add, e.g., to identify data > from a particular instrument. > > We need to discuss what the half-dozen or so primary types might be but > something like table/catalog, image, spectrum, time_series, visibility, > event would come close to what an astronomer might expect. > > Ultimately we end up with something like .. > and a search on (catalog, image, spectrum, etc.) would > find everything in that category. But if additional types are specified > anything including those kewords would be found. Ultimately the exact > data product or content type used by the data provider could be specified > to get an exact query. > > I won't attempt to refine the classification here, but something > along these lines is flexible, should be familiar to astronomers, > and would probably work fairly well. > > - Doug Here is a more graphical illustration of what I describe above. Three levels of classification (four if include root GDS/Obs model) GDS/Obs Primary Type (Table/Catalog, Image, Spectrum etc.) Subtype (Image.2D, Image.Cube, etc.) Custom (provider-defined, open ended) GDS/Obs Table Catalog Source_Catalog SimDB (etc.) Image 2D sky image Cube Spectral Time Polarization Longslit (etc.) Spectrum 1D spectrum SED MOS/IFU aggregation TimeSeries Light Curve Visibility Event