Observation Data model Core Components and ObsTAP implementation: 
   + Request for Comments
This wiki page document will act as RFC center for the 
Proposed Recommendation entitled " 
Observation Data Model Core Components and its Implementation in the Table Access Protocol, version 1.0 ". 
The specification can be found below as attached files and on the IVOA Document page:
http://www.ivoa.net/Documents/index.html
 Last updates (Oct 2011)  are available below as attached files.
 Reference Implementations  
  
      access web page (soon) 
http://xcatdb.u-strasbg.fr/2xmmidr3/tap 
  
 Other implementations  
-  Chandra ObsTAP service on going  
   + Review period:  03 May 2011 to 
17 July 2011*   
In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your WikiName so authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.
Additional discussion about any of the comments or responses can be conducted on the Data Model and Dal mailing list.
  However, please be sure to enter your initial comments here for full consideration in any future revisions of this document.
 Comments from the IVOA Community during RFC period: 03 May 2011 to 17 July 2011* 
 
An Implementation Note is currently in preparation to list and illustrate how an ObsTAP service can be set up based on the current ObsCore/TAP specification described in this proposed specification .
-- 
MireilleLouys - 04 May 2011
Thank for adding your comments below... 
(1) In Figure 1, shouldn't 
SimpleDALRegExt rather be 
TAPRegExt?
-- Response: 
MireilleLouys - 04 July 2011
The figure has been updated with TAPRegExt highlighted in red.
(2) For Section 5, I'd suggest the following wording:
  The standard URI of ObsCore is ivo://ivoa.net/std/ObsCore-1.0.  Thus,
  to declare that a TAP service implements the ObsCore model, include
  
  <dataModel ivo-id="ivo://ivoa.net/std/ObsCore-1.0">ObsCore 1.0</dataModel>
  
  in the service's capabilities element as included in the registry
  record and available on the service's capabilities endpoint.  The
  details are discussed in [TAPRegExt].
  To accommodate naive user queries, it is recommended that TAP services
  implementing the ObsCore model give ``ObsCore'' as a subject in the
  registry record.  
Rationale: TAP services must be registered anyway (so "should be registered" is in conflict with the TAP REC).  The ObsCore "keyword" is a nice and lightweight idea, but in registry language "keyword" is "subject".  
-- Response: 
MireilleLouys - 04 July 2011
This has been partly included in the updated Section 5 thanks to pat Dowler.
(3) For Section 6 -- in case it matters for the note, I have ObsCore support in my data center software 
DaCHS, and it's live on 
http://dc.g-vo.org/tap.  For the record, I think reference implementations (and, if at all possible, validators) should be part of the REC rather than being deferred to a note.
-- 
MarkusDemleitner - 06 May 2011
-- Response: 
MireilleLouys - 04 July 2011
Due to the long specification and discussion of the ObsCore Data model, implementation examples will be described in a separate note.
Comment: Please make the pol_states keyword mandatory
Reason: There are four types of information an electro-magnetic wave can
be described with but only three of them are mapped into ObsCore. Section B.6.6. already describes how polarization can be mapped in detail. In my opinion this information should be mandatory to 
   (1) make the description of the measurements complete and to 
   (2) render all ObsTAP services symmetric.
It is important to note, that this request will 
not put any additional burden on service providers of standard imaging or spectral products as for these the pol_states value will always be constant: "I". In any reference implementation this could be the default value resulting thus in no additional work whatsoever for these providers.
-- 
FelixStoehr 2011, May 23
----------------------------------------------------------------------
-- Response 
MireilleLouys 2011,May 30
Whether a data set contains or nor polarised data is expressed in the mandatory field : 
o_ucd .
If this string field contains 'polarization' then we know polarised data is present.
If not , then it is not necessary to consider this data set in a search for polarised data.
By this way , we think we use less metadata to describe observations recording one type of flux .
pol_states is useful in the specific use-cases where we search which kind of polarisation is present after 
o_ucd has been checked.
-- Response : complements 
MireilleLouys 2011, June 08
This has been discussed during the telco on Monday June 06. See the minutes attached below. It was agreed to have pol_states mandatory, with NULL value when not applicable. The combination of constraints on o_ucd and pol_states in an ADQL query requires both column names to be defined in some data base systems. Therefore  pol_states needs to be defined and then to be mandatory.
Comment: typos and clarification on cresolution
1) table 6 typos: for em_min and em_max utypes I assume the last terms
should be LoLim,!HiLim, not LoLlim and HiLlim.
2) The s_resolution is listed as "Char.SpatialAxis.Resolution.refval.cresolution"
What does the "cresolution" refer to?  
Also the
resolving power is "Char.SpectralAxis.Resolution.resolpower.refval". 
Should it be refval.resolpower to be consistent?
-- Randy Thompson 2011, Jun 02
-- Response: 
MireilleLouys - 04 July 2011
Typos on LoLim, HiLim corrected. CResolution is an STC substitution type and has been removed. The correct Utype now is "Char.SpatialAxis.Resolution.refval" 
A lot of typos found:
I am referring to pdf version so some objections given below may be valid only for this - but it has to be checked anyway
1. Introduction last sentence - the global data discoverability and accessibility is in very strange hand-written font. Similar but different font in Section 2, 2nd sentence
4.12 4th paragraph 1st sentence - redundant "the" in "of the observed the region"
Sec.5 at the end [REF] - probably the reference is expected 
Sec. 6. - comment [add url] twice - should be filled. The question mark to be removed.
Appendix A.1.2 Use case 1.2
What does it mean LIST=SERVICE REQ and AND=SERVREQ 
Explanation needed.
In addition in use case III - bad range 5000-9000 (missing zero)
A.1.3 in listing:
3rd line - AND s_resolution< 0.3 (missing underscore)
A.1.4. 
III  Spatial Resolution < 0.7 arcseconds   (double dot instead of period 0:7)
A1.5 
III - again strange comment SERVIC REQ + NEEDS ANOTHER SERVICE (CATALOGUE)  and missing right parenthesis 
A1.6. 
LIST=SERVREQ
and in last line - is the absolute value really called by vertical line in TAP query ?
A.2.3
SERVICEREQ+NEEDS ANOTHER SERVICE
A.3.3 listing 
AND t_exptime> 3600 ?    (redundant question mark)
A.5.2  listing last line
s_region (missing underscore)
A.6.2  sentence For the quasars, give me high resolution (<0.5") 
the double dot in 0:5
In general - in appendix the capitalizaed comments like SERVICE REQ are cryptic and inconsistently written (although I understand they were copied from some notes - however document like ObsCore should be 
precise even here ... (the missing underscores may confuse people i f they try to call service using the cut and paste (in the future).
B.2.1 in sentence "So the users could be warned"
(it is only "be warn")
B.4.1  The last sentence begins in different font (The same dataset - it is smaller - is it an intention ?
B.4.4 - defines rights "proprietary"  but in B4.5 last sentence
An observation with a NULL value in the releaseDate attribute is private by definition.
Is it meant the "proprietary" as a reference to B.4.4 rights ??
B.6.1.1. last sentence   strange reference to 
STC 
verbosely "CITATION 
STC \I 1036" - probably some remnant of citation tool
B.6.2 last sentence - again strange citation remnant
B.6.2.3.  after the case a)
(FWHM of the LSF) -- is it the Line Spread Function ? (probably not so commonly known abbreviation to anybody ...) 
after case b)
 ... resolution power stored as   - there is period power.stored
after case c) 
Char.spectralAxis.Resolution.resolPower LoLim
seems to be the white space before !.LoLim
B.6.3.1
strange reference
B.6.5.1 
... more complex combinations such as:
phot.flux.density;phys.polarization.stokes.I 
is it correct  the stokes.I ? (or Stokes.I ?)
last sentence
should be probably parentheses (See examples in the table) 
B.6.5.2
strange citation
B6.6.
2nd paragraph  
phot.flux.density;phys.polarisation.Stokes.I   (written Stockes) and missing period - is it OK ?
Tables 6 and 7:
What does mean principal, Index and Std ?
The index TBD - or zero - - what is a meaning of it?
and what sometimes there are question mark ? It is a standard and everything should be clear to the reader 
 
In addition to the typos and clarification needed:
There is a lot of places referring to SDM or SSAP in obsolete versions.
I suppose the SSA 1.0 et least (or 1.1.) should be referred to 
(in 3.3, B.4.3, B.5, B.6.5.2, ) and SDM probably the 1.1
(in 3., B.3.7, B.6.2, )
One inconsistency in table 5 and definition in B.6.2.1
there is no the em_calib_status in the table, but in B.6.2.1 it is defined. as {calibrated,uncalibrated,relative and normalized}.
I think it should be the level of wavelength calibration (not flux)
so the usage of "normalized" is questionable  (in SSA we have normalized only for fluxcalib). But it would break the symmetry in DM (same calibration for all axes)
However, I think, the introduction of observable axis o_calib_status in B.6.5.2 is the correct place but it is {absolute,relative,normalized, any}
B.6.5.2. refers to SSA (it is a value of FLUXCALIB query parameter)
So in query I can use "any" abut if it is describing characterization (data model ) - it is not directly corresponding to query,
however absolute, relative is a both state and query ....
I feel the columns from ObsCore should be mainly used for queries - so the SSA value of any is correct then the SDM.
I  think the whole em_calib_status should be referring to SSA query params .... and the value of "normalized " kept for symmetry in SDM (maybe I can normalize the wavelength as well  to get relative value of distance from given line in e.g. percents...)
The same inconsistency in table 5 and B.6.1.4.
the s_calib_status is defined {uncalibrated,raw, calibrated} but in table is NOT CALIBRATED, FINE COARSE....
in t_calib_status (tab 5) is then written "Type of coord calibration" = not defined values and in B.6.3.3 is not defined anything - what values can have time calibration level ?????
I am confused about this - is it a inconsistency between SDM, SSA and 
CharDM or I do not recognize something important?
-- 
PetrSkoda 2011, Jun 03
-- Response: 
MireilleLouys - 04 July 2011 
Most typos corrected in the last version.Updates for values of calibStatus, modified for each axis. 
Comments from Arnold Rots during Chandra's ObsTAP service implementation  -- August 01, 2011
There is indeed not a showstopper [for Recommendation...], but there are (imho) obvious ways in which the PR can be improved, making it a more rigorous standard and preventing future problems.
I list 7 specific recommendations, in case you would want to amend the
PR, but you will notice that they are in decreasing order of priority,
at least on my scale.
To separate out the Data Link components and to make it a pure Data
Discovery protocol:
1. Remove the access_* parameters, since they really belong to Data
Linking.
2. Allow dataproducts_type to contain a list of all data products
types that are available for the observation. This should include the
current list, augmented with something that indicates multi-file
packages. The dataproduct_subtype can problably be dropped, unless it
were to be used to indicate the contents of packages.
3. I think it would be extremely helpful if there were a parameter
added that enumerates the IVOA protocols (SIAP, SCS, ...) that are
available for the observation - again, a list.
These three items would allow repositories to specify an observation
in a single record and make the responses more concise.
In addition, there is one item that would be useful for HEA (event)
data:
4. Add a parameter o_stat_error_type (or similar), a string, that
allows the server to specify the nature of the errors in the
observable. "POISSON" would be appropriate for event data (just a
plain single value, as is currently provided, clearly is not useful),
but other uses have been suggested as well.
-- Response: 
MireilleLouys - 16 Sept 2011
This has been explicitely developped as an example of additional columns, in the document, see section B.6 Additionnal parameters on the Observable Axis.
In the text of the PR there is inconsistency with respect to the
definition of resolutions. Time and spectral resolutions are to be
"mean" or "average", while spatial resolution has to be the "highest
available". A single value or a range is allowed in all cases, but
there is no reference in the text to a range in spatial resolution.
-- Response: 
MireilleLouys - 16 Sept 2011
Added description in section B.6.1.3 mentionning range in spatial resolution : s_resolution_min, s_resolution_max.
5. I would like to suggest that the single values should represent a
"typical good value" (or something similar) and encourage the use of
ranges whenever there is such a range. Averages do not make sense,
because it is not clear what one should average over: just the average
of the range; weighted by FOV area? Trust me, you don't want to go
there with X-ray observations.
-- Response: 
MireilleLouys - 16 Sept 2011
Text changed accordingly in section 4.13
The obs_release_date could conceivably become ambiguous: is it the
date when the observation was publicly released or when the current
version of the data products was released? I assume it is the former,
but one could consider:
6. Add an optional parameter dataversion_release_date to indicate the
release date of the currently available data products.
-- Response: 
MireilleLouys - 16 Sept 2011
This is true : some ambiguity remains. Here the model needs to be worked out in further details. Several dates are already available in DataID and Curation. This can be discussed and clarified from the lessons learnt in various implementation tests and brought in in the next version of the specification.
Finally, a pet peeve of mine that is not terribly important, but is
telling about the mindset of the PR: the spectral units.
One has the choice between wavelength, frequency, and energy.
The PR has chosen 'm' which is the least physically meaningful of the
three - it just runs the wrong way when one does physics with the data.
7. Change the spectral unit from 'm' to 'Hz'.
-- Response: 
MireilleLouys - 16 Sept 2011
Actually, the PR chose wavelength and the SI units system. There is no way to escape conversion when considering the full range of spectral coordinates.
In order to support global search in the spectral domain, we need to fix a common spectral unit in which all comparable quantities for spectral coverage can be expressed.
Therefore the mandatory spectral items defined in this model :em_min, em_max, etc. has been choosen as wavelength expressed in meters, widely used in multiwavelength astronomy.
These metadata are not stored like this in radio or high energy data centers and require a conversion to be exposed as ObsTAP metadata.
This seems safer to have conversion achieved on the archive side, where care is taken by the publisher than by a client application building on-the-fly queries.  (my personnal opinion). 
This is necessary for global discovery , but does not prevent to distribute data sets described in their native units.
For instance, to expose Xray spectra , the data model items will be : 
-  ObsTAP mandatory items: em_min, em_max in meters  in order to support multi-wavelength queries
-  ObsTAP optional parameters em_unit=KeV, em_ucd=em.energy, and values in KeV in customised columns named like: em_min_kev, em_max_kev, etc...
Then the global query in use case 1.1 (Appendix A) will be :
 SELECT * FROM ivoa.Obscore
WHERE em_min < 2.48E10 AND em_max > 2.48 E-10 
Finer queries can be built up on the customised columns
of the service like : 
select * FROM ivoa.Obscore
WHERE em_ucd='em.energy'and em_min_kev < 5 and em_max_kev > 5
Customised columns are easily retrieved with
select * in IVOA.obscore where obs_collection='myfavorite-xray-data'
 for instance.
 Comments from TCG member during the TCG Review Period: 21-July-2011 - 15-September-2011 
WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or not the Standard.
IG chairs or vice chairs are also encouraged to do the same, althought their inputs are not compulsory.
 TCG Chair & Vice Chair (Christophe Arviset, Séverin Gaudet) 
I greatly appreciate the important effort that has gone through this IVOA standards, which represents the 1st instance of IVOA standard fully driven by a Science Case. I thank the people who have spent lots of time, effort and travel in this context.
I approve the Document.
-- 
ChristopheArviset, 04 October 2011
 Applications Working Group (Tom Mcglynn, Mark Taylor) 
While I don't see any major issues with the proposal technically, the proposal seems to contain a lot of ancillary justification that isn't necessary for the standard and which could get in the way of someone trying to read it with the goal of implementing it. A lengthy discussion of the justification for the standard, use cases, implementations and such does not belong in the standard itself (though they may be needed in the RFC process).  
My one technical concern is that the inclusion of the region field implies some capability of handling 
ADQL SQL extensions substantially beyond what a minimal TAP service need support, but this is mitigated by this field being nullable.
-- 
TomMcGlynn June 13, 2011
 Data Access Layer Working Group (Patrick Dowler, Mike Fitzpatrick) 
The issue raised above about the access_* columns pertains to how the PR is used within a TAP service. The only solution currently available is to provide a URL where users can retrieve the data. The PR does not constrain what is available at this URL, so client software will have to be careful (check content-type and do something appropriate). This is not a problem that the PR can hope to solve and the DAL-WG is undertaking work to solve this (currently called 
DataLink). 
The access_format and access_estsize columns are not easy to support in more complex data centre environments where such things belong to some other subsystem (the storage system). In this case I agree that these access columns are mixing in access details that do not belong in the data model. However, implementors can return NULL and in a future version they could be made optional or removed (e.g. implementors with these columns would remain compliant since extra columns are allowed). I do not feel that these belong in the model, but it is not a showstopper.
For now, the DAL-WG accepts the use of access_url since we do not have an alternative to offer at this time. 
Approved.
-- 
PatrickDowler 2011-10-04
 Data Model Working Group (Jesus Salgado, Omar Laurino) 
I approve the document.
I really appreciate the effort done and the quality of the specification. I also appreciate the compromise to review this specification as soon as we have a stable recommendation of Data Linking. 
A future combination of ObsCoreTAP and Data Linking would be quite powerful.
I only have a short list of typos that could be corrected in the last update:
- List of Acronyms:
DAL should be Data Access Layer instead of Data Access Protocol
- There are some references in the specification to the sentence "global data discoverability and accessibility" with a strange typography 
- Section 3 Spectrum data mode instead of Spectrum data model
- Simple Examples, last "curl" example does not look the same than the previous 
ADQL SELECT query
- There are some references to utypes in the text that do not seem to show the proper case sensitivity (in particular in Appendix B1) like Char.spectralAxis.Resolution.refval.value, Char,spectralAxis,Resolution.resolPower.refval
This looks changed in the table. As it is not really decided how to proceed with case sensitivity for utypes, I propose to review them and correct whenever is applicable.
-- 
JesusSalgado Sept 23, 2011
-- Response: 
MireilleLouys - 2011, Oct 13 
Typos and Utype case corrected in version 20110926 and following.
 Grid & Web Services Working Group (Andreas Wicenec, Andre Schaaff ) 
I approve the document.
-- 
AndreasWicenec, 5 Oct 2011
 Registry Working Group (Gretchen Greene, Pierre Le Sidaner) 
Could it be possible to define a rule for naming parameters
for example: 
(Three letter from the DM)_(parameter_name)sed_frequency
As there will be different data model related to TAP and when comparing results need to identify parameters that can be cross correlated.
-- 
GretchenGreene Sept 19, 2011
-- Response: 
MireilleLouys - 27 Sept 2011
This specification defines all items for one model only for an ObsTAP service. The TAP_SCHEMA columns use the utype field and "obscore:" prefix to tell which data model the column refers to.
An SED TAP service can be defined along the same line.
For a TAP service using different fields of , let say two models, the utype prefix should be explicitely mentioned and not factorised  as in the tables presented in the appendix.
-- Response: 
JesusSalgado - 16 Oct 2011
From the TCG meeting:
Although this is an interesting topic, a general rule to generate column names (or short aliases) from utypes is not covered by ObsCoreDM specification. This could be addressed into the utypes specification to be developed in a short term. Other approach followed in the past is "query by utypes" (query using utypes is then mapped to your DB data model by a special sever implementation mapping). This was prototyped in the past although some limitations were found.
As a summary, the issue mentioned will be discussed within as a possible feature of the utypes specification but, in our view, it should not be needed for the current ObsCoreDM where the column names and the utypes are directly described into the spec.
I approve the Document.
-- 
GretchenGreene, 18 October 2011
 Semantics Working Group (Sebastien Derriere, Norman Gray) 
Congratulations on this important and very well written standard.
Document approved.
Some typos found in the document :
- p10 Section 3 : Spectrum data mode -> Spectrum data model
- p13 in table BB5.2 -> B5.2
- p15 last paragraph of section 3.3.1 there is a reference to section 3.3.1, while it should be 4.7
- unit for kilobytes should be "kbyte", not kB (in table p17, p35, p48)
- p24 in section 6, there is a [add url] left. And section 6 is numbered 2 in the PDF
- p28 curl command inconsistent with 
ADQL above; in A.1.1, missing - sign in em_min: 2.48E-10
- p38 B.1.2 missing "be" on line 5 "specific archive to BE precisely"
- p38 B.2.1 could be warn -> could be warned
- p39 B.3 two dataset -> two datasets
- p45 "phys.polarization.stokes.I" is not a standard word, but "phys.polarization.stokes" exists
- remarks on UCDs in tables 6 and 7: 
-  p48 and 50: class and creator are not valid words, use "meta.id" for dataproduct_type, dataproduct_subtype and obs_creator_name
-  for calib_level, ucd="meta.code;obs.calib"
-  for access_format, ucd="meta.code.mime"
-  for s_region ucd phys.angArea more adapted than phys.area?
-  for t_max : ucd="time.end;obs.exposure"
-  for em_res_power ucd="spect.resolution" (missing t)
-  in s_resolution_min/max use dot and not underscore in stat.min, stat.max
-- 
SebastienDerriere - 09 Oct 2011
-- Response: 
MireilleLouys - 2011, Oct 13 
Typos , references and ucd updated in the PR-ObsCore-v1.0-20111008 documents. 
 VOEvent Working Group (Matthew Graham, Roy Williams) 
I approve this document. 
-- 
MatthewGraham - 13 Oct 2011
 Data Curation & Preservation Interest Group (Alberto Accomazzi) 
 Knowledge Discovery in Databases Interest Group (Giuseppe Longo) 
 Theory Interest Group (Herve Wozniak, Franck Le Petit) 
Approved.
-- 
HerveWozniak - 05 Oct 2011
 Standards and Processes Committee (Francoise Genova)