Spectral2RFC < IVOA

TWiki>

IVOA Web>IvoaDataModel>DataModelPassband>DataModelSpectra>Spectral2RFC (2015-10-27, MarkCresitelloDittmar)

EditAttach

Spectral v2.0: Proposed Recommendation: Request for Comments

This page contains public discussion of the Spectral 2.0 Proposed Recommendation; latest version

PS-SpectralDM-2.0-20150206

For Second RFC period (Oct 2015) page see:

Spectral2RFCr2

Reference Interoperable Implementations

Spectral 2.0 has been implemented at:

VAO: speclib - Java library for interpreting Spectral2.0 model
- loads model definition from ancillary file, with design capable of alternate resource types ( DB, vo-dml, etc)
- generates interfaces for model components, supports all primitive datatypes, Quantity, Lists, used in Spectral model
GAVO: GAVO DC Software Distribution DaCHS
SVO: TSAP - Theorical Spectral Model Service

Comments from the IVOA Community during RFC period: 13 May 2014 - 31 July 2014

In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your Wiki Name so that authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.

Additional discussion about any of the comments or responses can be conducted on the WG mailing list. However, please be sure to enter your initial comments here for full consideration in any future revisions of this document

Comments by -- JesusSalgado - 2014-05-13
- Comment
Comments by -- -- MireilleLouys - 2014-06-19 Here are comments I made by iterating with the Mark Cresitello-Dittmarr during the Interop meeting in Madrid, last May(2014).

Section 4: Characterisation

This section reads : "The following named instances may be used for defining axis requirements for specific use cases" - the diagram shows specializations for 'domain' restrictions (e.g. SpectralCharAxis) which is consistent with the Char-1.13 model. However, the description as "named instances" does not seem correct. We suggest the text be changed to: "The following labels may be used for defining axis requirements for specific use cases".

This term "instance" is also present in "3.9.2 Domain subclasses"

The *CharAxis objects are the "specialized realizations of the CharacterisationAxis" which are identified by the simple labels.

Attributes names and spelling:
- CalibStatus: CalibrationStatus To agree with proposed ObsCore-1.1 changes this should be
- - lower case 'calibrationStatus' attribute
  - with UTypes changed to match "CalibStatus" -> "CalibrationStatus"
- BinSize|BinLow|BinHigh
  - should be annotated as additions for DataAxis support... they are not part of Char-1.13 model
- SampleExtent
  - we could discuss about the ambiguity of this with BinSize on SpectralAxis.. However, since the BinSize is only a DataAxis element, this should not be a conflict. The SampleExtent can provide the characteristic BinSize.

Section 6: IVOA Conventions

- subsection "6.1.3 UTypes" describes labels FluxAxis etc, as "denote specific instances of CharacterisationAxis"
  - the phrasing should be consistent. CharacterisationAxis can be considered as a template.. *CharAxis are specialized realizations of that template.. and these (FluxAxis, SpectralAxis, etc) are the labels for these specialized realizations in the UType string.Moreover these labels express the role of each specialised axis with respect to the Characterisation class.

All changes implemented on version PR-SpectralDM-2.0-20140730 -- JesusSalgado - 2014-07-31

Comment by -- OmarLaurino - 2014-08-08

In order to help implementors upgrading from Spectrum 1.1 to Spectral 2.0, Mark Cresitello Dittmar and Jamie Budynkiewicz produced this "cheat-sheet" highlighting how elements in Spectral 2.0 map to elements in Spectrum 1.1 and the new elements utypes in Spectral 2.0: map-document-new.pdf

Comments from TCG member during the TCG Review Period: 1 August 2014 - 15 September 2014

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or not the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Séverin Gaudet, Matthew Graham )

2 comments:

The architecture diagram needs updating and a section added to explain the related standards. I have sent this separately to Mark.
Are there two interoperating reference implementations of SpectralDM? How does one demonstrate/explain that a reference implementation is complete? Until we can answer these questions, it will not be possible to ask the Exec for endorsement as a recommendation.

Once point 1 is done, I approve the document. The document should be updated and be ready for Exec approval. Once point 2 is answered, then the standard can be submitted to Exec.

-- SeverinGaudet - 2014-10-11

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Approved -- PierreFernique - 2015-02-09

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Approved -- MarcoMolinaro - 2014-09-15

Data Model Working Group ( _Jesus Salgado, Omar Laurino )

Approved -- JesusSalgado - 2014-09-10

Grid & Web Services Working Group ( André Schaaff, Andreas Wicenec )

Approved -- AndreSchaaff - 2014-09-30

Registry Working Group ( _Markus Demleitner, Pierre Le Sidaner )

First off, apologies for coming with this so late. Many of the following comments I should have made in public RFC as an implementor; alas, I was busy elsewhere. I certainly don't want to abuse my position as TCG member to push them in; the points marked with ( ), however, are I believe appropriate for a TCG review. Disregard the others as you see fit, except if the document makes another round through public RFC (which I, frankly, would prefer).

My main concern is: Implementations. The RFC page lists speclib on the client side and TSAP on the server side -- as the model is so large, I'd like to get an idea what part of the standard these actually use/support, and in what sense interoperability has been shown. I am a bit surprised that TSAP is a reference implementation, when 7.1.2 requires a 'SpatialAxis'. On the mailing list, the rationale for this was that the model was meant for observational spectra. As a baseline, I think there should be at least example serialisations using all features provided by the DM ()

Are there any other experiences with the data model and, in particular, its serialisations? [With my GAVO hat on: I'm offering an attempt at a partial server-side implementation until the end of the year].

Also, is there any plan to make a validator for at least the VOTable serialisation? If speclib really reads a significant portion of this, couldn't that be used to make some sort of validator? ( )

If I remember correctly, one reason this was considered relatively urgent was that it was hoped it would show how to serialise time series, in line with our current time domain priority. Now, though time series are hinted at in some places, they are not really worked out as far as I can tell -- wouldn't it be economical to add the few pages here, next to the two other concrete types (Spectrum and Photometry Point)? It'd probably be faster than start another effort in another document.

Other general remarks:

Given the number of cross references in the document, it would really help if cross-references could be hyperlinks.

Many headings are missing whitespace between the numbering and the heading text itself

For my taste, "Zero (0)" (passim) looks a bit odd, in particular when there's frequently no "(1)" after the "one"s that are often following. Isn't "Zero" enough?

I can't say I'm a big fan of the duplication of metadata between 'DataAxis' and 'CharAxis'. If I don't misunderstand the effort and this is really is only to allow embedding "immediate" values in metadata, maybe that could be made a bit more explicit -- and maybe mentioning that, e.g., 'DataAxis'.unit and 'DataAxis'.ucd would not appear explicitely in a VOTable as they are mapped to PARAM attributes might also help understanding.

On to more specific issues, by section header:

MCD: I'm not sure if this is the right way to respond.. but I'll make comments in-line (in green), especially the () items. As I mentioned in Banff, there was never an attempt or requirement to 'correct' the descriptions of the metadata over the earlier Spectral document(s), though I did some cleanup and comparison against obscore at the time. The Dataset Metadata document for the Cube work has had a much more thorough scrubbing and cleanup. Many of these comments are very helpful for that effort, but I'm not sure how much time and effort should be put into this document given that:
1) the content is entirely consistent with previous versions
2) will be revised after the cube work for consistency with that model framework.

MCD-20150206: The document will certainly be up for revision post-Cube, how long that will be, I can't say. The concensus was that this version provided enough benefit and interest from the community, to move forward.

1.3 Use cases

The section is empty -- while that's regrettable, at least the headline should be removed. ( )

MCD: I will restore the use case section from the previous doc, and see if new cases should be added.

MCD-20150206: Done

2.1.2 'Dataset.DataProductSubtype':

I can see the motivation of leaving the vocabulary here open -- but without further indication of the intended use, if only by example, it's hard to figure out what one could possibly put here.

MCD-20150206: Done - enhanced description, and added examples

2.1.3 'Dataset.CalibLevel'

0 here appears to mean "not in a standard format" -- in what circumstances could that turn up? If a dataset has a way to communicate this, isn't it in a standard format already? It'd also help if the text provided indication of (let's say, an example or two for) what 3 ("Enhanced") is intended to mean.

MCD-20150206: Done - enhanced description, and added examples

2.4.3 'Curation.PublisherDID': String

I'm not quite happy with "may be an internal ID" here. Common understanding, I believe, has been that PubDIDs should be VO global and IVORNs. I'd much rather have:

The 'PublisherDID' is a VO-global identifier of the dataset as assigned by the publisher. The recommended form is ?, e.g., ivo://example.net/imageservice?2013/5/2342. Other schemes, for instance using the authority ID as basis, are allowed, too. Note that the part in front of the question mark must be resovable in the VO Registry.

After I've proceeded to 2.6., I'm now doubtful of this: Isn't this what 'DataId'.datasetID is? IMHO there should be some explanation on the relation here. ()

MCD: I will see what I can do to help clarify the distinction.

MCD-20150206: Done - enhanced description, not quite as specifically as your text, but clearly states the values must be a valid IVOA identifier.

2.4.7 Curation.References: String

I'm fairly unhappy with keeping this so generic. The way this is written, people will dump any old string in there, glueing together different references with characters an implementation has no way of figuring out. Couldn't we write ( ):

2.4.7 Curation.Reference: String [Singular!]

One or more bibliographic references associated with the datset. Applications might use these to suggest what works to reference when a dataset is used. To allow for automatic processing, values should be either bibcodes (discernable to the client as 19-character strings beginning with four digits) or DOIs (discernable to the client by their prefix "doi:"). Freetext references are allowed but discouraged.

The containing element can occur multiple times. Do not combine multiple references into one value.

MCD: Perhaps the notation isn't clearly conveyed. The convention in this doc (and the Cube) is for attributes with multiplicity >1 to be plural. If you consider the attribute, it holds the 'references'. Each instance is a singular reference which could be described as you suggest. So, the convention used is in question, and if changed, would be done across the board. Another option would be to show the type as "String[]".

MCD-20150206: Done - The attribute name(s) are now singular, the text states that the multiplicity of zero or more. The description rather speaks to the serialization, but hopefuly clarifies things.

2.6.

There's an "IVAO" (rather than "IVOA") here. ()

MCD:will fix.

MCD-20150206: Done

2.6.3 'DataID'.Collections

Again, I'd suggest to make this singular and make clear this element may be repeated. Alternatively, we need clear rules how different entities would be separated. ( )

MCD: Again, the attribute isn't singluar, it is a collection/list of things (in this case Strings), each representing a particular Collection. I can see about clarifying the structure, but your concern is more along the lines of "how do I serialize array parameters in the VOTable".. which is a different concern.

MCD-20150206: Done - similar change as with reference. Added this element to votable serialization example as well.

2.6.4 'DataID.DatasetID': URI

The relation to 'Curation.PublisherDID' should be clarified. Also, I'm not sure I'm a big fan of the text on journal-based URIs. I'd much rather have here the text proposed above on PublisherDID.

MCD: will update.

MCD-20150206: Done

2.6.5 'DataID.CreatorDID': URI

Again, I'd like to see a text similar to the one on PublisherDIDs here, except that I should be made clear that the base IVORN would be the one of the creator.

MCD: will enhance the description.

MCD-20150206: Done

2.6.7 DataID.Version

There should be an explanation for how this relates to Curation.Version.

MCD-20150206: Not Done - I'm not sure what the distinction is.

2.6.11DataID.ObservationID: String

If this is intended to actually be an "internal" id, are there any expectations on the semantics? An example might help.

MCD: I don't believe there are any expectations/restrictions on the semantics.

2.7 DataModel

I think it would help understanding if it were mentioned here that concrete values are bound in sections 7 and 8. There, I'm confused that Prefix and Type are optional. Are there guidelines when they can/should be left out or be included? If not, I'd suggest to drop them entirely -- if an application cannot rely on them in the first place and they're not necessary for some specific task, why bother at all?

MCD: I'll look at the language. This bit has migrated quite a lot during the Cube model discussions, so is a little tricky. In the new Cube docs, this object doesn't even exist. The name MUST be present, and MUST match that specified by the particular model. The prefix MAY be specified, if not given, the default value MUST be used within the serialization. I think that's how it went. User-defined content would provide both to indicate their elements.

MCD-20150206: Done - descriptions updated

2.8.1 Derived.SNR: Double

Either provide an embedded hyperlink or say "can be obtained from [7]" at the end of the first paragraph.

MCD-20150206: Done - embedded the hyperlink.

2.9.1 ObservingElements

Typo: ObservingElments ()

MCD: will fix

MCD-20150206: Done

2.9.4 DataSource.Name: String

Is this SSAP's DataSource? If so, can we harmonise this type with what's in the SSA registry extension for that?

MCD: OK

MCD-20150206: Done

2.11.1Target.Name: String

I'd like to see some prose in here like "If at all possible, this object name should resolve in the domain-specific resolution services, e.g., SIMBAD or NED".

MCD: the cube model description does that.. I'll adopt it here.

MCD-20150206: Done - updated description.

2.11.4Target.Class: String

I think we can't really write things like "an initial deployment of the VO would" in 2014. If we can't agree on a closed vocabulary here, let's at least put in some representative recommended terms. If we can't get any better, then let's at least put in the text for the equivalent field in obscore. ( )

MCD: In the cube work I have: "General classification of the target. This field supports the discovery of data pertaining to a common class, e.g. 'Star', 'Galaxy', 'AGN'. At the time of this writing, there is no IVOA recommended vocabulary for this field. The SIMBAD and NED databases use defined vocabularies for astronomical object classifications which may serve as the basis for such."

MCD-20150206: Done - updated description

3.1.2 SpectralSI: String (and following)

Since we how have VOUnits, is it really a good idea not to use it here? ()

MCD: I don't see the conflict. These are an alternative/generalized unit representation of the VOUnit strings.

MD: ...which of course means that producers have to include very similar information twice in different ways, and consumers will have to support two ways of reading it, decide on how to reconcile conflicts, etc. I don't really understand what's the use of this. What functionality would be lost if all the *SI items were just removed?*

MCD-20150206: I agree with your point. The values are a re-casting of the units associated with the data axis values (Data.*Axis.unit), and probably don't need to be part of the model. I'm not sure what use-case motivated their inclusion, they are mentioned in the SSA document as well, so there may be a dependency that isn't clear. I wouldn't be comfortable removing them without some discussion/consideration of the effects.

3.4 CoordSys

I'm concerned about the duplication of responsibilities between this and CoordSys. As I believe in general embedding is preferable to referencing in VOTables and I see little to gain by "normalising" this (so items with common systems can reference instead of repeat): Can't we just agree on sticking this information in CoordSys all the way through? It's as expressive, and it would remove optional elements and, best of all, another source for potentially conflicting information.

MCD: I'm afraid I don't understand.. which two CoordSys are you referring to?

MD: Hm -- re-reading this I don't really understand how things are intended to play out between characterisation and this CoordSys. It also doesn't turn up in C.1.1 (but it does in C.1.2). I guess what this boils down to is that some additional explanations how the axes down in the Char elements relate to the axes in CoordSys.

MCD-20150206: The relation between these was a big issue with this document, I recall we spent a good bit of time working this out. The top level CoordSys holds complete coordinate system definitions for the dataset, there can be several coordinate systems represented in any given dataset. The CoordSys element is associated with a particular axis, and is to specify which Frame the CharacterisationAxis information is provided in (ie: which of the N SpaceFrame-s is associated with the SpatialAxis metadata). I haven't made any changes to these descriptions.. or the examples.
I wanted to keep the serialization examples lean, since they really shouldn't be included in the model doc. A full serialization of the model would basically be a reference implementation, and external to the doc, either as a Note or in some location where we keep reference implementations of our standards.

3.6 CorrectionItem

For interoperability, I think this should include strict rules on what clients are supposed to do when they encouter CorrectionItems they don't understand (at least if they're marked as "not applied").

MCD-20150206: The requirements about what to do with elements they don't understand would vary depending on what type of client it is. A cutout service may just pass it along, where an analysis tool would need to handle it. So, I don't think that decision is appropriate for the model doc.

3.9.6 DataAxis.unit: String

I'd like to see a "MUST conform to VOUnits" here. ()

MCD: I can add that (to each 'unit' element), section 6.2 does specifically state that the model requires compliance with VOUnit-1.0, but may be worth repeating.

MCD-20150206: Done

3.13 SpectralResolution

Is there actually a compelling reason to keep both ResolPower.refVal and Resolution, in particular since, as stated in the text, they can be fairly trivially transformed into each other? Having two spots for essentially the same thing is an implementation liability at the very least, and I'd argue for a 2.0 version "backwards compatibility" is not a terribly strong reason. And even "obscore compatibility" doesn't convince me. It would certainly be nice if our data models had more consistency, but as at least until VO-DML is ready it seems we have to choose between intra-model consistency (one place, one form for an item) and inter-model consistency. I, for one, would go for the former any day.

MCD-20150206: again, I agree that trimming the fat would be good, but was outside the scope of this revision. I wouldn't want to remove things without a review of the consequences.

4.4 Coverage

Here, the document structure appears to have gone funny -- there are first three empty subsections, then three sections that appear to flesh out these subsections. ( )

MCD: The structure is consistent, but the content is weak. Coverage (4.4) consists of 3 elements; Location (4.4.1), Bounds (4.4.2), Support (4.4.3) , in those sections, should be content describing their use in that context (as elements of Coverage).. and I didn't have anything specific to say. They each then have sections (4.5, 4.6, 4.7 respectively) defining the elements themselves. The wonky part of the section structure is that the sections are not in alphabetical order as they are in the other sections, it seemed more confusing to do that.

5.3.2 DopplerDefinition: Enum

"Comparisons to these values should not be case sensitive." -- does case-insensitivity help here a lot? In my implementation practice, I've always found case folding to be a noticeable burden and source of errors, while I usually fail to understand how they could be useful.

5.6.3 TimeFrame.Zero: Double

I'm not sure I understand what this is intended to do. That may be me, but I'd read this as allowing some global shift in all times in a document, and that I'd find at least in clear need of a strong justification.

6.1.3 UTypes

"These labels are used as synonyms for the CharacterisationAxis portion of the relevant UType,..." -- they certainly are no "synonyms", right? Maybe it should say "specializations" or something like that here? ()

MCD: Will change the wording.

MCD-20150206: Done

6.4.2.6 Position

The "unit" item here again appears to insinuate multiple units might end up in one string. I would certainly be useful if this said how these would be separated (or otherwise distributed to the fields in question).

MCD: In this doc, the Position object has a singular unit field which contains the unit string applicable to all Cn attributes of the sub-types (Position1, Position2). Basically, it requires all contained values to be given in the same unit.

MCD-20150206: Done - enhanced the description

6.4.3.1 stdRefPosition

I'll not tire to point out that having this large list of potential reference positions makes it unlikely that any implementation will ever support even a large part of them, which at best may lead to interoperability problems. Can't we say people are supposed to support a small subset of these? There's always TOPOCENTER for Pluto-orbiting observatories. Of course, the topoi in there are to be described somewhere else, but that we'd really need anyway.

I'd propose a similar reasoning for stdSpaceRefFrame.

MCD: I think document should have the comprehensive list.. (well, actually, stc should). The applications would define which are required.. for example the SSAP protocol could specify that a service need only handle (blah. blah. blah) to be IVOA compliant.

MD: Which, of course, will make it virtually impossible that a client can just say "I understand Spectra in VOTable SDM2" or so, which I think makes all kinds of things needlessly difficult.

6.4.3.4 CalibrationStatus

Since we're making a major release anyway: Can't we just drop ABSOLUTE?

MCD: I have no strong opinion on this. Char-1.13 doesn't have it, so would be making it more consistent. Unless there is strong objection, I will remove it.

MCD-20150206: Done - noone objected smile

C.1.1 Basic Spectrum Instance

As said above a couple of times, I think this would be a good place to say how sequence- or array-like items are to be serialised.

MCD: yes it would.. the serializations are 'minimal' and none of those items are required, but it would be useful.

MCD-20150206: Done - I added several items to the Spectrum VOTable example. "an example of the various datatypes ( string, double,
array, enum, uri, url ), as well as complex attribute (Derived.Redshift) and element with multiplicity greater than one (DataID.Collection)."

Also, there's this in the instance:

 <GROUP name="Data"> <FIELDref ref="DataFluxValue"/> <FIELDref ref="DataSpectralValue"/> </GROUP>

With old-style utypes I'd argue that makes no real sense. The utypes on the FIELDs are enough. If you keep it in, you'd at least have to explain how this is intended to be used and whether that's mandatory or not. ( )

MCD: Are you referring to my serializing the Data group as containing the 2 FIELDref specs rather than let the UType on the field elements show that they are part of the Data element? If so, this is true for all of the GROUPs. The first paragraph in C.1 states, "We use the VOTable GROUPS construct to aid readability. It is not a requirement for users to make use of this construct for all elements of the model." One could repeat the utype in the FIELDref, but that wouldn't (I think) aid readability.

MD: No, my concern was that here you could (and probably should) just put the FIELDs themselves in the GROUP. The FIELDrefs are necessary in more advanced schemes where they allow the association of additional metadata -- utypes not on the FIELDs themselves. As that's not necessary here, I don't think we should tell people to do referencing, which is always a liability.

MCD-20150206: The choice of using FIELD or FIELDref is a user choice. I'm not sure why I went with this, other than maybe to show that you can. It is true that, in this case, it would be equivalent to put the FIELD elements in there directly.

C.2 FITS Serialization

I believe as an implementor I would ask how I'd annotate existing spectra that have SPECSYS CMBDIPOL or SOURCE..

MCD: Yeah, I suppose one would. The question applies regardless of VOTable or FITS, these aren't in the STC reference position list, so would be "CUSTOM" frames. Since this model uses a 'simplified' STC model which allows only standard reference position values, I don't think this would be supported. This is the sort of issue that the Cube work should help resolve.

p 100 "Open Issues"

I guess having "open issues" in a REC would merit a brief comment on why they're left open and what the cost of that is. Then, there's no utypes specification to date, and frankly, the expectation utypes could magically solve the problem described in the first bullet point is part of the problem -- they can't, really. So, I'd propose to describe the problem without reference to utypes (where it would seem to me that at least in VOTable serialisation an ad-hoc convention would solve the problems). ()

MCD: Both of these are serialization items, I'll rename it "Serialization Issues", and rework the first bullet to recommend using GROUP elements in VOTable to provide the structure.

MCD-20150206: Done

Summing up, I'd certainly appreciate collecting somewhat more implementation experience with this; however, after the points marked with (*) are addressed in one way or another, I'd not hold up the process and approve.

-- MarkusDemleitner - 2014-09-16

Semantics Working Group ( _Norman Gray, Mireille Louys )

The Semantics WG had interactions in the past about the way UCD were used in the previous version of this specification. Provided the updates of SpectralDM v2.0 does not touch the semantic aspects , the Semantics WG approves this document.

A question still remains , as for other data models, to define a metrics that will help to evaluate how much of a data model has been covered in the reference implementations.

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Mike Fitzpatrick )

Data Curation & Preservation Interest Group ( Alberto Accomazzi, Françoise Genova )

Knowledge Discovery in Databases Interest Group ( George Djorgovski )

Theory Interest Group ( _Franck Le Petit, Rick Wagner )

The Spectral Data Model v2.0 deals with Theoretical Spectra.

It would have been useful that during the process could have talked with the Theory Interest Group.

The major concern I see, is that some same kind of data (here spectra) can be published and retrieved through different DM / Access Protocols. For example, we may find some services about Theoretical Spectra described through Spectral Data Model and other ones described through the Simulation Data Model. I think this situation may be problematic to discover data and to develop tools to discover data in an interoperable way. Somebody who wants to retrieve Theoretical Spectra will have to develop two ways to access them either with SDM / SSA and SimDM / SimDAL.

Historicaly, Theoretical Spectra have always been described by the Spectral DM v1 but that was by default, because no other DM covered Theoretical Data. From 2012, the IVOA has the Simulation DM, and so, for clarity, we can wonder if it is not time to say that Theoretical Data has to be described using the Simulation Data Model.

Standards and Processes Committee ( Françoise Genova )

<!--
* Set ALLOWTOPICRENAME = TWikiAdminGroup
--