Resource Metadata RFC

This document will act as RFC centre for the Resource Metadata v1.1 Proposed Recommendation.

In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your WikiName so authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.

Discussion about any of the comments or responses should be conducted on the virtual observatory query language mailing list, registry@ivoa.net.

Community Comments

  • First sample comment (by MarcoLeoni): ...
    • Response (by authorname): ...

  • Second sample comment (): ...

Comments from the Technical Coordination Group

TWG members should add their comments under their name. The deadline for comments is 27 Oct 2006.

Note: RM v1.01 is a REC already; see sect. 7 for summary of changes since then.

Mark Allen (Applications IG)

I approve

Francoise Genova (Data Curation & Preservation IG)

please edit

Bob Hanisch (Standards & Documentation WG)

The Standards and Documentation WG is in hibernation. However, as the editor of this document I will take this spot to respond the comments.

Regarding Jonathan's suggestions about Creator and Creator.Logo, the label Creator is taken from Dublin Core and, I think, should not be changed. The label Creator.Logo was chosen here, and that pattern used elsewhere (e.g., with Coverage) to show a logical grouping. Creator.Logo is a subgroup of one, so perhaps this is not so necessary. Is there really a compelling reason to change this, though?

Regarding ReferenceBibcode, the element Source (defined right before ReferenceURL) is intended to be where the bibcode is given. Again, this is for maximum consistency with Dublin Core.

Reagan asked about specifying the version number. It is there -- the element is called Version, and in the SDSS example the version is SDSS EDR. Regarding whether a resource is on-line or off-line, I think this is a registry internal function. That is, it does not seem to me like the kind of information one would want to harvest, since it could be a highly time-variable status. If a resource is totally off-line, i.e., not electronically accessible, its value as a VO resource is rather limited.

As for the definition of Format, there is a list given in the definition, and it is intended to describe the format of information returned by a service. I am happy to expand the definition if it would be helpful. What is unclear? I don't understand the question "Is a separate identifier needed?" For what?

On Reagan's suggestion that Instrument be used to describe the computational resource behind a simulation, that is in fact exactly what the document already suggests. "Comments: Can be a specific instrument name (Wide Field/Planetary Camera 2) or generic instrument type (CCD camera). Theoretical data is produced by a computer code, and the name of the code could be specified."

Roy suggests adding a phone number to the Contact metadata... this was in an earlier version of RM and was removed. I have restored the address and telephone elements.

Regarding Francois's comments: * Section 3.3 Type definition, I'm not sure how to understand extensive: is it required to include one of the types enumerated ? or would any list of types be valid ? Also it seems to me that EPOResource is not really necessary, as it looks to mean the same thing as the list "Education, Outreach"

The word extensive does not appear in this section; perhaps Francois meant extensible? The idea is that new Types can be added by amending the RM document, or even more simply, by extending the registry schema.

  • Sections 3.4 Coverage.Depth and 4 Uncertainty.Photometric required to be expressed in Jy looks quite unusual at least in the optical domain -- adding some key numbers could be useful. For photometric data absolute uncertainty values are generally less significant than relative uncertainties.

The choice of Jy is arbitrary and, in any case, should be hidden in the registry with user interfaces presenting more suitable units for other parts of the EM spectrum.

  • Section 4, there are also many commonalities with the Characterisation model which could be mentioned.

Well, RM predated Characterisation by several years. We could include this if someone would like to do the match-up. I tend to see RM as kind of stand-alone, focused on collections, and if our standards like STC and Characterisation draw upon and extend RM, that is great.

  • Last question, there is nothing in the Resource Metadata about authentication (whether all or some of the data are publicly available). Is it totally irrelevant ?

The element Rights at the end of Section 3.4 already covers this.

Regarding Doug's comments, I added the blank line (and fixed some other places where blank lines were missing or extra blank lines were stuck between elements). The reference to VOTable as an encoding format for resource metadata seems ok to me, in that XML is used in many registry implementations (even if not the particular flavor of XML used in VOTable). As for CreatorID, I have not added this but have no major objection, except that this then requires Creators to be registered as well as Publishers. Typically the Creator is just a name or names, and there is really no compelling need to register an ID. And unfortunately, I think we need to stick with Source for the bibliographic reference since, as Doug notes, this is for compatibility with Dublin Core.

Guy suggests that the definition of uncertainties should be clarified. However, uncertainties are just one set of metadata elements here that are only quite generally defined, and where it is up to the data provider to decide what is best to present in the registry. For an archive of pointed observations, obviously there is no way to characterize something like photometric uncertainty in any simple way. The same is true for things like Resolution.Spatial, .Spectral, and.Temporal. Thus, I think we should keep things the way they are and follow the guidelines in the intro to Section 4: "The following metadata elements are intended to capture the most basic measures of data quality, and may well require extensions as VO usage practices evolve and become more sophisticated."

The issue with the Service.InterfaceURL came about because the registry schema in which these metadata concepts are actually implemented use a more generic service URL concept. In consultation with Ray Plante, I have added a new element, Service.AccessURL, whose definition is consistent with the registry standard schema. Also, to avoid confusion, Service.InterfaceURL has been renamed to Service.DefinitionURL. These changes bring the RM document into accord with the actual implementations and should have no other effect.

Gerard Lemson (Theory IG)

I approve.

Jonathan McDowell (Data Models WG)

In section 3.2, I don't like the idea that there is Creator and Creator.Logo. I'd like these fields to be the utypes of the data model for RM, and that would be cleaner if you either change Creator to Creator.Name or change Creator.Logo to CreatorLogo (no dot).

You have ReferenceURL which must be a URL, but I think there also ought to be a ReferenceBibcode and I don't see one.

I realize these technical comments are late, and I will approve the rec even if you choose not to adopt them.

Reagan Moore (Data Curation & Preservation IG)

I did not see a place in the SDSS example for listing the current release number (DR5). Is there a way to specify the version of the sky survey easily? A second question is how to simply define whether a resource is online or offline? The format field example was not clear. Is a separate identifier needed? Finally there are a few minor questions: For the identifier "Type", define the term EPO For the indentifier "facility", theoretical data is more typically associated with a project than the compute resource where they generated the simulation. Perhaps the "instrument" for theoretical data is the name of the compute resource.

I approve the recommendation even if these minor changes are not incorporated.

Francois Ochsenbein (VOTable WG)

First the document is quite useful, clear, and improved compared to the previous versions. Minor comments follow, but I approve the recommendation whether these comments are taken into account or not.

  • Section 3.3 Type definition, I'm not sure how to understand extensive: is it required to include one of the types enumerated ? or would any list of types be valid ? Also it seems to me that EPOResource is not really necessary, as it looks to mean the same thing as the list "Education, Outreach"
  • Sections 3.4 Coverage.Depth and 4 Uncertainty.Photometric required to be expressed in Jy looks quite unusual at least in the optical domain -- adding some key numbers could be useful. For photometric data absolute uncertainty values are generally less significant than relative uncertainties.
  • Section 4, there are also many commonalities with the Characterisation model which could be mention ed.
  • Last question, there is nothing in the Resource Metadata about authentication (whether all or some of the data are publicly available). Is it totally irrelevant ?

And sorry for being late...

Pedro Osuna (VOQL WG)

The doc is not in sync with the Registry schema. For instance, the Contact telephone number appears in the Registry schema while it does not in the Resource Metadata (even being the Registry schema in a lower version (1.0) than the Resource Metadata one (1.1)). This is not a problem, but just a small asynchronicity.

I approve the recommendation with the above minimal caveat to be considered.

Ray Plante (Resource Registry WG)

Submitting Chair: no comments.

Andrea Priete-Martinez (Semantics WG)

I approve.

Guy Rixon (Grid & Web Services WG)

I approve if the following changes are made.

  • Fix some typos in HTML version. In the coverage section, pound-sterling symbols have replaced algebraic operators in some definitions.
  • In section 4, clarify the Uncertainty.x entries. Do they refer to the typical uncertainties in the resource or to the worst case (e.g. if there are 10% of faint sources in a catalogue with bad photometry and 90% with good, do the metadata highlight the bad ones or the good ones?).
  • Change the definition of Service.InterfaceURL in respect of web (SOAP) services. This metadatum maps to AccessURL in the registry and the agreement with Registry-WG was that AccessURL should state the service endpoint, not the URL for the WSDL document. The base definition of the RM should be made consistent with registry practice.

Doug Tody (Data Access Layer WG)

I have been using the resource metadata specification for some time and have found it a very useful starting point for specifying things like service and dataset metadata, however it does not (and probably should not) go into enough detail to describe actual datasets or the capabilities of an individual service type. Rather, it approximates these at the level of general resource metadata in the registry. It appears that RM defines and standardizes some important concepts and nomenclature regarding VO resources, but is not an actual interface or specification, hence more specific standards based on the RM will apply where defined. With that caveat, and given that RM is already a standard, I think significant changes are probably not warranted and I approve RM with whatever minor revisions may be made.

Minor comments follow, in the order things appear in the document.

Minor comment: in section two it mentions "construct a resource descriptor in a standard format (such as VOTable)". I don't think VOTable is used for or appropriate for resource descriptions, so this could be rephrased.

Very minor comment: in section 3.1, put a blank line before "Identifier (URI)" to improve readability.

A general comment on Identity and Curation metadata comes up in comparison with the related DataID and Curation as used in DAL for generic dataset metadata (see SSA spec). We had to expand upon these to describe individual datasets and do things like distinguish between dataset creation by a data provider, and curation and publication by a publisher, or describe how virtual data relates back to an archival dataset. This required further elaboration of the concepts of fundamental identity and curation/publication. However, at the level of detail of RM I think these two approaches are reasonably consistent and don't see a problem.

Minor comment: if we have Publisher and PublisherID, perhaps there should be a CreatorID to go along with Creator.

I also think that "Source" is a poor name for a bibliographic reference in the context of astronomical resources or data. "Reference" might be a better name for this, although I see from Bob's comments that the name was chosen for compatibility with Dublin Core.

The service metadata is probably getting into a bit of trouble by trying to say too much, as real service metadata will surely differ, but what we have here is a good starting point so long as we don't have to take it literally!

Nic Walton (GGF Astro-RG)

I approve

Roy Williams (VOEvent WG)

In section 3.2, for Contact information, I think there should be a place for telephone number. Also, I think multiple telephone numbers and emails should be allowed. This is because VOEvent schema clones the Contact schema from RM, and in that case it would be crucial to be able to contact the author of an event.

Once this change is made, I would add my vote for Recommendation.


Concluding comments from Registry Chair:

I agree with all of Bob's responses given above with the following caveat regarding Contact information: I would go ahead and add Contact.address and Contact.email for the benefit of VOEvent. (These have already been added to VOResource.) RM need not place restrictions on the number of emails, phone numbers given apart from perhaps requiring that the encoding format clearly express how the more specific information should be grouped. VOResource, for example, does allow multiple Contacts, each with a name, email, phone, etc.

The only unresolved/unresponded recommendations listed above are from Guy:

  • clarification of uncertainties
  • AccessURL vs. InterfaceURL: I recommend we discuss this at our telecon.

-- RayPlante - 05 Dec 2006

Edit | Attach | Watch | Print version | History: r25 < r24 < r23 < r22 < r21 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r25 - 2007-02-14 - NicholasWalton
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback