Copyright © 2003 IVOA®
This is a Working Draft. The first release of this document was 30 June 2003 and the Registry Working Group is making its best effort to address comments received since then, releasing several drafts and resolving a list of issues meanwhile. The working group seeks confirmation that comments have been addressed to the satisfaction of the community.
Comments on this document are due 1 October 2003 for consideration in the next version of this document. They should be sent to public-rql-comments@ivoa.net, a mailing list with a public archive. General discussion of related technology is welcome on the Rwp03 wiki site or on the mailing list: registry@ivoa.net.
It is expected that once it is accepted as an IVOA Recommendation, this document, or an appropriately adapted version, will be submitted to the Internet Engineering Task Force (IETF) to register the "ivo" URI scheme.
This is an IVOA Working Draft for review by IVOA members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use IVOA Working Drafts as reference materials or to cite them as other than "work in progress." A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/docs/.
This document builds on the concept of a Uniform Resource Identifier as described in the IETF RFC 2396 by Berners-Lee, Fielding, Irvine, & Masinter [RFC 2396] and the subsequent IETF Internet Draft revisions [Berners-Lee et al. 2003].
The authors would like to acknowledge the support of NSF ..., and
the UK ....
Conformance-related definitions
The words "MUST", "SHALL", "SHOULD", "MAY", "RECOMMENDED", and
"OPTIONAL" (in upper or lower case) used in this document are to be
interpreted as described in IETF standard, RFC 2119
[RFC 2119].
The Virtual Observatory (VO) is
general term for a collection of federated resources that can be used
to conduct astronomical research, education, and outreach.
The International
Virtual Observatory Alliance (IVOA) is a global
collaboration of separately funded projects to develop standards and
infrastructure that enable VO applications.
Syntax Notation
This document uses the Augmented Backus-Naur Form (ABNF) notation
(RFC 2234) to
formally define syntax for identifier components. It references the
following core ABNF syntax productions (defined in section 6.1):
ALPHA, DIGIT.
Many data providers in the VO have been creating and using IDs for a long time. Their choices of identifiers were made presumably to best fit the needs of the data. If an IVOA ID framework is to minimize the cost of adoption, then it needs to maxmize the control of providers have to reuse the IDs they already have in place as well as create new IDs that are consistant with their overall organization. In addition, providers will need full flexibility over what an ID refers to. It could be a dataset that can be transmitted over the network, or a scientific intstrument which cannot, or an abstract concept or organization.
Identifiers are very important to registries which aid users in discovering data and services. In general, a registry stores descriptions of data and services in a searchable form, and it distinguishes them by a unique ID. Furthermore, when users encounter an ID, they should be able to go to a registry and find out something about the thing it refers to.
It is important to distinguish between two forms of referencing. The first is a reference to a specific instance of something. Typically, we think of this thing having a specific location; however, in the framework proposed here, we think of it as being held and managed by a specific organization even though it's physical location may be undefined. The framework recognizes that entities like datasets do not always remain in control with a single organization forever; thus, necessitating a second form of referencing that is location-independent--or more precisely, organization-independent. When several copies of a dataset exists at several locations around the VO, one can refer to all of them collectively, defering the choice of a particular instance until it is actually needed. Also, the curation of a dataset may be transfered from one organization to another; an organization-independent reference thus serves as a persistant pointer to data that can be resolved to a new location when it moves. This is very important to journal publishers that wish to refer to data in publications (whose useful life might be measured in decades) without worry that the references will become obsolete.
This proposal defines identifiers of the first type describe above--that is, organization-dependent identifiers. Persistent, organization- and location-independent identifiers are not currently defined as part of this proposal, because it is not clear under what conditions it can be claimed that a resource is unchanged when its management is changed or duplicated from one organization to another. It is expected that a standard for organization-independent identifiers will be based on this proposal and the standard registry framework.
Nevertheless, these two forms of referencing are precisely what are addressed by the IETF standards for URIs [RFC 2396] and URNs [RFC 2141]. Thus, the framework proposed in this document builds directly on these standards. They provide an explicit mechanism for communities to add additional syntactic restrictions onto URIs and URNs and to define rules for interpreting them, all while still remaining compatible with the generic standards. This proposal uses this mechanism to enable the retrieval of descriptions of referenced items given their specific identifier and the resolving of organization-independent identifiers to specific ones.
In the Internet world, a resource, is essentially anything that can be referenced by a URI. This document refines this definition by adding that it is describably by the generic resource metadata defined in the IVOA Working Draft on Resource Metadata (Hanisch et al. 2003, here on referred to as RM).
We also refer to organizations and providers in the sense that they are defined in the RM:
An organization is a specific type of resource that brings people together to persue participation in VO applications. Organizations can be hierarchical and range greatly in size and scope. At a high-level, it could be a university, observatory, or government agency. At a finer level, it could be a specific scientific project, space mission, or individual researcher. A provider is an organization the makes data and/or services available to users over the network.Definitions of other types of resources, including data collection and service, are also defined in the RM, and are assumed by this document.
As discussed above, identifiers are critical to registries. A registry is a resource that stores metadata about other resources, including organizations, data collections, and services, and makes that information accessible through a set of services. Typically, one gets at this information either by doing a search against the metadata or through a look-up operation given an identifier. Registration is an operation that a provider carries out to tell a registry that a resource exists and can be refered to by a particular identifier; this is typically done through a registry service. An IVOA-compliant registry is a registry that implements the minimum, IVOA standard registry services (currently under development), including description look-up by ID.
<ResourceID>
<AuthorityID>adil.ncsa.uiuc.edu</AuthorityID>
<ResourceKey>surveys/96.JC.01</ResourceKey>
</ResourceID>
The same identifier can be expressed as a URI:
ivo://adil.ncsa.uiuc.edu/surveys/96.JC.01
In both forms, the identifier has two components: an authority ID and
a resource key. The former establishes a namespace within which the
rest of the ID, the resource key, can be considered unique. Identifiers are considered case-insensitive; however, the preferred rendering of character case in the ID is determined when its resource is registered.
An authority identifier is a compact string of ASCII text that defines a globally unique namespace controlled by a single naming authority. The authority ID string is a compliant URI authority component (RFC 2396, section 3.2) with the following restrictions:
In ABNF notation, the syntax for an authority ID is as follows:
- Recommendation:
- It is intended (but not required) that the authority ID look like a DNS hostname. Use of an authority ID of this form does not imply that such a hostname actually exists and is DNS-registered. To make this form more recognizable, the following characters are discouraged from being a part of the authority ID: "!", "~", "*", "'", "(", and ")". It is also recommended that authority IDs avoid the multiple, sequential occurances of periods, ".".
authorityid = alphanum 2*unreserved alphanum = ALPHA / DIGIT reserved = "/" / "?" / "#" / "[" / "]" / ";" / ":" / "@" / "&" / "=" / "+" / "$" / "," / "<" / ">" unreserved = alphanum / mark / discouraged mark = "-" / "_" / "." discouraged = "!" / "~" / "*" / "'" / "(" / ")"A naming authority is allowed to control multiple authority IDs to organize related resources into different namespaces. For example, an organization may choose to control two authority IDs, one for research-related resources and one for education/outreach resources, even though they are all maintained by the same organization and perhaps made available through the same machine.
VO applications should be case-insensitive when handling authority IDs (see "Comparing Identifiers" below). In practice, applications are encouraged to present identifiers using all lower-case characters.
3.1.2. Resource Key
A resource key is a localized
name for a resource that is unique within the namespace of an
authority ID. The naming authority creates keys for its namespaces
and has complete control their form beyond the syntax constraints
specified here.
A resource key must conform to the syntax of a URI path component (RFC 2396, section 3.2); that is, it is a slash ("/") delimited ASCII string. In addition, it must not contain any of the other reserved characters, ";", ":", "@", "?", "#", ",", "[", "]", "<", ">".
In ABNF notation, the syntax for a resource key is as follows:
resourcekey = segment *( "/" segment) segment = *unreservedNaming authorities are discouraged from creating segments matching either "." or "..". Empty segments, resulting in two or more consecutive slashes or a trailing slash, is also discouraged. As described in section 3.4, "Comparing Identifiers", such segments do not have the special meaning they have in traditional filesystem pathnames; that is, a resource key cannot be transformed to remove any "." or ".." segments and still reference the same resource. Such segments are interpreted as a literal component of the path.
The naming authority is free to create a resource key that suggests something about the resource it refers to. Any meaning that is suggested by the resource key is intended only for human consumption. The character content of a resource key is not semantically machine-interpretable. Any information about the resource, apart from whether it is same exact resouce (i.e. instance or copy) as the resource refered to by another identifier (see "Comparing Identifers"), can only be determined by examing the resource metadata.
- Note:
- The reserved characters may be employed in the future to differentiate additional components of an identifer beyond the authority ID and resource key.
The presence of a resource key is optional. An identifier that contains only an authority ID refers to the organization that acts as the naming authority for that namespace.
VO applications should be case-insensitive when handling a resource key (see "Comparing Identifiers" below). In practice, the prefered use of case is set by the rendering of the key by the naming authority when the resource is registered. This may contain one or more capital letters to improve reading.
3.2 Identifier Formats
The identifier components can be combined into
two equivalent formats: an XML-tagged form and a URI-compliant form.
3.2.1. XML Format
An IVOA identifier rendered in XML can be described as an XML complex
type as defined by the XML Schema specification
(XMLSchema-1). This type contains two child
elements. The first element, which must
be present and is non-repeatable, is named
AuthorityID, and its content is a string that
conforms to the syntax of an authority ID. The
second element, which is optional but non-repeatable, is named
ResourceKey, and its content is a string that
conforms to the syntax of a resource key.
Appendix A lists an XML Schema definition that
includes the XML element, Identifier, that is
defined to be an IVOA identifier for a registered resource. Other
schemas may import and use this element directly whereever this
general metadata concept is useful. If a more precise meaning is
needed, e.g. "PublisherID" meant to refer to a resource's publisher,
then a schema can create a new element whose type is
"IVOAidentifier".
3.2.1. URI Format
There will be occasions when an application needs to encode an
identifier in a format where the XML encoding defined in the
previous section is not cannot be readily
used but a simple string can. This would include a non-XML format,
such as in a FITS keyword. A number of XML-based metadata handling
systems also treat identifiers as strings (e.g. Resource Description
Language, Dublin Core, Open Archives Initiative). In some cases,
identifiers are encoded as XML attributes. To enable greater
compatibility with other metadata technologies, this document also
defines a URI format for an identifier.
This specification defines a new URI scheme called "ivo." A URI that use this scheme signals that:
To simplify comparisons identifiers in URI format (see "Comparing Identifiers"), case-insensitive variations on "ivo" as the scheme shall be considered equivalent; however, use of variations other than the all lower-case form are strongly discouraged.
In ABNF format, the URI form is defined as:
ivo-scheme = ("i" / "I") ("v" | "V") ("o" | "O") uri-form = ivo-scheme "://" authorityid ?( "/" resourcekey )
It is possible that an organization may become registered as part of the process of requesting an authority ID. Equivalently, registering an organization may result in an implicit request for a particular authority ID.
Once an organization is recognized as a naming authority, it is free to register any number of resources with identifiers having an authority ID that they control. No other organizations may create identifiers with an authority ID it does not control. The naming authority has full control over the creation of a resource key as long as it conforms to the syntax and uniqueness constraints described in this specification.
It is the responsibility of the registry that accepts new resource descriptions to ensure that a new descriptions are not associated with identifiers already refering to other resources. The mechanisms used to ensure this are not described here.
3.4 Comparing Identifiers
An important use of identifiers is comparing two instances to
determine if they refer to the same resource. This will most commonly
occur when using an identifier to look up the associated resource
description in a registry.
Two resource identifiers are guaranteed to refer to the same resource if they have identical authority IDs and identical resource keys. A pair of components is considered identical if a case-insensitive, character-by-character comparison indicates they are identical. Apart from a transformation to handle case-insensitive comparisons, no other normalizing transformations shall be necessary to test if two resources are IDs refer to the same resource.
In general, the string-based comparison of identifiers described cannot determine definitively if two identifiers refer to different resources. While it is not intended that the a single resource be registered multiple times with different identifiers, it is not disallowed by this specification. In particular, it is possible that two resources with different identifiers may be mirrors of each other; such a relationship can only be determined by examining the metadata contained in the descriptions associated with each identifier.
- Note:
- The case-insentive string comparison test on two identifiers in URI format is equivalent to parsing to the identifiers into their two components and comparing them.
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vid="http://www.ivoa.net/xml/prop/VOIdentifier" targetNamespace="http://www.ivoa.net/xml/prop/VOIdentifier" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:annotation> <xs:documentation>Version 0.1</xs:documentation> <xs:documentation> This schema defines the XML format for IVOA Identifiers as specified in the IVOA Identifiers Working Draft, Version 0.1. </xs:documentation> <xs:documentation> History: </xs:documentation> </xs:annotation> <xs:element name="Identifier" type="vid:IVOAidentifier"> <xs:annotation> <xs:documentation> a global, IVOA-compliant identifier that refers unambiguously to a resource. </xs:documentation> </xs:annotation> </xs:element> <xs:complexType name="IVOAidentifier"> <xs:sequence> <xs:element ref="vid:AuthorityID" /> <xs:element ref="vid:ResourceKey" minOccurs="0"/> </xs:sequence> </xs:complexType> <xs:element name="AuthorityID" type="vid:AuthorityIDType"> <xs:annotation> <xs:documentation>the identifier a namespace under the control of a single naming authority</xs:documentation> </xs:annotation> </xs:element> <xs:element name="ResourceKey" type="vid:ResourceKeyType"> <xs:annotation> <xs:documentation>the identifier a namespace under the control of a single naming authority</xs:documentation> </xs:annotation> </xs:element> <xs:simpleType name="AuthorityIDType"> <xs:restriction base="xs:string"> <xs:pattern value="[\w\d][\w\d\-_\.!~\*'\(\)]{2}"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="ResourceKeyType"> <xs:restriction base="xs:string"> <xs:pattern value="[\w\d\-_\.!~\*'\(\)]+(/[\w\d\-_\.!~\*'\(\)])*"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="IVOAIdentifierURI"> <xs:restriction base="xs:anyURI"> <xs:pattern value="ivo://[\w\d][\w\d\-_\.!~\*'\(\)]{2}(/[\w\d\-_\.!~\*'\(\)]+(/[\w\d\-_\.!~\*'\(\)])*)"/> </xs:restriction> </xs:simpleType> </xs:schema>
http://www.ietf.org/rfc/rfc2396.txt
http://www.ietf.org/internet-drafts/draft-fielding-uri-rfc2396bis-03.txt
http://www.ietf.org/rfc/rfc2119.txt
http://www.ietf.org/rfc/rfc2234.txt
http://www.ivoa.net/Documents/WD/WD-RM.htm
http://www.ietf.org/rfc/rfc2141.txt