Links: IvoaResReg :: registry mail archive :: IVOARegWp03 :: IVOARegWp04

OAI Coordination

OAI stands for the Open Archives Initiative an effort from within the digital library community to develop open standards for archive interoperability. However, the abbreviation is often used to refer specifically to the OAI Protocol for Metadata Harvesting (OAI-PMH). This page describes issues we need to address to ensure uniformity across our OAI interfaces.


1. Things we need to agree on to make our OAI interfaces look the same:

1.1. Metadata format name for VOResource metadata: ivo_vor

This name is used in two places:

  • It is returned as one of the supported formats in the response to the ListMetadataFormats query, refering to the VOResource schema. The format description should look like this:


  • It can be used as a value for the metadataPrefix argument to the GetRecord and ListRecords queries. It also appears in the responses to these in the value of the metadataPrefix attribute to the request element.

ivo_vor was Sebastien's suggestion and we got sufficient agreement. It doesn't matter too much what we use, as long as we use the same name.

1.2. The "root" element for the VOResource metadata should be <VOResource>.

<OAI-PMH xmlns="" 
  <request verb="ListRecords" from="2004-01-01"
         <VOResource xmlns="">

This is better than using VODescription because VOResource guarantees (via the schema) that only 1 resource is being described, making this requirement verifiable by any XML verifier (including the OAI Explorer.

It's also better than using Resource or one of its sub-classes (e.g. Organisation, Service, etc.), as this complicated the handling of the metadata on the harvester's end when several possible elements are allowed at this level.

This may not be a big deal in the short term, but in the long-term it will make it easier for a harvester to decide if it can handle the record. In general, any application must answer the following questions:

  • is the XML instance valid (for the schemas I know/care about)?
  • is the root element what I need/expect it to be?

The second question is easier to answer if there is only one possible root element to check for.

1.3 The form of the OAI identifier; i.e. the value of <oai:identifier>. Deferred until after Jan!

I would like to see us use our IVOA identifiers (in their URI forms) here. Otherwise, we will find ourselves having to keep track of two identifiers.

This might seem like a no-brainer, but several of us (including us at NCSA!) are using the OAI interface script from Virginia Tech, which creates its own OAI identifiers based on the local XML file name.

Ramon has placed a modified version of this script at that is meant to serve as a drop in replacement for (Replace your old in the perl library directory used by your script.) This version will override the default OAI identifiers with the IVOA ones found in the corresponding VOResource files. It also has the added benefit of supporting deleted records.

Several people have suggested we defer unifying on this item until after January, since it is not absolutely critical. This only becomes a practical issue when a harvester uses the GetRecord function. As far as I know, none of our harvesting registries do.

1.4 Each OAI interface must export one <Authority> record for each AuthorityID it controls.

(See Authority Resource Definition.) By "control", we mean that it creates resource descriptions for. These records are used to trace back where a resource record originates from.

1.5 Each OAI interface must export one <Registry> record describing itself.

(See Registry Resource Definition.) This record should include a listing of all AuthorityIDs it manages (i.e. that it has <Authority> records for) using the <ManagedAuthority> element. These records are used to trace back where a resource record originates from.

1.6. Standard Services (ConeSearch, SIA) should include the appropriate Capability sub-element.

By standard service, we mean a service that has a schema extension associated with it. Current examples are ConeSearch and SIA. Each extension schema defines an element (<ConeSearch> and <SimpleImageAccess>, respectively) that inherits from <Capability> which contains service-type-specific metadata.

All records that describe one of these standard services should included either the <ConeSearch> or <SimpleImageAccess> tag, even if that tag contains no content. Currently, the Data Inventory Service looks for this tag to determine what type of service it is.

1.7. Set ContentLevel to "Research" if you want it chosen by the DIS

By convention, the Data Inventory Service (DIS) request from the registry only those services with the <ContentLevel> element set to "Research". Some data providers have used this convention to prevent the DIS from using certain services (because, say, their are not fully operational) by not setting ContentLevel to "Research".

-- RayPlante - 18 Dec 2003

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r3 - 2003-12-18 - RayPlante
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback