Links: IvoaResReg ::
registry mail archive ::
IVOARegWp03 ::
IVOARegWp04
OAI Coordination
OAI stands for the
Open Archives Initiative an effort from within the digital library community to develop open standards for archive interoperability. However, the abbreviation is often used to refer specifically to the
OAI Protocol for Metadata Harvesting (OAI-PMH).
This page describes issues we need to address to ensure uniformity across our OAI interfaces.
Contents
1. Things we need to agree on to make our OAI interfaces look the same:
1.1. Metadata format name for VOResource metadata: ivo_vor
This name is used in two places:
- It is returned as one of the supported formats in the response to the ListMetadataFormats query, refering to the VOResource schema. The format description should look like this:
<metadataFormat>
<metadataPrefix>ivo_vor</metadataPrefix>
<schema>http://www.ivoa.net/xml/VOResource/VOResource-v0.9.xsd</schema>
<metadataNamespace>http://www.ivoa.net/xml/VOResource/v0.9</metadataNamespace>
</metadataFormat>
- It can be used as a value for the
metadataPrefix
argument to the GetRecord and ListRecords queries. It also appears in the responses to these in the value of the metadataPrefix
attribute to the request
element.
ivo_vor was Sebastien's suggestion and we got sufficient agreement. It doesn't matter too much what we use, as long as we use the same name.
1.2. The "root" element for the VOResource metadata should be <VOResource>
.
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2004-01-05T19:20:30Z</responseDate>
<request verb="ListRecords" from="2004-01-01"
metadataPrefix="ivo_vor">...</request>
<ListRecords>
<record>
<header>
<identifier>...</identifier>
<datestamp>...</datestamp>
</header>
<metadata>
<VOResource xmlns="http://www.ivoa.net/xml/VOResource/v0.9">
...
</VOResource>
</metadata>
</record>
...
</ListRecords>
</OAI-PMH>
This is better than using
VODescription
because
VOResource
guarantees (via the schema) that only 1 resource is being described, making this requirement verifiable by any XML verifier (including the
OAI Explorer.
It's also better than using
Resource
or one of its sub-classes (e.g.
Organisation
,
Service
, etc.), as this complicated the handling of the metadata on the harvester's end when several possible elements are allowed at this level.
This may not be a big deal in the short term, but in the long-term
it will make it easier for a harvester to decide if it can handle
the record. In general, any application must answer the
following questions:
- is the XML instance valid (for the schemas I know/care about)?
- is the root element what I need/expect it to be?
The second question is easier to answer if there is only one
possible root element to check for.
1.3 The form of the OAI identifier; i.e. the value of <oai:identifier>. Deferred until after Jan!
I would like to see us use our IVOA identifiers (in their URI forms)
here. Otherwise, we will find ourselves having to keep track of two
identifiers.
This might seem like a no-brainer, but several of us (including us at
NCSA!) are using the OAI interface script from Virginia Tech, which
creates its own OAI identifiers based on the local XML file name.
Ramon has placed a modified version of this script at
http://nvo.ncsa.uiuc.edu/VO/software/XMLFileDP_vo.pm that is meant to
serve as a drop in replacement for XMLFileDP.pm. (Replace your old
XMLFileDP.pm in the perl library directory used by your oai.pl script.)
This version will override the default OAI identifiers with the IVOA
ones found in the corresponding VOResource files. It also has the
added benefit of supporting deleted records.
Several people have suggested we defer unifying on this item until after January, since it is not absolutely critical. This only becomes a practical issue when a harvester uses the GetRecord function. As far as I know, none of our harvesting registries do.
1.4 Each OAI interface must export one <Authority>
record for each AuthorityID it controls.
(See
Authority Resource Definition.) By "control", we mean that it creates resource descriptions for. These records are used to trace back where a resource record originates from.
1.5 Each OAI interface must export one <Registry>
record describing itself.
(See
Registry Resource Definition.) This record should include a listing of all AuthorityIDs it manages (i.e. that it has
<Authority>
records for) using the
<ManagedAuthority>
element. These records are used to trace back where a resource record originates from.
1.6. Standard Services (ConeSearch, SIA) should include the appropriate Capability
sub-element.
By standard service, we mean a service that has a schema extension associated with it. Current examples are ConeSearch and SIA. Each extension schema defines an element (
<ConeSearch>
and
<SimpleImageAccess>
, respectively) that inherits from
<Capability>
which contains service-type-specific metadata.
All records that describe one of these standard services should included either the
<ConeSearch>
or
<SimpleImageAccess>
tag, even if that tag contains no content. Currently, the
Data Inventory Service looks for this tag to determine what type of service it is.
1.7. Set ContentLevel
to "Research" if you want it chosen by the DIS
By convention, the
Data Inventory Service (DIS) request from the registry only those services with the
<ContentLevel> element set to "Research". Some data providers have used this convention to prevent the DIS from using certain services (because, say, their are not fully operational) by
not setting
ContentLevel
to "Research".
--
RayPlante - 18 Dec 2003