The Registry of Registries
Version 1.00

IVOA Note 2007 June 1

Working Group:: http://www.ivoa.net/twiki/bin/view/IVOA/IvoaResReg
This version:: http://www.ivoa.net/Documents/Notes/RofR-20070602.html
Latest version:: http://www.ivoa.net/Documents/latest/RofR.html
Previous versions:: n/a
Author(s):: RayPlante

Abstract

The Registry of Registries refers to a service maintained by the International Virtual Observatory Alliance (IVOA) via the Resource Registry Working Group that provides a mechanism for IVOA-compliant registries to learn about each other. More specifically, it is itself a compliant publishing registry which contains copies of the resource descriptions for all IVOA Registries that wish to share its records in the registry network. It can also maintain and export various other resource records that describes IVOA infrastructure, most notably IVOA standards. This document describes how the Registry of Registries is populated with records--in particular, how a curator can register a new publishing registry--and how other full harvesting records can discover new registries to harvest from.

Status of this Document

This is an IVOA Note. The first release of this document was 2006 September 21.

As an IVOA Note, this document expresses suggestions from and opinions of the authors.
It is intended to share best practices, possible approaches, or other perspectives on interoperability with the Virtual Observatory. It should not be referenced or otherwise interpreted as a standard specification.

This document specifically proposes a conventional practice to aid interoperability of between registries. It is not to be considered a standard as there may be alternative ways to accomplish the same objectives.

A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.

Acknowledgments

The idea of a registry of registries was originally suggested by Bob Hanisch.

This document has been developed with support from the National Science Foundation's Information Technology Research Program under Cooperative Agreement AST0122449 with The Johns Hopkins University.

Definitions

The Virtual Observatory (VO) is a general term for a collection of federated resources that can be used to conduct astronomical research, education, and outreach. The International Virtual Observatory Alliance (IVOA) is a global collaboration of separately funded projects to develop standards and infrastructure that enable VO applications. A Registry is a repository for descriptions of VO resources. An IVOA-compliant Registry is a registry that conforms with the IVOA standard, "Registry Interfaces" [1] which describes service interfaces for retrieving resource descriptions from the registry.

Abstract
Status
Acknowledgments
Contents
1. Introduction
2. Registering a New Registry
3. Discovering Registries
4. Other Uses and Concerns
References

1. Introduction

Within the Virtual Observatory, resource discovery starts with registries. The IVOA standard, "Registry Interfaces" (RI) [1], describes a service interface that allows client applications to search a registry for resources of interest based on some constraints. A registry is populated through a process called publishing in which a resource provider adds resource descriptions to the registry's holdings. The VO hosts a number of registries, each able to publish resource descriptions. Many registries that support client search queries will want to collect all resource descriptions known to the VO. The RI standard also describes how registries can share their descriptions with each other through a process called harvesting: periodically, one registry can request from another registry newly published resource descriptions. By harvesting from all other known registries, a registry is able maintain a complete collection of all availabe resource descriptions in the VO.

This document addresses how a registry becomes "known" so that other registries will harvest from it. It proposes a boot-strapping registry that can hold a complete list of harvestable registries. This "registry of registries" would provide an interactive web browser-based interface that allows a registry provider to deposit the resource description of a new registry. Other registries can discover the new registry by starting its harvesting process with a harvest of the registry of registries. Section 3 describes this process in detail, leveraging the standard harvesting interface defined by the RI.

The Registry of Registries (RofR, pronounced "rover"), thus, provides an important lynch pin for connecting all VO registries into a working network. For this reason, this document proposes that this registry be provided and maintained by the IVOA. Since the Registry of Registries is itself a RI-compliant publishing registry, it can publish its own descriptions that appropriate for the IVOA to curate. As an example, the IVOA can publish descriptions of IVOA standards; this allows standards to be assigned IVOA identifiers which can be used to refer to the standards by other descriptions or applications. This document does not describe how standards might be described or referenced, but merely provides it as an example of IVOA-published resource descriptions.

Note that it is not intended that the RofR be considered a "master" registry. It is not intended to provide, for example, any services (such as searching) for end users or user applications directly. Its only clients are other registries; even the records it publishes will typically be accessed by applications through other full registries. The RofR merely is meant to provide the glue that establishes the network of interoperable registries.

2. Registering a New Registry

The process that the RofR presents for registering a new registry incorporates a compliance check of the registry. That is, if the registry is found not to be fully compliant with the RI [1], it will not be registered. If the registry is found to be compliant, then the provider will have the option of completing the registration and have the registry's own resource record added into the RofR.

Thus, once a registry provider has deployed a new publishing registry, he or she can register it with the RofR by visiting the web page at http://rofr.ivoa.net/rofr/check.html. This page presents a simple form comprised of one text field into which the provider can enter the URL endpoint for the registry's OAI harvesting interface. This needs to be the endpoint for standard HTTP GET interface required by the RI (i.e., not the SOAP Web Service version). Submitting the form by clicking the "Check" button will begin the validation process in the background.

As part of the validation process, the RofR will send test queries to the registry's OAI harvesting interface, invoking all of the OAI standard verbs, to test whether the interface is compliant first with the base OAI-PMH standard [2] as well as with the extra requirements imposed by the RI [1]. The RofR will also use the ListRecords operation (with set=ivo_managed) to obtain all of the VOResource records the registry publishes. Each of these records is checked for compliance with VOResource schema [3] and its standard extensions.

When the validation completes, the provider will be presented with the results, including any failures, warnings, and recommendations. To be considered fully compliant, the results must include no compliance failures. If this is the case, the results page will ask if the user wishes to register the registry. When the user chooses to proceed, the RofR will extract the VOResource record describing the registry from the results of the Identify operation.

Note that since the user must take an additional explicit action to actually register a registry, it is fine if the user only wishes to use the RofR for its validation capabilities.

3. Discovering Registries

Publishing registries make known their desire to be harvested by any other registry by successfully registering their registry with the RofR. Thus, the RofR repesents a listing of pointers to all resource records considered "known" by the VO. This section describes the recommended process for incorporating the RofR into an overall harvesting process that collects all resource descriptions known in the VO. Typically, the harvester is a so-called full registry (in the sense defined in the RI [1]), but it can be any application that is attempting to get every record known to the VO. It builds on the process recommended by the RI (section 3.2) that assures that the harvester does not receive duplicates of the same record.

Step 1. Retrieve a list of publishing registries from the RofR. The harverster starts by harvesting the RofR by making an OAI-PMH ListRecords operation using the set argument set to ivo_publishers. This will return the registry records (i.e. resources with xsi:type='vg:Registry') for the registries that successfully registered themselves as described in section 2.

The very first time the harvester executes the ListRecords operation, the since argument should be not used so that all known publishing registries are returned. The registry should cache at least a mapping of the registry identifiers to their respective harvesting endpoints, as well as the date and time that this operation was carried out (see Tip below). Then, at the start of subsequent harvesting runs, the harvester can provide the cached date using the since argument to receive only new and updated records. When such a record is returned the cached mapping for the corresponding registry should be updated.

Example

The URL that invokes OAI-PMH-based harvesting of known publishing registries a standard service capability. Here is the URL that can be used the first time harvesting from the RofR:

http://rofr.ivoa.net/harvest/oai.pl?verb=ListRecords&metadataformat=ivo_vor&
set=ivo_publishers

On subsequent visits, this URL can be used:

http://rofr.ivoa.net/harvest/oai.pl?verb=ListRecords&metadataformat=ivo_vor&
set=ivo_publishers&since=2007-07-11T13:01:24

Note:

The harvester would not typically cache the complete registry records received in this step into its normal metadata store. This is because it will receive each of these records again when it harvests from each publishing registry. This latter one will always be more up-to-date than the one returned by the by the call from the RofR.

Tip on extracting information from the harvesting response:

A registry's harvesting service endpoint is located in the resource record in the capability[xsi:type='vg:Registry']/interface[role='std' and xsi:type='vg:OAIHTTPGet']/accessURL element.
It's best to get the date and time of when the harvesting operation was carried out directly from the OAI response itself. This will ensure that your time scale matches the one used by the server. The time-stamp is gotten from the /oai:OAH-PMH/oai:responseDate element.

One of the registries that will be returned in this first step will be the RofR itself. That is, the RofR will be a publisher of a small number of resources on behalf of the IVOA. At a minimum (i.e. before any other registries have passed validation and have been accepted into the RofR's holdings), the RofR will return in this step just the registry record for the RofR, itself.

Step 2. Harvest from each of the returned publishing registries. The harvester then steps through its cached mapping of registries to harvesting endpoints. On each endpoint, the harvester invokes an OAI-PMH ListRecords operation using the set argument set to ivo_managed (as described in section 3.2 of the RI [1]). The since argument is applied as necessary to get only updated records from the registries.

Note:

If Step 1 returns a new publishing registry, you should remember not to use the since argument in Step 2 for that registry so that you get all records it currently publishes.

Note:

Harvesting from the RofR with set=ivo_managed will not return any Registry records except its own, as is the case for all publishing registries. In other words, the RofR is not considered the publishing registry of other registry resources; those other registries are.

It is expected that the list of publishing registries served by the RofR will change infrequently (more infrequently, for example, than any other typical publishing registry). Thus, it is not critical that Step 1 complete successfully every time the harvesting process is undertaken. In particular, if the RofR is unavailable (e.g. due to a network or server failure), then the harvester can proceed to Step 2 using the cache of harvesting endpoints it already has. It is unlikely that the harvester will miss the appearance of new registries. The corollary to this is that the RofR does not have a strong requirement for high availability; the web of IVOA registries can tolerate occasional, unannounced downtime.

4. Other Uses and Concerns

4.1. Records published by the Registry of Registries

At the time of this writing, the only type of resource that is originally published through the RofR (with the exception of the RofR's own registry record) is expected to be through resource types defined by the VOStandard extension schema [4]. In general, a record of this type describes a standard (defined by a specification document) used in the VO. The standards described in the RofR are those prepared and endorsed (at some level) by the IVOA. The motivations for registering a standard include:

to define an IVOA identifier for the standard. This allows, for example, a service to indicate its compliance with a standard by citing the standard's identifier.
to enable a registry user, particularly potential service providers and client developers, to discover the specification document based on its identifier. That is, if a user encounters via a registry search a service that is compliant with a standard that is as yet unknown to him, he can resolve the identifier to the description of the standard in the registry, which will in turn point him to the full specification document.
to provide definitions and information that apply to all compliant instances. This can include:
- the names and definitions of standard properties that can be used in conjunction with an instance of standard being described.
- required and optional input parameters that are defined by a standard service protocol. This is to aid automated service clients and agents to understand how to call the service and to do things like dynamically create an interface for the service. When the base-level parameter information appears in the Standard record, it is not necessary for service instances to indicate their support for required inputs; they only need to describe their support for optional or non-standard arguments.

The VOStandard extension is described more fully in the associated VOStandard specification [4].

4.2. Browser-based Views of the RofR

As explained in the Introduction, the RofR was motivated by a need for registries to discover each other in a programmatic way. Thus, apart from the interface to register a registry, little more is needed in the RofR's web site to fulfill this goal. Nevertheless, it is useful to IVOA participants and developers to see and browse a list of known publishing registries through the browser. Thus, it is expected that this will be added as a convenience. It is not, however, considered a critical feature of the RofR.

It has been suggested that the RofR could serve as an aid to users--data providers, specifically--looking for a registry to publish resources through. At this time, this is considered out-of-scope for the RofR. From its inception, the RofR was never intended to provide anything directly for end users. (Serving end users directly would require higher availability; see the end of section 3.) This reflects the role of the IVOA as a whole as an organization that attempts merely to coordinate the activities of the various VO projects rather than serve end users. With this vision in mind, it is expected that any user that happens upon IVOA-managed web sites should be directed toward the member projects for services and support.

4.3. Discovering Searchable Registries

A similar question has been raised about how client applications might discover searchable registries that it can choose from to discover services that it might interact with. By analogy, this question begs the suggestion that the RofR could serve the initial point of discovery. Since a searchable registry is not required to also be a publishing registry, the RofR would have to provide a mechanism for registering purely searchable registries for this to work simply in general.

At this time, this use of the RofR is considered out-of-scope for two reasons. First, it can be argued that this would cross the line of serving end-users (and require higher availability--see the end of section 3). Second, any full registry can serve this same role, since it knows about all other searchable registries. Regardless of which registry is used as the initial point of discovery, a client would still need to hard-code (or provide in an out-of-the box default configuration) the endpoint of the registry. Thus, this does not need to be the RofR's endpoint.

(A counter-argument might be that the RofR's endpoint is considered more stable than the endpoint of a project's searchable registry. A client could guard against the initial registry's endpoint changing or disappearing by configuring multiple initial registries. It only requires one successful connection to a full registry to get the current endpoints of all searchable registries and update the client's configuration accordingly. The likelihood, then, of a client application not being able to find a registry to search becomes quite low.)

Nevertheless, it is possible for a client application to use the RofR, through its harvesting interface, to discover searchable registries (despite the advice of this document). It could examine the descriptions of the publishing registries and find one that claims to be a full registry supporting the searching interface. A query to this registry would reveal all other known full searchable registries.

4.4 Security

Secure authentication is certainly needed if we want to allow registry providers to update the endpoint of their registries' harvesting endpoints. (All other changes to the registry registry record can be captured if the RofR internally includes a harvester that checks for updates to the registry records it holds.) Endpoint updates would require that the provider revisit the RofR's validation and registration page and enter the new endpoint. Assuming the new endpoint passes validation, the previously existing record would get replaced. Consequently, the RofR would need a way to guarantee that the previous record was submitted by the same user as the one later overwriting it.

It is preferred that the RofR portal use an authentication mechanism based on IVOA security standards. When RofR is initially launched, the IVOA security standards will likely not ventured into standard ways to authenticate to a web portal; thus, some details for how security will be deployed and engaged by the RofR users still need to be worked out.

References

[1] Benson, K, Andrews, K., Auden, E., Graham, G, Greene, G, Hill, M., Linde, T., Morris, D., O'Mullane, W., Plane, R., Rixon, G., Stebe, A. 2007 Registry Interfaces Version 1.01
http://www.ivoa.net/Documents/latest/RegistryInterface.html

[2] Lagoze, C, Van de Sompel, H., Nelson, M., Warner, S. 2002, The Open Archives Initiative Protocol for Metadata Harvesting, http://www.openarchives.org/OAI/openarchivesprotocol.html

[3] Plante, R., Benson, K., Graham, M. Greene, G, Harrison, P., Lemson, G., Linde, T., Rixon, G., Stebe, A. 2006, VOResource: an XML Encoding Schema for Resource Metadata, Version 1.02, an IVOA Working Draft, http://www.ivoa.net/Documents/latest/VOResource.html

[4] Harrison, P., Plante, R., Rixon, G., Morris, D. 2007, VOStandard: an XML Encoding Schema for Standards, Version 0.20, an IVOA Working Draft

The Registry of Registries Version 1.00