(back to main)

The SimDB-Registry Connection

Here we discuss the relations between SimDB and the Registry.

Questions to consider

  • How should a SimDB be registered?
  • SimDB/Resource is not equivalent to Registry/Resource, but has clear correspondences. Should these be made "formal" in some way, for example using UTYPEs.
  • Some SimDB/Resources could/should be registered in a registry as well. This is one motivation for SimDB/Project for example. Can this be postponed to later version?
  • In view of previous point, can SimDB be seen as an "extension registry"? Should this be formalised (in a later version)?
  • What can SimDB learn from registry efforts regarding:
    • allowing (and specifying ?) upload of new resource-s
    • harvesting
    • use of references to other existing, registered resources (unsing IVO Identifiers).
    • query interfaces
  • other ...

Discovery Use Cases

The common ground between SimDB and Registries is in discovering of data and services. The purpose of the SimDB is to allow users to discover simulation datasets using a rich but focused data model. For example, users may visit a SimDB portal and use its search page to find experiments based on, say, the type of physical phenomena being simulated and the physical parameters that were calculated. This use case mainly addresses theorists-users specifically--that is, users expressly looking for theoretical data.

We would also like to allow theoretical datasets to be discovered by users that are not necessisarily looking for them. For example, a user that is looking for data related to evolved stars, she might find a catalog of observed evolved stars as well as simulations of evolved stars of various masses. Currently, this kind of broad discovery is done via an IVOA Registry. Thus, to support this ability at some level, theoretical data collections need to have some representation in the Registry.

We can consider two ways SimDB datasets might be represented in a Registry by making analogies to current practices with collections of observed data.

Scenario 1: The NCSA Astronomical Digital Image Library

The ADIL is a repository that allows astronomers to deposit their images into a VO-enabled library when they publish the paper that results from it. One of the advantages of this is that the ADIL already supports VO protocols; simply by depositing the images, they can be discovered via standard VO services.

The ADIL is currently registered in the Registry as a single DataCollection. Its images are searchable via a single SIA service. Thus, to discover an ADIL image, one first discovers the ADIL via the registry and then drill down to images via SIA. The most common way that VO users discover ADIL images is via DataScope which essentially uses this mechanism for the user. The user provides a position initially and then can drill down by waveband.

If a SimDB is like the ADIL, there will be a limit to the specificity of discovery of theory data that can be discovered via the Registry; in particular, the limit will be set by the focus of the SimDB repository. For example, one might have a SimDB dedicated specifically to simulations of galaxy clusters and the early universe.

I note that the images are organized into projects composed of scientifically related images. Each project has an abstract that describes the "sub"-collection. The publisher is considering registering each ADIL project individually. This would expose more science-specific metadata/keywords into the Registry. This would allow users to discover, say, radio images of spiral galaxy directly via the registry.

Scenario 2: The HEASARC Catalog Services

The HEASARC archive provides data products related to high-energy astrophysics. They provide access to some 500 catalogs, each available via the Cone Search protocol. Thus, individual catalogs can be discovered via a Registry query. In particular, the specific science terms that are part of the description appear in the Registry. This allows users to discover, say, catalogs related to cataclysmic binaries.

If a SimDB is like the HEASARC, then some notion of logical collections--e.g. sets of simulations produced by a group of authors--then each collection could be registered separately. This could be done via a specialized extension of the DataCollection Resource (or the generic Resource) from VOResource that adds metadata specific to theoretical simulations. These resources can associated with specialized services that access the data.

In this scenario, we can still have a specialized SimDB portal that is targeted to users looking specifically for users looking for simulated data. It may or may not support the standard registry interface, but doing so would represent a currently unexercised feature of the Registry framework--a local searchable registry, a registry that specialized in a particular kind of data.

(back to main)

Topic revision: r4 - 2009-05-26 - RayPlante
This site is powered by the TWiki collaboration platformCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback