TWiki
>
IVOA Web
>
IvoaResReg
>
RegistryOperations
>
RegisteringBestPracticesDisc
(revision 2) (raw view)
Edit
Attach
---+ Discussion of Best Practices for Registering Resources %TOC% ---++ Background Over several Interops, we have discussed some best practices regarding how to register scientific data collections and the services that access them. The motivation for such practices include: * making them easier to find under targeted searches. We want to make sure that resource descriptions include important information that will likely be used in queries * make search responses more comprehensible by, for example,... * avoiding displays of search results that appear to have multiple occurances of the same resource (when they are actually subtly different) * making it clearer what the individual resources represent * distinguishing between "original" published data and "mirrored" or "re-published" data Among the ways we have discussed doing this is by promoting a uniform pattern for registering resources. The idea is that if collections and their services were registered according to a uniform convention (and that convention could be recognized when in use), client applications (such as the [[http://vao.stsci.edu/discover][VAO DDT]]) could provide a more meaningful and easier to understand display of search results. ---+++ A Problem with Early Proposals One convention that has been proposed and which is already in some use now involved separating the description of the underlying collection from the description of the services that access it into separate resources (e.g. [[%PUBURL%/%WEB%/InterOpMay2013Registry/IVOA-RWG-upgrades-rofr.pdf][see "A Common Registration Pattern" slide of Plante's May2013 presentation]]). That is, * a data collection would be registered as a =DataCollection= resource; this discription would include all of the science-related information in it (including table column descriptions, if applicable). * services that access the collection registered collectively but separately (as a =DataService= or =CatalogService=) * relationship links would connect to the collection with its services. At the [[InterOpSept2013][Sept. 2013 Interop]], Markus Demleitner reported on some important disadvantages this approach presents for some simple but important search use cases using the TAP interface (see [[%PUBURL%/%WEB%/InterOpSep2013Registry/regtap.pdf][slide 6, "Uneasy Relationships", and beyond of his presentation]] for details). In particular, separating the access metadata (in the Service resource) from the science metadata (in the =!DataCollection= resource) requires some fairly complex joining using relationship information. It is easy to argue that under this convention, certain simple queries can not be expressed simply. ---++ A Revised Proposal: a single resource per collection We can avoid the joining mess described above if the science and access metadata are included in the same resource description. Doing so would also make it easier to interpret search results. I propose, then, that we combine the role of the =DataCollection= and =DataService= (or rather =CatalogService=) into a single resource type. In detail the proposal would be described as follows: * Every data provider is registered via an =Organisation= resource (and their authorities registered separately as well) * Each data collection published by the provider is registered with a resource type<sup>*</sup> that includes both science metadata describing what is in the collection and =capability= elements for each of the services that access the collection. * =relationship= elements would be used to connect it to other related collections. In particular, if a collection is mirror or derived from another collection, it would include a relationship that points back to the source collection. ---+++ <sup>*</sup>The Collection Resource Type and the Evolution of a Resource There is some question as to the "proper" resource type we should use--that is, whether we can use an existing type or we would need to define a new one. We observe that the existing =CatalogService= resource type provides all of the metadata required to describe a collection and its services. Re-using this type would obviously have no impact on the TAPReg schema. Still, the semantics of this type, one could argue, do not quite match that of a "collection". For example, we allow (and in fact encourage) providers to register collections even before there are any services available to access it. Perhaps there is only a web page (which would be given via the =referenceURL= element). Perhaps the collection is a simple catalog that is downloadable as a single file but not yet searchable via ConeSearch. Semantically, this resource has risen yet to the status of a "service". Nevertheless, a =CatalogService= resource record is not _required_ to have any =capability= elements, so syntactically, a =CatalogService= resource would work just fine. There is one feature of a =DataCollection= that is not strictly present in a =CatalogService=: a =DataCollection= provides a special =accessURL= element for accessing the collection as a whole (e.g. as a single file or, say, a directory of files). However, this would simple to capture in a generic =capability= if need be. If we felt that semantics were important (or we had other reasons not to use the =CatalogService= type in this role), defining a new type need not be difficult since no new metadata elements need to be defined. (This is discussed further below.) For this reason, the impact on existing registries would likely be minimal. One reason to define a new type (that otherwise looks just like =CatalogService=)--let's call it =DataCollection2= --would be provide an explicit signal to clients that a resource complies with the unified convention. In particular, if one found a =DataCollection2= resource, one could be certain that there are no separate resources describing its services, whether it has =capability= elements or not. So, we conclude that we have two choices for registering data collections and their services together: * use =CatalogService=, ignoring the semantic inaccuracies * define a new resource type (e.g. =DataCollection2=) with the correct semantics but the same contents as =CatalogService=. ---+++ Regarding Services accessing Multiple Collections <!-- * Set ALLOWTOPICRENAME = IVOA.TWikiAdminGroup -->
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r2 - 2013-10-22
-
RayPlante
IVOA
Log in
or
Register
IVOA.net
Wiki Home
WebChanges
WebTopicList
WebStatistics
Twiki Meta & Help
IVOA
Know
Main
Sandbox
TWiki
TWiki intro
TWiki tutorial
User registration
Notify me
Working Groups
Applications
Data Access Layer
Data Model
Distributed Services & Protocols
Registry
Semantics
Interest Groups
Data Curation
Education
Knowledge Discovery
High Energy
Operations
Radio Astronomy
Solar System
Time Domain
Committees
Stds&Procs
www.ivoa.net
Documents
Events
Members
XML Schema
Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback