Jumps: IvoaResReg :: registry mail archive :: RegistryMetadata :: VOResourceV10
Meetings: InterOpMay2011Registry

Requirements and Use Cases for New Registry Search Interface

This page aims to capture requirements and use cases for a new (post-RI v1.0) registry search interface.

There's now a page to discuss Registry Interfaces version 2 at RI2Discussion. Following the Sao Paulo IVOA Interop, the RWG agreed to pursue a new REstFul approach based on 2 protocol directions:

  1. TAP compliant interface that is compliant with a RegTAP data model representation
  2. A simple resource restful interface that supports free text like search capability and subsetting result sets
The RI v1.0 spec provides two interfaces for searching: one is a required ADQL-based interface and the other is an optional XQuery-based query. Both use SOAP. Since a searchable registry is only required to support the ADQL interface, this is the one that applications would prefer to be interoperable across registries. These interfaces, and particularly the ADQL interface, have following disadvantages:
  • They are the only remaining recommended IVOA service interface based on SOAP. Because of the complexities introduced by SOAP and WSDLs for both providers and clients, our community has evolved to prefer RESTful interfaces.
  • It relies on an old version of ADQL (v1.0) that itself never made it as an IVOA Recommendation.
  • The ADQL interface returns either full VOResource records (some of which can be memory-bustingly large) or just their identifiers; most applications only want a few pieces of information from the VOResource records
This effort is motivated to address these problems by providing a searchable interface that is simpler and scriptable.

Use Cases

Consider the following as potentially useful queries and ways to express queries. not all of which may be included explicitly in a standard interface. In addition to adding more use cases, feel free to add comments to those that already appear that would indicate their relative importance or role for an application. For the purposes of discussion, annotate with your name (or linked initials).

-- Ray Plante (RP)

Interface clients

Clients that would interact with a programmable web search interface:

  • General purpose discovery portals (like the NVO Directory)
  • Desktop applications that work with specific types of services (e.g. Topcat)
  • Workflow environments
  • Ad hoc scripts looking for specific types of services.

Sample Queries

  • find all TAP services; return their accessURLs
  • find all SIA services that might have spiral galaxies
    • this is a common type of query desired by interactive users: they want services of a particular type related to a subject --RP
  • find all SIA services that provides infrared images.
  • find all searchable catalogs that provide a column containing redshift
  • find all the resources that registered by me (or another publisher)
  • find all searchable registries; return the accessURLs to their search interfaces
    • desktop apps can use this query to cache a list of alternative registries should the primary one be down.
  • find all resources of a particular type (e.g. Authority)
  • return the resource having a particular identifier
  • find all TAP services exposing a table having some word in the description and a column with a given UCD (this is something I find very common in "data discovery" jobs; if you want, substitute "subject" for the description word) -- MD
I would like to query the registry at a little more fine grained level:

  • find all SSAP services that provide time series, not spectra (or the opposite).
  • find all SSAP services that provide theoretical spectra, not observational ones (or the opposite).
  • find all services of any type that provide galaxy spectra, not stellar ones.
  • find all ConeSearch services that provide stellar distance information.
  • find all services that provide theoretical isochrones.
This could be better done having a better description of services:

- Having an "object type" keyword with values "galaxy, star, brown dwarf, etc"

- Having a "data type" keyword with values like "spectrum, time series, datacube, isochrone, distance, photometry, etc".

These keyword should be multivalued (a service giving distances and photometry for stars and galaxies). And, even though one could imagine them as a predefined set of values, I think it's more practical to leave them as free text. I imagine that the registry web form could contain a select field with all the values previously used by other services so that the registrant can choose among them and thus try to avoid too many different words with almost the same meaning.

- Having a "data origin" keyword to specify if the service provides observational or theoretical data (this could be done now using "content type=simulation" for theoretical data, but it is quite confusing, because the other options don't need to correspond to observational data either).

If the user has the option to choose before querying the registry for services, then the query itself would be more powerful, and so the following use cases would have sense:

  • find all object types having services providing spectra
  • find all data types (spectra, theoretical isochrones,...) for a specific object type
--CRB

Query Interactions

This section looks at possible service interaction patterns.

Single Resource resolution

  1. Input: identifier
    Output: a full VOResource record
    • This might be most useful for browsing purposes --RP
  2. GetCapabilities
    Input: identifier
    Output: a list of supported capabilities
  3. GetTables
    Input: identifier
    Output: a VOResource TableSet description
  4. Get the accessURL
    Input: identifier, capability type (by standardID?)
    Output: access URL
    • Considered most important to desktop apps and scripts --RP
I am not so sure if GetCapabilities and GetTables are important operations on the registry since VOSI now requires the respective endpoints on the services themselves. But a quick way to get an access URL from an ivoa id would certainly be most welcome. We'd need to specify the answer for ids that have no access URLs, though. As to the retrieval of the whole record, I think that is sufficiently covered by OAI-PMH. --MD

One issue that I cannot dismiss by handwaving towards ADQL is the "keyword search" thing. In a mail to the registry mailing list dated 2011-05-24T10:40:38, PaulHarrison notes that the current state of wildly differing responses of different registries to identical keywords is a pain. While I would stipulate that quite a few of the differences are due to registries in fact holding significantly differing data, I agree the concept of "keyword search" needs some clarification. This is particularly true since offering "google-style" queries is highly desirable when everyone feels they know how to use google. Now, ADQL has no built-in provisions for "IR-like" queries (where IR here means Information Retrieval). One way to work around this could be a user-defined function registry interfaces should provide, maybe along the lines of Postgres' Text Search functions and operators -- say, haswords("quasar recent", description). --MD

Search Queries

Search queries provide return a list of records that match a set of input constraints.

We note that a resource record can, in general, have zero or more capabilities associated with it, and each capability can have multiple interfaces (e.g. supporting different versions of a capability), each with its primary own accessURL. Thus, the result records return from search queries could represent ...

  • a list of resources
  • a list of capabilities
    • depending on the output details, this is probably the most convenient record type for clients looking to access services --RP
  • a list of interfaces
In principle, these could come back in a variety of formats, including table formats. Desktop applications, portal apps, and workflow builders will usually want to get a few pieces of information (most notably, a service's accessURL; see opening motivation above). It would be both easier for these clients and more efficient if the client could specify exactly what information they want back.

Possible output forms

  1. an XML list of VOResource records
    • This is provided by the current RI search interface; however, there are no known cases where this output is the most desired. As long as VOResurce records are available individually via single identifier resolution, we can probably do without supporting this type of query. --RP
  2. VOTable format
    • This format would be convenient for both portal clients that might want to display (and allow the user to manipulate) the results and scripters (with access to an easy to use VOTable parser) --RP
  3. CSV table format
    • This format, depending on the specific clients, might be most conenient for scripters and specialized desktop tools (Topcat? - Nah - VOTable is fine for me -- MT ) --RP
  4. a list of space-delimited identifiers
Questions:
  1. Would it be useful to allow a user to select a non-standard format through the standard search interface?
If we use TAP as an access protocol, this question largely becomes moot. People will get VOTables by default, but if the system supports other formats, they can get data in those, too. The only exception is the list of VOR records, since they would probably not be accessible through TAP. These, however, are already covered by OAI-PMH, which the registry needs to speak anyway. I think OAI-PMH has enough expressivity for those applications that actually want to deal with VOR records.. --MD

I don't think that multiple output formats are very useful. The clients will normally be software of some sort rather than a human directly talking to the registry service (no?), and such clients can easily transform from a single format (presumably VOTable) to whatever makes sense for the user. -- MT

Ways to constrain search results

  1. keywords: find resources related to a particular subject
  2. keywords + capability type: find SIA services relatedd to a particular subject
    • not currently support by the current RI, a number of developers have since requested this. --RP
  3. ADQL/SQL-like constraints
    • Given current experience with the RI search interface, it is unlikely that clients will need to place constraints on any arbitrary data available in a VOResource record. That is, not all VOResource metadata need be searchable. --RP
    • Despite the above statement, the query constraint must be extensible to the different specialized capabilities (e.g. TAP, SIA, etc.) including future ones. In other words, we should not have to update the new RI spec everytime a new registry exetension is added. --RP
  4. ADQL/SQL-like constraints + keywords: find resources related to a particular subject where publisher like %NASA%
    • I have found queries of this type particularly helpful for exploring the contents of the registry for administrative purposes. Publishers could use this capability to explore how their own resources respond to different keyword queries. --RP
I'm all for straight ADQL as the default query language in RI, but then I'm biased because that would be very easy for me. Still, most astronomers wanting to use the VO will -- I hope! -- learn ADQL, and not supporting it on our registries will not look good. This means we'll need to store the essential aspects of VOResource in relational tables. So, what about extensibility? Ray, do you have examples of those specialized capabilities? The example I can think of, data model support in TAP services, could be covered using a table containing tuples of ivo id, keyword, and value -- maybe that would be sufficient for most of these? --MD

Requirements

Here we list any functional and technical requirements (some deriving from the above use cases).

  • the interface should be REST-like based primarily on HTTP Get.

    • I'd still prefer we just use TAP as specified, plus OAI-PMH; this may be a bit more work for those registry operators that don't have TAP services yet, but it sure beats defining a new interface. If we feel the need for custom shortcuts, I'd guess PQL would help. And sync TAP lets you have GET-only operation. --MD

Implementation Considerations

Here we collect considerations for implementations that do not necessarily derive from requirements (but may feed into them).

XML vs. Relational DB backends

The current RI v1.0 search interface reflects an attempt to provide a search interface that could be implemented against either an XML database backend or a relational one. Can this be achieved with a new, RESTful design? Can at least part of the interface be database agnostic? Is it worth trying?

PaulHarrison on the mailing list (mail dated 2011-05-24T10:40:38) argued that if we move RI towards TAP/ADQL then a "full relational registry data model should be attempted" and in consequence "we probably should stop using XML schema to describe the registry model". While I'd always advocate avoiding XML schema wherever possible, I disagree here on dropping current VOResource practice. We are part of the bibliographic community, and OAI-PMH is XML-based. In my view, it is the "primary source" of our registry information (which essentially is just a collection of bibliographic records). I'd much rather see a map of "as many" of VOResource features as "proportional" (to the purposes defined by the use cases) to some relational model. --MD

I have put together a page with some modelling of a Relational RegistryDM --PH

TAP Interface

A TAP interface that supports a "Registry Data Model" has an advantage to developers (in particular, scripters) because it is potentially a query interface they already know.

  • A simplified view of the VOResource model could probably be mapped to three tables: resources, capabilities, and table. Some DB Views that dynamically join these tables could make things more convenient for TAP users. For example, one view could combine capability information (including the accessURLs) with resource level metadata (like identifier, shortName, and title). --RP
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r11 - 2013-02-26 - GretchenGreene
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback