IVOA
 Registry Working Group
Links: Registry Twiki Registry Mail Archive IVOA Members
International Virtual Observatory Alliance

VOResource: a Case Study in rendering a Metadata Model in XML Schema

Ray Plante (NCSA/UIUC)
17 Apr 2003

Abstract

This article gives a brief overview of the VOResource XML Schema (v0.1), a metadata model based on the document, "Resource and Service Metadata" (Hanisch et al. 2002). It includes figures to aid visualizing the model. This overview is followed by a discussion of the general approach taken to the XML modeling process. I am using this exercise to develop an XML Schema authoring style that is well suited to defining metadata in general. I plan to turn my lessons learned into a list of recommendations (to appear in a separate document) for rendering data models in XML Schema that will encourage smooth extensibility and aid the standardization process.

Contents

1. VOResource Overview
1.1. Common Resource Metadata
1.2. Support for Simple Image Access Protocol
1.3. Sample XML Document
2. General Approach
2.1. Clarity and Reuse
2.2. Extensibility
2.3. Tool Integration
2.4. Conclusion

1. VOResource Overview

1.2 Common Resource Metadata

The starting principle of the resource data model for the VOResource schema is that anything can be a resource. An organization, a service, and a data collection are examples of resources (see Figure 1). This schema first attempts to capture all of the metadata that is potentially common to all types of resources. This commonality defines the generic Resource element. Figure 2 illustrates the structure of this metadata; the yellow box (representing a genericResource type) encloses the common Resource metadata. In includes three main catagories: Curation, Content, and Coverage.

Different types of Resources
Figure 1: The different types of Resources
Resource metadata
Figure 2: Generic Resource and Service Metadata

The second principle recognizes that different types of resources require addition metadata to describe that is not common to the other types. This schema provides a way to "hook" on this extra metadata using a combination of XML type extension and XML substitution groups (see section 2.2 below). In a real sense, the Organization, Project, DataCollection, and Service elements inherit from the generic Resource. Resources that don't match one of these "sub-elements" could be described by the generic Resource element, or additional sub-elements could be defined to accommodate. The arrows in Figures 1 and 2 indicate that a sub-element can be substituted in anywhere that a Resource is expected. The VODescription element is provided to allow a listing of Resources of different types.

The Service element extends Resource to add two additional child elements: Capability and Interface (see Figure 3).

Generic Service Metadata
Figure 3. Generic Service Metadata.

The Interface element is an extension point for describing different types of interfaces, such as Web browser-driven forms (WebBrowser) and Web Services (WebService). The WebService simply provides a place, InterfaceURL to point to a WSDL document.

The Capability element is place to place all metadata that does not fit well into the interface description. The StdCapability sub-element is meant for describing standard services such a Simple Image Access or Cone Search. This is described below.

1.2 Support for Simple Image Access Protocol

The metadata specific to Simple Image Access has been separated into another schema, VOStdService. The SIA's interface is described with a ParamHTTPGet element (a sub-element of Interface); this holds the supported input parameters, the BaseURL, and the MIME type of the output (see Figure 4). The SIA specification defines a special set of metadata which are interpreted here as part of its "Capability". In this schema, the StdCapability is extended to produce the SimpleImageAccess to hold that metadata, as shown in Figure 5. The VOTableColumns element is meant to list the VOTable Field tags that describe the VOTable output columns. In both figures below, the "vot" namespace prefix indicates elements borrowed from the VOTable schema.

ParamHTTPGet
Figure 4. SIA Interface represented as a ParamHTTPGet element.

SIA Metadata
Figure 5. Simple Image Access and related metadata.

I note that the separation between VOService and VOStdService is not as clean as it should be. (The separation was mainly guided by experimentation and debugging needs.) The ParamHTTPGet should be moved to VOResource as one of general service interface types. Furthermore, VOStdService should be specialized (at least in name) to just SIA. Other standard services should have their specific metadata in their own separate XML Schema documents.

1.3 Sample XML Document

The example, adil.xml, uses a VODescription to list four resources associated with the NCSA Astronomy Digital Image Library: a Project, a DataCollection, and two Service records. The first service is just the browser-based search page for the Library, and the second is the SIA interface. Note that the author did not feel the need to fill out values every possible metadata tag in the schema. In particular within the SIA's service description, the supported input query parameters and the output columns were not included; this information can be added latter by an automated service verifier.

2. General Approach

The various goals I've tried to keep in mind fall into four catagoies; that is, the XML Schema should:
  1. achieve good clarity in representing a data model
  2. encourage reuse of metadata as building blocks for other schemas.
  3. provide clear paths to extensibility
  4. operate well with widely available XML tools.

2.1. Clarity and Reuse

I attempt to achieve the first goal with a schema model that is a close reflection of we think about and use metadata. If the match is good, it should be easy to automatically transform the XML Schema (using XSLT) into a human-readable metadata dictionary that uses a minimum of XML jargon to describe itself. To achieve this, the schema model needs to capture components that define a metadatum:
  1. a name,
  2. a meaning,
  3. a type, and
  4. a value matching that type.
In my mind, when it comes to handling metadata, the meaning is as important as the type--but they are not the same thing. A type defines the form the value comes in; it's a container with no meaning. For example, calling something an integer or a string doesn't tell you what the value represents, and an integer can be used to hold values that represent very different things. A value of a certain type becomes a metadatum when meaning is attached to it, represented by its name.

In object-oriented languages, the distinction between type and meaning is fuzzy. A Vector class is a generic container that carries no particular meaning; its methods are usually about accessing its components. A Screen class, on the other hand, has meaning, and therefore, it might include other methods that control how it works meaningfully with other classes, like Window and Scrollbar. Fortunately, XML Schema makes a distinction between elements and types which we can take advantage of.

Thus, the metadatum components are captured in the following way:

This provides a natural way of interpreting an XML instance document, independent of a schema: the element name represent the meaning of the values it contains.

Also as a way of aiding clarity, a metadatum should have a general meaning that is independent of how it is used. It's role may be somewhat different when it is used as a component of different, more complex metadata. Nevertheless, its general meaning does not change, and its type does not change. To enforce this,

(The correllary to this is that complex elements will define their content using the ref= mechanism.) Not only does this encourage the reuse of elements to describe different things, it enforces consistant use of the element. A "Frequency" element can be used to describe a bandwidth or an observation. In both cases, the meaning of Frequency and form that it takes should be the same.

The only few elements that are defined locally (i.e. within a complexType without using a ref attribute), are ones that have no particular meaning to them but just provide structure. For example, an item element is defined to delimit elements in a string array. Capitalization is used to emphasize the difference:

2.2. Extensibility

The Resource metadata is a good example of why extensibility of the metadata model is important. New standard services will have a unique metadta associated with them, and so we need a way to integrate them as we go.

An important XML Schema feature VOResource uses to enable extension is the substitutionGroup attribute in the definition of elements. This defines an element that can be used in place of another; when comined with the XML's version of inheritance--namely, extension and restriction--they provide a form of polymorphism. In the VOResource example, a Service element can be used anywhere a Resource element is expected. The difference is that the Service element will contain service-specific metadata not applicable to a generic Resource.

Proper declaration and use of namespaces is important when extending or reusing metadata. For example, VOStdResource draws upon the definitions in both VOResource and VOTable. In order to allow other schemas to extend or reuse elements in a schema, I found it is important to always define a targetNamespace. Although the current official VOTable follows the pattern sugested in the previous section (e.g. globally defined elements), it does not define a namespace, preventing me from mixing VOTable elements with elements from VOStdResource. To get around this, I created a new version of VOTable.xsd, slightly modified to define the target namespace.

2.3 Integration with Tools

There are a number tools that make it easy to build XML-based applications. Most notable are those that automate the building of Web Services; these include for example, Java's APIs for XML Binding and Messaging (JAXB and JAXM), Apache's Axis, and Microsoft's XSD. The power of these tools is in their ability to automatically generate software classes directly from the XML Schema document. Tools that help integrate XML with relational databases are also expected to be important. A schema authoring style that allows for the use of these tools ultimately reduces the software VO developers will need to write.

Experimentation with such tools is on-going. Developers at STSci/JHU (Wil O'Mullane and Gretchen Greene) have used Microsoft's XSD to create C# classes. It appears that substitution groups are not well supported by current tools; however, it is expected that they will before long.

2.4. Conclusion

This exercise has revealed several straigh-forward patterns for defining and extending metadata using XML Schema, most of which have not been described here. These will enumerated in detail in a subsequent document.
Last modified: Tue Apr 22 13:09:50 2003