The standard format of IVOA Vocabularies

Version 0.10

IVOA Note 2007 November XX

Editor(s): Andrea Preite Martinez, Frederic V. Hessman . . .

Author(s): Sebastian Derriere, CDS Strasbourg, France, Norman Gray, Glasgow, UK, Frederic Hessman, Georg-August-Universität Göttingen, Germany, Tony Linde, Leicester, UK, Andrea Preite Martinez, IASF-Roma, Italy Alexandre Richard, IASF-Roma, Italy, Rob Seaman, NOAO, USA, Brian Thomas, University of Maryland, USA . . .


IVOA VOcabularies are named dictionaries consisting of a set of string tokens representing astrophysical concepts, data, objects, structures, devices, and processes. The tokens of a dictionary can be used to help identify, label, classify, and/or automatically process astrophysical information within Virtual Observatory (VO) or external contexts. We propose that the dictionaries be stored in a W3C proposed standard named SKOS (Simple Knowledge Organisation Systems). SKOS provides a standard way to represent knowledge organisation systems like thesauri, classification schemes, subject heading systems and taxonomies, using the Resource Description Framework (RDF). The use of a standard format will not only enable different groups to define and maintain their own specialized VOcabularies while letting the rest of the astronomical community access and use them, but also will enable the access and the manipulation of these vocabularies using standard tools already developed for the semantic web. Several examples of VOcabularies formatted in SKOS are presented as examples.

Status of This Document

This is a Note. The first release of this document was YYYY Month DD. This is an IVOA Note expressing suggestions from and opinions of the authors. It is intended to share best practices, possible approaches, or other perspectives on interoperability with the Virtual Observatory. It should not be referenced or otherwise interpreted as a standard specification.

1 Introduction

Astronomical information of relevance to the Virtual Observatory (hereafter "VO") is not confined to quantities easily expressed in a catalogue or a table. Fairly simple things like position on the sky, brightness in some units, times measured in some frame, redshifts, classifications or other similar quantities are easily manipulated and stored in VOTables and can now be identified using IVOA Unified Content Descriptors (hereafter "UCD"). However, astrophysical concepts and quantities consist of a wide variety of names, identifications, classifications, and associations, most of which cannot be described or labeled via UCD.

Formally, one needs an ontology - a systematic mathematical description of how the concepts are both named and connected with each other - in order to process astronomical information by computer to any depth of complexity. On the other hand, there are many uses of the VO where it would be perfectly adequate to enable computers to handle astronomical tokens that intelligent humans have standardized and for which context-specific processing can be pre-defined.

One of the best examples for the need of a simple token-based vocabulary within the VO is VOEvent, the VO standard for handling astronomical events: if someone broadcasts ("publishes") the occurrence of an event, the implication is that someone else is going to want to respond to it, but no institution is interested in all possible events, so some standardized information about what the event "is about" is necessary and in a form which insures that the parties communicate effectively. If a "burst" is announced, is it a Gamma-Ray Burst due to the collapse of a star in a distant galaxy, a solar flare, or the brightening of an accretion disk around a stellar or AGN accretion disk? If a publisher doesn‘t use the label one would have expected, how is one to guess what other equivalent labels might have been used? Thus, rather than waiting for someone to perform the Herculean task of creating a useful VO ontology for astrophysics, most of us would be very happy simply to agree on how we label certain things, independent of what those things mean to individual researchers or computer processes.

There have been many attempts to create something less than a full astrophysical ontology - call them "vocabularies" or "taxonomies" - for astronomical purposes.

• The Second Reference Dictionary of the Nomenclature of Celestial Objects (Lortet, Borde & Ochsenbein 1994) [3] contains 500 pages (!) of astronomical nomenclature.

• For decades, professional journals have used a set of reasonably compatible keywords to help classify the content of whole articles. These keywords have been analyzed by Preite Martinez & Lesteven (2007), from which they derived a set of common keywords constituting one of the potential bases for an official VO vocabulary. A similar but less formal attempt was made by Hessman (2005) for the VOEvent working group, resulting in a similar list.

• Astronomical databases generally use simple sets of keywords - sometimes hierarchically organized - to aid the users in the querying of the databases. Two examples from totally different contexts are the list of object types used in the Simbad database and the search keywords used in the educational Hands-On Universe(TM) image database portal.

• The Astronomical Outreach Imagery (AOI) working group has created a simple taxonomy for helping to classify images used for educational or public relations.

• Preite Martinez & Lesteven (2007) also attempted to derive a set of common concepts by analyzing the contents of abstracts in journal articles, the list of which should contain more up-to-date tokens/concepts than the old list of journal keywords.

• Remote Telescope Markup Language [4], a document definition for the transfer of observing requests that has been adopted by the Heterogeneous Telescope Network (HTN) Consortium [5] and is indirectly supported by the VOEvent protocol, currently contains several telescope and observation-related taxonomies of terms (e.g. for devices, filters, objects).

The purpose of this document is to define a VO-wide standard format for such vocabularies. While the definition of the vocabulary format does specify how such vocabularies are to be encoded (in the form of an XML document with standard properties), it does not prescribe how they are stored, published, transmitted, used or processed. Several examples of vocabularies that could be useful in contexts within and external to the VO are presented.

2 The Format of IVOA VOcabularies

3 Example VOcabularies

