Vocabularies in the Virtual Observatory, v0.90

IVOA Working Draft, 2008 February 14 [DRAFT Revision: 61 ]

Working Group: Semantics
This version: http://www.ivoa.net/twiki/bin/view/IVOA/IvoaSemantics
http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies/vocabularies-0.90.xhtml
Latest version: http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies
and issues list
Editors: Alasdair J G Gray, Norman Gray, Frederic V Hessman and Andrea Preite Martinez
Authors: Sébastien Derriere, Alasdair J G Gray, Norman Gray, Frederic V Hessman, Tony Linde, Andrea Preite Martinez, Rob Seaman and Brian Thomas

Abstract [link here]

As the astronomical information processed within the Virtual Observatory becomes more complex, there is an increasing need for a more formal means of identifying quantities, concepts, and processes not confined to things easily placed in a FITS image, or expressed in a catalogue or a table. We proposed that the IVOA adopt a standard format for vocabularies based on the W3C's Resource Description Framework (RDF) and Simple Knowledge Organisation System (SKOS). By adopting a standard and simple format, the IVOA will permit different groups to create and maintain their own specialised vocabularies while letting the rest of the astronomical community access, use, and combined them. The use of current, open standards ensures that VO applications will be able to tap into resources of the growing semantic web. Several examples of useful astronomical vocabularies are provided, including work on a common IVOA thesaurus intended to provide a semantic common base for VO applications.

Status of this document [link here]

This is (an internal draft of) an IVOA Working Draft. The first release of this document was 2008 February 14.

This document is an IVOA Working Draft for review by IVOA members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use IVOA Working Drafts as reference materials or to cite them as other than work in progress.

A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.

Acknowledgments

We would like to thank the members of the IVOA semantic working group for many interesting ideas and fruitful discussions.

1 Introduction (informative) [link here]

1.1 Vocabularies in astronomy

Astronomical information of relevance to the Virtual Observatory (VO) is not confined to quantities easily expressed in a catalogue or a table. Fairly simple things such as position on the sky, brightness in some units, times measured in some frame, redshifts, classifications or other similar quantities are easily manipulated and stored in VOTables and can currently be identified using IVOA Unified Content Descriptors (UCDs) [std:ucd]. However, astrophysical concepts and quantities use a wide variety of names, identifications, classifications and associations, most of which cannot be described or labelled via UCDs.

There are a number of basic forms of organised semantic knowledge of potential use to the VO, ranging from informal folksonomies (where users are free to choose their own labels) at one extreme, to formally structured vocabularies (where the label is drawn from a predefined set of definitions, and which can include relationships between labels) and ontologies (where the domain is captured in a formal data model) at the other. More formal definitions are presented later in this document.

An astronomical ontology is necessary if we are to have a computer (appear to) `understand' something of the domain. There has been some progress towards creating an ontology of astronomical object types [std:ivoa-astro-onto] to meet this need. However there are distinct use cases for letting human users find resources of interest through search and navigation of the information space. The most appropriate technology to meet these use cases derives from the Information Science community, that of controlled vocabularies, taxonomies and thesauri. In the present document, we do not distinguish between controlled vocabularies, taxonomies and thesauri, and use the term vocabulary to represent all three.

One of the best examples of the need for a simple vocabulary within the VO is VOEvent [std:voevent], the VO standard for supporting rapid notification of astronomical events. This standard requires some formalised indication of what a published event is `about', in a formalism which can be used straightforwardly by the developer of relevant services. See 1.2 Use-cases, and the motivation for formalised vocabularies for further discussion.

A number of astronomical vocabularies have been created, with a variety of goals and intended uses. Some examples are detailed below.

The Second Reference Dictionary of the Nomenclature of Celestial Objects [lortet94], [lortet94a] contains 500 paper pages of astronomical nomenclature
For decades professional journals have used a set of reasonably compatible keywords to help classify the content of whole articles. These keywords have been analysed by Preite Martinez & Lesteven [preitemartinez07], who derived a set of common keywords constituting one of the potential bases for a fuller VO vocabulary. The same authors also attempted to derive a set of common concepts by analysing the contents of abstracts in journal articles, which should comprise a list of tokens/concepts more up-to-date than the old list of journal keywords. A similar but less formal attempt was made by Hessman [hessman05] for the VOEvent working group, resulting in a similar list [TODO] Check differences from the A&A list.
Astronomical databases generally use simple sets of keywords – sometimes hierarchically organised – to help users make queries. Two examples from very different contexts are the list of object types used in the Simbad database and the search keywords used in the educational Hands-On Universe image database portal.
The Astronomical Outreach Imagery (AOI) working group has created a simple taxonomy for helping to classify images used for educational or public relations [std:aoim]. See 4.3 The AOIM Taxonomy.
In 1993, Shobbrook and Shobbrook published an Astronomy Thesaurus endorsed by the IAU [shobbrook92]. This collection of nearly 3000 terms, in five languages, is a valuable resource, but has seen little use in recent years. Its very size, which gives it expressive power, is a disadvantage to the extent that it is consequently hard to use. See 4.5 The 1993 IAU Thesaurus.
The VO's Unified Content Descriptors [std:ucd] (UCD) constitute the main controlled vocabulary of the IVOA and contain some taxonomic information. However, UCD has some features which supports its goals, but which make it difficult to use beyond the present applications of labelling VOTables: firstly, there is no standard means of identifying and processing the contents of the text-based reference document; secondly, the content cannot be openly extended beyond that set by a formal IVOA committee without going through a laborious and time-consuming negotiation process of extending the primary vocabulary itself; and thirdly, the UCD vocabulary is primarily concerned with data types and their processing, and only peripherally with astronomical objects (for example, it defines formal labels for RA, flux, and bandpass, but does not mention the Sun). See 4.4 The UCD1+ Vocabulary.

1.2 Use-cases, and the motivation for formalised vocabularies [link here]

The most immediate high-level motivation for this work is the requirement of the VOEvent standard [std:voevent] for a controlled vocabulary usable in the VOEvent's <why/> and <what/> elements, which describe what sort of object the VOEvent packet is describing, in some broadly intelligible way. For example a `burst' might be a gamma-ray burst due to the collapse of a star in a distant galaxy, a solar flare, or the brightening of a stellar or AGN accretion disk, and having an explicit list of vocabulary terms can help guide the event publisher into using a term which will be usefully precise for the event's consumers. A free-text label can help here (which brings us into the area sometimes referred to as `folksonomies'), but the astronomical community, with a culture sympathetic to international agreement, can do better.

The purpose of this proposal is to establish a set of conventions for the creation, publication, use, and manipulation of astronomical vocabularies within the Virtual Observatory, based upon the W3C's SKOS standard. We include as appendices to this proposal formalised versions of a number of existing vocabularies, encoded as SKOS vocabularies [std:skosref].

Specific use-cases include the following.

A user wishes to process all events concerning supernovae, which means that an event concerning a type 1a supernova must be understood to be relevant. [This supports a system working autonomously, filtering incoming information]
A user is searching an archive of VOEvents for microlensing events, and retrieves a large number of them; the search interface may then prompt her to narrow her search using one of a set of terms including, say, binary lens events. [This supports so-called `semantic search', providing semantic support to an interface which is in turn supporting a user]
A user wishes to search for resources based on the journal-supported keywords in a paper; they might either initiate this by hand, or have this done on their behalf by a tool which can extract the keywords from a PDF. The keywords are in the A&A vocabulary, and mappings have been defined between this vocabulary and others, which means that the query keywords is translated automatically into those appropriate for a search of an outreach image database (everyone likes pretty pictures), the VO Registry, a set of Simbad object types, and one or more concepts in more formal ontologies. The search interface is then able to support the user browsing up and down the AOIM vocabulary, and a specialised Simbad tool is able to take over the search, now it has an appropriate starting place. [This supports interoperability, building on the investments which institutions and users have made in existing vocabularies]

It is not a goal of this standard, as it is not a goal of SKOS, to produce knowledge-engineering artefacts which can support elaborate machine reasoning – such artefacts would be very valuable, but require much more expensive work on ontologies. As the supernova use-case above illustrates, even simple vocabularies can support useful machine reasoning.

It is also not a goal of this standard to produce new vocabularies, or substantially alter existing ones; instead, the vocabularies included below in 4 Example vocabularies are directly derived from existing vocabularies, with adjustments to make them structurally compatible with SKOS, or to remove (in the case of the IAU-93 and IVOAT pair) significant anachronisms. It therefore follows that the ambiguities, redundancies and incompleteness of the source vocabularies are faithfully represented in the distributed SKOS vocabularies.

The reason for both of these limitations is that vocabularies are extremely expensive to produce, maintain and deploy, and we must therefore rely on such vocabularies as have been developed, and attached as metadata to resources, by others. Such vocabularies are less rich or less coherent than we might prefer, but widely enough deployed to be useful.

1.3 Formalising and managing multiple vocabularies

We find ourselves in the situation where there are multiple vocabularies in use, describing a broad range of resources of interest to professional and amateur astronomers, and members of the public. These different vocabularies use different terms and different relationships to support the different constituencies they cater for. For example, delta Sct and RR Lyr are terms one would find in a vocabulary aimed at professional astronomers, associated with the notion of variable star; however one would not find such technical terms in a vocabulary intended to support outreach activities.

One approach to this problem is to create a single consensus vocabulary, which draws terms from the various existing vocabularies to create a new vocabulary which is able to express anything its users might desire. The problem with this is that such an effort would be very expensive, both in terms of time and effort on the part of those creating it, and to the potential users, who have to learn to navigate around it, recognise the new terms, and who have to be supported in using the new terms correctly (or, more often, incorrectly).

The alternative approach to the problem is to evade it, and this is the approach taken in this document. Rather than deprecating the existence of multiple overlapping vocabularies, we embrace it, help interest groups formalise as many of them as are appropriate, and standardise the process of formally declaring the relationships between them. This means that:

The various vocabularies are allowed to evolve separately, on their own timescales, managed either by the IVOA, individual working groups within the IVOA, or by third parties;
Specialised vocabularies can be developed and maintained by the community with the most knowledge about a specific topic, ensuring that the vocabulary will have the most appropriate breadth, depth, and precision;
Users can choose the vocabulary or combination of vocabularies most appropriate to their situation, either when annotating resources, or when querying them; and
We can retain the previous investments made in vocabularies by users and resource owners.

2 SKOS-based vocabularies (informative)

In this section, we introduce the concepts of SKOS-based vocabularies, and the technology of mapping between them. We describe some additional requirements for IVOA vocabularies in the next section, 3 Publishing vocabularies (normative).

2.1 Selection of the vocabulary format [link here]

After extensive online and face-to-face discussions, the authors have brokered a consensus within the IVOA community that formalised vocabularies should be published at least in SKOS (Simple Knowledge Organisation System) format, a W3C draft standard application of RDF to the field of knowledge organisation [std:skosref]. SKOS draws on long experience within the Library and Information Science community, to address a well-defined set of problems to do with the indexing and retrieval of information and resources; as such, it is a close match to the problem this document is addressing.

ISO 5964 [std:iso5964] defines a number of the relevant terms (ISO 5964:1985=BS 6723:1985; see also [std:bs8723-1] and [std:z39.19]), and some of the (lightweight) theoretical background. The only technical distinction relevant to this document is that between `vocabulary' and `thesaurus': BS-8723-1 defines a thesaurus as a

Controlled vocabulary in which concepts are represented by preferred terms, formally organized so that paradigmatic relationships between the concepts are made explicit, and the preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms. (BS-8723-1, sect. 2.39)

with a similar definition in ISO-5964 sect. 3.16. The paradigmatic relationships in question are those relating a term to a broader, narrower or more generically related term. These notions have an operational definition: any resource retrieved as a result of a search on a given term will also be retrievable through a search on that term's broader term (narrower is a simple inverse, so that for any pair of terms, if A skos:broader B, then B skos:narrower A; a term may have multiple narrower and broader terms). This is not a subsumption relationship, as there is no implication that the concept referred to by a narrower term is of the same type as a broader term.

Thus a vocabulary (SKOS or otherwise) is not an ontology. It has lighter and looser semantics than an ontology, and is specialised for the restricted case of resource retrieval. Those interested in ontological analyses can easily transfer the vocabulary relationship information from SKOS to a formal ontological format such as OWL [std:owl].

The purpose of a thesaurus is to help users find resources they might be interested in, be they library books, image archives, or VOEvent packets.

2.2 Content and format of a SKOS vocabulary

A published vocabulary in SKOS format consists of a set of concepts – an example concept capturing the vocabulary information about spiral galaxies is provided in the Figure below, with the RDF shown in both RDF/XML [std:rdfxml] and Turtle notation [std:turtle] (Turtle is similar to the more informal N3 notation). The elements of a concept are detailed below.

Figure: examples of SKOS vocabularies

XML Syntax		Turtle Syntax

<skos:Concept rdf:about="#spiralGalaxy"> <skos:prefLabel lang="en"> spiral galaxy </prefLabel> <skos:prefLabel lang="de"> Spiralgalaxie </prefLabel> <skos:altLabel lang="en"> spiral nebula </skos:altLabel> <skos:hiddenLabel lang="en"> spiral glaxy </hiddenLabel> <skos:definition lang="en"> A galaxy having a spiral structure. </skos:definition> <skos:scopeNote lang="en"> Spiral galaxies fall into one of three catagories: Sa, Sc, and Sd. </skos:scopeNote> <skos:narrower rdf:resource="#barredSpiralGalaxy"/> <skos:broader rdf:resource="#galaxy"/> <skos:related rdf:resource="#spiralArm"/> </skos:Concept>		<#spiralGalaxy> a skos:Concept; skos:prefLabel "spiral galaxy"@en, "Spiralgalaxie"@de; skos:altLabel "spiral nebula"@en; skos:hiddenLabel "spiral glaxy"@en; skos:definition """A galaxy having a spiral structure."""@en; skos:scopeNote """Spiral galaxies fall into one of three categories: Sa, Sc, and Sd"""@en; skos:narrower <#barredSpiralGalaxy>; skos:broader <#galaxy>; skos:related <#spiralArm> .

XML Syntax

Turtle Syntax

<skos:Concept rdf:about="#spiralGalaxy">
  <skos:prefLabel lang="en">
    spiral galaxy
  </prefLabel>
  <skos:prefLabel lang="de">
    Spiralgalaxie
  </prefLabel>
  <skos:altLabel lang="en">
    spiral nebula
  </skos:altLabel>
  <skos:hiddenLabel lang="en">
    spiral glaxy
  </hiddenLabel>
  <skos:definition lang="en">
    A galaxy having a spiral structure.
  </skos:definition>
  <skos:scopeNote lang="en">
    Spiral galaxies fall into one of 
    three catagories: Sa, Sc, and Sd.
  </skos:scopeNote>
  <skos:narrower
    rdf:resource="#barredSpiralGalaxy"/>
  <skos:broader
    rdf:resource="#galaxy"/>
  <skos:related
    rdf:resource="#spiralArm"/>
</skos:Concept>

<#spiralGalaxy> a skos:Concept;
  skos:prefLabel
    "spiral galaxy"@en, 
    "Spiralgalaxie"@de;
  skos:altLabel "spiral nebula"@en;
  skos:hiddenLabel "spiral glaxy"@en;
  skos:definition """A galaxy having a 
    spiral structure."""@en;
  skos:scopeNote """Spiral galaxies fall
    into one of three categories:
    Sa, Sc, and Sd"""@en;
  skos:narrower <#barredSpiralGalaxy>;
  skos:broader <#galaxy>;
  skos:related <#spiralArm> .

A SKOS vocabulary includes the following features.

A single URI representing the concept, mainly for use by computers.
A single prefered label in each supported language of the vocabulary, for use by humans.
Optional alternative labels which applications may encounter or in common use, whether simple synonyms or commonly-used aliases, e.g. GRB for "gamma-ray burst", or Spiral nebula for spiral galaxies.
Optional hidden labels which capture terms which are sometimes used for the corresponding concept, but which are deprecated in some sense. This might include common misspellings for either the preferred or alternate labels, for example glaxy for galaxy.
A definition for the concept, where one exists in the original vocabulary, to clarify the meaning of the term.
A scope note to further clarify a definition, or the usage of the concept.
Optionally, a concept may be involved in any number of relationships to other concepts. The types of relationships are
- Narrower or more specific concepts, for example a link to the concept representing a barred spiral galaxy.
- Broader or more general concepts, for example a link to the token representing galaxies in general.
- Related concepts, for example a link to the token representing spiral arms of galaxies
  (note this relationship does not say that spiral galaxies have spiral arms – that would be ontological information of a higher order which is beyond the requirements for information stored in a vocabulary).

In addition to the information about a single concept, a vocabulary can contain information to help users navigate its structure and contents:

The top concepts of the vocabulary, i.e. those that occur at the top of the vocabulary hierarchy defined by the broader/narrower relationships, can be explicitly stated to make it easier to navigate the vocabulary.
Concepts that form a natural group can be defined as being members of a collection.
Versioning information can be added using change notes.
Additional metadata about the vocabulary, e.g. the publisher, may be documented using the Dublin Core metadata set [std:dublincore].

2.3 Relationships Between Vocabularies

There already exist several vocabularies in the domain of astronomy. Instead of attempting to replace all these existing vocabularies, which have been developed to achieve different aims and user groups, we embrace them. This requires a mechanism to relate the concepts in the different vocabularies.

Part of the SKOS standard [std:skosref] allows a concept in one vocabulary to be related to a concept in another vocabulary. There are four types of relationship provided to capture the relationships between concepts in vocabularies, which are similar to those defined for relationships between concepts within a single vocabulary. The types of mapping relationships are:

Equivalence between concepts, i.e. the concepts in the different vocabularies refer to the same real world entity. This is captured with the RDF statement
AAkeys:#Cosmology skos:exactMatch aoim:#Cosmology
which states that the cosmology concept in the A&A Keywords is the same as the cosmology concept in the AOIM. (Note the use of an external namespaces AAkeys and aoim which must be defined within the document.)
Broader concept, i.e. there is not an equivalent concept but there is a more general one. This is captured with the RDF statement
AAkeys:#Moon skos:broadMatch aoim:PlanetSatellite
which states that the AOIM concept Planet Satellite is a more general term than the A&A Keywords concept Moon.
Narrower concept, i.e. there is not an equivalent concept but there is a more specific one. This is captured with the RDF statement
AAkeys:#IsmClouds skos:narrowMatch aoim:#NebulaAppearanceDarkMolecularCloud
which states that the AOIM concept Nebula Appearance Dark Molecular Cloud is more specific than the A&A Keywords concept ISM Clouds.
Related concept, i.e. there is some form of relationship. This is captured with the RDF statement
AAkeys:#BlackHolePhysics skos:relatedMatch aoim:#StarEvolutionaryStageBlackHole
which states that the A&A Keywords concept Black Hole Physics has an association with the AOIM concept Star Evolutionary Stage Black Hole.

The semantic mapping relationships have certain properties. The broadMatch relationship has the narrowMatch relationship as its inverse and the exactMatch and relatedMatch relationships are symmetrical. The consequence of these properties is that if you have a mapping from concept A in one vocabulary to concept B in another vocabulary then you can infer a mapping from concept B to concept A.

3 Publishing vocabularies (normative) [link here]

3.1 Requirements [link here]

A vocabulary which conforms to this IVOA standard has the following features. In this section, the keywords must, should and so on, are to be interpreted as described in [std:rfc2119].

3.1.1 Dereferenceable namespace

The namespace of the vocabulary must be dereferenceable on the web. That is, typing the namespace URL into a web browser will produce human-readable documentation about the vocabulary. In addition, the namespace URL should return the RDF version of the vocabulary if it is retrieved with an HTTP Accept header of application/rdf+xml.

Rationale: These prescriptions are intended to be compatible with the patterns described in [berrueta08] and [sauermann07], and vocabulary distributors should follow these patterns where possible.

3.1.2 Long-term availability

The files defining a vocabulary, including those of superseded versions, should remain permanently available. There is no requirement that the namespace URL be at any particular location, although the IVOA web pages, or the online sections of the A&A journal would likely be suitable archival locations.

3.1.3 Distribution format

Vocabularies must be made available for distribution as SKOS RDF files, in either RDF/XML [std:rdfxml] or Turtle [std:turtle] format; vocabularies should be made available in both formats. See issue [distformat-2].

A publisher may make available documentation and supporting files in other formats.

Rationale: this does imply that the vocabulary source files can only realistically be parsed using an RDF parser. An alternative is to require that vocabularies be distributed using a subset of RDF/XML which can also be naively handled as traditional XML; however as well as creating an extra standardisation requirement, this would make it effectively infeasible to write out the distribution version of the vocabulary using an RDF or general SKOS tool.

3.1.4 Clearly versioned vocabulary

To be decided. There are interactions with 'long-term availability' and 'dereferenceable namespace', since this implies that the vocabulary version should be manifestly encoded in the namespace URI. See issue [versioning-3].

3.1.5 No restrictions on source files

This standard does not place any restrictions on the format of the files managed by the maintenance process, as long as the distributed files are as specified above. See issue [masterformat-1].

3.2 Suggested good practices [link here]

This standard imposes a number of requirements on conformant vocabularies (see 3 Publishing vocabularies (normative)). In this section we list a number of good practices that IVOA vocabularies should abide by. Some of the prescriptions below are more specific than good-practice guidelines for vocabularies in general.

The adoption of the following guidelines will make it easier to use vocabularies in generic VO applications. However, VO applications should be able to accept any vocabulary that complies with the latest SKOS standard [std:skosref] (this does not imply, of course, that an application will necessarily understand the terms in an alien vocabulary, although the presence of mappings to a known vocabulary should allow it to derive some benefit).

Concept identifiers should consist only of the letters a-z, A-Z, and numbers 0-9, i.e. no spaces, no exotic letters (e.g. umlauts), and no characters which would make a token inexpressible as part of a URI; since tokens are for use by computers only, this is not a big restriction, since the exotic letters can be used within the labels and documentation if appropriate.
The concept identifiers should be kept in human-readable form, directly reflect the implied meaning, and not be semi-random identifiers only (for example, use spiralGalaxy, not "t1234567"); tokens should preferably be created via a direct conversion from the preferred label via removable/translation of non-token characters (see above) and sub-token separation via capitalisation of the first sub-token character (e.g. the label "My favourite idea-label #42" is converted into "MyFavouriteIdeaLabel42").
Labels should be in the form of the source vocabulary. When developing a new vocabulary the singular form should be preferred, e.g. spiral galaxy, not "spiral galaxies". Open issue
Each concept should have a definition (skos:definition) that constitutes a short description of the concept which could be adopted by an application using the vocabulary. Each concept should have additional documentation using SKOS Notes or Dublin Core terms as appropriate (see [std:skosref])
The language localisation should be declared where appropriate, in preferred labels, alternate labels, definitions, and the like.
Relationships (broader, narrower, related) between concepts should be present, but are not required; if used, they should be complete (thus all broader links have corresponding narrower links in the referenced entries and related entries link each other).
TopConcept entries (see above) should be declared and normally consist of those concepts that do not have any broader relationships (i.e. not at a sub-ordinate position in the hierarchy).
The SKOS standard describes some good practices for vocabulary maintenance, such as using <skos:changeNote> and the like. Publishers should respect such good maintenance practices are are available.
Publishers should publish mappings between their vocabularies and other commonly used vocabularies. These should be external to the defining vocabulary document so that the vocabulary can be used independently of the publisher's mappings. Open issue.

4 Example vocabularies [link here]

The intent of having the IVOA adopt SKOS as the preferred format for astronomical vocabularies is to encourage the creation and management of diverse vocabularies by competent astronomical groups, so that users of the VO and related resources can benefit directly and dynamically without the intervention of the IAU or IVOA. However, we felt it important to provide several examples of vocabularies in the SKOS format as part of the proposal, to illustrate their simplicity and power, and to provide an immediate vocabulary basis for VO applications.

The vocabularies described below are included, as SKOS files, in the distributed version of this standard. These vocabularies have stable URLs Format TBD, see [versioning-3], and may be cited and used indefinitely. These vocabularies will not, however, be developed as part of the maintenance of this standard. Interested groups, within and outwith the IVOA, are encouraged to take these as a starting point and absorb them within an existing process.

The exception to this is the IVOA-T (see 4.6 Towards an IVOA Thesaurus), which will be developed as part of a process which has already begun. Clarify wording here; include reference to forthcoming IVOA-T document; ??include snapshot of vocabulary in the distribution??.

We provide a set of SKOS files representing the vocabularies which have been developed, and mappings between them. These can be downloaded at the URL

http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies/vocabularies-0.90/vocab-0.90.tar.gz

Not yet: instead go to http://code.google.com/p/volute/downloads/list

[To be expanded:] there are no mappings at the moment. Also, the vocabularies are all in a single language, though translations of the IAU93 thesaurus are available. See also issue [mappings-6].

4.1 A Constellation Name Vocabulary [link here]

This vocabulary is presented as a simple example of an astronomical vocabulary for a very particular purpose, e.g. handling constellation information like that commonly encountered in variable star research. For example, SS Cygni is a cataclysmic variable located in the constellation Cygnus. The name of the star uses the genitive form Cygni, but the alternate label SS Cyg uses the standard abbreviation Cyg. Given the constellation vocabulary, all of these forms are recorded together in a computer-manipulatable format. Various incorrect forms should probably be represented in SKOS `hidden labels'

The <skos:ConceptScheme> contains a single <skos:TopConcept>, constellation

XML Syntax		Turtle Syntax

<skos:Concept rdf:about="#constellation"> <skos:inScheme rdf:resource=""/> <skos:prefLabel> constellation </skos:prefLabel> <skos:definition> IAU-sanctioned constellation names </skos:definition> <skos:narrower rdf:resource="#Andromeda"/> ... <skos:narrower rdf:resource="#Vulpecula"/> </skos:Concept>		<#constellation> a :Concept; :inScheme <>; :prefLabel "constellation"; :definition "IAU-sanctioned constellation names"; :narrower <#Andromeda>; ... :narrower <#Vulpecula>.

XML Syntax

Turtle Syntax

<skos:Concept rdf:about="#constellation">
  <skos:inScheme rdf:resource=""/>
  <skos:prefLabel>
    constellation
  </skos:prefLabel>
  <skos:definition>
    IAU-sanctioned constellation names
  </skos:definition>
  <skos:narrower rdf:resource="#Andromeda"/>
  ...
  <skos:narrower rdf:resource="#Vulpecula"/>
</skos:Concept>

<#constellation> a :Concept;
  :inScheme <>;
  :prefLabel "constellation";
  :definition "IAU-sanctioned constellation names";
  :narrower <#Andromeda>;
  ...
  :narrower <#Vulpecula>.

and the entry for Cygnus is

<skos:Concept rdf:about="#Cygnus">
  <skos:inScheme rdf:resource=""/>
  <skos:prefLabel>Cygnus</skos:prefLabel>
  <skos:definition>Cygnus</skos:definition>
  <skos:altLabel>Cygni</skos:altLabel>
  <skos:altLabel>Cyg</skos:altLabel>
  <skos:broader rdf:resource="#constellation"/>
  <skos:scopeNote>
    Cygnus is nominative form; the alternative
    labels are the genitive and short forms
  </skos:scopeNote>
</skos:Concept>

<#Cygnus> a :Concept;
  :inScheme <>;
  :prefLabel "Cygnus";
  :definition "Cygnus";
  :altLabel "Cygni";
  :altLabel "Cyg";
  :broader <#constellation>;
  :scopeNote """Cygnus is nominative form;
    the alternative labels are the genitive and
    short forms""" .

Note that SKOS alone does not permit the distinct differentiation of genitive forms and abbreviations, but the use of alternate labels is more than adequate enough for processing by VO applications where the difference between SS Cygni, SS Cyg, and the incorrect form SS Cygnus is probably irrelevant.

4.2 The Astronomy & Astrophysics Keyword List [link here]

This vocabulary is a set of keywords made available on a web page by the publisher of the journal. The intended usage of the vocabulary is to tag articles with descriptive keywords to aid searching for articles on a particular topic.

The keywords are organised into categories which have been modelled as hierarchical relationships. Additionally, some of the keywords are grouped into collections which has been mirrored in the SKOS version. The vocabulary contains no definitions or related links as these are not provided in the original keyword list, and only a handful of alternative labels and scope notes that are present in the original keyword list.

4.3 The AOIM Taxonomy [link here]

This vocabulary is published by the IVOA to allow images to be tagged with keywords that are relevant for the public. It consists of a set of keywords organised into an enumerated hierarchical structure. Each term consists of a taxonomic number and a label. There are no definitions, scope notes, or cross references.

When converting the AOIM into SKOS, it was decided to model the taxonomic number as an alternative label. Since there are duplication of terms, the token for a term consists of the full hierarchical location of the term. Thus, it is possible to distinguish between

Planet -> Feature -> Surface -> Canyon

and

Planet -> Satellite -> Feature -> Surface -> Canyon

which have the tokens PlanetFeatureSurfaceCanyon and PlanetSatelliteFeatureSurfaceCanyon respectively.

4.4 The UCD1+ Vocabulary [link here]

The UCD standard is an officially sanctioned and managed vocabulary of the IVOA. The normative document is a simple text file containing entries consisting of tokens (e.g. em.IR), a short description, and usage information (syntax codes which permit UCD tokens to be concatenated). The form of the tokens implies a natural hierarchy: em.IR.8-15um is obviously a narrower term than em.IR, which in turn is narrower than em.

Given the structure of the UCD1+ vocabulary, the natural translation to SKOS consists of preferred labels equal to the original tokens (the UCD1 words include dashes and periods), vocabulary tokens created using guidelines in 3.2 Suggested good practices (e.g., "emIR815Um" for em.IR.8-15um), direct use of the definitions, and the syntax codes placed in usage documentation: <skos:scopeNote>UCD syntax code: P</skos:scopeNote>

Note that the SKOS document containing the UCD1+ vocabulary does NOT consistute the official version: the normative document is still the text list. However, on the long term, the IVOA may decide to make the SKOS version normative, since the SKOS version contains all of the information contained in the original text document but has the advantage of being in a standard format easily read and used by any application on the semantic web whilst still being usable in the current ways.

4.5 The 1993 IAU Thesaurus [link here]

The IAU Thesaurus consists of concepts with mostly capitalised labels and a rich set of thesaurus relationships (BT for "broader term", NT for narrower term, and RT for related term). The thesaurus also contains U (for use) and UF (use for) relationships. In a SKOS model of a vocabulary these are captured as alternative labels. A separate document contains translations of the vocabulary terms in five languages: English, French, German, Italian, and Spanish. Enumerable concepts are plural (e.g. SPIRAL GALAXIES) and non-enumerable concepts are singular (e.g. STABILITY). Finally, there are some usage hints like combine with other

In converting the IAU Thesaurus to SKOS, we have been as faithful as possible to the original format of the thesaurus. Thus, preferred labels have been kept in their uppercase format.

The IAU Thesaurus has been unmaintained since its initial production in 1993; it is therefore significantly out of date in places. This vocabulary is published for the sake of completeness, and to make the link between the evolving vocabulary work and any uses of the 1993 vocabulary which come to light. We do not expect to make any future maintenance changes to this vocabulary, and would expect the IVOAT vocabulary, based on this one, to be used instead (see 4.6 Towards an IVOA Thesaurus).

4.6 Towards an IVOA Thesaurus [link here]

While it is true that the adoption of SKOS will make it easy to publish and access different astronomical vocabularies, the fact is that there is no vocabulary which makes it easy to jump-start the use of vocabularies in generic astrophysical VO applications: each of the previously developed vocabularies has their own limits and biases. For example, the IAU Thesaurus provides a large number of entries, copious relationships, and translations to four other languages, but there are no definitions, many concepts are now only useful for historical purposes (e.g. many photographic or historical instrument entries), some of the relationships are false or outdated, and many important or newer concepts and their common abbreviations are missing.

Despite its faults, the IAU Thesaurus constitutes a very extensive vocabulary which could easily serve as the basis vocabulary once we have removed its most egregious faults and extended it to cover the most obvious semantic holes. To this end, a heavily revised IAU thesaurus is in preparation for use within the IVOA and other astronomical contexts. The goal is to provide a general vocabulary foundation to which other, more specialised, vocabularies can be added as needed, and to provide a good lingua franca for the creation of vocabulary mappings.

5 Example Mapping [link here]

To show how mappings can be expressed between two vocabularies, we have provided one example mapping document which maps the concepts in the A&A Keywords vocabulary to the concepts in the AOIM vocabulary. All four types of mappings were required. Since all the mapping relationships have inverse relationships defined, the mapping document can also be used to infer the set of mappings from the AOIM vocabulary to the A&A keywords.

To provide provenence information about the set of mappings in a document, Dublin Core metadata is included in the mapping document.

To come

Appendices

Bibliography [link here]

[berrueta08] Diego Berrueta and Jon Phipps.: Best practice recipes for publishing RDF vocabularies. W3C Working Draft, January 2008. [Online].
[hessman05] F V Hessman.: VOConcepts - a proposed UCD for astronomical objects, events, and processes. [Online, cited January 2008].
[lortet94] M-C Lortet, S Borde, and F Ochsenbein.: Second reference dictionary of the nomenclature of celestial objects. Astron.\ Ap.\ Supp, 107 pp. 193-218, 1994. [Online].
[lortet94a] M-C Lortet, S Borde, and F Ochsenbein.: The second reference dictionary of the nomenclature of celestial objects (solar system excluded). volumes i, ii.. Technical Report 24, Centre des Données astronomique des Strasbourg, 1994. [Online].
[preitemartinez07] Andrea Preite Martinez and Soizick Lesteven.: Astronomical keywords in the era of the virtual observatory. IVOA Note, IVOA, 2007. [Online].
[sauermann07] Leo Sauermann, Richard Cyganiak, and Max Völkel.: Cool URIs for the semantic web. Technical Memo TM-07-01, Deutsches Forschungszentrum für Künstliche Intelligenz, 2007. [Online].
[shobbrook92] R.M. Shobbrook and R.R. Shobbrook.: The IAU thesaurus for improved on-line access to information. Proc. Astron. Soc. of Australia, 10 pp. 134, 1992. [Online].
[std:aoim] Robert Hurt, Lars Lindberg Christensen, and Adrienne Gauthier.: Astronomical outreach imagery metadata tags for the virtual observatory. [Online, cited January 2008].
[std:bs8723-1] Structured vocabularies for information retrieval - guide - definitions, symbols and abbreviations (BS 8723-1:2005).: British Standard, 2005.
[std:dublincore] DCMI Usage Board.: DCMI metadata terms. DCMI Recommendation, 2006. [Online].
[std:iso5964] Documentation - guidelines for the establishment and development of multilingual thesauri (ISO 5964:1985=BS 6723:1985).: International Standard, 1985.
[std:ivoa-astro-onto] Sébastien Derriere, Andrea Preite Martinez, and Alexandre Richard, editors.: Ontology of astronomical object types. IVOA Working Draft, 2007. [Online].
[std:owl] World Wide Web Consortium.: The web ontology language. [Online].
[std:rdfxml] Dave Beckett.: RDF/XML syntax specification (revised). W3C Recommendation, February 2004. [Online].
[std:rfc2119] S Bradner.: Key words for use in RFCs to indicate requirement levels. RFC 2119, March 1997. [Online].
[std:skosref] Alistair Miles and Sean Bechhofer, editors.: SKOS reference. W3C Working Draft, January 2008. [Online].
[std:turtle] Dave Beckett.: Turtle - terse RDF triple language. Draft Standard, November 2007. [Online].
[std:ucd] Sébastien Derriere, Andrea Preite Martinez, and Roy Williams, editors.: UCD (Unified Content Descriptor) - moving to UCD1+. IVOA Recommendation, 2004. [Online, cited February 2008].
[std:voevent] Rob Seaman and Roy Williams, editors.: Sky event reporting metadata (VOEvent). IVOA Recommendation, 2006. [Online].
[std:z39.19] Guidelines for the construction, format and management of monolingual thesauri (ANSI/NISO Z39.19-2005).: American National Standard, 2005. Closely corresponds to BS 8723:2005, parts 1 and 2, which replaces BS 5723; and to forthcoming ISO 25964, which replaces ISO 2788. [Online].

Revision: 61 Date: 2008-02-14 17:43:23 +0000 (Thu, 14 Feb 2008)