Publishing Data in the VO

This document is intended as a guide to scientists, archives and developers who wish to provide astronomical data in the Virtual Observatory but are unfamiliar with the details of the Virtual Observatory protocols and may find the plethora of VO standards intimidating. It discusses the advantages of the VO approach, how you might decide which VO capabilities are useful to you and your users, how to learn how to implement the VO standards, and how to make your new capabilities visible to the rest of the VO.

Why publish in the VO?

There are many scientific and practical benefits to publishing your data in the VO. Your data will be interoperable with other resources from all wavelength regimes and from astronomical institutions around the world making multimission, multwavelength, multiarchive research much more feasible. Your data will automatically become visible in many widely used tools like the TOPCAT and Aladin Java clients, and data portals like the MAST astronomy data portal. Scientists who may not know of your resources can nonetheless discover them when they make queries looking for data with the appropriate characteristics.

For developers there are many clients and tools that can be used to interact with the data when they are available through VO protocols. If you are still in the early stages of developing your resources, you can reuse any of a number of well-tested frameworks for publishing data. When you need to define data models for your services, the VO standards can provide a very helpful starting point allowing you to take advantage of the expertise of many domain experts.

Increasingly astronomical research involves collaborations of scientists who build tools that analyze already massive but still increasing volumes of data through sophisticated software tools. For this to work there needs to be effective ways for the software tools to access the data. You can build your own special interfaces – and occasionally that may be appropriate – but the goal of the VO is to provide you with an easier pathway to making your data easily accessible to the astronomy community.

To publish your capabilities in the Virtual Observatory you need to do three things:

  1. Decide what resources you are trying to publish and how you might wish to make them available through the VO.
  2. Build VO interfaces.
  3. Register the VO interfaces in the VO registry so that others can find them.
Let’s talk about each of these in turn.

Planning your VO presence.

As a data provider you want to make your data available through the VO. What does that mean? It depends upon what kind of data you have and what level of access you want to provide. Are you trying to provide a cutout service for images from some telescope? Or an object catalog from some major survey? Or maybe you are building the archive to some new spectral instrument? To understand what you might want to provide lets go over some of the VO standards that are likely to apply. In IVOA-speak these are mostly ‘Data Access Layer’ or DAL protocols. As a data provider, data access is what you are looking for! Let’s define some VO jargon…

Basic VO Standards

  • The simplest VO protocol is called Simple Cone Search or sometimes just cone search. It allows users to ask for a table of data within some radius of a position on the sky. (It’s a cone search because we don’t specify the distance from the Earth or Sun, so that the circle on the sky expands as we get more distant). If you have a list of objects or observations scattered on the sky, this may be a nice way to get the data out to users quickly.
  • The Simple Image Access or SIA protocol allows users to ask for images in a given location or locations. It also allows users to limit the images by time, waveband and a few other criteria. This works nicely for providing access to an image archive.
  • The Simple Spectral Access or SSA protocol similarly provides a way to allow users to download spectra by position or a few other criteria.
  • The Table Access Protocol is a more sophisticated way to make queries. It uses a dialect of the Standard Query Language (SQL) used in relational databases which is called ADQL (Astronomical Data Query Language). ADQL adds some functions to SQL to make it easier to specify positional queries and regions on the sky. If you want to allow users to make sophisticated queries this is the way to go. E.g., a user can ask for all of the objects where difference between two magnitudes is greater than some value, or where the background is below some level, or where the principal investigator is themselves, or…. If you have used the CASJobs systems for the SDSS you can get a sense of the power. If you have more than one table, the user can do a cross-correlation. They can even upload their own table to participate in a multitable query.
If you want to allow sophisticated query access to a table or set of tables, TAP is what you are looking for.
  • The VOTable format is the primary output format for VO services. When you access a cone, SIA, SSA or TAP service you are going to get a VOTable back. This is just an XML format with capabilities similar to FITS tables, but with a lot more ability to add in descriptive metadata, and without limits on keywords and such. This format is commonly supported even in non-VO tools and interfaces. Ideally you won’t need to worry about this too much. All the clients already understand VOTable and you can use one of the existing libraries to build them.
  • The Hierarchical Progressive Survey or HiPS protocol, enables you to present image (and spatial catalog) data in a Google-maps like format where users can zoom and pan very easily. This is actually a data format rather than a protocol. If you convert your data to the appropriate format – often you can use the CDS’s Aladin tools to do so in a few minutes or hours– you just need to put the data in a directory on the web and HiPS-enabled tools will immediately be able to do amazing things with your data. If you have survey images or large survey catalogs HiPS is a really powerful introduction to the benefits of the VO.
You’ll probably want to start with the protocols discussed above, but as you get familiar with the VO approach a few more standards are likely to be useful:
  • The DataLink protocol supplements the table, image and spectral protocols, by allowing you to specify additional data that a user might want. E.g., while you might return the image in an SIA service, you could use the DataLink to provide links to the calibration data. DataLink is a little different from the other data protocols. Normally you don’t make an initial DataLink query, rather you add DataLink information to the response from a TAP, SIA or SSA request that points to additional information.
  • The ObsTAP, Observation Data Model in TAP, defines a particular way to build a TAP service that allows you to point to observations from one or more missions. Since it has columns that are specifically defined as links to datasets, clients know how to use these links to download data. With ObsTAP you provide can provide a complete interface to a mission archive. ObsTAP works especially nicely when you adopt it early in the development of your archive since you can make sure that you have the data it needs in the appropriate format. It can be harder to introduce later where it may be easier to use DataLink since it can point to resources with many fewer restrictions.
  • The Simple Line Access Protocol, or SLAP, allows you to provide access to lists of spectral lines.
There are lots of VO data model standards and vocabularies which we haven’t described here. These are standards which describe what kinds of information are useful for various queries. You can certainly take a look at these standards, and if you are just starting an archive or service, these can be really valuable in helping to understand the kinds of issues that will come up. However the data access protocols we discussed above will define the critical aspects of the models that you need to get data to users.

We’ve also not addressed the standard for how to do authentication and identification and deal with credential, nor the Universal Worker Service (UWS) protocol that allows for asynchronous capabilities. These are mostly used by clients, but if you want to use the VO to allow to access restricted capabilities, you’ll need to think about these too.

One other set of protocols that we won’t discuss here are those that are used by the VO registry. The registry is where people and tools find out what resources are available in the VO. We’ll talk about registering your service latter, but we recommend that you use existing registry clients or worked examples rather than working from the registry documents which are pretty abstruse.

Issues to consider

Above we described some of the major capabilities the VO can provide. You can now assess how these fit with your situation as a data provider. If you want to enable access to images, then either HiPS or SIA make sense. HiPS works especially well if we have a single survey we’re trying to publish. SIA is better when we have lots of observations which may overlap but where we want users to get them separately, e.g., to look for temporal variations. For spectra we just have SSA. If we’re trying to provide access to more kinds of data then if we’re able to use the standard VO observation data model, then we might want to build an ObsTAP service, otherwise we can add DataLink

If we have table data and expect that most users are just going to query by position, then a cone search service may be enough. If we want to enable more complex queries then TAP may make more sense.

Another aspect for implementation is to consider where you are already. Are you just beginning the planning for your archive or do you have a mature service in place and you want to add in VO compatibility? In the first case you may want to start with ObsTAP and you’ll find that pretty much all of the relevant VO capabilities can be implemented easily on that basis. If you’ve got a lot of data and you want VO users to see it with the same names and such as non-VO, then you may find TAP and DataLink is more feasible at least in the short term. Or maybe a cone search is all that you can easily provide in your existing infrastructure.

Overall you don’t want to start in terms of what protocols to implement. Think of what you expect users to need and see what VO protocols will make that possible. If you have any questions please contact the IVOA at ….

Implementing your VO Services

So you’ve decided what VO protocols make sense for you to implement? How should you do it?

It is possible to try to read the standards and write new code on your own. All of the VO data access protocols are defined in terms of HTTP (or HTTPS) communications between client and server. You can read the standard and try to carefully implement the protocol. This isn’t especially hard for the Simple Cone Search protocol, but even there, there is enough complexity that almost no one implements it perfectly ab initio. Other standards are substantially more complex.

If you can, we strongly recommend that you use or modify an existing implementation. All of the IVOA standards are required to have at least two independent implementations before the standard can be approved. Almost all VO implementations are freely available for re-use. Take a look at the table below for links to implementations of the protocols that you are interested in. Included with each implementation is someone you can contact if – when – you have questions.

Standard

Implementation

License

Contact

Language

Link

Comments

TAP

XYZZY

Public domain

IVOA

Java

https:…

Statement of the reliability and completeness of this implementation.

If for some reason you do need to write your own implementation, or if you are making significant modifications to an existing one, then you’ll want to be sure to begin validating the implementation as soon as it begins producing data and long before users can see it. There are validators for all of the major standards you are likely to work with. These are described in our Validators Status page. In some cases you can download the validator and run it on your site. For others you may need to have gotten to where you are producing content on the web before you can do the validation.

You can use standard VO client tools like TOPCAT and Aladin to play with your service and see it from a user perspective. It can be a pain to build your own TAP or HiPS client. Take advantage of existing software resources.

Don’t hesitate to contact the IVOA if you have a question or suggestion about one of the standards. Is there something we could do to make it better? Or if you see a problem or have a suggestion for the software tool you’re using, let the developer know. We all want feedback.

Registering your service.

You decided what services you would provide to the community, and you’ve successfully implemented them. When you run a validation test, there are no problems. How do you let people know about your new services?

All VO services are indexed by the VO registry. This are several instances of this maintained by different organizations, but they are all linked and should all provide the same information. You need to get your service or services into the registry.

Getting your institution in the VO

The first decision to make is whether you feel this new service is part of some larger organization which already has some registered services. If not then you need to announce to the world that there is a whole new provider of VO services. You may only have a single service. That’s fine and quite common. There’s a strong analogy here with web services. You can put a new web page on an existing host, or it can use a new hostname. All IVOA services are identified by an id, or more specifically an ivoid, that looks like ivo://authority/…

Here authority looks rather like an HTTP host name. For example all services from the NASA’s HEASARC use ivoid’s that start with ivo://nasa.heasarc/.

Before you register the first service from your organization, you need to register a new authority ID. There are no specific standards here. Pick something reasonable, maybe ivo://country.institution.vo.

To register a new authority ID go to one of the registries which allow users to enter new services in the table below and …[I don’t know how to do this…]

Getting services in the VO

Now you have an authority ID. You’re ready to register your actual VO services. If you have just a few services to register you likely want to use one of the existing registry clients that allow you to do this. Presumably you’ll be using the same one you used to register your authority ID. For each service you’ll go through a series of web prompts that ask you about the service you are registering. You’ll need to specify a unique IVOID for each new service. Think about a framework for how you’re going to specify these. You get to specify the scheme used for your authority ID, but it’s nice if it makes some sense to users and isn’t an opaque block.

Registry

Registration Link

Documentation Link

Recommended for

NAVO/MAST

??

??

North American Services

If you have lot of services to register – more than 10 or 20, it may make sense for you to build your own ‘publishing’ registry. This means that you will actually write the XML that will define the registry entry and provide it to the rest of the world using a non VO standard call OAI (Open Archive Interface). You can then update and manage your registry entries as you will. This isn’t necessarily hard, but it’s also not trivial. You may want to talk to IVOA member from one of the institutions that has done this already and steal some of their code.

Conclusions

Congratulations! You’ve thought about what services you wanted, decided how to implement them though VO services. You reused somebody’s implementation or built your own. You’ve registered them. Now the VO world can see and integrate your data into the latest research. If you ran into problems or have any thoughts about how the standards, software, documentation, or support could be improved let us know.

Topic revision: r3 - 2019-01-10 - JanetEvans
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback