International Virtual Observatory Alliance

May 14th 2005

What is the Virtual Observatory ?

A guide for astronomical end-users

V0.1

Andy Lawrence, University of Edinburgh


Notes on first draft. This is a document aimed at astronomical end-users to introduce them to the VO and what it can do. It is intended to go on the IVOA public web pages. It is aimed mostly at regular working astronomers rather than converts or techies, but might also be of interest to e-scientists from other disciplines, and astro-politicians. It is mostly about what the VO looks like from the point of view of an end-user, but should also explain some of the technology background. It should be a short introduction not a long manual. It should be mostly plain text, but with some simple diagrams and screen shots. It should have a small amount of information about specific projects.


(1) The Virtual Observatory Vision

The power of the World Wide Web is its transparency - it feels as if all the documents in the world are inside your PC. The idea of the Virtual Observatory (VO) is to achieve the same transparency for astronomical data.

All the world's data on your desk - all archives speaking the same language, accessed through a uniform interface, and analysable by the same tools. The world-wide super archive becomes the sky, and software the instrument with which we collect data from the sky - hence the metaphor of the Virtual Observatory. However the idea is to offer not just access to the data, but also operations on the data and returned results which are essential for their full exploitation - for example the ability to visualise results, to stack and mosaic images, to query catalogues and create subsets, to integrate data from different origins, or to calculate a correlation function. Such calculations will be data services offered by the expert data centres holding the data. They will be standardised to be compatible across many archives. The result can be referred to as "grid of services".

The VO is not a monolithic system. Its a way of life. Like the Web, it is really a set of standards which make all the components of the system interoperable - data and metadata standards, agreed protocols and methods, and standardised mix-and-match software components. These standards and software modules constitute the VO Framework. To achieve the whole vision, data centres, tools writers, and facility builders will all work within this framework.

(2) Why do we need the VO ?

We really have no choice but to build the VO. It is driven by scientific trends, user requirements, technological problems, and technology opportunities.

Astronomical databases are growing every day. The largest databases are several TB, and will soon be several PB. More importantly, the number and diversity of different databases is expanding, while astronomers increasingly wish to combine data from multiple databases - to combine different wavelengths, spot moving objects, or cross-correlate solar flares and magnetospheric events. Meanwhile it is becoming more and more normal to work online, through science archives, and to publish papers based on archival data. Data are pulled out of popular archives faster than they are put in, and the same data are used many times. Of course this is a trend throughout the modern world. Users will increasingly expect that analysis of data as well as access to data is automatic.

Database querying is limited by I/O speed, which unlike storage and CPU, is improving only slowly. So the natural thing is to put specialised parallel search engines next to where the data are stored, and provide searching as a service. Many astronomical analysis problems go as the square or the cube of the number of records and so datamining problems are also not rescued by Moore's law, and we can soon expect supercomputer analysis engines to likewise provide a service. Finally, constructing a well designed database is a demanding specialised job, and is closely linked to the nature of the particular scientific project. All this points to the continued growth of data and resource centres with data holdings, experts who curate the data, and teams who design databases.

Of course all these themes are familiar across science and much of commercial life. New internet technology is therefore developing fast to solve these common problems, and astronomy needs to adapt to this changing landscape. For all these reasons, something like the VO is inevitable.

(3) The user experience

Because the VO is not a specific software system, there is likewise not a single user interface or user experience. Because data services and application programmes are all compatible, they can be put together in different combinations by different people. Nontheless, a few common themes are developing.

The VO framework is largely server centred and based on web technology. There is not a piece of software called "the VO" which you install and run. All you need is a browser, so you can get at the services which the astronomical marketplace is offering. Typically you start from a web page, but you may then also launch Java programmes which give a more flexible interface.

Some tools work pretty much like existing web forms - you fill in boxes, click a button, and get results back. A good example is the US-VO "datascope" tool which lets you query many databases simultaneously. Others look like standard image analysis tools, but are linked in to the VO framework. For example, the CDS Aladin visualiser gives you a menu of image servers to load from, and you can save intermediate results to the "MySpace" virtual storage area. A growing need is to make queries of source catalogues and other databases, along the lines of "give me all the galaxies between A and B that are redder than X and not in this other database". Astronomers are rapidly learning SQL, but VO projects are also writing button-driven "query builder" tools. This will typically return a table in a standard format. There are some new tools, such as the Indian VOplot or UK TopCat, with which you can make 2D plots out of selected columns from these tables. In the UK AstroGrid system, you can string together a sequence of tasks using a "workflow builder". The workflow can be saved as a file so you can reload it and change it another day.

A key feature of these tools and services is that they are dynamic rather than hardwired and static. This works because of a key VO concept called a Registry. This is a kind of yellow pages of resources. Here "resources" means not just data, but available services and applications. Several different registries are being maintained - in the US, the UK, France, and Japan. They are not identical because they hold different information, and structured differently, but they are all compatible, so they can harvest from each other. There are tools for browsing a registry, so that you can "see whats out there", but more importantly, software can automatically locate what it needs. For example, a VO-enabled version of Sextractor is available. If this is called up inside a workflow, the workflow programme doesn't need to know how Sextractor works - the parameters that Sextractor wants are either listed in the registry and pop up automatically (thats the AstroGrid method) or the Sextractor service is called up to ask what it wants (thats the US-VO method). For data centres, or tools writers, the way you "get on the VO" is then to publish to a Registry in a standardised way.

It may sound like the VO is a kind of creative anarchy, with lots of separate individual data services and applications programmes. To some extent this is true, and indeed this is much like the way the Web works - lots of people can run their own web servers, and users can surf all these. (Of course for both the Web and the VO this freedom is made possible by a remarkable degree of international cohesion in development of and adherence to standards.) However, a more uniform approach comes from two things. First, collaborating alliances of data centres will naturally come together to offer combined services, for example offering a common access interface, pooled virtual storage resource, or a grid-like routing to the currently least busy version of a mirrored database. Second, some projects will specialise in offering a portal of some kind - a VO "one stop shop". Such a project would typically run a registry, have well organised entry web pages, and visible launch points for favourite third party applications. Third, even though services are increasingly automatic, some kind of user support will still be needed. Although some of these will come from parent services, portal owners may well offer a helpdesk or similar. Finally, major astronomical organisations such as ESO or NASA or AURA, whose role is to provide service to the community, may do all of the above - co-ordinating and supporting data centre alliances, offering a portal, and supporting users. Such a major activity would constitute a VO Facility Centre.

(3) What can you do you couldn't do before ?

(4) Parts of the VO

(5) Key technologies

(6) International collaboration

(7) Where are we now ?

(8) Where do we go next ?

high end DM

visn

intelligent res disc

Single sign on. At the moment most VO services are based around open access data and applications. But of course a lot of data are proprietary for some period, and supercomputer time is typically quota-ed. If we are not to have a nightmare of multiple passwords, and continually break the chain in workflows, the VO framework needs to develop some kind of standardised way of expressing identity

user script upload