Radio Data Providers questionnaire

This was sent to about a dozen establishments involved in radio (including sub-mm) interferometry in 2003, and posted on radiovo@ivoa.net, and later drawn to the attention of the RadioNet software forum. In all, seven facilities replied, some representing more than one array.

Facility Location Approx Wavelength Range Other info
ATCA Australia cm Interferometer
BIMA US mm Interferometer
IRAM (Plateau de Bure) France mm Interferometer
JCMT Hawaii sub-mm Mostly single-dish - heterodyne spectra, SCUBA (bolometer) etc.
JIVE Holland cm Correlator for European VLBI Network (EVN)
MERLIN UK cm Interferometer
NRAO USA cm (mm-m) (E)VLA interferometer; VLBA; GBT (single dish/VLBI)

Information was also gleaned from discussions and on-line information from the Nancay cm-wave single dish and from IR - (sub-)mm arrays under construction, ALMA and CARMA.

The full responses and provider details can be found in the radiovo forum archive.

I summarise the responses briefly here, followed by the original questions and a response-by-response summary.

I have attempted to make suggestions arising from the issues in italics; some of these may already be in practice and/or alternative ideas are welcome. The questionnaire didn't ask if users made their own observations but it is worth noting that this is the exception rather than the rule for interferometry, for example most VLA users never visit the site.

Electronic Data Availability

In some cases (mostly higher frequencies) there is no provision for public access to existing data, but there is access to metadata. In all cases PIs can obtain metadata and often the data itself. This is usually via the observatory; for mountain-top facilities the metadata may be supplied through a general archive service e.g. CADC, CDS. Of more interest to the IVOA, the majority of facilities are in the process of establishing general on-line data access. There are a number of models. NRAO have just set up on-line access to all raw VLA data; they and other facilities who only provide raw data usually provide or point to downloadable software and pipelines. NRAO use ADIL including SIA, cone search.

Data may be available instantly via a browser, or via ftp following email notification in response to a query. Other facilities provide partially calibrated data and images. This can beFor VLBI it may be necessary to get the data on tapes due to the many-Gb volume but the calibration and meta-data are available electronically. Relatively few (e.g. MERLIN) make observed data products such as FITS images routinely available via CDS or etc. but specific major surveys e.g. NVSS are accessible via such proto-VOs. Most data providers expressed an interest in publishing data to VOs. Issues include the need for an authentication filter so that proprietary and legacy data can be processed by the same route.

Extracted data, e.g. survey results, calibrator source lists, are also published in many ways - in the literature, on the observatory web site etc. There was also interest in integrating the publication of source properties via VOs.

Identification

Some facilities e.g. NRAO have on-line archives which can be searched by a wide range of parameters (position, frequency, PI ID etc.). Others can only identify an experiment by the pointing position (which may cover a large field of view) or even just an ID number, with additional information via a separate interface.There may be links between related observations (target and calibrators, series of targets, repeat observations of variable objects...). The original calibration source may be of intrinsic interest to another user. If data providers want to publish to VOs, they should make clear the paths to all the essential information to establish if a source (e.g. an IAU position or SIMBAD object) has actually be observed and the IVOA registry and data models should assist this.

Metadata

For many observatories the primary and sometimes only source of metadata are FITS headers. In some cases extracted information and other details e.g. proposal information are available as xml and/or VOTable and databases are widely used. In the real world FITS headers are not entirely standard and even if they are, are very complex. The IVOA should publicise what flavours of FITS and what keywords our tools can cope with (e.g. as done by AVO FITS image support ) and what metadata is required so that it can be supplied another way if the FITS headers don't tell us. There was a demand for tools for VOTable handling, and a tool for extraction of information from FITS headers into VOTable or IVOA standard format would be useful. We should publicise the existing tools (TopCat, ConVot etc.) better.

Reversibility and History

Most normal pre-user data processing is reversible or (almost) raw data is archived separately. Histories are usually in the context of the software used (e.g. AIPS HI extension table, copy of pipeline commands) and thus may be incomplete (if processing hit a glitch) or unintelligible to the non-expert. Observatories should be encouraged to summarise data processing for the general astronomer; however it isn't a big issue in my experience since anyone who knows enough to care probably does know the software. However this does place an obligation on data providers to describe the data accuracy (photometry, astrometry etc.) reliably using standard metadata.

Desirable Data Products and Formats

There is a variety of intentions regarding how far data should be pipelined or otherwise pre-processed; desirable products include:

  • Raw FITS plus cal info and possibly extension tables
  • Partially calibrated uv data
  • Partially calibrated data plus rough images
  • The best available (high-resolution) images/datacubes
  • Calibrated uv data for combining with data from other arrays
  • Extracted source properties (flux, position ....)
  • Full data characterisation
  • Analysis tools

There is no unique product from a radio astronomy observation; the field of view is ultimately limited by the primary beam and can cover 10^3 - 10^5 resolution elements in diameter. In rare cases the entire field is known to be of interest; in other cases the PI only images a small patch but other images are possible. Different weightings can produce different combinations of resolution and surface brightness sensitivity images. 1- or 2-D spectra can be extracted and the visibities themselves are used to produce light curves and for modelling.

There is also little progress in standardising radio astronomy software or full compatibility of existing data formats espcially in the uv plane although there are good intentions for facilities now under construction. Even further in the future the novel observing modes of e.g. LOFAR, SKA will require innovative data reduction.

There was overwhelming agreement for a standard metadata format and much interest in supplying products accessible to a variety of common packages without information loss. Most facilities felt that this implied offering fairly highly processed products as it would not be practical to standardise fully the early stages of data processing (e.g. visibility calibration) and it was recognised that a variety of products were needed to match user experience. It was felt that local expertise was needed at the early stages of data processing and to maintain the relevant software. There was a lot of support for the development of user-driven pipelines which could be steered remotely via an interface any astronomer could use (i.e. free of radio-specific jargon). The user should not get more details than they can cope with.

VOs should consider how to interface with such specialised data providers in the medium/long term. In the short term we should continue to tackle minor issues hampering the publication of existing analysis-ready data, e.g. units (Jy/beam Tant etc), access to (sub-)mm and radio calibration source catalogues, handling spectra and data cubes; cross-identification of radio sources in SIMBAD.


-----------------------------------

1. Name and nature of observatory and/or facility

(see above for list )


2. Current archive status and description:
a) How are data stored?

Most (or recent) on-line (e.g. RAID).  Back-ups DAT, DVD, CD, Exabyte
Robotic tape system
MERLIN catalogue partially in Vizier
IRAM metadata accessible via Vizier
JCMT data also held at CADC

b) How are data catalogued?

Pointing position, date
Project, date
Experiment-code based
Metadata extracted during pipeline processing
FITS headers
Links between related obs

DBs, XML for metadata

c) Do you provide information about sources (as distinct from about
   observations) e.g. calibrator lists, target properties, and if so
   is this:
   i) catalogued information?

Observation info and lists of related cal sources etc.
Detailed properties of calibration and phase-reference sources (may
require separate query) 
Source information planned
Source role in older data not always clear

  ii) plots?

yes  no calibration sources (calibration tables, visibilities, images)
calibration and data quality plots

photometry time series

3. What can be accessed on-line?

All raw data since 1991; some FITS images and auxillary data
UV data (VLA)
Raw data, metadata (FITS headers)
FITS files of raw data for some experiments, calibration tables and other 
supporting data/plots.  Pipeline diagnostic products
All metadata browsable; datasets downloadable.
All data since 1976
All data in fixed period e.g. 1990-2003 (updating and v. old data can be problems)

Raw data; FITS images (beta release)

4. Who can access it?

In all cases(?) some observational details are public, either rapidly
or after a restricted period.  The actual data are only available to
PIs in two cases (both mm arrays); in other cases anyone can access
public uv data and/orimages and auxillary data directly.  In some
cases proprietary data can be accessed by http using an authorisation
process.

Data Internally, externally by arrangement;
Metadata via CDS (in future)
Internal; external planned including authorisation groups.

5. What are the methods of access?

http, ftp via archive page or CDS/AVO (AstroGrid eventually)
ssh
web form, email notification for ftp retrieval
http, browser helper for multiple downloads (DART)
http, html or VOTable lists of results
web via CADC

6. What search parameters are available?

Field centre, target name if recognised, rough wavelength via CDS;
various other observational parameters via archive
Header info 
Project/PI details, date or wide range of observing parameters
incl. cone search on pointing positions; molecular transition, SIMBAD
name.
Experiment name, source name (different searches needed for different
information)
Observing mode
Object name / RA dec

7. What is your Archive Policy?

Archive everything

Access:
Data public after 1 yr
Data remains proprietary, headers public after 1 yr
18 mth

8. What software do you use?

Local software
Oracle, Java
My/MiniSQL, cgi, perl, AIPS
GILDAS
Tomcat, JBOSS
AIPS++ measurement set handling - Glish, C++
AIPS++
XML XSLT
VOTable (intention)
PostgresSQL, rsynch, Miriad, Apache,
development tools: Python, Java servlets,
browser helper for multiple downloads (DART)
Currently SPECX and CLASS. Moving to AIPS++

9. What software do users need, and can you provide it?

Some images require no special processing; otherwise AIPS, DIFMAP or
etc (easily available)
Supply GILDAS
Web browser
Java for calibrator tool
browser helper for multiple downloads (DART) supplied
Miriad
Specific JCMT tools provided by Starlink

10.What format(s) are your data in (or can be translated into)?

Raw data proprietary, processed data AIPS FITS
GILDAS format; convertible to AIPS and other FITS 
VLA export (understood by AIPS), various FITS
Miriad/AIPS++ MS2 for UV; also FITS for images
RPFITS
SCUBA NDF
Other GSD, will be FITS

Plots etc.: PS, PNG, tar archives

11.Do you use pipelines?

Yes for continuum
Yes, recent
Yes for VLBA
Being developed for VLA
Will be for EVLA, ALMA
prototype
under development
For continuum: Generic JAC pipeline ORAC-DR, also generates light-curves

12.How far are data normally reduced before being supplied to the user?

Some data have minimal calibration and conversion to FITS, others have
enough calibration to allow imaging, or user can elect to reduce raw
data (only at observatory)

UV data calibration
Much reduction; flux calibration and editing still required

Conversion to AIPS++ MS and thence to IDI_FITS. Calibration tables generated.

Calibration and imaging of single tracks; track combination planned

Visibility data; plans to pipeline calibration and imaging and some
extracted source properties and cross-links

Calibrated and coadded, need more resources for rigorous quality
control in archiving

Often implicit that user editing in particular may be needed.

13.Are these stages:
a) documented?
yes
yes in FITS AIPS history/ Miriad history
pipeline script supplied and documented.
Histories kept, scripts supplied.

b) reversible?
all after FITS
yes
not entirely
no but can get unprocessed data

14.Can data be processed remotely (i.e. user in x, data in
   observatory/datacentre)?

Prototype
Not yet
No
Planned
Yes (but I think they meant off-site entirely - question initially badly 
worded)
Could in theory use pipeline to control remote workflow using web
services

15.What Virtual Observatory projects (if any) are you involved in?

(GAVO)
AstroGrid AVO IVOA (also PVO)
None formally
NVO
Aus-VO, IVOA
None formally, would like to publish data to VOs

16.Do you use explicitly any interoperability tools, e.g. data models,
   UCDs, VOTable?

UCDs, VOTable for CDS/AVO publication.  Prototype DM.  Developing
local AstroGrid Data Centre.
As used for ADIL (cone search, SIA etc.)
Yes as VO compatibility developed
Developing DM
No
Not directly

17.Do you publish any data via existing VO-like facilities e.g. CDS,
   MAST?

CDS
Planned
Surveys e.g. SUMSS, NVSS published
ADIL
No
Header infomation at CADC.  Would like to see published sub-mm flux
measurements available via such facilities.

18.Making data acess easier for a wider range of astronomers - what are
   your views on whether/how these suggestions should be implimented:

a) Using a VO interface to radio observatories/data centres to run
   hidden software to provide required image, light curve,
   visibilities etc.?

yes
yes
yes 
yes - access for astronomer at any level of radio experience, clear
information about what products are available, which should be a wide
range.  Interface should encourage appropriate use.
SIA useful; datacube access needed
Need to investigate practically and in discussion what could be useful
for visibility data
Planned

b) Supplying information about hidden processing (software, versions,
   parameterisation, etc)?

yes but how to make user-friendly...
yes especially for remote processing via VO
Let people get the information they want/need, not what they don't
(which will depend on user level of experience).
Supply software-generated history via metadata interface
Data always comes with history files

c) Standardising the software in use at radio observatories/ data
   centres?

yes or user-friendly VO-type interface
Increase interperability where possible
Provide tools/support for producing and extracting info from VOTables
Standardising of interfaces is more important
No, need to leave software selection to local experts who know how to
provide the products the users want
Reduced images incl. data cubes are already reasonably accessible;
aquiring bolometer time series data could be made so.

Generic software for preparing observations exists for JCMT and UKIRT

d) Standardising the format of data products?

Yes where useful.  Use standard data model for exportable (meta)data
(can be specific where only needed internally)
Highly desirable
Highly desirable for processed data in accessible archives
e.g. calibrated uv data, spectral cubes.
Standard method of viewing (?extracting measurements?) is more important.
Yes, e.g. to convert FITS between the different flavours in use in
interferometry without loss of information. Don't lock format to
software.  Use standard daa model.
Yes (might be easier said than done)
Our FITS cubes use standard WCS, could use other standard headers
(does this imply conversion desirable?)

19.What do you think astronomers want from your data?

Some: raw data
Most: finished products (or for final easy stages)
Many: custom products obtained from calibrated uv data e.g. M+V, light
curves with sample time/uv dist avg determined interactively
All: History, other information
Many: Analysis tools

Some: raw data to do better than any machine
Others: final images/flux measurments

High spectral resolution uv data
Cleaned images
Cleaned images
Calibrated uv data especially for combination with other data incl. SD
remote Analysis tools e.g. extract cal fluxes 

Easy access
Full characterisation of origin, quality, analysis tools

easy quick reliable processing to intelligible products (which will
have different implications for different levels of user experience).

20.What are your plans for archive development or any other relevant
   suggestions?

VO compatibility 

Complete popuation of archive with pipelined products (from partially
calibrated uv data to images depending on expt)
On-line remote data product extraction - local web page initially, then VO
VO compatibility and access including use of US Teragrid for access
and possibly processing.
On-line access to data archive and products via a single local
interface incl. authorisation if required.
Pipeline all expts and archive reports and results
Connect to other archives/data centres (espec. issue for VLBI)
Put pipeline into production mode, enhance capabilities including
user-driven.
(see ATCA on-line proj doc)
Provide fully calibrated and reduced data to VO with good provenance.
Work out how to cope with high data rates from new instruments.




Topic revision: r2 - 2004-05-11 - BobHanisch
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback