InterOpMay2005 DataCP (Data Curation & Preservation)


Reagan Moore, Francoise Genova, Robert Hanisch, Pepi Fabbiano, plus others.
Will contact the following for their participation: Bob Mann, Ray Plante, Gunther Eichhorn, Mike Watson (policy), Michael Kurtz, Peter Quinn.


Time Topic Slide Leader
09:00 CP Overview .pdf ReaganMoore
09:30 Discussion    
11:30 Action points   ReaganMoore

Materials and notes from the work group and plenary sessions

Overview: The development of curation mechanisms for IVOA collections is viewed as an essential requirement before attempts are made in preservation. Curation is the process of appraising (assessing what is worth keeping), accession (controlled import of material), arrangement (organizing the material), description (providing metadata needed for discovery and to assert authenticity), preservation (creation of a standard archival form and the association of integrity metadata), and access (creation of suitable display and manipulation services).

The IVOA community has established the basic standards needed for curation, including a standard metadata vocabulary (Uniform Content Descriptors), a standard data encoding format (FITS file), and a standard set of access services (SIAP, SSAP, Conesearch).

The IAU Commission 5 has published a resolution recognizing the need for full access to data by scientists, based on the emergence of the concept of a virtual observatory. The resolution by Ray Norris recommends that after a suitable proprietary period, data be made accessible for science usage and that encouragement be sought from funding agencies and publication journals. See

Opportunities for IVOA: Of interest is the identification of material that is at risk, and that should be brought into a curation and preservation environment. R. Hanisch expressed interest in providing curation and preservation of digital data (images, graphical data, spectrum) that is published in journals. The goal is to provide access to the digital data on which the article is based.

Francoise Genova described a related effort within CDS which curates tables that are published in A and A and provided by other journals, and curates catalogs submitted by researchers. CDS has developed a set of procedures to validate the semantics (based on standardized description of the tables and Uniform Content Descriptors), and examine the consistency of the tables. A similar project to validate images and digital graphs would be useful to the community.

Pepi Fabbiano raised the issue of support for digitization of plates. An example project is being pursued at Harvard. This implies a major effort in understanding calibration for the original data.

Consensus on activities:

  1. Develop policies for capturing digital data (images, graphical data based on a digital representation) in an approved form. [ BobHanisch, Pepi Fabbiano ]
  2. Characterize the current metadata curation procedures used for tabular data by CDS. [ FrancoiseGenova ]
  3. Develop a preservation repository example that would support digital data for publishers. This includes:
    • Developing tools for validating submissions. An attempt will be made to build upon existing tools for FITS validation. [ Pepi Fabbiano ]
    • Collaborations with the Data Models group on the testing of digital data for compliance with the SIA and SSA protocols
    • Interactions with publishers to establish a curation process [ BobHanisch, FrancoiseGenova ]
    • Development of an IVOA compliant repository for preserving the digital data [ ReaganMoore ]
  4. Establish interactions with existing preservation groups. This includes the IAU task force on digitization and preservation and preservation groups such as the MIT DSpace and Cornell Fedora development efforts.

Topic revision: r9 - 2018-06-20
