Agenda

Discussion

Bob Hanisch described the NSF Datanet initiative, which is an interdisciplinary data curation and preservation program. Datanet projects must span different types of research communities. The NVO project is collaborating with a proposal team led by the JHU library, and has been successful in passing the pre-proposal and full proposal stages of the process. A site review was conducted a few weeks ago. Selections should be announced in August. Projects will be funded for up to $20M over five years, and may be continued for a second five years but with decreasing funding. Part of the challenge to Datanet projects is to make data curation and preservation sustainable; i.e., NSF does not plan to directly fund DCP efforts indefinitely. There may be various models for sustainability, including institutional commitment (research libraries), fees for data providers, fees for data consumers, and/or something else. Solutions may be different for research communities with strong commercial interests (biology, chemistry). The NVO interest in Datanet focuses on the data associated with peer reviewed papers: "homeless data". A solution involves a partnership with the professional societies, the publishers, and the research community.

Andy Lawrence asked about copyright concerns. Bob noted that the AAS has a liberal policy on re-use, and that it holds the copyright for the papers in the ApJ and AJ, not its publishing agent. Francoise Genova noted that they work closely with Astronomy & Astrophysics and have no problems with the tables that they maintain for them.

Dave De Young asked about the practicality of a DCP system for such a wide range of disciplines, from astronomy to ornithology. Bob said that they were starting with a very general data model for observations (drawing on VO experience) in which the where, when, what, how types of questions form the basis for the metadata. Whether this will be sufficiently powerful to span such diverse research areas remains to be seen.

Igor Chillangarian reported that CNES and the Observatory of Paris have been collaborating on data curation, and that he gave an introduction to the VO. He has also been using his own data as a suite of test cases, publishing them through the Observatory and making them available through VO protocols.

Francoise noted that the work they have been doing at CDS for many years now on the tables associated with publications has been successful, and pointed out that there is a significant cost associated with data curation.

Igor said that some authors do not want to share their data. Bob noted that we cannot insist on data publication without first having a means to support it, and that some journals (e.g., in gene sequencing) require data publication as a condition of acceptance of a manuscript.

Pepi Fabbiano described in very general terms the US interagency study on DCP. The report was commissioned by the White House and remains confidential until it is accepted by the National Science Board. Pepi served on the subcommittee on interoperability. The study encompassed all government agencies.

Pepi described a similar effort going on within the Smithsonian Institution, which is also very diverse (astronomy, natural history, zoology, meteorology, and museums with many physical artifacts). Some people are just beginning to understand the importance of a digital record. Common access to diverse digital data could enable truly cross-cutting investigations such as in climate change and biodiversity.

Bob showed the website for an upcoming meeting, co-organized by Wolfgang Voges, in which DCP efforts of the Max-Planck-Institutes will be discussed.

Andy reminded us that the UK was active in DCP efforts through the Digital Curation Center (Bob Mann will be assisting with our IVOA activities).

Andy also pointed out the importance of the research library community, and that they command significant resources. And that commercial journal publishers are not sleeping. There is also the open access movement. It is a highly fluid time in publishing, libraries, and information management.

We then turned to discussion of the White Paper. We should be sure to discuss at-risk data (digital and non-digital) and consider the cost-benefit arguments; is all data worth saving? We agreed that preservation of software is very difficult, and maybe even impossible. Software might best be archived via its effect on data (e.g., SDSS DR 1, 2, 3, ... 6. Fine to capture source code. We should describe DCP success stories and then see if the approaches they used are scalable. We need to define the audience for the White Paper. Idea is that it is a policy statement, backed up by the motivation and benefits of DCP, that can be used as leverage by academic departments, institutions, projects, and national and international organizations. Wolfgang has some funding to support work on this White Paper. Our goal is to have a draft ready for discussion at the Interop meeting in October.

-- BobHanisch, based on notes taken by FrancoiseGenova


Topic attachments
I Attachment Action Size Date Who Comment
Microsoft Word filedoc DCPWhitePaper0p1.doc manage 55.5 K 2008-05-16 - 18:52 BobHanisch  
Topic revision: r4 - 2008-05-27 - BobHanisch
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback