Use of
DataCite DOIs for Citing Astronomical Data
Arnold Rots, Raffaele D'Abrusco, Sherry Winkelman
Now that data citation has become a mature and accepted concept, there
is a need for a globally accepted mechanism for assigning Persistent
Identifiers (PID) to the cited datasets. About 15 years ago a working
group (ITWG) of the ADEC (the executive committee of the NASA data centers) defined a PID mechanism based on IVOA identifiers under the authority of the ADS which was agreed upon with the AAS journals. This system has served the Chandra Data Archive (CDA) well, but found limited following and clearly was designed for a limited community with equally little support.
At this time we do have available globally accepted PID types and data repositories in astronomy and other fields are generally adopting DOIs issued by
DataCite. These are eminently suitable and provide a good set of metadata. As the astronomical repositories are beginning to design and implement the use of these DOIs, this seems the right moment to meet together and discuss our plans and insights regarding their use, in order to attain a certain level of commonality in approach, which will benefit the discoverability of the cited datasets and flexibility in their access.
In this respect there are especially three design and implementation matters that are pertinent:
- The schema used for the DOIs; for instance, STScI currently mints DOIs that are publication-based, containing pointers to an aggregate of datasets, while CDA plans to augment these with a separate set of DOIs associated with single datasets.
- The metadata schema used for astronomical datasets: what elements do we use and what conventions govern their content; this is particularly relevant for data discovery directly through DataCite.
- The use of trailing fragments to allow access to components within a dataset.