Provenance Data Model RFC #2 PageHistoryAfter the first RFC period (November 2018) during which no consensus was reached, the Provenance model has been revamped.
DocumentThe attached document presents the IVOA Provenance Data Model v1.0 which is proposed for review is accessible in ivoadoc. This document describes how provenance information can be modeled, stored and exchanged within the astronomical community in a standardized way. We follow the definition of provenance as proposed by the W3C (https://www.w3.org/TR/prov-overview/), i.e. that "provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness."Such provenance information in astronomy is important to enable any scientist to trace back the origin of a dataset (e.g. an image, spectrum, catalog or single points in a spectral energy distribution diagram or a light curve), a document (e.g. an article, a technical note) or a device (e.g. a camera, a telescope), learn about the people and organizations involved in a project and assess the reliability, quality as well as the usefulness of the dataset, document or device for her own scientific work.
Reference Implementation
CTA implementations (M. Servillat)The Cherenkov Telescope Array (CTA) is the next generation ground-based very high energy gamma-ray instrument. Contrary to previous Cherenkov experiments, it will serve as an open observatory providing data to a wide astrophysics community, with the requirement to propose self-described data products to users that may be unaware of the Cherenkov astronomy specificities. Because of the complexity in the detection process and in the data processing chain, provenance information of data products are necessary to the user to perform a correct scientific analysis. Provenance concepts are relevant for different aspects of CTA:
Pollux Provenance: a simple access protocol to provenance of theoretical spectra (M. Sanguillon)POLLUX is a stellar spectra database proposing access to high resolution synthetic spectra computed using the best available models of atmosphere (CMFGEN, ATLAS and MARCS), performant spectral synthesis codes (CMF_FLUX, SYNSPEC and TURBOSPECTRUM) and atomic line lists from VALD database and specific molecular line lists for cool stars. Currently the provenance information is given to the astronomer in the header of the spectra files (depending on the format: FITS, ASCII, XML, VOTable, ...) but in a non-normalized description format. The implementation of the provenance concepts in a standardized format allows users on one hand to benefit from tools to create, visualize and transform to another format the description of the provenance of these spectra and on a second hand to select data depending on provenance criteria. In this context, the ProvSAP protocol has been implemented to retrieve provenance information in different formats of the serialized data: PROV-N, PROV-JSON, PROV-XML, VOTABLE and to build diagrams in the following graphic formats: PDF, PNG, SVG. These serializations and graphics are generated using the voprov python package derived from the prov Python library (MIT license) developed by Trung Dong Huynh (University of Southampton).SVOM Quick Analysis (L. Michel)The SVOM satellite is a Sino-French variable object monitor to be launched in 2021. When a transient object is detected, a fast alert is sent to the ground station through a worldwide VHF network. Burst advocates and instrument scientists are in charge of evaluating the scientific relevance of the event. To complete the assement, scientists have at their disposal high level data products such as light curves or spectra generated by an automatic data reduction pipeline. In some case, they might need to reprocess raw data with refined parameters. To do so, scientific products (calib_leve >= 2) embedd their JSON provenance serialization in a specific extension. This provenance instance can be extracted, updated and then uploaded to a dedicated pipeline to reprocess the photon list with different parameters.ProvHIPS CDS prototype service providing provenance metadata for HiPS datasets stored at CDS. ( F. Bonnarel, A. Egner)This prototype is both an implementation of Provenance Data Model and of the DAL ProvTAP protocol. ProvTAP is a proposal for providing Provenance metadata via TAP services. The current draft for this DAL protocol definition (TBC) can be found here). It is basically providing a TAP-schema mapping the IVOA Provenance model onto a relational schema.
| ||||||||
Changed: | ||||||||
< < | We are currently creating a new version of our ProvHiPS database containing the history of the HiPS tiles for all the HST HIPS subsets, as well as the HiPS version of the DSS2 Schmidt plates survey. This is illustrated by this viewgraph and the result of this work will be presented at next interop meeting in Groningen. | |||||||
> > | We are currently creating a new version of our ProvHiPS database containing the history of the HiPS tiles for all the HST HIPS subsets (see here, here and there ) as well as the HiPS version of the DSS2 Schmidt plates survey. This is also globally illustrated by this viewgraph and the result of this work will be presented at next interop meeting in Groningen. | |||||||
A triple Store implementation for an image database (M.Louys, F.X Pineau, L.Holzmann, F.Bonnarel).We have implemented the IVOA provenance DM proposed using a BlazeGraph triplestore data base. It handles astronomical images as Entities with a succession of Activities that produce or consume them. Entities represent image files or photometric plates. Activities are described by an ActivityDescription instance which gives the template for execution of several different Activity instances based on the same template. / The Blazegraph prov-test prototype translates the classes via an Ontology of Object in OWL. The Prov_Owl diagram attached exposes the main objects of this ontology. This prototype implements:
MuseWise Provenance: Implementation of ProvTAP, ProvSAP, W3C, and visualisation (O. Streicher)MUSE is an integral field spectrograph installed at the Very Large Telescope (VLT) of the European Southern Observatory (ESO). It consists of 24 spectrographs, providing a 1x1arcmin FOV (7.5" in Narrow Field Mode) with 300x300 pixel. For each pixel, a spectrum covering the range 465-930nm is provided. MuseWise is the data reduction framework that is used within the MUSE collaboration. It is built on the Astro-WISE system, which has been extended to support 3D spectroscopic data and integrates data reduction, provenance tracking, quality control and data analysis. The MUSE pipeline is very flexible and offers a variety of options and alternative data flows to fit to the different instrument modi and scientific requirements. MuseWISE hast provenance "built-in", i.e. it stores all relevant provenance information during the execution of the pipeline. Our implementation presents this provenance information conform to the upcoming standard. Activity configuration is modelled as Entities, to enable accessing the provenance of the configuration, and for W3C compatibility. The provenance data are primarily presented as a relational database which is conform to the IVOA Provenance model and close to the upcoming ProvTAP standard. It can be queried via SQL and ADQL queries. The database covers the full processing chain from the exposure to the science-ready product, including all necessary entity, activity, usage, and generation descriptions. On top of the relational representation, we developed a few tools for alternative acess methods: 1. A complete ProvSAP sever that translates REST queries into relational queries for a provenance database, converts the result into the W3C Provenance model and presents it in W3C formats (XML, Prov-N, OWL2, JSON). The part of the IVOA model that is not W3C compatible (ActivityConfiguration package) is not queried. The result can be processed and stored by any W3C compatible tool and service. 2. A tabular vizualisation prototype of the provenance based on the W3C model (from the first tool), to present the data to the user. This page shows an example output for the complete chain of one data product. We also tested this prototype on other available data that we locally stored in databases with the same structure as our MuseWISE implementation: HIPS generation, CTA, and cube segmentation.APPLAUSE Provenance: publically available implementation via TAP (A. Galkin)German astronomical observatories own considerable collection of photographic plates. While these observations lead to significant discoveries in the past, they are also of interest for scientists today and in the future. In particular, for the study of long-term variability of many types of stars, these measurements are of immense scientific value. There are about 85000 plates in the archives of Hamburger Sternwarte, Dr. Karl Remeis-Sternwarte Bamberg, and Leibniz-Institut für Astrophysik Potsdam (AIP). The plates are digitized with high-resolution flatbed scanners. In addition, the corresponding plate envelopes and observation logbooks are digitized, and further metadata are entered into the database. The APPLAUSE implementaion of the provenance model is the only publically accessable imlementation and is accessable through the TAP protocol on https://www.plate-archive.org Currently, the provenance schema for the APPLAUSE Data Release 3 has 528082 entities, 425421 activities, 327 agents, 799097 used relations, 471888 wasGeneratedBy relations and 138552 wasAttributedTo relations.Implementations ValidatorsThere is no validator for the model as such. Validation tools should be applied to specific implementations of the model.RFC Review Period: 2019 July 23 - 2019 September 03Comments from WG membersTCG Review Period: TCG_start_date - TCG_end_dateTBC<--
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
Added: | ||||||||
> > |
| |||||||