Provenance meeting in Paris, France, 2018 August 28th to August 30th
supported by OV-France, GAVO, Paris Data Center and Asterics Project
Follow_up_meeting
hosted at Observatoire de Paris, France
Finalising the proposed recommendation for Fall interoperability meeting
Minutes / ML to be completed
side discussion before start:
Poster for Adass conference ???
- Mathias Fussling (?)/ Karl Kasak to present CTA-O science user support : a provenance flavour on CTA / CTA data processing
- ProvDM data model poster :Mathieu or Mireille as first author ?
- Implementation for CTA : Mathieu ?
- Implementation prototype in Tripple store : Mireille
To be decided before Thursday noon.
The ProvDM draft update we plan to validate the next days:
- Activity Flow : Designed as a coarse grain Activity in Prov DM / Designed as a plan in Prov-One
--> Skip it for a next version
Prov SAP : New draft updated by Kristin. Output format: in a multi layer graph, iDs should be unique for merging several graphs ( ?)
Do you plan graph merging ? Kristin first attempt : All nodes are present only once
What does DEPTH= mean :
1- ? Generation levels for entities
2- ? Relations between 2 classes of Prov Core : then we need to subclass the kind of relations entity/entity or entity/activity/entity?
Presentations around the table —Tour de table - participants
*
Mathieu Servillat , Luth
- Anastasia Galkin , Kristin Riebe, IAP, Postdam
- Michael Johnson, post doc at Southampton . Co advised by Luc Moreau and ?
- Markus Nullmeier, ARI, Heidelberg
- Catherine Boisson, Luth , paris
- Michèle Sanguillon, LUPM, Montpellier
- François Bonnarel, Mireille Louys, CDS Strasbourg
Discussed other data models presented at
Provenance week:
- Prov-one : scientific workflows
- UniProv : a binding of Prov DM + PROV-ONE
Parameter discussion :
are there context where they are different from entities .
Simulation ? A result is computed from an activity : is it an entity ? A parameter?
Proposal :
WasDerivedFrom is the relation to express a parameter stems from an Entity ?
still to discuss
Scope for the draft ?
- ActivityFlow Workflow postponed for next version Prov 2.0
- Will take benefit of Prov-One concepts and discussion with authors
— End of day 1
Wednesday Aug 29
Presentations
- ProvDM draft where we stand ? - MS Discussion What format and content should come out of Prov-SAP? Proposed by Postdam: This the tool to be compatible with W3C tools : Agreed
Will display more concepts than Prov
W3C,
Extends attributes provided in the core
W3C to specific entity classes.
But each class derived from entity can be shown as entity type : parameter, parameter description, Activity description, etc…
Translation for relations:
IVOA —>
W3C
had context —> was influenced by
had config —> was influenced by
But then it is not easy to read for the user .
Looses the fine classification for applications sorting the provenance info .
Should we derive new visualisation methods for the ivoa? Enrich the Provenance suite for instance …
- CTA pipe - MS
CtA pipe : how calibration provenance is simply recorded.
Input output context config —> trace what happens
This is to cover the CTA pipeline execution
ML: I see it as a workflow tracking system
- Opus . MS
Job management : PDAC service at observatoire de <Paris
The description of the step is taken from the
ActivityDescription definitions
A form can be filled by the job subscriptor user:
Describe the
ActivityDescription record with Param names , UCD , datatypes
Used , generated section
Includes defaults parameter values
When the job is run, values are transmitted
Provenance graph is generated at the end of job execution and displayed .
DEPTH param in prov SAP:
Option All agents - means
track them all or
show them all ??
To be discussed
Identifying files locally and across various partners or a common store :
Used name space to generate the unique ids.
- Anastasia - provenance for the Applause data base
Log pages to be hooked to the scanning process : you could have a scanning activity .
Not bound to a simple Prov object. Not clear .
Provenance for light curves.
The source for each point comes from a different observation. It makes the graph too crowded.
How to distinguish data points provided, from observations provenance ?
A source/ rather a detection/ as an entity .
How can you adjust the granularity ?
? Michael :
Did you try to use summary functions from the
ProvToolBox
Summaries ? It helps to gather info in a coarse grain view.
- Michael johnson - astronomy WFlows
2 use cases :
-
- Lsst differential photometry : is it a new detection or an error ?
b. Was the image badly calibrated due to a bad choice in the standard star? Trace how the standard star is measured and chosen.
What you pay for provenance : +45% info to store
what you get : trust improve 99% for use case 1 , and % ( a bit less for use case 2
UML2PROV Carlos … to generate
Provenance helps to upgrade data quality and trust
Highly use case dependent : you must track the info that may cause errors or improve trust
—> this feeds the Activity desc and parameters desc
How to construct Prov templates ?
What is the status ?
- Prov SAP for Pollux- Michèle
Agents : which Agents ? All , only one activity?
Mainly to track the history of a data set
Prov tap is more general
Parameters responseFormat
ML ? should RESPONSE Format be for graphics format ?
How could we distinguish both outputs?
Response : this can be chained to another application : keep only one output
Probably we will have advanced viewers that develops groups of metadata from the viewer hierarchical browsing.
- Kristin - Provenance for the Rave Survey
Shows a
ProvSap client interface
Sankey representation for graphics
Is a possible output format
They are exclusive choices . Need to run it again if you want it again as metadata .
Then it is not scriptable ? What to do with all the SVG files ?
What means DEPTH ? Relations counting or entity to entity generation layer?
- François Bonnarel ProvTAP
Draft ready since April, please provide comments
Prototype for TAP service :
This is an implementation on a simple database
Tap service working
Interface with aladin , topcat, etc .
To do to enrich the CDS prototype? insert the 300 Hips files
question Anastasia: How many Gbytes costs the Provenance metadata?
Store data provenance for any source ?
Is it possible?
Discuss with the Gaia people ?
François: If you can query the data base in SQL, and store all your sources, then TAP can do
If you have a unique id for an entity and a dataset publisher id ,
You can query both sides : the provenance data base , and the Obscore data base , by a data link , by a broadcast mechanism from one data base to the other ?
Give me the provenance of obs-publisherid =12345. —> query provtap or provsap with e-id =12345
And vice versa : find an e-id and build up the query on an Obstap service …..
Pbs with
PosgresQL and server settings …
May be try docker …
a new package can include extra relations into a
W3C provenance document and visualize them
Extensions of the visualisation
+ Working Draft discussion :
Check the DM Requirements:
if an entity exist, there is an Activity to create it.
Update the expression of these constraints.
Identifiers : must be unique .
Replica of an entity ? is another entity. Must be specified as a copy
Agreed on : Entities, Activities, Agent must be uniquely identifiable.
12 . Contact info should be recorded for all Activities and entities …
i.e. there is always a known Agent linked with them ( even a general one) .
What is entityDescription? Do we need it ? Who did implement it?
How do we describe the role and the constraints to be used by an activity entities when they are to be used by an activity.
Used and used description /
What is needed for Input params ?
An id
A title
For us we can have a role
What is needed for output params ?
MS: The description side is the degree of freedom for each project to feed its specific metadata.
EntityDescription could be one of this placeholder ???
We need more info about this… more practical usage examples.
Parameter :
What is a parameter in our model?
A parameter cannot be used or generated by ?
Is an entity but restricted…
It can be derived from an Entity when it is Data.
?? Do we need to say it ??
We coud also setup this rule: If your parameter must be traced , define it as an entity .
EntityDescription
What would you put here ?
I would put the required format for my entities to be used with Activitydesc Axx
The role
example :
ActivityDescription : regridding
Constraints
input-format=Fits. ??? Mimetype ?
output formats= fits
errorMapPreviewformats = jpeg
Sextractor : with particular output styles : .dat --> check for an example to describe
Do we specialize the table for each particular specialized entities.
Yes for a full archive ….
What goes into the draft:
- ParameterDescription
- EntityDescription : implemented as
- François : Doculink to a list of properties of the data used as input entities
- Mathieu : Parameter from the UWS interface : MimeType
- Michèle: as a doculink to a list of information on the data structure./ documentation
Explain that this class can be used to hook specific info for a project explaining the constraints on an entity
Proposal MS:This is not normative in the document.
- Discussion on the Description part of the model:( in Appendix)
this part is important to provide metadata to the Core Model.
We explore the PROV-ONE proposition and have also checked the possibility of the
W3C Plan .
Mention that we consider we could reuse the input and output ports from PROV-ONE
Process from PROV-ONE is similar to our Activity Description, but may be not with the same granularity.
To be checked .
— End of day 2
Useful links:
Add useful links here