Provenance meeting in Paris, France, 2018 August 28th to August 30th

supported by OV-France, GAVO, Paris Data Center and Asterics Project

Follow_up_meeting

Hosted at Observatoire de Paris, France

Agenda details and talks provided at the indico page at https://indico.obspm.fr/event/59/
Finalising the proposed recommendation for Fall interoperability meeting

Minutes of the meetings/ ML to be completed

Side discussion before start: Poster for Adass conference ???

planned:

  • Mathias Fussling (?)/ Karl Kosak to present CTA-O science user support : a provenance flavour on CTA / CTA data processing
  • ProvDM data model poster :Mathieu or Mireille as first author ?
  • Implementation for CTA : Mathieu ?
  • Implementation prototype in Tripple store : Mireille
To be decided before Thursday noon.

The ProvDM draft update we plan to validate the next days:

  • Activity Flow : Designed as a coarse grain Activity in Prov DM / Designed as a plan in Prov-One --> Skip it for a next version
Prov SAP : New draft updated by Kristin. Output format: in a multi layer graph, iDs should be unique for merging several graphs ( ?) . Do you plan graph merging ? Kristin first attempt : All nodes are present only once

What does DEPTH= mean :

  • 1- ? Generation levels for entities
  • 2- ? Relations between 2 classes of Prov Core : then we need to subclass the kind of relations entity/entity or entity/activity/entity?

Presentations around the table —Tour de table - participants

  • Mathieu Servillat , Luth
  • Anastasia Galkin , Kristin Riebe, IAP, Postdam
  • Michael Johnson, post doc at Southampton . Co advised by Luc Moreau and ?
  • Markus Nullmeier, ARI, Heidelberg
  • Catherine Boisson, Luth , paris
  • Michèle Sanguillon, LUPM, Montpellier
  • François Bonnarel, Mireille Louys, CDS Strasbourg
Discussed other data models presented at Provenance week:
  • Prov-one : scientific workflows
  • UniProv : a binding of Prov DM + PROV-ONE
Parameter discussion : are there context where they are different from entities . Simulation ? A result is computed from an activity : is it an entity ? A parameter? Proposal : wasDerivedFrom is the relation to express a parameter stems from an Entity ? -->still to discuss

Scope for the draft ?

  • ActivityFlow Workflow postponed for next version Prov 2.0
  • Will take benefit of Prov-One concepts and discussion with authors
— End of day 1

Wednesday Aug 29

Presentations

  1. ProvDM draft where we stand ? - MS Discussion What format and content should come out of Prov-SAP? Proposed by Postdam: This the tool to be compatible with W3C tools : Agreed
Will display more concepts than Prov W3C, Extends attributes provided in the core W3C to specific entity classes. But each class derived from entity can be shown as entity type : parameter, parameter description, Activity description, etc…

Translation for relations: IVOA —> W3C

  • had context —> was influenced by
  • had config —> was influenced by
But then it is not easy to read for the user. Looses the fine classification for applications sorting the provenance info.

Should we derive new visualisation methods for the ivoa serialisation? Enrich the Southampton Provenance Suite for instance …

CTA pipe - MS

!CTA pipe : how calibration provenance is simply recorded.

Input output context config —> trace what happens. This is to cover the CTA pipeline execution . you can see it as a workflow tracking system

Opus . MS

Job management : PDAC service at observatoire de Paris. The description of the step is taken from the ActivityDescription definitions. A form can be filled by the job subscriptor user:

Describe the ActivityDescription record with Param names, UCD , datatypes, Used , Generated section in the form. Includes defaults parameter values. When the job is run, values are transmitted. The Provenance graph is generated at the end of job execution and displayed .

DEPTH param in prov SAP: Option All agents - means track them all or show them all ?? To be discussed

Identifying files locally and across various partners or a common store : Used name space to generate the unique ids.

Anastasia - provenance for the Applause data base

https://cloud.aip.de/index.php/s/aY6EccPAQFepH1I

Log pages to be hooked to the scanning process : you could have a scanning activity. Not bound to a simple Prov object. Not clear . Provenance for light curves. The source for each point comes from a different observation. It makes the graph too crowded.

How to distinguish data points provided, from observations provenance ? A source/ rather a detection/ as an entity . How can you adjust the granularity ?

? Michael : Did you try to use summary functions from the ProvToolBox Summaries ? It helps to gather info in a coarse grain view.

Michael johnson - Astronomy WFlows

2 use cases :

a. Lsst differential photometry : is it a new detection or an error ?

b. Was the image badly calibrated due to a bad choice in the standard star? Trace how the standard star is measured and chosen.

What you pay for provenance : +45% info to store what you get : trust improve 99% for use case 1 , and % ( a bit less for use case 2

UML2PROV Carlos … to generate Provenance templates helps to upgrade data quality and trust Highly use case dependent : you must design what to track by selecting the info that may cause errors or improve trust.

—> this feeds the Activity desc and parameters desc

How to construct Prov templates ? What is the status ?

Prov SAP for Pollux- Michèle

Agents : which Agents ? All, only one activity? Mainly to track the history of a data set.

More focused than ProvTAP which will allow to ask a larger set of queries by design.

?Parameters responseFormat

ML ? should RESPONSE Format be for graphics format ? How could we distinguish both outputs? Response : this can be chained to another application : keep only one output? need to be disussed further.

Probably we will have advanced viewers that develops groups of metadata from the viewer using a hierarchical browsing function.

Kristin - Provenance for the Rave Survey

Shows a ProvSap client interface. Sankey representation for graphics.

?ML: Graphics is a possible output format but the options are exclusive choices . Need to run it again if you want it again as metadata . Then it is not scriptable ? What to do with all the SVG files ?

?What means DEPTH ? Relations counting or entity to entity generation layer?

Discussion:

Kristin : Propose to give up the IVOA JSON format. (and use only Prov-JSON)

Markus : we have several JSON flavours in the game. May be better to leave it open and see how people implement it.

François Bonnarel ProvTAP

the Prov TAP Draft is ready since April, please provide comments on the documents.

demonstrate a prototype for a Provenance TAP service :

This is an implementation on a simple database accessed through Tap service. Interface with aladin , topcat, etc . are working and allow to discover datasets served in Obscore , for instance from selection on the provenance metadata.

To do to enrich the CDS prototype? insert more Hips files ( planned 300 using an ingestion script)

question Anastasia: How many Gbytes costs the Provenance metadata? is the number of entities a pb in a relational database?

ichallenge: Store data provenance for any source ? Is it possible? Discuss with the Gaia people ?

François: If you can query the data base in SQL, and store all your sources, then TAP can do as long as you have a unique id for an entity and a dataset publisher id ,

You can query both sides : the provenance data base and the Obscore data base, by a data link , by a broadcast mechanism from one data base to the other ?

Give me the provenance of obs-publisherid =12345. —> query provtap or provsap with e-id =12345 And vice versa : find an e-id and build up the query on an Obstap service ….. Pbs with PosgresQL and server settings … May be try docker … a new package can include extra relations into a W3C provenance document and visualize them

Extensions of the visualisation functions is needed if we extend the model. ? PROV Suite authors will be ok with this ( from a discussion Mathieu-Duong)

Working Draft discussion / how to finalise it

Check the DM Requirements: if an entity exist, there is an Activity to create it. Update the expression of these constraints. Identifiers : must be unique . Replica of an entity ? is another entity. Must be specified as a copy Agreed on : Entities, Activities, Agent must be uniquely identifiable.

Requirement 12 . Contact info should be recorded for all Activities and entities …i.e. there is always a known Agent linked with them ( even a general one) .

What is entityDescription? Do we need it ? Who did implement it?

How do we describe the role and the constraints to be used by an activity entities when they are to be used by an activity. Used and used description /

  • What is needed for Input params ? An id, A title. For us we can have a role
  • What is needed for output params ?
MS: The description side is the degree of freedom for each project to feed its specific metadata. EntityDescription could be one of this placeholder ??? We need more info about this… more practical usage examples.
Parameter :

What is a parameter in our model? A parameter cannot be used or generated by ? Is an entity but restricted… It can be derived from an Entity when it is Data. ?? Do we need to say it ??

ML: We coud also setup this rule: If your parameter must be traced , define it as an entity .

EntityDescription

What would you put here ? I would put the required format for my entities to be used with Activitydescription 'Axyz'.

  • example : ActivityDescription : regridding
    • Constraints
      • input-format=Fits. ??? Mimetype ?
      • output formats= fits
      • errorMapPreviewformats = jpeg
      • etc..
  • Sextractor : with particular output styles : .dat --> check for an example to describe
Do we specialize the table for each particular specialized entities. Yes for a full archive …. when we have many instance of each subclass , it is usefull to store then in various tables.

What goes into the draft:

  • ParameterDescription
  • EntityDescription : implemented as
    • François : Doculink to a list of properties of the data used as input entities
    • Mathieu : Parameter from the UWS interface : MimeType
    • Michèle: as a doculink to a list of information on the data structure./ documentation
Explain that this class can be used to hook specific info for a project explaining the constraints on an entity

Proposal MS:This is not normative in the document.

  • Discussion on the Description part of the model:( in Appendix)
This part is important to provide metadata to the Core Model. We explore the PROV-ONE proposition and have also checked the possibility of the W3C Plan for ActivityDescription.

Mention that we consider we could reuse the input and output ports from PROV-ONE Process from PROV-ONE DM is similar to our Activity Description, but may be not with the same granularity. To be checked .

We notice that the EntityDescription even still fuzzy is interpreted as a way to convey metadata about entities usage and cannot just be removed straight.

We need more practice to clarify the possible usage .

It is really important to go for implementaions and sort out the refactoring and specialising possibilities from the feedback.

2 reference implementations are requested anyway for the IVOA review of the Proposed Recommendation.

— End of day 2

Useful links:

Most of the presentations are hoocked under the indico page mentionned above.

Topic revision: r5 - 2018-08-31 - MireilleLouys
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback