Provenance Day at Heidelberg, June 14th ======================================== Start: around 10:30 am End: around 7:30 pm Attendants: Miteille Louys, Francois Bonnarell, Markus Nullmeier, Markus Demleitner (until 1 pm), Michèle Sanguillon, Mathieu Servillat, Kristin Riebe * Mireille and Kristin started going through the agenda, fixing items to be discussed, updating details on Provenance webpage (roadmap). * Mathieu reported on his meeting with CTA pipeline people and talked about their feedback and ideas, provenance meets the "real world". Pipeline could automatically provide provenance if e.g. by using a customized "open"-function etc. and use information on opening/closing files and input parameters of functions for finding input entities etc. * Kristin talked a bit about the javascript-libs behind her small RAVE prov. application -- sankey.js and d3.js. She'll provide some instructions for how she used them somewhere. * Discussion on possible contributions to ADASS: - Mireille will attend as invited speaker - Mathieu will attend, could present CTA and Provenance in CTA - Michele was uncertain, if she attends - Kristin will attend, could present a general talk on provenance DM * SimDM: - Mireille: Is this high priority? - Kristin: I want to do this anyway, part of another project, will be helpful to compare both, SimDM and ProvDM and see which concepts may be reused - For the provenance draft we should come up with a mapping of classes between SimDM and ProvDM, in order to identify similar patterns - Mathieu: SimDM focuses more on the parameters, there's a Parameter class in SimDM, which could be a good reason for having it in ProvDM as well * Next provenance meeting: - would be best if in July already, ~ 18th - 20th July - Paris Observatory - Mathieu will check, if there's a free room, if not, the last full week of August (starting on 22nd) is an alternative - rather 1 full long day than having two half days - invite someone from SimDM and/or CTA pipeline as well - next time maybe in Montpellier * Data model discussions: - Kristin talked about the map between Activity and Entity, thoughts from discussion at previous InterOp (South Africa), at provenance dinner - rather need two separate mapping-classes for input/output, i.e. use "Used" and "WasGeneratedBy" instead of one "ActivityEntityMap" - more useful, because with just one map one can not ensure that there is only one wasGeneratedBy-relation for each entity - if there is a wasDerivedFrom-relationship included, this will put additional constraints on the derivation-activity (if it is also present in the provenance information) - wasDerivedFrom adds an ambiguity that Markus D. does not like at all - VODML: Mireille will rebuild the uml-diagram in Modelio (latest version 3.5), for using later on script to export everything as VO-DML, before the next meeting - discussion on "role" attribute in the mapping classes and what it could be used for - We created mock examples for clarifying possible role-values and description classes, using "FlatField" activity: **ActivityDescription** ``` id: “ex:flatfield” name/label? : “Flat field division” -- arbitrary name or label type: “calibration” subtype: “flatfieldDivision” -> predefined list/vocabulary description: “Dividing a raw image by a flat field image” version: url: link html page or code description or rdf or to vocabulary-description? [depending on the request from the client] shall make a human understand what the Activity does bibref: ``` **UsedDescription** DB table would contain 2 entries: ``` ActivitiyDescriptionId: ex:flatfield EntityDescriptionId: ex:flatfieldImage role: flatfieldImage ``` ``` ActivitiyDescriptionId: ex:flatfield EntityDescriptionId: ex:rawImage role: rawImage ``` **EntityDescription** DB table would contain 2 entries: ``` id: ex:flatfieldImage type: image subtype: flatfield-sky description: "A sky flatfield image" ``` ``` id: ex:rawImage type: image description: "A raw image from an observation" ``` - Possible "role"-values are defined by the ActivityDescription (and its relation to EntityDescriptions) - different ActivityDescriptions may reuse the same roles - example use case: "Select all entities that were usedas calibration-image" - Probably it makes more sense to use less fine-grained entity-descriptions, i.e. use just "image" and then make clear from the "role" in the "Used"-table, as what the image was used. - Maybe include the `calib_level` for data_products, this could help to ensure that calibration-steps are only applied to raw images (of the correct calib_level) and not for science-ready images - Thus we may need only 1 EntityDescription "Image", including attributes: ``` id: ex:image type: voprov:image description: "An image" obscore:dataproduct_Type = [image, spectrum, ...] obscore:calib_level: [0,1,...] obscore:dataproduct_Subtype ``` - primary key in entity-activity-map tables (used, wasGeneratedBy) must be constructed fron ActivityDescriptionId + EntityDescriptionId + role(Id), because their can be e.g. multiple images used as input image, but with different roles, for the same activity. - Maybe use multiplicity-attribute in addition to "role" as well, e.g. for a "Fusion"-activity 2 or more images of the same type ("Image") could be required * Serialisation of Prov-information - EntityDescriptions => see ObsCore - Mathieu will look at this for the next meeting - once we have a VO-DML version, this will also provide a serialisation format * ProvDiscovery/ProvDAL/ProvAccess: - make a ProvTap query, get json or xml format as output - use ObsCore - all should write down query examples for this, e.g. + select entity from ProvCore where entityId = ... + given an entityId: * return all relations * return all progenitors * return this entity's details * New use case: HIPS generation * Provenance talk for Asterics Forum: - will be held by Mathieu - Michele and Kristin provide one slide each with their specific use case ## Working draft discussion: * minimum requirements: - Most important tasks are: + Give me all the progenitors for a data set + Give me all activities involved in producing a data set + View agents * = responsibility view in W3C * Maybe skip them in the very first requirements? or assing lower priority, since not needed for each use case? - TODO-everyone: think about minimum requirements for your own use case, present at next Provenance Day => find then the common min. requirements * There are different detail levels for provenance information, but the granularity of provenance will have to be defined by each project/use case individually. => Mention this in the introduction of the working draft * Put section 1.2.4 as paragraph into introduction * Francois will write a section on previous efforts * Mention in introduction that provenance is not defining workflows, it's about past history => Introduction should contains goal of provenance model and what is not the scope of it * Meaning of provenance, requirements written by Mathieu (?) * Restructure model description sections: merge them together, "This is our model, and we see similar patterns in W3C ..." * use tables instead of lists for listing attributes, for better overview, easier to spot * Michele writes section on collections (EntityCollection, ActivityCollection) * mapping SimDM and ProvDM should go into appendix * skip instrument details and ambient conditions * split LaTeX-file into many sections, for easier editing without major conflicts * put version 0.1 (pdf) somewhere, rename to version 0.2 now. ## Summary of action items: * everyone: minimum requirements: - think about requirements for your own use case - write down more detailed questions (query examples) from your use case - as preparation for next PROV-discussions in July * Kristin: prov-javascript: - write down somewhere which javascript libraries can be used for provenance information (sankey.js, d3.js, compare with prov.js?) * Kristin, Mathieu, anyone else who will register for ADASS: - write abstracts for ADASS talks * Kristin: SimDM: - comparison of ProvDM and SimDM using a simulation example - map classes of SimDM and ProvDM to each other * Mathieu: Next meeting: - check availability of rooms at Paris observatory at 18th - 20th July or week of 22. August - one full ProvenanceDay (not 2 half days) * Mireille: VO-DML - build a VODML version of the data model, using Modelio (latest version 3.5) - before the next meeting * Mathieu: EntityDescriptions and ObsCore - prepare something for next meeting * Francois: Working draft - section on previous efforts (refer to ObsDM note 2005?) - write section on Provenance access (ProvTAP/ProvDAL) * Mathieu (?): contribute to introduction, meaning of provenane, why useful * Michele: section on collections (EntityCollection, ActivityCollection) * Kristin: Working draft restructuring * Mireille: collect minutes and send them out