Provenance Meeting Paris, 2017 ============================== 27.+28.07.2017 Attending: Kristin Riebe, Mathieu Servillat, Michele Sanguillon, Mireille Louys, Francois Bonnarel (remotely), Markus Nullmeier, Anastasia Galkin, Ole Streicher; on Friday (28.07.) also: Karl Kosack, Catherine Boisson, Julien Lefaucheur Notes from Kristin Riebe. Thursday, 27.07.2017 -------------------- 1) Setting up schedule: Provenance Data Model -> Recommendation track? - Discussion inside the Data model group --> 4 weeks - DM chair/vice chair decides to go for RFC - Request for comments (RFC) period to all IVOA --> 4 weeks - 2 weeks for Technical Coordination Group (TCG), have to provide comments ==> Ready for recommendation! Contributions for ADASS etc.: * Talk by Mathieu: CTA Pipe and Opus UWS - large scale complex instrument, mentions IVOA Provenance DAta MOdel * Michele's poster: - voprov-focus - example title "Implementations of the Provenance Data Model" - within ADASS key topic: "Open sources in Astronomy" - Michèle finishes abstract draft by Monday, 31 July. - different blocks: -- one for each use case/implementations? -- some for collecting (cta-pipe, opus, Hips (ProvTap-Service!!)), some for serving provenance (voprov, pollux, django-prov_vo) -- what does each implem. highlight, which parts does it use, ... -- CTA-Pipe package contains a reusable Provenance Class. --> also see the cellphone-pictures from the blackboard for drafts of how this poster may look like * **Task:** everyone with an implementation prepare a paragraph that could be included in the poster --> due before Monday evening, so Michèle can prepare her abstract and register for ADASS * **Task:** everyone fill his/her section on implementation details in the implementation note. But we can still finalize the implementation note during the review process. * Possible reference implementations: - ctap_pipe will be public! It's already at GitHub. But it's rather a use case, not really a reference implementation. 2) Implementations * Talk on RAVE Web application (by (Kristin) * created now reusable Django package for provenance * ProvDAL implemented * Discussion: - STEP=LAST or =ONE or =1? - Lively discussion on how many steps to include etc. - undireted graph traversal (parameter DEPTH)? - or combine DEPTH with DIRECTION keyword? - We voted for using BACKWARD and FORWARD keyword, values are 1, 2, 3, etc. or ALL - FORWARD is optional - default values are: BACKWARD=ALL and FORWARD=0 - if a service does NOT implement FORWARD, it should return an error, if this keyword appears in the URL request - some parts were discussed on the 2. day again, see below * OPUS: use ActivityDescription for defining jobs - provjson files etc. are returned - uses prov-python-library to convert prov to different formats - just one step, so it's easy - VOTable: used-group, with group parameter - input-attributes: image/fits as mime-type? but my fits-file is not an image * CTAPipe: close to recording input/output automatically, developers have to use special read/write functions that at the same time record the provenance ctapipe/core/provenance.py --> quite independent, can also handle nested activity - entity/used/etc.: just input/output, no roles for this yet. - add creation time as additional attribute for entity? Meet again: Friday, 9:00, same location Friday, 28.07.2017 ------------------- 2.2 Implementations ------------------- * CDS implementation (Mireille, Francois) * Francois Bock: TAP service * Workflow 1: from photographic schmidt plates to RGB color composition * feed data into DB through VOTable list of instances * want to use external IVOA prov. representation * derivedfrom: automatically generated from Ent.->Act.->Ent. - may not work in general, but for this use case it's good * includes ActivityDescription * Prov. interface to select grahics/text output, back/forward * RGB roles? need to be defined in the Activitydescription, which roles are needed * We should have a list of ActivityDescriptions in the VO, which can be reused by anyone * * Workflow 2: HiPS - make query joins to get properties of entities and activities * json-output as reponse from a TAP-query to Provenance, -> This is just an extract of instances that are stored in the database - voprov:entity.label etc. are the utypes from the underlying class instances - mixes activity and entity attributes in the query response -> This is NOT a prov-json output or prov-votable, but just a votable or json with mixed attributes as query response. - Probably cannot be handled differently, because TAP requires to return just 1 VOTABLE as response. * CTA (Karl Kosack): - using grid jobs for their "activity", can only write to scratch-filesystem, then later ingest into archive - files on scratch will be overwritten - how to make sure that the entities are the same, if its output from an activity and reused in another? - ActivityFlow is needed * Pollux (Michèle): - tried to add DOI - implements Prov on top of Pollux, i.e. maps existing stuff to prov output -> great use case (not as prototype implementation, but as application) * ProvDAL discussion (continued): - Kristin had some more issues: what does "BACKWARD" mean? => going along the Provenance direction for any relation, except Agent relations and membership? (that's closer to what the user expects) - Markus argued against that, for simpler implementation, i.e. make no exceptions based on type of relation, but treat all relations equally - In a provenance graph, all relations have got a direction, follow this direction in the BACKWARD case, strictly, no exceptions. - In the FORWARD case, go along the relations in reverse provenance direction. - If one wants more user-friendliness, implement additionally the parameters: + expand_agentrelations: for including agents always + expand_collections: for including membership to collections/members of collections + expand_activityflow: for including membership to activityflow/members of an activityflow - We kind of agreed on this, but not everyone is really happy with this solution. - **TODO: Implementations will have to show, if this is the way to go or if there is a better way.** 3.) Draft discussion --------------------- * Mathieu goes through the draft, everyone comments and makes suggestions where needed * EntityDescription discussion shifted to the end * many smaller things to be changed --> mainly Mathieu, Kristin * EntityDescription: Do we really need it? - Mathieu uses mime-types to define expected input/output with OPUS UWS service, e.g. image/fits, text/ascii (but: fits can also contain table ...) - EntityDescription could contain all the things that we know already *before* and entity exists, e.g. format, structure, content-type (category) * Tasks: - **Mathieu:** - add context/configuration in introduction - rewrite EntityDescription section - **Mireille:** - update first figure (overview diagram) in section 2: remove attributes from relation boxes - **Kristin:** - add attribute 'category' to Table 4, EntityDescription - at 'wasDerivedFrom' section, remove sentence after 'shortcut.' - at Entity-Activity-relations, remove the part on timing - add diagram on Entity-Activity relations to the corresponding section - add section on serialisation of W3C from ProvenanceDM classes (move from implementation note) - include serialisation example of ActivityFlow - shorten the serialisation examples (no annotation etc.) - reverse column order in data model mapping tables, so readers see how to build Prov-attributes from existing implementations of other data models - more explicitly mention possible FITS extension for provenance - add currently discussed changes for ProvDAL - **TODO for someone**: - add a table with example activity-types - add tables with attributes for each relation (already done for wasDerivedFrom) - Shall we add default entity roles for specific entity(descriptions)? - add size, min, max, enum to ParameterDescription - write a section on what services need in order to implement provenance * VODML version: - We already have the auto-generated html-documentation for our model, using modelio-export - **TODO Kristin:** add a direct link to this documentation in the draft - Kristin also wrote two scripts for extracting and adding descriptions for each class and attribute into/from an extra file from/to the vodml-xml file, from which the html-docu will be generated. - **TODO Kristin:** add how to use these scripts in the README in vodml-directory (also: check why style of this html-docu has changed) * We should have validators at some point, e.g. for the voprov-serialisation formats.