## Provenance examples Working Example - Cube segmentation - M.Louys - 2019-02-21 datamodel version 1.2 / preparation for PR-version ML. ----------------------------------------------------------------------- here I describe the activity of a hyperspectral data cube segmentation, which takes a CALIFA hyper spectral data cube as input, a catalog of redshifts for galaxies in the Califa data collection and a list of emission lines to detect. It produces a set of 4 maps with the H\alpha amplitude, the radial velocity extracted from the line shift and the velocity dispersion in the field of the galaxy image, plus the residual error maps. The schema for the processing is modeled in IVOA Provenance DM as shown in the manually designed graph attached. A Modelio instance diagram Illustrates the links between the objects. // How to read this Instance of the model // comments and updates 2019.01.29 -A a1:: Activity is an instance of an Activity. I used the short id a1 to call the instance but all links will be operated through the attribute ids for each instance. 1, it helps to match and check . 2019.01.29-D The .description attribute which stands for a short text description is maintained . We did not find a better word : abstract/title/shortdescription/summary are not better 2019.01.31-E Changed .activityType and .entityType to .type 2019.01.31-F Used for type and subtypes in dasetdescription a vocabulary derived from Obscore dataproduct_type 2019.01.31-G updated the types values for various classes with respect to Mathieu's categories 2019.02.13 Study a hierarchy for the descriptions features so that the description tree could be browsed easily without connecting each left side instance to their description. (Laurent Michel 's suggestion) If many entities can be used or produced with the same role , the pointer to the description counterpart becomes necessary. // a1 ::Activity Activity .id = 2019-CubeSeg-jlx Activity .name= cubeseg_shell_12 Activity .startTime = 2019:01:01T09:13:00 Activity .endTime = 2019:01:01T010:07:35 Activity .activityDescription_id= ‘2019-cubeseg_1’ Activity .comment= 'segmentation en escargot test 12' //wasConfiguredby- Configuration part // p1 ::Parameter Parameter.id= p1_abc Parameter.value =(128, 128) Parameter.parameterDescription_id = ‘p1_central_position’ Parameter.activity_id = 2019-CubeSeg-jlx p2::Parameter Parameter.id= p2_path_567 Parameter.value =shell Parameter.parameterDescription_id = ‘p2_estimation_strategy’ Parameter.activity_id = 2019-CubeSeg-jlx // here the Activity--> Parameter link is implemented reversely param--> activity p2_d ::ParameterDescription ParameterDescription.id= 'p2_estimation_strategy' ParameterDescription.default =‘shell’ ParameterDescription.name=‘linefittingStrategy‘ ParameterDescription.options=[‘single’, ‘shell’]. //settings_available ParameterDescription.description=‘the initialisation of line fitting is extrapolated from the guess at neighboring spaxel (shell mode) or just from the single spaxel (single)' ParameterDescription.activity_description_id =2019-cubseg_1 p1_d ::ParameterDescription ParameterDescription.id= ‘p1_central_position’ ParameterDescription.datatype=integer // here I reused the Votable notation : datatype // instead of Valuetype ParameterDescription.unit=‘’ ParameterDescription.ucd=pos.cartesian ParameterDescription.default = (1,1) ParameterDescription.name= ‘start position’ ParameterDescription.description=‘Start position in pixel number (nbline, nbcol)’ ParameterDescription.activity_description_id = 2019-cubseg_1 ad_1 ::ActivityDescription ActivityDescription.id= ‘2019-cubeseg_1’ ActivityDescription.name= ‘VelocitySegmentation’ ActivityDescription.version=1.0 ActivityDescription.description= ‘Hyperspectral cube line fitting for velocity maps ’ ActivityDescription.doculink= ‘’ ActivityDescription.type= ‘image processing’ // can be part of a specialized vocabulary // ActivityDescription.subtype= ‘cube segmentation ; line fitting’ // here the ActivityDescription--> ParameterDescription link is implemented reversely parameterDescription--> activityDescription // Used and input Entities Ui1::Used Used.entity_id='e_765' Used.role= "input cube" Used.activity_id= 2019-CubeSeg-jlx Used.usageDescription_id=ED_a e1 :: DatasetEntity DatasetEntity.id='e_765' DatasetEntity.name=CubeCalifa-NCG0036 DatasetEntity.comment='points to Califa datacube NGC0036' DatasetEntity.location=http://cds_hips_sdss.cdsarc.unistra.fr/e_123 Ui2::Used Used.role='input line list' Used.entity_id= e_line_A1 Used.activity_id= 2019-CubeSeg-jlx Used.usageDescription_id=ED_b e2 :: DatasetEntity DatasetEntity.id='e_line_A1' DatasetEntity.name=requestedLinelist DatasetEntity.comment= ‘line list to fit’ DatasetEntity.location=http://seafile@unistra/mlouys/segCalifa/l1.csv Ui3::Used Used.role='input galaxy catalog' Used.entity_id= e_catalog_Califa_V500 Used.activity_id= 2019-CubeSeg-jlx Used.usageDescription_id=ED_c e3 :: DatasetEntity DatasetEntity.id='e_catalog_Califa_V500' DatasetEntity.name= Califa_catalog_selection DatasetEntity.location=http://seafile.unistra.fr/califaSeg-select/galaxy_params // wasGeneratedBy and result Entities Wgb1::wasGeneratedBy wasGeneratedBy.role= ‘result Halpha amplitude image’ wasGeneratedBy.entity_id= e_Halpha_AmpMap wasGeneratedBy.activity_id= 2019-CubeSeg-jlx wasGeneratedBy.generationDescription_id=HalphaAmp_1 e4::DatasetEntity // e4 DatasetEntity.id=e_Halpha_AmpMap DatasetEntity.name=Halpha_Amplitude_Map DatasetDescription.comment= ‘halpha amplitude image map’ DatasetEntity.location=http://seafile.unistra.fr/califaSeg-select/ampl_map_IC5376 wgb2::wasGeneratedBy wasGeneratedBy.role= ‘result Velocity map Halpha’ wasGeneratedBy.entity_id= e_Halpha_Veloc_x1 wasGeneratedBy.activity_id= 2019-CubeSeg-jlx wasGeneratedBy.generationDescription_id=HalphaVelc_1 e5 ::DatasetEntity // e5 DatasetEntity.id='e_Halpha_Veloc_x1' DatasetEntity.name=Halpha_veloc_1 DatasetEntity.comment= ‘halpha velocity image map’ DatasetEntity.location=http://seafile.unistra.fr/califaSeg-select/veloc_map_IC5376 wgb3::wasGeneratedBy wasGeneratedBy.role= 'result Velocity Dispersion map Halpha’ wasGeneratedBy.entity_id= e_Halpha_VelocDisp_x1 wasGeneratedBy.activity_id= 2019-CubeSeg-jlx wasGeneratedBy.generationDescription_id=Halpha_VelocDispersion _1 e6 ::DatasetEntity //e6 DatasetEntity.id=e_Halpha_VelocDisp_x1 DatasetEntity.name=Halpha_velocdisp_1 DatasetEntity.comment= ‘velocity dispersion map’ DatasetEntity.location=http://seafile.unistra.fr/califaSeg-select/velocdisp_map_IC5376 Wgb4::wasGeneratedBy wasGeneratedBy.role= ‘residual map’ wasGeneratedBy.entity_id= e_residuals_x1 wasGeneratedBy.activity_id= 2019-CubeSeg-jlx wasGeneratedBy.generationDescription_id= Fit_Residual_1 e7 ::DatasetEntity // e7 DatasetEntity .id='e_residuals_x1' DatasetEntity .name=Halpha_residualfits_1 DatasetEntity .comment= ‘Halpha emission line residual map’ DatasetEntity .location=http://seafile.unistra.fr/califaSeg-select/velocdisp_map_IC5376 // UsageDescription instances and their related EntityDescription instances UD_a ::UsageDescription UsageDescription .id= 'ud_inputcube_a' UsageDescription .type= Main UsageDescription .role='input cube' UsageDescription .multiplicity=1 DatasetDescription. description= 'input cube to be segmented' UsageDescription .entityDescription_id=ED_a // pointer to the EntityDescriotion instance i.e its id UsageDescription .activityDescription_id=‘2019-cubeseg_1’ ED_a ::DatasetDescription DatasetDescription. id=ED_input_a DatasetDescription. type= cube DatasetDescription. subtype= NULL DatasetDescription. contentType = image/fits DatasetDescription. comment = NULL //pas de commentaires sur cette instance DatasetDescription. description= 'cube 2D+lambda' UD_b ::UsageDescription UsageDescription .id: ud_inputLineList UsageDescription .type= document UsageDescription .role=input line list UsageDescription .description='lines to be fitted with name and reference wavelength' UsageDescription .multiplicity=1 UsageDescription .entityDescription_id= ED_b UsageDescription .activityDescription_id=‘2019-cubeseg_1’ ED_b::EntityDescription EntityDescription .id='ED_b' EntityDescription .type= document EntityDescription .subtype= list EntityDescription .contentType=text/csv EntityDescription .description= line list UD_c ::UsageDescription UsageDescription .id= 'ud_inputgal_cat' UsageDescription .type= catalog/table UsageDescription .role=input catalog UsageDescription .description='catalog in which the task extracts the galaxy redshift' UsageDescription .multiplicity=1 UsageDescription .entityDescription_id =ED_c UsageDescription .activityDescription_id=‘2019-cubeseg_1’ ED_c ::EntityDescription EntityDescription .id='ED_c' EntityDescription .type= catalog // same vocabulary definition as dataproducttype in Obscore EntityDescription .subtype= NULL EntityDescription .description = parameters for observed galaxies in input cubes EntityDescription .contentType= application/x-votable+xml //GenerationDescription instances GD_1:: GenerationDescription GenerationDescription .id=HalphaAmp_1 GenerationDescription .type=Main GenerationDescription .role= Halpha amplitude map GenerationDescription .description= 'Location of Halpha line with line amplitude ' GenerationDescription .multiplicity=1 GenerationDescription .entityDescription.id = output_edMap_1 GenerationDescription .activityDescription_id=‘2019-cubeseg_1’ GD_2 :: GenerationDescription GenerationDescription .id=HalphaVelc_1 GenerationDescription .type= Main GenerationDescription .role= Halpha velocity map GenerationDescription .description= 'velocity map derived from line doppler shift ' GenerationDescription .multiplicity=1 GenerationDescription .entityDescription_id = output_edMap_1 GenerationDescription .activityDescription_id=‘2019-cubeseg_1’ GD_3 :: GenerationDescription GenerationDescription .id= Halpha_VelocDispersion_1 GenerationDescription .type= Main GenerationDescription .role= Halpha velocity dispersion map GenerationDescription .description= 'velocity dispersion map as gaussian sigma for the fitted line' GenerationDescription .multiplicity=1 GenerationDescription .entityDescription_id = output_edMap_1 GenerationDescription .activityDescription_id=‘2019-cubeseg_1’ GD_4 :: GenerationDescription GenerationDescription .id=Fit_Residual_1 GenerationDescription .type=Main GenerationDescription .role= Residuals GenerationDescription .description= 'Error Map of line fits-Halpha ' GenerationDescription .multiplicity=1 GenerationDescription .entityDescription_id = output_edMap_1 GenerationDescription .activityDescription_id=‘2019-cubeseg_1’ output_edMap_1 ::DatasetDescription DatasetDescription .id=output_edMap_1 DatasetDescription .name= Map1-2019 DatasetDescription .description= 'feature map' DatasetDescription .type=image DatasetDescription .subtype=2D DatasetDescription. contentType=image/jpeg output_edMap_PNG ::DatasetDescription // can exist and be pointed to in some other user scenario DatasetDescription .id='output_edMap_PNG' DatasetDescription .type=image DatasetDescription .subtype=2D DatasetDescription .contentType=image/png Here we differentiate the various UsageDescription per input data type we factorize Entity description in the generation format : all outputs are maps . In case we would have a logfile as output, we would need another generation description and a new EntityDescription for all logfiles (probably) GD_5 :: GenerationDescription GenerationDescription. id=log_1 GenerationDescription. type= Log // belongs to special IVOA-PROV vocabulary// GenerationDescription. role= logfile // belongs to special IVOA-PROV vocabulary// GenerationDescription. multiplicity=1 GenerationDescription. entityDescription_id = ED_logfile GenerationDescription .activityDescription_id=‘2019-cubeseg_1’ ED_logfile ::EntityDescription EntityDescription .id=ED_logfile EntityDescription .type= Document EntityDescription .subtype=log file EntityDescription .contentType=text/csv // comments For a data provider aware of some processing step, the work to describe the workflow is easy : we can build a tree from an instance of an Activity, and branch each instance of * used entities link existing entity or insert into the entity table * generated entities link existing entity or insert into the entity table * parameters * one instance of ActivityDescription * Instances of usageDescription * Instances of GenerationDescription * Instances of EntityDescription following its type Dataset/Document/ValueEntity * one ore more instance of Agent class