
Revealing the Detailed Lineage of Script Outputs using Hybrid Provenance Qian Zhang1, Yang Cao1, Qiwen Wang2, Duc Vu3, Priyaa Thavasimani4, Timothy McPhillips1, Paolo Missier4, Peter Slaughter5, Christopher Jones5, Mahew B. Jones5, Bertram Ludäscher1, 2 1School of Informaon Sciences, University of Illinois at Urbana-Champaign 2Department of Computer Science, University of Illinois at Urbana-Champaign 3Department of Electrical and Computer Engineering, University of Illinois at Chicago 4School of CompuNng Science, Newcastle University, UK 5Naonal Center for Ecological Analysis and Synthesis, UCSB, Santa Barbara, USA github.com/yesworkflow-org/yw-idcc-17 IDCC 2017, February 22, 2017, Edinburgh, Scotland Movaon: Provenance is good for self and others … • Data- and Workflow-Provenance are crucial for transparency and reproducibility in computaonal and data-driven science. • ScienNfic workflow systems provide both prospec<ve provenance (workflow graphs) and retrospecve provenance (runNme observables). hFp://ajdcreave.com.au/thoughts/a-w-food-packaging- wine-label-design-trend-series#.WKsMFRKLSRs … but there are some Challenges ... • Most computaonal analyses and workflows are conducted using scripts (Python, R, MATLAB, ...) rather than workflow systems. • Retrospec<ve Provenance Observables • e.g. DataONE RunManagers (file-level), ReproZip (OS-level), or noWorkflow (Python code-level) … only yield isolated fragments of the overall data lineage and processing history. • Prospecve Provenance links and contextualizes fragments into a meaningful and comprehensible workflow, but scripts alone do not reveal the underlying workflow. • Provenance (like other metadata) appears to be rarely aconable or immediately useful for those who are expected to provide it (“provenance for others”). Approach • Simple YesWorkflow (YW) annotaons allow authors to reveal the workflow (prospecNve provenance) implicit in scripts. • Prospecve provenance queries expose and test data dependencies at the workflow level. • Hybrid provenance queries situate runme observables (retrospecNve provenance) in the overall workflow, yielding meaningful knowledge arNfacts. è Comprehensible workflow graphs & customizable provenance reports for script runs, along with data and code in scienNfic studies (“provenance for self”). Hybrid provenance Raw runtime observations YesWorkflow-annotated General purpose provenance bridges scripts # @BEGIN Log Gravitational_Wave_Detection File I/O # @IN fn_d @as FN_Detector Provenance # @IN fn_sr @as FN_Sampling_Rate Logic rules for reconstructing, files # @OUT shifted.wav @as Events shifted_wave Recorders # @OUT whitenbp.wav @as whitened_bandpass querying, and visualizing import numpy as np from scipy import signal … # @BEGIN Amplitude_Spectral_Density prospective and retrospective # @IN strain_H1 # @IN strain_L1 # @PARAM fs # @OUT psd_H1 # @OUT psd_L1 provenance together # @OUT GW150914_ASDs.png @URI … … NFFT = 1*fs Function call graph and fmin, fmax = 10, 2000 … variable dependencies YesWorkflow toolkit YesWorkflow toolkit Provenance Exporters noWorkflow toolkit Extract annotations and Reconstruct script run and Query and visualize Query and visualize model script as a workflow retrospective provenance provenance provenance Workflow model (graph) Reconstructed provenance Run observations Facts (Prolog) Facts (Prolog) Facts (Prolog) 138 fn_L1 142 fs 135 fn_H1 136 time_H1 205 deltat 139 time_L1 139 chan_dict_L1 139 loaddata 642 RandomState.randn 204 tevent 411 tevent 136 strain_H1 139 strain_L1 412 deltat 417 NFFT 255 NFFT 779 int(fs) 779 int(fs) 780 int(fs) 780 int(fs) 781 int(fs) 781 int(fs) 639 coefs 572 fs 642 data 136 chan_dict_H1 136 loaddata 175 ndarray.min 175 ndarray.mean 175 ndarray.max 175 len 207 indxt 207 where 212 xlabel 376 xlabel 212 str(tevent) 212 str(tevent) 376 str(tevent) 376 str(tevent) 188 bits 190 bits 186 bits 445 xlabel 434 xlabel 414 where 445 str(tevent) 445 str(tevent) 434 str(tevent) 434 str(tevent) 176 len 176 ndarray.min 258 Pxx_H1 144 time 176 ndarray.mean 176 ndarray.max 259 freqs 177 len 624 data_in 624 coefs 624 data_in 624 coefs 177 ndarray.min 177 ndarray.mean 177 ndarray.max 422 window 414 indxt 437 axis 448 axis 419 NOVL 427 spec_cmap 259 psd 258 freqs 258 psd 475 tevent 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 658 Nc 422 blackman 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 b 661 a 661 b 661 a 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 584 low 585 high 580 order 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 259 Pxx_L1 624 coefs 624 data_in 182 bits 184 bits 180 bits 210 plot 211 plot 189 ndarray.min 189 array_str 189 ndarray.mean 189 ndarray.max 189 array_str 189 len 191 ndarray.min 191 array_str 191 ndarray.mean 191 ndarray.max 191 array_str 191 len 187 ndarray.min 187 array_str 187 ndarray.mean 187 ndarray.max 187 array_str 187 len 163 NRtime 267 np.sqrt(Pxx_H1) 262 psd_H1 146 dt 267 loglog 267 np.sqrt(Pxx_H1) 711 plot 712 plot 625 ndarray.copy 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 625 data 163 NR_H1 163 ndarray.transpose 163 genfromtxt 625 ndarray.copy 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 625 data 433 plt.specgram(strain_H1[in ... xextent=[-deltat,deltat]) 433 specgram 444 plt.specgram(strain_L1[in ... xextent=[-deltat,deltat]) 433 plt.specgram(strain_H1[in ... xextent=[-deltat,deltat]) 444 specgram 444 plt.specgram(strain_L1[in ... xextent=[-deltat,deltat]) 263 psd_L1 714 xlabel 492 xlabel 503 xlabel 730 xlabel 492 str(tevent) 503 str(tevent) 503 str(tevent) 714 str(tevent) 714 str(tevent) 662 w 492 str(tevent) 730 str(tevent) 730 str(tevent) 662 w 662 freqz 662 w 662 freqz 662 w 662 freqz 662 w 662 freqz 662 r 662 w 662 r 662 freqz 662 freqz 662 w 662 freqz 662 r 662 r 662 r 659 filt_resp 662 w 662 freqz 662 r 662 r 662 r 662 w 662 r 268 loglog 662 freqz 659 ones 662 w 662 freqz 662 w 662 freqz 662 w 662 freqz 662 w 662 freqz 662 r 662 r 662 w 662 freqz 662 w 662 freqz 662 r 662 r 662 r 662 w 662 freqz 662 w 662 freqz 662 r 662 r 662 r 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 586 butter 542 nyq 542 nyq 586 bb 586 ab 542 nyq 268 np.sqrt(Pxx_L1) 268 np.sqrt(Pxx_L1) 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 625 data 625 ndarray.copy 183 ndarray.mean 183 ndarray.max 183 array_str 183 len 183 ndarray.min 183 array_str 185 ndarray.mean 185 ndarray.max 185 array_str 185 len 185 ndarray.min 185 array_str 181 ndarray.mean 181 ndarray.max 181 array_str 181 len 181 ndarray.min 181 array_str 314 interp_psd 314 dt 314 strain 314 strain 314 interp_psd 314 dt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 443 im 443 spec_H1 443 freqs 443 bins 432 im 432 spec_H1 432 freqs 432 bins 624 data_in 624 coefs 314 dt 314 strain 314 interp_psd 663 filt_resp 664 freqf 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 np.abs(r) 663 filt_resp 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 filt_resp 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 648 NFFT 320 hf 315 Nt 315 len 320 rfft 320 hf 315 len 315 Nt 320 rfft 631 return 631 return 625 ndarray.copy 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 625 data 649 Pxx_data 320 hf 315 Nt 315 len 320 rfft 666 filt_resp 559 iirdesign 559 iirdesign([low,high], [lo ..
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages25 Page
-
File Size-