Revealing the Detailed Lineage of Script Outputs using Hybrid Qian Zhang1, Yang Cao1, Qiwen Wang2, Duc Vu3, Priyaa Thavasimani4, Timothy McPhillips1, Paolo Missier4, Peter Slaughter5, Christopher Jones5, Mahew B. Jones5, Bertram Ludäscher1, 2

1School of Informaon Sciences, University of Illinois at Urbana-Champaign 2Department of Computer Science, University of Illinois at Urbana-Champaign 3Department of Electrical and Computer Engineering, University of Illinois at Chicago 4School of Compung Science, Newcastle University, UK 5Naonal Center for Ecological Analysis and Synthesis, UCSB, Santa Barbara, USA

github.com/yesworkflow-org/yw-idcc-17

IDCC 2017, February 22, 2017, Edinburgh, Scotland Movaon: Provenance is good for self and others …

• Data- and Workflow-Provenance are crucial for transparency and reproducibility in computaonal and data-driven science. • Scienfic workflow systems provide both prospecve provenance (workflow graphs) and retrospecve provenance (runme observables). hp://ajdcreave.com.au/thoughts/a-w-food-packaging- wine-label-design-trend-series#.WKsMFRKLSRs … but there are some Challenges ...

• Most computaonal analyses and workflows are conducted using scripts (Python, , MATLAB, ...) rather than workflow systems. • Retrospecve Provenance Observables • e.g. DataONE RunManagers (file-level), ReproZip (OS-level), or noWorkflow (Python code-level) … only yield isolated fragments of the overall data lineage and processing history. • Prospecve Provenance links and contextualizes fragments into a meaningful and comprehensible workflow, but scripts alone do not reveal the underlying workflow. • Provenance (like other metadata) appears to be rarely aconable or immediately useful for those who are expected to provide it (“provenance for others”). Approach

• Simple YesWorkflow (YW) annotaons allow authors to reveal the workflow (prospecve provenance) implicit in scripts. • Prospecve provenance queries expose and test data dependencies at the workflow level. • Hybrid provenance queries situate runme observables (retrospecve provenance) in the overall workflow, yielding meaningful knowledge arfacts. è Comprehensible workflow graphs & customizable provenance reports for script runs, along with data and code in scienfic studies (“provenance for self”). Hybrid provenance Raw runtime observations YesWorkflow-annotated General purpose provenance bridges scripts

# @BEGIN Log Gravitational_Wave_Detection File I/O # @IN fn_d @as FN_Detector Provenance # @IN fn_sr @as FN_Sampling_Rate Logic rules for reconstructing, files # @OUT shifted.wav @as Events shifted_wave Recorders # @OUT whitenbp.wav @as whitened_bandpass querying, and visualizing import numpy as np from scipy import signal … # @BEGIN Amplitude_Spectral_Density prospective and retrospective # @IN strain_H1 # @IN strain_L1 # @PARAM fs # @OUT psd_H1 # @OUT psd_L1 provenance together # @OUT GW150914_ASDs.png @URI … … NFFT = 1*fs Function call graph and fmin, fmax = 10, 2000 … variable dependencies

YesWorkflow toolkit YesWorkflow toolkit Provenance Exporters noWorkflow toolkit Extract annotations and Reconstruct script run and Query and visualize Query and visualize model script as a workflow retrospective provenance provenance provenance

Workflow model (graph) Reconstructed provenance Run observations Facts (Prolog) Facts (Prolog) Facts (Prolog)

138 fn_L1 142 fs 135 fn_H1

136 time_H1 205 deltat 139 time_L1 139 chan_dict_L1 139 loaddata 642 RandomState.randn 204 tevent 411 tevent 136 strain_H1 139 strain_L1 412 deltat 417 NFFT 255 NFFT 779 int(fs) 779 int(fs) 780 int(fs) 780 int(fs) 781 int(fs) 781 int(fs) 639 coefs 572 fs 642 data 136 chan_dict_H1 136 loaddata

175 ndarray.min 175 ndarray.mean 175 ndarray.max 175 len 207 indxt 207 where 212 xlabel 376 xlabel 212 str(tevent) 212 str(tevent) 376 str(tevent) 376 str(tevent) 188 bits 190 bits 186 bits 445 xlabel 434 xlabel 414 where 445 str(tevent) 445 str(tevent) 434 str(tevent) 434 str(tevent) 176 len 176 ndarray.min 258 Pxx_H1 144 time 176 ndarray.mean 176 ndarray.max 259 freqs 177 len 624 data_in 624 coefs 624 data_in 624 coefs 177 ndarray.min 177 ndarray.mean 177 ndarray.max 422 window 414 indxt 437 axis 448 axis 419 NOVL 427 spec_cmap 259 psd 258 freqs 258 psd 475 tevent 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 658 Nc 422 blackman 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 b 661 a 661 b 661 a 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 584 low 585 high 580 order 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 259 Pxx_L1 624 coefs 624 data_in 182 bits 184 bits 180 bits

210 plot 211 plot 189 ndarray.min 189 array_str 189 ndarray.mean 189 ndarray.max 189 array_str 189 len 191 ndarray.min 191 array_str 191 ndarray.mean 191 ndarray.max 191 array_str 191 len 187 ndarray.min 187 array_str 187 ndarray.mean 187 ndarray.max 187 array_str 187 len 163 NRtime 267 np.sqrt(Pxx_H1) 262 psd_H1 146 dt 267 loglog 267 np.sqrt(Pxx_H1) 711 plot 712 plot 625 ndarray.copy 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 625 data 163 NR_H1 163 ndarray.transpose 163 genfromtxt 625 ndarray.copy 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 625 data 433 plt.specgram(strain_H1[in ... xextent=[-deltat,deltat]) 433 specgram 444 plt.specgram(strain_L1[in ... xextent=[-deltat,deltat]) 433 plt.specgram(strain_H1[in ... xextent=[-deltat,deltat]) 444 specgram 444 plt.specgram(strain_L1[in ... xextent=[-deltat,deltat]) 263 psd_L1 714 xlabel 492 xlabel 503 xlabel 730 xlabel 492 str(tevent) 503 str(tevent) 503 str(tevent) 714 str(tevent) 714 str(tevent) 662 w 492 str(tevent) 730 str(tevent) 730 str(tevent) 662 w 662 freqz 662 w 662 freqz 662 w 662 freqz 662 w 662 freqz 662 r 662 w 662 r 662 freqz 662 freqz 662 w 662 freqz 662 r 662 r 662 r 659 filt_resp 662 w 662 freqz 662 r 662 r 662 r 662 w 662 r 268 loglog 662 freqz 659 ones 662 w 662 freqz 662 w 662 freqz 662 w 662 freqz 662 w 662 freqz 662 r 662 r 662 w 662 freqz 662 w 662 freqz 662 r 662 r 662 r 662 w 662 freqz 662 w 662 freqz 662 r 662 r 662 r 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 586 butter 542 nyq 542 nyq 586 bb 586 ab 542 nyq 268 np.sqrt(Pxx_L1) 268 np.sqrt(Pxx_L1) 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 625 data 625 ndarray.copy 183 ndarray.mean 183 ndarray.max 183 array_str 183 len 183 ndarray.min 183 array_str 185 ndarray.mean 185 ndarray.max 185 array_str 185 len 185 ndarray.min 185 array_str 181 ndarray.mean 181 ndarray.max 181 array_str 181 len 181 ndarray.min 181 array_str

314 interp_psd 314 dt 314 strain 314 strain 314 interp_psd 314 dt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 443 im 443 spec_H1 443 freqs 443 bins 432 im 432 spec_H1 432 freqs 432 bins 624 data_in 624 coefs 314 dt 314 strain 314 interp_psd 663 filt_resp 664 freqf 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 np.abs(r) 663 filt_resp 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 filt_resp 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 648 NFFT

320 hf 315 Nt 315 len 320 rfft 320 hf 315 len 315 Nt 320 rfft 631 return 631 return 625 ndarray.copy 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 625 data 649 Pxx_data 320 hf 315 Nt 315 len 320 rfft 666 filt_resp 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 631 return 649 freqs 649 psd filter_data filter_data filter_data 316 rfftfreq 316 freqs 316 rfftfreq 316 freqs 653 norm 653 ndarray.mean 688 strain_H1_filt 689 strain_L1_filt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 653 np.sqrt(Pxx_data) 653 np.sqrt(Pxx_data) 316 rfftfreq 316 freqs 672 plot 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 645 resp

593 np.array( 321 white_hf 321 (np.sqrt(interp_psd(freqs) /dt/2.)) 321 white_hf 321 (np.sqrt(interp_psd(freqs) /dt/2.)) 650 freqs 654 asd_data 650 Pxx_resp 654 np.sqrt(Pxx_data) 722 strain_L1_fils 725 plot 722 roll 722 int(0.007*fs) 631 return 321 white_hf 321 (np.sqrt(interp_psd(freqs) /dt/2.)) 722 int(0.007*fs) 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append [14.0,3 ... 331.49, 510.02, 1009.99]) 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 593 array 561 append 561 pd 560 zd 560 append 650 psd filter_data 322 white_ht 322 irfft 322 white_ht 322 irfft 655 asd_resp 670 plot 655 np.sqrt(Pxx_resp) 726 plot 692 NR_H1_filt 322 white_ht 322 irfft 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 591 notchesAbsolute 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf

323 return 323 return 671 plot 727 plot 481 NFFT 323 return 476 deltat 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return whiten whiten whiten iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops 328 NR_H1_whiten 351 ab 351 bb 351 butter 326 strain_H1_whiten 486 window 478 indxt 486 blackman 327 strain_L1_whiten 483 NOVL 478 where 773 deltat 772 tevent 597 an 597 array 597 bn 597 an 597 array 597 bn 597 an 597 bn 597 array 597 an 597 array 597 bn 597 an 597 bn 597 bn 597 an 597 array 597 bn 597 an 597 array 597 array 575 coefs 597 array 597 bn 597 an 597 an 597 bn 597 array 597 array 597 array 597 bn 597 an 597 bn 597 an 597 bn 597 array 597 an 597 array 597 array 597 an 597 bn 605 array 605 an 605 bn 601 bn 601 an 601 array 597 an 597 bn

823 speedup 354 filtfilt 354 NR_H1_whitenbp 352 filtfilt 821 fs 352 strain_H1_whitenbp 353 filtfilt 822 fshift 491 plt.specgram(strain_H1_wh ... xextent=[-deltat,deltat]) 491 specgram 502 specgram 502 plt.specgram(strain_L1_wh ... xextent=[-deltat,deltat]) 776 indxt 502 plt.specgram(strain_L1_wh ... xextent=[-deltat,deltat]) 353 strain_L1_whitenbp 491 plt.specgram(strain_H1_wh ... xextent=[-deltat,deltat]) 776 where 608 return 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 606 list.append 602 list.append 587 list.append 598 list.append Provenance queries get_filter_coefs 371 plot 368 strain_L1_shift 824 float(speedup) 373 plot 824 fss 824 int(float(fs)*float(speedup)) 768 filename 768 fs 768 data 824 float(fs) 808 data 808 sample_rate 808 fshift 845 int(fs) 845 int(fs) 846 int(fs) 846 int(fs) 847 int(fs) 847 int(fs) 808 sample_rate 808 data 808 fshift 501 im 501 spec_H1 501 freqs 501 bins 768 filename 768 fs 768 data 768 filename 768 fs 768 data 808 sample_rate 808 fshift 808 data 368 roll 490 bins 490 im 490 spec_H1 490 freqs 368 int(0.007*fs) 368 int(0.007*fs)

372 plot 769 d 769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9) 769 np.abs(data) 769 amax 769 np.abs(data) 812 len 812 T 812 float(sample_rate) 812 float(sample_rate) 812 T 812 len 769 d 769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9) 769 np.abs(data) 769 amax 769 np.abs(data) 769 d 769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9) 769 np.abs(data) 769 amax 769 np.abs(data) 812 float(sample_rate) 812 T 812 len

YesWorkflow toolkit 770 write 770 int(fs) 770 int(fs) 813 df 813 df 770 write 770 int(fs) 770 int(fs) 770 write 770 int(fs) 770 int(fs) 813 df write_wavfile write_wavfile write_wavfile GW150914_NR_whitenbp.wav 811 x 811 rfft 814 nbins 814 int(fshift/df) 811 x 811 rfft 814 nbins 814 int(fshift/df) GW150914_H1_whitenbp.wav GW150914_L1_whitenbp.wav 814 nbins 814 int(fshift/df) 811 x 811 rfft

Query provenance 816 y 816 roll 816 roll 816 y 816 roll 816 roll 816 y 816 roll 816 roll

817 z 817 irfft 817 z 817 irfft 817 z 817 irfft

818 return 818 return 818 return Render workflow reqshift reqshift reqshift 829 NR_H1_shifted 827 strain_H1_shifted 828 strain_L1_shifted 881 fs 875 fn_16 978 fn (esp. graphs) and 768 filename 768 fs 768 data 768 filename 768 fs 768 data 768 filename 768 fs 768 data 882 NFFT 876 chan_dict 876 strain_16 936 numtaps 935 factor 939 fs 877 fn_4 885 fs 876 time_16 876 loaddata 979 strain 993 DQflag 979 chan_dict 979 time 979 loaddata 769 d 769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9) 769 np.abs(data) 769 amax 769 np.abs(data) 769 d 769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9) 769 np.abs(data) 769 amax 769 np.abs(data) 769 d 769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9) 769 np.abs(data) 769 amax 769 np.abs(data) 883 freqs_16 883 Pxx_16 883 psd 937 strain_4new 937 decimate 940 NFFT 878 strain_4 878 chan_dict 878 time_4 878 loaddata 886 NFFT 989 sum 989 np.isnan(strain) 990 len 995 segment_list 989 np.isnan(strain) 995 dq_channel_to_seglist 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 1009 segment_list 982 keys 1009 dq_channel_to_seglist 982 dict.items

770 write 770 int(fs) 770 int(fs) 770 write 770 int(fs) 770 int(fs) 770 write 770 int(fs) 770 int(fs) 216 savefig 256 fmin 257 fmax 275 savefig 380 savefig 439 savefig 450 savefig 497 savefig 508 savefig 678 savefig 718 savefig 734 savefig 893 loglog 914 plot 893 np.sqrt(Pxx_16) 914 np.sqrt(Pxx_16) 893 np.sqrt(Pxx_16) 914 np.sqrt(Pxx_16) 941 Pxx_4new 941 freqs_4 941 psd 887 Pxx_4 887 freqs_4 887 psd 889 fmin 890 fmax 901 savefig 910 fmin 911 fmax 922 savefig 943 fmin 944 fmax 955 savefig 1002 seg_strain 996 len 1014 seg_strain 1010 len write_wavfile write_wavfile write_wavfile model graphically GW150914_NR_shifted.wav GW150914_H1_shifted.wav GW150914_L1_shifted.wav 209 figure 213 ylabel 214 legend 215 title GW150914_strain.png 266 figure 269 axis 270 grid 271 ylabel 272 xlabel 273 legend 274 title GW150914_ASDs.png 370 figure 374 xlim 375 ylim 377 ylabel 378 legend 379 title GW150914_strain_whitened.png 431 figure 435 ylabel 436 colorbar 438 title GW150914_H1_spectrogram.png 442 figure 446 ylabel 447 colorbar 449 title GW150914_L1_spectrogram.png 489 figure 493 ylabel 494 colorbar 495 axis 496 title GW150914_H1_spectrogram_whitened.png 500 figure 504 ylabel 505 colorbar 506 axis 507 title GW150914_L1_spectrogram_whitened.png 669 figure 673 xlim 674 grid 675 ylabel 676 xlabel 677 legend GW150914_filter.png 710 figure 713 xlim 715 ylabel 716 legend 717 title GW150914_H1_strain_unfiltered.png 724 figure 728 xlim 729 ylim 731 ylabel 732 legend 733 title GW150914_H1_strain_filtered.png 947 np.sqrt(Pxx_4new) 947 plot 947 np.sqrt(Pxx_4new) 948 plot 948 np.sqrt(Pxx_4) 894 np.sqrt(Pxx_4) 915 np.sqrt(Pxx_4) 948 np.sqrt(Pxx_4) 894 loglog 894 np.sqrt(Pxx_4) 915 plot 915 np.sqrt(Pxx_4) 892 figure 895 axis 896 grid 897 ylabel 898 xlabel 899 legend 900 title GW150914_H1_ASD_16384.png 913 figure 916 axis 917 grid 918 ylabel 919 xlabel 920 legend 921 title GW150914_H1_ASD_16384_zoom.png 946 figure 949 axis 950 grid 951 ylabel 952 xlabel 953 legend 954 title GW150914_H1_ASD_4096_zoom.png 1003 len 1015 len 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str visualize results subgraph upstream(strain_LI_whitenbp) [NW-recon] upstream(strain_LI_whitenbp) [prospective] upstream(strain_L1_whitenbp) [URI-recon]

FN_Detector NW_FILTERED_LINEAGE_GRAPH_FOR_STRAIN_L1_WHITENBP 141 fn_d = 'L-L1_LOSC_4_V1-1126259446-32.hdf5' FN_Sampling_rate FN_Detector FN_Sampling_rate FN_Detector fn_d = L-L1_LOSC_4_V1-1126259446-32.hdf5 file:H-H1_LOSC_{Rate}_V1-... file:{Detector}_LOSC_4_V1-... H-H1_LOSC_4_V1-1126259446-32.hdf5 L-L1_LOSC_4_V1-1126259446-32.hdf5 H-H1_LOSC_16_V1-1126259446-32.hdf5 H-H1_LOSC_4_V1-1126259446-32.hdf5 142 loaddata = (array([ -1.77955839e-18, ... 1, 1, 1], dtype=uint32)})

LOAD_DATA 142 time_L1 = array([ 1.12625945e+09, ... 8e+09, 1.12625948e+09]) 142 strain_L1 = array([ -1.77955839e-18, ... 6e-18, -1.71969299e-18]) 151 fs = 4096 LOAD_DATA LOAD_DATA 153 time = array([ 1.12625945e+09, ... 8e+09, 1.12625948e+09]) 325 strain = array([ -1.77955839e-18, ... 6e-18, -1.71969299e-18]) 266 NFFT = 4096 strain_L1 fs strain_L1 = array([-1.779e-18, -1.765e-18, ..., -1.719e-18]) fs = 4096 155 dt = 0.000244140625 326 len 270 psd = (array([ 2.22851728e-36, ... e+03, 2.04800000e+03])) strain_H1 strain_L1 fs strain_H1 strain_L1 fs

325 dt = 0.000244140625 326 Nt = 131072 270 freqs = array([ 0.00000000e+00, ... 0e+03, 2.04800000e+03]) 270 Pxx_L1 = array([ 2.22851728e-36, ... 5e-46, 1.77059496e-46]) AMPLITUDE_SPECTRAL_DENSITY

AMPLITUDE_SPECTRAL_DENSITY AMPLITUDE_SPECTRAL_DENSITY 331 rfft = array([ -2.39692348e-13 + ... 54e-19 +0.00000000e+00j]) 327 rfftfreq = array([ 0.00000000e+00, ... 5e+03, 2.04800000e+03]) 274 psd_L1 =

PSD_L1 327 freqs = array([ 0.00000000e+00, ... 5e+03, 2.04800000e+03]) 325 interp_psd = psd_L1 = scipy.interpolate.interpolate.interp1d object at 0x113969418 PSD_H1 PSD_L1 PSD_H1 PSD_L1 331 hf = array([ -2.39692348e-13 + ... 54e-19 +0.00000000e+00j]) 332 (np.sqrt(interp_psd(freqs) /dt/2.))

WHITENING 332 white_hf = array([ -3.54798023e+03 + ... 58e+02 +0.00000000e+00j]) WHITENING WHITENING 333 irfft = array([ 8.49413154, -1. ... .39942945, 72.15659253]) strain_L1_whiten strain_L1_whiten = array([8.494, -1.672, ..., 72.156]) 333 white_ht = array([ 8.49413154, -1. ... .39942945, 72.15659253]) strain_H1_whiten strain_L1_whiten strain_H1_whiten strain_L1_whiten

334 return = array([ 8.49413154, -1. ... .39942945, 72.15659253]) 362 butter = (array([ 0.0012848 , 0. ... 9166733 , 0.32217438])) BANDPASSING whiten BANDPASSING BANDPASSING 338 strain_L1_whiten = array([ 8.49413154, -1. ... .39942945, 72.15659253]) 362 bb = array([ 0.0012848 , 0. ... 0. , 0.0012848 ]) 362 ab = array([ 1. , -6. ... .9166733 , 0.32217438])

strain_L1_whitenbp 364 filtfilt = array([ 8.18464884, 19. ... .18198039, -0.68432653]) strain_L1_whitenbp = array([8.184, 19.935,..., -0.684]) strain_L1_whitenbp strain_L1_whitenbp 364 strain_L1_whitenbp = array([ 8.18464884, 19. ... .18198039, -0.68432653]) Hybrid Provenance Prospective Provenance Retrospective Provenance prospective + file- prospective + code- user-defined Python runtime level runtime level runtime workflow models observables observables observables Prospecve element: script-based workflow graph

• YesWorkflow (YW) tool Øtry.yesworkflow.org ØYW models script as workflow with YW annotaons ü Benefits script authors, future code readers, and/or general users Ølanguage-independent : Python, MATLAB, JAVA, R, C/C++, bash Øu ser-declared: ü Author adds simple YW annotaons to reveal the hidden workflow (W) § @begin, @end, @in, @end § Model what’s important ØScript users can see W’s structure graphically (prospecve provenance graph), including its underlying dataflow structure • Visualize or query prospecve provenance before running a script. Prospecve provenance queries

• Workflow-structure (general) queries Ø Show/render the complete prospecve provenance graph. Ø List all of the code blocks ü defined in the script along with any descripon given for each. ü nested (directly or indirectly) within a parcular code block. ü that invoke a parcular funcon or external program. ü that contain a parcular block (directly or indirectly). ü that receive inputs derived (directly or indirectly) from the outputs of a parcular upstream code block. ü affected (directly or indirectly) by a parcular parameter value provided to the script. • Prospecve data provenance (general) queries Ø Reveal the upstream lineage / complete derivaon of any final or intermediate data product y, (i.e., the subgraph of data and steps upstream of y in W). ü List the inputs to the script ü List the computaonal steps (code blocks) involved Ø Compute the downstream influence of x (i.e., the subgraph of the data derived from x along with the derivaon steps). ü Given a data item to what other downstream items does it contribute? i.e., List the outputs that depend on / downstream of a parcular script input. Ø For a parcular computaonal step reveal where each input to the step comes from: an input to the script, a constant in the script, or a value produced by a different step, for example.

What C4_fracon_dataC3_C4_map_present_NA depends on …

mean_precip mean_airtemp

C3_C4_map_present_NA fetch_monthly_mean_precipitation_data fetch_monthly_mean_air_temperature_data mean_precip mean_airtemp

fetch_monthly_mean_precipitation_data fetch_monthly_mean_air_temperature_data

Rain_Matrix Tair_Matrix SYNMAP_land_cover_map_data

Rain_Matrix Tair_Matrix SYNMAP_land_cover_map_data examine_pixels_for_grass fetch_SYNMAP_land_cover_map_variable initialize_Grass_Matrix

C3_Data C4_Data lon_variable lat_variable lon_bnds_variable lat_bnds_variable Grass_variable

generate_netcdf_file_for_C3_fraction generate_netcdf_file_for_C4_fraction generate_netcdf_file_for_Grass_fraction

examine_pixels_for_grass fetch_SYNMAP_land_cover_map_variable C3_fraction_data C4_fraction_data Grass_fraction_data

C4_Data lon_variable lat_variable lon_bnds_variable lat_bnds_variable C4_fracon_data generate_netcdf_file_for_C4_fraction lineage very similar to overall workflow graph! C4_fraction_data What Grass_fracon_data depends on …

C3_C4_map_present_NA

C3_C4_map_present_NA SYNMAP_land_cover_map_data

mean_precip mean_airtemp

fetch_monthly_mean_precipitation_data fetch_monthly_mean_air_temperature_data

Rain_Matrix Tair_Matrix SYNMAP_land_cover_map_data fetch_SYNMAP_land_cover_map_variable initialize_Grass_Matrix examine_pixels_for_grass fetch_SYNMAP_land_cover_map_variable initialize_Grass_Matrix

C3_Data C4_Data lon_variable lat_variable lon_bnds_variable lat_bnds_variable Grass_variable

generate_netcdf_file_for_C3_fraction generate_netcdf_file_for_C4_fraction generate_netcdf_file_for_Grass_fraction

lon_variable lat_variable lon_bnds_variable lat_bnds_variable Grass_variable C3_fraction_data C4_fraction_data Grass_fraction_data

generate_netcdf_file_for_Grass_fraction C4_fracon_data lineage Grass_fraction_data different from overall workflow graph! - Smaller subgraph - Depends on 1 of 3 inputs! Retrospecve element

• Runme observables: Ø input, output, and intermediate data product • Raw runme observables come from: Ø YW-recon ü Not a runme provenance recorder § YW reconstructs script run and retrospecve provenance using URI template § YW-recon log files ü Not a runme provenance recorder § User-defined log files capture runme observables at any level of granularity § YW reconstructs script run and retrospecve provenance using @log associated with @out YW annotaons Ø DataONE MATLAB RunManagers (RM) ü MATLAB runme recorder § Capture provenance at file-level ü Provenance exporter à run observaon fact in Prolog Ø noWorkflow (NW) toolkit: ü Python runme provenance recorder § Capture provenance at both code-level and file-level ü Provenance exporter à run observaon fact in Prolog ü Query and visualize provenance Prospecve vs. retrospecve provenance

• Prospecve provenance ØForward data provenance describes how a data is used/applied aer it has been created. ØApproach: ü YesWorkflow (YW) tool for modeling scripts as workflows • Retrospecve provenance ØBackward data provenance describes where the data came from and how it was generated before it has been created. ØApproach: ü Reconstrucng provenance with YW-Recon o Extra: YW-log ü Recording provenance with DataONE RunManager (RM) ü NoWorkflow (NW) Hybrid provenance

• Prospecve + Retrospecve Ø situate runme observables (retrospecve provenance) in the overall workflow, yielding meaningful knowledge arfacts Ø fragments of workflow execuon traces with structures provided by the user- declared YW models (prospecve provenance) and with execuon details filled in from one or more sources of runme observables (retrospecve provenance) • Hybrid provenance bridge Ø Logic rules for reconstrucng, querying, and visualizing prospecve and retrospecve provenance together Ø Integrate provenance gathered from file names and paths, log files, data file headers, run metadata, and records of run-me events. ü YW-RM bridge at the file-level ü YW-NW bridge § At the file-level § At the code-level (General) hybrid queries and views

• Show/render reconstructed provenance graph with all observables in context of the workflow. • Backward lineage query: ØGiven a file output by a run of a script, indicate the input files from which it was derived or by which it was affected. ØShow/Render reconstructed workflow graph upstream of a given data product. • Forward lineage query: ØGiven an input file to a script, indicate which output files were derived from or affected by the data contained in that file. ØShow/Render reconstructed workflow graph downstream of a given data product. What happens aer running the script?

Hybrid provenanceC3_C4_map_present_NA graph!

mean_precip mean_airtemp inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.4.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.9.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.8.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.2.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.1.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.1.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.12.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.6.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.5.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.10.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.9.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.3.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.2.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.7.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.6.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.11.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.10.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.4.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.3.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.8.nc • inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.7.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.12.nc 3 inputs spread across inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.11.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.5.nc 25 (=2x24 + 1) files fetch_monthly_mean_precipitation_data fetch_monthly_mean_air_temperature_data

SYNMAP_land_cover_map_data Rain_Matrix Tair_Matrix inputs/land_cover/SYNMAP_NA_QD.nc

examine_pixels_for_grass fetch_SYNMAP_land_cover_map_variable initialize_Grass_Matrix • Do all 3 output files C3_Data C4_Data lon_variable lat_variable lon_bnds_variable lat_bnds_variable Grass_variable depend on all 25 inputs? generate_netcdf_file_for_C3_fraction generate_netcdf_file_for_C4_fraction generate_netcdf_file_for_Grass_fraction

C3_fraction_data C4_fraction_data Grass_fraction_data outputs/SYNMAP_PRESENTVEG_C3Grass_RelaFrac_NA_v2.0.nc outputs/SYNMAP_PRESENTVEG_C4Grass_RelaFrac_NA_v2.0.nc outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc C3_C4_map_present_NA What C4_fracon_data depends on (hybridC3_C4_map_present_NA) …

mean_precip mean_airtemp

mean_precip mean_airtemp

inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.10.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.4.nc fetch_monthly_mean_precipitation_data fetch_monthly_mean_air_temperature_data inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.3.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.8.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.7.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.1.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.11.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.12.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.4.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.5.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.8.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.9.nc Rain_Matrix Tair_Matrix SYNMAP_land_cover_map_data inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.1.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.2.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.12.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.6.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.5.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.10.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.9.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.3.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.2.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.7.nc examine_pixels_for_grass fetch_SYNMAP_land_cover_map_variable inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.6.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.11.nc

C4_Data lon_variable lat_variable lon_bnds_variable lat_bnds_variable

fetch_monthly_mean_precipitation_data fetch_monthly_mean_air_temperature_data generate_netcdf_file_for_C4_fraction

C4_fraction_data SYNMAP_land_cover_map_data Rain_Matrix Tair_Matrix inputs/land_cover/SYNMAP_NA_QD.nc

examine_pixels_for_grass fetch_SYNMAP_land_cover_map_variable Earlier prospecve query result C4_Data lon_variable lat_variable lon_bnds_variable lat_bnds_variable

generate_netcdf_file_for_C4_fraction

C4_fraction_data outputs/SYNMAP_PRESENTVEG_C4Grass_RelaFrac_NA_v2.0.nc Raw runtime observations YesWorkflow-annotated General purpose provenance bridges scripts

# @BEGIN Log Gravitational_Wave_Detection File I/O # @IN fn_d @as FN_Detector Provenance # @IN fn_sr @as FN_Sampling_Rate Logic rules for reconstructing, files # @OUT shifted.wav @as Events shifted_wave Recorders # @OUT whitenbp.wav @as whitened_bandpass querying, and visualizing What Grass_fracon_data depends on … import numpy as np from scipy import signal … # @BEGIN Amplitude_Spectral_Density prospective and retrospective C3_C4_map_present_NA # @IN strain_H1 # @IN strain_L1 # @PARAM fs # @OUT psd_H1 # @OUT psd_L1 provenance together # @OUT GW150914_ASDs.png @URI … … NFFT = 1*fs Function call graph and fmin, fmax = 10, 2000 … variable dependencies

YesWorkflow toolkit YesWorkflow toolkit Provenance Exporters noWorkflow toolkit SYNMAP_land_cover_map_data Extract annotations and Reconstruct script run and Query and visualize Query and visualize C3_C4_map_present_NA retrospective provenance inputs/land_cover/SYNMAP_NA_QD.nc model script as a workflow provenance provenance

mean_precip mean_airtemp

fetch_monthly_mean_precipitation_data fetch_monthly_mean_air_temperature_dataWorkflow model (graph) Reconstructed provenance Run observations Facts (Prolog) Facts (Prolog) Facts (Prolog) fetch_SYNMAP_land_cover_map_variable initialize_Grass_Matrix Rain_Matrix Tair_Matrix SYNMAP_land_cover_map_data

138 fn_L1 142 fs 135 fn_H1

136 time_H1 205 deltat 139 time_L1 139 chan_dict_L1 139 loaddata 642 RandomState.randn 204 tevent 411 tevent 136 strain_H1 139 strain_L1 412 deltat 417 NFFT 255 NFFT 779 int(fs) 779 int(fs) 780 int(fs) 780 int(fs) 781 int(fs) 781 int(fs) 639 coefs 572 fs 642 data 136 chan_dict_H1 136 loaddata

175 ndarray.min 175 ndarray.mean 175 ndarray.max 175 len 207 indxt 207 where 212 xlabel 376 xlabel 212 str(tevent) 212 str(tevent) 376 str(tevent) 376 str(tevent) 188 bits 190 bits 186 bits 445 xlabel 434 xlabel 414 where 445 str(tevent) 445 str(tevent) 434 str(tevent) 434 str(tevent) 176 len 176 ndarray.min 258 Pxx_H1 144 time 176 ndarray.mean 176 ndarray.max 259 freqs 177 len 624 data_in 624 coefs 624 data_in 624 coefs 177 ndarray.min 177 ndarray.mean 177 ndarray.max 422 window 414 indxt 437 axis 448 axis 419 NOVL 427 spec_cmap 259 psd 258 freqs 258 psd 475 tevent 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 658 Nc 422 blackman 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 a 661 b 661 b 661 a 661 b 661 a 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 584 low 585 high 580 order 535 fs 535 fstops 535 fs 535 fstops 535 fs 535 fstops 259 Pxx_L1 624 coefs 624 data_in 182 bits 184 bits 180 bits

210 plot 211 plot 189 ndarray.min 189 array_str 189 ndarray.mean 189 ndarray.max 189 array_str 189 len 191 ndarray.min 191 array_str 191 ndarray.mean 191 ndarray.max 191 array_str 191 len 187 ndarray.min 187 array_str 187 ndarray.mean 187 ndarray.max 187 array_str 187 len 163 NRtime 267 np.sqrt(Pxx_H1) 262 psd_H1 146 dt 267 loglog 267 np.sqrt(Pxx_H1) 711 plot 712 plot 625 ndarray.copy 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 625 data 163 NR_H1 163 ndarray.transpose 163 genfromtxt 625 ndarray.copy 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 625 data 433 plt.specgram(strain_H1[in ... xextent=[-deltat,deltat]) 433 specgram 444 plt.specgram(strain_L1[in ... xextent=[-deltat,deltat]) 433 plt.specgram(strain_H1[in ... xextent=[-deltat,deltat]) 444 specgram 444 plt.specgram(strain_L1[in ... xextent=[-deltat,deltat]) 263 psd_L1 714 xlabel 492 xlabel 503 xlabel 730 xlabel 492 str(tevent) 503 str(tevent) 503 str(tevent) 714 str(tevent) 714 str(tevent) 662 w 492 str(tevent) 730 str(tevent) 730 str(tevent) 662 w 662 freqz 662 w 662 freqz 662 w 662 freqz 662 w 662 freqz 662 r 662 w 662 r 662 freqz 662 freqz 662 w 662 freqz 662 r 662 r 662 r 659 filt_resp 662 w 662 freqz 662 r 662 r 662 r 662 w 662 r 268 loglog 662 freqz 659 ones 662 w 662 freqz 662 w 662 freqz 662 w 662 freqz 662 w 662 freqz 662 r 662 r 662 w 662 freqz 662 w 662 freqz 662 r 662 r 662 r 662 w 662 freqz 662 w 662 freqz 662 r 662 r 662 r 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 542 nyq 586 butter 542 nyq 542 nyq 586 bb 586 ab 542 nyq 268 np.sqrt(Pxx_L1) 268 np.sqrt(Pxx_L1) 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 625 data 625 ndarray.copy 183 ndarray.mean 183 ndarray.max 183 array_str 183 len 183 ndarray.min 183 array_str 185 ndarray.mean 185 ndarray.max 185 array_str 185 len 185 ndarray.min 185 array_str 181 ndarray.mean 181 ndarray.max 181 array_str 181 len 181 ndarray.min 181 array_str

314 interp_psd 314 dt 314 strain 314 strain 314 interp_psd 314 dt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 443 im 443 spec_H1 443 freqs 443 bins 432 im 432 spec_H1 432 freqs 432 bins 624 data_in 624 coefs 314 dt 314 strain 314 interp_psd 663 filt_resp 664 freqf 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 filt_resp 663 np.abs(r) 663 filt_resp 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 filt_resp 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 663 np.abs(r) 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 554 low 555 high 556 low2 557 high2 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 648 NFFT

examine_pixels_for_grass fetch_SYNMAP_land_cover_map_variable initialize_Grass_Matrix 320 hf 315 Nt 315 len 320 rfft 320 hf 315 len 315 Nt 320 rfft 631 return 631 return 625 ndarray.copy 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 627 b 627 a 625 data 649 Pxx_data 320 hf 315 Nt 315 len 320 rfft 666 filt_resp 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 559 iirdesign([low,high], [lo ... pe='ellip', output='zpk') 631 return 649 freqs 649 psd filter_data filter_data filter_data 316 rfftfreq 316 freqs 316 rfftfreq 316 freqs 653 norm 653 ndarray.mean 688 strain_H1_filt 689 strain_L1_filt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 630 data 630 filtfilt 653 np.sqrt(Pxx_data) 653 np.sqrt(Pxx_data) 316 rfftfreq 316 freqs 672 plot 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 558 p 546 pd 546 array 558 k 545 zd 558 z 545 array 645 resp

593 np.array( 321 white_hf 321 (np.sqrt(interp_psd(freqs) /dt/2.)) 321 white_hf 321 (np.sqrt(interp_psd(freqs) /dt/2.)) 650 freqs 654 asd_data 650 Pxx_resp 654 np.sqrt(Pxx_data) 722 strain_L1_fils 725 plot 722 roll 722 int(0.007*fs) 631 return 321 white_hf 321 (np.sqrt(interp_psd(freqs) /dt/2.)) 722 int(0.007*fs) 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append [14.0,3 ... 331.49, 510.02, 1009.99]) 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 561 append 561 pd 560 zd 560 append 593 array 561 append 561 pd 560 zd 560 append 650 psd filter_data 322 white_ht 322 irfft 322 white_ht 322 irfft 655 asd_resp 670 plot 655 np.sqrt(Pxx_resp) 726 plot 692 NR_H1_filt 322 white_ht 322 irfft 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 591 notchesAbsolute 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf 564 aPrelim 564 bPrelim 564 zpk2tf 568 a 568 b 568 zpk2tf

323 return 323 return 671 plot 727 plot 481 NFFT 323 return 476 deltat 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 596 notchf 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return 565 outFreq 565 outg0 565 freqz 569 return whiten whiten whiten iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops iir_bandstops 328 NR_H1_whiten 351 ab 351 bb 351 butter 326 strain_H1_whiten 486 window 478 indxt 486 blackman 327 strain_L1_whiten 483 NOVL 478 where 773 deltat 772 tevent 597 an 597 array 597 bn 597 an 597 array 597 bn 597 an 597 bn 597 array 597 an 597 array 597 bn 597 an 597 bn 597 bn 597 an 597 array 597 bn 597 an 597 array 597 array 575 coefs 597 array 597 bn 597 an 597 an 597 bn 597 array 597 array 597 array 597 bn 597 an 597 bn 597 an 597 bn 597 array 597 an 597 array 597 array 597 an 597 bn 605 array 605 an 605 bn 601 bn 601 an 601 array 597 an 597 bn

823 speedup 354 filtfilt 354 NR_H1_whitenbp 352 filtfilt 821 fs 352 strain_H1_whitenbp 353 filtfilt 822 fshift 491 plt.specgram(strain_H1_wh ... xextent=[-deltat,deltat]) 491 specgram 502 specgram 502 plt.specgram(strain_L1_wh ... xextent=[-deltat,deltat]) 776 indxt 502 plt.specgram(strain_L1_wh ... xextent=[-deltat,deltat]) 353 strain_L1_whitenbp 491 plt.specgram(strain_H1_wh ... xextent=[-deltat,deltat]) 776 where 608 return 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 598 list.append 606 list.append 602 list.append 587 list.append 598 list.append Provenance queries get_filter_coefs C3_Data C4_Data lon_variable lat_variable lon_bnds_variable lat_bnds_variable Grass_variable 371 plot 368 strain_L1_shift 824 float(speedup) 373 plot 824 fss 824 int(float(fs)*float(speedup)) 768 filename 768 fs 768 data 824 float(fs) 808 data 808 sample_rate 808 fshift 845 int(fs) 845 int(fs) 846 int(fs) 846 int(fs) 847 int(fs) 847 int(fs) 808 sample_rate 808 data 808 fshift 501 im 501 spec_H1 501 freqs 501 bins 768 filename 768 fs 768 data 768 filename 768 fs 768 data 808 sample_rate 808 fshift 808 data 368 roll 490 bins 490 im 490 spec_H1 490 freqs 368 int(0.007*fs) 368 int(0.007*fs) 372 plot 769 d 769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9) 769 np.abs(data) 769 amax 769 np.abs(data) 812 len 812 T 812 float(sample_rate) 812 float(sample_rate) 812 T 812 len 769 d 769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9) 769 np.abs(data) 769 amax 769 np.abs(data) 769 d 769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9) 769 np.abs(data) 769 amax 769 np.abs(data) 812 float(sample_rate) 812 T 812 len

YesWorkflow toolkit 770 write 770 int(fs) 770 int(fs) 813 df 813 df 770 write 770 int(fs) 770 int(fs) 770 write 770 int(fs) 770 int(fs) 813 df write_wavfile write_wavfile write_wavfile GW150914_NR_whitenbp.wav 811 x 811 rfft 814 nbins 814 int(fshift/df) 811 x 811 rfft 814 nbins 814 int(fshift/df) GW150914_H1_whitenbp.wav GW150914_L1_whitenbp.wav 814 nbins 814 int(fshift/df) 811 x 811 rfft

Query provenance 816 y 816 roll 816 roll 816 y 816 roll 816 roll 816 y 816 roll 816 roll

817 z 817 irfft 817 z 817 irfft 817 z 817 irfft

818 return 818 return 818 return Render workflow reqshift reqshift reqshift 829 NR_H1_shifted 827 strain_H1_shifted 828 strain_L1_shifted 881 fs 875 fn_16 978 fn generate_netcdf_file_for_C3_fraction generate_netcdf_file_for_C4_fraction generate_netcdf_file_for_Grass_fraction (esp. graphs) and 768 filename 768 fs 768 data 768 filename 768 fs 768 data 768 filename 768 fs 768 data 882 NFFT 876 chan_dict 876 strain_16 936 numtaps 935 factor 939 fs 877 fn_4 885 fs 876 time_16 876 loaddata 979 strain 993 DQflag 979 chan_dict 979 time 979 loaddata 769 d 769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9) 769 np.abs(data) 769 amax 769 np.abs(data) 769 d 769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9) 769 np.abs(data) 769 amax 769 np.abs(data) 769 d 769 np.int16(data/np.max(np.abs(data)) * 32767 * 0.9) 769 np.abs(data) 769 amax 769 np.abs(data) 883 freqs_16 883 Pxx_16 883 psd 937 strain_4new 937 decimate 940 NFFT 878 strain_4 878 chan_dict 878 time_4 878 loaddata 886 NFFT 989 sum 989 np.isnan(strain) 990 len 995 segment_list 989 np.isnan(strain) 995 dq_channel_to_seglist 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 982 keys 982 values 1009 segment_list 982 keys 1009 dq_channel_to_seglist 982 dict.items

770 write 770 int(fs) 770 int(fs) 770 write 770 int(fs) 770 int(fs) 770 write 770 int(fs) 770 int(fs) 216 savefig 256 fmin 257 fmax 275 savefig 380 savefig 439 savefig 450 savefig 497 savefig 508 savefig 678 savefig 718 savefig 734 savefig 893 loglog 914 plot 893 np.sqrt(Pxx_16) 914 np.sqrt(Pxx_16) 893 np.sqrt(Pxx_16) 914 np.sqrt(Pxx_16) 941 Pxx_4new 941 freqs_4 941 psd 887 Pxx_4 887 freqs_4 887 psd 889 fmin 890 fmax 901 savefig 910 fmin 911 fmax 922 savefig 943 fmin 944 fmax 955 savefig 1002 seg_strain 996 len 1014 seg_strain 1010 len write_wavfile write_wavfile write_wavfile model graphically GW150914_NR_shifted.wav GW150914_H1_shifted.wav GW150914_L1_shifted.wav 209 figure 213 ylabel 214 legend 215 title GW150914_strain.png 266 figure 269 axis 270 grid 271 ylabel 272 xlabel 273 legend 274 title GW150914_ASDs.png 370 figure 374 xlim 375 ylim 377 ylabel 378 legend 379 title GW150914_strain_whitened.png 431 figure 435 ylabel 436 colorbar 438 title GW150914_H1_spectrogram.png 442 figure 446 ylabel 447 colorbar 449 title GW150914_L1_spectrogram.png 489 figure 493 ylabel 494 colorbar 495 axis 496 title GW150914_H1_spectrogram_whitened.png 500 figure 504 ylabel 505 colorbar 506 axis 507 title GW150914_L1_spectrogram_whitened.png 669 figure 673 xlim 674 grid 675 ylabel 676 xlabel 677 legend GW150914_filter.png 710 figure 713 xlim 715 ylabel 716 legend 717 title GW150914_H1_strain_unfiltered.png 724 figure 728 xlim 729 ylim 731 ylabel 732 legend 733 title GW150914_H1_strain_filtered.png 947 np.sqrt(Pxx_4new) 947 plot 947 np.sqrt(Pxx_4new) 948 plot 948 np.sqrt(Pxx_4) 894 np.sqrt(Pxx_4) 915 np.sqrt(Pxx_4) 948 np.sqrt(Pxx_4) 894 loglog 894 np.sqrt(Pxx_4) 915 plot 915 np.sqrt(Pxx_4) 892 figure 895 axis 896 grid 897 ylabel 898 xlabel 899 legend 900 title GW150914_H1_ASD_16384.png 913 figure 916 axis 917 grid 918 ylabel 919 xlabel 920 legend 921 title GW150914_H1_ASD_16384_zoom.png 946 figure 949 axis 950 grid 951 ylabel 952 xlabel 953 legend 954 title GW150914_H1_ASD_4096_zoom.png 1003 len 1015 len 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str 984 array_str lon_variable lat_variable lon_bnds_variable lat_bnds_variable Grass_variable visualize results

C3_fraction_data C4_fraction_data Grass_fraction_data subgraph upstream(strain_LI_whitenbp) [NW-recon] upstream(strain_LI_whitenbp) [prospective] upstream(strain_L1_whitenbp) [URI-recon]

FN_Detector NW_FILTERED_LINEAGE_GRAPH_FOR_STRAIN_L1_WHITENBP 141 fn_d = 'L-L1_LOSC_4_V1-1126259446-32.hdf5' FN_Sampling_rate FN_Detector FN_Sampling_rate FN_Detector fn_d = L-L1_LOSC_4_V1-1126259446-32.hdf5 file:H-H1_LOSC_{Rate}_V1-... file:{Detector}_LOSC_4_V1-... H-H1_LOSC_4_V1-1126259446-32.hdf5 L-L1_LOSC_4_V1-1126259446-32.hdf5 H-H1_LOSC_16_V1-1126259446-32.hdf5 H-H1_LOSC_4_V1-1126259446-32.hdf5 142 loaddata = (array([ -1.77955839e-18, ... 1, 1, 1], dtype=uint32)})

LOAD_DATA 142 time_L1 = array([ 1.12625945e+09, ... 8e+09, 1.12625948e+09]) 142 strain_L1 = array([ -1.77955839e-18, ... 6e-18, -1.71969299e-18]) 151 fs = 4096 OverallLOAD_DATA workflow LOAD_DATA C3_C4_map_present_NA 153 time = array([ 1.12625945e+09, ... 8e+09, 1.12625948e+09]) 325 strain = array([ -1.77955839e-18, ... 6e-18, -1.71969299e-18]) 266 NFFT = 4096 generate_netcdf_file_for_Grass_fraction strain_L1 fs strain_L1 = array([-1.779e-18, -1.765e-18, ..., -1.719e-18]) fs = 4096 155 dt = 0.000244140625 326 len 270 psd = (array([ 2.22851728e-36, ... e+03, 2.04800000e+03])) strain_H1 strain_L1 fs strain_H1 strain_L1 fs

325 dt = 0.000244140625 326 Nt = 131072 270 freqs = array([ 0.00000000e+00, ... 0e+03, 2.04800000e+03]) 270 Pxx_L1 = array([ 2.22851728e-36, ... 5e-46, 1.77059496e-46]) AMPLITUDE_SPECTRAL_DENSITY

AMPLITUDE_SPECTRAL_DENSITY AMPLITUDE_SPECTRAL_DENSITY 331 rfft = array([ -2.39692348e-13 + ... 54e-19 +0.00000000e+00j]) 327 rfftfreq = array([ 0.00000000e+00, ... 5e+03, 2.04800000e+03]) 274 psd_L1 =

PSD_L1 327 freqs = array([ 0.00000000e+00, ... 5e+03, 2.04800000e+03]) 325 interp_psd = psd_L1 = scipy.interpolate.interpolate.interp1d object at 0x113969418 PSD_H1 PSD_L1 PSD_H1 PSD_L1 331 hf = array([ -2.39692348e-13 + ... 54e-19 +0.00000000e+00j]) 332 (np.sqrt(interp_psd(freqs) /dt/2.)) SYNMAP_land_cover_map_data WHITENING 332 white_hf = array([ -3.54798023e+03 + ... 58e+02 +0.00000000e+00j]) WHITENING WHITENING 333 irfft = array([ 8.49413154, -1. ... .39942945, 72.15659253]) strain_L1_whiten strain_L1_whiten = array([8.494, -1.672, ..., 72.156]) 333 white_ht = array([ 8.49413154, -1. ... .39942945, 72.15659253]) Grass_fraction_data strain_H1_whiten strain_L1_whiten strain_H1_whiten strain_L1_whiten 334 return = array([ 8.49413154, -1. ... .39942945, 72.15659253]) 362 butter = (array([ 0.0012848 , 0. ... 9166733 , 0.32217438])) fetch_SYNMAP_land_cover_map_variable initialize_Grass_Matrix BANDPASSING whiten BANDPASSING BANDPASSING 338 strain_L1_whiten = array([ 8.49413154, -1. ... .39942945, 72.15659253]) 362 bb = array([ 0.0012848 , 0. ... 0. , 0.0012848 ]) 362 ab = array([ 1. , -6. ... .9166733 , 0.32217438])

outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc strain_L1_whitenbp 364 filtfilt = array([ 8.18464884, 19. ... .18198039, -0.68432653]) strain_L1_whitenbp = array([8.184, 19.935,..., -0.684]) strain_L1_whitenbp strain_L1_whitenbp 364 strain_L1_whitenbp = array([ 8.18464884, 19. ... .18198039, -0.68432653]) lon_variable lat_variable lon_bnds_variable lat_bnds_variable Grass_variable Hybrid Provenance Prospective Provenance Retrospective Provenance prospective + file- prospective + code- generate_netcdf_file_for_Grass_fractionuser-defined Python runtime level runtime level runtime workflow models observables observables Grass_fraction_data observables Upstream of Grass_fracon_data (hybrid) Upstream of Grass_fracon_data (prospecve) Other use cases spanning mulple disciplines Domain Use case Programming language Provenance methods

Climate science C3C4 MATLAB YW + MATLAB single script/run RunManager

↓ Astrophysics LIGO Python YW + NW (code-level) ↓ ↓ Protein crystal samples Simulate data collecon Python YW + NW (code-level) ↓ Biodiversity data kurator-SPNHC Python YW-recon + YW-logging ↓ curaon

↓ Social network analysis Twier Python YW + NW (file-level) ↓ Oceanography OHIBC Howe Sound R YW + R RunManager mul-script/run (mul-run mul-script)

demo site: github.com/yesworkflow-org/yw-idcc-17 GRAVITATIONAL_WAVE_DETECTION

FN_Detector FN_Sampling_rate file:{Detector}_LOSC_4_V1-1126259446-32.hdf5 file:H-H1_LOSC_{DownSampling}_V1-1126259446-32.hdf5

LOAD_DATA Load hdf5 data.

strain_L1 strain_H1 fs strain_16 strain_4

AMPLITUDE_SPECTRAL_DENSITY SPECTROGRAMS_FOR_STRAIN_DATA FILTER_COEFS DOWNSAMPLING Amplitude spectral density. plot spectrogram for strain data. Filter signal in time domain (bandpassing). Downsampling from 16384 Hz to 4096 Hz.

ASDs spectrogram H1_ASD_SamplingRate PSD_L1 PSD_H1 COEFFICIENTS upstream(strain_LI_whitenbp) [NW-recon] upstream(strain_L1_whitenbp) [URI-recon] file:GW150914_ASDs.png file:GW150914_{detector}_spectrogram.png file:GW150914_H1_ASD_{SamplingRate}.png LIGO example: What strain_L1_whitenbp depends on … WHITENING FILTER_DATA suppress low frequencies noise. filter data.

filtered_white_noise_data strain_H1_whiten strain_L1_whiten strain_L1_filt strain_H1_filt file:GW150914_filter.png

BANDPASSING SPECTROGRAMS_FOR_WHITEND_DATA STRAIN_WAVEFORM_FOR_FILTERED_DATA remove high frequency noise. plot spectrogram for whitened data. plot the filtered data.

spectrogram_whitened H1_strain_unfiltered H1_strain_filtered strain_H1_whitenbp strain_L1_whitenbp file:GW150914_{detector}_spectrogram_whitened.png file:GW150914_H1_strain_unfiltered.png file:GW150914_H1_strain_filtered.png FN_Detector FN_Sampling_rate FN_Detector SHIFT_FREQUENCY_BANDPASSED WAVE_FILE_GENERATOR_FOR_WHITENED_DATA STRAIN_WAVEFORM_FOR_WHITENED_DATA fn_d = L-L1_LOSC_4_V1-1126259446-32.hdf5 H-H1_LOSC_4_V1-1126259446-32.hdf5 L-L1_LOSC_4_V1-1126259446-32.hdf5 shift frequency of bandpassed signal. Make sound files for whitened data. plot whitened data. H-H1_LOSC_16_V1-1126259446-32.hdf5 H-H1_LOSC_4_V1-1126259446-32.hdf5 whitened_bandpass_wavefile WHITENED_strain_data strain_H1_shifted strain_L1_shifted file:GW150914_{detector}_whitenbp.wav file:GW150914_strain_whitened.png

WAVE_FILE_GENERATOR_FOR_SHIFTED_DATA Make sound files for shifted data.

shifted_wavefile LOAD_DATA file:GW150914_{detector}_shifted.wav LOAD_DATA upstream(strain_LI_whitenbp) [prospective] Overall workflow strain_L1 fs strain_L1 = array([-1.779e-18, -1.765e-18, ..., -1.719e-18]) fs = 4096 strain_H1 strain_L1 fs FN_Sampling_rate FN_Detector • Intermediate data file:H-H1_LOSC_{Rate}_V1-... file:{Detector}_LOSC_4_V1-...

AMPLITUDE_SPECTRAL_DENSITY strain_L1_whitenbp AMPLITUDE_SPECTRAL_DENSITY depend only on 2 • 3 inputs spread LOAD_DATA

PSD_L1 out of 5 inputs! across 5 (=2x2 + 1) strain_H1 strain_L1 fs psd_L1 = scipy.interpolate.interpolate.interp1d PSD_H1 PSD_L1 object at 0x113969418 files

AMPLITUDE_SPECTRAL_DENSITY

WHITENING WHITENING • Does intermediate PSD_H1 PSD_L1 strain_L1_whiten data strain_H1_whiten strain_L1_whiten WHITENING strain_L1_whiten = array([8.494, -1.672, ..., 72.156]) strain_L1_whitenbp

depend on all 5 strain_H1_whiten strain_L1_whiten BANDPASSING BANDPASSING inputs? BANDPASSING strain_L1_whitenbp strain_L1_whitenbp strain_L1_whitenbp = array([8.184, 19.935,..., -0.684]) strain_L1_whitenbp Upstream of strain_L1_whitenbp Upstream of strain_L1_whitenbp Upstream of strain_L1_whitenbp (hybrid YW-NW at the code-level) (hybrid YW-NW at the file-level) (prospecve) YW-log implementaon for the SPNHC biodiversity data cleaning use case

clean_name_and_date_workflow.clean_scientific_name

original_dataset file:demo_input.csv

read_data_records Read original dataset

local_authority_source record_id_data scientificName file:demo_localDB.csv file:record_id.txt

read_scientific_name check_if_name_is_nonempty other_fields Read scientificName from local authority source Check if scientificName value is present

local_authority_source_scientificName_list nonEmpty_scientificName empty_scientificName RecordID

find_name_match log_name_is_empty Find if the scientificName matches any record in the local authority source using exact and fuzzy match Log records of empty scientific name with final status as unable to validate

authorship matching_record matching_method match_result record_final_status

update_authorship update_scientific_name log_match_not_found log_accepted_record Update authorship if fuzzy match is found Update scientificName if fuzzy match is found Log record where no match is found in authority source final status as unable to validate Log record final status as accepted

updated_authorship updated_scientificName unable-to-validate_record_count accepted_record_count

write_data_into_file log_updated_record log_summary initialize_run Write data into a new file Log records updating from old value to new value Summarize on all the records Create the run log file

data_with_cleaned_names name_cleaning_log file:demo_output_name_val.csv file:name_val_log.txt Conclusions

• Provenance from script runs can be revealed graphically and made aconable (e.g., to yield customizable data lineage reports) via Ø simple YW user annotaons Ø linking runme observables (e.g. DataONE RunManager,, noWorkflow), and Ø sharing provenance arfacts and executable queries. • Hybrid provenance can be created by Ø prospecve provenance + file-level runme observables Ø prospecve provenance + code-level runme observables • Hybrid view Ø the dual views of retrospecve and prospecve provenance can: ü reveal the overall story of a script run: à For documenng and explaining script-based scienfic workflows to others ü enable researchers to query and explore the detailed derivaon history of parcular data products yielded by script runs Future work

• Mul-run/script (ongoing) • Extend YW toolkit to • support other(oponal) workflow modeling constructs (e.g., control-flow to complement dataflow) • to support graph paern queries • to support project-level provenance. • Evolve ProvONE to support project-level provenance and graph queries • to enable mapping between YW and ProvONE data model s.t. ProvONE compable vocabulary extensions may be used in YW in the future • to generate and query ProvONE-compable RDF representaons of YW annotaons, workflow models, and retrospecve provenance, etc. Quesons?

Thank you!

• Demo site: hps://github.com/yesworkflow-org/yw-idcc-17 • Script (Python, MATLAB, R, …) • Script-based workflow (W) model Ø ßà high-level prospecve provenance graph that can be visualized Ø User-declared Ø Extracted via the YW toolkit (try.yesworkflow.org) Ø For script authors / future code readers / general users to have a high-level overview of the W’s structure, including its underlying dataflow structure • Query the database related to the conceptual workflow model ß à prospecve provenance queries Ø to reveal dataflow dependencies in the workflow model • Script run Ø Runme observables: input, output, and intermediate data product • Query the database regarding the runme observables ß à retrospecve provenance queries Ø link the conceptual workflow enes with observaons made about the script run Ø combine the informaon extracted from a marked-up script with references to data files corresponding to a run of that script • Hybrid query: Ø Hybrid provenance: prospecve + retrospecve ü situate runme observables (retrospecve provenance) in the overall workflow, yielding meaningful knowledge arfacts ü fragments of workflow execuon traces with structures provided by the user-declared YW models (prospecve provenance) and with execuon details filled in from one or more sources of runme observables (retrospecve provenance) Ø à Hybrid views, queries, and visualizaons Ø the dual views of retrospecve and prospecve provenance can: ü reveal the overall story of a script run: à For documenng and explaining script-based scienfic workflows to others ü enable researchers to query and explore the detailed derivaon history of parcular data products yielded by script runs