Applicability Study of the PRIMAD Model to LIGO Gravitational Wave Search Workflows Dylan Chapp Danny Rorabaugh Duncan A
Total Page:16
File Type:pdf, Size:1020Kb
Workshop Presentation P-RECS'19, June 24, 2019, Phoenix, AZ, USA Applicability Study of the PRIMAD Model to LIGO Gravitational Wave Search Workflows Dylan Chapp Danny Rorabaugh Duncan A. Brown [email protected] [email protected] [email protected] University of Delaware University of Tennessee, Knoxville Syracuse University Ewa Deelman Karan Vahi Von Welch [email protected] [email protected] [email protected] University of Southern California University of Southern California Indiana University, Bloomington Michela Taufer [email protected] University of Tennessee, Knoxville ABSTRACT in turn opens the door to reproducibility challenges associated with The PRIMAD model with its six components (i.e., Platform, Re- the implementation of those computational elements. search Objective, Implementation, Methods, Actors, and Data) pro- In order to reason about and assess reproducibility in the compu- vides an abstract taxonomy to represent computational experiments tational context, the PRIMAD model [21] was proposed. PRIMAD and promote reproducibility by design. In this paper, we employ breaks reproducibility into six named components (i.e., Platform, a post-hoc assessment of the model applicability to a set of Laser Research objective, Implementation, Methods, Actors, and Data), Interferometer Gravitational-Wave Observatory (LIGO) workflows each of which represents an element of a computational experi- from literature sources (i.e., published papers). Our work outlines ment where reproducibility can be enforced by design, or conversely potential advantages and limitations of the model in terms of its where a lack of such design can allow irreproducibility to seep in levels of abstraction and means of application. and corrode the overall integrity of the experiment. To evaluate the efficacy of PRIMAD as a tool for characterizing KEYWORDS the reproducibility of real-world computational science workflows, we examine computational workflows used to detect gravitational Reproducibility; LIGO; Workflows; PRIMAD waves using data from the Laser Interferometer Gravitational-Wave ACM Reference Format: Observatory (LIGO) [1] and the Virgo Observatory [10]. These com- Dylan Chapp, Danny Rorabaugh, Duncan A. Brown, Ewa Deelman, Karan putational workflows are designed to detect various astronomical Vahi, Von Welch, and Michela Taufer. 2019. Applicability Study of the events, including binary black hole mergers [3ś7] and binary neu- PRIMAD Model to LIGO Gravitational Wave Search Workflows. In 2nd tron star mergers [8]. International Workshop on Practical Reproducible Evaluation of Computer There are three factors that make the gravitational-wave search Systems (P-RECS’19), June 24, 2019, Phoenix, AZ, USA. ACM, New York, NY, workflows particularly appropriate for our post-hoc study through USA, 6 pages. https://doi.org/10.1145/3322790.3330591 the lens of PRIMAD: (1) LIGO and Virgo have reached a mature status with findings that have been recognized by the scientific 1 INTRODUCTION community at large; (2) gravitational-wave search workflows sup- The ability of the scientific community to incrementally build on port high impact scientific findings (i.e., empirical confirmation of experimental results depends strongly on the ability to trust that the existence of gravitational waves) that are subject to the highest those results are not accidental or transient, but rather that they can levels of scrutiny from the broader scientific community; and (3) the be reproduced to an acceptably high degree of similarity by subse- workflows consume a large amount of data and have high internal quent experiments. This notion of reproducibility is magnified both complexity, leading to reproducibility challenges in terms of the in importance and difficulty in the setting of computational science implementation and data management components of PRIMAD. workflows [39, 40]. An increasingly large fraction of scientific en- We tackle the study of gravitational-wave search workflows in deavors depend on or incorporate computational elements, which a post-hoc fashion using literature sources (i.e., published papers), rather than from the runtime point of view. The study mimics the ef- Permission to make digital or hard copies of all or part of this work for personal or fect of scientists in replicating the work done by others as a starting classroom use is granted without fee provided that copies are not made or distributed point for new research. Specifically, we study two gravitational- for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM wave searches presented by the LIGO Scientific Collaboration and must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, the Virgo Collaboration, and a third search presented by an indepen- to post on servers or to redistribute to lists, requires prior specific permission and/or a dent group of scientists1. These are: (1) the O1 Binary Catalogue [9]; fee. Request permissions from [email protected]. P-RECS’19, June 24, 2019, Phoenix, AZ, USA © 2019 Association for Computing Machinery. 1We note that this independent group contained former LIGO Collaboration members. ACM ISBN 978-1-4503-6756-1/19/06...$15.00 However, they only had access to publicly available data, code, and information in https://doi.org/10.1145/3322790.3330591 their analysis. 1 Workshop Presentation P-RECS'19, June 24, 2019, Phoenix, AZ, USA ambiguous descriptions. This scenario and its challenges match 1 of the LIGO detectors (O1). The LIGO-Virgo O2 Binary Catalogue such efforts as the Student Cluster Competition [22]. paper [23] and its constituent analyses use the dataset from Ob- The third scenario explores the impact of varying, to different serving Run 2 of the LIGO detectors (O2) as well as data from the degrees, all PRIMAD components of the model except the research VIRGO detector. The 1-OGC paper [29, 30] describes a run of the objectives and the data. By using the same data (e.g., the same PyCBC matched filtering workflow on the public LIGO O1 data initial conditions such as pressure, temperature, and volume in a using an updated pipeline configuration compared to the analysis molecular dynamic simulation) but allowing limited variability of in the LIGO-Virgo O1 Binary Catalogue. The relationships between implementations and platforms or both, along with published ma- the workflows and their parent publications are summarized inthe terial and related artifacts, an independent team is able to produce first two rows of Table 2 together with the six components ofthe result that confirm or disprove the same hypothesis presented inthe PRIMAD model. published material. The challenges associated with this, and follow- O1 Binary Catalog [9] O2 Binary Catalog [23] 1-OCG [29] ing scenarios, depend on the degree of variation in the individual PRIMAD Component O1BC- O1BC-PyCBC O2BC-PyCBC O2BC-GstLAL 1-OGC components of the PRIMAD model that we allow. GstLAL The fourth scenario is a scenario that is less constrained than (P)latform Proprietary Proprietary Public the previous one. It explores the impact of varying all PRIMAD • Re-analyze O1 with improved pipelines components except the research objectives. This scenario incorpo- (R)esearch • Discover BBH Mergers • Demonstrate Discover BBH Mergers • Re-analyze O2 with improved data quality reproducibility of O1 Objective • Re-analyze O1 with improved pipelines result on public data rates the additional challenge of collecting new data according to • Produce and release the data generation description in one or multiple original publi- sub-threshold events cations. The study can be performed by using the same or similar (I)mplementation See implementation comparison table implementations and platforms. In other words, minor variations in (M)ethod/ Matched filtering implementation and platform are acceptable in this scenario. Along Algorithm with the published material and related artifacts, an independent (A)ctors LIGO collaboration authors OGC authors team is able to produce results that confirm or disprove the same (D)ata See data comparison table hypothesis. Table 2: Overview of LIGO analyses considered in our case The fifth scenario studies the ability of a team of actors torepro- study in terms of the 6 PRIMAD model components. duce the research of another team using a different implementation (e.g., in molecular dynamics simulations using AMBER rather than CHARMM). They address the same research objectives, but the level of variability in workflow outcomes depends upon the changes in Implementation Component O1BC-PyCBC O2BC-PyCBC O1BC-GstLAL O2BC-GstLAL 1-OGC implementation. Template bank TWB-O1BC TWB-O2-PyCBC TWB-O1BC TWB-O2-GstLAL TWB-OGC The sixth scenario tackles the study of a scientific application See [15, 36, 39] See [18] See [15, 36, 39] See [28] See [14, 32, 33] Waveform model Delegated to SEOBNRv4 + Delegated to SEOBNRv4 + TaylorF2 SEOBNRv4 + for which the workflow has a substantial change of the platform multiple refs. TaylorF2 waveform multiple refs. waveform approximant TaylorF2 waveform approximant approximant and implementation. Specifically, this scenario deals with a major Template bank 15 equally sized technological transformation (e.g., from multiprocessor to acceler- decomposition Delegated to companion papers or proprietary sub-banks Noise-filtering Delegated to [2] Delegated to [37] Delegated to Delegated to [37] Delegated to [28, ated platforms). All components