Reproducing GW150914: the First Observation of Gravitational

Reproducing GW150914: the first observation of gravitational waves from a binary black hole merger Duncan A. Brown Syracuse University Karan Vahi University of Southern California Michela Taufer University of Tennessee Knoxville Von Welch Indiana University Ewa Deelman University of Southern California Abstract—In 2016, LIGO and Virgo announced the first observation of gravitational waves from a binary black hole merger, known as GW150914. To establish the confidence of this detection, large-scale scientific workflows were used to measure the event’s statistical significance. They used code written by the LIGO/Virgo and were executed on the LIGO Data Grid. The codes are publicly available, but there has not yet been an attempt to directly reproduce the results, although several analyses have replicated the analysis, confirming the detection. We attempt to reproduce the result presented in the GW150914 discovery paper using publicly available code on the Open Science Grid. We show that we can reproduce the main result but we cannot exactly reproduce the LIGO analysis as the original data set used is not public. We discuss the challenges we encountered and make recommendations for scientists who wish to make their arXiv:2010.07244v2 [cs.DC] 2 Mar 2021 work reproducible. FOR THE SCIENTIFIC COMMUNITY to build tion of scientific results depend on computational on previous results, it must trust that these results elements, which in turn creates reproducibility are not accidental or transient, but rather that they challenges associated with the implementation can be reproduced to an acceptably high degree of of these computational elements. Being able to similarity by subsequent analyses. This notion of reason about the validity of published scientific reproducibility is magnified both in importance results and re-use them in derivative works be- and challenges in the context of computational comes an extremely challenging task. Publishers science workflows [1]. An increasingly large frac- have made great strides in including relevant artifacts along with the manuscripts. However, XX Published by the IEEE Computer Society © 2020 IEEE 1 data, methods, and results are still hard to find and key software packages such as HTCondor [9], harder still to reproduce (re-creating the results Pegasus [10], and the CERN Virtual Machine from the original author’s data and code), to Filesystem (CVMFS) [11]. We have created a replicate (arriving at the same conclusion from a script that automates the setup and deployment of study using new data or different methods), and the LIGO workflows on a typical local compute to re-use in derivative works (using code or data cluster and from there Pegasus manages their from a previous study in a new analysis) [2]. execution on OSG. Our work focuses on reproducing the com- Our main goal was to reproduce the results of putational analysis used to establish the sig- the PyCBC search shown in Figure 4 of Abbott nificance of the first detection of gravitational et al. [3], shown on the left side of our Figure1 , waves created colliding binary black holes and since this is the result used to make the statement observed by the Advanced Laser Interferometer that the signal is detected with a “significance Gravitational-wave Observatory (LIGO) [3]. As greater than 5:1 σ in the abstract of the paper. Our part of its commitment to Open Data, LIGO reproduction of this plot, shown on the right-hand made the data and scientific codes from its first side of Figure1 shows that we can reproduce the observing run available to the scientific commu- search result, but there are small, noticeable dif- nity. Previous analyses have replicated the results ferences in the search background (explained later of the GW150914 discovery [4], [5]. In these in the paper). Based on the LIGO documentation, analyses, the data from LIGO’s first observing run we believe that these differences are because the was re-analyzed either by independent teams of data used in the original analysis were different scientists with different codes, with different data, from that released by the Gravitational Wave or by using different workflows to those used in Open Science Center (GWOSC) [12] and used the original GW150914 discovery. in our analysis. Unfortunately, the original data In a previous work, we have performed a set is not public and so we are unable to confirm post-hoc comparison of these results using the this hypothesis. However, we consider our ability published papers and the PRIMAD reproducibil- to re-run a scientific workflow last executed in ity formalism [6]. Here, we attempt to repro- 2015 and largely reproduce the results to be a duce ab-initio the original LIGO analysis used successful demonstration of reproducibility. in the GW150914 discovery paper using public This article is structured as follows. First, information. Specifically, we attempt to reproduce we provide background on the first gravitational- the results of the PyCBC search for gravitational wave discovery. We then describe our recent waves [7], [8] shown in Figure 4 of Abbott et efforts on ab-inito analysis to reproduce the al. [3]. GW150914 result, followed by challenges we Our effort is not completely separate from the encountered. In the results section, we explain any original analysis, as one co-author of this paper difference observed in our reproduction of result was a member of the team involved in running the published by LIGO. We perform an analysis original LIGO analysis. However, our aim was to of the workflow run-time provenance data and automate the production of the result in a way the compute resources required to execute the that other co-authors of this paper who were not workflow. We conclude with recommendations members of the LIGO or Virgo collaborations, for others who wish to reproduce the GW150914 as well as other scientists, could reproduce the result. result. The original analysis workflows were exe- THE DISCOVERY OF GW150914 cuted on the LIGO Data Grid, a collection of Gravitational-wave astronomy is an interest- computational resources that are not available ing case study for robust science because it has to the wider community. Since non-LIGO sci- three main science phases: low-latency data anal- entists do not have access to these systems, ysis, offline analysis, and public and educational we execute the analysis on the Open Science dissemination of results. The low-latency analysis Grid (OSG) [9] and rely on a cyberinfrastructure processes instrumental data in near real time to software stack that has latest stable releases of identify astrophysical signals. Alerts are dissem- 2 XX Binary coalescence search Binary coalescence search 2σ 3σ 4σ 5.1σ > 5.1σ 2σ 3σ 4σ 5.0σ > 5.0σ 2σ 3σ 4σ 5.1σ > 5.1σ 2σ 3σ 4σ5.0σ > 5.0σ 102 102 Search Result Search Result 101 Search Background 101 Search Background Background excluding GW150914 Background excluding GW150914 100 100 1 1 10− 10− 2 2 10− 10− GW150914 GW150914 3 3 10− 10− 4 4 10− 10− 5 5 10− 10− Number of events Number of events 6 6 10− 10− 7 7 10− 10− 10 8 10 8 − 8 10 12 14 16 18 20 22 24 − 8 10 12 14 16 18 20 22 24 Detection statistic ρˆc Detection statistic ρˆc Figure 1. Results from the binary coalescence search presented in the GW150914 discovery paper from [3] with permission (left) and our attempt to reproduce these results (right). These histograms show the number of candidate events (orange markers) and the mean number of background events (black lines) as a function of the search detection statistic and with a bin width of 0.2. The scales on the top give the significance of an event in Gaussian standard deviations based on the corresponding noise background. We were able to reproduce the search result for GW150914, but we were unable to exactly reproduce the search background. The differences between the two figures is likely due to differences in the gravitational-wave stain data used, as described in the text. inated to the community to identify electromag- that did not make assumptions about the shape netic or neutrino counterparts to the gravitational- of the gravitational waveform [3] and one using wave signal. Offline analyses validate the low- matched filtering (comparing the data to a known latency detections, identify signals missed in waveform) to search for the signals from merging low-latency, and provide determination of source black holes [7], known as PyCBC. Here, we focus properties. When a detection is published, the on reproducing the results of the PyCBC binary data is released to the scientific community and black hole search. the public. Since the analysis codes are also released, it should be possible for people outside The PyCBC search uses matched filtering to the LIGO Scientific Collaboration and Virgo to compare the LIGO data with a bank of template reproduce the published results. binary black hole waveforms that model the target sources. If the noise in the LIGO detectors was Our attempt to reproduce the first detection stationary and Gaussian, the estimation of the of gravitational waves from binary black holes, statistical significance of candidate events that known as GW150914, starts from the data re- crossed a signal-to-noise ratio threshold would leased by the GWOSC [12]. GW150914 was first be straightforward. However, the LIGO detector detected by a low-latency search for gravitational- data contains non-Gaussian noise transients and wave bursts that identifies interesting candidates periods of non-stationary noise. As a result, ad- but does not provide the final statistical signifi- ditional signal-processing techniques are applied cance of detected events. To establish the signif- to the data that suppress non-Gaussian noise icance of events, data from the LIGO detectors events.

Load more