Research Thesis: Declaration of Authorship
Total Page:16
File Type:pdf, Size:1020Kb
Research Thesis: Declaration of Authorship Print name: Michael Johnson Title of thesis: Improving the Quality of Astronomical Survey Data I declare that this thesis and the work presented in it is my own and has been generated by me as the result of my own original research. I confirm that: 1. This work was done wholly or mainly while in candidature for a research degree at this University; 2. Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated; 3. Where I have consulted the published work of others, this is always clearly attributed; 4. Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work; 5. I have acknowledged all main sources of help; 6. Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself; 7. Parts of this work have been published as: [please list references below]: Signature: Date: References: Johnson, Michael AC, et al. "Prospecting for periods with LSST–low-mass X-ray binaries as a test case." Monthly Notices of the Royal Astronomical Society 484.1 (2019): 19-30. Strader, Jay, et al. "The Plane's The Thing: The Case for Wide-Fast-Deep Coverage of the Galactic Plane and Bulge." arXiv preprint arXiv:1811.12433 (2018). Johnson, Michael AC, et al. "Using the provenance from astronomical workflows to increase processing efficiency." International Provenance and Annotation Workshop . Springer, Cham, 2018. University of Southampton Research Repository Copyright © and Moral Rights for this thesis and, where applicable, any accompanying data are retained by the author and/or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis and the accompanying data cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content of the thesis and accompanying research data (where applicable) must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holder/s. When referring to this thesis and any accompanying data, full bibliographic details must be given, e.g. When referring to this thesis and any accompanying data, full bibliographic details must be given, eTh.ge. sis: Author (Year of Submission) "Full thesis title", University of Southampton, name of the University Faculty or School or Department, PhD Thesis, pagination. Thesis: Author (Year of Submission) "Full thesis title", University of Southampton, name of the UDantivae: rAsuittyh Foar c(uYletay ro) rT Sitcleh.o UoRl oI r[d Daetapsaertm] ent, PhD Thesis, pagination. Data: Author (Year) Title. URI [dataset] UNIVERSITY OF SOUTHAMPTON Improving the Quality of Astronomical Survey Data by Michael A. C. Johnson A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in the Faculty of Engineering, Science and Mathematics School of Electronics and Computer Science March 2020 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF ENGINEERING, SCIENCE AND MATHEMATICS SCHOOL OF ELECTRONICS AND COMPUTER SCIENCE Doctor of Philosophy by Michael A. C. Johnson iv Astronomical survey telescopes are becoming increasing capable at generating large datasets. The quantities of data being produced necessitate the automation of the data processing which is commonly accomplished via astronomical workflows. The large scale of the data also means that small improvements in the quality of the data processing can have large implications for the value of the science gained. However, deciding on which workflow configuration is best is usually a qualitative process, achieved through trial and improvement which lacks a quantitative measure of the quality of the results produced by each workflow version. Consequently, the best workflow cannot be reliably chosen. Thorough analysis is typically applied to find specific outputs from astronomical workflows, such as the magnitude of an object. However, this targeted analysis focuses on specific components and does not utilise the wider workflow space or the provenance of the workflows. This thesis therefore outlines an approach to be applied to workflows to assess over different workflow versions and measure the quality of data that they produce. To test the approach, it was applied to three separate use cases. The first application used the approach to predict the completeness of period recovery of tran- sient and variable astronomical sources with several candidate observing strategies from upcoming front line astronomical surveys. It was found that observing strategies which did not reduce the observations within the Galactic Plane increase the completeness by a factor of ∼3. The second was an investigation into the use of provenance to improve the timeliness of a differential photometry workflow. It was found that this method offered improvements of at least 96% in computational efficiency when analysing the outlined use cases. The third application was to improve the accuracy and completeness of a workflow designed to search for transients within a set of archival calibration data from an astronomical survey telescope. Workflow configurations were generated using the manual method in addition to via the approach. The best performing workflow found through the approach outperformed the workflow generated through the manual method and consequently found an additional ∼2,500 transient events. However, full evaluation of the approach could be a computationally expensive process, therefore the hill climbing algorithm was also investigated as a means to quickly find a verifiably good workflow configuration. The quality of the results produced by the workflow generated through this method were found to be within 0.2% of those produced by the highest quality workflow found. Contents Nomenclature xxiii Acknowledgements xxv 1 Introduction 1 1.1 Research Statement . 3 1.2 Contributions . 3 1.3 Structure . 4 2 Literature Review 5 2.1 Astronomical Objects . 5 2.1.1 Variable and Transient Objects . 5 2.1.1.1 Long Period Variable Stars . 5 2.1.1.2 Cataclysmic Variables . 6 2.1.1.3 X-ray Binaries . 6 2.2 Astronomical Data . 9 2.2.1 Photometry . 11 2.2.1.1 Source Detection . 11 2.2.1.2 Background Determination . 12 2.2.1.3 Image Registration . 12 2.2.1.4 Image Subtraction . 13 2.2.2 Data Analysis . 14 2.2.2.1 Period Determination . 14 2.3 Large Scale Astronomical Survey Telescopes . 15 2.3.1 The Large Synoptic Survey Telescope . 15 2.3.2 Gaia Space Observatory . 18 2.3.3 The Kepler Space Telescope . 19 2.3.4 Instrumentation and Imaging . 19 2.4 Handling Large Astronomical Datasets . 21 2.4.1 Data Types & Format . 21 2.4.2 Astronomical Workflows . 21 2.5 Provenance . 22 2.5.1 Template Provenance . 23 2.5.2 Provenance from Scientific Workflows . 24 2.6 The Five Characteristics of Good Quality Data . 24 2.6.1 Accuracy and Precision . 25 2.6.2 Reliability . 26 v vi CONTENTS 2.6.3 Timeliness and Relevance . 27 2.6.4 Completeness . 27 2.6.5 Availability and Accessibility . 27 2.7 Summary . 28 3 An Approach for Reasoning over Workflow Versions 29 3.1 Problem Formulation . 30 3.2 Definitions . 31 3.2.1 Processor . 31 3.2.2 Parameter . 32 3.2.3 Parameter Space . 32 3.2.4 Objects . 32 3.2.5 Workflow . 33 3.2.6 Edge . 34 3.2.7 Actual Quality Result . 34 3.2.8 Expected Quality Result . 34 3.2.9 Utility Function . 35 3.2.10 Provenance . 35 3.3 Problem Statements . 35 3.3.1 Theoretical Analysis of the Problem . 37 3.3.2 Implementation . 39 3.4 Summary . 41 4 Prospecting for Periods with the Large Synoptic Survey Telescope 43 4.1 The Requirements of the Workflow . 45 4.1.1 LMXB Lightcurve Simulations . 45 4.1.2 Observing Strategy . 47 4.1.3 Multiband Lomb-Scargle Period Measurement . 49 4.2 The Quality of Results Produced by the Workflow . 50 4.2.1 Instantiation of the Approach . 50 4.3 Investigating Versions of the Workflow . 52 4.3.1 Evaluating the Completeness . 52 4.3.2 Evaluation of the Approach . 53 4.4 Extrapolation to the Underlying Milky Way Population . 55 4.5 Discussion . 57 4.6 Conclusions . 61 5 Using the Provenance from Astronomical Workflows to Improve their Timeliness 63 5.1 Astronomy Application . 64 5.1.1 The Image Processing Pipeline . 64 5.1.2 Use Cases . 66 5.2 Provenance in Astronomy Simulations . 67 5.3 Instantiation of the Approach . 68 5.4 Evaluation . 70 5.4.1 Use Case 1 - Variation Investigation . 70 5.4.2 Use Case 2 - Calibration Propagation . 71 CONTENTS vii 5.5 Discussion . 72 5.6 Conclusions . 74 6 Finding Transients with Kepler 75 6.1 The Requirements of the Workflow . 76 6.1.1 Difference Image Analysis . 76 6.1.2 Source Detection and Aperture Photometry . 77 6.1.3 Refining the Objects . 79 6.1.4 Matching to Astronomical Databases . 80 6.2 The Quality of Results Produced by the Workflow . 80 6.2.1 Instantiation of the Approach . 81 6.3 Investigating Versions of the Workflow . 82 6.3.1 Evaluating the Completeness . 84 6.3.2 Evaluating the Accuracy . 86 6.3.3 Evaluation of the Approach . 86 6.3.4 Brute Force . 87 6.3.5 Hill Climbing . 93 6.4 Finding Transients Using the Workflow . 97 6.4.1 Increase in Magnitude Sub-Sample . 100 6.4.2 Signal to Noise Sub-Sample . ..