Data Mining by Grid Computing in the Search for Extrasolar Planets
Total Page:16
File Type:pdf, Size:1020Kb
Technological University Dublin ARROW@TU Dublin Doctoral Theses&Dissertations 2017 Data Mining by Grid Computing in the Search for Extrasolar Planets Oisin Creaner [Thesis] Dublin Institute for Advanced Studies, [email protected] Follow this and additional works at: https://arrow.tudublin.ie/ittthedoc Part of the Databases and Information Systems Commons, External Galaxies Commons, Instrumentation Commons, Numerical Analysis and Scientific Computing Commons, Other Astrophysics and Astronomy Commons, Other Computer Sciences Commons, Software Engineering Commons, Stars, Interstellar Medium and the Galaxy Commons, and the Theory and Algorithms Commons Recommended Citation Creaner, O. (2017) Data Mining by Grid Computing in the Search for Extrasolar Planets, Doctoral Thesis, Institute of Technology, Tallaght, Dublin. DOI:10.21427/7w45-6018 This Theses, Ph.D is brought to you for free and open access by the Theses&Dissertations at ARROW@TU Dublin. It has been accepted for inclusion in Doctoral by an authorized administrator of ARROW@TU Dublin. For more information, please contact [email protected], [email protected]. This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 License Data Mining by Grid Computing in the Search for Extrasolar Planets Oisín Oliver Creaner B.A. (mod) Being a Thesis presented for the award of Doctor of Philosophy Dr. Eugene Hickey CPhys, MInstP, DEA, PhD Kevin Nolan BSc MSc, Dr. Niall Smith Ph.D School of Science and Computing Institute of Technology, Tallaght, Dublin Old Belgard Road, Tallaght, Dublin 24 Submitted to Quality and Qualifications Ireland (QQI) July 2016 i i. Declaration Statement I hereby certify that the material, which I now submit for assessment on the programmes of study leading to the award of PhD, is entirely my own work and has not been taken from the work of others except to the extent that such work has been cited and acknowledged within the text of my own work. No portion of the work contained in this Thesis has been submitted in support of an application for another degree or qualification to this or any other institution. __________________________________ ________________ Signature of Candidate Date I hereby certify that all the unreferenced work described in this Thesis and submitted for the award of PhD, is entirely the work of Oisín Creaner. No portion of the work contained in this Thesis has been submitted in support of an application for another degree or qualification to this or any other institution. __________________________________ ________________ Signature of Supervisor Date __________________________________ ________________ Signature of Supervisor Date __________________________________ ________________ Signature of Supervisor Date ii ii. Acknowledgements I would like to take this opportunity to thank everyone who has helped me through these last few years, and without whom this Thesis would never have made it to publication. First, I would like to thank my supervisors, Dr. Eugene Hickey and Kevin Nolan – Eugene, your support and encouragement throughout have kept me going and Kevin, without your constant drive towards perfection and robust work has made this project what it is. I would like to thank everyone at the Institute of Technology, Tallaght, especially John Behan, Martin McCarrick and Tadhg O Briain, for the opportunity to complete this project, in particular through the provision of the PhD continuation fund. Second, I thank the entire team down in Blackrock Castle Observatory, most of all Dr. Niall Smith, my co-supervisor, for all the work which you have assisted us with throughout the years. Thanks are also due to the team at Grid Ireland in Trinity College Dublin, especially John Smith, Stephen Childs, John Ryan, David O’Callaghan and Gabriele Pierantoni for their help in bringing an astrophysicist into the world of High Performance Computing. This project made extensive use of data from SDSS. Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education Funding Council for England. The SDSS Web Site is http://www.sdss.org/. I thank my family for being there for me throughout it all, my parents Ann and Aidan, who first encouraged me to look to the stars, my brothers Dónal and Rúairí who kept me sane throughout the long years of work, my grandmother Miriam who provided me with a refuge away from the city in which to work, my nephew and many aunts, uncles, cousins and friends who have been there for me when needed. Finally, I dedicate this work to two people who were there for me from the start but who weren’t able to see me finish this work. To my granddad Pat and my nanny Peg, I hope that you’re proud of me. iii iii. Abbreviations List Abbreviation Meaning ΔC Difference in colour ΔCmax Maximum permitted difference in colour Δmmax Maximum permitted difference in magnitude ∆m Difference in magnitude AD Anno Domini ADASS Astronomical Data Analysis Software and Systems API Applications Programming Interface ASCII American Standard Code for Information Interchange AT&T American Telephone & Telegraph B Johnson B magnitude Bash Bourne-Again SHell BC Before Christ BCO Blackrock Castle Observatory CAS Catalogue Access Server CCD Charge Coupled Device cf. conferre (compare with) char Character CIT Cork Institute of Technology C-K Comparison-Check Photometry Cl Colour with respect to the next band of longer wavelength CLI Command Line Interpreter/Interface Cs Colour with respect to the next band of shorter wavelength CSV /.csv Comma Separated Value CTI Catalogue Information File DAS Data Access Server DE Development Environment Dec Declination double Double Precision Floating Point Number DR Data Release e.g. exempli gratia (for example) EC2 Elastic Cloud Compute EGI European Grid Infrastructure etc. et cetera (and so on) FITS / .fits /.fit Flexible Image Transfer System FORTRAN FORmula TRANslation (programming language) iv FoV Field of view (telescope specific) g SDSS g band magnitude GB GigaByte (10 9 or 2 30 Bytes) GMS Grid Management System GUI Graphical User Interface GUID Globally Unique IDentifier HDD Hard Disk Drive HDU Header Data Unit HPC High Performance Computing i SDSS i band magnitude IDE Integrated Development Environment (IDE) int Integer ITTD Institute of Technology, Tallaght, Dublin JDL Job Description Language JSS Job Submission System kB kiloByte (10 3 or 2 10 Bytes) ksh Korn SHell LAN Local Area Network LFC Logical File Catalogue lfc-cp gLite command. Analogous to cp lfc-cr gLite command. Copy and Register. lfc-ls gLite command. Analogous to ls LFN Logical File Name long Long form integer ls Unix "list" tool. Lists the contents of a directory LSST Large Synoptic Sky Survey max maximum MB MegaByte (10 6 or 2 20 Bytes) min minimum mn magnitude in a given band (n) mn+1 magnitude in the next band of longer wavelength than a given band (n) mn-1 magnitude in the next band of shorter wavelength than a given band (n) MPI Message Passing Interface MS MicroSoft N Number: usually an integer which refers to a counted quantity PC Personal Computer PoI Point of Interception PPR/.ppr Pipeline Parameter File v PRM/.prm API Parameter File PSF Point-Spread Function QE Quantum Efficiency r SDSS r band magnitude R Rating RA Right Ascension RAM Random Access Memory Rl Rating with respect to the next band of longer wavelength Rs Rating with respect to the next band of shorter wavelength S Size (of FoV) S3 Simple Storage Service SCG Scientific Computing Group SDSS Sloan Digital Sky Survey sh bourne SHell SNR Signal-to-Noise Ratio SQL Structured Query Language SSH Secure SHell SURL Storage Universal Resource Locator TB TeraByte (10 12 or 2 40 Bytes) TCD Trinity College, Dublin tsObj SDSS calibrated object lists TTV Transit Time Variation TXT/.txt Text file u SDSS u band magnitude U Johnson U magnitude UFS Unix File System USNO United States Naval Observatory V Johnson V magnitude VLDB Very Large Data Base VM Virtual Machine VO Virtual Organisation WAN Wide Area Network wc Unix word count tool WN Worker Node XLDB eXtremely Large Data Base z SDSS z band magnitude vi iv. Abstract A system is presented here to provide improved precision in ensemble differential photometry. This is achieved by using the power of grid computing to analyse astronomical catalogues. This produces new catalogues of optimised pointings for each star, which maximise the number and quality of reference stars available. Astronomical phenomena such as exoplanet transits and small-scale structure within quasars may be observed by means of millimagnitude photometric variability on the timescale of minutes to hours. Because of atmospheric distortion, ground-based observations of these phenomena require the use of differential photometry whereby the target is compared with one or more reference stars. CCD cameras enable the use of many reference stars in an ensemble. The more closely the reference stars in this ensemble resemble the target, the greater the precision of the photometry that can be achieved. The Locus Algorithm has been developed to identify the optimum pointing for a target and provide that pointing with a score relating to the degree of similarity between target and the reference stars. It does so by identifying potential points of aim for a particular telescope such that a given target and a varying set of references were included in a field of view centred on those pointings. A score is calculated for each such pointing. For each target, the pointing with the highest score is designated the optimum pointing. The application of this system to the Sloan Digital Sky Survey (SDSS) catalogue demanded the use of a High Performance Computing (HPC) solution through Grid Ireland.