A Cpu-Gpu Framework for Astronomical Data Reduction and Analysis

UNIVERSIDAD DE CHILE FACULTAD DE CIENCIAS FÍSICAS Y MATEMATICAS´ DEPARTAMENTO DE CIENCIAS DE LA COMPUTACION´ FADRA: A CPU-GPU FRAMEWORK FOR ASTRONOMICAL DATA REDUCTION AND ANALYSIS TESIS PARA OPTAR AL GRADO DE MAGÍSTER EN CIENCIAS, MENCION´ COMPUTACION´ FRANCISCA ANDREA CONCHA RAMÍREZ PROFESOR GUÍA: MARÍA CECILIA RIVARA ZUŃIGA~ PROFESOR CO-GUÍA: PATRICIO ROJO RUBKE MIEMBROS DE LA COMISION:´ ALEXANDRE BERGEL JOHAN FABRY GONZALO ACUNA~ LEIVA Este trabajo ha sido parcialmente financiado por Proyecto FONDECYT 1120299 SANTIAGO DE CHILE 2016 Resumen Esta tesis establece las bases de FADRA: Framework for Astronomical Data Reduction and Analysis. El framework FADRA fue dise~nadopara ser eficiente, simple de usar, modular, expandible, y open source. Hoy en d´ıa,la astronom´ıaes inseparable de la computación,pero algunos de los software másusados en la actualidad fueron desarrollados tres décadasatrásy no estándise~nadospara enfrentar los actuales paradigmas de big data. El mundo del software astronómicodebe evolucionar no solo hacia prácticasque comprendan y adopten la era del big data, sino tambiénque esténenfocadas en el trabajo colaborativo de la comunidad. El trabajo desarollado consistióen el dise~no e implementaciónde los algoritmos básicos para el análisisde datos astronómicos, dando inicio al desarrollo del framework. Esto con- sideróla implementaciónde estructuras de datos eficientes al trabajar con un gran número de imágenes,la implementaciónde algoritmos para el proceso de calibracióno reducciónde imágenesastronómicas,y el dise~noy desarrollo de algoritmos para el cálculode fotometr´ıay la obtenciónde curvas de luz. Tanto los algoritmos de reduccióncomo de obtenciónde curvas de luz fueron implementados en versiones CPU y GPU. Para las implementaciones en GPU, se dise~naronalgoritmos que minimizan la cantidad de datos a ser procesados de manera de reducir la transferencia de datos entre CPU y GPU, proceso lento que muchas veces eclipsa las ganancias en tiempo de ejecuciónque se pueden obtener gracias a la paralelización. A pesar de que FADRA fue dise~nadocon la idea de utilizar sus algoritmos dentro de scripts, un módulo wrapper para interactuar a travésde interfaces gráficastambiénfue implementado. Una de las principales metas de esta tesis consistióen la validaciónde los resultados obtenidos con FADRA. Para esto, resultados de la reduccióny curvas de luz fueron comparados con resultados de AstroPy, paquete de Python con distintas utilidades para astrónomos. Los experimentos se realizaron sobre seis datasets de imágenesastronómicasreales. En el caso de reducciónde imágenesastronómicas,el Normalized Root Mean Squared Error (NRMSE) fue utilizado como métricade similaridad entre las imágenes. Para las curvas de luz, se probó que las formas de las curvas eran iguales a travésde la determinaciónde offsets constantes entre los valores numéricosde cada uno de los puntos pertenecientes a las distintas curvas. En términosde la validez de los resultados, tanto la reduccióncomo la obtenciónde curvas de luz, en sus implementaciones CPU y GPU, generaron resultados correctos al ser comparados con los de AstroPy, lo que significa que los desarrollos y aproximaciones dise~nados para FADRA otorgan resultados que pueden ser utilizados con seguridad para el análisis cient´ıficode imágenesastronómicas. En términosde tiempos de ejecución,la naturaleza intensiva en uso de datos propia del proceso de reducciónhace que la versiónGPU sea incluso máslenta que la versiónCPU. Sin embargo, en el caso de la obtenciónde curvas de luz, el algoritmo GPU presenta una disminuciónimportante en tiempo de ejecucióncomparado con su contraparte en CPU. i Abstract This thesis sets the bases for FADRA: Framework for Astronomical Data Reduction and Analysis. The FADRA framework is designed to be efficient and easy to use, modular, expandable, and open source. Nowadays, astronomy is inseparable from computer science, but some of the software still widely used today was developed three decades ago and is not up to date with the current data paradigms. The world of astronomical software development must start evolving not only towards practices that comprehend and embrace the big data era, but also that lead to collaborative work in the community. The work carried out in this thesis consisted in the design and implementation of basic algorithms for astronomical data analysis, to set the beginning of the FADRA framework. This encompassed the implementation of data structures that are efficient when working with a large number of astronomical images, the implementation of algorithms for astronomical data calibration or reduction, and the design and development of automated photometry and light curve obtention algorithms. Both the reduction and the light curve obtention algorithms were implemented on CPU and GPU versions. For the GPU implementations, the algorithms were designed considering the minimization of the amount of data to be processed, as a means to reduce the data transfer between CPU and GPU, a slow process which in many cases can even overshadow the gains in execution time obtatined through parallelization. Even though the main idea is for the FADRA algorithms to be run within scripts, a wrapper module to run Graphical User Interfaces (GUIs) for the code was also implemented. One of the most important steps of this thesis was validating the correctness of the results obtained with FADRA algorithms. For this, the results from the reduction and the light curve obtention processes were compared against results obtained using AstroPy, a Python package with different utilities for astronomers. The experiments were carried out over six datasets of real astronomical images. For the case of astronomical data reduction, the Normalized Root Mean Squared Error (NRMSE) was calculated between the images to measure their similarity. In the case of light curves, the shapes of the curves were proved to be equal by finding constant offsets between the numerical values for each data point belonging to a curve. In terms of correctness of results, both the reduction and light curve obtention algorithms, in their CPU and GPU implementations, proved to be correct when compared to AstroPy's results, meaning that the implementations and approximations designed for the FADRA framework provide correct results that can be confidently used in scientific analysis of astronomical images. Regarding execution times, the intensive data aspect of the reduction algorithm makes the GPU implementation even slower than the CPU implementation. How- ever, for the case of light curve obtention, the GPU algorithm presents an important speedup compared to its CPU counterpart. ii Acknowledgements First I would like to thank my family for always supporting me and helping me follow my dreams. This work and everything else I've accomplished so far would not have been possible without their love and encouragement. I would also like to thank Fernando Caro for his support and company, not only through the development of this thesis but in life. I would like to thank my friends for being the best company I could ask for, and for putting up with my long disappearances because \I have to work on my thesis tonight", many nights. Thank you for being so patient and for always cheering for me and supporting me. I also want to thank Professor Patricio Rojo for all these many years of friendly work and advice. This thesis would not have happened if it wasn't for him and his insistence on making better astronomical software. I would also like to thank Professor Maria Cecilia Rivara for her great support through my years as a student and all through this thesis, which I probably wouldn't have finished already if it wasn't for her relevant advice and comments. Both of my advisors were a fundamental part of my student years and of this work and I would not have made it this far if it wasn't for them. Finally I would like to express my thanks to the members of the revision committee, Professors Alexandre Bergel, Johan Fabry, and Gonzalo Acu~na,for their careful reviews of my thesis and for their relevant comments to improve it. Last but definitely not least I want to thank Ren Cerro for kindly taking the time to proof-read this text. iii Contents List of Tables vi List of Figures vii 1 Introduction 1 1.1 Astronomical data analysis . .1 1.2 Astronomical software development . .2 1.3 Thesis description . .3 1.3.1 Goals and objectives . .3 1.3.2 Research questions . .4 1.3.3 Software architecture . .4 1.3.4 Use of previous work . .5 1.3.5 Programming languages . .5 1.3.6 Validation of results . .5 2 Literature revision 8 2.1 Existing software . .9 2.2 Criticism of existing solutions . 15 3 Astronomical data and analysis 17 3.1 Astronomical data . 17 3.1.1 Astronomical images . 17 3.1.2 Astronomical spectra . 18 3.2 Astronomical image acquisition . 19 3.3 Astronomical image reduction . 21 3.4 Astronomical image processing . 22 3.4.1 Image arithmetic and combining . 23 3.4.2 Filter application . 24 3.4.3 Photometry . 25 3.4.4 Light curve or time series generation . 29 4 Introduction to General-Purpose Graphics Processing Unit (GPGPU) computing 30 4.1 What is the Graphics Processing Unit (GPU)? . 30 4.2 General-Purpose GPU computing (GPGPU) . 33 4.3 GPGPU use in astronomy . 34 4.3.1 GPGPU use for astronomical data analysis in this thesis . 35 iv 5 Software design and implementation 36 5.1 Data handling: AstroFile and AstroDir classes . 36 5.2 Calibration image combination and obtention of Master fields .

A Cpu-Gpu Framework for Astronomical Data Reduction and Analysis

CFITSIO User's Reference Guide

The Starlink Build System SSN/78.1 —Abstract I

ORAC-DR: Overview and General Introduction 4.1-0 SUN/230.6 —Abstract Ii

ORAC-DR: a Generic Data Reduction Pipeline Infrastructure✩

The Significant Properties of Software: a Study

Book of Abstracts 2009 European Week of Astronomy and Space

SLALIB — Positional Astronomy Library Programmer's Manual

Arxiv:1410.7513V1 [Astro-Ph.IM] 28 Oct 2014

Starlinksoftware Collection

FITSIO User's Guide

Learning from 25 Years of the Extensible N-Dimensional Data Format

Paper Writing