A Cpu-Gpu Framework for Astronomical Data Reduction and Analysis
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSIDAD DE CHILE FACULTAD DE CIENCIAS F´ISICAS Y MATEMATICAS´ DEPARTAMENTO DE CIENCIAS DE LA COMPUTACION´ FADRA: A CPU-GPU FRAMEWORK FOR ASTRONOMICAL DATA REDUCTION AND ANALYSIS TESIS PARA OPTAR AL GRADO DE MAG´ISTER EN CIENCIAS, MENCION´ COMPUTACION´ FRANCISCA ANDREA CONCHA RAM´IREZ PROFESOR GU´IA: MAR´IA CECILIA RIVARA ZU´NIGA~ PROFESOR CO-GU´IA: PATRICIO ROJO RUBKE MIEMBROS DE LA COMISION:´ ALEXANDRE BERGEL JOHAN FABRY GONZALO ACUNA~ LEIVA Este trabajo ha sido parcialmente financiado por Proyecto FONDECYT 1120299 SANTIAGO DE CHILE 2016 Resumen Esta tesis establece las bases de FADRA: Framework for Astronomical Data Reduction and Analysis. El framework FADRA fue dise~nadopara ser eficiente, simple de usar, modular, expandible, y open source. Hoy en d´ıa,la astronom´ıaes inseparable de la computaci´on,pero algunos de los software m´asusados en la actualidad fueron desarrollados tres d´ecadasatr´asy no est´andise~nadospara enfrentar los actuales paradigmas de big data. El mundo del software astron´omicodebe evolucionar no solo hacia pr´acticasque comprendan y adopten la era del big data, sino tambi´enque est´enenfocadas en el trabajo colaborativo de la comunidad. El trabajo desarollado consisti´oen el dise~no e implementaci´onde los algoritmos b´asicos para el an´alisisde datos astron´omicos, dando inicio al desarrollo del framework. Esto con- sider´ola implementaci´onde estructuras de datos eficientes al trabajar con un gran n´umero de im´agenes,la implementaci´onde algoritmos para el proceso de calibraci´ono reducci´onde im´agenesastron´omicas,y el dise~noy desarrollo de algoritmos para el c´alculode fotometr´ıay la obtenci´onde curvas de luz. Tanto los algoritmos de reducci´oncomo de obtenci´onde curvas de luz fueron implementados en versiones CPU y GPU. Para las implementaciones en GPU, se dise~naronalgoritmos que minimizan la cantidad de datos a ser procesados de manera de reducir la transferencia de datos entre CPU y GPU, proceso lento que muchas veces eclipsa las ganancias en tiempo de ejecuci´onque se pueden obtener gracias a la paralelizaci´on. A pesar de que FADRA fue dise~nadocon la idea de utilizar sus algoritmos dentro de scripts, un m´odulo wrapper para interactuar a trav´esde interfaces gr´aficastambi´enfue implementado. Una de las principales metas de esta tesis consisti´oen la validaci´onde los resultados obtenidos con FADRA. Para esto, resultados de la reducci´ony curvas de luz fueron compara- dos con resultados de AstroPy, paquete de Python con distintas utilidades para astr´onomos. Los experimentos se realizaron sobre seis datasets de im´agenesastron´omicasreales. En el caso de reducci´onde im´agenesastron´omicas,el Normalized Root Mean Squared Error (NRMSE) fue utilizado como m´etricade similaridad entre las im´agenes. Para las curvas de luz, se prob´o que las formas de las curvas eran iguales a trav´esde la determinaci´onde offsets constantes entre los valores num´ericosde cada uno de los puntos pertenecientes a las distintas curvas. En t´erminosde la validez de los resultados, tanto la reducci´oncomo la obtenci´onde curvas de luz, en sus implementaciones CPU y GPU, generaron resultados correctos al ser comparados con los de AstroPy, lo que significa que los desarrollos y aproximaciones dise~nados para FADRA otorgan resultados que pueden ser utilizados con seguridad para el an´alisis cient´ıficode im´agenesastron´omicas. En t´erminosde tiempos de ejecuci´on,la naturaleza intensiva en uso de datos propia del proceso de reducci´onhace que la versi´onGPU sea incluso m´aslenta que la versi´onCPU. Sin embargo, en el caso de la obtenci´onde curvas de luz, el algoritmo GPU presenta una disminuci´onimportante en tiempo de ejecuci´oncomparado con su contraparte en CPU. i Abstract This thesis sets the bases for FADRA: Framework for Astronomical Data Reduction and Analysis. The FADRA framework is designed to be efficient and easy to use, modular, expandable, and open source. Nowadays, astronomy is inseparable from computer science, but some of the software still widely used today was developed three decades ago and is not up to date with the current data paradigms. The world of astronomical software development must start evolving not only towards practices that comprehend and embrace the big data era, but also that lead to collaborative work in the community. The work carried out in this thesis consisted in the design and implementation of basic algorithms for astronomical data analysis, to set the beginning of the FADRA framework. This encompassed the implementation of data structures that are efficient when working with a large number of astronomical images, the implementation of algorithms for astronomical data calibration or reduction, and the design and development of automated photometry and light curve obtention algorithms. Both the reduction and the light curve obtention algorithms were implemented on CPU and GPU versions. For the GPU implementations, the algorithms were designed considering the minimization of the amount of data to be processed, as a means to reduce the data transfer between CPU and GPU, a slow process which in many cases can even overshadow the gains in execution time obtatined through parallelization. Even though the main idea is for the FADRA algorithms to be run within scripts, a wrapper module to run Graphical User Interfaces (GUIs) for the code was also implemented. One of the most important steps of this thesis was validating the correctness of the results obtained with FADRA algorithms. For this, the results from the reduction and the light curve obtention processes were compared against results obtained using AstroPy, a Python package with different utilities for astronomers. The experiments were carried out over six datasets of real astronomical images. For the case of astronomical data reduction, the Normalized Root Mean Squared Error (NRMSE) was calculated between the images to measure their similarity. In the case of light curves, the shapes of the curves were proved to be equal by finding constant offsets between the numerical values for each data point belonging to a curve. In terms of correctness of results, both the reduction and light curve obtention algorithms, in their CPU and GPU implementations, proved to be correct when compared to AstroPy's results, meaning that the implementations and approximations designed for the FADRA framework provide correct results that can be confidently used in scientific analysis of as- tronomical images. Regarding execution times, the intensive data aspect of the reduction algorithm makes the GPU implementation even slower than the CPU implementation. How- ever, for the case of light curve obtention, the GPU algorithm presents an important speedup compared to its CPU counterpart. ii Acknowledgements First I would like to thank my family for always supporting me and helping me follow my dreams. This work and everything else I've accomplished so far would not have been possible without their love and encouragement. I would also like to thank Fernando Caro for his support and company, not only through the development of this thesis but in life. I would like to thank my friends for being the best company I could ask for, and for putting up with my long disappearances because \I have to work on my thesis tonight", many nights. Thank you for being so patient and for always cheering for me and supporting me. I also want to thank Professor Patricio Rojo for all these many years of friendly work and advice. This thesis would not have happened if it wasn't for him and his insistence on making better astronomical software. I would also like to thank Professor Maria Cecilia Rivara for her great support through my years as a student and all through this thesis, which I probably wouldn't have finished already if it wasn't for her relevant advice and comments. Both of my advisors were a fundamental part of my student years and of this work and I would not have made it this far if it wasn't for them. Finally I would like to express my thanks to the members of the revision committee, Professors Alexandre Bergel, Johan Fabry, and Gonzalo Acu~na,for their careful reviews of my thesis and for their relevant comments to improve it. Last but definitely not least I want to thank Ren Cerro for kindly taking the time to proof-read this text. iii Contents List of Tables vi List of Figures vii 1 Introduction 1 1.1 Astronomical data analysis . .1 1.2 Astronomical software development . .2 1.3 Thesis description . .3 1.3.1 Goals and objectives . .3 1.3.2 Research questions . .4 1.3.3 Software architecture . .4 1.3.4 Use of previous work . .5 1.3.5 Programming languages . .5 1.3.6 Validation of results . .5 2 Literature revision 8 2.1 Existing software . .9 2.2 Criticism of existing solutions . 15 3 Astronomical data and analysis 17 3.1 Astronomical data . 17 3.1.1 Astronomical images . 17 3.1.2 Astronomical spectra . 18 3.2 Astronomical image acquisition . 19 3.3 Astronomical image reduction . 21 3.4 Astronomical image processing . 22 3.4.1 Image arithmetic and combining . 23 3.4.2 Filter application . 24 3.4.3 Photometry . 25 3.4.4 Light curve or time series generation . 29 4 Introduction to General-Purpose Graphics Processing Unit (GPGPU) computing 30 4.1 What is the Graphics Processing Unit (GPU)? . 30 4.2 General-Purpose GPU computing (GPGPU) . 33 4.3 GPGPU use in astronomy . 34 4.3.1 GPGPU use for astronomical data analysis in this thesis . 35 iv 5 Software design and implementation 36 5.1 Data handling: AstroFile and AstroDir classes . 36 5.2 Calibration image combination and obtention of Master fields .