Probabilistic Photometric Redshifts in the Era of Petascale Astronomy
Total Page:16
File Type:pdf, Size:1020Kb
c 2014 by Matias Carrasco Kind. All rights reserved. PROBABILISTIC PHOTOMETRIC REDSHIFTS IN THE ERA OF PETASCALE ASTRONOMY BY MATIAS CARRASCO KIND DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Astronomy in the Graduate College of the University of Illinois at Urbana-Champaign, 2014 Urbana, Illinois Doctoral Committee: Associate Professor Robert J. Brunner, Chair Associate Professor Athol J. Kemball Associate Professor Paul M. Ricker Professor Jon J. Thaler Abstract With the growth of large photometric surveys, accurately estimating photometric redshifts, preferably as a probability density function (PDF), and fully understanding the implicit systematic uncertainties in this process has become increasingly important. These surveys are expected to obtain images of billions of distinct galaxies. As a result, storing and analyzing all of these photometric redshift PDFs will be non-trivial, and this challenge becomes even more severe if a survey plans to compute and store multiple different PDFs. In this thesis, we have developed an end-to-end framework that will compute accurate and robust photometric redshift PDFs for massive data sets by using two new, state-of-the-art machine learning techniques that are based on a random forest and a random atlas, respectively. By using data from several photometric surveys, we demonstrate the applicability of these new techniques, and we demonstrate that our new approach is among the best techniques currently available. We also show how different techniques can be combined by using novel Bayesian techniques to improve the photometric redshift precision to unprecedented levels while also presenting new approaches to better identify outliers. In addition, our framework provides supplementary information regarding the data being analyzed, including unbiased estimates of the accuracy of the technique without resorting to a validation data set, identification of poor photometric redshift areas within the parameter space occupied by the spectroscopic training data, and a quantification of the relative importance of the variables used during the estimation process. Furthermore, we present a new approach to represent and store photometric redshift PDFs by using a sparse representation with outstanding compression and reconstruction capabilities. We also demonstrate how this framework can also be directly incorporated into cosmological analyses. The new techniques presented in this thesis are crucial to enable the development of precision cosmology in the era of petascale astronomical surveys. ii To the love of my life, my wife, the mother of our wonderful kids and my life partner Andrea. iii Preface This thesis is constructed as a compilation of articles in compliance with the rules for PhD thesis submission from the graduate research school at the University of Illinois at Urbana Champaign The refereed publications arisen from this thesis are: Carrasco Kind, M., & Brunner, R. J., 2013, “TPZ: photometric redshift PDFs and ancillary information by using • prediction trees and random forests”, MNRAS, 432, 1483 (Chapters3 and9) Carrasco Kind, M., & Brunner, R. J., 2014, “SOMz: photometric redshift PDFs with self-organizing maps and random • atlas”, MNRAS, 438, 3409 (Chapter4) Carrasco Kind, M., & Brunner, R. J., 2014, “Sparse representation of photometric redshift probability density • functions: preparing for petascale astronomy”, MNRAS, 441, 3550 (Chapters8 and9) Carrasco Kind, M., & Brunner, R. J., 2014, “Exhausting the Information: Novel Bayesian Combination of Photometric • Redshift PDFs”, MNRAS, 442, 3380 (Chapters6 and7) Sanch´ez, C., Carrasco Kind, M., Lin, H., Miquel, R., et al. 2014, “Photometric redshift analysis in the Dark Energy • Survey Science Verification data”, MNRAS accepted., arXiv: 1406.4407 (Chapter9) Banerji, M., Jouvel, S., Lin, H., McMahon, R.G., Lahav, O., Castander, F., Abdalla, F., Bertin, E., Bosman, S., • Carnero, A., Carrasco Kind, M., et al. 2014, “Combining Dark Energy Survey Science Verification Data with Near Infrared Data from the ESO VISTA Hemisphere Survey”, MNRAS submitted., arXiv: 1407.3801 (Chapter9) Other non-refereed publications related to this thesis: Carrasco Kind, M. & Brunner, R. J. 2013, “Implementing Probabilistic Photometric Redshifts”, Astronomical Data • Analysis Software and Systems XXII (ADASSXXII), ASPC, 475, 69C Newman, J., Abate, A., Abdalla, F., Allam, S., Allen, S., Ansari, R., Bailey, S., Barkhouse, W., Beers, T., Blanton, • M., Brodwin, M., Brownstein, J., Brunner, B., Carrasco Kind, M., Cervantes-Cota, J., Chisari, E., et al. Snowmass 2013 white paper, “Spectroscopic Needs for Imaging Dark Energy Experiments:Photometric Redshift Training and Calibration”, arXiv: 1309.5384 Abate, A., et al. LSST-DESC white paper, 2012, “LSST Dark Energy Science Collaboration”, arXiv: 1211.0310 • iv Acknowledgments Personal acknowledgments This thesis would certainly not have come to its completion without the help, support and trust of colleagues, friends and family. First, I would like to sincerely thank my advisor Robert J. Brunner for his help, guidance, advice, support and freedom during this work, for believing in me even before starting the thesis and for our extended brainstormings and interesting conversations about research, science, computing, family life and obviously soccer and for making this thesis a enjoyable experience. I am also grateful to the members of the thesis committee for their interest, comments and for taking the time to evaluate my work. I also want to thank my friends and colleagues in the Astronomy department who contributed to create and maintain a friendly and stimulating working environment during these four years. To Gary for his friend- ship, his help, our conversations and for suffering with me through several classes during our coursework. To Rukmani for her friendship, support and for her coffee companion during these years and for making CosmoCoffee a real thing. To Manuel and David for our lunches and lively Spanish discussions and friendship. To Jessica, Dom, Nachi, Andy, Michael, Margaret, Ian, Ashley, Ricky, and many others for being very friendly, for their understanding that I wish I’d had more time to spend with them. Special thanks Yiran and Edward for their support, discussions and for making our office a very enjoyable and friendly place. I would also like to thank many other people in this department for their support, advice and help: Bryan Dunne, Mary Margaret, Sandie, Rebecca, Kevin, Paul Ricker, Athol Kemball, Tony Wong, You-Hua Chu, Brian Fields. Special thanks to Judy and in particular to Jeri for their substantial help and for making this department a home-like place to work. I also want to acknowledge and thank my parents for their support from such a long distance, for doing the best in providing me with the best education and tools that helped to become the person I am today. Without their hard work, love, and support, none of this would have been possible. To my sisters for their love, support and for always being there for us. I would also like to sincerely thank my parents-in-law and sister-in-law who constantly traveled and helped us when we needed them. Their support and company v during the born and care of our three kids is unimaginable and without their help I couldn’t have gotten through this thesis. Last but definitely not least I would like to express my infinite gratitude to my wife Andrea, to whom this work is dedicated. Without her love, company, support, help, kindness, and advice I would not have been able to do this thesis. During these four years there was not a moment in which she wasn’t supporting and encouraging me, especially on the toughest and stressful of times she was able to remain calm and to provide love and peace to overcome them. All of this while being responsible for the care of our three happy and active little monsters. For her admirable dedication, love and unconditional support I will be eternally grateful. Technical acknowledgments I gratefully acknowledge the use of the parallel computing resource provided by the Computational Science and Engineering Program at the University of Illinois. The CSE computing resource, provided as part of the Taub cluster, is devoted to high performance computing in engineering and science. This work also used resources from the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number OCI-1053575. I also acknowledge the support from the National Science Foundation Grant No. AST-1313415. During two years of this thesis I have also been supported by the Computational Science and Engineering (CSE) fellowship at the University of Illinois at Urbana-Champaign. Funding for the DEEP2 Galaxy Redshift Survey has been provided by NSF grants AST-95-09298, AST- 0071048, AST-0507428, and AST-0507483 as well as NASA LTSA grant NNG04GC89G. This work is also based on observations obtained with MegaPrime/MegaCam, a joint project of CFHT and CEA/IRFU, at the Canada-France-Hawaii Telescope (CFHT) which is operated by the National Research Council (NRC) of Canada, the Institut National des Sciences de l’Univers of the Centre National de la Recherche Scientifique (CNRS) of France, and the University of Hawaii. This research used the facilities of the Canadian Astronomy Data Centre operated by the National Research Council of Canada with the support of the Canadian Space Agency. CFHTLenS data processing was made possible thanks