Cluster Analysis and Unsupervised Machine Learning Applied to Find Open Star Clusters in the Milky Way
Total Page:16
File Type:pdf, Size:1020Kb
Cluster analysis and unsupervised machine learning applied to find open star clusters in the Milky Way This dissertation is submitted for the degree of physicist By Andrés Felipe Amar Lesmes Faculty of Science Physics Department Bogotá D.C. Colombia November 2020 Cluster analysis and unsupervised machine learning applied to find open star clusters in the Milky Way This dissertation is submitted for the degree of physicist By Andrés Felipe Amar Lesmes Universidad de los Andes Faculty of Science Physics Department Advisor Alejandro García Ph. D. Bogotá D.C. Colombia 2020 This project is dedicated to my family who have been a big support along this track, To my dearest Luna... Acknowledgements I would like to express thanks to Dr. Alejandro García for being a big support and good guidance along this project, thanks for the patience and for always pushing me forward in this task. I would like to express also my genuine thanks to my Mother, Grandmother, Aunt and Sister for encourage me. All my friends and colleagues, who have helped me during the degree in order to study and understand many topics, specially thanks to Juanpi, Alejo2, Roosvelt, Coco, Jorge, Ávila, Gus and Daniel Calderón. To my school teacher Hermes Ortíz who gave me a wonderfull introduction of the beauty of physics. And finally my grandad who always accompanied me in difficult moments. Abstract The main intention of this project is to characterize open clusters of the Milky Way via unsupervised machine learning with the clustering algorithm density-based spatial clustering of applications with noise (DBSCAN), the data used was retrieved from Gaia DR2 in favor of having standard data for further analysis based on literature and comparison also with it. The performance of the DBSCAN algorithm shows a highly efficiency seeking the clusters compared with the literature, for the open clusters Beehive, Pleiades, NGC 2451, Hyades, Blanco 1 and Persei double cluster. The structures were filtered in a precise way and the analysis of the colour-magnitude diagram are similar to what was previously described. A new parameter was proposed for the DBSCAN algorithm in charge of optimizing the execution time, also gaining computing time for large volumes of data. The calculated mem- bers of each cluster are very similar and in some cases higher than those found in the literature. To obtain a better modeling of the data, combinations of parameters must be calculated in detail in an iterative way. Key words: Astronomy, Milky Way, Open clusters, Machine learning, Clustering, DB- SCAN. iv Resumen El objetivo principal de este proyecto es caracterizar cúmulos abiertos en la Vía Láctea por medio de aprendizaje automático no supervisado usando el algoritmo de agrupamiento Density-based spatial clustering of applications with noise (DBSCAN), usando la base de datos Gaia DR2 con el fin de obtener datos estándares y usados en la literatura para sufutura comparación. Los resultados del algoritmo DBSCAN muestran una eficiencia muy alta con respecto a lo comparado en la literatura para los cúmulos de Beehive, Pleiades, NGC 2451, Hyades, Blanco 1 y el cúmulo doble de Perseo. Las estructuras se filtraron de forma precisa yel análisis muestra que los diagramas de color magnitud son similares a la literatura. Se propuso un nuevo parámetro para el algoritmo DBSCAN encargado de optimizar el tiempo de ejecución, gananado así mismo tiempo de cómputo para grandes volúmenes de datos. Los miembros de cada cúmulo calculados son muy similares y en algunos casos superiores a los encontrados en la literatura. Para obtener un mejor modelamiento de los datos es preciso calcular detalladamente de forma iterativa combinaciones de parámetros. Palabras claves: Astronomía, Vía Láctea, Cúmulos abiertos, Machine learning, Cluster- ing, DBSCAN. Table of contents 1 Introduction1 1.1 Motivation . .1 1.2 GAIA astrometric mission . .2 1.3 What is an open cluster, why study them? . .4 1.4 Hertzsprung-Russell diagrams . .5 1.4.1 Beehive . .6 1.4.2 Pleiades . .7 1.4.3 Persei double cluster . .8 1.4.4 NGC 2451 . .9 1.4.5 Hyades . 10 1.4.6 Blanco 1 . 11 2 GAIA mission 13 3 Fundamental Concepts of Machine Learning 17 3.1 What is machine learning ? . 17 3.1.1 Quick overview of unsupervised machine learning . 17 3.2 Mathematical aspects of unsupervised machine learning . 18 3.2.1 Partitioning Methods . 18 3.2.2 Hierarchical Methods . 19 3.2.3 Density-based spatial clustering of applications with noise (DBSCAN) 22 4 Results and analysis 27 4.1 DBSCAN applied to test open clusters . 27 4.2 Pleiades . 28 5 Conclusions and future work 35 Bibliography 37 vi Table of contents Appendix A 41 A.0.1 Beehive . 41 A.0.2 Persei double cluster . 43 A.0.3 NGC2451 . 45 A.0.4 Hyades . 47 A.0.5 Blanco 1 . 49 Appendix B GAIA SEARCH CODE 53 Appendix C RUWE CODE 55 Appendix D Main Code 57 Chapter 1 Introduction 1.1 Motivation The mysteries of the evolution of the Universe and punctually our galaxy The Milky Way, have been one of the most intriguing and difficult problems to understand due to many facts through history like the lack of theoretical resources and artifacts to measure multiple variables as luminosity, speed, size, weight, etc. It has inspired scientists to develop a formal and technical way to get closer to this understanding. Physics changes this paradigm with new study branches. In order to dive into the main topic, the definition of the brands that came out from physics about the study of the Universe can be classified as follows: The word astronomy was a general term that described the science of the planets, moons, sun, and stars, and all other heavenly bodies. In other words, astronomy meant the study of anything beyond Earth. Although still an applicable term, modern astronomy, like most other sciences, has been divided and subdivided into many specialties. Disciplines that study the planets include planetary geology and planetary atmospheres. The study of the particles and fields in space is divided into magnetospheric physics, ionospheric physics, and cosmic and heliospheric physics. The Sun has its solar physics discipline. The origin and evolution of the Universe is the subject of cosmology [1]. In the first place, to study galaxy formation, it is essential to realize that galaxies contain clusters or groups with a high-density agglomeration of space objects that might correlate with masses, gravitational force, and other characteristics that enhance the relationship of grouping. Nowadays, we are capable of simulating the data acquired by telescopes into a 2 Introduction computer, which gives us the chance to work locally on outer space data. In this project, we are going to implement a method called unsupervised machine learning of type clustering to GAIA data in order to locate open star clusters in the Milky Way given by certain parameters discussed along with this entire thesis. Moreover, verify the efficiency of implemented Clustering algorithms with test clusters already known as Beehive (M44) and Pleiades (M45), NGC 2451, Hyades, Blanco 1, Persei double cluster. 1.2 GAIA astrometric mission GAIA is an european astrometric space mission released on 19 of december of 2013, whose main object was to get information of positions and proper motions of the stars with a high precision of 20mas, moreover photometric meassures obtaining multicolor and multiepoch observations of the stars. Figure 1.1: GAIA astrometric satellite1spacecraft. The dimensions of the spacecraft are 4:3m of diameter and and 2:3m height, the shape of the spacecraft can be appreciated as shown in the artistic representation on Figure 1:1 1Image taken from https://www.cosmos.esa.int/web/gaia/mediagallery/images/ig_spacecraft 1.2 GAIA astrometric mission 3 Is shown the spacecraft was launched by Arianespace, using a Soyuz ST-B2 rocket with a Fregat-MT3. The main process consisted of deploy the rocket’s upper stage after launched towards a distance of around 1:5 million kilometers from earth in a orbit known as Lagrange L2. This point is used to fix an object in a stable orbit, for this space mission the main purposewas place the satellite lined up with the sun and earth as shown in figure 1.3. Given a three body problem (Figure 1:2) where the center of mass is at x = 0 mass M1 is located at x = −r1 and , mass M2 is at position x = r2 Figure 1.2: Lagrange L24points, this diagram states the optimal orbit. GM1 GM2 G(M1 + M2) 2 + 2 = 3 x; (1.1) (x + r1) (x − r2) (r1 + r2) working out the solution for x we get r 3 M2 x = r2(1 + ): (1.2) 3M1 2Soyuz-ST-B,it is a three-stage carrier rocket for placing payloads into low Earth orbit https://space.skyrocket.de/doc_lau_det/soyuzstb_fregatmt.htm 3Fregat-MT, upper stage is an autonomous and flexible upper stage that is designed to operate as an orbital vehicle https:==space.skyrocket.de=doc_stage=fregat.htm 4 Image taken from https://www.esa.int/ScienceE xploration=SpaceScience=Herschel=L2t hesecondLagrangianPoint 4 Introduction In equation (1.2) r2 is the distance between the 2 massive bodies, M2 is the mass of the object who is spinning up due to the center mass M1.The zone Lagrange L2 brings not just the gravitational field required for an excellent motion flow but also confined the wholesolar energy that the satellite could produce through the solar panels, just as well displace the satellite thermal equilibrium. GAIA creates the largest and most homogeneous catalogue of stellar properties with the integrated spectrophotometry and parallaxes to calculate the luminosity of stars. 1.3 What is an open cluster, why study them? An open cluster is a group or agglomeration of few thousand stars that were borned from giant molecular cloud.