Classification of Stars from Redshifted Stellar Spectra Utilizing Machine Learning
Total Page:16
File Type:pdf, Size:1020Kb
Central Washington University ScholarWorks@CWU All Master's Theses Master's Theses Spring 2019 Classification of Stars from Redshifted Stellar Spectra utilizing Machine Learning Michael J. Brice Central Washington University, [email protected] Follow this and additional works at: https://digitalcommons.cwu.edu/etd Part of the Artificial Intelligence and Robotics Commons, Numerical Analysis and Scientific Computing Commons, and the Stars, Interstellar Medium and the Galaxy Commons Recommended Citation Brice, Michael J., "Classification of Stars from Redshifted Stellar Spectra utilizing Machine Learning" (2019). All Master's Theses. 1207. https://digitalcommons.cwu.edu/etd/1207 This Thesis is brought to you for free and open access by the Master's Theses at ScholarWorks@CWU. It has been accepted for inclusion in All Master's Theses by an authorized administrator of ScholarWorks@CWU. For more information, please contact [email protected]. CLASSIFICATION OF STARS FROM REDSHIFTED STELLAR SPECTRA UTILIZING MACHINE LEARNING __________________________________ A Thesis Presented to The Graduate Faculty Central Washington University ___________________________________ In Partial Fulfillment of the Requirements for the Degree Master of Science Computational Science ___________________________________ by Michael James Brice June 2019 CENTRAL WASHINGTON UNIVERSITY Graduate Studies We hereby approve the thesis of Michael James Brice Candidate for the degree of Master of Science APPROVED FOR THE GRADUATE FACULTY ______________ _________________________________________ Dr. Răzvan Andonie, Committee Chair ______________ _________________________________________ Dr. Szilárd VAJDA ______________ _________________________________________ Dr. Boris Kovalerchuk ______________ _________________________________________ Dean of Graduate Studies ii ABSTRACT CLASSIFICATION OF STARS FROM REDSHIFTED STELLAR SPECTRA UTILIZING MACHINE LEARNING by Michael James Brice June 2019 The classification of stellar spectra is a fundamental task in stellar astrophysics. There have been many explorations into the automated classification of stellar spectra but few that involve the Sloan Digital Sky Survey (SDSS). Stellar spectra from the SDSS are applied to standard classification methods such as K Nearest Neighbors, Random Forest, and Support Vector Machine to automatically classify the spectra. Stellar spectra are high dimensional data and the dimensionality is reduced using standard Feature Selection methods such as Chi-Squared and Fisher score and with domain-specific astronomical knowledge because classifiers work in low dimensional space. These methods are utilized to classify the stellar spectra into the two standard star classification schemes, the Harvard Spectral Classification and the Morgan Keenan Luminosity Classes. If a star is classified into both of these schemes, many stellar properties can be approximated with ease, whereas the direct approaches can take up to months of observations. A physical phenomenon known as redshift causes machine learning complications through the feature matrix when classifying stellar spectra. However, classifiers utilizing redshifted stellar spectra performed with high accuracy. An approach for extracting redshift using predictions from the classification models is explored. iii ACKNOWLEDGEMENTS I would like to thank Dr. Razvan˘ Andonie and Dr. Szilard´ VAJDA for helping me secure an account for the CWU High-Performance Machine (Turing) and for assisting me with its operations. I would like to thank the Sloan Digital Sky Survey for providing me with their database and Benjamin Alan Weaver for providing me with assistance regarding the Sloan Digital Sky Survey. Dr. Boris Kovalerchuk and Dr. Szilard´ VAJDA who provided feedback and advice. I would like to thank my colleagues, friends, and family for listening to me talk about my research and for providing me with feedback. I am extremely grateful and thankful that Dr. Razvan˘ Andonie accepted to be my thesis adviser and allowed me to work on a thesis that is related to astronomy and for providing advice and ideas to achieve my goals. iv AUTHOR COMMENTS This thesis utilizes hyperlinks. If reading in PDF form, every reference to a Chapter, Section, Subsection, Equation, Table, Figure, or Reference is clickable and will automatically transition to the appropriate page. v TABLE OF CONTENTS Chapter Page I INTRODUCTION . .1 II PREVIOUS RESULTS ON STELLAR SPECTRA CLASSIFICATION . .6 III ASTRONOMY BACKGROUND . 12 Stellar Spectra . 12 Stellar Classification Types . 14 Importance of Stellar Classes . 15 Redshift . 16 IV MACHINE LEARNING BACKGROUND . 19 Classifier Methods . 19 Feature Selection Methods . 25 V CLASSIFICATION INTO THE HARVARD SPECTRAL CLASSIFICATION SCHEME . 28 Approach to Classification . 28 Experiments . 36 Discussion . 38 VI SINGLE CLASSIFICATION INTO BOTH HARVARD AND MK CLASSIFICATION SCHEMES . 43 Approach to Classification . 43 Experiments . 56 Discussion . 57 vi TABLE OF CONTENTS (CONTINUED) Chapter Page VII AN ALTERNATIVE METHOD FOR REDSHIFT EXTRACTION FOR FUTURE WORK . 63 Approach to Redshift Extraction and Results . 63 Discussion . 66 VIII CONCLUSIONS . 68 REFERENCES CITED . 70 vii LIST OF TABLES Table Page 1 Example of the feature matrix . 32 2 Example of the feature matrix that has redshifted data . 34 3 10-Fold cross validation results for Chi-Squared feature selection with undersampling for redshifted spectra and artificial rest spectra. 39 4 10-Fold cross validation results for Chi-Squared feature selection with hybrid sampling for redshifted spectra and artificial rest spectra using Random Forest. 40 5 10-Fold Cross Validation Results for Fisher feature selection with undersampling for redshifted spectra and artificial rest spectra. 41 6 10-Fold cross validation results for Fisher feature selection with hybrid sampling for redshifted spectra and artificial rest spectra using Random Forest. 41 7 Precision, Recall, and F1 Score for Fisher feature selection with Hybrid for Random Forest. 42 8 Execution time for hybrid sampling for random forest with artificial rest spectra. 42 9 Example of the feature matrix using two sets of wavelengths around two absorption lines for a total of 34 features . 55 10 10-Fold cross validation results for KNN. 58 11 10-Fold cross validation Precision, Recall, and F1 Score for KNN using Oversampling. 59 12 10-Fold cross validation Execution Times for KNN using Oversampling. 59 13 10-Fold cross validation results for RF. 59 14 10-Fold cross validation Precision, Recall, and F1 Score for RF using Oversampling. 60 15 10-Fold cross validation Execution Times for RF using Oversampling. 60 16 Sample of redshift extraction results for A type stars. 67 viii LIST OF FIGURES Figure Page 1 Example: Stellar Spectrum of a flux scaled G2 star collected by the SDSS. 13 2 Example: Stellar Spectrum of a flux scaled M5 star collected by the SDSS. 13 3 Hertzsprung Russell Diagram. 15 4 Simplified example of redshift. 18 5 Example of redshift. 18 6 Example of a Decision Tree. 21 7 Example of building a Decision Tree. 22 8 Example of linearly separating two classes. 24 9 Example of Support Vector Machine. 24 10 Distribution of classes in the dataset for ChapterV................ 29 11 Example of fitting a spectrum’s wavelengths to the template wavelengths: Right Shifting. 31 12 Example of fitting a spectrum’s wavelengths to the template wavelengths: Left Shifting. 32 13 Example of a spectrum’s wavelengths and flux arrays after fitting to a template. 32 14 10-Fold cross validation results for Chi-Squared feature selection with undersampling for redshifted spectra and artificial rest spectra. 39 15 10-Fold cross validation results for Chi-Squared feature selection with hybrid sampling for redshifted spectra and artificial rest spectra spectra. 40 16 10-Fold cross validation results for Fisher feature selection with undersampling for redshifted spectra and artificial rest spectra. 40 17 10-Fold cross validation results for Fisher feature selection with hybrid sampling for redshifted spectra and artificial rest spectra. 41 18 10-Fold cross validation results for Fisher feature selection for random forest. 42 ix LIST OF FIGURES (CONTINUED) Figure Page 19 Distribution of classes in the dataset for ChapterVI................ 44 20 Example of a B Type Star focusing on wavelengths near Hδ and Ca I absorption lines. 47 21 Example of an A Type Star focusing on wavelengths near Hδ and Ca I absorption lines. 48 22 Example of a K Type Star focusing on wavelengths near Hδ and Ca I absorption lines. 49 23 Example of a M Type Star focusing on wavelengths near Hδ and Ca I absorption lines. 50 24 Example of a F Type Star focusing on wavelengths near Hδ and Ca I absorption lines. 51 25 Example of a G Type Star focusing on wavelengths near Hδ and Ca I absorption lines. 52 26 Example of the same Harvard class with different wavelength width (Full Width Half Max) for the same absorption line for different MK classes. 53 27 Example of how Redshift is preserved. 53 28 10-Fold cross validation results for K Nearest Neighbors using Undersampling, Hybrid, and Oversampling. 58 29 Precision, Recall, and F1 Score for Oversampling with K Nearest Neighbors . 58 30 10-Fold cross validation results for Random Forest using Undersampling, Hybrid, and Oversampling. 59 31 Precision, Recall, and F1 Score for Oversampling with Random Forest . 60 32 Sample of continuum normalized spectrums from O - G type stars. The arrows point to the Hδ absorption line. 62 33 Redshift Extraction. 65 x CHAPTER I INTRODUCTION