An Automated Pipeline for Variability Detection and Classification for The

An Automated Pipeline for Variability Detection and Classification for the Small Telescopes Installed at the Liverpool Telescope Paul Ross McWhirter A thesis submitted in partial fulfilment of the requirements of Liverpool John Moores University for the degree of Doctor of Philosophy. July 2018 Abstract The Small Telescopes at the Liverpool Telescope (STILT) is an almost decade old project to install a number of wide field optical instruments to the Liverpool Telescope, named Skycams, to monitor weather conditions and yield useful photometry on bright astronomical sources. The motivation behind this thesis is the development of algorithms and techniques which can automatically exploit the data generated during the first 1200 days of Skycam operation to catalogue variable sources in the La Palma sky. A previously developed pipeline reduces the Skycam images and produces photometric time-series data named light curves of millions of objects. 590,492 of these objects have 100 or more data points of sufficient quality to attempt a variability analysis. The large vol- ume and relatively high noise of this data necessitated the use of Machine Learning and sophisticated optimisation techniques to successfully extract this information. The Skycam instruments have no control over the orientation and pointing of the Liver- pool Telescope and therefore resample areas of the sky highly irregularly. The term used for this resampling in astronomy is cadence. The unusually irregular Skycam cadence places increased strain on the algorithms designed for the detection of periodicity in light curves. This thesis details the development of a period estimation method based on a novel implementation of a genetic algorithm combined with a generational clustering method. Named GRAPE (Genetic Routine for Astronomical Period Estimation), this algorithm deconstructs the space of possible periods for a light curve into regions in which the genetic population clusters. These regions are then fine-tuned using a k-means clustering algorithm to return a set of independent period candidates which are then analysed using a Vuong closeness test to discriminate between aliased and true periods. This thesis demonstrates the capability of GRAPE on a set of synthetic light curves built using traditional regular cadence sampling and Skycam style cadence for four different shapes of periodic light curve. The performance of GRAPE on these light curves ii is compared to a more traditional periodogram which returns a set of peaks and is then analysed using Vuong closeness tests. GRAPE obtains similar performance compared to the periodogram on all the light curve shapes but with less computational complexity allowing for more efficient light curve analysis. Automated classification of variable light curves has been explored over the last decade. Multiple features have been engineered to identify patterns in the light curves of different classes of variable star. Within the last few years deep learning has come to prominence as a method of automatically generating informative representations of the data for the solution of a desired problem, such as a classification task. A set of models using Ran- dom Forests, Support Vector Machines and Neural Networks were trained using a set of variable Skycam light curves of five classes. Using 16 features engineered from previous methods an Area under the Curve (AUC) of 0.8495 was obtained. Replacing these features with inputs from the pixel intensities from a 100 by 20 pixel image representation, produced an AUC of 0.6348, which improved to 0.7952 when provided with additional context to the dimensionality of the image. Despite the inferior performance, the importance of the different pixels produced relations in the trained models demonstrating that they had produced features based on well-understood patterns in the different classes of light curve. Using features produced by Richards et al. and Kim & Bailer-Jones et al., a set of features to train machine learning classification models was constructed. In addition to this set of features, a semi-supervised set of novel features was designed to describe the shape of light curves phased around the GRAPE candidate period. This thesis investigates the performance of the PolyFit algorithm of Prsa et al., a technique to fit four piecewise polynomials with discontinuous knots capable of connecting across the phase boundary at phases of zero and one. This method was designed to fit eclipsing binary phased light curves however were also described to be fully capable on other variable star types. The optimisation method used by PolyFit is replaced by a novel genetic algorithm optimisation routine to fit the model to Skycam data with substantial improvement in performance. The PolyFit model is applied to the candidate period and twice this period for every classified light curve. This interpolation produces novel features which describe similar statistics to the previously developed methods but which appear significantly more resilient to the Skycam noise and are often preferred by the trained models. In addition, Principal Component Analysis (PCA) is used to investigate iii a set of 6897 variable light curves and discover that the first ten principal components are sufficient to describe 95% of the variance of the fitted models. This trained PCA model is retained and used to generate twenty novel shape features. Whilst these features are not dominant in their importance to the learned models, they have above average importance and help distinguish some objects in the light curve classification task. The second principal component in particular is an important feature in the discrimination of short period pulsating and eclipsing variables as it appears to be an automatically learned robust skewness measure. The method described in this thesis produces 112 features of the Skycam light curves, 38 variability indices which are quickly obtainable and 74 which require the computation of a candidate period using GRAPE. A number of machine learning classifiers are investi- gated to produce high-performance models for the detection and classification of variable light curves from the Skycam dataset. A Random Forest classifier uses a training set of 859 light curves of 12 object classes to produce a classifier with a multi-class F1 score of 0.533. It would be computationally infeasible to produce all the features for every Skycam light curve, therefore an automated pipeline has been developed which com- bines a Skycam trend removal pipeline, GRAPE and our machine learned classifiers. It initialises with a set of Skycam light curves from objects cross-matched from the Amer- ican Association of Variable Star Observers (AAVSO) Variable Star Index (VSI), one of the most comprehensive catalogues of variable stars available. The learned models classify the full 112 features generated for these cross-matched light curves and confident matches are selected to produce a training set for a binary variability detection model. This model utilises only the 38 variability indices to identify variable light curves rapidly without the use of GRAPE. This variability model, trained using a random forest classifier, obtains an F1 score of 0.702. Applying this model to the 590,492 Skycam light curves yields 103,790 variable candidates of which 51,129 candidates have been classified and are available for further analysis. iv Publications In the course of completeing the work presented in this thesis, the content of Chapter 3 has been accepted for publication in refereed journals: McWhirter, P. R., Steele, I. A., Hussian, A., Al-Jumeily, D. & Vellasco, M. M. B. R., 2018, `GRAPE: Genetic Routine for Astronomical Period Estimation', Monthly Notices of the Royal Astronomical Society. In the course of completeing the work presented in this thesis, the contents of Chapter 3 & 5 have been submitted for publication in refereed journals: McWhirter, P. R., Hussian, A., Al-Jumeily, D., Fergus, P., Steele, I. A. & Vellasco, M. M. B. R., 2018, `Classifying Periodic Astrophysical Phenomena from non-survey optimized variable-cadence observational data', Expert Systems with Applications. In the course of completeing the work presented in this thesis, the contents of Chapter 5 (Fourier Decomposition section) & 6 section 1 have been published in conference proceedings: McWhirter, P. R., Wright, S., Steele, I. A., Al-Jumeily, D., Hussian, A. & Fergus, P., 2016, À dynamic, modular intelligent-agent framework for astronomical light curve analysis and classification’, Lecture Notes in Computer Science (including subseries Lec- ture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9771, 820{831. v McWhirter, P. R., Steele, I. A., Al-Jumeily, D., Hussian, A. & Vellasco, M. M. B. R., 2017, `The classification of periodic light curves from non-survey optimized observational data through automated extraction of phase-based visual features', Proceedings of the International Joint Conference on Neural Networks (2017) 2017-May, 3058{3065. In the course of completeing the work presented in this thesis, the contents of Chapter 6, section 2 are in preparation for publication in refereed journals: McWhirter, P. R., Steele, I. A., Hussian, A., Al-Jumeily, D. & Vellasco, M. M. B. R., 2018, `Representation Learning on light curves using the PolyFit algorithm'. Paul Ross McWhirter October 2018 vi Acknowledgements I would like to thank Dr. Paul Fergus for starting me on the road to this PhD by securing my funding. This research would have been impossible without this contribution and whilst different supervisors were ultimately selected, the occasional advice was very much appreciated. I would also like to thank my Computer Science supervisors, Prof. Dhiya Al-Jumeily and Prof. Abir Hussain for managing the administration of this research and securing funding for an exchange visit to Rio de Janeiro in 2016 which substantially boosted my personal knowledge and the direction of this study. Further to this, I would like to thank Prof Marley Vellasco from the Pontifical Catholic University in Rio de Janeiro for being so welcoming and helpful in the development of the Genetic Algorithms in this work.

An Automated Pipeline for Variability Detection and Classification for The

Commission 27 of the Iau Information Bulletin

The Agb Newsletter

Arxiv:0809.1275V2

On the Polar Distances of the Greenwich Transit Circle

Information Bulletin on Variable Stars

IAU Division C Working Group on Star Names 2019 Annual Report

Variable Star Section Circular

Downloads/ Astero2007.Pdf) and by Aerts Et Al (2010)

Carbon Stars T. Lloyd Evans

Patrick Moore's Practical Astronomy Series

Hyperspace  NASA BPP Program  Books 8

FY13 High-Level Deliverables