Classifying Exoplanet Candidates with Convolutional Neural Networks: Application to the Next Generation Transit Survey

Classifying exoplanet candidates with convolutional neural networks: application to the Next Generation Transit Survey Chaushev, A., Raynard, L., Goad, M. R., Eigmüller, P., Armstrong, D. J., Briegal, J. T., Burleigh, M. R., Casewell, S. L., Gill, S., Jenkins, J. S., Nielsen, L. D., Watson, C. A., West, R. G., Wheatley, P. J., Udry, S., & Vines, J. I. (2019). Classifying exoplanet candidates with convolutional neural networks: application to the Next Generation Transit Survey. Monthly Notices of the Royal Astronomical Society, 488(4), 5232-5250. https://doi.org/10.1093/mnras/stz2058 Published in: Monthly Notices of the Royal Astronomical Society Document Version: Peer reviewed version Queen's University Belfast - Research Portal: Link to publication record in Queen's University Belfast Research Portal Publisher rights © 2019 The Author(s) Published by Oxford University Press on behalf of the Royal Astronomical Society. This work is made available online in accordance with the publisher’s policies. Please refer to any applicable terms of use of the publisher. General rights Copyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The Research Portal is Queen's institutional repository that provides access to Queen's research output. Every effort has been made to ensure that content in the Research Portal does not infringe any person's rights, or applicable UK laws. If you discover content in the Research Portal that you believe breaches copyright or violates any law, please contact [email protected]. Download date:29. Sep. 2021 MNRAS 000,1{21 (2019) Preprint 26 July 2019 Compiled using MNRAS LATEX style file v3.0 Classifying Exoplanet Candidates with Convolutional Neural Networks: Application to the Next Generation Transit Survey Alexander Chaushev,1? Liam Raynard,2 Michael R. Goad,2 Philipp Eigmuller,¨ 3 David J. Armstrong,4;5 Joshua T. Briegal,6 Matthew R. Burleigh,2 Sarah L. Casewell,2 Samuel Gill,4;5 James S. Jenkins,7;8 Louise D. Nielsen,9 Christo- pher A. Watson,10 Richard G. West,4;5 Peter J. Wheatley,4;5 Stéphane Udry,9 Jose I. Vines7 1Center for Astronomy and Astrophysics, TU Berlin, Hardenbergstr. 36, D-10623 Berlin, Germany 2Department of Physics and Astronomy, University of Leicester, University Road, Leicester, LE1 7RH, UK 3Institute of Planetary Research, German Aerospace Center, Rutherfordstrasse 2, 12489 Berlin, Germany 4Dept. of Physics, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK 5Centre for Exoplanets and Habitability, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK 6Astrophysics Group, Cavendish Laboratory, J.J. Thomson Avenue, Cambridge CB3 0HE, UK 7Departamento de Astronom´ıa, Universidad de Chile, Casilla 36-D, Santiago, Chile 8Centro de Astrof´ısica y Tecnolog´ıas Afines (CATA), Casilla 36-D, Santiago, Chile 9Observatoire de Genève, Universitéde Genève, 51 Ch. des Maillettes, 1290 Sauverny, Switzerland 10Astrophysics Research Centre, School of Mathematics and Physics, Queen's University Belfast, Belfast BT7 1NN, UK Accepted for publication 23 July 2019 ABSTRACT Vetting of exoplanet candidates in transit surveys is a manual process, which suf- fers from a large number of false positives and a lack of consistency. Previous work has shown that Convolutional Neural Networks (CNN) provide an efficient solution to these problems. Here, we apply a CNN to classify planet candidates from the Next Generation Transit Survey (NGTS). For training datasets we compare both real data with injected planetary transits and fully-simulated data, as well as how their dif- ferent compositions affect network performance. We show that fewer hand labelled lightcurves can be utilised, while still achieving competitive results. With our best model, we achieve an AUC (area under the curve) score of ¹95:6 ± 0:2º% and an accu- racy of ¹88:5±0:3º% on our unseen test data, as well as ¹76:5±0:4º% and ¹74:6±1:1º% in comparison to our existing manual classifications. The neural network recovers 13 out of 14 confirmed planets observed by NGTS, with high probability. We use simulated data to show that the overall network performance is resilient to mislabelling of the training dataset, a problem that might arise due to unidentified, low signal-to-noise arXiv:1907.11109v1 [astro-ph.EP] 25 Jul 2019 transits. Using a CNN, the time required for vetting can be reduced by half, while still recovering the vast majority of manually flagged candidates. In addition, we identify many new candidates with high probabilities which were not flagged by human vetters. Key words: methods: data analysis { planets and satellites: detection { techniques: photometric 1 INTRODUCTION Exoplanets detected via the transit method constitute 80% of the total confirmed population (Akeson et al. 2013)1. ? E-mail: [email protected] However, current detection methods produce large numbers © 2019 The Authors 2 A. Chaushev & L. Raynard of false positives. Since these candidates are analysed man- of these are false positives, reducing to 82% after initial vet- ually by several human vetters, this is a time consuming ting tests. These numbers are broadly consistent with false process that lacks consistency. positives from other missions such as CoRoT and Kepler, Recent results (Shallue & Vanderburg 2018; Ansdell which can range from ∼ 50% to ∼ 90% (Deleuil et al. 2018; et al. 2018; Schanche et al. 2019; Osborn et al. 2019; Dat- Akeson et al. 2013; Santerne et al. 2016). tilo et al. 2019; Yu et al. 2019) have shown that a convolu- Large numbers of candidates with a high false posi- tional neural network (hereafter CNN) provides an efficient, tive rate demand many resources during the vetting process, automatic approach to classifying exoplanet candidates. A since candidates are analysed visually by a human being. It CNN can be used to reduce the time burden on human vet- is also difficult to ensure consistency across expert vetters, ters, as well as to identify promising candidates which may as some of the judgements may be subjective, particularly have been missed, particularly those in lower signal-to-noise for marginal candidates. (S/N) regimes where there are many false positives. In this paper, we present the first application of a CNN to data from the Next Generation Transit Survey (NGTS) 1.2 Deep Learning (Wheatley et al. 2018) and show that it is effective in classi- Machine learning is a subset of artificial intelligence which fying exoplanet candidates found by ORION, an implemen- studies algorithms that `learn' to perform a task instead of tation of the Box Least-Squares detection algorithm (BLS) following explicit steps. Machine learning approaches have (Collier Cameron et al. 2006). We demonstrate that there is become increasingly popular in the field of exoplanet de- good agreement between the CNN ranking and our exten- tection and vetting, and are being applied to address the sive database of classifications produced by expert human shortcomings of transit detection algorithms. vetters. In addition, we build on previous work by investi- McCauliff et al.(2015) and Mislis et al.(2016) utilised gating the optimal size and composition of the dataset used a random forest classifier (Breiman 2001) on Transit Cross- to train the neural network. Previous studies have relied on ing Events (TCEs) in Kepler data. Others have used self- false-positive candidates, identified during the human vet- organising maps to group Kepler lightcurves with simi- ting process, to formulate their CNN training data. By util- lar features and to classify new objects according to their ising transit injections we find that we can reduce the num- similarity with each group (Thompson et al. 2015; Arm- ber of human labelled lightcurves needed for training, while strong et al. 2017). Armstrong et al.(2018) combined a self- still achieving competitive results. Labelling data is a time- organising map with a random forest model to rank NGTS intensive process and a key road block in training a CNN. candidates produced by ORION. The Autovetter algorithm In x2 we describe our datasets and data preparation achieved an Area Under the Curve (AUC) score of 97.6% in procedures. In x3 we describe the architecture of the neural ranking injected transits against false positives in the NGTS network and set out our methods for training and optimising dataset. the CNN. We discuss the results of training using simulated More recently, a variety of machine learning techniques, data in x4. In x5 we characterise the performance of our net- called\Deep Learning", have provided performance improve- work using real NGTS data. We draw comparison to human ments for many applications (LeCun et al. 2015; Khamparia candidate classification in x6 and describe our search for & Singh 2019). A Deep Neural Network (DNN) consists of new, promising planet candidates in x7. Finally, we discuss three or more layers of interconnected neurons, and is ca- our results and present our conclusions in x8. pable of learning useful features for classifying the data au- tomatically. Performance of Deep Learning techniques have also been shown to scale well with large volumes of data 1.1 Transit search (Sun et al. 2017). These traits are advantageous for the exoplanet candidate classification problem. A CNN is a common Variants of the Box-Least Squares fitting (BLS)(Kovács form of a DNN, which is loosely based on the architecture et al. 2002) and matched filter (Jenkins 2002; Bordéet al. of the animal visual cortex (LeCun et al. 1990, 1998). CNNs 2007) methods have become the canonical tools for the de- are particularly suited to data that contains spatial struc- tection of exoplanet signals in transit lightcurves.

Classifying Exoplanet Candidates with Convolutional Neural Networks: Application to the Next Generation Transit Survey

Arxiv:2003.04650V1 [Astro-Ph.EP] 10 Mar 2020 Sphere (Arcangeli Et Al

Exoplanetary Atmospheres: Key Insights, Challenges, and Prospects

Abstracts of Extreme Solar Systems 4 (Reykjavik, Iceland)

Ground-Based Search for the Brightest Transiting Planets with the Multi-Site All-Sky Camera - MASCARA

KELT-25 B and KELT-26 B: a Hot Jupiter and a Substellar Companion Transiting Young a Stars Observed by TESS

An Ultrahot Neptune in the Neptune Desert

Atmospheres of Hot Alien Worlds

The Multi-Site All-Sky Camera (MASCARA)

The Science of Echo Exoplanet Characterisation Observatory

Inferring Atmospheric Characteristics from Transiting Exoplanets

Considerations for Planning TESS Extended Missions

Abstracts Book