Applying Convolutional Neural Networks to Classify Fast Radio Bursts Detected by The CHIME Telescope

by

Prateek Yadav

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF

Master of Science

in

THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Physics)

The University of British Columbia (Vancouver)

April 2020

c Prateek Yadav, 2020 The following individuals certify that they have read, and recommend to the Fac- ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:

Applying Convolutional Neural Networks to Classify Fast Radio Bursts Detected by The CHIME Telescope submitted by Prateek Yadav in partial fulfillment of the requirements for the de- gree of Master of Science in Physics.

Examining Committee: Dr. Ingrid H. Stairs, Astronomy Supervisor Dr. Gary F. Hinshaw, Astronomy Supervisory Committee Member

ii Abstract

The Canadian Hydrogen Intensity Mapping Experiment (CHIME) is a novel ra- dio telescope that is predicted to detect up to several dozens of Fast Radio Bursts (FRBs) per day. However, CHIME’s FRB detection software pipeline is suscep- tible to a large number of false positive triggers from terrestrial sources of Radio Frequency Interference (RFI). This thesis details the description of intensityML, a software pipeline designed to generate waterfall plots and automatically classify ra- dio bursts detected by CHIME without explicit RFI-masking and DM-refinement. The pipeline uses a convolutional neural network based classifier trained exclu- sively on the events detected by CHIME, and the classifier has an accuracy, preci- sion and recall of over 99%. It has also successfully discovered several FRBs, both in real-time and from archival data. The ideas presented in this thesis may play a key role in designing future machine-learning models for FRB classification.

iii Lay Summary

Fast radio bursts are bright bursts of radio waves that last about a few milliseconds in duration. These radio bursts originate from outside of our galaxy, but their exact origins remain a mystery. CHIME is a novel radio telescope which aims to detect a large number of these radio bursts in order to better understand their origins. However, radio telescopes are extremely susceptible to picking up radio signals from terrestrial sources, such as airplanes and mobile phones. This thesis presents an automated classifier, which can look at the radio bursts detected by CHIME and tell whether they had a terrestrial or an astrophysical origin.

iv Preface

The work presented in this thesis was primarily done by me, with contributions from various members of The CHIME/FRB collaboration. All members of collab- oration have contributed to this thesis in some way or the other through instrument and software development or through data acquisition and verification. The description of The CHIME/FRB system in Section 1.2 has been mostly summarised from previous publications by The CHIME/FRB Collaboration [4][5]. Chapter 2 presents some basic background knowledge for understanding convolu- tional neural networks, which can be readily found in most deep-learning textbooks such as Goodfellow et al., 2016 [19]. Dr. Shriharsh Tendulkar played a significant role in supervising and providing insights into the development of the plotting routine described in Section 3.2. The scripts were primarily coded by me, with some contributions from Dr. Shriharsh Tendulkar and Mr. Chitrang Patel. These scripts extensively utilise modules from Intensity Analysis Utilities, a library developed by The CHIME/FRB collaboration to analyze intensity data from radio telescopes. The idea of DM-augmentation, described in Subsection 3.2.2, also emerged from discussions within the collabo- ration. The classifier described in Section 3.1 was developed and trained indepen- dently by me. While most of the events used were labelled by the members of the collaboration, they were also double-checked by me. Mr. Charanjot Brar and Mr. Chitrang Patel assisted me with the real-time deployment of intensityML. Dr. Ingrid Stairs supervised the overall project and assisted with editing this thesis.

v Table of Contents

Abstract ...... iii

Lay Summary ...... iv

Preface ...... v

Table of Contents ...... vi

List of Tables ...... viii

List of Figures ...... ix

Glossary ...... xiii

Acknowledgments ...... xiv

1 Introduction ...... 1 1.1 Fast Radio Bursts ...... 1 1.2 The CHIME/FRB Project ...... 3 1.3 Related Work: Use of Machine Learning in FRB Classification . .8 1.3.1 Hybrid Deep Neural Network ...... 8 1.3.2 Transfer Learning on ImageNet Models ...... 10 1.4 Thesis Organisation ...... 11

2 Introduction to Convolutional Neural Networks ...... 12 2.1 Feed-Forward Neural-Networks ...... 12

vi 2.2 Convolutional Neural Networks ...... 15

3 Description of intensityML ...... 19 3.1 The FRBNet Architecture ...... 19 3.2 Generating Waterfall Plots ...... 23 3.2.1 Automated Plotting Scripts ...... 23 3.2.2 Data Augmentation ...... 24

4 Results ...... 28 4.1 Training ...... 28 4.2 Results ...... 29

5 Discussion ...... 33 5.1 Discussion and Future Work ...... 33 5.1.1 Discussion ...... 33 5.1.2 Future Work ...... 34 5.1.3 Science Goals ...... 35 5.2 Conclusion ...... 36

Bibliography ...... 38

vii List of Tables

Table 4.1 Accuracy, precision, recall and F1-score computed on the test set. 29 Table 4.2 L1 SNRs and DMs for events shown in Figures 4.1 and 4.2... 30

viii List of Figures

Figure 1.1 (a) Plot on the left shows the waterfall plot for an FRB after correcting for the effects from electromagnetic dispersion. (b) The plot on the right shows the same waterfall plot, but with partial dedispersion to demonstrate the effects of the quadratic dispersive delay in Equation 1.1. The masked frequency chan- nels in both plots are due to interference from the LTE band. .2 Figure 1.2 CHIME radio telescope located at The Dominion Radio Astro- physical Observatory (DRAO) in Canada. Photo taken by Mark Halpern and used with permission...... 4 Figure 1.3 A schematic of CHIME’s signal path. The raw data collected by the four reflectors (red arcs) is transferred to the F- and X- Engines at a rate of 13 Tb/s. The F-Engine consists of FPGA boards to digitise and channelise the data. The X-Engine utilises a GPU cluster for Fast Fourier Transform beam-forming. The CHIME/FRB backend receives the 1024 stationary intensity beams at 1 ms cadence and 16k frequency channels. Image adapted from [4]...... 5 Figure 1.4 A high-level overview of the CHIME/FRB’s software pipeline showing different stages of processing. Image adapted from [4].6 Figure 1.5 A schematic diagram showing the hybrid deep neural-network developed by Connor et al. Image adapted from [9]. In the original figure, the authors seem to have incorrectly labelled the operation on the waterfall plot as 1D convolution...... 9

ix Figure 1.6 Diagram showing an example of a network architecture used by Agarwal et al.. Image adapted from [2]...... 11

Figure 2.1 Schematic diagram of a fully-connected n-layer neural-network. d ( ) m An input vector x ∈ R is transformed to a vector h 1 ∈ R in the first hidden layer. This transformation is repeated n times k as shown in Equation 2.1. The final output vector yˆ ∈ R is obtained by the transformation shown in Equation 2.3, where

yˆi represents the probability for class i if a softmax function (Equation 2.4) is used. Image adapted from [19]...... 13 Figure 2.2 The figures show how convolution can be used to extract the edges from an input image with a single colour channel. The red pixels in the convolution kernels represent negative num- bers, while the blue pixels represent positive numbers. . . . . 16 Figure 2.3 Schematic diagram showing the operation performed by the convolution layer. See Equation 2.8. Image adapted from [19] 17 Figure 2.4 Diagram shows the typical sequence of operations in a CNN. Image adapted from [19]...... 18

Figure 3.1 Architecture of FRBNet. The model takes a single-channel 256 × 256 pixel image as input. Layer 0 of the model con- volves the input with a fixed set of thirteen 7 × 7 kernels. The resulting thirteen-channel 250×250 image is then down-sampled with 2 × 2 max-pooling and followed with non-linear ReLU activation function. The next three layers each perform a 5×5 convolution with a stride of two, followed by a ReLU activa- tion. At the end of Layer 3, the ten-channel image is down- sampled using max-pooling to give a one-dimensional vector of size ten. Finally, a fully-connected layer performs the oper- ation in Equation 2.3 to give an output vector of size 2. Image style adapted from [23]...... 20

x Figure 3.2 The top left plot shows the waterfall plot of an FRB. The hori- zontal streaks in this plot are RFI contamination. The remain- ing plots show the thirteen convolution kernels and their cor- responding Layer 0 transformations on the original plot. The kernel on the top right is simply the identity kernel. The re- maining kernels show the Prewitt (left) and Sobel (right) ker- nels embedded in a 7×7 grid. These help enhance the vertical pulse shape while wiping away the horizontal RFI streaks. . . 21 Figure 3.3 The top left plot shows the waterfall plot of an RFI event (with no astrophysical pulse present). Similar to Figure 3.2, the re- maining plots show the thirteen convolution kernels and their corresponding Layer 0 transformations on the original plot. . . 22 Figure 3.4 Some examples of plots generated by the automated script for events that were classified as astrophysical by the CHIME/FRB pipeline. The top five plots are pulses from FRBs and known that were correctly classified as astrophysical by the pipeline. The pulses are not perfectly vertical as L1’s DM search doesn’t find the optimal DM value. The horizontal RFI streaks can also be seen on these plots due to absence of RFI- masking. The bottom five events were also classified classi- fied as astrophysical by the pipeline, but these events are most likely RFI...... 24 Figure 3.5 The figure shows the difference between the plots generated by the online-waterfaller (left) and intensityML (right) for a scattered FRB. There is no down-sampling performed by the online-waterfaller by default. intensityML on the other hand performs automatic down-sampling which makes the burst ap- pear narrow in the plot...... 25 Figure 3.6 The waterfall plots of an FRB and its ten DM-augmented coun- terparts generated by the plotting script. L1 typically finds a sub-optimal DM value, which is why the burst in (a) does not appear vertical...... 26

xi Figure 3.7 Noise-augmentation for an FRB’s waterfall plots. (a) shows how this is performed by taking a weighted sum of the default waterfall plot with the blank-sky plot, where ξ ∼ 0.6. There’s also a probability of 0.5 to flip the blank-sky plot along its time-axis before the addition. (b) shows this effect for all of the DM-augmented counterparts from Figure 3.6...... 27

Figure 4.1 Waterfall plots generated by intenstiyML (left) compared with the ones generated by the online-waterfaller (right) for some of the potential FRB candidates. These were discovered by users with the help of real-time deployment...... 31 Figure 4.2 Waterfall plots for some of the potential FRB candidates dis- covered from archival data that were intially misclassified as RFI by users. These were discovered by either visually in- specting plots generated by intensityML or with the help of the classifier...... 32

Figure 5.1 Diagram shows an example of an Inception module [44] for Layer 0 convolution...... 35

xii Glossary

CHIME Canadian Hydrogen Intensity Mapping Experiment

CNN Convolutional Neural Network

DBSCAN Density-Based Spatial Clustering of Applications with Noise

DM Dispersion Measure

DRAO Dominion Radio Astrophysical Observatory

FRB

GBT

RFI Radio Frequency Interference

RRAT Rotating Radio Transients

SNR Signal-to-Noise Ratio

SVM Support Vector Machine

xiii Acknowledgments

I would like to express my gratitude towards my supervisor, Dr. Ingrid H. Stairs, for supervising and supporting my research. I would also like to thank the en- tire CHIME/FRB collaboration for providing numerous insightful discussions and assisting me with countless roadblocks encountered over the course of this project. I would also like to extend my gratitude towards the faculty and staff, as well as the greater student community, at The University of British Columbia for their support during my time here. Finally, I would like to thank my family for their financial and moral support throughout my education.

xiv Chapter 1

Introduction

1.1 Fast Radio Bursts

A Fast Radio Burst (FRB) is a highly dispersed millisecond duration radio signal of unknown extra-galactic origin [25][35]. Figure 1.1 shows the intensity as function of frequency vs. time (also known as the waterfall plot) for an FRB detected by the CHIME telescope [5] (see Sec- tion 1.2 for more details on CHIME). As an FRB propagates through space, the effects of electromagnetic dispersion due to the interstellar cold-plasma result in a quadratic delay in the time of arrival as a function of frequency [25]. The amount of dispersion can be quantified by calculating the Dispersion Measure (DM) using the cold-plasma dispersion law [24]:

2πm c∆t DM = e , (1.1) 2 −2 −2 e ( flow − fhigh) where me and e are the mass and charge of an electron respectively, c is the speed of light and ∆t is the difference in the time of arrival between the higher frequency

( fhigh) and the lower frequency ( flow) of the burst. The DM is also equal to the integral of the electron density (ne) along the line-of-sight to the FRB [34]:

Z d DM = ne(l)dl, (1.2) 0

1 (a) (b)

Figure 1.1: (a) Plot on the left shows the waterfall plot for an FRB after cor- recting for the effects from electromagnetic dispersion. (b) The plot on the right shows the same waterfall plot, but with partial dedispersion to demonstrate the effects of the quadratic dispersive delay in Equation 1.1. The masked frequency channels in both plots are due to interfer- ence from the LTE band. where d is the distance to the FRB. The optimum DM value can be determined by a process knows as dedispersion, in which the lower frequency channels are shifted with respect to the higher frequency channels over several trial DM values until the burst lines up vertically [25]. Figure 1.1 shows the waterfall plots of the FRB and the effects of dedispersion. The first FRB was discovered by Lorimer et al. in 2007 [25] during their search of archival data from Parkes radio telescope [42]. This FRB came to be known as the Lorimer burst. They found the DM of this burst to be 375 cm−3 pc. However, according to the NE2001 model [10], The Milky Way should have only contributed about 25 cm−3 pc along this line of sight of detection [25]. This led them to con- clude that the burst originated from outside our galaxy. Since then, around 110 FRBs have been discovered by various radio surveys across the world1 [32]. Some FRB sources are also known to repeat, and the first repeating source was identified by Spitler et al. in 2016 [41]. It is estimated that the

1http://frbcat.org

2 new generation of telescopes, such as The Canadian Hydrogen Intensity Mapping Experiment (CHIME), may be capable of detecting dozens of new FRBs every day [4]. Despite numerous detections, the source of FRBs still remains a mystery [35]. Some of the popular progenitor theories include compact object mergers, collapse of compact objects, supernovae remnants and active galactic nuclei [35]. For ex- ample, two neutron stars in a highly eccentric binary orbit may witness reduction in their orbital separation due to dissipation via gravitation waves [13]. It is predicted that FRBs may be produced when the neutron stars approach their periastron and their magneto-spheres interact, before they finally merge into each other [13]. However, repeating FRBs have been observed to have a different burst mor- phology when compared to non-repeating ones, which suggests that that their emission mechanism or local environment may be different [6][15]. Moreover, repeating FRBs can only originate from a mechanism that does not involve the de- struction of the original source. For example, Metzger et al. developed a theory involving synchrotron maser emission from young magnetars [27] to describe the origins and characteristics of repeating FRBs. However, it is still not clear whether repeating and non-repeating FRBs originate from the same mechanism. Needless to say, an increased number of FRB detections would help better constrain these theoretical models.

1.2 The CHIME/FRB Project The following section gives a brief overview of The CHIME/FRB software pipeline. For more details, refer to [4][5]. The Canadian Hydrogen Intensity Mapping Experiment (CHIME) is originally designed to measure the baryon acoustic oscillations by mapping neutral hydrogen in the frequency range of 400 – 800 MHz [28]. The telescope consists of four adja- cent 20 m x 100 m semi-cylindrical reflectors oriented in the North-South direction as shown in Figure 1.2[4]. Each one of these reflectors have 256 dual-polarisation feeds along its axis, resulting in a total of 1024 independent intensity beams and a large field-of-view of about 250 deg2 [4][5]. The CHIME/FRB project aims to utilise CHIME’s wide bandwidth, high sensitivity, large field-of-view and powerful

3 Figure 1.2: CHIME radio telescope located at The Dominion Radio Astro- physical Observatory (DRAO) in Canada. Photo taken by Mark Halpern and used with permission. correlator to detect multiple FRBs per day [4]. Figure 1.3 shows a schematic diagram of the telescope signal path. The input from the receiver feeds is processed by the F-Engine and the X-Engine to produce 1024 stationary intensity beams. Each beam has a high sampling rate of 0.983 ms and a frequency resolution of 24.4 kHz, resulting in a total of 16384 frequency channels in the 400-800 MHz range [4]. The data rate into the CHIME/FRB back- end is a massive 142 Gb/s. [4] Figure 1.4 shows a schematic diagram of the CHIME/FRB software pipeline. The processing is split into four stages, namely L1, L2, L3 and L4. The L0 stage refers to the pre-processing, including beam-forming, done by the X-Engine cor- relator. The L1 stage receives data from L0 and utilises a dedicated cluster of 128 compute nodes to perform two key tasks [4]. The first task involves removing ter- restrial sources of Radio Frequency Interference (RFI). It is imperative to excise these RFI signals from the intensity data in order to prevent misclassifying them as astrophysical signals. L1 does this by using a specialised algorithm to apply a custom mask to the intensity data in the frequency vs. time space. The second task involves dedispersion and identification of potential burst candidate events, and it is the most computationally expensive part of the pipeline. A highly optimised

4 Figure 1.3: A schematic of CHIME’s signal path. The raw data collected by the four reflectors (red arcs) is transferred to the F- and X-Engines at a rate of 13 Tb/s. The F-Engine consists of FPGA boards to digitise and channelise the data. The X-Engine utilises a GPU cluster for Fast Fourier Transform beam-forming. The CHIME/FRB backend receives the 1024 stationary intensity beams at 1 ms cadence and 16k frequency channels. Image adapted from [4]. tree-algorithm performs dedispersion on the data stream and searches for candidate events with DMs up to 13,000 cm−3 pc and pulse widths up to 128 ms [5]. Once an event is identified, various techniques, including machine-learning, are used to classify it [4][5]. In L1, a Support Vector Machine (SVM) classifier [11] uses the event’s Signal-to-Noise Ratio (SNR) behaviour to give it a score between 0 to 10. This score is called the L1 Grade and it reflects how astrophysical the event is. A score closer to 0 suggests that the event was most likely a terrestrial source of RFI, whereas a score closer to 10 suggests that the event was of an astrophysical origin. A lightweight description of the event containing key parameters, such as SNR, DM, L1 Grade, detection-time and sky-coordinates, are passed onto the next stage for event identification. Since CHIME/FRB data rate is so high, the event’s baseband (raw voltage data from antennae) and intensity data are stored

5 Figure 1.4: A high-level overview of the CHIME/FRB’s software pipeline showing different stages of processing. Image adapted from [4]. temporarily in L0 and L1 buffers respectively, and are retrieved later if the event is confirmed to be astrophysical further down the pipeline. The L2 stage utilises Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm [14] to consolidate events that are from the same origin, but detected simultaneously at different beams, into one single event based on their detection-time, DM, and sky-coordinates [5]. Once the events from all beams have been collated and grouped, another SVM classifier called the RFI-sifter gives them a score between 0 to 10 depending on their SNR behaviour, beam activity and L1 Grade. The RFI-sifter has an accuracy and recall of about 99% [5]. The events classified as RFI are sent straight to L4 stage. The events classified as astrophysical have their positions refined and fluxes estimated before being sent to L3 stage. The L3 stage checks whether the events are of galactic or extra-galactic origin based on their DMs and refined positions [4]. It also checks whether an event is coming from a source that is already known, such as a , a known FRB or

6 an Rotating Radio Transients (RRAT), by checking the ATNF pulsar catalogue2 [26], the RRATalog3, the FRB catalogue4 [32] and CHIME/FRB’s own database of discoveries. Once this is determined, L3 contains a set of rules which determine what further actions need to be taken by the L4 stage. L4 performs the actions requested by L3 [4]. These include sending a call- back request to L1 to retrieve the raw intensity data for extra-galactic events. The call-back request may also trigger L0 to write raw voltage data from the antennae by performing a baseband dump. The intensity data that is called-back is written to a network-shared archiver and can be used for further offline analysis. L4 also hosts a relational database that stores key parameters of each event sent past the L1 stage. Post L4, several offline analysis routines are monitored and controlled by The CHIME/FRB Master. One such routine is the online-waterfaller, which automati- cally generates waterfall plots for all called-back events. These plots are displayed on an interactive web-interface where users can modify the plots by manually re- fining the DM, masking and sub-banding the frequency channels, down-sampling the time samples etc. With the assistance of these tools, users visually inspect these plots and try their best to classify the called-back events as astrophysical or RFI. However, despite multiple efforts to mitigate RFI events, the CHIME/FRB pipeline is still susceptible to a large number of false positives, i.e there is a signif- icant proportion of called-back events which are actually RFI because the pipeline classifies them as astrophysical. For example, roughly 135 out of 360 called-back events in the months of October and November 2019 were false positives. This can make manual classification a rather laborious task. To address this, the members of the CHIME/FRB collaboration are assigned periodic 5-6 hour-long shifts where they monitor the waterfall plots of the events as they are called-back, and clas- sify them as either RFI or astrophysical. Needless to say, image-recognition via machine-learning can play a significant role in minimising the amount of work- load in this area.

2www.atnf.csiro.au/people/pulsar/psrcat/ 3astro.phys.wvu.edu/rratalog/ 4www.frbcat.org/

7 1.3 Related Work: Use of Machine Learning in FRB Classification

Convolutional Neural Network (CNN) based classifiers have been successfully ap- plied to pulsar searches (see for example [20][47]). More recently, they have also been successfully applied to FRB searches. In 2018, Zhang et al. successfully applied CNNs to detect FRBs directly from the intensity data stream from Break- through Listen observations from Green Bank Telescope (GBT), West Virginia [46]. CNNs have also been applied to search for FRBs from event data, and this section will describe two such applications that are similar to the model presented in this thesis [9][2]. For technical details on CNNs, refer to Chapter 2. The performance of such models is typically evaluated by four metrics:

No. of correctly classified events 1. Accuracy: Total no. of events

No. of correctly classified astrophysical events 2. Recall: Total no. of astrophysical events

No. of correctly classified astrophysical events 3. Precision: Total no. of events classified as astrophysical

4. F -score: 2 1 Recall−1+ Precision−1

1.3.1 Hybrid Deep Neural Network In 2018, Connor et al. constructed a tree-like hybrid deep neural network as shown in Figure 1.5[9]. This hybrid network takes in four different features:

1. Dedispersed waterfall plot.

2. Pulse profile obtained by summing the dedispersed waterfall plot along its frequency axis.

3. DM vs. time plot, where each row of the 2-dimensional plot is the pulse profile at a different DM value.

4. The SNRs of neighbouring beams that were triggered.

8 Figure 1.5: A schematic diagram showing the hybrid deep neural-network developed by Connor et al. Image adapted from [9]. In the original figure, the authors seem to have incorrectly labelled the operation on the waterfall plot as 1D convolution.

Three different CNNs independently extract higher-level features from the first three input features. A fully-connected neural-network extracts higher-level fea- tures from the last input feature. All of these output features are merged into one large fully-connected neural-network which makes a prediction. Connor et al. trained and tested their results independently on two different datasets. The first dataset consists of events triggered on the CHIME Pathfinder, which is a precursor to CHIME [3]. The model was trained on 4850 simulated FRBs and an equal number of RFI triggers from the CHIME Pathfinder. The trained model was then tested on several hundred RFI triggers and single pulses

9 from the pulsar B0329+54 and the Crab pulsar5. The classifier had a recall rate of about 99% on the test set. The second dataset used by Connor et al. was from the Apertif Telescope [29]. The training set consisted of roughly 10,000 RFI candidates, 9,800 simulated FRBs and a couple hundred single pulses from galactic pulsars. The trained model was tested on several hundred single pulses from galactic pulsars and RFI triggers, and the resulting recall rate was about 99.7%.

1.3.2 Transfer Learning on ImageNet Models In 2019, Agarwal et al. trained several popular deep CNN architectures [2]. These models have previously shown remarkable performance when it came to large pub- lic image repositories of real-life images, such as ImageNet [12]. Agarwal et al. trained these models on images of waterfall plots and DM vs. time plots inde- pendently using the method of transfer learning. In this method, all of the model parameters are initialised to their values from training on ImageNet dataset, and only the parameters from the last few convolution layers were fitted. Once these individual models were optimised, they removed each model’s classification layer and fused them into a hybrid network as shown in Figure 1.6. Agarwal et al. trained the models on RFI and galactic pulsars from The Green Bank Telescope (GBT)[43][37] and the 20 m telescope [18] located in Green Bank, West Virginia. The training set also consisted of simulated FRBs. The examples used for training were also augmented to artificially increase the size of the training set. They did so by flipping the waterfall plots along their time axes and flipping the DM vs. time plots along both axes. The test set consisted of about 13,500 real events, out of which roughly half were from galactic pulsars and the other half from RFI. Their top 11 hybrid models had greater than 99% accuracy, recall and F1-score. They also tested their models on 56 real FRB events from ASKAP [39], Parkes [31][33][45][8] and Breakthrough Listen [17][46], and most of their models had close to perfect recall.

5It is a common practice to use pulses from pulsars as their waterfall plots look extremely similar to those from FRBs.

10 Figure 1.6: Diagram showing an example of a network architecture used by Agarwal et al.. Image adapted from [2]

1.4 Thesis Organisation This thesis presents intensityML, a software pipeline which automatically generates waterfall plots for called-back events and classifies them. The classifier has been designed specifically for the CHIME/FRB system, as it has been trained and tested exclusively on real events called-back by CHIME. The second chapter of this thesis presents a brief overview of convolutional neural networks. The third chapter describes the features of intensityML, and the fourth chapter reports its performance. Finally, the fifth chapter presents a discus- sion on the current results and potential future work.

11 Chapter 2

Introduction to Convolutional Neural Networks

This chapter will provide a very quick and condensed description of feed-forward neural-networks and convolutional neural networks. For more comprehensive de- tails, refer to sources like [19] and [16].

2.1 Feed-Forward Neural-Networks A neural-network is a commonly used nonlinear statistical model which is often used for classification problems. An input example with d features can be repre- d sented as a vector x ∈ R . For a classification problem with k target labels, a neural- k network maps the input vector x to a target vector yˆ ∈ R . The mapping consists of a series of nonlinear transformations or hidden layers as shown in Equation 2.1.

  h(1) = g(1) W(1)x + b(1)   h(2) = g(2) W(2)h(1) + b(2) (2.1) . .   h(n) = g(n) W(n)h(n−1) + b(n)

12 Figure 2.1: Schematic diagram of a fully-connected n-layer neural-network. d ( ) m An input vector x ∈ R is transformed to a vector h 1 ∈ R in the first hidden layer. This transformation is repeated n times as shown in Equa- k tion 2.1. The final output vector yˆ ∈ R is obtained by the transformation shown in Equation 2.3, wherey ˆi represents the probability for class i if a softmax function (Equation 2.4) is used. Image adapted from [19].

Each layer multiplies its input vector with a matrix W(i) and adds a bias vector b(i) to it. The resulting vector is then acted on by an element-wise nonlinear func- tion g(i), which is also known as the activation function. The rectified linear unit (ReLU) is the typical choice for activation functions in most neural-networks [19]:

g(z) = max{0, z}. (2.2)

The output layer takes the vector h(n) and performs a transformation similar to the hidden layers:

  yˆ = f Wh(n) + b , (2.3) except the nonlinear function f is typically the softmax function [19]:

13 exp(z ) f (z ) = j . (2.4) j k ∑ exp(zc) c=1 The advantage of using the softmax function is that it allows us to interpret the elements of yˆ as the predicted probabilities for each class. Figure 2.1 shows a schematic diagram summarising Equations 2.1 and 2.3. The parameters of a neural-network (W’s and b’s) are a priori unknown. Let Θ represent all the neural-network parameters. The optimal values of these param- eters, Θˆ , are determined by maximum likelihood estimation:

( N ) Θˆ ∈ argmax ∏ p( yi | xi, Θ) , (2.5) Θ i=1 where yi is the true label for example xi, and N is the total number of examples. This can equivalently be expressed as finding the minimiser of the negative log- likelihood:

( N k ) ( N k ) ˆ i i i i  Θ ∈ argmax ∏ ∑ yc yˆc ∈ argmin − ∑ ∑ yc ln yˆc , (2.6) Θ i=1 c=1 Θ i=1 c=1

i i where yc is equal 1 only if c is the correct class label for example i, andy ˆc is predicted probability for example i to be of class c. The term being minimised in Equation 2.6 is also known as the cross-entropy loss function [16]. A given dataset is typically split into three parts for training, validation and testing. The training set is used for minimising the loss function iteratively via some variation of the gradient descent algorithm:

t+1 t t Θ = Θ − α · ∇Θt L(Θ ), (2.7) where Θt are the weights at iteration t, α is the step-size and L is the loss func- tion. This process, however, can often lead to a model that overfits the training dataset since neural-networks are typically over-parameterised. In the context of machine-learning, overfitting refers to a situation where a model is fit too well to the peculiarities of the features in the training set, and generalises poorly to newer

14 datasets. The validation set can be used to gauge this effect by estimating how well the trained model generalises to data that it did not ‘see’ during training. The generalisation error on the validation set can also be used for fine-tuning the hyper- parameters of the neural-network, such as the number of hidden layers or the size of each layer. It can also be used to determine when to stop the gradient descent algorithm once an optimal value for the validation error has been obtained. Lastly, the test set is used to get an unbiased estimate of the final trained model’s performance and should ideally only be used once to avoid optmisation bias.

2.2 Convolutional Neural Networks

Convolutional Neural Network (CNN)[22] is a special type of neural-network that generally performs well on image-classification problems [19]. Unlike fully- connected neural-networks, a CNN consists of convolution layers that perform con- volution operations instead of matrix multiplications. The purpose of convolutions is to extract features from an input image. Figure 2.2 shows how convolutions with vertical and horizontal Prewitt operators [36] can be used to extract edges. For an input two-dimensional grid of pixels I, the output O from the convolu- tion operation1 performed by a convolution layer looks like:

Oi, j = ∑∑ Ii+m, j+n Km,n + b, (2.8) m n where K is the two-dimensional convolution kernel, b is a constant bias term and the subscripts denote pixel numbers. Figure 2.3 illustrates this operation. For a two-dimensional image with multiple input colour channels, the convolution oper- ation is performed is a slight modification to Equation 2.8:

Oi, j,k = ∑∑∑Ii+m, j+n,p Kk,m,n,p + bk, (2.9) m n p where subscripts p and k denote the colour channels of the input and output image respectively.

1Technically, the operation performed here is cross-correlation, which is similar to convolution. See [19] for more details.

15 (a) Vertical edge detection with a 3 × 3 Prewitt operator.

(b) Horizontal edge detection with a 3 × 3 Prewitt operator.

Figure 2.2: The figures show how convolution can be used to extract the edges from an input image with a single colour channel. The red pixels in the convolution kernels represent negative numbers, while the blue pixels represent positive numbers.

The convolution operation can also be written as a matrix multiplication:

Z = Wx + b, (2.10) where x and Z are images I and O flattened into one-dimensional vectors respec- tively, and b is the bias vector. W is a sparse doubly-block circulant-matrix, where the number of non-zero elements of each row is equal to the total number of el- ements in matrix K. Here, we can see the similarity between the CNN and the fully-connected neural network from the previous subsection. However, due to the sparsity of W, not all input features interact with each other. This can be advanta- geous as it allows the transformation to focus on the local regions of an image. It

16 Figure 2.3: Schematic diagram showing the operation performed by the con- volution layer. See Equation 2.8. Image adapted from [19] also greatly reduces the number of parameters in the model. Moreover, the same set of parameters are applied at every position of the input. As a result, CNNs have better time-complexity, memory requirement and statistical efficiency than fully-connected neural-networks. Similar to neural-networks, the convolution operation is usually followed by a non-linear activation function applied to every single pixel of the output. This is then typically followed by a pooling function which down-samples the dimensions of the output. This set of processes are applied repeatedly to extract higher-level features from the image while reducing the size of the image. Finally, the output is reshaped into a one-dimensional vector and is fed into a fully-connected neural-

17 Figure 2.4: Diagram shows the typical sequence of operations in a CNN. Im- age adapted from [19] network which then classifies it. Figure 2.4 shows the architecture of a typical CNN. Note that the convolution kernels are also typically a priori unknown and are fitted during the training phase.

18 Chapter 3

Description of intensityML

3.1 The FRBNet Architecture In this section, the architecture of FRBNet, the CNN model used by intensityML for classification will be described. Figure 3.1 shows its architecture. In the Layer 0 of FRBNet, the input image is first convolved with a set of thirteen kernels which were kept fixed during the training phase1. The convolution operation is followed by max-pooling operation [19], where the image is down- sampled by taking the maximum value over a window of size 2 × 2 every two pixels. Finally, a ReLU activation function is applied to every pixel. Figures 3.2 and 3.3 show the thirteen kernels and their corresponding Layer 0 transforms for an FRB and an RFI event respectively. There are twelve Sobel [40][1] and Prewitt [36] kernels of different sizes for vertical edge detection. It can be seen how these kernels help extract the vertical pulse while clearing away the RFI. The 3×3 kernels are typically more sensitive to narrower bursts, whereas the 7×7 kernels are more sensitive to wider ones. A single identity kernel is also used to retain some information from the original plot. Since the kernels are of different sizes, the smaller kernels were padded with zeroes to give larger 7×7 kernels. This allows the convolution operations to be performed simultaneously within the same convolution layer.

1Similar techniques have been applied in numerous CNN applications. See for example [7] and [38].

19 Figure 3.1: Architecture of FRBNet. The model takes a single-channel 256 × 256 pixel image as input. Layer 0 of the model convolves the input with a fixed set of thirteen 7 × 7 kernels. The resulting thirteen- channel 250×250 image is then down-sampled with 2×2 max-pooling and followed with non-linear ReLU activation function. The next three layers each perform a 5 × 5 convolution with a stride of two, followed by a ReLU activation. At the end of Layer 3, the ten-channel image is down-sampled using max-pooling to give a one-dimensional vector of size ten. Finally, a fully-connected layer performs the operation in Equation 2.3 to give an output vector of size 2. Image style adapted from [23].

Layers 1, 2 and 3 perform 5×5 convolution and ReLU activation successively. Instead of applying max-pooling, the convolution operation is performed with a stride of 2. These kernels are not kept frozen and are fitted during the training phase. The purpose of these layers is to extract higher level features from the Layer 0 outputs. Finally, the extracted information is down-sampled to a one-dimensional vector, which is then transformed to give the classification probabilities.

20 Figure 3.2: The top left plot shows the waterfall plot of an FRB. The hori- zontal streaks in this plot are RFI contamination. The remaining plots show the thirteen convolution kernels and their corresponding Layer 0 transformations on the original plot. The kernel on the top right is sim- ply the identity kernel. The remaining kernels show the Prewitt (left) and Sobel (right) kernels embedded in a 7 × 7 grid. These help enhance the vertical pulse shape while wiping away the horizontal RFI streaks.

21 Figure 3.3: The top left plot shows the waterfall plot of an RFI event (with no astrophysical pulse present). Similar to Figure 3.2, the remaining plots show the thirteen convolution kernels and their corresponding Layer 0 transformations on the original plot.

22 The classification probability is converted to a score between 0 to 10. For events detected in multiple beams, the final event score is determined by averaging the score for each beam’s waterfall plot. During the training procedure, each wa- terfall plot is treated independently and the model is optimised to give the correct score to each waterfall plot. The averaging can help minimise the errors in the final event classification due to the classification errors in each beam’s waterfall plots.

3.2 Generating Waterfall Plots

3.2.1 Automated Plotting Scripts The called-back raw intensity data for an event is converted into a 256 × 256 pixel waterfall plot by intensityML via an automated plotting script which:

1. Dedisperses to the DM value calculated by L1.

2. Trims and down-samples the plot along its time-axis to give a total of 256 time samples.

3. Sub-bands the frequency-axis to give 256 frequency channels.

4. Scales each frequency channel independently by subtracting the median value and dividing by the standard deviation.

Figure 3.4 shows plots for typical examples of astrophysical and RFI events generated from this automated script. All of these events were classified as astro- physical by the CHIME/FRB pipeline. Note that DM-refinement and RFI-masking are not performed on any of these plots as these are often difficult to automate and can be relatively time-consuming processes. A key feature of these scripts is that they automatically down-sample plots based on their L1 parameters. L1’s dedispersion algorithm finds parameters which are correlated with the burst width, and these are used by intensityML for automatic down-sampling. As a result, bursts which are wide and scattered appear narrow as shown in Figure 3.5. This property is extremely useful to the FRBNet architecture.

23 Figure 3.4: Some examples of plots generated by the automated script for events that were classified as astrophysical by the CHIME/FRB pipeline. The top five plots are pulses from FRBs and known pulsars that were correctly classified as astrophysical by the pipeline. The pulses are not perfectly vertical as L1’s DM search doesn’t find the optimal DM value. The horizontal RFI streaks can also be seen on these plots due to absence of RFI-masking. The bottom five events were also classi- fied classified as astrophysical by the pipeline, but these events are most likely RFI.

3.2.2 Data Augmentation Data augmentation is a powerful technique that is commonly used when training image classifiers [19]. This involves generating extra modified copies of the train- ing examples by rotating, flipping, translating, cropping, adding noise etc. This technique significantly improves how well a trained model can generalise to new data. There were three types of data augmentation strategies used for training. The first of these were the DM-augmented plots, where an event was plotted about ten times at different DM values uniformly sampled within the event’s DM- error range. Figure 3.6 shows this for an FRB event. This augmentation can also be performed for RFI events. The plotting scripts can also be used to generate blank-sky plots, where the time-window of an FRB’s waterfall plot was shifted to give a plot with just noise. A weighted sum of this blank-sky plot and the FRB plot was then taken to produce a noisier FRB plot. This technique was also used by Zhang et al. in 2018 [46].

24 Figure 3.5: The figure shows the difference between the plots generated by the online-waterfaller (left) and intensityML (right) for a scattered FRB. There is no down-sampling performed by the online-waterfaller by de- fault. intensityML on the other hand performs automatic down-sampling which makes the burst appear narrow in the plot.

This process will be referred to as noise-augmentation, and Figure 3.7 shows this effect on the plots from Figure 3.6. The summation weight, ξ, was chosen to be a random number close to 0.6. This value was chosen as it gave the right balance between diluting the signal while still keeping it visible. There was also a 50% chance to flip the blank-sky plot before summation in order to further increase the randomness introduced by this augmentation technique. Noise-augmentation was performed only for high-SNR FRBs. For fainter FRBs, the signal may get too diluted by this effect. For pulsars, the blank-sky plots may contain additional pulses if the plotting window is shifted. For RFI events, the augmentation may not give a realistic plot. It can be seen that the combination of the two aforementioned augmentation techniques can be used to multiply the size of the training set by about an order of magnitude. The size of the training set was further doubled by mirroring each plot along its frequency-axis (similar to Agarwal et al. [2]).

25 (a) Default waterfall plot.

(b) DM-augmentations of the waterfall plot.

Figure 3.6: The waterfall plots of an FRB and its ten DM-augmented counter- parts generated by the plotting script. L1 typically finds a sub-optimal DM value, which is why the burst in (a) does not appear vertical.

26 (a) Noise-augmentation for an FRB waterfall plot.

(b) Noise-augmentation for all DM-augmented waterfall plots of the FRB.

Figure 3.7: Noise-augmentation for an FRB’s waterfall plots. (a) shows how this is performed by taking a weighted sum of the default waterfall plot with the blank-sky plot, where ξ ∼ 0.6. There’s also a probability of 0.5 to flip the blank-sky plot along its time-axis before the addition. (b) shows this effect for all of the DM-augmented counterparts from Figure 3.6.

27 Chapter 4

Results

4.1 Training The dataset used for training, validation and testing consisted of events from Septem- ber 2018 to November 2019. The raw data for these events were processed into waterfall plots using intensityML. Events and plots which appeared ambiguous or corrupted were excluded. The training set consisted of roughly 3000 called-back events from September 2018 to August 2019. Roughly half of these were from pulsars and FRBs, and the other half were from RFI. After applying augmentations, the astrophysical set consisted of about 57,000 waterfall plots, whereas the RFI set consisted of about 49,000. Some blank-sky plots were also included in the RFI set to further improve classifier’s sensitivity to faint bursts. The validation set consisted of around 195 astrophysical events (pulsars and FRBs) and 95 RFI events from September 2019. All waterfall plots were standardised such that the brightest and dimmest pixel in each plot had a value of 1 and 0 respectively. PyTorch [30] was used to implement the FRBNet architecture and train it. The model was trained using the Adam optimiser [21], which is a variation of the gra- dient descent algorithm. The optimal model configuration was selected using the method of early-stopping [19], where the validation error was kept track of during the optimisation process, and the configuration with the lowest error was selected. The optimal configuration had 100% validation accuracy.

28 4.2 Results The test set consisted of roughly 135 RFI events and 223 astrophysical events (pul- sars and FRBs) from October and November 2019. Table 4.1 shows the accuracy, precision, recall and F1-score for the test set. The lower performance on the test set when compared to the validation set could be due to optimisation bias towards the latter as its size is very small.

Accuracy (%) Precision (%) Recall (%) F1-Score (%) 99.2 99.1 99.6 99.3

Table 4.1: Accuracy, precision, recall and F1-score computed on the test set.

The latest model has been deployed for real-time classification of called-back events via The CHIME/FRB Master. This also now allows the users to view the plots generated by intensityML alongside the ones generated by the online- waterfaller. In several cases, users have reported that these plots have been sig- nificantly more discernible when compared to the default ones produced by the online-waterfaller. Figure 4.1 shows the waterfall plots for a few FRB candidates that were discovered via real-time deployment of intensityML. The combination of improved plots and the means to automatically classify them proved to be extremely useful in discovering several potential FRB candidates from archival data that were initially misclassified as RFI by users. Figure 4.2 shows several examples of such candidates that were discovered through the latest model or through one of its preliminary versions. This also helped clear over 70Tb of disk-space from the archiver by deleting old unwanted event data. Table 4.2 shows the L1 SNRs and DMs for the events shown in Figures 4.1 and 4.2. It can be seen that these events cover a wide range of SNRs and DMs, which suggests that the classifier may be capable of finding bursts over a wide range of parameters.

29 L1 SNR L1 DM (cm−3 pc) 12.1 179.5 10.3 1074.0 8.9 105.1 8.9 714.9 9.2 648.6 25.6 308.9 9.4 1184.0 10.6 669.6 9.6 334.8 11.1 975.4 8.7 706.9 12.9 681.0

Table 4.2: L1 SNRs and DMs for events shown in Figures 4.1 and 4.2.

30 Figure 4.1: Waterfall plots generated by intenstiyML (left) compared with the ones generated by the online-waterfaller (right) for some of the potential FRB candidates. These were discovered by users with the help of real- time deployment.

31 Figure 4.2: Waterfall plots for some of the potential FRB candidates discov- ered from archival data that were intially misclassified as RFI by users. These were discovered by either visually inspecting plots generated by intensityML or with the help of the classifier.

32 Chapter 5

Discussion

5.1 Discussion and Future Work

5.1.1 Discussion The preliminary analysis in this thesis presents an alternative CNN architecture that can give excellent performance when it comes to FRB searches. The FRBNet architecture presented here is simpler and more lightweight than the ones presented by Connor et al. [9] and Agarwal et al. [2], but exhibits comparable classification performance. A simpler model with fewer input features and parameters will take significantly less time when it comes to training and classification. This also greatly reduces the memory required to store the parameters of the trained model. For example, the parameters of the models presented by Agarwal et al. have a size of at least a 100 MB [2], whereas FRBNet’s parameters occupy only about 45 KB. However, it should be stressed that the comparisons presented here are weak as these models were trained and tested on completely different datasets. Nevertheless, the ideas from this thesis, combined with the ones presented in the two papers, can provide key insights into building and training even better mod- els in the future. The DM-augmentation technique could become a very powerful tool for generating additional training examples if using waterfall plots dedispersed at sub-optimal DMs. Moreover, building future architectures may also involve ex- perimenting with different types of convolution operations in the very first layer.

33 Hybrid architectures like the ones presented by Connor et al. [9] can perhaps be optimised by pruning away redundant input features or by replacing the first 2D CNN with an architecture similar to that of FRBNet. Moreover, training deep CNN architectures that were successful on ImageNet challenge may also not be neces- sary for this type of classification problem. Since waterfall plots contain much simpler features when compared to images of real-life objects from 1000 classes, using deeper networks to extract very high-level features may not be necessary as Agarwal et al. [2] claim. A relatively shallow network, as presented in this thesis, may be sufficient. Clearly, all of these ideas can be validated when working with even larger datasets as they become available.

5.1.2 Future Work It should also be stressed that the specific attributes of the classifier are subject to change in the future. The classifier presented in this thesis has been trained and tested on a relatively small dataset, and as CHIME/FRB’s dataset grows, the com- bination of model architecture plus optimisation strategy would most likely evolve too. The distribution of events is also highly dependent on ever-changing factors such as the beam sensitivity, the local RF environment or the performance of the upstream classifiers. However, the key ideas presented here will still, nonetheless, be very helpful in developing future classifiers. Besides, training and testing on larger datasets, other key future improvements to this work may involve exploring how to further optimise the architecture of FRBNet. For example, it may be worth investigating if some of the kernels in Layer 0 could be pruned to reduce redundancy in its output channels. Having both, Prewitt and Sobel filters, and their reflections may be unnecessary. It would also be worth exploring the effects of replacing some of the vertical-edge detection kernels with horizontal-edge detection kernels as it would increase the contrast in the layer’s output channels and reduce redundancy. The classifier may also be sped up further by adopting an architecture similar to that of the Inception module [44] for Layer 0 as shown in Figure 5.1. In this method, instead of padding the convolution kernel, the convolution operations of different sizes are performed by different branches, and the output is concatenated.

34 Figure 5.1: Diagram shows an example of an Inception module [44] for Layer 0 convolution.

However, the downside of this implementation is that it would be more challenging to parallelise.

5.1.3 Science Goals The trained model’s ability to efficiently classify events without explicit RFI-masking and DM-refinement makes it ideal for real-time event classification with the CHIME/FRB pipeline. This is because the L1 tree-dedispersion algorithm usually finds a sub- optimal DM and automating the process of finding the optimal DM is often a dif- ficult task, especially with the presence of RFI contamination. Similarly, RFI- masking is also a challenging task to automate. These tasks can also be computa- tionally expensive and can hinder the speed of real-time classification. Currently, the intensityML plots and real-time classification have not fully re- placed the need for manual classification, but instead complement this task. It will be crucial to train and test on much larger datasets before manual classification

35 can be completely phased out. Another key future work would involve training on events with SNR lower than the pipeline’s default threshold. This would in- crease the sensitivity of the classifier further in the low SNR range, which would eventually help lower the pipeline’s SNR threshold and increase the overall FRB detection rate. For several key scientific goals, it is crucial to miss as few FRBs as possible, and intensityML plots and classifier would certainly assist with this challenge. An exhaustive catalogue of detected FRBs would improve estimates of CHIME’s event detection rates and its variations with DM, sky-coordinates, fluence, morphological structure etc. This will also help get better estimates of properties of FRBs such as their repetition rates. The automatic real-time classifier can also be used to trigger baseband dumps [9]. An event’s baseband data provides a higher temporal resolution when com- pared to the intensity data, which allows studying burst morphology in even greater detail [6]. Moreover, it also contains an event’s polarisation information [15]. However, baseband data requires substantially more disk space and takes consid- erably longer time to download. In order to minimise the risk of overloading the system with baseband dumps triggered by false-positives, the real-time classifier can be used to efficiently control it. This technique can also be applied to trigger data dumps from outrigger sites, which can help improve localisation of FRBs [9]. Presently, The CHIME/FRB pipeline does not perform callbacks for galactic sources as it would result in an overwhelming event-rate. However, an automated classifier can easily fulfill this role with minimal human supervision. This would allow CHIME/FRB to detect a large number of pulses from known galactic pulsars, which could help characterise properties of individual pulses and provide better estimates of their pulse statistics. The automated classifier may also help CHIME discover new pulsars.

5.2 Conclusion This thesis presented a detailed description of intensityML pipeline designed to generate plots and automatically classify events for The CHIME/FRB system. The CNN classifier used by intensityML has a unique architecture when compared to

36 other classifiers used in this field, and it has been trained exclusively on events collected by CHIME, without using any simulated events, DM-refinement and RFI- masking. The training phase also utilised a novel data-augmentation technique of DM-augmentation. The combination of intensityML’s new plotting routine as well as its trained classifier helped discover several FRBs, both in real-time and from archival data. The work presented in this thesis encourages further exploration in refining future CNN architectures designed for FRB classification. Having a large and comprehensive catalogue of FRBs will be a major step in answering key scientific questions behind the source of FRBs, and automated classifiers will play a key role in accomplishing this task.

37 Bibliography

[1] Sobel filter kernel of large size. URL https: //stackoverflow.com/questions/9567882/sobel-filter-kernel-of-large-size. Accessed: 2019-10-01. → page 19

[2] D. Agarwal, K. Aggarwal, S. Burke-Spolaor, D. R. Lorimer, and N. Garver-Daniels. Towards deeper neural networks for Fast Radio Burst detection, 2019. → pagesx,8, 10, 11, 25, 33, 34

[3] M. Amiri, K. Bandura, P. Berger, J. R. Bond, J. F. Cliche, L. Connor, M. Deng, N. Denman, M. Dobbs, R. S. Domagalski, and et al. Limits on the Ultra-bright Fast Radio Burst Population from the CHIME Pathfinder. The Astrophysical Journal, 844(2):161, Aug 2017. ISSN 1538-4357. doi:10.3847/1538-4357/aa713f. URL http://dx.doi.org/10.3847/1538-4357/aa713f. → page9

[4] M. Amiri, K. Bandura, P. Berger, M. Bhardwaj, M. M. Boyce, P. J. Boyle, C. Brar, M. Burhanpurkar, P. Chawla, and et al. The CHIME Fast Radio Burst Project: System Overview. The Astrophysical Journal, 863(1):48, Aug 2018. ISSN 1538-4357. doi:10.3847/1538-4357/aad188. URL http://dx.doi.org/10.3847/1538-4357/aad188. → pagesv, ix,3,4,5,6,7

[5] M. Amiri, K. Bandura, M. Bhardwaj, et al. Observations of Fast Radio Bursts at Frequencies Down to 400 Megahertz. Nature, 566(7743):230–234, Jan 2019. ISSN 1476-4687. doi:10.1038/s41586-018-0867-7. URL http://dx.doi.org/10.1038/s41586-018-0867-7. → pagesv,1,3,5,6

[6] B. Andersen, K. Bandura, M. Bhardwaj, P. Boubel, M. Boyce, P. Boyle, C. Brar, T. Cassanelli, P. Chawla, D. Cubranic, et al. CHIME/FRB Discovery of Eight New Repeating Fast Radio Burst Sources. The Astrophysical Journal Letters, 885(1):L24, 2019. → pages3, 36

38 [7] A. Calderon,´ S. Roa, and J. Victorino. Handwritten digit recognition using convolutional neural networks and Gabor filters. Proc. Int. Congr. Comput. Intell, 2003. → page 19

[8] D. J. Champion, E. Petroff, M. Kramer, M. J. Keith, M. Bailes, E. D. Barr, S. D. Bates, N. D. R. Bhat, M. Burgay, S. Burke-Spolaor, and et al. Five new fast radio bursts from the HTRU high-latitude survey at Parkes: first evidence for two-component bursts. Monthly Notices of the Royal Astronomical Society: Letters, 460(1):L30–L34, Apr 2016. ISSN 1745-3933. doi:10.1093/mnrasl/slw069. URL http://dx.doi.org/10.1093/mnrasl/slw069. → page 10

[9] L. Connor and J. van Leeuwen. Applying Deep Learning to Fast Radio Burst Classification. The Astronomical Journal, 156(6):256, Nov 2018. ISSN 1538-3881. doi:10.3847/1538-3881/aae649. URL http://dx.doi.org/10.3847/1538-3881/aae649. → pages ix,8,9, 33, 34, 36

[10] J. M. Cordes and T. J. W. Lazio. NE2001.I. A New Model for the Galactic Distribution of Free Electrons and its Fluctuations, 2002. → page2

[11] C. Cortes and V. Vapnik. Support-Vector Networks. Mach. Learn., 20(3): 273–297, Sept. 1995. ISSN 0885-6125. doi:10.1023/A:1022627411411. URL https://doi.org/10.1023/A:1022627411411. → page5

[12] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. → page 10

[13] V. I. Dokuchaev and Y. N. Eroshenko. Recurrent Fast Radio Bursts from Collisions of Neutron Stars in the Evolved Stellar Clusters, 2017. → page3

[14] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A Density-based Algorithm for Discovering Clusters a Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, pages 226–231. AAAI Press, 1996. URL http://dl.acm.org/citation.cfm?id=3001460.3001507. → page6

[15] E. Fonseca, B. Andersen, M. Bhardwaj, P. Chawla, D. Good, A. Josephy, V. Kaspi, K. Masui, R. Mckinven, D. Michilli, et al. Nine New Repeating Fast Radio Burst Sources from CHIME/FRB. arXiv preprint arXiv:2001.03595, 2020. → pages3, 36

39 [16] J. Friedman, T. Hastie, and R. Tibshirani. The Elements of Statistical Learning : Data Mining, Inference, and Prediction. Springer, New York, 2009. ISBN 978-0387848570. → pages 12, 14

[17] V. Gajjar, A. P. V. Siemion, D. C. Price, C. J. Law, D. Michilli, J. W. T. Hessels, S. Chatterjee, A. M. Archibald, G. C. Bower, C. Brinkman, and et al. Highest frequency detection of frb 121102 at 4–8 ghz using the breakthrough listen digital backend at the green bank telescope. The Astrophysical Journal, 863(1):2, Aug 2018. ISSN 1538-4357. doi:10.3847/1538-4357/aad005. URL http://dx.doi.org/10.3847/1538-4357/aad005. → page 10

[18] G. Golpayegani, D. R. Lorimer, S. W. Ellingson, D. Agarwal, O. Young, F. Ghigo, R. Prestage, K. Rajwade, M. A. McLaughlin, and M. Mingyar. GBTrans: A commensal search for radio pulses with the Green Bank twenty metre telescope. Monthly Notices of the Royal Astronomical Society, Sep 2019. ISSN 1365-2966. doi:10.1093/mnras/stz2424. URL http://dx.doi.org/10.1093/mnras/stz2424. → page 10

[19] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org. → pages v,x, 12, 13, 15, 17, 18, 19, 24, 28

[20] P. Guo, F. Duan, P. Wang, Y. Yao, Q. Yin, and X. Xin. Pulsar Candidate Identification with Artificial Intelligence Techniques, 2017. → page8

[21] D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization, 2014. → page 28

[22] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput., 1(4):541–551, Dec. 1989. ISSN 0899-7667. doi:10.1162/neco.1989.1.4.541. URL http://dx.doi.org/10.1162/neco.1989.1.4.541. → page 15

[23] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278–2324, Nov 1998. ISSN 1558-2256. doi:10.1109/5.726791. → pages x, 20

[24] D. R. Lorimer and M. Kramer. Handbook of Pulsar Astronomy, volume 4. 2004. → page1

40 [25] D. R. Lorimer, M. Bailes, M. A. McLaughlin, D. J. Narkevic, and F. Crawford. A Bright Millisecond Radio Burst of Extragalactic Origin. Science, 318(5851):777–780, Nov 2007. ISSN 1095-9203. doi:10.1126/science.1147532. URL http://dx.doi.org/10.1126/science.1147532. → pages1,2 [26] R. N. Manchester, G. B. Hobbs, A. Teoh, and M. Hobbs. The Australia Telescope National Facility Pulsar Catalogue. The Astronomical Journal, 129(4):1993–2006, Apr 2005. ISSN 1538-3881. doi:10.1086/428488. URL http://dx.doi.org/10.1086/428488. → page7 [27] B. D. Metzger, B. Margalit, and L. Sironi. Fast radio bursts as synchrotron maser emission from decelerating relativistic blast waves. Monthly Notices of the Royal Astronomical Society, 485(3):4091–4106, Mar 2019. ISSN 1365-2966. doi:10.1093/mnras/stz700. URL http://dx.doi.org/10.1093/mnras/stz700. → page3 [28] L. B. Newburgh, G. E. Addison, M. Amiri, K. Bandura, J. R. Bond, L. Connor, J.-F. Cliche, G. Davis, M. Deng, N. Denman, and et al. Calibrating CHIME: A New Radio Interferometer to Probe Dark Energy. Ground-based and Airborne Telescopes V, Jul 2014. doi:10.1117/12.2056962. URL http://dx.doi.org/10.1117/12.2056962. → page3 [29] T. Oosterloo, M. Verheijen, W. van Cappellen, L. Bakker, G. Heald, and M. Ivashina. Apertif - the focal-plane array system for the WSRT, 2009. → page 10 [30] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf,¨ E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library, 2019. → page 28 [31] E. Petroff, M. Bailes, E. D. Barr, B. R. Barsdell, N. D. R. Bhat, F. Bian, S. Burke-Spolaor, M. Caleb, D. Champion, P. Chandra, and et al. A real-time fast radio burst: polarization detection and multiwavelength follow-up. Monthly Notices of the Royal Astronomical Society, 447(1): 246–255, Dec 2014. ISSN 0035-8711. doi:10.1093/mnras/stu2419. URL http://dx.doi.org/10.1093/mnras/stu2419. → page 10 [32] E. Petroff, E. D. Barr, A. Jameson, E. F. Keane, M. Bailes, M. Kramer, V. Morello, D. Tabbara, and W. van Straten. FRBCAT: The Fast Radio Burst

41 Catalogue. Publications of the Astronomical Society of Australia, 33, 2016. ISSN 1448-6083. doi:10.1017/pasa.2016.35. URL http://dx.doi.org/10.1017/pasa.2016.35. → pages2,7

[33] E. Petroff, S. Burke-Spolaor, E. F. Keane, M. A. McLaughlin, R. Miller, I. Andreoni, M. Bailes, E. D. Barr, S. R. Bernard, S. Bhandari, and et al. A polarized fast radio burst at low galactic latitude. Monthly Notices of the Royal Astronomical Society, May 2017. ISSN 1365-2966. doi:10.1093/mnras/stx1098. URL http://dx.doi.org/10.1093/mnras/stx1098. → page 10

[34] E. Petroff, J. W. T. Hessels, and D. R. Lorimer. Fast Radio Bursts. The Astronomy and Astrophysics Review, 27(1), May 2019. ISSN 1432-0754. doi:10.1007/s00159-019-0116-6. URL http://dx.doi.org/10.1007/s00159-019-0116-6. → page1

[35] E. Platts, A. Weltman, A. Walters, S. Tendulkar, J. Gordin, and S. Kandhai. A living theory catalogue for fast radio bursts. Physics Reports, 821:1–27, Aug 2019. ISSN 0370-1573. doi:10.1016/j.physrep.2019.06.003. URL http://dx.doi.org/10.1016/j.physrep.2019.06.003. → pages1,3

[36] J. Prewitt and B. Lipkin. ”Object Enhancement and Extraction”. In Picture Processing and Psychopictorics. Academic Press, 1970. → pages 15, 19

[37] K. M. Rajwade, D. Agarwal, D. R. Lorimer, N. M. Pingel, D. J. Pisano, M. Ruzindana, B. Jeffs, K. F. Warnick, D. A. Roshi, and M. A. McLaughlin. A 21 cm pilot survey for pulsars and transients using the Focal L-Band Array for the Green Bank Telescope. Monthly Notices of the Royal Astronomical Society, 489(2):1709–1718, Aug 2019. ISSN 1365-2966. doi:10.1093/mnras/stz2207. URL http://dx.doi.org/10.1093/mnras/stz2207. → page 10

[38] S. S. Sarwar, P. Panda, and K. Roy. Gabor filter assisted energy efficient fast learning Convolutional Neural Networks. 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), Jul 2017. doi:10.1109/islped.2017.8009202. URL http://dx.doi.org/10.1109/ISLPED.2017.8009202. → page 19

[39] R. M. Shannon, J.-P. Macquart, K. W. Bannister, R. D. Ekers, C. W. James, S. Osłowski, H. Qiu, M. Sammons, A. Hotan, M. A. Voronkov, R. J. Beresford, M. Brothers, A. J. Brown, J. D. Bunton, A. Chippendale, C. Haskins, M. Leach, M. Marquarding, D. McConnell, M. Pilawa, E. M.

42 Sadler, E. Troup, J. Tuthill, M. T. Whiting, J. R. Allison, C. S. Anderson, M. E. Bell, J. D. Collier, G. Gurkan,¨ G. Heald, and C. J. Riseley. The dispersion–brightness relation for fast radio bursts from a wide-field survey. Nature, 562:386–390, 2018. → page 10

[40] I. Sobel. An Isotropic 3x3 Image Gradient Operator. Presentation at Stanford A.I. Project 1968, 02 2014. → page 19

[41] L. G. Spitler, P. Scholz, J. W. T. Hessels, S. Bogdanov, A. Brazier, F. Camilo, S. Chatterjee, J. M. Cordes, F. Crawford, J. Deneva, and et al. A repeating fast radio burst. Nature, 531(7593):202–205, Mar 2016. ISSN 1476-4687. doi:10.1038/nature17168. URL http://dx.doi.org/10.1038/nature17168. → page2

[42] L. Staveley-Smith, W. Wilson, T. Bird, M. Disney, R. Ekers, K. Freeman, R. Haynes, M. Sinclair, R. Vaile, R. Webster, et al. The Parkes 21 cm multibeam receiver. Publications of the Astronomical Society of Australia, 13(3):243–248, 1996. → page2

[43] M. P. Surnis, D. Agarwal, D. R. Lorimer, X. Pei, G. Foster, A. Karastergiou, G. Golpayegani, R. J. Maddalena, S. White, W. Armour, and et al. GREENBURST: A commensal Fast Radio Burst search back-end for the Green Bank Telescope. Publications of the Astronomical Society of Australia, 36, 2019. ISSN 1448-6083. doi:10.1017/pasa.2019.26. URL http://dx.doi.org/10.1017/pasa.2019.26. → page 10

[44] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015. → pages xii, 34, 35

[45] D. Thornton, B. Stappers, M. Bailes, B. Barsdell, S. Bates, N. D. R. Bhat, M. Burgay, S. Burke-Spolaor, D. J. Champion, P. Coster, and et al. A population of fast radio bursts at cosmological distances. Science, 341 (6141):53–56, Jul 2013. ISSN 1095-9203. doi:10.1126/science.1236789. URL http://dx.doi.org/10.1126/science.1236789. → page 10

[46] Y. G. Zhang, V. Gajjar, G. Foster, A. Siemion, J. Cordes, C. Law, and Y. Wang. Fast Radio Burst 121102 Pulse Detection and Periodicity: A Machine Learning Approach. The Astrophysical Journal, 866(2):149, Oct 2018. ISSN 1538-4357. doi:10.3847/1538-4357/aadf31. URL http://dx.doi.org/10.3847/1538-4357/aadf31. → pages8, 10, 24

43 [47] W. W. Zhu, A. Berndsen, E. C. Madsen, M. Tan, I. H. Stairs, A. Brazier, P. Lazarus, R. Lynch, P. Scholz, K. Stovall, and et al. Searching for pulsars using image pattern recognition. The Astrophysical Journal, 781(2):117, Jan 2014. ISSN 1538-4357. doi:10.1088/0004-637x/781/2/117. URL http://dx.doi.org/10.1088/0004-637X/781/2/117. → page8

44