-FOR-ALL:

BUILDING AN OPEN-SOURCE DETECTION LINKING SOFTWARE

by

Nicole Tan Jieyi

Submitted to Wellesley College

in Partial Fulfillment of the Prerequisite for Honors in Astrophysics

under the advisement of Carrie Nugent and Kim McLeod

May 2020

© Copyright by NICOLE TAN JIEYI, 2020 ACKNOWLEDGMENTS

I’d like to acknowledge that this work was partially written during the 2019-2020 coronavirus pandemic. My gratitude to everyone that I will mention below is infinitely greater because of it; despite the challenging conditions of countless back and forth emails, Zoom calls, scheduling conflicts with respect to a 12 hour time difference and the general state of the world, everyone has been immensely accommodating and kind in pushing me through the

final stages of the thesis process.

I’d like to express my deepest gratitude to my off-campus advisor, Professor Carrie Nu- gent at Olin College, for first taking me on as a summer student and then taking on the huge responsibility of becoming my thesis advisor over the academic year. Thank you for introducing me to the world (no pun intended) of asteroids and Solar System objects. I cannot begin to thank you enough for everything you’ve done for me, and I will miss our weekly thesis meetings.

Thank you to my on-campus advisor, Professor Kim McLeod, for being incredibly sup- portive throughout the entire thesis process. Thank you for all your comments and advice through the different thesis drafts, and your continued faith in my abilities—not just during this thesis process, but throughout my four years at Wellesley.

Many thanks to the rest of my thesis committee for not just their time in reviewing my thesis, but also for their individual contributions to my time here at Wellesley. Professor

James Battat, for his role as my major advisor and a source of continued guidance; Professor

Wes Watters, for his help in crafting my KNAC presentation on my initial summer work for this project; and Professor Lisa Rodensky, who deserves credit for any semblance of good writing I’ve done (and I am sorry for all the passive voice constructions).

ii Thanks also to my friends and family. The Astro Squad, Maura Shea and Karisa Zdanky, for all the laughs (and tears) during our shared struggles through the astrophysics major.

Laurel Stickney, my roommate of four years, for all the interesting conversations, both pro- found and nonsensical, that have acted as a sounding board for most of my ideas. My family for supporting me through this journey, and for leaving me (mostly) alone to write this.

This work would not have been possible without Bill Gray’s wonderful software, Find_Orb.

Thank you for putting it out there for all to use, and for your continuous updates, meticulous documentation and extremely helpful answers to all my questions, from software installation to orbital mechanics.

This research has made use of data and services provided by the International Astronom- ical Union’s Minor Planet Center, and has made use of the NASA/IPAC Infrared Science

Archive, which is funded by the National Aeronautics and Space Administration and oper- ated by the California Institute of Technology. The Pan-STARRS1 Surveys (PS1) and the

PS1 public science archive have been made possible through contributions by the Institute for Astronomy, the University of Hawaii, the Pan-STARRS Project Office, the Max-Planck

Society and its participating institutes, the Max Planck Institute for Astronomy, Heidelberg and the Max Planck Institute for Extraterrestrial Physics, Garching, The Johns Hopkins

University, Durham University, the University of Edinburgh, the Queen’s University Belfast, the Harvard-Smithsonian Center for Astrophysics, the Las Cumbres Observatory Global

Telescope Network Incorporated, the National Central University of Taiwan, the Space Tele- scope Science Institute, the National Aeronautics and Space Administration under Grant No.

NNX08AR22G issued through the Planetary Science Division of the NASA Science Mission

Directorate, the National Science Foundation Grant No. AST-1238877, the University of

Maryland, Eotvos Lorand University (ELTE), the Los Alamos National Laboratory, and the

Gordon and Betty Moore Foundation.

iii ASTEROIDS-FOR-ALL:

BUILDING AN OPEN-SOURCE ASTEROID DETECTION LINKING SOFTWARE

by Nicole Tan Jieyi, B.A. Wellesley College May 2020

ABSTRACT

We developed the first free and open-source asteroid detection linking software for use with time-series sky survey data. It takes cleaned Source Extractor outputs of non-stationary de- tections from the images and finds tracklets representing asteroid motion within them. The linking software is a Python-based program that incorporates Find_Orb, an orbit determi- nation program by Bill Gray. The bulk of the program identifies candidate tracklets based on exposure-to-exposure object movement. The program runs through all possible combi- nations of observation linkages, feeds possible tracklets through Find_Orb, and identifies candidate tracklets based on user-defined criteria including angular speed, angle between detections and mean residuals from Find_Orb. We tested the software with data from the Near- Asteroid Tracking (NEAT) survey and from the Palomar Transient Factory

(PTF). For NEAT tests with artificial noise added, we found that the code performed ideally for noise levels up to 80%. In the PTF tests, we discovered a previously unreported asteroid position. Subsequently, we have submitted this tracklet to the Minor Planet Center. Our trials show that under ideal conditions, the linking software is able to perform on par with the linking capabilities of the NEAT and PTF systems, which have a more complex design.

iv TABLE OF CONTENTS

Page

ACKNOWLEDGMENTS ...... ii

ABSTRACT ...... iv

LIST OF TABLES ...... ix

LIST OF FIGURES ...... xi

CHAPTER

1 Introduction ...... 1 1.1 Project overview...... 1 1.2 Overview of asteroids...... 1 1.2.1 Small body taxonomy...... 1 1.3 Asteroid significance ...... 4 1.3.1 Clues to the past ...... 4 1.3.2 Keys to further ...... 5 1.3.3 A potential hazard...... 5 1.4 Detection method...... 7 1.4.1 Introduction to asteroid surveys...... 7 1.4.2 Detection linking ...... 9 1.5 Scope of work...... 10 1.5.1 Expected inputs and outputs...... 10 1.5.2 Purpose ...... 11

CHAPTER

2 Methods ...... 13 2.1 Software overview...... 13

v 2.2 Find_Orb...... 13 2.2.1 Purpose ...... 13 2.2.2 Find_Orb method...... 14 2.2.3 Find_Orb limitations...... 15 2.3 Tracklet Screening...... 16 2.3.1 Data handling...... 16 2.3.2 Software parameters...... 18 2.3.3 Recursion...... 20 2.3.4 Screening passes...... 22

CHAPTER

3 Code testing with NEAT ...... 23 3.1 NEAT overview...... 23 3.2 NEAT data ...... 23 3.3 Code modifications for NEAT ...... 24 3.4 Noise addition...... 25 3.5 Findings...... 26

CHAPTER

4 Applications to PTF data ...... 29 4.1 PTF overview...... 29 4.2 Accessing PTF data...... 29 4.3 Star removal...... 30 4.3.1 Accessing Pan-STARRS data...... 30 4.3.2 Code for star removal...... 31 4.4 Trials...... 31 4.4.1 Testing on 20’ PTF data ...... 32

CHAPTER

5 Conclusions ...... 35

APPENDIX

A ...... 41 A.1 Code...... 41 A.1.1 linking_library.py...... 41

vi A.1.2 add_library.py...... 55 A.1.3 FindPOTATOs.py...... 66

vii

LIST OF TABLES

1.1 Types of objects at hypothetical distances and their calculated orbital and

angular speed, rounded to two significant figures...... 4

1.2 Specifications for the Near-Earth Asteroid Tracking (NEAT), Palomar Tran-

sient Factory (PTF), and Outer Solar System Origins Survey (OSSOS) sur-

veys, taken from Pravdo et al.(1999), Law et al.(2009) and Bannister et al.

(2016). The streak limits for NEAT and PTF are calculated based on pixel

size and exposure. For seeing-limited OSSOS, the streak limit is based on

exposure and an assumed 2" seeing...... 8

ix

LIST OF FIGURES

1.1 MOID is a measure of the distance between the closest point of two orbits.

This diagram shows the Earth MOID for a hypothetical NEO...... 6

1.2 An example of a tracklet found by the Catalina Sky Survey. These three

images of asteroid 2010RX30 are of the same field of the sky. From exposure

to exposure, most sources are stationary, except for detections of the asteroid

(circled). This series of three detections is a tracklet...... 9

2.1 θ measurement in relation to the three detections ...... 19

2.2 Overview of the main linking process, assuming one tracklet is made of three

detections. Pre-Find_Orb parameters are the angular speed and angle limit;

Post-Find_Orb parameters are MOID limit and mean residual limit. Not

meeting any one of these parameters causes the function to retry with another

group of three detections...... 22

3.1 Accuracy measurements for linking software based on number of good tracklets

and number of false positive tracklets. Note that noise percentage refers to

percent of total input data that is noise; e.g. 20% noise is a 2:8 noise to true

detection ratio, while 90% noise is a 9:1 ratio...... 27

xi xii Chapter One

Introduction

1.1 Project overview

We are building a Python-based software to be used for the detection of asteroids. This software will analyze catalog data obtained from astronomical images of the sky taken by asteroid-hunting surveys. Locating asteroids, especially ones that are potentially hazardous, is key to avoiding a potentially cataclysmic event. By making the software freely available, we hope to enable smaller observatories to find asteroids in their data.

1.2 Overview of asteroids

Asteroids are small, rocky or metallic inner Solar System objects that orbit the Sun. They are minor planets—too small to have a large gravitational effect over a significant region of space, but large enough to be detected individually by a telescope. They are diverse in size and composition across varied distances from the Sun (DeMeo & Carry, 2014).

1.2.1 Small body taxonomy

Asteroids are just one of the many types of small bodies in the Solar System. Small bodies are largely characterised by their orbital distance from the Sun, composition and size.

1 For this work, we will be focusing our efforts toward asteroids. However, the nature of asteroid detection (which is largely just looking at moving point sources) means that other similarly sized small bodies may pop up as part of the process every so often. Therefore, we will need to define the other possible objects that we may encounter and detect.

Minor planet populations

Minor planets can broadly be grouped based on their orbits. The ones found by our detection linking software will likely be part of one of these groups:

• The asteroid belt, where the majority of asteroids reside. The belt is formed by asteroids

with orbits that lie between Mars and Jupiter (IAU, 2013). These asteroids are termed

main belt asteroids (MBAs).

• Near-Earth objects (NEOs), which are somewhat misleadingly named. NEOs are de-

fined as minor planets that are less than 1.4 AU from the Sun at closest approach (IAU,

2013). A subset of NEOs are Near-Earth asteroids (NEAs), the rest being comets.

• Kuiper belt objects (KBOs) are minor planets found in a disc that extends from Nep-

tune’s orbit to 50 AU from the Sun (Stern & Colwell, 1997). These are technically not

asteroids because they are too far away from the Sun, and are also largely composed

of ices.

• Comets are small bodies that are defined by having a visible coma and sometimes tail

when close to the Sun, which is caused by gases being released due to heat. They can

be found throughout the Solar System.

The line between asteroids and comets is hard to distinguish because comets may lurk

as asteroids if they have yet to display a coma. In instances such as these (or when

a comet has a very small coma), the comet will be detected by our software as an

2 asteroid. Comets only make up ∼ 0.5% of discovered small bodies (MPC, 2020), so

this situation occurring is not a major concern.

• Some other classifications of small bodies include Trans-Neptunian Objects (TNOs),

Jupiter trojans, Centaurs, etc,

Small body mechanics

Because the different populations of small bodies are located in different regions of the Solar

System, they also move with different speeds. In order for them to be in gravitational

orbit around the sun, their movement is governed by the orbital velocity vorb, which for a q GMsun circular orbit, is vorb = r , where Msun is the mass of the Sun and r is the heliocentric distance. We also can calculate their angular speed across the sky. Using the small angle approximation, the angular speed vang is equal to the speed of the object relative to Earth

(vrel) divided by its distance from the Earth (rrel). Since ~vrel is just the difference between

the orbital velocity of the object and the orbital velocity of the Earth, and for an asteroid

near opposition, rrel is the heliocentric distance subtracted by Earth’s distance from the Sun,

we can write vang as:

1 vang = vrel rrel 1 = |vorb − vorb,Earth| r − rEarth r GM rGM 1 sun sun = − r rEarth r − rEarth

With expressions for vang, we can get a rough estimate for how fast we expect various

objects to be moving across the as viewed from Earth. This is helpful to know when we are

trying to identify moving objects. Table 1.1 contains calculated orbital and angular speeds

for hypothetical minor planets observed at opposition, with orbits assumed to be minimally

inclined, prograde, and circular.

3 Object Heliocentric distance (AU) Orbital speed (m/s) Angular speed ("/min)

NEA 1.5 2.4 × 104 0.9

MBA 2 2.1 × 104 0.7

KBO 40 4.7 × 103 0.05

Table 1.1 Types of objects at hypothetical distances and their calculated orbital and angular speed, rounded to two significant figures.

1.3 Asteroid significance

Asteroids hold a unique status amongst celestial entities because of their value in astronomy as well as for broader human society. Since some are composed of the oldest and least altered material in the Solar System, their composition can give us clues as to how planets were formed and how life on Earth started (Alexander et al., 2012). While the answers to these scientific questions have major ramifications for civilisation at large, asteroids also hold relevance for day-to-day life. With a large enough asteroid, an could cause a mass extinction and major climatic change.

1.3.1 Clues to the past

The Solar System began 4.5 billion years ago with the gravitational collapse of part of a giant molecular cloud. Most of the collapsed mass formed the Sun, while the remaining material became a swirling disk of dust.

Over time, much of the disk’s mass was depleted by the gravitational attraction of the

Sun. What was leftover from this stellar accretion process is what makes up Solar System objects like asteroids, comets and planets. These bodies were formed in the protoplanetary disk, as dust particles coalesced to form larger objects.

To understand how the planets formed from small particles, we can turn to asteroids.

Some asteroids appear to be protoplanets. Like planets, these asteroids are compositionally

4 layered, with the densest material at its core (Thomas et al., 2005). This shows that these bodies were potential planets, stunted in their evolution. Studying these asteroids thus gives us a better understanding of planetary formation.

Asteroids can also shed light on the origins of life. Some asteroids have been found to contain the building blocks of life, such as water and organic molecules. It is believed that similar objects fell to Earth, bringing materials necessary for life (Alexander et al., 2012).

1.3.2 Keys to further space exploration

The diverse orbits of asteroids make them great stepping stones for further space exploration.

After the Moon, NEAs are the next step for human spacefaring, and can help pave the way for further missions to Mars (Jones et al., 1994). These NEA missions would be useful for testing technologies needed for further Solar System destinations. Currently, NEAs that are potentially accessible by human round-trip missions have already been defined and identified by the Near-Earth Object Human Space Flight Accessible Targets Study (NHATS) (Abell et al., 2015).

An additional benefit to exploring asteroids is the identification of resources available.

The variety of materials found on asteroids (organic molecules, metals, water) suggests that apart from being destinations in and of themselves, asteroids have the potential to act as “pit stops” (Abell et al., 2015). Successfully extracting and utilizing the resources of asteroids could allow for longer and further missions. This would also mean that fewer supplies would be needed for missions, freeing space for other payloads or systems.

1.3.3 A potential hazard

Predicting asteroid impacts is perhaps the most pragmatic reason to study asteroids. The last major extinction event, which wiped out three-quarters of plant and animal species, was caused by an impact.

5 Figure 1.1 MOID is a measure of the distance between the closest point of two orbits. This diagram shows the Earth MOID for a hypothetical NEO.

Potentially hazardous asteroids (PHAs) are NEAs that are large enough to cause sign-

ficant damage in the event of a collision. PHAs are defined by two criteria: distance and

size, as an asteroid must be close enough and large enough to be deemed a threat. To assess

potential close approaches, minimum orbit intersection distance (MOID) is used (Fig. 1.1).

Size is inferred using absolute magnitude, H, which is a brightness measure used for planets

and asteroids. An H magnitude is the magnitude of an object, assuming it is 1 AU from

the Sun and the observer (in an impossible configuration where observer is located at the

center of the Sun). An asteroid is classed as a PHA if it has a MOID <0.05 AU, and H <22

(corresponding to roughly a diameter of 140 m) (Perna et al., 2015). By monitoring and calculating the orbits of PHAs, we are able to accurately predict possible impact events and take precautions if necessary.

Impact avoidance

While there are many asteroid deflection ideas, none have been tested so far. However, the three most feasible methods (Harris et al., 2015) are:

• Kinetic impactors. A massive impact by a hypervelocity massive spacecraft transfers

momentum to a target PHA, and is capable of deflecting the asteroid’s course (Harris

et al., 2015).

6 • Gravity tractors. These are powered spacecrafts that hover near a target PHA, and

uses the gravitational attraction between it and the asteroid to slowly move the asteroid

off-course (Harris et al., 2015). This technique is only feasible for small NEAs that are

some time away from collision, because gravitational force exerted by a tractor would

be relatively small.

• Blast deflection. This method is the only potential option for a very large hazardous

asteroid with little time for deflection. Blast deflection involves using an explosive

either close to or on the surface of a PHA (Harris et al., 2015). The goal of such a

detonation is to alter an asteroid’s trajectory. An asteroid may only be completely

disrupted if the resulting debris do not pose as hazardous impactors themselves.

What technique (or combination of techniques) is used in deflection is completely situation dependent; the hazardous asteroid’s size, distance and composition are all factors to consider.

However, in all cases, early detection is key—finding hazardous asteroids is always the first step to mitigating the risks posed by an impact.

1.4 Detection method

1.4.1 Introduction to asteroid surveys

Currently, the preferred method to identify asteroids is by their motion. In still images, asteroids appear as point sources just as stars do, which makes them difficult to tell apart.

Thus all asteroid detection surveys involve.taking multiple exposures of the same field of sky, and then looking for detections across the exposures that may correspond to a moving object

(e.g., Mainzer et al.(2011)). Because an asteroid’s brightness can vary greatly during its rotational and orbital period, it is generally the motion alone that is used for identification

The time between exposures (known as the survey cadence) is relatively short, so that far away objects (like stars) appear as stationary detections from exposure to exposure. The

7 data is then “cleaned” by removing these stationary detections, either through exposure- to-exposure comparisons (e.g., Masci et al.(2019)) or by comparing source positions to a pre-existing catalog.

However, besides asteroids (and other minor planets), other objects that remain after stationary object filtering include cosmic rays and telescope artifacts such as glints and diffraction spikes. We can isolate asteroids from this noise because as orbital objects, aster- oids can be identified by their specific motion across the sky.

Survey variances

Many different surveys for small body detection exist, and they vary in terms of cadence, exposure and equipment. They are optimised for based on their detection technique, and their objects of interest.

Survey Field of View Cadence Exposure time Pixel size Streak limit

(°) (min) (s) (”) (”/min)

NEAT 1.6 15 20 1.4 8.4

PTF 7.3 45 60 1 2

OSSOS 0.9 varies 287 0.184 0.42

Table 1.2 Specifications for the Near-Earth Asteroid Tracking (NEAT), Palomar Transient Factory (PTF), and Outer Solar System Origins Survey (OSSOS) surveys, taken from Pravdo et al.(1999), Law et al.(2009) and Bannister et al.(2016). The streak limits for NEAT and PTF are calculated based on pixel size and exposure. For seeing-limited OSSOS, the streak limit is based on exposure and an assumed 2" seeing.

Table 1.2 shows the differences between three surveys. The streak limit is the maximum on-sky motion an object can have while remaining a point source in a survey’s images. If a minor planet moves any faster, it will appear as a streak. The streak limits in the table are rough estimates based on the pixel size and exposure of the surveys, however, they provide

8 a useful ballpark to understand how surveys are catered to specific small body populations.

NEAT and PTF are surveys that are used in this project and are used to find asteroids, while

OSSOS is focused on objects in the outer Solar System, such as KBOs. As seen in Table

1.2, the streak limit for OSSOS is significantly lower than for NEAT or PTF, which makes sense as it looks at objects that are further away, and thus are moving slower. NEAT and

PTF are surveys that look for asteroids, and so have a higher streak limit Ye et al.(2019).

1.4.2 Detection linking

Figure 1.2 An example of a tracklet found by the Catalina Sky Survey. These three images of asteroid 2010RX30 are of the same field of the sky. From exposure to exposure, most sources are stationary, except for detections of the asteroid (circled). This series of three detections is a tracklet.

Detection linking is used to establish if any set of non-stationary detections across the multiple exposures can correspond to a singular asteroid moving across the field of view. Linked detections that follow asteroid-like motion across the sky are called a tracklet

(Fig. 1.2). There are several methods of linking described in literature (Denneau et al., 2013;

Granvik & Muinonen, 2005; Kubica et al., 2007; Holman et al., 2018; Jones et al., 2018), but all share the commonality of verifying tracklets based on whether plausible orbits can be found based on exposure-to-exposure movement.

9 1.5 Scope of work

For this thesis, we developed an free and open-source asteroid detection linking software.

The software identifies all tracklets from multi-exposure observations over one night. It does so by running through all combinations of observations, finding candidate tracklets based on exposure-to-exposure separation, then confirming the tracklets via Find_Orb.

Find_Orb is an orbit-determination software created by Bill Gray (Gray, 2017). Find_Orb calculates a best-fit orbit to match detections that it is given, and outputs, among other things, orbital elements and the mean residual for how well the best-fit orbit matches the detections.

The linking software uses a script to quickly and efficiently run through all possible combinations of moving detections that may form a tracklet. Plausible tracklets are run through Find_Orb. The linking software accepts tracklets that have sensible mean residuals and asteroid-like orbits determined by Find_Orb.

The majority of the processing lies in finding candidate tracklets using Python. Pre- screening tracklets before running through Find_Orb is essential as the large number of combinations make it time-prohibitive to run them all through Find_Orb. Another key com- ponent is defining what “asteroid-like” orbits mean when checking tracklets with Find_Orb.

These two considerations (optimizing tracklet pre-screening and Find_Orb screening) are therefore two major areas of focus, and are detailed in Chapter Two.

1.5.1 Expected inputs and outputs

Our asteroid detection linking software is not designed to be used on raw image files. Instead, it is meant to be used as the last stage in the detection process. Our linking software requires data that has been processed by Source Extractor (Bertin & Arnouts, 1996). Source Extrac- tor is a program that reduces images into catalogs of detections with measured properties

(position, flux, etc), which is the appropriate format for the linking software. The input data

10 should also be “cleaned” by having stationary detections removed from the Source Extractor catalog.

The linking software outputs a list of tracklets found from the given input into the

Minor Planet Center’s (MPC) 80-column format1. This is to allow for found tracklets to be submitted to the MPC for approval.

1.5.2 Purpose

While several detection linking software packages currently exist (Denneau et al., 2013;

Holman et al., 2018; Jones et al., 2018), none are freely available. We hope that by making our linking software available for all, especially smaller observatories and citizen scientists, we can increase asteroid detections on a broader scale. Therefore, to make our linking software as broadly applicable as possible, we designed the software to be compatible with different surveys of different cadences and exposures per field.

While this work can help to increase asteroid discoveries, it can also be used for asteroid pre-covery. Newly discovered asteroids may have been missed by previous asteroid surveys; by re-analyzing archival data with our linking software, it may be possible to unearth these previously missed tracklets. With pre-covered tracklets, we are able to get astrometric positions over a long period of time. These longer observation arcs contribute to a better constraint on the asteroid’s orbit. Having accurate measurements of an asteroid’s orbital elements allow for follow-up observations to be made, especially of NEAs.

Our software is named Find Point-source Object Tracklets Affirmed Through Orbit-fits

(FindPOTATOs). It is named so because arguably, asteroids look like potatoes2, and like potatoes, the code is designed to be basic, versatile and adaptable. The name also pays tribute to Find_Orb.

1A full description of the use of each column can be found at https://www.minorplanetcenter.net/iau/ info/OpticalObs.html 2Please refer to https://apod.nasa.gov/apod/ap200401.html

11 12 Chapter Two

Methods

2.1 Software overview

FindPOTATOs is a Python-based software that incorporates Find_Orb, an orbit determina- tion program (Gray, 2017). The bulk of the program identifies candidate tracklets based on exposure-to-exposure object movement. The program runs through all possible combinations of observation linkages and identifies candidate tracklets based on pre-defined parameters.

These candidates are then fed through Find_Orb, which fits an orbit to each of them. A tracklet is accepted if Find_Orb’s orbit fit gives a residual that is below a threshold; this threshold can be adjusted. This process can also be done in passes whereby less stringent parameters are applied in subsequent passes to allow for flexibility in allowed orbits.

2.2 Find_Orb

2.2.1 Purpose

Find_Orb is a C++ based software developed by Bill Gray. The program takes a series of observations of an asteroid, comet, or natural or artificial satellite and finds the corresponding orbit. It takes in observational data in the MPC’s 80-column observation format or in the

AstDyS/NEODys .rwo format, and returns a .txt file containing orbital elements and the

13 mean residual for the fitted orbit.

Find_Orb can be used for a variety of purposes, such as generating ephemerides for object

tracking, testing the quality of astrometric observations, computing impact probabilities, and

more. For our purposes, we used Find_Orb to make sure that it is physically possible for a

series of detections to constitute a tracklet. Note however, that just because Find_Orb is

able to fit an orbit to detections that it is fed does not mean these detections are likely to

be a tracklet. Find_Orb tries its best to fit an orbit to detections, but these orbits could

be highly unlikely. Therefore, FindPOTATOs can be used to eliminate these unlikely orbits.

Find_Orb is distributed under GPL (GNU General Public License), version 2, so using it as part of FindPOTATOs does not affect its free and open-source nature.

2.2.2 Find_Orb method

There are many different methods of orbital fitting, and many elegant solutions exist as outlined in textbooks. However, orbit fitting codes, like Find_Orb, use simpler tools that are computationally efficient. Since the orbits that will be fed into Find_Orb from FindPOTATOs

tend to be of short arcs (on the order of an hour), Find_Orb will default to using statistical

ranging for orbit determination (B. Gray 2020, personal communication).

Statistical Ranging

Statistical ranging involves using the first and last detection to calculate plausible orbits,

fitting the rest of the observations to these orbits, and then picking the most likely orbit out

of the possibilities.

The first step involves figuring out a possible distance to the hypothetical object. There

are hard limits to this distance–we assume that the object cannot be within the Earth’s

atmosphere, and it cannot be interstellar. To narrow down the possible distances, we can

get ranges of possible distances based on the locations of the first and last detection, and

14 the time difference between them.

By treating the distance to the hypothetical object as a variable, we can find possible ranges for it. This involves finding the physical separation between the two detections. The maximum separation possible between the two detections is governed by the Solar System escape velocity of the object, which relies on the object’s distance from Earth. The minimum separation between the detections can be expressed as a function of the on-sky positions of the two detections, and the distance to the object. We can then write the maximum and minimum separations as an inequality statement1 with both sides a function of object distance. Solving this inequality results in possible ranges for the object distance from Earth.

Find_Orb then makes random guesses for an object distance within these distance ranges, and then makes a radial velocity guess within the escape velocity limit. This results in only one possible orbit that can fit the first and last detections. Then, Find_Orb compares how well this orbit fits to the other detections it is fed, and adjusts the orbit slightly to get lower residuals across detections. Find_Orb repeats this process of making guesses for object distance and radial velocity, in order to find an orbit with low residual fits across detections.

2.2.3 Find_Orb limitations

Find_Orb’s accuracy means that it could be used as the main method of linking on its own. By feeding randomly selected detections into Find_Orb, one could theoretically find tracklets just by looking at Find_Orb outputs. However, the problem with this approach is two-fold:

• Scaleability. While using Find_Orb on its own may be feasible for a handful of de-

tections, the number of tracklet possibilities increase dramatically with a larger search

area and number of exposures. Running all combinations through Find_Orb is im-

1The full details for how Find_Orb finds the range of distances to the hypothetical object can be found at https://github.com/Bill-Gray/find_orb/blob/master/sr.cpp, including the full inequality statement.

15 practical even for linking software that is not time-limited such as this, because the

time taken for Find_Orb to generate an orbit could take up to 20 seconds per detection

set.

• Orbit likelihood. Even if Find_Orb manages to match a well-fitting orbit to a series

of detections, it does not take into account how likely such an orbit is for a specific

survey.

2.3 Tracklet Screening

In order to have an effective linking software, we have written code that incorporates and enhances the use of Find_Orb as a linking method. Our software runs through all combina- tions of tracklets and identifies candidate tracklets based on a series of adjustable parameters.

These parameters will be discussed below. We use this method of identifying candidate track- lets because it is much more time-efficient to eliminate blatantly “bad” tracklets with Python before running them through Find_Orb. The linking software is also used as a way to screen out undesired orbits found by Find_Orb.

2.3.1 Data handling

Types of data formats

FindPOTATOsinterfaces with the same data in three different formats. These formats, and their use in the linking software, are listed below:

• Source Extractor table. Data that has been processed by Source Extractor should be

in a .tbl format, the columns of which are outlined below. The linking software will

only accept data in format.

• Pandas DataFrame. The Python Pandas library is an efficient data format for data

16 manipulation, All of the calculations done within Python uses a DataFrame version of

detection data.

• MPC 80-col format. This is one of the two accepted input formats for Find_Orb, and

also the desired output format for the linking software.

Input data

FindPOTATOs requires cleaned Source Extractor outputs, with all detections formatted in a

.tbl file. This .tbl file should have the following columns:

• ‘sid’ — Source ID. This is an individual identification number that is assigned to each

detection in an image. While the linking code disregards what the actual source ID

for the detections are, it is important that detections are individually named.

• ‘obsjd’ — Observation date. This column encodes the observation time of any partic-

ular detection, in MJD format.

• ‘ra’ — Detection RA.

• ‘dec’ — Detection declination.

• ‘fieldnum’ — Field number of image. This is an identification number that is assigned

based on the field of the image. Images taken at different times, but centered on the

same sky position are considered to have the same field number.

Any extraneous columns will not need to be deleted; however, all of the above columns must be included for software to run properly.

Format Conversion

FindPOTATOs first takes the input .tbl file and converts it into two formats: a Pandas DataFrame and MPC 80-col format. To avoid the problem of having to constantly translate between

17 these two formats, we decided to convert the .tbl file into both right at the beginning, and then use indexing to refer to specific detections in either format.

This conversion is done using the function tblConvert(). The function first turns each row in the .tbl file into a Pandas DataFrame for analysis, with each detection taking up a row in the DataFrame and columns for the detection’s right ascension, declination, field number, exposure number and time of observation. These values are also saved in an ap- propriate datatype (e.g. datetime object for time of observation, AstroPy Angle for RA and Dec) within the DataFrame. Next, each detection’s corresponding exposure number is calculated based on its field number and observation time, and appended to the DataFrame as a new column. The DataFrame is then converted into MPC 80-col format, and saved as a .txt file in the same directory as the original input .tbl file.

2.3.2 Software parameters

In order to narrow down the possible tracklets, the linking software screens out candidate tracklets based on several parameters, listed below. Where these parameters are used is detailed in Section 2.3.3.

Angular speed

The angular speed restriction is the maximum on-sky movement that will be accepted by the FindPOTATOs. This is specified in arcseconds per second. This parameter is used to pre-screen tracklets.

Angle between detections

The angle between detections (Figure 2.1) is a parameter used for surveys with tracklets that are formed by three detections. This angle is calculated using the cosine rule, and assumes a flat sky, and that all three detections are equidistant from the observer.

18 Figure 2.1 θ measurement in relation to the three detections

Find_Orb mean residual

For all orbits calculated by Find_Orb, all detections that are used to find the orbit also have a residual attached. The final orbit that Find_Orb outputs includes a mean of all the residuals calculated, which are a reliable indication of how well the calculated orbit fits the detections overall. FindPOTATOs rejects all Find_Orb results that have a mean residual that is higher than what is specified by the user.

MOID

For some calculated orbits, Find_Orb also calculates an Earth MOID if it is small. There is a user-specified minimum MOID in order to eliminate unlikely orbits being found by

Find_Orb.

19 2.3.3 Recursion

Pre-screening tracklets

The DataFrame is fed through nested for-loops that runs through detection combinations.

We have chosen to work with for-loops for the ease of breaking out of an iteration once a condition is met (i.e. when a tracklet is found).

However, straightforward nested for-loops would mean that the number of exposures per field the software can analyse is hard-coded (as each for-loop would correspond to one exposure); we have avoided this by making use of a recursive function, findTracklet() instead. findTracklet() is also a for-loop, but since it is a function it can be called an indefinite number of times to suit the data. A main for-loop, found in the function linking() is used to run through all detections in the first exposure of all fields. For every one of these first detections, the main for-loop edits down the main DataFrame into a smaller

DataFrame consisting of only detections from the same field. This smaller DataFrame is fed into findTracklet(), which loops through all detections in the next exposure, and uses

AstroPy (Astropy Collaboration et al., 2018) to find the angular separation between the detection chosen in the previous exposure and the detection looped through in the current exposure. If the angular separation fits the maximum angular speed set, the function recurses and detections from the next exposure are looped through.

In order to keep track of the detections being linked, the detections, in MPC format, are saved to a temporary text file named fo.txt (located in ~/.find_orb/fo.txt), as the code weaves through the for-loops. If the recursive function links a detection, that detection is saved to file prior to recursion. If not, the function deletes the previously saved detection, and goes back a level of recursion to loop through the previous exposure.

An extra step is added when three detections are linked. The angle between the three detections is calculated and compared against the angle limitation. If the angle is below the angle limitation, the last detection is unlinked, and the function goes back a level of

20 recursion; if it is above the angle limit, the last detection is linked.

Find_Orb incorporation

Once the recursive function links a detection in the last exposure, the temporary text file fo. txt is fed through the Find_Orb software through the find_orb() function. The function feeds fo.txt to the non-interactive version of Find_Orb using the Popen function from the subprocess library. When Find_Orb finishes running and successfully fits an orbit to the detections in fo.txt, it saves these orbital elements to the file ~/.find_orb/elements.txt.

The find_orb function then analyses Find_Orb’s output to determine if the fitted orbit meets user-set parameters. This is done by parsing the elements.txt file using the search() function from the re Python module. ‘MOID’ and ‘mean residual’ are searched for in elements.txt, and if these values are acceptable, a tracklet is considered “found”. If so, find_orb() returns a true value. Otherwise, a false value is returned.

Post-Find_Orb

If the function find_orb() returns a false value, the main findTracklet() recursive function goes back a level of recursion to find another suitable last detection. However, if find_orb() returns a true value, findTracklet() uses the truth value to exit out of all levels of recursion, and back to the main for-loop in linking(). The overall process of linking() is summarised in 2.2.

Within the main for-loop, when a tracklet is found, the contents of fo.txt is appended to the file outputs/tracklets.txt, and the contents of elements.txt is appended to outputs/OrbitalElements.txt.

21 Figure 2.2 Overview of the main linking process, assuming one tracklet is made of three detections. Pre-Find_Orb parameters are the angular speed and angle limit; Post-Find_Orb parameters are MOID limit and mean residual limit. Not meeting any one of these parameters causes the function to retry with another group of three detections.

2.3.4 Screening passes

Since one of the main goals of this project is to create a linking software that is flexible for a myriad of orbits, we have decided to implement screening in passes. Users should ideally choose less stringent parameters for each successive pass. This method was used in order to quickly identify easily linked tracklets in the first pass, and to avoid mis-linkages by linking these detections to a more erratic orbit.

We are able to do screening in passes because all of the linking is done with the function linking(). Once all tracklets are found and linking() has finished running in the first pass, the software re-runs the linking() function again with new parameters on the remaining detections.

22 Chapter Three

Code testing with NEAT

This linking software was originally designed as part of a NEAT (Near-Earth Asteroid Track- ing) re-analysis pipeline. Therefore, preliminary tests on this software was done with NEAT data.

3.1 NEAT overview

The Near-Earth Asteroid Tracking (NEAT) survey operated from 1995 to 2007. NEAT was one of the first surveys of its kind, and discovered 41,227 minor planets, and reported observations of 258 comets.

3.2 NEAT data

As the rest of the re-analysis pipeline was being developed in conjunction with the linking software, at the time of development, we did not have Source Extractor-cleaned NEAT data to be used as input. Therefore, we decided to use NEAT tracklets that have been submitted and accepted by the MPC as test data. The benefit of this is that it is an easy way to check if the software is working as intended. Since we are feeding it a known quantity of tracklets, we can easily compare the output results to the known tracklets to see if all tracklets were found, and if any were mis-linked.

23 Accessing NEAT data

All detections reported and accepted by the MPC is saved in the 80-col format and archived

in two files: UnnObs.txt and NumObs.txt1. We downloaded these files and isolated the detections from NEAT. From there, we chose a subset of 42 detections (corresponding to 14 tracklets) from the same night to use with our linking software.

Field and exposure numbers

One major limitation of using NEAT data was that it was formatted according to the MPC

80-col format, and as such, did not have information corresponding to each detection’s field and exposure number (i.e. we did not know which image each point source came from). This is an issue because FindPOTATOs is reliant on finding tracklets within the same fields.

Therefore, in order to use NEAT data from the MPC, we had to re-label all detections and assign them field numbers and exposure numbers.

3.3 Code modifications for NEAT

Since FindPOTATOs was designed for Source Extractor .tbl input, we used a slightly modified

version for MPC 80-col files. This involved converting the 80-col data into a Pandas DataFrame,

and then using a new function, labelNEAT() to add field and exposure numbers. labelNEAT()

The labelNEAT function is a rudimentary method that was used to fix NEAT data limita- tions. The function labels NEAT data (in the form of a DataFrame) based on date and time

of observation. It creates the following new columns in the input DataFrame:

1These files can be found at the MPC’s Observation Archive: https://minorplanetcenter.net/iau/ECS/ MPCAT-OBS/MPCAT-OBS.html

24 • ‘image’: Specifies which NEAT image a detection comes from. All detections with the

exact same observation time can be assumed to be from the same image, so each detec-

tion can be chronologically labelled with an image number based on their observation

time.

• ‘frame’: Specifies a best guess of which field of view a detection comes from; corresponds

to ‘fieldnum’ from a typical .tbl input.

• ‘time’: Specifies a best guess of which exposure of a particular field of view a detection

comes from; i.e. it is the nth time NEAT has returned to a specific field during the

course of one night.

Limitations

However, because of the irregularities of the NEAT survey, labelNEAT() does not always work as intended. For instance, some detections are labelled as having been observed mi- croseconds apart. This is likely to be a recording error since exposures are generally 30 seconds long, so they must be part of the same exposure. However, labelNEAT() will cata- log them as two separate images.

Since the linking code is not originally intended to work with code in 80-col format, we decided that we would not try to fix labelNEAT() to work in all scenarios. Instead, we isolated 42 detections (corresponding to 14 tracklets) that the function is able to correctly classify and label. This is the main data set that we have been testing on, in terms of NEAT observations.

3.4 Noise addition

For a more quantitative assessment of the linking software’s performance, we decided to add noise incrementally to the 42 detections. Noise addition was done with another function,

25 addNoise().

The function adds noise points to NEAT data by randomly choosing a detection and plac- ing a point within 1 degree (corresponding to NEAT’s field of view) of the chosen observation.

The noise point uses the chosen observation as a template, so it has all the non-positional attributes (magnitude, time of observation, etc) of the original observation for use in MPC

80-col format. Noise observations are appended at the end of original .txt file (with the designation ‘FAKE’ as an identifier), and is also added to the DataFrame.

The new DataFrame is then fed through the main linking function. It is then able to identify false positive tracklets that are due to linked noise in the final output by counting how many detections have the ‘FAKE’ designation.

3.5 Findings

Using the maximum angular speed set of 0.3”/s, we ran the FindPOTATOs on NEAT’s 42 detections. The software successfully re-linked all 14 tracklets with no errors.

We then ran the 42 observations through the linking software ten times for each different noise level tested. We have used two metrics to gauge the accuracy of the detection linking software. Good tracklets is a count of how many (of the expected 14 tracklets) are returned by the software with no mis-linkages whatsoever. False positive tracklets are detections that are either mis-linked, or linked with noise. Figure 3.1 illustrates the software’s performance, averaged over the ten trials, using both of these metrics. As shown, FindPOTATOs performs ideally until the 80% noise level (indicating that 168 noise points have been added), where the number of good tracklets begin to decrease and the number of false positives begin to increase. Encouragingly, even at 95% noise (i.e. 798 added noise points), the software is still managing to perform at a decent level of accuracy, with only about 1 missed tracklet and 1 false positive tracklet on average.

26 Figure 3.1 Accuracy measurements for linking software based on number of good tracklets and number of false positive tracklets. Note that noise percentage refers to percent of total input data that is noise; e.g. 20% noise is a 2:8 noise to true detection ratio, while 90% noise is a 9:1 ratio.

27 28 Chapter Four

Applications to PTF data

While the preliminary trials done on NEAT data were encouraging, they did not best emulate data that would be fed to the linking software. In real-world usage, actual data would be in .tbl format,and would not only be limited to confirmed detections (since not all non- stationary sources are necessarily asteroid detections). Therefore, we applied the code with

Palomar Transient Factory (or PTF) data instead.

4.1 PTF overview

PTF was a survey based at Palomar Observatory that operated from 2009-2012 (Law et al.,

2009). It was designed for the detection of transient objects, objects that vary with magni- tude over time. These include supernovae, variable stars, comets and asteroids.

4.2 Accessing PTF data

The PTF data is archived on the NASA/IPAC Infrared Science Archive (IRSA). First, we identified a region of the sky that was imaged multiple times for asteroid detection.

This was done by examining reported PTF asteroid observations submitted to the Minor

Planet Center. We chose an observation at random, and recorded the RA, Dec, and date of observation.

29 “PTF Sources Catalog” was accessed via the “Catalog Search” function1. Using “Cone

Search”, we queried all sources within a certain radius of the recorded coordinate, with the additional column constraint of a specific night corresponding to the date of the recorded observation. The output columns selected, besides the default “ra” and “dec” columns, were

“sid”, “obsmjd (days)”, “mag_auto”, “ptffield”, and “goodflag”. If given a sensible search radius, these search settings return a downloadable .tbl file that can be directly used by

FindPOTATOs.

4.3 Star removal

The main issue with using PTF data to test FindPOTATOs is that while PTF data has been resolved by Source Extractor, it still includes all resolved sources, including stationary objects. Since the linking software is designed to be used with “cleaned” data, we had to do preliminary stationary source removal on the PTF data before running it through

FindPOTATOs. To do so, we used data from the Panoramic Survey Telescope and Rapid

Response System (Pan-STARRS) (Chambers et al., 2019), and compared it with data from the PTF to identify stationary sources.

4.3.1 Accessing Pan-STARRS data

Pan-STARRS data is archived by at the Space Telescope Science Institute (STScI) (Flewelling et al., 2019). We used the Pan-STARRS catalog search, and did a coordinate search using the same RA, Dec, and search radius used to query the PTF database. Under the catalog type and release section, we selected “PS1 DR2” and “stacked object”. This search would return a downloadable .csv file.

1https://irsa.ipac.caltech.edu/Missions/ptf.html

30 4.3.2 Code for star removal

In order to process multiple samples of PTF data efficiently, we created a function within

FindPOTATOs to be run before all the linking is done, so that downloaded PTF data can be di- rectly run by the linking software. This function, removeStars(), takes in a Pandas DataFrame version of the PTF data and the .csv Pan-STARRS file. It compares the on-sky positions of all detections in both files, and if there is a detection from Pan-STARRS that is within 1” of a detection from PTF, that detection is deleted from the PTF DataFrame. This process takes longer than the actual linking, so the PTF data with stars removed is also saved at this point so that more trials can be run on the same dataset if needed.

4.4 Trials

Throughout the development of the code, several tests were run with different parameters.

By changing the search radii on the PTF catalog search, we created inputs of varying sizes for FindPOTATOs. We tested the code incrementally, going from search radii of 3’ to 20’.

Due to time constraints, tests on data sets larger than 20’ were unable to be done because the time consuming nature of star removal. Trials done on data of search radii 3’, 5’, 10’ and 20’ were all successful, with all tracklets expected found. Since the code was still in active development as these trials were done, we used the trials as a way to debug and catch software flaws. Therefore, while we reached 100% accuracy on all trials run, this cannot be taken to mean that the software will perform at such levels with every field of view with similar specifications. Due to the repetitive nature of these trials, only the most significant trial (at 20’) is summarised here, in order to demonstrate the software testing process.

31 4.4.1 Testing on 20’ PTF data

The search settings we chose for the 20’ trial was all data from 2013 April 5, within a 20’

radius of RA=197.03233 and Dec=+13.66614. With these search settings, the number of

sources found from the PTF catalog search was 6996; the star removal function narrowed

down this number to 326 detections.

FindPOTATOs then was run with the settings of angular speed limit of 0.3”/s, maximum

3 Find_Orb residual of 0.2, an angle limit of 4 π and a MOID limit of 0.001 AU. No second pass was done. Out of these 326 detections, the linking software found 5 tracklets. 4 tracklets were represented in the MPC’s list of confirmed detections. The remaining tracklet was originally thought to be mis-linked; however, further analysis determined that the 5th tracklet was an unreported tracklet.

Analyzing the unreported 5th tracklet

Confirming the validity of the 5th tracklet involved using the MPC’s Minor Planet Checker

(MPChecker)2. We first queried for a list of known minor planets within a 20’ radius of our initial search coordinates. Unlike the MPC’s list of confirmed detections, MPChecker finds all minor planets that can be physically spotted by an observatory during any given time within a given field based on orbit calculations. Therefore, MPChecker gives a list of objects that could hypothetically be spotted at a specific time.

MPChecker suggested that 6 objects could be spotted, along with their coordinates.

4 of these objects matched with the tracklets from the MPC’s confirmed detection list.

Another object was determined to be outside PTF’s field of view as its coordinates were not represented in the data downloaded from the PTF Sources Catalog. There was therefore one object left unaccounted for, object 401570.

Following this, we used the Minor Planet Checker again, this time pasting in the 5th

2https://minorplanetcenter.net/cgi-bin/checkmp.cgi

32 tracklet in the 80-col format with the “around these observations” option selected resulted in a match to the orbit of object 401570, with 0 degree offsets from the predicted coordinates.

This confirmed that the 5th tracklet was unreported. Subsequently, we have submitted this tracklet to the Minor Planet Center, and it has been accepted.

33 34 Chapter Five

Conclusions

We built Python-based software for asteroid detection linking. By making the software

FindPOTATOs freely available, we hope it will enable more asteroid detections.

While many trials have been done with the linking software with both NEAT and PTF data, the most promising result came from the 20’ trial on PTF data. It demonstrated linking performance at a 100% accuracy in a specific 20’ field. This result is concurrent with the results from the NEAT trials, which showed decent accuracy at the 95% noise level, with an average of 1 missed tracklet and 1 false positive tracklet, keeping in mind that the

NEAT field used (1 degree) is larger than the PTF trial radius. In addition, the trials show that under ideal conditions, the linking software is able to perform on par with the linking capabilities of the more complexly designed NEAT and PTF systems.

A key area of focus for future development is the ability to process larger amounts of data through the linking software. Currently, FindPOTATOs is limited by the star removal process. Ideally, we would be able to test the software with a full night’s worth of PTF data, and compare the output with known detections in order to better understand software accuracy.

Additionally, another aspect that could be implemented in the future is to make FindPOTATOs

compatible with the Astrometry Data Exchange Standard (ADES)1. This would allow more

1Full details of the ADES format can be found at https://minorplanetcenter.net/iau/info/ADES.html

35 precise astrometry, photometry and uncertainties to be stored.

36 REFERENCES

Abell, P., B.W. Barbee, P.W. Chodas, et al. 2015, in Asteroids IV, ed. P. Michel, F. E.

DeMeo, & W. F. Bottke (University of Arizona Press)

Alexander, C. M. O., Bowden, R., Fogel, M. L., et al. 2012, Sci, 337, 721, doi: 10.1126/

science.1223474

Astropy Collaboration, Price-Whelan, A. M., Sipőcz, B. M., et al. 2018, AJ, 156, 123,

doi: 10.3847/1538-3881/aabc4f

Bannister, M. T., Kavelaars, J. J., Petit, J.-M., et al. 2016, AJ, 152, 70, doi: 10.3847/

0004-6256/152/3/70

Bertin, E., & Arnouts, S. 1996, A&AS, 117, 393, doi: 10.1051/aas:1996164

Chambers, K. C., Magnier, E. A., Metcalfe, N., et al. 2019, arXiv:1612.05560. http://arxiv.

org/abs/1612.05560

DeMeo, F. E., & Carry, B. 2014, Natur, 505, 629, doi: 10.1038/nature12908

Denneau, L., Jedicke, R., Grav, T., et al. 2013, PASP, 125, 357, doi: 10.1086/670337

Flewelling, H. A., Magnier, E. A., Chambers, K. C., et al. 2019, arXiv:1612.05243. http:

//arxiv.org/abs/1612.05243

Granvik, M., & Muinonen, K. 2005, Icar, 179, 109, doi: 10.1016/j.icarus.2005.06.001

37 Gray, B. 2017, Find_Orb. https://projectpluto.com/find_orb.htm

Harris, A. W., Boslough, M., Chapman, C. R., et al. 2015, in Asteroids IV, ed. P. Michel,

F. E. DeMeo, & W. F. Bottke (University of Arizona Press)

Holman, M. J., Payne, M. J., Blankley, P., Janssen, R., & Kuindersma, S. 2018,

arXiv:1805.02638. https://ui.adsabs.harvard.edu/abs/2018arXiv180502638H/abstract

IAU. 2013, Near Earth Asteroids (NEAs). https://www.iau.org/public/themes/neo/nea2/

Jones, R. L., Slater, C. T., Moeyens, J., et al. 2018, Icar, 303, 181, doi: 10.1016/j.icarus.

2017.11.033

Jones, T., Eppler, D., Davis, D., et al. 1994, in Hazards Due to Comets and Asteroids, ed.

T. Gehrels (University of Arizona Press)

Kubica, J., Denneau, L., Grav, T., et al. 2007, Icar, 189, 151, doi: 10.1016/j.icarus.2007.01.

008

Law, N. M., Kulkarni, S. R., Dekany, R. G., et al. 2009, PASP, 121, 1395, doi: 10.1086/648598

Mainzer, A., Grav, T., Bauer, J., et al. 2011, ApJ, 743, 156, doi: 10.1088/0004-637X/743/

2/156

Masci, F. J., Laher, R. R., Rusholme, B., et al. 2019, PASP, 131, 018003, doi: 10.1088/

1538-3873/aae8ac

MPC. 2020, Minor Planet Center. https://minorplanetcenter.net//

Perna, D., Dotto, E., Ieva, S., et al. 2015, AJ, 151, 11, doi: 10.3847/0004-6256/151/1/11

Pravdo, S. H., Rabinowitz, D. L., Helin, E. F., et al. 1999, AJ, 117, 1616, doi: 10.1086/300769

Stern, S. A., & Colwell, J. E. 1997, AJ, 490, 879, doi: 10.1086/304912

38 Thomas, P. C., Parker, J. W., McFadden, L. A., et al. 2005, Natur, 437, 224, doi: 10.1038/

nature03938

Ye, Q., Masci, F. J., Lin, H. W., et al. 2019, PASP, 131, 078002, doi: 10.1088/1538-3873/

ab1b18

39 APPENDIX Appendix A

A.1 Code

A.1.1 linking_library.py

Module used for main linking analysis.

1 import numpy as np

2 import matplotlib.pyplot as plt

3 import pandas as pd

4 import datetime

5 from dateutil.relativedelta import relativedelta

6 from astropy.coordinates import SkyCoord, Angle, Distance

7 import astropy.units as u

8 import itertools# for looping through colors when plotting all points

9 import os

10 from subprocess import Popen, PIPE# used to call Find_Orb

11 import re# regular expressions, used to search for mean residuals in Find_orb output files

12 from time import sleep

13 import matplotlib.cm as cm

14 import time

15 from astropy.time import Time

16 from datetime import timedelta

17 import math

18 from matplotlib.backends.backend_pdf import PdfPages

19

20

41 21 def linking(saveObs, combinations, correctTracklets, falsePositives, lines , data, speed, maxResidual, original, originalAddNoise, angleLim, nullResid = True, MOIDLim = False):

22 timeA = (data.loc[data[’time’] == 0])

23 fori in np.arange(len(timeA)):# loop through all timeA images

24 findOrbTxt = open(os.path.expanduser("~/.find_orb/fo.txt"),"w")

25 indexes = []

26 trackletFound = False

27 FOV = timeA [’frame’].iloc[i]

28 coord1 = SkyCoord(ra=timeA[’RA’].iloc[i],dec=timeA[’Dec’].iloc[i], unit=(u.hourangle, u.deg), distance=70*u.kpc)

29 date = timeA[’date’].iloc[i]

30 maxTime = max(original.loc[original[’frame’] == FOV][’time’])

31 findOrbTxt.write(lines[timeA[’line’].iloc[i]])

32 indexes.append(timeA[’line’].iloc[i])

33 findOrbTxt.close()

34 trackletFound, indexes, timenum = findTracklet(indexes, trackletFound, 1, lines, FOV, data, coord1, speed, date, maxTime, maxResidual, originalAddNoise, angleLim, nullResid, MOIDLim)

35 if trackletFound:

36 rightTracklet = True

37 combinations = combinations + 1

38 for line in open(os.path.expanduser(’~/.find_orb/fo.txt’)):

39 match = re.search(’FAKE’, line)

40 if match:

41 rightTracklet = False

42 if rightTracklet:

43 correctTracklets = correctTracklets + 1

44 if not rightTracklet:

45 falsePositives = falsePositives + 1

46 if saveObs:

42 47 open("outputs/OrbitalElements.txt","a+").writelines([l forl in open(os.path.expanduser("~/.find_orb/fo.txt")).readlines()])

48 for line in open(os.path.expanduser("~/.find_orb/elements. txt")):

49 li=line.strip()

50 if not li.startswith("#"):

51 open("outputs/OrbitalElements.txt","a"). writelines(line.rstrip())

52 open("outputs/OrbitalElements.txt","a"). writelines("\n")

53 open("outputs/OrbitalElements.txt","a").writelines("\n\n" )

54 open("outputs/tracklets.txt","a").writelines([l forl in open (os.path.expanduser("~/.find_orb/fo.txt")).readlines()])

55 os.remove(os.path.expanduser("~/.find_orb/fo.txt"))

56 for ii in (indexes):

57 data = data[data.line != ii]

58 originalAddNoise = originalAddNoise[originalAddNoise.line != ii]

59 return data, originalAddNoise, combinations, correctTracklets, falsePositives

60

61

62 def plotTracklets():

63 """

64 Plots all found tracklets in4x4 subplots.

65 """

66 widths = [

67 14,#name

68 18,#date

69 12,#RA

70 12,#Dec

43 71 ]

72 data = pd.read_fwf(’outputs/tracklets.txt’, widths=widths,header=None)

73 pp = PdfPages(’outputs/trackletsPlot.pdf’)

74 data.columns = ["name","date","RA","Dec"]

75 xlim=(None,None)

76 ylim=(None,None)

77 figsNum = np.ceil(len(data)/3/4)

78 for figs in np.arange(figsNum):

79 fig = plt.figure(figsize=(15,15))

80 ax1 = fig.add_subplot(221)

81 ax1.set_xlabel(’RA(degrees)’)

82 ax1.set_ylabel(’Dec(degrees)’)

83 ax1 . axis (’equal’)

84 ax1.grid(True)

85 set = data[int(figs*4):int(figs*4+3)]

86 ra = Angle (set[’RA’],unit=(u.hourangle))

87 ra = ra.wrap_at(180*u.degree)

88 dec = Angle (set[’Dec’],unit=(u.deg))

89 fori in np.arange(len(ra)):

90 ax1.scatter(ra.degree[i], dec.degree[i], label="observation%i "%(i))

91 ax1.set_title(’Tracklet%i’%(figs*4+1))

92 ax1.legend()

93 if(len(data)/3)-(figs*4+1) > 0:

94 ax2 = fig.add_subplot(222)

95 ax2.set_xlabel(’RA(degrees)’)

96 ax2.set_ylabel(’Dec(degrees)’)

97 ax2.grid(True)

98 ax2 . axis (’equal’)

99 set = data[int(figs*4+3):int(figs*4+6)]

100 ra = Angle (set[’RA’],unit=(u.hourangle))

101 ra = ra.wrap_at(180*u.degree)

44 102 dec = Angle (set[’Dec’],unit=(u.deg))

103 fori in np.arange(len(ra)):

104 ax2.scatter(ra.degree[i], dec.degree[i], label=" observation%i"%(i))

105 ax2.legend()

106 ax2.set_xlim(xlim)

107 ax2.set_ylim(ylim)

108 ax2.set_title(’Tracklet%i’%(figs*4+2))

109 if(len(data)/3)-(figs*4+2) > 0:

110 ax3 = fig.add_subplot(223)

111 ax3.set_xlabel(’RA(degrees)’)

112 ax3.set_ylabel(’Dec(degrees)’)

113 ax3.grid(True)

114 ax3 . axis (’equal’)

115 set = data[int(figs*4+6):int(figs*4+9)]

116 ra = Angle (set[’RA’],unit=(u.hourangle))

117 ra = ra.wrap_at(180*u.degree)

118 dec = Angle (set[’Dec’],unit=(u.deg))

119 fori in np.arange(len(ra)):

120 ax3.scatter(ra.degree[i], dec.degree[i], label=" observation%i"%(i))

121 ax3.legend()

122 ax3.set_xlim(xlim)

123 ax3.set_ylim(ylim)

124 ax3.set_title(’Tracklet%i’%(figs*4+3))

125 if(len(data)/3)-(figs*4+3) > 0:

126 ax4 = fig.add_subplot(224)

127 ax4.set_xlabel(’RA(degrees)’)

128 ax4.set_ylabel(’Dec(degrees)’)

129 ax4.grid(True)

130 ax4 . axis (’equal’)

131 set = data[int(figs*4+9):int(figs*4+12)]

45 132 ra = Angle (set[’RA’],unit=(u.hourangle))

133 ra = ra.wrap_at(180*u.degree)

134 dec = Angle (set[’Dec’],unit=(u.deg))

135 fori in np.arange(len(ra)):

136 ax4.scatter(ra.degree[i], dec.degree[i], label=" observation%i"%(i))

137 ax4.legend()

138 ax4.set_xlim(xlim)

139 ax4.set_ylim(ylim)

140 ax4.set_title(’Tracklet%i’%(figs*4+4))

141 plt.savefig(pp, format=’pdf’)

142 plt . clf ()

143 pp. close ()

144

145

146 # new and improved

147 def find_orb(maxResidual, nullResid = True, MOIDLim = False):

148 """

149 Feeds observations inMPC format that are located ~/.find_orb/fo.txt to

150 the non-interactive version of find orb, fo. find_orb stores orbital

151 elements in ~/.find_orb/elements.txt, which this function will read to

152 find the mean residual to the orbital fit. If the mean residual is less

153 than maxResidual(specified in") and all observations in

154 ~/.find_orb/fo.txt was used to generate the orbital fit, then the

155 function will return True. In other cases(e.g. find_orb doesn’t run;

156 mean residual greater than maxResidual; not all observations in

157 ~/.find_orb/fo.txt used), the function will return False.

158 """

159 if os.path.exists(os.path.expanduser("~/.find_orb/elements.txt")):

46 160 os.remove(os.path.expanduser("~/.find_orb/elements.txt"))

161 sp = Popen ([’cd ~/.find_orb\n~/find_orb/find_orb/fo fo.txt-c’], shell = True )

162 totSleep = 0

163 # wait for find_orb to create elements.txt. If it takes longer than 20 seconds

164 # then find_orb probably can’t find an orbit.

165 while not os.path.exists(os.path.expanduser("~/.find_orb/elements.txt" )):

166 sleep (0.2)

167 totSleep = totSleep + 0.2

168 if totSleep > 20:

169 break

170 if os.path.exists(os.path.expanduser("~/.find_orb/elements.txt")):

171 if os.path.getsize(os.path.expanduser("~/.find_orb/elements.txt")) == 0:

172 sleep (0.2)

173 numObs = sum(1 for line in open(os.path.expanduser("~/.find_orb/fo .txt")))

174

175 # save all inputs to find_orb

176 open("outputs/AllPotentialTracklets.txt","a+").writelines([l for l in open(os.path.expanduser("~/.find_orb/fo.txt")).readlines()])

177 for line in open(os.path.expanduser("~/.find_orb/elements.txt")):

178 li=line.strip()

179 if not li.startswith("#"):

180 open("outputs/AllPotentialTracklets.txt","a").writelines( line.rstrip())

181 open("outputs/AllPotentialTracklets.txt","a").writelines( "\n")

182 open("outputs/AllPotentialTracklets.txt","a").writelines("\n\n")

183

47 184 resCheck = False

185 for line in open(os.path.expanduser(’~/.find_orb/elements.txt’)):

186 match = re.search(’mean residual(\d+)".(\d+)’, line)

187 match2 = re.search(’MOID:(\d+).(\d+)’, line)

188 if match:

189 res = int(match.group(1)) + float((’0.’+match.group(2)))

190 if nullResid:

191 if (res < maxResidual) & (res > 0):# maxResidual in"

192 resCheck = True

193 else:

194 resCheck = False

195 else:

196 if (res < maxResidual):# maxResidual in"

197 resCheck = True

198 else:

199 resCheck = False

200 if (match2):

201 if MOIDLim:

202 MOID = int(match2.group(1)) + float((’0.’+match2.group (2) ))

203 if MOID > MOIDLim:

204 break

205 if resCheck:

206 return True

207 else:

208 return False

209 else:

210 return False

211

212 def findTracklet(indexes,trackletFound, timenum, lines, FOV, df, coord1, speed, date, maxTime, maxResidual, originalAddNoise, angleLim, nullResid = True, MOIDLim = False):

48 213 """

214 Function recursively identifies nearby observations in the next frame.

215 Saves each observation it links to fo.txt for use with find_orb.

216 """

217 FOVB = df.loc[df[’time’] == timenum]

218 FOVB = FOVB.loc[FOVB[’frame’] == FOV]

219 fori in np.arange(len(FOVB)):

220 if (trackletFound):

221 return trackletFound, indexes, timenum

222 break

223 coord2 = SkyCoord(ra=FOVB[’RA’].iloc[i],dec=FOVB[’Dec’].iloc[i], unit=(u.hourangle, u.deg), distance=70*u.kpc)

224 timeDelta = FOVB[’date’].iloc[i] - date

225 maxDist = (timeDelta/np.timedelta64(1,’s’))*(speed)

226 maxDist = Distance(maxDist, u.kpc)

227 sep = coord1.separation_3d(coord2)

228 # print(sep,’separation’)

229 if (maxDist > sep) & (timenum != maxTime):

230 #####

231 # enters this if statement if observationsA andB are

232 # enough in distance from each other,AND there are

233 # additional exposures to loop through

234 #####

235 # setup variables for recursive loop, by turning the

236 # current observationB into observationA

237 date = FOVB [’date’].iloc[i]

238 coord1 = coord2

239 timenum = timenum + 1

240 findOrbTxt = open(os.path.expanduser("~/.find_orb/fo.txt"),"a" )

241 findOrbTxt.write(lines[FOVB[’line’].iloc[i]])

242 findOrbTxt.close()

49 243 indexes.append(FOVB[’line’].iloc[i])

244 # print(’next:’,FOVB[’name’].iloc[i])

245 findOrbTxt = findOrbTxt.close()

246 trackletFound, indexes, timenum = findTracklet(indexes, trackletFound,timenum, lines, FOV, df, coord1, speed, date, maxTime, maxResidual, originalAddNoise, angleLim, nullResid, MOIDLim)

247

248 elif (maxDist < sep) :

249 #####

250 # ifB is too far away fromA, deleteB

251 #####

252 df = df[df.line != FOVB[’line’].iloc[i]]

253

254 elif (maxDist >= sep) & (timenum == maxTime):

255 #####

256 # ifA andB are close enoughAND there are no more

257 # exposures to loop through

258 #####

259 findOrbTxt = open(os.path.expanduser("~/.find_orb/fo.txt"),"a" )

260 findOrbTxt.write(lines[FOVB[’line’].iloc[i]])

261 findOrbTxt.close()

262 indexes.append(FOVB[’line’].iloc[i])

263 # print(’final:’,FOVB[’name’].iloc[i])

264 angle = findAngle(angleLim)

265 if angle >= (angleLim * u.radian):

266 #####

267 # if linked tracklet makes an angle greater than

268 # angleLim

269 #####

270 print("angle=", angle)

271 trackletFound = find_orb(maxResidual, nullResid, MOIDLim)

50 272 if not trackletFound:

273 #####

274 # if linked tracklet is rejected(either by find_orb

275 # or by angle), delete last observation, and re-enter

276 # recursive function

277 #####

278 indexes=indexes[:-1]

279 readFile = open(os.path.expanduser("~/.find_orb/fo.txt"))

280 fileLines = readFile.readlines()

281 readFile.close()

282 findOrbTxt = open(os.path.expanduser("~/.find_orb/fo.txt") ,"w")

283 findOrbTxt.writelines([item for item in fileLines[:-1]])

284 findOrbTxt.close()

285 df = df[df.line != FOVB[’line’].iloc[i]]

286

287 if trackletFound:

288 return trackletFound, indexes, timenum

289

290

291 if(not trackletFound) & (timenum >= 2):

292 #####

293 # if no tracklets are found using, go back one exposure

294 # as long as not working with first or second exposure.

295 timenum = timenum - 1

296 df = df[df.line != indexes[-1]]

297 B = originalAddNoise.loc[originalAddNoise[’time’] == timenum+1]

298 df = df.append(B)

299 df = df.sort_values(by=’date’)

300 date = df[df.line == indexes[timenum-1]].iloc[0].date

301 if len(indexes[:-1]) != 0:

302 indexes = indexes[:-1]

51 303 readFile = open(os.path.expanduser("~/.find_orb/fo.txt"))

304 fileLines = readFile.readlines()

305 readFile.close()

306 findOrbTxt = open(os.path.expanduser("~/.find_orb/fo.txt"),"w" )

307 findOrbTxt.writelines([item for item in fileLines[:-1]])

308 findOrbTxt.close()

309 coord1 = SkyCoord(ra=df[df.line == indexes[-1]].iloc[0].RA,dec =df[df.line == indexes[-1]].iloc[0].Dec,unit=(u.hourangle, u.deg), distance=70*u.kpc)

310 trackletFound, indexes, timenum = findTracklet(indexes, trackletFound,timenum, lines, FOV, df, coord1, speed, date, maxTime, maxResidual, originalAddNoise, angleLim, nullResid, MOIDLim)

311

312

313 return trackletFound, indexes, timenum

314

315

316 ## function to plotRA and Dec

317 def plot_coords(df,noise_start=False,specframe=False,spectime=False):

318 """

319 Plots observations and noise. Not fully refined, especially with

320 regards to color and scale.

321 """

322 if not specframe:

323 specframe = df.frame.unique()

324 if not spectime:

325 spectime = df.time.unique()

326 if not noise_start:

327 noise_start = max(df[’line’])

328

329 fig = plt.figure(figsize=(8,6))

52 330 ax1 = fig.add_subplot(211, projection="mollweide")

331 ax2 = fig.add_subplot(212)

332 ax1.set_xticklabels([’14h’,’16h’,’18h’,’20h’,’22h’,’0h’,’2h’,’4h’,’6h’ ,’8h’,’10h’])

333 ax1.grid(True)

334 # ax2.set_xticklabels([’14h’,’16h’,’18h’,’20h’,’22h’,’0h’,’2h’,’4h’,’6h ’,’8h’,’10h’])

335 ax2.set_xlabel(’RA(degrees)’)

336 ax2.set_ylabel(’Dec(degrees)’)

337 # ax2.set_xlim(-61.9,-63.9)

338 # ax2.set_ylim(-33.4,-35.4)

339 ax2.grid(True)

340 colors = itertools.cycle((cm.rainbow(np.linspace(0, 1, 11))))

341

342 for frame in specframe:

343 eachframe = df[df[’frame’]==frame]

344 color = next(colors)

345 markers = itertools.cycle((’+’,’.’,’o’,’P’))

346 for time in spectime:

347 eachtime = eachframe[eachframe[’time’]==time]

348 # find where noise

349 noise = eachtime[eachtime[’line’] >= noise_start]

350 ra = Angle(noise[’RA’],unit=(u.hourangle))

351 ra = ra.wrap_at(180*u.degree)

352 dec = Angle(noise[’Dec’],unit=(u.deg))

353 ax1.scatter(ra.radian, dec.radian, s=20,color=color, marker =’ *’, facecolors=’none’, alpha=0.5)

354 ax2.scatter(ra.degree, dec.degree, s=20,color=color, marker =’ *’, facecolors=’none’, alpha=0.5)

355 # find where not noise:

356 not_noise = eachtime[eachtime[’line’] < noise_start]

357 ra = Angle(not_noise[’RA’],unit=(u.hourangle))

53 358 ra = ra.wrap_at(180*u.degree)

359 dec = Angle(not_noise[’Dec’],unit=(u.deg))

360 ax1.scatter(ra.radian, dec.radian, s=10,color=color, marker = next(markers), facecolors=’none’,alpha=0.5)

361 ax2.scatter(ra.degree, dec.degree, s=10,color=color, marker = next(markers), facecolors=’none’, alpha=0.5)

362

363

364 plt.savefig(’outputs/coord_plot.pdf’)

365

366 def findAngle(angleLim):

367 """

368 Finds angle ofa tracklet’s trajectory, assuminga flat

369 plane of movement, using cosine rule. Obviously, only

370 works with observations with three

371 """

372 col_specification = [(32, 43), (44, 55)]

373 fo = pd.read_fwf(os.path.expanduser("~/.find_orb/fo.txt"), colspecs= col_specification,header=None)

374 fo.columns = ["RA","Dec"]

375 if len(fo) == 3:

376

377 print("RAs:",fo[’RA’].iloc[0], fo[’RA’].iloc[1], fo[’RA’].iloc[2])

378 coordA = SkyCoord(ra=fo[’RA’].iloc[0],dec=fo[’Dec’].iloc[0],unit=( u.hourangle, u.deg), distance=70*u.kpc)

379 coordB = SkyCoord(ra=fo[’RA’].iloc[1],dec=fo[’Dec’].iloc[1],unit=( u.hourangle, u.deg), distance=70*u.kpc)

380 coordC = SkyCoord(ra=fo[’RA’].iloc[2],dec=fo[’Dec’].iloc[2],unit=( u.hourangle, u.deg), distance=70*u.kpc)

381 lenAB = coordA.separation_3d(coordB)

382 lenBC = coordB.separation_3d(coordC)

383 lenCA = coordC.separation_3d(coordA)

54 384 cosine_angle = ((lenAB ** 2) + (lenBC ** 2) - (lenCA ** 2)) / (2 * lenAB * lenBC)

385 angle = np.arccos(cosine_angle)

386 else:

387 angle = angleLim * u.radian

388 return angle

A.1.2 add_library.py

Module used for additional functions needed to process NEAT and PTF data.

1 import numpy as np

2 import pandas as pd

3 import datetime

4 from dateutil.relativedelta import relativedelta

5 from astropy.coordinates import SkyCoord, Angle, Distance

6 import astropy.units as u

7 from tqdm import tqdm, trange# used to make progress bars for star removal

8 import itertools# needed for progress bars

9 import os

10 import time

11 from astropy.time import Time

12 from datetime import timedelta

13 from astropy.io import ascii

14 import pickle

15 import random

16

17 def labelNEAT(new):

18 """

19 Function labelsNEAT data containing confirmed asteroids based on date

20 and time of observation. Creates new columns in the input DataFrame:

21 ’image’,’frame’ and’time’.

55 22

23 ’image’: Specifiesa best guess of whichNEAT image an observation

24 comes from. Images are ordinally ranked according to chronology.

25 ’frame’: Specifiesa best guess of which field of view an observation

26 comes from.

27 ’time’: Specifiesa best guess of which image ofa particular field of

28 view an observation comes from.I.e. ifa particular field of view has

29 been scanned byNEAT3 times,’time’ specifies if an observation comes

30 from the1st,2nd or3rd time.

31 Might be easier to think of’frame’ and’time’ as space and time

32 representations.’frame’ specifies where in the sky an observation is

33 from;’time’ represents that an observation comes from the nth time

34 NEAT has surveyed that part of the sky.

35

36 Not fully refined because of abnormalities inNEAT data, so labelling

37 is off in non-specific instances.

38

39 """

40 deAsterisk = str.maketrans({"*":r""})

41 image =[]

42 imagenum = 0

43 for idx in np.arange(len(new)):

44 if idx == 0:

45 imagenum = 0

46 elif new[’date’].iloc[idx] != new[’date’].iloc[idx-1]:

47 imagenum = imagenum + 1

48 image.append(imagenum)

49 new = new.assign(image=image)

50

51 # assign’frame’

52 frame = []

53

56 54 for idx in np.arange(len(new)):

55 if new[’image’].iloc[idx] == 0:

56 framenum = 0

57 else:

58 framenum = max(frame)+1

59 if new[’image’].iloc[idx] == new[’image’].iloc[idx-1]:

60 framenum = frame[idx-1]

61 elif new[’image’].iloc[idx] != new[’image’].iloc[idx-1]:

62 # this part is repetitive and hideous

63 subset = (new.loc[new[’image’] == new[’image’].iloc[idx -1]])

64 subset2 = (new.loc[new[’image’] == new[’image’].iloc[idx -1] -1])

65 subset3 = (new.loc[new[’image’] == new[’image’].iloc[idx -1] -2])

66 subset4 = (new.loc[new[’image’] == new[’image’].iloc[idx -1] -3])

67 subset5 = (new.loc[new[’image’] == new[’image’].iloc[idx -1] -4])

68 subset6 = (new.loc[new[’image’] == new[’image’].iloc[idx -1] -5])

69 subset7 = (new.loc[new[’image’] == new[’image’].iloc[idx -1] -6])

70 subset8 = (new.loc[new[’image’] == new[’image’].iloc[idx -1] -7])

71 subset9 = (new.loc[new[’image’] == new[’image’].iloc[idx -1] -8])

72 subset10 = (new.loc[new[’image’] == new[’image’].iloc[idx -1] -9])

73 fori in np.arange(len(subset)):

74 if (new[’name’].iloc[idx].translate(deAsterisk) == subset [’name’].iloc[i].translate(deAsterisk)):

57 75 framenum = frame[idx-1]

76 break

77 fori in np.arange(len(subset2)):

78 if (new[’name’].iloc[idx].translate(deAsterisk) == subset2 [’name’].iloc[i].translate(deAsterisk)):

79 framenum = frame[idx-(len(subset))-1]

80 break

81 fori in np.arange(len(subset3)):

82 if (new[’name’].iloc[idx].translate(deAsterisk) == subset3 [’name’].iloc[i].translate(deAsterisk)):

83 framenum = frame[idx-(len(subset))-len(subset2)-1]

84 break

85 fori in np.arange(len(subset4)):

86 if (new[’name’].iloc[idx].translate(deAsterisk) == subset4 [’name’].iloc[i].translate(deAsterisk)):

87 framenum = frame[idx-(len(subset))-len(subset2)- len(subset3)-1]

88 break

89 fori in np.arange(len(subset5)):

90 if (new[’name’].iloc[idx].translate(deAsterisk) == subset5 [’name’].iloc[i].translate(deAsterisk)):

91 framenum = frame[idx-(len(subset))-len(subset2)- len(subset3)-len(subset4)-1]

92 break

93 fori in np.arange(len(subset6)):

94 if (new[’name’].iloc[idx].translate(deAsterisk) == subset6 [’name’].iloc[i].translate(deAsterisk)):

95 framenum = frame[idx-(len(subset))-len(subset2)- len(subset3)-len(subset4)-len(subset5)-1]

96 break

97 fori in np.arange(len(subset7)):

58 98 if (new[’name’].iloc[idx].translate(deAsterisk) == subset7 [’name’].iloc[i].translate(deAsterisk)):

99 framenum = frame[idx-(len(subset))-len(subset2)- len(subset3)-len(subset4)-len(subset5)-len(subset6)-1]

100 break

101 fori in np.arange(len(subset8)):

102 if (new[’name’].iloc[idx].translate(deAsterisk) == subset8 [’name’].iloc[i].translate(deAsterisk)):

103 framenum = frame[idx-(len(subset))-len(subset2)- len(subset3)-len(subset4)-len(subset5)-len(subset6)-len(subset7)-1]

104 break

105 fori in np.arange(len(subset9)):

106 if (new[’name’].iloc[idx].translate(deAsterisk) == subset9 [’name’].iloc[i].translate(deAsterisk)):

107 framenum = frame[idx-(len(subset))-len(subset2)- len(subset3)-len(subset4)-len(subset5)-len(subset6)-len(subset7)-len( subset8)-1]

108 break

109 fori in np.arange(len(subset10)):

110 if (new[’name’].iloc[idx].translate(deAsterisk) == subset10 [’name’].iloc[i].translate(deAsterisk)):

111 framenum = frame[idx-(len(subset))-len(subset2)- len(subset3)-len(subset4)-len(subset5)-len(subset6)-len(subset7)-len( subset8 )-len(subset9)-1]

112 break

113 frame.append(framenum)

114

115 new = new.assign(frame = frame)

116

117 new = new.sort_values([’frame’,’date’], ascending=[True, True])

118 time = []

119 for idx in np.arange(len(new)):

59 120 if (new[’frame’].iloc[idx] != new[’frame’].iloc[idx-1]) and (new[’ date’].iloc[idx] != new[’date’].iloc[idx-1]):

121 timenum = 0

122 elif (new[’frame’].iloc[idx] == new[’frame’].iloc[idx-1]) and (new [’date’].iloc[idx] == new[’date’].iloc[idx-1]):

123 timenum = time[idx-1]

124 elif (new[’frame’].iloc[idx] == new[’frame’].iloc[idx-1]) and (new [’date’].iloc[idx] != new[’date’].iloc[idx-1]):

125 timenum = time[idx-1] +1

126 time.append(timenum)

127 new = new.assign(time = time)

128 new = new.sort_values(by=’date’)

129 return new

130

131

132

133 def removeStars(filename,df_allSources, StarFile):

134 """

135 RemovesPTF objects that are within 1" of Pan-STARRS sources.

136 Pickles dataframe post-star removal. If post-star removal

137 file exists, will use that instead.

138 """

139 if os.path.exists(’inputs/%s_dfStarsRemoved.pickle’%(filename)):

140 with open(’inputs/%s_dfStarsRemoved.pickle’%(filename),’rb’) as f :

141 df_allSources = pickle.load(f)

142 else:

143 print("reading stars")

144 stars = pd.read_csv(StarFile)

145 print("removing stars")

146 fori in trange(len(stars), desc="loop1"):

60 147 for index, row in tqdm(df_allSources.iterrows(), desc="loop2" ):

148 c1 = SkyCoord(ra=row["ra"],dec=row["dec"],unit=(u.deg, u. deg ))

149 c2 = SkyCoord(ra=stars["raMean"].iloc[i], dec=stars[" decMean"].iloc[i],unit=(u.deg,u.deg))

150 if c1.separation(c2) < Angle(1, unit = u.arcsec):

151 df_allSources.drop(index,axis=0,inplace=True)

152 with open(’inputs/%s_dfStarsRemoved.pickle’%(filename),’wb’) as f :

153 pickle.dump(df_allSources, f)

154 # break

155 return df_allSources

156

157 def tblConvert(filename, StarFile):

158 """

159 ConvertsPTF data fromIPAC’s.tbl format to Pandas DataFrame

160 andMPC 80-col format for use with linking code and Find_Orb

161 respectively. Pandas DataFrame is pickled.

162 """

163 if not os.path.exists(’inputs’):

164 os.makedirs(’inputs’)

165 if os.path.exists(’inputs/%s_df.pickle’%(filename)):

166 with open(’inputs/%s_df.pickle’%(filename),’rb’) as f:

167 df = pickle.load(f)

168 else:

169 raw = ascii.read(’%s.tbl’%(filename),format=’ipac’, guess=False)

170 df = raw.to_pandas()

171 with open(’inputs/%s_df.pickle’%(filename),’wb’) as f:

172 pickle.dump(df, f)

173 if StarFile:

174 df = removeStars(filename,df, StarFile)

61 175 data = pd.DataFrame()

176 print("converting.tbl to DataFrame")

177 data [’name’]=df[’sid’]

178 data [’date’]=df[’obsmjd’].apply(lambda x: datetime.datetime.strptime( Time (x, format=’mjd’, out_subfmt=’date*’).iso,’%Y-%m-%d%H:%M:%S.%f’))

179 data [’RA’]=df[’ra’].apply(lambda x: Angle(x, u.degree).to_string(unit= u.hourangle, sep=’’, precision=2, fields = 3))

180 data [’Dec’]=df[’dec’].apply(lambda x: Angle(x, u.degree).to_string( unit=u.degree, sep=’’, precision=1, alwayssign = True, fields = 3))

181 data [’line’]=np.arange(len(df))

182 data = data.sort_values(by=’date’)

183 image =[]

184 imagenum = 0

185 print("labelling data")

186 for idx in np.arange(len(data)):

187 if idx == 0:

188 imagenum = 0

189 elif (data[’date’].iloc[idx-1] + timedelta(seconds=1) < data[’ date’].iloc[idx]) :

190 imagenum = imagenum + 1

191 image.append(imagenum)

192 data = data.assign(image=image)

193 data [’frame’]=df[’ptffield’]-min(df[’ptffield’])

194 data = data.sort_values([’frame’,’date’], ascending=[True, True])

195 times = []

196 for idx in np.arange(len(data)):

197 if idx == 0 :### this needs to be added inNEAT library, in label_NEAT function

198 timenum = 0

199 elif (data[’frame’].iloc[idx] != data[’frame’].iloc[idx-1]) and( data [’date’].iloc[idx] != data[’date’].iloc[idx-1]):

200 timenum = 0

62 201 elif (data[’frame’].iloc[idx] == data[’frame’].iloc[idx-1]) and( data [’date’].iloc[idx] == data[’date’].iloc[idx-1]):

202 timenum = times[idx-1]

203 elif (data[’frame’].iloc[idx] == data[’frame’].iloc[idx-1]) and( data [’date’].iloc[idx] != data[’date’].iloc[idx-1]):

204 timenum = times[idx-1] +1

205 times.append(timenum)

206 data = data.assign(time = times)

207 data = data.sort_values(by=’date’)

208

209 # makeMPC 80-col format

210 print("converting.tbl to MPC1992")

211 filename ="%s_MPC1992"%(filename)

212 if os.path.exists("%s.txt"%(filename)):

213 os. remove ("%s.txt"%(filename))

214 f=open("%s.txt"%(filename),"a+")

215 for idx in np.arange(len(data)):

216 RA = Angle(df[’ra’].iloc[idx] , u.degree).to_string(unit=u. hourangle, sep=’’, pad=True, precision=2, fields = 3)

217 Dec = Angle(df[’dec’].iloc[idx], u.degree).to_string(unit=u.degree , sep =’’,pad=True, precision=1, alwayssign = True, fields = 3)

218 date = datetime.datetime.strptime(Time(df[’obsmjd’].iloc[idx], format=’mjd’, out_subfmt=’date*’).iso,’%Y-%m-%d%H:%M:%S.%f’)

219 Year =’C’+date.strftime(’%Y’)

220 Month = date.strftime(’%m’)

221 dt = timedelta(hours=date.hour, minutes=date.minute, seconds=date. second )

222 secs_per_day = 24*60*60# hours* mins* secs

223 Day = str(np.around((dt.total_seconds()/secs_per_day),decimals=5)) . ljust (7,’0’)

224 Day = str(str(date.day).zfill(2))+Day[1:]

225 Mag = str(np.around(df[’mag_auto’].iloc[idx],decimals=1))

63 226 Obs =’I41’

227 new_line =’ J91tX00B’ + Year +’’ + Month +’’ + Day +’ ’+RA+’’ + Dec +’’ + Mag +’’ + Obs+’\n’

228 f.write(new_line)

229 f. close ()

230 return filename, data

231

232

233 def addNoise(df, noise_lvl, maxSep, filename, lines):

234 """

235 Function adds noise points to data by randomly choosing an observation

236 and placinga point within1 degree of the chosen observation. The

237 noise point uses the chosen observation asa template, so has all the

238 non-positional attributes(magnitude, time of observation, etc) of the

239 original observation for use inMPC 80-col format. Noise observations

240 are appended at the end of file.

241 """

242 numNoise = int((len(df) * noise_lvl) / (1 - noise_lvl))

243 maxLines = len(lines)

244 lineNum = len(lines)

245 f=open("%s.txt"%(filename),"a")

246 maxIm = df[’image’].max()

247 fori in np.arange(numNoise):

248 # generate noise elements

249 image = random.randint(0, maxIm)

250 field = df[df.image == image]

251 field = field[field.line < maxLines]

252 template = random.randint(0, len(field)-1)

253 templateCoord = SkyCoord(ra=field[’RA’].iloc[template],dec=field[’ Dec’].iloc[template],unit=(u.hourangle, u.deg), distance=70*u.kpc)

254 position_angle = np.random.uniform(0,360) * u.deg

255 separation = np.random.uniform(0,maxSep) * u.arcsec

64 256 ranCoord = templateCoord.directional_offset_by(position_angle, separation)

257 ranRA = ranCoord.to_string(style=’hmsdms’,sep=’’, precision=2, alwayssign=False, pad=True)[0:11]

258 ranDec = ranCoord.to_string(style=’hmsdms’,sep=’’, precision=2, alwayssign=True, pad=True)[12:]

259 # append noise data to DataFrame

260 df = df.append({’name’:’FAKE’+field[’name’].iloc[template][4:7],

261 ’date’: field[’date’].iloc[template],

262 ’RA’: ranRA,

263 ’Dec’: ranDec,

264 ’line’: lineNum,

265 ’image’: field[’image’].iloc[template],

266 ’frame’: field[’frame’].iloc[template],

267 ’time’: field[’time’].iloc[template]

268 }, ignore_index = True)

269 # append noise data to file

270 line_original = lines[field[’line’].iloc[template]]

271 new_line = line_original[0:5]+’FAKE’+line_original[9:32]+ranRA+’’ +ranDec+line_original[56:]

272 # print(line_original)

273 print(new_line)

274 f.write(new_line)

275 lineNum = lineNum + 1

276

277 f. close ()

278 df = df.sort_values(by=’date’)

279 return df

65 A.1.3 FindPOTATOs.py

The main linking code, FindPOTATOs.

1

2 import numpy as np

3 import pandas as pd

4 import datetime

5 from astropy.coordinates import SkyCoord, Angle, Distance

6 import astropy.units as u

7 import os

8 import re# regular expressions, used to search for noise points in tracklets

9 import linking_library as ll

10 import add_library as al

11 import shutil

12 import math

13 print(’reading data’)

14 """

15 change below based on input and desired outputs.

16 """

17 import configparser

18 config = configparser.ConfigParser()

19 config.read(’inputs/linking_parameters.ini’)

20

21 if config[’FILE_LOCATIONS’][’IPAC_table_file’] and config[’FILE_LOCATIONS’ ][’NEAT_file’]:

22 print(’please only specify one file: eitherNEAT data orIPAC- formatted data’)

23 quit ()

24 elif(not config[’FILE_LOCATIONS’][’NEAT_file’]) and(not config[’ FILE_LOCATIONS’][’NEAT_file’]):

25 print(’please specify one file for analysis’)

66 26 quit ()

27 elif config[’FILE_LOCATIONS’][’IPAC_table_file’]:

28 ipac = 1

29 elif config[’FILE_LOCATIONS’][’NEAT_file’]:

30 ipac = 0

31

32 noise_lvl = float(config[’SETTINGS’][’Noise_level’])

33 saveObs = float(config[’SETTINGS’][’Save_observation’])

34 maxSep = float(config[’SETTINGS’][’Maximum_separation’])

35

36

37

38 if ipac == 1:

39 filename = config[’FILE_LOCATIONS’][’IPAC_table_file’]#’ptf10_2’#’ onlyGood’#’ptf_10arcmin’

40 starFile = config[’FILE_LOCATIONS’][’star_file’]

41 filename, data = al.tblConvert(filename,starFile)

42 if ipac == 0:

43 filename= config[’FILE_LOCATIONS’][’NEAT_file’]

44 widths = [

45 14,#name

46 18,#date

47 12,#RA

48 12,#Dec

49 ]

50 raw = pd.read_fwf(’%s.txt’ %(filename), widths=widths,header=None)

51 raw.columns = ["name","date","RA","Dec"]

52 data = pd.DataFrame()

53 data [’name’]=raw[’name’]

54 data [’date’]=raw[’date’].apply(lambda x: datetime.datetime.strptime(x [ -16: -6] ,’%Y%m%d’) + datetime.timedelta(days=float(x[-6:len(x)])))

55 data [’RA’]=raw[’RA’]

67 56 data [’Dec’]=raw[’Dec’]

57 data [’line’]=np.arange(len(data))

58 data = data.sort_values(by=’date’)

59 print(’labelling data’)

60 data=al.labelNEAT(data)

61

62 inputTxt = open(’%s.txt’ %(filename))

63 lines = inputTxt.readlines()

64 inputTxt.close()

65

66 original = data.copy(deep=True)

67

68

69 if (noise_lvl) > 0 & (ipac == 0):

70 """

71 Datafile is duplicated to create noise file.

72 """

73 shutil.copy(’%s.txt’ %(filename),’%s.txt’ %(filename+’_noise’))

74 filename = filename +’_noise’

75 print(’adding noise’)

76 data = al.addNoise(data, noise_lvl, maxSep, filename, lines)

77

78 originalAddNoise = data.copy(deep=True)

79 noiseObsNum = len(data) - len(original)

80 if not os.path.exists("outputs/"):

81 os. mkdir ("outputs")

82

83 if ipac ==0:

84 ll.plot_coords(data,len(lines))

85

86 inputTxt = open(’%s.txt’ %(filename))

87 lines = inputTxt.readlines()

68 88 inputTxt.close()

89

90 dates = data.date.unique()

91

92 if os.path.exists("outputs/tracklets.txt"):

93 os. remove ("outputs/tracklets.txt")

94 if os.path.exists("outputs/OrbitalElements.txt"):

95 os. remove ("outputs/OrbitalElements.txt")

96 if os.path.exists("outputs/AllPotentialTracklets.txt"):

97 os. remove ("outputs/AllPotentialTracklets.txt")

98

99 print(’finding tracklets and running through find_orb’)

100 print(data)

101

102

103 combinations = 0

104 correctTracklets = 0

105 falsePositives = 0

106

107

108 ### first pass###

109 print(’first pass’)

110

111 forx in np.arange(len(config.sections())-2):

112 keyName =’PARAMETERS’+ str(x + 1)

113 print(’pass#’+str(x+1))

114 """

115 parameters, can be changed with linking_parameters.ini

116 """

117 angleLim = float(config[keyName][’Angle_limit’])

118 speed = float(config[keyName][’Angular_speed’])

119 maxResidual = float(config[keyName][’Maximum_residual’])

69 120 MOIDLim = float(config[keyName][’MOID_limit’])

121 nullResid = True

122 if (x+1)==2:

123 nullResid = False

124 MOIDLim = False

125 data, originalAddNoise, combinations, correctTracklets, falsePositives = ll.linking(saveObs,combinations, correctTracklets, falsePositives, lines, data, speed, maxResidual, original, originalAddNoise, angleLim, nullResid = nullResid, MOIDLim = MOIDLim)

126

127

128 if os.path.exists("outputs/tracklets.txt"):

129 ll.plotTracklets()

130 print("unlinked observations")

131 print(data)

132 print(noiseObsNum,’noise observations added’)

133 print(len(original),’real observations used’)

134 print(combinations,’total tracklets found’)

135 print(correctTracklets,’good tracklets found’)

136 print(falsePositives,’false positives found’)

70