Computer Science Helps Shield Earth from Asteroids

Home , 2014 AA

COMPUTER SCIENCE HELPS SHIELD EARTH FROM ASTEROIDS

Bruce Yellin Data Center Architect [email protected] Table of Contents The Threat ...... 5

Finding The Threats: A Brief History of Asteroid Detection ...... 7

How Do We Find Asteroids Today? ...... 10

Optical Telescopes ...... 10

Charge-Coupled Device – CCD ...... 11

Radio and Radar Telescopes ...... 13

Ground-Based Telescopes ...... 15

Large Synoptic Survey Telescope - LSST - Optical Telescope ...... 15

Asteroid Terrestrial-impact Last Alert System – ATLAS – Optical Telescope ...... 17

Satellite Telescopes ...... 18

NEOWISE – Optical Telescope ...... 18

Gaia Space Telescope – Optical Telescope ...... 20

The Square Kilometer Array – Mankind’s Largest Big Data Challenge – Radio Telescope 22

Using Hadoop To Spot An Asteroid ...... 27

3D Asteroid Modeling – Try It Yourself! ...... 28

Taking Action ...... 29

High-Performance Computing and Big Data ...... 34

Conclusion ...... 38

Appendix - Glossary ...... 40

Appendix – Draw an Ellipse in Excel ...... 41

Footnote...... 42

Disclaimer: The views, processes or methodologies published in this article are those of the author. They do not necessarily reflect EMC Corporation’s views, processes or methodologies.

2016 EMC Proven Professional Knowledge Sharing 2

Earth is facing an asteroid threat from outer space, and it isn’t the Arachnids of Klendathu from the 1997 science fiction film Starship Troopers hurling them at our planet. It is a real threat from one of the hundreds of millions of asteroids that orbit the Sun and travel between Mars and Jupiter and beyond. In essence, Earth sits in an asteroid shooting gallery.

Many were caught off guard early Friday, February 15, 2013, when a medium-sized 66-foot wide meteoroid weighing 28 million pounds (13,000 metric tons) approached Earth at 43,000 mph1. (Meteoroids traveling at 160,000 mph can enter the atmosphere, eventually decelerating to a much slower speed2.) Coming in at a steep 30o angle3, friction made it glow 23-29 miles above the ground, and it exploded in the atmosphere 18 miles over Chelyabinsk, Russia, producing a Sun-bright light.

With kinetic explosive energy greater than 20-30 WWII atomic bombs, the shockwave broke glass Chelyabinsk Asteroid Orbit Earth at Impact windows and hurt nearly 2,000 people4. Astronomers never saw the meteoroid coming – it was just too small and it came from behind the Sun Sun so Earth’s telescopes could not detect it. This orbit Venus orbit Earth diagram, constructed after the event, shows the path orbit

5 Mars in yellow-green . Current estimates indicate there orbit could be as many as 80 million “rocks” of this size6.

“…it came dangerously In a short 8 day period from March 4-11, 2014, four asteroids silently close to wiping us all approached Earth. The largest would have likely wiped out a city the out.” – Prof. Brian Cox size of London. On March 4, a 380-foot asteroid called “2014 DU110” came within 13 million miles of Earth. The next day, an asteroid discovered by telescope only 5 days earlier named “2014 DX110” passed the Earth from about the same distance as the Moon. Given the vastness of space, many would call this a near-miss. On March 6, a 100 foot “2014

2016 EMC Proven Professional Knowledge Sharing 3

EC” asteroid (orbit diagrams to the right7), discovered only 2 days earlier, came within 38,300 miles of our planet – less than 1/6th the distance to the moon and just above the 22,000 mile geosynchronous orbit of some satellites. According to University of Manchester physicist Dr. Brian Cox, there is an “asteroid with our name on it” and it is only a matter of time before an asteroid large enough to wipe out the human race collides with Earth.”8

Asteroid impacts are not rare. While the chance that a large one will obliterate a city is once in a century9, this map shows a total of 556 impacts from 1994-2013, with 26 asteroids, containing a force of 1 to 600 kilotons of TNT, exploding in the atmosphere. By contrast, the Hiroshima atomic bomb equaled 15 kilotons of TNT. One might conclude our current strategy to protect the planet consists of “blind luck”.

In 1908, an asteroid perhaps as big as “2014 CU13” exploded 3-6 miles above the city of Vanavara, Russia. Called the Tunguska Event, it destroyed a 770 square mile area about 2,200 miles west of Moscow. The damage equaled 10-15 megatons of TNT (over 1,000 times the energy of the WWII atom bomb). An explosion of that magnitude over a heavily populated area like New York City would wipe it out, kill perhaps a million people, create an unparalleled ecological disaster and plunge the world’s economy into chaos10.

2016 EMC Proven Professional Knowledge Sharing 4

Sixty-five million years ago, as noted by the Alvarez hypothesis11, an asteroid 6-7 miles in diameter (10-12 kilometers) traveling at 45,000 mph (20 km/s)12 struck offshore near the Yucatán Peninsula with the force of three billion WWII atomic bombs13. It created a 15-mile deep, 110-mile wide Chicxulub (Chi’-shoo-loob) crater and a 100-meter (328 feet) tsunami. The impact triggered the planet’s fifth mass extinction event14, eradicating dinosaurs and most other species15, and marked the end of the 350 million-year-old Age of Reptiles16.

Asteroids of this size hitting Earth would convert kinetic energy into an instantaneous inferno with “hot-coal colored” rocks shooting into the sky eventually causing global firestorms. Ash would fill the air and block out the sun. Food and breathable air would be gone. If this happened today, perhaps landing further offshore, U.S. Gulf states like Florida, Alabama, Mississippi, Louisiana and Texas might disappear underwater. The human race would be extinct.

While astronomers believe the chances of a devastating strike is BIG DATA “When unlikely, it seems inevitable. And if one does hit, mankind would be accumulated data exceeds the capacity or eradicated. Earth needs an approach that gives scientists and leaders capture rate of local resources, local storage enough notice to deflect an asteroid when it is millions of miles away. and manipulation is We are scanning the skies for asteroids. We have plans to protect the impractical at best, impossible at worst.”17 human race. Asteroid defense is a big data analysis problem.

The Threat Asteroids are minor planets that orbit our part of the Solar System in 4 distinct regions. The main asteroid belt contains millions of bodies 200 million miles from the Sun and is found between the orbits of Mars and Jupiter18. There are also Trojan groups which pace and follow Jupiter by The main asteroid belt is 100 million o miles wide and ±60 , a Kuiper belt or region which ranges from 111 million miles 2,800 to 4,650 million miles away19, and the Oort from Earth cloud which is thought to be 100,000 AU or 9,300 The Trojan Venus Jupiter Group of Mercury billion miles from the Sun20. This image shows the asteroids Earth expected location of the main asteroid belt (shown in Mars red/pink in this diagram) and the Trojan group (green in the diagram) on June 28, 201621.

2016 EMC Proven Professional Knowledge Sharing 5

While most asteroids “peacefully” orbit the Sun, there are those that travel through our inner solar system and are of primary concern should they strike the Earth. These are called Near Earth Asteroids (NEAs), and when combined with Near Earth Objects (NEOs) such as satellite debris, create a hazard ranging from fireballs in the sky to the dinosaur extinction documented by Alvarez.

For the most part, asteroids are 4.5 billion- year-old rotating, irregular solar system building blocks. They are sometimes called planetoids. Comprised of clay, silicates, and nickel-iron, they can weigh from 1,200 billion billion tons (5,000 times lighter than Earth)22 in the case of the largest called Ceres, down to the Asteroid Size Diameter Quantity weight of a car or even a pebble. They can also be as A few hundred miles Several dozen large as Ceres’s 590-mile diameter (Earth’s diameter is Tens of miles Hundreds A few miles Thousands 7,918 miles). About 10 million NEAs are larger than 10 Large fraction of a mile Tens of thousands meters wide while many millions of asteroids are tiny Small fraction of a mile Hundreds of thousands http://cseligman.com/text/asteroids/sizedistribution.htm with little mass.23

Current asteroid hunting initiatives mainly scan space for objects larger than 1 kilometer – 3,280 feet – or about 500 feet higher than Burj Khalifa in Dubai, the world’s tallest building. Astronomers estimate they have found about 95% of civilization-ending asteroids24.

With Asteroids 30 feet wide passing near our Moon every week, a study that examined the last 20 years of data from global nuclear weapons testing sensors concluded that perhaps 60 asteroids approaching 20 meters in size have hit Earth's atmosphere, exceeding previous estimates25. In 2005, the U.S. Congress instructed NASA to find 90% of the asteroids 140 meters wide (1.5 football fields long) by the year 202026, but as of late 2014, they have only found 10% of them27. There is no mandated program for asteroids smaller than 500 feet long.

The Minor Planet Center (MPC) maintains a database of over 140 million asteroid observations and tracks over 700,000 asteroids28. Orbit calculations must be constantly revised because they change (for example, when objects collide). The following Hubble Space Telescope image

2016 EMC Proven Professional Knowledge Sharing 6 shows the 460-foot diameter asteroid “P/2010 A2” gaining a dust and gravel trail after being struck by another asteroid29, undoubtedly changing its orbit. It is presently beyond our “big data” technology to comprehensively monitor all of the main asteroid belt activity.

An asteroid’s path can also be altered by the Yarkovsky effect – when the Sun warms an asteroid, the heat is dissipated in another direction as it rotates30. Accurate orbit predictions require everything is tracked. From Earth, one way to track an asteroid’s rotation is by observing the timing of light reflecting off its surface. Spherical asteroids have a fairly constant amount of reflected light31. Asteroid occultation, occurring when an asteroid passes in front of a star temporarily blocking its light, can also help us measure its size, shape and exact position32.

Finding The Threats: A Brief History of Asteroid Detection If astronomers could predict meteoroid and asteroid strikes years in advance, Earth would conceivably have time to prepare for the disaster or possibly even prevent it. It all starts with finding the threats and the first such discovery occurred in 1801.

An Italian astronomer, Giuseppe Piazzi, was in Palermo searching the Italian sky with the telescope to the left, looking to prove a then- prevailing theory that a planet orbited between Mars and Jupiter33. He recorded the position of a small dot of light on January 1, 1801, along with angular measurements and exact times as shown in the table below. (A precursor to today’s rows and columns in Excel and database theory, the use of data tables to record information can be traced to the Sumerians of 3100 BC34). He wasn’t sure if it was a star or a comet35. On subsequent nights, he observed the dot move from its original position and in front of known stars. Overall, he made 22 observations of a large object for 41 days until it disappeared behind the Sun on February 11, 1801. He named the object Ceres Ferdinandea in honor of the Roman era goddess of agriculture (Ceres or Cerere in Italian) and King Ferdinand of Sicily36, although it

2016 EMC Proven Professional Knowledge Sharing 7 was later known as Ceres. After publishing his data, other astronomers tried to find the object in the August and September sky, without success.

A 24-year old German mathematician, Carl Friedrich Gauss, studied the complex problem, taking into account that Piazzi’s observations were made from (1) Earth’s 24-hour circular rotation (2) while the planet is moving along an elliptical orbit around the Sun and (3) the motion of the object also orbited the Sun. Gauss needed to understand the object’s orbit through an ever changing, time-sensitive set of motions.

In general, the orbit of a planet or asteroid is based on how close it resembles a circle, ellipse or parabola. This is called eccentricity and is the deviation from a circle with an eccentricity of 0. A hyperbola has an eccentricity of 2, a parabola has an eccentricity of 1, and an ellipse is

Aphelion Perihelion between a parabola and a circle. Asteroid [NOTE: If you would like to try your hand Semi-major Sun axis at constructing an ellipse, please see the appendix.] No one knew what type of orbit Ceres was following, but Gauss assumed it was elliptical - i.e. an eccentricity between 0 and 1. Mathematicians and astronomers had no known methods to compute an elliptical orbit from available observations.

From Piazzi’s 22 observations, Gauss decided to work with only three Ceres from January 2, January 22, and February 1137. The actual orbit of the Earth was well understood in 1801, so Gauss could pinpoint Piazzi’s Ceres position for these Ceres Piazzi Gauss Calculations Ceres Observation Time Right observations. Using the exact Date HH:MM:SS Ascension Declination time to the fraction of a Jan 2, 1801 08:39:04.6 51º 47′ 49″ 15º 41′ 05″ Jan 22, 1801 07:20:21.7 51º 42′ 21″ 17º 3′ 18″ second, and two angles down Feb 11, 1801 06;11:58.2 54º 10′ 23″ 18º 47′ 59″ to the tenths of seconds of arc, Jan 2 but lacking the distance from Palermo to the white dot, Jan 22 Gauss was able to construct 11 equations in 6 unknowns Feb 11 and solve this complex problem using a “least squares” approximation method he had developed years earlier to analyze the Moon’s orbit.

2016 EMC Proven Professional Knowledge Sharing 8

Least squares can help estimate an orbit when there are many unknown equations. It is often used to determine the approximate shape and direction of a best fitting curve with a given set of points. This is done by minimizing the sum of the squares of the offsets of the data points. On the left is an example of red data points and the resulting blue curve that could be drawn as the line that would best represent the points. In Gauss’s case as shown on the right, using just 3 observation points could mean the object is traveling through space in a circular, parabolic, elliptical, or hyperbolic curve. Gauss leveraged the work of Johannes Kepler almost two centuries earlier and assumed Ceres followed an elliptical orbit.

On November 25, 1801, astronomers were able to find Ceres in the sky not far from where Gauss had predicted it would be38. The basis of Gauss’s calculations is still used today to calculate post-flight trajectory simulations of solid and liquid fueled rockets39.

As an asteroid, it was soon given the name “1 Ceres” as early discoveries were given a number followed by a mythical name such as 2 Juno, 3 Pallas, 4 Vesta, and so on40. Over time, the MPC adopted other naming conventions including a provisional designation and a permanent designation. These Example: The meaning behind the name of asteroid "2012 DA14" Year 2012 names can be confusing. First A B C D E F G H J K L M N O P Q R S T U V W X Y To the right is an Letter J F M A M J J A S O N D J a F e M a A p M a J u J u A u S e O c N o D e explanation of the a n e b a r p r a y u n u l u g e p c t o v e c D n b r r y n l g p t v c provisional designation 1 1 1 1 1 1 1 1 1 1 1 1 for asteroid “2012 DA14” 1 6 1 6 1 6 1 6 1 6 1 6 1 6 1 6 1 6 1 6 1 6 1 6 discovered on February Second A B C D E F G H J K L M N O P Q R S T U V W X Y Z Letter A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 23, 201241. Permanent Subscript 14 Multiply the number by 25 and add 1. So 14 becomes 14*15+1 = 351 numbers are assigned by As a result, asteroid "2012 DA14" was the 351st object found in 2012 in the 2nd half of February the International Astronomical Union (IAU) when the object has enough observations to ensure it can be found at another time.

2016 EMC Proven Professional Knowledge Sharing 9

How Do We Find Asteroids Today? Telescopes are designed to receive frequencies of electromagnetic waves called wavelengths. We are very familiar with the visible light wavelength that allows us to see colors in the 400–700 nanometer (nm) frequency Wavelengths range , but there are many wavelengths that we cannot see. There are shorter X-ray  and ultraviolet       wavelengths, as well as longer infrared  and radio  wavelengths.

Optical telescopes are either ground-based or space-based, use lenses, and are generally designed to capture light in the infrared through X-ray spectrum. Their images can be affected by atmospheric distortions, so they are often located on high mountain tops to minimize the interference, or in space42. Asteroids appear much brighter in infrared than in visible light.43 Radio telescopes are only found on Earth, and use parabolic receivers to capture long wavelengths. Asteroids that reflect sunlight can be seen by optical telescopes while very dark non-reflective asteroids are best viewed by a radio telescope. This set of Crab Nebula images shows the amount of information available in each of the wavelengths44.

radio wave infrared visible light ultraviolet X-ray

Optical Telescopes

There are three basic types of optical telescopes – refractor, reflector, and compound. Refractor  telescopes have a large glass lens on its farthest end allowing light to be bent (refracted) to the focal point and magnified when viewed through the eyepiece45. Issac Newton invented the reflector  telescope. Light bounces (reflects) off a rear mirror until it reaches a

2016 EMC Proven Professional Knowledge Sharing 10 flat mirror. It is then directed to the eyepiece after reaching the focal point. The compound  or catadioptric telescope uses reflecting and refracting to reduce optical error. Light is bounced off a curved lens in the back, then bent by a lens towards the front, and finally sent backward again through its focal point and out the eyepiece.

Charge-Coupled Device – CCD This miracle of integrated circuits revolutionized the world of photography and optical telescope-based astronomy. Up until 1980, modern astronomers relied on film cameras. Invented at Bell Labs in 1969 for use as a memory device46, the CCD ushered in the era of digital photography, which meant images could be transmitted and digitally stored on a disk. This is the same camera technology that we now take for granted in our smartphones. Whereas film uses silver halides suspended in an emulsion to capture certain wavelengths of photons, the silicon CCD transforms wavelengths into electric signals. Without the CCD and powerful processors with large memory capacity, telescopes such as the Hubble Space Telescope would be near impossible if it relied on film for imagery.

A CCD contains an array of photodiodes that essentially absorb photons of light and convert it into a measurable electrical charge47. Comprised of silicon, they absorb photons and store them like a capacitor such that the greater the number of photons, the higher the electrical charge. In rapid succession, single pixels contained in shifting rows of image information are processed by dedicated circuits and handed off to a serial shift register – something that assembler language programmers are very familiar with. Electron packets accurately timed by a horizontal shift register clock are shifted one row at a time to an output amplifier which registers the photodiode charge. When the array has been exposed to light, the values are stored in memory - see the illustration to the left48.

2016 EMC Proven Professional Knowledge Sharing 11

The CCD memory images are bitmap (raster) graphics – a series of black and white dot (pixels). The images lend themselves to a table layout similar to Excel’s (x, y) A 1-bit asteroid addressing scheme of rows, representation columns, and cells. This allows the data to easily be manipulated using most computer languages. In this 0 0 0 0 simple example, you see a 0 1 1 0 = 0 0 1 1 magnified asteroid shape translated 0 0 1 0 into a 1-bit matrix image of zeroes and ones. With an 8-bit image, up to 256 shades of gray can be represented in each cell based on the electron charge of each pixel. More bits equal higher resolution and a larger disk storage requirement.

Wavelength The material used to build the CCD photodiode dictates the Photodiode material nm Silicon 190–1100 wavelength it records. For example, a silicon photodiode Germanium 400–1700 Indium gallium arsenide 800–2600 captures visible light in the 190 - 1100 nm electromagnetic Lead(II) sulfide <1000–3500 Mercury cadmium telluride 400–14000 spectrum. https://en.wikipedia.org/wiki/Photodiode Fairchild Semiconductor produced the first CCD in 1973. With a resolution of 100 x 100 pixels (~10 KB), it was used in a telescope the following year49. In 1975, Kodak built the first digital camera. It weighed 8 pounds and recorded a 0.01 megapixel (100 x 100 pixels) black and white photo to cassette tape (shown to the right of the blue body of the camera) in 23 seconds50. In comparison, the iPhone 6s incorporates a 12-megapixel camera51.

Color filters enable a grayscale CCD to record color images. A red filter allows only red light to pass through to the pixel, a green filter absorbs all the colors of visible light except for green, and so forth. CCDs can be arranged in a mosaic with discrete color “Bayer” filters as shown to the left, with each CCD mapped to a primary color.

2016 EMC Proven Professional Knowledge Sharing 12

Multi-chip mosaics are a cost-effective way to gain the advantages of a much larger CCD or can be used to build a camera with far greater resolution than might be available with a single chip design. The image to the right is from the wide-field Chilean VLT Survey Telescope that uses 32 CCD chips, each with 2K x 4K pixels, making the entire mosaic a 16K-by-16K, or 268 megapixels52.

Radio and Radar Telescopes All telescopes capture photons. Optical telescopes capture photons with a wavelength of about 390- 700 nm (purple to red) and record them with a CCD camera. Radio telescopes capture the longest wavelengths, typically 1 millimeter up to hundreds of meters, and do not use a CCD camera.

Even though the same object in the sky emits photons across all wavelengths, our eyes can only process certain wavelengths – i.e., we cannot see or hear a radio wave. The parabolic shape of the radio dish antenna focuses the low energy photons at the antenna. The antenna absorbs the energy and hands the weak space signal to an amplifier. From there, the signals are usually recorded on a disk drive and processed by computer.

Radio telescopes detect asteroids (or any other object) by initially sending a signal into space, and if it bounces off an asteroid, the antenna receives that signal – a “ping” and “echo”. The amount of time the radio wave takes to make the round trip is used to calculate the distance from the dish to the asteroid. The technique is called ranging and is the basis of RADAR (Radio Detection and Ranging).

2016 EMC Proven Professional Knowledge Sharing 13

The following set of 5 images is based on the work of Emily Lakdawalla53 and depicts a radio dish sending a signal towards the asteroid . The asteroid is moving, rotating and irregularly shaped. The signal bounces off the closest part of the asteroid first , with subsequent waves bouncing back as they reach the farthest portions of the asteroid . As the dish receives and processes the reflected signals, a waveform image of the asteroid begins to appear . Eventually, the dish receives the entire reflected signal, including those parts bouncing off the farthest face of the asteroid .

Signal reflects Signal reflects from closest from closest parts parts of of asteroid first asteroid first Reflected wavelengths compressed from parts rotating Radio dish sees Radio dish towards antenna, return signals at sends extended from many wavelengths signal parts rotating around broadcast away one     

broadcast wavelength broadcast wavelength broadcast wavelength

wavelength

wavelength wavelength time time time Since the object is irregular, rotating, and moving (left to right, near to far, etc.), the imagery taken over days would show multiple facets of the asteroid. For example, in this radar image taken of asteroid “2007 PA8”, these 9 reflected images were taken over a 2 week period and show multiple sides of this rotating and moving object.

From the orbit diagram of November 5, 2012, the asteroid came within 0.0472 AU or 4 million miles from the radar dish on Earth54 (Earth’s “white” orbit appears next to the 2007 PA8 “blue” orbit.) The processing of the radar image would be able to estimate the size of the asteroid and its movement since the radio signals are transmitted and received at the speed of light.

With a radar telescope, astronomers are not tied to reflective sunlight or radiation. By bouncing a signal off an object, day or night, clear sky or cloudy, the object is illuminated by reflected radio waves allowing them to evaluate its intensity, direction, orbit and other deduced data.

2016 EMC Proven Professional Knowledge Sharing 14

Ground-Based Telescopes Telescopes can be located on Earth or in space, with pros and cons for each approach. For example, Earth-bound telescopes can use very large mirrors such as the 10-meter mirror in the Keck Observatory in Hawaii whereas the Hubble Space Telescope uses a 2.4-meter mirror. Larger mirrors gather more light and ground telescopes generally cost less. Space-based telescopes are free from Earth’s atmospheric distortions and can capture greater wavelengths of light that would normally be filtered out by our atmosphere55. With that in mind, let’s take a look at some of the major telescopes in use and their standing in the big data era.

Large Synoptic Survey Telescope - LSST - Optical Telescope Scheduled to be operational in January 2022, the LSST’s goal is to photograph space from Earth every few nights to find asteroids and perhaps unlock the nature of dark energy. Using a wide field of view telescope to record images to its 3.2 gigapixel CCD camera, the LSST will take about 800 panoramic images a night equaling 15 TB of raw data every day56. To put that into perspective, the Sloan Digital Sky Survey (SDSS) in 2000 gathered in just a few weeks more data than throughout the then-history of astronomy. In a matter of a few days, the LSST gathers more data than the entire SDSS project57.

Over its ten year mission, hundreds of petabytes will be processed to produce 60 PB of data and a 15 PB database catalog, thereby creating a 3D map of space effectively allowing a user to “fly” through space58. The camera will take a 15-second exposure every 20 seconds59 covering 6 wavelengths from 320 nm near ultraviolet to 1050 nm near infrared, and is expected to take over 200,000 pictures a year occupying well over a petabyte of uncompressed disk space.

The LSST camera uses 189 4K x 4K CCD chips arranged in a mosaic focal plane. In this image, you can see the 21 replaceable electronic physical (x, y) assemblies (called rafts), with each raft containing 9 CCD chips in a 3 x 3 mosaic. If you look at the center raft, you will see the addressing scheme also uses (x, y) with (0, 0) in the lower left and (2, 2) in the upper right.

2016 EMC Proven Professional Knowledge Sharing 15

The LSST’s camera is enormous. Pictured to the left, it weighs 6,200- pounds, and is 5.5 feet tall and 9.8 feet wide. On the right is a picture of a staffer showing the relative size of the CCD mosaic.

The LSST will create unprecedented volumes of high-quality data – more than astronomers can manually process every night. It will mark a revolution in how humans will explore space through computer science. This effort is classified as a big data problem as the management and data mining of this real-time data is paramount for astronomers to interpret the observations. Initial computational requirements are estimated to require 3,000 16-core compute nodes at the telescope’s location in Chile60. In 60 seconds, the captured image data must undergo a multi- step parallel processing reduction to find asteroids and other moving objects, all before the next batch of data comes in61. Once a day, raw data and metadata are sent 5,000 miles to a supercomputer at the University of Illinois to be reprocessed and archived. Archiving the data will initially require 150 teraflops of compute power, growing to nearly a petaflop by the 10th year, and use 15 PB of disk space a year. The immense volume of data must be statistically analyzed for low-level correlations to help reverse-engineer the results and determine the cause and underlying cosmic physics – this is called the “inverse problem”62.

The 2010 prototype used 200,000 lines of C++ and Python code.63 “The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching, and querying of large survey catalogs (>109 rows, >1 TB).”64 The processing complex is estimated to have a source catalog of 350 billion rows and an object catalog of 37 billion rows, each with 200+ attributes, all representing 400,000 16-megapixel images65. The LSD uses partitioned tables stored as compressed Hierarchical Data Format 5 (HDF5) files. HDF5 uses B-trees to index table objects and works well with 3D data for faster access than the rows of an SQL database. HDF5 can represent complex data objects and metadata much simpler and faster than a star schema66,67. “Vertically, the tables are partitioned into sets of related columns (‘column groups’), grouping together logically related data (e.g. astrometry, photometry). Horizontally, the tables are partitioned into partially overlapping “cells” by position in space (lon, lat) and time (t).”68

2016 EMC Proven Professional Knowledge Sharing 16

Asteroid Terrestrial-impact Last Alert System – ATLAS – Optical Telescope ATLAS was designed to be Earth’s asteroid collision “early warning” system. It scans space to provide a day's warning for 30-kiloton "town killer” asteroid impacts, a week’s notice for a 5- megaton 150-foot diameter "city killer" asteroid, and three weeks of warning for a 100-megaton 390-foot "county killer” strike69. (NOTE – the Chelyabinsk meteor was estimated at 13 kilotons and 66 feet). ATLAS’s first discovery (composite image to the right) was August 9, 2015, when it spotted asteroid “2015 PE312”, estimated to be 200-500 feet in diameter based on its brightness70.

If ATLAS provides enough lead time, authorities can evacuate an impact area, or a tsunami zone if the object strikes the ocean. With two ground-based telescopes 100 miles apart, ATLAS robotically scans the sky four times every night seeking out NEOs by looking for movement against the background of stars and galaxies. ATLAS may eventually have 8 telescopes.

The ATLAS system can analyze 500 MB/min to make detailed comparisons of images taken one hour apart71. The telescope observes the same area of space four times

4 4 before software combines them into a single image. As this illustration CCD imagesCCDapart minutes shows, algorithms subtract static Combined Static Possible images image asteroids “stars” and “planets” leaving only objects that appear to be moving. + - Objects moving in a straight line subtract between images become “suspect” asteroids. With a “suspect” asteroid, the system searches a database in real-time for this object using its coordinates and brightness data and issues an alert within 10 minutes after analysis72. More on this critical step in the section “Using Hadoop To Spot An Asteroid”.

The ground-based ATLAS will have the same limitations as other telescopes of this variety – the Sun makes it impossible to see what is directly behind it and its glare blocks out those reflective asteroids in a perimeter around the Sun. That is what happened with the Chelyabinsk meteor – it came from the direction of the Sun and was not visible. With ATLAS located in Earth’s northern hemisphere, it is also unable to see into a major part of the southern sky. The Moon also reflects the Sun’s light causing other asteroids coming from that direction to not be visible.

2016 EMC Proven Professional Knowledge Sharing 17

ATLAS exemplifies the blurred lines between astronomy and automation. A human would be hard pressed to accomplish this mission without serious compute power. Each telescope will have a 10.5 K x 10.5 K CCD equaling 110 megapixels and take 1,000 images a night73. That equates to 150 GB every day or 55 TB/year/telescope. With two telescopes, 110 TB a year will be generated, and if eight telescopes come on-line, they will generate almost a petabyte of data.

Satellite Telescopes Hunting asteroids with a space telescope has many advantages over ground-based telescopes. Space-based telescopes are not susceptible to the filtering of infrared and ultraviolet light by Earth’s atmosphere, as well as the optical distortion caused by atmospheric turbulence. While space telescopes cost more and are harder to repair, they allow astronomers to get clear images of outer space. Let’s look at two space telescopes that will help us find asteroids.

NEOWISE – Optical Telescope In 2009, NASA launched the 6 foot wide, 10 foot tall Wide-field Infrared Survey Explorer (WISE) space telescope aboard a Delta II rocket74. With solar panels for energy, WISE orbits 325 miles above Earth and follows a Sun-synchronous path from the North Pole to the South Pole75.

With infrared’s ability to find “dark” asteroids or ones that do not reflect a lot of visible light, WISE uses four 1-megapixel CCDs of different infrared wavelengths to capture amazing images of space76. This greatly enhanced infrared image of the dying star Helix Nebula shows an asteroid’s red streaks. CCDs made of Mercury-Cadmium-Telluride (MCT) capture the infrared wavelength bands of 3.4 and 4.6 microns while CCDs made of Arsenic-doped Silicon capture the 12 and 22- micron bands77.

In this infrared illustration, WISE’s Scientist Dr. Amy Mainzer is holding a teacup. On the left, there is not enough visible light to see any details. On the right, infrared shows many more details. The same holds true in space when looking for asteroids without the aid of visible light or when their surfaces are not highly reflective. Dark asteroids absorb sunlight, so

2016 EMC Proven Professional Knowledge Sharing 18 they get hotter and appear to glow with infrared detection, just like Dr. Mainzer.

Every space object reflects infrared light, and the warmer they are the greater the amount of infrared light they produce. As a result, the WISE telescope needs to be colder than the objects it observes or it would pick up infrared from the telescope itself. When WISE was launched, it contained enough hydrogen to cool the telescope for 10 months. After that time, the Arsenic- doped Silicon CCDs failed even though the MCT CCDs continued to operate78. NASA renamed the WISE telescope NEOWISE (Near-Earth Object WISE) using just the surviving MCT CCDs. In February 2011, NEOWISE was “turned off” or decommissioned. In September 2013, NASA reactivated and reprogrammed NEOWISE to search for asteroids that could hit Earth as well as finding asteroids that could theoretically be redirected into a Moon orbit79.

WISE takes a picture every 11 seconds and took 2.7 million of them in 2010. The Tracking and Data Relay Satellite System (TRDSS) transmits WISE imagery to ground stations using communication satellites operating at 300 megabits/s in the Ku/Ka-bands and 800 megabits/s in the S-band80. WISE radios data 4 times a day in 15-minute durations81. The computing complex located in the Infrared Processing and Analysis Center (IPAC) at the California Institute of Technology (Caltech) in Pasadena, California combines the images into a catalog for worldwide access82. The satellite uses stored commands for automatic controls such as attitude control and receives new sequences sent from the NASA Jet Propulsion Laboratory (JPL).

The IPAC processes images EOS & White Sands Protected and Public WISE Science Data System @IPAC Web Services 83 Science following this block diagram . The Data EXEC Tape ❷ Project Level 0 Engineering Archive Ingest  module accepts Instrument Archive System Archive I/F and S/C Ingest Image/Engin. Engineering Data NEOWISE data packets, telemetry, ❶ Level 1 Science Archive Science Plan Image/Src/Meta Team/Project Data and other data and puts it into the (UCLA) Archive I/F Reduction (IRSA) Pipelines Tracklet Level 0 database . The Level 0 Quality ❸ Database WISE Assurance Scan/Frame Minor Intranet QA WISE-MOPS images are then handed off to Data Web Pages Planet Multi-Frame Level 3 Center Archive Reduction Pipeline processing . Image/Src/Meta QuickLook Processed Public Atlas Science and QA Final This pipeline removes instrument Release and Catalog Engineering Metadata Product Product Data Archive Generation Archive Access signatures and performs other QA (ftp/website) Atlas/Catalog (IRSA) ❹ ❺ work on the raw images84. The WISE-MOPS portion of the pipeline finds the NEOs. The Final Product Generation  documents the images and puts them in the Archive .

2016 EMC Proven Professional Knowledge Sharing 19

The processing of a raw image starts on the top left of this sequence. It is filtered, with new bad and previously bad pixels (shown in the yellow circle) removed85.

In 2011, the WISE/IPAC processing used:

 5 Sun/Oracle X4270 storage servers  15 Sun/Oracle J4400 SAS JBODs, H/W RAID, 3 X 18 TB usable per server; 270 TB total  42 node compute cluster; Dell 8‐core Xeon, 32 GB RAM, 0.5‐1 TB internal disk  3 Cisco 48‐port Catalyst 3750E switches with two 10 Gbit/s interfaces each  Resource management RHE4 (cluster), Solaris/ZFS (servers), NFS3, Condor, Ganglia86

Gaia Space Telescope – Optical Telescope The European Space Agency used a Soyuz-STB rocket to launch an optical space telescope named Gaia in December 2013 for a 5-year mission primarily to create a 3D catalog of 1 billion objects in space, or roughly 1% of our Milky Way galaxy87. It uses an optical telescope and CCDs to capture images of stars in the 400 - 1000 nanometer wavelength and is expected to find thousands of planets the size of Jupiter, quasars, and the positions and velocities of over 200,000 asteroids and comets88.

Unlike other space telescopes, Gaia orbits in what is known as Lagrange point or L2 – a stable place between the Earth and the Sun where a satellite is free of gravitational vibrations. Stationed 1 million miles from Earth, it will be unaffected by the same blind spot that causes Earth-bound telescopes to be unable to detect asteroids emerging from behind the Sun.

Using 106 CCDs, each with 4500 x 1966 pixels for a mosaic of 1 billion pixels, Gaia will take images and collect makeup, position, motion, and other data on a billion stars and other objects 70 times over its 5-year mission. Each object will become a discrete Java object on Earth when processed. The data is transmitted over a 5 Mbit/s radio link during an 8 hour period each day. Gaia generates 50 GB of raw data daily, and by the time the mission ends, it will have created 200 TB of data. The data is stored in the main database and an object-oriented database management system from InterSystems Caché and processed by the Data Processing and Analysis Consortium (DPAC)89. The final product is estimated to equal one petabyte.

2016 EMC Proven Professional Knowledge Sharing 20

In 2013, Gaia was believed to be the largest astronomy data processing challenge to date90. To process Gaia’s data, DPAC uses a processing complex depicted by the diagram to the right91. The processing is performed by equipment architected and operated by over 400 European scientists and software developers from 24 countries including France, Italy, UK, Germany, Belgium, Spain, and Switzerland92. This “team effort” consortium has broken the Gaia processing into 9 components to facilitate geographically distributed development. The components are called Coordination Units (CU), 8 of which perform various aspects of processing with the 9th handling the data archive catalog. CU1 and CU2 handle development and simulations, and CU3, 5, and 6 handle the data processing of astrometric, photometric and spectroscopic data. The CU3 is also known as the Astrometric Global Iterative Solution (AGIS) and is designed to insert over 7 billion Java objects into the Caché database every day93. Double star, orbital boundary, and solar system object analysis are performed by the CU4 component. CU7 tackles variable stars and CU8 handles spectral classification. Lastly, CU9 is involved with Gaia data publication94.

The data processing would be distributed across the nations GAIA Data Processing Centers Coordination listed in the table to the right. The DPAC requires that each CU Acronym Unit Location ESAC CU 1, 3 Madrid, Spain uses the Java framework to be database-agnostic and run using BPC CU 2, 3, 9 Barcelona, Spain ISDC CU 7 Geneva, Switzerland any vendor’s database95. IoA CU 5 Cambridge, England CNEX CU 4, 6, 8 Toulouse, France OATO CU 3 Torino, Italy An enormous amount of processing, as part of the AGIS “astrometric core solution”, is needed to create position and motion data for the observed objects. While the main database (center of the data flow diagram on the top of this page) holds the Gaia data and the results of data processing, the AGIS contains a subset of the data for up to 40 passes through 100 TB of Java objects in a 4-week period96. Multiple AGIS Java programs ingest 50 billion discrete 600-byte objects contained in the 100 TB data in just 5 days. AGIS finished results are stored in a versioned copy of the main database.

2016 EMC Proven Professional Knowledge Sharing 21

As an example of the processing power behind Gaia, the Barcelona, Spain BPC data center in charge of CU2 simulations and CU3 Intermediate Data Updating (IDU) uses the “MareNostrum III”97 supercomputer that has 3,028 compute nodes using 16 core Intel SandyBridge-EP E5-2670 processors (2.6 GHz), 32 GB of RAM and 500 GB of local disk. Interconnected with an Infiniband point–to– point 10 Gb fiber optic network, the nodes utilize IBM’s General Parallel File System (GPFS, now renamed to Spectrum Scale) mapped to 1.9 PB of disk space98.

In Toulouse, France, the Data Processing Center CNES (DPCC) is responsible for components CU4, CU6, and CU8. They are handled with Dell servers used in both a Hadoop cluster and a high performance compute cluster as pictured below99. CNES will have a big data mission to assist in the processing of Gaia’s one petabyte of data stored in tables of 80 billion rows100.

The Square Kilometer Array – Mankind’s Largest Big Data Challenge – Radio Telescope There is a new set of radio telescopes coming on-line called the Square Kilometer Array (SKA). SKA will be the largest scientific instrument on the planet when completed101 and be 100 times more sensitive than existing radio telescopes. The amount of data it is expected to generate will dramatically push the boundaries of today’s computer science techniques.

With approximately 1/3rd of the telescopes located in Australia and 2/3rds in South Africa, SKA will cover an area of 1,000,000 square meters, equaling the size of 187 American football fields. Three different types of antennas will be used, each capable of receiving specific data frequencies. The low-

2016 EMC Proven Professional Knowledge Sharing 22 frequency aperture array uses dipole antennas to handle the 50 to 350 MHz wavelengths, acting in unison or as many smaller independent radio telescopes102,103. The mid frequency is captured with dish antennas that cover the 350 MHz to 14 GHz spectrum while a subset in the 350 MHz – 4 GHz range is handled with larger traditional parabolic antennas.

With the ability to scan the sky 10,000 times faster than before104, the SKA requires innovations in supercomputing, algorithmic analytics, and disk storage. The telescopes use a “Central Signal Processor” (CSP) to forward the image data by high-speed communication links to scientists working around the world. The Digital Data Backhaul (DDBH) network moves signals from the telescope to the CSP, then to the Science Data Processor (SDP), and finally to local SKA distribution centers. The distances, some measured in thousands of kilometers, data rates to 27 terabits/second105 (almost 300,000 TB/day), and its timing requirements will stretch the limits of modern telecommunications.

Initial SKA prototypes were named MeerKAT in South Africa, and ASKAP and MWA in Australia. MWA’s “Phase 1” will have 250,000 low-frequency antennas, increasing to a million over time106. It should provide a much higher resolution and will scan the sky 135 times faster than existing radio telescopes.

In the first of multiple phases, telescopes will produce 160 TB of raw data per second (35,000 DVDs per second). With low-frequency range telescopes collectively generating 157 TB/s, and mid frequency range telescopes generating 2 TB/s107, SKA is a big data computing project. Individual telescopes will create up to 20 GB of raw data per second108. In total, up to 5 exabytes (EB) every day needs to be processed by a supercomputer, with the systems handling 156 zettabytes of data annually when fully operational. Data traffic is estimated at ten times the 109 SKA Represents a Petabytes Exabytes Zettabytes current global internet traffic with the Computing Revolution a year a year a year Data generated by SKA2 antennas ** 138,555,830 135,300 156 SKA requiring enough fiber channel Data generated by SKA1 antennas 13,855,583 13,530 16 110 Global Internet Traffic 2013 430,080 420 0.5 cable to wrap around the Earth twice . SKA1 combined archive 6,656 6.50 < 0.01 Business emails sent worldwide 3,000 2.90 < 0.01 The volume of data makes it impractical Facebook uploads 180 0.17 < 0.01 Google searches 98 0.09 < 0.01 to move through a network, so it must YouTube 15 0.01 < 0.01 CERN 15 0.01 < 0.01 somehow be processed where it finally NOAA 6 < 0.01 < 0.01 Library of Congress 5 < 0.01 < 0.01 lands. ** SKA1 = first phase of SKA = 10% of total projected data Source: SpaceUp Toulouse - The Square Kilometre Array telescope https://www.youtube.com/watch?v=PkR6LAOgSII 2016 EMC Proven Professional Knowledge Sharing 23

As shown in this SKA Big Data Flow Diagram, the radio dish and array data rates rapidly increase to 5 PB/s in Phase 2. Researchers are able to review the data and work with subsets, perhaps in a cloud computing model, after it lands in the Science Archive to the right of the diagram. Antenna & Front-End Massive Data Flow, The parallel architecture needed to process these rates and Systems Storage & Processing volume sizes must take into account the worldwide Correlation > 1 Exaflop/s geographic routing of data. Existing IT infrastructure simply > 7 Petabytes/s Data cannot handle these data rates. Imagine the impact of taking 800 Petabytes Product Temporary 30 Petaflops/s an outage to cope with unplanned code upgrades or break-fix Generation Storage > 300 Gigabytes/s issues. Here is a flowchart of the anticipated data rates. SKA High Long Term Availability On-Demand is the very definition of a truly ambitious big data project. Storage Storage / DB Processing 18 PB/year SKA’s 500,000 telescopes will collect an enormous 14 EB of radio signal data and store 1 PB every day. If you tried to store a petabyte of data on an EMC VNX2 using RAID 6(14+2), you would consume 300 x 4 TB drives every day111. However, the critical issue is the compute power and infrastructure to process a petabyte of data every day and not disk capacity per se. The scalability, bandwidth, power consumption, and drive characteristics such as Input/Output Operations per Second (IOPs) would dictate a far more elegant solution (if it even exists today).

The SKA design team initially used a conservative blade Processing Blade GGPU or MIC M

Disk 1 Disk 2 Disk 3 Disk 4 M

- -

≥1TB ≥1TB ≥1TB ≥1TB Core architecture design and extrapolated it to 2018/2020 to Core 56Gb/s

handle future processing requirements. From the -

>10TFLOP/s >10TFLOP/s 112 LOFAR (Low-Frequency Array) low-power design , To rack Host processor switches a Dell PowerEdge T620 using 8-core dual Xeon E5- Multi-core X86

2600 processors with PCIe Gen3 15.75 GB/s expansion PCI Bus

Moore’s Law – every slots, 768 GB RAM, 32 x 2½” solid-state disk drive bays, 2 x 10 or 2 x 40 two years, the number GbE NICs, and 2 x 56 Gb/s Infiniband ports were envisioned. Using of CPU transistors doubles, effectively Moore's Law, these blades could have double to triple the processing doubling computer power by 2020 and be capable of 64 TFlops. processing power

2016 EMC Proven Professional Knowledge Sharing 24

Twenty of these 2U blades will be housed in a 42U rack. Each node, taking into 42U Rack Processing blade 1 Processing blade 2 account memory, network interfaces, SSDs and other components, is expected to Processing blade 3 Processing blade 4 Processing blade 5 consume 882 watts. Two 36 port Mellanox SX6536 Infiniband “leaf” switches Processing blade 6 Processing blade 7 Processing blade 8 connect to one 56 Gb/s port on each blade, delivering 74.52 Tb/s of switching Processing blade 9 Processing blade 10 capacity. Each rack would have an electrical power density of about 20 kW. Leaf Switch-1 56Gb/s Leaf Switch-2 56Gb/s Processing blade 11 Creating a low-profile SKA processing building block is essential to be able to power Processing blade 12 Processing blade 13 Processing blade 14 the overall processing complex necessary to handle the expected data rates. The Processing blade 15 Processing blade 16 SKA 2013 “SDP Element Concept” architecture guide described a bulk storage Processing blade 17 Processing blade 18 Processing blade 19 system incorporating a “scale-out” Xyratex ClusterStor 3000 which uses Processing blade 20 the Lustre file system that is expandable to 30 PB and uses Infiniband to connect the blades. Its power consumption is 18.5 kW113. [Note: Lustre (Linux Cluster) is a parallel distributed file system used for large-scale cluster computing114.]

To explore the enormous processing power required over the entire SKA timeline, with a focus on Phase 1 of SKA, IBM and the Netherlands Institute for Radio Astronomy (ASTRON) are working to create a massively powerful computing system through advanced chip designs. Called “Project DOME”, they will try to find energy efficient ways to Projects 1. Algorithms and Machines transport the huge data volumes between radio antennas to a central 2. Access Patterns 3. Nanophotonics location, and provide real-time data filtering and methods to store the 4. Microservers 5. Accelerators data. Ideally, they need to develop a 300 petaflop computer that uses 6. Compressive Sampling 7. Realtime Communications less than 8 MW of power, or more than 10 times the fastest supercomputer with the same energy profile115. In total, ASTRON and IBM have mapped out 7 projects to handle this new SKA big data frontier. They include information management, computer chip system design employing 3D stacked chips, optical interconnects, water cooling and nanophotonics.

The software architecture is expected to include an Application layer, Common software layer, SKA subsystems and service components High-Performance Computing (HPC) High-level UIF Toolkit SKA Common Software Application Framework APIs and Tools services, and Operating System layers. The Configuration Scheduling Access Monitoring Live Data Logging Alarm Block Core Services Control Archiver Access System Service Management Service designers envision a “loose coupling in the Communication Database 3rd Party Tools Development Base Tools Middleware Support and Libraries Tools higher layers of the software stack…” with tighter Operating System coupling for performance oriented lower layers116. Further subdivisions of each layer are likely.

The Base Tools layer contains Common Software development tools and run-time environment on top of the operating system. This layer contains a Communication Middleware that handles

2016 EMC Proven Professional Knowledge Sharing 25 intra-application exchanges, a Database Support component providing administration, data access and abstraction application programming interfaces (API), and may include Cassandra, the Hadoop database HBase, or relational databases such as MySQL and Postgres. Third party tools and libraries might include astronomical libraries such as casacore, wcslib, HDF5, etc.117 “Development Tools comprises a comprehensive build system that supports recursive compilation, executing of unit and functional tests and creation of deployable packages (release process). It also provides wrappers on top of existing compilers such as make and/or SCons for C++ applications, Ant/Maven for Java applications and setuptools for Python.”118

Access control and authentication, archiving of monitor data, access to SKA real-time monitoring and control data, application logging, alarm tools, configuration management, and scheduling are part of Core Services.

High-level APIs and Tools provide APIs, allowing packages to integrate and access core services. The User Interface Toolkit has APIs for the Graphical User Interface (GUI) including widgets for displays, log browsing, alarms, and tools to monitor and operate large scale control systems.

The Science Data Processor binds hardware compute, network, software, and algorithms together to handle data rates exceeding the daily worldwide web traffic119. Planned to be online by 2020 and at “full power” by 2025, 100 petaflop supercomputers (100,000,000,000,000,000 floating point operations per second) will be needed to crunch SKA data120. Ultimately, exaflop supercomputers will be required. As of June 2015, the fastest supercomputer is China’s Tianhe- 2. Capable of “just” 34 petaflops, it could only handle 1/3 of SKA’s requirements121. The compute power is needed to process real-time image data from thousands of telescopes operating at thousands of frequencies. Some of the calculations include122:  Removing corrupted data  Calibrating each antenna  Transforming the data onto a rectangular grid  Applying Fourier transformations to convert the data an image in the sky  Removal of data spikes from bright stars

The process then iteratively combines parameters such as complex gains to eventually create a converged image. These steps are memory intensive and require massive data storage

2016 EMC Proven Professional Knowledge Sharing 26 capabilities. However, neither the processing power nor storage capabilities exist today on a practical basis.

As we have seen in this section, SKA data rates will overwhelm the ability for astronomers and data scientists to work with the raw data, pushing the analysis of patterns and correlations beyond the limits of the human brain. SKA promises to redefine all that we associate with the term big data – maybe we should call this “Ultra Big Data”?

Using Hadoop To Spot An Asteroid With millions of asteroids in space, you would think it would be easier to find them. However, their relatively small size poses a problem as they only appear to be tiny dots of light in the sky. Is the dot a star and or an asteroid? In order to find an asteroid, telescopic images must be compared, and an object that seems to move from one image to the next might be an asteroid. In Piazzi’s time, the comparison was done manually, and as a result, few asteroids were found.

French physicists first used a camera for astronomy in 1845, but the film was not sensitive enough to capture starlight123. These days, telescopes are far more sensitive and film cameras have been replaced by CCD cameras. Algorithms now compare images with positive findings reviewed by astronomers. Algorithmic methods have plusses and minuses. Algorithms that are too sensitive can yield many “false positives”, and with lower sensitivity, it may miss the object.

The Catalina Sky Survey took 7 images of asteroid “2014 AA” on January 1, 2014124. This SUV- sized asteroid weighed about 44 tons and burned up in our atmosphere the next day125. These are 4 of those images126. At a high level, an Earth-bound telescope adjusted for planetary rotation to take CCD images minutes apart of the same 1 2 3 4 part of space. As mentioned in the ATLAS section of this paper, the images were aligned and cleaned up through coaddition to allow image subtraction to isolate the asteroid.

2016 EMC Proven Professional Knowledge Sharing 27

In greater detail, when a telescope takes multiple images of space minutes apart, images will partially overlap, or images from different telescopes will need to be coadded to clean up and enhance faint images. Starting with a base image, subsequent images of the same section of space are algorithmic aligned and added to produce a sharper, brighter image. With ever-higher resolution and an increasing number of images, astronomers rely on massively parallel processing Hadoop systems to do this work. In this illustration, image data is injected into the Hadoop system where dozens to thousands of nodes break the problem apart and parallelize the search for boundary matching (MAP). The images are eventually stacked and brought together into a mosaic (REDUCE). When the processing is complete, a final composite image is produced127. This approach is far faster than a serial approach of image alignment.

To complete this image process, once bright static dots are isolated and subtracted from each frame, an asteroid can be seen streaking across the sky as shown on the previous page. This is called image or pixel subtraction128 and allows an asteroid’s motion to stand out – i.e., the stars are so far away they appear fixed in space. What is left is possibly an asteroid. This is hard to spot without computer algorithms.

3D Asteroid Modeling – Try It Yourself! Asterank is a database created by Ian Webster that contains information on over 600,000 asteroids129 using their known orbit and physical composition data from the Minor Planet Center and NASA’s Jet Propulsion Laboratory. Webster’s highly informative 3D full-motion view of asteroids shows their interaction with the planets and serves as a model of potential Earth

2016 EMC Proven Professional Knowledge Sharing 28 impactors. Here is a still image of the 3D view. The interface allows for speed settings, pan and zoom, the layering of planet orbits and the Milky Way. In this zoomed image for May 17, 2016, you can see the Sun in the center, the planetary orbits of Mercury, Venus, Earth, and Mars and a portion of Jupiter, and the position of asteroids in this section of space. You are encouraged to explore this database and the viewer at http://www.asterank.com. The source code is on GitHub130.

There is also an API to query the MongoDB database using the syntax: {field: {$lt: value} }, where $lt selects the documents where the value of the field is less than the specified value131. For example, {"e":{"$lt":0.1},"i":{"$lt":4},"a":{"$lt":1.5}}&limit=1 searches for an Asteroid with Eccentricity E <0.1, Inclination (degrees) I <4, and a Semi Major axis < 1.5AU. The query returns asteroid 138911 “2001 AE2”.

Taking Action While the threat of a cataclysmic, massive, civilization-ending asteroid colliding with Earth has a very low probability, the likelihood of smaller strikes remains constant. Based on the Moon’s craters, we know Earth has been and will continue to be hit repeatedly. Asteroids with the equivalent of 600 kilotons of TNT have hit Earth over the last decade. In 1997, David Morrison, one of the pre-eminent experts on NEOs and asteroids stated that of the “roughly two thousand kilometer-scale asteroids that are expected in Earth-crossing orbits, fewer than two hundred have actually been found.”132 In 2005, British astronomer David J. Asher co-authored a paper titled “Earth In The Cosmic Shooting Gallery” and wrote, “The terrestrial impact rate appears to be substantially higher than current near-Earth object population models imply, consistent with a significant unseen cometary contribution to the terrestrial impact hazard.”133

2016 EMC Proven Professional Knowledge Sharing 29

While we may not see extinction in our lifetimes, many feel we are fortunate to have made it this far. The argument Dr. Asher’s analysis naturally raises is one of preparation. If we were to wake up tomorrow and be told an asteroid will strike on Friday, it could be too late to react. If there is not enough lead time to deflect it, then it only makes sense based on his findings that we need a strategy to put an object deflection infrastructure in place in advance before the detection.

Let’s look at what can be done if an asteroid of sufficient size is on a collision course with Earth. Any defensive strategy depends on computer science as demonstrated by recent endeavors to send a spacecraft to an asteroid, as we did with NASA’s Dawn space probe. We have the technology to put a probe in orbit around asteroid Vesta and dwarf planet Ceres to take great pictures as shown in this image of the Ceres surface.

Launched in September 2007, it took almost 4 years (July 2011) and a lot of planning to have Dawn orbit Vesta, some 117 million miles from Earth. Due to its relatively slow speed and Vesta’s own orbital velocity, Dawn traveled 1.7 billion miles with a Martian gravity assist along the way. On August 2013, it was sent on the second part of its mission, a 930 million mile, 2½ year journey to Ceres134.

While asteroids threatening us will not be as distant as Vesta or Ceres, the key to deflecting or redirecting them is sufficient lead time, perhaps measured in years. No nation today is prepared to launch a rocket to deflect, redirect or destroy an asteroid. Based on the object’s size and the lead time, a change of a fraction of a degree is all that it would likely take to change its orbit and prevent the collision.

Nuclear Explosion

In 1998, a Texas-sized asteroid was 18 days away from annihilating Earth, or so the disaster movie Armageddon goes. In the movie, space shuttles with nuclear bombs were launched towards the asteroid with a plan to save mankind by using the bombs to break it into pieces. A few months after Armageddon, the film Deep Impact depicted a crew using nuclear bombs on a 7-mile wide comet. Unfortunately, they broke it into 2 large pieces, with both still targeting Earth.

2016 EMC Proven Professional Knowledge Sharing 30

One fragment caused a 3,500-foot tsunami in the Atlantic Ocean near America’s East Coast, killing millions while the other piece is destroyed before it strikes Canada.

In real life, the lack of warning could lead to a similarly desperate approach. With lead time, a spacecraft with a nuclear weapon could be launched to deflect a certain sized asteroid. By detonating the weapon near the asteroid, the shockwave or intense radiation could be sufficient to nudge the asteroid off course while keeping it intact, causing it to miss Earth135. It is generally agreed not to detonate anything on the asteroid’s surface or subsurface since breaking it into many smaller but still significant smaller pieces could still target Earth – a “buckshot effect”136.

Kinetic Impact

NEOShield-2 is a project by the European and German space agencies to create a high-velocity kinetic impactor that can crash into an asteroid at a high velocity137. The impactor transfers its mass and velocity to the asteroid causing it to have a small change in velocity, thus diverting its course by a fraction of a degree. An example of this is when a cue ball hits another billiard ball, imparting kinetic energy and sending the other ball flying.

The degree of deflection depends on the mass and speed of the impactor. A small impactor moving quickly can have the same effect of a large impactor moving very slowly. Calculations show a 1 mile-per-hour impact would divert an asteroid 170,000 miles if it were struck 20 years in advance138. If an asteroid was small enough, ramming it with a spacecraft like Dawn could supply enough kinetic energy to throw it off course. There are also hybrids that use this kinetic approach. One such kinetic impactor is called an HAIV, or Hypervelocity Asteroid Intercept Vehicle. HAIVs consist of two spacecraft with the first kinetically punching a hole in the asteroid and the second implanting explosives in the asteroid similar to the Nuclear Explosion method139.

2016 EMC Proven Professional Knowledge Sharing 31

The Yarkovsky/Paint Effect

Russian civil engineer Ivan Yarkovsky wrote in the year 1900 “…that the diurnal heating of a rotating object in space would cause it to experience a force that, while tiny, could lead to large long-term effects in the orbits of small bodies…”140. We feel this ourselves when wearing a white or a black shirt on a hot sunny day – the white reflects some of the heat while the black shirt absorbs it. In other words, if we could paint one side of an asteroid white, it would change the number of thermal photons reflected off of it causing it to change course.

The photons act as a tiny rocket pushing the asteroid in a different direction. Adjusting the thrust could be accomplished through the opaqueness of white paint or by painting the opposite side black. This approach would take many years or even decades to change an orbit, so plenty of impact notice would be needed.

Sails

German astronomer Johannes Kepler noted in 1619 that a comet’s tail was away from the Sun because of pressure from sunlight141. Similar to a sailboat that uses large sails and wind power to move, the pressure of sunlight against a giant solar sail pushes it forward. Sunlight is made of photons. Photons have no mass but they do have momentum, and the larger the solar sail, the greater the capture of photons to push it – in essence, the Sun has wind energy.

If a spacecraft can attach a solar sail to an asteroid, then the Sun’s emitted photons hit the sail and push against it, transferring its momentum. The sail would slowly nudge the asteroid into a slightly different orbit. By furling or unfurling the sail, the degree of propulsion could be changed. This concept would work for smaller asteroids but the size of the sail might make it impractical for very large ones or if the lead time to attach one is too small.

2016 EMC Proven Professional Knowledge Sharing 32

Catch It

If you could snare an asteroid in a net – a giant one made of metal or some strong carbon fiber – then a spacecraft could “tug” the asteroid into a new orbit. It could also bring it somewhere else such as into an orbit around the Moon for further study142.

Heat it up

Using giant mirrors, sunlight could be aimed at an asteroid that contains trapped water to heat it up. The heat would cause any vapor in the asteroid to be ejected out. The ejected vapor would act like a small rocket motor pushing the asteroid into a slightly different orbit143. A high-powered laser aimed at the asteroid (laser sublimation) would have the same effect.

Nudge It

Similar to the manipulated gravitational forces generated by the Star Trek Enterprise’s tractor beam, we know that objects, even man-made objects, exert a gravitational pull. By orbiting a spacecraft around an asteroid, a weak gravitation force would be exerted on the asteroid, and by very slight changes in the spacecraft’s direction, it could nudge the asteroid enough to change its course as well144. Care would need to be taken that the spacecraft didn’t accidently strike it or aim its thrusters towards the asteroid’s surface in its attempt to orbit it. The closer the orbit, the greater the gravitational pull. In theory, you could also tether the asteroid to another heavy object like a giant spacecraft, thereby altering the asteroid’s orbit. The lead time for these ideas could be measured in decades.

Attach a rocket motor to it

If time is short or the object is too large, then waiting for a sail to guide it away or spray painting it white might not be the right approach. If a spacecraft could attach a big chemical

2016 EMC Proven Professional Knowledge Sharing 33 rocket engine to it, it would push the asteroid in a different direction145.

Eat It

In 2004, NASA created a farfetched idea to send dozens of nuclear-powered spacecraft to an asteroid and working as a team, drill into it and send the rubble into space using powerful electromagnets or a rail gun. NASA called this project Modular Asteroid Deflection Mission Ejector Node (MADMEN). By changing the mass of the asteroid and the recoil of sending the chunks away, the asteroid’s course would be altered. NASA’s analysis showed they would need a formation of 39 “munching” spacecraft, needing just 17 to survive the landing on the asteroid. With the craft fully functioning, the mission stood a 43% chance of success146.

All these methods and dozens of others all rely on sufficient warning. As we have seen, the warning can only come through active computer science-aided observation of space. The problem is enormous and even stretches today’s definition of big data in that the technology does not yet exist that can process all the data in sufficient time to be of value.

High-Performance Computing and Big Data

The amount of telescope data generated has Data Volume Sky Survey Projects Estimate grown at an incredible rate. Astrophysicist DPOSS (The Palomar Digital Sky Survey) 3 TB 2MASS (The Two Micron All-Sky Survey) 10 TB and data scientist Dr. Kirk Borne tells a story SDSS (The Sloan Digital Sky Survey) 40 TB SkyMapper Southern Sky Survey 500 TB of an astronomer in 2000 who asked if NASA GBT (Green Bank Telescope) 20 PB LSST (The Large Synoptic Survey Telescope) ~ 200 PB expected could store a terabyte of sky survey data and SKA (The Square Kilometer Array) ~ 4.6 EB expected was told “That’s impossible! Don’t you realize http://datascience.codata.org/articles/10.5334/dsj-2015-011/ that the entire data set NASA has collected over the past 45 years is one terabyte?”147 These days, “virtual astronomy” is measured in petabytes and exabytes. As we’ve discussed. the SKA will create 5 petabytes of data per second when fully operational.

2016 EMC Proven Professional Knowledge Sharing 34

Computer science uses parallel processing to address problems such as how to defend Earth through timely decisions based on huge volumes of data. Rather than trying to overcome the limitations of silicon, thermodynamics, and the speed of light to build a highly scalable single processor chip, it is far more practical to increase their overall computational performance by using multiple processor chips that communicate with each other. Distributing the compute load among processors is a practical approach to solving problems easily answered in parallel – these are called “embarrassingly parallel”148. For example, Hadoop splits up large chunks of work among processing nodes working in parallel, and when the nodes are finished processing, the individual answers are brought together into a single solution. This is the same concept as using multi-lane highways to allow more cars to travel in parallel, but without speeding up the cars.

Two of the popular parallel processing approaches are Single Instruction, Multiple Data Stream (SIMD), and Multiple Instruction, Multiple Data Stream (MIMD). At a very high level, an SIMD can run the same instruction on all processors but on different data streams while an MIMD can run different instructions on different data streams.

Today’s approach to processing vast amounts of astronomical data is to apply advanced parallel processing techniques. This class of HPC computer science problem can require a supercomputer – something that can apply aggregated compute power. Supercomputers can either be built from a few dozen to thousands of off-the-shelf servers (e.g. Dell, HP, Lenovo/IBM) aggregated with high performance interconnects, or designed from the ground-up to be specialty supercomputers (e.g. Cray, IBM) incorporating commodity components. Examples of expensive specialty supercomputers include:

 Cray XC40. The U.S. National Nuclear Security Administration purchased “Trinity” for $174 million to run Linux on 9,436 nodes using 301,056 compute cores and 2 PB of memory to support a 78 PB parallel file system with a bandwidth of 1.6 TB/s149.  Tianhe-2 is the fastest machine in the world according to Top500150. It uses Intel Xeon E5 processors to supply 3,120,000 compute cores (10X that of the Cray XC40), 1 PB of memory, and 12.4 PB of storage for a cost of $390 million dollars151. It consumes between $26 and $36 million dollars’ worth of electricity every year152.

2016 EMC Proven Professional Knowledge Sharing 35

There is also a cloud computing framework that unites HPC and big data on a pay-as-you-go basis (scalable and dynamic) without the need to own the platform. Various public cloud providers such as Amazon153 and Google154 offer these environments.

If supercomputing is to help protect Earth, it must continue to follow Moore’s Law. By 2020, the world needs its first exaflop machine (1,000 petaflops) capable of quadrillions of calculations per second, but current systems are trending below that goal as this chart shows (performance based on Tianhe-2 as shown in the red oval is trending flat in 2015)155. Other critical factors include cost, power, cooling, storage requirements, etc.

To further emphasize the critical nature of this problem, the supercomputer in this illustration is tasked to prepare a planetary defense that predicts the success of a nuclear detonation156. A logical 3D grid is created such that each cube of the grid can be assigned to its own asteroid segment and compute core. Then the most precise elliptical orbit data of the asteroid is fed into the Create a computational box to do the simulation and divide it into 100 grid such that its composition, million cubes. Place a model of the speed, mass, trajectory and other asteroid in the computational box and assign groups of data is represented, with each the cubes to different processors. Allow processors to compute how the contents of each cube evolve. compute core working on its own view of the cube. The goal is to determine how big a blast is needed and where should it be placed such that when detonated, the asteroid will be blown into much smaller fragments that will miss Earth. Each compute core is applying physics equations to understand the effect of an explosion on its piece of the asteroid. Given the 3D model is tracking a rotating high-speed asteroid, the simulation would represent a timeline with second or sub-second resolution. Each core must be in inter-process communications with other cores so the blast effect on its piece of

2016 EMC Proven Professional Knowledge Sharing 36 the asteroid will be understood and taken into account by adjacent cores. The overall simulation must begin in the future to allow enough lead time to put a plan in place to blow up the object.

From an IT perspective, the supercomputer is just a machine and can breakdown, so the architecture or the machine itself must provide enough redundancy, for example, to minimize the impact of a processor replacement or a bad cable. If the simulation gets corrupted, the system must allow itself to back up to the appropriate timestamp or checkpoint since starting the process from scratch is obviously not an option. Checkpoints must record all the processing since the last checkpoint – i.e. each core needs to know where in their equation calculations they need to resume from. These checkpoints could take hundreds of terabytes of data storage, so I/O service time must be taken into account. There is also the likelihood that multiple checkpoints would need to be saved – perhaps exabytes of fast storage will be required.

Just as collecting and analyzing petabytes of data in real-time pushes the boundaries of Moore’s Law, the same challenges apply to storing the data. While Moore’s Law predicts the immense supercomputer power to generate data faster than ever before, the ability to store it for additional analysis has not kept up. There is a growing gap between the speed of processors and storage – spinning hard disk drive (HDD) performance is simply too slow. HDDs were invented in 1956157, well before the first commercially available microprocessor in 1971158. In 2000, the fastest HDD operated at 15,000 RPM but they have not rotated any faster since. Improvements in classical hard drive technology have focused on platter density, larger cache memory, etc., allowing the fastest rotating 600 GB drive with a 128 MB cache buffer to transfer 290 MB/s of sequential 4K block data over a 12 Gb/s SAS interface.159 Mechanical drive capacity is not the answer – the largest helium-filled 10 TB drive with a 256 MB cache transfers data at a sustained transfer rate of 249 MB/s160. Before the introduction and commercialization of the solid-state drive (SSD), supercomputers might require tens of thousands of HDDs just to handle the throughput performance requirements (e.g. IOPs).

Employing integrated circuits to store data instead of rotating platters with moving arm magnetic heads, the performance difference of today’s SSD (typically at a higher cost) well exceeds the

2016 EMC Proven Professional Knowledge Sharing 37 fastest HDD. As a result, supercomputers might only need thousands of SSDs to do their work. Even with the pace of telescopes like SKA, hundreds of terabytes of checkpoint data written to thousands of SSDs will still take many minutes.

As illustrated by this chart, even faster solutions are In-server Flash Memory Accelerators Random READ MB/s 2,800 becoming available as storage moves from an external Random WRITE MB/s 2,200 Random READ IOPs 345,000 storage area network (SAN) and “next to” processor Random WRITE IOPs 385,000 Solid-State Disks with 12 Gb/s SAS Interface memory. Bypassing disk controller cards and host bus Sequential READ MB/s 980 Sequential WRITE MB/s 740 adapters effectively give in-memory technology orders of Random READ IOPS 199 Random WRITE IOPS 115 magnitude higher throughput than today’s best storage https://www.sandisk.com/business/datacenter/products array. These memory images could be gradually de-staged to slower SSDs and even slower but huge HDDs allowing the supercomputer to continue with its calculations with the checkpoint completed as a background process. Examples of in-server flash memories include EMC’s DSSD which provides compute-side SSDs directly through the PCIe bus161. Co-location with the computer allows for near-memory speed storage with bandwidths of 1 TB/s and 250 M IOPS.162

To defend our planet, it is acknowledged that supercomputing and new storage technologies need to work together as part of the HPC/big data asteroid defense problem. The field of astronomy recognizes these critical areas and established astroinformatics and astrostatistics disciplines to focus on them. Astroinformatics combines astronomy and IT technologies such as machine learning, statistics, visualization, data management, and others163 while astrostatistics encompasses astrophysics, statistical analysis, and data mining164.

Conclusion Most of the funded projects are naturally focused on finding asteroids, but equally important is what to do about them when they pose a risk to us. There is little doubt that we need a plan beyond praying to deflect or destroy them, especially with little or no lead time. Critical to both parts of this approach is data analysis. To find asteroids, and to deflect or destroy them, you need computer science.

Taking the form of computation algorithms, HPC, big data, modeling, simulation, data mining, networking and other critical areas, computer science is fundamentally critical to help shield Earth from asteroids. Coupled with the work of many gifted astronomers, the “golden age” of astronomy, marked by massive photon gathering mirrors, radar telescopes, and spacecraft like

2016 EMC Proven Professional Knowledge Sharing 38 the Hubble Space Telescope, would not be where it is today without the integrated circuit CCDs and microprocessors.

Clearly times have changed, and every day the field of astronomy is being transformed by computer science. In a field that less than 100 years ago relied on humans to scan the sky with optical telescopes, and people to perform manual data reductions, results were often distributed to a limited few or kept in a desk drawer. Technology now allows for giant maps of the sky to be collected in an automated fashion with data scrubbed by algorithms. The result is the beginning of a giant database of imagery and metadata searchable by anyone around the world. We are witnessing the initial creation of a space shield that will hopefully protect “All I’m saying is now is the time to mankind from what happened to the dinosaur – if it’s develop the technology to deflect not too late. an asteroid.” [www.slideshare.net/perficientinc/creating-a-successful- api-program-to-drive-digital-transformation]

2016 EMC Proven Professional Knowledge Sharing 39

Appendix - Glossary Aphelion - the asteroid’s farthest distance from the Sun measured in astronomical units (AU).

Asteroid – A small (relative to a planet) rocky body orbiting the Sun.

Astrometry - the precise measurement of the positions and motions of celestial bodies.

Declination - (abbreviated dec; symbol δ) is one of the two angles that locate a point on the celestial sphere in the equatorial coordinate system, the other being “hour angle”.165

Ephemerides – A table of future positions.

Kinetic energy – The energy an object possesses when in motion. The heavier the object and the faster it travels, the more the kinetic energy it possesses.

Meteoroid – A small piece of the asteroid that orbits the Sun. Generally less than 1 meter in size.

Meteor - The streak of light produced by atmospheric friction as an asteroid or meteoroid enters Earth’s atmosphere.

Meteorite - A meteor chunk not vaporized on entry into the atmosphere and lands on the Earth.

Perihelion - the asteroid's closest distance to the Sun measured in astronomical units (AU).

Photometry - the measurement of the brightness of a celestial body over wide bands of wavelength.

Planetoid – See asteroid.

Right ascension - (abbreviated RA; symbol α) is the angular distance measured eastward along the celestial equator from the vernal equinox to the hour circle of the point in question.166

Semi-major axis - distance is equal to one-half of the major axis of an ellipse.

Spectrometry - the measurement of the spectrum of light emitted by a celestial body.

2016 EMC Proven Professional Knowledge Sharing 40

Appendix – Draw an Ellipse in Excel I’ve included some simple steps if you would like to draw some simple ellipses using Excel. The general formula for an ellipse with its major and minor axis lying on a graph’s x and y-axis

푥2 푦2 follows this formula: + = 1 푎2 푏2 To put it into an easy to use Excel form, you want to “solve” this equation for y:

푥2 (1 − ) ∗ 푏2 푦 = ±√ 푎2 , so an Excel equation, it looks like 푦 = ±푠푞푟푡((1 − 푥^2/푎^2) ∗ 푏^2))

Width Height 4.0 "A" "B" 4 3 y=±sqrt((1-x^2/a^2)*b^2)) 3.0 x y y- -4 0.0 0.0 2.0 -3.5 1.5 -1.5 -3 2.0 -2.0 -2.5 2.3 -2.3 1.0 -2 2.6 -2.6 -1.5 2.8 -2.8 0.0 -1 2.9 -2.9 -4 -3 -2 -1 0 1 2 3 4 -0.5 3.0 -3.0 0 3.0 -3.0 -1.0 0.5 3.0 -3.0 1 2.9 -2.9 -2.0 1.5 2.8 -2.8 2 2.6 -2.6 2.5 2.3 -2.3 -3.0 3 2.0 -2.0 3.5 1.5 -1.5 4 0.0 0.0 -4.0

The following shows you the formulas in each cell. By changing the values in A1, you can alter the width of the ellipse and with B1 you can change the height of it. By using a larger X range with smaller intervals, the ellipse would look smoother. My example uses X intervals of 0.5, so if you used 0.1, the curve would appear smoother.

2016 EMC Proven Professional Knowledge Sharing 41

Footnote

1 https://en.wikipedia.org/wiki/Chelyabinsk_meteor 2 http://www.amsmeteors.org/fireballs/faqf/#5 3 http://www.popsci.com/science/article/2013-02/astronomers-calculate-russian-meteorites-orbit-and-realize-it-has- 80-million-cousins 4 http://www.wired.com/2015/07/asteroid-2015-hm10-will-not-destroy-earth/ 5 http://neo.jpl.nasa.gov/images/Chelya_orb.png 6 http://www.popsci.com/science/article/2013-02/astronomers-calculate-russian-meteorites-orbit-and-realize-it-has- 80-million-cousins 7 NASA JPL Orbit Diagram of 2014 EC http://ssd.jpl.nasa.gov/sbdb.cgi?sstr=2014+EC&orb=1 8 http://www.express.co.uk/news/nature/507480/Asteroid-Strikes-Earth-Damage-Nasa-Destruction 9 http://www.jpl.nasa.gov/news/news.php?feature=4380 10 https://en.wikipedia.org/wiki/Tunguska_event 11 http://paleobiology.si.edu/dinosaurs/info/everything/why_2.html 12 http://www.scientificamerican.com/article/asteroid-killed-dinosaurs/ 13 https://www.youtube.com/watch?v=Dcp0JhwNgmE 14 https://en.wikipedia.org/wiki/Extinction_event 15 http://www.space.com/19681-dinosaur-killing-asteroid-chicxulub-crater.html 16 https://en.wikipedia.org/wiki/The_Age_of_Reptiles 17 https://ccaeducause.files.wordpress.com/2011/01/bernard-meade.pdf 18 https://en.wikipedia.org/wiki/Asteroid_belt 19 https://en.wikipedia.org/wiki/Kuiper_belt 20 https://en.wikipedia.org/wiki/Oort_cloud 21 http://imgur.com/gallery/FTE4Ly9 22 http://www.daviddarling.info/childrens_encyclopedia/comets_QA.html 23 http://www.popsci.com/article/technology/what-nasa-should-do-instead-asteroid-retrieval-mission 24 http://www.space.com/23501-russian-meteor-explosion-asteroid-threat.html 25 http://www.bbc.co.uk/news/science-environment-24839601 26 http://www.computerweekly.com/news/1280090479/Lack-of-funds-puts-Earth-in-shadow-of-asteroid-threat 27 http://www.vox.com/2014/9/16/6226379/nasa-asteroid-risk-location 28 http://www.minorplanetcenter.net/iau/lists/ArchiveStatistics.html 29 https://en.wikipedia.org/wiki/P/2010_A2 30 http://www.popsci.com/science/article/2013-02/how-powerful-new-telescopes-are-helping-us-find-more-asteroids- hopefully-just-time 31 http://www.britannica.com/EBchecked/topic/39730/asteroid 32 https://en.wikipedia.org/wiki/Occultation#Occultations_by_asteroids 33 Mathematics Magazine, Vol. 72(1999), pp. 83-91 34 https://en.wikipedia.org/wiki/History_of_ancient_numeral_systems#cite_note-13 35 www.lpi.usra.edu/books/AsteroidsIII/pdf/3027.pdf 36 https://en.wikipedia.org/wiki/Ceres_(dwarf_planet) 37 http://www.schillerinstitute.org/fid_97-01/982_orbit_ceres.pdf 38 https://www.math.rutgers.edu/~cherlin/History/Papers1999/weiss.html 39 “Orbital Mechanics: Theory and Applications” by Tom Logsdon, ISBN 0-471-14636-6, p. 164 40 http://www.open.edu/openlearn/science-maths-technology/science/physics-and-astronomy/astronomy/the-naming- asteroids 41 https://groups.google.com/forum/#!topic/b-a-s/bYkwFzW9t7o 42 http://science.nasa.gov/science-news/science-at-nasa/1999/features/ast20apr99_1/ 43 http://www.lawrencehallofscience.org/static/hou/hs/wise/ppt/WISE-Asteroids.ppt 44 https://en.wikipedia.org/wiki/Telescope 45 https://www.youtube.com/watch?v=goL3K_xQzbE 46 http://inventors.about.com/od/cstartinventions/a/CCD.htm 47 https://en.wikipedia.org/wiki/Photodiode 48 Computerworld. August 6, 2001, p.49 49 http://www.digicamhistory.com/1970s.html 50 http://petapixel.com/2010/08/05/the-worlds-first-digital-camera-by-kodak-and-steve-sasson/ 51 http://www.apple.com/iphone-6s/specs/ 52 http://spiff.rit.edu/richmond/asras/catch_plates/catch_plates.html 53 http://www.planetary.org/blogs/emily-lakdawalla/2011/3248.html 54 http://ssd.jpl.nasa.gov/sbdb.cgi?sstr=2007+PA8&orb=1 55 https://answers.yahoo.com/question/index?qid=20080212210936AAHddvM

2016 EMC Proven Professional Knowledge Sharing 42

56 http://www.lsst.org/about/dm 57 http://www.gutenberg.us/articles/big_data 58 http://www.lsst.org/lsst/public 59 https://en.wikipedia.org/wiki/Large_Synoptic_Survey_Telescope 60 http://www.symmetrymagazine.org/breaking/2010/10/18/astronomical-computing 61 http://www.symmetrymagazine.org/breaking/2010/10/18/astronomical-computing 62 https://en.wikipedia.org/wiki/Inverse_problem 63 http://www.theregister.co.uk/Print/2010/11/26/lsst_big_data_and_agile/ 64 http://research.majuric.org/wp/survey-science/large-survey-database/ 65 http://www.lsst.org/about/dm/technology 66 https://en.wikipedia.org/wiki/Hierarchical_Data_Format 67 https://www.cac.cornell.edu/education/Training/Data12/DataFormats2012.pdf 68 http://research.majuric.org/wp/survey-science/large-survey-database/ 69 http://fallingstar.com/home.php 70 http://blog.fallingstar.com/index.php/2015/12/04/our-first-neo/ 71 http://www.leonarddavid.com/asteroid-alert-system-first-light-reported/ 72 https://gears.guidebook.com/guide/39106/event/11384479/ 73 http://fallingstar.com/specifications.php 74 http://wise.ssl.berkeley.edu/mission_faq.html 75 http://wise.ssl.berkeley.edu/mission.html 76 http://www.jpl.nasa.gov/multimedia/wise/ 77 http://wise.ssl.berkeley.edu/mission_faq.html 78 http://wise2.ipac.caltech.edu/docs/release/allsky/expsup/sec8_1.html 79 https://en.wikipedia.org/wiki/Wide-field_Infrared_Survey_Explorer#NEOWISE 80 https://en.wikipedia.org/wiki/Tracking_and_Data_Relay_Satellite_System 81 http://wise.ssl.berkeley.edu/documents/wise/launch/2009-12-03.pdf 82 http://wise.ssl.berkeley.edu/edu_accessing_images.html 83 http://wise2.ipac.caltech.edu/docs/release/neowise/expsup/sec4_1.html 84 http://wise2.ipac.caltech.edu/docs/release/allsky/expsup/sec4_3a.html 85 http://wise2.ipac.caltech.edu/docs/release/prelim/expsup/sec4_3a.html 86 http://www.eso.org/sci/php/meetings/adass2011/Slides/PDF/All/ADASS_XXI_I01_Cutri.pdf 87 https://en.wikipedia.org/wiki/Gaia_(spacecraft) 88 http://esamultimedia.esa.int/multimedia/publications/BR-296/ 89 https://en.wikipedia.org/wiki/Gaia_(spacecraft) 90 http://www.odbms.org/wp-content/uploads/2013/11/Charting_the_Galaxy.pdf 91 http://www.mpia.de/gaia/about/dpac 92 https://en.wikipedia.org/wiki/Data_Processing_and_Analysis_Consortium 93 http://www.intersystems.com/library/library-item/european-space-agency-chooses-intersystems-cach-database-for- gaia-mission-to-map-milky-way/ 94 http://gaia.ac.uk/mission/gaia-dpac 95 http://www.iwinac.uned.es/Astrostatistics/w/manuscripts/deTeodoro.pdf 96 http://www.odbms.org/blog/2011/02/objects-in-space/ 97 https://upload.wikimedia.org/wikipedia/commons/b/ba/MareNostrum_III_cenital_general.jpg 98 http://gaia.ub.edu/?page_id=4327 99 http://www.apc.univ-paris7.fr/~beckmann/common/Gleyzes_Espace_BigData_CNES.pdf 100 http://www.spaceops2012.org/proceedings/documents/id1275512-Paper-003.pdf 101 https://www.youtube.com/watch?v=PkR6LAOgSII 102 https://www.skatelescope.org/location/ 103 https://www.skatelescope.org/layout/ 104 https://en.wikipedia.org/wiki/Square_Kilometre_Array 105 https://www.skatelescope.org/sadt-report-skaenews-july2015/ 106 https://www.youtube.com/watch?v=PkR6LAOgSII 107 https://www.skatelescope.org/frequently-asked-questions/ 108 https://www.skatelescope.org/signal-processing/ 109 https://www.skatelescope.org/signal-processing/ 110 https://www.skatelescope.org/signal-processing/ 111 https://www.emc.com/collateral/software/white-papers/h10938-vnx-best-practices-wp.pdf A RAID 6 (14+2) raid group using 4TB drives contains approximately 50TB of usable space. Twenty of these groups would equal 1 PB. 112 https://www.skatelescope.org/wp-content/uploads/2013/09/SDP-PROP-DR-001-1_ElemConc.pdf 113 https://www.skatelescope.org/wp-content/uploads/2013/09/SDP-PROP-DR-001-1_ElemConc.pdf 114 https://en.wikipedia.org/wiki/Lustre_(file_system)

2016 EMC Proven Professional Knowledge Sharing 43

115 http://www.cam.ac.uk/research/features/masters-of-the-universe#sthash.5RBAd34q.dpuf 116 https://www.skatelescope.org/wp-content/uploads/2013/09/SDP-PROP-DR-001-1_ElemConc.pdf 117 https://www.skatelescope.org/wp-content/uploads/2013/09/SDP-PROP-DR-001-1_ElemConc.pdf 118 https://www.skatelescope.org/wp-content/uploads/2013/09/SDP-PROP-DR-001-1_ElemConc.pdf, p. 48 119 https://www.skatelescope.org/sdp/ 120 https://www.skatelescope.org/software-and-computing/ 121 http://www.top500.org/lists/2015/06/ 122 https://www.skatelescope.org/software-and-computing/ 123 https://en.wikipedia.org/wiki/Timeline_of_astronomy 124 http://minorplanetcenter.net/blog/lets-start-2014-with-a-bang-hello-and-goodbye-to-asteroid-2014-aa/ 125 https://en.wikipedia.org/wiki/2014_AA 126 http://minorplanetcenter.net/blog/wp-content/uploads/2014/01/2014AA-2014-01-02-673_0-by-G96.gif 127 http://kti.tugraz.at/staff/elex/courses/science20/slides/e-science_e-infrastructures_content_mining_week4.pdf 128 https://en.wikipedia.org/wiki/Image_subtraction 129 http://www.asterank.com/about 130 https://github.com/typpo/asterank 131 https://docs.mongodb.org/v3.0/reference/operator/query/lt/ 132 http://www.csicop.org/si/show/is_the_sky_falling 133 http://www.arm.ac.uk/preprints/455.pdf 134 http://dawn.jpl.nasa.gov/multimedia/pdfs/Dawn_Vesta_Ceres_Lithograph.pdf 135 https://en.wikipedia.org/wiki/Asteroid_impact_avoidance 136 “Military Space Power: A Guide to the Issues” by Wilson Wong and James Fergusson, ISBN 0313356807, p. 98 137 http://www.neoshield.net/mitigation-measures/kinetic-impactor/ 138 http://news.discovery.com/space/asteroids-meteors-meteorites/top-10-asteroid-deflection-13013010.htm 139 http://www.travelsinorbit.com/save-the-planet-from-asteroids/ 140 https://en.wikipedia.org/wiki/Yarkovsky_effect 141 “Solar Sailing: Technology, Dynamics and Mission Applications” by Colin McInnes. ISBN 3540210628 p.33 142 http://www.dailymail.co.uk/sciencetech/article-2308660/Animation-released-shows-Nasa-intends-CAPTURE- asteroid.html 143 http://phys.org/news/2008-12-asteroid.html 144 http://www.universetoday.com/90605/nasa-developing-real-life-tractor-beams/ 145 http://www.projectrho.com/public_html/rocket/infrastructure.php 146 http://www.sei.aero/downloads/SEI_LOEM_30March2004.pdf 147 http://discovermagazine.com/2011/apr/14-when-astronomy-met-computer-science 148 https://gigadom.wordpress.com/2011/06/29/to-hadoop-or-not-to-hadoop/ 149 http://www.cray.com/sites/default/files/CP-Cray-NNSA-XC40-Trinity.pdf 150 Top500 is an organization that rates supercomputers (www.Top500.org). 151 https://en.wikipedia.org/wiki/Tianhe-2 152 http://www.hpcwire.com/2014/07/17/dd/ 153 https://d0.awsstatic.com/whitepapers/Intro_to_HPC_on_AWS.pdf 154 https://cloud.google.com/solutions/architecture/highperformancecomputing 155 http://www.nextplatform.com/2015/07/13/top-500-supercomputer-list-reflects-shifting-state-of-global-hpc-trends/ 156 http://www.lanl.gov/science/NSS/pdf/NSS_April_2013.pdf 157 http://www.pcworld.com/article/127105/article.html 158 https://en.wikipedia.org/wiki/Intel_4004 159 HGST Ultrastar C15K600 https://www.hgst.com/sites/default/files/resources/Ultrastar_C15K600_SAS_Spec_V1.4.pdf 160 https://www.hgst.com/products/hard-drives/ultrastar-he10 161 http://www.theregister.co.uk/2015/08/18/dssd_nvme_fabric_flash_magic/ 162 http://insidehpc.com/2015/04/taccs-wrangler-uses-dssd-technology-for-data-intensive-computing/ 163 https://en.wikipedia.org/wiki/Astroinformatics 164 https://en.wikipedia.org/wiki/Astrostatistics 165 https://en.wikipedia.org/wiki/Declination 166 https://en.wikipedia.org/wiki/Right_ascension

2016 EMC Proven Professional Knowledge Sharing 44

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

2016 EMC Proven Professional Knowledge Sharing 45