Machine Learning in Large Radio Astronomy Surveys (How to Do Science with Petabytes)
Total Page:16
File Type:pdf, Size:1020Kb
Machine Learning in Large Radio Astronomy Surveys (How to do Science with Petabytes) Ray Norris, Western Sydney University & CSIRO Astronomy & Space Science, ASKAP: Australian Square Kilometre Array Pathfinder ▪ $185m telescope built by CSIRO, approaching completion ▪ Mission: to solve fundamental problems in astrophysics ▪ “EMU” = Evolutionary Map of the Universe PAFs -> Big Data Data Rate to correlator = 100 Tbit/s = 3000 Blu-ray disks/second = 62km tall stack of disks per day = world internet bandwidth in June 2012 Processed data volume = 70 PB/yr (only store 4 PB/yr) EMU: Evolutionary Map of the Universe ▪ PI Ray Norris ▪ Will survey the whole sky for radio continuum ▪ Will discover ~ 70 million galaxies, ▪ compared to 2.5 million currently known ▪ Will revolutionize our view of the Universe ▪ Will revolutionize the way we do astronomy ▪ “large-n astronomy” ASKAP Radio Continuum survey: EMU = 70 million NVSS=1.8 million current total=2.5 million From Norris, 2017, Nature Astronomy, 1,671 1940 1980 2020 EMU Team: ~300 scientists in 21 countries Key Title Project Leader project KP1. EMU Value-Added Catalogue Nick Seymour (Curtin) KP2. Characterising the Radio Sky Ian Heywood (Oxford) KP3. EMU Cosmology David Parkinson (KASA, Korea) KP4. Cosmic Web Shea Brown (Iowa) KP5. Clusters of Galaxies Melanie Johnston-Hollitt (NZ) KP6. cosmic star formation history Andrew Hopkins (AAO) KP7. Evolution of radio-loud AGN Anna Kapinska (UWA) KP8. Radio AGN in the EoR Jose Afonso (Lisbon) KP9. Radio-quiet AGN Isabella Prandoni (Bologna) KP10. Binary super-massive black holes Roger Deane (Cape Town) KP11. Local Universe Josh Marvil (NRAO) KP12. The Galactic Plane Roland Kothes (Canada) KP13. SCORPIO: Cataloguing the Radio Stars in our Galaxy Grazia Umana (Catania) KP14. WTF: Discovering the Unexpected Ray Norris (CSIRO/WSU) KP15 The Magellanic Clouds Miroslav Filipovic (WSU) Bad news: even with 300+ Integrated Sachs-Wolfe effect scientists, we cannot analyse can measure Dark Energy data in traditional ways Good news: with 70 million galaxies, we can extract the science from the data in innovative ways -> “large-n astronomy” EMU can cross-correlate galaxy positions against the cosmic microwave background E.g: “Cosmic magnification” enables us to measure whether gravity still obeys General Relativity at large distances Large-n approach: EMU can cross-correlate foreground galaxies and background galaxies. But to do these tests we need to know (roughly) the distances (redshifts) of the galaxies Traditional approach: measure redshifts with a spectrometer on a large optical telescope Large-n approach: Use machine learning on all observational features to estimate the “statistical redshift” Currently comparing machine-learning techniques for redshift measurement. Neural net Random Forest (NA) Random Forest (JHU) kNN redshift Fit determined determined - ML Residual Spectroscopic redshift From Norris et al. 2018, submitted to PASP “There’s nothing as useless as a radio source” (Jim Condon, 2011) Data from Jordan Collier PhD thesis Source classification & Cross-identification Problem: radio sources can consist of several components Cross-identification Radio image (contours) overlaid on infrared image (greyscale) Radio image Radio Galaxy Zoo • Radio Galaxy Zoo Project started in 2010 to solve the EMU cross-ID problem • Initially used ATLAS but then added FIRST • Launched in December 2013 • >12,000 citizen scientists • >2,000,000 classifications • >120,000 galaxies classified (image courtesy of Mathew Alger and Radio Radio Galaxy Zoo Galaxy Zoo) (over 2 million identifications by i citizen scientists) Double radio sources have an infrared galaxy between the radio components Single radio sources have an infrared galaxy at the same position as Radio contours on infrared grey-scale the radio components Current Radio Source cross-ID projects • Expert manual cross-ID for training/test (lead: Jesse Swan,U. Tas) • Radio Galaxy Zoo (lead: Ivy Wong, UWA) • Bayesian (lead: Dongwei Fan, NO/CAS, & Tamas Budavari,JHU) • Convol. Neural Net (several groups) • Self-organized maps (Tim Galvin, CSIRO) • Self-organized maps with auto-encoders (Nic Ralph, WSU) • Image complexity (lead: Gary Segal, UQ) (Gold standard reliability: NVSS 90%, ATLAS/EMU 99%) A Bayesian approach: Pulsars for Public & Pupils | Robert Hollow | “Machine learning in astronomy” collaboration • Participants from WSU, CSIRO (CASS, Data61, IM&T) ANU, Sydney Uni, UQ, U.Iowa, U. Minnesota, U.Calgary, etc. • Regular informal zoom research meetings take place every 2nd Thursday at 11.00 - 12.00 ADST, 09.00-10.00 AWST, 0200-0300 UTC on https://uws.zoom.us/j/2319669070 • Provide data sets and training sets for experimenting • E.g. ATLAS DR3, synthetic data sets, etc • Other resources – see http://mlprojects.pbworks.com We welcome all interested to join our discussion meetings. We acknowledge the Wajarri Yamaji people as Westernthe traditional Australia owners of the ASKAP site See our newsletter on http://askap.pbworks.com.