<<

Maximizing the Information from Surveys of the 2020s Daniel Masters (JPL/Caltech) Collaborators: Peter Capak, Iary Davidzon, Daniel Stern, Jason Rhodes, Judy Cohen, Shooby Hemmati, Megan Tjandrasuwita, Bahram Mobasher,, Dave Sanders, Adam Stanford IPAC Seminar October 16, 2019

ⓒ 2019 California Institute of Technology. Government sponsorship acknowledged.1 Breakthrough from the 1990s: Accelerating cosmic expansion

2011 Nobel Prize in

Λ? Tension of local H0 measurement with CMB- based value now at 4.4s (Riess et al. 2019)

2 Probes of the Cosmic Expansion History: Standard Candles

type Ia supernovae (other supernovae) (quasars)

3 Probes of the Cosmic Expansion History: Standard Rulers

cosmic microwave background Baryon acoustic oscillations (BAO) characteristic size is imprinted into cosmic density fluctuations at recombination; can measure that characteristic scale over cosmic time 4 Probes of the : Weak Gravitational Lensing

intervening large scale structure magnifies and distorts (shears) the shapes of background sources (~2% effect)

5 Upcoming Dark Energy Missions

Proposed lifetime 2022 - 2032 2022 - 2028 2025 - 2031 Mirror size (m) 6.5 (effective diameter) 1.2 2.4 Survey size (sq deg) 20,000 15,000 2,227 Median z (WL) 0.9 0.9 1.2 Depth (AB mag) ~27.5 ~24.5 ~27 FoV (sq deg) 9.6 0.5 (Vis) 0.5 (NIR) 0.28

6 The measurement problem • Weak lensing probes the growth of structure with redshift • Need to split shear sample up into well-defined redshift bins, so we know which are foreground vs. background galaxies - Mean redshift of galaxies in ~10-20 shear bins must be known to better than 0.2% • Not possible with existing photometric redshift (photo-z) methods Degradation of cosmological constraint with increasing photo-z bias (Ma et al. 2006)

7 Template fitting

• Use models of galaxy spectra to define a grid of possible colors • Main issues: - Are templates fully representative of the true population? What about overfitting? - How to determine correct priors for different template/redshift combinations?

Stickley et al. 2016, simulated SPHEREx data

8 Machine learning

• Can we uncover the color-redshift relation directly? • Relies on spectroscopic training samples • Unfortunately we’re not in a data Credit: Google rich environment – spectroscopic www.google.com/about/main/machine-learning-qa/ samples are limited and biased

Redshift

9 The empirical P(z|C) relation

SDSS galaxy distribution in two colors i - g

g-r Rahman et al. 2015 à Photo-z’s are fundamentally a mapping of galaxy colors to redshift à Color distribution of galaxies to a given depth is limited and measurable 10 The self-organizing map • The problem of mapping a high-dimensional dataset arises in many fields, and many techniques have been developed • We used the Self-Organizing Map (SOM; Kohonen 1990)

Illustration of the SOM (From Carrasco Kind & Brunner 2014) What is a SOM? Starts with high-dimensional data

12 Credit: Shoubaneh Hemmati (JPL/Caltech) What is a SOM? Similar in one dimension

Credit: Shoubaneh Hemmati (JPL/Caltech)

13 What is a SOM? Similar in another dimension

Credit: Shoubaneh Hemmati (JPL/Caltech)

14 What is a SOM?

• The SOM projects a high-dimensional data space to lower dimension in a topologically-ordered way

15 Credit: Shoubaneh Hemmati (JPL/Caltech) Training the map

Training data Best-matching cell in current SOM 1. Initialized map is presented with training data, i.e. the colors of one galaxy from the overall sample. 2. Map moves towards training data, with the closest cells being most affected. 3. Process repeats many times with samples drawn from training set until the map approximates the data distribution well.

16 The 8-color Galaxy SOM to depth

20 20

21 Cell # 8642, x = 17, y = 115 21 Cell # 8988, x = 63, y = 119

22 Photo-z estimate: 1.186 22 Photo-z estimate: 0.595

23 23

24 24 AB mag AB AB mag AB

25 25

26 26

27 27 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 Wavelength (μm) Wavelength (μm) 20 20

21 Cell # 8342, x = 17, y = 111 21 Cell # 6490, x = 40, y = 86 Photo-z estimate: 1.298 22 22 Photo-z estimate: 2.364

23 23 AB mag AB 24 24 AB mag AB 25 25

26 26

27 27 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 Wavelength (μm) Wavelength (μm) 20 Masters et al. 2015, 2017 20 21 Cell # 2043, x = 18, y = 27 21

22 Photo-z estimate: 0.760 17 22

23 23

24 24 AB mag AB AB mag AB Cell # 3051, x = 51, y = 40 25 25 Photo-z estimate: 0.529 26 26

27 27 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 Wavelength (micron) Wavelength (micron) Masters et al. 2015, ApJ, 813, 53 SOM traces bulk of the color distribution

Blue points: colors of SOM cells

Red points: Galaxy colors to Euclid depth

18 Intentionally mapping the color space • The unsupervised manifold learning of the color space lets us explicitly measure where we have spectroscopic training data

Masters et al. 2015, ApJ, 813, 53 19 C3R2 = Complete Calibration of the Color-Redshift Relation

Judith Cohen (Caltech) - PI of Caltech Keck C3R2 allocation 16 nights (DEIMOS + LRIS + MOSFIRE, kicked off program in 2016A) Daniel Stern (JPL) - PI of NASA Keck C3R2 allocation 10 nights (all DEIMOS; “Key Strategic Mission Support”) Daniel Masters (JPL) – PI of NASA Keck C3R2 allocation 2018A/B 10 nights (5 each LRIS/MOSFIRE; “Key Strategic Mission Support”) Dave Sanders (IfA) - PI of Univ. of Hawaii Keck C3R2 allocation 6 nights (all DEIMOS) + H20 Bahram Mobasher (UC-Riverside) - PI of UC Keck C3R2 allocation 2.5 nights (all DEIMOS)

+ time allocations on VLT (PI F. Castander), MMT (PI D. Eisenstein), and GTC (PI C. Guitierrez) -Coordinating closely with these collaborators for these observations -Sample drawn from 6 fields totaling ~6 deg2

Additional Collaborators: Peter Capak, S. Adam Stanford, Nina Hernitschek, Francisco Castander, Sotiria Fotopoulou, Audrey Galametz, Iary Davidzon, Stephane Paltani, Jason Rhodes, Alessandro Rettura, Istvan Szapudi, and the Euclid Organization Unit – Photometric (OU-PHZ) team 2 0 Mapping the galaxy P(z|C) relation Complete Calibration of the Color-Redshift Relation (C3R2) Survey:

u Designed to “fill the gaps” in our knowledge of the color-redshift relation to Euclid depth

u Collaboration of Caltech (PI J. Cohen, 16 nights), NASA (PI D. Stern, 10 nights, PI D. Masters, 10 nights (2018A/2018B)), the University of Hawaii (PI D. Sanders, 6 nights), and the University of California (PI B. Mobasher, 2.5 nights), European participation with VLT (PI F. Castander) - Multiplexed spectroscopy with a combination of Keck DEIMOS, LRIS, and MOSFIRE and VLT FORS2/KMOS targeting VVDS, SXDS, COSMOS, and EGS - DR1 (Masters, Stern, Capak et al. 2017) comprised 1283 redshifts, DR2 (Masters, Stern, Cohen, et al. 2019) brings total to >4400 redshifts, observations in 2017B and later will comprise DR3 (https://sites.google.com/view/c3r2-survey/home)

u Currently a total of 44 Keck nights awarded (29 observed in 2016A-2017A, 5 nights each in 2017B/2018A/2018B)

21 C3R2 color map overview

22 C3R2 stats through DR2 (2016A-2017A) • 29 nights, ~19 good weather - 22 DEIMOS, 5 LRIS, 2 MOSFIRE • 6696 spectra: 4525 Q >= 3 (high quality), 3970 Q = 4 (certain)

Masters et al. 2019 ApJ accepted 23 SOM-based redshift performance • Simple test: Use position on SOM to predict photo-z - Incorporate nothing in defining P(z|C) relation other than the median deep survey photo-z in cells of the Euclid/WFIRST color space

- Outlier fraction 4.7%, bias (after removing outliers) of 0.18% Masters et al. 2019 ApJ 877, 81 24 Color space coverage

25 How much does galaxy brightness matter?

All unique pairs of spec-z galaxies with matching positions on SOM are shown, illustrating the relation of magnitude and redshift at fixed color.

26 How to view the SOM solution

• Dimensionality reduction? • “Explicit” neural network-style functional solution? • Cluster identifier? • All of these interpretations are valid… • But what we might really be interested in is density estimation!

27 SOM as density estimator

28 The data prior • Will have lots of noisy measurements M estimating some true underlying value C (e.g., true galaxy colors) - Assume we want to measure some parameter z given M:

Bayes’ rule

Probability of measurement Prior on the data having a : M given true underlying : truth value. Encoded in data value C - easy distribution itself!

29 P(C) – The density prior

? Most likely truth value! Why is the data prior so important?

• Lets us identify true outliers, or the “surprise” value of a given observation • Increases information content of noisy measurements • Lets us achieve unbiased statistical estimates • True density distribution encodes galaxy evolution • But how to recover P(C)?

31 What we really want is high-dimensional deconvolution

• Like extreme deconvolution (XDC; Bovy Hogg & Roweis 2011), we want to infer a distribution function from a large number of individual, noisy, and sometimes incomplete samples

“The description you are interested in as a scientist is not the observed distribution, what you really want is the description of the distribution you would have if you had good data, that is, data with vanishingly small uncertainties and with all of the dimensions measured.” - Bovy, Hogg & Roweis, 2011

32 Manifold learning for galaxy physics

ØEvidence suggests the information content of the ~8 broadband images is significantly higher than would be inferred from, e.g., template fitting ØWe are actively exploring this problem ØColor selections have a long history in ØWhat can we learn from high-dimensional color selection?

33 Power of simple color selections

Intermediate redshift galaxies (Steidel et al. 2004)

Passive galaxies (Williams et al. 2009)

Quasar selection (Stern et al. 2005)

Red sequence cluster selection (Gladders & Yee 2000) 34 Link between colors and physics

NGC 4125

Spectrum encodes the physics

Arp 256S

Templates from Brown et al. 2014

35 Position on SOM predicts spectral properties

SOM built from colors of galaxies in the CANDELS survey selected to look like the WFIRST weak lensing sample (Hemmati et al. 2019, ApJ accepted) 36 Position on SOM predicts spectral properties

SOM built from colors of galaxies in the HORIZON-AGN cosmological hydro-simulation Davidzon et al. in prep (Laigle et al. 2019, Davidzon et al. 2019 in prep) 37 Physics from the manifold – rate

Davidzon et al. in prep 38 Summary

• Exploring different machine learning approaches to derive maximum information from large astrophysical datasets of the 2020s • Manifold learning has been used successfully (e.g., the SOM, C3R2) • C3R2 survey is progressing with the goal of calibrating photo-z for Euclid and laying the groundwork for WFIRST • Working on a scalable high-dimensional deconvolution approach to measure the true distributions encoded by these datasets • All in service of statistically accurate inference in the world of huge datasets of the next decade! Thank you!

39