McWilliams Center for

McWilliams Center for Cosmology

McWilliams Center for Cosmology

McWilliams Center for Cosmology

,CODQTGG Rachel Mandelbaum (+Optimus Prime) Observational cosmology: • how can we make the best use of large datasets? (+stats, ML connection) • • the - connection

I measure this: for tens of millions of to (statistically) map dark matter and answer these questions

Data I use now:

Future surveys I’m involved in: Hung-Jin Huang

0.2 0.090 0.118 0.062 0.047 0.118 0.088 0.044 0.036 0.186 0.016 70 0.011 0.011 − 0.011 − 0.011 − 0.011 − 0.011 − 0.011 0.011 − 0.011 0.011 ± ± ± ± ± ± ± ± ± ± 50 η

0.1 30 ∆ Fraction 10 1 24 23 22 21 0.40.81.2 0.20.61.0 0.6 0.2 0.20.6 0.30.50.70.9 1.41.61.82.02.2 0.15 0.25 0.10.30.5 0.4 0.00.4 − − 0−.1 − − − − . 0.090 ∆log(cen. R )[kpc/h] P ∆ 0.011 0.3 cen. Mr cen. color cen. e eff cen log(richness) z cluster e R ± dom 1 0.2 . −

cen 0.1 3 Fraction − 21 Research: − 0.118 0.622 0.3 0.011 0.009 Mr 22 1 . ± ± 0.2

0 − . 23

− 0.1 Fraction cen intrinsic alignments in 24 − 0.8 0.062 0.062 0.084 0.7 1.2 0.011 0.011 0.011 0.6 redMaPPer clusters − ± − ± − ± 0.5 color

. 0.4 0.8 0.3 Fraction cen 0.2 0.4 0.1 1.0 0.047 0.048 0.052 0.111 0.3 Advisor : 0.011 0.011 0.011 0.011 e − ± − ± − ± − ± . 0.2 0.6 cen 0.1 Fraction Rachel Mandelbaum 0.2 h] / 0.6 0.6 0.118 0.046 0.000 0.045 0.086 0.5

)[kpc − 0.011 − 0.011 − 0.011 − 0.011 0.011 ff 0.4 e 0.2 ± ± ± ± ±

R 0.3 . 0.2

0.2 Fraction − 0.6 0.1 φsat − log(cen 0.6

∆ 0.9 0.088 0.481 0.393 0.107 0.034 0.089 0.5 0.011 0.010 0.010 0.011 0.011 0.011 0.7 − ± − ± − ± ± − ± ± 0.4 cen 0.3 P 0.5 0.2 Fraction 0.1 0.3 0.5 2.2 0.044 0.164 0.288 0.099 0.003 0.087 0.025 0.4 2.0 0.011 0.011 0.011 0.011 0.011 0.011 0.011 − ± ± − ± ± ± ± ± 0.3 1.8 θcen 0.2

1.6 Fraction 0.1 log(richness) 1.4 0.036 0.175 0.152 0.153 0.074 0.214 0.005 0.022 0.3 0.011 − 0.011 − 0.011 − 0.011 0.011 − 0.011 − 0.011 − 0.011 0.25 ± ± ± ± ± ± ± ± 0.2 z

0.1 Fraction central 0.15 0.186 0.002 0.046 0.028 0.001 0.031 0.056 0.127 0.145 0.3 0.5 − 0.011 0.011 0.011 0.011 0.011 0.011 − 0.011 − 0.011 − 0.011 ± ± ± ± ± ± ± ± ± 0.2 0.3 Fraction cluster e 0.1 0.1 φsat 0.4 0.016 0.057 0.039 0.013 0.074 0.018 0.051 0.010 0.009 0.017 0.3 0.011 0.011 0.011 0.011 0.011 0.011 0.011 0.011 0.011 0.011 ± − ± ± − ± ± − ± − ± ± ± − ± R 0.2

∆ 0.0

0.1 Fraction 0.4 − 10 30 50 70 3 1 1 24 23 22 21 0.40.81.2 0.20.61.0 0.6 0.2 0.20.6 0.30.50.70.9 1.41.61.82.02.2 0.15 0.25 0.10.30.5 0.4 0.00.4 − − − − 0−.1 − − − − ∆η cen. dom. cen. Mr cen. color cen. e ∆log(cen. Reff)[kpc/h] Pcen log(richness) z cluster e ∆R Sukhdeep Singh Graduate Student with Prof. Rachel Mandelbaum

Research

0.8 20 LOWZ-CMB lensing Wm = 0.20 LOWZ-z1 15 W = 0.25 LOWZ-galaxy lensing 0.7 m LOWZ-z2

gm 10 Wm = 0.30 CMASS °

1. Weak Lensing p

r LOWZ Pullen+,15 5 0.6 0 i 5 G 0.5 Closed markers: Galaxy lensing Science E h 100 LOWZ-CMB lensing Open markers: CMB lensing LOWZ-galaxy lensing 0.4

- Gravitational Physics error

gm 1 10 0.3 - Nature of Dark Matter, Dark Energy ° 0.2 100 101 102 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 rp [Mpc/h] z

IA 2. Intrinsic Alignments measurement

- Galaxy Formation and Evolution - Weak Lensing Systematics ́μ˧ͽ̏΍͕υ Ͱ˧ͽϝυυ̤

ࠇ °ųɯƚɟʲŏɾǞȩȘŏȀ żȩɯȒȩȀȩǃʿ ࠇ ĦƚŏǺ ƚȘɯǞȘǃ ࠇ ɯɾɟȩɯɾŏɾǞɯɾǞżɯ ࠇ ëɔŏɟɯǞɾʿ ࠘ ›ŏżǕǞȘƚ ƚŏɟȘǞȘǃ

ÚȩɯɾƇȩż ʶǞɾǕ áŏżǕƚȀ ›ŏȘƇƚȀųŏʕȒ Danielle Leonard McWilliams Postdoctoral Fellow

• Weak lensing + other LSS probes of non-standard cosmology, especially alternative theories of gravity • Degeneracies involving beyond-LCDM parameters • Understanding theoretical uncertainties, as related to next-generation surveys Matthew G. Walker Mao-Sheng Liu (Terrence) Advisor: Matthew Walker

Study the distribution of dark matter at small scale through sampling-based inference, including: ρ

g o • Likelihood Approximation L • Approximate Bayesian Computation • Machine Learning

Log r

Evan Tucker - 4th Year Grad Student Working with Matt Walker, we developed a new model for fitting galaxy spectra extracting population properties: age,

[Fe/H], vlos, and mass. We are now developing a new mixture model to understands dynamics of galaxy clusters.

Alex Ge!n"r-SamePostdoctoral# researcher Astroparticle physics

astrophysical observation = dark matter + backgrounds particle physics ⇥ distribution gamma-rays dwarf galaxies 5 10 statistical tools

] annihilation 1 1

10 6

sr stellar dark matter

1 13390 18 11 1 Fermi Reticulums II 51 33 22

2 135.4 detecting unresolved sources kinematics distribution 96.3 59.2 6 33.5 WIMPs 10 19.9 11.5 6.5 3.9 2.3

[GeV cm 1.3 0.6 0.3 0.2 0.1 0.1 velocity dispersion profile pulsars,

dF/dE correlations 2 E 7 10 0 1 2 moving objects 10 10 10 Energy [GeV] search for new particles beyond the t VERITAS Standard Model

+ 24 ⌧ ⌧ 10

1.0 3 25 ] 10 ⌦ 1 Annihilation known populations of gamma- 24 Draco y s

0.8 3 ray emitters vs dark matter 25 , dJ/d 10 23

⌦ unassoc (178) bin (1) [cm ? 0.6 2 /d fsrq (100) hmb (3) i 10 22

v Decay bll (36) nov (1)

h 21 decay 26 0.4 bcu (22) snr (13)

dJ inferred dark matter i 20 agn (2) spp (15) v 26 nlsy1 (2) psr (11) Curvature parameter

x 0.5 101 distribution 0.2 h 10 3 19 statistical framework to 10 0.9 rdg (2) pwn (2) 19 ,J 18 gal (1) glc (6) J 0.8 0.0 1 0 1 2 3 4 5 extract particle physics 1 2 3 10 10 10 decay 17 J Spectral index ↵ at 1 GeV 3 100 0.7 1.0 from astro data Mass [GeV] 16 0.4 ¯ b¯b hh 0.6 bb 15 2 + 2 1 2 GeV 0 + 100.8 10 GeV 10 10 ⌧ ⌧ µ µ 0.5 2 1 10 ✓ [deg] 104 1 10 1 GeV 1 2 3 0.4 10 10 10 0.6 spacetime correlation function 0.3 1000 0 Mass [GeV] 0.3 Search power to search for moving sources 0.5 GeV 0.2 0.4

Angular separation [deg] 100

0.2 1 Mass used for search [GeV] 0.1 containment fraction Detection significance 0.1 J 0.2 10 2 search for an 101 0.0 0 Counts 0 1 2 1 2 3 10 10 10 annihilation signal 10 10 10 Draco 3 True mass [GeV] 0.0 1 Event energy [GeV] 1 2 3 10 10 10 0.0 0.2 0.4 0.6 0.8 1.0 tail due to unresolved pulsars Mass [GeV] ✓ [deg] 0.1

0.01 20 25 30 35 40 Score Tina Kahniashvili

The McWilliams Center For Cosmology

– Cosmology – • Very Early Universe • Cosmic Magne9c Fields – Fundamental • MHD Turbulence Symmetries Tests – Gravita9onal Waves – Cosmic Microwave Background • Accelerated Expansion – Modified Gravity – Dark Energy • Astro-Par9cle Physics – Neutrino Mass Origin

Hy Trac CMB Asst Prof 8307 Wean Hall First Stars & Galaxies [email protected] Galaxy Clusters Group Minghan Chen, Paul La Plante, Michelle Ntampaka, Jeff Patrick, Layne Price

Interests Structure formation & evolution, large-scale structure, dark matter halos, Meshing Meshfree galaxies, clusters, cosmic reionization

Tools Cosmological simulations, N-body, hydro, radiative transfer

Ether (finite-volume particle method) Hyper (fast hydro-particle-mesh) Michelle'Ntampaka'

Research:' Outreach:' • Graduate'student'working'in' • Early'Childhood'Astronomy' Hy'Trac’s'group' • Research:''Galaxy'Cluster' Dynamics'with'ML'and'Stats' Paul La Plante • Graduate student, soon-to-be postdoc

• Works with Prof. Hy Trac

• Simulations of helium reionization

properties, IGM thermal history, Lyman-alpha forest

• Efficient, scalable algorithms Helium reionization in action for cosmological simulations z =2.9820 and analysis 1.0 HI 0.8 HeII 0.6 Flux 0.4 0.2 0.0 Peta-scale 0 5000 10000 15000 20000 computation Lyman-alpha forest used to v [km/s] measure large-scale structure Diane Turnshek, Special Faculty, Physics, CMU • Teaches Astronomy and manages classroom demos • Teacher Advisory Panelist at Carnegie Science Center • IDA Dark Sky Defender • Chair of IAU Technical Working Group against light pollution Layne Price Cosmo/Stats/ML

Early universe Machine Bayesian theory learning modelling

Jeff$Peterson:$Radio$Astronomy$With$ 20007receiver$Telescopes$ • Building$three$radically$new$telescopes$in$Canada$(CHIME),$ South$Africa$(HIRAX)$and$China$(Tianlai).$$ • Primary$Goals:$$ – Map$LSS$via$217cm$intensity$field—BAO$dark$energy$test$ – Find$and$localize$10$Fast$Radio$Burst$per$day$ Thanks! Zhonghao Luo (Roy)

● Advisor: Jeff Peterson ● 4th year graduate student ● Lyman Alpha Intensity Mapping using a small aperture telescope with grism spectrotomography Hsiu-Hsien Lin

• 4th year physics graduate student Advisor: Jeffrey Peterson

• Search Fast Radio Bursts (FRBs) and Pulsars by using Green Bank Telescope and incoming telescopes.

• Discover FRB110523 Masui, K., Lin, H.-H., Sievers, J., et al. 2015, Nature, 528, 523 Name : Aklant Kumar Bhowmick Advisor: Tiziana Di Matteo

• Third year graduate student at CMU • Worked in the field of interfacial instabilities • Interested in Smooth Particle Hydrodynamic simulations

Stanic et al. 2012 http://web.phys.cmu.edu/~tiziana/D6bh/article/compare2.gif Carolina Núñez Research Topics:

● Photometrically selected massive galaxy catalog

● SZ effect Research Assistant, “Pre-Doc” ● Group: Prof. Shirley Ho Photo-z Ross O’Connell

McWilliams Center

LSS, BAO, etc.

Current interests:! Covariance matrix estimation! ! Tomographic analysis! (for eBOSS, DESI, etc.)! ! https://github.com/rcoconnell/Rascal

Rupert Croft

weak gravitational lensing of the lyman-alpha forest

relativistic distortions of cosmology galaxies and large-scale structure video games McWilliams Center for Cosmology

McWilliams Center for Cosmology

McWilliams Center for Cosmology Fitting Baryon Acoustic Oscillations in

SIDDHARTH SATPATHY, PROF. SHIRLEY HO, PROF. RUPERT CROFT McWilliams Center for Cosmology

Work in progress.

r( ) r( ) r( ) Gravitational redshifts in galaxy clusters / MaNGA BCGs

__author__ = ‘Hongyu Zhu’

__status__ = ‘4th year graduate student’

__advisor__ = ‘Prof. Rupert Croft’ RECENT ASTROPARTICLE THEORY PROJECTS: Leonard Kisslinger and Collaborators: 1) Review of QCD, Quark-Gluon Plasma, Heavy Quark Hybrids, and Heavy Quark State Production in p-p and A-A Collisions, Leonard S. Kisslinger and Debasish Das, Int.J.Mod.Phys.A 31, 1630010 (2016) This was a review of the Quantum Chromodynamics Cosmological Phase Transitions, the Quark-Gluon Plasma, the production of heavy quark states via p-p collisions and RHIC (Relativistic Heavy Ion Colli- sions) using the mixed hybrid theory for the Ψ(2S), Υ(3S) states; and the possible detection of the Quark-Gluon Plasma via heavy quarkproduction using RHIC.

2) Polarized Gravitational Waves from Cosmological Phase Transitions, L.S.K. and Tina Kahniashvili, Phys.Rev.D 92, 043006 (2015) We estimated the degree of circular polarization for the gravitational waves generated during the electroweak and QCD phase transitions (EWPT and QCDPT) from the kinetic and magnetic helicity generated by bubble collisions during those cosmological phase transitions. 3) STERILE NEUTRINOS: neutrinos in addition to the three standard model active neutrinos. One Sterile Neutrino: Experimental parameters were used to estimate the effect on muon neutrinos converting (oscillating) to electron netrinos: L.S.K., Int.J.Theor.Phys. 54,2141(2015); Two Sterile Neutrinos: L.S.K., Int.J.Theor.Phys. to be published (2016) Review of Neutrino Oscillations With Sterile and Active Neutrinos: L.S.K., Int. J. Mod. Phys. A, Vol 31, 1630037 (2016) 4) Dark Mass Creation During EWPT via Dark Energy Interaction, L.S.K. and S. Casper, Modern Physics Letters A Vol 29, 1450055(2014). Since all standard particles received their mass during the early uni- verse cosmological Electroweak Phase Transition (EWPT), weaddedDark Matter Dark Energy terms with a Dark Energy field interacting with a Dark Matter field, called the Qunitessence fiels, to the EW Lagrangian previously used to calculate the magnetic field created during the EWPT. From this model we calculated the mass of Dark Matter Particles and es- timated the Dark Matter masses to be in the range of a few GeV to 140 GeV, which is consistent with recent experiments.

1 OFFICIAL UNIVERSITY SEAL

The official seal of Carnegie Mellon University from 1967 was originally reserved for only official documents, including diplomas, presidential and trustee minutes or other legal, academic or official university documentation—or on the highest awards or certificates.

The seal can now also be used for FORMAL occasions and formal products, including items for commencement, specific gift items in brass, silver or pewter, appropriate clothing (sports coats), stationery and other items. DO NOT use the official seal in combination with the wordmark. If an item ˆÃʈ˜ÊµÕiÃ̈œ˜]Ê«i>ÃiÊVœ˜Ì>VÌÊ >ÀŽï˜}Ê œ““Õ˜ˆV>̈œ˜ÃÊ>ÌÊ[email protected]. Please contact The Communications Design and Photography Group at commdesign-photog@andrew. cmu.edu to obtain the official seal graphic.

The Official Seal: 4-color, 1-color

OFFICIAL UNIVERSITY SEAL

The official seal of Carnegie Mellon University from 1967 was originally reserved for only official documents, including diplomas, presidential and trustee minutes or other legal, academic or official university documentation—or on the highest awards or certificates.

The seal can now also be used for FORMAL occasions and formal products, including items for commencement, specific gift items in brass, silver or pewter, appropriate clothing (sports coats), Detecting Galaxy Mergers at High Redshiftstationery and other items. DO NOT use the official seal in combination with the wordmark. If an item ˆÃʈ˜ÊµÕiÃ̈œ˜]Ê«i>ÃiÊVœ˜Ì>VÌÊ >ÀŽï˜}Ê œ““Õ˜ˆV>̈œ˜ÃÊ>ÌÊ[email protected]. Please contact The Communications Design and Photography Group at commdesign-photog@andrew. cmu.edu to obtain the official seal graphic.

The Official Seal: 4-color, 1-color

4-color version 1-color version greater than 2 inches P. Freeman, R. Izbicki, A. Lee, C. Schafer, D. Slep!ev, and J. Newman (Pitt) If the one color seal is used two inches or greater in diameter, Carnegie Mellon University appears in Department of a bolder font. International Computational Astrostatistics -- www.incagroup.org Peter Freeman Detecting Galaxy Mergers at High Redshift MNRASStatistics000, 1–9 (2016)Domain: PreprintAstrostatistics 10 August 2016 Compiled using MNRAS LATEX style file v3.0

4-color version 1-color version greater than 2 inches P. Freeman, R. Izbicki, A. Lee, C. Schafer, D. Slep!ev, and J. Newman (Pitt) If the one color seal is used two inches or greater in diameter,A Carnegie unifiedMellon University appears in framework for constructing, tuning and assessing Introduction Data Department ofAnalysisa bolder font. International Computational Astrostatistics -- www.incagroup.org Statistics 2 P.photometric E. Freeman et al. redshift density estimates in a selection bias setting 18 In our initial analysis,R. we IZBICKI populate ET AL. a p-dimensional space Mergers play an important role in the development of biased estimation1-color version less of than parameters 2 inches in downstream cosmological anal- If the one color seal (shown here in pms 187) is used at a massive galaxies at redshifts z ~ 2, transforming star- of covariates (e.g., G, andyses new (e.g.diameterWittman measures less than two2009 inches, Carnegie); forsuch Mellon instance, University1 ?as maxRatioMandelbaumWhite version et2 reversed; al. out(2008 of a background) 1 appearsP. in E.a lighter font. Freeman, R. IzbickiWhen reversing theand seal out of a A.color or black B. Lee SeriesCS demonstrateComb1 CS that the useKNN of theCS conditional densitybackground, estimate do not use thef ( 1-colorz x )(black) version; see below) with our trainingOne-color data,Department version canthen be printed of Statistics, inuse black, PMS the 187-red, Carnegie lasso Mellon always University, use the reversed5000 (white)| version. Forbes Avenue, Pittsburgh, PA 15213, USA forming disks into Introductionquenched spheroidal systems. Data Analysis Metallic PMS 877-silver and 873-gold. β(x)~3.38 (oftenβ denoted(x)~4.502Departmentp(z) in of the Statistics, astronomicalβ(x)~0.00 Federal literature University and of São often Carlos, called Brazil estimator (see, e.g., Hastie, Tibshirani,Ideal for embossing and foil stamping. and Friedman 2009) D Automated detection of merging systems in this redshift the probability–1– density estimate, or PDF) reduces systematic error in In our initial analysis,to we select populate a sparse a p-dimensional set ofgalaxy-galaxy most space important weak lensing covariates: analyses. (Here, x can represent mag- regimeMergers is thus play critical an important for testing role in theoriesthe development of hierarchical of 1-color version less than 2 inches If the one color seal (shown here in pms 187) is used at a of covariates (e.g., G, and newdiameter measures less than two inches, Carnegie such Mellon University as maxRationitudesWhite and/orversion reversed; colours out of a background and/or other ancillary information measured massive galaxies at redshifts z ~ 2, transforming star- n CARNEGIE MELLON UNIVERSITY MERCHANDISE BRAND GUIDELINESp | 04/12/ 2010! 3 galaxy formation. appears in a lighter font. When reversing the seal out of a color or black for abackground, galaxy.) do not use Several the 1-color (black) other version; works have touted the use of f (z x) forming disks into quenched spheroidal systems. see below) with our trainingOne-color data, version canthen be printed inuse black, PMS the 187-red,T lasso2 always useAccepted the reversed (white) XXX.version. Received YYY; in original form ZZZ minMetallic PMS 877-silver and( 873-gold.Yi + X ) + ⇥ 1 , where 1 = j | Ideal for embossing and foil stamping. i as well, often as a step towards better estimates of ensemble red- Automated detection of merging systems in this redshift estimator (see, e.g., Hastie, Tibshirani, 0.0 0.5 and Friedman1.0 0.0 2009)||0.5 || 1.0 0.0 ||0.5|| 1.0 | | D –1– i=1 shift distributions! (usually denoted N (z))j in=1 tomographic studies At lowregime redshifts is thus (criticalz ~ 0.2), for mergerstesting theories are efficiently of hierarchical detected to select a sparse set of most importantβ(x)~2.88 covariates: β(x)~0.75 β(x)~0.00 X (e.g. Cunha et al. 2009, Ball & Brunner 2010X, Sheldon et al. 2012, via, e.g.,galaxy the formation. comparison of two summary statistics of a n Here, CARNEGIEtheDensity MELLON response UNIVERSITY MERCHANDISE Y BRAND is GUIDELINES thep |proportion 04/12/ 2010! of votes 3 for merger/ABSTRACT T 2 Carrasco Kind & Brunner 2013, Carrasco KindPhotometric & Brunner redshift2014, estimation is an indispensable tool of precision cosmology. One problem min (Yi + Xi interaction) + ⇥ 1 , (M/I).where Given1 =Rau # et andj al. 2015 $, ,weBonnett generate2015, De predicted Vicente, Sánchez & Sevilla- Approximate galaxy image: the Gini coefficient G and the moment of || || ! || || | | that plagues the use of this tool in the era of large-scale sky surveys is that the bright galaxies At low redshifts (z ~ 0.2), mergers are efficiently detected i=1 Noarbej=1 2016),p and standard methods such as the aforementioned light statistic M20 (see below; Abraham et al. 2003, Lotz et X responses 0 < Y’ < 1 andX find the optimal threshold forthat are selected for spectroscopic observation do not have properties that match those of (far via, e.g., the comparison of two summary statistics of a Here, the response Y is the proportion of votes forBPZ, merger/ EAZY,= and ANNz provide f (z x) as an available output. 15 17 19 21 −1 0 1 2 3 0.0 1.0 2.0 −0.2 0.2 0.6 1.0 −0.5 0.0 0.5 1.0 identifying M/I’s: 1 j | more numerous) dimmer galaxies;r magnitude thus, ill-designedu−g empiricalg−r methodsr−i that producei−z accurate Bayesian al. 2004).galaxy However, image: the Ginisimulations coefficient performed G and the moment by Lotz of et al. interaction (M/I). Given # and 0.0$, we generate0.5 1.0 predicted0.0|| || 0.5 |1.0 | 0.0 0.5 1.0 p j=1 D and precise redshift estimates for the former generally will not produce good estimates for the indicatelight that statistic the M estimators20 (see below; of Abraham G and Met 20al. become 2003, Lotz et responses 0 < Y’ < 1 and find the optimal threshold for z X latter. In this paper, we provide a statistically rigorous framework for generating conditional identifying M/I’s: 1 = j SeriesCS CombCS KNNCS increasinglyal. 2004). biasedHowever, in simulations the low S/N performed and low by Lotz resolution et al. || || | | Regarding point (2): it is a well-establisheddensity truism that estimates in large- (i.e. photometricFigure 1. Estimated redshift distributions PDFs) ofthatr-band takesmodel intomagnitudes account selectionand ugriz bias and Computation j=1 regime,indicate affecting that the merger estimators detection of G and efficiencyM20 become at high z. X scale surveys there is selection bias, wherein rarethe and covariate bright galaxies shift that thiscolours bias for induces. labeled (i.e. We spectroscopic; base our approach red dashed on lines) the and assumption unlabeled that the increasingly biased in the low S/N and low resolution Conditionalare preferentially selectedDensity for spectroscopic observation.probability This that bias astronomers(i.e. photometric; label a galaxy blue solid (i.e. lines) determine galaxies, its primarily spectroscopic observed by redshift) the Sloan depends Digital Sky Survey (see Section 7). These distributions indicate the bias for maxRatioinduces a covariate–1– = max shift,R sincei the properties ofonly theseon bright its galaxiesmeasured (photometric and perhaps other) properties x and not on its true redshift. regime, affecting merger detection efficiency at high z. i s selecting brighter galaxies for spectroscopic follow-up observations, which We identify2 maxRatio as the mostWith this assumption, we can explicitly write down risk functions that allow us to both tune In this poster, we show that the G and M20 statistics alone maxRatio–1– = max Ri do not match those of more numerous dimmer galaxies (see e.g. Fig- induces a covariate shift that is here manifested as mismatches in the distri- Densityi s ure 1). Thisimportant shift aects of theouraccuracy currentRatio andcovariates. precisionand compare of empirical methods for estimating importance weights (i.e. the ratio of densities of unlabeled do notIn allowthis poster, one weto showdetect that mergers the G and (and M20 statisticsinteractions) alone in the nWe identify2 maxRatio as the most + p butions of colours between labeled and unlabeled data. and labeled galaxies for dierent values of x) and conditional densities. We also provide a do not allow one to detect mergers (and interactions) in the n important0.0 of0.5 our currentT 1.0 covariates.2photo-z0.0 estimates.0.5p 1.0 One0.0 can mitigate0.5 covariate1.0 shift by estimating The Astrophysical Journal Supplement Series,213:9(29pp),2014July Drake et al. GOODS-S ERS2 field (Windhorst et al. 2011) that were min (Yi + X ) importance+ ⇥ weights1 ,(xwhere) = fU (x)/ f L(x),1 the=method ratio of for thecombining densityj multiple conditional density estimates for the same galaxy into a single GOODS-S ERS2 field (Windhorst et al. 2011) that were T 2 Estimationi Estimation 6 min (Y + X ) + ⇥ Density , where = visually identified by members of the Cosmic Assembly In the H band (1.6 "m), the GOODS-S ERS2 field containsi i 1 of1 galaxies|| without|| j! redshift labels to|| those|| observedestimate spectroscopi- with| better| properties.2 PROBLEM We apply STATEMENT: our risk functions SELECTION to an analysis BIAS of 10 galaxies, visually identified by members of the Cosmic Assembly In the H band (1.6 "m), the GOODS-S ERS2 field contains i||=1|| ! || || | | mostlyj=1 observed by SDSS, and demonstrate through multiple diagnostic tests that⇡ our method Near-IR Deep Extragalactic Legacy Survey (CANDELS) Source 118 i=1 X Source 118cally. Forj=1 instance, Lima et al. (2008) attempt to directlyX estimate The data in a conventional photometric redshift estimation problem Near-IR Deep Extragalactic Legacy Survey (CANDELS) 6,178 detected6,178 detectedsources; for sources; 1,653 of these, for 1,653 we have of boththese, we haveX both N (z) inX a covariate shift setting with a k-nearest-neighbor-basedachieves good conditional density estimates for thed unlabeled galaxies. consist of covariates x R (photometric colours and/or magni- team team(Grogin (Grogin et al.et al. 2011). 2011). WeWe areare currently currently developing developing estimatesestimates of G and M of20 Gas andwell asM visual as well identifications as visual identifications estimator of the importance weights, an estimator since utilized by 2 20 Evolution of Galaxy MorphologyKey words: galaxies: distancestudes, etc.) and and redshifts redshifts –z. galaxies: We have access fundamental to two data parameters samples: an – galaxies: new statisticsnew statistics for for merger merger detectiondetection in in the the high- high-z regime;z regime; (disk, spheroid, merger, interaction, etc.). Of these, 236 are 1.0 Cunha et al. (2009), Sheldon et al. (2012), and Sánchez et al. (2014). xU,...,xU

40 independent and identically distributed (i.i.d.) sample (disk, spheroid, merger, interaction, etc.). Of these, 236 are 0.0p 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 statistics – methods: data analysis – methods: statistical 1 nU here, we specifically discuss a new statistic based on level identified as mergers, 70 as interacting, and 46 received maxRatioreal Rau et al. (2015p), who propose a weighted kernel density estima- consisting of photometric data without associated labels (i.e. red- here, we specifically discuss a new statistic based on level identified! as mergers, 70 as! interacting, and 46 received maxRatioz !! !! tor for f (z x),oer two other methods for computing the weights L L L L ! ! sets, which we dub maxRatio. ! ! 0.8 ! ! ! ! x , ,..., x , mixed votes (both merging and interacting). 1 = j | shifts), and an i.i.d. labeled sample ( z ) ( nL znL ) con- ! !! !! 75 1 1 ! ! sets, which we dub maxRatio. 30 Define a sequence s of levels relative to an object’s(quantile regression forest and ordinal classification PDF). All these mixed votes (both! ! merging and interacting).! ! || || | | 1 = j structed from follow-up spectroscopic studies. (For computational ! ! ! j=1 ! ! ! ! ! maximum intensity.Fig 7:ForDefineTop each: Estimated levelXa sequence i, generate densities s oflevel||weight andlevels|| sets, estimated estimators relative feature| importance to| parameters an object’s weights that one for must 6 tune galax- for proper eciency, in our analyses these datasets are samples taken from ! ! ! ! 0.6 ● 0.7 0.7 ●●● 1 j=1 ●● Cluster1 ! ! ! ! performance.1 INTRODUCTION One would generally tune estimators● by minimizing cal methods such as ANNz (Collister & Lahav 2004).DataThe former ● ! ! ! ! ! ●● ● ! ! then compute theies ratio inmaximum theRi of spectroscopic the area intensity. of the sample. second-largest For Vertical each lines level indicate i, generate the true (spectroscopicallylevel● sets,● larger pools of available labeled and unlabeled data.) Our goal is to ! ! ! ! X Cluster2 ●● ●● ●●● 20 ●● ● ! !!!! !!! ! ● ● ! ! !! ! ! !! 0.1 ● ●● ● 0.7 0.2 0.7 an estimate of risk using a validation dataset,● but the authors listed utilize sets of galaxy SED templates that are redshifted until a best ! !!! ! ! ! ! ! ! ! ●●● ! ! !!! ! !! ! !!! ! ! ! ! !! ! ! !!! ! ! level set to the largest.confirmed) maxRatio redshift. is Bottom: Examples of estimated densities for 6 galaxies● ● in the construct a photo-z conditional density estimator, f (z x), that per- ! !! ! !! !!!!! !!! !! ! !!! !!! ●● 0.6 0.6 Photometric redshift (or photo-z) estimation● is an indispensable tool ! ! !! ! !! !!!!!! ! ! ! ! !! !!!!!! ! ! then0.4 compute the ratio R of the area of the second-largest● ● ! !!! ! ! !! !!!!!!!!!!!! ! !! ! !!! !!!!!!!! ! i ●● ● | ! !!!! !! ! ! !! !!!!!!!!!!!!!!! ! ! !! ! !! ! ! !! !!!!!!!!!!! ! ! above skirt the issue of tuning by setting the● ● number of nearest match with a galaxy’s observed photometry is found, whereas the ! ! ! ! !!!! ! !!!!!!!!!!!!!!!!! ! ! !!! ! ! !! 0.1!!!!!!!!!!!! !!!! ! !!! 50 ● forms well when applied to the unlabeled data (where “well" can be G ! !! !!!!!!!! ! !!!!!!!!!!!!!!!!!!!!!! ! ! !! ! !! !!! !!!! ! ! !!!!!!!!!!!!!!!!! ! ! !! photometric sample. of precision cosmology. The planners●● of current and future large- ! ! ! ! !!!!! !!!!!!!!!!!!!!!!!!!!!!!!!! ! ! ! ! ! !!!!! !!!!!!!!!!!!!!!!!!!!!!! ! ● ● ! ! !!! !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!! ! !! ! ! !! !! !!!!! !!!!!!!!!!!!!!!!!!!! !! ! ! ! ! D Matching ● !!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! !!! ! !! ! !!! ! !! !! !! !!!!!!!!!!!!!!!!!!!!!!!!! ! ! !! ! ! !!! ! level set to the largest.neighbors maxRatio a priori, is or, in the case of Rau et al.●●● (2015), by utilizing a latter utilize spectroscopically observed galaxies to train machine !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! !!!!!!!! !!!!! !!!!!!!!!!!!!!!!!!!!!!! !

! !! ! !!10 !!!!! !!! !! ! !!! !!! ●● 0.6 0.6 ● defined by its performance with respect to a number of metrics; see !!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ● 0.5 0.5 maxRatio = max R Cluster3 ● ! ! !! ! !! !!!!!! ! ! ! !! !!!!!! ! i ●● ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! !!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!! scale photometric surveys such as the Dark Energy Survey (Flaugher !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! !!! ! ! !! !!!!!!!!!!!! !!!!! !! !!! !!!!!!!!!!!!!!!!!!!!!!! ! !! ! !!! !!!!!!!! ● ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!! !! ! ! !! !!!!!!!!!!!!!! ! !!!! !!!!!!!!!!!!!!!!!!!!!! ! !! ! !! ! ! !! !!!!!!!!!!! 0.2 i s plug-in bandwidth estimate via Scott’s rule (see their equation 24). learning methods to predict the redshifts of those galaxies that are ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! ! ! !!!! ! !!!!!!!!!!!!!!!!! ! ! ! !!!!!!!! ! ! !!!!!!!!!!!!!! ! ! ! !! !!!!!!!!!!!!!!!! ! ● Cluster4 Aligned embeddings G ! ! !! ! ! !!!!!!!!!!!!!!!!!!!! ! !! !!!!!!!! ! !!!!!!!!!!!!!!!!!!!!!! ! !! ! !! !!!!!!!!!!!!!!! ! !! !!! !!!! ! ! !!!!!!!!!!!!!!!!! ● e.g. §6 for two examples). !!!! !!!! !!!!!!!!!!!!!!!!!!!!!!! ! ! !! !!! !!!!!!!!!!!!!!!!!!!!!! ! 2 2005) and the Large Synoptic Survey● ● Telescope (IveziÊ et al. 2008), !!!!! !! !!!! !!!!!!!!!!!!!! ! ! ! !!!!! !!!!!!!!!!!!!!!!!!!!!!!!!! ! !!!! ! !!!! !!! !!!!!!!! ! ! ! ! !!!!! !!!!!!!!!!!!!!!!!!!!!!! ! ●● ● ! !! !! !! !!!!!! ! !!! !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! !! ! !! !!!!! ! ! !! !! !!!!! !!!!!!!!!!!!!!!!!!!! !! ● only observed photometrically. ! !! ! !!!!! ! !! !!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! !! !! !! ! !! !! !! !! !!!!!!!!!!!!!!!!!!!!!!!!! ● !!! ! !!!! ! !! !!! ! ! ! ! ● ● ● x 0.4 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!0.4 !!!!!!! ! ! !!!!!!!! !!!!! !!!!!!!!!!!!!!!!!!!!!!! ! ●● An issue that arises when constructing f (z ) via empirical !! !!! !!! ! !! ! ! !!! ! As in Section ??, one can further improve on the estimators●●● of f(z x)by !!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!! ! which combined will observe over● one billion galaxies, require ac- 0.5 0.5 0 maxRatio = max R ● ● i ● ●●● ● ● ! ! ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! ! !!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!! ●●● ●● ● ● | ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! !!!!! !! !!! !!!!!!!!!!!!!!!!!!!!!!! ! 0.0 ● ●● ● ● ● Less well established within the field of photo-z6 estimation, ! ! ! ! ! ●● ● ● | ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! !!!! !!!!!!!!!!!!!!!!!!!!!! i s ●● ●● ● ● techniques is that of selection bias. A standard assumption in statis- ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! !!!!!!!! ! ! !!!!!!!!!!!!!! curate and precise redshift estimates● ●● in order to fully leverage the ! ! !! ! ! !!!!!!!!!!!!!!!!!!!! ! !! ! !! !!!!!!!!!!!!!!! selecting a subset of the 10 covariates from25 the model and cmodel● ● magnitude● D !!!! !!!! !!!!!!!!!!!!!!!!!!!!!!! ! ! !! !!! !!!!!!!!!!!!!!!!!!!!!! ! 2 Cluster5 ● ● however, are methods that (1) produce conditional density estimates !!!!! !! !!!! !!!!!!!!!!!!! !!!! ! !!!! !!! !!!!!!!! ● ● ● ● tics and machine learning is that labeled and unlabeled data are ! !! !! !! !!!!! ! !! ! !! !!!!! In thisconstraining paper, we power describe of cosmological a statistically● ●● probes rigorous● such method as baryon for acoustic 0.0 −0.5 −1.0 −1.5 −2.0 −2.5! !! !−3.0!!!!! ! !! 0.00 −0.5 −1.010−1.5 −2.0 −2.520−3.0 ! !! 30!! !! ! !! 40 0.0 0.2 0.4 0.6 0.8 1.0 ● ● 0.5 !!! ! !!!! ! !! !!! ! ! ! ! ●● ● ● 6 0.4 ! 0.4 systems. Table 2 lists the selected covariates for the combined estimator● from (or error estimates) of individual galaxy redshifts and at the same !! !!! !!! ! !! ! ! !!! ! ● ●● 5 S S ! ● ● ● ●● sampled from similar distributions, which we denote5 L and U generatingoscillations conditional and density weak estimates gravitational●●● ● f●(z x lensing.) in a selection● Numerous bias estimators P 7 P where I is the galaxy image smoothed by a boxcar of width 0.25 rp, and B is the average ! ! ! ! ●● ● ● Cluster17 4 7 ! ! ●● ● Cluster18 ! ! ! ! ! ! ● ● | ● time (2) properly take into account4 the discrepancy between the ! ! a forward stepwise model search initialized by an estimate●● of●● the marginal● respectively. However, as Figure 1 demonstrates, these two distri- Second coordinate Cluster9 ● ● ● M20 setting. Incurrently §2, we define exist thatboth achieve the problem●● “good"● and● ● point our●● notation.●estimates In of● § photo-z3- red- smoothness of the background. To decrease the eect that a smoothed central bulge has on S, all ! ● ● ●● ●● ● ● ● ● ● 3 8 ●●● ●● ●● ●D ● ● Cluster19 populations of spectroscopically observed galaxies (roughly8 closer 0.7 ● ● ●● ● ● distribution f(z). For the SDSS data, the loss (9) of the combined● ●● ●● estimator● ● butions can dier greatly for sky surveys that mix spectroscopy §4 we showshifts that at if low we redshifts assume that (z the0.5), probability● where “good" that a meansgalaxy that is photo-z 0.0 3 Examples of merging (left) and interacting (right) galaxies observed . ●● ● ● ● ● ●● ● pixels within 0.25 rp of the center of the image are excluded in the calculation of S. From L04: Cluster8 ●● ●● ●● ● ● ● ●● ●● 0.0 −0.5 −1.0 −1.5 −2.0 −2.5 −3.0 0.0 −0.5 −1.0 −1.5 −2.0 −2.5 −3.0 ●●●● ● ● ●● ● ● ●● ●● and brighter) and those observed via photometry only (farther and ! ● ● ● ● ●●● ●●●● ●●● ●● ● ●●●●●●●●●●● ●● ● Cluster20and photometry; brighter galaxies2 are more likely to be selected for at low redshifts (z " 0.01) by the Hubble Space Telescope. ! 0 ●●●●● ●●● ●● ●●●●● ●●●●●●●●●●●●●●●●●●●●● ●● ●● ●●●● S For these data,S mergers (red), interactions ! labeled dependsand spectroscopic only on its photometry (or spec-z)●●●●●●●● ● ● ●●estimates and●●●●●●●●●● not●●●●●● ●●● on●●●●● for●●●● its●●●●●●● the●●● true●●●●●●same●●● redshift,●●●● galaxy largely ! ! ! with variable selection (Comb in Fig. 6b)is ●●2●●●.51● ●●●●●●●●●●●0●●●.●●●09,●●●●●●●●●●●●●●●● which●●●●●●●●●●●●●●●●●●●●●●●●●●●●● is●●● 2 CSVS ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● “because of its strong dependence on resolution, [S] is not applicable to poorly resolved and distant ! ! ! Cluster7 ●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● fainter). ! !! ! ! ! ●●● ● ● ● ●●●●●●●●●●●● ● where I is the galaxy image smoothed by a boxcar of width 0.25 r , and B is the average 0.6 ● ●●●● ● p ●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● 9 ! ! ! ! ! ●●●●●●●●●●●●●●●●●●●●●●● ± ●●●●●●●●●●●●● ●●●●● follow-up spectroscopic observation. To model how selection bias 9 (blue), and mixes (green) are clearly not ! ! ! !! ! ! ! ! ●●●●●●● ● ! which is a valid assumption●●●●●●●●●●● within●●●● the redshift regime probed by ! ! ! !! !! match, with only●●●●●●●●● a●●●●●●●● small percentage of catastrophic outliers. These 1 ! ! ! !! ! !! !!!!! ! ●●●●●●●●●●●● 1 G smaller than 2.36 0.10, the loss achieved by●● the combined estimator galaxies.” M20 ! ! !!!! !!!!!! !! ●●●● Regarding point (1): the error distributions of photo-z estimates disambiguated from otherwise normal ! ! !! !!!! !!!! !!! ! Cluster13 aects learning methods,0.5 one needs to invoke additional assump- smoothness of the background. To decrease the eect that a smoothed central bulge has on S, all ! !! !!!!!!!!!!!!!!!! ! Cluster12 Figure 18. Examples of spotted contact binaries. Left panel: observed light curves of three spotted eclipsing binary systems. Right panel: phased light curves of the − ! ! !!! !!!!!!!!!!!!!!!!! ! ! ± shallow surveysestimators such are as the conventionally SloanCluster11 Digital divided Sky Survey into (SDSS; two classes:York template 2nd diffusion coord. ! !!! ! !!!!!!!!!!!!!! ! maxRatio ! 1 maxRatio ! 0

Source0.7 118 Source 118 ! !! !!! !!!!!!! !! ! ! Cluster14 galaxies (black) using G and M20 alone. 0.5 Cluster16 are often asymmetric and/or multi-modal, so that single-number ! ! !!!!!!!! !!!!!!!! ! (CombCS) based on the 5 model covariates.Cluster6 Figure 8 shows goodness-of- tions about the relationship between and (e.g. Gretton et same systems. Examples of merging (left) and interacting (right) galaxies observed ! !!!!! !!!!!!!! !!!!!!!!! Cluster10 PL PU 10 ! ! !!!!!!!! ! !!! !! ! ! et al. 2000fitters,), we oft-usedcan write examples down risk of functions which include that allow BPZCluster15 one (Benítez to 2000) pixels within 0.25 rp of the center ofGini the image / M are20 excluded in the calculation of S. From L04: !!!!!!! !! !!! ! 10 ! ! ! ! !! ! !!!! summary statistics such as the mean or median are insucient to !! !! ! ! !!! ! ! al. 2008, Moreno-Torres et al. 2012). In this work, we assume that (A colorTransient version of this figure is available Classification in the online journal.) at low redshifts (z " 0.01) by the Hubble Space Telescope. ! ! ! ! ! ! ! ! fit plots for Comb . The Q-Q plot to the left indicates that our final For these data, mergers (red), interactions! ! ! CSVS properly tuneand estimators EAZY (Brammer, of both von(x) Dokkumand f (z x &). CoppiThese risk2008 func-), and empiri- ! !! ! !! ! ! ! ! 0.4 ! ! ! ● “because of its strong dependence on resolution, [S] is not applicable to poorly resolved and distant ! ● | describe their1.0 shapes. Furthermore, the use of such statistics leads to Ergo we might not expect to see large changes in G in CANDELS data. ! !! ! ! ! ●● 0.6 ●● −50 −25 0 25 the50 probability that a galaxy is labeled with a spectroscopic redshift ● − Gini Coecient G. ! ! ! ! ! tions also allow●● us to choose from among competing estimators. In (blue), and mixes (green) are clearly not ! ! ! !! ! ! ! ! ●● Gini coefficient: 40 ! 40 ●●● ! ! ! !! !! ●●● ●●● First coordinate ! ! !! ! !! !!!!! ●●● depends only upon x (in accord with Lima et al. 2008 and Sheldon galaxies.” G ! ! !!!! !!!!!! !! ●●● 6.2.4. Semi-Detached Binaries ●● ! ! !! !!!! !!!! !!! §5 we show●●● how one can combine estimators of conditional density disambiguated from otherwise normal ●●● n ! !! !!!!!!!!!!!!!!!! ! ●●● −2 −1 0 1 2 ●●●● ! ! !!! !!!!!!!!!!!!!!!!! ! ! ●●●● et al. 2012); i.e. 1 0.0 −0.5 −1.0 −1.5 −2.0 −2.5 −3.0 ! !!! ! !!!!!!!!!!!!!! ! ●●● maxRatio 0 ● ! maxRatio 1 ● ! ● ! ! !! !!! !!!!!!! !! ! ! imsart-aoas ver. 2012/08/31to improve●● upon file: the main.tex results achieved date: by August any one 26, estimator 2014 alone. 0.5 ●● galaxies (black) using G and M20 alone. ●●●● ! ! !!!!!!!! !!!!!!!! ! ●●●● playing with summary statistics G = (2i n 1)X , ! !!!!! !!!!!!!! !!!!!!!!! Figure●●● 7: The result of the di↵usion k-means clustering with k =20.Theobservationshave 1st diffusion coord. During our inspection we separated sources with significant i ●●●● ! ! !!!!!!!! ! !!! !! ! ! ●●●● ¯ 30 30 simulated ●●● 1 Moment of Light M20. To calculateGini this property, / M rank20 order the pixels by flux, determine the !!!!!!! !! !!! ! In●● §6 we provide three diagnostic tests that one may use to deter- ● Xn(n 1) ! ●●● See, e.g. Hildebrandt et al. (2010), Dahlen et al. (2013), and Sánchez et al. ! ! ! ! !! ! !!!! ●●● P(S = 1 x, z) = P(S = 1 x) (1) i=1 M !! !! ! ! !!! ! ! ● variations in depth and V-shapedEtc., eclipses from theEtc. contact 20 2e 2e been projected on the first two di↵usion coordinates for visualization. −05 ! ! ! ! ! ! ! −05 ? | | collection of brightest pixels that contribute 20% of the total flux, i.e., ! ! ! mine the absoluteE-mail: performance [email protected] of conditional density estimators. In (2014), who compare and constrast numerous estimators from both classes. ! !! ! !! ! 0.4 binaries. These sources consist of semi-detached and detached ! ●●●●●●●●● Ergo we might not expectwhere the to seeXi are large pixel changes flux values in G sortedin CANDELS into ascending data. −05 ●●●●●●●●● −05 ●●●●● ●●●●● ●●●●● ●●●● where the random variable S equals 1 if a datum is labeled and 0 ●●●●●● ●●●●●●● ●●●●● §7 we demonstrate our methods by applying them to SDSS data. Gini Coecient G. 3e ●●●●● 3e ●●●●●● ●●●● where the X are pixel flux values sorted into ascending order. For a flat surface brightness profile, ●●●● ●●●●● ●●● Figure 9: Aligned embeddings of broad-band Brown spectra (black points) and broad-band simulated i ●●● ●●●●●● ●●●

20 20 ● Gini coefficient: ! ●●●● ●●●●●● ●●●● eclipsing binaries. ●●● ●●● ●● order. Values range from 0 for uniform surface brightness ●●●●● ●●●●● sponding●● alternative hypotheses are ●●●●●●●●● ●●●●●● ●● otherwise. This assumption is valid to depths spanned by surveys i s.t. fi =0.2ftot , 05 ●●●●●●●● 05 ●● Finally, in© §20168 we The summarize Authors our results. In future works, we will Acknowledgements and References − − ●● G 0, while for galaxies with more concentrated light, G 1. The advantage over concentration ● spectra (red points) using Algorithm 1. Note that this approach does not require a training set or direct ● Semi-detached eclipsing binaries, including β Lyrae-type 4e 4e ● n 5e 5e ● such as e.g. SDSS. It implies covariate shift, defined as ⇥ to 1 for all light concentratedi in one pixel.⇥ −05 ●● −05 ●provide methods for variable selection (i.e. the selection of the most ● ● ● ●● is that there is no implicit assumption that1 the brightest pixels are those at the center of the galaxy. 0.0 −0.5 −1.0 −1.5 −2.0 −2.5 −3.0 ● ● ●● distance computations between the original spectra; both data sets are embedded and aligned in diffusionvariables (EBs), consist of pairs of stars where one of the stars ⌅ ● ● ●● ! ●● ● ●● Hi,1A : One specific mass group dominates the other mass group on the ith cluster. ●● ● ●● 1e−05 1e● −05 ●● ● informative●●● colours, etc., to retain from a large set of possible co- 10 10 ● ● G = (2i n 1)X , ●● ●●●●●●●●● ●●● i ● x x , x = x . We thank the CANDELS morphology team for providing visual identifications ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● f L ( ) , fU ( ) f L (z ) fU (z ) (2) has a full Roche lobe and the other does not. This enables the Aperture is vital: including sky pixels will systematically increase G; not including enough LSB ● ● ●●●●●● ●●●●●●●●●●●●● space only. The numbers represent the geometric landmarks obtained by a density insensitive diffusion then compute the ratio of the second-order¯ moment of these pixels to that of all pixels: ●● ● ●●●●●● Moment of Light M . To calculate this property, rank order the pixels by flux, determine the ● ● ●●● ● ●●● 20 Xn(n 1) ●● variates)●●●●●Hi,1B : andOne explore specific methods age group in which dominates we relax the the other galaxy-labeling age group on the ith cluster. | | Moment of Light statistic: ! ●● ●●●● of GOODS-S ERS2 galaxy morphology. This work was supported by NASA ●● ● ●●●●● M ●●● ●●●●● transfer of gas from the Roche-lobe-filling star to the other. galaxy pixels will systematically decrease G. i=1 20 ●●● K-means clustering. ●●● ● ●●● assumption stated above. At first glance, it may seem that covariate shift would not pose a collection of brightest pixels that contribute 20% of the total flux, i.e., grant NNX09AK59G. ●● 2 2 ● H : One specific SFR group dominates the other SFR group on the ith cluster. ●● i,1C f [(x x ) +(y y ) ] 0 0 Semi-detached eclipsing variables can be distinguished by where the Xi are pixel fluxj i jvaluesj c sortedj intoc ascending M20 = log10 . light curves that continuously vary between eclipses due to where theFromXi L04:are pixel“observations flux values of distant sorted galaxies intof are ascending[(x generallyx )2 +( order.ofy lowery )2 For S/N] a and flat resolution surfaceAbraham, than R., brightness van those den Bergh, profile, S., & Nair, P. 2003, ApJ, 588, 218 MNRAS , 1–9 (2016) ⇤j mask j j c j c ⇥ for i =1,...,20. Furthermore, the local two-sample tests at sample points are conducted1 th well−matched spectrum 000330 th well−matched spectrum order. Values rangei froms.t. 0 forfi =0uniform .2ftot , surface brightnessGrogin, N. A., et al. 2011, arXiv:1105.3753 0 10 20 30 40 0 10 20 30 40 ellipsoidal variations of the distorted star. Unlike with contact G 0, whileof local for galaxies. galaxies We with have more defined concentratedG and M20 . light, . . inG an attempt1. The toadvantage minimize oversystematic concentrationAcknowledgements and References to 1 forwhere all ilight denotes concentrated the⇤ set of brightest in one pixels pixel. collectively Hastie, T., Tibshirani, R., & Friedman, J. 2009, The Elements of Statistical based on Algorithm2 with the k-NN regressor, and the results are visualized in the di↵usion2 2 systems, the depth of the eclipses is unequal and more V-shaped. ⇥ oMsets20 di withers noise from andC in resolution. this bright pixelsNevertheless, awayi from any the measurement center are⇥ more is ultimately heavily weighted, limited and by the there S/N These boxplots indicate that M/I’s have values of is that there is no implicitcontaining assumption 20% of the that total⌅ the flux, brightest and (x ,y pixels) is the areestimated those atLearning the center (2nd edition), of the Springer galaxy. map to demonstrate the validity of binomial tests. After that, the information about signif-1 1 However, unlike detached binary systems, it is not possible to is no implicit assumption of circular symmetry. c c Lotz, J., Primack, J., & Madau, P. 2004, AJ, 128, 163 maxRatio typically larger than those for non-M/I’s, of the observations. Also, the PSF and finite pixel size of the images may introduce increasing icant clusters are presented in Appendix B. Further details on the significant clusters can0 0 distinguish the point at which an eclipse begins or ends. In Aperturethen compute is vital: the including ratiogalaxy of centroid. the sky second-order pixels will moment systematically of these increase pixels toG that; not ofWindhorst, including all pixels:We R. A.,thank et enough al. the2011, CANDELS ApJS, LSB 193, 27 morphology team for providing visual identificationsbut that there is overlap between the two populations. uncertaintiesMoment to the morphologies of Light statistic: as the resolution decreases and small-scale structures areof GOODS-S washed ERS2 galaxy morphology. This work was supported by NASA be found in Appendix C. Figure 19,wepresentexamplesoftheseobjects. relative intensity relative intensity relative galaxy pixelsFigures will 5 andsystematically 6 of L04 (above) decrease show thatGM. 20 systematically increases with both decreased S/N 2 2 out.” − − 2 2 grant NNX09AK59G. After reviewing all of the detached candidates, we separated and decreased resolution. The circlesj (ellipticals)i fj[(xj increasexc) +( theyj most,yc which) ] is consistent with our 0 20 40 60 80 120 0 20 40 60 80 120 the sample into semi-detached and detached binaries based on not seeing smallM ( 20-2)= values log of M is the CANDELS data: . Index Index Figure 5 of L04 shows. that G10systematically20 decreases with2 reduced S/N2 in simulated images:Abraham, R., van den Bergh, S., & Nair, P. 2003, ApJ, 588, 218 whether it was possible to determine the start or end time of the From L04: “observations of distant galaxiesj mask f arej[(xj generallyxc) +( ofyj loweryc) S/N]⇥ and resolution than those ⇤ Grogin, N. A., et al. 2011, arXiv:1105.3753 eclipses. Given the similarity of the light curves, this process is of local galaxies. We have defined G and M20 . . . in an attempt to minimize systematic 660 th well−matched spectrum 1000 th well−matched spectrum Hastie, T., Tibshirani, R., & Friedman, J. 2009, The Elements of Statistical uncertain. where i denotes the⇤ set of brightest pixels collectively 15 2 2 oMsets20 di withers noise from andC in resolution. this bright pixelsNevertheless, away from any the measurement center are more is ultimately heavily weighted, limited and by the there S/N These boxplots indicate that M/I’s have values of

Learning (2nd edition), Springer 1 1 6.2.5. Detached Binaries is no implicit assumptioncontaining of 20% circular of the symmetry. total flux, and (xc,yc) is the estimated maxRatio typically larger than those for non-M/I’s, Figure 19. Examples of semi-detached binary light curves. of the observations. Also, the PSF and finite pixel size of the images may introduceLotz, J., Primack, increasing J., & Madau, P. 2004, AJ, 128, 163 0 0 uncertainties to thegalaxy morphologies centroid. as the resolution decreases and small-scale structuresWindhorst, are R. washedA., et al. 2011, ApJS, 193, 27 but that there is overlap between the two populations. Detached eclipsing binaries, often noted as EAs (or Algol relative intensity relative intensity relative 2 2

− − types), consist of two separated stars aligned closely along our out.”Figures 5 and 6 of L04 (above) show that M20 systematically increases with both decreased S/N while the increasingly separated semi-contact and detached 0 20 40 60 80 120 0 20 40 60 80 120 line of sight. Unlike contact binaries, these stars can have very binaries have longer periods. and decreased resolution. The circles (ellipticals) increase the most, which is consistent with our Index Index different temperatures, resulting in systems with a high degree not seeing small ( -2) values of M is the CANDELS data: Figure 5 of L04 shows. that G systematically20 decreases with reduced S/N in simulated images: of variability. EAs can also have highly elliptical orbits. In such 6.3. Compact Eclipsing Binary Systems Figure 10: Examples of 4 matches in Example 2 using Algorithm 1; i.e., matching without referencecases, the primary and secondary eclipses are not evenly spaced. points. The red curves represent the query spectra from simulations and the black curves representIn the Figure 20,wepresentexamplesofEAswitheclipsedepths Compact binaries can often exhibit orbital periods below ranging from 0.4 to 3 mag. Binaries with eclipse depths greater 0.2 days. Such binaries include systems with white dwarfs matched outputs, which are averages of the k = 3 closest Brown spectra in diffusion space. The results are 2 than a magnitude result from objects of significantly different (WDs) and subdwarfs (mainly sdB’s and sdO’s). Post-common- ranked according to increasing ` matching error of spectra; i.e., from the best (no. 1) to the worsttemperatures. (no. We denote these as deeply eclipsing systems. envelope binaries (PCEBs) include WD-dM systems and are 1000) matched pair for a total of 1000 simulated query spectra. [Jaehyeok: Can you color the simulated In Figure 21, we plot the distribution of VCSS w1 colors as related to interacting close binaries such as CVs. These sys- spectra in Figure 9 by how well they match the Brown spectra in observables space?] afunctionofperiodforEAs,EBs,andEWs.Asexpected,the− tems can aid our understanding of the complex common enve- contact systems have the shortest periods for any given color, lope evolutionary phase in binary systems. Subdwarf binaries

12 12

However, Figure 6 of L04 shows that G systematically increases with reduced resolution in simulated images:

However, Figure 6 of L04 shows that G systematically increases with reduced resolution in simulated images: Zongge Liu

• About Me: 2nd year PhD in statistics

• Advisor: Chad Schafer

• Interest: applied statistics in astronomy/ data mining

• Projects:

✦ Astrostats: predicting emission line from galaxy continuum

✦ Data mining: Effective recovery and efficient fusion for aggregated historical data.

✦ Cosmology: CMB weak lensing Exploring the Intergalactic Medium Collin Eubanks, Jessi Cisewski, Rupert Croft, Doug Nychka, and Larry Wasserman

Goal: Produce 3D map of HI density fluctuations in the IGM from Lyman-α forest in BOSS/eBOSS QSO spectra Principle Challenges: Highly nonuniform sampling, computational costs

Density of BOSS DR12 Observations Cross-section of Predicted Map z ~ 2 1.5 1.0 Density 0.5

0.0 2.0 2.5 3.0 3.5 4.0 z Future Work: (Suboptimal) homogeneous map, eBOSS DR13 (and future releases), supercluster catalog, topological analysis, … Interests: Galaxy morphology

• Comparing distributions of galaxy morphologies between two populations (high-mass vs. low-mass, old vs. new and high SFR vs. low SFR)

• Main interest is to know how two populations are locally different in a multivariate space of morphology statistics such as M, I, D, Gini, M20, C and A.

Local significantLocal significant differences differences (1D toy example) Binomial Tests for Mass (7-dim feature space) Result of Binomial Tests for Mass 75 ! Low-mass dominated 0.4 Decision ! High-mass dominated ● ●● ● ●●● ● ●● ● f(x)=g(x) ●●● ●● ●● ●● ● ●●● f(x)>g(x) ● ●● ●

0.3 ● 50 ● ●● ● ● f(x)

● ● ●● statistic ●● ●●● 10 ●● 0.2 ●●● 5 ● ● ●●● ● ● 0 ● 25 ● −5 ● −10 Density ●● ● −15 ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● 0.1 ●● ●● ● ●●● ●● ● ● ● ● Second coordinate ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ● ●● ●● ● ● ● ●● ●● ● ●● ● ●●●●●●● ●● ● ● ●● ●●● ●●●● ●●●●●●●●● ● ●● 0 ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● 0.0 ●●●●●● ●●●●●●●●●●●

−50 −25 0 25 50 −6 −4 −2 0 2 4 6 First coordinate X

1Department of Statistics 2Department of Physics and Astronomy 1 1 1 2 Ilmun Kim , Ann Lee , Peter Freeman and Jeffrey Newman Carnegie Mellon University University of Pittsburgh