Talk at 30 May 2017

New Insights on Galaxy Formation from Comparing Simulations and Observations

Joel Primack Distinguished Professor of Emeritus, UCSC

Brief introduction to modern cosmology, based on ΛCDM: dark energy and dark matter

Cosmic large scale structure simulations and star formation in galaxies

Comparing high-resolution hydrodynamic galaxy simulations with observations

Astronomers used to think that galaxies form as disks, that forming galaxies are pretty smooth, and that galaxies generally grow in radius as they grow in mass — but Hubble Space Telescope data show that all these statements are false, and our simulations may explain why.

We are using these simulations and deep learning to improve understanding of galaxy formation, with support from Google. Our Cosmic Address The Modern Scientific each dot Cosmos is a big galaxy

Sloan Digital Sky Survey Matter and Energy Content of the Universe

Imagine that the entire universe is an ocean of dark energy. On that ocean sail billions of ghostly ships made of dark matter... Matter and Energy Content of the Dark Universe Matter Ships

on a ΛCDM

Dark Double Energy Dark Ocean Imagine that the entire universe is an ocean of dark Theory energy. On that ocean sail billions of ghostly ships made of dark matter... Planck Collaboration: Cosmological parameters

PlanckPlanck Collaboration: Collaboration: The ThePlanckPlanckmissionmission Planck Collaboration: The Planck mission 6000 6000 Angular scale 5000 90 500018 1 0.2 0.1 0.07 Planck Collaboration: The Planck mission

European 6000 ] 4000 2

] 4000 2 Double µ K [ µ K 3000 [ 3000Dark TT Space TT 5000 D

D 2000 2000Theory

Agency 1000 1000

] 4000 2 Cosmic 0 0 PLANCK 600 Variance600 60

µ K 60

[ 3000 300300 30 TT 30 TT 0 0 Satellite D 0 0 D D Fig. 7. Maximum posterior CMB intensity map at 50 resolution derived from the joint baseline-30 analysis of Planck, WMAP, and

-300 -300 408 MHz observations. A small strip of the Galactic plane, 1.6 % of the sky, is filled in by-30 a constrained realization that has the same statistical properties as the rest of the sky. 2000 -600 -60-60 Data -600 Temperature-Temperature 22 1010 3030 500500 10001000 15001500 20002000 25002500 Released 1000The Planck 2015 temperature power spectrum. At multipoles ` 30 we show the maximum likelihood frequency averaged Fig.Fig.Fig. 9. 9. 10.ThePlanckPlanck TT2015power temperature spectrum. power The points spectrum. in the At upper multipoles panel show` 30 the we maximum-likelihood show the maximum estimateslikelihood of frequency the primary averaged CMB temperature spectrum computed from the Plik cross-half-mission likelihood with foreground and other nuisance parameters deter- temperaturespectrum computed spectrum{ as computed described from in the the textPlik forcross-half-mission the best-fit foreground likelihood and nuisance with foreground parameters and of other the Planck nuisance+WP parameters+highL fit deter- listed February 9, mined from the MCMC analysis of the base ⇤CDM cosmology. In the multipole range 2 ` 29, we plot the power spectrum minedin Table from5. The the red MCMC line shows analysis the of best-fit the base base⇤⇤CDMCDM cosmology. spectrum. The In the lower multipole panel shows range the 2 residuals` 29, with we plot respect the powerto the theoretical spectrum 2015 estimatesestimates from the Commander component-separationcomponent-separation algorithm algorithm computed computed over over 94 94 % % of of the the sky. sky. The The best-fit best-fit base base⇤CDM⇤CDM theoretical theoretical spectrummodel. The fitted0 error to barsthe Planck are computedTT+lowP from likelihood the full covariance is plotted in matrix, the upper appropriately panel. Residuals weighted with across respect each to band this (see model Eqs. are36a shownand in spectrum fitted to the Planck TT+lowP likelihood is plotted in the upperFig. 8. Maximum panel. posterior Residuals amplitude Stokes Q ( withleft) and U ( respectright) maps derived to from thisPlanck observations model between are 30 and shown 353 GHz. in 36b), and include2 beam uncertainties10 and50 uncertainties500 in the foregroundThese1000 model mapS have been parameters. highpass-filtered with1500 a cosine-apodized filter between ` = 202000 and 40, and the a 17 % region of the Galactic2500 thethe lower lower panel. The error error bars bars show show 11 uncertainties.uncertainties. From FromPlanckPlanck Collaborationplane Collaboration has been replaced with a XIII constrained XIII( Gaussian2015(2015 realization). ). (Planck Collaboration IX 2015). From Planck Collaboration X ±± Multipole moment,(2015). viewed as work in progress. Nonetheless, we find a high level of 8.2.2. Number of modes consistency in results between the TT and the full TT+TE+EE likelihoods. Furthermore, the cosmological parameters (which One way of assessing the constraining power contained in a par- 100 ticular measurement of CMB anisotropies is to determine the 140 do not depend strongly on ⌧) derived from the TE spectra have comparable errors to the TT-derived parameters, and they are e↵ective number of a`m modes that have been measured. This

The temperature angular power spectrum of the primary CMB from Planck] , showing a precise measurement of seven acoustic peaks,/ that Fig. 19. consistent to within typically 0.5 or better. Polarization-is equivalent to estimatingPolarization 2 times the square of the total S N

Temperature-Polarization 2 are well fit by a simple six-parameter ⇤CDM theoretical model (the model plotted80 is the one labelled [Planck+WPin the power+highL] spectra, a measure in Planck that contains all Collaborationthe available ] 70 2 XVI (2013)). The shaded area around the best-fit curve represents cosmic variance,µ K 16 including the sky cut used. The error bars on individual points 60

also include cosmic variance. The horizontal axis is logarithmic up to ` = 50, 5 and linear beyond. The vertical scale is `(` + 1)Cl/2⇡. The measured [ µ K 0

spectrum shown here is exactly the same as the one shown in Fig. 1 of Planck[10 Collaboration40 XVI (2013), but it has been rebinned to show better TE the low-` ` region. D EE -70 ` 20 Double Dark Theory C 0 Double Dark Theory -140 Angular scale tected by Planck over the entire sky, and which therefore con- 9010 18 1 0.2 0.1 tains both4 Galactic and extragalactic objects. No polarization in- EE `

6000TE ` 0 formation0 is provided for the sources at this time. The PCCS C 5000 D -10 di↵ers-4 from the ERCSC in its extraction philosophy: more e↵ort has been made on the completeness of the catalogue, without re- 30 500 1000 1500 2000 30 500 1000 1500 2000

] 4000 ducing notably the reliability of the detected sources, whereas 2 ` the ERCSC was built in the spirit of` releasing a reliable catalog µ K

[ 3000

Fig. 10. Frequency-averaged TE (left) and EE (right) spectra (withoutsuitable fitting for quick for T– follow-upP leakage). (in The particular theoretical withTE theand short-livedEE spectra Fig.Fig. 10. 11.Frequency-averagedPlanck T E (left) andTEEE(leftspectra) and (right)EE (right computed) spectra as (withoutdescribed fitting in the for text.T– TheP leakage). red lines The show theoretical the polarizationTE and spectraEE spectra from plottedD in the upper panel of each plot are computed from the best-fitHerschel modeltelescope). of Fig. 9. Residuals The greater with amount respect of to data, this theoretical di↵erent selec- model plottedthe2000 base in⇤ theCDM upperPlanck panel+ ofWP each+highL plot aremodel, computedwhich from is fitted the to best-fit the TT model data only of Fig.. 9. Residuals with respect to this theoretical model areare shown shown in the lower panel panel in in each each plot. plot. The The error error bars bars show show 1tion1 errors. processerrors. The The and green thegreen improvementslines lines in inthe the lower lower in thepanels panels calibration show show the and the best-fit map-best-fit temperature-to-polarization leakage model, fitted separately to the±± TE and EE spectra. From Planck Collaboration XIII (2015). temperature-to-polarization1000 leakage model, fitted separately to the makingTE and EE processingspectra. (references)From Planck help Collaboration the PCCS XIII to( improve2015). the performance (in depth and numbers) with respect to the previ- 0 2 10 50 500 1000 1500 2000 ous ERCSC. 4 cosmologicalcosmological informationMultipole if if we we assume assumemoment, that that the the anisotropies anisotropies are are andandThe 96 96 000 sources 000 modes, modes, were respectively. respectively. extracted4 fromFromFrom the this 2013 this perspective perspectivePlanck thefrequency the2015 2015 purelypurely Gaussian Gaussian (and hence hence ignore ignore all all non-Gaussian non-Gaussian informa- informa- mapsPlanckPlanck (Sect.datadata 6), constrain constrain which approximatelyinclude approximately data acquired 55 %55 more % over more modes more modes than than intwo in Fig.tiontion 20. coming comingThe temperature from lensing, angular the the power CIB, CIB, cross-correlations cross-correlationsspectrum of the CMB, with with other esti- other skythethe coverages.2013 2013 release. release. This Of Of implies course course this that this is the not is flux not the thedensities whole whole story, of story, most since since of matedprobes,probes, from etc.). etc.). the SMICA CarryingPlanck out outmap. this this procedure The procedure model for plotted for the the isPlanckPlanck the one20132013 la- somesome pieces pieces of of information information are are more more valuable valuable than than others, others, and and TT power spectrum data provided in Planck Collaboration XV thein sources fact Planck are anis able average to place of three considerably or more tighter di↵erent constraints observa- on belledTT [Planckpower+ spectrumWP+highL] data in provided Planck Collaboration in Planck Collaboration XVI (2013). The XV tionsin fact overPlanck a periodis able of to 15.5 place months. considerably The tighterMexican constraints Hat Wavelet on shaded((20142014 area)) and and aroundPlanck the Collaboration best-fit curve XVI represents XVI((20142014 cosmic),), yields yields variance, the the number number in- particularparticular parameters parameters (e.g., (e.g., reionization reionization optical optical depth depth or certain or certain 24 algorithm (Lopez-Caniego´ et al. 2006) has been selected as the cluding826826 000 000 the sky (which (which cut used. includes The error the the e ebars↵↵ectsects on of ofindividual instrumental instrumental points noise, do noise, not cos- in- cos- mic variance and masking). The 2015 TT data have increased baseline4 method for the production of the PCCS. However, one cludemic cosmic variance variance. and masking). The horizontal The axis 2015 is logarithmicTT data have up to increased` = 50, 4HereHere we we have have used used the the basic basic (and (and conservative) conservative) likelihood; likelihood; more more this value to 1 114 000, with TE and EE adding a further 60 000 additional methods, MTXF (Gonzalez-Nuevo´ et al. 2006) was andthis linear value beyond. to 1 114 The 000, vertical with scaleTE and is `EE(` +adding1)Cl/2 a⇡ further. The binning 60 000 modesmodes are are e↵ eectively↵ectively probed probed by byPlanckPlanckif oneif one includes includes larger larger sky skyfrac- frac- scheme is the same as in Fig. 19. implementedtions.tions. in order to support the validation and characteriza- tion of the PCCS.

8.1.1. Main catalogue The source selection for the PCCS is made on the basis17 of17 Signal-to-Noise Ratio (SNR). However, the properties of the The Planck Catalogue of Compact Sources (PCCS, Planck background in the Planck maps vary substantially depending on Collaboration XXVIII (2013)) is a list of compact sources de- frequency and part of the sky. Up to 217 GHz, the CMB is the

27 Cosmic Horizon (The Big Bang) Cosmic Background Radiation

Cosmic Dark Ages

Bright Galaxies Form Big Galaxies Form

Earth Forms Today

When we look Cosmic out in space we look back Spheres in time… of Time Cosmological Simulations Astronomical observations represent snapshots of moments in time. It is the role of astrophysical theory to produce movies -- both metaphorical and actual -- that link these snapshots together into a coherent physical theory.

Cosmological dark matter simulations show large scale structure and dark matter halo properties, basis for semi-analytic models

Hydrodynamic galaxy formation simulations: evolution of galaxies, formation of galactic spheroids, mock galaxy images and spectra including stellar evolution and dust effects Aquarius Simulation Volker Springel

Milky Way 100,000 light years

Milky Way Dark Matter Halo 1.5 million light years 8

Bolshoi Cosmological Simulation Anatoly Klypin & Joel Primack 8.6x109 particles 1/h kpc resolution Pleiades Supercomputer at NASA Ames Research Center

1 Billion Light Years 100 Million Light Years

1 Billion Light Years How the Halo of the Big Cluster Formed

100 Million Light Years Bolshoi-Planck Cosmological Simulation Merger Tree of a Large Halo

Peter Behroozi & Christoph Lee We theorists make very complicated models of the star formation rate (SFR) in galaxies — but Is Main Sequence SFR Controlled by Halo Mass Accretion? by Aldo Rodríguez-Puebla, Joel Primack, Peter Behroozi, MNRAS 2016

One can show that this must be true on average Our radical SHARC (stellar halo accretion rate co-evolution) hypothesis is that this may be true halo-by-halo for many dark matter halos hosting star-forming galaxies We then put SHARC in the bathtub, by combining the SHARC hypothesis with “bathtub” galaxy models

KEY BACKGROUND INFORMATION - the stellar/halo mass relation - the galaxy main sequence The Astrophysical Journal,795:104(20pp),2014November10 Whitaker et al. Two Key Discoveries About Galaxies The4 Astrophysical Journal,795:104(20pp),2014November10BEHROOZI, WECHSLER & CONROY Whitaker et al. (a) (b) Relationship(a) Between Galaxy Star-forming(b) Galaxies Lie z TheStellar Astrophysical Mass Journal,795:104(20pp),2014November10 and Halo Mass 8on42 a “Main 1 Sequence”0.5Whitaker 0.2 et al. 0 0.1 (a) (b) ] 3 0.1 0.01 / yr Mpc • O

0.001 z=0.1 0.01 z=1.0 z=2.0 Observations

Stellar Mass / Halo z=3.0 Full Star Formation History Best-Fit Model 0.0001 z=4.0 Cosmic SFR [M Time-Independent SFE Time-Independent SFE Mass and Time-Independent SFE 10 11 12 13 14 15 0.001 10 10 10 10 10 10 1 2 4 6 8 10 12 13.8 Halo Mass [M ] Time Since Big Bang [Gyr] O• Figure 1. Star formation rate as a function of stellar mass for star-forming galaxies. Open circles indicate the UV+IR SFRs from a stacking analysis, with a second-order polynomial fit above the mass completeness limits (solid vertical lines). Open squares signify measurements below the mass-completeness limits. Therunningmedians for individuallyFTheIG.3.— stellarLeft detected masspanel: objects to the halo instellar MIPS mass mass 24 µ ratio tom imaging halo at mass multiple with ratio S/N at> multiple3(shownasagray-scaledensityplotinthePanel(a),left)areindicatedwithfilledcirclesinthe redshifts as derived from observations (Behroozi et al. 2012) compared to a model which righthasFigure panelredshifts a time-independent 1. andStar are formationas color-coded derived rate star as byfrom aformation redshift.function observations of The efficiency stellar number mass (SFE). of forcompared star-forming star-forming Error barsto galaxies. galaxies show Open with 1 Scircles/uncertaintiesN > indicate3 detections the (Behroozi UV+IR in the SFRs 24 etµ from al.m imaging 2012). a stacking A and time-independent analysis, those with with S a/ secoN 3(shownasagray-scaledensityplotinthePanel(a),left)areindicatedwithfilledcirclesinthe with signifyα 0. measurements3–0.6fromz 0 below.5 to z the2.5. mass-completeness We show the SDSS curve limits. (gray Th⊙ dottederunningmedians line in Panel right panel and are color-coded by redshift. The number of star-forming galaxies with S/N >are3 detections controlled in the 24byµ mtheir imaging mass, and those the withgalaxy S/N < star3areindicated for individually detected(b))to objects from the1σ best-fit Brinchmannuncertainties. in MIPS model 24et fromal.µ mA (2004 time-independent aimaging star) as formation it is with one of S history/ theN few> Star3(shownasagray-scaledensityplotinthePanel(a),left)areindicatedwithfilledcirclesinthe reconstruction measurements Formation that technique goes= to (Behroozi very low mass, et= al. but 2012) it is= basedas well on as another the time-independent SFR indicator. SFE model. The latter model in the bottom right of each panel. The star formation sequence for star-forming galaxies is curved,formation with a constantrate (SFR) slope of is unity approximately at log(M⋆/M ) < 10 (solid black right panel and are color-codedworksEfficiency surprisingly by redshift. predicts well The up numbera roughly to redshifts of star-formingtime-independent of z 4. However, galaxies awithstellar model S/ N which> 3 has detections a constant in the efficiency 24 µm (with imaging mass and and those time) with also⊙ S reproduces/N < 3areindicated the decline in star (A colorline in version Panel (b) of is this linear), figure whereas is available the slope in the at the online⇠ massive journal.) end flattens with α 0.3–0.6fromproportionalz 0.5 to z 2 .5.to We the show stellar the SDSS mass curve, (gray with dotted the line in Panel in the bottom right of eachformation(b))mass panel. from Brinchmann wellto The halo since star mass formation etz al.2. (2004 relationship) as sequence it is one of for. the (Behroozi, star-forming few measurements galaxies that goes= is to curved, very low mass, with= but a constant it is= based on slope another of SFR unity indicator. at log(M⋆/M ) < 10 (solid black ⇠ proportionality constant increasing with redshift⊙ up line in Panel (b) is linear),(A whereasWechsler, color version the slopeConroy, of this figureat the ApJL is massive available 2013 in end) the flattens online journal.) with α 0.3–0.6fromz 0.5 to z 2.5. We show the SDSS curve (gray dotted line in Panel (b)) from BrinchmannWuyts et al. (2004 et al.) as2007 it is;Williamsetal. one of the few measurements2009; Bundy that et goes= al. to2010 very; low mass,24=toµ butmdetections,totaling9015star-forminggalaxiesovertheabout it is= z based = 2.5. on ( anotherWhitaker SFR et indicator.al. ApJ 2014) Cardamone et al. 2010; Whitaker et al. 2011; Brammer et al. (A color version of this figure is available in the online journal.)z full redshift range. Mass completenessz limits are indicated by 2011Wuyts;Pateletal. et al. 20072012;Williamsetal.); quiescent galaxies2009; have Bundy strong et al. Balmer2010;/ 24verticalµmdetections,totaling9015star-forminggalaxiesoverthe lines. The GOODS-N and GOODS-S fields have deeper Cardamone8 42 et al. 2010; Whitaker 1 et0.5 al. 2011; Brammer 0.2 et 0 al. full redshift8 range.42 Mass completeness 1 limits are0.5 indicated 0.2 by 0 400014 Å breaks, characterized by red rest-frame U–V colors MIPS12 imaging (3σ limit of 10 µJy) and HST/WFC3 JF 125W 10 2011;Pateletal.2012); quiescent galaxies have strong Balmer/ vertical10 lines. The GOODS-N and∼ GOODS-S fields have deeper Wuyts et al. 2007and;Williamsetal. relatively blue rest-frame2009; BundyV–J colors. et al. Following2010; the two-24 µmdetections,totaling9015star-forminggalaxiesovertheand HF 160W imaging (5σ 26.9 mag), whereas the other three 4000 Å breaks, characterized by red rest-frame U–V colors MIPS imaging (3σ limit of ∼10 µJy) and HST/WFC3 JF 125W colorand separations relatively blue defined rest-frame in WhitakerV–J colors. et al. Following (2012a), wethe selecttwo- andfieldsH haveimaging shallower (5σ MIPS∼26.9 imaging mag), whereas (3σ limits the otherof 20 threeµJy) and Cardamone et al. 201013 ; Whitaker et al. 2011; Brammer et al. full redshiftF 160 range.11W Mass completeness limits are∼ indicated by 58,97310 color star-forming separations defined galaxies in Whitaker at 0.5

4000 Å breaks, characterizedabove the mass-completeness by red14 rest-frame limits (TalU– etV al.colors2014). AmongMIPS imaging90%] completeness (3σ limit limits of derived10 µJy) by Tal and etHST al.∼ (2014/WFC3), calculatedJF 125W ] HST v4.0 catalogs. Of these, 39,106 star-forming galaxies are The mass• completeness limits in Figure 1 correspond to the O

• 12 10

O ∼ and relatively bluethe10 rest-frameaboveUVJ-selected the mass-completenessV– star-formingJ colors. Following galaxies limits (Tal with et the al. masses two-2014). above Among theand H90%Fby160 completenessW comparing10imaging object limits (5σ derived detection26.9 by mag), Talin the et al. whereasCANDELS (2014), calculated the/deep other with three a color separationscompleteness definedthe UVJ in-selected Whitaker limits, star-forming 22,253 et al. have ( galaxies2012a S/N), with we> masses select1MIPS24 above theµfieldsm by havere-combined comparing shallower object subset MIPS detection of∼ imaging the in exposures the CANDELS (3σ thatlimits reach/deep of the with20 depthµ a Jy) of and detections11completeness (amongst limits, which 22,253 9,015 have have S S//NN >> 3)1MIPS24 and 35,916µm are re-combinedthe CANDELS9 subset/wide of the fields. exposures Although that reach the mass the depth∼ completeness of 58,973 star-forming galaxies at 0.5 N3)< and1). 35,916The are full thein CANDELS the10 deeper/wide GOODS-N fields. Although and GOODS-S the mass fields completeness∼ will extend to HST v4.0 catalogs. Of these, 39,106 star-forming galaxies are15 The mass completeness limits in Figure 1 correspond to the sampleundetected of star-forming in MIPS 24 galaxiesµm photometry are considered (S/N < in1). theThe stacking full inlower the deeper stellar GOODS-N masses, we and adopt GOODS-S the more fields conservative will extend limits to for above the mass-completenessanalysis.10sample Although of star-forming limits we (Tal have galaxies et not al. areremoved2014 considered). sources Among in the with stacking-1.0 X-ray90% completenesslowerthe shallower stellar8 masses, limitsHST we/WFC3 derivedadopt imaging. the moreby TalObserved conservative et al. (Limit2014 limits), calculatedfor -1.2 Halo Mass [M 10 10 the UVJ-selecteddetections star-forminganalysis. in Although the galaxies following we have analysis,with not masses removed we estimate above sources the the contribution with X-rayby comparingthe shallowerFirst,Stellar Mass [M we objectHST look/WFC3 detection at the imaging. measurements in the CANDELS for individual/deep galaxies. with a detections in the following analysis, we estimate the contribution First, we look at the measurements for individual galaxies. completeness limits,of active 22,253 galactic have nuclei S/N (AGNs)> 1MIPS24 to the medianµm 24 µ-2.5mfluxre-combinedThe running subset median of the of exposures the individual that UV+IR reach measurements the depth-2.6 of 9of active galactic nuclei (AGNs) to the median 24 µmflux The running7 median of the individual UV+IR measurements(SFR*ND) 10

densities in Section 4.2. (SFR*ND) of the SFR are indicated with solid circles when the data are detections (amongst10 whichdensities 9,015 in Section have4.2 S. /N > 3) and 35,916 are the CANDELSof the SFR10 are/wide indicated fields. with Although solid circles the when mass the data completeness are 15 10 complete both in stellar mass and SFR (above the shallower undetected in MIPS 24 µm photometry (S/N < 1). The full -4.0 in thecomplete deeper both GOODS-N in stellar mass and and GOODS-S SFR (above16 fields the shallower will extendlog -4.0 to sample of star-forming8 galaxies3. THE3. THE are STAR STAR considered FORMATION FORMATION in the SEQUENCE SEQUENCE stackinglog datadata 3σ 3MIPSσ6 MIPS 24 µ 24mµ detectionm detection limit). limit).16 We considerWe consider all MIPS all MIPS 10 lower stellarphotometry10 masses, in the we median adopt the for more the individual conservative UV+IR limits SFRs for analysis. AlthoughFigure we have11 2shows not removed 4 the star 6 formation sources 8 with sequence, 10 X-ray 12 log 13.8as a photometry in1 the/ 2 median 4 for the 6 individual 8 UV+IR 10 SFRs 12 13.8 Figure 1 shows the star formation sequence, log ΨΨasthe a shallowermeasurementsmeasurementsHST (filledWFC3 (filled circles), circles), imaging. even even those those galaxies galaxies intrinsically intrinsically M Time Since Big Bang [Gyr] Time Since Big Bang [Gyr] detections in the followingfunctionfunction of analysis, of log log M⋆,infourredshiftsbinsfrom⋆ we,infourredshiftsbinsfrom estimate the contributionzz 00.5to.5toFirst,faintfaint we in in the look the IR. IR. at Only Onlythe 1% measurements 1% of the of star-forming the star-forming for galaxies individual galaxies above above the galaxies. the z z 2.5.2.5. We We use use a a single single SFR SFR indicator, indicator, the UV+IR UV+IR== SFRs SFRs of active galactic nuclei= = (AGNs) to the median 24 µmflux The running2020µJyµJy limit median limit in each in each of redshift the redshift bin individual have bin have 24 µm UV+IR 24 photometryµm photometry measurements with with describeddescribed in in Section Section2.42.4,probingovertwodecadesinstellar,probingovertwodecadesinstellar S/SN/N< <1. 1. densities in Section 4.2FIG..4.—Left panel: Star formation rate as a function of halo massof and the cosmic SFR time, are weighted indicated by the number with density solid of circles dark matter when halos the at that data time. are Contours mass.mass. The The gray gray scale scale represents represents the the density density of points points for for star- star- ToTo leverage leverage the the additional additional decade decade lower lower in stellar in stellar mass mass showforming where galaxies 50 and 90% selected of all in stars Section were formed;2.5 with dashed S/N > line3MIPS showscomplete the median halo both mass in for stellar star formation mass as and a function SFR of (above time. Right thepanel: shallower Star formation forming galaxies selected in Section 2.5 with S/N > 3MIPS thatthat the the CANDELS CANDELSHSTHST/WFC3/WFC3 imaging imaging enables enables us to probe us to probe 3. THE STARrate as FORMATION a function of galaxy SEQUENCE stellar mass and time, weighted by thedata number 3σ densityMIPS of galaxies 24 µm at detection that time. Contours limit). and16 We dashed consider line are as all in top-left MIPS panel; dotted14 line shows current minimum stellar masses reached by observations. 16 14 EssentiallyEssentially identical identical to to the the publicly publicly released released catalogs catalogs available through through 16In the case of the 1.0 the3MIPS star formation effi- ever, as discussed previously, the baryon accretion rate for that the CANDELS HST/WFC312 imaging enables us to probe ciency’s normalization. Using the maximum circular velocity small halos (Mh < 10 M ) can differ from the dark matter 14 (V ) or the velocity dispersion () instead would also lead Essentially identical to thecirc publicly released catalogs available through 16 In the case ofaccretion the 1.0 4; Behroozi & Silk 2015; Finkelstein et al. 2015). distinct halo of mass Mvir hosts a central galaxy of stellar Moreover, Behroozi, Wechsler & Conroy (2013a) showed mass M∗.ThentheGSMFforcentralgalaxiesasafunction 2 that assuming a time-independent ratio of galaxy specific of stellar mass is given by star formation rate (sSFR) to host halo specific mass accre- 2.2 The Galaxy Mass Function ∞ tion rate (sMAR), defined as the star formation efficiency ϵ, φ∗,cen(M∗)= P (M∗|Mvir)φh(Mvir)dMvir. (2) simply explainsWe use the the cosmic GSMF star for formation central galaxies rate since reportedz =4. in ? and Z0 If we assumeobtained a time-independent from the ? galaxy SHMR, group the catalog star formation based on the SDSS efficiency isDR7. the slope This catalog of the SHMR, represents an updated version of ?. Here, φh(Mvir)isthehalomassfunctionandP (M∗|Mvir) is a log-normal distribution assumed to have a scatter of M˙ ∗/M∗ ∂ log M∗ ϵ = = . (3) ˙ σc =0.15 dex independent of halo mass. Such a value is Mvir/M2.3vir Connecting∂ log Mvir Galaxies to Halos supported by the analysis of large group catalogs (Yang, This equation simply relates galaxy SFRs to their host Mo & van den Bosch 2009; Reddick et al. 2013), studies of We model the central GSMF by defining P (M∗|Mvir)asthe halo MARs without requiring knowledge of the underlying the kinematics of satellite galaxies (More et al. 2011), as well probability distribution function that a distinct halo of mass physics. (This is the main difference between the equilibrium as clustering analysis of large samples of galaxies (Shankar Mvir hosts a central galaxy of stellar mass M∗.Thenthe solution we present below and previous “bathtub” models.) et al. 2014; Rodr´ıguez-Puebla et al. 2015). Note that this GSMF for central galaxies as a function of stellar mass is Our primary motivation here is to understand whether halo scatter, σ ,consistsofanintrinsiccomponentandamea- given by c MARs are responsible for the mass and redshift dependence surement error component. At z =0,mostofthescatter ∞ of the SFR main sequence and its scatter. Similar models appears to be intrinsic, but that becomes less and less true φ∗,cen(M∗)= P (M∗|Mvir)φh(Mvir)dMvir. (2) have been explored in the0 past for different purposes, includ- at higher redshifts (see, e.g., Behroozi, Conroy & Wechsler Z ing generatingHere, mockP (M catalogs∗|Mvir)isalognormaldistributionswithascatter (Taghizadeh-Popp et al. 2015) 2010; Behroozi, Wechsler & Conroy 2013b; Leauthaud et al. and understandingaround M the∗ assumed different to clustering be constant of with quenchedσ =0. and15 dex. Such 2012; Tinker et al. 2013). Here, we do not deconvolve to re- c star-formingavalueissupportedtheanalysisofgenerallargegroupcat- galaxies (Becker 2015). move measurement error, as most of the observations that Usingalogs halo(alias?) MARs, we,studiesonthekinematicsofsatellitegalaxies operationally infer galaxy SFRs we will compare to include these errors in their measure- as follows.(More Let M et∗ = al.M 2011)∗(Mvir as(t) well,t) be as onthe clustering stellar mass analysis of a of large ments. central galaxysamples formed of galaxies in a halo??. of mass Mvir(t)attimet. As2594 regardsA. Rodr´ıguez-Puebla the GSMF et al. of central galaxies, we here use In a time-independentEmphasis SHMR, that the the above model reduces reproduces to M the∗ = observed the resultsTable 1. List of reported acronyms used in this in paper. Rodr´ıguez-Pueblaral halo finder etROCKSTAR al.(Behroozi, (2015). Wechsler In & Wua2013b;Behroozi et al. 2013c). Halo masses were defined using spherical overden-M∗(Mvir(t)).GSMF From at this redshift relationz ∼ 4. the change of stellar mass ART Adaptive refinement tree (simulation code). recent analysis of the SDSS DR7, Rodr´ıguez-Pueblasities according to the redshift-dependent et virial al. overdensity "vir(z) CSFR Cosmic star formation rate. given by the spherical collapse model (Bryan & Norman 1998), IMF Initial mass function. in time is simply with "vir 178 for large z and "vir 333 at z 0withour (2015)ISM derived theInterstellar total, medium. central, and satellite GSMF for stel- # . Like= the Bolshoi simulation (Klypin= et al. 2011= ), Bolshoi– GSMF Galaxy stellar mass9 function. M 12 ˙ Planck is complete down to haloes of maximum circular velocitydM∗ ∂ log M∗ dMvir lar massesMAR from MMass∗ =10 accretion rate, MMvir.⊙ to M∗ =10 M1 ⊙ based on the vmax 55 km s− . ∗ 2.4 Inferring Star Formation Rates From Halo SHARC Stellar halo accretion rate coevolution. ∼ = f , (4) E SHARC Equilibrium SHARC. In this paper, we calculate instantaneous halo MARs from the NYU-VAGC+ (Blanton+ et al. 2005) and using the 1/Vmax es- dt ∂ log Mvir dt SDSS Sloan digital sky survey. Bolshoi–Planck simulation, as well as halo MARs averaged over Mass Accretion Rates SFR Star formation rate. the dynamical time (M˙ vir,dyn), defined as timator.SHMR The membershipStellar-to-halo mass relation. (central/satellite) for each galaxy where f∗ = M∗/Mvir is the stellar-to-halo mass ratio. ˙ dMvir Mvir(t) Mvir(t tdyn) sMAR Specific mass accretion rate, Mvir/Mvir. − − . (1) In recent analysis of the galaxy stellar mass functions, star was obtainedsSFR fromspecific an star formation updated rate, SFR/M . versiondt ofdyn = the Yangtdyn et al.

∗ ! " EquationDownloaded from (4) implies stellar-halo accretion rate coevolution, 1/2 formation rates and cosmic star formation rates from z =0 The dynamical time of the halo is t (z) [G" (z)ρ ]− , which (2007) group catalog presented in Yang et al. (2012).dyn = Thevir m the spherical overdensity criterion of Bryan & Norman (1998). We is 20 per cent of the Hubble time. Simulations (e.g. Dekel et al.SHARC. The left panel of Figure 4 shows the resulting ∼ to z =8combinedwiththegrowthhalosobtainedfromN- correspondingalso assume a Chabrier SHMR (2003) IMF. Finally, is shown Table 1 lists all as the the2009) black suggest that most curve star formation in results Fig- from cold gas flowing acronyms used in this paper. inward at about the virial velocity – i.e. roughly a dynamical timestellar-to-halo mass ratio, f∗,derivedforSDSScentralgalax- Figure 7. Upper Panel: Cosmic mass density as a function of body simulations, ? show that the M∗–Mvir relation evolves after the gas enters. As instantaneous accretion rates for distinct z z ure 3, and the SHMR for all galaxieshaloes from near clusters Behroozi, can also be negative Wech- (Behroozi et al. 2014), http://mnras.oxfordjournals.org/ .Cosmicstar-formationrateasafunctionof . 2STELLARHALOACCRETIONRATE ies (see Section 2.2). Consistent with previous studies, we using time-averaged accretion rates allows galaxies in these haloes slowly with redshift. Moreover, ? showed that when assum- COEVOLUTION (SHARC) 12 sler & Conroy (2013a) is shown as theto continue red forming curve. stars. The dif- find that f∗ has a maximum of ∼ 0.03 at Mvir ∼ 10 M⊙, Fig. 1 shows the instantaneous and the dynamical-time-averaged ing that the ratio of galaxies specific star formation rates ference2.1 The between simulation the two curves for halo masses lower than halo MARs as a function of halo mass and redshift, and Fig. 2 showsand it decreases(sSFR) at to both their higher host and halos lower specific halo masses. mass accretion The rates We generate12 our mock galaxy catalogues based on the N-body their respective scatters. Even before converting halo accretion rates rates averaged over a dynamical time instead. Additionally, Mvir ∼ 10 M⊙ reflects the fact that the SHMR of cen- Bolshoi–Planck simulation (Klypin et al. 2014). The Bolshoi– into SFRs (Section 2.3), it is evident that both the slope and disper-product f∗(sMAR),× ϵ = dM star∗/dM formationvir will effi beciency shownϵ,isindependentofred- as the black trals andPlanck simulation satellite is based on galaxies the !CDM cosmology are with slightly param- sion di inff haloerent MARs are as already menti very similaroned to that of galaxy SFRs we show that a redshift-independent M∗–Mvir model ex- eters consistent with the latestIs results Main from the Planck Collabora- Sequenceon the main sequence. SFR Controlledcurves in Figurebyshift Halo simply 5 below. explainsMass theAccretion? cosmic star formation rate since tion (Planck Collaboration XIII 2015) and run using the ART code 12 at University of California, Santa Cruz on December 10, 2015 plains the observed scatter of the SFR−M∗ in main sequence ⊙ above.(Kravtsov, At Klypin halo & Khokhlov masses1997;Gottloeber&Klypin higher than2008). Mvir ∼ 10 M ,this In the more general case M∗ = M∗(Mvir(t),z), equation by Aldo Rodríguez-Puebla,1 3 2.2 Connecting galaxies toJoel haloes Primack, Peter Behroozi,z =4. Sandra Faber MNRAS 2016 The Bolshoi–Planck simulation has a volume of (250 h− Mpc) and galaxies. differencecontains 2048 is3 particles primarily of mass 1.9 10 due8 M .Haloes/subhaloes to the diThefferences abundance matching between technique is a simple the and powerful statis- × (4) generalizesIn to this paper we use these results by assuming that GSMFsand their used merger trees to were deriveHalo calculated with mass these the⊙ phase-space SHMRs, accretion tempo- tical Behroozi approach rates to connecting etz=0 galaxies al. toto 2013 haloes. 3 In its most simple the M∗–Mvir is independent of redshift. We use the relation dM∗ ∂M∗(Mvir(t),z) dMvir ∂M∗(Mvir(t),z) dz used (Moustakas et al. 2013). When comparing both GSMFs = obtained in Section 2.3+ for local galaxies., Specifically,(5) we dt ∂Mvir dt ∂z dt we find that the high mass-end from Rodr´ıguez-Puebla et al. infer galaxy star formation rates from halo mass accretion 3RESULTS (2015) is significantly different to the one derive in (Mous- where the first term is the contribution to the SFR from but if the M∗–Mratesvir relation as follow. is independent Let M∗ = M of∗( Mredshiftvir(t),t then)thestellarmassof the takas et al. 2013). In contrast, when comparing Rodr´ıguez- halo MAR and the second term is the change in the SHMR Figures: SFR vs M∗;sSFRvsM∗;SFRDvszandcosmic stellar mass ofacentralgalaxyformedinahaloofmass a central galaxy formed in a halo of massMvir (t)attimet.If Puebla et al. (2015) GSMF with Bernardi et al. (2010) we with redshift. Although in this paper we assume a constant mass density vs z. Mvir(t) is M∗ =M M∗∗–(MMvirvir(t)).is independent From this relation of redshift star thenformationM∗ = M∗(Mvir(t)). find an excellent agreement, for a more general discussion ratesSHMR, are given theFrom formalism simply this by relation that we star describe formation below rates applies are given to this simply by; see Rodr´ıguez-Puebla et al. (2015). In less degree, we also more general case. dM∗ d log M∗ dMvir find that the different values employed for the scatter of the The relation= betweenf∗ stellar mass, growth and observed (3) 4DISCUSSION dt d log Mvir dt SHMR explain these differences. star formation rate is given by where f∗ = M∗/Mvir.Moreover,fromtheaboveequation Discuss about the star formation efficiency. For which galax- where f∗ = M∗/Mvir. We call this Stellar-Halo Accretion Rate ⃝c 0000 RAS, MNRAS 000,000–000 we can deduce that the star formation efficiency, ϵ,isjust, ies ϵ =1.Doweneedafigureofϵ vs Mvir? Coevolution (SHARC) if true halo-by-haloStellar-Halo. Accretion Rate Coevolution 11 sSFR d log M∗ = ϵ = . (4) Figure 1. Halo MARs from z 0 to 3, from the Bolshoi–Planck simulation. The instantaneous rate is shown in black, and the dynamically time averaged rate Stellar halo accretion rate coevolution 2595 = ˙ ˙ SHARC correctlysMAR predictsd log starMvir formation rates to z ~ 4 in red. The grey band is the 1σ (68 per cent) range of the instantaneous MARs. All the slopes are approximately the same 1.1 both for Mvir and Mvir,dyn. 4.1 Implications for the bathtub model Scatter of halo mass accretion∼ rates While in the above analysis the term dMvir/dt refers to MNRAS 455, 2592–2606 (2016) Equation 3 is essentially the bathtub model. Differences be- the instantaneous mass accretion rates we also infer SFRs tween the observed SFRs and our models will give con- by using dMvir/dt averaged over a dynamical time scale as straints on the regime where the bathtub model is valid. measured from the simulations. As we will show below, we confirm the previous claim • For z>5ourSFRsareaboveobservations.Thismeans in ? that this model reproduces the observed evolution of that at early epochs galaxies did not convert gas in stars as the SFR−M∗ and cosmic star formation rate. Moreover, we fast as they receive it. This is a phase of gas accumulation show that this is also true when using halo mass accretion where the bathtub is being fill with gas.

⃝c 20?? RAS, MNRAS 000,1–?? Downloaded from

Figure 2. Scatter of halo MARs from z 0 to 3 from the Bolshoi–Planck = 9 9.5 10 10.5 simulation. As in Fig. 1, scatter for the instantaneous rate is shownFigure in black,8. Specific star formation rates as a function of redshift z for stellar masses M∗ =10, 10 , 10 and 10 M⊙ from time- independent SHMR model. The red and black curves are the sSFRs, from both dynamically-time-averaged and instantaneous mass and that for the dynamically time averaged rate in red. accretion rates, respectively, with the gray band representing the dispersion in the latter. Both are corrected for mergers. The orange curve is the Speagle et al. (2014) summary of observed sSFRs onthemainsequence.ObservationsfromWhitakeretal.(2014), Ilbert et al. (2015) and Schreiber et al. (2015) are also included. http://mnras.oxfordjournals.org/ form, the cumulative halo and subhalo mass function1 and the cu-

mulative GSMF are matched in order to determine the massesting relation to discuss these differences in the light of the constant between haloes and galaxies. In order to assign galaxiesSHMR to haloes model. in the Bolshoi–Planck simulation, in this paper we use a moreFirst, gen- the observed sSFRs of galaxies at z>4aresys- eral procedure for abundance matching. Recent studiestematically have shown lower than the time independent SHMR model predictions. These differences increase at z =6.Thedis- that the mean SHMRs of central and satellite galaxiesagreement are slightly between the constant SHMR predicted SFRs and different, especially at lower masses where satellitesthe tend observations to have implies that the changing SHMR must be used, as in equation (5), at least at high redshift. more stellar mass compared to centrals of the same halo mass (for a more general discussion see Rodr´ıguez-Puebla et al. 2012Between, 2013; z =4andz =3theobservedstar-forming sequence is consistent with the SHARC predictions. Between at University of California, Santa Cruz on December 10, 2015 Reddick et al. 2013;Watson&Conroy2013;Wetzeletal.z =2and2013z).=0.5, the observed sSFRs are slightly above Since we are interested in studying the connection betweenthe SHARC halo predictions. This departure occurs at the time of the peak value of the cosmic star formation rate. mass accretion and star formation in central galaxies, for our anal- After the compilation carried out by Speagle et al. ysis we derive the SHMR for central galaxies only. (2014), new determinations of the sSFR have been pub- lished, particularly for redshifts z<2.5. In Figures 7 and 8, We model the GSMF of central galaxies by defining P (M Mvir) Figure 9. Scatter of the sSFR for main-sequence galaxies pre- ∗ we reproduce| new data published in Whitaker et al. (2014); dicted in our model. as the probability distribution function that a distinctIlbert halo of et mass al. (2015) and Schreiber et al. (2015). This new Mvir hosts a central galaxy of stellar mass M . Then theset GSMF of data for agrees better with our model between z =2 ∗ and z =0.5, implying that the time-independent SHMR 4.2 Scatter of the sSFR Main Sequence central galaxies as a function of stellar mass is given by(SHARC assumption) may be nearly valid across the wide redshift range from Figurez ∼ 4to 3. Upperz ∼ 0, panel: a remarkable SHMR for SDSS result. galaxies.We The now red turn curve our is discussion for all to the scatter of the star-forming ∞ SDSS galaxies, from Behroozi et al. (2013d) abundance matching using the ˙ φ (M ) P (M M )φ (M )dM . However, it(2) is not clear whether this is valid since the newer main sequence, displayed in Figure 9. When using Mvir,the ,cen vir h vir vir observations have not been recalibrated as in Speagle et al. scatter is nearly independent of redshift and it increases ∗ ∗ = 0 ∗| Bolshoi simulation. The black curve is for SDSS central galaxies, using the ! (2014). abundance matching method of Rodr´ıguez-Puebla,very Avila-Reese slowly with & Drory mass for z<2. The value of the scat- Here, φh(Mvir) is the halo mass function and P (M Mvir) is a log- ⃝c 0000 RAS, MNRAS(2013000)appliedtotheBolshoi–Plancksimulation.Thelatteriswhatwe,000–000 normal distribution assumed to have a scatter of ∗σ| 0.15 dex c = use in this paper, where we restrict attention to central galaxies. Bottom independent of halo mass. Such a value is supported by the anal- Panel: halo-to-stellar mass relations. The dotted vertical line and the blue ysis of large group catalogues (Yang, Mo & van den Bosch 2009; arrow indicate that galaxies below M 1010.5 M are considered as main ∗ = Reddick et al. 2013), studies of the kinematics of satellite galaxies sequence galaxies, while some higher mass galaxies⊙ are not on the main (More et al. 2011), as well as clustering analysis of large samples sequence. of galaxies (Shankar et al. 2014;Rodr´ıguez-Puebla et al. 2015). 12 Note that this scatter, σ c, consists of an intrinsic component and a to 10 M based on the NYU-VAGC (Blanton et al. 2005)and measurement error component. At z 0, most of the scatter ap- using the⊙ 1/V estimator. The membership (central/satellite) for = max pears to be intrinsic, but that becomes less and less true at higher each galaxy was obtained from an updated version of the Yang redshifts (see e.g. Behroozi, Conroy & Wechsler 2010;Leauthaud et al. (2007) group catalogue presented in Yang et al. (2012). The et al. 2012; Behroozi et al. 2013d; Tinker et al. 2013). Here, we corresponding SHMR is shown as the black curve in Fig. 3,and do not deconvolve to remove measurement error, as most of the the SHMR for all galaxies from Behroozi et al. (2013a) is shown observations that we will compare to include these errors in their as the red curve. The difference between the two curves for halo 12 measurements. masses lower than Mvir 10 M reflects the fact that the SHMR As regards the GSMF of central galaxies, we here use the results of centrals and satellite∼ galaxies are⊙ slightly different as mentioned 12 reported in Rodr´ıguez-Puebla et al. (2015). In a recent analysis of above. At halo masses higher than Mvir 10 M , this difference the SDSS DR7, Rodr´ıguez-Puebla et al. (2015) derived the total, is primarily due to the differences between∼ the GSMFs⊙ used to derive central, and satellite GSMF for stellar masses from M 109 M these SHMRs, Behroozi et al. (2013c) used Moustakas et al. (2013). ∗ = ⊙ When comparing both GSMFs, we find that the high-mass end from Rodr´ıguez-Puebla et al. (2015) is significantly different to the one 1 Typically defined at the time of subhalo accretion. derive in Moustakas et al. (2013). In contrast, when comparing

MNRAS 455, 2592–2606 (2016) Astronaut Andrew Feustel installing Wide Field Camera Three on the last visit to Hubble Space Telescope in 2009

The infrared capabilities of WFC3 allow us to see the full stellar populations of forming galaxies

candels.ucolick.org

WFC3 ACS

(blue 0.4 μm)(1+z) = 1.6 μm @ z = 3 (red 0.7 μm)(1+z) = 1.6 μm @ z = 2.3 Redshift COSMIC WEB This frame from the Bolshoi supercom- 69 4 3 2 10.50.20 puter simulation depicts the distribution of matter at ) – 2 redshift 3. Clusters of galaxies lie along the bright filaments. CANDELS Dark matter and cold gas flow along the filaments to supply × 10 3

galaxies with the material they need to form stars. 10 / CANDELS, ET AL. )

Cosmic 5

Cosmic dawn Cosmic high noon Star-formation rate UNIVERSITY OF NOTTINGHAMUNIVERSITY OF (

(solar masses/year/megaparsecs 0 BER

Ö 012345 6 7 8 9 10 11 12 13 Time since Big Bang (billions of years) Big Now Bang 13.7 STARBIRTH RATE Using data from many surveys, including CANDELS, astronomers have plotted the rate of star formation through cosmic history.

The rate climbed rapidly at cosmic dawn and peaked at cosmic high noon. GREGG DINDERMAN; SOURCE: KENNETH DUNCAN ANATOLY KLYPIN / STEFAN GOTTL STEFAN / KLYPIN ANATOLY S&T:

At the present day, only a few galaxies lie between the present-day red ellipticals with old stars grew in size by peaks of the blue and red galaxies, in the so-called “green “dry” mergers — mergers between galaxies having older valley” (so named because green wavelengths are midway red stars but precious little star-forming cold gas. But between red and blue in the spectrum). A blue galaxy that the jury is still out on whether this mechanism works in is vigorously forming stars will become green within a detail to explain the observations. few hundred million years if star formation is suddenly quenched. On the other hand, a galaxy that has lots of old The Case of the Chaotic Blue Galaxies stars and a few young ones can also be green just through Ever since Hubble’s first spectacular images of distant the combination of the blue colors of its young stars and galaxies, an enduring puzzle has been why early star- the red colors of the old ones. The Milky Way probably forming galaxies look much more irregular and jumbled falls in this latter category, but the many elliptical galaxies than nearby blue galaxies. Nearby blue galaxies are around us today probably made the transition from blue relatively smooth. The most beautiful ones are elegant to red via a rapid quenching of star formation. CANDELS “grand-design” spirals with lanes of stars and gas, such as lets us look back at this history. M51. Smaller, irregular dwarf galaxies are also often blue. Most galaxies of interest to astronomers working on But at cosmic high noon, when stars were blazing CANDELS have a look-back time of at least 10 billion into existence at peak rates, many galaxies look distorted years, when the universe was only a few billion years old. or misshapen, as if galaxies of similar size are colliding. Because the most distant galaxies were relatively young at Even the calmer-looking galaxies are often clumpy and the time we observe them, we thought few of them would irregular. Instead of having smooth disks or spiral arms, have shut off star formation. So we expected that red gal- early galaxies are dotted with bright blue clumps of very axies would be rare in the early universe. But an impor- active star formation. Some of these clumps are over 100 tant surprise from CANDELS is that red galaxies with the times more luminous than the Tarantula Nebula in the same elliptical shapes as nearby red galaxies were already Large Magellanic Cloud, one of the biggest star-forming common only 3 billion years after the Big Bang — right regions in the nearby universe. How did the chaotic, dis- in the middle of cosmic high noon. ordered galaxies from earlier epochs evolve to become the Puzzlingly, however, elliptical galaxies from only familiar present-day spiral and elliptical galaxies? about 3 billion years after the Big Bang are only one- Because early galaxies appear highly distorted, astro- third the size of typical elliptical galaxies with the same physicists had hypothesized that major mergers — that is, stellar mass today. Clearly, elliptical galaxies in the early collisions of galaxies of roughly equal mass — played an universe must have subsequently grown in a way that important role in the evolution of many galaxies. Merg- increased their sizes without greatly increasing the num- ers can redistribute the stars, turning two disk galaxies ber of stars or redistributing the stars in a way that would into a single elliptical galaxy. A merger can also drive gas change their shapes. Many astronomers suspect that the toward a galaxy’s center, where it can funnel into a black

SkyandTelescope.com June 2014 21 Cosmological Simulations Astronomical observations represent snapshots of moments in time. It is the role of astrophysical theory to produce movies -- both metaphorical and actual -- that link these snapshots together into a coherent physical theory.

Cosmological dark matter simulations show large scale structure and dark matter halo properties, basis for semi-analytic models

Hydrodynamic galaxy formation simulations: evolution of galaxies, formation of galactic spheroids, mock galaxy images and spectra including stellar evolution and dust effects Galaxy Hydro Simulations: 2 Approaches 1. Low resolution (~ kpc) Advantages: it’s possible to simulate many galaxies and study galaxy populations and their interactions with CGM & IGM. Disadvantages: we learn little about how galaxies themselves evolve, and cannot compare in detail with high-z galaxy images and spectra. Examples: Overwhelmingly Large Simulations (OWLs, EAGLE), AREPO simulations in 100 Mpc box (Illustris) 2. High resolution (~10s of pc) Advantages: it’s possible to compare in detail with high-z galaxy images and spectra, to discover how galaxies evolve, morphological drivers (e.g., galaxy shapes, clumps and other instabilities, origins of galactic spheroids, quenching). Radiative feedback essential? Disadvantages: it’s hard to run statistical galaxy samples, so the best approach puts simulation insights into SAMs. Examples: ART and FIRE simulation suites, AGORA simulation comparison project GAS, Face-on GAS, Edge-on STARS, Face-on STARS, Edge-on ART hydro sims. CeverinoClumpy et al. 2010 Galaxies Face-onin hydroART GenerationEdge-on 1 Simulations Generation 1 Simulation Edge-on Ly alpha blobs from same simulation

observed Edge-on simulated z ~ 2 galaxies z ~ 2 galaxies

Face-on

Fumagalli, Prochaska, Kasen, Dekel, Ceverino, & Primack 2011 Radiative Feedback: Fewer Stars Simulated Galaxy 10 billion years ago

as it would More Elongated appear nearby to VELA27-RP VELA27-RP our eyes z = 2.1 z = 2.1 face-on edge-on

CANDELized as it would appear to Hubble’s ACS visual camera

as it would appear to Hubble’s WFC3 infrared camera Ceverino+ RP simulations analyzed by Zolotov, Dekel, VELA07-RP VELA12-RP Tweed, Mandelker, Ceverino, & Primack MNRAS 2015 FAST-TRACK

Barro+ (CANDELS) 2013

VELA11-RP VELA27-RP

SLOW-TRACK

COMPACTION —>

• minor merger • major merger 7

Zolotov+2015 Compaction and Quenching in the Inner 1 kpc

Innerwhole galaxy10 kpc

Avishai Dekel based on Zolotov+2015

inner 1 kpc DM Gen 3 VELA07-RP Animations z = 4.4 to 2.3 Gas Compaction

Stars } Face-on Edge-on

< 5 kpc

Gas Gas

< 1 kpc

Stars Stars The Astrophysical Journal Letters,792:L6(6pp),2014September1 van der Wel et al.

1

0.75

0.5

0.25

0 1

Prolate Galaxies0.75 Dominate at High Redshifts & Low Masses

0.5

Prolate 0.25 Spheroidal Oblate 0

Figure 3. Reconstructed intrinsic shape distributions of star-forming galaxies in our 3D-HST/CANDELS sample in four stellar mass bins and five redshift bins. The model ellipticity and triaxiality distributions are assumed to be Gaussian, with the mean indicated by the filled squares, and the standard deviation indicated by the open vertical bars. The 1σ uncertainties on the mean and scatter are indicated by the error bars. Essentially all present-day galaxies have large ellipticities, and small triaxialities—they are almost all fairly thin disks. Toward higher redshifts low-mass galaxies become progressively more triaxial. High-mass galaxies always have rather low triaxialities, but they become thicker at z 2. ∼ (A color version of this figure is available in the online journal.)

van der Wel+2014 Figure 4. Color bars indicate the fraction of the different types of shape defined in Figure 2 as a function of redshift and stellar mass. The negative redshift bins representSee the also SDSS Morphological results for z<0.1; the Survey other bins of are Galaxies from 3D-HST z=1.5-3.6/CANDELS. Law, Steidel+ ApJ 2012 (A color version of this figure is available in the online journal.) When Did Round Disk Galaxies Form? T. M. Takeuchi+ ApJ 2015 Letter allows us to generalize this conclusion to include earlier Despite this universal dominance of disks, the elongatedness epochs. of many low-mass galaxies at z ! 1impliesthattheshapeof At least since z 2moststarformationisaccountedforby agalaxygenerallydiffersfromthatofadiskatearlystages 10 ∼ !10 M galaxies (e.g., Karim et al. 2011). Figures 3 and 4 in its evolution. According to our results, an elongated, low- show that⊙ such galaxies have disk-like geometries over the same mass galaxy at z 1.5 will evolve into a disk at later times, or, redshift range. Given that 90% of stars in the universe formed reversing the argument,∼ disk galaxies in the present-day universe over that time span, it follows that the majority of all stars in the do not initially start out disks.13 universe formed in disk galaxies. Combined with the evidence As can be seen in Figure 3, the transition from elongated that star formation is spatially extended, and not, for example, to disky is gradual for the population. This is not necessarily concentrated in galaxy centers (e.g., Nelson et al. 2012;Wuyts et al. 2012)thisimpliesthatthevastmajorityofstarsformedin 13 This evolutionary path is potentially interrupted by the removal of gas and disks. cessation of star formation.

4 Our cosmological zoom-in simulations often produce elongated galaxies like observed ones. The elongated stellar distribution follows the elongated inner dark matter halo. MNRAS 453, 408–413 (2015) doi:10.1093/mnras/stv1603 Formation of elongated galaxies with low masses at high redshift Formation of elongated galaxies with low masses at high redshift Daniel Ceverino, Joel Primack and Avishai Dekel DanielABSTRACT Ceverino,1,2‹ Joel Primack3 and Avishai Dekel4 1WeCentro report de Astrobiolog the identification´ıa (CSIC-INTA), of elongated Ctra de Torrej (triaxialon´ a Ajalvir, or prolate) km 4, E-28850 Torrejon´ de Ardoz, Madrid, Spain 2galaxiesAstro-UAM, in Universidad cosmological Autonoma simulations de Madrid, at Unidadz ~ 2. AsociadaThese are CSIC, E-28049 Madrid, Spain 3 Department of Physics, University of California, Santa9.5 Cruz, CA 95064, USA 4preferentiallyCenter for Astrophysics low-mass and Planetary galaxies Science, (M∗ ≤ Racah 10 InstituteM⊙), residing of Physics, in The Hebrew University, Jerusalem 91904, Israel Dark matter halos are elongated, especially ! near their centers. Initially stars follow the ! dark matter (DM) haloes with strongly elongated inner parts, a gravitationally dominant dark matter, as shown.! common feature of high-redshift DM haloes in the cold dark But later as the ordinary matter central density Acceptedmatter cosmology. 2015 July 14. A Received large population 2015 June 26; of in originalelongated form galaxies 2015 April 20 grows and it becomes gravitationally dominant, produces a very asymmetric distribution of projected axis ratios, the star and dark matter distributions both Downloaded from become disky — as observed by Hubble as observed in high-z galaxy surveys. This indicates that the Space Telescope (van der Wel+ ApJL Sept majority of the galaxies at high redshiftsABSTRACT are not discs or spheroids 2014).! but rather galaxies with elongatedWe morphologies report the identification of elongated (triaxial or prolate) galaxies in cosmological simula- tions at z 2. These are preferentially low-mass galaxies (M 109.5 M ), residing in dark ≃ ∗ ≤ matter (DM) haloes with strongly elongated inner parts, a common feature⊙ of high-redshift

Nearby large galaxies are mostly disks and spheroids — but they start out looking more like pickles. http://mnras.oxfordjournals.org/ DM haloes in the ! cosmology. Feedback slows formation of stars at the centres of these haloes, so that a dominant and prolate DM distribution gives rise to galaxies elongated along the DM major axis. As galaxies grow in stellar mass, stars dominate the total mass within the galaxy half-mass radius, making stars and DM rounder and more oblate. A large population of elongated galaxies produces a very asymmetric distribution of projected axis ratios, as observed in high-z galaxy surveys. This indicates that the majority of the galaxies at high redshifts are not discs or spheroids but rather galaxies with elongated morphologies. Key words: galaxies: evolution – galaxies: formation. at University of California, Santa Cruz on October 16, 2015

cluded that the majority of the star-forming galaxies with stellar 1INTRODUCTION masses of M 109–109.5 M are elongated at z 1. At lower The intrinsic, three-dimensional (3D) shapes of today’s galaxies redshifts, galaxies∗ = with similar⊙ masses are mainly oblate,≥ disc-like can be roughly described as discs or spheroids, or a combination of systems. It seems that most low-mass galaxies have not yet formed the two. These shapes are characterized by having no preferential a regularly rotating stellar disc at z ! 1. This introduces an interest- long direction. Examples of galaxies elongated along a preferential ing theoretical challenge. In principle, these galaxies are gas-rich direction (prolate or triaxial) are rare at z 0(Padilla&Strauss and gas tends to settle in rotationally supported discs, if the angular 2008;Weijmansetal.2014). They are usually= unrelaxed systems, momentum is conserved (Fall & Efstathiou 1980;Blumenthaletal. such as ongoing mergers. However, at high redshifts, z 1–4, we 1986;Mo,Mao&White1998;Bullocketal.2001). At the same may witness the rise of the galaxy structures that we see= today at time, high-mass galaxies tend to be oblate systems even at high-z. the expense of other structures that may be more common during The observations thus suggest that protogalaxies may develop an those early and violent times. early prolate shape and then become oblate as they grow in mass. Observations trying to constrain the intrinsic shapes of the stellar Prolateness or triaxiality are common properties of dark mat- components of high-z galaxies are scarce but they agree that the ter (DM) haloes in N-body-only simulations (Jing & Suto 2002; distribution of projected axis ratios of high-z samples at z 1.5–4 Allgood et al. 2006; Bett et al. 2007;Maccioetal.` 2007;Maccio,` is inconsistent with a population of randomly oriented disc= galaxies Dutton & van den Bosch 2008;Schneider,Frenk&Cole2012, (Ravindranath et al. 2006;Lawetal.2012;Yuma,Ohta&Yabe and references therein). Haloes at a given mass scale are more pro- 2012). After some modelling, Law et al. (2012) concluded that late at earlier times, and at a given redshift more massive haloes the intrinsic shapes are strongly triaxial. This implies that a large are more elongated. For example, small haloes with virial masses 11 population of high-z galaxies are elongated along a preferential around Mv 10 M at redshift z 2areasprolateastoday’s direction. galaxy clusters.≃ Individual⊙ haloes are= more prolate at earlier times, van der Wel et al. (2014) looked at the mass and redshift de- when haloes are fed by narrow DM filaments, including mergers, pendence of the projected axis ratios using a large sample of star- rather than isotropically, as described in Vera-Ciro et al. (2011). The forming galaxies at 0

C ⃝ 2015 The Authors Published by Oxford University Press on behalf of the Royal Astronomical Society FormationIn hydro of sims, elongated dark-matter galaxies dominated with low galaxies masses are at Daniel Ceverino, Joel Primack and Avishai Dekel MNRAS 2015 high redshift prolateCeverino, Primack, DekelMNRAS 453, 408 (2015) 10 Stars M* <10 M☉ at z=2

Dark matter

Also Tomassetti et al. 2016 MNRAS Simulated elongated galaxies are aligned with cosmic web filaments, become round after compaction 20 kpc (gas inflow to center) How we are using galaxy simulations and deep learning to improve understanding of galaxy formation, with support from Google.

Sander Dieleman used a deep learning code to predict Galaxy Zoo nearby galaxy image classifications with 99% accuracy, winning 2014 Kaggle competition

Marc Huertas-Company used Dieleman’s code to classify CANDELS galaxy images H-C et al. 2015, Catalog of Visual-like Morphologies in 5 CANDELS Fields Using Deep Learning H-C et al. 2016, Mass assembly and morphological transformations since z ~ 3 from CANDELS

Google supports Marc H-C’s visits to UCSC Summer 2016 and 2017, and his grad student Fernando Caro’s visit March-August 2017 using deep learning, CANDELS images, and Primack group’s galaxy simulations to understand galaxy formation UCSC group here today: Profs. David Koo, Joel Primack; grad students Fernando Caro, Christoph Lee, Viraj Pandya, astrophysics senior thesis student Sean Larkin

Related UCSC deep learning project: better galaxy environment estimates Here today: Dr. Doug Hellinger, grad students James Kakos, Dominic Pasquali

Related UCSC deep learning project: damped Lyα systems in SDSS spectra Here today: Dr. Shawfeng Dong Sander Dieleman used a deep learning code to predict Galaxy Zoo nearby galaxy image classifications with 99% accuracy, winning 2014 Kaggle competition

http://benanne.github.io/2014/04/05/galaxy-zoo.html

The Galaxy Zoo 2 decision tree. Reproduced from fig.1 in Willett et al. (2013). Krizhevsky-style diagram of the architecture of the best performing network.

Dieleman, Willett, Dambre 2015, Rotation-invariant convolutional neural networks for galaxy morphology prediction, MNRAS

From the ABSTRACT: The Galaxy Zoo project successfully applied a crowdsourcing strategy, inviting online users to classify images by answering a series of questions. We present a deep neural network model for galaxy morphology classification which exploits translational and rotational symmetry. For images with high agreement among the Galaxy Zoo participants, our model is able to reproduce their consensus with near-perfect accuracy (>99 per cent) for most questions. Marc Huertas-CompanyThe Astrophysical Journal Supplement Series, 221:8 (23pp used), 2015 November Dieleman’s codeHuertas-Company to classify et al. CANDELS galaxy images H-C et al. 2015, Catalog of Visual-like Morphologies in 5 CANDELS Fields Using Deep Learning In this work, we mimic human perception with deep learning using convolutional neural networks (ConvNets). The ConvNet is trained to reproduce the CANDELS visual morphological classification based on the efforts of 65 individual classifiers who contributed to the visual inspection of all of the galaxies in the GOODS-S field. It was then applied to the otherThe Astrophysical four JournalCANDELS Supplement Series, fields.221:8 (23pp The), 2015 November galaxy classification dataHuertas-Company was then et al. released to the astronomical community.

Following the approach in CANDELS, we associate five real numbers with each galaxy corresponding to the frequency at which The Astrophysical Journal Supplement Series, 221:8 (23pp), 2015 November Fully Connected LayersHuertas-Company et al. expert classifiers flagged a galaxy as having a bulge, having a disk, presenting an irregularity, being compact or point-source, and being unclassifiable. Galaxy images are interpolated to a fixed size, rotated, and randomly perturbed before feeding the network to (i) avoid over-fitting and (ii) reach a comparable ratio of background versus galaxy pixels in all images. ConvNets are

Figure 2. Configuration of the Convolutional Neural Network used in this paper. The Network is based on the one used by Dieleman et al. (2015) on SDSS galaxies. It able to predict the votes of expert classifiers is made of 5 convolutional layers followed by 2 fully connected perceptron layers. In the convolutional part there are also 3 max-pooling steps of different sizes. The input are SDDSized CANDELS galaxies as explained in the text and the output (for this paper) is made of 5 real values corresponding to the fractions defined in the with a <10% bias and a ∼10% scatter. This fi CANDELS classi cation scheme. makes the classification almost equivalent to Configuration of the Convolutional Neural Network used in this paper, a visual-based classification. The training based on the one used by Dieleman et al. (2015) on SDSS galaxies. It took 10 days on a GPU and the classification is made of 5 convolutional layers followed by 2 fully connected is performed at a rate of 1000 galaxies/hour. perceptron layers.

Figure 2. Configuration of the Convolutional Neural Network used in this paper. The Network is based on the one used by Dieleman et al. (2015) on SDSS galaxies. It is made of 5 convolutionalFigure 3. CANDELS layers followedMain Morphology by 2 fully connectedvisual classi perceptronfication layers. scheme In as the described convolutional in Kartaltepe part there et are al. ( also2014 3). max-pooling Each classifi stepser (3– of5 diff pererent galaxy sizes. onaverage The ) is asked to input are SDDSizedprovideCANDELS 5 flags for galaxies each galaxy as explained corresponding in the totext the and main the morphological output (for this properties paper) is made of the of galaxy 5 real as values labeled corresponding in the figure. to The thefl fractionsags are then defined combined in the to produce the H-CCANDELS et classifractions fial.cation of peoplescheme. 2016, that voted for Mass a given feature. assembly and morphological transformations since z ~ 3 from CANDELS ConvNets have been proven to perform extremely well in compared to the rate of 26.2% achieved by the second best From theimage ABSTRACT: recognition tasks. For example,We quantify they have achieved the anevolutioncompetitors of(non-deep star-forming).Also,theperformanceofconvolu- and quiescent galaxies as a function of morphology from error rate of 0.23% for the MNIST database, which is a tional neural networks on the ImageNet tests is now close z ~ 3 to the present. Our main results are: 1) At z ~ 2, 80% of thefi stellar mass density of star-forming galaxies is in Figurecollection 2. Configuration of manuscript of the Convolutional numbers Neural considered Network usedas in this a standardpaper. The Networkto is based a purely on the one usedhuman-based by Dieleman et classi al. (2015cation) on SDSS( galaxies.Russakovsky It is madetest of 5 for convolutional all new layers machine followed by learning 2 fully connected algorithms perceptron(Ciresan layers. In the convolutionalet al. 2014 part). there are also 3 max-pooling steps of different sizes. The 9 irregularinputet are al.SDDSized 2012systems.).Whenappliedtofacialrecognition,theyachieveCANDELS galaxies However, as explained in the by text andz the∼ output0.5,(for irregular this paper)ConvNetsis made of objects 5 real were valuesfi correspondingrst only applied to dominate tothe fractions galaxy defi morphologicalned in at the stellar masses below 10 M⊙. fi CANDELSa97.6%recognitionrateon5600imagesofmorethan10 classi cation scheme. classification earlier this year in the framework of the Galaxy 13 10.8 2)Figure Quenching: 3. CANDELSsubjectsMain(Matusugu Morphology We etvisual confirm al. 2003 classifi)cation.TheImageNetLargeScale schemethat as describedgalaxies in Kartaltepe reaching et al.Zoo(2014 Challenge). Each classi a fistellar oner (3 the–5 per Kaggle galaxy mass on platform. average M) is asked∗The ~ to 10 goal of theM⊙ tend to quench. Also, quenching implies provide 5 flagsVisual for each Recognition galaxy corresponding Challenge to the main is morphological a benchmark properties in of object the galaxy as labeledchallenge in the wasfigure. to Thefindflags an are algorithm then combined able to to produce predict the the 37 votes thefractions presence of people that voted for of a given a feature.bulge: the abundance of massive red disks is negligible at all redshifts classification and detection, with millions of images and of the Galaxy Zoo 2 release. The winner of the competition hundreds of object classes. In Krizhevsky et al. (2012), ConvNetsConvNets have been were proven able to to perform achieve extremely an error well rate in of 15.3%compared to13 thehttps: rate//www.kaggle.com of 26.2% achieved/c/galaxy-zoo-the-galaxy-challenge by the second best image recognition tasks. For example, they have achieved an competitors (non-deep).Also,theperformanceofconvolu- error rate of 0.23% for the MNIST database, which is a tional neural3 networks on the ImageNet tests is now close collection of manuscript numbers considered as a standard to a purely human-based classification (Russakovsky test forFigure all 3. newCANDELS machineMain Morphology learningvisual algorithms classification(Ciresan scheme as describedet al. in2014 Kartaltepe). et al. (2014). Each classifier (3–5 per galaxy on average) is asked to et al. 2012provide).Whenappliedtofacialrecognition,theyachieve 5 flags for each galaxy corresponding to the main morphological propertiesConvNets of the galaxy were as labeledfirst in theappliedfigure. The tofl galaxyags are then morphological combined to produce the a97.6%recognitionrateon5600imagesofmorethan10fractions of people that voted for a given feature. classification earlier this year in the framework of the Galaxy subjects (Matusugu et al. 2003).TheImageNetLargeScale Zoo Challenge on the Kaggle platform.13 The goal of the Visual RecognitionConvNets have Challenge been proven is a to benchmark perform extremely in object well inchallengecompared was to fi tond the an rate algorithm of 26.2% able achievedto predict by the the 37 votessecond best classifiimagecation recognition and detection, tasks. with For example, millions theyof images have achieved and anof thecompetitors Galaxy Zoo( 2non-deep release.) The.Also,theperformanceofconvolu- winner of the competition hundredserror of rate object of 0.23% classes. for In the Krizhevsky MNIST database, et al. (2012 which), is a tional neural networks on the ImageNet tests is now close 13 ConvNetscollection were ofable manuscript to achieve numbers an error considered rate of as 15.3% a standard https://towww.kaggle.com a purely/ human-basedc/galaxy-zoo-the-galaxy-challenge classification (Russakovsky test for all new machine learning algorithms (Ciresan et al. 2014). et al. 2012).Whenappliedtofacialrecognition,theyachieve3 ConvNets were first applied to galaxy morphological a97.6%recognitionrateon5600imagesofmorethan10 classification earlier this year in the framework of the Galaxy subjects (Matusugu et al. 2003).TheImageNetLargeScale Zoo Challenge on the Kaggle platform.13 The goal of the Visual Recognition Challenge is a benchmark in object challenge was to find an algorithm able to predict the 37 votes classification and detection, with millions of images and of the Galaxy Zoo 2 release. The winner of the competition hundreds of object classes. In Krizhevsky et al. (2012), ConvNets were able to achieve an error rate of 15.3% 13 https://www.kaggle.com/c/galaxy-zoo-the-galaxy-challenge

3 Google supports Marc H-C’s visits to UCSC Summer 2016 and 2017, and his grad student Fernando Caro’s visit March-August 2017 using deep learning, CANDELS images, and Primack group’s galaxy simulations to understand galaxy formation

Bulge Disk F160W image

Bulge+Disk

H-band morphology compaction merger

V23 DM F160W image

DM Gas Stars Stars ex situ stars Zoom-in on Gas merger event

Simulation at z ~ 1.2

Galaxy Center Whole Galaxy

EvolutionFigure of 3: zoom-in Evolution galaxy of zoom-in simulation galaxy VELA23-RP. simulation VELA23-RP. The upper Thethree upper panels three show panels the show probabilities the that theprobabilities galaxy is that best the fit galaxy by GALFIT is best as fit a by single-Sersic GALFIT as aBulge single-S´ersic or Disk, Bulge or instead or Disk, as or a instead double as Sersic Bulge+Disk,a double based S´ersic Bulge+Disk, on classifications based on by classifications a deep learning by deep code learning trained codes using trained synthetic using images. (Notesynthetic that these images. probabilities (Note that do these not probabilitiesneed to sum do to not unity, need since to sum they to unity,are independent.) since they are Classificationsindependent.) are Classifications plotted for 20 are different plotted for orientations, 20 di↵erent orientations,with the medians with the plotted medians as plottedheavy lines. as heavy lines. The lower panels show the evolution of masses and rates in the inner 1 kpc (left panel) and out to 10 kpc (right panel). Masses plotted are dark matter (black), stars formed in situ (red), accreted ex situ stars (green), and gas (blue), and mass rates plotted are star formation (purple), gas inflow (cyan), and gas outflow (magenta).

mergers leads to stellar mass dominating over dark matter in the inner kpc. The Bulge probability accordingly increases at z < 3andthepureDiskandBulge+Diskprobabilities drop there. The gas density in the galaxy⇠ center then declines as gas is turned into stars or expelled, and the stellar mass in the inner kpc remains essentially constant for several Gyr. But on the 10 kpc scale star formation continues, producing a disk around the bulge, so the Bulge+Disk and the pure Disk probabilities increase and the pure Bulge probability decreases. Then a gas-rich major merger occurs at z 1.2, leading to significant central star formation with a corresponding increase in the Bulge⇠ and Bulge+Disk probabilities and a decrease in the pure Disk probability. The key message from this and other tests we have done is that the deep learning codes eciently extract information in the H-band images of the forming galaxy at most orien- tations that correlates with the astrophysical phenomena. Note also that the compaction due to gas inflow at z 3andthebulgegrowthduetoamergeratz 1.2leadtosimilar ⇠ ⇠ 7

last last last last

last last

~400Myrs ~400Myrs ~400Myrs

no event in in event no in event no in event no

~400Myrs

no event in in event no

No mergers No mergers No mergers No

[Preliminar!] [Preliminar!] [Preliminar!]

No mergers No [Preliminar!]

[Preliminar!]

microns] microns] microns]

microns]

time time time

35 VELA time

mock images mock images mock images mock

Mergers Mergers Mergers

time time

mock images mock

NIRCAM JWST JWST NIRCAM JWST NIRCAM JWST NIRCAM

3 filters [0.7-2.7 [0.7-2.7 filters 3 [0.7-2.7 filters 3 [0.7-2.7 filters 3 Mergers

mock images [Preliminar!] JWST NIRCAM

3 filters [0.7-2.7 [0.7-2.7 filters 3

~600Myrs ~600Myrs ~600Myrs

window of of window of window of window

zoom-in ~600Myrs window of of window Google supportssimulations Marc35 VELA H-C’s visits to UCSCNIRCAM Summer JWST 2016 and 2017, and his grad 3 filters [0.7-2.7mock images

student Fernando Caro’szoom-in visit P{merger} March-August 2017 using deep learning, CANDELS microns]NIRCAM JWST simulations CNN

images, and Primack group’s galaxy simulations3 filters to [0.7-2.7 understand galaxy formation

CNN CNN CNN P{merger}

microns] CNN

P{merger} P{merger}

P{merger} ROC curve UCSC group here today: Profs. David KooCNN, Joel Primack; grad students FernandoP{merger} Caro, Christoph Lee, Viraj Pandya, astrophysics senior thesis student Sean ROCLarkin curve [Preliminar!] [Preliminar!] ROC curve 35 VELA Mergers No mergers mock images 35 VELA zoom-in DM mass < Rv mock imagessimulations time no event in NIRCAM JWST zoom-in Lookback time window of last 3 filters [0.7-2.7 P[merger]

NIRCAM JWST~600Myrs ~400MyrsP{merger}

zoom-in zoom-in zoom-in zoom-in

Mergers No mergers

Lookback time Lookback time Lookback

Lookback time Lookback microns]

35 VELA VELA 35 VELA 35 VELA 35

simulations

zoom-in zoom-in

Lookback time Lookback

35 VELA VELA 35

simulations simulations simulations CNN

3 filters [0.7-2.7 simulations

DM mass < Rv CNN

P{merger}

DM mass < Rv < mass DM Rv < mass DM DM mass < Rv < mass DM time no event in Lookback time microns] window of last Rv < mass DM CNN ~600Myrs ~400Myrs INPUT INPUTOUTPUT INPUT Mergers No mergers

Example of DM mass < Rv time no event in P_merger P_merger Lookback time window of last 1 simulation ~600Myrs ~400Myrs P_merger

Mergers No mergers Redshift Redshift Redshift DM mass < Rv time no event in Redshift Redshift ROC curve ThisLookback is an timeoversimplified example,window of last Redshift where we just used the total dark~600Myrs ~400Myrs matter mass within the halo radius RV to estimate when a major merger Applied to all occurred. We are now analyzing 35 simulations the entire satellite galaxy population to determine when major and minor mergers and satellite fly-bys occur.

INPUT P_merger

Redshift Redshift Page 8 of 14 1 2 1 Page 9 of 14 Page 8 of 14 3 2 Page 9 of 14 4 3 5 4 1 5 6 2 1 6 7 3 2 7 SORT allows recovery of the 2-point 8 4 8 3 9 5 correlation function for s > 4 Mpc/h 9 4 SORT allows recovery of the 2-point 10 6 5 11 10 correlation function for s > 4 Mpc/h 11 7 6 12 8 7 13 12 Related UCSC deep learning project: better galaxy environment9 8 estimates 14 13 10 9 15 14 11 16 15 10 Here today: Dr. Doug Hellinger, grad students James Kakos, 12Dominic Pasquali 17 16 11 18 17 13 12 Images at various wavelengths (=>photometric19 18 redshifts, photo-z’s)14 are13 much more plentiful than 1 20 19 15 14 2 spectroscopic redshifts. HowPage can 8 of 14we best21 20 combine a few spectroscopic16 15 z’s with many photo-z’s Page 9 of 14 1 Page 8 of 14 3 22 21 17 16 to estimate the environment2 of each galaxy? A preprint by Nicholas Tejos, Aldo Rodriguez- 4 23 22Page 9 of 14 18 17 3 5 24 23 19 18 Puebla, and Joel4 Primack introduces a method (“sort”) to do this. Can deep learning do better? 6 1 24 20 19 5 25 7 2 251 21 6 26 20 8 3 26 22 7 27 2 21 9 4 spec-z’s photo-z’s28 273 sort23 z’s22 true z’s 10 8 SORT allows recovery28 of the 2-point 5 29 4 24 23 SORT allows recovery of the 2-point 11 6 9 29 10 correlation function30 5 for s > 4 Mpc/h 25 24 12 7 30 correlation function for s > 4 Mpc/h 11 31 6 26 25 13 8 31 12 32 7 27 26 14 9 33 32 15 13 338 28 27 10 14 34 9 29 28 16 11 35 34 17 15 3510 30 29 12 16 36 11 31 30 18 37 36 13 17 12 32 31 19 38 37 14 18 13 Page 6 of 14 33 32 20 Page 9 of 14 39 38 Page 6 of 14 15 19 14 34 21 40 39 33 22 16 20 4015 35 34 17 41 23 1 1 21 1 4116 36 35 18 42 24 2 2 22 2 4217 37 36 19 23 43 25 3 3 3 44 4318 38 37 20 24 4 26 4 4 45 4419 39 38 21 25 5 27 5 5 46 4520 40 39 22 26 6 28 6 6 47 4621 41 40 29 23 7 27 7 47 7 8 48 22 42 41 30 24 8 28 48 Ratio of measured and 8 9 49 23 43 42 31 25 29 49 9 9 10 50 24 44 43 true 2-point correlation 32 30 50 1026 10 11 51 25 45 44 33 27 11 31 51 function as a function 11 32 12 52 26 46 45 34 28 12 13 52 12 33 53 27 47 46 35 29 13 14 54 53 of redshift space 36 13 34 28 48 47 30 14 15 55 54 37 14 35 29 49 48 distance s. Sort gets it 31 15 16 56 55 38 15 36 30 50 49 -1 32 16 17 57 56 right for s > 4 h Mpc, 39 16 37 18 31 51 50 33 17 58 57 40 17 38 19 32 52 51 34 18 59 58 while photo-z’s fail 41 18 39 20 33 5352 19 60 59 -1 3519 40 21 34 5453 42 20 60 even at s > 40 h Mpc. 43 3620 41 22 35 5554 21 44 3721 42 23 36 5655 22 45 3822 43 24 37 5756 23 25 46 3923 44 38 5857 40 24 24 45 26 47 27 39 5958 41 25 46 48 25 28 40 6059 26 47 49 42 26 29 41 60 50 43 27 27 48 28 30 42 51 44 28 49 31 43 29 50 52 45 29 32 44 30 51 53 46 30 33 45 31 52 34 54 47 31 46 55 32 53 35 48 32 47 56 33 54 36 49 33 48 57 34 55 37 50 34 49 58 35 56 38 51 35 39 50 59 52 36 36 57 37 58 40 51 60 53 37 38 59 41 52 54 38 42 39 60 53 55 39 43 40 54 56 40 44 41 55 57 41 45 42 56 58 42 46 43 47 57 59 43 44 48 58 44 60 45 49 59 45 46 50 60 46 47 51 47 48 52 48 53 49 49 54 50 50 55 51 51 56 52 52 57 53 53 58 54 59 54 55 60 55 56 56 57 57 58 58 59 59 60 60 Related UCSC deep learning project: damped Lyα (DLA) systems in SDSS spectra Here today: Dr. Shawfeng Dong; co-authors David Park, Prof. J. Xavier Prochaska, Dr. Zheng Cai

DLA systems seen in quasar spectra, corresponding to at least 2x1020 hydrogen atoms/cm2, represent most of the neutralDLA hydrogen detection in the using universe deep learningat redshifts z = 2 to 4. About 7000 DLAs were identified by astronomers in about 100,000 quasar spectra. The additional 270,000 DLA detection using deepsightlines learning thatDLA recently detection DLAbecame using detection deep available learning using deep from learning the Sloan Digital Sky Survey were scanned for DLAs by a deep learning code, and the resulting DLA catalog will be made publicly available.

The sightline is broken into 400 pixel segments in a sliding window, so 1748 DLA DLA inference computations must be made for each sightline. Using each of the 1748 pixels in the sightline as the center point of a 400 pixel window generates a prediction per pixel. This approach facilitates identifying overlapping DLAs and generates a large training dataset. Figure 15: The sightline is broken into 400 pixelFigure segments 15: The in a sightline slidingFigure window, 15: is brokenThe so sightline into 1748 400 inference is pixel broken segments computations into 400 in pixel a sliding must segments window, in a so sliding 1748 inference window, socomputations 1748 inference must computations must be made for each sightline. Using each of thebe 1748 made pixels for each in thebe sightline. sightline made for Using as each the sightline. each center of pointthe Using 1748 of a eachpixels 400 of in the the 1748 sightline pixels as in the the center sightline point as of the a center400 point of a 400 pixel window generates a prediction per pixel. Thispixel approach window facilitates generatespixel window aidentifying prediction generates overlapping per pixel. a prediction This DLAs approach per and pixel. facilitates This approach identifying facilitates overlapping identifying DLAs overlapping and DLAs and generates a large training dataset. generates a large traininggenerates dataset. a large training dataset. identified sightlines with or withoutidentified a DLA sightlinesofidentified the window with sightlines andor without the localization with a DLA or without labelof skips the a DLA windowof and the the window localization and the label localization skips label skips (simple binary classification). We then(simple created binaryfrom(simple classification).60 to binary0 suddenly. classification). We then This created would We then certainlyfrom created60 to 0 fromsuddenly.60 to This0 suddenly. would certainly This would certainly ± ± ± a separate network that performed localizationa separate networkcausea separate a that learning performed network algorithm that localization performed trouble, and localizationcause to side a learningcause algorithm a learning trouble, algorithm and to trouble, side and to side by improving upon the simple binary classifica-by improvingstep uponby the improving the problem simple upon we binary do the not classifica- simple train thebinary algorithmstep classifica- the problemstep we the do problem not train we the do algorithm not train the algorithm tion and outputting a predicted center pointtion and of outputtingontion these and a cases, predicted outputting and during center a predicted point inference of center weon do point these not of cases,on and these during cases, inference and during we do inference not we do not the DLA relative to the current input.the Finally DLA relativeneedthe the to DLA the values relative current at these to input. the edges current Finally to be input. accurate,need Finally the valuesneed at these the values edges at to these be accurate, edges to be accurate, An outline of the neural network architecture used, three convolutional we trained a third model to take the sampleswe trained athey thirdwe go trained model unused a tolayers third and take untrained. modelfollowed the samples to by take a fully thethey samplesconnected go unused layer.they and goThese untrained. unused layers and of untrained. the network are where a DLA exists somewhere in the windowwhere a DLA existsThewhere figure somewhere a DLA17sharedshows exists in somewherethe acomponents. sightline window and in The the the windowThefinal posi- figure layer 17of showstheThe network figure a sightline17 hasshows 3 and independent a the sightline posi- and the posi- and trained it to predict the column densityand trained of tive it toand (green), predict trained negativethe itfully column to predictconnected (red), density the and column layers. ignored of tive densityEach (grey) (green), of ofthese negative tive3 layers (green), (red), connects negative and ignored to (red),the shared (grey) and ignored fully (grey) the DLA, regardless of its location. Wethe found DLA, regardlessregions.the DLA, Notablyof its regardless location.connected there of are We itslayer. far found location. more The negativenetworkregions. We found is Notably trainedregions. thereusing Notably arethe farAdam more there gradient negative are far descent more negative that combining the models into one producedthat combiningregionsthat the combining models than positiveoptimizer into the one regionsmodels produced in Tensor-flow. into of the one sightline.regions produced than positiveregions regions than positive of the regions sightline. of the sightline. better results than training each modelbetter inde- resultsThis thanbetter is the training results reason eachthan that our modeltraining training inde- each dataset modelThis only is inde- the reasonThis that is the our reason training that dataset our training only dataset only pendently. An astute reader will note thatpendently. us- Ancontainspendently. astute sightlines reader An will astute with note readerDLAs. that us- willIn training notecontains that we us- sightlinescontains with sightlines DLAs. In with training DLAs. we In training we ing a single model for all 3 of these outputsing a singlesample modeling a all for single positive all 3model of samples these for outputs all from 3 of the these sightline,sample outputs all positivesample samples all positive from the samples sightline, from the sightline, will force us to train on areas of the sightlinewill force usand towill train an force equal on us areas number to train of the of on negative sightline areas of sightlines theand sightline an at equal numberand an equalof negative number sightlines of negative at sightlines at where no DLA exists, rendering the resultwhere of no DLAchosenwhere exists, at no random rendering DLA so exists, that the renderingwe result maintain of the achosen result50/50 at of randomchosen so that at random we maintain so that a 50/50we maintain a 50/50 the column density measurement irrelevant.the columnbalance densitythe column between measurement density positive measurement irrelevant. and negative regions.balance irrelevant. betweenbalance positive between and negative positive regions. and negative regions. We side step this issue by masking the gradientWe side step thisFigureNoteWe issue side that bystep 2: we maskingAn this also outline issue ignore the by gradient regions of masking the neural of the the gradient sight-Note network that we architecture alsoNote ignore that we regions used. also See ignoreof the section sight- regionsV offor the a sight- more detailed description. appropriate, and discuss this in detailappropriate, in the line andappropriate, where discuss Lyb this andabsorption in discuss detail takes in this the in place, detailline these in where the Lyblineabsorption where Ly takesb absorption place, these takes place, these section on multi task learning V. section on multiregionssection task are learning on often multi falselyV task. detected learning asV. sub-DLA’s,regions are oftenregions falsely are detected often falsely as sub-DLA’s, detected as sub-DLA’s, A visualization of the labels for each pixelA visualizationNeuraland evenA of visualization Network the bonafide labels DLAs for of each the Architecture when labels pixel the for true eachand DLA even pixel bonafideand even DLAs bonafide whenare the the DLAs true flux when DLA values the true of DLA a 400 pixel segment of in a sightline is shown in the visualizationin a sightlinehasin islog a shown sightlineNHI > in21 the. is Training shown visualization in on the these visualization regionshas log NHI >has21.log TrainingNHI > on21. these Training regions on these regions 16. We demonstrate how the data is labeled,16. We demonstratedoes16. not We stop how demonstrate the the algorithm data howis labeled, from the datalearning, isdoes labeled, but not stop thedoes algorithm not stopthe from the algorithmsightline learning, frombut (for learning, each sightline but we pass it every The neural network architecture is shown in I think classification is fairly straightI think forward,classification asI long think lowers isclassification fairly its straight accuracy is forward, fairly by training straight as long on forward, labelslowers as that long its accuracylowers by its training accuracysuch on by 400 labels training pixel that on segment labels that as separate samples, some as the DLA is within 60 pixelssome of the 400assome pixel the DLAfigureindicate is withinas the2. no DLA 60 We absorption pixels is use within of athe exists 60 convolutional 400 pixels when pixel of it the does.indicate 400 It’s pixel neural no absorptionindicate net- exists no absorption when it does. exists It’s when it does. It’s 1748 of them). The input is not normalized, words window it takes on a 1/truewords value, elsewindow 0/false.wordswork it takesinstructivewindow with on a 1/true it three to takes point value, on convolutional out a else 1/true that 0/false. the value, algorithm elseinstructive layers 0/false. is followed to pointinstructive out that to point the algorithm out that the is algorithm is are miss- Localization operates similarly,are miss- thoughLocalization insteadare miss- not operatesLocalization trained similarly, in a operates manner though similarly, that instead would though allownot instead it trained to in anot manner trained that inbut awould manner we allow do that it pad would to the allow blue it to end of the spectrum ing here of a 0/1 value the output ising an here offset betweenofing a 0/1 hereby valueidentify aof fully the a output0/1 the connected value difference is an the offset output between layer. between is an a offsetLya absorp-identify between the differenceidentify thebetween difference a Lya betweenabsorp- a Lya absorp- to lrest = 900Å if it does not have data in that [ 60, +60], essentially pointing to the[ DLA,60, +60],Convolutionaltion essentially[ and60, + a60 Ly pointing],b essentiallyabsorption. neural to the pointing DLA, Although networks totion the this DLA,and is essentially a Lybtionabsorption. and a Lyb Althoughabsorption. this Although is this is range as discussed in ??. It goes through three and column density is either the columnand den- columnbreakpotentially densityand an column isimage either feasible, density the into we column is did component either not den- the include columnpotentially piecesit in den- feasible, whichpotentially we did feasible, not include we did it not in include it in sity of a DLA if one is within 60 pixelssity of that of a DLAthe ifsity scope one of is a of within DLA this if work 60 one pixels isas within we of can that 60 simply pixelsthe com- of scope that of thisthe work scope as of westandard this can work simply as we convolutional com- can simply com- layers each followed point, or 0 (we again reference the sectionpoint, on orare 0 (wepute matchedpoint, again the Ly or referenceb 0location to(we different again the and section reference mark parts on any the identified of sectionpute the the image on Lyb locationpute you the and Lyb marklocation any andidentified mark any identified by a pooling layer (essentially down sampling multi task learning for further details heremultiV). taskare learningabsorption lookingmulti for task asfurther learning at, Lyb thenin details post-processing. for furtherbuilds here V). details theseabsorption here backV). as up Lyabsorptionb inin post-processing. as Lyb in post-processing. A notable problem occurs at the boundaryA notable problemConvolutionalA notable occurs problem network at the architecture: boundary occurs at thethe boundary neuralConvolutional networkConvolutional architecture:the network inputthe architecture: neural size) whichthe neural breaks the sightline into a hierarchical fashion to construct the desired where a valid DLA is 61 pixels from thewhere center a validnetwork DLAwhere is is a 61 validconstructed pixels DLA from is using 61 the pixels center a fairly from standardnetwork the center is constructednetwork usingis constructedcomponent a fairly standard using pieces. a fairly standard Then a fully connected layer result. The process of breaking the image, or is applied at the end. 14 14 sightline14 in our case, into component pieces, The architecture to that point is fairly sim- and learning what those components are that ple, three convolutional layers followed by a are most beneficial in producing a correct re- fully connected layer. These layers of the net- sult, forces the neural network to learn con- work are shared components. The final layer cepts such as measuring column density, rather of the network has 3 independent fully con- than simply memorizing the input. Hence con- nected layers. Each of these 3 layers connects volutional network generalize to unseen sam- to the shared fully connected layer, and each of ples quite well. the 3 output layers has a loss function: 1) sig- The structure of the network with detailed moid/cross entropy loss for [0, 1] classification; hyperparameters is provided in section V. 2) square-loss for [ 60, +60] localization; and The architecture consists of an input, which 3) square-loss for real valued column density

5 Talk at 30 May 2017

New Insights on Galaxy Formation from Comparing Simulations and Observations

Joel Primack Distinguished Professor of Physics Emeritus, UCSC

Brief introduction to modern cosmology, based on ΛCDM: dark energy & dark matter

Cosmic large scale structure simulations and star formation in galaxies

Comparing high-resolution hydrodynamic galaxy simulations with observations

Astronomers used to think that galaxies form as disks, that forming galaxies are pretty smooth, and that galaxies generally grow in radius as they grow in mass — but Hubble Space Telescope data show that all these statements are false, and our simulations may explain why.

We are using these simulations and deep learning to improve understanding of galaxy formation, with support from Google.