<<

Analyzing Data From Astronomical Surveys: Issues and Directions

Eddington, Malmquist, Lutz-Kelker and all that

Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/ SAMSI SPS colleagues: Ruth Barrera, David Chernoff, Pablo de la Cruz, Martin Hendry, Woncheol Jang, Hyunsook Lee, Ji Meng Lo, Haywood Smith, Antonio Uribe, Michael Woodroofe . . .

Collaborators: Ira Wasserman, Brett Gladman, Jean-Marc Petit . . . We Survey Everything! Lunar Craters Solar Flares TNOs

Stars & GRBs Distributions Galore Basic: Directions and Fluxes log(N)–log(S) curves, number counts, size-frequency dist’ns. . .

1200 GRBs from BATSE 4B Catalog ∼ Advanced: Directions, Fluxes and Indicators & Distance Distributions

104 Galaxies from Millenium Catalog “Houston, we have a problem . . . ”

Selection effects (truncation, censoring) — obvious (usually) • Extensively discussed in earlier SCMA meetings “Houston, we have a problem . . . ”

“Scatter” effects (measurement error, etc.) — insidious • Emphasized in this presentation Outline

1 Observables, desirables & integrals distributions Distance–luminosity distributions

2 Cornucopia of complications Selection effects Scatter biases

3 Statistical methodology Current practice Promising directions Outline

1 Observables, desirables & integrals Brightness distributions Distance–luminosity distributions

2 Cornucopia of complications Selection effects Scatter biases

3 Statistical methodology Current practice Promising directions “You can’t always get what you want”

What we want The distribution of sources in space: number density (intensity function for a possibly inhomogeneous Poisson point process)

n(r) = n(r, Ω) at distance r, direction Ω = n(r) (isotropic case)

The distribution of source : luminosity function (luminosity = total energy emitted/unit time)

f (L; r) = pdf for L given r = f (L; r) — “evolution” (isotropic) = f (L) — “universal” = δ(L L ) — “standard candles” − 0 What we get The primary observables are direction, Ω, and flux (energy/unit time/unit area): L F = the “root of all evil!” 4πr 2 ← Note this conflates r and L! Further complications (ignored here!)

Passbands/k-corrections • Extinction • Reflective sources (F 1/r 4) • ∝ Transient/variable sources • ... •

Geometry of space-time alters inverse-square law (redshift, time dilation) and volume element in spatial integrals:

χ = Cosmo params H0, Ωm, ΩΛ Lbol Fb = K(r,χ,α) 4πr 2 α = Spectral parameters

Redshift z (observable) can be used as (partial) surrogate for r via Hubble’s law:

λ λ vHub + vpec z = − 0 = λ0 c cz = H r + vpec → 0 What We Really Want

Physics works in phase space: positions and velocities. We’d really like to infer ρ(r, v) — all the issues of inferring n(r), plus issues from imperfect measurement of v. Applications:

Stars in Milky Way — Galactic dynamics, dark matter • Stars in nearby dwarf galaxies — dark matter on small scales • Galaxy proper motions — large scale structure • We’ll mostly focus on position + luminosity, but upcoming peculiar velocity and parallax surveys (6dF, GAIA) are where some of these issues will be most crucial. Diverse (Arcane!) Units Radio, x-ray & γ-ray surveys, and surveys of other quanta (cosmic rays, grav’l radiation, neutrinos) use energy units directly (L, F ). Optical & IR surveys use M instead of L: L M 2.5 log10 , Φ(M; r) = pdf for M given r ≡ − Lfid and m instead of F : F L m 2.5 log10 = 2.5 log10 2 ≡ − Ffid − 4πr Ffid = M + µ with µ instead of r r µ = 5 log (stars) 10 10pc r = 5 log + 25 (galaxies) 10 Mpc Fundamental Equation of Stellar Statistics Connecting Observables to Desirables

Σ = density of sources wrt observables (unit sr, unit mag/flux)

L Σ(F , Ω) = dr dL r 2 n(r) f (L; r) δ F Z Z  − 4πr 2  = 4π dr r 4 n(r) f (4πr 2F ; r) Z Σ(m, Ω) = dr dM r 2 n(r) Φ(M; r) δ [m (M + µ)] Z Z − = dr r 2 n(r) Φ(m µ(r); r) Z − If either the density or luminosity function is known, and if Σ is accurately measured, this is a Fredholm integral equation.

But Σ is sampled (incompletely), often with measurement error, and usually both integrands are uncertain. Visualizing the Integral Luminosity: M N( 21,.32) ∼ − Density: Uniform to r = 100; linear drop Curve has constant m = 13.25 0.25 ± Indicators

Indicators are additional observables, σ, that help make r and L identifiable unravel the integral. (Like “instruments”?) ⇒ Two classes

Direct: Knowing σ knowing either r or L • → Stochastic: r or L are correlated with σ • Three types (usually all called “distance indicators”)

Distance indicators: p(r σ) • | Luminosity indicators: p(L σ) • | Size indicators: p(D σ) r via geometry • | → Direct Distance Indicator: Parallax

Parallax directly measures the distance to nearby stars: 1AU tan p = r p 1pc ̟ → ≡ 1 arcsec ≈ r 1pc r = ̟ “Parallax” is sometimes used as a synonym for distance. Conventional symbol is “π”. Similar geometric considerations orbits of mi- nor planets. → Stochastic Luminosity Indicators

Measure a source property σ so that

L fσ(L), hopefully narrow (<30%)! ∼ ∼

σ = Color/spectral type of (H-R diagram, “photometric parallax”)

= Period & color of periodic variable star (Period-luminosity rel’n)

= Asymptotic rot’n velocity of spiral galaxy (Tully-Fisher) = Velocity dispersion & angular size of elliptical galaxy

(Fundamental plane)

= Shape & color of SN Ia light curve ... Can consider measurement of σ estimate of L with “measurement/instrumental error”→ (real σ msmt. error may compound this) Stellar Luminosity Indicators

Hipparcos H-R Diagram Variable Star Period-Lum. Rel’n Galaxy Luminosity Indicators

Tully-Fisher Fundamental Plane Inferential Goals

Estimate shape of Σ(m) (no aux. info) • Estimate characteristics r, L for each object • Estimate f (L) (“calibration”) • σ Estimate n(r) with f (L) known • σ Estimate f (L) for entire population (mixture of f ) • σ Detect/estimate evolution, f (L; r) • Jointly estimate n(r), f (L; r) • Estimate cosmological parameters (n and f are nuisances) • ... • Outline

1 Observables, desirables & integrals Brightness distributions Distance–luminosity distributions

2 Cornucopia of complications Selection effects Scatter biases

3 Statistical methodology Current practice Promising directions Scatter Biases in Univariate Distributions

“Eddington Bias” n r Uncertainty n

r ^r

“A series of quantities are measured and classified in equal ranges. Each measure has a known uncertainty. On account of the errors of measurement some quantities are put into the wrong ranges. If the true number in a range is greater than those in the adjacent ranges, we should expect more observations to be scattered out of the range than into it, so that the observed number will need a positive correction.” (Jeffreys 1938) Luminosity Calibration via Parallax “Lutz-Kelker Bias”

Source complications Selection complications Parallax error (λ σ̟/̟) Magnitude (flux) truncation • • Flux error ≡ Parallax censoring • • Transformation bias Usually “soft” (random) • • Density law • Distance Estimation via Luminosity Indicator “Malmquist Bias”

Source complications Selection complications Indicator scatter Magnitude/flux truncation • • Transformation bias Usually “soft” (random) • • Flux error • Distance Errors Due to Indicator Scatter

Average true radius r of sources assigned ˆr

160

140

120

100 r 80

60

40

20

0 0 20 40 60 80 100 120 140 160 0.2

0.1 r/rδ 0.0

-0.1

-0.2 0 20 40 60 80 100 120 140 160 rˆ Outline

1 Observables, desirables & integrals Brightness distributions Distance–luminosity distributions

2 Cornucopia of complications Selection effects Scatter biases

3 Statistical methodology Current practice Promising directions Two Classes of Methods Indicator scatter Truncation Transform n bias Meas. Error Censoring

Population, θ Observables Measurements Catalog F L Mapping, φ Observation Selection

r z POOO = point = likelihood

Inverse methods Try to “correct” or “debias” data via adjustments/weights • Focus on moments & empirical dist’n function (EDF) • Forward modeling methods Try to predict data by applying obs. process to model • Focus on likelihood • (Analogous to “design-based” vs. “model-based” methods in survey sampling) Seminal Work Eddington (1913, 1940) & Jeffreys (1938) Measurement error in univariate dist’ns (“density • deconvolution/demixing”) Adjusted EDF/estimates vs. likelihood (Eddington vs. • Jeffreys)

Malmquist (1920) Correct (r, L) dist’ns for truncation by adjusting moment • estimators, assuming uniform n and Gaussian Φ(M)

Lutz-Kelker (1973) Correct parallax distances for scatter by adjusting moment • estimators, assuming uniform n and Gaussian Φ(M)

Modern syntheses: Teerikorpi, Hendry & Simmons, Willick, Smith, Sandage & Saha . . . Some Recent Methodology

Parametric corrections for scatter Follow original moment-correction procedures of Malmquist/L-K • but with simple parameterized n(r) Adjust parameters via ML or ad hoc iterative procedure • Robust corrections for truncation Use product-limit estimators accounting for truncation • Identify “plateau region” where other biases are minor • Parametric & Stepwise Max. Like. (SWML; Esfathiou et al.)

Parametric methods adopt • 2dFGRS Galaxy Lum. Func. (N+ ’02) Schecter form for f (L) (gamma dist’n) SWML motivated by • − Lynden-Bell’s C method, but with piecewise-constant model ML with finite mixtures Fit to fixed grid of 50+ normals • Account for m error∼ via marginalization • SDSS Galaxy Lum. Func. (B+ ’03) Current Status

Brightness distributions (r, L) distributions X Parametric modelling (via X Parametric f (L) given marginalizing latent n(r), or n(r) given f (L) variables) X As above, nonparametric X Nonparametric estimates with no meas. error with truncation/censoring; Everything else (including no meas. error × ext. to phase space)! Nonparametric estimates × with scatter and truncation What Won’t Work

“Why not just wait and see? √N will save us!” Alas . . .

Scatter biases don’t go away asymptotically! • Convergence may be much slower than √N! • Measurement Error Doesn’t “Average Out” Measure m = 2.5 log(flux) from sources following a “rolling − power law” distribution

m ′ m 2 f (m) 10[α( −23)+α ( −23) ] ∝ f(m) α

23 m

Measurements have uncertainties 1% (bright) to 30% (dim) ≈ Analyze simulated data with likelihood ignoring measurement error, and (Bayesian) marginal likelihood Parameter estimates (MLEs) with (dots) and without (circles) error marginalization:

Uncertainties don’t average out! Gets rediscovered in a new field every decade A Role for Shrinkage?

Eddington (1913, 1926, 1940; before Stein!)

1913: Estimate n(r) via derivatives of EDF (binned) • Later: Sought adjustments to best-fit (unbiased) source • estimates that decreased the summed squared errors Emphasized that improved estimates were not good for • inferring population properties

The Eddington 1940 approach is a variant of shrinkage estimation Suggests empirical Bayes/hierarchical Bayes generalizations (Eddington anticipated this!) Cautions: Varying goals require varying shrinkage! Inferential goals — Reduce MSE of what?

Conventional shrinkage minimizes errors in individual estimates To estimate histogram (EDF), minimize MSE between reported & true EDF less shrinkage (Louis ’84) → To rank values, minimize SE on ranks shrinkage of ranks → toward mid-rank (produces non-integer ranks) Triple-goal estimates — Compromise shrinkage to produce a single summary that improves on BLUE/MLE for all three tasks (Louis & Shen ’99) Nonparametric Estimation With Scatter

“Farewell, root-N!”

Simplest case: Estimate Σ(m) from noisy measurements:

mi Σ(m) (iid) ∼ mi,obs = mi + ǫi , p(ǫi ) known

This is known as the (density) deconvolution problem or as mixing density estimation. Methods are available using KDE, but convergence can be extremely poor. Performance depends strongly on the error distribution. “Smooth” error dist’ns (e.g., Gamma, Laplace) Characteristic function falls as power law, t −β ∼| | Slow polynomial convergence, e.g. O(1/n1/6) ∼

“Supersmooth” error dist’ns (e.g., Normal, Cauchy)

β Characteristic function falls exponentially, e−|t| /γ ∼ Logarithmic convergence, e.g. O (log n)−1/2 — yikes! ∼  Convergence also slows as target Σ(m) gets smoother. Role of Deconvolution/Demixing Rates are discouraging: • The problem of consistent estimation is effectively insoluble in practical terms. . . . Even when the rate is polynomial, it is often particularly slow unless the [error density] contains a discontinuity (Carroll & Hall ’04) Limited literature on multivariate density deconvolution • indicates similar behavior > 1-D Many methods assume homoscedasticity; very little work on • deconvolution + selection (Stefanski & Bay ’96; Zhang+ ’00) Adding some parametric info may help a lot • (Splines—Mendelsohn & Rice ’82; Berry+ ’02; semiparametric Bayes—Carroll et al. ’04)

Efromovitch 1997 (adaptive nonlinear deconvolution) — Role is for “for visualizing an underlying density... and serving as a tool to suggest an appropriate parametric model . . . ” Landy & Szalay 1992

A quasi-empirical Bayes approach

Data di provide estimates ˆri ; true distances are ri Prior p(ri ) r 2n(r); likelihood (ri ) = lognormal ∝ L 2 ri n(ri ) (ri ) p(ri di ) = L | p(di )

2 p(di ) = dri ri n(ri ) (ri ) Z L

LS92 set p(di ) = p(ˆri ) = Ψ(ˆri ), a smoothed fit to ˆri { } moments of p(ri ˆri ) can be found from Ψ(ˆri ) → | Use these to calculate corrections to ˆri

Problems “Double counts” the data • Doesn’t account for uncertainty in n(r) • Hierarchical Bayes

Population inferences

Let n(r) = n(r; θ). Infer θ directly from the data rather than via ri estimators.

p(θ di ) p(θ) (θ) |{ } ∝ L 2 (θ) dr r n(r; θ)ℓi (r) L ∼ Yi Z For noiseless msmts, ℓi (r) = p(Fi ,σi r) | = dLp(Fi L, r) p(σi , L r) Z | | 2 2 = 4πr p(L = 4πr Fi ,σi r) |

For noisy msmts, introduce latent Fi ,σi . This is what we do for TNOs, GRBs, pulsars . . . Galaxy distances: Marginal distributions Instead of p(r ˆr ), calculate 1| 1

p(r1 di ) = dθ p(θ di ) p(r1 θ, d1) |{ } Z |{ } | 2 = dθ p(θ di ) r n(r1; θ)ℓ1(r1) Z |{ } 1

LS92 corresponds to θ = θˆ and using Gaussian approximation. No one has yet tried the full hierarchical calculation. Galaxy distances: “Best” estimates Different summaries are appropriate depending on goals Individual distances • Distribution of distances. . . . • Directions for Cross-Disciplinary Research

Terminology • Nonparametric deconvolution/demixing • What is its role given slow convergence? • Study sensitivity to error dist’ns, heteroscedasticity • 2-D density deconvolution • Selection effects (= boundary corrections?) • Shrinkage beyond Eddington • Role for multi-goal shrinkage? • Adaptive shrinkage? • Parametric & semiparametric modelling • Empirical/hierarchical Bayes • Finite models + SIMEX, with selection . . . • Beyond r and L (vpec, surface brightness, color . . . ) •