A Statistical Application in Astronomy: Streaming Motion in Leo I
Total Page:16
File Type:pdf, Size:1020Kb
Streaming motion in Leo I Separating signal from background A statistical application in astronomy: Streaming motion in Leo I Bodhisattva Sen DPMMS, University of Cambridge, UK Columbia University, New York, USA [email protected] ETH Zurich, Switzerland 31 May, 2012 Streaming motion in Leo I Separating signal from background Outline 1 Streaming motion in Leo I Modeling Threshold models 2 Separating signal from background Method Theory Streaming motion in Leo I Modeling Separating signal from background Threshold models What is streaming motion? Local Group dwarf spheroidal (dSph) galaxies: small, dim Is Leo I in equilibrium or tidally disrupted by Milky Way? Such a disruption can give rise to streaming motion: the leading and trailing stars move away from the center of the main body of the perturbed system Streaming motion in Leo I Modeling Separating signal from background Threshold models Data on 328 stars (Y ; Σ): Line of sight velocity; std. dev. of meas. error (R; Θ): Projected position, orthogonal to line of sight Contaminated with foreground stars (in the line of sight) Magnitude of streaming motion Likely to increase beyond a threshold radius Likely to be aligned with the major axis of the system Streaming motion in Leo I Modeling Separating signal from background Threshold models Statistical questions Is streaming motion evident in Leo I? If so, how can it be described and estimated? To what extent can it be described by a threshold model? Findings [Sen et al. (AoAS, 2009)]: The magnitude of streaming motion appears to be modest, (nearly) significant at 5% level Appears consistent with a threshold model Difficult to identify the threshold radius precisely Streaming motion in Leo I Modeling Separating signal from background Threshold models Outline 1 Streaming motion in Leo I Modeling Threshold models 2 Separating signal from background Method Theory Streaming motion in Leo I Modeling Separating signal from background Threshold models 2 Yi = ν(Ri ; Θi ) + i + δi ; i ∼ N(0; σ ) independent of (R; Θ) 2 Measurement error: δi jΣi ∼ N(0; Σi ) Cosine Model: ν(r; θ) = ν + λ(r) cos θ; λ " n 2 X fYi − v − u(Ri ) cos Θi g (ν; ^ λ^ ) = arg min v2 ;u" σ2 + Σ2 R i=1 i 1 n 2 X ^ 2 2 σ^ = [fYi − ν^ − λ(Ri ) cos Θi g − Σi ] n i=1 λ measures the effect of streaming; λ " cos θ determines the deviation from the major axis Streaming motion in Leo I Modeling Separating signal from background Threshold models λ^ Simulated effect of streaming Streaming motion in Leo I Modeling Separating signal from background Threshold models Is streaming evident? Test λ = 0 using log-likelihood ratio statistic No streaming: ν(r; θ) ≡ ν, a constant P-value ≈ 0:055 (using permutation test) How much is streaming at radius r? Need point-wise confidence intervals for λ d Result: n1=3fλ^(r) − λ(r)g ! ηC. Need to estimate η – tricky! Need ways to by-pass the estimation of η. Likelihood ratio (LR) based test or bootstrap methods LR is pivotal and can be used to construct CI for λ(r) (Banerjee and Wellner, AoS, 2001) Streaming motion in Leo I Modeling Separating signal from background Threshold models Likelihood ratio based method Banerjee and Wellner, AoS, 2001 Test H0 : λ(r) = ξ0 n 2 n 2 X fYi − v − u(Ri ) cos Θi g X fYi − v − u(Ri ) cos Θi g ∆SSE(r; ξ0) = min − min v2 ;u(r)=ξ ;u" σ2 + Σ2 v2 ;u" σ2 + Σ2 R 0 i=1 i R i=1 i d Under H0, ∆SSE(r; ξ0) ! D D does not contain any nuisance parameters Invert this sequence of hypotheses tests (by varying ξ0) to get a CI for λ(r) Streaming motion in Leo I Modeling Separating signal from background Threshold models Bootstrap methods Want to bootstrap n1=3fλ^(r) − λ(r)g Efron’s bootstrap fails We claim that the bootstrap estimate does not have any weak limit (Sen et al., AoS, 2010) Smoothed bootstrap works LR Bootstrap 0.90 0.95 0.90 0.95 ^ r0 L U CP L U CP L U L U λ 400 0 3.54 .901 0 3.86 .952 0 3.57 0 3.57 1.92 500 0.10 4.50 .882 0 5.02 .936 0 3.58 0 3.90 1.98 600 0.26 6.66 .827 0 7.30 .897 0 3.30 0 3.64 1.98 700 0.36 8.88 .913 0.05 9.56 .961 0 4.26 0 4.69 1.99 750 1.85 8.88 .906 1.37 9.56 .952 0.44 7.86 0 8.37 5.37 Streaming motion in Leo I Modeling Separating signal from background Threshold models Outline 1 Streaming motion in Leo I Modeling Threshold models 2 Separating signal from background Method Theory Streaming motion in Leo I Modeling Separating signal from background Threshold models Threshold models Goal: To determine a “threshold” in the domain of the function where some “activity” takes place Could either be a rapid change in the function value or a discontinuity Find the threshold radius in the Leo I ( 0 0 ≤ r ≤ d ; Model: ν(r; θ) = ν + λ(r) cos θ where λ(r) = 0 > 0 d0 < r ≤ 1: Two approaches: change point versus split point Streaming motion in Leo I Modeling Separating signal from background Threshold models Change point models ν(r; θ) = ν + β (r − ρ) cos θ 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 x x (x) = 1fx ≥ 0g (x) = max(0; x) Streaming motion in Leo I Modeling Separating signal from background Threshold models Change point models ν(r; θ) = ν + β (r − ρ) cos θ ρ is the change-point 2 Pn fYi −v−β (Ri −r) cos Θi g Minimize: SSE(v; β; r) = i=1 2 2 σ^ +Σi SSE(v^; β;^ r) − SSE(v^; β;^ ^r) Streaming motion in Leo I Modeling Separating signal from background Threshold models Split point model Do not assume specific form for λ(r); λ " Find the closest stump to λ (under mis-specification) R r 2 R τ 2 κ(b; r) = 0 λ (s)ds + r [λ(s) − b] ds Population parameters: (β; γ) = arg min κ(b; r) γ is the split point Estimate λ by λ^ to estimate γ Streaming motion in Leo I Modeling Separating signal from background Threshold models Rescaled version of the criterion function Asymptotics for the split point: n1=3-rate of convergence? Streaming motion in Leo I Method Separating signal from background Theory Outline 1 Streaming motion in Leo I Modeling Threshold models 2 Separating signal from background Method Theory Streaming motion in Leo I Method Separating signal from background Theory Problem Most astronomical data sets are polluted to some extent by foreground/background objects (“contaminants/noise”) that can be difficult to distinguish from objects of interest (“member/signal”) Contaminants may have the same apparent magnitudes, colors, and even velocities as signal stars How do you separate out the “signal” stars? We develop an algorithm for evaluating membership (estimating parameters & probability of an object belonging to the signal population) Streaming motion in Leo I Method Separating signal from background Theory Example Data on stars in nearby dwarf spheroidal (dSph) galaxies Data: (X1i ; X2i ; V3i ; σi ; ΣMgi ; :::) Velocity samples suffer from contamination by foreground Milky Way stars Streaming motion in Leo I Method Separating signal from background Theory Approach Our method is based on the Expectation-Maximization (EM) algorithm We assign parametric distributions to the observables; derived from the underlying physics in most cases The EM algorithm provides estimates of the unknown parameters (mean velocity, velocity dispersion, etc.) Also, probability of each star belonging to the signal population; see Walker et al. (2008) Streaming motion in Leo I Method Separating signal from background Theory Streaming motion in Leo I Method Separating signal from background Theory A toy example Suppose N ∼ Poisson(b + s) is the number of stars observed s = rate for observing a signal star b = the foreground rate Given N = n, we have W1;:::; Wn ∼ fb;s where data n bfb(w)+sfs(w) fWi = (X1i ; X2i ; V3i ; σi )gi=1, and fb;s(w) = b+s We assume that fb and fs are parameterized (modeled by the underlying physics) probability densities Streaming motion in Leo I Method Separating signal from background Theory For galaxy stars The stellar density (number of stars per unit area) falls exponentially with radius, R The distribution of velocity given position is assumed to be 2 2 normal with mean µ and variance σ + σi For foreground stars The density is uniform over the field of view The distribution of velocities V3i is independent of position (X1i ; X2i ) We adopt V3i from the Besancon´ Milky Way model (Robin et al. 2003), which specifies velocity distributions of Milky Way stars along a given line of sight Streaming motion in Leo I Method Separating signal from background Theory Outline 1 Streaming motion in Leo I Modeling Threshold models 2 Separating signal from background Method Theory Streaming motion in Leo I Method Separating signal from background Theory Model Suppose N ∼ Poisson(b + s) is the number of stars observed s = rate for observing a signal star b = the foreground rate Given N = n, we have W1;:::; Wn ∼ fb;s where data n bfb(w)+sfs(w) fWi = (X1i ; X2i ; V3i ; σi ; ΣMgi ;:::)gi=1, and fb;s(w) = b+s We assume that fb and fs are parameterized (modeled by the underlying physics) probability densities; β = (s; b; µ, σ2;:::) We would ideally like to maximize the likelihood N Y bfb(Wi ) + sfs(Wi ) L(β) = (Difficult!) b + s i=1 Streaming motion in Leo I Method Separating signal from background Theory The Likelihood Let Yi be the indicator of a foreground star, i.e., Yi = 1 if the i’th star is a foreground star, and Yi = 0 otherwise b Note that Yi ’s are i.i.d.