Theoretical Statistics. Lecture 20
Total Page:16
File Type:pdf, Size:1020Kb
Theoretical Statistics. Lecture 20. Peter Bartlett 1. Recall: Functional delta method, differentiability in normed spaces, Hadamard derivatives. [vdV20] 2. Quantile estimates. [vdV21] 3. Contiguity. [vdV6] 1 Recall: Differentiability of functions in normed spaces Definition: φ : D E is Hadamard differentiable at θ D tangentially → ∈ to D D if 0 ⊆ φ′ : D E (linear, continuous), h D , ∃ θ 0 → ∀ ∈ 0 if t 0, ht h 0, then → k − k → φ(θ + tht) φ(θ) − φ′ (h) 0. t − θ → 2 Recall: Functional delta method Theorem: Suppose φ : D E, where D and E are normed linear spaces. → Suppose the statistic Tn : Ωn D satisfies √n(Tn θ) T for a random → − element T in D D. 0 ⊂ If φ is Hadamard differentiable at θ tangentially to D0 then ′ √n(φ(Tn) φ(θ)) φ (T ). − θ If we can extend φ′ : D E to a continuous map φ′ : D E, then 0 → → ′ √n(φ(Tn) φ(θ)) = φ (√n(Tn θ)) + oP (1). − θ − 3 Recall: Quantiles Definition: The quantile function of F is F −1 : (0, 1) R, → F −1(p) = inf x : F (x) p . { ≥ } Quantile transformation: for U uniform on (0, 1), • F −1(U) F. ∼ Probability integral transformation: for X F , F (X) is uniform on • ∼ [0,1] iff F is continuous on R. F −1 is an inverse (i.e., F −1(F (x)) = x and F (F −1(p)) = p for all x • and p) iff F is continuous and strictly increasing. 4 Empirical quantile function For a sample with distribution function F , define the empirical quantile −1 function as the quantile function Fn of the empirical distribution function Fn. −1 F (p) = inf x : Fn(x) p = X , n { ≥ } n(i) where i is chosen such that i 1 i − <p , n ≤ n and Xn(1),...,Xn(n) are the order statistics of the sample, that is, X X and n(1) ≤···≤ n(n) Xn(1),...,Xn(n) is a permutation of the sample (X1,...,Xn). 5 Quantiles Define φ : D[a,b] R as the pth quantile function φ(F )= F −1(p). → Here, D[a,b] is the set of cadlag functions on [a,b], considered as a subset of ℓ∞[a,b]: cadlag = continue a` droite, limite a` gauche = right continuous, with left limits. 6 Quantiles Theorem: If F D[a,b], • ∈ xp satisfies F (xp)= p, • ′ F is differentiable at xp, with F (xp) > 0, • then φ is Hadamard-differentiable at F tangentially to h D[a,b]: h is continuous at xp . { ∈ } Its Hadamard derivative is the (continuous) function ′ h(xp) φF (h)= ′ . −F (xp) 7 Quantiles Recall: −1 ′ p sx(F (p)) φ (sx F )= − F − f(F −1(p)) (sx F )(xp) = −′ . − F (xp) 8 Quantiles Theorem: For 0 <p< 1, • F differentiable at F −1(p), • F ′(F −1(p)) = f(F −1(p)) > 0, • n −1 −1 −1 1 1[Xi F (p)] p √n F (p) F (p) = ≤ − + oP (1) n − −√n f(F −1(p)) Xi=1 p(1 p) N 0, − . f 2(F −1(p)) 9 Quantiles Proof: Because x 1[x a]: a R is Donsker, { 7→ ≤ ∈ } conv x 1[x a]: a R is Donsker, hence { 7→ ≤ ∈ } Gn,F = √n(Fn F ) converges weakly in D[ , ] to an F -Brownian − −∞ ∞ bridge process GF = Gλ F . ◦ [Recall that Gλ is the standard uniform Brownian bridge.] The sample paths of GF are continuous at points where F is continuous. Now, φ : F F −1(p) is Hadamard-differentiable tangentially to the set 7→ D0 of cadlag functions that are continuous where F is continuous. And the G ′ limiting process F takes its values in this set D0. Furthermore, φF is defined and continuous everywhere in ℓ∞[ , ]. −∞ ∞ 10 Quantiles Hence, we can use the functional delta method: ′ √n(φ(Fn) φ(F )) = φ √n(Fn F ) + oP (1) − F − ′ G = φF ( n,F )+ oP (1). We can extend this result to the process √n(F −1 F −1), provided the n − differentiability conditions are satisfied over a set... 11 Quantiles Theorem: Suppose 0 <p <p < 1, • 1 2 [a,b]=[F −1(p ) ǫ,F −1(p )+ ǫ], for some ǫ> 0, • 1 − 2 F continuously differentiable and with positive derivative f on [a,b], • φ : D[a,b] ℓ∞[p ,p ] is defined by φ(G)= G−1. • → 1 2 Then φ is Hadamard differentiable at F tangentially to C[a,b], with h φ′ (h)= F −1. F − f ◦ (Recall: C[a,b] is the set of continuous functions on [a,b].) 12 Quantiles Theorem: For 0 <p <p < 1, • 1 2 [a,b]=[F −1(p ) ǫ,F −1(p )+ ǫ], for some ǫ> 0, and • 1 − 2 F continuously differentiable and with positive derivative f on [a,b], • G √n F −1 F −1 λ , n − f F −1 ◦ ∞ where the convergence is in ℓ [p1,p2], and Gλ is the standard Brownian bridge. (Recall: weak convergence in a metric space of functions is defined in terms of expectations of bounded continuous functions in the space.) 13 Contiguity Motivation: Suppose we wish to study the asymptotics of statistics Tn. Under the null hypothesis, say, Tn Pn, we can show that Tn T . What happens when ∼ the null hypothesis is not true? For instance, under the alternative hypothesis, Tn Qn. What can we say about the asymptotics? ∼ We can relate them through likelihood ratios (also called Radon-Nikodym derivatives): dP p n = n , dQn qn where pn and qn are the corresponding densities. For these to make sense, we need the ratio to exist, and in particular we need qn = 0 only if pn = 0, at least asymptotically. 14 Absolute Continuity Definition: 1. Q P (“Q is absolutely continuous wrt P ”) means A, ≪ ∀ P (A)=0= Q(A) = 0. ⇒ 2. P Q (“P and Q are orthogonal”) means ΩP , ΩQ, ⊥ ∃ P (ΩP ) = 1, Q(ΩP ) = 0, Q(ΩQ) = 1, P (ΩQ) = 0. 15 Absolute Continuity: Examples Example: 1. P = N(0, 1), Q = N(µ, σ2) with σ2 > 0. Then P (A) = 0 Q(A) = 0. Hence, P Q and Q P . ⇔ ≪ ≪ 2. P = N(0, 1), Q is uniform on [0, 1]. Then Q P but not P Q. ≪ ≪ 3. P = N(0, 1), Q is a mixture of a normal and a point mass at x: Q(x) > 0. Then P Q but not Q P . ≪ ≪ 4. P is uniform on [ 1/2, 1/2], Q is uniform on [0, 1]. Then neither − Q P nor P Q. ≪ ≪ 16 Absolute Continuity We can always decompose Q into a part that is absolutely continuous wrt P and a part that is orthogonal (singular): Suppose that P and Q have densities p and q wrt some measure µ. Define Qa(A)= Q (A p> 0 ) , ∩ { } Q⊥(A)= Q (A p = 0 ) . ∩ { } 17 Absolute Continuity Lemma: 1. Q = Qa + Q⊥, with Qa P and Q⊥P (Lebesgue decomposition) ≪ q 2. Qa(A)= dP . ZA p q 3. Q P Q = Qa Q(p = 0) = 0 dP = 1. ≪ ⇔ ⇔ ⇔ Z p 18 Absolute Continuity Proof: (1) is immediate from the definitions. (2): Qa(A)= qdµ ZA∩{p>0} q = pdµ ZA∩{p>0} p q = dP. ZA∩{p>0} p 19 Absolute Continuity (3): Q P Q = Qa ≪ ⇔ Q⊥ = 0 ⇔ Q(p = 0) = 0, from the definitions. ⇔ 20 Absolute Continuity Also, dQ = dQa + dQ⊥ Z Z Z q = dP + dQ⊥, Z p Z so q/pdP = 1 Q⊥ = 0. Z ⇔ 21 Likelihood ratios Write the likelihood ratio (= Radon-Nikodym derivative): dQ q = . dP p This is defined on ΩP = p> 0 , and it is P -almost surely unique. It does { } not depend on the choice of dominating measure µ that is used to define the densities p and q. 22 Likelihood ratios: Change of measure If Q P then Q = Qa, so we can write the Q-law of X : Ω Rk in ≪ → terms of the P -law of the random pair (X, dQ/dP ), via dQ E f(X)= E f(X) , Q P dP dQ Q(X A)= EP 1[X A] = vdPX,V (x, v), ∈ ∈ dP ZA×R where we have written the distribution under P of (X, V )=(X, dQ/dP ) as PX,V . This change of measure requires that Q is absolutely continuous wrt P . 23 Contiguity We are interested in an asymptotic version of this change of measure. That is, we know the asymptotics of Tn Pn, and we’d like to infer the ∼ asymptotics under an alternative sequence Qn. When can we do that? We clearly need an asymptotic version of absolute continuity. This is called contiguity. Definition: Qn ⊳Pn (“Qn is contiguous wrt Pn”) means, An, ∀ Pn(An) 0= Qn(An) 0. → ⇒ → 24.