<<

Theoretical . Lecture 20. Peter Bartlett

1. Recall: Functional delta method, differentiability in normed spaces, Hadamard derivatives. [vdV20] 2. Quantile estimates. [vdV21]

3. Contiguity. [vdV6]

1 Recall: Differentiability of functions in normed spaces

Definition: φ : D E is Hadamard differentiable at θ D tangentially → ∈ to D D if 0 ⊆ φ′ : D E (linear, continuous), h D , ∃ θ 0 → ∀ ∈ 0 if t 0, ht h 0, then → k − k → φ(θ + tht) φ(θ) − φ′ (h) 0. t − θ →

2 Recall: Functional delta method

Theorem: Suppose φ : D E, where D and E are normed linear spaces. → Suppose the Tn : Ωn D satisfies √n(Tn θ) T for a random → − element T in D D. 0 ⊂ If φ is Hadamard differentiable at θ tangentially to D0 then

′ √n(φ(Tn) φ(θ)) φ (T ). − θ If we can extend φ′ : D E to a continuous map φ′ : D E, then 0 → → ′ √n(φ(Tn) φ(θ)) = φ (√n(Tn θ)) + oP (1). − θ −

3 Recall: Quantiles

Definition: The of F is F −1 : (0, 1) , → F −1(p) = inf x : F (x) p . { ≥ }

Quantile transformation: for U uniform on (0, 1), • F −1(U) F. ∼ integral transformation: for X F , F (X) is uniform on • ∼ [0,1] iff F is continuous on R. F −1 is an inverse (i.e., F −1(F (x)) = x and F (F −1(p)) = p for all x • and p) iff F is continuous and strictly increasing.

4 Empirical quantile function

For a with distribution function F , define the empirical quantile −1 function as the quantile function Fn of the empirical distribution function Fn.

−1 F (p) = inf x : Fn(x) p = X , n { ≥ } n(i) where i is chosen such that i 1 i −

Xn(1),...,Xn(n)  is a permutation of the sample (X1,...,Xn).

5 Quantiles

Define φ : D[a,b] R as the pth quantile function φ(F )= F −1(p). → Here, D[a,b] is the set of cadlag functions on [a,b], considered as a subset of ℓ∞[a,b]:

cadlag = continue a` droite, limite a` gauche = right continuous, with left limits.

6 Quantiles

Theorem: If F D[a,b], • ∈ xp satisfies F (xp)= p, • ′ F is differentiable at xp, with F (xp) > 0, • then φ is Hadamard-differentiable at F tangentially to

h D[a,b]: h is continuous at xp . { ∈ } Its Hadamard derivative is the (continuous) function

′ h(xp) φF (h)= ′ . −F (xp)

7 Quantiles

Recall: −1 ′ p sx(F (p)) φ (sx F )= − F − f(F −1(p))

(sx F )(xp) = −′ . − F (xp)

8 Quantiles

Theorem: For 0 0, • n −1 −1 −1 1 1[Xi F (p)] p √n F (p) F (p) = ≤ − + oP (1) n − −√n f(F −1(p))  Xi=1 p(1 p) N 0, − .  f 2(F −1(p))

9 Quantiles

Proof: Because x 1[x a]: a R is Donsker, { 7→ ≤ ∈ } conv x 1[x a]: a R is Donsker, hence { 7→ ≤ ∈ } Gn,F = √n(Fn F ) converges weakly in D[ , ] to an F -Brownian − −∞ ∞ bridge process GF = Gλ F . ◦ [Recall that Gλ is the standard uniform Brownian bridge.]

The sample paths of GF are continuous at points where F is continuous. Now, φ : F F −1(p) is Hadamard-differentiable tangentially to the set 7→ D0 of cadlag functions that are continuous where F is continuous. And the G ′ limiting process F takes its values in this set D0. Furthermore, φF is defined and continuous everywhere in ℓ∞[ , ]. −∞ ∞

10 Quantiles

Hence, we can use the functional delta method:

′ √n(φ(Fn) φ(F )) = φ √n(Fn F ) + oP (1) − F − ′ G  = φF ( n,F )+ oP (1).

We can extend this result to the process √n(F −1 F −1), provided the n − differentiability conditions are satisfied over a set...

11 Quantiles

Theorem: Suppose 0

0, • 1 − 2 F continuously differentiable and with positive derivative f on [a,b], • φ : D[a,b] ℓ∞[p ,p ] is defined by φ(G)= G−1. • → 1 2 Then φ is Hadamard differentiable at F tangentially to C[a,b], with h φ′ (h)= F −1. F − f  ◦

(Recall: C[a,b] is the set of continuous functions on [a,b].)

12 Quantiles

Theorem: For 0

0, and • 1 − 2 F continuously differentiable and with positive derivative f on [a,b], • G √n F −1 F −1 λ , n − f F −1  ◦ ∞ where the convergence is in ℓ [p1,p2], and Gλ is the standard Brownian bridge. (Recall: weak convergence in a metric space of functions is defined in terms of expectations of bounded continuous functions in the space.)

13 Contiguity

Motivation:

Suppose we wish to study the asymptotics of statistics Tn. Under the null hypothesis, say, Tn Pn, we can show that Tn T . What happens when ∼ the null hypothesis is not true? For instance, under the alternative hypothesis, Tn Qn. What can we say about the asymptotics? ∼ We can relate them through likelihood ratios (also called Radon-Nikodym derivatives): dP p n = n , dQn qn where pn and qn are the corresponding densities. For these to make sense, we need the ratio to exist, and in particular we need qn = 0 only if pn = 0, at least asymptotically.

14 Absolute Continuity

Definition: 1. Q P (“Q is absolutely continuous wrt P ”) means A, ≪ ∀ P (A)=0= Q(A) = 0. ⇒

2. P Q (“P and Q are orthogonal”) means ΩP , ΩQ, ⊥ ∃

P (ΩP ) = 1, Q(ΩP ) = 0,

Q(ΩQ) = 1,P (ΩQ) = 0.

15 Absolute Continuity: Examples

Example: 1. P = N(0, 1), Q = N(µ, σ2) with σ2 > 0. Then P (A) = 0 Q(A) = 0. Hence, P Q and Q P . ⇔ ≪ ≪ 2. P = N(0, 1), Q is uniform on [0, 1]. Then Q P but not P Q. ≪ ≪ 3. P = N(0, 1), Q is a mixture of a normal and a point mass at x: Q(x) > 0. Then P Q but not Q P . ≪ ≪ 4. P is uniform on [ 1/2, 1/2], Q is uniform on [0, 1]. Then neither − Q P nor P Q. ≪ ≪

16 Absolute Continuity

We can always decompose Q into a part that is absolutely continuous wrt P and a part that is orthogonal (singular): Suppose that P and Q have densities p and q wrt some measure µ. Define

Qa(A)= Q (A p> 0 ) , ∩ { } Q⊥(A)= Q (A p = 0 ) . ∩ { }

17 Absolute Continuity

Lemma: 1. Q = Qa + Q⊥, with Qa P and Q⊥P (Lebesgue decomposition) ≪ q 2. Qa(A)= dP . ZA p q 3. Q P Q = Qa Q(p = 0) = 0 dP = 1. ≪ ⇔ ⇔ ⇔ Z p

18 Absolute Continuity

Proof: (1) is immediate from the definitions. (2):

Qa(A)= qdµ ZA∩{p>0} q = pdµ ZA∩{p>0} p q = dP. ZA∩{p>0} p

19 Absolute Continuity

(3):

Q P Q = Qa ≪ ⇔ Q⊥ = 0 ⇔ Q(p = 0) = 0, from the definitions. ⇔

20 Absolute Continuity

Also,

dQ = dQa + dQ⊥ Z Z Z q = dP + dQ⊥, Z p Z so q/pdP = 1 Q⊥ = 0. Z ⇔

21 Likelihood ratios

Write the likelihood ratio (= Radon-Nikodym derivative): dQ q = . dP p

This is defined on ΩP = p> 0 , and it is P -almost surely unique. It does { } not depend on the choice of dominating measure µ that is used to define the densities p and q.

22 Likelihood ratios: Change of measure

If Q P then Q = Qa, so we can write the Q-law of X : Ω Rk in ≪ → terms of the P -law of the random pair (X, dQ/dP ), via dQ E f(X)= E f(X) , Q P dP dQ Q(X A)= EP 1[X A] = vdPX,V (x, v), ∈  ∈ dP  ZA×R where we have written the distribution under P of (X, V )=(X, dQ/dP ) as PX,V . This change of measure requires that Q is absolutely continuous wrt P .

23 Contiguity

We are interested in an asymptotic version of this change of measure. That is, we know the asymptotics of Tn Pn, and we’d like to infer the ∼ asymptotics under an alternative sequence Qn. When can we do that? We clearly need an asymptotic version of absolute continuity. This is called contiguity.

Definition: Qn ⊳Pn (“Qn is contiguous wrt Pn”) means, An, ∀

Pn(An) 0= Qn(An) 0. → ⇒ →

24