<<

ECE 645: Estimation Theory Spring 2015 Instructor: Prof. Stanley H. Chan

Lecture 4: Law of Large Number and Central Limit (LaTeX prepared by Jing Li) March 31, 2015

This lecture note is based on ECE 645(Spring 2015) by Prof. Stanley H. Chan in the School of Electrical and Computer Engineering at Purdue University.

1 Bounds for PF and PM

In the previous lectures we have studied various detection methods. Starting from this lecture, we want to take a step further to analyze the performance of these detection methods. In order to motivate ourselves to learn a set of new tools called Large Deviation Theory, let us first review some “standard” tools, namely the Law of Large Number and the . To begin our discussion, let us first consider the probability of false alarm PF and the probability of miss PM . If Y = y is a one-dimensional observation, we can show the following proposition. Proposition 1. Given a one-dimensional observation Y = y and a decision rule δ(y), it holds that

PF P(ℓ(y) η H ), (1) ≤ ≥ | 0 and PM P(ℓ(y) η H ), (2) ≤ ≤ | 1 where ℓ(y) def= log L(y) is the log-likelihood ratio. Proof. Given δ(y), it holds that

PF = f (y) dy + γ f (y) dy f (y) dy = P(ℓ(y) η H ), 0 0 ≤ 0 ≥ | 0 ℓ(yZ)>η ℓ(yZ)=η ℓ(yZ) η ≥ where the inequality holds because γ 1. Similarly, we have ≤

PM = f (y) dy + (1 γ) f (y) dy f (y) dy = P(ℓ(y) η H ). 1 − 1 ≤ 1 ≤ | 1 ℓ(yZ)<η ℓ(yZ)=η ℓ(yZ) η ≤ ✷

While the derivation shows that PF and PM can be evaluated through the probability of hav- ing ℓ(y) ≶ η, the same trick becomes much more difficult if we proceed to a high-dimensional observation Y = y. In this case, we let

T y = [y1,y2,...,yn] . (3) Then,

f0(y) dy = f0(y1,...,yn) dy1 ...dyn ℓ(yZ) η ℓ(yZ) η ≥ ≥ n = f0(yi) dy1 ...dyn. (4) i=1 ℓ(yZ) η ≥ Y Unfortunately, (4) involves multivariate integration and is extremely difficult to compute. To overcome this difficulty, it will be useful to note that

PF P(ℓ(y) η H ). (5) ≤ ≥ | 0 Since n f (y) ℓ(y) = log 1 = ℓ (y ), f (y) i i 0 i X=1 def where ℓ (y ) = log f1(yi) , it holds that i i f0(yi)

n P(ℓ(y) η H0)= P ℓi(yi) η H0 . (6) ≥ | " ≥ # i=1 X By letting Xi = ℓi(yi), we see that PF can be equivalently bounded as

n PF P Xi η H0 . (7) ≤ " ≥ # i=1 X n Therefore, if we can derive an accurate upper bounds for P( Xi η H ), then we can i=1 ≥ 0 find an upper bound of PF . So the question now is: How do we find good upper bounds for n P P( Xi η H )? i=1 ≥ 0

P 2 Weak Law of Large Number

We begin the analysis by reviewing some elementary probability inequalities. Theorem 1. Markov Inequality For any X 0, and for any ǫ> 0, ≥ E[X] P(X>ǫ) (8) ≤ ǫ Proof.

∞ (a) ∞ (b) ∞ ǫP(X>ǫ)= ǫ fX (x) dx xfX(x) dx xfX (x) dx = E[X], ≤ ≤ Zǫ Zǫ Z0 where (a) holds because ǫ

2 TO DO: Add a pictorial explanation using E[X]= ∞(1 FX (x))dx. 0 − Theorem 2. Chebyshev Inequality R Let X be a random variable such that E[X]= µ and Var(X) < . Then, for all ǫ> 0, ∞ Var[X] P( X µ >ǫ) . (9) | − | ≤ ǫ2 Proof.

E[(X µ)2] Var[X] P( X µ >ǫ)= P((X µ)2 >ǫ2) − = | − | − ≤ ǫ2 ǫ2 where the inequality is due to Markov. ✷

With Chebyshev inequality, we can now prove the following result. Proposition 2. 2 Let X1,...,Xn be iid random variables with E[Xk]= µ and V ar(Xk)= σ . If

n 1 Y = X , n n k Xk=1 then for any ǫ> 0, we have σ2 P( Yn µ >ǫ) . (10) | − | ≤ nǫ2 Proof. By Chebyshev inequality, we have

2 E[(Yn µ) ] P( Yn µ >ǫ) − . | − | ≤ ǫ2 Now, we can show that

n n 2 E 2 1 1 σ [(Yn µ) ] = Var(Yn) = Var Xk = 2 Var(Xk)= . − n ! n n Xk=1 Xk=1 ✷

The interpretation of Proposition 2 is important. It says that if we have a sequence of iid random variables X1,...,Xn, the Yn will stay around the mean of X1. In particular:

σ2 lim P( Yn µ >ǫ) lim = 0 n n 2 →∞ | − | ≤ →∞ nǫ This result is known as the Weak .

3 Example

Consider a unit square containing an arbitrary shape Ω. Let X1,...,Xn be a sequence of iid 1 n Bernoulli random variables with probability p = Ω , i.e., p = area of Ω. Let Yn = Xk. We | | n k=1 can show that P n 1 np E[Y ]= E[X ]= = p, (11) n n k n Xk=1 and n 1 1 p(1 p) V ar(Yn)= Var(Xk)= np(1 p)= − . (12) n2 n2 − n k X=1 Therefore: p(1 p) P( Yn µ >ǫ) − 0 as n . | − | ≤ nǫ2 → →∞ So by throwing arbitrarily n “darts” to the unit square we can approximate the area Ω.

Example n n TO DO: Add an example of approximating y = i=1 aixi by Y = i=1 aixiIi/pi. The convergence behavior demonstrated by WLLNP is known asP the convergence in probability. Formally, it says the followings. Definition 1. Convergence in Probability We say that a sequence of random variables Y1,...,Yn converges in probability to µ, denoted by p Yn µ if −→ lim P( Yn µ >ǫ) = 0. (13) n →∞ | − | For more discussion regarding WLLN, we refer the readers to standard probability textbooks. We close this section by mentioning the following proposition, which appears to be very useful in practice. Proposition 3. p p If Yn µ, then f(Yn) f(µ) for any f that is continuous at µ. −→ −→ Proof. Since f is continuous at µ, by continuity we must have that ǫ> 0, δ> 0 such that ∀ ∃ x µ < δ f(x) f(µ) < ǫ. | − | ⇒ | − | Therefore, P( Yn µ < δ) P( f(x) f(µ) <ǫ) | − | ≤ | − | because “ Yn µ < δ” is a subset of “ f(x) f(µ) <ǫ”. Hence | − | | − |

P( Yn µ < δ) P( f(x) f(µ) <ǫ) 0 as n | − | ≤ | − | → →∞ ✷

4 Example

1 n p Let X , ..., Xn be iid Poisson(λ). Then if Yn = Xk, and Yn λ, then 1 n k=1 −→ p Yn P λ e− e− −→ 3 Central Limit Theorem

In introductory probability courses we have also learned the Central Limit Theorem. Central Limit Theorem concerns about the convergence of a sequence of distributions. Definition 2. A sequence of distributions with CDF F1,...,Fn is said to converge to another distribution F , denoted as Fn F , if Fn(x) F (x) at all continuous points x of F . → → Definition 3. Convergence in Distribution A sequence of random variables Y1,...,Yn is said to converge to Y in distribution F , denoted as d Yn F , if Fn F , where Fn is the cdf of Yn and F is the CDF of Y . −→ →

Example

d The notation Yn (0, 1) that the distribution of Yn is converging to (0,1). Note that d −→N N Yn Y does not mean that Yn is becoming Y . It only means that FY is becoming FY . −→ n Remark p d Yn Y Yn Y , but the converse is not true. For example, let X and Y be two iid random −→ ⇒ −→ 1 p variables with distribution (0, 1). Let Yn = Y + n . Then it can be shown that Yn Y , as well d Nd −→p as Yn Y . This gives Yn X, as X has the same distribution as Y . However Yn X is not −→ −→ −→ true, as Yn is becoming Y , not X.

We now present the Central Limit Theorem. Theorem 3. Central Limit Theorem 2 Let X , ..., Xn be iid random variables with E[Xk]= µ and Var(Xk)= σ < , Then 1 ∞ d 2 √n(Yn µ) (0,σ ) − −→N 1 n where Yn = n k=1 Xk. Proof. P It is sufficient to prove that Yn µ d √n − (0, 1) σ −→N   Yn µ Let Zn = √n( σ− ). The moment generating function of Zn is n Y µ s def n (Xk µ) E sZn E s√n( σ− ) E σ√n MZn (s) = [e ]= e = e − . k h i Y=1 h i

5 By Taylor approximation, we have

s 2 (Xk µ) s s 2 1 3 E e σ√n − = E 1+ (Xk µ)+ (Xk µ) + O( (Xk µ) ) σ√n − σ2n − σ3√n3 − h i   s2 =(1+0+ ). 2n Therefore,

2 n 2 s (a) s MZ (s)= 1+0+ e 2 , n 2n −−→   s2 n s2 as n . To prove (a), we let yn = (1+ ) . Then, log yn = n log(1 + ), and by Taylor → ∞ 2n 2n approximation we have

x2 log(1 + x ) x 0 . 0 ≈ 0 − 2 Therefore, 2 2 4 2 4 2 s s s s s n s log yn = n log(1 + )= n( )= →∞ . 2n 2n − 4n2 2 − 4n −−−→ 2 ✷

As a corollary of the Central Limit Theorem, we also derive the following proposition. Proposition 4. Delta Method d 2 d 2 2 If √n(Tn θ) (0,τ ), then √n(f(Tn) f(θ)) (0,τ (f (θ) )), provided f (θ) exists. − −→N − −→N ′ ′ This result is known as the Delta Method. Proof. By Taylor expansion

2 f(Tn)= f(θ) + (Tn θ)f ′(θ)+ O((Tn θ) ) − − Therefore,

d 2 2 √n(f(Tn) f(θ)) = √n(Tn θ)f ′(θ) (0,τ (f ′(θ) )). − − −→N ✷

We close this section by discussing the limitation of the Central Limit Theorem. Recall that our analysis question is to study: n P Xi η . (14) ≥ i ! X=1 Central Limit Theorem says that

n Xi nµ lim P ( i=1 − ) ǫ = Φ(ǫ) n √nσ ≤ →∞  P 

6 This implies that

n lim P Xi nµ + √nσǫ = Φ(ǫ), n ≤ →∞ i ! X=1 and hence

n 1 σǫ lim P Xi µ + = Φ(ǫ). n n ≤ √n →∞ i ! X=1 As n , σǫ 0. Therefore, the deviation that central limit theorem can handle is small → ∞ √n → deviation. TO DO: Add a picture to explain small deviation VS large deviation.

7