Law of Large Number and Central Limit Theorem 1 Probability Bounds for PF and PM
Total Page:16
File Type:pdf, Size:1020Kb
ECE 645: Estimation Theory Spring 2015 Instructor: Prof. Stanley H. Chan Lecture 4: Law of Large Number and Central Limit Theorem (LaTeX prepared by Jing Li) March 31, 2015 This lecture note is based on ECE 645(Spring 2015) by Prof. Stanley H. Chan in the School of Electrical and Computer Engineering at Purdue University. 1 Probability Bounds for PF and PM In the previous lectures we have studied various detection methods. Starting from this lecture, we want to take a step further to analyze the performance of these detection methods. In order to motivate ourselves to learn a set of new tools called Large Deviation Theory, let us first review some “standard” tools, namely the Law of Large Number and the Central Limit Theorem. To begin our discussion, let us first consider the probability of false alarm PF and the probability of miss PM . If Y = y is a one-dimensional observation, we can show the following proposition. Proposition 1. Given a one-dimensional observation Y = y and a decision rule δ(y), it holds that PF P(ℓ(y) η H ), (1) ≤ ≥ | 0 and PM P(ℓ(y) η H ), (2) ≤ ≤ | 1 where ℓ(y) def= log L(y) is the log-likelihood ratio. Proof. Given δ(y), it holds that PF = f (y) dy + γ f (y) dy f (y) dy = P(ℓ(y) η H ), 0 0 ≤ 0 ≥ | 0 ℓ(yZ)>η ℓ(yZ)=η ℓ(yZ) η ≥ where the inequality holds because γ 1. Similarly, we have ≤ PM = f (y) dy + (1 γ) f (y) dy f (y) dy = P(ℓ(y) η H ). 1 − 1 ≤ 1 ≤ | 1 ℓ(yZ)<η ℓ(yZ)=η ℓ(yZ) η ≤ ✷ While the derivation shows that PF and PM can be evaluated through the probability of hav- ing ℓ(y) ≶ η, the same trick becomes much more difficult if we proceed to a high-dimensional observation Y = y. In this case, we let T y = [y1,y2,...,yn] . (3) Then, f0(y) dy = f0(y1,...,yn) dy1 ...dyn ℓ(yZ) η ℓ(yZ) η ≥ ≥ n = f0(yi) dy1 ...dyn. (4) i=1 ℓ(yZ) η ≥ Y Unfortunately, (4) involves multivariate integration and is extremely difficult to compute. To overcome this difficulty, it will be useful to note that PF P(ℓ(y) η H ). (5) ≤ ≥ | 0 Since n f (y) ℓ(y) = log 1 = ℓ (y ), f (y) i i 0 i X=1 def where ℓ (y ) = log f1(yi) , it holds that i i f0(yi) n P(ℓ(y) η H0)= P ℓi(yi) η H0 . (6) ≥ | " ≥ # i=1 X By letting Xi = ℓi(yi), we see that PF can be equivalently bounded as n PF P Xi η H0 . (7) ≤ " ≥ # i=1 X n Therefore, if we can derive an accurate upper bounds for P( Xi η H ), then we can i=1 ≥ 0 find an upper bound of PF . So the question now is: How do we find good upper bounds for n P P( Xi η H )? i=1 ≥ 0 P 2 Weak Law of Large Number We begin the analysis by reviewing some elementary probability inequalities. Theorem 1. Markov Inequality For any random variable X 0, and for any ǫ> 0, ≥ E[X] P(X>ǫ) (8) ≤ ǫ Proof. ∞ (a) ∞ (b) ∞ ǫP(X>ǫ)= ǫ fX (x) dx xfX(x) dx xfX (x) dx = E[X], ≤ ≤ Zǫ Zǫ Z0 where (a) holds because ǫ<x, and (b) holds because xfX (x) 0. ≥ ✷ 2 TO DO: Add a pictorial explanation using E[X]= ∞(1 FX (x))dx. 0 − Theorem 2. Chebyshev Inequality R Let X be a random variable such that E[X]= µ and Var(X) < . Then, for all ǫ> 0, ∞ Var[X] P( X µ >ǫ) . (9) | − | ≤ ǫ2 Proof. E[(X µ)2] Var[X] P( X µ >ǫ)= P((X µ)2 >ǫ2) − = | − | − ≤ ǫ2 ǫ2 where the inequality is due to Markov. ✷ With Chebyshev inequality, we can now prove the following result. Proposition 2. 2 Let X1,...,Xn be iid random variables with E[Xk]= µ and V ar(Xk)= σ . If n 1 Y = X , n n k Xk=1 then for any ǫ> 0, we have σ2 P( Yn µ >ǫ) . (10) | − | ≤ nǫ2 Proof. By Chebyshev inequality, we have 2 E[(Yn µ) ] P( Yn µ >ǫ) − . | − | ≤ ǫ2 Now, we can show that n n 2 E 2 1 1 σ [(Yn µ) ] = Var(Yn) = Var Xk = 2 Var(Xk)= . − n ! n n Xk=1 Xk=1 ✷ The interpretation of Proposition 2 is important. It says that if we have a sequence of iid random variables X1,...,Xn, the mean Yn will stay around the mean of X1. In particular: σ2 lim P( Yn µ >ǫ) lim = 0 n n 2 →∞ | − | ≤ →∞ nǫ This result is known as the Weak Law of Large Numbers. 3 Example Consider a unit square containing an arbitrary shape Ω. Let X1,...,Xn be a sequence of iid 1 n Bernoulli random variables with probability p = Ω , i.e., p = area of Ω. Let Yn = Xk. We | | n k=1 can show that P n 1 np E[Y ]= E[X ]= = p, (11) n n k n Xk=1 and n 1 1 p(1 p) V ar(Yn)= Var(Xk)= np(1 p)= − . (12) n2 n2 − n k X=1 Therefore: p(1 p) P( Yn µ >ǫ) − 0 as n . | − | ≤ nǫ2 → →∞ So by throwing arbitrarily n “darts” to the unit square we can approximate the area Ω. Example n n TO DO: Add an example of approximating y = i=1 aixi by Y = i=1 aixiIi/pi. The convergence behavior demonstrated by WLLNP is known asP the convergence in probability. Formally, it says the followings. Definition 1. Convergence in Probability We say that a sequence of random variables Y1,...,Yn converges in probability to µ, denoted by p Yn µ if −→ lim P( Yn µ >ǫ) = 0. (13) n →∞ | − | For more discussion regarding WLLN, we refer the readers to standard probability textbooks. We close this section by mentioning the following proposition, which appears to be very useful in practice. Proposition 3. p p If Yn µ, then f(Yn) f(µ) for any function f that is continuous at µ. −→ −→ Proof. Since f is continuous at µ, by continuity we must have that ǫ> 0, δ> 0 such that ∀ ∃ x µ < δ f(x) f(µ) < ǫ. | − | ⇒ | − | Therefore, P( Yn µ < δ) P( f(x) f(µ) <ǫ) | − | ≤ | − | because “ Yn µ < δ” is a subset of “ f(x) f(µ) <ǫ”. Hence | − | | − | P( Yn µ < δ) P( f(x) f(µ) <ǫ) 0 as n | − | ≤ | − | → →∞ ✷ 4 Example 1 n p Let X , ..., Xn be iid Poisson(λ). Then if Yn = Xk, and Yn λ, then 1 n k=1 −→ p Yn P λ e− e− −→ 3 Central Limit Theorem In introductory probability courses we have also learned the Central Limit Theorem. Central Limit Theorem concerns about the convergence of a sequence of distributions. Definition 2. A sequence of distributions with CDF F1,...,Fn is said to converge to another distribution F , denoted as Fn F , if Fn(x) F (x) at all continuous points x of F . → → Definition 3. Convergence in Distribution A sequence of random variables Y1,...,Yn is said to converge to Y in distribution F , denoted as d Yn F , if Fn F , where Fn is the cdf of Yn and F is the CDF of Y . −→ → Example d The notation Yn (0, 1) means that the distribution of Yn is converging to (0,1). Note that d −→N N Yn Y does not mean that Yn is becoming Y . It only means that FY is becoming FY . −→ n Remark p d Yn Y Yn Y , but the converse is not true. For example, let X and Y be two iid random −→ ⇒ −→ 1 p variables with distribution (0, 1). Let Yn = Y + n . Then it can be shown that Yn Y , as well d Nd −→p as Yn Y . This gives Yn X, as X has the same distribution as Y . However Yn X is not −→ −→ −→ true, as Yn is becoming Y , not X. We now present the Central Limit Theorem. Theorem 3. Central Limit Theorem 2 Let X , ..., Xn be iid random variables with E[Xk]= µ and Var(Xk)= σ < , Then 1 ∞ d 2 √n(Yn µ) (0,σ ) − −→N 1 n where Yn = n k=1 Xk. Proof. P It is sufficient to prove that Yn µ d √n − (0, 1) σ −→N Yn µ Let Zn = √n( σ− ). The moment generating function of Zn is n Y µ s def n (Xk µ) E sZn E s√n( σ− ) E σ√n MZn (s) = [e ]= e = e − . k h i Y=1 h i 5 By Taylor approximation, we have s 2 (Xk µ) s s 2 1 3 E e σ√n − = E 1+ (Xk µ)+ (Xk µ) + O( (Xk µ) ) σ√n − σ2n − σ3√n3 − h i s2 =(1+0+ ). 2n Therefore, 2 n 2 s (a) s MZ (s)= 1+0+ e 2 , n 2n −−→ s2 n s2 as n . To prove (a), we let yn = (1+ ) . Then, log yn = n log(1 + ), and by Taylor → ∞ 2n 2n approximation we have x2 log(1 + x ) x 0 . 0 ≈ 0 − 2 Therefore, 2 2 4 2 4 2 s s s s s n s log yn = n log(1 + )= n( )= →∞ . 2n 2n − 4n2 2 − 4n −−−→ 2 ✷ As a corollary of the Central Limit Theorem, we also derive the following proposition. Proposition 4. Delta Method d 2 d 2 2 If √n(Tn θ) (0,τ ), then √n(f(Tn) f(θ)) (0,τ (f (θ) )), provided f (θ) exists.