Lecture Notes 4 Expectation • Definition and Properties
Total Page:16
File Type:pdf, Size:1020Kb
Lecture Notes 4 Expectation • Definition and Properties • Covariance and Correlation • Linear MSE Estimation • Sum of RVs • Conditional Expectation • Iterated Expectation • Nonlinear MSE Estimation • Sum of Random Number of RVs Corresponding pages from B&T: 81-92, 94-98, 104-115, 160-163, 171-174, 179, 225-233, 236-247. EE 178/278A: Expectation Page 4–1 Definition • We already introduced the notion of expectation (mean) of a r.v. • We generalize this definition and discuss it in more depth • Let X ∈X be a discrete r.v. with pmf pX(x) and g(x) be a function of x. The expectation or expected value of g(X) is defined as E(g(X)) = g(x)pX(x) x∈X • For a continuous r.v. X ∼ fX(x), the expected value of g(X) is defined as ∞ E(g(X)) = g(x)fX(x) dx −∞ • Examples: ◦ g(X)= c, a constant, then E(g(X)) = c ◦ g(X)= X, E(X)= x xpX(x) is the mean of X k k ◦ g(X)= X , E(X ) is the kth moment of X ◦ g(X)=(X − E(X))2, E (X − E(X))2 is the variance of X EE 178/278A: Expectation Page 4–2 • Expectation is linear, i.e., for any constants a and b E[ag1(X)+ bg2(X)] = a E(g1(X)) + b E(g2(X)) Examples: ◦ E(aX + b)= a E(X)+ b ◦ Var(aX + b)= a2Var(X) Proof: From the definition Var(aX + b) = E ((aX + b) − E(aX + b))2 = E (aX + b − a E(X) − b)2 = E a2(X − E(X))2 = a2E (X − E(X))2 = a2Var( X) EE 178/278A: Expectation Page 4–3 Fundamental Theorem of Expectation • Theorem: Let X ∼ pX(x) and Y = g(X) ∼ pY (y), then E(Y )= ypY (y)= g(x)pX(x)=E(g(X)) y∈Y x∈X • The same formula holds for fY (y) using integrals instead of sums • Conclusion: E(Y ) can be found using either fX(x) or fY (y). It is often much easier to use fX(x) than to first find fY (y) then find E(Y ) • Proof: We prove the theorem for discrete r.v.s. Consider E(Y )= ypY (y) y = y pX(x) y {x: g(x)=y} = ypX(x) y {x: g(x)=y} = g(x)pX(x)= g(x)pX(x) y x {x: g(x)=y} EE 178/278A: Expectation Page 4–4 Expectation Involving Two RVs • Let (X,Y ) ∼ fX,Y (x, y) and let g(x, y) be a function of x and y. The expectation of g(X,Y ) is defined as ∞ ∞ E(g(X,Y )) = g(x, y)fX,Y (x, y) dxdy −∞ −∞ The function g(X,Y ) may be X, Y , X2, X + Y , etc. • The correlation of X and Y is defined as E(XY ) • The covariance of X and Y is defined as Cov(X,Y )=E[(X − E(X))(Y − E(Y ))] =E[XY − X E(Y ) − Y E(X)+E(X) E(Y )] = E(XY ) − E(X) E(Y ) Note that if X = Y , then Cov(X,Y ) = Var(X) EE 178/278A: Expectation Page 4–5 • Example: Let 2 for x, y ≥ 0, x + y ≤ 1 f(x, y)= 0 otherwise Find E(X), Var(X), and Cov(X,Y ) Solution: The mean is ∞ ∞ E(X)= xf(x, y) dy dx −∞ −∞ 1 1−x 1 1 = 2xdydx = 2 (1 − x)x dx = 3 0 0 0 To find the variance, we first find the second moment 1 1−x 1 1 E(X2)= 2x2 dy dx = 2 (1 − x)x2 dx = 6 0 0 0 Thus, 1 1 1 Var(X)=E(X2) − (E(X))2 = − = 6 9 18 EE 178/278A: Expectation Page 4–6 The covariance of X and Y is 1 1−x Cov(X,Y ) = 2 xy dy dx − E(X) E(Y ) 0 0 1 1 1 1 1 = x(1 − x)2 dx − = − = − 9 12 9 36 0 EE 178/278A: Expectation Page 4–7 Independence and Uncorrelation • Let X and Y be independent r.v.s and g(X) and h(Y ) be functions of X and Y , respectively, then E(g(X)h(Y )) = E(g(X))E(h(Y )) Proof: Let’s assume that X ∼ fX(x) and Y ∼ fY (y), then ∞ ∞ E(g(X)h(Y )) = g(x)h(y)fX,Y (x, y) dx dy −∞ −∞ ∞ ∞ = g(x)h(y)fX(x)fY (y) dx dy −∞ −∞ ∞ ∞ = g(x)fX(x) dx h(y)fY (y) dy −∞ −∞ = E(g(X))E(h(Y )) • X and Y are said to be uncorrelated if Cov(X,Y ) = 0, or equivalently E(XY )=E(X) E(Y ) EE 178/278A: Expectation Page 4–8 • From our independence result, if X and Y are independent then they are uncorrelated To show this, set g(X)=(X − E(X)) and h(Y )=(Y − E(Y )), then Cov(X,Y ) = E[(X − E(X))(Y − E(Y ))] = E(X − E(X))E(Y − E(Y ))=0 • However, if X and Y are uncorrelated they are not necessarily independent • Example: Let X,Y ∈ {−2, −1, 1, 2} such that pX,Y (1, 1) = 2/5, pX,Y (−1, −1) = 2/5 pX,Y (−2, 2) = 1/10, pX,Y (2, −2) = 1/10, pX,Y (x, y) = 0, otherwise Are X and Y independent? Are they uncorrelated? EE 178/278A: Expectation Page 4–9 Solution: y 1/10 2 2/5 1 −2 −11 2 x 2/5 −1 1/10 −2 Clearly X and Y are not independent, since if you know the outcome of one, you completely know the outcome of the other. Let’s check their covariance 2 2 2 2 E(X)= − − + = 0, also 5 5 10 10 E(Y ) = 0, and 2 2 4 4 E(XY )= + − − = 0 5 5 10 10 Thus, Cov(X,Y ) = 0, and X and Y are uncorrelated! EE 178/278A: Expectation Page 4–10 The Correlation Coefficient • The correlation coefficient of X and Y is defined as Cov(X,Y ) ρX,Y = Var(X)Var(Y ) • Fact: |ρX,Y |≤ 1. To show this consider 2 X − E(X) Y − E(Y ) E ± ≥ 0 σ σ X Y E (X − E(X))2 E (Y − E(Y ))2 E (X − E(X))(Y − E(Y )) 2 + 2 ± 2 ≥ 0 σX σY σXσY 1 + 1 ± 2ρX,Y ≥ 0 =⇒ −2 ≤ 2ρX,Y ≤ 2 =⇒ |ρX,Y |≤ 1 • From the proof, ρX,Y = ±1 iff (X − E(X))/σX = ±(Y − E(Y ))/σY (equality with probability 1), i.e., iff X − E(X) is a linear function of Y − E(Y ) • In general ρX,Y is a measure of how closely (X − E(X)) can be approximated or estimated by a linear function of (Y − E(Y )) EE 178/278A: Expectation Page 4–11 Application: Linear MSE Estimation • Consider the following signal processing problem: Noisy Y X ˆ Channel Estimator X aX + b • Here X is a signal (music, speech, image) and Y is a noisy observation of X (output of a noisy communication channel or a noisy circuit). Assume we know the means, variances and covariance of X and Y • Observing Y , we wish to find a linear estimate of X of the form Xˆ = aY + b, which minimizes the mean square error MSE = E (X − Xˆ)2 • We denote the best such estimate as the minimum mean square estimate (MMSE) EE 178/278A: Expectation Page 4–12 • The MMSE linear estimate of X given Y is given by ˆ Cov(X,Y ) X = 2 (Y − E(Y ))+E(X) σY Y − E(Y ) = ρ σ + E(X) X,Y X σ Y and its MSE is given by 2 2 Cov (X,Y ) 2 2 MSE = σX − 2 = (1 − ρX,Y )σX σY • Properties of MMSE linear estimate: – E(Xˆ)=E(X), i.e., estimate is unbiased – If ρX,Y = 0, i.e., X and Y are uncorrelated, then Xˆ = E(X) (ignore the observation Y ) – If ρX,Y = ±1, i.e., X − E(X) and Y − E(Y ) are linearly dependent, then the linear estimate is perfect EE 178/278A: Expectation Page 4–13 Proof 2 • We first show that mina E (X − b) = Var(X) and is achieved for b = E(X), i.e., in the absence of any observations, the mean of X is its minimum MSE estimate, and the minimum MSE is Var(X) To show this consider E (X − c)2 = E [(X − E(X)) + (E(X) − b)]2 = E (X − E(X))2 + (E(X) − b)2+ 2(E( X) − b) E(X − E(X)) = E (X − E(X))2 + (E(X) − b)2 ≥ E (X − E(X))2 , with equality iff b = E(X) • Now, back to our problem. Suppose a has already been chosen. What should b be to minimize E (X − aY − b)2 ? From the above result, we should choose EE 178/278A: Expectation Page 4–14 b = E(X − aY )=E(X) − a E(Y ) So, we want to choose a to minimize E ((X − aY ) − E(X − aY ))2 , which is the same as 2 2 2 2 E ((X − E(X)) − a(Y − E(Y ))) = σX + a σY − 2aCov(X,Y ) This is a quadratic function of a. It is minimized when its derivative equals 0, which gives Cov(X,Y ) ρX,Y σXσY ρX,Y σX a = 2 = 2 = σY σY σY The mean square error is given by 2 2 2 2 2 2 ρX,Y σX 2 ρX,Y σX σX + a σY − 2aCov(X,Y )= σX + 2 σY − 2 × ρX,Y σXσY σY σY 2 2 = (1 − ρX,Y )σX EE 178/278A: Expectation Page 4–15 Mean and Variance of Sum of RVs • Let X1,X2,...,Xn be r.v.s, then by linearity of expectation, the expected value of their sum Y is n n E(Y ) = E Xi = E(Xi) i=1 i=1 Example: Mean of Binomial r.v.