Lecture Notes 4 Expectation • Definition and Properties

Lecture Notes 4 Expectation • Definition and Properties • Covariance and Correlation • Linear MSE Estimation • Sum of RVs • Conditional Expectation • Iterated Expectation • Nonlinear MSE Estimation • Sum of Random Number of RVs Corresponding pages from B&T: 81-92, 94-98, 104-115, 160-163, 171-174, 179, 225-233, 236-247. EE 178/278A: Expectation Page 4–1 Definition • We already introduced the notion of expectation (mean) of a r.v. • We generalize this definition and discuss it in more depth • Let X ∈X be a discrete r.v. with pmf pX(x) and g(x) be a function of x. The expectation or expected value of g(X) is defined as E(g(X)) = g(x)pX(x) x∈X • For a continuous r.v. X ∼ fX(x), the expected value of g(X) is defined as ∞ E(g(X)) = g(x)fX(x) dx −∞ • Examples: ◦ g(X)= c, a constant, then E(g(X)) = c ◦ g(X)= X, E(X)= x xpX(x) is the mean of X k k ◦ g(X)= X , E(X ) is the kth moment of X ◦ g(X)=(X − E(X))2, E (X − E(X))2 is the variance of X EE 178/278A: Expectation Page 4–2 • Expectation is linear, i.e., for any constants a and b E[ag1(X)+ bg2(X)] = a E(g1(X)) + b E(g2(X)) Examples: ◦ E(aX + b)= a E(X)+ b ◦ Var(aX + b)= a2Var(X) Proof: From the definition Var(aX + b) = E ((aX + b) − E(aX + b))2 = E (aX + b − a E(X) − b)2 = E a2(X − E(X))2 = a2E (X − E(X))2 = a2Var( X) EE 178/278A: Expectation Page 4–3 Fundamental Theorem of Expectation • Theorem: Let X ∼ pX(x) and Y = g(X) ∼ pY (y), then E(Y )= ypY (y)= g(x)pX(x)=E(g(X)) y∈Y x∈X • The same formula holds for fY (y) using integrals instead of sums • Conclusion: E(Y ) can be found using either fX(x) or fY (y). It is often much easier to use fX(x) than to first find fY (y) then find E(Y ) • Proof: We prove the theorem for discrete r.v.s. Consider E(Y )= ypY (y) y = y pX(x) y {x: g(x)=y} = ypX(x) y {x: g(x)=y} = g(x)pX(x)= g(x)pX(x) y x {x: g(x)=y} EE 178/278A: Expectation Page 4–4 Expectation Involving Two RVs • Let (X,Y ) ∼ fX,Y (x, y) and let g(x, y) be a function of x and y. The expectation of g(X,Y ) is defined as ∞ ∞ E(g(X,Y )) = g(x, y)fX,Y (x, y) dxdy −∞ −∞ The function g(X,Y ) may be X, Y , X2, X + Y , etc. • The correlation of X and Y is defined as E(XY ) • The covariance of X and Y is defined as Cov(X,Y )=E[(X − E(X))(Y − E(Y ))] =E[XY − X E(Y ) − Y E(X)+E(X) E(Y )] = E(XY ) − E(X) E(Y ) Note that if X = Y , then Cov(X,Y ) = Var(X) EE 178/278A: Expectation Page 4–5 • Example: Let 2 for x, y ≥ 0, x + y ≤ 1 f(x, y)= 0 otherwise Find E(X), Var(X), and Cov(X,Y ) Solution: The mean is ∞ ∞ E(X)= xf(x, y) dy dx −∞ −∞ 1 1−x 1 1 = 2xdydx = 2 (1 − x)x dx = 3 0 0 0 To find the variance, we first find the second moment 1 1−x 1 1 E(X2)= 2x2 dy dx = 2 (1 − x)x2 dx = 6 0 0 0 Thus, 1 1 1 Var(X)=E(X2) − (E(X))2 = − = 6 9 18 EE 178/278A: Expectation Page 4–6 The covariance of X and Y is 1 1−x Cov(X,Y ) = 2 xy dy dx − E(X) E(Y ) 0 0 1 1 1 1 1 = x(1 − x)2 dx − = − = − 9 12 9 36 0 EE 178/278A: Expectation Page 4–7 Independence and Uncorrelation • Let X and Y be independent r.v.s and g(X) and h(Y ) be functions of X and Y , respectively, then E(g(X)h(Y )) = E(g(X))E(h(Y )) Proof: Let’s assume that X ∼ fX(x) and Y ∼ fY (y), then ∞ ∞ E(g(X)h(Y )) = g(x)h(y)fX,Y (x, y) dx dy −∞ −∞ ∞ ∞ = g(x)h(y)fX(x)fY (y) dx dy −∞ −∞ ∞ ∞ = g(x)fX(x) dx h(y)fY (y) dy −∞ −∞ = E(g(X))E(h(Y )) • X and Y are said to be uncorrelated if Cov(X,Y ) = 0, or equivalently E(XY )=E(X) E(Y ) EE 178/278A: Expectation Page 4–8 • From our independence result, if X and Y are independent then they are uncorrelated To show this, set g(X)=(X − E(X)) and h(Y )=(Y − E(Y )), then Cov(X,Y ) = E[(X − E(X))(Y − E(Y ))] = E(X − E(X))E(Y − E(Y ))=0 • However, if X and Y are uncorrelated they are not necessarily independent • Example: Let X,Y ∈ {−2, −1, 1, 2} such that pX,Y (1, 1) = 2/5, pX,Y (−1, −1) = 2/5 pX,Y (−2, 2) = 1/10, pX,Y (2, −2) = 1/10, pX,Y (x, y) = 0, otherwise Are X and Y independent? Are they uncorrelated? EE 178/278A: Expectation Page 4–9 Solution: y 1/10 2 2/5 1 −2 −11 2 x 2/5 −1 1/10 −2 Clearly X and Y are not independent, since if you know the outcome of one, you completely know the outcome of the other. Let’s check their covariance 2 2 2 2 E(X)= − − + = 0, also 5 5 10 10 E(Y ) = 0, and 2 2 4 4 E(XY )= + − − = 0 5 5 10 10 Thus, Cov(X,Y ) = 0, and X and Y are uncorrelated! EE 178/278A: Expectation Page 4–10 The Correlation Coefficient • The correlation coefficient of X and Y is defined as Cov(X,Y ) ρX,Y = Var(X)Var(Y ) • Fact: |ρX,Y |≤ 1. To show this consider 2 X − E(X) Y − E(Y ) E ± ≥ 0 σ σ X Y E (X − E(X))2 E (Y − E(Y ))2 E (X − E(X))(Y − E(Y )) 2 + 2 ± 2 ≥ 0 σX σY σXσY 1 + 1 ± 2ρX,Y ≥ 0 =⇒ −2 ≤ 2ρX,Y ≤ 2 =⇒ |ρX,Y |≤ 1 • From the proof, ρX,Y = ±1 iff (X − E(X))/σX = ±(Y − E(Y ))/σY (equality with probability 1), i.e., iff X − E(X) is a linear function of Y − E(Y ) • In general ρX,Y is a measure of how closely (X − E(X)) can be approximated or estimated by a linear function of (Y − E(Y )) EE 178/278A: Expectation Page 4–11 Application: Linear MSE Estimation • Consider the following signal processing problem: Noisy Y X ˆ Channel Estimator X aX + b • Here X is a signal (music, speech, image) and Y is a noisy observation of X (output of a noisy communication channel or a noisy circuit). Assume we know the means, variances and covariance of X and Y • Observing Y , we wish to find a linear estimate of X of the form Xˆ = aY + b, which minimizes the mean square error MSE = E (X − Xˆ)2 • We denote the best such estimate as the minimum mean square estimate (MMSE) EE 178/278A: Expectation Page 4–12 • The MMSE linear estimate of X given Y is given by ˆ Cov(X,Y ) X = 2 (Y − E(Y ))+E(X) σY Y − E(Y ) = ρ σ + E(X) X,Y X σ Y and its MSE is given by 2 2 Cov (X,Y ) 2 2 MSE = σX − 2 = (1 − ρX,Y )σX σY • Properties of MMSE linear estimate: – E(Xˆ)=E(X), i.e., estimate is unbiased – If ρX,Y = 0, i.e., X and Y are uncorrelated, then Xˆ = E(X) (ignore the observation Y ) – If ρX,Y = ±1, i.e., X − E(X) and Y − E(Y ) are linearly dependent, then the linear estimate is perfect EE 178/278A: Expectation Page 4–13 Proof 2 • We first show that mina E (X − b) = Var(X) and is achieved for b = E(X), i.e., in the absence of any observations, the mean of X is its minimum MSE estimate, and the minimum MSE is Var(X) To show this consider E (X − c)2 = E [(X − E(X)) + (E(X) − b)]2 = E (X − E(X))2 + (E(X) − b)2+ 2(E( X) − b) E(X − E(X)) = E (X − E(X))2 + (E(X) − b)2 ≥ E (X − E(X))2 , with equality iff b = E(X) • Now, back to our problem. Suppose a has already been chosen. What should b be to minimize E (X − aY − b)2 ? From the above result, we should choose EE 178/278A: Expectation Page 4–14 b = E(X − aY )=E(X) − a E(Y ) So, we want to choose a to minimize E ((X − aY ) − E(X − aY ))2 , which is the same as 2 2 2 2 E ((X − E(X)) − a(Y − E(Y ))) = σX + a σY − 2aCov(X,Y ) This is a quadratic function of a. It is minimized when its derivative equals 0, which gives Cov(X,Y ) ρX,Y σXσY ρX,Y σX a = 2 = 2 = σY σY σY The mean square error is given by 2 2 2 2 2 2 ρX,Y σX 2 ρX,Y σX σX + a σY − 2aCov(X,Y )= σX + 2 σY − 2 × ρX,Y σXσY σY σY 2 2 = (1 − ρX,Y )σX EE 178/278A: Expectation Page 4–15 Mean and Variance of Sum of RVs • Let X1,X2,...,Xn be r.v.s, then by linearity of expectation, the expected value of their sum Y is n n E(Y ) = E Xi = E(Xi) i=1 i=1 Example: Mean of Binomial r.v.

Lecture Notes 4 Expectation • Definition and Properties

Laws of Total Expectation and Total Variance

Probability Cheatsheet V2.0 Thinking Conditionally Law of Total Probability (LOTP)

Probability Cheatsheet

Conditional Expectation and Prediction Conditional Frequency Functions and Pdfs Have Properties of Ordinary Frequency and Density Functions

Chapter 3: Expectation and Variance

Probabilities, Random Variables and Distributions A

(Introduction to Probability at an Advanced Level) - All Lecture Notes

Lecture 15: Expected Running Time

Missed Expectations? 1 Linearity of Expectation

On the Method of Moments for Estimation in Latent Variable Models Anastasia Podosinnikova

Advanced Probability

Overdispersed Models for Claim Count Distribution