Lecture Notes 4 Expectation
• Definition and Properties • Covariance and Correlation • Linear MSE Estimation • Sum of RVs • Conditional Expectation • Iterated Expectation • Nonlinear MSE Estimation • Sum of Random Number of RVs Corresponding pages from B&T: 81-92, 94-98, 104-115, 160-163, 171-174, 179, 225-233, 236-247.
EE 178/278A: Expectation Page 4–1
Definition
• We already introduced the notion of expectation (mean) of a r.v. • We generalize this definition and discuss it in more depth
• Let X ∈X be a discrete r.v. with pmf pX(x) and g(x) be a function of x. The expectation or expected value of g(X) is defined as
E(g(X)) = g(x)pX(x) x ∈X • For a continuous r.v. X ∼ fX(x), the expected value of g(X) is defined as ∞ E(g(X)) = g(x)fX(x) dx −∞ • Examples: ◦ g(X)= c, a constant, then E(g(X)) = c
◦ g(X)= X, E(X)= x xpX(x) is the mean of X k k ◦ g(X)= X , E(X ) is the kth moment of X ◦ g(X)=(X − E(X))2, E (X − E(X))2 is the variance of X
EE 178/278A: Expectation Page 4–2 • Expectation is linear, i.e., for any constants a and b
E[ag1(X)+ bg2(X)] = a E(g1(X)) + b E(g2(X))
Examples: ◦ E(aX + b)= a E(X)+ b ◦ Var(aX + b)= a2Var(X) Proof: From the definition
Var(aX + b) = E ((aX + b) − E(aX + b))2 = E (aX + b − a E(X) − b)2
= E a2(X − E(X))2 = a2 E (X − E(X))2 = a2Var( X)
EE 178/278A: Expectation Page 4–3
Fundamental Theorem of Expectation
• Theorem: Let X ∼ pX(x) and Y = g(X) ∼ pY (y), then
E(Y )= ypY (y)= g(x)pX(x)=E(g(X)) y ∈Y x ∈X • The same formula holds for fY (y) using integrals instead of sums • Conclusion: E(Y ) can be found using either fX(x) or fY (y). It is often much easier to use fX(x) than to first find fY (y) then find E(Y ) • Proof: We prove the theorem for discrete r.v.s. Consider
E(Y )= ypY (y) y = y pX(x) y {x: g (x)=y} = ypX(x) y {x: g (x)=y} = g(x)pX(x)= g(x)pX(x) y x {x: g (x)=y}
EE 178/278A: Expectation Page 4–4 Expectation Involving Two RVs
• Let (X,Y ) ∼ fX,Y (x, y) and let g(x, y) be a function of x and y. The expectation of g(X,Y ) is defined as
∞ ∞ E(g(X,Y )) = g(x, y)fX,Y (x, y) dxdy −∞ −∞ The function g(X,Y ) may be X, Y , X2, X + Y , etc.
• The correlation of X and Y is defined as E(XY )
• The covariance of X and Y is defined as
Cov(X,Y )=E[(X − E(X))(Y − E(Y ))] =E[XY − X E(Y ) − Y E(X)+E(X) E(Y )] = E(XY ) − E(X) E(Y )
Note that if X = Y , then Cov(X,Y ) = Var(X)
EE 178/278A: Expectation Page 4–5
• Example: Let 2 for x, y ≥ 0, x + y ≤ 1 f(x, y)= 0 otherwise Find E(X), Var(X), and Cov(X,Y ) Solution: The mean is
∞ ∞ E(X)= xf(x, y) dy dx −∞ −∞ 1 1−x 1 1 = 2xdydx = 2 (1 − x)x dx = 3 0 0 0 To find the variance, we first find the second moment
1 1−x 1 1 E(X2)= 2x2 dy dx = 2 (1 − x)x2 dx = 6 0 0 0 Thus, 1 1 1 Var(X)=E(X2) − (E(X))2 = − = 6 9 18
EE 178/278A: Expectation Page 4–6 The covariance of X and Y is
1 1−x Cov(X,Y ) = 2 xy dy dx − E(X) E(Y ) 0 0 1 1 1 1 1 = x(1 − x)2 dx − = − = − 9 12 9 36 0
EE 178/278A: Expectation Page 4–7
Independence and Uncorrelation
• Let X and Y be independent r.v.s and g(X) and h(Y ) be functions of X and Y , respectively, then E(g(X)h(Y )) = E(g(X))E(h(Y ))
Proof: Let’s assume that X ∼ fX(x) and Y ∼ fY (y), then ∞ ∞ E(g(X)h(Y )) = g(x)h(y)fX,Y (x, y) dx dy −∞ −∞ ∞ ∞ = g(x)h(y)fX(x)fY (y) dx dy −∞ −∞ ∞ ∞ = g(x)fX(x) dx h(y)fY (y) dy −∞ −∞ = E(g(X))E(h(Y ))
• X and Y are said to be uncorrelated if Cov(X,Y ) = 0, or equivalently E(XY )=E(X) E(Y )
EE 178/278A: Expectation Page 4–8 • From our independence result, if X and Y are independent then they are uncorrelated To show this, set g(X)=(X − E(X)) and h(Y )=(Y − E(Y )), then
Cov(X,Y ) = E[(X − E(X))(Y − E(Y ))] = E(X − E(X))E(Y − E(Y ))=0
• However, if X and Y are uncorrelated they are not necessarily independent
• Example: Let X,Y ∈ {−2, −1, 1, 2} such that
pX,Y (1, 1) = 2/5, pX,Y (−1, −1) = 2/5
pX,Y (−2, 2) = 1/10, pX,Y (2, −2) = 1/10,
pX,Y (x, y) = 0, otherwise
Are X and Y independent? Are they uncorrelated?
EE 178/278A: Expectation Page 4–9
Solution: y 1/10 2 2/5 1
−2 −11 2 x 2/5 −1 1/10 −2
Clearly X and Y are not independent, since if you know the outcome of one, you completely know the outcome of the other. Let’s check their covariance 2 2 2 2 E(X)= − − + = 0, also 5 5 10 10 E(Y ) = 0, and 2 2 4 4 E(XY )= + − − = 0 5 5 10 10 Thus, Cov(X,Y ) = 0, and X and Y are uncorrelated!
EE 178/278A: Expectation Page 4–10 The Correlation Coefficient
• The correlation coefficient of X and Y is defined as Cov(X,Y ) ρX,Y = Var(X)Var(Y )
• Fact: |ρX,Y |≤ 1. To show this consider
2 X − E(X) Y − E(Y ) E ± ≥ 0 σ σ X Y E (X − E(X))2 E (Y − E(Y ))2 E (X − E(X))(Y − E(Y )) 2 + 2 ± 2 ≥ 0 σX σY σXσY
1 + 1 ± 2ρX,Y ≥ 0 =⇒ −2 ≤ 2ρX,Y ≤ 2 =⇒ |ρX,Y |≤ 1
• From the proof, ρX,Y = ±1 iff (X − E(X))/σX = ±(Y − E(Y ))/σY (equality with probability 1), i.e., iff X − E(X) is a linear function of Y − E(Y )
• In general ρX,Y is a measure of how closely (X − E(X)) can be approximated or estimated by a linear function of (Y − E(Y ))
EE 178/278A: Expectation Page 4–11
Application: Linear MSE Estimation
• Consider the following signal processing problem:
Noisy Y X ˆ Channel Estimator X aX + b
• Here X is a signal (music, speech, image) and Y is a noisy observation of X (output of a noisy communication channel or a noisy circuit). Assume we know the means, variances and covariance of X and Y
• Observing Y , we wish to find a linear estimate of X of the form Xˆ = aY + b, which minimizes the mean square error MSE = E (X − Xˆ)2 • We denote the best such estimate as the minimum mean square estimate (MMSE)
EE 178/278A: Expectation Page 4–12 • The MMSE linear estimate of X given Y is given by
ˆ Cov(X,Y ) X = 2 (Y − E(Y ))+E(X) σY Y − E(Y ) = ρ σ + E(X) X,Y X σ Y and its MSE is given by
2 2 Cov (X,Y ) 2 2 MSE = σX − 2 = (1 − ρX,Y )σX σY
• Properties of MMSE linear estimate: – E(Xˆ)=E(X), i.e., estimate is unbiased – If ρX,Y = 0, i.e., X and Y are uncorrelated, then Xˆ = E(X) (ignore the observation Y ) – If ρX,Y = ±1, i.e., X − E(X) and Y − E(Y ) are linearly dependent, then the linear estimate is perfect
EE 178/278A: Expectation Page 4–13
Proof
2 • We first show that mina E (X − b) = Var(X) and is achieved for b = E(X), i.e., in the absence of any observations, the mean of X is its minimum MSE estimate, and the minimum MSE is Var(X) To show this consider E (X − c)2 = E [(X − E(X)) + (E(X) − b)]2