Lecture Notes 4 Expectation

• Definition and Properties • Covariance and Correlation • Linear MSE Estimation • Sum of RVs • • Iterated Expectation • Nonlinear MSE Estimation • Sum of Random Number of RVs Corresponding pages from B&T: 81-92, 94-98, 104-115, 160-163, 171-174, 179, 225-233, 236-247.

EE 178/278A: Expectation Page 4–1

Definition

• We already introduced the notion of expectation (mean) of a r.v. • We generalize this definition and discuss it in more depth

• Let X ∈X be a discrete r.v. with pmf pX(x) and g(x) be a function of x. The expectation or of g(X) is defined as

E(g(X)) = g(x)pX(x) x∈X • For a continuous r.v. X ∼ fX(x), the expected value of g(X) is defined as ∞ E(g(X)) = g(x)fX(x) dx −∞ • Examples: ◦ g(X)= c, a constant, then E(g(X)) = c

◦ g(X)= X, E(X)= x xpX(x) is the mean of X k k ◦ g(X)= X , E(X ) is the kth moment of X ◦ g(X)=(X − E(X))2, E (X − E(X))2 is the variance of X

EE 178/278A: Expectation Page 4–2 • Expectation is linear, i.e., for any constants a and b

E[ag1(X)+ bg2(X)] = a E(g1(X)) + b E(g2(X))

Examples: ◦ E(aX + b)= a E(X)+ b ◦ Var(aX + b)= a2Var(X) Proof: From the definition

Var(aX + b) = E ((aX + b) − E(aX + b))2 = E (aX + b − a E(X) − b)2

= E a2(X − E(X))2 = a2E (X − E(X))2 = a2Var( X)

EE 178/278A: Expectation Page 4–3

Fundamental Theorem of Expectation

• Theorem: Let X ∼ pX(x) and Y = g(X) ∼ pY (y), then

E(Y )= ypY (y)= g(x)pX(x)=E(g(X)) y∈Y x∈X • The same formula holds for fY (y) using integrals instead of sums • Conclusion: E(Y ) can be found using either fX(x) or fY (y). It is often much easier to use fX(x) than to first find fY (y) then find E(Y ) • Proof: We prove the theorem for discrete r.v.s. Consider

E(Y )= ypY (y) y = y pX(x) y {x: g(x)=y} = ypX(x) y {x: g(x)=y} = g(x)pX(x)= g(x)pX(x) y x {x: g(x)=y}

EE 178/278A: Expectation Page 4–4 Expectation Involving Two RVs

• Let (X,Y ) ∼ fX,Y (x, y) and let g(x, y) be a function of x and y. The expectation of g(X,Y ) is defined as

∞ ∞ E(g(X,Y )) = g(x, y)fX,Y (x, y) dxdy −∞ −∞ The function g(X,Y ) may be X, Y , X2, X + Y , etc.

• The correlation of X and Y is defined as E(XY )

• The covariance of X and Y is defined as

Cov(X,Y )=E[(X − E(X))(Y − E(Y ))] =E[XY − X E(Y ) − Y E(X)+E(X) E(Y )] = E(XY ) − E(X) E(Y )

Note that if X = Y , then Cov(X,Y ) = Var(X)

EE 178/278A: Expectation Page 4–5

• Example: Let 2 for x, y ≥ 0, x + y ≤ 1 f(x, y)= 0 otherwise Find E(X), Var(X), and Cov(X,Y ) Solution: The mean is

∞ ∞ E(X)= xf(x, y) dy dx −∞ −∞ 1 1−x 1 1 = 2xdydx = 2 (1 − x)x dx = 3 0 0 0 To find the variance, we first find the second moment

1 1−x 1 1 E(X2)= 2x2 dy dx = 2 (1 − x)x2 dx = 6 0 0 0 Thus, 1 1 1 Var(X)=E(X2) − (E(X))2 = − = 6 9 18

EE 178/278A: Expectation Page 4–6 The covariance of X and Y is

1 1−x Cov(X,Y ) = 2 xy dy dx − E(X) E(Y ) 0 0 1 1 1 1 1 = x(1 − x)2 dx − = − = − 9 12 9 36 0

EE 178/278A: Expectation Page 4–7

Independence and Uncorrelation

• Let X and Y be independent r.v.s and g(X) and h(Y ) be functions of X and Y , respectively, then E(g(X)h(Y )) = E(g(X))E(h(Y ))

Proof: Let’s assume that X ∼ fX(x) and Y ∼ fY (y), then ∞ ∞ E(g(X)h(Y )) = g(x)h(y)fX,Y (x, y) dx dy −∞ −∞ ∞ ∞ = g(x)h(y)fX(x)fY (y) dx dy −∞ −∞ ∞ ∞ = g(x)fX(x) dx h(y)fY (y) dy −∞ −∞ = E(g(X))E(h(Y ))

• X and Y are said to be uncorrelated if Cov(X,Y ) = 0, or equivalently E(XY )=E(X) E(Y )

EE 178/278A: Expectation Page 4–8 • From our independence result, if X and Y are independent then they are uncorrelated To show this, set g(X)=(X − E(X)) and h(Y )=(Y − E(Y )), then

Cov(X,Y ) = E[(X − E(X))(Y − E(Y ))] = E(X − E(X))E(Y − E(Y ))=0

• However, if X and Y are uncorrelated they are not necessarily independent

• Example: Let X,Y ∈ {−2, −1, 1, 2} such that

pX,Y (1, 1) = 2/5, pX,Y (−1, −1) = 2/5

pX,Y (−2, 2) = 1/10, pX,Y (2, −2) = 1/10,

pX,Y (x, y) = 0, otherwise

Are X and Y independent? Are they uncorrelated?

EE 178/278A: Expectation Page 4–9

Solution: y 1/10 2 2/5 1

−2 −11 2 x 2/5 −1 1/10 −2

Clearly X and Y are not independent, since if you know the outcome of one, you completely know the outcome of the other. Let’s check their covariance 2 2 2 2 E(X)= − − + = 0, also 5 5 10 10 E(Y ) = 0, and 2 2 4 4 E(XY )= + − − = 0 5 5 10 10 Thus, Cov(X,Y ) = 0, and X and Y are uncorrelated!

EE 178/278A: Expectation Page 4–10 The Correlation Coefficient

• The correlation coefficient of X and Y is defined as Cov(X,Y ) ρX,Y = Var(X)Var(Y )

• Fact: |ρX,Y |≤ 1. To show this consider

2 X − E(X) Y − E(Y ) E ± ≥ 0 σ σ X Y E (X − E(X))2 E (Y − E(Y ))2 E (X − E(X))(Y − E(Y )) 2 + 2 ± 2 ≥ 0 σX σY σXσY

1 + 1 ± 2ρX,Y ≥ 0 =⇒ −2 ≤ 2ρX,Y ≤ 2 =⇒ |ρX,Y |≤ 1

• From the proof, ρX,Y = ±1 iff (X − E(X))/σX = ±(Y − E(Y ))/σY (equality with probability 1), i.e., iff X − E(X) is a linear function of Y − E(Y )

• In general ρX,Y is a measure of how closely (X − E(X)) can be approximated or estimated by a linear function of (Y − E(Y ))

EE 178/278A: Expectation Page 4–11

Application: Linear MSE Estimation

• Consider the following signal processing problem:

Noisy Y X ˆ Channel Estimator X aX + b

• Here X is a signal (music, speech, image) and Y is a noisy observation of X (output of a noisy communication channel or a noisy circuit). Assume we know the means, variances and covariance of X and Y

• Observing Y , we wish to find a linear estimate of X of the form Xˆ = aY + b, which minimizes the mean square error MSE = E (X − Xˆ)2 • We denote the best such estimate as the minimum mean square estimate (MMSE)

EE 178/278A: Expectation Page 4–12 • The MMSE linear estimate of X given Y is given by

ˆ Cov(X,Y ) X = 2 (Y − E(Y ))+E(X) σY Y − E(Y ) = ρ σ + E(X) X,Y X σ Y and its MSE is given by

2 2 Cov (X,Y ) 2 2 MSE = σX − 2 = (1 − ρX,Y )σX σY

• Properties of MMSE linear estimate: – E(Xˆ)=E(X), i.e., estimate is unbiased – If ρX,Y = 0, i.e., X and Y are uncorrelated, then Xˆ = E(X) (ignore the observation Y ) – If ρX,Y = ±1, i.e., X − E(X) and Y − E(Y ) are linearly dependent, then the linear estimate is perfect

EE 178/278A: Expectation Page 4–13

Proof

2 • We first show that mina E (X − b) = Var(X) and is achieved for b = E(X), i.e., in the absence of any observations, the mean of X is its minimum MSE estimate, and the minimum MSE is Var(X) To show this consider E (X − c)2 = E [(X − E(X)) + (E(X) − b)]2

= E (X − E(X))2 + (E(X) − b)2+ 2(E( X) − b) E(X − E(X)) = E (X − E(X))2 + (E(X) − b)2 ≥ E (X − E(X))2 , with equality iff b = E(X) • Now, back to our problem. Suppose a has already been chosen. What should b be to minimize E (X − aY − b)2 ? From the above result, we should choose

EE 178/278A: Expectation Page 4–14 b = E(X − aY )=E(X) − a E(Y )

So, we want to choose a to minimize

E ((X − aY ) − E(X − aY ))2 , which is the same as

2 2 2 2 E ((X − E(X)) − a(Y − E(Y ))) = σX + a σY − 2aCov(X,Y ) This is a quadratic function of a. It is minimized when its derivative equals 0, which gives Cov(X,Y ) ρX,Y σXσY ρX,Y σX a = 2 = 2 = σY σY σY The mean square error is given by

2 2 2 2 2 2 ρX,Y σX 2 ρX,Y σX σX + a σY − 2aCov(X,Y )= σX + 2 σY − 2 × ρX,Y σXσY σY σY 2 2 = (1 − ρX,Y )σX

EE 178/278A: Expectation Page 4–15

Mean and Variance of Sum of RVs

• Let X1,X2,...,Xn be r.v.s, then by linearity of expectation, the expected value of their sum Y is n n

E(Y ) = E Xi = E(Xi) i=1 i=1 Example: Mean of Binomial r.v. One way to define a binomial r.v. is as follows: Flip a coin with bias p independently n times and define the Bernoulli r.v. n Xi = 1 if the ith flip is a head and 0 if it is a tail. Let Y = i=1 Xi. Then Y is a binomial r.v. Thus n

E(Y )= E(Xi)= np i=1 Note that we do not need independence for this result to hold, i.e., the result holds even if the coin flips are not independent (Y is not binomial in this case, but the expectation doesn’t change)

EE 178/278A: Expectation Page 4–16 n • Let’s compute the variance of Y = i=1 Xi

2 Var(Y ) = E (Y − E(Y ))

n n 2 = E X − E(X )  i i  i=1 i=1  n 2  = E (X − E(X ))  i i  i=1  n n  = E (X − E(X ))(X − E(X ))  i i j j  i=1 j=1 nn  = E[(Xi − E(Xi))(Xj − E(Xj))] i=1 j=1 n n n

= Var(Xi)+ Cov(Xi,Xj) i=1 i=1 j=i

EE 178/278A: Expectation Page 4–17

• If the r.v.s are independent, then Cov(Xi,Xj) = 0 for all i = j, and n

Var(Y )= Var(Xi) i=1 Note that this result only requires that Cov(Xi,Xj) = 0, for all i = j, and therefore it requires that the r.v.s be uncorrelated (which is in general weaker than independence)

n • Example: Variance of Binomial r.v. Again express Y = i=1 Xi, where the Xis are i.i.d. Bern(p). Since the Xis are independent, Cov(Xi,Xj) = 0, for all i = j. Thus n Var(Y )= Var(Xi)= n × p(1 − p) i=1 • Example: Hat problem Suppose n people throw their hats in a box and then each picks one hat at random. Let N be the number of people that get back their own hat. Find E(N) and Var(N)

Solution: Define the r.v. Xi = 1 if a person selects her own hat, and Xi = 0, n otherwise. Thus N = i=1 Xi

EE 178/278A: Expectation Page 4–18 To find the mean and variance of N , we first find the means, variances and covariances of the Xis. Note that Xi ∼ Bern(1/n) and thus E(Xi) = 1/n, and Var(Xi)=(1/n)(1 − 1/n). To find the covariance of Xi and Xj , i = j, note that 1 p (1, 1) = Xi,Xj n(n − 1) Thus, 1 1 2 1 Cov(X ,X )=E(X X ) − E(X ) E(X )= × 1 − = i j i j i j n(n − 1) n n2(n − 1) The mean and variance of N are given by

E(N)= n E(X1) = 1 n n n

Var(N)= Var(Xi)+ Cov(Xi,Xj) i=1 i=1 j=i = n × Var(X1)+ n(n − 1)Cov(X1,X2) 1 = (1 − 1/n)+ n(n − 1) × = 1 n2(n − 1)

EE 178/278A: Expectation Page 4–19

Method of Indicators

• In the last two examples we used the method of indicators to simplify the computation of expectation

• In general the indicator of an event A ⊂ Ω is a r.v. defined as

1 if x ∈ A I (ω)= A 0 otherwise Thus c E(IA) = 1 × P(A) + 0 × P(A )=P(A) The method of indicators involves expressing a given r.v. Y as a sum of indicators in order to simplify the computation of its expectation (this is precisely what we did in the last two examples) Example: Spaghetti. Consider a ball of n spaghetti strands. You randomly pick two strand ends and join them. The process is continued until there are no ends left. Let X be the number of spaghetti loops formed. What is E(X)?

EE 178/278A: Expectation Page 4–20 Conditional Expectation

• Conditioning on an event: Let X ∼ pX(x) be a r.v. and A be a nonzero probability event. We can define the conditional pmf of X given X ∈ A as

pX(x) P{X = x, X ∈ A} P{X∈A}, if x ∈ A pX|A(x) = P{X = x|X ∈ A} = = P{X ∈ A} 0, otherwise

Note that pX|A(x) is a pmf on X

• Similarly for X ∼ fX(x),

fX(x) P{X∈A}, if x ∈ A fX|A(x)= 0, otherwise is a pdf on X • Example: Let X ∼ Exp(λ) and A = {X>a}, for some constant a> 0. Find the conditional pdf of X given A

EE 178/278A: Expectation Page 4–21

• We define the conditional expectation of g(X) given X ∈ A as

∞ E(g(X)|A)= g(x)fX|A(x) dx −∞

• Example: Find E(X|A) and E(X2|A) for the previous example.

• Total expectation: Let X ∼ fX(x) and A1, A2,...,An ⊂ (−∞, ∞) be disjoint n n nonzero probability events with P{X ∈∪i=1Ai} = i=1 P{X ∈ Ai} = 1, then n E(g(X)) = P{X ∈ Ai} E(g(X)|Ai) i=1 This is called the total expectation theorem and is useful in computing expectation by divide-and-conquer Proof: First note that by the

n

fX(x)= P{X ∈ Ai}fX|Ai(x) i=1

EE 178/278A: Expectation Page 4–22 Therefore

∞ E(g(X)) = g(x)fX(x) dx −∞ ∞ n

= g(x) P{X ∈ Ai}fX|Ai(x) dx −∞ i=1 n ∞ n

= P{X ∈ Ai} g(x)fX|Ai(x) dx = P{X ∈ Ai} E(g(X)|Ai) i=1 −∞ i=1

• Example: mean and variance of Piecewise uniform pdf Let X be a continuous r.v. with the piecewise uniform pdf

1/3, if 0 ≤ x ≤ 1 f (x)= 2/3, if 1 < x ≤ 2 X   0, otherwise Find the mean and variance of X

EE 178/278A: Expectation Page 4–23

Solution: Define the events A1 = {X ∈ [0, 1]} and A2 = {X ∈ (1, 2]}

Then, A1, A2 are disjoint and the sum of their probabilities is 1. The mean of X can be expressed as

2 1 1 2 3 7 E(X)= P{X ∈ A } E(X|A )= × + × = i i 3 2 3 2 6 i=1 Also 2 1 1 2 7 15 E(X2)= P{X ∈ A } E(X2|A )= × + × = i i 3 3 3 3 9 i=1 Thus 11 Var(X)=E(X2) − (E(X))2 = 36 • Mean and variance of a mixed r.v. As we discussed, there are r.v.s that are neither discrete nor continuous. How do we define their expectation?

EE 178/278A: Expectation Page 4–24 Answer: We can express a mixed r.v. X as a mixture of a discrete r.v. Y and a continuous r.v. Z as follows Assume that the cdf of X is discontinuous over the set Y and that

y∈Y P{X = y} = p. Define the discrete r.v. Y ∈ Y to have the pmf 1 p (y)= P{X = y}, y ∈ Y Y p Define the continuous r.v. Z such that 1 dFX z∈ / Y f (z)= 1−p dx Z f (z−) z ∈ Y Z Now, we can express X as Y with probability p X = Z with probability 1 − p To find E(X), we use the law of total expectation E(X)= p E(X|X ∈ Y)+(1 − p) E(X|X∈ / Y)= p E(Y )+(1 − p) E(Z)

Now both E(Y ) and E(Z) can be computed in the usual way

EE 178/278A: Expectation Page 4–25

Conditioning on a RV

• Let (X,Y ) ∼ fX,Y (x, y). If fY (y) = 0, the conditional pdf of X given Y = y is given by

fX,Y (x, y) fX|Y (x|y)= fY (y)

• We know that fX|Y (x|y) is a pdf for X (function of y), so we can define the expectation of any function g(X,Y ) w.r.t. fX|Y (x|y) as

∞ E(g(X,Y )|Y = y)= g(x, y)fX|Y (x|y) dx −∞

• If g(X,Y )= X, then, the conditional expectation of X given Y = y is

∞ E(X|Y = y)= xfX|Y (x|y) dx −∞

EE 178/278A: Expectation Page 4–26 • Example: Let 2 for x, y ≥ 0, x + y ≤ 1 f (x, y)= X,Y 0 otherwise Find E(X|Y = y) and E(XY |Y = y) Solution: We already know that 1 for x, y ≥ 0, x + y ≤ 1,y< 1 f (x|y)= 1−y X|Y 0 otherwise , thus 1−y 1 E(X|Y = y)= x dx 1 − y 0 1 − y = for 0 ≤ y < 1 2 Now to find E(XY |Y = y), note that E(XY |Y = y)= y E(X|Y = y) y(1 − y) = for 0 ≤ y < 1 2

EE 178/278A: Expectation Page 4–27

Conditional Expectation as a RV

• We define the conditional expectation of g(X,Y ) given Y as the E(g(X,Y )|Y ), which is a function of the random variable Y

• So, E(X|Y ) is the conditional expectation of X given Y , a r.v. that is a function of Y

• Example: This is a continuation of the previous example. Find the pdf of E(X|Y ) Solution: The conditional expectation of X given Y is the r.v.

1 − Y △ E(X|Y )= = Z 2 The pdf of Z is given by

1 f (z) = 8z for 0 < z ≤ Z 2

EE 178/278A: Expectation Page 4–28 fZ(z) 4

z 1 2

Now let’s find the expected value of the r.v. Z

1 2 1 E(Z)= 8z2dz = = E(X) 3 0

EE 178/278A: Expectation Page 4–29

Iterated Expectation

• In general we can find E(g(X,Y )) using iterated expectation as

E(g(X,Y ))=EY [EX(g(X,Y )|Y )] ,

where EX means expectation w.r.t. fX|Y (x|y) and EY means expectation w.r.t. fY (y). To show this consider

∞ EY [EX(g(X,Y )|Y )] = EX(g(X,Y )|Y = y)fY (y) dy −∞ ∞ ∞ = g(x, y)fX|Y (x|y) dx fY (y) dy −∞ −∞ ∞ ∞ = g(x, y)fX,Y (x, y) dx dy −∞ −∞ = E(g(X,Y ))

• This result can be very useful in computing expectation

EE 178/278A: Expectation Page 4–30 • Example: A coin has random bias P with fP (p) = 2(1 − p) for 0 ≤ p ≤ 1. The coin is flipped n times independently. Let N be the number of heads, find E(N) Solution: Of course we can first find the pmf of N , then find its expectation. Using iterated expectation we can find N faster. Consider

E(N) = EP [EN (N|P )]

= EP (nP ) 1 = n 2(1 − p)pdp 0 n = 3 • Example: Let E(X|Y )= Y 2, and Y ∼ U[0, 1], find E(X) Solution: Here we cannot first find the pdf of X, since we do not know fX|Y (x|y), but using iterated expectation we can easily find 1 1 E(X) = E [E (X|Y )] = y2dy = Y X 3 0

EE 178/278A: Expectation Page 4–31

Conditional Variance

• Let X and Y be two r.v.s. We define the conditional variance of X given Y = y as Var(X|Y = y) = E (X − E(X|Y = y))2|Y = y = E(X2|Y = y) − [E(X|Y = y)]2 • The r.v. Var(X|Y ) is simply a function of Y that takes on the values Var(X|Y = y). Its expected value is 2 2 2 2 EY [Var(X|Y )]=EY E(X |Y ) − (E(X|Y )) = E(X ) − E (E(X|Y )) • Since E(X|Y ) is a r.v., it has a variance 2 2 2 Var(E(X|Y ))=EY (E(X|Y ) − E[E(X|Y )]) = E (E(X|Y )) − (E(X)) • Law of Conditional Variances : We can show that Var(X) = E(Var(X|Y )) + Var (E(X | Y )) Proof: Simply add above expressions for E (Var(X|Y )) and Var (E(X | Y )): E(Var(X|Y )) + Var(E(X|Y )) = E(X2) − (E(X))2 = Var(X)

EE 178/278A: Expectation Page 4–32 Application: Nonlinear MSE Estimation

• Consider the estimation setup with signal X, observation Y • Assume we know the pdf of X and the conditional pdf of the channel fY |X(y|x) for all (x, y)

• We wish to find the best nonlinear estimate Xˆ = g(Y ) of X that minimizes the mean square error

MSE = E (X − Xˆ)2 = E (X − g(Y ))2 • The Xˆ that achieves the minimum MSE is called the minimum MSE estimate (MMSE) of X (given Y )

EE 178/278A: Expectation Page 4–33

MMSE Estimate

• The MMSE estimate of X given the observation Y and complete knowledge of the joint pdf fX,Y (x, y) is Xˆ = E(X|Y ) and its MSE (i.e., minimum MSE) is

MSE = E (X − E(X | Y ))2 = E E (X − E(X | Y ))2 Y

= EY [Var( X | Y )]

• Proof:

2 ◦ Recall that mina E (X − a) = Var(X) and is achieved for a = E(X), i.e., in the absence of any observations, the mean of X is its minimum MSE estimate, and the minimum MSE is Var(X) ◦ We now use this result to show that E(X|Y ) is the MMSE estimate of X given Y . First we use iterated expectation to write

EE 178/278A: Expectation Page 4–34 2 2 E (X − g(Y )) = EY EX (X − g(Y )) |Y

From the above result we know that for each Y = y, 2 EX (X − g(y)) |Y = y is minimum for g(y)=E(X|Y = y). Thus the MSE is minimized for g(Y )=E(X|Y ) Thus E(X|Y ) minimizes the MSE conditioned on every Y = y and not just its average over Y !

EE 178/278A: Expectation Page 4–35

• Properties of the minimum MSE estimator: ◦ Since E(Xˆ) = E[E(X|Y )] = E(X), the best MSE estimate is called unbiased ◦ If X and Y are independent, then the best MSE estimate is E(X) ◦ The conditional expectation of the estimation error E((X − Xˆ)|Y = y) = 0 for all y, i.e., the error is unbiased for every Y = y ◦ From the law of conditional variance Var(X) = Var(Xˆ) + E(Var(X|Y )), i.e., the sum of the variance of the estimate and the minimum MSE is equal to the variance of the signal

EE 178/278A: Expectation Page 4–36 Example

• Again let 2 for x, y ≥ 0, x + y ≤ 1 f(x, y)= 0 otherwise Find the MMSE estimate of X given Y and its MSE Solution: We already found the MMSE estimate to be 1 − Y E(X|Y )= for 0 ≤ Y ≤ 1 2 E(X|Y = y)

1 2

y 1

EE 178/278A: Expectation Page 4–37

We know that 1/(1 − y) for x, y ≥ 0, x + y ≤ 1,y < 1 f (x|y)= X|Y 0 otherwise , Thus for Y = y, the minimum MSE is given by (1 − y)2 Var(X|Y = y)= for 0 ≤ y < 1 12

Var(X|Y = y)

1 12

y 1

Thus the minimum MSE is EY (Var(X|Y ))=1/24, compared to Var(X) = 1/18. By the law of conditional variance, the difference is the variance of the estimate Var(E(X|Y ))=1/18 − 1/24 = 1/72

EE 178/278A: Expectation Page 4–38 Sum of Random Number of Independent RVs

• Let N be a r.v. taking positive integer values and X1,X2,... be a sequence of i.i.d. r.v.s independent of N

• Define the sum N

Y = Xi i=1 • We are given E(N), Var(N), the mean of Xi, E(X), and its variance Var(X) and wish to find the mean and variance of Y

• Using iterated expectation, the mean is:

EE 178/278A: Expectation Page 4–39

N

E(Y ) = EN E Xi N i=1 N = EN E(Xi|N) by linearity of expectation i=1 N

= EN E(Xi) Xi and Nareindependent i=1 = EN [N E(X)] = E(N) E(X)

Using the law of conditional variance, the variance is:

Var(Y ) = E [Var(Y |N)] + Var(E(Y |N)) =E[NVar(X)] + Var(N E(X)) = Var(X)E(N)+(E(X))2Var(N)

• Example: You visit bookstores looking for a copy of the Great Expectations.

EE 178/278A: Expectation Page 4–40 Each bookstore carries the book with probability p, independent of all other bookstores. You keep visiting bookstores until you find the book. In each bookstore visited, you spend a random amount of time, exponentially distributed with parameter λ. Assuming that you will keep visiting bookstores until you buy the book and that the time spent in each is independent of everything else, find the mean and variance of the total time spent in bookstores.

EE 178/278A: Expectation Page 4–41

Solution: The total number of bookstores visited is a r.v. N ∼ Geom(p). Let Xi be the amount of time spent in each bookstore. Thus, X1,X2,... are i.i.d. Exp(λ) r.v.s. Now let Y be the total amount of time spent looking for the books. Then N

Y = Xi i=1 The mean and variance of Y , thus, are

1 E(Y )=E(N) E(X)= pλ Var(Y ) = Var(X)E(N)+(E(X))2Var(N) 1 1 1 − p 1 = + × = pλ2 λ2 p2 p2λ2

EE 178/278A: Expectation Page 4–42