Stat 110 Strategic Practice 10, Fall 2011 1 Conditional Expectation

Stat 110 Strategic Practice 10, Fall 2011 Prof. Joe Blitzstein (Department of Statistics, Harvard University) 1 Conditional Expectation & Conditional Variance 1. Show that E((Y E(Y X))2 X)=E(Y 2 X) (E(Y X))2, so these two ex- pressions for Var(−Y X)agree.| | | − | | 2. Prove Eve’s Law. 3. Let X and Y be random variables with finite variances, and let W = Y E(Y X). This is a residual:thedi↵erencebetweenthetruevalueofY and the− predicted| value of Y based on X. (a) Compute E(W )andE(W X). | (b) Compute Var(W ), for the case that W X (0,X2)withX (0, 1). | ⇠N ⇠N 4. Emails arrive one at a time in an inbox. Let Tn be the time at which the nth email arrives (measured on a continuous scale from some starting point in time). Suppose that the waiting times between emails are i.i.d. Expo(λ), i.e., T ,T T ,T T ,... are i.i.d. Expo(λ). 1 2 − 1 3 − 2 Each email is non-spam with probability p,andspamwithprobabilityq =1 p (independently of the other emails and of the waiting times). Let X be the− time at which the first non-spam email arrives (so X is a continuous r.v., with X = T1 if the 1st email is non-spam, X = T2 if the 1st email is spam but the 2nd one isn’t, etc.). (a) Find the mean and variance of X. (b) Find the MGF of X. What famous distribution does this imply that X has (be sure to state its parameter values)? Hint for both parts: let N be the number of emails until the first non-spam (including that one), and write X as a sum of N terms; then condition on N. 5. One of two identical-looking coins is picked from a hat randomly, where one coin has probability p1 of Heads and the other has probability p2 of Heads. Let X be the number of Heads after flipping the chosen coin n times. Find the mean and variance of X. 1 2 Inequalities 1. Let X and Y be i.i.d. positive r.v.s, and let c>0. For each part below, fill in the appropriate equality or inequality symbol: write = if the two sides are always equal, if the lefthand side is less than or equal to the righthand side (but they are not necessarily equal), and similarly for .Ifnorelationholds in general, write ?. ≥ (a) E(ln(X)) ln(E(X)) (b) E(X) E(X2) (c) E(sin2(X)) +pE(cos2(X)) 1 (d) E( X ) E(X2) | | p E(X3) (e) P (X>c) c3 (f) P (X Y ) P (X Y ) ≥ (g) E(XY ) E(X2)E(Y 2) (h) P (X + Y>10)p P (X>5orY>5) (i) E(min(X, Y )) min(EX,EY ) EX (j) E(X/Y ) EY (k) E(X2(X2 +1)) E(X2(Y 2 +1)) X3 Y 3 (l) E( X3+Y 3 ) E( X3+Y 3 ) 2. (a) Show that E(1/X) > 1/(EX)foranypositivenon-constantr.v.X. (b) Show that for any two positive r.v.s X and Y with neither a constant multiple of the other, E(X/Y )E(Y/X) > 1. 2 3. For i.i.d. r.v.s X1,...,Xn with mean µ and variance σ ,giveavalueofn (as a specific number) that will ensure that there is at least a 99% chance that the sample mean will be within 2 standard deviations of the true mean µ. 2 4. The famous arithmetic mean-geometric mean inequality says that for any positive numbers a1,a2,...,an, a + a + + a 1 2 ··· n (a a a )1/n. n ≥ 1 2 ··· n Show that this inequality follows from Jensen’s inequality, by considering E log(X) for a r.v. X whose possible values are a1,...,an (you should specify the PMF of X;ifyouwant,youcanassumethattheaj are distinct (no repetitions), but be sure to say so if you assume this). 3 Stat 110 Strategic Practice 10 Solutions, Fall 2011 Prof. Joe Blitzstein (Department of Statistics, Harvard University) 1 Conditional Expectation & Conditional Variance 1. Show that E((Y E(Y X))2 X)=E(Y 2 X) (E(Y X))2, so these two ex- pressions for Var(−Y X)agree.| | | − | | This is the conditional version of the fact that Var(Y )=E((Y E(Y ))2)=E(Y 2) (E(Y ))2, − − and so must be true since conditional expectations are expectations, just as conditional probabilities are probabilities. Algebraically, letting g(X)=E(Y X) we have | E((Y E(Y X))2 X)=E(Y 2 2Yg(X)+g(X)2 X)=E(Y 2 X) 2E(Yg(X) X)+E(g(X)2 X), − | | − | | − | | and E(Yg(X) X)=g(X)E(Y X)=g(X)2,E(g(X)2 X)=g(X)2 by taking out what’s known,| so the righthand| side above simplifies| to E(Y 2 X) g(X)2. | − 2. Prove Eve’s Law. We will show that Var(Y )=E(Var(Y X))+Var(E(Y X)). Let g(X)=E(Y X). By Adam’s Law, E(g(X)) = E(Y ). Then| | | E(Var(Y X)) = E(E(Y 2 X) g(X)2)=E(Y 2) E(g(X)2), | | − − Var(E(Y X)) = E(g(X)2) (E(g(X))2 = E(g(X)2) (E(Y ))2. | − − Adding these equations, we have Eve’s Law. 3. Let X and Y be random variables with finite variances, and let W = Y E(Y X). This is a residual:thedi↵erencebetweenthetruevalueofY and the− predicted| value of Y based on X. (a) Compute E(W )andE(W X). | Adam’s law (iterated expectation), taking out what’s known, and linearity give E(W )=EY E(E(Y X)) = EY EY =0, − | − E(W X)=E(Y X) E(E(Y X) X)=E(Y X) E(Y X)=0. | | − | | | − | 1 (b) Compute Var(W ), for the case that W X (0,X2)withX (0, 1). | ⇠N ⇠N Eve’s Law gives Var(W )=Var(E(W X)) + E(Var(W X)) = Var(0) + E(X2)=0+1=1. | | 4. Emails arrive one at a time in an inbox. Let Tn be the time at which the nth email arrives (measured on a continuous scale from some starting point in time). Suppose that the waiting times between emails are i.i.d. Expo(λ), i.e., T ,T T ,T T ,... are i.i.d. Expo(λ). 1 2 − 1 3 − 2 Each email is non-spam with probability p,andspamwithprobabilityq =1 p (independently of the other emails and of the waiting times). Let X be the− time at which the first non-spam email arrives (so X is a continuous r.v., with X = T1 if the 1st email is non-spam, X = T2 if the 1st email is spam but the 2nd one isn’t, etc.). (a) Find the mean and variance of X. Write X = X1 + X2 + + XN , where Xj is the time from the (j 1)th to the jth email for j 2, and···X = T .ThenN 1 Geom(p), so − ≥ 1 1 − ⇠ 1 1 E(X)=E(E(X N)) = E(N )= . | λ pλ And 1 1 Var(X)=E(Var(X N)) + Var(E(X N)) = E(N )+Var(N ), | | λ2 λ which is 1 1 p 1 + − = . pλ2 p2λ2 p2λ2 (b) Find the MGF of X. What famous distribution does this imply that X has (be sure to state its parameter values)? Hint for both parts: let N be the number of emails until the first non-spam (including that one), and write X as a sum of N terms; then condition on N. Again conditioning on N,theMGFis E(etX )=E(E(etX1 etX2 etXN N)) = E E(etX1 N)E(etX2 N) ...E(etXN N) = E(M (t)N ), ··· | | | | 1 2 λ where M1(t)istheMGFofX1 (which is λ t for t<λ). By LOTUS, this is − pλ 1 1 n n 1 p n p qM1(t) λ t pλ p M (t) q − = (qM (t)) = = − = 1 q 1 q 1 qM (t) qλ pλ t n=1 n=1 1 1 λ t X X − − − − for t<pλ(as we need qM1(t) < 1fortheseriestoconverge).Thisisthe Expo(pλ) MGF, so X Expo(pλ). ⇠ 5. One of two identical-looking coins is picked from a hat randomly, where one coin has probability p1 of Heads and the other has probability p2 of Heads. Let X be the number of Heads after flipping the chosen coin n times. Find the mean and variance of X. The distribution of X is a mixture of two Binomials; we have seen before that X is not Binomial unless p1 = p2.LetI be the indicator of having the p1 coin. Then 1 E(X)=E(X I =1)P (I =1)+E(X I =0)P (I =0)= n(p + p ). | | 2 1 2 Alternatively, we can represent X as X = IX +(1 I)X with X Bin(n, p ), 1 − 2 j ⇠ j and I,X1,X2 independent. Then 1 E(X)=E(E(X I)) = E(Inp +(1 I)np )= n(p + p ). | 1 − 2 2 1 2 For the variance, note that it is not valid to say “Var(X)=Var(X I =1)P (I = 1) + Var(X I =0)P (I = 0)”; an extreme example of this mistake| would be claiming that| “Var(I)=0sinceVar(I I =1)P (I =1)+Var(I I =0)P (I = 1 | | 0) = 0”; of course, Var(I)= 4 ).

Stat 110 Strategic Practice 10, Fall 2011 1 Conditional Expectation

Lecture 22: Bivariate Normal Distribution Distribution

Laws of Total Expectation and Total Variance

Multicollinearity

CONDITIONAL EXPECTATION Definition 1. Let (Ω,F,P)

Conditional Expectation and Prediction Conditional Frequency Functions and Pdfs Have Properties of Ordinary Frequency and Density Functions

Negative Binomial Regression

Section 1: Regression Review

Assumptions of the Simple Classical Linear Regression Model (CLRM)

Lectures Prepared By: Elchanan Mossel Yelena Shvets

(Introduction to Probability at an Advanced Level) - All Lecture Notes

5. Conditional Expectation A. Definition of Conditional Expectation

Week 6: Linear Regression with Two Regressors