Lecture 10 Conditional Expectation

Lecture 10:Conditional Expectation 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 10 Conditional Expectation The definition and existence of conditional expectation For events A, B with P[B] > 0, we recall the familiar object P[ j ] = P[A\B] A B P[B] . We say that P[AjB] the conditional probability of A, given B. It is important to note that the condition P[B] > 0 is crucial. When X and Y are random variables defined on the same probability space, we often want to give a meaning to the expression P[X 2 AjY = y], even though it is usually the case that P[Y = y] = 0. When the random vector (X, Y) admits a joint density fX,Y(x, y), and fY(y) > 0, the concept of conditional density fXjY=y(x) = fX,Y(x, y)/ fY(y) is introduced and R the quantity P[X 2 AjY = y] is given meaning via A fXjY=y(x, y) dx. While this procedure works well in the restrictive case of absolutely continuous random vectors, we will see how it is encompassed by a general concept of a conditional expectation. Since probability is simply an expectation of an indicator, and expectations are linear, it will be easier to work with expectations and no generality will be lost. Two main conceptual leaps here are: 1) we condition with respect to a s-algebra, and 2) we view the conditional expectation itself as a random variable. Before we illustrate the concept in discrete time, here is the definition. Definition 10.1. Let G be a sub-s-algebra of F, and let X 2 L1 be a random variable. We say that the random variable x is (a version of) the conditional expectation of X with respect to G - and denote it by E[XjG] - if 1. x 2 L1. 2. x is G-measurable, 3. E[x1A] = E[X1A], for all A 2 G. Last Updated: January 24, 2015 Lecture 10:Conditional Expectation 2 of 17 Example 10.2. Suppose that (W, F, P) is a probability space where X W = fa, b, c, d, e, f g, F = 2W and P is uniform. Let X, Y and Z be E[X (Y )] random variables given by (in the obvious notation) ! a b c d e f X ∼ , 1 3 3 5 5 7 ! ! a b c d e f a b c d e f Y ∼ and Z ∼ 2 2 1 1 7 7 3 3 3 3 2 2 We would like to think about E[XjG] as the average of X(w) over all w which are consistent with the our current information (which is G). For G = (Y) G example, if s , then the information contained in is exactly X the information about the exact value of Y. Knowledge of the fact that [X ( )] Y = y does not necessarily reveal the “true” w, but certainly rules out E all those w for which Y(w) 6= y. In our specific case, if we know that Y = 2, then w = a or w = b, 1 1 and the expected value of X, given that Y = 2, is 2 X(a) + 2 X(b) = 2. Similarly, this average equals 4 for Y = 1, and 6 for Y = 7. Let us show that the random variable x defined by this average, i.e., ! a b c d e f x ∼ , 2 2 4 4 6 6 satisfies the definition of E[Xjs(Y)], as given above. The integrability is not an issue (we are on a finite probability space), and it is clear that x is measurable with respect to s(Y). Indeed, the atoms of s(Y) are fa, bg, fc, dg and fe, f g, and x is constant over each one of those. Finally, we need to check that E[x1A] = E[X1A], for all A 2 s(Y), which for an atom A translates into ( ) = 1 E[ ] = ( 0)P[f 0gj ] 2 x w P[A] X1A ∑ X w w A , for all w A. w02A The moral of the story is that when A is an atom, part 3. of Defini- tion 10.1 translates into a requirement that x be constant on A with value equal to the expectation of X over A with respect to the conditional probability P[·jA]. In the general case, when there are no atoms, 3. still makes sense and conveys the same message. Btw, since the atoms of s(Z) are fa, b, c, dg and fe, f g, it is clear that 8 <3, w 2 fa, b, c, dg, E[Xjs(Z)](w) = :6, w 2 fe, f g. Last Updated: January 24, 2015 Lecture 10:Conditional Expectation 3 of 17 Look at the illustrations above and convince yourself that E[E[Xjs(Y)]js(Z)] = E[Xjs(Z)]. A general result along the same lines - called the tower property of conditional expectation - will be stated and proved below. Our first task is to prove that conditional expectations always exist. When W is finite (as explained above) or countable, we can always construct them by averaging over atoms. In the general case, a different argument is needed. In fact, here are two: Proposition 10.3. Let G be a sub-s-algebra G of F. Then 1. there exists a conditional expectation E[XjG] for any X 2 L1, and 2. any two conditional expectations of X 2 L1 are equal P-a.s. Proof. (Uniqueness): Suppose that x and x0 both satisfy 1., 2. and 3. of Definition 10.1. Then 0 E[x1A] = E[x 1A], for all A 2 G. 0 1 For An = fx − x ≥ n g, we have An 2 G and so E[ ] = E[ 0 ] ≥ E[( + 1 ) ] = E[ ] + 1 P[ ] x1An x 1An x n 1An x1An n An . 0 Consequently, P[An] = 0, for all n 2 N, so that P[x > x] = 0. By a symmetric argument, we also have P[x0 < x] = 0. (Existence): By linearity, it will be enough to prove that the condi- 1 tional expectation exists for X 2 L+. 1. A Radon-Nikodym argument. Suppose, first, that X ≥ 0 and E[X] = 1, as the general case follows by additivity and scaling. Then the prescription Q[A] = E[X1A], defines a probability measure on (W, F), which is absolutely continuous with respect to P. Let QG be the restriction of Q to G; it is trivially absolutely continuous with respect to the restriction PG of P to G. The Radon-Nikodym theorem - applied to the measure space (W, G, PG ) and the measure QG PG - guarantees the existence of the Radon- Nikodym derivative dQG x = 2 L1 (W, G, PG ). dPG + For A 2 G, we thus have G PG E[X1A] = Q[A] = Q [A] = E [x1A] = E[x1A]. Last Updated: January 24, 2015 Lecture 10:Conditional Expectation 4 of 17 where the last equality follows from the fact that x1A is G-measurable. Therefore, x is (a version of) the conditional expectation E[XjG]. 1. An L2-argument. Suppose, first, that X 2 L2. Let H be the family of all G-measurable elements in L2. Let H¯ denote the closure of H in the topology induced by L2-convergence. Being a closed and convex (why?) subset of L2, H¯ satisfies all the conditions of Problem ?? so that there exists x 2 H¯ at the minimal L2-distance from X (when X 2 H¯ , we take x = X). The same problem states that x has the following property: E[(h − x)(X − x)] ≥ 0 for all h 2 H¯ , and, since H¯ is a linear space, we have E[(h − x)(X − x)] = 0, for all h 2 H¯ . It remains to pick h of the form h = x + 1A 2 H¯ , A 2 G, to conclude that E[X1A] = E[x1A], for all A 2 G. Our next step is to show that x is G-measurable (after a modifi- cation on a null set, perhaps). Since x 2 H¯ , there exists a sequence 2 a.s. fxngn2N such that xn ! x in L . By Corollary ??, xnk ! x, for 0 some subsequence fxnk gk2N of fxngn2N. Set x = lim infk2N xnk 2 0 ˆ 0 ˆ ˆ L ([−¥, ¥], G) and x = x 1fjx0j<¥g, so that x = x, a.s., and x is G- measurable. 2 We still need to remove the restriction X 2 L+. We start with 1 ¥ 2 a general X 2 L+ and define Xn = min(X, n) 2 L+ ⊆ L+. Let xn = E[XnjG], and note that E[xn+11A] = E[Xn+11A] ≥ E[Xn1A] = E[xn1A]. It follows (just like in the proof of uniqueness above) that xn ≤ xn+1, a.s. We define x = supn xn, so that xn % x, a.s. Then, for A 2 G, the monotone-convergence theorem implies that E[X1A] = lim E[Xn1A] = lim E[xn1A] = E[x1A], n n 1 and it is easy to check that x1fx<¥g 2 L (G) is a version of E[XjG]. Remark 10.4. There is no canonical way to choose “the version” of the conditional expectation. We follow the convention started with Radon- Nikodym derivatives, and interpret a statement such at x ≤ E[XjG], a.s., to mean that x ≤ x0, a.s., for any version x0 of the conditional expectation of X with respect to G. If we use the symbol L1 to denote the set of all a.s.-equivalence classes of random variables in L1, we can write: E[·jG] : L1(F) ! L1(G), Last Updated: January 24, 2015 Lecture 10:Conditional Expectation 5 of 17 but L1(G) cannot be replaced by L1(G) in a natural way.

Lecture 10 Conditional Expectation

Lecture 22: Bivariate Normal Distribution Distribution

Bayes and the Law

Estimating the Accuracy of Jury Verdicts

General Probability, II: Independence and Conditional Proba- Bility

Laws of Total Expectation and Total Variance

Randomized Algorithms

Propensities and Probabilities

MA3K0 - High-Dimensional Probability

Conditional Probability and Bayes Theorem A

Multicollinearity

CONDITIONAL EXPECTATION Definition 1. Let (Ω,F,P)

Conditional Expectation and Prediction Conditional Frequency Functions and Pdfs Have Properties of Ordinary Frequency and Density Functions