Conditional Probability Theory
Total Page:16
File Type:pdf, Size:1020Kb
Conditional Probability Theory Ioanid Rosu 1 Introduction Conditional probability theory is one of the most difficult parts of basic probability theory. The reason is that it is hard to come up with good intutitions for it. Just for fun, let me give the formal definition of conditional expectations. Start with a probability space (Ω; ;P ), with Ω the event space, a σ-algebra on Ω, and P a probability on , i.e., P is a positiveF real measure on (Ω; ) with PF(Ω) = 1. Let be a sub-σ-algebra of .F Let f be a random variable on Ω (whichF is -measurable, but notB necessarily -measurable).F Then the conditional expectation E(f ) ofFf given is a -measurable functionB g on Ω which satisfies jB B B g = E(f ) Z f dP = Z g dP B : (1) jB () B B 8 2 B Not very enlightening now, is it? I think that, unless you are a mathematical genius, the most reasonable intuition you can come up with by just looking at the definition is this: Let's assume f is a simple function, i.e., f is constant on some partition of Ω by -measurable sets. Since has a finer \resolution" than (there are more sets in ), conditionalF expectation of f givenF must be some way of forcingBf to be identified by someoneF who only \sees" . The way to doB that is by averaging out f on those smaller subsets of , and making it constantB on the larger subsets of . F Directly from the definitionB we can deduce two properties which are used a lot in practice: Notice that if f is already -measurable, we can take g = f in the above definition, • hence E f = f. In otherB words, -measurable functions are treated as constants by the operatorf jBg E . In particular,B if X is any ( -measurable) random variable, we deduce this important{·|Bg property about conditional expectations:F E fX = fE X f -measurable: (2) f j Bg f jBg 8 B Suppose X :Ω R is a -measurable random variable. Define E f X = E f X , • ! F 1f j g f jB g where = X is the σ-algebra generated by all sets of the form X− (B), with B a Borel set1 inBR. ThenB we have the following property: E f X = φ(X) with φ : R R measurable: (3) f j g ! Date: March 17,2003. 1 A Borel set on R is simply an element in the σ-algebra R generated by all open intervals in R. Equivalently, a Borel set can be obtained by taking a countable numberB of intersections, unions, and complements of open intervals. 1 To explain briefly why this is the case, let g = E f X , which according to the definition f j g of conditional expectation is X -measurable. But for a function g :Ω R to be X - measurable is the same as g Bbeing a composition of the form φ X, with! φ : R B R measurable. (It is trivial that g = φ X as functions; the fact◦ that φ is measurable! ◦ comes from the fact that X induces a one-to-one correspondence between X and R restricted to Im X R. B B ⊆ But enough about the definition of conditional expectations. I'm not a big fan of it, although this is what you have to use in order to prove formulas. Luckily, there is an easier way of interpreting conditional expectations, but it comes at the cost of losing some generality. Namely, if we restrict ourselves at the subclass of \square-integrable" random variables, we can come up with a nice geometric intuition. This is because the square-integrable functions form something called a Hilbert space, where one can do geometry in the usual way. We define the Hilbert space2 of square-integrable -measurable functions on Ω by H F (X; Y ) = E XY : (4) f g The great thing about Hilbert spaces is that the apparently innocuous inner product generates a lot of structure, to the extent that in a Hilbert space one can work smoothly with geometric concepts such as distance, projection, angles, etc. For example, the distance d(X; Y ) and the angle α(X; Y ) are defined as follows: 1 E XY 2 2 d(X; Y ) = E (X Y ) ; cos α(X; Y ) = f1 g 1 : (5) f − g E X2 2 E Y 2 2 f g f g Notice that if we restrict our attention at the subspace 0 = X EX = 0 of de-meaned random variables in , the formula for the cosine of theH anglef becomes2 H j g H cos α(X; Y ) = corr(X; Y ) X; Y 0; (6) 8 2 H where corr(X; Y ) denotes correlation between X and Y . Also, we say that X and Y are perpendicular, and write X Y , if cos α(X; Y ) = 0. Equivalently, ? X Y E XY = 0: (7) ? () f g What is then the projection of an element X on the subspace of all -measurable functions in ? It must be some element Y 2so H that (b Y ) (XHB Y ) forB all b in . But Y is alsoH in , so we can replace b by b+2Y H,B and deduce− that b? (X− Y ) for all b in HB. HB ? − HB This means that E b(X Y ) = 0 for all b in . In particular, take b = 1B, the indicator f − g HB function of some subset B . We then get E 1B(X Y ) = 0, which translates into 2 B f − g B X dP = B Y dP . The subset B was arbitrary, so it follows that Y satisfies our definition ofR conditionalR expectation of X with respect to B. In other words, Proj X = E X ; (8) HB f jBg where Proj X is the projection of X on . HB HB 2A real Hilbert space is a real vector space , endowed with an inner product ( ; ): R, so that (x; x) 0 with equality iff x = 0, and such thatH with the resulting metric is complete.· · TheH × metric H ! is defined by d(x;≥ y) = (x y; x y)1=2. For details see for example Rudin, \Real andH Complex Analysis". − − 2 One last subtle point here is about the existence of the projection of X on the closed subspace = (the fact that is closed is proved by some easy measure theory). The existenceL of theH projectionB Y = ProjL X is a standard fact in Hilbert space theory, and is proved as follows: First notice that L is a Hilbert space in its own right (easy). Then one can define the continuous linear functionalL Φ( ) = (X; ): R. According to the Riesz' representation theorem, there exists Y so· that Φ( )· = (L!Y; ) on . But this is equivalent to (X Y; b) = 0 for all b , which2 by L the definition· of projection· L means exactly that Y is the projection− of X on 2.L So the crucial step in this proof is Riesz' theorem, which only makes use of the fact that L is complete and is closed (and of course plays around with the inner product). H L By contrast, to see what we gained by reducing the generality from all measurable functions to the square-integrable ones (and thus entering the neat world of Hilbert spaces), let's see what is involved in the general existence proof for the conditional expectation g = E f in (1). f jBg First notice that the measure B µ(B) = B f dP is absolutely continuous with respect to P (that's easy). Then the hard part7! is provedR by Radon{Nikodym, namely that there exists g a -measurable function such that µ(B) = B g dP . But then, given our definition (1), g is exactlyB the conditional expectation of f givenR . The Radon{Nikodym theorem is one of the deepest theorems in measure theory, and its proofB is one of the most technical proofs I know, given such an elementary statement. I dare you to read the proof and remember it after just one day. This concludes our discussion about the geometric interpretation of the conditional expec- tation. Now we want to put it to use. 2 Formulas There are two basic formulas in conditional probability theory: the law of iterated expecta- tions (9), also called the ADAM formula, and the EVE formula (10)3. Let X be a -measurable random variable, and two sub-σ-algebras of . Then: F B0 ⊆ B F E X 0 = E E X 0 ; (9) f jB g f jBg j B V X = E V X + V E X : (10) f g f jBg f jBg Both of them have a nice geometric interpretation, which we now discuss. The law of iterated expectations (9) is quite simple. With our interpretation of conditional expectations, it says that projecting X on the smaller subspace gives the same result as 0 first projecting X on the larger subspace , and then takingH ProjB X and projecting it further on . This has a simple geometricHB proof if you draw theH pictureB (I learned it in 0 8'th grade asHB the theorem of the 3 perpendiculars). But, to be more rigorous, consider the decomposition of as an orthogonal sum, = ?. Notice that via this decomposition the first componentH of X is exactly the projectionH HB Proj⊕ HB X. Now also has an orthogonal decomposition, = . Then the projectionH ofB Proj XHBon the first component, 0 ? HB HB ⊕ HB0B HB , is Proj Proj X. We now look at the decomposition = ? ?, which 0 0 0 HB HB0 HB H HB ⊕ HB B ⊕ HB 3If you haven't figured it out yet, the reason these are called the ADAM and EVE formulas comes from the form of the EVE equation (10), which should perhaps be called the EVVE equation.