1 Multivariate Law of Total Probability 2 Bayes8rule for Multivariate
Total Page:16
File Type:pdf, Size:1020Kb
MIT 14.30, Fall 2005 Raymond Guiteras Handout on multivariate law of total probability and Bayes’Rule In the following, let X = (X1;X2) be a random vector where X1 is a random K1-vector and X1 is a random K2-vector. As usual, we write the random vector in uppercase and the value in lowercase. 1 Multivariate Law of Total Probability Recall the original law of total probability: given a partition B1;:::;Bn and an event A, we can calculate the probability of the event A as a weighted average of the conditional probability of A given each event in the partition, with the probabilities of the partitioning events as the weights: P (A) = P (A B1) P (B1) + + P (A Bn) P (Bn) (1) j j Let’s apply this same idea to random vectors. We know that to obtain a marginal distribution, we integrate the other variables out of the joint distrib- ution: fX (x1) = fX ;X (x1; x2) dx2 (2) 1 1 2 K Zx2 R Z2 2 (Note that this is an K2-fold integral. Why?) From the de…nition of condi- tional probability, we know that fX1;X2 (x1; x2) fX1 X2 (x1 x2) = (3) j j fX2 (x2) and therefore fX1;X2 (x1; x2) = fX1 X2 (x1 x2) fX2 (x2) (4) j j This allows us to re-write (2) as fX1 (x1) = fX1 X2 (x1 x2) fX2 (x2) dx2 (5) j j Zx RKZ2 2 This is the Law of Total Probability for random vectors. Note how closely it resembles (1), our original LoTP. 2 Bayes’Rule for Multivariate Distributions Again, let’sreview Bayes’Rule for events: P (A Bi) P (Bi) P (Bi A) = j (6) j P (A) 1 We can expand the denominator by using (1) to obtain P (A Bi) P (Bi) P (Bi A) = j (7) j P (A B1) P (B1) + + P (A Bn) P (Bn) j j For all these conditional probabilities to be well-de…ned, of course, we have to assume that P (Bi) > 0 i. Again, let’sapply the logic of events to multivariate distributions and see what8 we get. Suppose we’reinterested in fX1 X2 (x1 x2) (8) j j As above, by de…nition we have fX1;X2 (x1; x2) fX1 X2 (x1 x2) = (9) j j fX2 (x2) To re-write the numerator, just observe that fX1;X2 (x1; x2) fX1 X2 (x2 x1) = (10) j j fX1 (x1) by de…nition. We can multiply through to obtain fX1;X2 (x1; x2) = fX2 X1 (x2 x1) fX1 (x1) (11) j j Substituting this into the numerator of (9), we have fX2 X1 (x2 x1) fX1 (x1) j fX1 X2 (x1 x2) = j (12) j j fX2 (x2) This is Bayes’ Rule for multivariate distributions. It may be convenient to apply the multivariate Law of Total Probability (5) to rewrite the denominator, obtaining1 fX2 X1 (x2 x1) fX1 (x1) j fX1 X2 (x1 x2) = j j j fX1;X2 (~x1; x2) dx~1 K x~1 R 1 R 2 R fX2 X1 (x2 x1) fX1 (x1) = j j (13) fX2 X1 (x2 x~1) fX2 (x2) dx~1 K j j x~1 R 1 R 2 R Again, note how this closely resembles (7) above. 1 The tilde overwriting x~1 is meant to remind you that you are integrating over all possible values of x~1, i.e. to distinguish the variable of integration from the value x1 at which you are calculating the conditional probability. 2.