MIT 14.30, Fall 2005 Raymond Guiteras Handout on multivariate law of total and Bayes’Rule

In the following, let X = (X1;X2) be a random vector where X1 is a random K1-vector and X1 is a random K2-vector. As usual, we write the random vector in uppercase and the value in lowercase.

1 Multivariate Law of Total Probability

Recall the original law of total probability: given a partition B1;:::;Bn and an event A, we can calculate the probability of the event A as a weighted average of the of A given each event in the partition, with the of the partitioning events as the weights:

P (A) = P (A B1) P (B1) + + P (A Bn) P (Bn) (1) j    j Let’s apply this same idea to random vectors. We know that to obtain a , we integrate the other variables out of the joint distrib- ution:

fX (x1) = fX ;X (x1; x2) dx2 (2) 1    1 2 K Zx2 R Z2 2 (Note that this is an K2-fold integral. Why?) From the de…nition of condi- tional probability, we know that

fX1;X2 (x1; x2) fX1 X2 (x1 x2) = (3) j j fX2 (x2) and therefore

fX1;X2 (x1; x2) = fX1 X2 (x1 x2) fX2 (x2) (4) j j This allows us to re-write (2) as

fX1 (x1) = fX1 X2 (x1 x2) fX2 (x2) dx2 (5)    j j Zx RKZ2 2 This is the Law of Total Probability for random vectors. Note how closely it resembles (1), our original LoTP.

2 Bayes’Rule for Multivariate Distributions

Again, let’sreview Bayes’Rule for events:

P (A Bi) P (Bi) P (Bi A) = j (6) j P (A)

1 We can expand the denominator by using (1) to obtain

P (A Bi) P (Bi) P (Bi A) = j (7) j P (A B1) P (B1) + + P (A Bn) P (Bn) j    j For all these conditional probabilities to be well-de…ned, of course, we have to assume that P (Bi) > 0 i. Again, let’sapply the logic of events to multivariate distributions and see what8 we get. Suppose we’reinterested in

fX1 X2 (x1 x2) (8) j j As above, by de…nition we have

fX1;X2 (x1; x2) fX1 X2 (x1 x2) = (9) j j fX2 (x2) To re-write the numerator, just observe that

fX1;X2 (x1; x2) fX1 X2 (x2 x1) = (10) j j fX1 (x1) by de…nition. We can multiply through to obtain

fX1;X2 (x1; x2) = fX2 X1 (x2 x1) fX1 (x1) (11) j j Substituting this into the numerator of (9), we have

fX2 X1 (x2 x1) fX1 (x1) j fX1 X2 (x1 x2) = j (12) j j fX2 (x2) This is Bayes’ Rule for multivariate distributions. It may be convenient to apply the multivariate Law of Total Probability (5) to rewrite the denominator, obtaining1

fX2 X1 (x2 x1) fX1 (x1) j fX1 X2 (x1 x2) = j j j fX1;X2 (~x1; x2) dx~1   K x~1 R 1 R 2 R fX2 X1 (x2 x1) fX1 (x1) = j j (13) fX2 X1 (x2 x~1) fX2 (x2) dx~1   K j j x~1 R 1 R 2 R Again, note how this closely resembles (7) above.

1 The tilde overwriting x~1 is meant to remind you that you are integrating over all possible values of x~1, i.e. to distinguish the variable of integration from the value x1 at which you are calculating the conditional probability.

2