STA111 - Lecture 3 Review, Total , and Bayes’ Rule

First, we will review permutations, combinations, and . Please check the previous set of notes for definitions and examples.

1 Law of total probability and Bayes’ rule

Let’s start the section with a simple result that can be derived from the axioms of probability. If A and B are events, P (A) = P (A ∩ B) + P (A ∩ Bc ). Drawing a helps here and I will draw one in Lecture (draw one yourself, it’s good practice!). A more formal argument would be: P (A) = P ((A∩B)∪(A∩Bc )) = P (A ∩ B) + P (A ∩ Bc ).

Law of total probability: The previous result can be extended to a more general case. If A1,A2, .... , An are disjoint events (Ai ∩ Aj = ∅ for i 6= j) and A1 ∪ A2 ∪ .... ∪ An = Ω (i.e. if A1, ... , An is a partition of Ω), then

P (B) = P (B ∩ A1) + P (B ∩ A2) + ... + P (B ∩ An) = P (A1)P (B | A1) + ... + P (An)P (B | An). This formula looks daunting, but it can be interpreted as a weighted average. As usual, drawing a Venn diagram helps, as we saw/will see in Lecture.

Examples: • Let’s go back to the example where Bobby has 3 bags of beans (Lecture 2). We can compute the probability that he picks a red bean (without knowing the bag from which he draws the bean) directly as 1/2, since he’s equally likely to select either bag and there are 40 red beans and 40 green beans in total. Alternatively, we can use the law of total probability. Let R be the event “the ball is red”, B1 be the event “Bobby draws the ball from bag 1”, B2 be the event “Bobby draws the ball from bag 2”, and B3 be the event “Bobby draws the ball from bag 3”. Note that B1 ∪ B2 ∪ B3 = Ω and they’re disjoint sets, so B1, B2, B3 constitute a partition of Ω. By the law of total probability: 1 1 1 25 1 5 1 P (R) = P (B )P (R | B ) + P (B )P (R | B ) + P (B )P (R | B ) = · + · + · = . 1 1 2 2 3 3 3 2 3 30 3 30 2 • Suppose that in country S, 40% of the people support party A, 30% of the people support party B, 20% support party C, and 10% support party D. Let Q be a certain policy. We’re given that 50% of the supporters of party A are in favor of Q, 40% of the supporters of party B are in favor of Q, 30% of the supporters of party C are in favor of Q, and 100% of the supporters of party D are in favor of Q. If we draw a citizen from this imaginary country at random, what is the probability that the citizen supports Q? Let Q denote the event “is in favor of policy Q”, A be the event “supports party A” (and so on for the rest of the parties). We can find P (Q) using the law of total probability:

P (Q) = P (A)P (Q | A) + P (B)P (Q | B) + P (C)P (Q | C) + P (D)P (Q | D) = 0.4 · 0.5 + 0.3 · 0.4 + 0.2 · 0.3 + 0.1 · 1 = 0.48.

Bayes’ Rule: Suppose that A1,A2, ... , An is a partition of the Ω and suppose B is an event. Then, for any j ∈ {1, 2, ... , n}, P (B)P (B | Aj ) P (Aj | B) = Pn i=1 P (Ai )P (B | Ai )

1 This formula can be derived by combining the definition of conditional probability and the law of total prob- ability.

Examples: • Bobby has drawn a ball and it turns out that it is red (in the recurring example where he has 3 bags and draws a ball from one of them; please go back to page 3 in Lecture 2 to see the details if you don’t remember). Let’s compute the probability that it comes from bag 1, bag 2, or bag 3, and find the “most probable bag”. Intuitively, it should be bag 2 because it is the one that that has the highest proportion of red jelly beans, and he’s equally likely to draw a ball from either bag. Let’s confirm our intuition: P (B )P (R | B ) 1 · 1 1 P (B | R) = 1 1 = 3 2 = 1 P (B )P (R B ) + P (B )P (R B ) + P (B )P (R B ) 1 1 1 25 1 5 3 1 | 1 2 | 2 3 | 3 3 · 2 + 3 · 30 + 3 · 30 P (B )P (R | B ) 1 · 25 5 P (B | R) = 2 2 = 3 30 = 2 P (B )P (R B ) + P (B )P (R B ) + P (B )P (R B ) 1 1 1 25 1 5 9 1 | 1 2 | 2 3 | 3 3 · 2 + 3 · 30 + 3 · 30 P (B )P (R | B ) 1 · 5 1 P (B | R) = 3 3 = 3 30 = , 3 P (B )P (R B ) + P (B )P (R B ) + P (B )P (R B ) 1 1 1 25 1 5 9 1 | 1 2 | 2 3 | 3 3 · 2 + 3 · 30 + 3 · 30 so there’s over a 50% chance that the red ball comes from bag 2. Note that we could’ve found the third probability as P (B3 | R) = 1 − P (B1 | R) − P (B2 | R) (we know that the ball has to come from one of the 3 bags, there is no other possibility). In general, conditional behave like regular probabilities. All the properties you showed for probabilities apply to conditional probabilities (P (Ω | B) = 1, P (Ac | B) = 1 − P (Ac | B), etc.).

• Suppose that our friend Robbie is a citizen of the imaginary country S we introduced before. We know he doesn’t support policy Q because his Facebook status is “I really dislike Q :(”. Which party does he support? P (A)P (Qc | A) 0.4 · 0.5 P (A | Qc ) = = ≈ 0.385 P (Qc ) 0.52 P (B)P (Qc | B) 0.3 · 0.6 P (B | Qc ) = = ≈ 0.346 P (Qc ) 0.52 P (C)P (Qc | C) 0.2 · 0.7 P (C | Qc ) = = ≈ 0.269 P (Qc ) 0.52 P (D)P (Qc | D) P (D | Qc ) = = 0, P (Qc ) so it’s pretty hard to tell! We know for sure that he doesn’t support party D, which makes sense: 100% of the supporters of D are in favor of Q, so he couldn’t be supporting D.

• Some spam filters use Bayes’ rule to compute the probability that a message is spam given the words it contains. Let S is the event “the message is spam” and C the event “the message contains the string of words “you won a prize!””. Then, the Bayes filter would compute P (S)P (C | S) P (S | C) = , P (C) where P (S) is the probability that a “random” message is spam, P (C) is the probability that a message contains the string “you won a prize!”, and P (C | S) is the probability that a spam message contains

2 the string “you won a prize!”. Clearly, P (C | S) is way greater than P (C), so P (S | C) will be pretty close to 1. There are many spam filters that essentially do this. If you’re interested, you can read more if you click on this link.

• I will give more examples in Lecture, but I don’t want to include them here because I don’t want you to know the answers in advance!

Bayes’ rule is one of the core formulas in the course, so make sure you’re familiar with it and know how to apply it.

3