Announcements
• Readings – Probability Theory CSE 321 Discrete Structures • 6.1, 6.2 (5.1, 5.2) Probability Theory • 6.3 (New material!) Bayes’ Theorem • 6.4 (5.3) Expectation Winter 2008 – Advanced Counting Techniques – Ch 7. Lecture 19 •Not covered Probability Theory
Discrete Probability Example: Dice
Experiment: Procedure that yields an outcome
Sample space: Set of all possible outcomes
Event: subset of the sample space
S a sample space of equally likely outcomes, E an event, the probability of E, p(E) = |E|/|S|
Example: Poker Combinations of Events
Probability of 4 of a kind EC is the complement of E
P(EC) = 1 – P(E)
P(E1∪ E2) = P(E1) + P(E2) – P(E1∩ E2) Combinations of Events Probability Concepts
EC is the complement of E • Probability Distribution • Conditional Probability P(EC) = 1 – P(E) • Independence • Bernoulli Trials / Binomial Distribution • Random Variable P(E1∪ E2) = P(E1) + P(E2) – P(E1∩ E2)
Discrete Probability Theory Examples
•Set S • Probability distribution p : S → [0,1] –For s ∈ S, 0 ≤ p(s) ≤ 1
– Σs∈ S p(s) = 1
• Event E, E⊆ S
• p(E) = Σs∈ Ep(s)
Conditional Probability Examples
Let E and F be events with p(F) > 0. The conditional probability of E given F, defined by p(E | F), is defined as: Independence Are these independent?
• Flip a coin three times The events E and F are independent if and only if – E: the first coin is a head – F: the second coin is a head p(E∩ F) = p(E)p(F) • Roll two dice – E: the sum of the two dice is 5 – F: the first die is a 1 • Roll two dice – E: the sum of the two dice is 7 E and F are independent if and only if – F: the first die is a 1 p(E | F) = p(E) • Deal two five card poker hands – E: hand one has four of a kind – F: hand two has four of a kind
Bernoulli Trials and Binomial Random Variables Distribution • Bernoulli Trial A random variable is a function from – Success probability p, failure probability q a sample space to the real numbers The probability of exactly k successes in n independent Bernoulli trials is
False Positives, False Bayes’ Theorem Negatives Suppose that E and F are events from a sample Let D be the event that a person has the disease space S such that p(E) > 0 and p(F) > 0. Then Let Y be the event that a person tests positive for the disease Testing for disease P(D|Y)
Disease is very rare: p(D) = 1/100,000
Testing is accurate: False negative: 1% False positive: 0.5%
Suppose you get a positive result, what do you conclude?
Spam Filtering Bayesian Spam filters
• Classification domain From: Zambia Nation Farmers Union [[email protected]] Subject: Letter of assistance for school installation – Cost of false negative To: Richard Anderson – Cost of false positive Dear Richard, I hope you are fine, Iam through talking to local headmen about the possible • Criteria for spam assistance of school installation. the idea is and will be welcome. – v1agra, ONE HUNDRED MILLION USD I trust that you will do your best as i await for more from you. Once again Thanking you very much • Basic question: given an email message, Sebastian Mazuba. based on spam criteria, what is the probability it is spam
Email message with phrase Proving Bayes’ Theorem “Account Review” • 250 of 20000 messages known to be spam • 5 of 10000 messages known not to be spam • Assuming 50% of messages are spam, what is the probability that a message with “Account Review” is spam Flip a coin until the first head Expectation Expected number of flips? The expected value of random variable X(s) on sample space S is: Probability Space:
Computing the expectation:
Linearity of Expectation Hashing
E(X1 + X2) = E(X1) + E(X2) H: M → [0..n-1]
E(aX) = aE(X)
If k elements have been hashed to random locations, what is the expected number of elements in bucket j?
What is the expected number of collisions when hashing k elements to random locations?
Hashing analysis Counting inversions
Sample space: [0..n-1] × [0..n-1] × . . . × [0..n-1] Let p1, p2, . . . , pn be a permutation of 1 . . . n pi, pj is an inversion if i < j and pi > pj Random Variables
Xj = number of elements hashed to bucket j 4, 2, 5, 1, 3 C = total number of collisions
Bij = 1 if element i hashed to bucket j B = 0 if element i is not hashed to bucket j ij 1, 6, 4, 3, 2, 5
Cab = 1 if element a is hashed to the same bucket as element b Cab = 0 if element a is hashed to a different bucket than element b 7, 6, 5, 4, 3, 2, 1 Expected number of inversions Insertion sort for a random permutation 4 2 5 1 3
for i :=1 to n-1{ j := i; while (j > 0 and A[ j - 1 ] > A[ j ]){ swap(A[ j -1], A[ j ]); j := j – 1; } }
Expected number of swaps for Left to right maxima Insertion Sort
max_so_far := A[0]; for i := 1 to n-1 if (A[ i ] > max_so_far) max_so_far := A[ i ];
5, 2, 9, 14, 11, 18, 7, 16, 1, 20, 3, 19, 10, 15, 4, 6, 17, 12, 8
What is the expected number of left-to- right maxima in a random permutation