Announcements

• Readings – Theory CSE 321 Discrete Structures • 6.1, 6.2 (5.1, 5.2) • 6.3 (New material!) Bayes’ Theorem • 6.4 (5.3) Expectation Winter 2008 – Advanced Counting Techniques – Ch 7. Lecture 19 •Not covered Probability Theory

Discrete Probability Example: Dice

Experiment: Procedure that yields an

Sample space: Set of all possible outcomes

Event: subset of the

S a sample space of equally likely outcomes, E an event, the probability of E, p(E) = |E|/|S|

Example: Poker Combinations of Events

Probability of 4 of a kind EC is the complement of E

P(EC) = 1 – P(E)

P(E1∪ E2) = P(E1) + P(E2) – P(E1∩ E2) Combinations of Events Probability Concepts

EC is the complement of E • P(EC) = 1 – P(E) • Independence • Bernoulli Trials / P(E1∪ E2) = P(E1) + P(E2) – P(E1∩ E2)

Discrete Probability Theory Examples

•Set S • Probability distribution p : S → [0,1] –For s ∈ S, 0 ≤ p(s) ≤ 1

– Σs∈ S p(s) = 1

• Event E, E⊆ S

• p(E) = Σs∈ Ep(s)

Conditional Probability Examples

Let E and F be events with p(F) > 0. The conditional probability of E given F, defined by p(E | F), is defined as: Independence Are these independent?

• Flip a coin three times The events E and F are independent if and only if – E: the first coin is a head – F: the second coin is a head p(E∩ F) = p(E)p(F) • Roll two dice – E: the sum of the two dice is 5 – F: the first die is a 1 • Roll two dice – E: the sum of the two dice is 7 E and F are independent if and only if – F: the first die is a 1 p(E | F) = p(E) • Deal two five card poker hands – E: hand one has four of a kind – F: hand two has four of a kind

Bernoulli Trials and Binomial Random Variables Distribution • Bernoulli Trial A random variable is a function from – probability p, failure probability q a sample space to the real numbers The probability of exactly k successes in n independent Bernoulli trials is

False Positives, False Bayes’ Theorem Negatives Suppose that E and F are events from a sample Let D be the event that a person has the disease space S such that p(E) > 0 and p(F) > 0. Then Let Y be the event that a person tests positive for the disease Testing for disease P(D|Y)

Disease is very rare: p(D) = 1/100,000

Testing is accurate: False negative: 1% False positive: 0.5%

Suppose you get a positive result, what do you conclude?

Spam Filtering Bayesian Spam filters

• Classification domain From: Zambia Nation Farmers Union [[email protected]] Subject: Letter of assistance for school installation – Cost of false negative To: Richard Anderson – Cost of false positive Dear Richard, I hope you are fine, Iam through talking to local headmen about the possible • Criteria for spam assistance of school installation. the idea is and will be welcome. – v1agra, ONE HUNDRED MILLION USD I trust that you will do your best as i await for more from you. Once again Thanking you very much • Basic question: given an email message, Sebastian Mazuba. based on spam criteria, what is the probability it is spam

Email message with phrase Proving Bayes’ Theorem “Account Review” • 250 of 20000 messages known to be spam • 5 of 10000 messages known not to be spam • Assuming 50% of messages are spam, what is the probability that a message with “Account Review” is spam Flip a coin until the first head Expectation Expected number of flips? The of random variable X(s) on sample space S is: :

Computing the expectation:

Linearity of Expectation Hashing

E(X1 + X2) = E(X1) + E(X2) H: M → [0..n-1]

E(aX) = aE(X)

If k elements have been hashed to random locations, what is the expected number of elements in bucket j?

What is the expected number of collisions when hashing k elements to random locations?

Hashing analysis Counting inversions

Sample space: [0..n-1] × [0..n-1] × . . . × [0..n-1] Let p1, p2, . . . , pn be a permutation of 1 . . . n pi, pj is an inversion if i < j and pi > pj Random Variables

Xj = number of elements hashed to bucket j 4, 2, 5, 1, 3 C = total number of collisions

Bij = 1 if element i hashed to bucket j B = 0 if element i is not hashed to bucket j ij 1, 6, 4, 3, 2, 5

Cab = 1 if element a is hashed to the same bucket as element b Cab = 0 if element a is hashed to a different bucket than element b 7, 6, 5, 4, 3, 2, 1 Expected number of inversions Insertion sort for a random permutation 4 2 5 1 3

for i :=1 to n-1{ j := i; while (j > 0 and A[ j - 1 ] > A[ j ]){ swap(A[ j -1], A[ j ]); j := j – 1; } }

Expected number of swaps for Left to right maxima Insertion Sort

max_so_far := A[0]; for i := 1 to n-1 if (A[ i ] > max_so_far) max_so_far := A[ i ];

5, 2, 9, 14, 11, 18, 7, 16, 1, 20, 3, 19, 10, 15, 4, 6, 17, 12, 8

What is the expected number of left-to- right maxima in a random permutation