Basic Probability Theory
Total Page:16
File Type:pdf, Size:1020Kb
TU Eindhoven Advanced Algorithms (2IL45) — Course Notes Basic probability theory Events. Probability theory is about experiments that can have different outcomes. The possible outcomes are called the elementary events, and the sample space is the set of all elementary events. A subset of the sample space is an event.1 (Note that if the subset is a singleton, then the event is an elementary event.) We call the set of all events defined by a sample space S the event space defined by S, and we denote it by Events(S). As an example, consider the experiment where we flip a coin two times. One possible outcome of the experiment is HT: we first get heads (H) and then tails (T). The sample space is the set {HH,HT,TH,TT }. The subset {HT, T H, T T } is one of the 16 events defined by this sample space, namely “the event that we get at least one T”. Probability distributions. A sample space S comes with a probability distribution, which is a mapping Pr : Events(S) → R such that 1. Pr[A] > 0 for all events A ∈ Events(S). 2. Pr[S] = 1. 3. Pr[A ∪ B] = Pr[A] + Pr[B] for any two events A, B ∈ Events(S) that are mutually exclusive (that is, events A, B such that A ∩ B = ∅). Another way of looking at this is that we assign non-negative probabilities to the elementary events (which sum to 1), and that the probability of an event is the sum of the probabilities of the elementary events it is composed of. Note that property 3 only talks about mutually exclusive events. However, for any set of n events A1,...,An (no matter whether the events are mutually exclusive) we have Pr[A1 ∪ · · · ∪ An] 6 Pr[A1] + ··· + Pr[An]. It is not necessarily the case that all elementary events have the same probability. For example, in the example above it could be that Pr[HH] = 1/16, Pr[HT ] = Pr[TH] = 3/16, and Pr[TT ] = 9/16. Note, however, that the probabilities of all the elementary events always sum to 1; this follows from the conditions 1, 2, 3 above. If all elementary events have the same probability—thus for finite S we have Pr[A] = 1/|S| for all elementary events A ∈ S—then Pr is called a uniform distribution. Conditional probabilities. The probability that an event happens may change if you already know that some other event has taken place. This leads to the following concept: the conditional probability of an event A given that another event B occurs is defined to be Pr[A ∩ B] Pr[A|B] := , Pr[B] where we require Pr[B] 6= 0. Two events A and B are called independent if Pr[A ∩ B] = Pr[A] · Pr[B]. If Pr[B] 6= 0 this implies Pr[A|B] = Pr[A], which explains the name: the probability that A has happened is independent of whether B has happened or not. 1In general not all subsets need to be events, but we shall only deal with situations where every subset is an event. 1 TU Eindhoven Advanced Algorithms (2IL45) — Course Notes Random variables and their expected value. A random variable is a function that assigns a real number to each elementary event. (Note that it does not assign a number to a non-elementary event.) We will be working with discrete random variables: random variables whose possible values come from a discrete set (usually N or Z). In the coin-flip example, for instance, we could define a random variable X that denotes the number of heads, so X(HH) = 2, X(TH) = X(HT ) = 1, and X(TT ) = 0. For a random variable X and a number x we can now define the event X = x to be {A ∈ S : X(A) = x}. In other words X = x is the collection of elementary events A such that X(A) = x. Thus we have P Pr[X = x] = A∈S:X(A)=x Pr[A]. The expected value of a random variable X is defined as X E[X] := x · Pr[X = x], x where the sum is taken over all possible values x that the random variable X can take. Lemma 1 (Markov inequality) Let X be a random variable taking only non-negative values, and µ = E[X] be its expectation . Then for any t > 0 we have Pr[ X > t · µ ] 6 1/t. Often several random variables are defined over the same sample space S. We have the following important property. Lemma 2 (Linearity of expectation) E[X + Y ] = E[X] + E[Y ] for any two random variables X, Y . Moreover, for any constant c we have E[cX] = c E[X]. For multiplication a similar statement holds only if the random variables are independent. (X and Y are independent if for all x and y the events X = x and Y = y are independent.) So we have: E[XY ] = E[X] · E[Y ] for any two independent random variables. Bernoulli and Poisson trials. A Bernoulli trial is an experiment with two possible out- comes: 0 (=fail) or 1 (=success). If the probability of succes is p, then the expected number of trials before a successful experiment takes place is 1/p. For example, if we have a fair dice then the probability of throwing, say, 5 is 1/6, so the expected number of throws before 5 comes up is 6. With a fair coin, the expected number of coin flips to gets heads is 2. If we have a sequence of experiments with two possible outcomes, 0 or 1, but the success probability is different for each experiment, then these are called Poisson trials. The following result is often useful to obtain high-probability bounds for randomized algorithms: Lemma 3 (Tail estimates for Poisson trials) Suppose we do n Poisson trials. Let Xi denote the outcome of the i-th trial and let pi = Pr[Xi = 1], where 0 < pi < 1. Let Pn X = i=1 Xi be a random variable that indicates the total number of successes in the n Pn trials, and let µ = E[X] = i=1 pi be the expected values of X (that is, the expected number of successful experiments). Then eδ µ Pr[X > (1 + δ)µ] . 6 (1 + δ)1+δ Thus the probability of deviating by more than some constant factor from the expected value is exponentially small. For example, for δ = 2 we have eδ/(1 + δ)1+δ < 1/2, so we get 1µ Pr[X > 3µ] . 6 2 2.