<<

Definition A random variable X on a sample spaceΩ is a real-valued onΩ; that is, X :Ω → R.A discrete random variable is a random variable that takes on only a finite or countably infinite number of values. Discrete random variable X and real value a: the event “X = a” represents the {s ∈ Ω: X (s) = a}. X Pr(X = a) = Pr(s) s∈Ω:X (s)=a Independence

Definition Two random variables X and Y are independent if and only if

Pr((X = x) ∩ (Y = y)) = Pr(X = x) · Pr(Y = y) for all values x and y. Similarly, random variables X1, X2,... Xk are mutually independent if and only if for any subset I ⊆ [1, k] and any values xi ,i ∈ I , ! \ Y Pr Xi = xi = Pr(Xi = xi ). i∈I i∈I Expectation

Definition The expectation of a discrete random variable X , denoted by E[X ], is given by X E[X ] = i Pr(X = i), i where the summation is over all values in the range of X . The P expectation is finite if i |i| Pr(X = i) converges; otherwise, the expectation is unbounded.

The expectation (or mean or average) is a weighted sum over all possible values of the random variable. Median

Definition The median of a random variable X is a value m such

Pr(X < m) ≤ 1/2 and Pr(X > m) < 1/2. Linearity of Expectation

Theorem For any two random variablesX andY

E[X + Y ] = E[X ] + E[Y ].

Lemma For any constantc and discrete random variableX,

E[cX ] = cE[X ]. Bernoulli Random Variable

A Bernoulli or an indicator random variable:  1 if the experiment succeeds, Y = 0 otherwise.

Let Pr(Y = 1) = p.

E[Y ] = p · 1 + (1 − p) · 0 = p = Pr(Y = 1). Binomial Random Variable

Definition A binomial random variable X with parameters n and p, denoted by B(n, p), is defined by the following distribution on j = 0, 1, 2,..., n:

n Pr(X = j) = pj (1 − p)n−j . j Expectation of a Binomial Random Variable

n X n E[X ] = j pj (1 − p)n−j j j=0 n X n! = j pj (1 − p)n−j j!(n − j)! j=0 n X n! = pj (1 − p)n−j (j − 1)!(n − j)! j=1 n X (n − 1)! = np pj−1(1 − p)(n−1)−(j−1) (j − 1)!((n − 1) − (j − 1))! j=1 n−1 X (n − 1)! = np pk (1 − p)(n−1)−k k!((n − 1) − k)! k=0 n−1 X n − 1 = np pk (1 − p)(n−1)−k = np. k k=0 Binomial Random Variable

X represents the number of successes in n independent experiments, each with success probability p. Let  1 if the i-th experiment succeeds, X = i 0 otherwise. so Xi is the indicator random variable (for the ith experiment). Using linearity of expectations

" n # n X X E[X ] = E Xi = E[Xi ] = np. i=1 i=1 Example: Finding the k-Smallest Element

Procedure Order(S, k); Input: A set S, an integer k ≤ |S| = n. Output: The k smallest element in the set S. Example: Finding the k-Smallest Element

Procedure Order(S, k); Input: A set S, an integer k ≤ |S| = n. Output: The k smallest element in the set S.

1 If |S| = k = 1 return S. 2 Choose a random element y uniformly from S.

3 Compare all elements of S to y. Let S1 = {x ∈ S | x ≤ y} and S2 = {x ∈ S | x > y}.

4 If k ≤ |S1| return Order(S1, k) else return Order(S2, k − |S1|).

Theorem The algorithm returns thek-smallest element inS performing O(n) comparisons in expectation. Proof • We say that a call to Order(S, k) was successful if the random element was in the middle1 /3 of the set S. A call is successful with probability1 /3. • After the i-th successful call the size of the set S is bounded i by n(2/3) . Then there are at most log3/2 n successful calls. • Let X be the total number of comparisons. Let Xi be the number of comparisons between the i-th successful call Plog3/2 n (included) and the i + 1-th (excluded): E[X ] ≤ i=0 E[Xi ]. • Let Yi (i > 0) be the number of calls between the i-th successful call and the i + 1-th successful call. The expected number of calls from one successful call to the next one (included) is3: E[Yi ] = 3 for all i. i i • Therefore E[Xi ] ≤ n(2/3) E[Yi ] = 3n(2/3) . • Expected number of comparisons:

log3/2 n j X 2 E[X ] ≤ 3n ≤ 9n. 3 j=0 Conditional expectation

Definition Let Y and Z be random variables. The conditional expectation of Y given that Z = z is defined as

X E[Y |Z = z] = yPr(Y = y|Z = z), y where summation is over all y in the range of Y .

We can also consider the expectation of Y conditioned on the event E:

X E[Y |E] = yPr(Y = y|E) y Conditional expectation

Lemma For any random variablesX andY

X E[X ] = Pr(Y = y)E[X |Y = y], y where the sum is over the range ofY and all of the expectations exist. Conditional expectation

Linearity of expectations also holds for conditional expectations: Lemma

For any finite collection of discrete random variablesX 1, X2,..., Xn with finite expectation and for any random variableY,

" n # n X X E Xi |Y = y = E[Xi |Y = y]. i=1 i=1 Conditional expectation

Note that for given random variables Y , Z the expression E[Y |Z] is itself a random variable f (Z) that takes the value E[Y |Z = z] when Z = z. Theorem For given random variablesY , Z we have E[Y ] = E[E[Y |Z]]

Proof. By definition,

X E[E[Y |Z]] = E[Y |Z = z]Pr(Z = z) = E[Y ] z Application of conditional expectation

Given a process S that each time when when it is started will recursively spawn new copies of the process S, where the number of new copies is a binomial random variable with parameters n and p. We also assume that these random variables are independent for each call to S. Now assume that we call the process S from a program once. What is the expected number of copies of S generated by this call? Application of conditional expectation

To analyse we introduce generations: • The initial process S is (the only element) in generation 0 • A (copy of) process S is in generation i if it was spawned by another process S in generation i − 1.

• Denote by Yi the number of processes in generation i.

• Then Y0 = 1 Application of conditional expectation

As the number of processes spawned by the first process S is binomially distributed with parameters n, p, we have E[Y1] = np Let us assume that there are yi−1 copies of process S in generation i − 1 and let Zk denote the number of processes spawned by the kth process which was itself spawned in generation i − 1 for 1 ≤ k ≤ yi−1 Each Zk is binomially distributed with parameters n, p and hence has expectation E[Zk ] = np Application of conditional expectation

"yi−1 # X E[Yi |Yi−1 = yi−1] = E Zk |Yi−1 = yi−1 k=1 yi−1 X = E[Zk |Yi−1 = yi−1] k=1 yi−1 n X X = jPr(Zk = j|Yi−1 = yi−1) k=1 j=0 yi−1 n X X = jPr(Zk = j) k=1 j=0

yi−1 X = E[Zk ] k=1 = yi−1np Application of conditional expectation

Now we can calculate the expected size of the ith generation inductively. We have

E[Yi ] = E[E[Yi |Yi−1]] = E[Yi−1np] = npE[Yi−1]

i so by induction and Y0 = 1 we get E[Yi ] = (np) Application of conditional expectation

The expected number of copies of S generated by the first call is given by

  X X X i E  Yi  = E[Yi ] = (np) i≥0 i≥0 i≥0

This is unbounded if np ≥ 1 and equals1 /(1 − np) when np < 1 Hence the expected number of copies spawned by the first call is finite if and only if the expected number of copies spawned by each copy is less the 1. The Geometric Distribution

Definition A geometric random variable X with parameter p is given by the following probability distribution on n = 1, 2,....

Pr(X = n) = (1 − p)n−1p.

Example: repeatedly draw independent Bernoulli random variables with parameter p > 0 until we get a1. Let X be number of trials up to and including the first1. Then X is a geometric random variable with parameter p. Memoryless Property Lemma For a geometric random variable with parameterp andn > 0,

Pr(X = n + k | X > k) = Pr(X = n).

Proof.

Pr((X = n + k) ∩ (X > k)) Pr(X = n + k | X > k) = Pr(X > k) Pr(X = n + k) (1 − p)n+k−1p = = P∞ i Pr(X > k) i=k (1 − p) p (1 − p)n+k−1p = = (1 − p)n−1p (1 − p)k = Pr(X = n). Memoryless Property – Intuition

I have a coin and flip it a number of times without telling you neither the outcomes nor the number of times I flip it.

I give the coin to you, asking what is the probability that the first head will come out after n times you have flipped it.

Intuitively, this probability should not depend on the number of times I flip the coin and/or on the corresponding outcomes.

The memoryless property tells us it is indeed so. Lemma LetX be a discrete random variable that takes on only non-negative integer values. Then

∞ X E[X ] = Pr(X ≥ i). i=1

Proof.

∞ ∞ ∞ X X X Pr(X ≥ i) = Pr(X = j) i=1 i=1 j=i ∞ j X X = Pr(X = j) j=1 i=1 ∞ X = j Pr(X = j) = E[X ]. j=1

The interchange of (possibly) infinite summations is justified, because the terms being summed are all non-negative. For a geometric random variable X with parameter p, ∞ X Pr(X ≥ i) = (1 − p)n−1p n=i 1 1 − (1 − p)i−1 = p( − ) = (1 − p)i−1 p p

∞ X E[X ] = Pr(X ≥ i) i=1 ∞ X = (1 − p)i−1 i=1 1 = 1 − (1 − p) 1 = p Example: Coupon Collector’s Problem

Suppose that each box of cereal contains a random coupon from a set of n different coupons. How many boxes of cereal do you need to buy before you obtain at least one of every type of coupon? Let X be the number of boxes bought until at least one of every type of coupon is obtained. Let Xi be the number of boxes bought while you had exactly i − 1 different coupons.

n X X = Xi i=1

Xi is a geometric random variable with parameter i − 1 p = 1 − . i n 1 n E[Xi ] = = . pi n − i + 1

" n # X E[X ] = E Xi i=1 n X = E[Xi ] i=1 n X n = n − i + 1 i=1 n X 1 = n = n ln n + Θ(n). i i=1 Randomized Algorithms: • Analysis is true for any input. • The is the space of random choices made by the algorithm. • Repeated runs are independent. Probabilistic Analysis: • The sample space is the space of all possible inputs. • If the algorithm is deterministic repeated runs give the same output. Algorithm classification

A Monte Carlo Algorithm is a randomized algorithm that may produce an incorrect solution. For decision problems: A one-side error Monte Carlo algorithm errs only one one possible output, otherwise it is a two-side error algorithm. A Las Vegas algorithm is a randomized algorithm that always produces the correct output. In both types of algorithms the run-time is a random variable. Example of probabilistic analysis: Quicksort

Procedure Quicksort(S); Input: List S = {x1,..., xn} of n distinct elements (from totally ordered universe). Output: The elements of S in sorted order.

1 If |S| ≤ 1 return S. 2 Choose the first element of S as pivot, called x.

3 Compare all elements of S to y. Let S1 = {y ∈ S |y < x} and S2 = {y ∈ S |y > x}.

4 return Quicksort(S1),x,Quicksort(S2).

(S1 and S2 are lists. If comparisons performed “left to right”, the elements in S1 and S2 have the same order as in S.) Analysis

Worst-Case: if the input S satisfies x1 > x2 > ··· > xn, Quicksort performsΘ( n2) comparison. Analysis

Worst-Case: if the input S satisfies x1 > x2 > ··· > xn, Quicksort(S) performsΘ( n2) comparison.

Probabilistic Analysis: what happens if S is chosen uniformly at random from all possible permutations of the values? Analysis

Theorem IfS is chosen uniformly at random from all possible permutations of the values, the expected number of comparisons made by Quicksort(S) is 2n ln n + O(n). Proof

• Let y1, y2,..., yn be the same as x1, x2,..., xn but sorted (increasing order)  1 y and y are compared • For i < j: X = i j ij 0 otherwise. • Let X be total number of comparisons:

n n X X X = Xij i=1 j=i+1

• By linearity of expectation:

n n X X E[X ] = E[Xij ] i=1 j=i+1

• Xij is indicator random variable: E[Xij ] = Pr(Xij = 1) Proof

• Pr(Xij = 1) =? ij • Let Y = {yi , yi+1,..., yj−1, yj }. yi and yj are compared if ij and only if either yi or yj is the first pivot selected from Y .

• Since elements in S1 and S2 are in the same order as in S, the first pivot selected from Y ij is the first element of Y ij in input list S. • Since S is chosen at random from all possible permutations of the values, every element in Y ij is equally likely be first in S: ij probability yi is first in Y is1 /(j − i + 1); same for yj . • 2 Pr(X = 1) = ij j − i + 1 Proof

n−1 n X X 2 E[X ] = j − i + 1 i=1 j=i+1 n−1 n−i+1 X X 2 = k i=1 k=2 n n+1−k X X 2 = k k=2 i=1 n X 2 = (n + 1 − k) k k=2 n X 2 = (n + 1) − 2(n − 1) k k=2 n X 1 = (2n + 2) − 4n = 2n ln +Θ(n). k k=1