Math Colloquium Five Lectures on Probability Theory Part 2: the Law of Large Numbers

Math Colloquium Five Lectures on Probability Theory Part 2: The Law of Large Numbers Robert Niedzialomski, [email protected] March 3rd, 2021 Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 1Numbers / 18 Probability - Intuition Probability Theory = Mathematical framework for modeling/studying non-deterministic behavior where a source of randomness is introduced (this means that more than one outcome is possible) The space of all possible outcomes is called the sample space. A set of outcomes is called an event and the source of randomness is called a random variable Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 2Numbers / 18 Discrete Probability A discrete probability space consists of a finite (or countable) set Ω of outcomes ! together with a set of non-negative real numbers p! assigned to each !; p! is called the probability of the outcome !. We require P !2Ω p! = 1. An event is a set of outcomes, i.e., a subset A ⊂ Ω. The probability of an event A is X P(A) = p!: !2A A random variable is a function X mapping the set Ω to the set of real numbers. We write X :Ω ! R. We note that the following Kolmogorov axioms of probability hold true: P(;) = 0 1 P1 if A1; A2;::: are disjoint events, then P([n=1An) = n=1 P(An). Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 3Numbers / 18 An Example of Rolling a Die Twice Example (Rolling a Die Twice) Suppose we roll a fair die twice and we want to model the probability of the sum of the numbers we roll. The sample space to Ω = f(i; j): i; j = 1; 2; 3; 4; 5; 6g with probability of each outcome pij = 1=36. Let the random variable X represent the number after the first roll and let Y be the random variable that represents the number after the second roll. Hence X (i; j) = i and Y (i; j) = j: Our goal is to study the random variable X + Y . We compute P(X 0 + Y 0 = 2) = P(f(1; 1)g) = 1=36 P(X 0 + Y 0 = 3) = P(f(1; 2); (2; 1)g = 2 · (1=36) = 1=18: Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 4Numbers / 18 Example (Rolling a Die Twice Continued) We continue our compuation of the probabilities for the random variable X 0 + Y 0 and obtain k 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 5 4 3 2 1 pk 36 36 36 36 36 36 36 36 36 36 36 We have obtained a new probability space ΩX = f2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12g with probability pk of each outcome given in the table above. This probability is called the distribution of the random variable X + Y . The expected value, denoted by E[X + Y ] of this random variable is the weighted average (mean) of the locations k with weights pk , i.e., 12 X E[X + Y ] = pk · k = 7: k=2 Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 5Numbers / 18 Distribution and Expectation Let (Ω; (p!)) be a discrete probability space and let X be a random variable on Ω. The probability distribution on X is the discrete probability ΩX = the set of values of X = fX (!): ! 2 Ωg with probability of an outcome k given by pk = P(X = k) = P(f! 2 Ω: X (!) = kg): The expectation E[X ] of X , called also mean, is given by X E[X ] = pk · k: k2ΩX Remark: The formula for expectation makes sense for a probability defined on a real line without reference to a random variable. Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 6Numbers / 18 Theorem The expectation of a random variable X can be computed according to the formula X E[X ] = p!X (!): !2Ω Proof. We first notice that X pk = P(f! 2 Ω: X (!) = kg) = p!: f! : X (!)=kg Therefore X E[X ] = pk · k k2ΩX X X X = p!X (!) = p!X (!): k2ΩX f! : X (!)=kg !2Ω Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 7Numbers / 18 The theorem gives us the following properties of expectation: For two random variables X and Y we have E[X + Y ] = E[X ] + E[Y ]: For a random variable X and a real number C we have E[cX ] = cE[X ] We say that a random variable X has zero mean if E[X ] = 0. Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 8Numbers / 18 Bernoulli Distribution Suppose we flip a biased coin with probability of heads = p and probability of tails = q = 1 − p The probability space is Ω = fH; T g with pH = p and pT = q. Let X be the random variable that assigns the value 0 to tails and value 1 to heads. This means that X (T ) = 0 and X (H) = 1. The probability distribution is ΩX = f0; 1g with p1 = p and p0 = q. This distribution is called the Bernoulli distribution. Its expectation is E[X ] = 0 · p0 + 1 · p1 = p: Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 9Numbers / 18 Binomial Distribution Suppose we flip a biased coin n times. What is the probability of getting Heads k times? The sample space is Ω = f(x1; x2;:::; xn): xj = 0; 1g, where 0 represents tails and 1 represents heads. The probability of an outcome (x1; x2;:::; xn) is p(number of 1's) · q(number of 0's): Let Sn be the random variable that represents numbers of of Heads in n flips of the coin. We need to find P(Sn = k). We see that k n−k P(Sn = k) = number of outcomes with k ones · p q Since the number of outcomes with k ones equals the number of k element subsets of an n element set with is nchoosek, we have n P(S = k) = pk qn−k : n k Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 10 Numbers / 18 The distribution of the random variable Sn is probability space: f0; 1;:::; ng n probability distribution: p = P(S = k) = pk qn−k : k n k To find the expectation we need to compute n X n E[S ] = k pk qn−k : n k k=0 This requires the use of the Binomial Theorem which says the following. For any real numbers a and b and any positive integer n we have n X n (a + b)n = ak bn−k : k k=1 Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 11 Numbers / 18 Let X1; X2;:::; Xn be the random variables representing the 1st, 2nd,. , n-th flip of the coin. If we wanted to be precise, we would write Xj (x1;:::; xn) = xj Each random variable Xj , where j = 1; 2;:::; n has Bernoulli distribution. Hence E[X1] = E[X2] = ::: = E[Xn] = p: Moreover, we see that Sn = X1 + X2 + ::: + Xn: Therefore E[Sn] = E[X1] + E[X2] + ::: + E[Xn] = np: What happens if we keep flipping the coin, record the number of Heads, and take the average by divide by the number of flips? In order words, we want to study X + X + ::: + X S lim 1 2 n = lim n n!1 n n!1 n Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 12 Numbers / 18 Law of Large Numbers Law of Averages: Suppose we repeat an experiment independently n times. Then # of successes in n trials ! P(success) n Law of Large Numbers: Let the random variable Xi model the i-th trial of the experiment. This means that P(Xi = 1) = P(success) = p and P(X = 0) = P(failure) = q = 1 − p. Then the random variables X1; X2;::: are independent and identically distributed (i.i.d.) with Bernoulli distribution, and X + X + ::: + X 1 2 n ! E[X ] = P(success) = p n 1 Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 13 Numbers / 18 Theorem (Bernoulli, 1692) It is the case that Sn=n converges to p as n ! 1 in the sense that for any > 0 S P p − ≤ n ≤ p + ! 1 when n ! 1: n Proof: Let > 0. Then n Sn X X n P ≥ p + = P(S = k) = pk qn−k n n k k≥n(p+) k=dn(p+)e Let λ > 0. Then 0 < λ[k − n(p + )] = −λn + λqk − λp(n − k) and n Sn X n P ≥ p + ≤ eλ[k−n(p+)] pk qn−k n k k=dn(p+)e m X n ≤ e−λn (peλq)k (qe−λp)n−k = e−λn(peqλ + qe−λp)n: k k=0 Robert Niedzialomski, [email protected] Math Colloquium Five Lectures on Probability Theory PartMarch 2: The 3rd, Law 2021 of Large 14 Numbers / 18 We will now use the inequality saying that 2 ex ≤ x + ex where x is any real number: Then S P n ≥ p + ≤ e−λn(peqλ + qe−λp)n n 2 2 2 2 ≤ e−λn(pλq + peλ q − qλp + qeλ p )n 2 2 ≤ e−λn(peλ + qeλ )n 2 = eλ n−λn: The minimum of the function λ 7! λ2n − λn = nλ(λ − ) occurs when λ = /2.

Load more