Binomial Approximation and Joint Distributions Chris Piech CS109, Stanford University Which Is Random? Climate Sensitivity Four Prototypical Trajectories

Binomial Approximation and Joint Distributions Chris Piech CS109, Stanford University Which is Random? Climate Sensitivity Four Prototypical Trajectories Review The Normal Distribution 2 • X is a Normal Random Variable: X ~ N(µ, s ) § Probability Density Function (PDF): 1 2 2 f (x) = e-(x-µ) / 2s where - ¥ < x < ¥ s 2p µ § E[X ] = µ f (x) 2 § Var(X ) =s x § Also called “Gaussian” § Note: f(x) is symmetric about µ Simplicity is Humble µ probability σ2 value * A Gaussian maximizes entropy for a given mean and variance Density vs Cumulative CDF of a Normal 1 F(x) PDF of a Normal f(x) 0 -5 5 f(x) = derivative of probability F(x) = P(X < x) Probability Density Function 2 (µ, σ ) the distance to N “exponential” the mean (x µ)2 1 − − f(x)= e 2σ2 σp2⇡ probability sigma shows up density at x a constant twice Cumulative Density Function 2 CDF of Standard Normal: A (µ, σ ) function that has been solved N for numerically x µ F (x)=Φ − σ ✓ ◆ The cumulative density function (CDF) of any normal Table of F(z) values in textbook, p. 201 and handout Four Prototypical Trajectories Great questions! Four Prototypical Trajectories 68% rule only for Gaussians? 68% Rule? What is the probability that a normal variable X N(µ, σ2) ⇠ has a value within one standard deviation of its mean? µµ σσ µµ XX µµ µµ++σσ µµ P (µ σ <X<µ+ σ)=P µ σ µ < X µ < µ + σ µ PP((µµ σσ <X<µ<X<µ++σσ)=)=PP µ−−− σ−−− µ << X−−− µ << µ + σ−−− µ P (µ−−− σ <X<µ+ σ)=P µ −σσσ − µ < Xσσσ− µ < µ +σσσ − µ P (µ − σ <X<µ+ σ)=P✓✓✓ − σ − < σ− < σ − ◆◆◆ − = P (✓1 <Z<σ 1) σ σ ◆ == PP((✓ 11 <Z<<Z<1)1) ◆ = P (−−−1 <Z<1) ===ΦΦP(1)(1)(−1 <Z<ΦΦ(( 1)1)1) = Φ(1)− Φ( 1) = Φ(1)−−− Φ(−−−1) ===ΦΦΦ(1)(1)(1) −[1[1Φ[1(−Φ1)ΦΦ(1)](1)](1)] = Φ(1)−−− [1−−−− Φ(1)] =2=2= ΦΦΦ(1)(1)(1)− [111− Φ(1)] =2Φ(1)− 1− =2Φ(1)−−− 1 ===2= 2[0 2[0 2[0Φ..(1).8413]8413]8413]− 1 1=01=01=0...683683683 = 2[0.8413]− −−− 1=0.683 = 2[0.8413] − 1=0.683 − Only applies to normal 68% Rule? Counter example: Uniform X Uni(↵, β) ⇠ (β ↵)2 σ = Var(X) Var(X)= − β ↵ 12 = p− p12 1 P (µ σ <X<µ+ σ) − β ↵ 1 2(β ↵) − = − β ↵ p12 − =0.2581 2(β ↵) = − β 2σ ↵ pβ 12↵ p12 − =0.58 Four Prototypical Trajectories End Review ELO Ratings What is the probability that the Warriors beat the Cavs? How do you model zero sum games? ELO Ratings How it works: • Each team has an “ELO” score S, calculated based on their past performance. • Each game, the team has ability A ~ N(S, 2002) • The team with the higher sampled ability wins. Arpad Elo 2 2 AAC (1555, 200 ) A (1797, 200 ) R ⇠⇠ N W ⇠ N p 1555 1797 ability P (Warriors win) = PP((AAWW >A>ARC)) 0.87 ⇡ ELO Ratings from random import * NTRIALS = 10000 WARRIORS_ELO = 1797 CAVS_ELO = 1555 STD = 200 nSuccess = 0 for i in range(NTRIALS): w = gauss(WARRIORS_ELO, STD) b = gauss(CAVS_ELO, STD) if w > b: nSuccess += 1 print float(nSuccess) / NTRIALS ELO Ratings How it works: • Each team has an “ELO” score S, calculated based on their past performance. • Each game, the team has ability A ~ N(S, 2002) • The team with the higher sampled ability wins. Arpad Elo 2 2 AB (1555, 200 ) A (1797, 200 ) R ⇠ N W ⇠ N p 1555 1797 ability P (Warriors win) = PP((AAWW >A>ARB)) 0.87 ⇡ Calculated via sampling Four Prototypical Trajectories How does python sample from a Gaussian? from random import * for i in range(10): mean = 5 How std = 1 does this sample = gauss(mean, std) work? print sample 3.79317794179 5.19104589315 4.209360629 5.39633891584 7.10044176511 6.72655475942 5.51485158841 4.94570606131 6.14724644482 4.73774184354 How Does a Computer Sample Normal? Inverse Transform Sampling 1 CDF of the Standard Normal Φ(x) 0 -5 5 How Does a Computer Sample Normal? Inverse Transform Sampling y 1 CDF of the Standard Normal between 0,1 between Φ(x) Step 1: pick a uniform number number uniform a pick 1: Step 0 -5 5 Step 2: Find the x such that Φ(x)=y 1 x = Φ− (y) Further reading: Box–Muller transform Continuous RV Relative Probability X = time to finish pset 3 PP((XX== 10) 10) ""ff((XX== 10) 10) X ~ N(10, 2) == PP((XX== 5) 5) ""ff((XX== 5) 5) ff((XX== 10) 10) == Time to finish pset 3 ff((XX== 5) 5) (10(10 µµ))22 11 −− f(x) ee−− 22σσ22 pp22σσ22⇡⇡ == (5(5 µµ))22 11 −− ee−− 22σσ22 pp22σσ22⇡⇡ (10(10 10)10)22 11 −− x ee 44 pp44⇡⇡ −− == (5(5 10)10)22 11 −− ee 44 pp44⇡⇡ −− How much more likely 00 are you to complete in ee == 2525 == 518 518 10 hours than in 5? ee−−44 Four Prototypical Trajectories Imagine you are sitting a test… Website Testing • 100 people are given a new website design § X = # people whose time on site increases § CEO will endorse new design if X ≥ 65 What is P(CEO endorses change| it has no effect)? § X ~ Bin(100, 0.5). Want to calculate P(X ≥ 65) § Give an exact answer… 100 100 i 100 i P (X 65) = (0.5) (1 0.5) − ≥ i − i=65 X ✓ ◆ Normal Approximates Binomial There is a deep reason for the Binomial/Normal similarity… Normal Approximates Binomial Four Prototypical Trajectories Let’s invent an approximation! Website Testing • 100 people are given a new website design § X = # people whose time on site increases § CEO will endorse new design if X ≥ 65 What is P(CEO endorses change| it has no effect)? § X ~ Bin(100, 0.5). Want to calculate P(X ≥ 65) What is variance? E[X] = 50 x Website Testing • 100 people are given a new website design § X = # people whose time on site increases § CEO will endorse new design if X ≥ 65 What is P(CEO endorses change| it has no effect)? § X ~ Bin(100, 0.5). Want to calculate P(X ≥ 65) np = 50 np(1 – p) = 25 np(1 – p) = 5 § Use Normal approximation: Y ~ N(50, 25) Y 50 65 50 P (Y 65) = P − > − = P (Z>3) = 1 φ(3) 00..0020013 ≥ 5 5 − ⇡ § Using Binomial: P(X ³ 65) » 0.0018 Website Testing E[X] = 50 x Continuity Correction If Y (normal) approximates X (binomial) P (X 65) ≥ P (Y 64.5) ⇡ ≥ What about 64.9? 0.0018 ⇡ 64 65 66 P (Y 65) ≥ Continuity Correction If Y (normal) approximates X (binomial) Discrete (eg Binomial) Continuous (Normal) probability question probability question X = 6 5.5 < Y < 6.5 X >= 6 Y > 5.5 X > 6 Y > 6.5 X < 6 Y < 5.5 X <= 6 Y < 6.5 * Note: Binomial is always defined in units of “1” Comparison when n = 100, p = 0.5 P(X = k) k Who Gets to Approximate? X ~ Bin(n, p) Normal approx. Poisson approx. n large (> 20), n large (> 20), p is mid-ranged p small (< 0.05) np(1-p) > 10 If there is a choice, go with the normal approximation Stanford Admissions • Stanford accepts 2050 students this year § Each accepted student has 84% chance of attending § X = # students who will attend. X ~ Bin(2050, 0.84) § What is P(X > 1745)? np = 1722 np(1 p) = 276 np(1 p) = 16.6 − − p § Use Normal approximation: Y ~ N(1722, 276) P (X>1745) P (Y>1745.5) ⇡ Y 1722 1745.5 1722 P (Y 1745.5) = P − > − = P (Z>1.4) ≥ 16.6 16.6 0.08 ⇡ Changes in Stanford Admissions Class of 2020 Admit Rates Lowest in University History “Fewer students were admitted to the Class of 2020 than the Class of 2019, due to the increase in Stanford’s yield rate which has increased over 5 percent in the past four years, according to Colleen Lim M.A. ’80, Director of Undergraduate Admission.” 68% 10 years ago 84% last year Continuous Random Variables X Uni(↵, β) Uniform Random Variable ⇠ All values of x between alpha and beta are equally likely. Normal Random Variable X (µ, σ2) ⇠ N Aka Gaussian. Defined by mean and variance. Goldilocks distribution. Exponential Random Variable X Exp(λ) ⇠ Time until an event happens. Parameterized by lambda (same as Poisson). Beta Random Variable How mysterious and curious. You must wait a few classes J. Four Prototypical Trajectories Joint Distributions Ode to the Thai Café (demo) Go to this URL: https://goo.gl/KdcTAC Four Prototypical Trajectories Events occur with other events Probability Table for Discrete • States all possible outcomes with several discrete variables • A probability table is not “parametric” • If #variables is > 2, you can have a probability table, but you can’t draw it on a slide All values of A 0 1 2 0 Every outcome falls into a bucket 1 P(A = 1, B = 1) AllvaluesB of 2 Here “,” means “and” Discrete Joint Mass Function • For two discrete random variables X and Y, the Joint Probability Mass Function is: pX ,Y (a,b) = P(X = a,Y = b) • Marginal distributions: pX (a) = P(X = a) = å pX ,Y (a, y) y pY (b) = P(Y = b) = å pX ,Y (x,b) x • Example: X = value of die D1, Y = value of die D2 6 6 1 1 P(X =1) = å pX ,Y (1, y) = å 36 = 6 y=1 y=1 A Computer (or Three) In Every House • Consider households in Silicon Valley § A household has X Macs and Y PCs § Can’t have more than 3 Macs or 3 PCs X 0 1 2 3 p (y) Y Y 0 0.16 0.12 ? 0.04 1 0.12 0.14 0.12 0 2 0.07 0.12 0 0 3 0.04 0 0 0 pX(x) A Computer (or Three) In Every House • Consider households in Silicon Valley § A household has X Macs and Y PCs § Can’t have more than 3 Macs or 3 PCs X 0 1 2 3 p (y) Y Y 0 0.16 0.12 0.07 0.04 1 0.12 0.14 0.12 0 2 0.07 0.12 0 0 3 0.04 0 0 0 pX(x) A Computer (or Three) In Every House • Consider households in Silicon Valley § A household has X Macs and Y PCs § Can’t have more than 3 Macs or 3 PCs X 0 1 2 3 p (y) Y Y 0 0.16 0.12 0.07 0.04 0.39 1 0.12 0.14 0.12 0 0.38 2 0.07 0.12 0 0 0.19 3 0.04 0 0 0 0.04 pX(x) 0.39 0.38 0.19 0.04 1.00 Marginal distributions Ode to the Thai Café (demo) Let’s see the results! Four Prototypical Trajectories Way Back Permutations How many ways are there to order n distinct objects? n! Binomial How many ways are there to make an unordered selection of r objects from n objects? How many ways are there to order n objects such that: r are the same (indistinguishable) (n – r) are the same (indistinguishable)? n! n = r!(n r)! r − ✓ ◆ Called the “binomial” because of something from Algebra Multinomial How many ways are there to order n objects such that: n1 are the same (indistinguishable) n2 are the same (indistinguishable) … nr are the same (indistinguishable)? n n! = n ,n ,...,n n1!n2! ...nr! ✓ 1 2 r◆ Note: Multinomial > Binomial Binomial Distribution • Consider n independent trials of Ber(p) rand.

Load more