CS 109 Lecture 19 May 9Th, 2016 30 Midterm Distribution
Total Page:16
File Type:pdf, Size:1020Kb
CS 109 Lecture 19 May 9th, 2016 10 15 20 25 30 0 5 5 10 15 90 median= √ E[X] = 87 Va r 20 DistributionMidterm 25 (X) 30 35 15 = 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 30 Midterm Distribution 25 A E[X] = 87 A- √Va r (X) = 15 B+ 20 median = 90 B 15 B- 10 5 Cs A+ 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 110 115 100 105 120 10 15 20 25 30 0 5 5 10 15 90 median= √ E[X] = 87 Va r 20 DistributionMidterm 25 (X) 15 = 30 35 40 45 Understanding 50 55 Core 60 65 70 75 80 85 Understanding 90 Advanced 95 100 105 110 115 120 10 15 20 25 30 0 5 5 10 15 β ɑ 90 median= √ E[X] = 87 = 2.7 Va r 20 7.1 = 25 (X) 15 = 30 35 BetaMidterm 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 Midterm Beta 1.0 CS109 Midterm CDF 0.8 0.6 Yo u can interpret this as a percentile 0.4 function 0.2 0.0 0 20 40 60 80 100 120 0.14 0.12 CS109 Midterm PDF 0.10 Derivative of 0.08 percentile per point 0.06 0.04 0.02 0.00 0 20 40 60 80 100 120 Midterm Question Correlations Problem 1 Problem 2 100 100 50 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 2 4 6 8 10 12 14 16 18 20 Problem 3 Problem 4 100 100 50 50 0 0 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 Problem 5 Problem 6 100 100 50 50 0 0 2 4 6 8 10 12 2 4 6 8 10 12 14 16 18 20 22 24 Midterm Question Correlations 2.c 2.b 4.c 1.b Correlation 2.a 6.c 1.c 0.60 1.a 0.40 6.b 5 3.b 6.a 0.22 1.d 4.a 3.a 4.b 3.c *Correlations bellow 0.22 are not shown **Almost all correlations were positive Joint PMFJoint Probability for Questions Mass Function 6.a,6.c 6.a,6.c 8 7 6 5 4 3 y = 0.46x + 2.6 Points on 6.c Points 2 1 0 -1 -1 0 1 2 3 4 5 6 7 8 Points on 6.a JointJoint PMF Probability for MassQuestions Function 4, 6 4,6 20 18 16 14 12 y = 0.2x + 11 10 8 Total Points on 4 Points Total 6 4 2 0 0 4 8 12 16 20 24 Total Points on 6 ConditionalJoint Probability Mass Expectation Function 4, 6 • Let X be your score on problem 4. I use standard • Let Y be your score on problem 6. error because expectation is • E[X | Y]? from a sample 20 E[X Y = y] 18 | 16 14 12 10 8 6 4 2 y 0 0 4 8 12 16 20 24 Four Prototypical Trajectories 1. Inequalities Inequality, Probability and Joviality • In many cases, we don’t know the true form of a probability distribution § E.g., Midterm scores § But, we know the mean § May also have other measures/properties o Variance o Non-negativity o Etc. § Inequalities and bounds still allow us to say something about the probability distribution in such cases o May be imprecise compared to knowing true distribution! Markov’s Inequality • Say X is a non-negative random variable E[X ] P(X ≥ a) ≤ , for all a > 0 a • Proof: § I = 1 if X ≥ a, 0 otherwise X § Since X ≥ 0, I ≤ a § Taking expectations: ⎡ X ⎤ E[X ] E[I] = P(X ≥ a) ≤ E⎢ ⎥ = ⎣ a ⎦ a Andrey Markov • Andrey Andreyevich Markov (1856-1922) was a Russian mathematician § Markov’s Inequality is named after him § He also invented Markov Chains… o …which are the basis for Google’s PageRank algorithm Markov and the Midterm • Statistics from a previous quarter’s CS109 midterm § X = midterm score § Using sample mean X = 86.7 ≈ E[X] § What is P(X ≥ 100)? E[X ] 86.7 P(X ≥100) ≤ = ≈ 0.867 100 100 § Markov bound: ≤ 86.7% of class scored 100 or greater § In fact, 20.1% of class scored 100 or greater o Markov inequality can be a very loose bound o But, it made no assumption at all about form of distribution! Chebyshev’s Inequality • X is a random variable with E[X] = µ, Var(X) = σ2 σ 2 P( X − µ ≥ k) ≤ , for all k > 0 k 2 • Proof: 2 § Since (X – µ) is non-negative random variable, apply Markov’s Inequality with a = k2 E[(X − µ)2 ] σ 2 P((X − µ)2 ≥ k 2 ) ≤ = k 2 k 2 2 2 § Note that: (X – µ) ≥ k ⇔ |X – µ| ≥ k, yielding: σ 2 P( X − µ ≥ k) ≤ k 2 Pafnuty Chebyshev • Pafnuty Lvovich Chebyshev (1821-1894) was also a Russian mathematician § Chebyshev’s Inequality is named after him o But actually formulated by his colleague Irénée-Jules Bienaymé § He was Markov’s doctoral advisor o And sometimes credited with first deriving Markov’s Inequality § There is a crater on the moon named in his honor Four Prototypical Trajectories 2. Law of Large Numbers Weak Law of Large Numbers • Consider I.I.D. random variables X1, X2, ... 2 § Xi have distribution F with E[Xi] = µ and Var(Xi) = σ 1 n § Let X X = ∑ i n i=1 § For any ε > 0: P( X − µ ≥ ε) ⎯n⎯→⎯∞→0 • Proof: X X ... X 2 E[X] = E 1+ 2+ + n = µ Var(X ) Var X1+X2+...+Xn σ [ n ] = ( n )= n § By Chebyshev’s inequality: 2 2 σVar(X) σ n→∞ P( X − µ ≥ k) ≤ P( X − µ ≥ ε) ≤ 2 ⎯⎯⎯→0 k 2 k nε Strong Law of Large Numbers • Consider I.I.D. random variables X1, X2, ... § Xi have distribution F with E[Xi] = µ 1 n § Let X X = ∑ i n i=1 ⎛ ⎛ X1 + X 2 +... + X n ⎞ ⎞ P⎜lim⎜ ⎟ = µ ⎟ =1 ⎝ n→∞⎝ n ⎠ ⎠ § Strong Law ⇒ Weak Law, but not vice versa § Strong Law implies that for any ε > 0, there are only a finite number of values of n such that condition of Weak Law: X − µ ≥ ε holds. Intuitions and Misconceptions of LLN • Say we have repeated trials of an experiment § Let event E = some outcome of experiment § Let Xi = 1 if E occurs on trial i, 0 otherwise § Strong Law of Large Numbers (Strong LLN) yields: X + X +... + X 1 2 n → E[X ] = P(E) n i n(E) § Recall first week of class: P(E) = lim n→∞ n § Strong LLN justifies “frequency” notion of probability § Misconception arising from LLN: o Gambler’s fallacy: “I’m due for a win” o Consider being “due for a win” with repeated coin flips... La Loi des Grands Nombres • History of the Law of Large Numbers § 1713: Weak LLN described by Jacob Bernoulli § 1835: Poisson calls it “La Loi des Grands Nombres” o That would be “Law of Large Numbers” in French § 1909: Émile Borel develops Strong LLN for Bernoulli random variables § 1928: Andrei Nikolaevich Kolmogorov proves Strong LLN in general case Silence!! And now a moment of silence... ...before we present... ...the greatest result of probability theory! Four Prototypical Trajectories 3. Central Limit Theorem The Central Limit Theorem • Consider I.I.D. random variables X1, X2, ... 2 § Xi have distribution F with E[Xi] = µ and Var(Xi) = σ 1 n § X X Let: = ∑ i n i=1 § Central Limit Theorem: σ 2 X ~ N(µ, ) as n → ∞ n http://onlinestatbook.com/stat_sim/sampling_dist/ The Central Limit Theorem • Consider I.I.D. random variables X1, X2, ... 2 § Xi have distribution F with E[Xi] = µ and Var(Xi) = σ 1 n 2 § X X σ Let: = ∑ i X ~ N(µ, ) as n → ∞ n i=1 n X−µ § Recall Z = where Z ~ N(0, 1): σ 2 / n 1 n ⎡1 n ⎤ n X n 2 X i − µ ( i 1 i )− µ ( i=1 ) ⎢ ∑ = ⎥ X n σ ∑ n ( i 1 i )− µ X ~ N(µ, ) ⇔ Z = n = ⎣ ⎦ = ∑ = n σ 2 n n σ 2 n σ n X + X +... + X − nµ 1 2 n →~ N(0, 1) as n → ∞ σ n Another form of the Central Limit Theorem Once Upon a Time… Abraham De Moivre 1733 Once Upon a Time… • History of the Central Limit Theorem § 1733: CLT for X ~ Ber(1/2) postulated by Abraham de Moivre § 1823: Pierre-Simon Laplace extends de Moivre’s work to approximating Bin(n, p) with Normal § 1901: Aleksandr Lyapunov provides precise definition and rigorous proof of CLT § 2003: Charlie Sheen stars in television series “Two and a Half Men” o By end of the 7th (final) season, there were 161 episodes o Mean quality of subsamples of episodes is Normally distributed (thanks to the Central Limit Theorem) Central Limit Theorem in the Real World • CLT is why many things in “real world” appear Normally distributed § Many quantities are sum of independent variables § Exams scores o Sum of individual problems on the SAT o Why does the CLT not apply to our midterm? § Election polling o Ask 100 people if they will vote for candidate X (p1 = # “yes”/100) o Repeat this process with different groups to get p1, ..