CS 109 Lecture 19 May 9Th, 2016 30 Midterm Distribution

CS 109 Lecture 19 May 9th, 2016 10 15 20 25 30 0 5 5 10 15 90 median= √ E[X] = 87 Va r 20 DistributionMidterm 25 (X) 30 35 15 = 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 30 Midterm Distribution 25 A E[X] = 87 A- √Va r (X) = 15 B+ 20 median = 90 B 15 B- 10 5 Cs A+ 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 110 115 100 105 120 10 15 20 25 30 0 5 5 10 15 90 median= √ E[X] = 87 Va r 20 DistributionMidterm 25 (X) 15 = 30 35 40 45 Understanding 50 55 Core 60 65 70 75 80 85 Understanding 90 Advanced 95 100 105 110 115 120 10 15 20 25 30 0 5 5 10 15 β ɑ 90 median= √ E[X] = 87 = 2.7 Va r 20 7.1 = 25 (X) 15 = 30 35 BetaMidterm 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 Midterm Beta 1.0 CS109 Midterm CDF 0.8 0.6 Yo u can interpret this as a percentile 0.4 function 0.2 0.0 0 20 40 60 80 100 120 0.14 0.12 CS109 Midterm PDF 0.10 Derivative of 0.08 percentile per point 0.06 0.04 0.02 0.00 0 20 40 60 80 100 120 Midterm Question Correlations Problem 1 Problem 2 100 100 50 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 2 4 6 8 10 12 14 16 18 20 Problem 3 Problem 4 100 100 50 50 0 0 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 Problem 5 Problem 6 100 100 50 50 0 0 2 4 6 8 10 12 2 4 6 8 10 12 14 16 18 20 22 24 Midterm Question Correlations 2.c 2.b 4.c 1.b Correlation 2.a 6.c 1.c 0.60 1.a 0.40 6.b 5 3.b 6.a 0.22 1.d 4.a 3.a 4.b 3.c *Correlations bellow 0.22 are not shown **Almost all correlations were positive Joint PMFJoint Probability for Questions Mass Function 6.a,6.c 6.a,6.c 8 7 6 5 4 3 y = 0.46x + 2.6 Points on 6.c Points 2 1 0 -1 -1 0 1 2 3 4 5 6 7 8 Points on 6.a JointJoint PMF Probability for MassQuestions Function 4, 6 4,6 20 18 16 14 12 y = 0.2x + 11 10 8 Total Points on 4 Points Total 6 4 2 0 0 4 8 12 16 20 24 Total Points on 6 ConditionalJoint Probability Mass Expectation Function 4, 6 • Let X be your score on problem 4. I use standard • Let Y be your score on problem 6. error because expectation is • E[X | Y]? from a sample 20 E[X Y = y] 18 | 16 14 12 10 8 6 4 2 y 0 0 4 8 12 16 20 24 Four Prototypical Trajectories 1. Inequalities Inequality, Probability and Joviality • In many cases, we don’t know the true form of a probability distribution § E.g., Midterm scores § But, we know the mean § May also have other measures/properties o Variance o Non-negativity o Etc. § Inequalities and bounds still allow us to say something about the probability distribution in such cases o May be imprecise compared to knowing true distribution! Markov’s Inequality • Say X is a non-negative random variable E[X ] P(X ≥ a) ≤ , for all a > 0 a • Proof: § I = 1 if X ≥ a, 0 otherwise X § Since X ≥ 0, I ≤ a § Taking expectations: ⎡ X ⎤ E[X ] E[I] = P(X ≥ a) ≤ E⎢ ⎥ = ⎣ a ⎦ a Andrey Markov • Andrey Andreyevich Markov (1856-1922) was a Russian mathematician § Markov’s Inequality is named after him § He also invented Markov Chains… o …which are the basis for Google’s PageRank algorithm Markov and the Midterm • Statistics from a previous quarter’s CS109 midterm § X = midterm score § Using sample mean X = 86.7 ≈ E[X] § What is P(X ≥ 100)? E[X ] 86.7 P(X ≥100) ≤ = ≈ 0.867 100 100 § Markov bound: ≤ 86.7% of class scored 100 or greater § In fact, 20.1% of class scored 100 or greater o Markov inequality can be a very loose bound o But, it made no assumption at all about form of distribution! Chebyshev’s Inequality • X is a random variable with E[X] = µ, Var(X) = σ2 σ 2 P( X − µ ≥ k) ≤ , for all k > 0 k 2 • Proof: 2 § Since (X – µ) is non-negative random variable, apply Markov’s Inequality with a = k2 E[(X − µ)2 ] σ 2 P((X − µ)2 ≥ k 2 ) ≤ = k 2 k 2 2 2 § Note that: (X – µ) ≥ k ⇔ |X – µ| ≥ k, yielding: σ 2 P( X − µ ≥ k) ≤ k 2 Pafnuty Chebyshev • Pafnuty Lvovich Chebyshev (1821-1894) was also a Russian mathematician § Chebyshev’s Inequality is named after him o But actually formulated by his colleague Irénée-Jules Bienaymé § He was Markov’s doctoral advisor o And sometimes credited with first deriving Markov’s Inequality § There is a crater on the moon named in his honor Four Prototypical Trajectories 2. Law of Large Numbers Weak Law of Large Numbers • Consider I.I.D. random variables X1, X2, ... 2 § Xi have distribution F with E[Xi] = µ and Var(Xi) = σ 1 n § Let X X = ∑ i n i=1 § For any ε > 0: P( X − µ ≥ ε) ⎯n⎯→⎯∞→0 • Proof: X X ... X 2 E[X] = E 1+ 2+ + n = µ Var(X ) Var X1+X2+...+Xn σ [ n ] = ( n )= n § By Chebyshev’s inequality: 2 2 σVar(X) σ n→∞ P( X − µ ≥ k) ≤ P( X − µ ≥ ε) ≤ 2 ⎯⎯⎯→0 k 2 k nε Strong Law of Large Numbers • Consider I.I.D. random variables X1, X2, ... § Xi have distribution F with E[Xi] = µ 1 n § Let X X = ∑ i n i=1 ⎛ ⎛ X1 + X 2 +... + X n ⎞ ⎞ P⎜lim⎜ ⎟ = µ ⎟ =1 ⎝ n→∞⎝ n ⎠ ⎠ § Strong Law ⇒ Weak Law, but not vice versa § Strong Law implies that for any ε > 0, there are only a finite number of values of n such that condition of Weak Law: X − µ ≥ ε holds. Intuitions and Misconceptions of LLN • Say we have repeated trials of an experiment § Let event E = some outcome of experiment § Let Xi = 1 if E occurs on trial i, 0 otherwise § Strong Law of Large Numbers (Strong LLN) yields: X + X +... + X 1 2 n → E[X ] = P(E) n i n(E) § Recall first week of class: P(E) = lim n→∞ n § Strong LLN justifies “frequency” notion of probability § Misconception arising from LLN: o Gambler’s fallacy: “I’m due for a win” o Consider being “due for a win” with repeated coin flips... La Loi des Grands Nombres • History of the Law of Large Numbers § 1713: Weak LLN described by Jacob Bernoulli § 1835: Poisson calls it “La Loi des Grands Nombres” o That would be “Law of Large Numbers” in French § 1909: Émile Borel develops Strong LLN for Bernoulli random variables § 1928: Andrei Nikolaevich Kolmogorov proves Strong LLN in general case Silence!! And now a moment of silence... ...before we present... ...the greatest result of probability theory! Four Prototypical Trajectories 3. Central Limit Theorem The Central Limit Theorem • Consider I.I.D. random variables X1, X2, ... 2 § Xi have distribution F with E[Xi] = µ and Var(Xi) = σ 1 n § X X Let: = ∑ i n i=1 § Central Limit Theorem: σ 2 X ~ N(µ, ) as n → ∞ n http://onlinestatbook.com/stat_sim/sampling_dist/ The Central Limit Theorem • Consider I.I.D. random variables X1, X2, ... 2 § Xi have distribution F with E[Xi] = µ and Var(Xi) = σ 1 n 2 § X X σ Let: = ∑ i X ~ N(µ, ) as n → ∞ n i=1 n X−µ § Recall Z = where Z ~ N(0, 1): σ 2 / n 1 n ⎡1 n ⎤ n X n 2 X i − µ ( i 1 i )− µ ( i=1 ) ⎢ ∑ = ⎥ X n σ ∑ n ( i 1 i )− µ X ~ N(µ, ) ⇔ Z = n = ⎣ ⎦ = ∑ = n σ 2 n n σ 2 n σ n X + X +... + X − nµ 1 2 n →~ N(0, 1) as n → ∞ σ n Another form of the Central Limit Theorem Once Upon a Time… Abraham De Moivre 1733 Once Upon a Time… • History of the Central Limit Theorem § 1733: CLT for X ~ Ber(1/2) postulated by Abraham de Moivre § 1823: Pierre-Simon Laplace extends de Moivre’s work to approximating Bin(n, p) with Normal § 1901: Aleksandr Lyapunov provides precise definition and rigorous proof of CLT § 2003: Charlie Sheen stars in television series “Two and a Half Men” o By end of the 7th (final) season, there were 161 episodes o Mean quality of subsamples of episodes is Normally distributed (thanks to the Central Limit Theorem) Central Limit Theorem in the Real World • CLT is why many things in “real world” appear Normally distributed § Many quantities are sum of independent variables § Exams scores o Sum of individual problems on the SAT o Why does the CLT not apply to our midterm? § Election polling o Ask 100 people if they will vote for candidate X (p1 = # “yes”/100) o Repeat this process with different groups to get p1, ..

CS 109 Lecture 19 May 9Th, 2016 30 Midterm Distribution

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support