Stats 512 513 Review ♥
Eileen Burns, FSA, MAAA
June 16, 2009 Contents
1 Basic Probability 4 1.1 Experiment, Sample Space, RV, and Probability ...... 4 1.2 Density, Value Number Urn Model, and Dice ...... 5 1.3 Expected Value Descriptive Parameters ...... 5 1.4 More About Normal Distributions ...... 5 1.5 Independent Trials and a Pictorial CLT ...... 5 1.6 The Population, the Sample, and Data ...... 5 1.7 Elementary Probability, Stressing Independence ...... 5 1.8 Expectation, Variance, and the CLT ...... 6 1.9 Applications of the CLT ...... 7
2 Introduction to Statistics 8 2.1 PresentationofData ...... 8 2.2 Estimation of µ and σ2 ...... 8 2.3 Elementary Classical Statistics ...... 8 2.4 Elementary Statistical Applications ...... 9
3 Probability Models 10 3.1 MathFacts ...... 10 3.2 Combinatorics and Hypergeometric RVs ...... 11 3.3 Independent Bernoulli Trials ...... 11 3.4 ThePoissonDistribution ...... 12 3.5 The Poisson Process N ...... 12 3.6 The Failure Rate Function λ( ) ...... 13 · 3.7 Min, Max, Median, and Order Statistics ...... 13 3.8 Multinomial Distributions ...... 14 3.9 Sampling from a Finite Populations, with a CLT ...... 14
4 Dependent Random Variables 17 4.1 Two-Dimensional Discrete RVs ...... 17 4.2 Two-Dimensional Continuous RVs ...... 18 4.3 Conditional Expectation ...... 19 4.4 Prediction...... 20 4.5 Covariance ...... 21 4.6 Bivariate Normal Distributions ...... 22
5 Distribution Theory 24 5.1 TransformationsofRVs ...... 24 5.2 Linear Transformations in Higher Dimensions ...... 25 5.3 General Transformations in Higher Dimensions ...... 27 5.4 Asymptotics...... 27 5.5 Moment Generating Functions ...... 28
2 CONTENTS 3
6 Classical Statistics 30 6.1 Estimation ...... 30 6.2 The One-Sample Normal Model ...... 30 6.3 The Two-Sample Normal Model ...... 31 6.4 Other Models ...... 32
7 Estimation, Principles and Approaches 33 7.1 Sufficiency, and UMVUE Estimators ...... 33 7.2 Completeness, and Ancillary Statistics ...... 34 7.3 Exponential Families, and Completeness ...... 35 7.4 The Cram´er-Rao Bound ...... 36 7.5 Maximum Likelihood Estimators ...... 37 7.6 Procedures for Location and Scale Models ...... 38 7.7 Bootstrap Methods ...... 39 7.8 BayesianMethods ...... 39
8 Hypothesis Tests 41 8.1 The Neyman-Pearson Lemma and UMP Tests ...... 41 8.2 TheLikelihoodRatioTest...... 42
9 Regression, Anova, and Categorical Data 44 9.1 Simple Linear Regression ...... 44 9.2 One Way Analysis of Variance ...... 45 9.3 CategoricalData ...... 45
A Distributions 47 A.1 ImportantTransformations ...... 47 Chapter 1
Basic Probability
1.1 Experiment, Sample Space, RV, and Probability
Bell Curve (6) Probability an observation falls within x standard deviations of the mean of a normal random variable x = 1: P ( 1 Z 1) = .682 x = 2: P (−2 ≤ Z ≤ 2) = .954 x = 3: P (−3 ≤ Z ≤ 3) = .9974 − ≤ ≤ X µ Note: this is true for any X N(µ, σ2) with Z = − . ∼ σ From a quantile perspective,
p .500 .600 .700 .750 .800 .850 .900 .950 .975 .990 .999
zp 0 .254 .524 .672 .840 1.37 1.28 1.645 1.96 2.33 3.80
Basic probability facts (8) P (Ac) = 1 P (A) P (A B) =−P (A) + P (B) P (A B) ∪ − ∩
Union-Intersection Principle (§1.7) (73)
n P ( 1 Ai) = i P (Ai) = i=j P (AiAj ) + i=j=k P (AiAjAk) i=j=k=l P (AiAj AkAl) + ∪ 6 6 6 − 6 6 6 · · · P P P P
Partition of Event (§1.7) (67) For a given partition of the event A, P ( ∞ A ) = ∞ P (A ) when all pairs of events A , A are disjoint. ∪i=1 i i=1 i i j P
Density Functions and Distribution Functions (10) For density function f and distribution function F , b b a P (a < X b) = a f(v) dv = f(v) dv f(v) dv = F (b) F (a) ≤ −∞ − −∞ − R R R 4 1.2. DENSITY, VALUE NUMBER URN MODEL, AND DICE 5
1.2 Density, Value Number Urn Model, and Dice
1.3 Expected Value Descriptive Parameters
Discrete Continuous Expected Value (26) (31) ∞ µ = E[X] v p (v) vf(v) dv X · X allv Z−∞ X ∞ E[g(X)] g(v) p (v) g(v)f(v) dv · X Xallv Z−∞ Mean deviation (27) (31) ∞ τ = E[ X µ ] v µ p (v) v µ f(v) dv X | − | | − X | · X | − X | Xallv Z−∞ Variance ∞ σ2 = E[X2] (E[X])2 (v µ )2 p (v) (v µ )2f(v) dv X − − X · X − X Xallv Z−∞ 1.4 More About Normal Distributions 1.5 Independent Trials and a Pictorial CLT Central Limit Theorem (46) Let X for 1 i n denote iid rvs. i ≤ ≤ Let Tn = X1 + X2 + + Xn. Let Z denote a N(0, 1)· · · rv. T nµx Then for all a < b , P a n − b P (a Z b) as n . −∞ ≤ ≤ ∞ ≤ √nσ ≤ → ≤ ≤ → ∞ x 1.6 The Population, the Sample, and Data 1.7 Elementary Probability, Stressing Independence Conditional Probability (58)
P (A B) P (A B) = ∩ P (A B) = P (A B) P (B) | P (B) ∩ | ·
Independent Events (62) P (A B) = P (A) P (A B) = P (A) P (B) | ∩ · General Independence of Random Variables (64)
1. X1 and X2 are independent rvs
2. P (X A and X B) = P (X A)P (X B) for all (one-dimensional) events A and B. 1 ∈ 2 ∈ 1 ∈ 2 ∈ 6 CHAPTER 1. BASIC PROBABILITY
3. Discrete: P (X1 = vi and X2 = wj ) = P (X1 = vi)P (X2 = wj ) for all vi, wj .
4. Continuous: fX1,X2 (v, w) = fX1 (v)fX2 (w) for all v and w in the sample space
5. Discrete: Repetitions X1, X2 of a value number X-experiment are independent provided sampling is done with replacement. 6. These statements generalizes to n dimensions for both discrete and continuous distributions.
Law of Total Probability (67)
Bayes Theorem (67) P (A B ) P (A B ) P (B ) P (A B )P (B ) P (B A) = ∩ i = | i · i = | i i i| P (A) P (A) P (A B )P (B ) + P (A B )P (B ) + + P (A B )P (B ) | 1 1 | 2 2 · · · | n n Marginals (67) pX (v) = P (X = v) = allw P (X = v and Y = w) = allw pX,Y (v, w). P P Example Maximums and minimums of independent rvs (74)
1.8 Expectation, Variance, and the CLT Sums of Arbitrary RVs (83) 2 2 2 µaX+b = aµX + b µaX+bY = aµX + bµY σaX+b = a σX
Sums of Independent RVs (83) E(XY ) = E(X) E(Y ) = µ µ V ar(X + Y ) = V ar(X) + V ar(Y ) = σ2 + σ2 · X · Y X Y Mean and Variance of a Sum (83) For independent repetitions of a basic X-experiment X ,...,X , Let T = X + X . Then 1 n n 1 · · · n E(Tn) = nµx 2 V ar(Tn) = nσx
Tn nµX X¯n µX Zn = − = − has mean 0 and variance 1. √nσX σX /√n
Covariance (86) Cov(X,Y ) = E[(X µ )(Y µ )] = E(XY ) µ µ − X − Y − X Y Correlation (90)
Corr[X,Y ] =Cov[X,Y ]/σX σY 1.9. APPLICATIONS OF THE CLT 7
Sample Moments (87) ¯ 1 n µˆ = X = n i=1 Xi
σˆ2 = 1 n P(X X¯ )2 n i=1 i − n 2 1P n ¯ 2 s = n 1 i=1(Xi Xn) − − σˆ = 1 P n (X X¯)(Y Y¯ ) X,Y n i=1 i − i − P Handy formulas (89) σ2 = E[X(X 1)] + µ µ2 σ2 = E[X(X +− 1)] µ − µ2 − − 1.9 Applications of the CLT
Continuity correction (93) Pretty graphs (99) Chapter 2
Introduction to Statistics
2.1 Presentation of Data
See Appendix B for examples of plot types.
2.2 Estimation of µ and σ2
Estimator Expected Value Variance ¯ 1 n ¯ ¯ 2 Sample meanµ ˆ = X = n i=1 Xi E(Xn) = µ Var[Xn] = σ /n (114) 2 1 n ¯ 2 2 2 2 µ4 1 n 3 4 Sample variance s = n 1 i=1(Xi Xn) E[Sn] = σ Var[Sn] = n n n−1 σ (115) − P − − − P 2.3 Elementary Classical Statistics
Actual accuracy ratio of X¯n(Tn) (121)
X¯n µ Z = − N(0, 1) n σ/√n ∼=
CLT of Probability (121) No matter what the shape of the distribution of the population of possible outcomes of an original X- experiment that has mean µ and standard deviation σ, the following result will hold true. As n gets larger, the distribution of the standardized Zn form of the X¯n-experiment gets closer to the distribution given by the standardized bell shaped Normal curve.
Estimated accuracy ratio of X¯n(Tn) (122)
X¯n µ Tn = − = Tn 1 Sn/√n ∼ −
CLT of Statistics (122)
The distribution of the ratio Tn is given (not by the standardized bell curve, but rather) by the Student n 1 density. This is an approximation that gets better as the sample size n gets larger. T −
Random confidence intervals vs. hypothesis testing vs p-values (123) Random confidence intervals: 95% CI for µ consists of exactly those values of µ that are .95-plausible Hypothesis testing: .05-level test of Ho : µ = µo rejects Ho at the .05 level if µo is not .95-plausible p-values: 2-sided p-value associated with µo is just exactly that level at which µo first becomes plausible
8 2.4. ELEMENTARY STATISTICAL APPLICATIONS 9
Theorem 2.3.1 – Student T -statistic (124) 2 If X1, X2,...,Xn are independent rvs that all have exactly a N(µ, σ ) distribution, then Students esti- mated accuracy ratio has exactly the n 1 distribution. T − Metatheorem 2.3.1 (126) 2 If X1, X2,...,Xn are independent rvs that all have approximately a N(µ, σ ) distribution, then Students estimated accuracy ratio has approximately the n 1 distribution. T − Metatheorem 2.3.2 (126)
If X1, X2,...,Xn are nearly independent rvs that have distributions that are not too skewed and do not produce severe outliers, then the following estimated accuracy ratio is approximately Normal, or some : W¯ µ T − . S
Student n density (128) T 2 Γ((n + 1)/2) 1 1 t /2 fn(t) = e− . √πn Γ(n/2) (1 + t2/n)(n+1)/2 → √2π 2.4 Elementary Statistical Applications Type I error (130)
Rejecting the null hypothesis Ho when true
Type II error (130)
Not rejecting Ho when it is false
Test statistic Tn (130)
X¯n µo Tn(µo) = − = Tn 1 sn/√n ∼ − Chapter 3
Probability Models
3.1 Math Facts
Geometric Series (135)
n k a n+1 ∞ k a Sn = ar = (1 r ) S = ar = 1 r − ∞ 1 r kX=0 − kX=0 − Infinite Series (135)
∞ k ∞ k 1 f(x) = akx f ′(x) = kakx − kX=0 kX=0 Exponential and Logarithmic Series (135)
∞ xk ∞ ( 1)k+1xk ex = log(1 + x) = − for x < 1 k! k | | kX=0 kX=1 Fundamental Exponential Limit (136)
a n 1 + n ea as n , whenever a a as n . n → → ∞ n → → ∞ Mean Value Theorem (136) f(b) f(a) = (b a)f ′(c) for some a L’Hospital’s Rule (136) Suppose f(a) = g(a) = 0, and that f ′(a) and g′(a) = 0 exist. Then f(x) f (a) f (x) 6 lim = ′ = lim ′ . x a x a → g(x) g′(a) → g′(x) Taylor Series Expansion (136) k (x a) (x a) (k) f(x) = f(a) + − f ′(a) + + − f (a) + R (x) 1! · · · k! k Integration by Parts (137) uvdv = uv vdu − R R 10 3.2. COMBINATORICS AND HYPERGEOMETRIC RVS 11 Gamma Function & Properties (138) r 1 1 1 Γ(r) = ∞ y − e− dy Γ(r) = (r 1)Γ(r 1) Γ(r) = (r 1)!Γ( ) = √π 0 − − − 2 R Standard Normal Density Function & Moments (139) 2 1 x /2 2 f(x) = e− µx = 0 σ = 1 √2π x Gamma Density & Moments (140) 1 wr 1 f(x) = − e w/θ µ = rθ σ2 = rθ2 Γ(r) θr − w w 3.2 Combinatorics and Hypergeometric RVs Permutations & Combinations (148) k n! k n n! Pn = (n k)! Cn = k = (n k)!k! − − Sampling with replacement (Binomial) (150) Sampling without replacement (Hypergeometric(R, W, n)) (151) R W k n k nR 2 nR (N R)(N n) P (X = k) = − µ = σ = − − N x N x N · N(N 1) n − Mathematics via revisualization (157) N N N N 1 N 1 m n 1 n n+1 n+2 m 1 k = N k k = k− + k −1 n = n−1 + n 1 + n 1 + n 1 + + n −1 − − − − − − · · · − 3.3 Independent Bernoulli Trials Bernoulli(p) – Xi = independent trials (171) 2 P (X = 1) = p µxi = p σxi = qp P (X = 0) = 1 p = q − Binomial(n, p) – Tn = sum of independent trials (173) n k n k 2 P (Tn = k) = k p q − µTn = np σTn = npq Geometric Turns – GeoT(p) – Yi = interarrival times (174) k 1 2 2 P (Y = k) = q − p µYi = 1/p σYi = q/p P (Y > k) = qk GeoT(p) lack of memory property (174) P (Y > i + k Y > k) = P (Y > i) = qi | Negative Binomial Turns – NegBiT(r, p) – Wr = waiting times (176) k 1 r k r 2 2 P (Wr = k) = r−1 p q − µWr = r/p σWr = rq/p − 12 CHAPTER 3. PROBABILITY MODELS Event Identity (176) [Wr > n] = [Tn < r] [Wr = k] = [TK 1 = r 1] [Xk = 1] − − ∩ Sums of NegBiT rvs (177) NegBiT(r, p)+NegBiT(s,p) =NegBiT(r + s,p) Geometric Failures – GeoF(p) – Y ′ = failures before first success (177) k ′ 2 2 P (Y ′ = k) = q p Y ′ + 1 = Y µY = q/p σY ′ = q/p ′ th Negative Binomial Failures – NegBiF(r, p) – Wr = failures before the k success (177) k+r 1 r k 2 2 − ′ ′ P (Wr′ = k) = r 1 p q Wr′ + r = Wr µWr = rq/p σWr = rq/p − Examples Comparing two success probabilities p1 and p2 (185) Gamber’s Ruin (194) 3.4 The Poisson Distribution Poisson(λ) – Nt = count by time t (208, 219) λk P (T = k) = e λ µ = λ σ2 = λ k! − T T Poisson Limit Theorem (208) k λ λ T Binomial(n, p ) λ = np λ as n P (T = k) P (T = k) = e− n ∼ n n n → → ∞ n → k! Sums of Poissons is Poisson (210) 3.5 The Poisson Process N Poisson(νt) – Nt = count by time t (208, 219) (νt)k P (T = k) = e νt µ = νt = λ σ2 = νt = λ k! − T T Exponential (θ) – Yi = interarrival times (220) νt 1 t/θ 2 2 2 fY (t) = νe− = θ e− µYi = 1/ν = θ σYi = (1/ν) = θ Event Identity (220) [Y >t] = [Nt = 0] = [Nt < 1] Exponential (θ) lack of memory (220) νt t/θ P (Y >s + t Y >s) = P (Y >t) = e− = e− | 3.6. THE FAILURE RATE FUNCTION λ( ) 13 · Gamma(r, ”rate”= ν) – Wr = waiting times (221) 1 1 tr 1 f (t) = νrtr 1e νt = − e t/θ µ = r/ν = rθ σ2 = r/ν2 = rθ2 Wr Γ(r) − − Γ(r) θr − Wr Wr Event Identity (221) [Wr >t] = [N(t) < r] Sums of Gamma RVs (222) Gamma(r, ν) + Gamma(s,ν) = Gamma(r + s,ν) Random location of the times of event occurrences is Binomial (222, 214) n k n k λ1 P (T1 = k T1 + T2 = n) = k p (1 p) − where p = | − λ1 + λ2 Conditioned Poisson sums have Binomial Distributions (222, 214) (N N = m) Binomial(n, p = s ) s| t ∼ t Example Comparison of bulb lifetimes (230) 3.6 The Failure Rate Function λ( ) · Failure Rate Function (241) f(t) With a constant failure rate, λ(t) = = F ′(t)/[1 F (t)] = λ for all t 0. 1 F (t) − ≥ − For a nonconstant failure rate, this generalizes to d log[1 F (t)] = λ(t). dt − − 3.7 Min, Max, Median, and Order Statistics Uniform(0,1) order statistics (252) Let U1,...,Un be idnependent Uniform(0,1) rvs. Let 0 [Un:k >t] = [Bn(t) < k] Distribution of Uniform Order Statistics (252) Un:k has the Beta(k, n k + 1) density. n k− 1 n k fUn:k (t) = k 1,1,n k t − (1 t) − for 0 t 1. − − − ≤ ≤ Asymptotic Distribution of the minimum of Uniform(0,1) rvs (258) n n t P (nU >t) = P (X > t/n) = (1 t/n) e− for all 0 t < n. n:1 − → ≤ 14 CHAPTER 3. PROBABILITY MODELS Joint Distribution of Uniform Order Statistics (253) n i 1 k i 1 n k f n:i n:k (s,t) = s − (t s) − − (1 t) − for 0 s