Basic Principles of Statistical Inference
Total Page:16
File Type:pdf, Size:1020Kb
Basic Principles of Statistical Inference Kosuke Imai Department of Politics Princeton University POL572 Quantitative Analysis II Spring 2016 Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 1 / 66 What is Statistics? Relatively new discipline Scientific revolution in the 20th century Data and computing revolutions in the 21st century The world is stochastic rather than deterministic Probability theory used to model stochastic events Statistical inference: Learning about what we do not observe (parameters) using what we observe (data) Without statistics:wildguess With statistics: principled guess 1 assumptions 2 formal properties 3 measure of uncertainty Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 2 / 66 Three Modes of Statistical Inference 1 Descriptive Inference: summarizing and exploring data Inferring “ideal points” from rollcall votes Inferring “topics” from texts and speeches Inferring “social networks” from surveys 2 Predictive Inference: forecasting out-of-sample data points Inferring future state failures from past failures Inferring population average turnout from a sample of voters Inferring individual level behavior from aggregate data 3 Causal Inference: predicting counterfactuals Inferring the effects of ethnic minority rule on civil war onset Inferring why incumbency status affects election outcomes Inferring whether the lack of war among democracies can be attributed to regime types Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 3 / 66 Statistics for Social Scientists Quantitative social science research: 1 Find a substantive question 2 Construct theory and hypothesis 3 Design an empirical study and collect data 4 Use statistics to analyze data and test hypothesis 5 Report the results No study in the social sciences is perfect Use best available methods and data, but be aware of limitations Many wrong answers but no single right answer Credibility of data analysis: Data analysis = assumption + statistical theory + interpretation | {z } | {z } | {z } subjective objective subjective Statistical methods are no substitute for good research design Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 4 / 66 Sample Surveys Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 5 / 66 Sample Surveys A large population of size N Finite population: N < 1 Super population: N = 1 A simple random sample of size n Probability sampling: e.g., stratified, cluster, systematic sampling Non-probability sampling: e.g., quota, volunteer, snowball sampling The population: Xi for i = 1;:::; N Sampling (binary) indicator: Z1;:::; ZN PN Assumption: i=1 Zi = n and Pr(Zi = 1) = n=N for all i N N! # of combinations: n = n!(N−n)! Estimand = population mean vs. Estimator = sample mean: N N 1 X 1 X X = X and x¯ = Z X N i n i i i=1 i=1 Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 6 / 66 Estimation of Population Mean Design-based inference Key idea: Randomness comes from sampling alone Unbiasedness (over repeated sampling): E(x¯) = X Variance of sampling distribution: n S2 (x¯) = 1 − V N n | {z } finite population correction 2 PN 2 where S = i=1(Xi − X) =(N − 1) is the population variance Unbiased estimator of the variance: n s2 σ^2 ≡ 1 − and (^σ2) = (x¯) N n E V 2 PN 2 where s = i=1 Zi (Xi − x¯) =(n − 1) is the sample variance Plug-in (sample analogue) principle Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 7 / 66 Some VERY Important Identities in Statistics 1 V(X) = E(X 2) − fE(X)g2 2 Cov(X; Y ) = E(XY ) − E(X)E(Y ) 3 Law of Iterated Expectation: E(X) = EfE(X j Y )g 4 Law of Total Variance: V(X) = EfV(X j Y )g + VfE(X j Y )g | {z } | {z } within−group variance between−group variance 5 Mean Squared Error Decomposition: 2 2 Ef(θ^ − θ) g = fE(θ^ − θ)g + V(θ^) | {z } |{z} bias2 variance Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 8 / 66 Analytical Details of Randomization Inference 1 2 2 2 n n E(Zi ) = E(Zi ) = n=N and V(Zi ) = E(Zi ) − E(Zi ) = N 1 − N ( − ) 2 n n 1 E(Zi Zj ) = E(Zi j Zj = 1) Pr(Zj = 1) = N(N−1) for i 6= j and thus n n Cov(Zi ; Zj ) = E(Zi Zj ) − E(Zi )E(Zj ) = − N(N−1) 1 − N 3 Use these results to derive the expression: N ! 1 X (x¯) = Z X V n2 V i i i=1 8 N N N 9 1 <X X X = = X 2 (Z ) + X X Cov(Z ; Z ) n2 i V i i j i j : i=1 i=1 j6=i ; 2 8 N N ! 9 1 n 1 < X X = = 1 − N X 2 − X n N N(N − 1) i i : i=1 i=1 ; | {z } =S2 PN 2 PN 2 2 where we used the equality i=1(Xi − X) = i=1 Xi − NX Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 9 / 66 4 Finally, we proceed as follows: 2 3 ( ) 8 92 N N < = X ¯ 2 6X ¯ 7 E Zi (Xi − x) = E 4 Zi (Xi − X) + (X − x) 5 i=1 i=1 :| {z }; add & subtract ( N ) X 2 2 = E Zi (Xi − X) − n(X − x¯) i=1 ( N ) X 2 = E Zi (Xi − X) − nV(x¯) i=1 n(N − 1) n = S2 − 1 − S2 N N = (n − 1)S2 Thus, E(s2) = S2, implying that the sample variance is unbiased for the population variance Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 10 / 66 Inverse Probability Weighting Unequal sampling probability: Pr(Zi = 1) = πi for each i We still randomly sample n units from the population of size N PN PN where i=1 Zi = n implying i=1 πi = n Oversampling of minorities, difficult-to-reach individuals, etc. Sampling weights = inverse of sampling probability Horvitz-Thompson estimator: N 1 X Z X x~ = i i N πi i=1 Unbiasedness: E(x~) = X Design-based variance is complicated but available Háyek estimator (biased but possibly more efficient): PN Z X /π x~∗ = i=1 i i i PN i=1 Zi /πi Unknow sampling probability post-stratification Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 11 / 66 Model-Based Inference An infinite population characterized by a probability model Nonparametric F 2 Parametric Fθ (e.g., N (µ, σ )) A simple random sample of size n: X1;:::; Xn Assumption: Xi is independently and identically distributed (i.i.d.) according to F Estimator = sample mean vs. Estimand = population mean: n 1 X µ^ ≡ X and µ ≡ (X ) n i E i i=1 Unbiasedness: E(^µ) = µ Variance and its unbiased estimator: n σ2 1 X (^µ) = and σ^2 ≡ (X − µ^)2 V n n − 1 i i=1 2 where σ = V(Xi ) Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 12 / 66 (Weak) Law of Large Numbers (LLN) n If fXi gi=1 is a sequence of i.i.d. random variables with mean µ and finite variance σ2, then p X n −! µ p where “−!” denotes the convergence in probability, i.e., if p Xn −! x, then lim Pr(jXn − xj > ) = 0 for any > 0 n!1 p If Xn −! x, then for any continuous function f (·), we have p f (Xn) −! f (x) Implication: Justifies the plug-in (sample analogue) principle Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 13 / 66 LLN in Action In Journal of Theoretical Biology, 1 “Big and Tall Parents have More Sons” (2005) 2 “Engineers Have More Sons, Nurses Have More Daughters” (2005) 3 “Violent Men Have More Sons” (2006) 4 “Beautiful Parents Have More Daughters” (2007) findings received broad attention in the news media. For example, the popular Freakonomics blog reported, A newLaw study ofby Satoshi Averages Kanaza- in action wa, an evolutionary psychologist at the London1 School of Econom- ics, suggests 1995:. there are 57.1% more beautiful2 women in the world 24 girls 35 1996: 56.6 boys 45 girls 35 boys boys than there are handsome men. 32 girls 24 Why? Kanazawa3 argues it’s be- cause good-looking1997: parents 51.8 are 1995 1996 1997 36 percent more likely to have a baby daughter4 1998: as their first 50.6 child than a baby son—which suggests, evolutionarily5 1999: speaking, 49.3 that beauty is a trait more valuable for women6 than 2000:for men. The 50.0study was conducted with data from 3,000 Americans, derived from the National Longitudinal Study : % of NoAdolescent dupilicates: Health, and was 47 7 23 girls 25 boys 21 girls 25 boys 30 boys 29 girls published in the Journal of Theo- retical Biology. 1998 1999 2000 Population frequency: 48:5% Publication in a peer-reviewed jour- Figure 4. The authors performed a sex-ratio study of the offspring of the most beautiful people nal seemed to have removed all skepti- in the world as selected by People magazine between 1995 and 2000. The girls started strong in cism, which is noteworthy given that Gelman1995 with 32 girls & to 24 Weakliem, boys. Girls continued strongAmerican in 1996. However,Scientist as the sample size grew, the authors of Freakonomics are them- the ratio converged on the population frequency, concluding with 157 girls and 172 boys, or 47.7 percent girls, approximately the same as the population frequency of 48.5 percent. selves well qualified to judge social Kosuke Imai (Princeton) Basic Principlesscience research. POL572 Spring 2016 14 / 66 In addition, the estimated effect The data are available for download removing the duplicates, such as grew during the reporting. As noted at http://www.stat.columbia.edu/ Brad Pitt, People’s most beautiful above, the 4.7 percent (and not sta- ~gelman/research/beautiful/ people from 1995 to 2000 have had tistically significant) difference in the As of 2007, the 50 most beautiful 157 girls out of 329 children, or 47.7 data became 8 percent in Kanazawa’s people of 1995 had 32 girls and 24 percent girls (with a standard error choice of the largest comparison (most boys, or 57.1 percent girls, which is of 2.8 percent), a statistically insig- attractive group versus the average of 8.6 percentage points higher than nificant 0.8 percentage points lower the four least attractive groups), which the population frequency of 48.5 than the population frequency.