Basic Principles of Statistical Inference

Total Page:16

File Type:pdf, Size:1020Kb

Basic Principles of Statistical Inference Basic Principles of Statistical Inference Kosuke Imai Department of Politics Princeton University POL572 Quantitative Analysis II Spring 2016 Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 1 / 66 What is Statistics? Relatively new discipline Scientific revolution in the 20th century Data and computing revolutions in the 21st century The world is stochastic rather than deterministic Probability theory used to model stochastic events Statistical inference: Learning about what we do not observe (parameters) using what we observe (data) Without statistics:wildguess With statistics: principled guess 1 assumptions 2 formal properties 3 measure of uncertainty Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 2 / 66 Three Modes of Statistical Inference 1 Descriptive Inference: summarizing and exploring data Inferring “ideal points” from rollcall votes Inferring “topics” from texts and speeches Inferring “social networks” from surveys 2 Predictive Inference: forecasting out-of-sample data points Inferring future state failures from past failures Inferring population average turnout from a sample of voters Inferring individual level behavior from aggregate data 3 Causal Inference: predicting counterfactuals Inferring the effects of ethnic minority rule on civil war onset Inferring why incumbency status affects election outcomes Inferring whether the lack of war among democracies can be attributed to regime types Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 3 / 66 Statistics for Social Scientists Quantitative social science research: 1 Find a substantive question 2 Construct theory and hypothesis 3 Design an empirical study and collect data 4 Use statistics to analyze data and test hypothesis 5 Report the results No study in the social sciences is perfect Use best available methods and data, but be aware of limitations Many wrong answers but no single right answer Credibility of data analysis: Data analysis = assumption + statistical theory + interpretation | {z } | {z } | {z } subjective objective subjective Statistical methods are no substitute for good research design Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 4 / 66 Sample Surveys Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 5 / 66 Sample Surveys A large population of size N Finite population: N < 1 Super population: N = 1 A simple random sample of size n Probability sampling: e.g., stratified, cluster, systematic sampling Non-probability sampling: e.g., quota, volunteer, snowball sampling The population: Xi for i = 1;:::; N Sampling (binary) indicator: Z1;:::; ZN PN Assumption: i=1 Zi = n and Pr(Zi = 1) = n=N for all i N N! # of combinations: n = n!(N−n)! Estimand = population mean vs. Estimator = sample mean: N N 1 X 1 X X = X and x¯ = Z X N i n i i i=1 i=1 Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 6 / 66 Estimation of Population Mean Design-based inference Key idea: Randomness comes from sampling alone Unbiasedness (over repeated sampling): E(x¯) = X Variance of sampling distribution: n S2 (x¯) = 1 − V N n | {z } finite population correction 2 PN 2 where S = i=1(Xi − X) =(N − 1) is the population variance Unbiased estimator of the variance: n s2 σ^2 ≡ 1 − and (^σ2) = (x¯) N n E V 2 PN 2 where s = i=1 Zi (Xi − x¯) =(n − 1) is the sample variance Plug-in (sample analogue) principle Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 7 / 66 Some VERY Important Identities in Statistics 1 V(X) = E(X 2) − fE(X)g2 2 Cov(X; Y ) = E(XY ) − E(X)E(Y ) 3 Law of Iterated Expectation: E(X) = EfE(X j Y )g 4 Law of Total Variance: V(X) = EfV(X j Y )g + VfE(X j Y )g | {z } | {z } within−group variance between−group variance 5 Mean Squared Error Decomposition: 2 2 Ef(θ^ − θ) g = fE(θ^ − θ)g + V(θ^) | {z } |{z} bias2 variance Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 8 / 66 Analytical Details of Randomization Inference 1 2 2 2 n n E(Zi ) = E(Zi ) = n=N and V(Zi ) = E(Zi ) − E(Zi ) = N 1 − N ( − ) 2 n n 1 E(Zi Zj ) = E(Zi j Zj = 1) Pr(Zj = 1) = N(N−1) for i 6= j and thus n n Cov(Zi ; Zj ) = E(Zi Zj ) − E(Zi )E(Zj ) = − N(N−1) 1 − N 3 Use these results to derive the expression: N ! 1 X (x¯) = Z X V n2 V i i i=1 8 N N N 9 1 <X X X = = X 2 (Z ) + X X Cov(Z ; Z ) n2 i V i i j i j : i=1 i=1 j6=i ; 2 8 N N ! 9 1 n 1 < X X = = 1 − N X 2 − X n N N(N − 1) i i : i=1 i=1 ; | {z } =S2 PN 2 PN 2 2 where we used the equality i=1(Xi − X) = i=1 Xi − NX Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 9 / 66 4 Finally, we proceed as follows: 2 3 ( ) 8 92 N N < = X ¯ 2 6X ¯ 7 E Zi (Xi − x) = E 4 Zi (Xi − X) + (X − x) 5 i=1 i=1 :| {z }; add & subtract ( N ) X 2 2 = E Zi (Xi − X) − n(X − x¯) i=1 ( N ) X 2 = E Zi (Xi − X) − nV(x¯) i=1 n(N − 1) n = S2 − 1 − S2 N N = (n − 1)S2 Thus, E(s2) = S2, implying that the sample variance is unbiased for the population variance Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 10 / 66 Inverse Probability Weighting Unequal sampling probability: Pr(Zi = 1) = πi for each i We still randomly sample n units from the population of size N PN PN where i=1 Zi = n implying i=1 πi = n Oversampling of minorities, difficult-to-reach individuals, etc. Sampling weights = inverse of sampling probability Horvitz-Thompson estimator: N 1 X Z X x~ = i i N πi i=1 Unbiasedness: E(x~) = X Design-based variance is complicated but available Háyek estimator (biased but possibly more efficient): PN Z X /π x~∗ = i=1 i i i PN i=1 Zi /πi Unknow sampling probability post-stratification Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 11 / 66 Model-Based Inference An infinite population characterized by a probability model Nonparametric F 2 Parametric Fθ (e.g., N (µ, σ )) A simple random sample of size n: X1;:::; Xn Assumption: Xi is independently and identically distributed (i.i.d.) according to F Estimator = sample mean vs. Estimand = population mean: n 1 X µ^ ≡ X and µ ≡ (X ) n i E i i=1 Unbiasedness: E(^µ) = µ Variance and its unbiased estimator: n σ2 1 X (^µ) = and σ^2 ≡ (X − µ^)2 V n n − 1 i i=1 2 where σ = V(Xi ) Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 12 / 66 (Weak) Law of Large Numbers (LLN) n If fXi gi=1 is a sequence of i.i.d. random variables with mean µ and finite variance σ2, then p X n −! µ p where “−!” denotes the convergence in probability, i.e., if p Xn −! x, then lim Pr(jXn − xj > ) = 0 for any > 0 n!1 p If Xn −! x, then for any continuous function f (·), we have p f (Xn) −! f (x) Implication: Justifies the plug-in (sample analogue) principle Kosuke Imai (Princeton) Basic Principles POL572 Spring 2016 13 / 66 LLN in Action In Journal of Theoretical Biology, 1 “Big and Tall Parents have More Sons” (2005) 2 “Engineers Have More Sons, Nurses Have More Daughters” (2005) 3 “Violent Men Have More Sons” (2006) 4 “Beautiful Parents Have More Daughters” (2007) findings received broad attention in the news media. For example, the popular Freakonomics blog reported, A newLaw study ofby Satoshi Averages Kanaza- in action wa, an evolutionary psychologist at the London1 School of Econom- ics, suggests 1995:. there are 57.1% more beautiful2 women in the world 24 girls 35 1996: 56.6 boys 45 girls 35 boys boys than there are handsome men. 32 girls 24 Why? Kanazawa3 argues it’s be- cause good-looking1997: parents 51.8 are 1995 1996 1997 36 percent more likely to have a baby daughter4 1998: as their first 50.6 child than a baby son—which suggests, evolutionarily5 1999: speaking, 49.3 that beauty is a trait more valuable for women6 than 2000:for men. The 50.0study was conducted with data from 3,000 Americans, derived from the National Longitudinal Study : % of NoAdolescent dupilicates: Health, and was 47 7 23 girls 25 boys 21 girls 25 boys 30 boys 29 girls published in the Journal of Theo- retical Biology. 1998 1999 2000 Population frequency: 48:5% Publication in a peer-reviewed jour- Figure 4. The authors performed a sex-ratio study of the offspring of the most beautiful people nal seemed to have removed all skepti- in the world as selected by People magazine between 1995 and 2000. The girls started strong in cism, which is noteworthy given that Gelman1995 with 32 girls & to 24 Weakliem, boys. Girls continued strongAmerican in 1996. However,Scientist as the sample size grew, the authors of Freakonomics are them- the ratio converged on the population frequency, concluding with 157 girls and 172 boys, or 47.7 percent girls, approximately the same as the population frequency of 48.5 percent. selves well qualified to judge social Kosuke Imai (Princeton) Basic Principlesscience research. POL572 Spring 2016 14 / 66 In addition, the estimated effect The data are available for download removing the duplicates, such as grew during the reporting. As noted at http://www.stat.columbia.edu/ Brad Pitt, People’s most beautiful above, the 4.7 percent (and not sta- ~gelman/research/beautiful/ people from 1995 to 2000 have had tistically significant) difference in the As of 2007, the 50 most beautiful 157 girls out of 329 children, or 47.7 data became 8 percent in Kanazawa’s people of 1995 had 32 girls and 24 percent girls (with a standard error choice of the largest comparison (most boys, or 57.1 percent girls, which is of 2.8 percent), a statistically insig- attractive group versus the average of 8.6 percentage points higher than nificant 0.8 percentage points lower the four least attractive groups), which the population frequency of 48.5 than the population frequency.
Recommended publications
  • Statistical Inference: How Reliable Is a Survey?
    Math 203, Fall 2008: Statistical Inference: How reliable is a survey? Consider a survey with a single question, to which respondents are asked to give an answer of yes or no. Suppose you pick a random sample of n people, and you find that the proportion that answered yes isp ˆ. Question: How close isp ˆ to the actual proportion p of people in the whole population who would have answered yes? In order for there to be a reliable answer to this question, the sample size, n, must be big enough so that the sample distribution is close to a bell shaped curve (i.e., close to a normal distribution). But even if n is big enough that the distribution is close to a normal distribution, usually you need to make n even bigger in order to make sure your margin of error is reasonably small. Thus the first thing to do is to be sure n is big enough for the sample distribution to be close to normal. The industry standard for being close enough is for n to be big enough so that 1 − p 1 − p n > 9 and n > 9 p p both hold. When p is about 50%, n can be as small as 10, but when p gets close to 0 or close to 1, the sample size n needs to get bigger. If p is 1% or 99%, then n must be at least 892, for example. (Note also that n here depends on p but not on the size of the whole population.) See Figures 1 and 2 showing frequency histograms for the number of yes respondents if p = 1% when the sample size n is 10 versus 1000 (this data was obtained by running a computer simulation taking 10000 samples).
    [Show full text]
  • Computing the Oja Median in R: the Package Ojanp
    Computing the Oja Median in R: The Package OjaNP Daniel Fischer Karl Mosler Natural Resources Institute of Econometrics Institute Finland (Luke) and Statistics Finland University of Cologne and Germany School of Health Sciences University of Tampere Finland Jyrki M¨ott¨onen Klaus Nordhausen Department of Social Research Department of Mathematics University of Helsinki and Statistics Finland University of Turku Finland and School of Health Sciences University of Tampere Finland Oleksii Pokotylo Daniel Vogel Cologne Graduate School Institute of Complex Systems University of Cologne and Mathematical Biology Germany University of Aberdeen United Kingdom June 27, 2016 arXiv:1606.07620v1 [stat.CO] 24 Jun 2016 Abstract The Oja median is one of several extensions of the univariate median to the mul- tivariate case. It has many nice properties, but is computationally demanding. In this paper, we first review the properties of the Oja median and compare it to other multivariate medians. Afterwards we discuss four algorithms to compute the Oja median, which are implemented in our R-package OjaNP. Besides these 1 algorithms, the package contains also functions to compute Oja signs, Oja signed ranks, Oja ranks, and the related scatter concepts. To illustrate their use, the corresponding multivariate one- and C-sample location tests are implemented. Keywords Oja median, Oja signs, Oja signed ranks, Oja ranks, R,C,C++ 1 Introduction The univariate median is a popular location estimator. It is, however, not straightforward to generalize it to the multivariate case since no generalization known retains all properties of univariate estimator, and therefore different gen- eralizations emphasize different properties of the univariate median.
    [Show full text]
  • Statistical Inferences Hypothesis Tests
    STATISTICAL INFERENCES HYPOTHESIS TESTS Maªgorzata Murat BASIC IDEAS Suppose we have collected a representative sample that gives some information concerning a mean or other statistical quantity. There are two main questions for which statistical inference may provide answers. 1. Are the sample quantity and a corresponding population quantity close enough together so that it is reasonable to say that the sample might have come from the population? Or are they far enough apart so that they likely represent dierent populations? 2. For what interval of values can we have a specic level of condence that the interval contains the true value of the parameter of interest? STATISTICAL INFERENCES FOR THE MEAN We can divide statistical inferences for the mean into two main categories. In one category, we already know the variance or standard deviation of the population, usually from previous measurements, and the normal distribution can be used for calculations. In the other category, we nd an estimate of the variance or standard deviation of the population from the sample itself. The normal distribution is assumed to apply to the underlying population, but another distribution related to it will usually be required for calculations. TEST OF HYPOTHESIS We are testing the hypothesis that a sample is similar enough to a particular population so that it might have come from that population. Hence we make the null hypothesis that the sample came from a population having the stated value of the population characteristic (the mean or the variation). Then we do calculations to see how reasonable such a hypothesis is. We have to keep in mind the alternative if the null hypothesis is not true, as the alternative will aect the calculations.
    [Show full text]
  • SAMPLING DESIGN & WEIGHTING in the Original
    Appendix A 2096 APPENDIX A: SAMPLING DESIGN & WEIGHTING In the original National Science Foundation grant, support was given for a modified probability sample. Samples for the 1972 through 1974 surveys followed this design. This modified probability design, described below, introduces the quota element at the block level. The NSF renewal grant, awarded for the 1975-1977 surveys, provided funds for a full probability sample design, a design which is acknowledged to be superior. Thus, having the wherewithal to shift to a full probability sample with predesignated respondents, the 1975 and 1976 studies were conducted with a transitional sample design, viz., one-half full probability and one-half block quota. The sample was divided into two parts for several reasons: 1) to provide data for possibly interesting methodological comparisons; and 2) on the chance that there are some differences over time, that it would be possible to assign these differences to either shifts in sample designs, or changes in response patterns. For example, if the percentage of respondents who indicated that they were "very happy" increased by 10 percent between 1974 and 1976, it would be possible to determine whether it was due to changes in sample design, or an actual increase in happiness. There is considerable controversy and ambiguity about the merits of these two samples. Text book tests of significance assume full rather than modified probability samples, and simple random rather than clustered random samples. In general, the question of what to do with a mixture of samples is no easier solved than the question of what to do with the "pure" types.
    [Show full text]
  • The Focused Information Criterion in Logistic Regression to Predict Repair of Dental Restorations
    THE FOCUSED INFORMATION CRITERION IN LOGISTIC REGRESSION TO PREDICT REPAIR OF DENTAL RESTORATIONS Cecilia CANDOLO1 ABSTRACT: Statistical data analysis typically has several stages: exploration of the data set; deciding on a class or classes of models to be considered; selecting the best of them according to some criterion and making inferences based on the selected model. The cycle is usually iterative and will involve subject-matter considerations as well as statistical insights. The conclusion reached after such a process depends on the model(s) selected, but the consequent uncertainty is not usually incorporated into the inference. This may lead to underestimation of the uncertainty about quantities of interest and overoptimistic and biased inferences. This framework has been the aim of research under the terminology of model uncertainty and model averanging in both, frequentist and Bayesian approaches. The former usually uses the Akaike´s information criterion (AIC), the Bayesian information criterion (BIC) and the bootstrap method. The last weigths the models using the posterior model probabilities. This work consider model selection uncertainty in logistic regression under frequentist and Bayesian approaches, incorporating the use of the focused information criterion (FIC) (CLAESKENS and HJORT, 2003) to predict repair of dental restorations. The FIC takes the view that a best model should depend on the parameter under focus, such as the mean, or the variance, or the particular covariate values. In this study, the repair or not of dental restorations in a period of eighteen months depends on several covariates measured in teenagers. The data were kindly provided by Juliana Feltrin de Souza, a doctorate student at the Faculty of Dentistry of Araraquara - UNESP.
    [Show full text]
  • Summary of Human Subjects Protection Issues Related to Large Sample Surveys
    Summary of Human Subjects Protection Issues Related to Large Sample Surveys U.S. Department of Justice Bureau of Justice Statistics Joan E. Sieber June 2001, NCJ 187692 U.S. Department of Justice Office of Justice Programs John Ashcroft Attorney General Bureau of Justice Statistics Lawrence A. Greenfeld Acting Director Report of work performed under a BJS purchase order to Joan E. Sieber, Department of Psychology, California State University at Hayward, Hayward, California 94542, (510) 538-5424, e-mail [email protected]. The author acknowledges the assistance of Caroline Wolf Harlow, BJS Statistician and project monitor. Ellen Goldberg edited the document. Contents of this report do not necessarily reflect the views or policies of the Bureau of Justice Statistics or the Department of Justice. This report and others from the Bureau of Justice Statistics are available through the Internet — http://www.ojp.usdoj.gov/bjs Table of Contents 1. Introduction 2 Limitations of the Common Rule with respect to survey research 2 2. Risks and benefits of participation in sample surveys 5 Standard risk issues, researcher responses, and IRB requirements 5 Long-term consequences 6 Background issues 6 3. Procedures to protect privacy and maintain confidentiality 9 Standard issues and problems 9 Confidentiality assurances and their consequences 21 Emerging issues of privacy and confidentiality 22 4. Other procedures for minimizing risks and promoting benefits 23 Identifying and minimizing risks 23 Identifying and maximizing possible benefits 26 5. Procedures for responding to requests for help or assistance 28 Standard procedures 28 Background considerations 28 A specific recommendation: An experiment within the survey 32 6.
    [Show full text]
  • Annual Report of the Center for Statistical Research and Methodology Research and Methodology Directorate Fiscal Year 2017
    Annual Report of the Center for Statistical Research and Methodology Research and Methodology Directorate Fiscal Year 2017 Decennial Directorate Customers Demographic Directorate Customers Missing Data, Edit, Survey Sampling: and Imputation Estimation and CSRM Expertise Modeling for Collaboration Economic and Research Experimentation and Record Linkage Directorate Modeling Customers Small Area Simulation, Data Time Series and Estimation Visualization, and Seasonal Adjustment Modeling Field Directorate Customers Other Internal and External Customers ince August 1, 1933— S “… As the major figures from the American Statistical Association (ASA), Social Science Research Council, and new Roosevelt academic advisors discussed the statistical needs of the nation in the spring of 1933, it became clear that the new programs—in particular the National Recovery Administration—would require substantial amounts of data and coordination among statistical programs. Thus in June of 1933, the ASA and the Social Science Research Council officially created the Committee on Government Statistics and Information Services (COGSIS) to serve the statistical needs of the Agriculture, Commerce, Labor, and Interior departments … COGSIS set … goals in the field of federal statistics … (It) wanted new statistical programs—for example, to measure unemployment and address the needs of the unemployed … (It) wanted a coordinating agency to oversee all statistical programs, and (it) wanted to see statistical research and experimentation organized within the federal government … In August 1933 Stuart A. Rice, President of the ASA and acting chair of COGSIS, … (became) assistant director of the (Census) Bureau. Joseph Hill (who had been at the Census Bureau since 1900 and who provided the concepts and early theory for what is now the methodology for apportioning the seats in the U.S.
    [Show full text]
  • Survey Experiments
    IU Workshop in Methods – 2019 Survey Experiments Testing Causality in Diverse Samples Trenton D. Mize Department of Sociology & Advanced Methodologies (AMAP) Purdue University Survey Experiments Page 1 Survey Experiments Page 2 Contents INTRODUCTION ............................................................................................................................................................................ 8 Overview .............................................................................................................................................................................. 8 What is a survey experiment? .................................................................................................................................... 9 What is an experiment?.............................................................................................................................................. 10 Independent and dependent variables ................................................................................................................. 11 Experimental Conditions ............................................................................................................................................. 12 WHY CONDUCT A SURVEY EXPERIMENT? ........................................................................................................................... 13 Internal, external, and construct validity ..........................................................................................................
    [Show full text]
  • Evaluating Survey Questions Question
    What Respondents Do to Answer a Evaluating Survey Questions Question • Comprehend Question • Retrieve Information from Memory Chase H. Harrison Ph.D. • Summarize Information Program on Survey Research • Report an Answer Harvard University Problems in Answering Survey Problems in Answering Survey Questions Questions – Failure to comprehend – Failure to recall • If respondents don’t understand question, they • Questions assume respondents have information cannot answer it • If respondents never learned something, they • If different respondents understand question cannot provide information about it differently, they end up answering different questions • Problems with researcher putting more emphasis on subject than respondent Problems in Answering Survey Problems in Answering Survey Questions Questions – Problems Summarizing – Problems Reporting Answers • If respondents are thinking about a lot of things, • Confusing or vague answer formats lead to they can inconsistently summarize variability • If the way the respondent remembers something • Interactions with interviewers or technology can doesn’t readily correspond to the question, they lead to problems (sensitive or embarrassing may be inconsistemt responses) 1 Evaluating Survey Questions Focus Groups • Early stage • Qualitative research tool – Focus groups to understand topics or dimensions of measures • Used to develop ideas for questionnaires • Pre-Test Stage – Cognitive interviews to understand question meaning • Used to understand scope of issues – Pre-test under typical field
    [Show full text]
  • The Theory of the Design of Experiments
    The Theory of the Design of Experiments D.R. COX Honorary Fellow Nuffield College Oxford, UK AND N. REID Professor of Statistics University of Toronto, Canada CHAPMAN & HALL/CRC Boca Raton London New York Washington, D.C. C195X/disclaimer Page 1 Friday, April 28, 2000 10:59 AM Library of Congress Cataloging-in-Publication Data Cox, D. R. (David Roxbee) The theory of the design of experiments / D. R. Cox, N. Reid. p. cm. — (Monographs on statistics and applied probability ; 86) Includes bibliographical references and index. ISBN 1-58488-195-X (alk. paper) 1. Experimental design. I. Reid, N. II.Title. III. Series. QA279 .C73 2000 001.4 '34 —dc21 00-029529 CIP This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC for such copying. Direct all inquiries to CRC Press LLC, 2000 N.W.
    [Show full text]
  • Statistical Theory
    Statistical Theory Prof. Gesine Reinert November 23, 2009 Aim: To review and extend the main ideas in Statistical Inference, both from a frequentist viewpoint and from a Bayesian viewpoint. This course serves not only as background to other courses, but also it will provide a basis for developing novel inference methods when faced with a new situation which includes uncertainty. Inference here includes estimating parameters and testing hypotheses. Overview • Part 1: Frequentist Statistics { Chapter 1: Likelihood, sufficiency and ancillarity. The Factoriza- tion Theorem. Exponential family models. { Chapter 2: Point estimation. When is an estimator a good estima- tor? Covering bias and variance, information, efficiency. Methods of estimation: Maximum likelihood estimation, nuisance parame- ters and profile likelihood; method of moments estimation. Bias and variance approximations via the delta method. { Chapter 3: Hypothesis testing. Pure significance tests, signifi- cance level. Simple hypotheses, Neyman-Pearson Lemma. Tests for composite hypotheses. Sample size calculation. Uniformly most powerful tests, Wald tests, score tests, generalised likelihood ratio tests. Multiple tests, combining independent tests. { Chapter 4: Interval estimation. Confidence sets and their con- nection with hypothesis tests. Approximate confidence intervals. Prediction sets. { Chapter 5: Asymptotic theory. Consistency. Asymptotic nor- mality of maximum likelihood estimates, score tests. Chi-square approximation for generalised likelihood ratio tests. Likelihood confidence regions. Pseudo-likelihood tests. • Part 2: Bayesian Statistics { Chapter 6: Background. Interpretations of probability; the Bayesian paradigm: prior distribution, posterior distribution, predictive distribution, credible intervals. Nuisance parameters are easy. 1 { Chapter 7: Bayesian models. Sufficiency, exchangeability. De Finetti's Theorem and its intepretation in Bayesian statistics. { Chapter 8: Prior distributions. Conjugate priors.
    [Show full text]
  • What Is Statistic?
    What is Statistic? OPRE 6301 In today’s world. ...we are constantly being bombarded with statistics and statistical information. For example: Customer Surveys Medical News Demographics Political Polls Economic Predictions Marketing Information Sales Forecasts Stock Market Projections Consumer Price Index Sports Statistics How can we make sense out of all this data? How do we differentiate valid from flawed claims? 1 What is Statistics?! “Statistics is a way to get information from data.” Statistics Data Information Data: Facts, especially Information: Knowledge numerical facts, collected communicated concerning together for reference or some particular fact. information. Statistics is a tool for creating an understanding from a set of numbers. Humorous Definitions: The Science of drawing a precise line between an unwar- ranted assumption and a forgone conclusion. The Science of stating precisely what you don’t know. 2 An Example: Stats Anxiety. A business school student is anxious about their statistics course, since they’ve heard the course is difficult. The professor provides last term’s final exam marks to the student. What can be discerned from this list of numbers? Statistics Data Information List of last term’s marks. New information about the statistics class. 95 89 70 E.g. Class average, 65 Proportion of class receiving A’s 78 Most frequent mark, 57 Marks distribution, etc. : 3 Key Statistical Concepts. Population — a population is the group of all items of interest to a statistics practitioner. — frequently very large; sometimes infinite. E.g. All 5 million Florida voters (per Example 12.5). Sample — A sample is a set of data drawn from the population.
    [Show full text]