Basic Statistical Inference for Survey Data

Total Page:16

File Type:pdf, Size:1020Kb

Basic Statistical Inference for Survey Data Basic Statistical Inference for Survey Data Professor Ron Fricker Naval Postgraduate School Monterey, California 1 Goals for this Lecture • Review of descriptive statistics • Review of basic statistical inference – Point estimation – Sampling distributions and the standard error – Confidence intervals for the mean – Hypothesis tests for the mean • Compare and contrast classical statistical assumptions to survey data requirements • Discuss how to adapt methods to survey data with basic sample designs 2 Two Roles of Statistics • Descriptive: Describing a sample or population – Numerical: (mean, variance, mode) – Graphical: (histogram, boxplot) • Inferential: Using a sample to infer facts about a population – Estimating (e.g., estimating the average starting salary of those with systems engineering Master’s degrees) – Testing theories (e.g., evaluating whether a Master’s degree increases income) – Building models (e.g., modeling the relationship of how an advanced degree increases income) A Descriptive Statistics Question: What was the average survey response to question 7? An Inferential Question: Given the sample, what can we say about the average response to question 7 for the population? Lots of Descriptive Statistics • Numerical: – Measures of location • Mean, median, trimmed mean, percentiles – Measures of variability • Variance, standard deviation, range, inter-quartile range – Measures for categorical data • Mode, proportions • Graphical – Continuous: Histograms, boxplots, scatterplots – Categorical: Bar charts, pie charts Continuous Data: Sample Mean, Variance, and Standard Deviation • Sample average or sample mean is a measure of location or central tendency: 1 n xx= ∑ i n i=1 • Sample variance is a measure of variability n 2 1 2 s =−∑ ()xxi n − 1 i=1 • Standard deviation is the square root of the variance s = s 2 7 Statistical Inference • Sample mean, x , and sample variance , s2, are statistics calculated from data • Sample statistics used to estimate true value of population called estimators • Point estimation: estimate a population statistic with a sample statistic • Interval estimation: estimate a population statistic with an interval – Incorporates uncertainty in the sample statistic • Hypothesis tests: test theories about the population based on evidence in the sample data 8 Classical Statistical Assumptions vs. Survey Practice / Requirements • Basic statistical methods assume: – Population is of infinite size (or so large as to be essentially infinite) – Sample size is a small fraction of the population – Sample is drawn from the population via SRS • In surveys: – Population always finite (though may be very large) – Sample could be sizeable fraction of the population • “Sizeable” is roughly > 5% – Sampling may be complex 9 Point Estimation (1) • Example: Use sample mean or proportion to estimate population mean or proportion • Using SRS or a self-weighting sampling scheme, usual estimators for the mean calculated in all stat software packages are generally fine – Assuming no other adjustments are necessary • E.g., nonresponse, poststratification, etc • Except under SRS, usual point estimates for standard deviation almost always wrong 10 Point Estimation (2) • Naïve analyses just present sample statistics for the means and/or proportions – Perhaps some intuitive sense that the sample statistics are a measure of the population – But often don’t account for sample design • However, when using point estimates, no information about sample uncertainty provided – If you did another survey, how much might its results differ from the current results? • Also, even for mean, if sample design not self- weighting, need to adjust software estimators11 Sampling Distributions • Abstract from people and surveys to random variables and their distributions 12 Sampling Distributions • Sampling distribution is the probability distribution of a sample statistic 0.9 0.8 Distribution Sampling Individual 0.7 Mean of 5 individual obs distribution of 0.6 means of five 0.5 with standard obs 0.4 deviation σ 0.3 0.2 Standard error: 0.1 0.0 σ =σ 5 812 813 814 815 816 817 818 X Index 13 Demonstrating Randomness http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html Simulating Sampling Distributions http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html Central Limit Theorem (CLT) for the Sample Mean • Let X1, X2, …, Xn be a random sample from any distribution with mean µ and standard deviation σ • For large sample size n, the distribution of the sample mean has approximately a normal distribution – with mean µ, and σ – standard deviation n • The larger the value of n, the better the approximation 16 Example: Sums of Dice Rolls Roll of a Single Die 120 100 y 80 nc 60 que One roll e r F 40 20 0 Sum of Two Dice 123456700 Outcom e 600 500 y c 400 n e u q e Sum of 2 rolls 300 Fr 200 100 0 Sum of 5 Dice 2345678970 101112 Sum 60 50 y nc 40 e u Sum of 5 rolls q 30 e Fr 20 10 0 Sum of 10 Dice 350 1 3 5 7 9 11 13 15 17 19 21 23 25 Sum 300 250 y Sum of nc 200 ue q e 150 r F 10 rolls 100 50 0 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 56 59 Sum 17 Demonstrating Sampling Distributions and the CLT http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html Interval Estimation for µ • Best estimate for µ is X • But X will never be exactly µ – Further, there is no way to tell how far off • BUT can estimate µ’s location with an interval and be right some of the time – Narrow intervals: higher chance of being wrong – Wide intervals: less chance of being wrong, but also less useful • AND with confidence intervals (CIs) can define the probability the interval “covers” µ! Confidence Intervals: Main Idea • Based on the normal distribution, we know X is within 2 s.e.s of µ 95% of the time • Alternatively, µ is within 2 s.e.s of X 95% of the time 0.9 (Unobserved) dist. of 0.8 Individual sample mean 0.7 Mean of 5 0.6 Sample 0.5 Unobserved mean 95% confidence 0.4 pop mean interval for pop mean 0.3 0.2 0.1 (Unobserved) dist. 0.0 812 813 814 815 816 817 818 of population 20 Index A Simulation intervals not including population mean: 2 100 90 80 70 60 50 40 30 20 10 1 5 10 15 20 25 30 35 40 45 50 sample 95% Confidence Intervals for mean = 50, sd = 10, n = 5 Another Simulation intervals not including population mean: 10 80 70 60 50 40 30 20 1 10 20 30 40 50 60 70 80 90 100 110 120130 140 150160 170 180190 200 sample 95% Confidence Intervals for mean = 50, sd = 10, n = 95 Confidence Interval for µ (1) • For X from sample of size n from a population with mean µ, X − µ T = sn/ has a t distribution with n-1 “degrees of freedom” – Precisely if population has normal distribution – Approximately for sample mean via CLT • Use the t distribution to build a CI for the mean: Pr ()−tTαα/2,nn−−1 <<t/2, 1 =1−α 23 Review: the t Distribution 0.40 normal T3 0.30 T10 T100 0.20 0.10 0.00 -4 -3 -2 -1 0 1 2 3 4 Z= number of SE’s from the mean Confidence Interval for µ (2) • Flip the probability statement around to get a confidence interval: (1) Pr ()−<tTαα/2,nn−−1 <t/2, 1 =1−α ⎛⎞X − µ (2) Pr ⎜⎟−<ttαα/2,nn−−1 </2, 1 =1−α ⎝⎠sn/ ⎛⎞ss (3) Pr ⎜⎟Xt−<αα/2,nn−−1 µ <X+t/2, 1 =1−α ⎝⎠nn 25 Example: Constructing a 95% Confidence Interval for µ • Choose the confidence level: 1-α • Remember the degrees of freedom (ν) = n -1 •Find tα/2,n−1 – Example: if α = 0.05, df=7 then t0.025,7 = 2.365 • Calculate X and s / n • Then ⎛ s s ⎞ Pr⎜ X − 2.365 < µ < X + 2.365 ⎟ = 0.95 ⎝ n n ⎠ Hypothesis Tests • Basic idea is to test a hypothesis / theory on empirical evidence from a sample – E.g., “The fraction of new students aware of the school discrimination policy is less than 75%.” – Does the data support or refute the assertion? This is the probability. If it’s small, we don’t believe our assumption. p=0.75 pˆ = 0.832 If we assume this is true, how likely are we to see this 27 (or something more extreme)? One-Sample, Two-sided t-Test • Hypothesis: H0: µ = µ0 H : µ ≠ µ a 0 X − µ • Standardized test statistic: t = 0 sn2 / • p-value = Pr(T<-t and T>t) = Pr(|T|>t), where T follows a t distribution with n-1 degrees of freedom • Reject H0 if p < α, where α is the predetermined significance level 28 One-Sample, One-sided t-Tests • Hypotheses: H : µ = µ H : µ = µ 0 0 or 0 0 Ha: µ < µ0 Ha: µ > µ0 X − µ • Standardized test statistic: t = 0 sn2 / • p-value = Pr(T < t) or p-value = Pr(T > t), depending on Ha, where T follows a t distribution with n-1 degrees of freedom • Reject H0 if p < α 29 Applying Continuous Methods to Binary Survey Questions • In surveys, often have binary questions, where desire to infer proportion of population in one category or the other • Code binary question responses as 1/0 variable and for large n appeal to the CLT – Confidence interval for the mean is a CI on the proportion of “1”s – T-test for the mean is a hypothesis test on the proportion of “1”s 30 Applying Continuous Methods to Likert Scale Survey Data • Likert scale data is inherently categorical • If willing to make assumption that “distance” between categories is equal, then can code with integers and appeal to CLT Strongly agree 1 Agree 2 Neutral 3 Disagree 4 Strongly disagree 5 31 Adjusting Standard Errors (for Basic Survey Sample Designs) • Sample > 5% of population: finite population correction – Multiply the standard error by ()Nn− /N –E.g.
Recommended publications
  • Statistical Inference: How Reliable Is a Survey?
    Math 203, Fall 2008: Statistical Inference: How reliable is a survey? Consider a survey with a single question, to which respondents are asked to give an answer of yes or no. Suppose you pick a random sample of n people, and you find that the proportion that answered yes isp ˆ. Question: How close isp ˆ to the actual proportion p of people in the whole population who would have answered yes? In order for there to be a reliable answer to this question, the sample size, n, must be big enough so that the sample distribution is close to a bell shaped curve (i.e., close to a normal distribution). But even if n is big enough that the distribution is close to a normal distribution, usually you need to make n even bigger in order to make sure your margin of error is reasonably small. Thus the first thing to do is to be sure n is big enough for the sample distribution to be close to normal. The industry standard for being close enough is for n to be big enough so that 1 − p 1 − p n > 9 and n > 9 p p both hold. When p is about 50%, n can be as small as 10, but when p gets close to 0 or close to 1, the sample size n needs to get bigger. If p is 1% or 99%, then n must be at least 892, for example. (Note also that n here depends on p but not on the size of the whole population.) See Figures 1 and 2 showing frequency histograms for the number of yes respondents if p = 1% when the sample size n is 10 versus 1000 (this data was obtained by running a computer simulation taking 10000 samples).
    [Show full text]
  • Statistical Inferences Hypothesis Tests
    STATISTICAL INFERENCES HYPOTHESIS TESTS Maªgorzata Murat BASIC IDEAS Suppose we have collected a representative sample that gives some information concerning a mean or other statistical quantity. There are two main questions for which statistical inference may provide answers. 1. Are the sample quantity and a corresponding population quantity close enough together so that it is reasonable to say that the sample might have come from the population? Or are they far enough apart so that they likely represent dierent populations? 2. For what interval of values can we have a specic level of condence that the interval contains the true value of the parameter of interest? STATISTICAL INFERENCES FOR THE MEAN We can divide statistical inferences for the mean into two main categories. In one category, we already know the variance or standard deviation of the population, usually from previous measurements, and the normal distribution can be used for calculations. In the other category, we nd an estimate of the variance or standard deviation of the population from the sample itself. The normal distribution is assumed to apply to the underlying population, but another distribution related to it will usually be required for calculations. TEST OF HYPOTHESIS We are testing the hypothesis that a sample is similar enough to a particular population so that it might have come from that population. Hence we make the null hypothesis that the sample came from a population having the stated value of the population characteristic (the mean or the variation). Then we do calculations to see how reasonable such a hypothesis is. We have to keep in mind the alternative if the null hypothesis is not true, as the alternative will aect the calculations.
    [Show full text]
  • SAMPLING DESIGN & WEIGHTING in the Original
    Appendix A 2096 APPENDIX A: SAMPLING DESIGN & WEIGHTING In the original National Science Foundation grant, support was given for a modified probability sample. Samples for the 1972 through 1974 surveys followed this design. This modified probability design, described below, introduces the quota element at the block level. The NSF renewal grant, awarded for the 1975-1977 surveys, provided funds for a full probability sample design, a design which is acknowledged to be superior. Thus, having the wherewithal to shift to a full probability sample with predesignated respondents, the 1975 and 1976 studies were conducted with a transitional sample design, viz., one-half full probability and one-half block quota. The sample was divided into two parts for several reasons: 1) to provide data for possibly interesting methodological comparisons; and 2) on the chance that there are some differences over time, that it would be possible to assign these differences to either shifts in sample designs, or changes in response patterns. For example, if the percentage of respondents who indicated that they were "very happy" increased by 10 percent between 1974 and 1976, it would be possible to determine whether it was due to changes in sample design, or an actual increase in happiness. There is considerable controversy and ambiguity about the merits of these two samples. Text book tests of significance assume full rather than modified probability samples, and simple random rather than clustered random samples. In general, the question of what to do with a mixture of samples is no easier solved than the question of what to do with the "pure" types.
    [Show full text]
  • The Focused Information Criterion in Logistic Regression to Predict Repair of Dental Restorations
    THE FOCUSED INFORMATION CRITERION IN LOGISTIC REGRESSION TO PREDICT REPAIR OF DENTAL RESTORATIONS Cecilia CANDOLO1 ABSTRACT: Statistical data analysis typically has several stages: exploration of the data set; deciding on a class or classes of models to be considered; selecting the best of them according to some criterion and making inferences based on the selected model. The cycle is usually iterative and will involve subject-matter considerations as well as statistical insights. The conclusion reached after such a process depends on the model(s) selected, but the consequent uncertainty is not usually incorporated into the inference. This may lead to underestimation of the uncertainty about quantities of interest and overoptimistic and biased inferences. This framework has been the aim of research under the terminology of model uncertainty and model averanging in both, frequentist and Bayesian approaches. The former usually uses the Akaike´s information criterion (AIC), the Bayesian information criterion (BIC) and the bootstrap method. The last weigths the models using the posterior model probabilities. This work consider model selection uncertainty in logistic regression under frequentist and Bayesian approaches, incorporating the use of the focused information criterion (FIC) (CLAESKENS and HJORT, 2003) to predict repair of dental restorations. The FIC takes the view that a best model should depend on the parameter under focus, such as the mean, or the variance, or the particular covariate values. In this study, the repair or not of dental restorations in a period of eighteen months depends on several covariates measured in teenagers. The data were kindly provided by Juliana Feltrin de Souza, a doctorate student at the Faculty of Dentistry of Araraquara - UNESP.
    [Show full text]
  • Summary of Human Subjects Protection Issues Related to Large Sample Surveys
    Summary of Human Subjects Protection Issues Related to Large Sample Surveys U.S. Department of Justice Bureau of Justice Statistics Joan E. Sieber June 2001, NCJ 187692 U.S. Department of Justice Office of Justice Programs John Ashcroft Attorney General Bureau of Justice Statistics Lawrence A. Greenfeld Acting Director Report of work performed under a BJS purchase order to Joan E. Sieber, Department of Psychology, California State University at Hayward, Hayward, California 94542, (510) 538-5424, e-mail [email protected]. The author acknowledges the assistance of Caroline Wolf Harlow, BJS Statistician and project monitor. Ellen Goldberg edited the document. Contents of this report do not necessarily reflect the views or policies of the Bureau of Justice Statistics or the Department of Justice. This report and others from the Bureau of Justice Statistics are available through the Internet — http://www.ojp.usdoj.gov/bjs Table of Contents 1. Introduction 2 Limitations of the Common Rule with respect to survey research 2 2. Risks and benefits of participation in sample surveys 5 Standard risk issues, researcher responses, and IRB requirements 5 Long-term consequences 6 Background issues 6 3. Procedures to protect privacy and maintain confidentiality 9 Standard issues and problems 9 Confidentiality assurances and their consequences 21 Emerging issues of privacy and confidentiality 22 4. Other procedures for minimizing risks and promoting benefits 23 Identifying and minimizing risks 23 Identifying and maximizing possible benefits 26 5. Procedures for responding to requests for help or assistance 28 Standard procedures 28 Background considerations 28 A specific recommendation: An experiment within the survey 32 6.
    [Show full text]
  • Survey Experiments
    IU Workshop in Methods – 2019 Survey Experiments Testing Causality in Diverse Samples Trenton D. Mize Department of Sociology & Advanced Methodologies (AMAP) Purdue University Survey Experiments Page 1 Survey Experiments Page 2 Contents INTRODUCTION ............................................................................................................................................................................ 8 Overview .............................................................................................................................................................................. 8 What is a survey experiment? .................................................................................................................................... 9 What is an experiment?.............................................................................................................................................. 10 Independent and dependent variables ................................................................................................................. 11 Experimental Conditions ............................................................................................................................................. 12 WHY CONDUCT A SURVEY EXPERIMENT? ........................................................................................................................... 13 Internal, external, and construct validity ..........................................................................................................
    [Show full text]
  • Evaluating Survey Questions Question
    What Respondents Do to Answer a Evaluating Survey Questions Question • Comprehend Question • Retrieve Information from Memory Chase H. Harrison Ph.D. • Summarize Information Program on Survey Research • Report an Answer Harvard University Problems in Answering Survey Problems in Answering Survey Questions Questions – Failure to comprehend – Failure to recall • If respondents don’t understand question, they • Questions assume respondents have information cannot answer it • If respondents never learned something, they • If different respondents understand question cannot provide information about it differently, they end up answering different questions • Problems with researcher putting more emphasis on subject than respondent Problems in Answering Survey Problems in Answering Survey Questions Questions – Problems Summarizing – Problems Reporting Answers • If respondents are thinking about a lot of things, • Confusing or vague answer formats lead to they can inconsistently summarize variability • If the way the respondent remembers something • Interactions with interviewers or technology can doesn’t readily correspond to the question, they lead to problems (sensitive or embarrassing may be inconsistemt responses) 1 Evaluating Survey Questions Focus Groups • Early stage • Qualitative research tool – Focus groups to understand topics or dimensions of measures • Used to develop ideas for questionnaires • Pre-Test Stage – Cognitive interviews to understand question meaning • Used to understand scope of issues – Pre-test under typical field
    [Show full text]
  • MRS Guidance on How to Read Opinion Polls
    What are opinion polls? MRS guidance on how to read opinion polls June 2016 1 June 2016 www.mrs.org.uk MRS Guidance Note: How to read opinion polls MRS has produced this Guidance Note to help individuals evaluate, understand and interpret Opinion Polls. This guidance is primarily for non-researchers who commission and/or use opinion polls. Researchers can use this guidance to support their understanding of the reporting rules contained within the MRS Code of Conduct. Opinion Polls – The Essential Points What is an Opinion Poll? An opinion poll is a survey of public opinion obtained by questioning a representative sample of individuals selected from a clearly defined target audience or population. For example, it may be a survey of c. 1,000 UK adults aged 16 years and over. When conducted appropriately, opinion polls can add value to the national debate on topics of interest, including voting intentions. Typically, individuals or organisations commission a research organisation to undertake an opinion poll. The results to an opinion poll are either carried out for private use or for publication. What is sampling? Opinion polls are carried out among a sub-set of a given target audience or population and this sub-set is called a sample. Whilst the number included in a sample may differ, opinion poll samples are typically between c. 1,000 and 2,000 participants. When a sample is selected from a given target audience or population, the possibility of a sampling error is introduced. This is because the demographic profile of the sub-sample selected may not be identical to the profile of the target audience / population.
    [Show full text]
  • The Evidence from World Values Survey Data
    Munich Personal RePEc Archive The return of religious Antisemitism? The evidence from World Values Survey data Tausch, Arno Innsbruck University and Corvinus University 17 November 2018 Online at https://mpra.ub.uni-muenchen.de/90093/ MPRA Paper No. 90093, posted 18 Nov 2018 03:28 UTC The return of religious Antisemitism? The evidence from World Values Survey data Arno Tausch Abstract 1) Background: This paper addresses the return of religious Antisemitism by a multivariate analysis of global opinion data from 28 countries. 2) Methods: For the lack of any available alternative we used the World Values Survey (WVS) Antisemitism study item: rejection of Jewish neighbors. It is closely correlated with the recent ADL-100 Index of Antisemitism for more than 100 countries. To test the combined effects of religion and background variables like gender, age, education, income and life satisfaction on Antisemitism, we applied the full range of multivariate analysis including promax factor analysis and multiple OLS regression. 3) Results: Although religion as such still seems to be connected with the phenomenon of Antisemitism, intervening variables such as restrictive attitudes on gender and the religion-state relationship play an important role. Western Evangelical and Oriental Christianity, Islam, Hinduism and Buddhism are performing badly on this account, and there is also a clear global North-South divide for these phenomena. 4) Conclusions: Challenging patriarchic gender ideologies and fundamentalist conceptions of the relationship between religion and state, which are important drivers of Antisemitism, will be an important task in the future. Multiculturalism must be aware of prejudice, patriarchy and religious fundamentalism in the global South.
    [Show full text]
  • A Meta-Analysis of the Effect of Concurrent Web Options on Mail Survey Response Rates
    When More Gets You Less: A Meta-Analysis of the Effect of Concurrent Web Options on Mail Survey Response Rates Jenna Fulton and Rebecca Medway Joint Program in Survey Methodology, University of Maryland May 19, 2012 Background: Mixed-Mode Surveys • Growing use of mixed-mode surveys among practitioners • Potential benefits for cost, coverage, and response rate • One specific mixed-mode design – mail + Web – is often used in an attempt to increase response rates • Advantages: both are self-administered modes, likely have similar measurement error properties • Two strategies for administration: • “Sequential” mixed-mode • One mode in initial contacts, switch to other in later contacts • Benefits response rates relative to a mail survey • “Concurrent” mixed-mode • Both modes simultaneously in all contacts 2 Background: Mixed-Mode Surveys • Growing use of mixed-mode surveys among practitioners • Potential benefits for cost, coverage, and response rate • One specific mixed-mode design – mail + Web – is often used in an attempt to increase response rates • Advantages: both are self-administered modes, likely have similar measurement error properties • Two strategies for administration: • “Sequential” mixed-mode • One mode in initial contacts, switch to other in later contacts • Benefits response rates relative to a mail survey • “Concurrent” mixed-mode • Both modes simultaneously in all contacts 3 • Mixed effects on response rates relative to a mail survey Methods: Meta-Analysis • Given mixed results in literature, we conducted a meta- analysis
    [Show full text]
  • (“Quasi-Experimental”) Study Designs Are Most Likely to Produce Valid Estimates of a Program’S Impact?
    Which Comparison-Group (“Quasi-Experimental”) Study Designs Are Most Likely to Produce Valid Estimates of a Program’s Impact?: A Brief Overview and Sample Review Form Originally published by the Coalition for Evidence- Based Policy with funding support from the William T. Grant Foundation and U.S. Department of Labor, January 2014 Updated by the Arnold Ventures Evidence-Based Policy team, December 2018 This publication is in the public domain. Authorization to reproduce it in whole or in part for educational purposes is granted. We welcome comments and suggestions on this document ([email protected]). Brief Overview: Which Comparison-Group (“Quasi-Experimental”) Studies Are Most Likely to Produce Valid Estimates of a Program’s Impact? I. A number of careful investigations have been carried out to address this question. Specifically, a number of careful “design-replication” studies have been carried out in education, employment/training, welfare, and other policy areas to examine whether and under what circumstances non-experimental comparison-group methods can replicate the results of well- conducted randomized controlled trials. These studies test comparison-group methods against randomized methods as follows. For a particular program being evaluated, they first compare program participants’ outcomes to those of a randomly-assigned control group, in order to estimate the program’s impact in a large, well- implemented randomized design – widely recognized as the most reliable, unbiased method of assessing program impact. The studies then compare the same program participants with a comparison group selected through methods other than randomization, in order to estimate the program’s impact in a comparison-group design.
    [Show full text]
  • What Is Statistic?
    What is Statistic? OPRE 6301 In today’s world. ...we are constantly being bombarded with statistics and statistical information. For example: Customer Surveys Medical News Demographics Political Polls Economic Predictions Marketing Information Sales Forecasts Stock Market Projections Consumer Price Index Sports Statistics How can we make sense out of all this data? How do we differentiate valid from flawed claims? 1 What is Statistics?! “Statistics is a way to get information from data.” Statistics Data Information Data: Facts, especially Information: Knowledge numerical facts, collected communicated concerning together for reference or some particular fact. information. Statistics is a tool for creating an understanding from a set of numbers. Humorous Definitions: The Science of drawing a precise line between an unwar- ranted assumption and a forgone conclusion. The Science of stating precisely what you don’t know. 2 An Example: Stats Anxiety. A business school student is anxious about their statistics course, since they’ve heard the course is difficult. The professor provides last term’s final exam marks to the student. What can be discerned from this list of numbers? Statistics Data Information List of last term’s marks. New information about the statistics class. 95 89 70 E.g. Class average, 65 Proportion of class receiving A’s 78 Most frequent mark, 57 Marks distribution, etc. : 3 Key Statistical Concepts. Population — a population is the group of all items of interest to a statistics practitioner. — frequently very large; sometimes infinite. E.g. All 5 million Florida voters (per Example 12.5). Sample — A sample is a set of data drawn from the population.
    [Show full text]