STAT 100 Chapter 3

Richard Lockhart

Simon Fraser University

Fall 2014 — Surrey

Richard Lockhart (Simon Fraser University) STAT 100 Chapter 3 Fall 2014 — Surrey 1 / 10 Jargon to learn

Population, , parameter, statistic. computed from samples are variable: different sample gives different value of statistic. Bias, variability. Margin of error, 95% confidence, 19 times out of 20, (SRS). In and SRS, quick method for MOE for a proportion: 1 over square root of sample size. Usual mathematical symbol for sample size is n. Usual mathematical notation for a population proportion: p. For a sample proportion:p ˆ (read as “hat p”).

Richard Lockhart (Simon Fraser University) STAT 100 Chapter 3 Fall 2014 — Surrey 2 / 10 Summary of key ideas from last time

Voluntary surveys are useless. Non-response bias is a systematic tendency for survey results to be wrong in one specific direction due to differences between respondents and non-respondents. Very serious problem in many surveys. Non-response bias does not go away just because you have a big sample.

Richard Lockhart (Simon Fraser University) STAT 100 Chapter 3 Fall 2014 — Surrey 3 / 10 Parameters and Statistics

Crucial distinction. Parameters are numbers which describe populations. Statistics are numbers which describe samples. The actual fraction of Canadians aged 15 to 24 who are unemployed is a parameter of the population of Canadian working age adults. The number produced by Statistics Canada each month is a statistic. The numbers are different. We don’t know how much different exactly.

Richard Lockhart (Simon Fraser University) STAT 100 Chapter 3 Fall 2014 — Surrey 4 / 10 Estimation error

The number produced by Statistics Canada is an estimate. This sounds much more professional than guess (and it is much more professional when Statistics Canada does it). Estimates are wrong. The error has two parts: bias (an error that would be the same if you had drawn a different sample), and variability (due to randomness in sample). Other jargon: statistical fluctuation, stochastic variation, random error, and other combinations of these terms. Do some in R. Message is: different sample, different statistic -- variability from one sample to another.

Richard Lockhart (Simon Fraser University) STAT 100 Chapter 3 Fall 2014 — Surrey 5 / 10 Bias versus Variability

If a merchant puts his hand on the scale when weighing your steak the result is bias. Precision balances have less variability than ordinary bathroom scales. A golfer who always misses 3 cm to the left has bias. When I golf I have lots of variability. And bias, too, because I often hit to the left. I could correct for the bias by aiming right. Survey example: Scottish Independence. Notice that young people are under-represented – so we adjust for this known source of bias. We cannot adjust for random variation.

Richard Lockhart (Simon Fraser University) STAT 100 Chapter 3 Fall 2014 — Surrey 6 / 10 Reducing and quantifying error

Bias is reduced by random sampling. Bias is reduced by adjustment (reweighting). Random sampling with complete frame, no non-response and no measurement error eliminates bias. Estimates are then unbiased. Variability is reduced by increasing sample size. Increasing sample size makes no change in bias. Multiply sample size by 4 to get variability half as big. Census has no variability or bias except for non-response.

Richard Lockhart (Simon Fraser University) STAT 100 Chapter 3 Fall 2014 — Surrey 7 / 10 Margin of Error

Good surveys produce a margin of error. If I say “margin of error is 2 percentage points” I mean For this method of sampling, if I repeated the whole process of selecting a sample many times, the estimate would be within 2 percentage points of the population value 19 times out of 20 (95% of the time).

Look at LFS Standard Errors. LFS gives for month over month change. Margin of Error is twice the Standard Error. Quick formula: in a SRS, MOE for a population proportion, for 95% confidence is 1 √n n is the sample size

Richard Lockhart (Simon Fraser University) STAT 100 Chapter 3 Fall 2014 — Surrey 8 / 10 Confidence intervals and statements

Ingredients: margin of error and confidence level. Usually confidence level is 95% = 19/20. Officially approved phrasing. A 95% confidence interval for the number of full time jobs created in the month from July to August is -2,300 plus or minus 57,000 So the interval runs from 59,300 jobs lost (negative number) to 54,700 created. Month after month the media, and the Statistics Canada news release ignore the huge uncertainty in this month over month change. One last point – some changes have no standard error. Why not? Always ask: how did they measure that? How do they know?

Richard Lockhart (Simon Fraser University) STAT 100 Chapter 3 Fall 2014 — Surrey 9 / 10 Importance of Population Size Practically none. Formula for margin of error uses only sample size, n. Exception to the rule for census, large sampling fraction like NHS. Labour Force Survey sample size 55,000 households (of about 18 million in Canada). American equivalent is Current Population Survey CPS has sample of 60,000 households. Both have margin of error of about 0.2 percentage points. Note: quick formula is inappropriate. Unemployment applies to individuals, not households. Probably about 100,000 individuals covered. That would give 1 = 0.3 percentage points. √100, 000 That is how good an SRS would be for a proportion like: fraction of Canadians in labour force. This is not SRS, not about fraction of all Canadians, applies to a difference between two surveys. Richard Lockhart (Simon Fraser University) STAT 100 Chapter 3 Fall 2014 — Surrey 10 / 10