Outline of Levin and Fox, Chapter 1 (2003) s1

Topic 3. Inferential Statistics: Probability, the normal distribution, sampling distributions, estimation, and hypothesis testing

Inferential Statistics  Sample data and population data – we usually have sample data  Inferential statistics – we use sample data to make predictions (to draw inferences) about the population  When we don't have data for every case, our predictions are probably incorrect (due to sampling error)  We never know for certain if we are drawing incorrect conclusions

Notation for Sample Statistics and Population Parameters Population parameters are often displayed as Greek letters (or capital Roman letters) and sample statistics are displayed as (lowercase) Roman letters or Greek letters with a ‘hat’ (aka caret): Population Sample Mean  y or Y or y  y  Standard deviation y or SY or sy  y Proportion or P   or p

Probability and Probability Distributions Probability – the ratio of the number of times the desired outcome can occur relative to the total number of all outcomes that can occur over the long run. Probabilities are often expressed as ratios and/or proportions. Our approach to probabilities falls within the classical or frequentist approach (rather than the Bayesian approach)… for better or worse.

The probability of a ‘heads’ on one flip of an honest coin=1/2=0.5  The flip of a coin is a purely random event – we cannot predict the outcome with any certainty  Elementary outcomes: heads, tails  This is an equal probability process:

The probability of a ‘6’ on one roll of an honest die=1/6=0.1667  The roll of a die is a purely random event  Elementary outcomes: 1, 2, 3, 4, 5, 6  This is an equal probability process:

Page 1 of 10 The ‘or’ rule – the probability of a 1 or a 6 on one roll of an honest die=1/6+1/6=2/6=0.333 Elementary outcomes: 1, 2, 3, 4, 5, 6

The ‘and’ rule – the probability of a 6 and a 6 on the roll of two honest dice=1/6*1/6=1/36=0.02778 Elementary outcomes (6*6=36) Die 1 1 2 3 4 5 6 1 1,1 1,2 1,3 1,4 1,5 1,6 2 2,1 2,2 2,3 2,4 2,5 2,6 Die 2 3 3,1 3,2 3,3 3,4 3,5 3,6 4 4,1 4,2 4,3 4,4 4,5 4,6 5 5,1 5,2 5,3 5,4 5,5 5,6 6 6,1 6,2 6,3 6,4 6,5 6,6

Probabilities tell us how often things should happen over the long run

Four flips of an ‘honest’ coin and the number of heads  a binomial process  both outcomes (heads and tails) are equally likely so there is a uniform distribution for one flip  there is NOT a uniform distribution for multiple flips – with 4 flips there are 5 possible outcomes but 16 ways to get them:

Number of possible ‘heads’ on four coin flips 0 1 2 3 4 Number of ways to get this many heads 1 4 6 4 1 The elementary outcomes (there are 16) TTTT HTTT HHTT HHHT HHHH THTT HTHT HHTH TTHT HTTH HTHH TTTH THHT THHH THTH TTHH The probability of any one of these ways 0.0625 Sum =(0.5 * 0.5 * 0.5 * 0.5) = 1/16 The ‘combined’ probabilities 0.0625 0.2500 0.3750 0.2500 0.0625 1.0 (i.e., combined within the number of ways)

0.4000 0.3500 0.3000 y t

i 0.2500 l i b

a 0.2000 b o r 0.1500 P 0.1000 0.0500 0.0000 0 1 2 3 4 Number of Heads on 4 Flips of a Coin

Here is an alternative way to show the elementary outcomes (there are 16 ‘branches’): Page 2 of 10 Probability distributions, such as the one above, tell us the relative frequency with which we should expect all possible outcomes to occur over a large number of trials. They are based on theory rather than empirical data.

The Normal Distribution (also known as the Gaussian distribution) The normal distribution is a theoretical probability distribution – it tells us the expected relative frequency of every possible outcome.

Properties of the normal distribution:  Bell-shaped  Perfectly symmetric: o mean=median=mode o half of the cases are less than the mean and half of the cases are greater than the mean

A Normal Distribution (Y =0, SY =100):

0.0045000000

0.0040000000

0.0035000000

0.0030000000

0.0025000000

0.0020000000

0.0015000000

0.0010000000

0.0005000000

0.0000000000 -500 -450 -400 -350 -300 -250 -200 -150 -100 -50 0 50 100 150 200 250 300 350 400 450 500

No empirical distribution ever matches this theoretical distribution perfectly, but some come close. This is a histogram describing the ‘substantive complexity’ of occupations (the 809 cases are occupations):

Page 3 of 10 Not all normal distributions are shaped exactly the same. The shape (i.e., the peakedness) is determined by the size of the standard deviation. For example, these three normal distributions have the same mean (0), but different standard deviations (50, 100, and 150). The larger the standard deviation, the flatter is the curve. Despite this, they are all normal distributions.

3 Normal Distributions

0.0090000000

0.0080000000

0.0070000000

0.0060000000

0.0050000000 Standard deviation=100 Standard deviation=150 0.0040000000 Standard deviation=50

0.0030000000

0.0020000000

0.0010000000

0.0000000000 z Scores (a.k.a. Standard Scores) A z score (or standard score) is the number of standard deviations that a raw score is above or below the mean. Here is the formula for transforming raw scores into z scores: y  y z  s y

The standard normal distribution A standard normal distribution is a normal distribution represented in standard scores. For example, if we displayed a distribution of raw GRE scores, we would have a (nearly) normal distribution. If we converted the raw scores to z scores and then displayed the distribution of GRE z scores, then we would have a standard normal distribution. The standard normal distribution has a mean of 0 and a standard deviation of 1.

A normal distribution: A standard normal distribution:

0.0045000000 0.0045000000

0.0040000000 0.0040000000

0.0035000000 0.0035000000

0.0030000000 0.0030000000

0.0025000000 0.0025000000

0.0020000000 0.0020000000

0.0015000000 0.0015000000

0.0010000000 0.0010000000

0.0005000000 0.0005000000

0.0000000000 0.0000000000 -500 -450 -400 -350 -300 -250 -200 -150 -100 -50 0 50 100 150 200 250 300 350 400 450 500 -5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Page 4 of 10 The normal distribution is useful because... There is a constant area (a constant proportion of cases) under the curve lying between the mean and any given distance from the mean when measured in standard deviation units:  68.26% within 1 standard deviation of the mean (±34.13)  95.44% within 2 standard deviations of the mean (±47.72)  99.74% within 3 standard deviations of the mean (±49.87)

One application is that we can determine how unusual any given outcome is: GRE Scores on Verbal Reasoning (N=1,421,856; July 1, 2005-June 30, 2008) Mean=457 Standard deviation=121 Minimum=200 Maximum=800

How unusual is a GRE score on verbal reasoning of 699? 699 is 242 points or 2 standard deviations above the mean How many people score 699 or better? 100-(50+47.72)=2.26% Only 2.26% of people score higher than 699; this is a rare event.

Sampling Distributions Let’s get back to the basic idea of inferential statistics: We use a sample statistic to estimate a population parameter.  For example, a sample mean is an estimate of a population mean (it is a ‘point estimate’).  Due to sampling error, we know that the point estimate is probably incorrect.  How confident can we be in the estimate from any one sample?  We use a sampling distribution and the standard error to estimate the magnitude of the sampling error and we take this into account when we draw inferences about the population.

Let’s go over an example (see the Excel file):

The population data (N=6) The distribution of scores in the population: Case 2 Y Y Y (Y Y) 35.0 1 0 -1.5 2.25

2 1 -0.5 0.25 30.0

3 1 -0.5 0.25 25.0

4 2 0.5 0.25 20.0

5 2 0.5 0.25 15.0 6 3 1.5 2.25 10.0 Sum 9 0 5.5 5.0

Y (Population mean) 1.50 0.0 Y (Population 1.05 0 1 2 3 standard deviation

Now let’s pretend that we don’t know the population mean and standard deviation. We usually don’t know these (if we did, we wouldn’t need to do the study). Page 5 of 10 Let’s start by focusing on one sample of 3 people (Cases 1, 2, and 3):

 y123 =0.67; this sample mean is our point estimate of the population mean (notice that it is incorrect) This, however, is only one sample; 19 other samples of 3 cases are possible.

How many possible samples of size n are there from the population of size N? Here is a helpful formula for finding the answer:  N  N!    n  n!(N  n)!

For example, if there are 6 cases in the population (N=6) and you are drawing samples of 3 (n=3), then there are 20 possible samples: 6 6! 6*5*4*3*2*1 6*5*4       20 3 3!(6  3)! 3*2*1(3*2*1) 3*2*1

Here are all 20 possible sample means (this is a sampling distribution of means). Notice that the sample means vary across samples: Samples (N=3) Case 1 y Case 2 y Case 3 y Sample mean Sample standard deviation Estimated standard error 1 1 0 2 1 3 1 0.67 0.58 0.33 2 1 0 2 1 4 2 1.00 1.00 0.58 3 1 0 2 1 5 2 1.00 1.00 0.58 4 1 0 2 1 6 3 1.33 1.53 0.88 5 1 0 3 1 4 2 1.00 1.00 0.58 6 1 0 3 1 5 2 1.00 1.00 0.58 7 1 0 3 1 6 3 1.33 1.53 0.88 8 1 0 4 2 5 2 1.33 1.15 0.67 9 1 0 4 2 6 3 1.67 1.53 0.88 10 1 0 5 2 6 3 1.67 1.53 0.88 11 2 1 3 1 4 2 1.33 0.58 0.33 12 2 1 3 1 5 2 1.33 0.58 0.33 13 2 1 3 1 6 3 1.67 1.15 0.67 14 2 1 4 2 5 2 1.67 0.58 0.33 15 2 1 4 2 6 3 2.00 1.00 0.58 16 2 1 5 2 6 3 2.00 1.00 0.58 17 3 1 4 2 5 2 1.67 0.58 0.33 18 3 1 4 2 6 3 2.00 1.00 0.58 19 3 1 5 2 6 3 2.00 1.00 0.58 20 4 2 5 2 6 3 2.33 0.58 0.33

Here is the dilemma: if sample estimates vary, how confident can we be in the estimate from any one sample?

The solution to this dilemma is to use a device known as a sampling distribution to estimate how much variability we think there is across all possible samples. A sampling distribution is a theoretical probability distribution that lists all possible sample estimates for the statistic in which we are interested as well as how often we should expect each estimate to occur (‘means’ in this example). The bolded means above make up the sampling distribution of means.

Page 6 of 10 *Note – in reality, it is often impossible to create a sampling distribution. Think of how difficult it would be to draw all possible random samples of 2,000 people from the US population. We are doing this to illustrate the idea behind how the sampling distribution works. 304,059,724! (Using population estimates from July 1, 2008) 2,000!*(304,059,724  2,000)!

I also want to point out that we have a sampling distribution of the mean. Sampling distributions exist for all sample statistics.

The sampling distribution is now its own variable with central tendency and variability:

30 Sample mean f % 25 0.67 1 5 1.00 4 20 20 1.33 5 25 15 1.67 5 25 10

2.00 4 20 5 2.33 1 5 0 Sum 20 100 0.67 1.00 1.33 1.67 2.00 2.33

The mean of the sampling distribution (i.e., the ‘mean of means’) is equal to the population mean (1.50). This suggests that if we generated all possible samples of the same size, our sample estimates would be correct on average (this is somewhat reassuring).

The standard error (in this example, the standard error of the mean) describes how much variability there is in the estimate from sample to sample. The bigger the standard error, the more different are the estimates from sample to sample. A lot of variability from sample to sample is bad for our confidence. We only ever have one sample. If we think there is little consistency in the estimate from sample to sample (i.e., high variability), then we have to be less confident in the one sample estimate that we do have. If we think there is consistency in the estimate from sample to sample (i.e., low variability), then we can be more confident in the one sample estimate that we do have.

Here is the formula for calculating the standard error of the mean:

 Y   , where Y is the population standard deviation and n is the sample size. Y n  1.05 For our sampling distribution:  Y   .61 Y n 3

In reality we don’t know the population standard deviation, so we use the sample standard deviation to estimate it (see the table). The logic here is that if there is a lot of variability across cases in the scores in our one sample (as indicated by a larger standard deviation), then there is probably a lot of variability across cases in the scores in the population, which would cause there to be a lot of variability in the estimates across samples. So we use an estimate of the variability within one sample to predict how much variability there is across all samples.

S 0.58 For the first sample of 3 cases: S  Y   .33 Y n 3

Page 7 of 10 When is the standard error small? 1. When there is little variability across cases in the scores within our one sample (i.e., when there is a small standard deviation). 2. When the sample size is large.

If you generate your sample using an appropriate method, you don’t have control over the amount of variability in your sample. You do, however, have control over your sample size. So if you want to have confidence in the estimate from your one sample, generate a large sample!

Notice that the sampling distribution of the means is more normally distributed than the distribution of scores in the population (to see this, compare the bar graphs). Also, notice that the variability in the sampling distribution (.61 or .33) is smaller than the variability in the scores in the population (1.05).

These basic properties are known as the central limit theorem (also the law of large numbers): If large simple random samples are drawn from a population (with any distribution), then: 1. The sampling distribution will have a normal distribution. 2. The mean of the sampling distribution will be equal to the population mean. 3. The variance in the sampling distribution is equal to the variance of the population divided by the sample size.

Let’s return to our basic dilemma: if sample estimates vary and if most estimates result in some degree of sampling error, how confident can we be in our estimate from the sample?

Estimation – Let’s Put Standard Errors to Good Use We often use a sample statistic as an estimate of the exact value of a population parameter (called a ‘point estimate’). For example, we can use the GSS sample data to calculate the mean number of hours people work in a typical week ( y =41.9 hours). We could then use this as our exact estimate for the population mean.

The problem with this approach is that we don’t know how accurate our estimate is (we haven’t yet considered how much variability there might be in this estimate from sample to sample) and it is not very likely that the true population mean is exactly equal to this value because of sampling error (the sample mean is the best guess, but the chance that it is correct is relatively low; see a normal distribution).

The second type of estimation is confidence interval estimation. A confidence interval is a range of values in which the population parameter is expected to fall. The width of the range, as we shall see, is determined by (1) the level of confidence we want to have in our estimate and (2) the standard error.

CI  y  (z * ) CI  y  (z *s ) Y or y

In the social sciences, we are usually satisfied to be 95% certain that the confidence interval contains the true population parameter. If we want to reduce the risk of calculating an interval that does not contain the population parameter, then we need to increase the width of the confidence interval by selecting a larger z score.

To be 95% certain, we should use a z score of 1.96. To be 99% certain, we should use a z score of 2.58.

 s   Y y Y sy  n , n Page 8 of 10 A lot of variability from sample-to-sample in the estimate (represented by a large standard error) forces us to have a wider interval. If we want to increase the precision of the estimate (i.e., generating a narrower interval) without increasing the risk of being wrong, we need to generate a larger sample (this will reduce the standard error). So standard errors allow us to take uncertainty related to sampling error into account when making predictions about the population.

Estimation of Population Proportions Here are the formulas for the confidence interval and standard error:

CI  p  (z * p ) or CI  p  (z *s p ) ( )(1 ) ( p)(1 p)  p  , s p  n n

The standard error for the proportion describes variation in the estimate of the proportion from sample to sample. is the population proportion and n is the number of cases in the sample. Since we don’t know we will p (the sample proportion) as our estimate of 

Confidence Intervals in SPSS Let’s use SPSS and the 2000 General Social Survey to estimate the mean number of hours worked last week in the US. Let’s estimate confidence intervals at the 95% level.

95% Confidence Level:  Launch SPSS and open the data file  Click analyze, descriptive statistics, and explore  Insert work hours (HRS1) into the “Dependent list” box.  Click Statistics and notice that the default setting for the confidence interval is 95%  Click continue and ok

Here is the output: Case Processing Summary Descriptives Cases Valid Missing Total Statistic Std. Error N Percent N Percent N Percent HRS1 NUMBER OF Mean 41.90 .314 HRS1 NUMBER OF HOURS WORKED 1818 64.5% 999 35.5% 2817 100.0% HOURS WORKED 95% Confidence Lower Bound 41.28 LAST WEEK LAST WEEK Interval for Mean Upper Bound 42.51

5% Trimmed Mean 41.76 Median 40.00 Variance 179.430 Std. Deviation 13.395 Minimum 3 Maximum 89 Range 86 Interquartile Range 9.00 Skewness .212 .057 Kurtosis 1.668 .115

Our point estimate of the population parameter is 41.90 hours. We are 95% confident that the mean number of hours worked last week in the US population (in 2000) is between 41.28 hours and 42.51 hours. We are 95% confident because in 95 out of 100 samples, our confidence interval would contain the population parameter.

Page 9 of 10 Remember that we will get different means and standard deviations in different samples due to sampling error. As a result, we will also get different estimates of the standard error and different confidence intervals. Despite this, 95 samples out of 100 (assuming 95% confidence) should yield a confidence interval that contains the population parameter.

Hypothesis Testing We also use standard errors in hypothesis testing.

Is the mean number of work hours per week in the US population 40 hours per week?

Null Hypothesis, H0: Y  40

Alternative or Research Hypothesis, H1: Y  40

Sample mean ( y )=41.9 Sample size (valid n)=1,818

13.395    0.314 Y 1,818 y   41.9  40.0 z  Y   6.051  Y 0.314

How often would we get this z score if the null hypothesis is true? p=0.000000185184859 or 0.0000185184859% or 1.85 times per 10,000,000 trials

We are either extraordinarily unlucky OR Y  40 is not a good assumption.

In this example, we used the standard error to take into account uncertainty in our estimate (from sampling error). If we believe that the estimate of the mean might vary quite a bit across samples (which would be represented by a large standard error), then the observed z score would be smaller. This would make it more difficult to reject the null hypothesis (because smaller observed z scores are closer to the middle of the distribution). In other words, by dividing by the standard error, we are able to take sampling error into account.

One sample t (and z) test in SPSS (you would get the same results listed above)  Click analyze, compare means, and one-sample t test  Insert work hours (HRS1) into the “Test Variable(s):” box.  Insert the hypothesized value (in this example, 40) into the “Test Value:” box.  Click ok

In sum, just about every sample statistic that we will calculate has a standard error. All standard errors describe how much variability in the estimate there might be from sample to sample. We use this information to adjust our predictions – for example, by making our confidence intervals wider or our z scores smaller; in both cases, this means being more conservative in our predictions about the population.

Page 10 of 10