Central Limit Theorem

Understanding the Central Limit Theorem is critical for understanding hypothesis tests of means. The central limit theorem states that when an infinite number of successive random samples are taken from a population, the sampling distribution of the means of those samples will become approximately normally distributed with mean µ and standard deviation /N [~N(µ,/N)] as the sample size becomes larger, irrespective of the shape of the population distribution.

The following tutorial demonstrates the different components of the central limit theorem.

First, let’s choose our population.

This is a population in which all five characteristics being measured have the same probability of occurrence. This is called a uniform distribution. The scores in the population range from 1 to 5. The mean of the population is 3.0 and the standard deviation of the population is 1.41 [µ=3.0; =1.41].

Our next step is to choose successive samples from this population. All samples must have the same sample size. Let’s start small and choose samples of 5.

The distribution of scores in each of these samples is presented below. The red arrow indicates the mean of the sample. As you can see, the sample mean is not the same for these samples. Let’s choose some more samples.

Again, we can see that the distribution is different for each sample and that the means are not the same. Let’s choose some more samples.

We now have taken 25 samples (N=5) from the population and have calculated a mean for each sample. Just as we graphed the distribution of individual scores in each of the samples, we can plot the distribution of the sample means. The distribution of our 25 sample means is presented below.

We can see that this distribution looks different from the population distribution. The sample means do not have an equal probability of occurrence. In fact, the distribution shows us that most of the sample means are clustering around a score of 3.0. We can describe the central tendency and variability of distributions of sample means, just as we can distributions of raw data. The mean of these sample means is 3.14 and the standard deviation of these sample means is .59. The mean is very close to the population mean. The standard deviation, however, is a lot smaller. The range of sample means is also smaller than the range of scores in the population.

Let’s see what happens to our distribution of sample means if we take 75 more samples of 5 from the population. This time, we won’t plot all 100 of our sample distributions, we will just graph the values of the sample means. As you can see, the distribution of sample means is almost normally distributed. The mean of this distribution of sample means is 3.0 and the standard deviation is ##. Even with a very small sample size, our distribution of sample means becomes approximately normally distributed with a mean of 3.0 and a standard deviation very close in value to /N (1.41/5 = .6306).

If we were to take an infinite number of samples from the population, the sampling distribution would be approximately normally distributed as presented below. The mean of this sampling distribution is equal to 3.0 and the standard deviation is .6306.