STATWAY™ STUDENT HANDOUT

Lesson 1.2.1 Random Sampling

STUDENT NAME DATE

INTRODUCTION

In Lesson 1.1.2, we learned that in order to generalize results from a sample to the population, the sample must be representative of the population.

1 In your own words, explain what you think it means for a sample to be representative of the population.

2 Suppose our college is thinking of ways to increase money in its budget. Since many students like parking spaces close to their classes, the administration is considering implementing a $100 fee per term for a reserved parking space. This would guarantee that the students who paid the fee would have a parking space. The college administration wants to know the percentage of students who would support such a fee.

One way to figure out the proportion of students that would support such a fee would be to conduct a census. A census is a survey of the entire population. The college would ask every student on campus if they would support the fee.

Would this be a reasonable plan? Explain why you think this plan is reasonable or not reasonable.

3 Read each of the following ways to obtain a sample of students at our college. For each method, do the following task(s):

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT Tell whether the method would produce a sample that was representative of the entire student population. If you think the sample will not be representative, explain how the sample would be different from the population.

A Choose four 8:00 a.m. classes at random. Survey all the students in each class.

B Put a poll on the front page of the college website. A poll is a survey usually with only one question to get people’s opinions. Use the students who answer the question as the sample.

C Stand at the entrance to the Student Center and survey students as they enter.

4 None of the above sampling methods produces a sample that is representative of the population of all the college’s students. What is a better method? Suggest a method to obtain a sample that is representative of all students at the college.

NEXT STEPS

In general, our goal when we select a sample is: every individual in the population has an equal chance to be selected if possible.

One good way to do this is to select a simple random sample. A simple random sample means that every different possible sample has the same chance of being chosen.

Often, we refer to a “simple random sample of size n.” Here, n means the number of subjects or individuals in the sample and is called the sample size. This helps researchers keep track of how many people or things are in the sample.

2 STATWAY STUDENT HANDOUT | 3

Lesson 1.2.1 Random Sampling

5 We call a sample that leaves out certain members of a population and that is not representative a biased sample. The three sampling methods listed above all produced biased samples. There are different types of biased samples.

A One type of biased sample is a voluntary response sample. Good sampling involves the researcher selecting the sample. In a voluntary response sample, the participants are self- selected. In other words, the participants pick themselves to participate. Which sample from (3) above is a voluntary response sample?

B Another biased sample is a convenience sample. Convenience sampling does not use random selection and involves using an easily available or “convenient” group to form a sample. Which sample from (3) above is a convenience sample?

6 Suppose the college has 13,000 students. The college has the names and emails for all students in its database. Suggest a way that the administration could choose a simple random sample of 150 students to survey about the parking fee proposal. After they have chosen their sample, how could they actually conduct the survey?

7 When a researcher does not have a list of the population, it can be difficult or even impossible to get a simple random sample. In such cases, researchers must still try to obtain the sample at random.

Suppose that a writer for the student newspaper wants to do a survey of students about the parking fee proposal. Suggest a method that the writer might use to get a sample that is representative of the student population.

Another thing researchers use to help pick a sample is a good sampling frame. The sampling frame is the individuals from whom the researcher chooses the sample. Ideally, the sampling frame is the entire

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT population but often some individuals are left out. For example, many researchers conduct surveys by telephone. The sampling frame consists of all individuals who have phones. Individuals without phones are left out. Since almost the entire population has a phone, this is not a big concern.

8 Look back at each of the sampling methods in question 3. Write down the sampling frame for each method. Helpful hint: think about the kind of students who might end up in the sample.

A Choose four 8:00 a.m. classes at random. Survey all the students in each class.

B Post a poll on the front page of the college website. Use the students who answer the question as the sample.

C Stand at the entrance to the Student Center and survey every 10th students as they enter.

INTRODUCTION

Not What You Might Think…

Researchers are interested in the proportion of students at a particular college who are registered to vote. There is a population of 5,000 students over age 18 who are enrolled at the college. Suppose 3,500 of these students are registered to vote and 1,500 are not registered. Think of the total population as consisting or made up of 5,000 values. Here is a special notation to show 3,500 values and 1,500 values, totaling 5,000 values.

In the notation, R stands for students who are registered to vote and N stands for students who are not registered.

Notice that the values in this population are not numbers. In our example, the values stand for individual students. One way to describe non-numerical data graphically is to use a bar chart. In a bar chart, each possible value in the data set (like registered to vote and not registered vote) is represented by a bar. The height of the bar corresponds to the number of times that value appears in the data set.

For the student population in this study, you could represent the population using a bar chart with two bars —one for registered to vote and one for not registered to vote. See the bar chart below.

4 STATWAY STUDENT HANDOUT | 5

Lesson 1.2.1 Random Sampling

Another common way to create a bar chart is to use proportions. A proportion is a decimal between 0 and 1. You compute a proportion by dividing the number of individuals in a particular category by the total number of individuals.

For example, there are 3,500 students who are registered to vote out of 5,000 total students, so the proportion of students who are registered to vote is:

The proportion for the Not Registered category is

A bar chart using the category proportions is shown here:

Now, we will consider what happens when you take random samples from this population. A computer simulation was used to generate a simple random sample of students. From this sample of students, 32 were registered to vote and 18 were not registered to vote.

9 The computer simulation generated a sample of 50 students. Answer these questions about the proportion of Registered students in this sample:

What proportion of students in the sample is registered to vote?

How does this proportion compare to the actual registered population proportion of 0.70? Are these values equal?

Does this surprise you? Why or why not?

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT We can create bar charts for both population data and sample data. A bar chart for the simple random sample is shown here:

10 What differences do you notice between this bar chart and the bar chart of the population?

Computer simulation was used to create two sets of simple random samples:

For the first set, the computer created one hundred simple random samples of size 50. This means that the computer did simple random samples with a size of 50 students one hundred times. The computer took these simple random samples from the population of 5,000 students. For the second set, the computer created one hundred simple random samples of size 100.

The proportion of students who were registered to vote was computed for each sample set. These sample proportions were used to construct the following dotplots.

11 Which of the dotplots do you think is for the samples of size 50 and which do you think is for samples of size 100? Explain your answer.

12 How can you see the population proportion of 0.70 in the two dotplots?

When researchers use a sample size of 500 to estimate a population proportion, the estimates they get have the same accuracy. It does not matter if the population size is 10,000 or 100,000 or 1,000,000. To show this, we will analyze the following three populations:

Population A: 10,000 people, 7,000 are registered to vote Population B: 100,000 people, 70,000 are registered to vote Population C: 1,000,000 people, 700,000 are registered to vote

13 What proportion is registered to vote in each of the three populations?

Population A:

6 STATWAY STUDENT HANDOUT | 7

Lesson 1.2.1 Random Sampling

Population B:

Population C:

Computer simulation was used to select one hundred different simple random samples of size 500. They selected the samples from Population A and calculated the proportions. The researchers used the one hundred sample proportions to make the dotplot labeled “Population Size 10,000” in the following graph.

The researchers repeated the process for the other two populations. They took one hundred simple random samples each of size 500 for Population B and Population C.

Population Size 10,000

Population Size 100,000

Population Size 1,000,000 0.56 0.60 0.64 0.68 0.72 0.76 0.80 Proportion Registered to Vote

14 We saw in the previous example that a larger sample size gives more accurate results. All of these dotplots are based on the same sample size of 100. Does the population size seem to matter in terms of the accuracy of the results? Explain your reasoning.

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT TAKE IT HOME

1 Imagine that you want to learn about the average number of hours students at your college spend online in a typical day. You want to select a simple random sample of 75 students from the full- time students at your college. You have a list of all full-time students, whose names are arranged in alphabetical order.

How would you select a simple random sample of 75 students from this population? Describe your process.

2 You want to estimate the average amount of time that students at a particular college spend studying in a typical week. Which of the following sampling methods do you recommend? For each method, explain why you did or did not select it as the recommended method.

Method A: Select 50 students at random from the students at the college.

Method C: Select 100 students as they enter the library.

Method B: Select 200 students at random from the students at the college.

Method D: Select the 300 students enrolled in English literature at the college this semester. 8 STATWAY STUDENT HANDOUT | 9

Lesson 1.2.1 Random Sampling

3 In California, a group of people tried to get an initiative voted on by the people of the state. The initiative proposed adding “none of the above” to the list of candidates running for political office. This would mean that voters in California could pick “none of the above” when they did not want to vote for any of the candidates running. The majority of voters in California would have to vote “yes” for the initiative to pass and become law.

A newspaper conducted a poll to see if enough people would vote “yes” to have the initiative pass. The results of the poll were:

55% were against the initiative and would vote “no” 45% were for the initiative and would vote “yes.”

A spokesperson for the group trying to get the initiative passed questioned the result of the poll because it was based on a random sample of 1,000 registered voters in California. He commented that because the population of California is so large, the sample size of 1,000 was not large enough. There are about 23.5 million registered voters in California. The spokesperson said that a sample of only 1,000 voters could not possibly provide an accurate estimate of the proportion of voters in California who support the ballot initiative. (Associated Press, January 30, 2000)

A Is his criticism valid or true? Explain why or why not.

B Would the criticism be valid if this had been a national initiative and 1,000 people were randomly selected from all adult Americans who were registered to vote?

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT +++++

This lesson is part of STATWAY™, A Pathway Through College Statistics, which is a product of a Carnegie Networked Improvement Community that seeks to advance student success. Version 1.0, A Pathway Through Statistics, Statway™ was created by the Charles A. Dana Center at the University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. This version 1.5 and all subsequent versions, result from the continuous improvement efforts of the Carnegie Networked Improvement Community. The network brings together community college faculty and staff, designers, researchers and developers. It is an open-resource research and development community that seeks to harvest the wisdom of its diverse participants in systematic and disciplined inquiries to improve developmental mathematics instruction. For more information on the Statway Networked Improvement Community, please visit carnegiefoundation.org. For the most recent version of instructional materials, visit Statway.org/kernel.

+++++

STATWAY™ and the Carnegie Foundation logo are trademarks of the Carnegie Foundation for the Advancement of Teaching. A Pathway Through College Statistics may be used as provided in the CC BY license, but neither the Statway trademark nor the Carnegie Foundation logo may be used without the prior written consent of the Carnegie Foundation.

10