Handout 6: Summarizing Categorical Data with Counts and Relative Frequencies STAT 100 – Spring 2016
Total Page:16
File Type:pdf, Size:1020Kb
Handout 6: Summarizing Categorical Data with Counts and Relative Frequencies STAT 100 – Spring 2016 In this handout, we will consider some methods that are appropriate for summarizing data measured on categorical variable(s). Definition Categorical Data: A set of measurements that take on values that are one of several possible categories. Example: Poll on Immigration Reform Between January 31 and February 2, 2014, a CNN/ORC opinion poll surveyed a random sample of 1,010 adults nationwide. Respondents were asked the following question: “What should be the main focus of the U.S. government in dealing with the issue of illegal immigration: developing a plan that would allow illegal immigrants who have jobs to become legal U.S. residents, or developing a plan for stopping the flow of illegal immigrants into the U.S. and for deporting those already here?” FREQUENCIES AND RELATIVE FREQUENCIES In this example, one categorical variable of interest is the opinion of each respondent (i.e., whether they answered Legal residency, Stop flow/deport, or Unsure). Our first step in organizing these data is to count how often each value occurs. These counts are shown in the following table. Count Legal residency 548 Stop flow/deport 412 Unsure 50 Total 1,010 Next, instead of reporting the number of responses that fell in each category, calculate the percentage of responses in each category. Find these percentages and put them in the table below. Count Percentage Legal residency 548 Stop flow/deport 412 Unsure 50 Total 1,010 1 Handout 6: Summarizing Categorical Data with Counts and Relative Frequencies STAT 100 – Spring 2016 Counts and percentages of this kind have formal names in statistics: Definitions Frequency: This is the number of times that a value occurs in a data set (i.e., the count). Relative frequency: This is the percentage of all observations in the data set that take on a particular value. Question: A recent blog posting on CNN.com was titled “CNN Poll: Pathway to citizenship trumps border security.” The first sentence of this article reads, “Americans overwhelmingly favor a bill that would give most undocumented immigrants a pathway towards citizenship, according to a new national poll.” Based on the data you have summarized above, do you agree or disagree with the author? Explain your reasoning. 2 Handout 6: Summarizing Categorical Data with Counts and Relative Frequencies STAT 100 – Spring 2016 BIVARIATE FREQUENCY TABLES Investigations such as this often become much more interesting when more than one variable is considered. For example, in the CNN/ORC opinion poll, respondents were also asked a question regarding their political affiliation. To summarize the results of both variables (i.e., opinion and political affiliation), we can use a bivariate frequency table. Legal residency Stop flow/deport Unsure Totals Democrat 202 79 12 293 Independent 267 189 29 485 Republican 79 144 9 232 Totals 548 412 50 1,010 This table gives us information on the joint counts of each category. For example, 202 Democrats answered that Legal residency should be the main focus of the U.S. government in dealing with the issue of illegal immigration. We can use this table to calculate many different types of percentages, as shown in the following questions. Questions: 1. What percentage of all respondents were Democrats that answered Legal residency? 2. What percentage of all respondents who answered Legal residency were Democrats? 3. What percentage of all Democrats answered Legal residency? 4. What was different in the calculation of each of the above percentages? 3 Handout 6: Summarizing Categorical Data with Counts and Relative Frequencies STAT 100 – Spring 2016 CONDITIONAL RELATIVE FREQUENCIES Suppose your goal is to look for differences in opinions across political affiliations. In particular, you want to address the following question: Do more Democrats answer Legal Residency than Republicans? Legal residency Stop flow/deport Unsure Totals Democrat 202 79 12 293 Independent 267 189 29 485 Republican 79 144 9 232 Totals 548 412 50 1,010 Knowing how to pull the correct information from the above table in order to answer this question is a very important skill. Questions: 1. Your friend Angie argues that the following information from the table is most useful for investigating the question of interest: “There were 202 Democrats that answered Legal residency, and only 79 Republicans that gave this answer. So, obviously more Democrats than Republicans feel this way.” Do you agree or disagree with Angie’s reasoning? Explain. 2. Your friend Samantha says that you should be looking at the following percentages. “Of those that answered Legal residency, 202/548 = 37% were Democrat. On the other hand, only 79/548 = 14% were Republican. So, obviously more Democrats than Republicans feel that the main focus should be on legal residency.” Do you agree or disagree with Samantha’s reasoning? Explain. 3. Your friend Paula says that you should be looking at the following percentages. “Of all Democrats, 202/293 = 69% answered Legal residency. On the other hand, only 79/232 = 34% of Republicans answered this way. So, obviously more Democrats than Republicans feel that the main focus should be on legal residency.” Do you agree or disagree with Paula’s reasoning? Explain. 4. Which of your friends is most correct: Angie, Samantha, or Paula? 4 Handout 6: Summarizing Categorical Data with Counts and Relative Frequencies STAT 100 – Spring 2016 Finally, for illustrative purposes, suppose that the results from the opinion poll had ended up as follows: Legal residency Stop flow/deport Unsure Totals Democrat 202 79 12 293 Independent 267 189 29 485 Republican 237 432 27 696 Totals 706 700 68 1,474 Next, you will investigate each of your friends’ arguments with the above data. Questions: 1. Your friend Angie argues that the following information from the table is most useful for investigating the question of interest: “There were 202 Democrats that answered Legal residency, and 237 Republicans that gave this answer. So, obviously more Republicans than Democrats feel this way.” Do you agree or disagree with Angie’s reasoning? Explain. 2. Your friend Samantha says that you should be looking at the following percentages. “Of those that answered Legal residency, 202/706 = 29% were Democrat. On the other hand, 237/706 = 34% were Republican. So, obviously more Republicans than Democrats feel that the main focus should be on legal residency.” Do you agree or disagree with Samantha’s reasoning? Explain. 3. Your friend Paula says that you should be looking at the following percentages. “Of all Democrats, 202/293 = 69% answered Legal residency. On the other hand, only 237/696 = 34% of Republicans answered this way. So, obviously more Democrats than Republicans feel that the main focus should be on legal residency.” Do you agree or disagree with Paula’s reasoning? Explain. 4. Which of your friends is most correct: Angie, Samantha, or Paula? 5 .