Outline Describing One Categorical Variable Relationships Between Categorical Variables
STAT 113 Describing Categorical Data I
Colin Reimer Dawson
Oberlin College
September 11, 2020
1 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Outline
Describing One Categorical Variable
Relationships Between Categorical Variables Contingency Tables Conditional Proportions
2 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Outline
Describing One Categorical Variable
Relationships Between Categorical Variables Contingency Tables Conditional Proportions
3 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
A data frame with a single categorical variable
Frequency 49 N/R 65 Daily 25 N/R 74 Weekly 18 Monthly 91 Monthly 47 Weekly 24 N/R 71 Monthly 37 Monthly Table: Partial Results of a Survey of College Students on Frequency of Video Game Playing (via Nolan and Speed, 2000)
4 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Frequency Tables
How can we summarize a categorical variable? One option is simply a frequency table.
Daily Weekly Monthly Semesterly N/R Total 9 28 18 23 13 91 Table: Results of a Survey of College Studentson Frequency of Video Game Playing (via Nolan and Speed, 2000)
5 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Relative Frequency Tables
If we use proportions or percentages, we have a relative frequency table.
Daily Weekly Monthly Semesterly N/R Total 0.100 0.310 0.200 0.250 0.140 1.000 Table: Results of a Survey of College Students on Frequency of Video Game Playing (via Nolan and Speed, 2000)
6 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Pie Charts The pie chart is a popular choice for proportion data...
Weekly ( 31%)
Daily ( 10%)
Monthly ( 20%)
N/R ( 14%)
Semesterly ( 25%)
Figure: Results of a Survey of College Students on Frequency of 7 / 25 Video Game Playing (via Nolan and Speed, 2000) • “Pie charts are the Nickelback of data visualization.” • “Pie charts are the Aquaman of data visualization.”
Outline Describing One Categorical Variable Relationships Between Categorical Variables
Pie Charts...
Figure: http://www.businessinsider.com/ pie-charts-are-the-worst-2013-6
8 / 25 • “Pie charts are the Aquaman of data visualization.”
Outline Describing One Categorical Variable Relationships Between Categorical Variables
Pie Charts...
Figure: http://www.businessinsider.com/ pie-charts-are-the-worst-2013-6
• “Pie charts are the Nickelback of data visualization.” 8 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Pie Charts...
Figure: http://www.businessinsider.com/ pie-charts-are-the-worst-2013-6
• “Pie charts are the Nickelback of data visualization.” • “Pie charts are the Aquaman of data visualization.” 8 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
The One Exception
9 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Bar Charts Much easier to see differences between categories. 40 40 30 30 20 20 # of respondents % of respondents 10 10 0 0
Daily Weekly Monthly Semesterly N/R Daily Weekly Monthly Semesterly N/R
Figure: Two bar plots of Video Game data showing frequency (left) and percentages (right)
10 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Bar Charts What’s this bar chart telling us?
Figure: A Fair and Balanced Bar Chart (from FOX News, 8/9/12) 11 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
The Cardinal Rule of Bar Charts
The cardinal rule of bar charts Ratios in area must correspond to ratios in value • The y-axis must start at 0! • Equal visual space for equal numerical differences
12 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Outline
Describing One Categorical Variable
Relationships Between Categorical Variables Contingency Tables Conditional Proportions
13 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Outline
Describing One Categorical Variable
Relationships Between Categorical Variables Contingency Tables Conditional Proportions
14 / 25 • With more than one variable, count combinations. • With two variables, we can store the counts in a two-way table (also known as a contingency table).
Outline Describing One Categorical Variable Relationships Between Categorical Variables
Contingency Tables
• With one categorical variable, summarize by counting the observations in each category
15 / 25 • With two variables, we can store the counts in a two-way table (also known as a contingency table).
Outline Describing One Categorical Variable Relationships Between Categorical Variables
Contingency Tables
• With one categorical variable, summarize by counting the observations in each category • With more than one variable, count combinations.
15 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Contingency Tables
• With one categorical variable, summarize by counting the observations in each category • With more than one variable, count combinations. • With two variables, we can store the counts in a two-way table (also known as a contingency table).
15 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
A Simple Contingency Table
Student Year Computer 1 2nd PC Computer 2 3rd Mac PC Mac 3 3rd PC 1st 1 1 4 1st PC =⇒ 2nd 3 0 5 2nd PC Year 3rd 1 1 6 1st Mac 4th 0 1 7 4th Mac 8 2nd PC
16 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Outline
Describing One Categorical Variable
Relationships Between Categorical Variables Contingency Tables Conditional Proportions
17 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Proportions within a Context Example: Driving While Black/Brown Armentrout, et al. (2007)1 reports data on traffic stops by the Los Angeles Police Department (LAPD). Two of the variables recorded are race of the driver and whether or not the vehicle was searched.
Question of interest: Of stops, is the proportion that result in a search different for different races of driver? 1Armentrout, M., Goodrich, A., Nguyen, J., Ortega, L., Smith, L., & Khadjavi, L.S. (2007). Cops and stops: Racial profiling and a preliminary statistics analysis of Los Angeles police department traffic stops and searches. Retrieved from http://www.public.asu.edu/ etcamach/AMSSI/reports/copsnstops.pdf 18 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Proportions in a Context Question of interest: Of stops, is the proportion that result in a search different for different races of driver?
Hisp./Lat. White Black Asian Others Total Searched 510 109 240 16 7 882 Not Searched 1826 2081 1008 486 104 5505 Total 2336 2190 1248 502 111 6387 On Your Own (3 min.): • Identify the cases and the population the cases are drawn from. • How would you address this question using this data? 19 / 25 • The resulting proportions are called conditional proportions: proportions are computed within a context, i.e., cases that satisfy a certain condition.
Outline Describing One Categorical Variable Relationships Between Categorical Variables
Conditional Proportions
Hisp./Lat. White Black Asian Others Total Searched 510 109 240 16 7 882 Not Searched 1826 2081 1008 486 104 5505 Total 2336 2190 1248 502 111 6387
• We can group the cases according to one variable (e.g., driver race), and look at the distribution of the other (searched or not) within each group.
20 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Conditional Proportions
Hisp./Lat. White Black Asian Others Total Searched 510 109 240 16 7 882 Not Searched 1826 2081 1008 486 104 5505 Total 2336 2190 1248 502 111 6387
• We can group the cases according to one variable (e.g., driver race), and look at the distribution of the other (searched or not) within each group. • The resulting proportions are called conditional proportions: proportions are computed within a context, i.e., cases that satisfy a certain condition.
20 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Conditional Proportions
Searched Yes No Total Hisp./Lat. 510/2336 1826/2336 2336 White 109/2190 2081/2190 2190 Driver Race Black 240/1248 1008/1248 1248 Asian 15/502 486/502 502 Others 7/111 104/111 111 Total 882/6387 5505/6387 6387
21 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Conditional Proportions
Searched Yes No Total Hisp./Lat 0.218 0.782 2336 White 0.050 0.950 2190 Driver Race Black 0.192 0.808 1248 Asian 0.032 0.968 502 Others 0.063 0.937 111 Total 0.138 0.862 6387
22 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Vietnam War Opinions
From an October 2001 article in The Economist entitled “Treason of the Intellectuals?” “Back in Vietnam days, the anti-war movement spread from the intelligentsia into the rest of the population, eventually paralysing the country’s will to fight.”
Source http://www.economist.com/node/806289
23 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Vietnam War Opinions
January 1971 Gallup Poll “A proposal has been made in Congress to require the U.S. government to bring home all U.S. troops before the end of this year. Would you like to have your congressman vote for or against this proposal?”
24 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables
Vietnam War Opinions • Opinion proportions about Vietnam withdrawal, within education levels, based on a January 1971 Gallup poll.
Education Level Grade High College Overall “Dove” 0.73 Opinion “Hawk” 0.27 Table: Source: Gelman & Nolan, 2002
• To Ponder: What would you expect the proportions to look like, based on this story?
25 / 25