<<

Outline Describing One Categorical Variable Relationships Between Categorical Variables

STAT 113 Describing Categorical I

Colin Reimer Dawson

Oberlin College

September 11, 2020

1 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Outline

Describing One Categorical Variable

Relationships Between Categorical Variables Contingency Tables Conditional Proportions

2 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Outline

Describing One Categorical Variable

Relationships Between Categorical Variables Contingency Tables Conditional Proportions

3 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

A data frame with a single categorical variable

Frequency 49 N/R 65 Daily 25 N/R 74 Weekly 18 Monthly 91 Monthly 47 Weekly 24 N/R 71 Monthly 37 Monthly Table: Partial Results of a of College Students on of Video Game Playing (via Nolan and Speed, 2000)

4 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Frequency Tables

How can we summarize a categorical variable? One option is simply a frequency table.

Daily Weekly Monthly Semesterly N/R Total 9 28 18 23 13 91 Table: Results of a Survey of College Studentson Frequency of Video Game Playing (via Nolan and Speed, 2000)

5 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Relative Frequency Tables

If we use proportions or percentages, we have a relative frequency table.

Daily Weekly Monthly Semesterly N/R Total 0.100 0.310 0.200 0.250 0.140 1.000 Table: Results of a Survey of College Students on Frequency of Video Game Playing (via Nolan and Speed, 2000)

6 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Pie Charts The is a popular choice for proportion data...

Weekly ( 31%)

Daily ( 10%)

Monthly ( 20%)

N/R ( 14%)

Semesterly ( 25%)

Figure: Results of a Survey of College Students on Frequency of 7 / 25 Video Game Playing (via Nolan and Speed, 2000) • “Pie charts are the Nickelback of .” • “Pie charts are the Aquaman of data visualization.”

Outline Describing One Categorical Variable Relationships Between Categorical Variables

Pie Charts...

Figure: http://www.businessinsider.com/ pie-charts-are-the-worst-2013-6

8 / 25 • “Pie charts are the Aquaman of data visualization.”

Outline Describing One Categorical Variable Relationships Between Categorical Variables

Pie Charts...

Figure: http://www.businessinsider.com/ pie-charts-are-the-worst-2013-6

• “Pie charts are the Nickelback of data visualization.” 8 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Pie Charts...

Figure: http://www.businessinsider.com/ pie-charts-are-the-worst-2013-6

• “Pie charts are the Nickelback of data visualization.” • “Pie charts are the Aquaman of data visualization.” 8 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

The One Exception

9 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Bar Charts Much easier to see differences between categories. 40 40 30 30 20 20 # of respondents % of respondents 10 10 0 0

Daily Weekly Monthly Semesterly N/R Daily Weekly Monthly Semesterly N/R

Figure: Two bar plots of Video Game data showing frequency (left) and percentages (right)

10 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Bar Charts What’s this telling us?

Figure: A Fair and Balanced Bar Chart (from FOX News, 8/9/12) 11 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

The Cardinal Rule of Bar Charts

The cardinal rule of bar charts Ratios in area must correspond to ratios in value • The y-axis must start at 0! • Equal visual space for equal numerical differences

12 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Outline

Describing One Categorical Variable

Relationships Between Categorical Variables Contingency Tables Conditional Proportions

13 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Outline

Describing One Categorical Variable

Relationships Between Categorical Variables Contingency Tables Conditional Proportions

14 / 25 • With more than one variable, count combinations. • With two variables, we can store the counts in a two-way table (also known as a ).

Outline Describing One Categorical Variable Relationships Between Categorical Variables

Contingency Tables

• With one categorical variable, summarize by counting the observations in each category

15 / 25 • With two variables, we can store the counts in a two-way table (also known as a contingency table).

Outline Describing One Categorical Variable Relationships Between Categorical Variables

Contingency Tables

• With one categorical variable, summarize by counting the observations in each category • With more than one variable, count combinations.

15 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Contingency Tables

• With one categorical variable, summarize by counting the observations in each category • With more than one variable, count combinations. • With two variables, we can store the counts in a two-way table (also known as a contingency table).

15 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

A Simple Contingency Table

Student Year Computer 1 2nd PC Computer 2 3rd Mac PC Mac 3 3rd PC 1st 1 1 4 1st PC =⇒ 2nd 3 0 5 2nd PC Year 3rd 1 1 6 1st Mac 4th 0 1 7 4th Mac 8 2nd PC

16 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Outline

Describing One Categorical Variable

Relationships Between Categorical Variables Contingency Tables Conditional Proportions

17 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Proportions within a Context Example: Driving While Black/Brown Armentrout, et al. (2007)1 reports data on traffic stops by the Los Angeles Police Department (LAPD). Two of the variables recorded are race of the driver and whether or not the vehicle was searched.

Question of interest: Of stops, is the proportion that result in a search different for different races of driver? 1Armentrout, M., Goodrich, A., Nguyen, J., Ortega, L., Smith, L., & Khadjavi, L.S. (2007). Cops and stops: Racial profiling and a preliminary analysis of Los Angeles police department traffic stops and searches. Retrieved from http://www.public.asu.edu/ etcamach/AMSSI/reports/copsnstops.pdf 18 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Proportions in a Context Question of interest: Of stops, is the proportion that result in a search different for different races of driver?

Hisp./Lat. White Black Asian Others Total Searched 510 109 240 16 7 882 Not Searched 1826 2081 1008 486 104 5505 Total 2336 2190 1248 502 111 6387 On Your Own (3 min.): • Identify the cases and the population the cases are drawn from. • How would you address this question using this data? 19 / 25 • The resulting proportions are called conditional proportions: proportions are computed within a context, i.e., cases that satisfy a certain condition.

Outline Describing One Categorical Variable Relationships Between Categorical Variables

Conditional Proportions

Hisp./Lat. White Black Asian Others Total Searched 510 109 240 16 7 882 Not Searched 1826 2081 1008 486 104 5505 Total 2336 2190 1248 502 111 6387

• We can group the cases according to one variable (e.g., driver race), and look at the distribution of the other (searched or not) within each group.

20 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Conditional Proportions

Hisp./Lat. White Black Asian Others Total Searched 510 109 240 16 7 882 Not Searched 1826 2081 1008 486 104 5505 Total 2336 2190 1248 502 111 6387

• We can group the cases according to one variable (e.g., driver race), and look at the distribution of the other (searched or not) within each group. • The resulting proportions are called conditional proportions: proportions are computed within a context, i.e., cases that satisfy a certain condition.

20 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Conditional Proportions

Searched Yes No Total Hisp./Lat. 510/2336 1826/2336 2336 White 109/2190 2081/2190 2190 Driver Race Black 240/1248 1008/1248 1248 Asian 15/502 486/502 502 Others 7/111 104/111 111 Total 882/6387 5505/6387 6387

21 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Conditional Proportions

Searched Yes No Total Hisp./Lat 0.218 0.782 2336 White 0.050 0.950 2190 Driver Race Black 0.192 0.808 1248 Asian 0.032 0.968 502 Others 0.063 0.937 111 Total 0.138 0.862 6387

22 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Vietnam War Opinions

From an October 2001 article in The Economist entitled “Treason of the Intellectuals?” “Back in Vietnam days, the anti-war movement spread from the intelligentsia into the rest of the population, eventually paralysing the country’s will to fight.”

Source http://www.economist.com/node/806289

23 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Vietnam War Opinions

January 1971 Poll “A proposal has been made in Congress to require the U.S. government to bring home all U.S. troops before the end of this year. Would you like to have your congressman vote for or against this proposal?”

24 / 25 Outline Describing One Categorical Variable Relationships Between Categorical Variables

Vietnam War Opinions • Opinion proportions about Vietnam withdrawal, within education levels, based on a January 1971 Gallup poll.

Education Level Grade High College Overall “Dove” 0.73 Opinion “Hawk” 0.27 Table: Source: Gelman & Nolan, 2002

• To Ponder: What would you expect the proportions to look like, based on this story?

25 / 25