Advanced Quantitative Reasoning — Teacher's Texas Edition
Total Page:16
File Type:pdf, Size:1020Kb
Part II Probability and Statistical Reasoning Section 16 Tabulating and Graphing Categorical Data to Make Informed Decisions Section 16 Tabulating and Graphing Categorical Data to Make Informed Decisions Main Ideas What Are Categorical Data? • What Are Categorical Data? Investigation 16.1 Collecting and Organizing Categorical Data • Organizing and Displaying Your teacher will ask you to complete a survey. Some of the questions might Categorical Data include: Simple Tables (a) Can you make the Vulcan greeting with your right hand? Two-Way Tables (b) Are you left-handed? Bar Charts and Pie Charts (c) Did you send or receive an e-mail today? • Converting Two-Way Tables Into (d) Did you use Twitter today? Charts (e) In what season were you born? Side-by-Side Bar Chart (f) What color are your eyes? Stacked Bar Chart (g) Do you like to ride roller coasters? Segmented Percentage Bar Chart • Making Decisions Using Categorical data are data involving categorical variables, which assume named Two-Way Tables and Graphs or coded values. For example, the category “eye color” could take on the values • Independent and Associated of brown or blue or hazel (or others). The category “political party” could take on Variables the values of Democrat, Republican, or Independent (or others). Categorical data or variables are classified as binary if there are exactly two possible values, such as Yes and No, or Male and Female. At the end of Part II, you will be asked to conduct a statistical study of a question that interests you. Depending on your question, you may collect categorical data or quantitative data. Categorical data are treated differently from quantitative data. For example, you can compute the average of distance to school (a quantitative vari- able), but it makes no sense to compute the average model of car driven to school (a categorical variable). In this section, you will organize and display categorical data to make sense of the relationships within the data. Organizing and Displaying Categorical Data Did you know Ohio is home to two of the top ten roller coasters in the world? Table 16.1 lists the 24 fastest roller coasters in Ohio. The table includes data on five variables related to these roller coasters: two are quantitative; three are categorical, two of which (Park and Material) are binary. Simple Tables Table 16.1 is complicated and contains a great deal of information. It may be diffi- cult to infer any specific or useful information from it. You might have noticed that Cedar Point appears to have a few more coasters than Kings Island, or that there are many more steel coasters than wooden coasters. Most of the coasters have sit-down cars. Some of the coasters have top speeds greater than 60 mph. All of this informa- tion is true, but statisticians prefer to be more precise than just using words such as “more” or “most” or “some.” Statisticians prefer to convey the information by organizing it into a table using numbers, or frequencies of observations. Table 16.2 is an example of a frequency table. 140 Advanced Quantitative Reasoning Copyright © 2015 by Gregory D. Foley, Thomas R. Butts, Stephen W. Phelps, and Daniel A. Showalter Part II Probability and Statistical Reasoning Section 16 Tabulating and Graphing Categorical Data to Make Informed Decisions Table 16.1 Data for Selected Ohio Roller Coasters, in Descending Order by Speed Speed Height Roller Coaster Park Material Car Type (mph) (ft) Top Thrill Dragster Cedar Point Steel Sit down 120 420 Millenium Force Cedar Point Steel Sit down 93 310 Diamondback Kings Island Steel Sit down 80 230 Magnum XL-200 Cedar Point Steel Sit down 72 205 Wicked Twister Cedar Point Steel Inverted 72 215 Maverick Cedar Point Steel Sit down 70 105 Beast Kings Island Wood Sit down 65 110 Mean Streak Cedar Point Wood Sit down 65 161 Gemini Cedar Point Steel Sit down 60 125 Mantis Cedar Point Steel Stand up 60 145 Raptor Cedar Point Steel Inverted 57 137 Vortex Kings Island Steel Sit down 55 148 Flight of Fear Kings Island Steel Sit down 54 74 Table 16.2 Racer Kings Island Wood Sit down 53 88 Numbers of Ohio Roller Flight Deck Kings Island Steel Suspended 51 78 Coasters, by Car Type Firehawk Kings Island Steel Flying 50 115 Car Type Frequency Invertigo Kings Island Steel Inverted 50 131 Bobsled 1 Corkscrew Cedar Point Steel Sit down 48 85 Flying 1 Cedar Creek Mine Ride Cedar Point Steel Sit down 42 48 Inverted 3 Backlot Stunt Coaster Kings Island Steel Sit down 40 45 Sit down 16 Blue Streak Cedar Point Wood Sit down 40 78 Stand up 1 Disaster Transport Cedar Point Steel Bobsled 40 63 Suspended 2 Iron Dragon Cedar Point Steel Suspended 40 76 Total 24 Wildcat Cedar Point Steel Sit down 40 50 Source: Roller Coaster Database The relative frequency of a category is the frequency of observations of that category compared to the total number of observations; a relative frequency can be represented as a fraction, a decimal, or a percent. Quick Question 16.2 Creating a Relative Frequency Table Summarize the data of Table 16.2 by creating a relative frequency table that Definitions Explanatory Variable; uses percents totaling 100% (in the second column) instead of counts. What Response Variable information does the relative frequency table tell you that a frequency table In statistics, when the value of a does not? variable x may explain or predict the value of another variable y, the variable x is an explanatory Exploration 16.3 Mixed Frequencies variable (predictor variable), and Organize the data from one of the Investigation 16.1 questions into a frequency y is a response variable (predicted table and a relative frequency table. variable, outcome variable). In such cases in algebra class, you likely called x an independent vari- Two-Way Tables able, and y a dependent variable. A simple frequency or relative frequency table displays one category of data. Stat- The terms dependent and indepen- isticians often combine two frequency tables into a single two-way frequency table dent have other special meanings in as shown in Figure 16.1. In the two-way frequency table two categories are rows statistics. (Steel or Wood), and two categories are columns (Cedar Point or Kings Island). This An explanatory variable may or resulting two-way table is a “2 by 2” table (written as 2 × 2), where the first num- may not cause changes in a response ber is the number of rows and the second number is the number of columns (just variable. We will address the issue of explanatory response variables causation in Section 19. like matrices). In a two-way table involving and , the explanatory variables form the columns, and the response variables form the rows. Copyright © 2015 by Gregory D. Foley, Thomas R. Butts, Stephen W. Phelps, and Daniel A. Showalter Advanced Quantitative Reasoning 141 Part II Probability and Statistical Reasoning Section 16 Tabulating and Graphing Categorical Data to Make Informed Decisions Number of Number of Roller Coasters Roller Coasters Cedar Point 15 Steel 20 Kings Island 9 Wood 4 Total 24 Total 24 Cedar Point Kings Island Total Steel 13 7 20 Wood 2 2 4 Total 15 9 24 Figure 16.1 Combining simple frequency tables into a two-way frequency table. Quick Question 16.4 Reading and Interpreting a Two-Way Table (a) What percent of the roller coasters are steel? (b) What percent of the roller coasters are at Cedar Point? (c) What percent of the steel roller coasters are at Cedar Point? (d) What percent of the Cedar Point roller coasters are steel? (e) What percent of the wooden roller coasters are at Kings Island? (f) What percent of the Kings Island roller coasters are wood? Quick Question 16.5 Creating a Two-Way Table Organize two pieces of data you collected in Investigation 16.1 (for example, the season born and eye color) in a two-way table. Bar Charts and Pie Charts Frequency tables, such as Table 16.2, provide useful numerical information, but more online visual displays give an instant sense of a distribution. The dot plot in Figure 16.2a shows the distribution of roller coasters across car types; each dot represents one An applet designed to help you see the of the roller coasters. A bar graph (or bar chart) of a categorical variable shows relationship between a dot plot and a pie chart can be found at www.aqrpress. the frequencies of the various categories as the lengths of separate bars relative com/sa1601. to a numerical scale (Figure 16.2b). A pie chart (or circle graph) shows each fre- quency as an angular wedge (sector of the circle), thus revealing relative frequency (Figure 16.2c). Figure 16.2 Car types for the 24 fastest roller coasters in Ohio displayed as (a) a dot plot, (b) a bar graph, and (c) a circle graph. The roller coaster data in Table 16.1 have been organized into height categories in Table 16.3 and displayed as a bar chart in Figure 16.3. Notice that we turned a quantitative variable (height) into a categorical variable. Each bar is the same width, and the height of each bar is proportional to the frequency in each category. 142 Advanced Quantitative Reasoning Copyright © 2015 by Gregory D. Foley, Thomas R. Butts, Stephen W. Phelps, and Daniel A. Showalter Part II Probability and Statistical Reasoning Section 16 Tabulating and Graphing Categorical Data to Make Informed Decisions Table 16.3 The bar chart in Figure 16.3 foreshadows the quantitative-variable display called Heights of Ohio Roller Coasters a histogram.