Quantitative Data Can Be Organized Into Classes and Displayed in a Frequency Distribution

Total Page:16

File Type:pdf, Size:1020Kb

Quantitative Data Can Be Organized Into Classes and Displayed in a Frequency Distribution

Module 1 - Topic: Graphical displays of Quantitative Data Quantitative data can be organized into classes and displayed in a frequency distribution table. A Histogram is a graphic version of a frequency distribution table.

Example: Ages of some Pierce students

Age Class Frequency  here is the Frequency Distribution Table 10-19 6 20-29 11 Below is a histogram of this data. Notice that the three histograms are identical except for the labels on the 30-39 6 Horizontal axis. There are several ways to label the horizontal axis: 40-49 1 1st graph: lower class limits are labeled 50-59 2 2nd graph: midpoint of each class is labeled 3rd graph: the class range is stated for each class n = frequency = 26

Module 1 - Topic: Graphical displays of Quantitative Data The relative frequency of a class is the frequency of items in that class divided by the total number of data items. Relative frequency = f/n The relative frequency may be expressed as a fraction or its equivalent decimal or percent.

Example: Ages of some students (the same data as the prior example) Determining the relative frequency Note: this is the “work document” where we calculate the relative frequencies. In the end, only one column is displayed in a table. Relative Relative Relative Age Frequency Frequency Frequency Class Frequency as fraction as decimal as percent 6/26 or 10-19 6 3/13 0.231 23.1% 20-29 11 11/26 0.423 42.3% 6/26 or 30-39 6 3/13 0.231 23.1% 40-49 1 1/26 0.038 3.8% 2/26 or 50-59 2 1/13 0.077 7.7% total 26 26/26 1.000 100.0%

Below is the relative frequency table, then the Relative Frequency Histogram, then the Histogram.

I chose to use Age Relative percents. Class Frequency 10-19 23.1% Notice that the 20-29 42.3% relative frequency histogram and the 30-39 23.1% histogram look 40-49 3.8% identical in shape. 50-59 7.7% The difference is in the vertical axis. total 100.0% Module 1 - Topic: Graphical displays of Quantitative Data Deciding on classes when making a histogram There is not “one right way” to make the classes, but there are some general guidelines. - The first class should contain the minimum data value and the last class should contain the maximum data value. [Exception: if two different data sets are to have the same classes so that the data can be better compared, then a histogram might have an extra, empty class at the beginning or end, for the purpose of matching the other histogram.] - Histograms are typically easier to interpret if there are about 5 to 8 classes. However, a large data set (e.g., 1000+ data items) might usefully have 20 or so classes. - If you use technology, it is fairly easy to make a histogram with one set of classes, and then try a different set of classes to see if it provides a better display. A “better display” is one that lets the reader understand the data better. Here are three histograms of the same data (the same example as above). The first has 5 classes. The second divided each of those classes in half for 10 classes. The third has 7 classes. Notice how the different choices of classes make differences in the shapes of the graphs – though each one is skewed right.

Module 1 - Topic: Graphical displays of Quantitative Data Stem and Leaf Plots, also known as Stem Plots In the example of the three histograms of the same data (but different classes), notice that you cannot determine exactly what the original data was. For example, you can tell that the minimum age is between 15 and 19, inclusive, but cannot tell if there is actuall anyone with age 15 or 16 or 17, etc. Once the data is grouped into classes, we cannot see what the original data values are from a histogram or from a frequency table.

A Stem and Leaf plot is a way to organize data into a display so that each original data value is still apparent. - Each data value is separated into a “stem” and a “leaf” - The stem represents one place value – for example, the tens, or the hundreds, or the thousands. - In the diagram, the list of stems begins with the smallest stem required, and continues without any gaps to the largest stem required. (So, if there is data for stems of size 3, 4, 6, and 8, the stems that are listed in the diagram are 3, 4, 5, 6, 7, and 8 – since no stems are skipped even if they are not needed.) - The leaf represents the place value that is one less than the stem - Each data value in the data set contributes one “leaf” to the diagram

The age data displayed in the earlier histograms has this stem and leaf plot.

Stem and Leaf Diagram (tens) (units) From the stem and leaf plot we can see that the minimum age is 16, 1 6 7 8 9 9 9 And the maximum ageis 56. 2 0 0 1 1 1 2 2 4 7 7 9 Each of the 26 “leaves” in the display is from one of the 26 data items. 3 0 1 1 2 2 8 4 3 If you turn the stem and leaf plot sidways, the rows of numbers look like columns in a histogram.

5 2 6 In this case the columns look like the histogram above with the classes of 10-19, 20-29, etc. Stem and Leaf Plots – Exercises 1. Make a Stem and Leaf Plot of this data of the high temperature in degrees F on some spring days in Ohio.

56, 63, 61, 65, 56, 49, 47, 63, 74, 68, 72, 80, 72

2. Make a Stem and Leaf Plot of this data of the selling prices of one model of car sold last month at a car dealer. Notice: the stem can represent the number of thousands, and the stem is the hundreds. First round each price to the nearest hundred. $26,800 $23,900 $25,740 $26,410 $25,425 $28,250 $24,335 $26,575 $24,500 $26,880

3. The number of students in the primary schools in one county (rounded numbers) is given by this Stem Plot. Use the data to answer the questions. Stem | Leaf (hundreds) | (tens) 1 | 5 7 9 2 | 4 8 8 3 | 1 5 6 6 9 4 | 5 | 2

a) How many primary schools are in the county? b) How many students are in the smallest size school? c) How many students are in the largest size school? d) What is the size of school in the middle? (That is, what is the median?) e) Are there any outliers? If so, what is it, and why?

Answers: For #2, values are rounded to nearest hundred. #3 1. stem | leaf 2. Stem | Leaf a) 12 [the number of leaves] (tens) | (ones) (thousands) | (hundreds) b) 150 4 | 7 9 23 | 9 c) 520 5 | 6 6 24 | 3 5 d) 300 (half way between 280 and 310) 6 | 1 3 3 5 8 25 | 4 7 e) 520 is probably an outlier 7 | 2 2 4 26 | 4 6 8 9 8 | 0 27 | 28 | 3 Module 1 - Topic: Graphical displays of Quantitative Data Shapes of Distributions of Quantitative Data

A “Normal” Distribution (this is a technical term) • Is “bell-shaped” – the frequencies increase to a maximum and then decrease. • A Normal distribution is symmetric (the left half of the graph is approximately a mirror image of the right half)

Above: Example of a Distribution that is approximately normal Module 1 - Topic: Graphical displays of Quantitative Data Skewed to the Right also called “skewed positively” Frequencies are high at first and then lower frequencies trail off in a “tail” to the right.

Skewed to the Left also called “skewed negatively” The “tail” where the low frequencies trail off is on the left, while the higher frequencies are on the right. Module 1 - Topic: Graphical displays of Quantitative Data Uniform - approximately the same frequency everywhere Module 1 - Topic: Graphical displays of Quantitative Data

Frequency Polygon (sometimes called a Line Graph) - Shows the information that is in a Frequency Distribution table and in the Histogram - The horizontal axis is marked with the Midpoints of the classes - A dot is placed above each midpoint at the frequency of the class - Ideally, the midpoints before the first class and after the last class are marked with frequency zero (but this is often not done). - The dots are connected.

Example:

Frequency table Class Midpoint Frequency 14.5 6 24.5 11 34.5 6 44.5 1 54.5 2 Module 1 – Practice reading/interpreting a Histogram (as in On-line Homework)

1. What is the frequency of times the limit was exceeded by 1 item? 2. What is the frequency of times the limit was exceeded Note: horizontal axis is for # of items. by at least 5 items? Ex: “limit exceeded by 3 items” means the “# items over 10” 3. What is the frequency of times the limit was exceeded is 3, so look on horizontal axis for “3” – and see that it is the by more than 5 items? column for 2.5 to 3.5. The frequency of that column is the 4. What is the frequency of times the limit was exceeded height of that column, which is labeled as 14 on the vertical by less than 3 items? axis.

Answers: 1) the “limit exceeded by 1 item” is found on horizontal axis- the first column is for 1 item. So the frequency of that is 5. 2) “at least 5 items” means 5 or 6 or 7 or more – so add the frequency for those columns: 11+7+5 = 23 3) “more than 5 items” means 6 or 7 or more – so add the frequency for those columns: 7 + 5 = 12 4) “less than 3 items” means 1 or 2 items – so add the frequency for those columns: 5+9 = 14

Recommended publications