<<

Warm-up

 The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group were accidents, 12,958; homicide, 5,793; suicide, 4,146; cancer, 1,583; heart disease, 1,013; congenital defects, 383; AIDS, 276.

Make a bar graph to display these data. What additional information do you need to make a pie chart? Section 1.2

Describing Distributions with Numbers Specific Ways to Describe Shape, Center and Spread  Center: Mean – ordinary arithmetic average. Pronounced “x-bar.”

n Σ, pronounced “sigma” 1 means the sum of… In XX  i other words, you add n up the terms 1 through i1 n. •Median – the midpoint of the data set. Denoted M. Bonds vs. Aaron

Barry Bonds 16 40 13 32 19 42 27 44 24 46 26 39 25 49 44 29 25 73 30 44 33 39 38 33 40 47 34 34 34 34 45 40 37 44 20 37 24 Have no fear… Your calculator is here!

You can get all this information from your calculator. Type your data in L1 and L2. Stat, 1-Var Stats, L1. Do the same thing for L2. Compare Centers

 Find the mean and median of both Bonds’ and Aaron’s home runs.

X  35.4375 Bonds has a higher average number of home runs, but this average is affected by the Y  34.9 extreme value of 73. The median for Aaron is higher M X  34 than Bonds, indicating that he more home runs than Bonds in a typical season. MY  38 Resistant and Non-resistant

The mean is affected by extreme observations, such as Bonds’ single season record of 73 home runs. It is a non-resistant measure of center. The median, however, is resistant to extreme measures. It is preferable when a data set has outliers. Think About This

Change Bonds’ single season record from 73 home runs to 100 home runs. How is the mean affected? The median? How do the mean and median compare to each other in a symmetric distribution? In a (unimodal) skewed right distribution? In a (unimodal) skewed left distribution?

Introduction to Measures of Spread

Today, we’ll learn about quartiles. Oddly enough, they divide a data set into fourths (25% sections).  Finding quartiles is like finding the median. You count midpoints, and average the middle two numbers if there are an even number of data points. A Visual Representation of Quartiles

Q1 Q2 Q3 Lower Median Upper Quartile 50th Quartile 25th th %ile 75 %ile %ile 25% 25% 25% 25%

So, there are really only THREE quartiles, and the middle one isn’t usually called a quartile (it’s called the median). We generally refer to Q1, M, and Q3. To find Q1, you find the median of the lowest half of data. To find Q3, you find the median of the higher half of the data. Try it!

16 19 24 25 25 33 33 34 34 37 37 40 42 46 49 73

Find the Range, Median, Q1, and Q3 Solution

16 19 24 25 25 33 33 34 34 37 37 40 42 46 49 73

Q1 = 25 Q3 = 41 Median = 34 So, the Range is 73 – 16 = 57. This gives us a little information about the variability of Bonds’ home runs in a season. The middle 50% of the data lies between 25 and 41, so we see where the spread of the middle half of the data lies. Interquartile Range and the Outlier Rule IQR is simply Q3 - Q1. In our Barry Bonds example, IQR = 41 – 25 = 16. The IQR is a suitable measure of spread and is paired with Median. We use the IQR to define what an outlier is. An outlier is any value (or values) that falls more than 1.5*IQR above the upper quartile or below the lower quartile.

“Fences”

 Think of the 1.5*IQR rule as fences. They draw the boundary line beyond which values are outliers.  Is Barry Bonds’ 73 homer season an outlier??? Recall: Q1 = 25; Q3 = 41; IQR = 16 So, 1.5*IQR = 1.5*16 = 24. Add 24 to Q3 and Subtract 24 from Q1: Upper boundary = 24 + 41 = 65 Lower boundary = 25 – 24 = 1 Conclusion: 73 falls above the outlier boundary of 65, so it is an outlier!!! 5 Number Summary

The five number summary consists of the lowest value, Q1, the Median, Q3, and the highest value. It is important because we’ll use it to create a new kind of graph: a boxplot (also called a box-and-whiskers plot). Bonds’ Boxplot

Recall his 5 number summary: L = 16; Q1 = 25; M = 34; Q3 = 41; H = 73

10 20 30 40 50 60 70

Number of home runs in a season Modified Boxplots

 Modified boxplots show outliers as isolated points. Bonds’ 73 home season was an outlier, so the whisker in a modified boxplot only extends to the last data point that was NOT an outlier. Any outlier is shown as a star (*).  CAUTION: Many students extend the whisker to the outlier “fence” (i.e. 65) This is WRONG! The whisker should stop at the last actual data point. So tell me – where should the upper whisker end in a modified boxplot of Bonds’ home runs per season??? 49 We can look at these in the calculator as well. Go to StatPlot. It’s Never Too Soon for a Practice AP Question

2005 AP Statistics Problem #1 Question 1 Part a)

 Part a) is graded Essentially Correct, Partially Correct, or Incorrect  To receive an Essentially Correct, a student must successfully compare center, shape and spread. Specific numeric values are not required.  To receive a Partially Correct, a student must successfully compare 2 of the 3 measures of center, shape and spread.  All other responses are graded as Incorrect. Special Notes

Compare means you state which is larger. For example, “the mean of the rural students’ daily caloric intake is greater than the mean for the urban students” is a correct comparison. However, stating “the mean of the rural students’ daily caloric intake is 40.45 while the mean for the urban students is 32.6” is not a COMPARISON. In Conclusion

Graders were looking for three comparisons: Center—the mean caloric intake of the rural students is greater than the mean caloric intake of the urban students Spread—the spread of the rural students’ distribution is larger than the spread of the urban students Shape—the rural students’ caloric intakes are roughly symmetric while the urban students’ caloric intakes are skewed right. There’s More to Spread than IQR

Section 1.2 Standard Deviation Describing Data with Numbers

So far, we’ve learned the 5 Number Summary to describe a set of data: Min, Q1, M, Q3, and Max. We’ve also used the mean as another measure of center. Measuring Spread: Standard Deviation The most commonly used measure of spread is the standard deviation. Standard deviation tells us, on average, how far the observations are away from the mean. Standard Deviation and Variance

 Variance is the average of the squares of the deviations of the observations from the mean.  WHAT???  But your calculator can tell you all of this!

1 2 s2  x  x n 1 i Properties of Standard Deviation

s2 is called variance. Square root of s2 is __. s measures spread about the mean and is called standard deviation. s = 0 only when there is NO SPREAD (in other words, all the data values are the same). As the observations become more spread out about their mean, s gets larger. s is not resistant to skewness or outliers. WHY?

Recap  Measures of spread: IQR, standard deviation  Measures of center: Median, Mean  When to use which??? The mean and the std. dev. are not resistant to outliers, so use them only when the distribution is roughly symmetric and there aren’t outliers. Use the 5 Number Summary when the distribution is strongly skewed or has outliers. How the AP Folks Test Your Ability to Reason How do the following affect the mean? The median? The Std. Dev.? Adding a certain amount to every value in a data set Multiplying each value in a data set by the same number

Homework

Chapter 1 #40, 41, 45, 50, 52