<<

Center: Finding the Center: Finding the Median (cont.)

• When we think of a typical value, we usually look • A more reasonable choice for center than for the center of the distribution. the midrange is the value with exactly half • For a unimodal, symmetric distribution, it’s easy the values below it and half above it. to find the center—it’s just the center of This particular value is called the median. symmetry. • The median is the middle data value (once • We could the minimum and maximum the data values have been ordered) that data values (called the midrange) as a measure divides the into two equal areas. of center, but the midrange is very sensitive to • The median has the same units as the skewed distributions and . data.

Copyright © 2004 Pearson Education, Inc. Slide 5-1 Copyright © 2004 Pearson Education, Inc. Slide 5-2

Median Spread: Home on the

• When describing a distribution numerically, we always report a measure of its spread along with The median is the n + 1 largest . its center. 2 • The range of the data is the difference between the maximum and minimum values: Range = max – min. n +1 • A disadvantage of the range is that a single If is not a whole number, the median is the 2 extreme value can make it very large and, thus, average of the two on either side. not representative of the data overall.

Copyright © 2004 Pearson Education, Inc. Slide 5-3 Copyright © 2004 Pearson Education, Inc. Slide 5-4 The

• The interquartile range (IQR) allows us to Quartiles split the data into quarters ignore extreme data values and • Lower (Q1) divides bottom half of data concentrate on the middle of the data. into two • To find the IQR, we first need to know – median of observations below the median • Upper quartile (Q ) divides upper half of data what quartiles are… 3 into two – median of observations above the median • The difference between the quartiles is the IQR, so IQR = upper quartile – lower quartile.

Copyright © 2004 Pearson Education, Inc. Slide 5-5 Copyright © 2004 Pearson Education, Inc. Slide 5-6

The Interquartile Range (cont.) The Five-Number Summary

• The lower and upper quartiles are the 25th and • Five number summary 75th of the data, so… { Min, Q1, Median, Q3, Max } • The IQR contains the middle 50% of the values of the distribution, as shown in Figure 5.3 from •Example: the text:

Copyright © 2004 Pearson Education, Inc. Slide 5-7 Copyright © 2004 Pearson Education, Inc. Slide 5-8 Boxplots Boxplot

•A boxplot is a graphical display of the five- number summary. The steps involved in Q Med Q constructing a boxplot can also be found 1 3 Data on pages 60-61 of the text. 1.5 IQR 1.5 IQR • Boxplots are particularly useful when (pull back until hit observation) (pull back until hit observation) comparing groups.

Scale

Figure 2.4.4 Construction of a box .

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Copyright © 2004 Pearson Education, Inc. Slide 5-9 Copyright © 2004 Pearson Education, Inc. Slide 5-10

Construction of Boxplot Comparing Groups With Boxplots

Data: breaking strength of wire in kilograms • The following set of boxplots compares the 220 214 222 218 223 210 223 210 227 225 212 effectiveness of various coffee containers: Leaf Unit = 1.0 kg

4 21 0024 5 21 8 (4) 22 0233 2 22 57

• Find Median

• Find Quartiles Q1 = Q3 =

• Calculate Interquartile range Q3 -Q1 = • Calculate whisker length 1.5 x (Q -Q) = 3 1 • What does this graphical display tell you?

Copyright © 2004 Pearson Education, Inc. Slide 5-11 Copyright © 2004 Pearson Education, Inc. Slide 5-12 Summarizing Symmetric Distributions Sample – average

do a good job of identifying the center of skewed distributions. When we • The sample mean is denoted by x have symmetric data, the mean is a good measure of center. The sample mean = Sum of the observations Number of observations • We find the mean by adding up all of the data values and dividing by n, the number of data values we have. Mean (a) (b) (c)

Figure 2.4.1 Mechanical construction representing a dot plot:

Copyright © 2004 Pearson Education, Inc. Slide 5-13 Copyright © 2004 Pearson Education,(a) shows Inc. a balanced rod while (b) and (c) show unbalanced rods.Slide 5-14

Relationship between mean and Mean or Median? median • Regardless of the shape of the distribution, the mean is the point at which a histogram of the data would balance. P • In symmetric distributions, the mean and median Med = x are approximately the same in value, so either (a) Data symmetric about P measure of center may be used. P • For skewed data, though, it’s better to report the Med median than the mean as a measure of center. x (b) Two largest points moved to the right Figure 2.4.2 The mean and the median. [Grey disks in (b) are the ``ghosts'' of the points that were moved.]

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000. Copyright © 2004 Pearson Education, Inc. Slide 5-15 Copyright © 2004 Pearson Education, Inc. Slide 5-16 What About Spread?

• A more powerful measure of spread than • The sample variance, denoted by s2, is the IQR is the standard , which found using the formula takes into account how far each data value is from the mean. 2 2 2 2 •A deviation is the distance that a data 2 ()x1 − x + ()x2 − x +...+ ()xn − x 1 s = = ∑()xi − x value is from the mean. Since adding all n −1 n −1 deviations together would total zero, we square each deviation and find an average of sorts for the deviations.

Copyright © 2004 Pearson Education, Inc. Slide 5-17 Copyright © 2004 Pearson Education, Inc. Slide 5-18

Sample Shape, Center, and Spread

2 2 2 2 • When telling about a quantitative variable, ()x1 − x + ()x2 − x + ... + ()xn − x 1 sx = = ∑()xi − x n −1 n −1 always report the shape of its distribution, along with a center and a spread. • In same units as data • If the shape is skewed, report the median – So preferable to sample variance and IQR. • Equals zero only if all observations identical • If the shape is symmetric, report the mean • Sensitive to outliers (extreme observations) and standard deviation and possibly the • Button on calculator – learn to use it! median and IQR as well. – Much simpler than applying formula

Copyright © 2004 Pearson Education, Inc. Slide 5-19 Copyright © 2004 Pearson Education, Inc. Slide 5-20 What About Outliers? What Can Go Wrong?

• If there are any clear outliers and you are • Do a reality check—don’t let technology do reporting the mean and standard your thinking for you. deviation, report them with the outliers • Don’t forget to sort the values before present and with the outliers removed. The finding the median or percentiles. differences may be quite revealing. • Don’t compute numerical summaries of a • Note: The median and IQR are not likely to . be affected by the outliers. • Watch out for multiple modes—multiple modes might indicate multiple groups in your data.

Copyright © 2004 Pearson Education, Inc. Slide 5-21 Copyright © 2004 Pearson Education, Inc. Slide 5-22

What Can Go Wrong? (cont.) So What Do We Know?

• Be aware of slightly different methods— • We describe distributions in terms of shape, different statistics packages and center, and spread. calculators may give you different answers •For symmetric distributions, it’s safe to use the for the same data. mean and standard deviation; for skewed • Beware of outliers. distributions, it’s better to use the median and • Make a picture (make a picture, make a interquartile range. picture). • Always make a picture—don’t make judgments • Be careful when comparing groups that about which measures of center and spread to use by just looking at the data. have very different spreads.

Copyright © 2004 Pearson Education, Inc. Slide 5-23 Copyright © 2004 Pearson Education, Inc. Slide 5-24