Center: Finding the Median Median Spread: Home on the Range

Center: Finding the Median Center: Finding the Median (cont.) • When we think of a typical value, we usually look • A more reasonable choice for center than for the center of the distribution. the midrange is the value with exactly half • For a unimodal, symmetric distribution, it’s easy the data values below it and half above it. to find the center—it’s just the center of This particular value is called the median. symmetry. • The median is the middle data value (once • We could average the minimum and maximum the data values have been ordered) that data values (called the midrange) as a measure divides the histogram into two equal areas. of center, but the midrange is very sensitive to • The median has the same units as the skewed distributions and outliers. data. Copyright © 2004 Pearson Education, Inc. Slide 5-1 Copyright © 2004 Pearson Education, Inc. Slide 5-2 Median Spread: Home on the Range • When describing a distribution numerically, we always report a measure of its spread along with The sample median is the n + 1 largest observation. its center. 2 • The range of the data is the difference between the maximum and minimum values: Range = max – min. n +1 • A disadvantage of the range is that a single If is not a whole number, the median is the 2 extreme value can make it very large and, thus, average of the two observations on either side. not representative of the data overall. Copyright © 2004 Pearson Education, Inc. Slide 5-3 Copyright © 2004 Pearson Education, Inc. Slide 5-4 The Interquartile Range Quartiles • The interquartile range (IQR) allows us to Quartiles split the data into quarters ignore extreme data values and • Lower quartile (Q1) divides bottom half of data concentrate on the middle of the data. into two • To find the IQR, we first need to know – median of observations below the median • Upper quartile (Q ) divides upper half of data what quartiles are… 3 into two – median of observations above the median • The difference between the quartiles is the IQR, so IQR = upper quartile – lower quartile. Copyright © 2004 Pearson Education, Inc. Slide 5-5 Copyright © 2004 Pearson Education, Inc. Slide 5-6 The Interquartile Range (cont.) The Five-Number Summary • The lower and upper quartiles are the 25th and • Five number summary 75th percentiles of the data, so… { Min, Q1, Median, Q3, Max } • The IQR contains the middle 50% of the values of the distribution, as shown in Figure 5.3 from •Example: the text: Copyright © 2004 Pearson Education, Inc. Slide 5-7 Copyright © 2004 Pearson Education, Inc. Slide 5-8 Boxplots Boxplot •A boxplot is a graphical display of the five- number summary. The steps involved in Q Med Q constructing a boxplot can also be found 1 3 Data on pages 60-61 of the text. 1.5 IQR 1.5 IQR • Boxplots are particularly useful when (pull back until hit observation) (pull back until hit observation) comparing groups. Scale Figure 2.4.4 Construction of a box plot. From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000. Copyright © 2004 Pearson Education, Inc. Slide 5-9 Copyright © 2004 Pearson Education, Inc. Slide 5-10 Construction of Boxplot Comparing Groups With Boxplots Data: breaking strength of wire in kilograms • The following set of boxplots compares the 220 214 222 218 223 210 223 210 227 225 212 effectiveness of various coffee containers: Leaf Unit = 1.0 kg 4 21 0024 5 21 8 (4) 22 0233 2 22 57 • Find Median • Find Quartiles Q1 = Q3 = • Calculate Interquartile range Q3 -Q1 = • Calculate whisker length 1.5 x (Q -Q) = 3 1 • What does this graphical display tell you? Copyright © 2004 Pearson Education, Inc. Slide 5-11 Copyright © 2004 Pearson Education, Inc. Slide 5-12 Summarizing Symmetric Distributions Sample Mean – average • Medians do a good job of identifying the center of skewed distributions. When we • The sample mean is denoted by x have symmetric data, the mean is a good measure of center. The sample mean = Sum of the observations Number of observations • We find the mean by adding up all of the data values and dividing by n, the number of data values we have. Mean (a) (b) (c) Figure 2.4.1 Mechanical construction representing a dot plot: Copyright © 2004 Pearson Education, Inc. Slide 5-13 Copyright © 2004 Pearson Education,(a) shows Inc. a balanced rod while (b) and (c) show unbalanced rods.Slide 5-14 Relationship between mean and Mean or Median? median • Regardless of the shape of the distribution, the mean is the point at which a histogram of the data would balance. P • In symmetric distributions, the mean and median Med = x are approximately the same in value, so either (a) Data symmetric about P measure of center may be used. P • For skewed data, though, it’s better to report the Med median than the mean as a measure of center. x (b) Two largest points moved to the right Figure 2.4.2 The mean and the median. [Grey disks in (b) are the ``ghosts'' of the points that were moved.] From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000. Copyright © 2004 Pearson Education, Inc. Slide 5-15 Copyright © 2004 Pearson Education, Inc. Slide 5-16 What About Spread? Variance • A more powerful measure of spread than • The sample variance, denoted by s2, is the IQR is the standard deviation, which found using the formula takes into account how far each data value is from the mean. 2 2 2 2 •A deviation is the distance that a data 2 ()x1 − x + ()x2 − x +...+ ()xn − x 1 s = = ∑()xi − x value is from the mean. Since adding all n −1 n −1 deviations together would total zero, we square each deviation and find an average of sorts for the deviations. Copyright © 2004 Pearson Education, Inc. Slide 5-17 Copyright © 2004 Pearson Education, Inc. Slide 5-18 Sample Standard Deviation Shape, Center, and Spread 2 2 2 2 • When telling about a quantitative variable, ()x1 − x + ()x2 − x + ... + ()xn − x 1 sx = = ∑ ()xi − x n −1 n −1 always report the shape of its distribution, along with a center and a spread. • In same units as data • If the shape is skewed, report the median – So preferable to sample variance and IQR. • Equals zero only if all observations identical • If the shape is symmetric, report the mean • Sensitive to outliers (extreme observations) and standard deviation and possibly the • Button on calculator – learn to use it! median and IQR as well. – Much simpler than applying formula Copyright © 2004 Pearson Education, Inc. Slide 5-19 Copyright © 2004 Pearson Education, Inc. Slide 5-20 What About Outliers? What Can Go Wrong? • If there are any clear outliers and you are • Do a reality check—don’t let technology do reporting the mean and standard your thinking for you. deviation, report them with the outliers • Don’t forget to sort the values before present and with the outliers removed. The finding the median or percentiles. differences may be quite revealing. • Don’t compute numerical summaries of a • Note: The median and IQR are not likely to categorical variable. be affected by the outliers. • Watch out for multiple modes—multiple modes might indicate multiple groups in your data. Copyright © 2004 Pearson Education, Inc. Slide 5-21 Copyright © 2004 Pearson Education, Inc. Slide 5-22 What Can Go Wrong? (cont.) So What Do We Know? • Be aware of slightly different methods— • We describe distributions in terms of shape, different statistics packages and center, and spread. calculators may give you different answers •For symmetric distributions, it’s safe to use the for the same data. mean and standard deviation; for skewed • Beware of outliers. distributions, it’s better to use the median and • Make a picture (make a picture, make a interquartile range. picture). • Always make a picture—don’t make judgments • Be careful when comparing groups that about which measures of center and spread to use by just looking at the data. have very different spreads. Copyright © 2004 Pearson Education, Inc. Slide 5-23 Copyright © 2004 Pearson Education, Inc. Slide 5-24.

Center: Finding the Median Median Spread: Home on the Range

Applied Biostatistics Mean and Standard Deviation the Mean the Median Is Not the Only Measure of Central Value for a Distribution

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

An Introduction to Psychometric Theory with Applications in R

TR Multivariate Conditional Median Estimation

Cluster Analysis for Gene Expression Data: a Survey

Reliability Engineering: Today and Beyond

5. the Student T Distribution

The Probability Lifesaver: Order Statistics and the Median Theorem

Biostatistics (BIOSTAT) 1

Notes Mean, Median, Mode & Range

Big Data for Reliability Engineering: Threat and Opportunity

Interactive Statistical Graphics/ When Charts Come to Life