Boxplot Is a Visual Representation of the Data Using Min, Q1, Median, Q3, Max and Outliers
Total Page:16
File Type:pdf, Size:1020Kb
Ch2.3 Boxplot
Boxplot is a visual representation of the data using min, Q1, median, Q3, max and outliers.
Ex. (Boxplot without outliers) In the rainfall example, the 5-number summary = (1, 3.45, 4.6, 5.4, 8.2). The boxplot of the Rainfall example is
Ex. (Boxplot with outliers): Consider a different dataset, the 5-number summary of the data is (Min, Q1, Med, Q3, Max) = ( -3, 3.4, 4.6, 5.4, 9.0)
1 Construct a Boxplot:
Ex. Age of the “best actor” Oscar winner (1991-2000) 55, 42, 37, 38, 31, 45, 60, 46, 40, 36 Steps: (1) Sort the data: 31 36 37 38 40 42 45 46 55 60 (2) Find the 5-number summary: min=31, Q1=37, median=41, Q3=46, max=60
(3) Identify outliers using the 1.5 IQR rule IQR = 46-37 = 9 1.5*IQR = 13.5 Q1 – 1.5*IQR = 37 - 13.5 = 23.5 (no data points below 23.5) Q3 + 1.5*IQR = 46 + 13.5 = 59.5 (one data point 60 is above 59.5, so 60 is an outlier)
(4) Draw a central box span the quartiles
(5) Draw a line in the box marking the median
(6) Outliers are plotted individually
(7) Use lines to connect the smallest and the largest observations that are not outliers
Note: The boxplot is theleft one on the next page.
2 Comment: Boxplots are very useful when presented side-by-side for comparison Ex. Side-by-side boxplots of the ages of the best actor (1) and the best actress (2) of the Oscar winners in the past 10 years
Shape: Since Q1’s are little closer to median (center) than Q3, minimums are closer to the center than maximums, both distributions are a little skewed to right (positively skewed)
Center: The center for actors is about 7 years larger than that for actresses.
Spread: Both distributions have similar spread.
Outliers: There is one outlier in actor data. However, there is no outlier in actress data.
Overall: The shapes of both distributions look similar. We can roughly view the actor distribution is shifted (upward) by about 7 years.
3