<<

Describing Quantitative (Shape, Center, and Spread) “A picture is worth a thousand words”

To describe quantitative data is to explain how the data is distributed.

 What is the shape of the data?  Where is the center of the data?  How spread out is the data?

Think Before You Draw…  Remember the “Make a picture” rule?  Now that we have options for data displays, you need to Think carefully about which type of display to make.  Before making a stem-and-leaf display, a histogram, or a dotplot, check the o Quantitative Data Condition: The data are values of a quantitative variable whose units are known.

Shape, Center, and Spread  When describing a distribution, make sure to always tell about three things: shape, center, and spread…

What is the Shape of the Distribution? 1. Does the histogram have a single, central hump or several separated humps? 2. Is the histogram symmetric? 3. Do any unusual features stick out?

Humps

1. Does the histogram have a single, central hump or several separated bumps? o Humps in a histogram are called modes. o A histogram with one main peak is dubbed unimodal; histograms with two peaks are bimodal; histograms with three or more peaks are called multimodal.

o A bimodal histogram has two apparent peaks:

Modes are one way to describe the SHAPE of the graph? The humps of the graph are called MODES.

 A histogram that doesn’t appear to have any and in which all the bars are approximately the same height is called uniform:

Symmetry

2. Is the histogram symmetric? o If you can fold the histogram along a vertical line through the middle and have the edges match pretty closely, the histogram is symmetric.

 The (usually) thinner ends of a distribution are called the tails. If one tail stretches out farther than the other, the histogram is said to be skewed to the side of the longer tail.  In the figure below, the histogram on the left is said to be skewed left, while the histogram on the right is said to be skewed right.

Summary:

Anything Unusual?

3. Do any unusual features stick out?  Sometimes it’s the unusual features that tell us something interesting or exciting about the data.  You should always mention any stragglers, or outliers, that stand off away from the body of the distribution.  Are there any gaps in the distribution? If so, we might have data from more than one group.

Are there GAPS in your data?

 The following histogram has outliers (there are three cities in the leftmost bar):

Are there groups or clusters of data?

If so, your data may not all be of the same type, may come from different sources, or contain more than one group.

There are two main clusters of data here: one ranging from approximately 7 to 12 and the other ranging from approximately 25 to 37.

Where is the CENTER of the distribution?

It's easy to identify the center of data that is somewhat uniform or unimodal-symmetric:

The center of a skewed graph or a multimodal graph are not as easy to determine or use. Certain measures of center can be meaningless or may not be useful in these types of graphs.

Spread:

How SPREAD out is the distribution?

 Always report a measure of spread along with a measure of center when describing a distribution numerically.  The of the data is the difference between the maximum and minimum values:  Range = max – min  A disadvantage of the range is that a single extreme value can make it very large and, thus, not representative of the data overall.

Example: The following graph displays the time it takes for a warehouse to retrieve parts for customer orders. Describe the distribution of the data.

Why is the graph shaped like this?

The following graph displays the time it takes for a warehouse to retrieve parts for customer orders.

What real life aspects could account for the shape of this graph?

This graph is obviously bimodal. It seems like there are two groups of parts listed. One group of parts takes between 1 and 5 minutes to retrieve and the other group takes 6 to 12 minutes to retrieve.

It is possible that the two groups are separated by weight. Maybe the warehouse employees need to use a machine to retrieve the heavier parts whereas the lighter parts can be retrieved more quickly by hand.

There are other possibilities….

Maybe some parts are kept in a warehouse that is a bit further away from the customer than the main warehouse.

Try this:

The dotplot to the right shows Kentucky Derby winning times, plotting each race as its own dot.

a) How many Kentucky Derby winning times were greater than 150 seconds?

b) Approximate the best Kentucky Derby winning time.

c) Why do you think there is a large gap in the middle of the data? (What real life situation may have caused this gap?)