The Histogram on Minitab

(a) MTB > hist c1 [Draws a histogram of the sample in c1; Minitab chooses the classes]

(b) MTB > hist c1; MTB > start m; MTB > incr w.

[ Draws a histogram of the sample in c1 with starting mid-point ‘m’ and class width (CW) ‘w’]

Note: The Minitab Output gives the class mid-points mi and the class frequencies (counts) fi . By hand we obtain the class width.

CW= the distance between successive mid-points.

The lower and upper boundaries of the ith class are given by

[LCB, UCB) = [mi - .5CW, mi + .5 CW)

Example ( rem.mtw): A part of sleep called REM sleep is associated with dreaming. The data below gives the fraction of time spent in REM sleep for a random sample of 40 adults.

0.23 0.18 0.24 0.24 0.27 0.13 0.15 0.35 0.18 0.18

0.17 0.14 0.19 0.21 0.20 0.14 0.21 0.18 0.18 0.28

0.23 0.24 0.17 0.24 0.17 0.17 0.12 0.23 0.18 0.28

0.10 0.20 0.19 0.20 0.31 0.26 0.23 0.17 0.18 0.17

(8) A. MTB > hist c1

Histogram of %REM N = 40

Midpoint Count 0.10 1 * 0.12 1 * 0.14 3 *** 0.16 1 * 0.18 13 ************* 0.20 5 ***** 0.22 2 ** 0.24 8 ******** 0.26 1 * 0.28 3 *** 0.30 0 0.32 1 * 0.34 0 0.36 1 *

(9) B. MTB > hist c1; SUBC> start .08; SUBC> incr .04.

Histogram

Histogram of %REM N = 40

Midpoint Count 0.0800 0 0.1200 3 *** 0.1600 9 ********* 0.2000 14 ************** 0.2400 8 ******** 0.2800 4 **** 0.3200 1 * 0.3600 1 *

(10) QUESTIONS: 1. With reference to histogram A answer the following: (a) What is the class width?

(b) What are the boundaries of the class width? (c) What percentage of the subjects in the sample spent 23% or more of their sleep in REM sleep?

2. With reference to histogram B answer the following:

(a) Find the boundaries of the class containing the greatest number of sample values?

(b) How many sample values fall in the last four classes?

(c) What is the shape of this histogram?

3. Suppose you wanted to use Minitab to form a histogram with the lower boundary of the first class equal to 1 and a class width equal to 6. What Minitab Commands would you use?

(11) Ideally we want a histogram that gives us as much information as possible. If we have too few classes we lose too much individual information; if we have too many we lose overall information.

MTB > hist c1; SUBC> start .1; SUBC> incr .18.

Histogram of %REM N = 40 Midpoint Count 0.100 19 ******************* 0.280 21 *********************

(12) MTB > hist c1; SUBC> start .1; SUBC> incr .005. Histogram of %REM N = 40 Midpoint Count 0.10000 1 * 0.10500 0 0.11000 0 0.11500 0 0.12000 1 * 0.12500 0 0.13000 1 * 0.13500 0 0.14000 2 ** 0.14500 0 0.15000 1 * 0.15500 0 0.16000 0 0.16500 0 0.17000 6 ****** 0.17500 0 0.18000 7 ******* 0.18500 0 0.19000 2 ** 0.19500 0 0.20000 3 *** 0.20500 0 0.21000 2 ** 0.21500 0 0.22000 0 0.22500 0 0.23000 4 **** 0.23500 0 0.24000 4 **** 0.24500 0 0.25000 0 0.25500 0 0.26000 1 * 0.26500 0 0.27000 1 * 0.27500 0 0.28000 2 ** 0.28500 0 0.29000 0 0.29500 0 0.30000 0 0.30500 0 0.31000 1 * 0.31500 0 0.32000 0 0.32500 0 0.33000 0 0.33500 0 0.34000 0 0.34500 0 0.35000 1 * (13)

Histograms for Discrete and Categorical Data

For data which is discrete, individual data values may, in some instances be used as classes.

Example: The table below gives the number of children in a random sample of 40 Canadian families. Number of Children 0 1 2 3 4 Number of Families 8 12 15 4 1

(14)

For data which is Categorical, the categories themselves may be used as classes.

Exercise: The table below gives the final grades to a class of 100 students in an elementary statistics course:

Grades A B C D F Number of students 15 25 30 20 10 (a) What type of data is presented here?

(b) Draw a histogram of this data?

(15)

Some Typical Sample Shapes

(1) A LEFT SKEWED sample is one whose histogram ( or stem and leaf plot) has along left tail; the sample values tend to cluster at the right end of the scale and taper off at the lower end. (2) A RIGHT SKEWED sample is one whose histogram (or stem and leaf plot) has a long right tail; the sample values tend to cluster at the left end of the scale and taper off at the higher end.

(3) A SYMMETRIC sample is one whose histogram ( or stem and leaf plot) is distributed approximately the same on each side of some central value.

(4) A particular type of symmetric sample is a BELL- SHAPED; all bell shaped sample are symmetric, but not all symmetric sample s are bell shaped.

(16) The Stem and Leaf Plot

Although a histogram tells us how many observations fall in a particular class, we lose information about the different values within a class. For example, in the histogram B of the REM sleep data the class [.26, .30) includes exactly 4 sample values; but unless we go back to the original data, we would not know that these values are .26, .27, .28 and . 28. Note that the mid-point for this class is .28 and yet all four sample values in the class are at or below this mid-point. A different method for displaying data which overcomes some of these difficulties, and which is easy to do, even by hand, is the Stem and Leaf Plot. Recall that the position of an individual digit in a number tells us the value the digit represents. For example, consider the number 962.78. For this number 9 is 100’s digit, 6 is 10’s digit, 2 is 1’s digit, 7 is .1’s digit and 8 is .01’s digit. The stem and leaf plot uses selected digits ( called stems) to group the sample into classes. The individual observations are then represented by the stem and the next significant digit. This is called the “truncation” method. EXAMPLE: Consider the following sample data of English Scholastic Aptitude Test (SAT) scores. 638 574 627 621 705 690 522 612 594 581 640 653 638 760 491 The data here ranges from 491 to 760. Thus it is reasonable to group the sample using the 100’s digit.

The number “638” is plotted as 63, (8 is truncated)

“6” is called a stem. The stem unit is SU=100 since the stem digit is the 100’s digit.

“3” is called a Leaf. The leaf unit is LU=10, since the leaf digit is 10’s digit.

“63” represents 63 leaf units; thus 63=63(LU)=63(10)=630. Thus although a value read from the plot contains more information about the observation than is available from a histogram, it may only be an approximation to the actual sample value because of truncation. [ 63 = 630 approximates the actual value of 638]. (17) Notes: 1. Stem unit = 10 * Leaf unit i.e. SU=10*LU 2. The leaf always consists of a single digit [anything more is truncated]. To illustrate these ideas, we plot the sample as follows: 638 574 627 621 705 690 522 612 594 581 640 653 638 760 491 Initial Plot: Final Plot:

LU= 10, SU = 100 This plot is called one leaf category per stem plot (LCPS). The number of LCPS merely gives the number of lines for which the stem value is the same. Two leaf category per stem plot: Stems Leaves

LU= 10, SU=100 (18)

Five Leaf Category per stem plot:

STEMS LEAVES LU=10, SU=100. Note: The increment of a stem and leaf plot is the distance from one line to the next. INCR= SU/#LCPS. Notice that the stem together with the increment represent classes. In a one leaf category per stem plot the increment is 100. Therefore, this plot represent the following 4 classes. [400,500), [500,600), [600,700), [700, 800) While a histogram gives the number of sample values in each class, the stem and leaf display plots the (truncated) sample values belonging to each class.

(19) The Stem and Leaf Plot on Minitab MTB> stem c1 [Draws a stem and leaf plot of the sample in c1]

Note: The Minitab output gives only the leaf unit(LU). The stem unit (SU) can be obtained by the rule: SU=10*LU Example: Consider the sample data of English Scholastic Aptitude Test (SAT) scores 638 574 627 621 705 690 522 612 594 581 640 653 638 760 491. MTB> stem c1 Stem-and-leaf of C1 N=15 Leaf Unit = 10

1 4 9 2 5 2 5 5 789 (6) 6 122334 4 6 59 2 7 0 1 7 6

Leaf Unit: LU=10 Stem Unit: SU=10*LU=100 #LCPS=2 INCREMENT=SU/#LCPS = 100/2 = 50. Values are easily read from the plot. For example the smallest sample value is 49 = 49(LU) = 49*10 = 490 Note: 1. Minitab also displays a depth column. The numbers in this column count cumulatively the number of leaves from the bottom and top lines as long as the count is  n/2. If there is a row remaining ( the middle row) the number of leaves in this row is given in parentheses [ In the example above (6)]. 2. The scale ( i.e. the stem unit and number of leaf categories per stem ) can be controlled using the subcommand INCRement. This is illustrated below.

(20) To obtain a stem and leaf plot with stem unit SU and 1,2 or 5 leaf categories per stem use the STEM command with the subcommand INCRement where INCR= SU/1 for a plot with stem unit SU and 1 LCPS = SU/2 for a plot with stem unit SU and 2 LCPS =SU/5 for a plot with stem unit SU and 5 LCPS Example: Consider the English SAT scores of the previous example. Suppose we want a stem and leaf plot with SU=100. (a) MTB> stem c1; SUBC>incr 100. Stem-and-leaf of C1 N=15 Leaf Unit = 10 1 4 9 5 5 2789 (8) 6 12233459 2 7 06

(b) MTB> stem c1; SUBC> incr 50.

Stem-and-leaf of C1 N=15 Leaf Unit =10

1 4 9 2 5 2 5 5 789 6 (6) 6 122334 4 6 59 2 7 0 1 7 6 (c) MTB> stem c1; SUBC> incr 20.

Stem-and-leaf of C1 N=15 Leaf Unit= 10 1 4 9 1 5 2 5 2 2 5 3 5 7 5 5 89 6 6 1 (4) 6 2233 5 6 45 3 6 3 6 9 2 7 0 1 7 1 7 1 7 6 (21) Example: weights ( in ounces) of 30 major league baseballs are 5.8 5.21 5.26 5.26 5.04 5.21 5.28 5.04 5.17 5.12 5.30 5.07 5.06 5.17 5.24 5.13 5.26 5.16 5.09 5.22 5.6 5.09 5.24 5.22 5.22 5.11 5.23 5.06 5.27 5.13 Construct a stem and leaf plot with stem unit .10 and with two leaf categories per stem. MTB> stem c1; SUBC> incr .05.

Stem-and-leaf of C1 N=30 Leaf Unit =0.010

2 50 44 9 50 6667899 13 51 1233 (3) 51 677 14 52 11222344 6 52 66678 1 53 0

QUESTIONS: 1(a) What value does “52  3” on the plot above represent?

(b) Find the sample range R. .

2. Suppose you wish to use Minitab to draw a stem and leaf plot. What increment would you use to obtain a plot with the following specifications?

(a) Leaf Unit = 1 and one leaf category per stem.

SU= INCR=

(b) Stem Unit = 10 and 5 leaf categories per stem.

INCR= (22)