<<

9/5/2017

MODULE 3: SUMMARY TO DESCRIBE DISTRIBUTIONS 3.2: Measures of Variability

Measures of Variability

• It is often desirable to consider measures of variability (dispersion), as well as measures of location. • For example, in assessing a health care cost provided by a hospital we might consider not only the cost for each stay, but also the variability in cost .

Measures of Variability

1. 2. Interquartile Range 3. 4. 5.

1 9/5/2017

Range

• The range of a data set is the difference between the largest and smallest data values. • It is the simplest measure of variability. • It is very sensitive to the smallest and largest data values.

Example: Hospital Stay • Range Range = largest value - smallest value Range = 30 - 3 = 27

3 3 3 4 4 5 5 5 5 6 7 7 8 8 9 9 9 10 11 11 11 11 14 17 30

Using Excel

• The excel syntax for the range is MAX(starting cell: ending cell) - MIN(starting cell: ending cell).

2 9/5/2017

Example: Hospital Stay

• Interpretation: The range 27 days indicates the extent of variation in the durations of hospital stay.

Interquartile Range

• The interquartile range of a data set is the difference between the third and the first quartile. • It is the range for the middle 50% of the data. • It overcomes the sensitivity to extreme data values.

3 3 3 4 4 5 5 5 5 6 7 7 8 8 9 9 9 10 11 11 11 11 14 17 30

3rd Quartile (Q3) = 11

1st Quartile (Q1) = 5

Interquartile Range (IQR) = Q3 - Q1 = 11 - 5 = 6

Example: Hospital Stay

• Interpretation: The inter-quartile of 6 days indicates the middle 50% of durations of hospital stay by Ignoring the data outside this range (i.e., below the first quartile and above the third quartile).

3 9/5/2017

Identifying

A data point is an if the data point is greater

than Q3 +1.5 * IQR or If the data point is less than Q1 - 1.5 * IQR .

Example:

Q1=5, Q3=11, IQR=6 • 11 + 1.5 * 6 = 20 (anything above 20 days considered as a potential outlier) • 5 - 1.5 * 6 = -4 ( duration cannot be less than zero, thus no lower bound for outlier)

Variance

• The variance is a measure of variability that utilizes all the data. • It is the average squared deviations of each observation

(xi) from the

Variance

• If the data set is a population, the variance is denoted by  2 , “sigma-square”, and defined as

4 9/5/2017

Variance

• If the data set is a sample, the variance is denoted by s2 and defined as

The Sample Standard Deviation

The sample standard deviation is simply the square root of the sample variance, i.e., SS 2

Note that standard deviation is measured in the same units as the data, making it more easily comparable to the mean than the variance.

Computational Formulas

• To calculate the variance using the above formula 1. Calculate the mean 2. Compute the distance of each observation from the mean 3. Square the distance (or deviation) and then sum the squared deviations 4. Divide the total sum of squared deviations by n-1 to get the sample variance. • An alternative method to compute the variance is given below.

5 9/5/2017

Computational Formulas

• The following formula can be used for computing sample variance.

Example: Hospital Stay

Calculating the Sample Variance

• Both methods are illustrated on the following slides.

Example: Hospital Stay

ID X  X (X  X ) 2 X 2 X

15 ‐3.6 12.96 25 210 1.4 1.96 100

36 ‐2.6 6.76 36

411 2.4 5.76 121

55 ‐3.6 12.96 25

614 5.4 29.16 196 730 21.4 457.96 900

811 2.4 5.76 121 917 8.4 70.56 289

10 3 ‐5.6 31.36 9 11 9 0.4 0.16 81

12 3 ‐5.6 31.36 9 13 8 ‐0.6 0.36 64

14 8 ‐0.6 0.36 64

15 5 ‐3.6 12.96 25 16 5 ‐3.6 12.96 25

17 7 ‐1.6 2.56 49 18 4 ‐4.6 21.16 16

19 3 ‐5.6 31.36 9 20 7 ‐1.6 2.56 49

21 9 0.4 0.16 81 22 11 2.4 5.76 121

23 11 2.4 5.76 121 24 9 0.4 0.16 81

25 4 ‐4.6 21.16 16

Total 215 0.0 784 2633

6 9/5/2017

Variance Formula

• We now calculate the variance of durations of hospital stay as follows:

Alternative Formula for Variance

Some times, a simpler computational formula for variance may be used.

Example: Hospital Stay

• Interpretation: The standard deviation 5.7 days measures how much spread out the durations are. • Note that a low standard deviation shows by how much most of the observations centered around the mean. A high standard deviation indicates that most observations are not so clustered around the mean or average.

7 9/5/2017

Using Excel to calculate Standard Deviation

Using Excel to calculate Standard Deviation

http://www.youtube.com/watch?v=WLpesF9MM98

Summary: Quantifying variation (spread)

• Range: Span of values from smallest to largest. Determined entirely by most extreme data values. • Inter-quartile Range: Span of values from the 25th to the 75th percentile(the middle 50% of data). Ignores the data outside this range. • Standard Deviation: Square root of the average squared difference between all data values and their mean. Includes all data values, BUT it is strongly affected by unusually large/small values (called outliers).

8