STAT1010 – (or average)

Chapter 4: Describing data

! 4.1 Averages and measures of center " Describing the center of a distribution ! 4.2 Shapes of distributions " Describing the shape ! 4.3 Quantifying variation " Describing the spread of a distribution

1

4.1 What is an average?

! In , we generally use the term mean instead of average, and the mean has a specific formula…

mean = sum of all values total number of values

! The term average could be interpreted in a variety of ways, thus, we’ll focus on the mean of a distribution or set of numbers. 2

Example:

Eight grocery stores sell the PR energy bar for the following prices: $1.09 $1.29 $1.29 $1.35 $1.39 $1.49 $1.59 $1.79

Find the mean of these prices.

Solution: The mean price is $1.41: $1.09 + $1.29 + $1.29 + $1.35 + $1.39 + $1.49 + $1.59 + $1.79 mean = 8

= $1.41

3

1 STAT1010 – mean (or average)

Example: Octane Rating n = 40 87.4, 88.4, 88.7, 88.9, 89.3, 89.3, 89.6, 89.7 89.8, 89.8, 89.9, 90.0, 90.1, 90.3, 90.4, 90.4 90.4, 90.5, 90.6, 90.7, 91.0, 91.1, 91.1, 91.2 91.2, 91.6, 91.6, 91.8, 91.8, 92.2, 92.2, 92.2 92.3, 92.6, 92.7, 92.7, 93.0, 93.3, 93.7, 94.4

4

Example: Octane Rating

Technical Note (short hand formula for the mean):

Let x1, x2, …, xn represent n values. Then, " n % $∑xi ' Total sum x + x +... + x # i 1 & mean = x = = 1 2 n = = n n n

In the octane example, x 1 = 87.4, x 2 = 88.4, etc. and n = 40. Total sum 3637.9 mean = x = = = 90.95 n 40 5

Rounding Rule for Statistical Calculations

Rounding Rule for Statistical Calculations State your answers with one more decimal place of precision than is found in the raw data. Example: The mean of 2, 3, and 5 is 3.3333 . . . , which we round to 3.3. Because the raw data are whole numbers, we round to the nearest tenth. As always, round only the final answer and not any intermediate values used in your calculations.

6

2 STAT1010 – mean (or average)

Mean as a measure of center

! A histogram of octane rating would balance at the position of its mean.

8

6 Count 4

2

85 90 95

90.95 7

Mean as a measure of center

! A histogram, in general, will balance at the position of its mean.

8

Effects of on mean

! Consider the following ordered data set: 110, 111, 116, 117, 118, 122, 123, 125, 126, 175 1243 x = =124.3 10 ! It turns out that the 175 was incorrectly written down, and was actually 135, then 1203 x = =120.3 correct 10 ! Did the mistake change the mean very much? 9

3 STAT1010 – mean (or average)

Effects of outliers on mean

! There’s some opinion on what constitutes a large change, but in general, outliers can greatly affect the mean.

Definition

An in a data set is a value that is much higher or much lower than almost all others.

10

Effects of outliers on mean ! Here is a visual along the real number line that shows the obvious outlier:

110 120 130 140 150 160 170

value

! More on this topic in part 2…

11

Weighted Mean

! Sometimes we weigh certain data points heavier than others in computing a mean. ! For instance, final grades are often computed using a weighted mean. Definition

A weighted mean accounts for variations in the relative importance of data values. Each data value is assigned a weight and the weighted mean is

sum of (each data value x its weight) weighted mean = sum of all weights 12

4 STAT1010 – mean (or average)

Example: Weighted average, Final grades

Category Weight Data Value HW 0.10 95 SchQuiz 0.10 80 UnschQuiz 0.05 65 Discussion 0.05 70 Exams 0.70 78

13

Example: Weighted average, Final grades

Data Data Value x Category Weight Value Weight HW 0.10 95 9.50 SchQuiz 0.10 80 8.00 sum of (each data UnschQuiz 0.05 65 3.25 value x its weight) Discussion 0.05 70 3.50 Exams 0.70 78 54.60 78.85

78.85 sum of (each data value x its weight) =78.85 1.00 sum of all weights 14

What about other measures of center?

!

!

15

5 STAT1010 – mean (or average)

Median

! A value that divides the data into a lower half and an upper half. ! About half the data values are greater than the median about half are less than the median. ! Perhaps a better measure of center than the mean for skewed distributions.

More on this later.

Example:

Eight grocery stores sell the PR energy bar for the following prices: $1.09 $1.29 $1.29 $1.35 $1.39 $1.49 $1.59 $1.79

Find the median of these prices.

17

Solution: To find the median, we first sort the data in ascending order:

3 values below 2 middle values 3 values above

Because there are eight prices (an even number), there are two values in the middle of the list: $1.35 and $1.39. Therefore the median lies halfway between these two values, which we calculate by adding them and dividing by 2:

median = $1.35 + $1.39 = $1.37 2

18 Copyright © 2009 Pearson Education, Inc.

6 STAT1010 – mean (or average)

Solution: To find the median, we first sort the data in ascending order:

3 values below 2 middle values 3 values above

Because there are eight prices (an even number), there are two values in the middle of the list: $1.35 and $1.39. Therefore the median lies halfway between these two values, which we calculate by adding them and dividing by 2:

median = $1.35 + $1.39 = $1.37 2 Using the rounding rule, we could express the median as $1.370

19 Copyright © 2009 Pearson Education, Inc.

n is the Median number of observations in the data set. ! When n is odd: " The median IS one of the observations 25, 47, 55, 78, 110

The median for the 5 data points ! When n is even: " The median is BETWEEN the two observations closest to the middle

20

Median (shown in a stem-n-leaf plot)

n = 12 36 |2 36*|5678899 37 |003 37*|8 Median = (368+369)/2 = 368.5 grams

21

7 STAT1010 – mean (or average)

Recall the earlier example…

! Consider the following ordered data set: 110, 111, 116, 117, 118, 122, 123, 125, 126, 175 1243 x = =124.3 10 ! It turns out that the 175 was incorrectly written down, and was actually 135, then 1203 x = =120.3 correct 10 ! Did the mistake change the mean very much? 22

Effects of outliers on median

! How does it change the median? 110, 111, 116, 117, 118, 122, 123, 125, 126, 175

Median = (118+122)/2=120

110, 111, 116, 117, 118, 122, 123, 125, 126, 135

Median = (118+122)/2=120

! It didn’t change it all!! ! Outliers can have a large affect on the mean, but the median is not greatly affected by outliers. 23

Mean or Median? 1200 1000 ! The mean is the 800

balance point of the Frequency 600 distribution. 400 200

! The median divides 0

the distribution into a 0 5 10 15 lower and an upper bird count half.

Median = 2.88 Mean = 3.48

24

8 STAT1010 – mean (or average)

Mode

! The mode is another way to find a representative value in a data set or distribution.

! The mode is the most common value in a data set. " If two values are ‘most common’, then we say the distribution is bimodal.

Example:

Eight grocery stores sell the PR energy bar for the following prices: $1.09 $1.29 $1.29 $1.35 $1.39 $1.49 $1.59 $1.79

Find the mode of these prices.

Solution:

The mode is $1.29 because this price occurs more times than any other price.

26

9