§2: Using a calculator to find measures of central tendency and measures of dispersion around a mean

NB Participants must have a scientific calculator for this workshop. In these notes a CASIO fx-82ES is used.

The material in this workshop covers some of the aspects of the following Core Assessment Standards: A S 10.4.1 (a) Collect, organise and interpret univariate numerical data in order to determine measures of central tendency of ungrouped data AS 11.4.1 (a) Calculate and represent measures of central tendency and dispersion in univariate data by calculating the variance and standard deviation of sets of data manually (for small sets of data) and using available technology (for larger sets of data) and representing results graphically using histograms and frequency polygons

USING A CALCULATOR TO FIND THE MEAN

Before you can use the calculator to work out the mean of a data set you must first get it into statistical (STAT) mode. Obviously you can find the mean of a data set on a calculator without going into the STAT mode but it is easier to use this mode as it eliminates mistakes made when working out the sum of the data set.

1) Use the [MODE] key when you want to perform statistical calculations. To get into the STAT MODE, press [MODE] [2:STAT]

2) The following Statistical Calculation Types then appear on the display:

1: 1 – VAR (Single variable) 2: A + BX (Linear regression) 3: __ + CX2 (Quadratic regression) 4: ln X (Logarithmic regression) 5: e^X (e exponential regression) 6: A.B^X (ab exponential regression) 7: A.X^B (Power regression) 8: 1/X (inverse regression)

Press [1: 1  VAR], and the following STAT Editor Screen will appear on the display:

X 1 2 3

Values are entered be typing the number and pressing [=]

Written by Jackie Scheiber and Meg Dickson © RADMASTE Centre, University of the Witwatersrand – May 2007 §2: Mean, variance and standard deviation FET Data Handling

3) The STAT Calculation Screen is used for performing statistical calculations with the data you input with the STAT editor screen. Pressing the [AC] key while the STAT editor screen is displayed switches to the STAT calculation screen.

4) While the STAT editor screen or STAT calculation screen is on the display, press [SHIFT] [1] (STAT) to display the STAT menu. The following appears on the display:

Used when you want to: 1: Type Display the Statistical Calculation Type 2: Data Display the STAT editor screen 3: Edit Display the Edit sub-menu for editing STAT editor screen 4: Sum Display the Sum sub-menu of commands for calculating suns 5: Var Display the Var sub-menu of commands for calculating the mean, standard deviation, etc 6: MinMax Display the MinMax sub-menu of commands for obtaining maximum and minimum

Activity 1 1) Thandi’s marks at the end of the first term are:

English 63%; Biology 31%; History 25%; Geography 63%; Maths 57%; Technology 37% Zulu 77% Use your calculator to find her mean mark (average of her marks for the term).

a) Get into the Stats mode by entering [MODE] [2: STAT]

b) Enter [1: 1VAR]

c) Enter these marks into the calculator: 63 [=] 31 [=]………. Look at the display each time. What does it show?

d) After entering all the items press [AC]. What does the display show?

e) To work out the mean press [SHIFT] [1] (STAT) [5: VAR] Then press [2: X ] [=] What does the display show?

f) To find out how many data items you entered press: [SHIFT] [1] (STAT) [5: VAR] [n] [=] What does the display show?

Written by Jackie Scheiber and Meg Dickson 2 © RADMASTE Centre, University of the Witwatersrand – April 2008 §2: Mean, variance and standard deviation FET Data Handling

2) The height and mass of five sprinters in the schools athletics team are given in the table below. In order to develop a suitable training diet for the athletes you need to know the average height and average mass of the sprinters. Work this out.

Height in cm 170 190 185 178 188 Mass in kg 85 91 74 68 82

3) The table below shows the results of anonymous HIV surveys of women at antenatal clinics taken since 1990 by the Department of Health in South Africa. Figures given are percentage estimated HIV infection.

North W. Cape E. Cape N. Cape Free State KZN Mpum. Lim-popo Gauteng West 1996 3,09 8,10 6,47 17,49 19,90 15,77 7,96 15,49 25,13 1997 6,29 12,61 8,63 19,57 26,92 22,55 8,20 17,10 18,10 1998 5,20 15,90 9,90 22,80 32,50 30,02 11,50 22,50 21,30 Taking into consideration that the survey was limited to women of childbearing age, estimates reflect only 15-49 year-olds. (This means that in 1996 3,09% of women at the antenatal clinics in the Western Cape were HIV positive.)

a) Find the mean percentage of HIV infected women in South Africa in 1996, 1997 and 1998. (Give answers correct to 2 decimal places)

b) What does the average tell you about HIV in South Africa?

4) The table below shows the area of each of the 9 provinces in South Africa in km2 North W. Cape E. Cape N. Cape Free State KZN Mpum. Limpopo Gauteng West 129 370 169 580 361 830 129 480 92 100 79 490 123 910 17 010 116 320

a) Find the average area of the provinces.

b) Does this figure have any meaning? Why/why not?

Written by Jackie Scheiber and Meg Dickson 3 © RADMASTE Centre, University of the Witwatersrand – April 2008 §2: Mean, variance and standard deviation FET Data Handling

MEASURES OF DISPERSION

Measures of central tendency (or averages) are very important as they can give a picture of the group that they represent. However, taken by themselves, they give a very limited view of the whole picture. As well as the average, you need to know how the rest of the data is grouped around the average – whether it is closely grouped or scattered more widely. You need to consider a MEASURE OF THE SPREAD or DISPERSION of data items around the middle values

You can find a measure of dispersion around a mean or a median. In this workshop we consider dispersion around a mean.

1) VARIANCE The mean is the balance point of a distribution of data. There are various ways you can measure the spread of a distribution around its mean. A measure that gives an idea of the spread of a data set is the deviation from the mean – i.e. how far away from the mean each data item is.

 The differences from the mean are written ( X  X ) or ( X- X ), where X is an element of the set of data and x is the mean of the set of data.  The variance is the average of the squares of the deviations of each data item from the mean

Note:  This measure of spread takes into account all data items.  It is a measure of the variability of the data items.  If the value of the variance is large, then the data items are widely spread. If the value of the variance is small the data items are closely clustered around the mean.

Activity 2 Suppose two men, Fred and Sipho, each have three sisters.  The ages of Fred’s sisters (rounded off to the nearest year) are: 22 years, 17 years and 21 years  The ages of Sipho’s sisters (rounded off to the nearest year) are: 10 years, 12 years and 38 years.

1) Calculate

a) The mean age of Fred’s sisters

b) The mean age of Sipho’s sisters

Written by Jackie Scheiber and Meg Dickson 4 © RADMASTE Centre, University of the Witwatersrand – April 2008 §2: Mean, variance and standard deviation FET Data Handling

2) Work out the deviations from the mean of each data item by completing the table below:

Fred’s sisters Sipho’s sisters age deviation deviation2 age deviation deviation2 x (x – x ) (x – x )2 x (x – x ) (x – x )2 22 10 17 12 21 38

Total = (x - x )2 Total = (x - x )2

3) Find the mean (average) of squared deviations. This is called the variance.

2 Variance =  (x  x ) where n = the number of terms n

...... Fred’s sisters, variance =  ...... (2 decimal places) 3

...... Sipho’s sisters, variance =  ...... (2 decimal places) 3

As you can see these numbers bear no real relation to the ages of the sisters in each case and it is sometimes a little difficult to understand what the variance is saying about a data set.

2) STANDARD DEVIATION The most common measure of dispersion is the standard deviation. It is simply the square root of the variance and is usually represented by the letter s or s (lower case “sigma”)

To find the standard deviation we use the formula:

2  (x  x ) Standard deviation = var iance  n

For Fred’s sisters, standard deviation = 4,66666...  2,16 years And for Sipho’s sisters the standard deviation = 162,66666...  12,75 years

Note:  The standard deviation has the same units as the data items and as the mean.

Written by Jackie Scheiber and Meg Dickson 5 © RADMASTE Centre, University of the Witwatersrand – April 2008 §2: Mean, variance and standard deviation FET Data Handling

 A small standard deviation tells you that the data items are closely clustered around the mean, while a large standard deviation tells you that the items are more spread out.  The standard deviation is the most commonly used measure of dispersion.  Although it looks as though the standard deviation is complicated to calculate it really takes little time and is very easy if you use a calculator or a computer spreadsheet.  The standard deviation can be used to compare different sets of data.  Two versions of the formula for standard deviation are used in statistics. Dividing by (n  1) gives a slightly larger value for the standard deviation and this works better when dealing with statistical inference in larger populations.

Activity 3

The table below shows the average longevity (in years) of domesticated animals

Guinea Cat Cow Dog Donkey Goat Horse Pig Rabbit Sheep pig 12 15 12 12 8 4 20 10 5 12

1) Find the mean age of the animals listed in the table.

2) Calculate the square of the deviations from the mean by filling the information onto the following table:

Deviation from the 2 Age (Deviation) mean 2 x (x  x ) (x  x ) 12 15 12 12 8 4 20 10 5 12 Total =  (x  x ) 2 =

Written by Jackie Scheiber and Meg Dickson 6 © RADMASTE Centre, University of the Witwatersrand – April 2008 §2: Mean, variance and standard deviation FET Data Handling

3) Find the standard deviation of the data by substituting into the following formula: x  x 2 Standard deviation =  n

4) What do you think the mean and the standard deviation tell you about the data?

3) USING THE CALCULATOR TO FIND THE STANDARD DEVIATION It is easy to use a calculator to find the standard deviation. Remember to first get into STAT mode on the calculator.

To work out the standard deviation on the calculator all you need to do is enter the data and then enter:

[SHIFT] [1] (STAT) [5: VAR] [3: x n] [=]

Look again at Fred’s and Sipho’s sisters Fred’s sisters ages are: 22 ; 17 and 21 Sipho’s sister’s ages are: 10 ; 12 and 38

To work out the Standard Deviation for Fred’s sisters, press the following keys [MODE] [2: STAT] [1: 1VAR] 22 [=] 17 [=] 21 [=] [SHIFT] [1] (STAT) [5: VAR] [3: x  n] [=]

You should find that the standard deviation of the ages of Fred’s sisters = 2,16

Similarly you should find the standard deviation of ages of Sipho’s sisters = 12,75

What do these values tell you about the spread of the data?

Written by Jackie Scheiber and Meg Dickson 7 © RADMASTE Centre, University of the Witwatersrand – April 2008 §2: Mean, variance and standard deviation FET Data Handling

Activity 4 1) The table shows the lowest temperatures ever recorded in 9 cities around the world. Cities Countries Temperatures °C Addis Ababa Ethiopia 0 Algiers Algeria 0 Bangkok Thailand 10 Johannesburg South Africa –8 Madrid Spain –10 Nairobi Kenya 5 Sao Paulo Brazil 0 Warsaw Poland –30 Washington USA –26

a) Use your calculator to find i) the mean of the temperatures

ii) the standard deviation of the temperatures.

b) What does the standard deviation tell you about the data?

c) What does this information tell you about Johannesburg’s temperature?

2) The figures below show the life expectancy at birth in countries belonging to the Southern African Development Community (SADC)

Life expectancy at Life expectancy at birth 1995 - 2000 birth 1995 - 2000 Angola 45,2 Namibia 44,7 Botswana 40,3 Seychelles 72,7 Dem Rep of Congo 51,3 South Africa 52,1 Lesotho 45,7 Swaziland 44,4 Malawi 40,0 Tanzania 51,1 Mauritius 71,3 Zambia 41,4 Mozambique 39,3 Zimbabwe 42,9

a) Use the calculator to find the mean of the data.

b) Use the calculator to find the standard deviation of the data.

c) What does this tell you about the data?

d) Why do you think there is such a spread in the life expectancy?

Written by Jackie Scheiber and Meg Dickson 8 © RADMASTE Centre, University of the Witwatersrand – April 2008 §2: Mean, variance and standard deviation FET Data Handling

4) THE STANDARD DEVIATION AND THE MEAN Measures of central tendency and spread of a data set help you describe the data more fully. The measures usually combined together are either the mean and the standard deviation, or the median and the quartiles. The right choice of summaries to use depends in the ‘shape’ of the distributions.

The standard deviation and the mean together provide a measure of variability within a single data set (i.e. finding measure of one, two or three standard deviations around the mean and counting how many data items fall within the intervals), as well as a contrast between two data sets. They give a way of characterising distributions of data.

This first diagram shows frequency polygons of three distributions having the same mean and varying standard deviations. The second diagram shows three distributions having the same standard deviation and different means:

[Hodge,S & Seed, M (1972) Statistics and probability Blackie & Sons, Glasgow, page 78]

Activity 5 The maths marks of two learners in you class are given below.

Jabu 20 16 10 3 12 10 11 14 5 19 Mmatsie 13 12 11 13 13 11 12 12 11 12

1) Find the mean of each data set

2) Find the standard deviation of each data set

3) The headmaster asks you to write a report comparing the progress of the two learners. Using measures of central tendency and measures of dispersion what can you say about the learners work?

5) USING THE STANDARD DEVIATION TO REACH CONCLUSIONS:

Written by Jackie Scheiber and Meg Dickson 9 © RADMASTE Centre, University of the Witwatersrand – April 2008 §2: Mean, variance and standard deviation FET Data Handling

Provided that the sample size is reasonably large and the data is not too skewed (that is, it does not have some very large or very small values), it is possible to make the following approximate statements:  About 66% of the individual observations will lie within one standard deviation of the mean.  For most data sets, about 95% of the individual observations will lie within two standard deviations of the mean.  Almost all of the data will lie within three standard deviations of the mean.

ACTIVITY 6 The office manager of a small office wants to get an idea of the number of phone calls made by the people working in the office during a typical day in one week in June. The number of calls on each day of the (5-day) week is recorded. They are as follows: Monday – 15; Tuesday – 23; Wednesday – 19; Thursday – 31; Friday – 22

1) Determine a) the mean number of phone calls per day

b) the standard deviation (correct to 1 decimal places).

2) On what percentage of the days is the number of calls within one Standard Deviation of the mean?

One Standard Deviation from the mean is:

x  σ ………………………………

So the interval is (……  …… ; …… + ……) = ……………………

The phone calls on …………………………………………………………………………. fall within the interval.

100% = ……………………

So the number of phone calls on ………. of the days lies within one Standard Deviation of the mean.

Written by Jackie Scheiber and Meg Dickson 10 © RADMASTE Centre, University of the Witwatersrand – April 2008