MDST C5 The Open Unwers~ty 242 Mothemot~cs/Soc~olS~~ences/Sc~ence/lechnology An loter forulty second level course

Unit C5 Review The Open University Mothematics/Social Sciences/Science/Technology An inter-fotulty second level course MDST242 Statistics in Society

Block C in context

Unit C5 Review Prepared by the course team

The Open University The Open University Walton Hall, Milton Keynes MK7 6AA First published 1983. Reprinted 1986, 1990, 1993, 1997, 1998 Copyright 0 1983 The Open University All rights reserved. No part of this work may be reproduced in any form, by mimeograph or any other , without permission in writing from the publisher. Designed by the Graphic Design Group of the Open University. Printed in Great Britain at The Alden Press. Oxford.

This text forms part of an Open University course. The complete list of units in the course appears at the end of this text. For general availability of supporting material referred to in this text, please write to Open University Educational Enterprises Limited. 12 CofTeridge Close. Stony Stratford, Milton Keynes MKl l IBY. Great Britain. Further information on Open University courses may be obtained from the Admissions Office, The Open University, P.O. Box 48, Walton Hall, Milton Keynes MK7 6AB. Contents

Introduction

Block review Statistics in Analysing experimental data The t-tests Statistics in decision-making

Experiments and energy : the Whitburn project Designing the Some data Learning from the Whitburn experience

Experiments and energy: the Milton Keynes project Pennyland revisited Some data Pennyland and Great Linford: relationships between variables Why experiment?

Acknowledgements 4 CS: I. 1.1

Introduction

This unit is designed to help you revise and consolidate what you have learnt in the rest of the block, and indeed the rest of the course. Section 1 contains a review of the block, similar to those in Units A5 and B5. You should aim to spend about l+-2 hours reading through this summary, referring back to earlier units where necessary. The remaining two sections of the unit are based round the television programmes, TV7 and TV8. Both of these programmes are concerned with experiments which investigate ways of saving energy in the home. You should plan your study of these sections to fit round the television programmes. Before watching TV7 you should work through Section 2.1 and part of 2.2. Before watching TV8 you should complete all your work on Section 2 and work through Sections 3.1 and 3.2.

1 Block review

The purpose of this section is to remind you of the statistical ideas introduced in Block C,and to give you an opportunity to put them into perspective. This should help you to organize your thoughts before going on to apply these ideas in Sections 2 and 3, and starting your revision for the examination. We shall not give full details of all the techniques here, nor shall we mention every technical term or phrase from the block. Fuller details of the important terms and techniques are in the Handbook. The block has concentrated on two basic areas where statistical ideas are used: in experimentation and in decision-making, both public and private.

1.1 Statistics in experiments

What are experiments, and what has statistics got to do with them? Broadly speaking, C2: 1.1-2 an experiment involves making specific observations under specific conditions in order to answer specific questions; a scientific experiment's methods and results should also be open to public scrutiny and verification. There are many kinds of experiments, including C2: 1.2 exploratory (Baconian) experiments, which aim to answer questions such as 'What happens if.. .? measurement experiments hypothesis-testing experiments, which aim to test a specific hypothesis, often one about the cause of a phenomenon. In this course we have concentrated on the third type of experiment. It is not always straightforward to classify an experiment into one of these three types. Some experiments have elements of more than one of the types. Statistics is relevant to experimentation chiefly because experimental observations are C1 :2.3, C2:2.6 and 4.7 variable. In a typical experiment, measurements are made on several experimental units, which qay be individual people, or plants, or households, or almost anything else. These measurements will vary from one experimental unit to the next, and there may be many different sources of this variability. Statistical techniques can help to deal with variability in two ways. Statistics provides ways of analysing the experimental results so that the different C1 : 4 and C2: 4.7 sources of variability can be disentangled as far as possible. Statistics provides ways ofdesigning the experiment which ensure that the effects of C1 : 3 unknown sources of variability are minimized. The statistical design and statistical analysis of an experiment go hand in hand. In designing an experiment you must ensure that the observations are made in such a way that it is possible for the statistical analysis of the results to answer the questions in which you are interested. Figure 1.1 shows the modelling diagram which we have used throughout the course. Experiments probably bring the second stage (collect data) to mind, but in fact the use of statistics in experimentation shows the whole of the modelling process in action. An experiment is carried out to investigate a clearly stated question. Data is collected and analysed, conclusions are drawn, and on the basis of these results new questions may arise and new experiments may be performed. In this block we have shown how earlier stages in the process are affected by what might happen at later stages. In designing an experiment, before you collect any data you must consider how you are going to analyse the data you collect. You must design the experiment so that the results are not obscured by irrelevant sources of variation; i.e. you must collect the data in such a way that the results can be interpreted in a manner pertinent to the question you have asked. In short, in designing experiments you must look at the statistical modelling process as a whole. Figure 1.1 Modelling diagram

CLARIFY COLLECT question about random sample population

INTERPRET INFER ANALYSE results for sample data population

Most of the experiments described in the block have investigated the effect of a Cl: 2.1 particular treatment of some sort on people, animals, plants or some other kind of experimental units. This investigation usually involved comparing experimental units which had been exposed to the experimental treatment (the experimental group) with other experimental units (the control group) which had not been exposed to the treatment. The types ofdesign described in the block differed mainly in the nature ofthe control group involved. The following three types of design were covered in detail. . Crossover design Each experimental unit acts as its own control: two measurements are made on each C1 : 3.1 unit, one under the experimental treatment and one under the control treatment. This design avoids the effect ofvariability between experimental units but it cannot be used if either of the treatments irreversibly alters the experimental unit or if it is not possible to make two measurements on one unit. It is usually important to ensure that half the experimental units receive the experimental treatment first, whilst half receive the control treatment first.

Matched-pairs design Each unit in the experimental group is matched with a similar unit in the control group. C1 : 3.2 If this can be done effectively then this type of design avoids much of the effect of unwanted variability between experimental units. Unfortunately it is often very difficult to achieve a good matching.

Group-comparative design Experimental units are allocated at random to either the experimental group or the Cl: 3.3 control group. Unwanted variability between units is not eliminated. ..

In order to reduce the impact of unwanted types of variability between experimental C1 : 2.1 and C2: 2.6 units in such designs, it is important to ensure that the experimental group and control group are dealt with identically (apart, of course, from the experimental manipulation). If this is done then any observed difference between the groups cannot be attributed to systematic differences in the circumstances under which the observations were made. Unfortunately it is not usually possible to treat the experimental and control groups completely identically, so there is always a danger that some differences remain, and these may introduce bias into the results. Another potential source of bias, both in group-comparative design experiments and (despite the matching) in matched-pair design experiments, occurs when the experimental treatment and the control treatment are applied to experimental units which are not sufficiently similar. If the variability between units acts in such a way that there is a systematic difference between those receiving the experimental treatment and .those receiving the control treatment, then the results will be subject to bias. One can never be sure that this kind of bias does not exist but one way to try to avoid it C1 : 3.4 is to use . For example, in an experiment with a group-comparative design, the units should be allocated to the experimental and control groups at random. This can be done in several ways; e.g. by using a random number table. The randomization ensures that units with any particular feature which might be a source of variability will be likely to be represented more or less equally in both groups. Hence it is probable that the only major difference between the two groups will be the difference, if any, between the two treatments whose effect is being measured. If the analysis of the results indicates that the two groups do differ then you can be fairly confident that the difference is due to a difference between the treatments rather than to an unwanted systematic difference between the individuals in the two groups. Certain procedures, such as the prospective and retrospective epidemiological studies C4: 2.2 in Unit C4, also involve comparing two groups of individuals. However, although the individuals in the two groups differ (e.g. those in one group may smoke whilst those in the other do not) they are not allocated to the groups at random and so, even if a difference between the groups is found, the possibility remains that it is due to some cause other than that under investigation. Studies of this nature are sometimes called pseudo-experiments (or quasi-experiments).

1.2 Analysing experimental data

The kind of statistical analysis used on experimental results depends on the design of C1: 4.1 the experiment. Most of the experiments in this block were analysed using a hypothesis test of some sort. Unit B5 contained a (Figure 1.5) to help you decide which hypothesis test to use in a particular situation. Figure 1.2 is a new version of that chart

Figure.1.2 Which test to use?

Type of - CATEGORICAL ------ORDINAL ------

TWO NOT- - ,- -

DSample size

Mood's Mann- A suitable . - - X' test . - -. - -- - slgn test - . med~an Wh~tney - a test test

Notes 1 All the tests given for use with ordinal data can also be applied to interval data. 2 The test can also be applied to ordinal or interval data if suitable categories are defined. which takes into account the additional techniques of hypothesis testing which you have learnt in this block. (Of course, the chart can also be used when the data has not been obtained from an experiment.) The questions you should ask yourself when using the chart are as follows. What type ofdata is involved, i.e. what type of measurements have been made-categorical, C1 : 4.3 ordinal or interval scale ? With categorical measurements, the person taking the measurement does no more than identify to which of several categories each item belongs. The categories need not be ordered in any way. With ordinal measurements, it is possible to arrange the measurements in a meaningful order. With measurements on an interval scale, it makes sense to compare differences between measurements. These are what you might think of as real measurements in definite units. How many samples are there and, if there are two, are they paired? C1 : 4.4 The chart treats data from a single sample and data from two paired samples together. This is because tests for paired samples (at least, those which we have described) involve first finding the difference between the two sample values in each pair and then applying a one-sample test to the resulting differences. Experiments with crossover and matched-pairs designs produce two paired samples of data; group-comparative design experiments produce two unpaired samples. How large is the sample ? Whether the sample counts as large or small may depend on the test being considered. In the context of this course, the maximum sample size for the , the Mann- Whitney test and the Wilcoxon matched-pairs test is determined by the corresponding table of critical values. It is possible to apply tests similar to these to larger samples but different tables must be used. In choosing between the z- and t-tests, remember that z- tests should not generally be used if the sample size is below 25. However, t-tests can, and if appropriate should, be used even when the sample size is larger than this. Figure 1.2 does not provide all the answers to the question of which test to use. Some of the points which it does not cover are as follows. Almost all of the tests require that the population(s) from which the sample(s) is drawn satisfies certain conditions. (See When to be careful entries in Section 1.3 of Unit B5 and Section 1.3 of this unit.) 0 You still need to decide, for most of the tests, whether to use a one-sided or a two- sided test. All the tests given under the headings ordinal and interval are tests whose hypotheses refer to measures of level of the populations involved, i.e. to their or means. There are hypothesis tests for other aspects of populations (e.g. to test whether two populations have equal spread, as measured by their ) but we have not been able to cover such tests in this course.

1.3 The t-tests

Some new hypothesis tests were introduced in Block C: the one-sample and two- sample t-tests. These tests can be applied only if the populations involved have an approximately Normal distribution. We shall now briefly remind you of some of the details of these tests.

One-sample t-test C2:4.3 This test is applied to a single sample of data in order to investigate the of a population. In this course we have considered only the two-sided version of this test, so the null and alternative hypotheses are H,: p= A HI: p# A where A is some value of interest to the researcher. Details of how to find the test t, and how to decide whether it is in the critical region, are in Unit C2 and in the Handbook.

When to be careful The following points should be born in mind when using the one-sample t-test. The one-sample t-test can be applied only to data measured on an interval scale, C2: 4.3 when the population under investigation can be assumed to have an approximately Normal distribution. The table of critical values does not cover numbers of degrees of freedom larger C2: 3.3 than 40, which means that the test cannot be applied using this table if the sample size is larger than 41. However, for samples larger than this the z-test can be used.

Matched-pairs t-test C2: 4.4 This test is applied to two paired samples (A and B) of data, to investigate whether the means, p, and p,, of the populations from which the samples were drawn are equal. The null and alternative hypotheses are 'H~:PA = HI: PA # PB. The procedure involves replacing these hypotheses by

where p, is the population mean of the population of differences between the matched- pairs. (Also, p, = p, - p,.) If it can be assumed that this population of differences has a Normal distribution then, after calculating the differences between the two sample values in each pair, the one-sample t-test can be applied to this sample of differences.

Two-sample t-test This test is applied to two unpaired samples of data to investigate whether the means p, and py, of the populations from which the samples were drawn, are equal. The null and alternative hypotheses are

Details of how to calculate the test statistic t, and how to decide whether it is in the critical region, are in Unit C2 and in the Handbook.

When to be careful The following points should be born in mind when using the two-sample t-test. The two-sample t-test can be applied only to data measured on an interval scale, C2: 3.3 when both populations involved can be assumed to have an approximately Normal distribution. It must be reasonable to assume that the standard deviations (and hence ) C2: 3.3 of the two populations involved are equal. The table of critical values does not cover degrees of freedom above 40, which means that the test cannot be applied using this table if the sum of the sample sizes is greater than 42. For samples larger than this it is often possible to use a z-test. However, if the sample sizes are very unequal then a z-test may not be satisfactory.

Confidence intervals The ideas behind the one-sample t-test can be extended to provide a 95 % confidence C2: 4.6 interval for the population mean of a Normally distributed population. Similarly, the ideas behind the two-sample t-test provide a for the difference between the population means of two Normally distributed populations with equal standard deviations. 1.4 Statistics in decision-making

Throughout the course we have seen examples of the use of statistics to help decision- making. For example, in Unit A0 we saw how a house owner could use statistics to help decide whether to pay for cavity-wall insulation. Block C has indicated various ways in which statistics can be used to help make both individual decisions (Unit C3) and public decisions (Unit C4). We shall now look at the whole process of using statistics and how it fits in with decision-making.

Clarifying the question COLLECT In applying statistics, the first step is often to clarify the question which we want to answer. This clarification must be done in relation to the data which is available or which can be measured. Statistics alone cannot answer questions such as L A Is my child normal? Does smoking cause cancer? Should the government spend more on building insulation? However, statistics can answer questions such as How tall is my child compared to other children of her age in Britain? How much less heat is lost, on average, from the extra-insulation houses on the Pennyland estate than from the standard-insulation houses? The answers to questions of this second sort can guide decision-makers in dealing with less specific questions. Yet care must be taken not to confuse statistical answers with answer of other kinds. Statistics might tell me that my child is smaller than all but 3 %of children of her age, but statistics on their own cannot tell me whether this is a problem which I should try to do something about.

Collecting data CLARIFY COLLECT Block C discussed several important ideas connected with collecting data. As well as problems concerned with data from experiments, those concerned with data from official statistics were again raised. We emphasized that it is important to consider the sources of official data and the means by which they are produced, before drawing conclusions from them. In Unit C4, different types of data obtained from observational studies were described. C4: 3.2 We distinguished between longitudinal data, which is collected by observing individuals or groups of individuals over time, and cross-sectional data, which is collected from a number of individuals or groups at one particular time. We also pointed out the distinction between individual data (i.e. data about C4: 3.1 individuals) and aggregated data (i.e. data about groups of individuals). Another important distinction which we made was between prospective studies, in C4: 2.2 which individuals are followed up to see whether a particular event (e.g. death from lung cancer) occurs, and retrospective studies, in which the previous experiences of a group of individuals to whom the event had occurred, and another to whom it had not occurred, are investigated. These types of study were introduced in the context of but they are used C4: 3 in many disciplines. When analysing data, it is important to consider what type of data it is and the type of study from which it comes. It cannot, for example, be assumed that a relationship observed in one type of data implies that a similar relationship would be observed in a different type of data.

Analysing data rT--l 121 CLARIFY COLLECT One important new technique for analysing data which we introduced in this block is the t-test. We also described certain other broad approaches to analysing data. In Unit C4 we used weighted means of death rates to compare mortality in different populations, in much the same way as, in Unit AI, we used weighted means of price ratios to compare how much things cost. We described two methods, direct and indirect age-standardization, of comparing death rates in populations with differing C4: 1.2 age' structures. In Unit C3 we introduced one method of fixing standards for things which vary. C3: 1.2,2 Children of a given age do not all have the same height - there is natural variation. But we want to identify children whose height is so far from the average that there may be something abnormal about them which requires further investigation by doctors. So we take the height on the growth chart as the FIT for children of a particular age, and identify children whose height is such that the RESIDUAL is too far from zero (Figure 1.3). Figure 1.3

centile

Drawing centile curves on the growth chart and using them to pick out statistically abnormal children makes this process easier to perform. Of course, the children who are picked out as statistically abnormal may not be abnormal in the sense of being ill, but they are sufficiently far from the FIT (i.e. they have sufficiently large residuals) to merit further investigation. You may have come across similar used in checking industrial processes, where they are known as control charts. For example, a machine may be set to produce articles of a given size. Articles produced by the machine are regularly sampled and their size is measured and plotted as a point on a chart like that in Figure 1.4. The size to which the machine is s&tis indicated by the central line. Dotted lines above and below indicate limits within which the variation in size is considered acceptable. If the measured sizes stray outside the warning lines, this indicates that something may be going wrong. If the measured sizes go outside the action lines, it is even more likely that something is wrong and the machine must be checked closely to see if it needs to be reset or repaired.

Warning

In Unit C3 we also saw that investigating relationships between three variables (e.g. age, C3: 3.2, 3.3 weight and height)could not be done simply by looking at relationships between pairs of these variables.

Interpreting the results I I 1 I 1 Finally we come to the interpretation of the results. This interpretation can be a decision, or recommendation, which depends on the way in which the decision-maker chooses to relate the statistical data to the real world. Decisions must depend on how the decision-maker evaluates the costs of the various possible outcomes - and this does not just include costs in the monetary sense. Users of statistics may need to communicate their interpretations to others and this communication process is fraught with difficulties. The modelling process continues as the results of statistical investigations lead to more and more questions. The original sources of research programmes are often hunches or suspicions held by workers in the field but, as the research proceeds, formal statistical investigations enter to a much greater extent. Yet the cycle of statistical investigation is not automatic. Decisions on where to go next must be made at every stage.

2 Experiments and energy: the Whitburn project

In this section and the next we shall round off the course by looking at some examples of the whole process of using statistics; these examples are drawn from the study of energy use in houses. The two sections are based on the television programmes TV7 and TV8. You should work through this section as far as the middle of Section 2.2 before watching TV7. After watching TV7, work through the rest of Section 2 and Sections 3.1 and 3.2 before watching TV8.

2.1 Designing the experiment

Does insulating a house save energy? By this stage in the course you have probably met enough evidence to convince you that it does. But how could you find out how much energy a particular level of insulation saves? One way to do this is to carry out an experiment, and you have seen many such experiments in the course. In this section we shall review what we have said earlier in the course about how such an experiment on house insulation should be planned.

Activity 2.1 Look back at what you learnt about such experiments from the projects on the houses at Fishponds, Bristol and Kemnay (in Unit AO), Derek Whiteside's house (in Unit A5) and the Pennyland houses (in Units A5 and B5). Solution There are many detailed points which you have probably remembered and many of these will be discussed further in this section and the next. One general point which we hope that you have understood is the importance of controlling for factors, other than insulation, which affect the amount of energy used in a house. W

Here, as in Unit B5, we shall be particularly interested in those factors which might affect the amount of energy used for space-heating in a house. Some which have already been discussed in the course are as follows. How well-insulated the house is. A house with a lot of cavity-wall and roof insulation will use less energy than a less well-insulated house. The location of the house. A house in an exposed country location will tend to use more energy than a house in a sheltered site. The design of the house. Big houses generally use more energy but details of the design, such as the positioning of the windows, can also make a difference. The weather can affect energy use. The life-style of the people living in the house can make a large difference to energy use. If they are at home most of the time and keep the house very warm then they will require a lot of energy for space-heating. In an experiment which aims to measure the effect of the first of these factors (i.e. how much energy can be saved by insulation) care must be taken to ensure that variability due to these other factors does not introduce bias into the results. As we explained in Unit B5, each factor can be dealt with, i.e. controlled, in two ways. It can be kept constant throughout the experiment. If the factor itself does not vary then it cannot cause any of the variability in the measurements. This is sometimes known as experimental control. An example from drug testing is to ensure that all the patients receive their treatment in exactly the same way, regardless of whether that treatment is the drug being tested or a placebo. The experiment can be designed in a way that controls for the factor, e.g. by pairing the experimental units. Even when such a design is not feasible, the experimental units should be allocated to the different treatments at random so that differences between units in the experimental group and the control group will tend to be balanced out. This is sometimes known as statistical control. In drug testing, the effect of a drug varies from one person to the next in a way which is usually impossible to predict. If enough people are used in a drug trial and they are allocated to the different treatment groups at random then this variation between people will tend to average out across the treatment groups. The experiment which we shall describe in this section was designed to measure how The first part of TV7 is much energy can be saved by insulation. It was carried out as part of a series ofresearch concerned with deciding whether to use experimental projects run by the Department of the Environment: the Better Insulated House or statistical control for the Programme. This programme consisted of nine separate projects, carried out at various various factors involved in this sites throughout Great Britain (see Figure 2.1) starting in 1974. experiment.

Figure 2.1 The sites of the projects in the Better Insulated House Programme

At most of the sites, an experiment using a group-comparative design was conducted. Each site had about 40 houses, divided into an experimental, or test, group and a control group. The houses in the control group were insulated only to a standard which was usual for similar houses at that time. Thus on most of the sites the control group houses had uninsulated cavity walls, and roofs with either no insulation or only a 25 mm thickness of insulation. Houses in the experimental groups were more highly insulated. The main aim of the research programme, at its outset, was to measure how much energy is saved by insulating houses to the standard of those in the experimental groups. From this information, the researchers hoped to learn more about the cost- effectiveness of insulation. Other aims of the research programme were to identify any technical problems related to the extra insulation, and to find out how the people living in the houses reacted to the higher levels of insulation. We shall concentrate on the Better Insulated House Programme site at Whitburn in Scotland. This experiment included 56 houses and flats, of various sizes. They were heated electrically, using storage heaters or underfloor heating. Half of them were in the control group with no cavity insulation and only a 25 mm thickness of loft insulation, while the others in the experimental group had extra roof insulation (100 mm thick) and their wall cavities were filled with insulatingfoam. The electricity usage ofthe houses was monitored, as were the temperatures inside and outside.

2.2 Some data

We shall now look in detail at a small amount of the data from the Whitburn project. Two further two-bedroom This data is in Figure 2.2:it is the total electricity usages of 15 two-bedroom houses for houses were investigated in this experiment but they were the year June 1975 to June 1976. unoccupied for part of the year so-the data from these is Figure 2.2 Total electricity consumption (kWh) for not listed here. two-bedroom houses at Whitburn:June 1975 to June l976

Control houses Experimental houses

Before watching TV7 you should start the analysis of this data by completing the following activities.

Activity 2.2 Prepare a back-to-back stemplot of the data in Figure 2.2. Solution The data must be cut to hundreds of kWh to obtain a reasonable number of levels on the stemplot. Control houses Experimental houses 4 3 5 5 67 7 4 8 001 9 7 62 10 3 11 640 12 13 6 14 4 ( 3 represents 4300kWh We shall refer to the values for the control houses by the n, = = 8 n, 7 letter X and to those for the experimental houses by the You may have felt that a squeezed stemplot shows the pattern in the data more clearly. letter y. Control houses Experimental houses 4,5 3. 5 6,i 7 4 8,V 0017 62 10,n 3 640 12,n 6 14,Z 4 1 3 represents 4300 kWh n, = 8 n,=7 The stemplot clearly indicates that the experimental houses tend to use less electricity than the control houses, and so the insulation does appear to save energy - but how much?

Activity 2.3 For each batch of data in Figure 2.2, calculate the mean and standard deviation. Solution Control houses

-C 10976.75 126 249037.3 (n, = 8)

"X

Cx The mean is m, = - = 10976.75. We keep the full accuracy for "X use in later calculations. The is

= l26 249 037.3 - (10 976.75)' = 5 759 996.7

and so the standard deviation is S, 2399.9 (cut to 1 decimal place).

Experimental houses

XY The mean is my = - = 7890.1429. Again keeping the full "Y accuracy for use in later calculations. The variance is

and so the standard deviation is S, 1. 1834.0 (cut to 1 decimal place). W

So, cut to 1 decimal place, the mean annual electricity consumption in the experimental houses is 7890.1 kWh, which is considerably less than the mean consumption of 10976.7 kWh in the control houses. In fact, on average, the experimental houses use 3086.6kWh less electricity. But what about houses not included in the Whitburn experiment? Do they behave in a similar way? To help answer this question we need to calculate a confidence interval for the mean saving of electricity. This confidence interval will be based on the t-test so you should now work through some of the The calculation of such a necessary calculations. confidence interval will be justified in TV7.

Activity 2.4 Continuing to denote the control house values by X and the experimental house values by y, calculate C(x - m,)' and CO, - my)2.Remember, from Section 3.3 of Unit C2, that you do not need to go back to the original data to calculate these sums. Cut your answers to a whole number. Solution Control houses Remember that

so that

Experimental houses CO, - my)2= nys,2 = 7 X 3 363 896.4 = 23547274. H WATCH TV7 NOW

Activity 2.5 Calculate a 95 % confidence interval, based on the two-sample t-test, for This confidence interval was the mean difference in annual electricity consumption between a control house and an calculated in TV7. Remember - that this calculation is valid experimental house comparable to these two-bedroom houses at Whitburn. only if it is reasonable to Solution Following the procedure in Section 4.6 of Unit C2, we first need to calculate assume that the populations from which the samples were S:, the estimate of the common population variance. Here is the formula drawn have Normal C(x - rnJ2 + CO, - m,)' distributions with equal S; = standard deviations. In TV7 nx-l+ny-1 we showed that these You found the two sums in this formula in Activity 2.4. Substituting these two values assumptions are reasonable. gives 46 079 973 + 23 547 274 S: = 8-1+7-1 - 69 627 247 13

E 5 355 942.1 The other value which is needed is the critical value t, and this can be found from the table in Figure 3.1 of Unit C2. The number of degrees of freedom is nx-l+ny- 1=13 and so, from the table, t, = 2.160. CS: 2.3

The confidence interval is

Now, m, - my = 10976.7 - 7890.1 Here we cut to one figure more than the data. = 3086.6 and

LZ 2587.1 (cut to one decimal place, the same accuracy as the means). So the confidence interval is [3086.6 - 2587.1, 3086.6 + 2587.11 that is

Activity 2.6 In Section 1.2 of Unit C2, three basic types ofexperiment were described: exploratory, measurement and hypothesis-testing. How does the Whitburn experiment fit into this classification? Solution The main objective of the experiment was to measure the amount of energy saved by extra insulation. From the point of view of this objective, Whitburn was a measurement experiment. However, there were other objectives, such as identifying any technical problems and finding out the inhabitants' reactions. From the point ofview of these objectives, it was an exploratory experiment. W

In TV7 a crossover design for the experiment was considered. In this design, This is not a true crossover uninsulated houses would have their energy use monitored for a time. They would then design. A true cro~over design would start off with be insulated and monitored for a further time. This design has the advantage that the some of the houses insulated same people live in each house throughout, so reducing the variability due to different and would have their personal habits and life-style. However, the effects of weather would have to be insulation removed at the controlled for statistically, by running the experiment for several years to average out Same time as the others were the effect of the weather. insulated.

Activity 2.7 In Unit A0 we described an experiment on houses at Fishponds, Bristol. See Section 5.2 of Unit AO. This experiment had exactly this kind of design but it ran for only two years, one without and one with insulation. How did the experimenters at Fishponds deal with variability caused by weather? Solution They monitored the weather conditions and concluded that the before insulation values needed to be increased by 12 % to allow for the fact that the winter after insulation was colder than the winter before.

This is another method of controlling for an unwanted source of variability in an experiment. If the factor causing the variability can be measured at the same time as the variable of interest (e.g. temperatures can be measured at the same time as energy use) then it may be possible to describe the relationship between the two variables (e.g. by an It is often more convenient for equation). If such a description can be found then the variable of interest, the energy the relationship between the use, can be corrected for changes in the factor causing the unwanted variability. variable of interest and the ,unwanted source of variability to be measured in a completely separate experiment. 2.3 Learning from the Whitburn experience

On the basis of all the data from the Whitburn project, the Department of the Environment researchers concluded that the extra insulation in the experimental houses was cost-effective. But what about other types of houses, in locations other than Whitburn? The researchers conducted similar experiments in different kinds of house at locations all over Great Britain (see the in Figure 2.1), and they reached broadly the same conclusion from the Better Insulated House Programme as a whole. However, the experience gained in the early projects in the programme, including Whitburn, led to a change of emphasis in later projects. In the Whitburn project, and in other early projects in the programme, an interesting result came to light, as follows. The properties of the materials used in house construction and insulation were known and from these the researchers calculated how much energy would have been saved by the insulation assuming that the insulation had been installed perfectly the inside temperatures were the same as those measured before the insulation was installed nothing else had changed. The results of these calculations were much greater than the energy savings achieved in practice, and it was found that the discrepancies were due to the temperatures inside the insulated houses being greater than the temperatures inside the control houses. In other words, something important had changed: people in the insulated houses were, on the whole, warmer (although the insulation was still saving someenergy). Now, some of this increase in temperature was voluntary (i.e. the occupants chose to be warmer) but some of the houses were hotter after insulation because the heating systems could not be controlled properly. Observations like this led the Department of the Environment researchers to broaden the objectives of the Better Insulated House Programme somewhat. It became clear that the heating system should match the house, and that a better insulated house might need a system with different controls, or even a smaller heating system altogether. A further possibility mentioned in TV7 was that some gas boilers are less efficient than they might be when producing the small amounts of heat needed to keep a well- This finding was in fact by no insulated house warm. In order to estimate how much energy could be saved by meansBetter Insulatedwell House by the insulating a house and altering the heating system, later projects in the Better Insulated though TV7 may HouseProgrammeusedexperimentalandcontrolhouseswithdifferent heatingsystems. have led you to believe that it was. In the next section we return to the much more complicated experiments being run by the Open University's Energy Research Group at Pennyland and Great Linford in Milton Keynes. In a sense, these experiments carry on from the Better Insulated House Programme: they investigate other aspects of energy saving in addition to the level of insulation and the heating system.

3 Experiments and energy: the Milton Keynes project

This section has two main aims. Firstly, we shall try to fit the Pennyland and Great Linford experiments into context. Secondly, we shall remind you of various statistical ideas and techniques, from the whole course, which you will need to know for the examination. So this section will both help you see the course as a whole and help you with your revision. You should complete Sections 3.1 and 3.2 before watching TV8.

3.1 Pennyland revisited

In Unit B5 we explained that the houses on the Pennyland estate in Milton Keynes are divided into two groups. One group, known as Pennyland 1, has a reasonably high level of insulation called the standard level (S).The other group, Pennyland 2, has a very high level of insulation called the extra level (E). One of the objectives of the researchers on the Pennyland project was to measure how much less energy is used in the very highly insulated houses in comparison with the other houses, just as the researchers did on the Whitburn project. They thus needed to deal with all those factors causing unwanted variability which were discovered by the Whitburn experimenters. However, because Houses of the same size have there is no systematic difference in heating system between the two groups of the Same of heating system, regardless of the level Pennyland houses, they were unable to investigate how much effect the type of of insulation. heating system has on energy consumption. They came to essentially the same conclusion about the best design for the experiment and used a group-comparative design. The Pennyland researchers, however, had additional objectives in mind. The Pennyland houses differ from one another in various ways other than their level of insulation. For example, some of them incorporate energy-saving measures which make direct use of the sun's heat to reduce the amount of gas needed to heat a house. These passive solar energy design features include Orientation Most of the houses on the estate face in a direction between south- east and south-west, to catch the sun. Aspect (glazing pattern) Many Pennyland houses have the majority of their windows on the south-facing wall (single-aspect houses). Others have their windows more'evenly distributed between south and north walls (dual-aspect houses). In Unit B5 we described how the researchers on the Pennyland project could control for these factors in experiments designed to investigate the effect of insulation on energy consumption. We shall now describe how they investigated the changes in energy use caused by these passive solar design features. One of the factors in which they were particularly interested was the effect of glazing pattern (i.e. whether the house is dual- aspect or single-aspect) on energy consumption.

Activity 3.1 Other things being equal, would you expect single-aspect houses or dual- aspect houses to consume less energy? Solution Our intuition is that the single-aspect houses would use less energy. Sun coming through a window warms a room. If more of the windows are on the south side then there is more glass for the sun to come through. W

Is this intuition valid? Perhaps in real houses the truth is more complicated than this, and in practice the single-aspect houses might use the same amount of energy as the dual-aspect houses, or even more.

Activity 3.2 How would you investigate whether single-aspect houses use less energy for heating than dual-aspect houses? Solution The first stage is to clarify the question. As has already been mentioned, houses differ from each other in many ways apart from their glazing pattern. Location, orientation, size etc. all affect the energy consumption of a house and so, to obtain meaningful results, it is necessary to ask a more specific question. Other things being equal, do the single-aspect houses at Pennyland require less energy for heating than the dual-aspect houses? The next stage is to collect some data, and this is where things become awkward. It is fairly easy to find data for the total gas consumption of several single-aspect houses and several dual-aspect houses at Pennyland, and while total gas consumption is not the same as energy used for heating, it would be reasonable data to investigate. The difficulty arises in attempting to separate the energy differences attributable to different glazing patterns from those attributable to the other factors such as level of insulation, orientation and size. In order to make a valid comparison, it .is necessary either to control for all these factors experimentally, or to allow for them somehow in the statistical analysis of the results. Controlling for them experimentally by keeping them constant involves finding two groups of houses which differ only in their glazing pattern (i.e. they all need to have the same level of insulation, orientation, size etc.). One difficulty with using such groups is that it may not be possible to find large enough groups and, even then, the houses would be lived in by different people. When the researchers on the Pennyland project tried to separate out the effects of all the energy-saving measures in their experiment, they were faced with just this choice. They could either have made each comparison on the basis of very small groups of houses, or they could have allowed for many different factors in the analysis of the data. 3.2 Some data

Figure 3.1 contains data collected from two groups of Pennyland houses. They are all Pennyland l houses (these have the standard level of insulation), they are all the same size and they all have the same orientation-they differ mainly in that the Group A houses are dual-aspect whilst the Group B houses are single-aspect.

Figure 3.1 Annual gas consumption (kWh) for two groups of Pennyland l houses

Group A Group B

Activity 3.3 Let MA and MB be the population medians of two populations from which the two groups (Group A and Group B, respectively) in Figure 3.1 are samples. Use the Mann-Whitney test to test the hypotheses Ho: MA = Me /' HI:MA > M,. Solution First prepare a back-to-back stemplot of the data then calculate the number of Group A values below each Group B value as follows Group A Group B Number of Group A values .below each Group B value

228 13, 13 -

104 = B column total Since the alternative 7 hypothesis is one-sided, the 8 1 7 represents 8700 kWh other column total is not required. Sample size = 13 Sample size = 17 The test statistic is the B column total: 104. 20 CS: 3.3

From the table of critical values for the one-sided Mann-Whitney test (Figure 3.6 of Unit B2) the critical value for sample sizes 13 and 17 is 70. The test statistic is greater than the critical value, so the null hypothesis cannot be rejected.

You may have wondered what would happen if the original data values, rather than the cut values, had been used in the calculation of the number of Group A values below each Group B value. After all, the Group A value recorded as 12 200 in the stemplot was actually 12 240, whilst the Group B value recorded as 12 200 was actually 12 280. So, in fact, the Group B figure recorded as 12 200 in the stemplot has 1 Group A value below it, not 9 as recorded above. We can go back to the original data and decide which way all the ties in the calculation above should have been recorded; the results are as follows. Group A Group B Number of Group A values below each group B value

103 = B column total nU 8 1 7 represents 8700 kWh Sample size = 13 Sample size = 17 This time the value of the test statistics is 103. Whether the value is 103 or 104, it is certainly not less than 70, so it is not in the critical region and the null hypothesis cannot be rejected. Thus, with neither analysis do the samples provide sufficient evidence that the population medians differ. This activity illustrates the fact that it usually does not make any difference to the conclusion of a hypothesis test whether cut values (e.g. from a stemplot), or the original data values, are used to calculate the test statistic. If the value of the test statistic calculated using the cut values comes out very close to the critical value, or if there is a very large number of ties in the cut values which are not tied in the uncut data, then it may be worth recalculating the test statistic using the uncut values.

WATCH TV8 NOW

3.3 Pennyland and Great Linford : relationships between variables

As you saw in Activity 3.3, it is not possible to reject the null hyp6thesis'that there is no This result is described in difference in median annual gas consumption between the populations of dual- and more in single-aspect Pennyland 1 houses. Does this mean that single-aspect glazing really does not save gas? This is one possibility but another possibility is that a type 2 error has occurred: i.e. there really is a difference between the two types of house but the experiment has failed to detect it. Another possibility is that there are other factors which affect energy consumption but which have not been controlled for in choosing the houses in Groups A and B. CS: 3.3 -

Activity 3.4 The single-aspect houses have a slightly larger area of external wall than the dual-aspect houses. How could this affect the results? Solution A house with a larger external wall-area will, other things being equal, lose more heat to the outside air than a house with a smaller external wall area. So, in this respect, the single-aspect houses lose more heat and so will need more gas to keep them warm. This will go against any tendency for single-aspect houses to use less gas because they gain more from direct solar radiation. So, even if it is true that Pennyland single- aspect houses use no less gas than Pennyland dual-aspect houses, this may be due to the extra wall-area cancelling out the effect of the different glazing pattern. B

It would be much simpler if all the houses had the same external wall-area. However, any real experiment is a compromise between different constraints and this is particularly true at Pennyland, where the laboratory is a real, and fairly large, housing estate. The houses will stand long after the monitoring has finished and architectural considerations inevitably lead to departures from what might be the scientific ideal. In order to overcome this difficulty of unequal wall-areas, a correction could be made to the data to allow for the difference in external wall-area. The size of this correction can be estimated using other data from the Pennyland project as well as from experiments elsewhere but we shall not describe this procedure here. Further evidence supporting the effectiveness of single-aspect glazing was found in an experiment using the Open University Energy Research Group's test house at Great Linford, Milton Keynes. There, for a 10-week period from February to May 1982, a constant-heating experiment was run. The house was heated to a constant and uniform 21°C using thermostatically controlled electric fan heaters. The researchers measured, amongst other things, the following three variables. - 0 Electricity required to keep the house at 21°C (denoted by E, measured in kWh per day ). 0 Energy from solar radiation falling on a south-facing vertical surface (denoted by S, measured in kWh per square metre per day). Temperature difference between the inside and the outside of the house (denoted by TD, measured in "C). Figure 3.2 gives the data which they collected. Each entry is an average over a whole week.

Figure 3.2 Data from the Great Linford constant-heating experiment

Week S TD E

The Great Linford test house faces south and has most of its glazing on the south side (i.e. it is a single-aspect house). If such a house does collect energy from solar radiation then it would be expected to collect more such energy on sunnier days, i.e. when S is relatively high. So electricity usage, E, should,berelatively low when S is relatively high : i.e. there should be a negative relationship between S and E. Figure 3.3 contains a scattergram of E against S. The electricity usage E is on the vertical axis because we are trying to explain changes in E in terms of changes in solar radiation S: i.e. E is the dependent variable and S is the explanatory variable. The line on the scattergram is the sub-batch median FIT line, with equation E = - 2.1268 + 54.6 Figure 3.3 Scattergram of electricity consumption against solar radiation, with ajt line

X

Solar radiation (kWh/day/m2)

Activity 3.5 Check the equation of the FIT line in Figure 3.3. Solution The ordered values of S are in the first column of Figure 3.4, which shows the corresponding values of E and the three sub-batches; it also contains the FIT values and residuals at the two stages of the calculation.

Figure 3.4

Adjusted ORIGINAL ORIGINAL Adjusted NEW S E-DATA FIT RESIDUAL NEW FIT RESIDUAL

0'06.8 Middle

- P P P P P P - 2.79 63.2 45.9 17.3 48.6 14.6 3.27 45.0 44.9 0.1 47.6 -2.6 Right 3.43 31.1 . 44.6 - 13.5 47.3 - 16.2

Thus the sub-batch summary points are (2.00,47.7) (2.69, 54.2) (3.27, 45.0). For the ORIGINAL FIT line RISE 45.0 - 47.7 -2.7 GRADIENT = -- =-- - - 2.1259843 R UN 3.27 - 2.00 1.27 and the intercept is 47.7 - (- 2.1259843) X 2.00 z 51.9. Thus the equation of the ORIGINAL FIT line is E = - 2.1259843s + 51.9 and this can be used to calculate the third and fourth columns in Figure 3.4. Since r, and r, = 0.1, the gradient of the RESIDUAL FIT line is 0. When this happens there is no point in working out a NEW FIT line as its gradient will be the same as that of the ORIGINAL FIT line. It is still, however, necessary to work out the adjusted intercept, as follows: 4 [47.7 + 54.2 + 45.0 + 2.1259843 (2.00 + 2.69 + 3.27)] - 54.6. Thus the equation of the adjusted NEW FIT line is l E = -2.1259843s + 54.6. The residual plot for this FIT line is in Figure 3.5.

Figure 3.5

t RES':UAL

Clearly the FIT is not good. Some of the residuals are very large. This is mainly because variations in the temperature difference, TD, have not been allowed for. The effect of TD can be incorporated by using a slightly more complicated FIT equation. For a straight line FIT equation relating S and E, we found an equation of the form E = aS + c. In Activity 3.5 the gradient a and intercept c of the line were calculated using the sub- batch median method. To include TD as well, we look for a FIT equation of the form E = aS + b(TD) + c, and the values of a, b and c are calculated to give the equation The method used to find these values is described briefly E = - 16.3s + 6.74 TD - 5.97. below. As usual, to see whether this is a good FIT, we must look at the residuals. Using the values of S and TD (from Figure 3.2) for Week 1, the FIT value of E is - 16.3 X 1.36 + 6.74 X 15.1 - 5.97, which comes to 73.6 (cut to 1 decimal place). The observed DATA value of E is 74.5, so RESIDUAL = DATA - FIT = 74.5 - 73.6 = 0.9. This is much less than the residual of22.8 corresponding to Week 1 on Figure 3.4, which was calculated without taking TD into account. In fact, almost all the residuals come out to be much less than those for the FIT line in Figure 3.3 (see Figure 3.5). The residuals for this new equation are pictured in Figure 3.6.

Figure 3.6

RESIDUAL

10

You may be wondering how the values of a, b and c in this new FIT equation were found. Of all the possible values for a, b and c, the values a = - 16.3, b = 6.74 and c = - 5.97 are those which make the sum of the squared residuals as small as possible. This method of , as it is known, is a widely used method of calculating the numbers in FIT equations and it can be used to fit straight lines to data. For fitting straight lines it is much less resistant than the sub-batch median method. This new FIT can be presented graphically if we arrange the FIT equation into the form E - 6.74 TD = - 16.3s - 5.97. The left-hand side of this equation can now be thought of as a value of E adjusted to take account of the temperature difference TD. Figure 3.7 contains the values of E - 6.74 TD for each of the 10 weeks, together with the corresponding value of S. The Adjusted E values are cut to 1 decimal place. , .

Figure 3.7

Adjusted E : Week E - 6.74 TD

Figure 3.8 contains a scattergram of these values, with the line Adjusted E = - 16.3s - 5.97 drawn on it. This FIT line clearly describes the data much better than did the previous line.

Figure 3.8 T Arljus~edE : E - 6.74 TD

So the FIT equation appears to describe the data from the Great Linford constant-heating experiment quite well. What is more, it ties in with what one might expect. The value of S is multiplied by a negative number so that when S falls, E rises: when there is less solar radiation, the house uses more electricity. On the other hand, TD is multiplied by a positive number so that when TD rises, E rises: when the house is much warmer inside than out, it uses a lot of electricity. These relationships can be summarized by the arrow diagram in Figure 3.9. Figure 3.9

Electricity consumption (E)

Temperature difference (TD) Solar radiation (S)

Activity 3.6 In Activity 3.3 you were unable to reject the null hypothesis that the popuhtion median gas consumptions are the same in the single- and dual-aspect houses. Do the Great Linford findings show that there must nevertheless be a difference between these two population medians (i.e. that a type 2 error must have occurred). Solution No. All the Great Linford data shows is that in one single-aspect house there is a negative relationship between the amount of solar radiation and the amount of energy needed to heat the house. The Great Linford house is similar to the single-aspect Pennyland houses; therefore the findings do indicate that, in a house whose design is not too different from the Pennyland single-aspect houses, some use can be made of solar radiation. This does not imply for certain that a dual-aspect Pennyland house makes less use of solar radiation than a single-aspect house, but it certainly makes it seem more plausible. To summarize, the Great Linford findings do not prove that single-aspect glazing saves energy at Pennyland but they do make this seem more plausible. W

The major part of the statistical analysis of the data from the Pennyland project began in March 1983. The analysis needed to be more sophisticated than that described above in two major respects. It used methods of comparing all the single-aspect houses with all the dual-aspect houses, and made adjustments for differences in insulation, orientation etc. between individual single-aspect houses and between individual dual-aspect houses. This can be done using techniques such as . See Section 4.7 of Unit C2. 0 It took into account detailed differences between houses, such as the differing wall- areas of the single- and dual-aspect houses. This can be done by investigating the This is what we have just relationship between external wall-area and energy use and then adjusting the in the Great Linford electricity energy use of each house to allow for its wall-area. consumption values to take These relationships can be investigated using data from the Great Linford test :'_Oftemperature house and from many other experiments, as well as data from the Pennyland houses themselves. Such methods, using more complicated statistical analysis, enable compensation to be made for the compromises which were necessary in designing the experiment. .

3.4 Why experiment?

In 1981 a total of l7 385 million therms of natural gas (over 500 thousand million kWh) These figures come from the were used in the UK. Of these, 8748 million therms (over 250 thousand million kWh) Digest of UK Energy Statistics 1982 (HMSO). were used in domestic premises. Domestic consumers paid about £2.5 thousand million for this gas. That is a lot of gas, and natural gas is a scarce resource. Many recent UK governments have said that they would like to decrease gas consumption.

Activity 3.6 Write down some ways in which a government could reduce gas usage in domestic premises. Solution Here are some methods. (a) Increase gas prices. (b) Ration gas by limiting the amount each customer is allowed to use. (c) Ration gas by restricting the number of new customers being connected to the supply. CS: 3.4

(d) Reduce gas consumption in houses by increasing insulation. This could be done as follows encouraging house insulation by advertising encouraging house insulation by giving people grants to insulate their houses compelling people to insulate their houses by law.

(e) ' Encouraging research into the production of more efficient gas boilers. You might well have thought of others.

In fact, recent governments have tried to reduce gas consumption by each of these measures, though methods (b)and (c) have been used only for non-domestic customers. The statistics on gas usage and on the amount of gas left under the North Sea indicate that there is a problem-gas may run out all too soon-but they do not indicate which answer to the problem the government should choose. Experiments like those in the Better Insulated House Programme and the Pennyland project provide governments with information which is valuable when considering the options listed under Method (d) above. Of course, insulation saves gas but if the cost of the insulation is far more than the value of the gas then the insulation may not be worthwhile. If insulation is being encouraged by raising gas prices, by advertising or by providing grants, then it is up to the consumer to decide how much insulation to put in, if any. On the other hand, if people are being compelled by law to insulate their homes then the compulsory level of insulation must be cost-effective, at least from the government's point of view. One group of people who have been for some years compelled by law to insulate houses are builders. All new houses constructed in this country must satisfy the Building Regulations, and these include regulations on thermal insulation. The standards of insulation required by the regulations were increased in April 1982 to a level that might have seemed excessive to all but energy conservation lobbyists 10 or 15 years ago- but of course world energy supplies are seen in a much different light nowadays. In fact, the thermal insulation standards for houses built after April 1982 caught up with the Pennyland project, in that the Pennyland 1 houses only just complied with the new These houses were designed in regulations. the late 1970s as reasonably well-insulated buildings, more This change in building regulations would probably not have occurred had highly insulated than the experiments like those in the Better Insulated House Programme not shown that at f~~:~~~:ousebeing higher standards of insulation are worthwhile. In the future, it may be that new houses will have to be insulated to the level of the Pennyland 2 houses, or even higher, and it will be data and statistical analysis from experiments like those in the Pennyland project which will back up these vital decisions about our country's energy use.

Acknowledgements

We gratefully acknowledge the cooperation of the Department of Energy and the Department of the Environment in providing information for Section 2.

MDST 242 Statistics in Society Block A Exploring the data Unit A0 Introduction l Unit A1 Prices Unit A2 Earnings Unit A3 Relationships Unit A4 Surveys Unit A5 Review Block B Unit B1 How good are the schools? Unit B2 Does pay? Unit B3 Education: does sex matter? Unit B4 What about large samples? Unit B5 Review Block C Data in context Unit C1 Testing new drugs Unit C2 Scientific experiments Unit C3 Is my child normal? Unit C4 Smoking, statistics and society Unit C5 Review

The Open Un~vers~ty 0335141404 -