STATISTICS the Open University Mothematics/Social Sciences/Science/Technology an Inter-Fotulty Second Level Course MDST242 Statistics in Society
Total Page:16
File Type:pdf, Size:1020Kb
MDST C5 The Open Unwers~ty 242 Mothemot~cs/Soc~olS~~ences/Sc~ence/lechnology An loter forulty second level course Unit C5 Review STATISTICS The Open University Mothematics/Social Sciences/Science/Technology An inter-fotulty second level course MDST242 Statistics in Society Block C Data in context Unit C5 Review Prepared by the course team The Open University The Open University Walton Hall, Milton Keynes MK7 6AA First published 1983. Reprinted 1986, 1990, 1993, 1997, 1998 Copyright 0 1983 The Open University All rights reserved. No part of this work may be reproduced in any form, by mimeograph or any other means, without permission in writing from the publisher. Designed by the Graphic Design Group of the Open University. Printed in Great Britain at The Alden Press. Oxford. This text forms part of an Open University course. The complete list of units in the course appears at the end of this text. For general availability of supporting material referred to in this text, please write to Open University Educational Enterprises Limited. 12 CofTeridge Close. Stony Stratford, Milton Keynes MKl l IBY. Great Britain. Further information on Open University courses may be obtained from the Admissions Office, The Open University, P.O. Box 48, Walton Hall, Milton Keynes MK7 6AB. Contents Introduction Block review Statistics in experiments Analysing experimental data The t-tests Statistics in decision-making Experiments and energy : the Whitburn project Designing the experiment Some data Learning from the Whitburn experience Experiments and energy: the Milton Keynes project Pennyland revisited Some data Pennyland and Great Linford: relationships between variables Why experiment? Acknowledgements 4 CS: I. 1.1 Introduction This unit is designed to help you revise and consolidate what you have learnt in the rest of the block, and indeed the rest of the course. Section 1 contains a review of the block, similar to those in Units A5 and B5. You should aim to spend about l+-2 hours reading through this summary, referring back to earlier units where necessary. The remaining two sections of the unit are based round the television programmes, TV7 and TV8. Both of these programmes are concerned with experiments which investigate ways of saving energy in the home. You should plan your study of these sections to fit round the television programmes. Before watching TV7 you should work through Section 2.1 and part of 2.2. Before watching TV8 you should complete all your work on Section 2 and work through Sections 3.1 and 3.2. 1 Block review The purpose of this section is to remind you of the statistical ideas introduced in Block C,and to give you an opportunity to put them into perspective. This should help you to organize your thoughts before going on to apply these ideas in Sections 2 and 3, and starting your revision for the examination. We shall not give full details of all the techniques here, nor shall we mention every technical term or phrase from the block. Fuller details of the important terms and techniques are in the Handbook. The block has concentrated on two basic areas where statistical ideas are used: in experimentation and in decision-making, both public and private. 1.1 Statistics in experiments What are experiments, and what has statistics got to do with them? Broadly speaking, C2: 1.1-2 an experiment involves making specific observations under specific conditions in order to answer specific questions; a scientific experiment's methods and results should also be open to public scrutiny and verification. There are many kinds of experiments, including C2: 1.2 exploratory (Baconian) experiments, which aim to answer questions such as 'What happens if.. .? measurement experiments hypothesis-testing experiments, which aim to test a specific hypothesis, often one about the cause of a phenomenon. In this course we have concentrated on the third type of experiment. It is not always straightforward to classify an experiment into one of these three types. Some experiments have elements of more than one of the types. Statistics is relevant to experimentation chiefly because experimental observations are C1 :2.3, C2:2.6 and 4.7 variable. In a typical experiment, measurements are made on several experimental units, which qay be individual people, or plants, or households, or almost anything else. These measurements will vary from one experimental unit to the next, and there may be many different sources of this variability. Statistical techniques can help to deal with variability in two ways. Statistics provides ways of analysing the experimental results so that the different C1 : 4 and C2: 4.7 sources of variability can be disentangled as far as possible. Statistics provides ways ofdesigning the experiment which ensure that the effects of C1 : 3 unknown sources of variability are minimized. The statistical design and statistical analysis of an experiment go hand in hand. In designing an experiment you must ensure that the observations are made in such a way that it is possible for the statistical analysis of the results to answer the questions in which you are interested. Figure 1.1 shows the modelling diagram which we have used throughout the course. Experiments probably bring the second stage (collect data) to mind, but in fact the use of statistics in experimentation shows the whole of the modelling process in action. An experiment is carried out to investigate a clearly stated question. Data is collected and analysed, conclusions are drawn, and on the basis of these results new questions may arise and new experiments may be performed. In this block we have shown how earlier stages in the process are affected by what might happen at later stages. In designing an experiment, before you collect any data you must consider how you are going to analyse the data you collect. You must design the experiment so that the results are not obscured by irrelevant sources of variation; i.e. you must collect the data in such a way that the results can be interpreted in a manner pertinent to the question you have asked. In short, in designing experiments you must look at the statistical modelling process as a whole. Figure 1.1 Modelling diagram CLARIFY COLLECT question about random sample population INTERPRET INFER ANALYSE results for sample data population Most of the experiments described in the block have investigated the effect of a Cl: 2.1 particular treatment of some sort on people, animals, plants or some other kind of experimental units. This investigation usually involved comparing experimental units which had been exposed to the experimental treatment (the experimental group) with other experimental units (the control group) which had not been exposed to the treatment. The types ofdesign described in the block differed mainly in the nature ofthe control group involved. The following three types of design were covered in detail. Crossover design Each experimental unit acts as its own control: two measurements are made on each C1 : 3.1 unit, one under the experimental treatment and one under the control treatment. This design avoids the effect ofvariability between experimental units but it cannot be used if either of the treatments irreversibly alters the experimental unit or if it is not possible to make two measurements on one unit. It is usually important to ensure that half the experimental units receive the experimental treatment first, whilst half receive the control treatment first. Matched-pairs design Each unit in the experimental group is matched with a similar unit in the control group. C1 : 3.2 If this matching can be done effectively then this type of design avoids much of the effect of unwanted variability between experimental units. Unfortunately it is often very difficult to achieve a good matching. Group-comparative design Experimental units are allocated at random to either the experimental group or the Cl: 3.3 control group. Unwanted variability between units is not eliminated. .. In order to reduce the impact of unwanted types of variability between experimental C1 : 2.1 and C2: 2.6 units in such designs, it is important to ensure that the experimental group and control group are dealt with identically (apart, of course, from the experimental manipulation). If this is done then any observed difference between the groups cannot be attributed to systematic differences in the circumstances under which the observations were made. Unfortunately it is not usually possible to treat the experimental and control groups completely identically, so there is always a danger that some differences remain, and these may introduce bias into the results. Another potential source of bias, both in group-comparative design experiments and (despite the matching) in matched-pair design experiments, occurs when the experimental treatment and the control treatment are applied to experimental units which are not sufficiently similar. If the variability between units acts in such a way that there is a systematic difference between those receiving the experimental treatment and .those receiving the control treatment, then the results will be subject to bias. One can never be sure that this kind of bias does not exist but one way to try to avoid it C1 : 3.4 is to use randomization. For example, in an experiment with a group-comparative design, the units should be allocated to the experimental and control groups at random. This can be done in several ways; e.g. by using a random number table. The randomization ensures that units with any particular feature which might be a source of variability will be likely to be represented more or less equally in both groups. Hence it is probable that the only major difference between the two groups will be the difference, if any, between the two treatments whose effect is being measured.