Stratified Random Sample

Unit A3 Surveys STdTlSTlCS in SOCIETY An inter-faculty second level course 3 .- Theopen MDST242 Statistics in Society UUniversity Block A Exploring the data Unit A3 Surveys Prepared by the course team An inter-faculty second level course The Open University, Walton Hall, Milton Keynes, MK7 6AA. First published 1983. New edition 1996. Reprinted 1999 Copyright @ 1996 The Open University All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, without written permission from the publisher or a licence from the Copyright Licensing Agency Limited. Details of such licences (for reprographic reproduction) may be obtained from the Copyright Licensing Agency Ltd of 90 Tottenham Court Road, London, W1P SHE. Edited, designed and typeset by the Open University using the Open University 1w System. Printed in the United Kingdom by the Alden Press, Oxford. This text forms part of an Open University Second Level Course. If you would like a copy of Studying with The Open University, please write to the Central Enquiry Service, PO Box 200, The Open University, Walton Hall, Milton Keynes, MK7 6YZ. If you have not already enrolled on the Course and would like to buy this or other Open University material, please write to Open University Educational Enterprises Ltd, 12 Cofferidge Close, Stony Stratford, Milton Keynes, MKll 1BY, United Kingdom. Contents Introduction 1 Surveys and sampling 1.1 Surveys 1.2 Random sampling 1.3 Properties of simple random sampling 2 Random samples 2.1 Choosing some samples 2.2 Systematic random sampling 3 Patterns in the samples 3.1 Population values and sample values 3.2 All possible samples 3.3 Pictures of patterns 3.4 Different sample sizes 4 More sampling methods 4.1 Types of error 4.2 Stratified sampling 4.3 Cluster sampling 4.4 Sampling from the electoral register 4.5 Stratified and cluster sampling 4.6 Quota sampling 4.7 Some more considerations 5 The Family Expenditure Survey Objectives Solutions to the Activities Solutions to the Exercises Appendix: Random Number Table Index Introduction Block A so far has been largely concerned with Stage 3 of the modelling diagram, the analysis of the data. This unit concentrates on Stage 2, collecting the data. You should by now realize the importance of collecting data that 0 can be analysed 0 enable you to answer the question under investigation. Perhaps the most frequent contact that you have with data collection in your everyday life is when you fill up forms or answer questionnaires providing information about yo&self, your home, your job, your car or (almost certainly) your OU studies! These may be for market research companies, for government departments or for your employers. Often you are asked to supply the information because you have been selected as one of a relatively small number of people being surveyed, i.e. a sample. In other cases, such as the ten-yearly Census in the UK, you are part of a large exercise designed to collect information from as many people in the country as it is possible to reach. We shall use the word census for any such complete coverage of a population and the word survey when a sample is selected from the population. You may well have wondered, when you are selected to answer questions in a survey, how the answers you give (about your preferences in toothpaste, or the number of children you have) will affect decisions made by whoever commissioned the survey. You may also have considered the question: if your next-door neighbour had been selected instead of you, how much difference would this have made to any decision based on the survey's results? The results of surveys of one kind or another - opinion polls, advertisers' claims - are never out of the news; but do they mean anything useful? Turning these questions round and looking at them from the statistician's viewpoint leads to the following question. Is it possible to gain useful information about a large population (such as all the people in the UK, or all the employees of a large firm) by collecting data about only a relatively small number (i.e. a sample) of them? The answer, which will be explained in more detail in this unit, is yes, provided that the people to be questioned are selected in the correct way. The population need not be a population of people; it could consist of schools, firms, villages, fish, light bulbs, etc. A similar question can still be asked, and the answer is much the same; but here we shall concentrate on surveys of people. Section 1 of the unit describes the basic principles of how to select the people to be questioned and introduces a method called random selection, or random sampling. Section 2 examines the effects of simple random sampling and introduces a modification of this method, called systematic random sampling, which is of great practical importance. Section 3 looks more closely at the relationship between samples of the population and the population as a whole. This leads to the idea of a sampling distribution, which forms the theoretical basis of all the methods in Block B for deriving information about the whole of a large population from facts about a sample taken from it. Section 4 contains an introduction to some further aspects of survey planning. Finally, Section 5 shows how the ideas introduced in the previous sections of the unit are applied in practice in a major Government-run survey, the Family Expenditure Survey (FES). 1 bumeys and sampling 1.1 Surveys Throughout the previous units of this block, stress has been laid on the importance of collecting data that are both relevant to the investigation in hand and reliable. You have also encountered several published sources of data. Now, many of these published sources were based on data that had been collected in surveys. Here is a list of those surveys that have been referred to, with a brief description of them. 1 The Family Expenditure Survey (FES) which, each year, investigates the See Subsection 5.2 of income and spending patterns of about 7000 households in the UK. Unit Al. 2 The survey of retail prices, carried out each month by the Department of See Subsection 5.3 of Employment; this provides about 150 000 prices used in calculating the Retail Unit Al. Prices Index (RPI). 3 The New Earnings Survey (NES) which, each year, collects information on the See Subsection 1.3 of earnings of about 180 000 people. Unit A2. 4 The Department of Employment's survey which, each month, collects See Subsection 5.2 of information about the earnings of all employees in about 8000 firms for use in Unit A2. calculating the Average Earnings Index (AEI). All these sources of data have one thing in common: they do not collect information about every individual member of the population involved (i.e. they are surveys, not censuses). The whole population of interest is known as the target population. Each of these surveys claims to provide reliable information about the whole of its target population. 1 The target population of the FES is all households in the UK. There are approximately 22 000 000 of these. 2 For the survey of retail prices, the exact size of the whole target population is difficult to assess but it is certainly much larger than the 150 000 prices collected in the survey. 3 The target population of the NES is all members of the PAYE system. There are about 20 000 000 of these. 4 Since the AEI aims to give an overall measure of changes in the earnings of all employees in the UK, the target population is all firms in the UK. Altogether, there are about 1800 000 firms, but the majority of these employ fewer than 25 people and these are not included. The basis for using a survey instead of a census is that, provided the sample is chosen carefully from the target population, the results of the survey can be used to infer the characteristics of the whole targct population. We shall see later how this can be done, but first let us consider some of the advantages. If such a survey does provide reliable information about the whole of its target population, then it is certainly much cheaper than collecting this See Subsection 1.3 of Unit A2. information from every member of the target population. For example, because the target population of the NES is more than 100 times as big as the sample, many of the operations involved in collecting the NES data would take considerably more money and effort if information about every member of the PAYE system were collected. It is true that some of the operations would not be as much as 100 times as costly, but some would certainly become excessively expensive. For example, even with vast resources it would be impossible to ensure that every employee's earnings were accurately recorded. It is also very likely that it would take longer to analyse the larger amount of data, so the results would be more out-of-date when they were published. However, it is possible with care to obtain reasonably accurate information about the selected sample of employees at a reasonable cost of both time and money. For reasons like these, in most investigations the data collected from an appropriately chosen sample give more detailed, more accurate and more useful information about the whole population of interest than could be obtained for the same cost by attempting complete coverage.

Stratified Random Sample

SAMPLING DESIGN & WEIGHTING in the Original

Stratified Sampling Using Cluster Analysis: a Sample Selection Strategy for Improved Generalizations from Experiments

Stratified Random Sampling from Streaming and Stored Data

IBM SPSS Complex Samples Business Analytics

Using Sampling Matching Methods to Remove Selectivity in Survey Analysis with Categorical Data

3 Stratified Simple Random Sampling

Overview of Propensity Score Analysis

Samples Can Vary - Standard Error

Ch7 Sampling Techniques

CHAPTER 5 Sampling

A Study of Stratified Sampling in Variance Reduction Techniques for Parametric Yield Estimation

Chapter 4 Stratified Sampling