Contents Sampling Simple Random Sample

Contents Sampling Simple random sample Statistics Sample mean Sampling Sampling is a branch of Statistics on drawing inference about a group of objects (population) based on • examining (interviewing, testing, observing, screening) a subset (sample) of the population. In contrast, census refers to an exhaustive examination of all objects in the population. • Why sampling? Pros: speed, accuracy (less chance to make mistake as there are less objects to deal • with), preserving material as some sampling is destructive by nature (life of a computer card). Cons: inference from the sample has an error called sampling error as not all objects are in the sample. How to select a sample? Selection based on mood, convenience and so on { haphazard sampling. • Selection based on probability (randomization) { random sample. Haphazard sampling will often lead to bias { systematic error, tends to be too high, or too low. More • importantly, no way to quantify the error from such a scheme; should avoid haphazard sampling. Random sampling has the major advantage that it is usually unbiased { no systematic error. More • importantly, because the selection is done based on randomization, we can quantify the sampling error using probability. We shall do an activity to compare the haphazard sampling and the random sampling schemes. • Simple random sample Randomly draw an object from the population so that all objects are equally likely to be selected. Note • the value of the object, and call it X1. Replace the object and repeat the preceding procedure to get X2, and so on until we have n X's; n is the sample size. Then X ;X ; ;Xn are jointly independent. • 1 2 ··· What is the distribution of X ? • 1 Suppose that there are 100 figures and 20 of them have unit area, 50 of them whose area= 2, 20 of • them have area= 3 and 10 of them have area= 4. Hence, the frequency distribution of the area of the rectangles is given below: • area 1 2 3 4 relative frequency 0.2 0.5 0.2 0.1 2S39: Class Notes/ October 11, 2000 1 The above frequency distribution is called the population distribution (of the area of the 100 rectangles) • Clearly, X1, the area of the first randomly selected rectangle must be either one of the four numbers • 1; 2; 3; 4 . f g Because all rectangles are equally likely to be selected, P (X = 1) = 20=100 = 0:2, the relative • 1 frequency of 1 in the population. Similarly, P (X1 = x) = the relative frequency of the rectangles with area equal to x. Hence, the probability distribution of X is the same as the population distribution! • 1 What is the distribution of X ? Same as the population distribution, and same for all other X's. • 2 The simple random sample yields X1;X2; ;Xn which are jointly independent and identically • distributed (i.i.d., or iid) as the population distribution.··· From now on, if we write X1;X2; ;Xn as iid, then the common probability distribution is called • the population distribution, and the ···X's may be thought of as arising from simple random sampling. The object of sampling is to learn the population distribution! Often it suffices to know some • characteristics such as the mean and the variance of the population distribution { population mean 2 2 µX = µ and variance σX = σ , where X denotes the value of an object randomly chosen from the population. Statistics Based on the sample X1;X2; ;Xn, we may estimate the population mean µ by the sample mean • X¯ = n X =n. ··· Pi=1 i A function of a random sample is called a statistic. So sample mean is a statistic. What other statistics • do you know? A statistic is a random variable, and hence has a pdf and the associated (probability) distribution • function (also known as the sampling distribution). The sampling distribution of a statistic may be derived analytically or by simulation (Monte Carlo study). • Simulation works as follows: draw a random sample and then compute the statistic, and repeat the • procedure say a 1000 times. The histogram of the 1000 statistics will be close to the sampling distribution of the statistic. The following web-site illustrate the simulation approach http://www.ruf.rice.edu/ lane/rvls.html • Sample mean 2S39: Class Notes/ October 11, 2000 2 From the preceding simulation exercise, it appears that the sample mean centers at the population mean, • ¯ that is, E(X) = µX and the sampling distribution is increasingly concentrated with increasing sample size. 2 2 These observations can be quantified as follows: E(X¯ ) = µ ¯ = µ and var(X¯ ) = σ = σ =n, • X X X¯ X where n is the sample size. Let the population variance be 20. To ensure that the sample mean has a variance equal to 2, we need • to choose the sample size to n = 10 because σ2 = σ2 =n = 20=10 = 2. X¯ X But to ensure that the standard deviation of the sample mean to be 2, n = 10 is not enough as • σ ¯ = σ =pn = 20=p10 = 6:32 = 2. X X 6 What should the sample size n be to make σ ¯ = 1 if σ = 10? • X X ¯ In order to prove the results that for a random sample X1;X2; ;Xn, E(X) = µX and • ¯ 2 ··· var(X) = σX=n, we need to consider two general results and their generalization. Two results: Let c and c be two constants. E(c X + c X ) = c E(X ) + c E(X ). This • 1 2 1 1 2 2 1 1 2 2 result is true whether or not X1 and X2 are independent. Proof: E(c1X1 + c2X2) = (c1x1 + c2x2)f (x1; x2) = • Px1 Px2 X1;X2 c1x1f (x1; x2) + c2x2f (x1; x2) = Px1 Px2 X1;X2 Px1 Px2 X1;X2 c1x1 f (x1; x2) + c2x2 f (x1; x2) = Px1 Px2 X1;X2 Px2 Px1 X1;X2 c1x1f (x1) + c2x2f (x2): Px1 X1 Px2 X2 Example, if X1 and X2 are of means 5 and 9, then • E(5X X ) = 5E(X ) E(X ) = 5 5 9 = 16: 1 − 2 1 − 2 ∗ − Recall the result that if X1 and X2 are independent of each other, then • 2 2 var(c1X1 + c2X2) = c1var(X1) + c2var(X2). Example, if X and X are of identical mean µ = 5 and variance σ2 = 10, then • 1 2 X X Y = (X1 + X2)=2 has mean (5 + 5)=2 = 5 = µX and variance 2 var(X1 + X2)=4 = [var(X1) + var(X2)]=4 = 5=2 = σX=2. The sum c X + c X is called a linear combination of X and X . • 1 1 2 2 1 2 These two results can be extended to the case of more than two random variables. In words, the • expectation of a sum of random variables is the sum of the expectations. If the random variables are independent, then the variance of the sum is the sum of variances. Using this generalization, we get the two main results that the sample mean centers at the population • mean, and the variance of the (sampling distribution of the) sample variance equals the population variance divided by the sample size. Furthermore, if X1;X2;:::;Xn are independent and normally distributed, but need not be identically • distributed, then any linear combination of the X's is normally distributed. 2 In particular, if the population distribution is N(µX; σX), then the sample mean based on a random • 2 sample of size n from the population is N(µX; σX=n). 2S39: Class Notes/ October 11, 2000 3.

Contents Sampling Simple Random Sample

Lesson 3: Sampling Plan 1. Introduction to Quantitative Sampling Sampling: Definition

Chapter 3: Simple Random Sampling and Systematic Sampling

R(Y NONRESPONSE in SURVEY RESEARCH Proceedings of the Eighth International Workshop on Household Survey Nonresponse 24-26 September 1997

Lecture 8: Sampling Methods

Survey Design

STANDARDS and GUIDELINES for STATISTICAL SURVEYS September 2006

Unit 16: Census and Sampling

Second Stage Sampling for Conflict Areas: Methods and Implications Kristen Himelein, Stephanie Eckman, Siobhan Murray and Johannes Bauer 1

Ch7 Sampling Techniques

METHOD GUIDE 3 Survey Sampling and Administration

Sample Surveys Test Review SOLUTIONS/EXPLANATIONS – Multiple Choice Questions

Interactive Lecture Notes 03-Sampling, Surveys and Gathering Useful Data