2.3 Simple Random Sampling • Simple Random Sampling Without Replacement (Srswor) of Size N Is the Probability Sampling Design

2.3 Simple Random Sampling • Simple random sampling without replacement (srswor) of size n is the probability sampling design for which a fixed number of n units are selected from a population of N units without replacement such that every possible sample of n units has equal probability of being selected. A resulting sample is called a simple random sample or srs. • Note: I will use SRS to denote a simple random sample and SR as an abbreviation of `simple random'. • Some necessary combinatorial notation: { (n factorial) n! = n × (n − 1) × (n − 2) × · · · × 2 × 1: This is the number of unique arrangements or orderings (or permutations) of n distinct items. For example: 6! = 6 × 5 × 4 × 3 × 2 × 1 = 720. N N(N − 1) ··· (N − n + 1) N! { (N choose n) = = : This is the n n! n!(N − n)! number of combinations of n items selected from N distinct items (and the order of 6 6! (6)(5)(4!) (6)(5) selection doesn't matter). For example, = = = = 15. 2 2!4! 2!4! (2)(1) N • There are n possible SRSs of size n selected from a population of size N. N • For any SRS of size n from a population of size N, we have P (S) = 1= n : • Unless otherwise specified, we will assume sampling is without replacement. 2.3.1 Estimation of yU and t • A natural estimator for the population mean yU is the sample mean y. Because y is an estimate of an individual unit's y-value, multiplication by the population size N will give us an estimate bt of the population total t. That is: n n 1 X N X y = y = y t = y = (10) cU n i b n i i=1 i=1 • ycU and bt are design unbiased. That is, the average values of y and Ny taken over all possible SRSs equal yU and t, respectively. Demonstration of Unbiasedness: Suppose we have a population consisting of five y-values: Unit i 1 2 3 4 5 yi 0 2 3 4 7 which has the following parameters: 2 N = t = yU = S = S ≈ 5 Suppose a SRS of size n = 2 is selected. Then P (S) = 1= 2 = 1=10 for each of the 10 possible SRSs. 22 All Possible Samples and Statistics from Example Population P 2 2 Sample Units y-values yi ycU = y bt = Ny Sb = s Sb = s S1 1,2 0,2 2 1 5 2 1.4142 S2 1,3 0,3 3 1.5 7.5 4.5 2.1213 S3 1,4 0,4 4 2 10 8 2.8284 S4 1,5 0,7 7 3.5 17.5 24.5 4.9497 S5 2,3 2,3 5 2.5 12.5 .5 0.7071 S6 2,4 2,4 6 3 15 2 1.4142 S7 2,5 2,7 9 4.5 22.5 12.5 3.5355 S8 3,4 3,4 7 3.5 17.5 .5 0.7071 S9 3,5 3,7 10 5 25 8 2.8284 S10 4,5 4,7 11 5.5 27.5 4.5 2.1213 Column Sum 32 160 67 22.6274 32 160 67 22:6274 Expected value 10 = 3:2 10 = 16 10 = 6:7 10 = 2:26274 2 = E(estimator) = yU = t = S =6 S 2 2 The averages for estimators ycU = y, bt = Ny; and Sb = s equal the parameters that they 2 2 are estimating. This implies that y, Ny, and s are unbiased estimators of yU , t, and S . 2 2 2 2 Notation: E(ycU ) = yU , E(bt) = t, E(Sb ) = S or E(y) = yU , E(Ny) = t, E(s ) = S . The average for estimator Sb = s does not equal the parameter S. This implies that s is a biased estimator of S. Notation: E(Sb) =6 S or E(s) =6 S. • The next problem is to study the variances of ycU = y and bt = Ny. • Warning: In an introductory statistics course, you were told that thep variancep of the sample mean V (Y ) = S2=n (= σ2=n) and its standard deviation is S= n (= σ= n). This is appropriate if a sample was to be taken from an infinite or extremely large population. • However, we are dealing with finite populations that often are not considered extremely N − n large. In such cases, we have to adjust our variance formulas by which is known N as the finite population correction (f.p.c.). N − n n • Texts may rewrite the f.p.c. as either 1 − or 1 − f where f = n=N is the N N fraction of the population that was sampled. By definition : S2 V (y ) = V (y) = V (t) = N 2V (y) = N(N − n) (11) cU b n • Because S2 is unknown, we use s2 to get unbiased estimators of the variances in (11):: s2 Vb(y ) = Vb(y) = Vb(t) = N 2Vb(y) = N(N − n) (12) cU b n • Taking a square root of a variance in (11) yields the standard deviation of the estimator. • Taking a square root of an estimated variance in (12) yields the standard error of the estimate. 23 N − n S2 3 6:7 • Thus, V (y) = = = and N n 5 2 2 2 S 6:7 V (bt) = N V (y) = N(N − n) = (5)(3) = . n 2 • Like ycU and bt, the variances Vb(ycU ) and Vb(bt) are design unbiased. That is the average of Vb(ycU ) and Vb(bt) taken over all possible SRSs equal V (ycU ) = 2:01 and V (bt) = 50:25, respectively. N − n s2 3 s2 • For the estimated variances we have Vb(ycU ) = = = and N n 5 2 s2 s2 Vb(bt) = N(N − n) = (5)(3) = where s2 is a particular sample variance. n 2 Example: We will use our population from the previous example: Unit, i 1 2 3 4 5 yi 0 2 3 4 7 which have the following parameters 2 N = 5 t = 16 yU = 3:2 S = 6:7 S ≈ 2:588 Estimated Variances of ycU and bt for All Samples 2 2 2 Sample Units y-values s Vb(ycU ) = :3s Vb(bt) = 7:5s S1 1,2 0,2 2 0.6 15 S2 1,3 0,3 4.5 1.35 33.75 S3 1,4 0,4 8 2.4 60 S4 1,5 0,7 24.5 7.35 183.75 S5 2,3 2,3 .5 0.15 3.75 S6 2,4 2,4 2 0.6 15 S7 2,5 2,7 12.5 3.75 93.75 S8 3,4 3,4 .5 0.15 3.75 S9 3,5 3,7 8 2.4 60 S10 4,5 4,7 4.5 1.35 33.75 Column Sum • From the table we have E(Vb(ycU )) = 20:1=10 = 2:01 = V (ycU ) and E(Vb(bt)) = 502:5=10 = 50:25 = V (bt). Thus, we see that both variance estimators are unbiased. • If N is large relative to n, then the finite population correction (f.p.c.) will be close to (but less than) 1. Omitting the finite population correction from the variance formulas (i.e., replacing (N − n)=N with 1) will slightly overestimate the true variance. That is, there is a small positive bias. I personally would not recommend omitting the f.p.c.. • If N is not large relative to n, then omitting the f.p.c. from the variance formulas can seriously overestimate the true variance. That is, there can be a large positive bias. N − n • As n ! N, ! 0. That is, as the sample size approaches the population size, the N f.p.c. approaches 0. Thus, in (11) and (12) the variances ! 0 as n ! N. 24 2.3.2 SRS With Replacement • Consider a sampling procedure in which a sampling unit is randomly selected from the population, its y-value recorded, and is then returned to the population. This process of randomly selecting units with replacement after each stage is repeated n times. Thus, a sampling unit may be sampled multiple times. A sample of n units selected by such a procedure is called a simple random sample with replacement. s2 • The estimators for SRS with replacement are: y = y Vb(y ) = Vb(y) = cU cU b n • Suppose we have two estimators θb1 and θb2 of some parameter θ. θb1 is less efficient than θb2 for estimating θ if V (θb1) > V (θb2). θb1 is more efficient than θb2 for estimating θ if V (θb1) < V (θb2). • For most situations, the estimator for a SRS with replacement is less efficient than the estimator for a SRS without replacement. • There will be circumstances (such as sampling proportional to size) where we will consider sampling with replacement. Unless otherwise stated, we assume that sampling is done without replacement. 2.4 Two-Sided Confidence Intervals for yU and t • In an introductory statistics course, you were given confidence interval formulas s s y ± z∗ p and y ± t∗ p (13) n n These formulas are applicable if a sample was to be taken from an infinitely or extremely large population. But when we are dealing with finite populations, we adjust our variance formulas by the finite population correction . • In the finite population version of the Central Limit Theorem, we assume the estimators ycU = y and bt = Ny have sampling distributions that are approximately normal. That is, N − n S2 S2 y ∼_ N y ; and t ∼_ N t ; N(N − n) cU U N n b n • For large samples, approximate 100(1 − α)% confidence intervals for yU (µ) and t (τ) are For yU : For t : (14) s r N − n s2 s2 y ± z∗ Ny ± z∗ N(N − n) N n n s N − n p y ± z∗s =n Ny ± z∗s N(N − n)=n (15) N where z∗ is the upper α=2 critical value from the standard normal distribution.

2.3 Simple Random Sampling • Simple Random Sampling Without Replacement (Srswor) of Size N Is the Probability Sampling Design

Hypothesis Testing with Two Categorical Variables 203

Esomar/Grbn Guideline for Online Sample Quality

Lecture 1: Why Do We Use Statistics, Populations, Samples, Variables, Why Do We Use Statistics?

Assessment of Socio-Demographic Sample Composition in ESS Round 61

Summary of Human Subjects Protection Issues Related to Large Sample Surveys

Lesson 3: Sampling Plan 1. Introduction to Quantitative Sampling Sampling: Definition

Chapter 3: Simple Random Sampling and Systematic Sampling

Indicators for Support for Economic Integration in Latin America

R(Y NONRESPONSE in SURVEY RESEARCH Proceedings of the Eighth International Workshop on Household Survey Nonresponse 24-26 September 1997

Categorical Data Analysis

Synergy of Quantitative and Qualitative Marketing Research − Capi and Observation Diary1

Lecture 8: Sampling Methods