
2.3 Simple Random Sampling • Simple random sampling without replacement (srswor) of size n is the probability sampling design for which a fixed number of n units are selected from a population of N units without replacement such that every possible sample of n units has equal probability of being selected. A resulting sample is called a simple random sample or srs. • Note: I will use SRS to denote a simple random sample and SR as an abbreviation of `simple random'. • Some necessary combinatorial notation: { (n factorial) n! = n × (n − 1) × (n − 2) × · · · × 2 × 1: This is the number of unique arrangements or orderings (or permutations) of n distinct items. For example: 6! = 6 × 5 × 4 × 3 × 2 × 1 = 720. N N(N − 1) ··· (N − n + 1) N! { (N choose n) = = : This is the n n! n!(N − n)! number of combinations of n items selected from N distinct items (and the order of 6 6! (6)(5)(4!) (6)(5) selection doesn't matter). For example, = = = = 15. 2 2!4! 2!4! (2)(1) N • There are n possible SRSs of size n selected from a population of size N. N • For any SRS of size n from a population of size N, we have P (S) = 1= n : • Unless otherwise specified, we will assume sampling is without replacement. 2.3.1 Estimation of yU and t • A natural estimator for the population mean yU is the sample mean y. Because y is an estimate of an individual unit's y-value, multiplication by the population size N will give us an estimate bt of the population total t. That is: n n 1 X N X y = y = y t = y = (10) cU n i b n i i=1 i=1 • ycU and bt are design unbiased. That is, the average values of y and Ny taken over all possible SRSs equal yU and t, respectively. Demonstration of Unbiasedness: Suppose we have a population consisting of five y-values: Unit i 1 2 3 4 5 yi 0 2 3 4 7 which has the following parameters: 2 N = t = yU = S = S ≈ 5 Suppose a SRS of size n = 2 is selected. Then P (S) = 1= 2 = 1=10 for each of the 10 possible SRSs. 22 All Possible Samples and Statistics from Example Population P 2 2 Sample Units y-values yi ycU = y bt = Ny Sb = s Sb = s S1 1,2 0,2 2 1 5 2 1.4142 S2 1,3 0,3 3 1.5 7.5 4.5 2.1213 S3 1,4 0,4 4 2 10 8 2.8284 S4 1,5 0,7 7 3.5 17.5 24.5 4.9497 S5 2,3 2,3 5 2.5 12.5 .5 0.7071 S6 2,4 2,4 6 3 15 2 1.4142 S7 2,5 2,7 9 4.5 22.5 12.5 3.5355 S8 3,4 3,4 7 3.5 17.5 .5 0.7071 S9 3,5 3,7 10 5 25 8 2.8284 S10 4,5 4,7 11 5.5 27.5 4.5 2.1213 Column Sum 32 160 67 22.6274 32 160 67 22:6274 Expected value 10 = 3:2 10 = 16 10 = 6:7 10 = 2:26274 2 = E(estimator) = yU = t = S =6 S 2 2 The averages for estimators ycU = y, bt = Ny; and Sb = s equal the parameters that they 2 2 are estimating. This implies that y, Ny, and s are unbiased estimators of yU , t, and S . 2 2 2 2 Notation: E(ycU ) = yU , E(bt) = t, E(Sb ) = S or E(y) = yU , E(Ny) = t, E(s ) = S . The average for estimator Sb = s does not equal the parameter S. This implies that s is a biased estimator of S. Notation: E(Sb) =6 S or E(s) =6 S. • The next problem is to study the variances of ycU = y and bt = Ny. • Warning: In an introductory statistics course, you were told that thep variancep of the sample mean V (Y ) = S2=n (= σ2=n) and its standard deviation is S= n (= σ= n). This is appropriate if a sample was to be taken from an infinite or extremely large population. • However, we are dealing with finite populations that often are not considered extremely N − n large. In such cases, we have to adjust our variance formulas by which is known N as the finite population correction (f.p.c.). N − n n • Texts may rewrite the f.p.c. as either 1 − or 1 − f where f = n=N is the N N fraction of the population that was sampled. By definition : S2 V (y ) = V (y) = V (t) = N 2V (y) = N(N − n) (11) cU b n • Because S2 is unknown, we use s2 to get unbiased estimators of the variances in (11):: s2 Vb(y ) = Vb(y) = Vb(t) = N 2Vb(y) = N(N − n) (12) cU b n • Taking a square root of a variance in (11) yields the standard deviation of the estimator. • Taking a square root of an estimated variance in (12) yields the standard error of the estimate. 23 N − n S2 3 6:7 • Thus, V (y) = = = and N n 5 2 2 2 S 6:7 V (bt) = N V (y) = N(N − n) = (5)(3) = . n 2 • Like ycU and bt, the variances Vb(ycU ) and Vb(bt) are design unbiased. That is the average of Vb(ycU ) and Vb(bt) taken over all possible SRSs equal V (ycU ) = 2:01 and V (bt) = 50:25, respectively. N − n s2 3 s2 • For the estimated variances we have Vb(ycU ) = = = and N n 5 2 s2 s2 Vb(bt) = N(N − n) = (5)(3) = where s2 is a particular sample variance. n 2 Example: We will use our population from the previous example: Unit, i 1 2 3 4 5 yi 0 2 3 4 7 which have the following parameters 2 N = 5 t = 16 yU = 3:2 S = 6:7 S ≈ 2:588 Estimated Variances of ycU and bt for All Samples 2 2 2 Sample Units y-values s Vb(ycU ) = :3s Vb(bt) = 7:5s S1 1,2 0,2 2 0.6 15 S2 1,3 0,3 4.5 1.35 33.75 S3 1,4 0,4 8 2.4 60 S4 1,5 0,7 24.5 7.35 183.75 S5 2,3 2,3 .5 0.15 3.75 S6 2,4 2,4 2 0.6 15 S7 2,5 2,7 12.5 3.75 93.75 S8 3,4 3,4 .5 0.15 3.75 S9 3,5 3,7 8 2.4 60 S10 4,5 4,7 4.5 1.35 33.75 Column Sum • From the table we have E(Vb(ycU )) = 20:1=10 = 2:01 = V (ycU ) and E(Vb(bt)) = 502:5=10 = 50:25 = V (bt). Thus, we see that both variance estimators are unbiased. • If N is large relative to n, then the finite population correction (f.p.c.) will be close to (but less than) 1. Omitting the finite population correction from the variance formulas (i.e., replacing (N − n)=N with 1) will slightly overestimate the true variance. That is, there is a small positive bias. I personally would not recommend omitting the f.p.c.. • If N is not large relative to n, then omitting the f.p.c. from the variance formulas can seriously overestimate the true variance. That is, there can be a large positive bias. N − n • As n ! N, ! 0. That is, as the sample size approaches the population size, the N f.p.c. approaches 0. Thus, in (11) and (12) the variances ! 0 as n ! N. 24 2.3.2 SRS With Replacement • Consider a sampling procedure in which a sampling unit is randomly selected from the population, its y-value recorded, and is then returned to the population. This process of randomly selecting units with replacement after each stage is repeated n times. Thus, a sampling unit may be sampled multiple times. A sample of n units selected by such a procedure is called a simple random sample with replacement. s2 • The estimators for SRS with replacement are: y = y Vb(y ) = Vb(y) = cU cU b n • Suppose we have two estimators θb1 and θb2 of some parameter θ. θb1 is less efficient than θb2 for estimating θ if V (θb1) > V (θb2). θb1 is more efficient than θb2 for estimating θ if V (θb1) < V (θb2). • For most situations, the estimator for a SRS with replacement is less efficient than the estimator for a SRS without replacement. • There will be circumstances (such as sampling proportional to size) where we will consider sampling with replacement. Unless otherwise stated, we assume that sampling is done without replacement. 2.4 Two-Sided Confidence Intervals for yU and t • In an introductory statistics course, you were given confidence interval formulas s s y ± z∗ p and y ± t∗ p (13) n n These formulas are applicable if a sample was to be taken from an infinitely or extremely large population. But when we are dealing with finite populations, we adjust our variance formulas by the finite population correction . • In the finite population version of the Central Limit Theorem, we assume the estimators ycU = y and bt = Ny have sampling distributions that are approximately normal. That is, N − n S2 S2 y ∼_ N y ; and t ∼_ N t ; N(N − n) cU U N n b n • For large samples, approximate 100(1 − α)% confidence intervals for yU (µ) and t (τ) are For yU : For t : (14) s r N − n s2 s2 y ± z∗ Ny ± z∗ N(N − n) N n n s N − n p y ± z∗s =n Ny ± z∗s N(N − n)=n (15) N where z∗ is the upper α=2 critical value from the standard normal distribution.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages23 Page
-
File Size-