<<

2.7 Theory Resultsfor Simple Random * 43

in the alphabet is associated with the characteristic of interest. However, systematic sampling is not the same as simple random sampling; it does not have the property that every possible group ofn units has the same probability of being the . In the preceding example, it is impossible to have students 345 and 346 both appear in the sample. Systematic sampling is technically a form ofcluster sampling, as will be discussed in Chapter 5. Most of the time, a systematic sample gives results comparable to those of an SRS, and SRS methods can be used in the analysis. If the population is in random order, the systematic sample will be much like an SRS. The population itself can be thought of as being mixed. In the quote at the beginning of the chapter, Sorensen reports that President Kennedy used to read a systematic sample of letters written to him at the White House. This systematic sample most likely behaved much like a random sample. Note that Kennedy was well aware that the letters he read, although representative of letters written to the White House, were not at all representative of . Systematic sampling does not necessarily give a representative sample, though, if the listing of population units is in some periodic or cyclical order. If male and female names alternate in the list, for example, and k is even, the systematic sample will contain either all men or all women-this cannot be considered a representative sample. In ecological surveys done on agricultural land, a ridge-and-fuITow topogra- phy may be present that would lead to a periodic pattern ofvegetation. Ifa systematic sampling scheme follows the same cycle, the sample will not behave like an SRS. On the other hand, some populations are in increasing or decreasing order. A list of accounts receivable may be ordered from largest amount to smallest amount. In this case, estimates from the systematic sample may have smaller (but unestimable) than comparable estimates from the SRS. A systematic sample from an ordered list of accounts receivable is forced to contain some large amounts and some small amounts. It is possible for an SRS to contain all small amounts or all large amounts, so there may be more variability among the sample of all possible SRSs than there is among the sample means of all possible systematic samples. In systematic sampling, we must still have a and be careful when defining the target population. Sampling every 20th student to enter the library will not give a representative sample of the student body. Sampling every 10th person exiting an airplane, though, will probably give a representative sample of the persons on that flight. The sampling frame for the airplane passengers is not written down, but it exists all the same. 2.7 Randomization Theory Results for Simple Random Sampling*l In this section we show that y is an unbiased estimator ofYu: Yu is the average of all possible values of ys if we could examine all possible SRSs S that could be chosen. We also calculate the variance of ygiven in Equation (2.7) and show that the estimator in Equation (2.9) is unbiased over repeated sampling.

I An asterisk (*) indicates a section, chapter, or exercise that requires more mathematical background.

= N.

I- N

=

N -

N

=

(E[ZiD -

E[Zi] )2 V(Zi) n) ( n 22 n ( n

that Note

ZN. , ...

Z!,

variables

random

the of

properties

using calculated y also is of variance The

N n

i=!

N n

i=!

i=!

=

[

= Yu·

- = L....J

- - L....J

L....J

- - i "Z Y

- "Yi

Yi E[

"n

Yi

E

- -]

N N ] N

and

,N

=- E[Z] = E[ZiJ

n 2

(2.18), Equation of consequence a As

N

samples

possible of number

=

=

I) ---::------:--=-----,-,-----=-- Zi P(

n

( n-l

unit i including samples of number

-1) N

so

1,

- size N of population a from drawn be may 1 - n

size of

samples

possible

(

of

total A population. the in units I - N other the

from

chosen

be must

sample

the in

units 1 - n other the then sample, the in is unit i

if

that

note

this, see

To

SRS. an of

definition the from follows (2.18) in probability The

=

(2.18)

N'

sample) in = unit i = P(select = 1) P(Zi Jri

n

with variables random Bernoulli

distributed

identically are

N} Z

{Z , ... 1, population, the in units the N of out

units

n

of

SRS an

choose

we When

quantities. fixed are the Yi'S theory,

randomization

to

according

because,

equation

above the in variables random only the are Zi'S The

n n i=1 iES

"Z

i-· L....J

= - L....J y=

Yi "Yi -

N

Then

otherwise

0

,- Z

{I ._

sample the in is unit i if

define

(1944),

Cornfield in done As sampling. random eni simple in

sample the of

properties

deriving

for works theory randomization the how see Let's

variables.

random

of

distribution the about

assumptions any make not need inference-we

to

approach nonparametric

a

provides approach theory randomization The

sample.

the in

be to

units

selecting of

probabilities the from arise used probabilities any

numbers-

unknown

but

fixed be

to considered are y/s the sampling, to approach

design-based) called

(also

theory

randomization the In mean J-L. with distributed

normally

are 's the Yi that

assume instance, for not, Yu. do We

estimating for

unbiased

y

is

that

ascertain to order in 's

the Yi

about made are assumptions distributional No

Samples Probability Simple 2: 44 Chapter ------:. 2.7 Randomization Theory Results for Simple Random Sampling* 45

For i =1= j, E[Z;Zj] = P(Z; = 1and Zj = 1) = P(Zj = 1 I Z; = l)P(Z; = 1)

Because the population is finite, the Z;'s are not quite independent-if we know that unit i is in the sample, we do have a small amount of information about whether unit j is in the sample, reflected in the conditional probability P(Zj = 1 I Z; = 1). Consequently, for i =1= j, COV(Zi, Zj) = E[ZiZj] - E[ZilE[Zj] n - 1 n (n )2 =N-1N-N (1- We use the (Cov) of Zi and Zj to calculate the variance of y; see Ap- pendix B for properties of . The negative covariance of Zi and Zj is the source of the fpc.

V(y) = :2 V (t ZiY;)

= :2 COV ( t ZiY;, t ZjYj) l[N ] = n2 8 y;V(Z;) + 8 YiYj COV(Zi, Zj)

="21 [n- ( 1- -n) LN Yi2 - LNNL Y;Yj-_-1 (1 --n )( -n )] n NNi=1 i=1 Hi N 1 NN

= Y;- YiYj] n NN[ti=1 N 1 t;=1 tj""i

= (1 - 1 [(N - 1) t l -(t Yi)2 +t l] n N N(N - 1) i=1 i=1 i=1

To show that the estimator in (2.9) is an unbiased estimator of the variance, we need to show that E[s2] = S2. The argument proceeds much like the previous one. Since S2 = (Yi - Yu? I(N - 1), it makes sense when trying to find an unbiased estimator to find the expected value of LiES(Yi - y)2 and then find the multiplicative

population,

finite

the

for

values

actual

The

model.

some

from

N

generated

2 Y

,

••• Y ,

Y

j ,

variables

random

of

thinking

by

sampling

to

approach this extend can We

statistics.

variou

of

values

expected

find

to

distribution

normal

the

and

variables

random dent

indepen of

properties used

and a

variance

and

J.1,

mean

2 with

distribution

normal a

fron

distributed

identically

and

independent

n

were

2 Y

1

, Y •.• ,

Y , that

example,

for

assumec

you

Thus

variables.

random

those of

realizations

were

values

sample

actual

th and

distribution,

probability

some

followed

that

{Yd

variables

random

had

you

Then

inference. to

approach

different a

learned

you

class, statistics basic your In

sample.

the

in

included

aJ units

that

probabilities

the are

y

of

variance

and

value

expected the

finding in

probabilities

only

The Yu.

equals S

samples

possible

all yS

for

of

average

the

becau:

unbiased

is y

and

values,

fixed

be to

considered

were YN

, ...

Y2, Yl,

theory:

randomizatic

using

y

mean

sample

the

of

properties

found, we 2.7 Section In

generator.

numb

random

the for

value

starting

different

a

used

we

had

sampled

been have

cou

units

nonsampled

the

that is

sampled

not

units

and

sampled

units

between

relationsh

only

the

inference,

sampling to

approach

theory,

randomization

or

based,

desig

a

In

not.

or

sample

the

in

is

unit ith

the

whether

us

tell that

variables

random

simI

are

They

Yi:

responses

the

with

concerned not

are

theory

randomization

in

variab

random

The

you.

to

strange

seemed

probably

section

preceding

the in

proofs

1

,

of

design

the

in theory

randomization studied have you Unless

Sampling*

Random Simple for Model A

2.8

Thus,

-1)S2. = (n

NN

S2 n - N _ S2

1) = - n(N

N

N i=1

(1-

!!...)S2 -

yu)2 - t(Yi = !!...

1=1

nV(y)

- yu)2] -

Zi(Yi [t = E

lES

[I:

)2] Yu -

n(y - YU)2 -

(Yi = E

IES

[I: lES

yu)}2]

-

-

Yu) - {(Yi

E = y)2] - [I:(Yi E

unbiasedness: the

give will that constant

Samples

Probability Simple 2: Chapter 46