<<

Chapter 6

Sampling and Distributions

6.1 Definitions

A is a set or collection of all possible observations of some • characteristic.

A is a part or subset of the population. •

A random sample of size  is a sample that is chosen in such a way as to ensure • that every sample of size  has the same probability of being chosen.

A is a number describing some (unknown) aspect of a population. (i.e. • )

A is some function of the sample observations. (i.e. ¯) •

The of a statistic is known as a .(How • is ¯ distributed)

We need to distinguish the distribution of a , say ¯ from the re- • alization of the random variable (ie. we get and calculate some sample say ¯ =42)

1 2 CHAPTER 6. SAMPLING AND SAMPLING DISTRIBUTIONS

Populations and Samples

. A Population is the set of all items or individuals of interest . Examples: All likely voters in the next election All parts produced today All sales receipts for November

. A Sample is a subset of the population . Examples: 1000 voters sel ected at random for interview A few parts selected for destructive testing Random receipts selected for audit

Statistics for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 7-4

Figure 6.1: 6.1. DEFINITIONS 3

Population vs. Sample

Population Sample

a b c d b c ef gh i jk l m n g i n o p q rs t u v w o r u x y z y

Statistics for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 7-5

Figure 6.2: 4 CHAPTER 6. SAMPLING AND SAMPLING DISTRIBUTIONS

Note on Statistics

The value of the statistic will change from sample to sample and we can therefore • think of it as a random variable with it’s own probability distribution.

¯ is a random variable •

Repeated sampling and calculation of the resulting statistic will give rise to a dis- • tribution of values for that statistic. 6.1. DEFINITIONS 5

Sampling Distributions

. A sampling distribution is a distribution of all of the possible values of a statistic for a given size sample selected from a population

Statistics for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 7-10

Figure 6.3: 6 CHAPTER 6. SAMPLING AND SAMPLING DISTRIBUTIONS

Chapter Outline

Sampling Distributions

Sampling Sampling Sampling Distribution of Distribution of Distribution of Sample Sample Sample Mean Proportion Difference in

Statistics for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 7-11

Figure 6.4: 6.2. IMPORTANT THEOREMS RECALLED 7 6.2 Important Theorems Recalled

2 Suppose 12   are independent with []= and  []=  = 1 2. ∀

Suppose  = 11 + 22 +  +  + ,then:

[ ]=[  + ]=1[1]+2[2]+ + []+ ··· X

= 11 +  + 

=  +  X

and

2 2 2  [ ]= [  + ]=  [1]+  [2]+ +   []  1 2 ···  X

= 22 + 22 + + 2 2 1 1 2 2 ···   2 2 =   because of independence X

2 and if  is normal  i.e.  (  ) independently : ∀ ∼   ∀

2 2  (  +    ) ∼    X X

6.3 Frequently used Statistics

6.3.1 The sample mean

Let 12 be a random sample of size  from a population with mean  • and 2. The sample mean is: 8 CHAPTER 6. SAMPLING AND SAMPLING DISTRIBUTIONS

1  ¯ =    =1 X

1. The expected value of the sample mean is the population mean:

1  1  1 [¯]=(  )= [ ]= ( +  + )=      =1 =1 X X 2. The variance of the sample mean (   independent): 0

1  1   [¯]= (  )=  ( )   2  =1 =1 X X 1  2 = 2 =  2  =1 X 3. If we do not have independence it can be shown that

2    [¯]= − where  is the population size   1 µ − ¶   − is called the correction factor  1 µ − ¶   ¯ 2 and if  is large relative to  then −1 1 so that  []=  − ⇒ ¡ ¢ Note on Sample Mean

1. The use of the formulas for expected values and of sums of random variablesthatwesawinchapter5.

2. The variance of the sample mean is a decreasing function of the sample size.

3. The of the sample mean (under independence)   ¯ =  √ 6.4. SAMPLING DISTRIBUTION OF THE SAMPLE MEAN 9

6.3.2 The sample variance

Let 12 be a random sample of size  from a population with variance • 2.

Thesamplevarianceis: •

1  2 = ( ¯)2  1  =1 − − X 1. [2]=2 (we omit the proof)

6.4 Sampling distribution of the Sample Mean

Sampling from a Normal Population

Let ¯ be the sample mean of an independent random sample of size  from a • population with mean  and variance 2.

Then we know that [¯]= and  [¯]= 2 . •  If we further specify the population distribution as being normal,then •

2  (  ) for all  ∼ and we can write:

2 ¯    ∼  µ ¶ 6.5 Equation for the Standardized Sample Mean

Since ¯ (  2 ) we can ask what transformation can give us to a standard normal ∼  Generically the approach is ALWAYS • Random Variable-Mean of Random Variable  = Standard Deviation of Random Variable What does that mean for ¯ : • 10 CHAPTER 6. SAMPLING AND SAMPLING DISTRIBUTIONS

Random Variable = ¯ •

Mean of Variable = [¯]= •

 Standard Deviation of Variable =  ¯ = •  √

Putitalltogether •

¯   = −  √ 6.5. EQUATION FOR THE STANDARDIZED SAMPLE MEAN 11

Z-value for Sampling Distribution of the Mean

. Z-value for the sampling distribution of X :

( X  μ) ( X  μ) Z   σ σ X n

where:X = sample mean μ = population mean σ = population standard deviation n = sample size

Statistics for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 7-22

Figure 6.5: 12 CHAPTER 6. SAMPLING AND SAMPLING DISTRIBUTIONS

Sampling Distribution Properties (continued)

. For sampling with replacement:

As n increases, Larger sample size σ x decreases

Smaller sample size

μ x

Statistics for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 7-26

Figure 6.6:

Example of Standardizing for Sample Mean The lengths of individual machined parts coming off a production line at Morton Metal- works are normally distributed around their mean of  =30centimeters. Their standard deviation around the mean is  = 1 centimeter. An inspector just took a sample of  =4of these parts and found that ¯ forthissampleis29.875centimeters.Whatis the probability of getting a sample mean this low or lower if the process is still producing parts at a mean of  =30?

Answer

Given the population is normally distributed with mean  =30and standard • 2 deviation  = 1, we know that  (301 ) for all , ∼ so

¯ (3012) ∼ Now want to apply our transformation stuff: 6.5. EQUATION FOR THE STANDARDIZED SAMPLE MEAN 13

¯  29875 30  (¯ 29875) =  ( − − ) ≤ √ ≤ 1√4

29875 30 =  ( − )= ( 250) = 0062 ≤ 1√4 ≤−

Questions: 7.3.

6.5.1 Sampling from a Non Normal Distribution

We have seen that we can obtain the exact sampling distribution for the sample • mean if the individual  are all independent normal variates.

What happens when the 0 are not normally distributed? •  14 CHAPTER 6. SAMPLING AND SAMPLING DISTRIBUTIONS

Developing a Sampling Distribution

. Assume there is a population … D . Population size N=4 A B C . Random variable, X, is age of individuals . Values of X: 18, 20, 22, 24 (years)

Statistics for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 7-13

Figure 6.7: 6.5. EQUATION FOR THE STANDARDIZED SAMPLE MEAN 15

Developing a Sampling Distribution (continued)

Summary Measures for the Population Distribution:

X P(x) μ   i N .25 18  20 22  24   21 4

2 0 (Xi  μ) 18 20 22 24 x σ    2.236 N A B C D Uniform Distribution

Statistics for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 7-14

Figure 6.8: 16 CHAPTER 6. SAMPLING AND SAMPLING DISTRIBUTIONS

6.5.2 The

Developing a Sampling Distribution (continued) Now consider all possible samples of size n = 2 1st 2nd Observation Obs 18 20 22 24 16 Sample Means 18 18,18 18,20 18,22 18,24 1st 2nd Observation 20 20,18 20,20 20,22 20,24 Obs 18 20 22 24 22 22,18 22,20 22,22 22,24 18 18 19 20 21 24 24,18 24,20 24,22 24,24 20 19 20 21 22

16 possible samples 22 20 21 22 23 (sampling with 24 21 22 23 24 replacement)

Statistics for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 7-15

Figure 6.9: 6.5. EQUATION FOR THE STANDARDIZED SAMPLE MEAN 17

Developing a Sampling Distribution (continued) Sampling Distribution of All Sample Means

16 Sample Means Sample Means Distribution 1st 2nd Observation _ Obs 18 20 22 24 P(X) .3 18 18 19 20 21 .2 20 19 20 21 22 .1 22 20 21 22 23 0 _ 24 21 22 23 24 18 19 20 21 22 23 24 X (no longer uniform) Statistics for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 7-16

Figure 6.10: 18 CHAPTER 6. SAMPLING AND SAMPLING DISTRIBUTIONS

Let 12 be an independent random sample having identical distribution from apopulationofany shape with mean  and variance 2.

Then if  is large:

2 ¯ ( ) approximately for large  ∼ 

and similarly we can use the same transformation to standard form:

¯   = − (0 1) approximately for large  √ ∼

Notes on the Central Limit Theorem

1. This result holds only for large n and we refer to such results as holding asymp- totically.Inthiscasewesaythat¯ is asymptotically normally distributed

2. We have a short-hand way to write that the distribution of  is independently and identically (  ) distributed with mean  and variance 2

2  (  ) ∼

2 3. Identically implies that []= and  []= for all . That is the distribution of each observation ( =1)isthesame. 6.5. EQUATION FOR THE STANDARDIZED SAMPLE MEAN 19

Central Limit Theorem

the sampling As the n? distribution sample becomes size gets almost normal large regardless of enough… shape of population

x

Statistics for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 7-28

Figure 6.11: 20 CHAPTER 6. SAMPLING AND SAMPLING DISTRIBUTIONS

Example

. Suppose a population has mean µ= 8and standard deviation s= 3. Suppose a random sample of size n = 36 is selected.

. What is the probability that the sample mean is between 7.8 and 8.2?

Statistics for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 7-31

Figure 6.12:

See ?? 6.5. EQUATION FOR THE STANDARDIZED SAMPLE MEAN 21

Example (continued) Solution (continued):   μ - μ  7.8 - 8 X 8.2- 8  P(7.8  μ X  8.2)  P     3 σ 3   36 n 36   P(-0.5  Z  0.5)  0.3830

Population Sampling Standard Normal Distribution Distribution Distribution .1915 ?? ? ? ?? +.1915 ? ?? ? Sample Standardize ? ? -0.5 0.5 7.8 8.2 μ  0 μ  8 X z Z μ X  8 x

Statistics for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 7-33

Figure 6.13:

Example From Transformation to Standard Form when Sampling from a Non- Normal Distribution

The delay time for inspection of baggage at a border station follows a bimodal • distribution with a mean of  =8minutes and a standard deviation of  =6 minutes. A sample of  =64from a particular minority group has a mean of ¯ =10minutes. Is their evidence that this minority group is being detained longer than usual? • Howlikelyisitthatasamplemeanof¯ 10 if the population mean is 8? • ≥

Answer:

note from the question that this population from which we are sampling is non • normal we know this is from the word bimodal since the normal has one • 22 CHAPTER 6. SAMPLING AND SAMPLING DISTRIBUTIONS

If the  are , then by applying (or invoke) the central limit theorem: •

10 8  (¯ 10)  ( − ) ≥ ≈ ≥ 6√64 =  ( 267) = 0038 ≥ approximately.

Questions: NCT 7.11, 7.18, 7.19, &7.20.

6.6 Sampling Distribution: Differences of Sample Means ¯1 ¯2 − Let ¯1 and ¯2 be the means of two samples from two separate and independent • populations. 2 ¯ ¯ 1 [1]=1  [1]= 1

2 ¯ ¯ 2 [2[= 2  [2]= 2

Since ¯1 and ¯2 are independent:

[¯1 ¯2]=[¯1] [¯2]=  − − 1 − 2

2 2 1 2  [¯1 ¯2]= [¯1]+ [¯2]= + − 1 2

6.6.1 Normal Case for ¯1 ¯2 − 2 2 If 1 are  as (11) , 2 are  .as(22),and1 and 2 are independent, then we have an exact normal result for any sample size:

2 2 ¯ ¯ 1 2 1 2 (1 2 + ) − ∼ − 1 2 Recall a linear combination of normals is normal • 6.6. SAMPLING DISTRIBUTION: DIFFERENCES OF SAMPLE MEANS ¯1 ¯2 23 −

6.6.2 Non Normal Case for ¯1 ¯2 −

2 If 1 are  with mean 1 and variance 1, 2 are  .withmean2 and 2 variance 2,and1 and 2 are independent, then using the central limit theorem, for large  1 ,and 2

2 2 ¯ ¯ 1 2 1 2 (1 2 + ) approximately or asymptotically − ∼ − 1 2

6.6.3 Example of Difference of Means with Non-Normal Popu- lation Suppose that right-handed (RH) students have a mean IQ of 80 units with variance 1,400 and left-handed (LH) students have a mean IQ of 80 with variance 1,320. What is the probability that the sample mean IQ of RH students will be at least 5 units higher than the sample mean IQ of LH students if we take a sample of 100 RH students and 120 LH students?

Answer: Let 1 be the IQ of the ith RH student and 2 be the IQ of the ith LH student.

2 1 =801 =14001 = 100

2 2 =802 =13202 = 120

and we want to find (¯1 ¯2 5). − ≥

Since 1 and 2 are both large we can apply the central limit theorem,

2 2 ¯ ¯ 1 2 1 2 (1 2 + ) approximately − ∼ − 1 2 here:

2 2 1 2 1 2 =80 80 = 0 and + =25 − − 1 2

Note: Formula for the standardizing transformation is:

(¯1 ¯2) (  )  = − − 1 − 2 ¯1 ¯2 − 24 CHAPTER 6. SAMPLING AND SAMPLING DISTRIBUTIONS

where

2 2 1 2 ¯1 ¯2 = +  − s1 2

So that

¯1 ¯2 (0 25)  − ∼

 (¯1 ¯2 5) =  ( 1) = 1587 − ≥ ≥ 6.7. SAMPLING DISTRIBUTION OF SAMPLE PROPORTION 25

6.7 Sampling Distribution of Sample Proportion

Let  be a binomially distributed random variable (the number of successes in  trials). Recallthesampleproportionis •  ˆ = s the fraction of successes in  trials  In Chapter 6 we used the normal approximation to the binomial as the number of • trials got large ((1 ) 9) − ≥ This is another application of the Central Limit Theorem • ()= =  and ()=2 = (1 ) − So we might ask what is the [] and  [] •

•  []  [ˆ]= = = =     ∙ ¸ Recall trials are independent so that •   [] (1 ) (1 )  [ˆ]= = = − = −  2 2  ∙ ¸ So we apply the generic formula •

Random Variable-Mean of Random Variable  = Standard Deviation of Random Variable What is mean for ˆ • Random Variable is ˆ • Mean of Variable is [ˆ]= • (1 ) Standard Deviation of Variable is  = − •  Putitalltogether q •

ˆ   = −  for large  (1 ) − q 26 CHAPTER 6. SAMPLING AND SAMPLING DISTRIBUTIONS

6.8 Sampling Distribution of Sample Proportion: ˆ1 − ˆ2

Consider two independent populations. •

1. Population 1:

1 1 =    1 =    1ˆ1 =  1 Recall [ˆ1]=1 and 1(1 1)  [ˆ1]= −  1 2. Population 2:

2 = number of successes

2 = number in sample 2 2 ˆ2 = 2 2(1 2)  [ˆ2]= − 2

3. Form Difference of Sample Proportion: ˆ1 ˆ2 −

If 1 and 2 are large, i.e. 11(1 1) 9 and 22(1 2) 9 then: • − ≥ − ≥

1(1 1) 2(1 2) ˆ1 ˆ2 (1 2 − + − )  − ∼ − 1 2

This is another application of the central limit theorem •

Questions: NCT 7.21,7.22, 7.27 & 7.36. Omit Sampling Distribution of the Sample Variance