Topic 2. Distributions, Hypothesis Testing, and Sample Size Determination

Topic 2. Distributions, hypothesis testing, and sample size determination

The Student - t distribution (ST&D pg 56 and 77) Consider a repeated drawing of samples of size n = 5 from a normal distribution. Y s For each sample compute , s, Y , and another statistic, t:

s  t (n-1)= (Y - )/ Y (Remember Z = (Y - )/ Y )

The t statistics is the number of standard error that separate Y and its hypothesized mean µ.

df=n-1=4

Critical values for |t|>|Z| -> less sensitivity. This is the price we pay for being uncertain about the population i Fig. 1. Distribution of t (df=4) compared to Z. The t distribution is symmetric, but wider and flatter than the Z distribution, lying under it at the center and above it in the tails.

When N increases the t distribution tend towards the N distribution

2 2. 2. Confidence limits based on sample statistics (ST&D p.77)

The general formula for any parameter  is: Estimated   Critical value * Standard error of the estimated 

So, for a population mean estimated via a sample mean:

  Y  t  sY ,n1 2 The statistic Y is distributed about  according to the t distribution so it satisfies

Y s Y s P{ - t /2, n-1 Y    + t /2, n-1 Y }= 1- 

For a confidence interval of size 1-, use a t value corresponding to  /2.

Therefore the confidence interval is

s s Y - t /2, n-1 Y    Y + t /2, n-1 Y

These two terms represent the lower and upper 1-  confidence limits of the mean. The interval between these terms is called confidence interval (CI).

Example: Data Set 1 of Hordeum 14 malt extraction values

Y s 14 = 75.94 Y = 1.23 / = 0.3279 A table gives the t0.025,13 value of 2.16

95% CI for  = 75.94 ± 2.160 * 0.3279  [75.23- 76.65]

If we repeatedly obtained samples of size 14 from the population and constructed these limits for each, we expect 95% of the intervals to contain the true mean.

True mean

Fig. 2 Twenty 95% confidence intervals. One out of 20 intervals does not include the true mean.

2. 3. Hypothesis testing and power of the test (ST&D 94).

= 75.94, s = sn2 / = 0.3279, t = 2.160, CI: [75.23- 76.65] Example Barley data. Y Y 0.025,13

1) Choose a null hypothesis: Test Ho  = 78 against the H1   78. 2) Choose a significance level: Assign  = 0.05 3) Calculate the test statistic t: Y   75.94  78.00 t    6.28 sY 0.3279 (interpretation: the sample mean is 6.3 SE from the hypothetical mean of 78. Too far!).

4) Compare the absolute value of the test statistic to the critical statistic: | - 6.28 | > 2.16

5) Since the absolute value of the test statistic is larger, we reject H0.

This is equivalent to calculate a 95% confidence interval for the mean.

Since o (78) is not within the CI [75.23- 76.65] we reject Ho.

is called the significance level of the test (<0.05): probability of incorrectly rejecting a true Ho, a Type I error.

is the Type II error: to incorrectly accept Ho when it is false  Null hypothesis Accepted Rejected

True Correct decision Type I error=  Null hypothesis False Type II error= Correct decision= Power= 1-

Power of the test: 1- is the power of the test, and represents the probability of correctly rejecting a false null hypothesis. It is a measure of the ability of the test to detect an alternative mean or a significant difference when it is real

Note that for a given Y and s, if 2 of the 3 quantities , , and n are specified then the third one can be determined. Choose the right number of replications to keep Type I error  and Type II error  under the desired limits (e.g.  <0.05 & <0.20).

Power of a test (ST&D pg 118-119)

 between 2 means in SE 1   0 1   0 Power 1    P(Z  Z  ) or P(t  t  ) units  / 2   / 2 s Y Y

What is the power of a test for Ho: = 74.88 in the barley data set against H1:  = s 75.94. Since  = 0.05, n = 14 (t 0.025,13 = 2.160), and Y = 0.32795.

75.94  74.88 Power  1    P(t  2.160  )  P(t  1.072)  0.85 0.32795

The probability of the Type II error  we are looking for is the shaded area to the left of the lower curve.

/2 Fig. 3. Type I and Type II errors in the Barley Ho is true data set.

Ho is false

 1-

Acceptance Rejection The area 1- in the rejection region = the probability that > 75.588 under H1 = power =

P(t>(75.588-75.94)/0.32795)= P(t>-1.072)=0.85 same as above!

The magnitude of  depends on 1. The Type I error rate  2. The distance between the two means under consideration s 3. The number of observations (n)  s  Y n

5 When the distance between the two means is reduced,  increases

Variation of power as a function of the distance between the alternative hypotheses (Biometry Sokal and Rohlf)

n=35 SE=0.7

n=5 SE=1.7

6 2.3.2. Power of the test for the difference between the means of two samples

Two types of alternative hypothesis H0: 1 - 2=0 versus H1: 1 - 2  0 (two tail test) -> (t value: top t Table) H1: 1 - 2 >0 (one tail test) -> (t value: bottom t Table)

The general power formula for both equal and unequal sample sizes reads as:

| 1   2 | | 1   2 | Power  P(t  t  )  P(t  t  ) s 2 , 2 Y 1Y 2 2 s pooled N

2 2 2 2 (n1 1)s1  (n2 1)s2 where s pooled is a weighted variance given by: s pooled  (n1 1)  (n2 1) n n and N  1 2 . n1  n2

When n1 = n2 = n (equal sample sizes) that the formulas reduce to:

2 2 2 2 2 2 2 (n1 1)s1  (n2 1)s2 (n 1)(s1  s2 ) s1  s2 s pooled    (n1 1)  (n2 1) 2(n 1) 2

2 n1n2 n n N    n1  n2 2n 2

| 1   2 | | 1   2 | Power  P(t  t  )  P(t  t  ) s 2 2 Y 1Y 2 2 2s pooled n The variance of the difference between two random variables is the sum of the variances (error are always added) (ST&D 113-115).

The degrees of freedom for the critical t /2 are

General case: (n1-1) + (n2-1) Equal sample size: 2*(n-1)

7 2. 4. 2 Sample size for estimating µ, when is known. Using the z statistic If the population variance  is known the Z statistic may be used.

Y    Z  so CI = Y  Z  or Y  Z   / 2 Y  / 2 n The formula for d= half-length of the confidence interval for the mean is  d d  Z   Z [ Y d ]  Y  2 2 n

This can be rearranged to estimate the confidence interval in terms of the population variance. For =0.05:

2 2 2 2 2 2 2 2 n = z /2  / d = z /2 (/d) = (1.96) (/d) = 3.8(/d)

So if d=  n 4 d= 0.5   n 16 d= 0.25   n 64

2      The equation may be expressed in terms   2 2    2 CV of the coefficient of variation n  Z  2  Z  2 2  d  2  d           

CV= s / Y (as a proportion not as a %) d/ is the confidence interval as a fraction of the population mean. For example d/ < 0.1 means that the length of the confidence interval should not be larger than two tenth of the population mean.

d/ < 0.1 and so 2d/ < 0.2 Example: The CVs of yield trials in our experimental station are never higher than 15%. How many replications are necessary to have a 95% CI for the true mean of less than 1/10 of the average yield? 2d= 0.1 so d= 0.1/2= 0.05 n= 1.962 0.152/0.052 = 34.6  35

8 2. 4. 3 Sample size for the estimation of the mean Unknown 2. Stein's Two-Stage Sample

Consider a (1 - )% confidence interval about some mean µ:

Y t s Y t s - ,n1 Y    + ,n1 Y 2 2

The half-length (d) of this confidence interval is therefore:

s d  t s  t ,n1 Y ,n1 2 2 n This formula can be rearranged to estimate necessary sample size n

2 2 2 s 2  n  t 2  Z 2 ,n1 2 d 2 d

Stein's Two-Stage procedure involves using a pilot study to estimate s2.

Note that n is now present at both sides of the equation: iterative approach

Example: An experimenter wants to estimate the mean height of certain plants. From a pilot study of 5 plants, he finds that s = 10 cm. What is the required sample size, if he wants to have the total length of a 95% CI about the mean be no longer than 5 cm?

2 2 2 Using n = t /2,n-1 s / d , n is estimated iteratively,

initial-n t5%, df n 5 2.776 (2.776)2 (10)2 /2.52 = 123 123 1.96 (1.96)2 (10)2 / 2.52 = 62 62 2.00 64 64 2.00 64 Thus with 64 observations, he could estimate the true mean with a CI no longer than 5 cm at =0.05.

To accelerate the iteration you can start with Z:

n = z2 s2 / d2 = (1.96)2 (10)2 / 2.52 = 62

9 2. 4. 4 Sample size estimation for a comparison of two means

When testing the hypothesis Ho: o, we can take into account the possibility of a Type I and Type II error simultaneously.

To calculate n we need to known the alternative 1 or at least the minimum difference we wish to detect between the means  = |o - 1 The formula for computing n, the number of observations on each treatment, is:  2 n = 2 ( /  (Z/2 + Z)

2 For = 0.05 and = 0.20, z= 0.8416 and z /2 = 1.96, (Z/2 + Z) =7.85 8 We can define in terms of   If δ = 2σ, n ≈ 4  If δ = 1σ, n ≈ 16  If δ = 0.5σ, n ≈ 64

We rarely know 2 and must estimate it via sample variances:

2 2  s pooled    2 2   s1  s2 n  2  t  t ,n1n22 , where s     ,n1n22  pooled     2  2 n is estimated iteratively. If no estimate of s is available, the equation may be expressed in terms of the CV, and  as a proportion of the mean:

 2 2 2 n  2 [(/) / ( (Z/2 + Z)  2(CV/%) (Z/2 + Z)

2 Example: Two varieties are compared for yield, with a previously estimated s = 2.25 (s=1.5). How many replications are needed to detect a difference of 1.5 tons/acre with a  = 5%, and  = 20%?

 2 2 Approximate: n  2 (/ (Z/2 + Z) = 2 (1.5/1.5) (1.96+0.8416)= 15.7  2 Then use n = 2 (s /  (t/2 + t) to estimate the sample size iteratively.

guesstimate n df = 2(n - 1) t0.025 t0.20 estimated n 16 30 2.0423 0.8538 16.8 17 32 2.0369 0.8530 16.7 The answer is that there should be 17 replications of each variety.

10 2. 4. 5. Sample size to estimate population standard deviation

2 The chi-squared ( ) distribution is used to establish confidence intervals around the sample variance as a way of estimating the true, unknown population variance.

2. 4. 5. 1. The Chi- square distribution (ST&D p. 55)

0 .5 2 df 0 .4 4 df 6 df 0 .3

0 .2

0 .1

0 .0

1 2 3 4 5 6 C hi - sq ua r e Distribution of 2 , for 2, 4, and 6 degrees of freedom.

Relation between the normal and chi-square distributions. The 2 distribution with df = n is defined as the sum of squares of n independent, normally distributed variables with zero means and unit variances. 2 2 2  α, df=1= Z (0,1) α/2  α/2, df= 2 2 2 2  1, 0.05 = 3.84 and Z (0,1), 0.025 = t , 0.025 = 1.96 = 3.84 Note: Z values from both tails go into the upper tail of the χ2 because of the disappearance of the minus sign in the 2 squaring. For this reason we use  for the  and /2 for Z and t. n 2 2 ()Yi   1 2 Zi  22 ()Yi  i1 

If we estimate the parametric mean  with a sample mean, we obtain: 2 112 ()ns n (Y Y )2 n ()YY 2 i  2 2 2  i 2 …due to: s   (Yi Y )  (n 1)s   i1 n 1 i1

This expression, which has a 2n-1 distribution, provides a relationship between the sample variance and the parametric variance.

11 2. 4. 5. 2. Confidence interval for 2 2 2 2 We can make the following statement about the ratio (n-1) s / that has n-1 distribution, 2 2 2 2 P {  1-/2, n-1  (n-1) s /   /2, n-1} = 1 - 

Simple algebraic manipulation of the quantities in the inequality yields

2 2 2 2 P {  1-/2, n-1 /(n-1)  s /   /2, n-1/(n-1)} = 1 -  which is useful when the precision of s2 can be expressed in terms of the % of 2.

Or inverting the ratio and moving (n-1): 2 2 2 2 2 P {(n-1) s /  /2, n-1    (n-1) s /  1-/2, n-1} = 1 -  2 which is useful to construct 95% confidence intervals for  .

Example: What sample size is required to obtain an estimate of  that deviates no more than 20% from the true value of  with 90% confidence?

2 2 Pr {0.8 < s/ < 1.2} = 0.90 = Pr {0.64 < s / < 1.44} = 0.90 thus 2 2  (1 - /2, n-1) / (n-1)= 0.64 and  (/2, n-1) / (n-1)= 1.44

2 Since  is not symmetrical, the above two solutions may not identical for small n. The computation involves an arbitrary initial n and an iterative process:

df 1 - /2 = 95% /2 = 5% 2 2 2 2 n (n-1)  (n-1)  (n-1) /(n-1)  (n-1)  (n-1) /(n-1) 21 20 10.90 0.545 31.4 1.57 31 30 18.50 0.616 43.8 1.46 41 40 26.50 0.662 55.8 1.40 36 35 22.46 0.642 49.8 1.42 35 35 21.66 0.637 48.6 1.43

Thus a rough estimate of the required sample size is ~ 36.