Chu Hai College

Total Page:16

File Type:pdf, Size:1020Kb

Chu Hai College

Chapter 3: Statistical Estimation and Hypothesis Testing Page 1

Department of Mathematics Faculty of Science and Engineering City University of Hong Kong

MA3518: Applied Statistics

Chapter 3: Statistical Estimation and Hypothesis Testing

Statistical inference is the process of extracting useful information from the observed data about the unknown underlying mechanism that generates the data. It can be considered as the basis of many other branches in statistical science. Three important topics in statistical inference are point estimation, interval estimation and hypothesis testing. This chapter provides a brief introduction to some basic but essential concepts and techniques in statistical inference, for instance, the construction of confidence intervals, one-sample t-test and two-sample t test for an unknown population mean. We will also discuss in some detail the use of SAS procedures to perform the tasks. Topics included in this chapter are listed as follows:

Section 3.1: Statistical Inference: One-sample and Two-sample Cases Section 3.2: Statistical computing by SAS

Extract Useful Information From Data! Chapter 3: Statistical Estimation and Hypothesis Testing Page 2

Section 3.1: Statistical Inference: One-sample and Two-sample Cases

1. Statistical inference:

 The process of extracting useful information from the observed data about the unknown underlying mechanism that generates the data

 Draw inferences about the values of unknown parameters in a probability distribution for a random sample based on the observed data in the sample

2. Real-life Examples:

 Estimate the average weight of female students in a university based on the weights of a sample of female students chosen from the population

 Estimate the average monthly income of the negative-asset group in Hong Kong and indicate the precision of the estimate based on a sample of negative-asset people

3. Three methods for statistical inference:

 Point Estimation  Interval Estimation  Hypothesis testing

4. Point Estimation:

 Estimate an unknown parameter of a probability distribution by a single number evaluated from sample data

 The single number is called a sample statistic or a statistic Chapter 3: Statistical Estimation and Hypothesis Testing Page 3

 A simple example:

Given a random sample of size n with observations {x1, x2,…, xn}

Estimate the population mean  based on the set of observations

The sample mean or sample average x is given by:

1 n x =  xn n i1

Use the sample mean x as a point estimate of 

Prior sampling, each observation of the sample is unknown and the sample can be represented by a set of random variables {X1, X2, …, Xn}

In this case, the sample mean X is a random variable given by:

1 n X =  Xn n i1

We call X a point estimator of 

Since X is a random variable, one can determine its statistical properties by a probability distribution

We call the probability distribution for X the sampling distribution for X (see Lecture Note: Chapter one) Chapter 3: Statistical Estimation and Hypothesis Testing Page 4

If the probability distribution for each of {X1, X2, …,  2 Xn} is normal, then the exact distribution of X is N( , /n)

5. Central Limit Theorem (CLT): A fundamental theorem of statistical inference

 Suppose  and  2 are the population mean and the population variance, respectively

 The sampling distribution for X can be approximated accurately by a normal distribution with mean  and variance  2/n when the sample size n is sufficiently large (i.e. n  30)

6. Interval Estimation:

 Precision of a point estimate:

(a) The reciprocal of the standard deviation of the point estimate

(b) SD( X ) =  / n (i.e. SD( X ) is the standard error for X

(c) Precision of estimate increases as the sample size n does

 Point estimation: Cannot tell anything about the precision of the estimator

 Interval Estimation: Use an interval to estimate an unknown parameter in order to indicate the precision of the estimator Chapter 3: Statistical Estimation and Hypothesis Testing Page 5

 Case I:  2 is known

Consider a random sample {X1, X2, …, Xn} of size n

For sufficiently large n, X ~ N(  , 2/n) approximately if the distribution for each Xi (i = 1, 2,…, n) is not normal

Suppose the population variance  2 is known

Estimate the unknown population mean  by constructing an interval estimate for the mean  based on the approximated distribution N(  , 2/n) for X

In particular, a 95% confidence interval for  is given by:

( X - 1.96  / n , X + 1.96  / n )

As n becomes large, the confidence interval becomes narrower

=> A higher level of precision is achieved

Interpretation of the confidence interval:

P( X - 1.96  / n    X + 1.96  / n ) = P(-1.96  ( X -  )/( / n )  1.96) = P(-1.96  Z  1.96) (Z ~ N(0, 1)) = 95% (From the standard normal table)

The values of the upper limit and the lower limit of the confidence interval ( X - 1.96  / n , X + 1.96  / n ) Chapter 3: Statistical Estimation and Hypothesis Testing Page 6

can be determined each time after the experiment of random sampling is conducted If we conduct the random experiment 100 times, we get 100 realizations of the confidence interval ( X - 1.96  / n , X + 1.96  / n )

Suppose the experiment of random sampling is repeated many times

Then, the frequenlist’s interpretation of the 95% confidence interval is that 95% of the realizations of the confidence interval contain the unknown mean 

In general, a 100(1 -  )% confidence interval for  is given by:

  ( X - z /2 / n , X + z /2 / n )

where z /2 is the critical value of a standard normal distribution and is given by:

P(Z  z /2) =  /2

We can check the value of z /2 from the standard normal table

Remarks:

(a) The 100(1 -  )% confidence interval is just an approximation when the probability distribution of Xi (i = 1, 2,…,n) is non-normal

(b) The result of the approximation is reasonably good when the sample size n  10 for moderately non- Chapter 3: Statistical Estimation and Hypothesis Testing Page 7

normal distribution and when n  30 for highly skewed and/or heavy-tailed distribution (c) An exact 100(1 -  )% confidence interval can be obtained when each of {X1, X2, …,Xn} is normally distributed

(d) One can justify the normality assumption by various tests of normality

 Case II:  2 is unknown

An unbiased point estimator of  2:

n 2 1 2 s =  (Xi – X ) n 1 i1

An approximate 100(1 -  ) % confidence interval for the unknown population mean  is given by:

( X - t /2 (n - 1) s/ n , X + t /2 (n - 1) s/ n )

where t /2 (n - 1) is the critical value of the student’s t distribution with degree of freedom n – 1

7. Hypothesis Testing:

 One-sample test:

Test whether a parameter is different from a given value (or a given range of values)

 Two-sample test:

Chapter 3: Statistical Estimation and Hypothesis Testing Page 8

Test whether the parameters in two different samples are the same as each other

 Statistical hypotheses:

Null hypothesis H0 v.s. Alternative hypothesis H1

 Significance level  :

(a) Determine the critical value for a test

(b) Specify the reliability of the test

(c) Must be given in advance

(d) Type I error (i.e. The probability of rejecting H0 when H0 is true)

 Testing procedures: (One-sample case)

(a) Case I:  2 is known

Given a random sample {X1, X2, …, Xn} of size n from a normal distribution with unknown mean  but known variance  2

Test the following hypotheses at significance level 5%:

    H0: = 0 v.s. H1: > 0 (One-sided test)

 Here, 0 is a given number Chapter 3: Statistical Estimation and Hypothesis Testing Page 9

  2 Under H0, X ~ N( 0 , /n)

Test statistic under H0:

 Z = ( X - 0) / ( / n ) ~ N(0, 1)

Decision: Reject H0 if Z > z0.05 = 1.645 or the p-value = P(Z > zobs) < 0.05

If n is large (i.e. n  10 for moderately non- normal case or n  30 for highly skewed case), the above result is a reasonably good approximation even when the probability distribution for a random sample {X1, X2, …, Xn} is not normal

Other choices of the alternative hypothesis H1 can be

  (i) < 0 (One-sided test)   (ii)  0 (Two-sided test)

  Decision of the test for H1: < 0:

Reject H0 if Z < - z0.05 = - 1.645 the p-value = P(Z <- zobs) < 0.05

  Decision of the test for H1:  0:

Reject H0 if | Z | > z0.025 = 1.96 or the p-value = Chapter 3: Statistical Estimation and Hypothesis Testing Page 10

P(|Z| > zobs) < 0.05

(a) Case II:  2 is unknown

Consider a random sample {X1, X2, …, Xn} of size n from a normal distribution with unknown population mean  and unknown population variance  2

Test the hypotheses at significance level 5%:

    H0: = 0 v.s. H1: > 0 Replace  2 by the sample variance:

n 2 1 2 s =  (Xi – X ) n 1 i1

Test statistic under H0:

 T = ( X - 0) / (s/ n ) ~ t(n - 1)

Decision: Reject H0 if T > t0.05 (n – 1) or the p-value = P(T > tobs) < 0.05

  Decision for the test when H1: < 0:

Reject H0 if T < - t0.05 (n - 1) or the p-value = P(T < - tobs) < 0.05

  Decision for the test when H1:  0:

Reject H0 if | T | > t0.025 (n - 1) or the p-value = P(|T| > tobs) < 0.05 Chapter 3: Statistical Estimation and Hypothesis Testing Page 11

If n is large (i.e. n > 10 for moderately non-normal case or n > 30 for highly skewed and/or heavy- tailed case), the above result can be a reasonably good approximation even when the random sample {X1, X2, …, Xn} does not follow a normal distribution

When n is very large, the student’s t distribution with degree of freedom n – 1, say t(n - 1), tends to a standard normal distribution N(0, 1).

In this case, under H0

 T = ( X - 0) / (s/ n ) ~ N(0, 1) approximately

Decision for the test at significance level 5% when the   alternative hypothesis is H1:  0:

Decision: Reject H0 if | Z | > z0.025 = 1.96 or the p-value = P(|Z| > zobs) < 0.05

  Decision for the test when H1: > 0:

Reject H0 if Z > z0.05 = 1.645 or the p-value = P(Z > zobs) < 0.05

  Decision for the test when H1: < 0:

Reject H0 if Z < - z0.05 = - 1.645 or the p-value = P(Z < - zobs) < 0.05 Chapter 3: Statistical Estimation and Hypothesis Testing Page 12

 Large sample size:

(a) The sample average X is approximately normal

(b) The estimate is more precise (i.e. A smaller standard error)

(c) The sample has higher chance to represent better the major characteristics of the population

 Nonparametric tests for median

Suppose the data does not follow a normal distribution or we are not sure whether the data come a normal distribution or not. It is particular the case when we deal with ordinal data

The Central Limit Theorem (CLT) cannot be applied when the sample size is small

In this case, the parametric test based on the normality assumption cannot produce a reasonably good approximation

Adopt the nonparametric tests for median:

(a) Sign test: ‘Sign’ information

(b) Wilcoxon signed-rank test: ‘Sign’ information and Ordinal ‘Magnitude’ information

Basic assumption: The data follows a symmetric distribution with unknown form Chapter 3: Statistical Estimation and Hypothesis Testing Page 13

 Testing procedures: (Two-independent sample test)

Case I:  2 is a known

Consider the following random samples for the variable X and the variable Y:

 2 X1, X2, …, Xn1 ~ N( 1, )  2 Y1, Y2, …, Yn2 ~ N( 2, )

Testing the hypotheses at significance level :

    H0: 1 = 2 v.s. H1: 1  2

Test statistic under H0:

1 1 Z = ( X - Y )/[ ( + )1/2] ~ N(0, 1) n1 n2

Decision: Reject H0 if | Z | > z / 2 or the p-value  = P(|Z| > zobs) <

Other choices of alternative hypotheses are given by:

  (a) H1: 1 < 2   (b) H1: 1 > 2

  Decision for the test when H1: 1 < 2:

Reject H0 if Z < - z or the p-value = P(Z < - zobs) <  Chapter 3: Statistical Estimation and Hypothesis Testing Page 14

  Decision for the test when H1: 1 > 2:

Reject H0 if Z > z or the p-value = P(Z > zobs) < 

Case II:  2 is unknown

Test the hypotheses at significance level 

    H0: 1 = 2 v.s. H1: 1  2

Estimate  2 by the pooled estimate of the common 2 variance sp defined as follows:

2 2 2 sp := [(n1 – 1) s1 + (n2 – 1) s2 ]/(n1+n2 – 2)

2 One can interpret sp as the sample variance of the combined sample of X and Y which consists of n1+n2 observations

Test statistic under H0:

1 1 T = ( X - Y )/[s ( + )1/2] ~ t(n +n – 2) p n1 n2 1 2

Decision: Reject H0 if | T | > t / 2 (n1+n2 – 2) or the p-value = P(|T| > tobs) < 

Other choices of alternative hypotheses are given by:

  (a) H1: 1 < 2   (b) H1: 1 > 2

  Decision for the test when H1: 1 < 2:

Reject H0 if T < - t (n1+n2 – 2) or the p-value = P(T <- tobs) Chapter 3: Statistical Estimation and Hypothesis Testing Page 15

< 

  Decision for the test when H1: 1 > 2:

Reject H0 if T > t (n1+n2 – 2) or the p-value = P(T > tobs) < 

 2  2 Case III: 1 and 2 are different but known

Test the hypotheses at significance level  :

    H0: 1 = 2 v.s. H1: 1  2

Test statistic under H0:

2 2 1/2 Z = ( X - Y )/[ ( 1 /n1+ 2 /n2) ] ~ N(0, 1)

Decision: Reject H0 if | Z | > z / 2 or the p-value  = P(|Z| > zobs) <

Other choices of alternative hypotheses are given by:

  (a) H1: 1 < 2   (b) H1: 1 > 2

  Decision for the test when H1: 1 < 2:

Reject H0 if Z < - z or the p-value = P(Z <- zobs) < 

  Decision for the test when H1: 1 > 2:

Reject H0 if Z > z or the p-value = P(Z > zobs) < 

2 2 Case IV:  1 and  2 are different and unknown Chapter 3: Statistical Estimation and Hypothesis Testing Page 16

Test the hypotheses at significance level  :

    H0: 1 = 2 v.s. H1: 1  2

 2  2 2 Estimate 1 and 2 by the sample variances s1 and 2 s2 respectively

Test statistic under H0:

2 2 1/2 T = ( X - Y )/[(s1 /n1 + s2 /n2) ] ~ t(df)

2 2 2 2 2 2 2 df = (s1 /n1 + s2 /n2) / {(s1 /n1) [1/(n1 – 1)] + (s2 /n2)  [1/(n2 – 1)]}

Decision: Reject H0 if | T | > t / 2 (df) or the p-value  = P(|T| > tobs) <

Other choices of alternative hypotheses are given by:

  (a) H1: 1 < 2   (b) H1: 1 > 2

  Decision for the test when H1: 1 < 2:

Reject H0 if T < - t (df) or the p-value = P(T < - tobs) < 

  Decision for the test when H1: 1 > 2:

Reject H0 if T > t (df) or the p-value = P(T > tobs) < 

Remarks: Chapter 3: Statistical Estimation and Hypothesis Testing Page 17

(a) If the random samples for X and Y are not normally distributed, we can only obtain approximate results for all of the two-independent sample tests

(b) If the sample size n1 and n2 are sufficiently large, a reasonably good approximation for the two- independent sample test can be obtained

(c) The two-independent sample t test can be considered a special case of ANOVA (i.e. The number of group is two)

 Test for equality of two variances (Variance-Ratio test or F- test):

 2 Suppose X1, X2,…,Xm iid ~ N( 1,  1 ) and  2 Y1, Y2,…,Yn iid ~ N( 2,  2 )

Assume that the two samples are independent

  2 1 and 2 are unknown nuisance parameters while  1 and 2  2 are interested unknown parameters

2 2 2 2 Test H0:  1 =  2 v.s. H1:  1 > 2 (One-sided test) at significance level 

2 2 2 Point estimates of  1 and  2 are the sample variances s1 2 and s2 , respectively

Test statistic under H0:

2 2 F = s1 /s2 ~ F(m-1, n-1) Chapter 3: Statistical Estimation and Hypothesis Testing Page 18

Decision: Reject H0 if F > F (m-1, n-1) or the p-value  P(F > fobs) <

Question: What are the testing procedures if we consider  the one-sided alternative H1: 2 2 1 < 2 and the  2 two-sided alternative H1: 1  2  2 ?

 2  2 Decision when H1: 1 < 2 :

Reject H0 if F < F1- (m-1, n-1) or the p-value = P(F < fobs) < 

 2  2 Decision when H1: 1  2 (i.e. Two-sided test):

(a) Based on the critical values of the F-distribution, we reject H0 if

F < F1- /2(m-1, n-1) or F > F /2(m-1, n-1)

(b) Based on the p-value, we reject H0 if the p-value

 p = P(F > fobs) + P(F < 1/fobs) <

An important remark:

Before performing the two-sample t test, one can perform the two-sided variance-ratio test to decide whether 2  1 =  2  2  2 2 or 1  2 Chapter 3: Statistical Estimation and Hypothesis Testing Page 19

 Testing procedures: (Paired t test)

Given a random sample of n paired observations (x1, y1), (x2, y2),…, (xn, yn) for the variables X and Y which are dependent or highly correlated

We are interested in studying the characteristic of the difference of the two variables X and Y. In this case, the paired t test is a good choice

Define a new variable Z by:

Z = X – Y,

where Z represents the difference of X and Y

 2 Assume that Z ~ N( z,  z ), where the population variance 2  z is unknown

    Test H0: z = 0 v.s. H1: z  0 (i.e. two-sided test) at significance level 

Observations for Z are then given by:

zi := xi - yi, for i = 1, 2, …, n

Let z and sz denote the sample mean and the sample standard deviation of the observations {z1, z2, …, zn}, respectively Chapter 3: Statistical Estimation and Hypothesis Testing Page 20

Test statistic under H0:

 T = ( z - 0) / (sz/ n ) ~ t(n –1)

Decision: Reject H0 if | T | > t /2 (n-1) or P(|T| > tobs) = 

  Decision for the test when H1: z > 0:

 Reject H0 if T > t (n-1) or P(T > tobs) <

  Decision for the test when H1: z < 0:

 Reject H0 if T <- t (n-1) or P(T <- tobs) <

Section 3.2: Statistical computing by SAS

1. Construct confidence intervals for the population mean 

Try to run the following SAS program

DATA CIExample; INPUT Y; CARDS; 10 Chapter 3: Statistical Estimation and Hypothesis Testing Page 21

8 12 10 11 9 12 13 7 6 15 5 3 16 18 ; PROC MEANS DATA = CIExample ALPHA = 0.05 N MEAN STD CLM; VAR Y; RUN; Options: N – display the number of data in the SAS output MEAN - display the mean of the data STD – display the standard deviation of the data CLM – display the confidence interval of the mean

The SAS output is given by:

The SAS System 16:49 Wednesday, August 20, 2003 1

The MEANS Procedure

Analysis Variable : Y

Lower 95% Upper 95% N Mean Std Dev CL for Mean CL for Mean ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ 15 10.3333333 4.1861449 8.0151235 12.6515431 ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ From the SAS output the 95% confidence interval for the population mean is given by:

(8.0151235, 12.6515431)

Due to the fact that we are not sure whether the random sample comes from a normal distribution or not, the 95% confidence interval may only be an approximation to the exact one. Chapter 3: Statistical Estimation and Hypothesis Testing Page 22

The approximation is reasonably good if the underlying probability distribution for the random sample is moderately non-normal since 10 < n < 30

2. Hypothesis testing for the population mean  :

 Use PROC UNIVARIATE statement to perform a two-sided t test for the following hypotheses on the unknown population mean 

  Test H0: = 0 v.s. H1:  0 (by default)

   Test for a non-zero value 0 for the unknown mean

    Test H0: = 0 v.s. H1:  0 Use Proc Univariate on the original variable by adding the  option command mu0= 0 . Another usual option command is to specify Normal to test the normality assumption.

Alternatively Step I: Perform the following data manipulation

 NEW VARIABLE = ORIGINAL VARIABLE - 0 Step II: Use the PROC UNIVARIATE statement to perform a two- sided t test on whether the population mean of the new variable equal to zero or not.

 Perform one-side t test Chapter 3: Statistical Estimation and Hypothesis Testing Page 23

  Test H0: = 0 v.s. H1: > 0 Step I: Use the PROC UNIVARIATE statement to perform a two- sided t test Step II: Divide the p-value from the SAS output by two

Step III:

Draw conclusion based on the p-value divided by two

(i.e. Reject H0 if the p-value divided by two < 0.05 if the significance level is 5%)

 Example:

  Test H0: = 10 v.s. > 10 at significance level  = 5% based on the dataset specified by the following SAS data step

DATA CIExample; Chapter 3: Statistical Estimation and Hypothesis Testing Page 24

INPUT Y; Z = Y - 10; CARDS; 10 8 12 10 11 9 12 13 7 6 15 5 3 16 18 ; PROC UNIVARIATE DATA = CIExample; VAR Z; RUN;

The part of SAS output relevant to testing the hypotheses is given by:

The SAS System 12:26 Thursday, August 21, 2003 1

The UNIVARIATE Procedure Variable: Z

Tests for Location: Mu0=0

Test -Statistic------p Value------

Student's t t 0.308397 Pr > |t| 0.7623 Sign M 0.5 Pr >= |M| 1.0000 Signed Rank S 4 Pr >= |S| 0.8054

From the SAS output, the p-value of the student’s t test for the two-sided t test on whether the population mean Mu of the variable Z is equal to zero or not is 0.7623. Hence, the p-value divided by two is 0.381 which is greater than 0.05 (for one-sided t test) Chapter 3: Statistical Estimation and Hypothesis Testing Page 25

 Conclusion: Do not reject H0: = 10 at significance level 5%

3. SAS Procedure for paired t-test:

 Use the PROC TTEST statement to perform paired t test in the SAS system

 Example:

Consider the dataset and the variable Z (i.e. the difference of two variables X and Y) defined by the following SAS data step:

DATA Pairexample; INPUT Y X; Z = X – Y; CARDS; 10 11 8 12 12 13 10 11 11 12 9 10 12 8 13 6 7 11 6 10 15 12 5 13 3 6 16 17 18 16 ;  2 2 Assume that Z ~ N( z,  z ), where  z is unknown

  Test H0: z = 0 v.s. H1: z  0 (two-sided t test) at significance level  = 0.05

The SAS procedure for testing the above hypotheses is Chapter 3: Statistical Estimation and Hypothesis Testing Page 26

given by:

PROC TTEST DATA = Pairexample; VAR Z; RUN;

It performs a two-sided t test for the hypothesis   H0: z = 0 v.s. H1: z  0

The SAS output is given as follows:

The SAS System 16:12 Thursday, August 21, 2003 1

The TTEST Procedure

Statistics

Lower CL Upper CL Lower CL Upper CL Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err

Z 15 -1.193 0.8667 2.9267 2.7235 3.72 5.8667 0.9605

T-Tests

Variable DF t Value Pr > |t|

Z 14 0.90 0.3822

From the SAS result, the p-value is given by 0.3822 > 0.05

 Hence, we do not reject H0: z = 0 at significance level  = 0.05

3. SAS Procedure for two-independent-sample t test and the variance-ratio test Chapter 3: Statistical Estimation and Hypothesis Testing Page 27

 Use the PROC TTEST statement to perform the two-sided two-independent-sample t test and the two-sided variance- ratio test (or two-sided F-test) in the SAS system

 Example:

Suppose we are interesting in the difference of the average monthly income of graduates from university A and that of graduates from university B. The sample data for the monthly incomes (quoted in the unit of thousand dollars) of the graduates from the two universities is summarized as follows:

A 10 8 9 12 11 16 20 7 21 18 B 11 7 6 10 12 11 12 8 16 15

We want to test whether there is a significant difference of  between the average monthly incomes A of the graduates  from university A and the average monthly incomes B of the graduates from university B

    Test H0: A = B v.s. H1: A  B at significance level 

Use the following statements to read the data to the SAS system:

Title ‘Comparison of Average Monthly Incomes’; DATA MI; INPUT University $ Income @@; DATALINES; A 10 A 8 A 9 A 12 A 11 A 16 A 20 A 7 A 21 A 18 B 11 B 7 B 6 B 10 B 12 B 11 B 12 B 8 B 16 B 15 Chapter 3: Statistical Estimation and Hypothesis Testing Page 28

; RUN;

Remarks:

(a) The ‘$’ sign in the input statement indicates that University is a character variable

(b) The ‘@@’ sign in the input statement enable the procedure to read more than one observation per line

Use the PROC TTEST statement to perform the t-test and the Variance-ratio test in the SAS system

PROC TTEST COCHRAN CI = EQUAL UMPU; CLASS University; VAR Income; RUN;

Remarks:

(a) The CLASS statement specifies the variable that distinguishes the groups being compared

(b) The VAR statement specifies the response variable used in the computations

(c) The COCHRAN option creates and displays the p- values for the two-sample t-test under the unequal variance situation using the Cochran and Cox (1950) approximation to the probability level of the Chapter 3: Statistical Estimation and Hypothesis Testing Page 29

approximation of the test statistic (i.e. t-statistic)

(d) The CI = Option is used to provide an equal tailed and uniformly most powerful unbiased (UMPU) confidence intervals for the population variances

The SAS output is shown as follows:

Comparison of Average Monthly Incomes 1 21:27 Sunday, September 21, 2003

The TTEST Procedure

Statistics

Equal UMPU UMPU Lower CL Upper CL Lower CL Lower CL Upper CL Variable University N Mean Mean Mean Std Dev Std Dev Std Dev Std Dev

Income A 10 9.5244 13.2 16.876 3.5342 3.4208 5.1381 8.9697 Income B 10 8.493 10.8 13.107 2.2182 2.147 3.2249 5.6298 Income Diff (1-2) -1.63 2.4 6.4303 3.2412 3.187 4.2895 6.2115

Statistics

Equal Upper CL Variable University Std Dev Std Err Minimum Maximum

Income A 9.3802 1.6248 7 21 Income B 5.8874 1.0198 6 16 Income Diff (1-2) 6.3435 1.9183

T-Tests

Variable Method Variances DF t Value Pr > |t|

Income Pooled Equal 18 1.25 0.2269 Income Satterthwaite Unequal 15.1 1.25 0.2299 Chapter 3: Statistical Estimation and Hypothesis Testing Page 30

Income Cochran Unequal 9 1.25 0.2424

Equality of Variances

Variable Method Num DF Den DF F Value Pr > F

Income Folded F 9 9 2.54 0.1814

Remarks:

 The PROC TTEST statement provides the test statistic under the assumption that the population variances of the two groups are equal

 It can also provide an approximation to the test statistic under the assumption that the population variances of the two groups are not equal

In this case, Satterthwaite's (1946) approximation is used to compute the degrees of freedom associated with the approximation to the test statistic while the probability level for the approximation to the test statistic is computed by the Cochran and Cox (1950) approximation

 The two-sided test for equality of population variances (i.e. the two-sided F-test) is also provided by the PROC TTEST statement

~ End of Chapter 3~

Recommended publications