Chu Hai College
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 3: Statistical Estimation and Hypothesis Testing Page 1
Department of Mathematics Faculty of Science and Engineering City University of Hong Kong
MA3518: Applied Statistics
Chapter 3: Statistical Estimation and Hypothesis Testing
Statistical inference is the process of extracting useful information from the observed data about the unknown underlying mechanism that generates the data. It can be considered as the basis of many other branches in statistical science. Three important topics in statistical inference are point estimation, interval estimation and hypothesis testing. This chapter provides a brief introduction to some basic but essential concepts and techniques in statistical inference, for instance, the construction of confidence intervals, one-sample t-test and two-sample t test for an unknown population mean. We will also discuss in some detail the use of SAS procedures to perform the tasks. Topics included in this chapter are listed as follows:
Section 3.1: Statistical Inference: One-sample and Two-sample Cases Section 3.2: Statistical computing by SAS
Extract Useful Information From Data! Chapter 3: Statistical Estimation and Hypothesis Testing Page 2
Section 3.1: Statistical Inference: One-sample and Two-sample Cases
1. Statistical inference:
The process of extracting useful information from the observed data about the unknown underlying mechanism that generates the data
Draw inferences about the values of unknown parameters in a probability distribution for a random sample based on the observed data in the sample
2. Real-life Examples:
Estimate the average weight of female students in a university based on the weights of a sample of female students chosen from the population
Estimate the average monthly income of the negative-asset group in Hong Kong and indicate the precision of the estimate based on a sample of negative-asset people
3. Three methods for statistical inference:
Point Estimation Interval Estimation Hypothesis testing
4. Point Estimation:
Estimate an unknown parameter of a probability distribution by a single number evaluated from sample data
The single number is called a sample statistic or a statistic Chapter 3: Statistical Estimation and Hypothesis Testing Page 3
A simple example:
Given a random sample of size n with observations {x1, x2,…, xn}
Estimate the population mean based on the set of observations
The sample mean or sample average x is given by:
1 n x = xn n i1
Use the sample mean x as a point estimate of
Prior sampling, each observation of the sample is unknown and the sample can be represented by a set of random variables {X1, X2, …, Xn}
In this case, the sample mean X is a random variable given by:
1 n X = Xn n i1
We call X a point estimator of
Since X is a random variable, one can determine its statistical properties by a probability distribution
We call the probability distribution for X the sampling distribution for X (see Lecture Note: Chapter one) Chapter 3: Statistical Estimation and Hypothesis Testing Page 4
If the probability distribution for each of {X1, X2, …, 2 Xn} is normal, then the exact distribution of X is N( , /n)
5. Central Limit Theorem (CLT): A fundamental theorem of statistical inference
Suppose and 2 are the population mean and the population variance, respectively
The sampling distribution for X can be approximated accurately by a normal distribution with mean and variance 2/n when the sample size n is sufficiently large (i.e. n 30)
6. Interval Estimation:
Precision of a point estimate:
(a) The reciprocal of the standard deviation of the point estimate
(b) SD( X ) = / n (i.e. SD( X ) is the standard error for X
(c) Precision of estimate increases as the sample size n does
Point estimation: Cannot tell anything about the precision of the estimator
Interval Estimation: Use an interval to estimate an unknown parameter in order to indicate the precision of the estimator Chapter 3: Statistical Estimation and Hypothesis Testing Page 5
Case I: 2 is known
Consider a random sample {X1, X2, …, Xn} of size n
For sufficiently large n, X ~ N( , 2/n) approximately if the distribution for each Xi (i = 1, 2,…, n) is not normal
Suppose the population variance 2 is known
Estimate the unknown population mean by constructing an interval estimate for the mean based on the approximated distribution N( , 2/n) for X
In particular, a 95% confidence interval for is given by:
( X - 1.96 / n , X + 1.96 / n )
As n becomes large, the confidence interval becomes narrower
=> A higher level of precision is achieved
Interpretation of the confidence interval:
P( X - 1.96 / n X + 1.96 / n ) = P(-1.96 ( X - )/( / n ) 1.96) = P(-1.96 Z 1.96) (Z ~ N(0, 1)) = 95% (From the standard normal table)
The values of the upper limit and the lower limit of the confidence interval ( X - 1.96 / n , X + 1.96 / n ) Chapter 3: Statistical Estimation and Hypothesis Testing Page 6
can be determined each time after the experiment of random sampling is conducted If we conduct the random experiment 100 times, we get 100 realizations of the confidence interval ( X - 1.96 / n , X + 1.96 / n )
Suppose the experiment of random sampling is repeated many times
Then, the frequenlist’s interpretation of the 95% confidence interval is that 95% of the realizations of the confidence interval contain the unknown mean
In general, a 100(1 - )% confidence interval for is given by:
( X - z /2 / n , X + z /2 / n )
where z /2 is the critical value of a standard normal distribution and is given by:
P(Z z /2) = /2
We can check the value of z /2 from the standard normal table
Remarks:
(a) The 100(1 - )% confidence interval is just an approximation when the probability distribution of Xi (i = 1, 2,…,n) is non-normal
(b) The result of the approximation is reasonably good when the sample size n 10 for moderately non- Chapter 3: Statistical Estimation and Hypothesis Testing Page 7
normal distribution and when n 30 for highly skewed and/or heavy-tailed distribution (c) An exact 100(1 - )% confidence interval can be obtained when each of {X1, X2, …,Xn} is normally distributed
(d) One can justify the normality assumption by various tests of normality
Case II: 2 is unknown
An unbiased point estimator of 2:
n 2 1 2 s = (Xi – X ) n 1 i1
An approximate 100(1 - ) % confidence interval for the unknown population mean is given by:
( X - t /2 (n - 1) s/ n , X + t /2 (n - 1) s/ n )
where t /2 (n - 1) is the critical value of the student’s t distribution with degree of freedom n – 1
7. Hypothesis Testing:
One-sample test:
Test whether a parameter is different from a given value (or a given range of values)
Two-sample test:
Chapter 3: Statistical Estimation and Hypothesis Testing Page 8
Test whether the parameters in two different samples are the same as each other
Statistical hypotheses:
Null hypothesis H0 v.s. Alternative hypothesis H1
Significance level :
(a) Determine the critical value for a test
(b) Specify the reliability of the test
(c) Must be given in advance
(d) Type I error (i.e. The probability of rejecting H0 when H0 is true)
Testing procedures: (One-sample case)
(a) Case I: 2 is known
Given a random sample {X1, X2, …, Xn} of size n from a normal distribution with unknown mean but known variance 2
Test the following hypotheses at significance level 5%:
H0: = 0 v.s. H1: > 0 (One-sided test)
Here, 0 is a given number Chapter 3: Statistical Estimation and Hypothesis Testing Page 9
2 Under H0, X ~ N( 0 , /n)
Test statistic under H0:
Z = ( X - 0) / ( / n ) ~ N(0, 1)
Decision: Reject H0 if Z > z0.05 = 1.645 or the p-value = P(Z > zobs) < 0.05
If n is large (i.e. n 10 for moderately non- normal case or n 30 for highly skewed case), the above result is a reasonably good approximation even when the probability distribution for a random sample {X1, X2, …, Xn} is not normal
Other choices of the alternative hypothesis H1 can be
(i) < 0 (One-sided test) (ii) 0 (Two-sided test)
Decision of the test for H1: < 0:
Reject H0 if Z < - z0.05 = - 1.645 the p-value = P(Z <- zobs) < 0.05
Decision of the test for H1: 0:
Reject H0 if | Z | > z0.025 = 1.96 or the p-value = Chapter 3: Statistical Estimation and Hypothesis Testing Page 10
P(|Z| > zobs) < 0.05
(a) Case II: 2 is unknown
Consider a random sample {X1, X2, …, Xn} of size n from a normal distribution with unknown population mean and unknown population variance 2
Test the hypotheses at significance level 5%:
H0: = 0 v.s. H1: > 0 Replace 2 by the sample variance:
n 2 1 2 s = (Xi – X ) n 1 i1
Test statistic under H0:
T = ( X - 0) / (s/ n ) ~ t(n - 1)
Decision: Reject H0 if T > t0.05 (n – 1) or the p-value = P(T > tobs) < 0.05
Decision for the test when H1: < 0:
Reject H0 if T < - t0.05 (n - 1) or the p-value = P(T < - tobs) < 0.05
Decision for the test when H1: 0:
Reject H0 if | T | > t0.025 (n - 1) or the p-value = P(|T| > tobs) < 0.05 Chapter 3: Statistical Estimation and Hypothesis Testing Page 11
If n is large (i.e. n > 10 for moderately non-normal case or n > 30 for highly skewed and/or heavy- tailed case), the above result can be a reasonably good approximation even when the random sample {X1, X2, …, Xn} does not follow a normal distribution
When n is very large, the student’s t distribution with degree of freedom n – 1, say t(n - 1), tends to a standard normal distribution N(0, 1).
In this case, under H0
T = ( X - 0) / (s/ n ) ~ N(0, 1) approximately
Decision for the test at significance level 5% when the alternative hypothesis is H1: 0:
Decision: Reject H0 if | Z | > z0.025 = 1.96 or the p-value = P(|Z| > zobs) < 0.05
Decision for the test when H1: > 0:
Reject H0 if Z > z0.05 = 1.645 or the p-value = P(Z > zobs) < 0.05
Decision for the test when H1: < 0:
Reject H0 if Z < - z0.05 = - 1.645 or the p-value = P(Z < - zobs) < 0.05 Chapter 3: Statistical Estimation and Hypothesis Testing Page 12
Large sample size:
(a) The sample average X is approximately normal
(b) The estimate is more precise (i.e. A smaller standard error)
(c) The sample has higher chance to represent better the major characteristics of the population
Nonparametric tests for median
Suppose the data does not follow a normal distribution or we are not sure whether the data come a normal distribution or not. It is particular the case when we deal with ordinal data
The Central Limit Theorem (CLT) cannot be applied when the sample size is small
In this case, the parametric test based on the normality assumption cannot produce a reasonably good approximation
Adopt the nonparametric tests for median:
(a) Sign test: ‘Sign’ information
(b) Wilcoxon signed-rank test: ‘Sign’ information and Ordinal ‘Magnitude’ information
Basic assumption: The data follows a symmetric distribution with unknown form Chapter 3: Statistical Estimation and Hypothesis Testing Page 13
Testing procedures: (Two-independent sample test)
Case I: 2 is a known
Consider the following random samples for the variable X and the variable Y:
2 X1, X2, …, Xn1 ~ N( 1, ) 2 Y1, Y2, …, Yn2 ~ N( 2, )
Testing the hypotheses at significance level :
H0: 1 = 2 v.s. H1: 1 2
Test statistic under H0:
1 1 Z = ( X - Y )/[ ( + )1/2] ~ N(0, 1) n1 n2
Decision: Reject H0 if | Z | > z / 2 or the p-value = P(|Z| > zobs) <
Other choices of alternative hypotheses are given by:
(a) H1: 1 < 2 (b) H1: 1 > 2
Decision for the test when H1: 1 < 2:
Reject H0 if Z < - z or the p-value = P(Z < - zobs) < Chapter 3: Statistical Estimation and Hypothesis Testing Page 14
Decision for the test when H1: 1 > 2:
Reject H0 if Z > z or the p-value = P(Z > zobs) <
Case II: 2 is unknown
Test the hypotheses at significance level
H0: 1 = 2 v.s. H1: 1 2
Estimate 2 by the pooled estimate of the common 2 variance sp defined as follows:
2 2 2 sp := [(n1 – 1) s1 + (n2 – 1) s2 ]/(n1+n2 – 2)
2 One can interpret sp as the sample variance of the combined sample of X and Y which consists of n1+n2 observations
Test statistic under H0:
1 1 T = ( X - Y )/[s ( + )1/2] ~ t(n +n – 2) p n1 n2 1 2
Decision: Reject H0 if | T | > t / 2 (n1+n2 – 2) or the p-value = P(|T| > tobs) <
Other choices of alternative hypotheses are given by:
(a) H1: 1 < 2 (b) H1: 1 > 2
Decision for the test when H1: 1 < 2:
Reject H0 if T < - t (n1+n2 – 2) or the p-value = P(T <- tobs) Chapter 3: Statistical Estimation and Hypothesis Testing Page 15
<
Decision for the test when H1: 1 > 2:
Reject H0 if T > t (n1+n2 – 2) or the p-value = P(T > tobs) <
2 2 Case III: 1 and 2 are different but known
Test the hypotheses at significance level :
H0: 1 = 2 v.s. H1: 1 2
Test statistic under H0:
2 2 1/2 Z = ( X - Y )/[ ( 1 /n1+ 2 /n2) ] ~ N(0, 1)
Decision: Reject H0 if | Z | > z / 2 or the p-value = P(|Z| > zobs) <
Other choices of alternative hypotheses are given by:
(a) H1: 1 < 2 (b) H1: 1 > 2
Decision for the test when H1: 1 < 2:
Reject H0 if Z < - z or the p-value = P(Z <- zobs) <
Decision for the test when H1: 1 > 2:
Reject H0 if Z > z or the p-value = P(Z > zobs) <
2 2 Case IV: 1 and 2 are different and unknown Chapter 3: Statistical Estimation and Hypothesis Testing Page 16
Test the hypotheses at significance level :
H0: 1 = 2 v.s. H1: 1 2
2 2 2 Estimate 1 and 2 by the sample variances s1 and 2 s2 respectively
Test statistic under H0:
2 2 1/2 T = ( X - Y )/[(s1 /n1 + s2 /n2) ] ~ t(df)
2 2 2 2 2 2 2 df = (s1 /n1 + s2 /n2) / {(s1 /n1) [1/(n1 – 1)] + (s2 /n2) [1/(n2 – 1)]}
Decision: Reject H0 if | T | > t / 2 (df) or the p-value = P(|T| > tobs) <
Other choices of alternative hypotheses are given by:
(a) H1: 1 < 2 (b) H1: 1 > 2
Decision for the test when H1: 1 < 2:
Reject H0 if T < - t (df) or the p-value = P(T < - tobs) <
Decision for the test when H1: 1 > 2:
Reject H0 if T > t (df) or the p-value = P(T > tobs) <
Remarks: Chapter 3: Statistical Estimation and Hypothesis Testing Page 17
(a) If the random samples for X and Y are not normally distributed, we can only obtain approximate results for all of the two-independent sample tests
(b) If the sample size n1 and n2 are sufficiently large, a reasonably good approximation for the two- independent sample test can be obtained
(c) The two-independent sample t test can be considered a special case of ANOVA (i.e. The number of group is two)
Test for equality of two variances (Variance-Ratio test or F- test):
2 Suppose X1, X2,…,Xm iid ~ N( 1, 1 ) and 2 Y1, Y2,…,Yn iid ~ N( 2, 2 )
Assume that the two samples are independent
2 1 and 2 are unknown nuisance parameters while 1 and 2 2 are interested unknown parameters
2 2 2 2 Test H0: 1 = 2 v.s. H1: 1 > 2 (One-sided test) at significance level
2 2 2 Point estimates of 1 and 2 are the sample variances s1 2 and s2 , respectively
Test statistic under H0:
2 2 F = s1 /s2 ~ F(m-1, n-1) Chapter 3: Statistical Estimation and Hypothesis Testing Page 18
Decision: Reject H0 if F > F (m-1, n-1) or the p-value P(F > fobs) <
Question: What are the testing procedures if we consider the one-sided alternative H1: 2 2 1 < 2 and the 2 two-sided alternative H1: 1 2 2 ?
2 2 Decision when H1: 1 < 2 :
Reject H0 if F < F1- (m-1, n-1) or the p-value = P(F < fobs) <
2 2 Decision when H1: 1 2 (i.e. Two-sided test):
(a) Based on the critical values of the F-distribution, we reject H0 if
F < F1- /2(m-1, n-1) or F > F /2(m-1, n-1)
(b) Based on the p-value, we reject H0 if the p-value
p = P(F > fobs) + P(F < 1/fobs) <
An important remark:
Before performing the two-sample t test, one can perform the two-sided variance-ratio test to decide whether 2 1 = 2 2 2 2 or 1 2 Chapter 3: Statistical Estimation and Hypothesis Testing Page 19
Testing procedures: (Paired t test)
Given a random sample of n paired observations (x1, y1), (x2, y2),…, (xn, yn) for the variables X and Y which are dependent or highly correlated
We are interested in studying the characteristic of the difference of the two variables X and Y. In this case, the paired t test is a good choice
Define a new variable Z by:
Z = X – Y,
where Z represents the difference of X and Y
2 Assume that Z ~ N( z, z ), where the population variance 2 z is unknown
Test H0: z = 0 v.s. H1: z 0 (i.e. two-sided test) at significance level
Observations for Z are then given by:
zi := xi - yi, for i = 1, 2, …, n
Let z and sz denote the sample mean and the sample standard deviation of the observations {z1, z2, …, zn}, respectively Chapter 3: Statistical Estimation and Hypothesis Testing Page 20
Test statistic under H0:
T = ( z - 0) / (sz/ n ) ~ t(n –1)
Decision: Reject H0 if | T | > t /2 (n-1) or P(|T| > tobs) =
Decision for the test when H1: z > 0:
Reject H0 if T > t (n-1) or P(T > tobs) <
Decision for the test when H1: z < 0:
Reject H0 if T <- t (n-1) or P(T <- tobs) <
Section 3.2: Statistical computing by SAS
1. Construct confidence intervals for the population mean
Try to run the following SAS program
DATA CIExample; INPUT Y; CARDS; 10 Chapter 3: Statistical Estimation and Hypothesis Testing Page 21
8 12 10 11 9 12 13 7 6 15 5 3 16 18 ; PROC MEANS DATA = CIExample ALPHA = 0.05 N MEAN STD CLM; VAR Y; RUN; Options: N – display the number of data in the SAS output MEAN - display the mean of the data STD – display the standard deviation of the data CLM – display the confidence interval of the mean
The SAS output is given by:
The SAS System 16:49 Wednesday, August 20, 2003 1
The MEANS Procedure
Analysis Variable : Y
Lower 95% Upper 95% N Mean Std Dev CL for Mean CL for Mean ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ 15 10.3333333 4.1861449 8.0151235 12.6515431 ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ From the SAS output the 95% confidence interval for the population mean is given by:
(8.0151235, 12.6515431)
Due to the fact that we are not sure whether the random sample comes from a normal distribution or not, the 95% confidence interval may only be an approximation to the exact one. Chapter 3: Statistical Estimation and Hypothesis Testing Page 22
The approximation is reasonably good if the underlying probability distribution for the random sample is moderately non-normal since 10 < n < 30
2. Hypothesis testing for the population mean :
Use PROC UNIVARIATE statement to perform a two-sided t test for the following hypotheses on the unknown population mean
Test H0: = 0 v.s. H1: 0 (by default)
Test for a non-zero value 0 for the unknown mean
Test H0: = 0 v.s. H1: 0 Use Proc Univariate on the original variable by adding the option command mu0= 0 . Another usual option command is to specify Normal to test the normality assumption.
Alternatively Step I: Perform the following data manipulation
NEW VARIABLE = ORIGINAL VARIABLE - 0 Step II: Use the PROC UNIVARIATE statement to perform a two- sided t test on whether the population mean of the new variable equal to zero or not.
Perform one-side t test Chapter 3: Statistical Estimation and Hypothesis Testing Page 23
Test H0: = 0 v.s. H1: > 0 Step I: Use the PROC UNIVARIATE statement to perform a two- sided t test Step II: Divide the p-value from the SAS output by two
Step III:
Draw conclusion based on the p-value divided by two
(i.e. Reject H0 if the p-value divided by two < 0.05 if the significance level is 5%)
Example:
Test H0: = 10 v.s. > 10 at significance level = 5% based on the dataset specified by the following SAS data step
DATA CIExample; Chapter 3: Statistical Estimation and Hypothesis Testing Page 24
INPUT Y; Z = Y - 10; CARDS; 10 8 12 10 11 9 12 13 7 6 15 5 3 16 18 ; PROC UNIVARIATE DATA = CIExample; VAR Z; RUN;
The part of SAS output relevant to testing the hypotheses is given by:
The SAS System 12:26 Thursday, August 21, 2003 1
The UNIVARIATE Procedure Variable: Z
Tests for Location: Mu0=0
Test -Statistic------p Value------
Student's t t 0.308397 Pr > |t| 0.7623 Sign M 0.5 Pr >= |M| 1.0000 Signed Rank S 4 Pr >= |S| 0.8054
From the SAS output, the p-value of the student’s t test for the two-sided t test on whether the population mean Mu of the variable Z is equal to zero or not is 0.7623. Hence, the p-value divided by two is 0.381 which is greater than 0.05 (for one-sided t test) Chapter 3: Statistical Estimation and Hypothesis Testing Page 25
Conclusion: Do not reject H0: = 10 at significance level 5%
3. SAS Procedure for paired t-test:
Use the PROC TTEST statement to perform paired t test in the SAS system
Example:
Consider the dataset and the variable Z (i.e. the difference of two variables X and Y) defined by the following SAS data step:
DATA Pairexample; INPUT Y X; Z = X – Y; CARDS; 10 11 8 12 12 13 10 11 11 12 9 10 12 8 13 6 7 11 6 10 15 12 5 13 3 6 16 17 18 16 ; 2 2 Assume that Z ~ N( z, z ), where z is unknown
Test H0: z = 0 v.s. H1: z 0 (two-sided t test) at significance level = 0.05
The SAS procedure for testing the above hypotheses is Chapter 3: Statistical Estimation and Hypothesis Testing Page 26
given by:
PROC TTEST DATA = Pairexample; VAR Z; RUN;
It performs a two-sided t test for the hypothesis H0: z = 0 v.s. H1: z 0
The SAS output is given as follows:
The SAS System 16:12 Thursday, August 21, 2003 1
The TTEST Procedure
Statistics
Lower CL Upper CL Lower CL Upper CL Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err
Z 15 -1.193 0.8667 2.9267 2.7235 3.72 5.8667 0.9605
T-Tests
Variable DF t Value Pr > |t|
Z 14 0.90 0.3822
From the SAS result, the p-value is given by 0.3822 > 0.05
Hence, we do not reject H0: z = 0 at significance level = 0.05
3. SAS Procedure for two-independent-sample t test and the variance-ratio test Chapter 3: Statistical Estimation and Hypothesis Testing Page 27
Use the PROC TTEST statement to perform the two-sided two-independent-sample t test and the two-sided variance- ratio test (or two-sided F-test) in the SAS system
Example:
Suppose we are interesting in the difference of the average monthly income of graduates from university A and that of graduates from university B. The sample data for the monthly incomes (quoted in the unit of thousand dollars) of the graduates from the two universities is summarized as follows:
A 10 8 9 12 11 16 20 7 21 18 B 11 7 6 10 12 11 12 8 16 15
We want to test whether there is a significant difference of between the average monthly incomes A of the graduates from university A and the average monthly incomes B of the graduates from university B
Test H0: A = B v.s. H1: A B at significance level
Use the following statements to read the data to the SAS system:
Title ‘Comparison of Average Monthly Incomes’; DATA MI; INPUT University $ Income @@; DATALINES; A 10 A 8 A 9 A 12 A 11 A 16 A 20 A 7 A 21 A 18 B 11 B 7 B 6 B 10 B 12 B 11 B 12 B 8 B 16 B 15 Chapter 3: Statistical Estimation and Hypothesis Testing Page 28
; RUN;
Remarks:
(a) The ‘$’ sign in the input statement indicates that University is a character variable
(b) The ‘@@’ sign in the input statement enable the procedure to read more than one observation per line
Use the PROC TTEST statement to perform the t-test and the Variance-ratio test in the SAS system
PROC TTEST COCHRAN CI = EQUAL UMPU; CLASS University; VAR Income; RUN;
Remarks:
(a) The CLASS statement specifies the variable that distinguishes the groups being compared
(b) The VAR statement specifies the response variable used in the computations
(c) The COCHRAN option creates and displays the p- values for the two-sample t-test under the unequal variance situation using the Cochran and Cox (1950) approximation to the probability level of the Chapter 3: Statistical Estimation and Hypothesis Testing Page 29
approximation of the test statistic (i.e. t-statistic)
(d) The CI = Option is used to provide an equal tailed and uniformly most powerful unbiased (UMPU) confidence intervals for the population variances
The SAS output is shown as follows:
Comparison of Average Monthly Incomes 1 21:27 Sunday, September 21, 2003
The TTEST Procedure
Statistics
Equal UMPU UMPU Lower CL Upper CL Lower CL Lower CL Upper CL Variable University N Mean Mean Mean Std Dev Std Dev Std Dev Std Dev
Income A 10 9.5244 13.2 16.876 3.5342 3.4208 5.1381 8.9697 Income B 10 8.493 10.8 13.107 2.2182 2.147 3.2249 5.6298 Income Diff (1-2) -1.63 2.4 6.4303 3.2412 3.187 4.2895 6.2115
Statistics
Equal Upper CL Variable University Std Dev Std Err Minimum Maximum
Income A 9.3802 1.6248 7 21 Income B 5.8874 1.0198 6 16 Income Diff (1-2) 6.3435 1.9183
T-Tests
Variable Method Variances DF t Value Pr > |t|
Income Pooled Equal 18 1.25 0.2269 Income Satterthwaite Unequal 15.1 1.25 0.2299 Chapter 3: Statistical Estimation and Hypothesis Testing Page 30
Income Cochran Unequal 9 1.25 0.2424
Equality of Variances
Variable Method Num DF Den DF F Value Pr > F
Income Folded F 9 9 2.54 0.1814
Remarks:
The PROC TTEST statement provides the test statistic under the assumption that the population variances of the two groups are equal
It can also provide an approximation to the test statistic under the assumption that the population variances of the two groups are not equal
In this case, Satterthwaite's (1946) approximation is used to compute the degrees of freedom associated with the approximation to the test statistic while the probability level for the approximation to the test statistic is computed by the Cochran and Cox (1950) approximation
The two-sided test for equality of population variances (i.e. the two-sided F-test) is also provided by the PROC TTEST statement
~ End of Chapter 3~