AMS 572 Class Notes
Nov . 7, 2006
Chapter 8: Inference on two population means. 1. Paired samples design 2. Independent sample design 1) pooled variance t-test and CI (normal populations (using Shapiro-Wilk test),
2 2 2 population variances unknown but equal:  1   2   ) 2) pooled variance t-test and CI (normal populations, population variance unknown and unequal). 3. Comparison of two population variances –Both populations normal the F-test
Two independent random samples
iid  2 x1 , x2 ,..., xn ~ N(1 , 1 ) indep.  iid  2 y1 , y2 ,..., yn ~ N(2 , 2 )
2 s1 2 2 H0 :2 = 1 镲H0:s 1= s 2 s 2 眄 2 2 2 Ha :s1 s 2 s 1 Ha :2 1 s 2
Pivotal Quantity
2 2 s1 s 2 a) Start with the point estimator for 哌 2 2 S1 S 2 Independent: (n- 1) S 2 1 1 ~ c 2 2 n1 -1 s1 (n- 1) S 2 2 2 ~ c 2 2 n2 -1 s 2
Definition: F-Distribution W/ k W ~ c 2 W ~ c 2 W W F= 1 1 ~ F Let 1 k1 , 2 k2 , and 1 , 2 are independent. Then k1, k 2 W2/ k 2
P.Q.: (n- 1) S 2 1 1 (n - 1) s21 S 2 s 2 F=1 = 1 1 ~ F 2 2 2 n1-1, n 2 - 1 (n2- 1) S 2 S2s 2 2 (n2 - 1) s 2 Test Statistic:
2 2 2 S S S H0 F=1 2 = 1 ~ F 02 n1- 1, n 2 - 1 1 S2
At significance level a , we will reject H0 in favor of Ha iif. F F F F 0n1- 1, n 2 - 1,a 2, upper or 0n1- 1, n 2 - 1,a 2, lower s 2 H :1 = 1 0 s 2 2 H F F 2 Reject 0 iif 0n1- 1, n 2 - 1,a , upper s1 Ha :2 > 1 s 2
s 2 H :1 = 1 0 s 2 2 H F F 2 Reject 0 iff 0n1- 1, n 2 - 1,a , lower s1 Ha :2 < 1 s 2
Deriving Confidence Interval:
PF# F F = 1 -a ( n1-1, n 2 - 1,a 2, lower n 1 - 1, n 2 - 1, a 2, upper ) 骣 S 2s 2 蓿 PF1 1 � F - 1 a 琪 n1-1, n 2 - 1,a 2, lower2 2 n 1 - 1, n 2 - 1, a 2, upper 桫 S2s 2 骣 1S2s 2 1 S 2 拮P琪 1 # 1� 1 - 1 a 琪F S2s 2 F S 2 桫 n1-1, n 2 - 1,a 2, upper 2 2 n 1 - 1, n 2 - 1, a 2, lower 2 Thus (1-a )% CI is:
骣 S2 S 2 S 2 S 2 琪 1 2, 1 2 琪F F 桫 n1-1, n 2 - 1,a 2, upper n 1 - 1, n 2 - 1, a 2, lower Since we have 1 F~F ~ F k1, k 2F k 2 , k 1 骣1 1 P F�F 蕹a P琪 = a ( k1, k 2 ,a , L ) 琪F F 桫 k1, k 2 ,a , L 1 = F F n2-1, n 1 - 1,a , U n1-1, n 2 - 1,a , L
Thus CI is:
骣 S2 S 2 琪 1 2 , S2 S 2 F 琪F 1 2n2- 1, n 1 - 1,a 2, U 桫 n1-1, n 2 - 1,a 2, U Nov . 9, 2006
4. Large independent samples (both samples are large): n1吵30, n 2 30
P.Q. for m1- m 2
iid 2 X1, X 2, ... X n ~ ?(m 1 , s 1 ) iid 2 Y1, Y 2, ... Yn ~ ?(m 2 , s 2 )
Start with the point estimator X- Y unbiased estimator for (m1- m 2 )
E( X- Y ) = E ( X ) - E ( Y ) =m1 - m 2
s 2 X~ N (m ,1 ) 1 n indep. 1 2 s 2 Y~ N (m2 , ) n2
2 2 骣 s1 s 2 X- Y~ N 琪m1 - m 2 , + 桫 n1 n 2 . If W and V are independent, then Var( W+ V ) = Var ( W ) + Var ( V ) , Var( W- V ) = Var ( W ) + Var ( V )
. For any R.V. W and V, Var( W+ V ) = Var ( W ) + Var ( V ) + 2 Cov ( W , V ) , Var( W- V ) = Var ( W ) + Var ( V ) - 2 Cov ( W , V )
Proof. X~?(m , s 2 ) , p.d.f. of X is f(x), assume X is continuous.
s2=Var( X ) = E ( X - E ( X )) 2 = E ( X - m ) 2
+ = (x- m )2 f ( x ) dx - = E( X2 )- ( EX ) 2
= E( X 2 ) - m 2 Var( W+ V ) = E [( W + V ) - E ( W + V )]2 =E [( W - EW ) + ( V - EV )]2 =E [( W - EW )2 + ( V - EV ) 2 + 2( W - EW )( V - EV )] =E ( W - EW )2 + E ( V - EV ) 2 + 2 E ( W - EW )( V - EV ) Cov( W , V )= E ( W - EW )( V - EV ) Cov( W , V ) Correlation( W , V ) =r = Var( W ) Var ( V )
Cov( W , V )= 0� r 0 W and V are uncorrelated they do NOT have LINEAR relationship. But if they are independent, they don’t have ANY relationship.
2 2 g g S1 S2 Slusky’s Theorem: X~ N (m1 , ) , Y~ N (m2 , ) , then n1 n2
2 2 骣 S1 S 2 X- Y~ N 琪m1 - m 2 , + 桫 n1 n 2 Pivotal Quantity X- Y -(m - m ) Z = 1 2 S2 S 2 1+ 2 n1 n 2
100(1-a )% CI for (m1- m 2 )
P(- Za2#Z Z a 2 ) = 1 -a
骣 琪 琪 X- Y -(m - m ) P- Z# 1 2 Z = 1 -a 琪 a22 2 a 2 琪 S1 S 2 琪 + 桫 n1 n 2
骣 S2 S 2 S 2 S 2 P琪- Z�1 � 2 -X - Y W (m m ) + Z = 1 - 2 1 a 琪 a2 1 2 a 2 桫 n1 n 2 n 1 n 2
骣 S2 S 2 S 2 S 2 P琪 (X- Y ) - Z�1 � 2 �m + m � ( X = Y ) - Z 1 2 1 a 琪 a2 1 2 a 2 桫 n1 n 2 n 1 n 2 Hypothesis Test
H0:m 1- m 2 = 0
Ha :m1- m 2 > 0 Test Statistic
H0 (X- Y ) - 0 g Z0 = ~ N (0,1) S2 S 2 1+ 2 n1 n 2
At significance level a , we reject H0 in favor of Ha iff Z0 Za H0:m 1- m 2 = 0 reject H0 iff Z0 Za Ha :m1- m 2 < 0
H0:m 1- m 2 = 0 reject H0 iff |Z0 | Za 2 Ha :m1- m 2 0 5. At least one population is NOT normal, and at least one sample is small (all except in 1,2,4). Non parametric test Wilcoxon Rank Sum Test (Mann-Whitney U Test) 6. Sample size determination 7. SAS program and examples Example: A new method of making concrete blocks has been proposed. To test whether or not the new method increases the compressive strength, five sample blocks are made by each method. (in 10 pounds per inch2 ) New method 14 15 13 15 16 Old method 13 15 13 12 14 a. Get a 95% CI for the mean difference between the compressive strength by two methods. b. At a =0.05, can you conclude the new method is better? Provide p-value.
Independent samples. SAS Program:
data block; input method strength; datalines; 1 14 1 15 1 13 1 15 1 16 2 13 2 15 2 13 2 12 2 14 ; run; proc univariate data=block normal; class method; var strength; run;
Result from Shapiro Wilk test:
The UNIVARIATE Procedure Variable: strength method = 1
Tests for Normality
Test --Statistic------p Value------
Shapiro-Wilk W 0.960859 Pr < W 0.8140 Kolmogorov-Smirnov D 0.23714 Pr > D >0.1500 Cramer-von Mises W-Sq 0.03991 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.231804 Pr > A-Sq >0.2500 p-value = 0.8140 > 0.1 Normal The UNIVARIATE Procedure Variable: strength method = 2
Tests for Normality
Test --Statistic------p Value------
Shapiro-Wilk W 0.960859 Pr < W 0.8140 Kolmogorov-Smirnov D 0.23714 Pr > D >0.1500 Cramer-von Mises W-Sq 0.03991 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.231804 Pr > A-Sq >0.2500 p-value = 0.8140 > 0.1 Normal
SAS program for T-test:
proc ttest data=block; class method; var strength; run;
Results: The TTEST Procedure
Equality of Variances
Variable Method Num DF Den DF F Value Pr > F
strength Folded F 4 4 1.00 1.0000
2 2 p-value = 1 s1= s 2
T-Tests
Variable Method Variances DF t Value Pr > |t|
strength Pooled Equal 8 1.66 0.1347 strength Satterthwaite Unequal 8 1.66 0.1347 So we shall use the pooled method, p-value= 0.1347/2=0.06735 If at least one population is not normal, we use Wilcoxon Rank Sum Test: SAS Program:
proc Npar1way data=block wilcoxon; class method; var strength; run;
Result: t Approximation One-Sided Pr > Z 0.0980 Two-Sided Pr > |Z| 0.1961
So the one-sided p-value = 0.0980.
