Chu Hai College

Chu Hai College

<p>Chapter 3: Statistical Estimation and Hypothesis Testing Page 1</p><p>Department of Mathematics Faculty of Science and Engineering City University of Hong Kong</p><p>MA3518: Applied Statistics</p><p>Chapter 3: Statistical Estimation and Hypothesis Testing</p><p>Statistical inference is the process of extracting useful information from the observed data about the unknown underlying mechanism that generates the data. It can be considered as the basis of many other branches in statistical science. Three important topics in statistical inference are point estimation, interval estimation and hypothesis testing. This chapter provides a brief introduction to some basic but essential concepts and techniques in statistical inference, for instance, the construction of confidence intervals, one-sample t-test and two-sample t test for an unknown population mean. We will also discuss in some detail the use of SAS procedures to perform the tasks. Topics included in this chapter are listed as follows:</p><p>Section 3.1: Statistical Inference: One-sample and Two-sample Cases Section 3.2: Statistical computing by SAS</p><p>Extract Useful Information From Data! Chapter 3: Statistical Estimation and Hypothesis Testing Page 2</p><p>Section 3.1: Statistical Inference: One-sample and Two-sample Cases</p><p>1. Statistical inference:</p><p> The process of extracting useful information from the observed data about the unknown underlying mechanism that generates the data</p><p> Draw inferences about the values of unknown parameters in a probability distribution for a random sample based on the observed data in the sample</p><p>2. Real-life Examples:</p><p> Estimate the average weight of female students in a university based on the weights of a sample of female students chosen from the population</p><p> Estimate the average monthly income of the negative-asset group in Hong Kong and indicate the precision of the estimate based on a sample of negative-asset people</p><p>3. Three methods for statistical inference:</p><p> Point Estimation  Interval Estimation  Hypothesis testing</p><p>4. Point Estimation:</p><p> Estimate an unknown parameter of a probability distribution by a single number evaluated from sample data</p><p> The single number is called a sample statistic or a statistic Chapter 3: Statistical Estimation and Hypothesis Testing Page 3</p><p> A simple example:</p><p>Given a random sample of size n with observations {x1, x2,…, xn}</p><p>Estimate the population mean  based on the set of observations</p><p>The sample mean or sample average x is given by:</p><p>1 n x =  xn n i1</p><p>Use the sample mean x as a point estimate of  </p><p>Prior sampling, each observation of the sample is unknown and the sample can be represented by a set of random variables {X1, X2, …, Xn}</p><p>In this case, the sample mean X is a random variable given by:</p><p>1 n X =  Xn n i1</p><p>We call X a point estimator of </p><p>Since X is a random variable, one can determine its statistical properties by a probability distribution</p><p>We call the probability distribution for X the sampling distribution for X (see Lecture Note: Chapter one) Chapter 3: Statistical Estimation and Hypothesis Testing Page 4</p><p>If the probability distribution for each of {X1, X2, …,  2 Xn} is normal, then the exact distribution of X is N( , /n)</p><p>5. Central Limit Theorem (CLT): A fundamental theorem of statistical inference</p><p> Suppose  and  2 are the population mean and the population variance, respectively </p><p> The sampling distribution for X can be approximated accurately by a normal distribution with mean  and variance  2/n when the sample size n is sufficiently large (i.e. n  30) </p><p>6. Interval Estimation:</p><p> Precision of a point estimate:</p><p>(a) The reciprocal of the standard deviation of the point estimate </p><p>(b) SD( X ) =  / n (i.e. SD( X ) is the standard error for X</p><p>(c) Precision of estimate increases as the sample size n does</p><p> Point estimation: Cannot tell anything about the precision of the estimator</p><p> Interval Estimation: Use an interval to estimate an unknown parameter in order to indicate the precision of the estimator Chapter 3: Statistical Estimation and Hypothesis Testing Page 5</p><p> Case I:  2 is known</p><p>Consider a random sample {X1, X2, …, Xn} of size n</p><p>For sufficiently large n, X ~ N(  , 2/n) approximately if the distribution for each Xi (i = 1, 2,…, n) is not normal</p><p>Suppose the population variance  2 is known</p><p>Estimate the unknown population mean  by constructing an interval estimate for the mean  based on the approximated distribution N(  , 2/n) for X </p><p>In particular, a 95% confidence interval for  is given by:</p><p>( X - 1.96  / n , X + 1.96  / n )</p><p>As n becomes large, the confidence interval becomes narrower</p><p>=> A higher level of precision is achieved </p><p>Interpretation of the confidence interval:</p><p>P( X - 1.96  / n    X + 1.96  / n ) = P(-1.96  ( X -  )/( / n )  1.96) = P(-1.96  Z  1.96) (Z ~ N(0, 1)) = 95% (From the standard normal table) </p><p>The values of the upper limit and the lower limit of the confidence interval ( X - 1.96  / n , X + 1.96  / n ) Chapter 3: Statistical Estimation and Hypothesis Testing Page 6</p><p> can be determined each time after the experiment of random sampling is conducted If we conduct the random experiment 100 times, we get 100 realizations of the confidence interval ( X - 1.96  / n , X + 1.96  / n )</p><p>Suppose the experiment of random sampling is repeated many times </p><p>Then, the frequenlist’s interpretation of the 95% confidence interval is that 95% of the realizations of the confidence interval contain the unknown mean  </p><p>In general, a 100(1 -  )% confidence interval for  is given by:</p><p>  ( X - z /2 / n , X + z /2 / n )</p><p> where z /2 is the critical value of a standard normal distribution and is given by:</p><p>P(Z  z /2) =  /2</p><p>We can check the value of z /2 from the standard normal table</p><p>Remarks:</p><p>(a) The 100(1 -  )% confidence interval is just an approximation when the probability distribution of Xi (i = 1, 2,…,n) is non-normal</p><p>(b) The result of the approximation is reasonably good when the sample size n  10 for moderately non- Chapter 3: Statistical Estimation and Hypothesis Testing Page 7</p><p> normal distribution and when n  30 for highly skewed and/or heavy-tailed distribution (c) An exact 100(1 -  )% confidence interval can be obtained when each of {X1, X2, …,Xn} is normally distributed</p><p>(d) One can justify the normality assumption by various tests of normality</p><p> Case II:  2 is unknown</p><p>An unbiased point estimator of  2:</p><p> n 2 1 2 s =  (Xi – X ) n 1 i1</p><p>An approximate 100(1 -  ) % confidence interval for the unknown population mean  is given by:</p><p>( X - t /2 (n - 1) s/ n , X + t /2 (n - 1) s/ n )</p><p> where t /2 (n - 1) is the critical value of the student’s t distribution with degree of freedom n – 1</p><p>7. Hypothesis Testing:</p><p> One-sample test:</p><p>Test whether a parameter is different from a given value (or a given range of values)</p><p> Two-sample test: </p><p>Chapter 3: Statistical Estimation and Hypothesis Testing Page 8</p><p>Test whether the parameters in two different samples are the same as each other</p><p> Statistical hypotheses:</p><p>Null hypothesis H0 v.s. Alternative hypothesis H1</p><p> Significance level  :</p><p>(a) Determine the critical value for a test</p><p>(b) Specify the reliability of the test</p><p>(c) Must be given in advance </p><p>(d) Type I error (i.e. The probability of rejecting H0 when H0 is true) </p><p> Testing procedures: (One-sample case)</p><p>(a) Case I:  2 is known </p><p>Given a random sample {X1, X2, …, Xn} of size n from a normal distribution with unknown mean  but known variance  2 </p><p>Test the following hypotheses at significance level 5%:</p><p>    H0: = 0 v.s. H1: > 0 (One-sided test)</p><p> Here, 0 is a given number Chapter 3: Statistical Estimation and Hypothesis Testing Page 9</p><p>  2 Under H0, X ~ N( 0 , /n)</p><p>Test statistic under H0:</p><p> Z = ( X - 0) / ( / n ) ~ N(0, 1)</p><p>Decision: Reject H0 if Z > z0.05 = 1.645 or the p-value = P(Z > zobs) < 0.05 </p><p>If n is large (i.e. n  10 for moderately non- normal case or n  30 for highly skewed case), the above result is a reasonably good approximation even when the probability distribution for a random sample {X1, X2, …, Xn} is not normal</p><p>Other choices of the alternative hypothesis H1 can be</p><p>  (i) < 0 (One-sided test)   (ii)  0 (Two-sided test) </p><p>  Decision of the test for H1: < 0:</p><p>Reject H0 if Z < - z0.05 = - 1.645 the p-value = P(Z <- zobs) < 0.05</p><p>  Decision of the test for H1:  0:</p><p>Reject H0 if | Z | > z0.025 = 1.96 or the p-value = Chapter 3: Statistical Estimation and Hypothesis Testing Page 10</p><p>P(|Z| > zobs) < 0.05</p><p>(a) Case II:  2 is unknown</p><p>Consider a random sample {X1, X2, …, Xn} of size n from a normal distribution with unknown population mean  and unknown population variance  2 </p><p>Test the hypotheses at significance level 5%:</p><p>    H0: = 0 v.s. H1: > 0 Replace  2 by the sample variance:</p><p> n 2 1 2 s =  (Xi – X ) n 1 i1</p><p>Test statistic under H0:</p><p> T = ( X - 0) / (s/ n ) ~ t(n - 1)</p><p>Decision: Reject H0 if T > t0.05 (n – 1) or the p-value = P(T > tobs) < 0.05 </p><p>  Decision for the test when H1: < 0:</p><p>Reject H0 if T < - t0.05 (n - 1) or the p-value = P(T < - tobs) < 0.05</p><p>  Decision for the test when H1:  0:</p><p>Reject H0 if | T | > t0.025 (n - 1) or the p-value = P(|T| > tobs) < 0.05 Chapter 3: Statistical Estimation and Hypothesis Testing Page 11</p><p>If n is large (i.e. n > 10 for moderately non-normal case or n > 30 for highly skewed and/or heavy- tailed case), the above result can be a reasonably good approximation even when the random sample {X1, X2, …, Xn} does not follow a normal distribution</p><p>When n is very large, the student’s t distribution with degree of freedom n – 1, say t(n - 1), tends to a standard normal distribution N(0, 1). </p><p>In this case, under H0</p><p> T = ( X - 0) / (s/ n ) ~ N(0, 1) approximately</p><p>Decision for the test at significance level 5% when the   alternative hypothesis is H1:  0:</p><p>Decision: Reject H0 if | Z | > z0.025 = 1.96 or the p-value = P(|Z| > zobs) < 0.05</p><p>  Decision for the test when H1: > 0:</p><p>Reject H0 if Z > z0.05 = 1.645 or the p-value = P(Z > zobs) < 0.05</p><p>  Decision for the test when H1: < 0:</p><p>Reject H0 if Z < - z0.05 = - 1.645 or the p-value = P(Z < - zobs) < 0.05 Chapter 3: Statistical Estimation and Hypothesis Testing Page 12</p><p> Large sample size: </p><p>(a) The sample average X is approximately normal </p><p>(b) The estimate is more precise (i.e. A smaller standard error)</p><p>(c) The sample has higher chance to represent better the major characteristics of the population</p><p> Nonparametric tests for median </p><p>Suppose the data does not follow a normal distribution or we are not sure whether the data come a normal distribution or not. It is particular the case when we deal with ordinal data </p><p>The Central Limit Theorem (CLT) cannot be applied when the sample size is small </p><p>In this case, the parametric test based on the normality assumption cannot produce a reasonably good approximation</p><p>Adopt the nonparametric tests for median:</p><p>(a) Sign test: ‘Sign’ information</p><p>(b) Wilcoxon signed-rank test: ‘Sign’ information and Ordinal ‘Magnitude’ information</p><p>Basic assumption: The data follows a symmetric distribution with unknown form Chapter 3: Statistical Estimation and Hypothesis Testing Page 13</p><p> Testing procedures: (Two-independent sample test)</p><p>Case I:  2 is a known</p><p>Consider the following random samples for the variable X and the variable Y:</p><p> 2 X1, X2, …, Xn1 ~ N( 1, )  2 Y1, Y2, …, Yn2 ~ N( 2, )</p><p>Testing the hypotheses at significance level :</p><p>    H0: 1 = 2 v.s. H1: 1  2</p><p>Test statistic under H0:</p><p>1 1 Z = ( X - Y )/[ ( + )1/2] ~ N(0, 1) n1 n2</p><p>Decision: Reject H0 if | Z | > z / 2 or the p-value  = P(|Z| > zobs) < </p><p>Other choices of alternative hypotheses are given by:</p><p>  (a) H1: 1 < 2   (b) H1: 1 > 2</p><p>  Decision for the test when H1: 1 < 2:</p><p>Reject H0 if Z < - z or the p-value = P(Z < - zobs) <  Chapter 3: Statistical Estimation and Hypothesis Testing Page 14</p><p>  Decision for the test when H1: 1 > 2:</p><p>Reject H0 if Z > z or the p-value = P(Z > zobs) < </p><p>Case II:  2 is unknown</p><p>Test the hypotheses at significance level </p><p>    H0: 1 = 2 v.s. H1: 1  2</p><p>Estimate  2 by the pooled estimate of the common 2 variance sp defined as follows:</p><p>2 2 2 sp := [(n1 – 1) s1 + (n2 – 1) s2 ]/(n1+n2 – 2)</p><p>2 One can interpret sp as the sample variance of the combined sample of X and Y which consists of n1+n2 observations</p><p>Test statistic under H0:</p><p>1 1 T = ( X - Y )/[s ( + )1/2] ~ t(n +n – 2) p n1 n2 1 2 </p><p>Decision: Reject H0 if | T | > t / 2 (n1+n2 – 2) or the p-value = P(|T| > tobs) < </p><p>Other choices of alternative hypotheses are given by:</p><p>  (a) H1: 1 < 2   (b) H1: 1 > 2</p><p>  Decision for the test when H1: 1 < 2:</p><p>Reject H0 if T < - t (n1+n2 – 2) or the p-value = P(T <- tobs) Chapter 3: Statistical Estimation and Hypothesis Testing Page 15</p><p>< </p><p>  Decision for the test when H1: 1 > 2:</p><p>Reject H0 if T > t (n1+n2 – 2) or the p-value = P(T > tobs) < </p><p> 2  2 Case III: 1 and 2 are different but known</p><p>Test the hypotheses at significance level  :</p><p>    H0: 1 = 2 v.s. H1: 1  2</p><p>Test statistic under H0:</p><p>2 2 1/2 Z = ( X - Y )/[ ( 1 /n1+ 2 /n2) ] ~ N(0, 1)</p><p>Decision: Reject H0 if | Z | > z / 2 or the p-value  = P(|Z| > zobs) < </p><p>Other choices of alternative hypotheses are given by:</p><p>  (a) H1: 1 < 2   (b) H1: 1 > 2</p><p>  Decision for the test when H1: 1 < 2:</p><p>Reject H0 if Z < - z or the p-value = P(Z <- zobs) < </p><p>  Decision for the test when H1: 1 > 2:</p><p>Reject H0 if Z > z or the p-value = P(Z > zobs) < </p><p>2 2 Case IV:  1 and  2 are different and unknown Chapter 3: Statistical Estimation and Hypothesis Testing Page 16</p><p>Test the hypotheses at significance level  :</p><p>    H0: 1 = 2 v.s. H1: 1  2</p><p> 2  2 2 Estimate 1 and 2 by the sample variances s1 and 2 s2 respectively</p><p>Test statistic under H0:</p><p>2 2 1/2 T = ( X - Y )/[(s1 /n1 + s2 /n2) ] ~ t(df)</p><p>2 2 2 2 2 2 2 df = (s1 /n1 + s2 /n2) / {(s1 /n1) [1/(n1 – 1)] + (s2 /n2)  [1/(n2 – 1)]}</p><p>Decision: Reject H0 if | T | > t / 2 (df) or the p-value  = P(|T| > tobs) < </p><p>Other choices of alternative hypotheses are given by:</p><p>  (a) H1: 1 < 2   (b) H1: 1 > 2</p><p>  Decision for the test when H1: 1 < 2:</p><p>Reject H0 if T < - t (df) or the p-value = P(T < - tobs) <  </p><p>  Decision for the test when H1: 1 > 2:</p><p>Reject H0 if T > t (df) or the p-value = P(T > tobs) <  </p><p>Remarks: Chapter 3: Statistical Estimation and Hypothesis Testing Page 17</p><p>(a) If the random samples for X and Y are not normally distributed, we can only obtain approximate results for all of the two-independent sample tests</p><p>(b) If the sample size n1 and n2 are sufficiently large, a reasonably good approximation for the two- independent sample test can be obtained </p><p>(c) The two-independent sample t test can be considered a special case of ANOVA (i.e. The number of group is two)</p><p> Test for equality of two variances (Variance-Ratio test or F- test): </p><p> 2 Suppose X1, X2,…,Xm iid ~ N( 1,  1 ) and  2 Y1, Y2,…,Yn iid ~ N( 2,  2 )</p><p>Assume that the two samples are independent</p><p>  2 1 and 2 are unknown nuisance parameters while  1 and 2  2 are interested unknown parameters</p><p>2 2 2 2 Test H0:  1 =  2 v.s. H1:  1 > 2 (One-sided test) at significance level </p><p>2 2 2 Point estimates of  1 and  2 are the sample variances s1 2 and s2 , respectively</p><p>Test statistic under H0:</p><p>2 2 F = s1 /s2 ~ F(m-1, n-1) Chapter 3: Statistical Estimation and Hypothesis Testing Page 18</p><p>Decision: Reject H0 if F > F (m-1, n-1) or the p-value  P(F > fobs) < </p><p>Question: What are the testing procedures if we consider  the one-sided alternative H1: 2 2 1 < 2 and the  2 two-sided alternative H1: 1  2  2 ? </p><p> 2  2 Decision when H1: 1 < 2 :</p><p>Reject H0 if F < F1- (m-1, n-1) or the p-value = P(F < fobs) < </p><p> 2  2 Decision when H1: 1  2 (i.e. Two-sided test):</p><p>(a) Based on the critical values of the F-distribution, we reject H0 if </p><p>F < F1- /2(m-1, n-1) or F > F /2(m-1, n-1)</p><p>(b) Based on the p-value, we reject H0 if the p-value </p><p> p = P(F > fobs) + P(F < 1/fobs) < </p><p>An important remark:</p><p>Before performing the two-sample t test, one can perform the two-sided variance-ratio test to decide whether 2  1 =  2  2  2 2 or 1  2 Chapter 3: Statistical Estimation and Hypothesis Testing Page 19</p><p> Testing procedures: (Paired t test)</p><p>Given a random sample of n paired observations (x1, y1), (x2, y2),…, (xn, yn) for the variables X and Y which are dependent or highly correlated</p><p>We are interested in studying the characteristic of the difference of the two variables X and Y. In this case, the paired t test is a good choice</p><p>Define a new variable Z by:</p><p>Z = X – Y,</p><p> where Z represents the difference of X and Y</p><p> 2 Assume that Z ~ N( z,  z ), where the population variance 2  z is unknown</p><p>    Test H0: z = 0 v.s. H1: z  0 (i.e. two-sided test) at significance level  </p><p>Observations for Z are then given by:</p><p> zi := xi - yi, for i = 1, 2, …, n</p><p>Let z and sz denote the sample mean and the sample standard deviation of the observations {z1, z2, …, zn}, respectively Chapter 3: Statistical Estimation and Hypothesis Testing Page 20</p><p>Test statistic under H0:</p><p> T = ( z - 0) / (sz/ n ) ~ t(n –1)</p><p>Decision: Reject H0 if | T | > t /2 (n-1) or P(|T| > tobs) =  </p><p>  Decision for the test when H1: z > 0:</p><p> Reject H0 if T > t (n-1) or P(T > tobs) < </p><p>  Decision for the test when H1: z < 0:</p><p> Reject H0 if T <- t (n-1) or P(T <- tobs) < </p><p>Section 3.2: Statistical computing by SAS</p><p>1. Construct confidence intervals for the population mean </p><p>Try to run the following SAS program</p><p>DATA CIExample; INPUT Y; CARDS; 10 Chapter 3: Statistical Estimation and Hypothesis Testing Page 21</p><p>8 12 10 11 9 12 13 7 6 15 5 3 16 18 ; PROC MEANS DATA = CIExample ALPHA = 0.05 N MEAN STD CLM; VAR Y; RUN; Options: N – display the number of data in the SAS output MEAN - display the mean of the data STD – display the standard deviation of the data CLM – display the confidence interval of the mean</p><p>The SAS output is given by:</p><p>The SAS System 16:49 Wednesday, August 20, 2003 1</p><p>The MEANS Procedure</p><p>Analysis Variable : Y</p><p>Lower 95% Upper 95% N Mean Std Dev CL for Mean CL for Mean ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ 15 10.3333333 4.1861449 8.0151235 12.6515431 ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ From the SAS output the 95% confidence interval for the population mean is given by:</p><p>(8.0151235, 12.6515431)</p><p>Due to the fact that we are not sure whether the random sample comes from a normal distribution or not, the 95% confidence interval may only be an approximation to the exact one. Chapter 3: Statistical Estimation and Hypothesis Testing Page 22</p><p>The approximation is reasonably good if the underlying probability distribution for the random sample is moderately non-normal since 10 < n < 30 </p><p>2. Hypothesis testing for the population mean  : </p><p> Use PROC UNIVARIATE statement to perform a two-sided t test for the following hypotheses on the unknown population mean </p><p>  Test H0: = 0 v.s. H1:  0 (by default) </p><p>   Test for a non-zero value 0 for the unknown mean </p><p>    Test H0: = 0 v.s. H1:  0 Use Proc Univariate on the original variable by adding the  option command mu0= 0 . Another usual option command is to specify Normal to test the normality assumption.</p><p>Alternatively Step I: Perform the following data manipulation </p><p> NEW VARIABLE = ORIGINAL VARIABLE - 0 Step II: Use the PROC UNIVARIATE statement to perform a two- sided t test on whether the population mean of the new variable equal to zero or not.</p><p> Perform one-side t test Chapter 3: Statistical Estimation and Hypothesis Testing Page 23</p><p>  Test H0: = 0 v.s. H1: > 0 Step I: Use the PROC UNIVARIATE statement to perform a two- sided t test Step II: Divide the p-value from the SAS output by two</p><p>Step III:</p><p>Draw conclusion based on the p-value divided by two</p><p>(i.e. Reject H0 if the p-value divided by two < 0.05 if the significance level is 5%)</p><p> Example:</p><p>  Test H0: = 10 v.s. > 10 at significance level  = 5% based on the dataset specified by the following SAS data step</p><p>DATA CIExample; Chapter 3: Statistical Estimation and Hypothesis Testing Page 24</p><p>INPUT Y; Z = Y - 10; CARDS; 10 8 12 10 11 9 12 13 7 6 15 5 3 16 18 ; PROC UNIVARIATE DATA = CIExample; VAR Z; RUN;</p><p>The part of SAS output relevant to testing the hypotheses is given by:</p><p>The SAS System 12:26 Thursday, August 21, 2003 1</p><p>The UNIVARIATE Procedure Variable: Z</p><p>Tests for Location: Mu0=0</p><p>Test -Statistic------p Value------</p><p>Student's t t 0.308397 Pr > |t| 0.7623 Sign M 0.5 Pr >= |M| 1.0000 Signed Rank S 4 Pr >= |S| 0.8054</p><p>From the SAS output, the p-value of the student’s t test for the two-sided t test on whether the population mean Mu of the variable Z is equal to zero or not is 0.7623. Hence, the p-value divided by two is 0.381 which is greater than 0.05 (for one-sided t test) Chapter 3: Statistical Estimation and Hypothesis Testing Page 25</p><p> Conclusion: Do not reject H0: = 10 at significance level 5% </p><p>3. SAS Procedure for paired t-test:</p><p> Use the PROC TTEST statement to perform paired t test in the SAS system</p><p> Example:</p><p>Consider the dataset and the variable Z (i.e. the difference of two variables X and Y) defined by the following SAS data step:</p><p>DATA Pairexample; INPUT Y X; Z = X – Y; CARDS; 10 11 8 12 12 13 10 11 11 12 9 10 12 8 13 6 7 11 6 10 15 12 5 13 3 6 16 17 18 16 ;  2 2 Assume that Z ~ N( z,  z ), where  z is unknown</p><p>  Test H0: z = 0 v.s. H1: z  0 (two-sided t test) at significance level  = 0.05</p><p>The SAS procedure for testing the above hypotheses is Chapter 3: Statistical Estimation and Hypothesis Testing Page 26</p><p> given by:</p><p>PROC TTEST DATA = Pairexample; VAR Z; RUN; </p><p>It performs a two-sided t test for the hypothesis   H0: z = 0 v.s. H1: z  0 </p><p>The SAS output is given as follows:</p><p>The SAS System 16:12 Thursday, August 21, 2003 1</p><p>The TTEST Procedure</p><p>Statistics</p><p>Lower CL Upper CL Lower CL Upper CL Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err</p><p>Z 15 -1.193 0.8667 2.9267 2.7235 3.72 5.8667 0.9605</p><p>T-Tests</p><p>Variable DF t Value Pr > |t|</p><p>Z 14 0.90 0.3822</p><p>From the SAS result, the p-value is given by 0.3822 > 0.05</p><p> Hence, we do not reject H0: z = 0 at significance level  = 0.05</p><p>3. SAS Procedure for two-independent-sample t test and the variance-ratio test Chapter 3: Statistical Estimation and Hypothesis Testing Page 27</p><p> Use the PROC TTEST statement to perform the two-sided two-independent-sample t test and the two-sided variance- ratio test (or two-sided F-test) in the SAS system</p><p> Example:</p><p>Suppose we are interesting in the difference of the average monthly income of graduates from university A and that of graduates from university B. The sample data for the monthly incomes (quoted in the unit of thousand dollars) of the graduates from the two universities is summarized as follows:</p><p>A 10 8 9 12 11 16 20 7 21 18 B 11 7 6 10 12 11 12 8 16 15 </p><p>We want to test whether there is a significant difference of  between the average monthly incomes A of the graduates  from university A and the average monthly incomes B of the graduates from university B</p><p>    Test H0: A = B v.s. H1: A  B at significance level </p><p>Use the following statements to read the data to the SAS system:</p><p>Title ‘Comparison of Average Monthly Incomes’; DATA MI; INPUT University $ Income @@; DATALINES; A 10 A 8 A 9 A 12 A 11 A 16 A 20 A 7 A 21 A 18 B 11 B 7 B 6 B 10 B 12 B 11 B 12 B 8 B 16 B 15 Chapter 3: Statistical Estimation and Hypothesis Testing Page 28</p><p>; RUN;</p><p>Remarks:</p><p>(a) The ‘$’ sign in the input statement indicates that University is a character variable</p><p>(b) The ‘@@’ sign in the input statement enable the procedure to read more than one observation per line</p><p>Use the PROC TTEST statement to perform the t-test and the Variance-ratio test in the SAS system</p><p>PROC TTEST COCHRAN CI = EQUAL UMPU; CLASS University; VAR Income; RUN;</p><p>Remarks: </p><p>(a) The CLASS statement specifies the variable that distinguishes the groups being compared</p><p>(b) The VAR statement specifies the response variable used in the computations</p><p>(c) The COCHRAN option creates and displays the p- values for the two-sample t-test under the unequal variance situation using the Cochran and Cox (1950) approximation to the probability level of the Chapter 3: Statistical Estimation and Hypothesis Testing Page 29</p><p> approximation of the test statistic (i.e. t-statistic)</p><p>(d) The CI = Option is used to provide an equal tailed and uniformly most powerful unbiased (UMPU) confidence intervals for the population variances </p><p>The SAS output is shown as follows:</p><p>Comparison of Average Monthly Incomes 1 21:27 Sunday, September 21, 2003</p><p>The TTEST Procedure</p><p>Statistics</p><p>Equal UMPU UMPU Lower CL Upper CL Lower CL Lower CL Upper CL Variable University N Mean Mean Mean Std Dev Std Dev Std Dev Std Dev</p><p>Income A 10 9.5244 13.2 16.876 3.5342 3.4208 5.1381 8.9697 Income B 10 8.493 10.8 13.107 2.2182 2.147 3.2249 5.6298 Income Diff (1-2) -1.63 2.4 6.4303 3.2412 3.187 4.2895 6.2115</p><p>Statistics</p><p>Equal Upper CL Variable University Std Dev Std Err Minimum Maximum</p><p>Income A 9.3802 1.6248 7 21 Income B 5.8874 1.0198 6 16 Income Diff (1-2) 6.3435 1.9183</p><p>T-Tests</p><p>Variable Method Variances DF t Value Pr > |t|</p><p>Income Pooled Equal 18 1.25 0.2269 Income Satterthwaite Unequal 15.1 1.25 0.2299 Chapter 3: Statistical Estimation and Hypothesis Testing Page 30</p><p>Income Cochran Unequal 9 1.25 0.2424</p><p>Equality of Variances</p><p>Variable Method Num DF Den DF F Value Pr > F</p><p>Income Folded F 9 9 2.54 0.1814</p><p>Remarks:</p><p> The PROC TTEST statement provides the test statistic under the assumption that the population variances of the two groups are equal</p><p> It can also provide an approximation to the test statistic under the assumption that the population variances of the two groups are not equal</p><p>In this case, Satterthwaite's (1946) approximation is used to compute the degrees of freedom associated with the approximation to the test statistic while the probability level for the approximation to the test statistic is computed by the Cochran and Cox (1950) approximation</p><p> The two-sided test for equality of population variances (i.e. the two-sided F-test) is also provided by the PROC TTEST statement</p><p>~ End of Chapter 3~</p>

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    30 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us