Confidence Intervals, Testing and ANOVA Summary 1 One–Sample

Confidence Intervals, Testing and ANOVA Summary 1 One{Sample Tests 1.1 One Sample z{test: Mean (σ known) Let X1; ··· ;Xn a r.s. from N(µ, σ) or n > 30. Let H0 : µ = µ0: The test statistic is x¯ − µ z = p 0 ∼ N(0; 1) σ= n for H . 0 σ A (1 − α)100% confidence interval for µ isx ¯ ± z? p . Sample size for n margin of error, m, is z?σ 2 n = : m 1 1.2 One Sample t{test: Mean (σ unknown) Let X1; ··· ;Xn a random sample and let either the population is normal or 15 ≤ n < 40 and there are no outliers or strong skewness or n ≥ 40. Let H0 : µ = µ0: The test statistic is x¯ − µ t = p 0 ∼ t(n − 1) s= n for H . 0 s A (1 − α)100% confidence interval for µ isx ¯ ± t?(n − 1)p . n 1.3 Matched Pairs Let (X1;Y1) ··· (Xn;Yn) be a r.s. and define Dj = Xj −Yj. Assume n > 30, or the Dj's are normal (or pretty much so). Let H0 : µD = d: The test statistic is d¯− d t = p ∼ t(n − 1) sD= n for H0. A (1 − α)100% CI for µD is s d¯± t?(n − 1)pD : n 2 2 Two Sample Tests 2.1 Two Sample z{test: Mean (σX and σY both known) Let X1; ··· ;XnX and Y1; ··· ;YnY be independent r.s.'s. Assume nX > 30 and nY > 30, or that both r.s.'s are normal. Let H0 : µX = µY The test statistic is x¯ − y¯ z = ∼ N(0; 1) q σ2 σ2 X + Y nX nY for H0. A (1 − α)100% confidence interval for µX − µY is s σ2 σ2 x¯ − y¯ ± z? X + Y : nX nY 3 2.2 Two Sample t{test: Mean (σX and σY both unknown) Let X1; ··· ;XnX and Y1; ··· ;YnY be independent r.s.'s. Assume nX > 30 and nY > 30, or that both r.s.'s are normal. Let H0 : µX = µY The test statistic is x¯ − y¯ t = ∼ t(df) q s2 s2 X + Y nX nY 2 s2 s2 X + Y nX nY for H0. Welch's t Test lets df = 2 2 . The con- s2 s2 1 X + 1 Y nX −1 nX nY −1 nY servative Welch's t Test is to let df be the largest integer that is less than or equal to the df of Welch's Test. An even more conservative test is to let the df = min(nX − 1; nY − 1). A (1 − α)100% confidence interval for µX − µY is s s2 s2 x¯ − y¯ ± t?(df) X + Y : nX nY 4 2.3 Two Sample t{test: Mean (σY = σY unknown) Let X1; ··· ;XnX and Y1; ··· ;YnY be independent r.s.'s. Assume nX > 30 and nY > 30, or both r.s.'s are normal. Let H0 : µX = µY 2 2 Define the pooled estimate of σX = σY to be 2 2 2 (nX − 1)sX + (nY − 1)sY sp = : nX + nY − 2 The test statistic is x¯ − y¯ t = ∼ t(nX + nY − 2) q 1 1 sp + nX nY for H0. A (1 − α)100% CI for µX − µY is r ? 1 1 (¯x − y¯) ± t (nX + nY − 2)sp + : nX nY Note: It is generally difficult to verify that the two variances are equal, so it is safer not to make this assumption unless one is confident the variances are equal. 5 2.4 Two Sample f{test: Standard Deviation Let X1; ··· ;XnX and Y1; ··· ;YnY be independent normal r.s.'s, where the first r.s. is the one with the larger sample variance. Let H0 : σX = σY The test statistic is 2 sX f = 2 ∼ F (nX − 1; nY − 1) sY ? for H0. Use the right hand tail for critical values, f , for a two{sided test. Warning: the above f{test is not robust with respect to the normality assumption. 6 3 Proportion Tests 3.1 One Sample Large Sample Population Proportion z{test Let X1; ··· ;Xn be a r.s. from Xj ∼ BIN(1; p), H0 : p = p0 and np0 ≥ 10 and n(1 − p0) ≥ 10; # heads (some books use 5 instead of 10 here). Then letp ^ = and the test n statistic be p^ − p0 z = p ∼ N(0; 1) p0(1 − p0)=n ¯ (X =p ^ is assumed to be normal by CLT) for H0. When # heads and # tails are both ≥ 15, a (1−α)100% confidence interval r p^(1 − p^) for p isp ^ ± z? when α ≤ 0:1. n Sample size for margin of error, m, is ( ? 2 (z ) p^(1−p^) p^ known n = m2 : (z?)2 4m2 p^ unknown A plus four (1 − α)100% confidence interval for p is obtained by using above procedure, but first adding two heads and two tails to the random sample (increasing the sample size to n + 4). Use when sample size is ≥ 10 and α ≤ 0:1. 7 3.2 Two Sample Proportions z{test Let X1; ··· ;XnX and Y1; ··· ;YnY be independent r.s. where Xj ∼ BIN(1; pX ) and Yk ∼ BIN(1; pY ). Let H0 : pX = pY = p # heads where p is unknown. Letp ^ = . Assume the number of heads and # tosses tails in each sample is at least 5. Define the pooled estimate of pX and pY to be n p^ + n p^ p¯ = X X Y Y nX + nY and the test statistic be p^X − p^Y z = q ∼ N(0; 1) p¯(1−p¯) + p¯(1−p¯) nX nY for H0. A (1 − α)100% CI for pX − pY when the number of heads and tails is at least 10 for each sample is s ? p^X (1 − p^X ) p^Y (1 − p^Y ) (^pX − p^Y ) ± z + : nX nY A plus four (1 − α)100% confidence interval for pX − pY is obtained by using above procedure, but first adding one head and one tail to each of the random samples (increasing each sample size by 2). Use when α ≤ 0:1.. 8 4 Correlation The linear correlation coefficient for (x1; y1); ··· ; (xn; yn) is Pn Pn Pn n j=1 xjyj − j=1 xj j=1 yj r = : r 2r 2 Pn 2 Pn Pn 2 Pn n j=1 xj − j=1 xj n j=1 yj − j=1 yj The test statistic for H0 : ρ = 0 is r n − 2 r ∼ t(n − 2): 1 − r2 for H0. 9 5 Chi{Squared Tests 5.1 Goodness of Fit Let X1; ··· ;Xn be a categorical r.s. where there is a total of k categories and th P (X = j category) = pj. Let H0 : p1 = a1; ··· ; pk = ak where the aj's are given. Define def th oj = # of j categories observed def th ej = naj = # of j categories expected under H0 and assume that ej ≥ 1 for all j's and that no more than a fifth of the expected counts are < 5. In this case, the test statistic is k 2 X (oj − ej) ∼ χ2(k − 1) e j=1 j 2 under H0 and one rejects H0 for large χ values. 10 5.2 Chi{Squared Test for Independence Given a two{way table, oij, of observed outcomes, with r possible row outcomes and c possible column outcomes, let H0 : there is no relationship between column and row variables. Define def oij = cell ij total (ith row total)(jth column total) e def= = expected count in cell ij under H ij table total 0 and assume that eij ≥ 1 for all cells and that no more than a fifth of the expected counts are < 5. In this case, the test statistic is r r 2 X X (oij − eij) ∼ χ2((r − 1)(c − 1)) e i=1 j=1 ij 2 under H0 and one rejects H0 for large χ values. 6 Simple Regression Given the bivariate random sample, (x1; y1) ··· ; (xn; yn) Statistical Model of Simple Linear Regression: Given a predictor, x, the response, y is y = β0 + β1x + x where β0 + β1x is the mean response for x. The noise terms, the x's, are assumed to be independent of each other and to be randomly sampled from N(0; σ). 11 Estimating β0, β1 and σ: The least{squares regression line, y = b0 + b1x is obtained by letting Pn Pn Pn s n( xjyj) − ( xj)( yj) b = r y = j=1 j=1 j=1 and b =y ¯−b x:¯ 1 Pn 2 Pn 2 0 1 sx n j=1 xj − ( j=1 xj) where b0 is an unbiased estimator of β0 and b1 is an unbiased estimator of β1. The variance of the observed yi's about the predictedy î's is P 2 P 2 P P (y − y^ ) y − b0 yj − b1 xjyj s2 def= j j = j ; n − 2 n − 2 which is an unbiased estimator of σ2. The standard error of estimate (also called the residual standard error) is s, an estimator of σ. Hypothesis Tests and Confidence Intervals for β0 and β1: Let s 2 def s def 1 x¯ SEb = and SEb = + n : 1 q n 0 P 2 P 2 n j=1(xj − x¯) j=1(xj − x¯) SEb0 and SEb1 are the standard error of the intercept, β0, and the slope, β1, for the least{squares regression line. b1 To test the hypothesis H0 : β1 = 0 use the test statistic t ∼ ∼ t(n−2). SEb1 ∗ A level (1−α)100% confidence interval for the slope β1 is b1±t (n−2)×SEb1 .

Load more