UCLA STAT 13 Comparison of Two Independent Samples
Total Page:16
File Type:pdf, Size:1020Kb
UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences Comparison of Two Instructor: Ivo Dinov, Independent Samples Asst. Prof. of Statistics and Neurology Teaching Assistants: Fred Phoa, Kirsten Johnson, Ming Zheng & Matilda Hsieh University of California, Los Angeles, Fall 2005 http://www.stat.ucla.edu/~dinov/courses_students.html Slide 1 Stat 13, UCLA, Ivo Dinov Slide 2 Stat 13, UCLA, Ivo Dinov Comparison of Two Independent Samples Comparison of Two Independent Samples z Many times in the sciences it is useful to compare two groups z Two Approaches for Comparison Male vs. Female Drug vs. Placebo Confidence Intervals NC vs. Disease we already know something about CI’s Hypothesis Testing Q: Different? this will be new µ µ z What seems like a reasonable way to 1 y 2 y2 σ 1 σ 1 s 2 compare two groups? 1 s2 z What parameter are we trying to estimate? Population 1 Sample 1 Population 2 Sample 2 Size n1 Size n2 Slide 3 Stat 13, UCLA, Ivo Dinov Slide 4 Stat 13, UCLA, Ivo Dinov − Comparison of Two Independent Samples Standard Error of y1 y2 y µ y − y µ − µ z RECALL: The sampling distribution of was centered at , z We know 1 2 estimates 1 2 and had a standard deviation of σ n z What we need to describe next is the precision of our estimate, SE()− y1 y2 − z We’ll start by describing the sampling distribution of y1 y2 Mean: µ1 – µ2 Standard deviation of σ 2 σ 2 s 2 s 2 1 + 2 = 1 + 2 = 2 + 2 SE()y − y SE1 SE2 n1 n2 1 2 n1 n2 z What seems like appropriate estimates for these quantities? Slide 5 Stat 13, UCLA, Ivo Dinov Slide 6 Stat 13, UCLA, Ivo Dinov 1 − − Standard Error of y1 y2 Standard Error of y1 y2 Example: A study is conducted to quantify the benefits of a new Example: Cholesterol medicine (cont’) cholesterol lowering medication. Two groups of subjects are compared, those who took the medication twice a day for 3 (e.g., ftp://ftp.nist.gov/pub/dataplot/other/reference/CHOLEST2.DAT) years, and those who took a placebo. Assume subjects were randomly assigned to either group and that both groups data are Calculate an estimate of the true mean difference between normally distributed. Results from the study are shown below: treatment groups and this estimate’s precision. First, denote medication as group 1 and placebo as group 2 ()− = − = − Medication Placebo Medication Placebo y1 y2 209.8 224.3 14.5 y 209.8 224.3 n 10 10 y 209.8 224.3 s 44.3 46.2 SE 14.0 14.6 n 10 10 s 44.3 46.2 s2 s2 44.32 46.22 SE 14.0 14.6 = 1 + 2 = + = SE()y − y 20.24 1 2 n1 n2 10 10 Slide 7 Stat 13, UCLA, Ivo Dinov Slide 8 Stat 13, UCLA, Ivo Dinov Pooled vs. Unpooled Pooled vs. Unpooled z 2 2 is know as an unpooled version of the z Then we use the pooled variance to calculate the pooled s1 s2 + version of the standard error n1 n2 standard error ⎛ 1 1 ⎞ SE = s 2 ⎜ + ⎟ there is also a “pooled” SE pooled pooled ⎜ ⎟ ⎝ n1 n2 ⎠ z First we describe a “pooled” variance, which can be thought of 2 2 as a weighted average of s 1 and s2 NOTE: If (n1 = n2) and (s1 = s2) the pooled and unpooled will give the same answer for SE()− y1 y2 2 2 ()()n −1 s + n −1 s It is when n ≠ n that we need to decide whether to use pooled s 2 = 1 1 2 2 1 2 pooled n + n − 2 or unpooled: 1 2 if = then use pooled (unpooled will give similar answer) s1 s2 if ≠ then use unpooled (pooled will NOT give similar answer) s1 s2 Slide 9 Stat 13, UCLA, Ivo Dinov Slide 10 Stat 13, UCLA, Ivo Dinov Pooled vs. Unpooled µ µ CI for 1 - 2 z RESULT: Because both methods are similar when z RECALL: We described a CI earlier as: s1=s2 and n1=n2, and the pooled version is not valid the estimate + (an appropriate multiplier)x(SE) when z Why all the torture? This will come up again in α µ µ chapter 11. z A 100(1- )% confidence interval for 1 - 2 (p.227) zBecause the df increases a great deal when we do ()− ± ( ) y1 y2 t(df )α SEy − y pool the variance. 2 1 2 ()2 + 2 2 SE1 SE2 where df = 4 4 SE1 + SE2 ()− ()− n1 1 n2 1 Slide 11 Stat 13, UCLA, Ivo Dinov Slide 12 Stat 13, UCLA, Ivo Dinov 2 µ µ µ µ CI for 1 - 2 CI for 1 - 2 Example: Cholesterol medication (cont’) ()− ± ( ) y1 y2 t(df )α SEy − y 2 1 2 Calculate a 95% confidence interval for µ µ = − ± 1 - 2 14.5 t(17)0.025 (20.24) We know y − y and SE () y − y from the previous slides. 1 2 1 2 = −14.5 ± 2.110(20.24) Now we need to find the t multiplier 2 ()142 +14.62 = (−57.21,28.21) = = 167411.9056 = ≈ df 4 4 17.97 17 14 +14.6 9317.021 CONCLUSION: We are highly confident at the 0.05 level, that the ()10 −1 ()10 −1 Round down to be true mean difference in cholesterol between the medication and conservative placebo groups is between -57.02 and 28.02 mg/dL. NOTE: Calculating that df is not really that fun, a quick rule of thumb for checking your work is: Note the change in the conclusion of the parameter that we are estimating. Still looking for the 5 basic parts of a CI conclusion (see n1 + n2 -2 slide 38 of lecture set 5). Slide 13 Stat 13, UCLA, Ivo Dinov Slide 14 Stat 13, UCLA, Ivo Dinov µ µ Hypothesis Testing: The independent t test CI for 1 - 2 z What’s so great about this type of confidence interval? z The idea of a hypothesis test is to formulate a hypothesis that nothing is going on and then to see if z In the previous example our CI contained zero collected data is consistent with this hypothesis (or if This interval isn't telling us much because: the data shows something different) the true mean difference could be more than zero (in which Like innocent until proven guilty case the mean of group 1 is larger than the mean of group 2) or the true mean difference could be less than zero (in z There are four main parts to a hypothesis test: which case the mean of group 1 is smaller than the mean of hypotheses group 2) test statistic or the true mean difference could even be zero! p-value The ZERO RULE! conclusion Suppose the CI came out to be (5.2, 28.1), would this indicate a true mean difference? Slide 15 Stat 13, UCLA, Ivo Dinov Slide 16 Stat 13, UCLA, Ivo Dinov Hypothesis Testing: #1 The Hypotheses Hypothesis Testing: #1 The Hypotheses z If we are comparing two group means nothing going on would z There are two hypotheses: imply no difference Null hypothesis (aka the “status quo” hypothesis) the means are "the same" ()µ − µ = 0 denoted by Ho 1 2 Alternative hypothesis (aka the research hypothesis) z For the independent t-test the hypotheses are: ()µ − µ = Ho: 1 2 0 denoted by Ha (no statistical difference in the population means) ()µ − µ ≠ Ha: 1 2 0 (a statistical difference in the population means) Slide 17 Stat 13, UCLA, Ivo Dinov Slide 18 Stat 13, UCLA, Ivo Dinov 3 Hypothesis Testing: #1 The Hypotheses Hypothesis Testing: #2 Test Statistic Example: Cholesterol medication (cont’) zA test statistic is calculated from the sample data Suppose we want to carry out a hypothesis test to see if the data it measures the “disagreement” between the data and the show that there is enough evidence to support a difference in null hypothesis treatment means. if there is a lot of “disagreement” then we would think that Find the appropriate null and alternative hypotheses. the data provide evidence that the null hypothesis is false ()µ − µ = if there is little to no “disagreement” then we would think that Ho: 1 2 0 the data do not provide evidence that the null hypothesis is (no statistical difference the true means of the medication and false placebo groups) ()y − y − 0 H : () µ − µ ≠ 0 = 1 2 a 1 2 ts (a statistical difference in the true means of the medication and SEy − y placebo groups, medication has an effect on cholesterol) 1 2 subtract 0 because the null says the difference is zero Slide 19 Stat 13, UCLA, Ivo Dinov Slide 20 Stat 13, UCLA, Ivo Dinov Hypothesis Testing: #2 Test Statistic Hypothesis Testing: #2 Test Statistic z On a t distribution ts could fall anywhere Example: Cholesterol medication (cont’) If the test statistic is close to 0, this shows that the data are compatible with Ho (no difference) Calculate the test statistic the deviation can be attributed to chance If the test statistic is far from 0 (in the tails, upper or lower), (y − y ) − 0 ()209.8 − 224.3 − 0 t = 1 2 = = −0.716 this shows that the data are incompatible to Ho (there is a s SE − 20.24 difference) y1 y2 deviation does not appear to be attributed to chance (ie.