Biost 517 Applied Biostatistics I Lecture Outline

Biost 517 Applied Biostatistics I Lecture Outline

Applied Biostatistics I, AUT 2012 November 26, 2012 Lecture Outline Biost 517 • Comparing Independent Proportions – Large Samples (Uncensored) Applied Biostatistics I • Chi Squared Test – Small Samples (Uncensored) Scott S. Emerson, M.D., Ph.D. • Fisher’s Exact Test Professor of Biostatistics University of Washington • Adjusted Chi Squared or Fisher’s Exact Test Lecture 13: Two Sample Inference About Independent Proportions November 26, 2012 1 2 © 2002, 2003, 2005 Scott S. Emerson, M.D., Ph.D. Summary Measures Comparing Independent Proportions • Comparing distributions of binary variables across two groups • Difference of proportions (most common) – Most common Large Samples (Uncensored) – Inference based on difference of estimated proportions • Ratio of proportions – Of most relevance with low probabilities – Often actually use odds ratio 3 4 Part 1:1 Applied Biostatistics I, AUT 2012 November 26, 2012 Data: Contingency Tables Large Sample Distribution • The cross classified counts • With totally independent data, we use the Central Limit Response Theorem Yes No | Tot Group 0 a b | n0 • Proportions are means 1 c d | n1 – Sample proportions are sample means Total m0 m1 | N • Standard error estimates for each group’s estimated proportion based on the mean – variance relationship 5 6 Asymptotic Sampling Distn Asymptotic Confidence Intervals • Comparing two binomial proportions • Confidence interval for difference between two binomial proportions n0 Suppose independent X i ~ B1, p0 X i X ~ B n0 , p0 i1 We want to make inference about p p n1 1 0 and Y ~ B 1, p Y Y ~ B n , p i 1 i 1 1 100(1)% confidence interval is i1 We want to make inference about p p ˆ ˆ ˆ ˆ 1 0 z1 / 2 se , z1 / 2 se X p (1 p ) Y p (1 p ) pˆ ~ N p , 0 0 pˆ ~ N p , 1 1 0 n 0 n 1 n 1 n 0 0 1 1 ˆ pˆ1 pˆ 0 p (1 p ) p (1 p ) pˆ1(1 pˆ1) pˆ 0 (1 pˆ 0 ) ˆ ˆ ˆ 1 1 0 0 Estimate se ˆ p1 p0 ~ N p1 p0 , n1 n0 n1 n0 7 8 Part 1:2 Applied Biostatistics I, AUT 2012 November 26, 2012 Asymptotic Hypothesis Tests Goodness of Fit Test • Test statistic for difference between two binomial proportions • An alternative derivation of the asymptotic test of binomial proportions as a special case of the “goodness of fit test” • For contingency tables of arbitrary size Suppose we want to test H 0 : p1 p0 0 – “R by C” table – E.g., tumor grade by bone scan score in PSA data set ˆ Under the null hypothesis, Z ~ N(0,1) seˆ Under H0 the entire distributions are equal so pˆ(1 pˆ) pˆ(1 pˆ) X Y Estimate seˆ with pˆ n1 n0 n0 n1 9 10 . tabulate grade bss, row col cell Goodness of Fit Test | bss grade | 1 2 3 | Total 1 | 1 3 6 | 10 • Given categorical random variables, we often have scientific | 10.00 30.00 60.00 | 100.00 questions that relate to the frequency distribution for the | 20.00 27.27 25.00 | 25.00 measurements | 2.50 7.50 15.00 | 25.00 2 | 2 4 9 | 15 • Examples: | 13.33 26.67 60.00 | 100.00 – Checking for Poisson, normal, etc. | 40.00 36.36 37.50 | 37.50 – Distribution of phenotypes to agree with genetic theory | 5.00 10.00 22.50 | 37.50 – Comparing distributions of categorical data across 3 | 2 4 9 | 15 populations | 13.33 26.67 60.00 | 100.00 – Independence of random variables | 40.00 36.36 37.50 | 37.50 | 5.00 10.00 22.50 | 37.50 Total | 5 11 24 | 40 | 12.50 27.50 60.00 | 100.00 | 100.00 100.00 100.00 | 100.00 11 | 12.50 27.50 60.00 | 100.00 Part 1:3 Applied Biostatistics I, AUT 2012 November 26, 2012 Aside: Sampling Nomenclature Basic Idea • We often characterize the sampling scheme according to the • We compare the observed counts in each cell to the number total counts that were fixed by design we might have expected – Poisson sampling: none – “Observed – Expected” – Multinomial sampling: total counts – Counts, NOT proportions – Binomial sampling: either row or column totals – Hypergeometric sampling: both row and column totals 13 14 Sampling Distribution Test Statistic • The test to see whether the data fits the hypotheses (hence • Pearson’s chi-square goodness of fit statistic in the general “goodness of fit”) case – Assuming only one “constraint” on the cells • Observed counts in each cell are assumed to be Poisson random variables – Standard error is the square root of the mean Given K categories, and for the ith cell Oi the observed count and • Test statistic based on sum of Z scores for each cell – Actually done as squared Z scores Ei the expected count (typically Ei pi N) 2 – Sum of squared normals has a chi squared distn K H0 2 Oi Ei 2 ~ K 1 i1 Ei 15 16 Part 1:4 Applied Biostatistics I, AUT 2012 November 26, 2012 Determining the “Expected” Test for Independence • The scientific null hypothesis usually specifies the probability • Chi-square test for independence (or association) that a random observation would belong in each cell R C table for two categorical variables, in the r,cth cell • Most often: – Testing independence of variables Orc the observed count and • Probabilities in each cell are expected to be the product of Erc pˆ r pˆ c N the expected count the marginal distributions 2 R C H 0 2 Oij Eij 2 • Other uses ~ R1 C 1 i1 j1 E – Testing “goodness of fit” to distributions ij – Testing agreement with genetic hypotheses where pˆ r and pˆ c are the estimated proportions are for the row and column margins, respectively 17 18 . tabulate grade bss, row col cell . tabulate grade bss, row col exp | bss | bss grade | 1 2 3 | Total grade | 1 2 3 | Total 1 | 1 3 6 | 10 1 | 1 3 6 | 10 | 10.00 30.00 60.00 | 100.00 | 1.3 2.8 6.0 | 10.0 | 20.00 27.27 25.00 | 25.00 | 10.00 30.00 60.00 | 100.00 | 2.50 7.50 15.00 | 25.00 | 20.00 27.27 25.00 | 25.00 2 | 2 4 9 | 15 2 | 2 4 9 | 15 | 13.33 26.67 60.00 | 100.00 | 1.9 4.1 9.0 | 15.0 | 40.00 36.36 37.50 | 37.50 | 13.33 26.67 60.00 | 100.00 | 5.00 10.00 22.50 | 37.50 | 40.00 36.36 37.50 | 37.50 3 | 2 4 9 | 15 3 | 2 4 9 | 15 | 13.33 26.67 60.00 | 100.00 | 1.9 4.1 9.0 | 15.0 | 40.00 36.36 37.50 | 37.50 | 13.33 26.67 60.00 | 100.00 | 5.00 10.00 22.50 | 37.50 | 40.00 36.36 37.50 | 37.50 Total | 5 11 24 | 40 Total | 5 11 24 | 40 | 12.50 27.50 60.00 | 100.00 | 5.0 11.0 24.0 | 40.0 | 100.00 100.00 100.00 | 100.00 | 12.50 27.50 60.00 | 100.00 | 12.50 27.50 60.00 | 100.00 | 100.00 100.00 100.00 | 100.00 Part 1:5 Applied Biostatistics I, AUT 2012 November 26, 2012 In 2 x 2 Contingency Tables Elevator Statistics • The chi squared test and the Z test comparing binomial • Well, nearly elevator statistics proportions are exactly the same test • The chi squared statistic is just the square of the Z statistic 0 a b n0 • The chi squared test P value will be the same as the two- 1 c d n1 sided P value for the Z test m0 m1 N • Most software packages have a tendency to tell you the value of the chi squared statistic ad bc 2 N N 2 Z ad bc n0n1m0m1 n0n1m0m1 21 22 Need for Large Samples Other Large Sample Tests • Note that because the goodness of fit test is relying on • “Likelihood Ratio Test” asymptotic properties, it is only valid in “large” samples • Also has chi squared distribution in large samples • A commonly used rule of thumb if that the expected counts be – But not the same statistic as “chi squared statistic” greater than 5 in the vast majority of the cells • Good large sample properties – Often most powerful • Less commonly used in 2 x 2 tables – We will use it more often in logistic regression 23 24 Part 1:6 Applied Biostatistics I, AUT 2012 November 26, 2012 Stata Commands: CI Ex: Stata Commands • “cs respvar groupvar, level(#)” • Example: Hepatomegaly by treatment group in PBC data set • Both variables must be coded as 0 and 1 • Response will be called “cases” and “noncases” . g tx= 2 – treatmnt – CI can be found under “Risk difference” . cs hepmeg tx – Chi squared statistic and two-sided P value | tx | | Exposed Unexposed | Total • “tabulate respvar groupvar, row col chi2 lr” Cases | 73 87 | 160 – Row and column percentages Noncases | 84 66 | 150 Total | 157 153 | 310 – Chi squared and likelihood ratio P values | | Risk | .4649682 .5686275 | .516129 25 26 Ex: Stata Commands (cont.) Interpretation • Example: Hepatomegaly by treatment group in PBC data set • With 95% confidence, the true difference in the prevalence of (cont.) hepatomegaly at baseline is between .007 higher in treatment group and .214 lower in treatment group. | Point | • (What are these populations? At baseline, we only have | estimate | [95% Conf. Interval] samples that are treated and control. Both groups were Risk diff | -.1037 | -.2143 .0070 drawn from the same population.) Risk ratio | .8177 | .6580 1.0161 Prev frac ex | .1823 | -.0161 .3420 • Based on two sided P = .0679, we cannot reject the null Prev frac pop | .0923 | _ _ hypothesis of equality chi2(1) = 3.33 Pr>chi2 = 0.0679 27 28 Part 1:7 Applied Biostatistics I, AUT 2012 November 26, 2012 Caveat: Sad Fact of Life Comparison to t Test • Different variance estimates are typically used for CI and • Hepatomegaly by treatment group using two sample t test hypothesis tests with unequal variances – Z test uses the standard normal distribution and does not • We can see disagreement between the conclusion reached use the sample variance by CI and P value – The P value might be less than .05, but the CI contain 0 .ttest hepmeg, by(tx) unequal – The P value might be greater than .05, but the CI exclude Two-sample t test; unequal variances 0: N obs= 157 0 1: N obs= 153 Variable | Mean St Err t P>|t| [95% CI] 0 | .465 .0399 11.6 0.0000 .386 .544 1 | .569 .0402 14.2 0.0000 .489 .648 diff |-.104 .0566 -1.83 0.0682 -.215 .008 29 30 Comparison to

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    15 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us