Genetic Association Tests for Binary Traits with An
Total Page:16
File Type:pdf, Size:1020Kb
GENETIC ASSOCIATION TESTS FOR BINARY TRAITS WITH AN APPLICATION by SULGI KIM Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy Dissertation Advisor: Dr. Robert C. Elston Department of Epidemiology and Biostatistics CASE WESTER RESERVE UNIVERSITY August, 2009 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approved the dissertation of Sulgi Kim candidate for the Ph. D. degree*. (Signed) Robert C. Elston, Ph.D Department of Epidemiology and Biostatistics Chair of the committee Xiaofeng Zhu, Ph.D Department of Epidemiology and Biostatistics Courtney Gray-McGuire, Ph.D Department of Epidemiology and Biostatistics Jill S. Barnholtz-Sloan, Ph.D Case Comprehensive Cancer Center June 8, 2009 *We also certify that written approval has been obtained for any proprietary material therein. ii Table of Contents List of Tables .................................................................................................................... iv List of Figures .....................................................................................................................v Acknowledgements .......................................................................................................... vi Abstract ...............................................................................................................................1 Chapter 1. Introduction.....................................................................................................3 Chapter 2. Association Tests for a Binary Trait with Unrelated Individuals ..............6 Chapter 3. An Application to Diabetic Nephropathy Data ..........................................41 Chapter 4. Association Tests for a Binary Trait Measured on Related Individuals .61 Chapter 5. Conclusions and Areas for Future Study ...................................................73 APPENDIX .......................................................................................................................76 BIBLIOGRAPHY ............................................................................................................88 iii List of Tables Table 2-I Genetic Association Tests for Case-Control Data .............................................. 7 Table 2-II Probabilities for unphased genotypes ............................................................. 10 Table 2-III Single-marker and two-marker association tests with corresponding models and hypotheses .................................................................................................................. 16 Table 2-IV Constraints for disease models ...................................................................... 18 Table 2-V Comparisons of Theoretical and Empirical Power of Test 1-2 ....................... 25 Table 2-VI Empirical Type I Error of Test 1-2 ................................................................. 25 Table 2-VII Power comparisons of two-marker tests ( rare allele frequency ) ................ 34 Table 2-VIII Power comparisons of two-marker tests ( common allele frequency ) ....... 35 Table 2-IX Mean of Power over Chromosome 11 of CEU HapMap Data ....................... 37 Table 3-I Details of Tag SNP at the candidate gene ......................................................... 44 Table 3-II Categorical Covariates used in our analysis .................................................... 47 Table 3-III Logistic regression model identifying significant covariates ......................... 48 Table 3-IV The Haplotype Distribution of rs2146098 and rs6659783 in Cases and Controls ............................................................................................................................. 53 Table 4-I Empirical Type I errors with Random Samples ................................................ 54 Table 4-II Empirical Power with Random Samples ......................................................... 54 Table 4-III Empirical Type I errors with Ascertained Samples ........................................ 54 Table 4-IV Empirical Power with Ascertained Samples .................................................. 54 iv List of Figures Figure 2-1 Power of single-marker test vs LD in four disease models (K=0.05) ............. 28 Figure 2-2 Power of single-marker test vs disease allele frequency in four disease models (K=0.05) ............................................................................................................................ 29 Figure 2-3 Power of single-marker test vs LD in four disease models (K=0.20) ............. 30 Figure 2-4 Power of single-marker test vs disease allele frequency in four disease models (K=0.20) ............................................................................................................................ 31 Figure 2-5 Power of single-marker test vs LD in four disease models (K=0.005) ........... 32 Figure 2-4 Power of single-marker test vs disease allele frequency in four disease models (K=0.005) .......................................................................................................................... 33 Figure 3-1 LD Plot of CNDP1 and tag SNPs ................................................................... 45 Figure 3-2 LD Plot of ELMO1 and tag SNPs ................................................................... 46 Figure 3-3 Results for CNDP1 and ELMO1 with ARB use covariate (B) ....................... 51 Figure 3-4 Results for the eight other candidate genes ..................................................... 52 Figure 3-5 Association test results for HMCN1 with covariate configuration (B) (including ARB use as a covariate) .................................................................................. 53 Figure 3-6 LD Plot of HMCN1 and tag SNPs .................................................................. 54 v Acknowledgements Most of all, I wish to thank my advisor, Dr. Robert Elston, for his time and guidance in making this dissertation a reality. I also thank Dr. Xiaofeng Zhu, Dr. Courtney Gray- McGuire and Dr. Jill Barnholtz-Sloan for serving on the committee and providing valua- ble advice. I am grateful to Nathan Morris who has offered useful suggestions. My work on this dissertation was supported by a U.S. Public Health Service research grant (R01- DK069844) from the NIDDK (P.I.: Dr. Sharon Adler). Computational support was by a U.S. Public Health Service resource grant (RR03655) from the National Center for Re- search Resources (P.I.: Dr. Robert Elston). I am grateful to Dr. Sudha Iyengar and Dr. Sharon Adler for the opportunity to receive this funding and also for their helpful advice. This dissertation includes some work of other FIND collaborators from Perlegen in AP- PENDIX E. I also wish to express my appreciation to the other faculty members at Case Western Reserve University for their excellent teaching. Sincere thanks go to my friends Sungho Won, Sung-Gon Yi, Gyungah Jun, Robert Goodloe and Yali Li for their willing- ness to share what they had learned. Finally, I thank my family for their love and prayers. vi Genetic Association Tests for Binary Traits with an Application Abstract by SULGI KIM Genetic association studies aim to map causal variants for a trait by performing many association tests between each marker along a chromosomal region and a trait of interest. Therefore, valid and powerful association tests are essential for a genetic asso- ciation study. In this dissertation, association tests are considered for binary traits using both unrelated and related individuals. For unrelated data, this dissertation shows that prospective models may be devel- oped that correspond conceptually to retrospective tests. Two single-marker tests and four two-marker tests are discussed. The true association models are derived, allowing us to understand the effects of marker association patterns. The power of the association tests was investigated by simulation using HapMap data. Among the single-marker tests, the allelic test has on average the most power in the case of an additive disease; but, for non- additive diseases, the genotypic test has the most power. Among the four two-marker tests, the Allelic-LD contrast test provides the most reliable power overall for the cases studied. The proposed methods were applied to Diabetic Nephropathy (DN) data. Two genes, Carnosine Dipeptidase 1 (CNDP1) and Engulfment and Cell Motility 1 (ELMO1) have previously shown association with DN. These two genes, along with eight other genes (HMCN1, CFH, AHSG, CASP3, HSPA1A, HSPB1, CASP12, and HMOX1) were examined in a new study of Mexican-Americans. There was no replication of the associa- tions with either CNDP1 or ELMO1. Of the other eight candidate genes, association with DN was found with a SNP pair, rs2146098 and rs6659783, in HMCN1. Association with a rare haplotype in this region was subsequently identified. Lastly, association tests for related individuals were considered, particularly with genome-wide data. Two versions of the quasi-likelihood score test using a generalized linear mixed model (GLMM-QLS) were proposed. For 100 nuclear families of the same structure, it was shown that the proposed methods maintain nominal Type I error and have power comparable previously published methods. The main strength of the GLMM- QLSs is their computational efficiency when applied to genome-wide association studies. Because they are based on the prospective model, it can easily incorporate other cova- riates. 2 Chapter 1 INTRODUCTION