ABSTRACT MORRIS, RICHARD WAYNE. Likelihood Ratio Tests For

ABSTRACT MORRIS, RICHARD WAYNE. Likelihood ratio tests for association with multiple disease susceptibility alleles, genotyping errors, or missing parental data. (Under the direction of Norman L. Kaplan.) Multiple disease susceptibility alleles, genotype errors, or missing genotype data can create problems when testing for association between alleles or genotypes at a genetic marker and a dichotomous phenotype. I used likelihood methods to study the impact of each of these factors on detecting association. In the presence of multiple disease susceptibility alleles, I found that power of the likelihood ratio test (LRT) declines less when based on haplotypes made up of tightly linked single nucleotide polymorphisms (SNPs) than when based on individual SNPs. The result suggests that statistical methods based on haplotypes may be useful to identify and locate complex disease genes. Genotype errors can lead to excess type I error in nuclear family (case- parents) studies when errors resulting in Mendelian inconsistent families are corrected but other errors remain in the data. I developed a LRT for single SNPs or haplotypes that incorporates nuisance parameters for genotype errors and showed that type I error rate can be controlled at little cost to power. For nuclear family data in which missing parents and additional siblings create a diversity of family structures, I developed a unified approach to computing LRT power for a test of association. Comparison of LRT power with power of a family-based association test showed that LRT has greater power. 1 Likelihood Ratio Tests for Association with Multiple Disease Susceptibility Alleles, Genotyping Errors, or Missing Parental Data by Richard Wayne Morris A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Biomathematics Raleigh 2003 Approved by Norman L. Kaplan Jeffrey L. Thorne Co-Chair of Advisory Committee Co-Chair of Advisory Committee Bruce S. Weir Dahlia M. Nielsen 2 to Jackie …the adventure continues ii Biography I grew up in San Diego, California, and graduated from San Diego State University with a BS in Zoology. I then attended the University of California, Berkeley, and completed requirements for a PhD in Genetics, except for writing the dissertation. I moved to Raleigh, North Carolina, in 1978 and held various research positions in the Departments of Statistics and Genetics at North Carolina State University. I completed a Master of Statistics at NCSU in 1984 and worked on statistical problems in toxicology in a contract-research environment for 17 years. I entered the PhD program in Biomathematics in 1994 and continued to work full time while pursuing coursework in statistics and mathematics. I joined the Biostatistics Branch at the National Institute of Environmental Health Sciences in 2001 and completed a PhD in Biomathematics in 2003. iii Acknowledgements I have really enjoyed doing the work summarized in this dissertation – in part because it is intellectually satisfying work, but mostly because it has given me an opportunity to interact with some very interesting and talented people. In my early years in the Biomathematics program, when I was taking a course each semester and working full time, Joe Haseman encouraged me in many ways and always supported my educational aspirations. During this period, Rich Cohn fostered an environment in the Division of Statistics and Public Health Research at Analytical Sciences, Inc. that encouraged personal and professional growth. I am grateful to both of these individuals for their support. I thank Ann Etheridge for her enthusiastic help in navigating the administrative requirements of the Biomathematics program. She helped and encouraged me in ways that are not reflected in the technical content of this document, but are nevertheless important to me. I am grateful to Clare Weinberg, who made it possible for me to complete this work at the National Institute of Environmental Health Sciences. Without her efforts, this would not have been possible. I thank the members of my committee for their support. Jeff Thorne has given freely of his time and offered insightful comments at different stages of development of this work. Bruce Weir has provided valuable perspective on statistical problems and has long supported my efforts to achieve closure on this project. Dahlia Nielsen has offered alternative interpretations and has often asked questions that evoked constructive thought about the problems discussed here. My greatest intellectual debt, iv however, is to Norm Kaplan, who worked with me from rough concepts through mathematical details to manuscript submissions. He kept me focused on each problem and provided valuable perspective throughout this work. The two years I spent completing this dissertation under Norm’s direction were, simply put, great fun. v Table of Contents Page List of Tables................................................................................................... vii List of Figures ................................................................................................. viii List of Symbols................................................................................................ ix List of Abbreviations...................................................................................... xi Introduction .................................................................................................... 1 Chapter 1 On the Advantage of Haplotype Analysis in the Presence of Multiple Disease Susceptibility Alleles .................................... 4 Abstract........................................................................................ 5 Introduction ................................................................................. 6 Methods ....................................................................................... 7 Likelihood Ratio Test............................................................ 7 Specification of Alternative Hypotheses............................... 8 Single Historical Scenario ..................................................... 8 Multiple Historical Scenarios................................................ 9 Expected LRT Power ............................................................ 10 Nonuniform Susceptibility Allele Frequencies ..................... 10 Results ......................................................................................... 11 Uniform Distribution of Multiple Susceptibility Mutations: 2, 4, and 8 Marker Alleles ................................................ 11 Uniform Distribution of Multiple Susceptibility Mutations: Haplotypes........................................................................ 12 Nonniform Distribution of Multiple Susceptibility Mutations.......................................................................... 13 Discussion.................................................................................... 14 Acknowledgements ..................................................................... 16 Appendix ..................................................................................... 16 References ................................................................................... 17 Chapter 2 Testing for Association with Nuclear Families in the Presence of Genotyping Error............................................ 18 Abstract........................................................................................ 19 Introduction ................................................................................. 20 vi Page Methods ....................................................................................... 21 Genotype Error Model........................................................... 21 Nuclear Family Model........................................................... 24 Likelihood of Observed Data ................................................ 25 Likelihood Ratio Test............................................................ 26 Data Analysis Strategies........................................................ 27 Simulations............................................................................ 28 Results ......................................................................................... 28 Type I Error Rate for Single SNP.......................................... 28 Type I Error Rate for 2 and 3 SNPs ...................................... 29 Power Loss Under MGE ....................................................... 31 Bias of RME Under Alternative Hypothesis......................... 31 Discussion.................................................................................... 32 Acknowledgements ..................................................................... 37 Appendix A ................................................................................. 38 Appendix B.................................................................................. 38 Likelihood of Complete Data ................................................ 38 Expectation Step.................................................................... 40 Maximization Step................................................................. 41 References ................................................................................... 43 Chapter 3 A Likelihood Approach to Power Computations for a Case-parents Design with Missing Parents and Additional Siblings ...................................................................

ABSTRACT MORRIS, RICHARD WAYNE. Likelihood Ratio Tests For

Identifying Novel Disease-Associated Variants and Understanding The

The Struggle to Find Reliable Results in Exome Sequencing Data

BIOINFORMATICS Pages 1–9

The Clark Phase-Able Sample Size Problem: Long-Range Phasing and Loss of Heterozygosity in GWAS

Integration and Missing Data Handling in Multiple Omics Studies

The NCBI Dbgap Database of Genotypes and Phenotypes

Erciyes Medical Genetics Days 2019

Joint Variant and De Novo Mutation Identification on Pedigrees from High-Throughput Sequencing Data

Genetic Investigation of Kidney Disease

Exploration of Interaction Between Common and Rare Variants in Genetic Susceptibility to Holoprosencephaly Artem Kim

Estimation of Genotype Error Rate Using Samples with Pedigree Information—An Application on the Genechip Mapping 10K Array

Mendelian Error Detection in Complex Pedigrees Using Weighted Constraint Satisfaction Techniques Marti Sanchez, Simon De Givry, Thomas Schiex