Detecting Genomic Imprinting and Maternal Effects in Family-Based
Total Page:16
File Type:pdf, Size:1020Kb
Detecting Genomic Imprinting and Maternal Effects in Family-Based Association Studies Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Fangyuan Zhang, Graduate Program in Biostatistics The Ohio State University 2015 Dissertation Committee: Prof. Shili Lin, Advisor Prof. Haikady N. Nagaraja Prof. Christopher W. Bartlett c Copyright by Fangyuan Zhang 2015 Abstract Genomic imprinting and maternal effects are two important epigenetic factors, that can contribute to phenotypic variation without changing DNA sequence. They are the source of the missing heritability that cannot be explained by genome-wide association studies. Genomic imprinting and maternal effects have been shown to affect many complex human diseases, including Prader-Willi, Beckwith-Weidemann, and Angelman syndromes, and childhood cancers. My dissertation focuses on the detection of these two epigenetic factors in family-based association studies. We first propose a Monte Carlo Pedigree Parental-Asymmetry Test utilizing both Affected and Unaffected offspring (MCPPATu) for detecting imprinting effect. Though genomic imprinting and maternal effects can be confounded, when there is no maternal effect, the proposed method is simple and powerful. It utilizes information from both affected and unaffected offspring and allows for missing genotypes through Monte Carlo sampling. Simulation studies demonstrate that MCPPATu controls the empirical type I error rate well under the null hypotheses of no parent-of-origin effects. It also shows that the use of additional information from unaffected offspring and partially observed genotypes can greatly improve the statistical power. Second part of our work offers a practical strategy to select optimum study design in the detection of genomic imprinting and maternal effect jointly within a case-control ii family scheme. To enable such an investigation, we first derive the asymptotic prop- erties of a recent partial likelihood method (LIME) that we employ for simultaneous effect detection and compare the information contents of each study design being in- vestigated. Our results show that the optimal study design is mainly determined by the disease prevalence. Thirdly, we develop a partial Likelihood method for detecting Imprinting and Ma- ternal Effects for a Discordant Sib-Pair design (LIMEDSP ) utilizing all available sibship data without the need to recruit separate control families. By matching affected and unaffected probands and stratifying according to their familial genotypes, a partial like- lihood component free of nuisance parameters can be extracted from the full likelihood. Theoretical analysis shows that the partial maximum likelihood estimators based on the LIMEDSP approach are consistent and asymptotically normally distributed. Sim- ulation study demonstrates that LIMEDSP is robust and powerful. To illustrate its practical utility, LIMEDSP is applied to a club foot dataset and to the Framingham Heart Study. In the last part, we propose a two-step approach to detect the two epigenetic effects jointly when missing genotype leads to ambiguous parental origin. The first step is to infer the distribution of the missing genotypes by using the information from nearby loci. This is then followed by applying a partial likelihood method to the inferred data. To substantiate the validity of the proposed procedures, we called out a simulation study. The results show that, by borrowing genetic information from nearby loci, the power of the proposed method can be made close to the method based on complete genotype data at the locus of interest. This illustrates the use of nearby marker in iii linkage disequilibrium can help resolve parental origin ambiguity. To illustrate its practical utility, the two-step method is applied to autism study data. iv Dedicated to my husband and parents, for their everlasting love and support. v Acknowledgments I would like to express my special appreciation and thanks to my advisor Dr. Shili Lin. Her advice on both research as well as on my career have been priceless. Without her supervision and constant help for the past five years, this dissertation would not have been possible. I also would like to thank Dr. Haikady N. Nagaraja and Dr. Christopher W. Bartlett for serving on my Ph.D. dissertation exam committee, and Dr. Asuman Turk- men for serving on my Ph.D. candidacy exam committee. I appreciate not only their time and extreme patience, but also their intellectual contributions to my development as a researcher. vi Vita May 23, 1987 . Born - Harbin, China 2010 . B.S. Statistics, Beijing Normal Univer- sity 2012 . M.S. Statistics, The Ohio State Univer- sity 2011-2014 . .Graduate Teaching/Research Asso- ciate, Department of Statistics, The Ohio State University. Publications Research Publications F. ZHANG & S. LIN, \Detection of Imprinting Effects for Hypertension Based on Gen- eral Pedigrees Utilizing All Affected and Unaffected Individuals". BMC proceedings. 8(Suppl 1):S52, 2014. H. N. NAGARAJA, K. BHARATH & F. ZHANG, \Spacings Around an Order Statis- tic". Annals of the Institute of Statistical Mathematics. Published online first: 26 April, 2014 F. ZHANG & S. LIN, \ Nonparametric method for detecting imprinting effect using all members of general pedigrees with missing data". Journal of Human Genetics. 59, 541-548 vii Fields of Study Major Field: Biostatistics Studies in: Nonparametric method for detecting imprinting effect using all members of general pedigrees with missing data. Prof. Shili Lin Asymptotic Property and Study Design for Detecting Imprinting and Maternal Effects Based on Partial Likelihood. Prof. Shili Lin Imprinting and Maternal Effect Detection Using Partial Likelihood Based on Discordant Sibship Data. Prof. Shili Lin Spacings Around an Order Statistic. Prof. Haikady N Nagaraja viii Table of Contents Page Abstract . ii Dedication . .v Acknowledgments . vi Vita.......................................... vii Chapter 1. Introduction ........................... 1 1.1 Genomic Imprinting and Maternal Effects . 1 1.2 Diseases Related to Imprinting Effect or Maternal Effect . 4 1.3 Existing Methods to Detect Imprinting Effect and Maternal Effects . 11 1.4 Organization of This Dissertation . 18 Chapter 2. Detection of Imprinting Effects Based on Gener- al Pedigrees Utilizing All Affected and Unaffect- ed Individuals .......................... 20 2.1 Introduction . 20 2.2 Materials and Methods . 23 2.3 Results . 28 ix 2.4 Discussion . 40 Chapter 3. Optimum Study Design for Detecting Imprinting and Maternal Effects Based on Partial Likelihood 42 3.1 Introduction . 42 3.1.1 The LIME Procedure . 44 3.1.2 Asymptotic Properties . 49 3.1.3 Calculation of Per Family and Per Individual Information Content 50 3.1.4 Numerical Study: Empirical Versus Asymptotic Variances . 51 3.2 Study Design Consideration . 56 3.2.1 Information Content Per Family . 56 3.2.2 Information Content Per Individual . 57 3.3 Discussion . 61 Chapter 4. Imprinting and Maternal Effect Detection Using Partial Likelihood Based on Discordant Sibship Data ................................. 63 4.1 Introduction . 63 4.2 Partial Likelihood Method - LIMEDSP ................... 65 4.2.1 Notation and Genetic Model . 65 4.2.2 Ascertainment and Probability Formulation . 66 4.2.3 Organization of Data . 69 4.2.4 Partial Likelihood and Asymptotic Properties . 71 4.2.5 Combining Data From the Two Study Designs . 72 4.3 Theoretical Study of Information Contents . 72 x 4.4 Simulation . 77 4.5 Real Data Analysis . 77 4.5.1 Analysis of the Club Foot Data . 78 4.5.2 Analysis of the Framingham Heart Study Data . 79 4.6 Discussion . 81 Chapter 5. Incorporating Information from Nearby Loci for Detection of Imprinting and Maternal Effects Based on Partial Likelihood ..................... 84 5.1 Introduction . 84 5.2 Method . 85 5.2.1 Enrichment of Test Locus Information . 85 5.2.2 LIMEhap . 88 5.3 Simulation Study . 90 5.3.1 Type I Error and Power . 92 5.3.2 Position of Test Locus and Number of Additional Loci . 92 5.4 Autism Spectrum Disorder Data . 96 5.5 Discussion . 99 Chapter 6. Contributions and Future Work ............. 102 6.1 Contributions . 102 6.2 Future Work . 105 6.2.1 Penalized Partial Likelihood . 105 6.2.2 Involving Ancestry in LIMEhap Method . 106 6.2.3 Other Future Work . 108 xi Bibliography .................................... 111 Appendices 148 Chapter A. Detection of Imprinting Effects Based on Gener- al Pedigrees Utilizing All Affected and Unaffect- ed Individuals .......................... 148 A.1 Show that E(TMCP P AT u) = 0 . 148 A.2 Analysis of an Rheumatoid Arthritis Dataset . 149 Chapter B. Optimum Study Design for Detecting Imprinting and Maternal Effects Based on Partial Likelihood154 B.1 Technical Details Related to Theorem 3.1 . 154 B.1.1 Regularity Conditions . 156 B.1.2 Proof of Theorem 3.1 . 157 B.1.3 Calculation of the Constants in the Information Matrix I(θ0) . 165 B.2 Calculation of Per Family and Per Individual Information Content . 168 B.2.1 Per Family Information Content . 168 B.2.2 Per Individual Information Content . 170 B.3 Verification of Asymptotic Properties . 172 B.4 Sample Size Calculation . 173 Chapter C. Imprinting and Maternal Effect Detection Using Partial Likelihood Based on Discordant Sibship Data ................................. 182 C.1 Calculation of Probabilities in Table 4.1 . 182 xii C.2 Regularity Conditions and Proof of Theorem 4.1 . 184 C.3 Estimation of Maternal Effect with the DSP Design without Additional Siblings . 193