ABSTRACT MORRIS, RICHARD WAYNE. Likelihood Ratio Tests For

Total Page:16

File Type:pdf, Size:1020Kb

ABSTRACT MORRIS, RICHARD WAYNE. Likelihood Ratio Tests For ABSTRACT MORRIS, RICHARD WAYNE. Likelihood ratio tests for association with multiple disease susceptibility alleles, genotyping errors, or missing parental data. (Under the direction of Norman L. Kaplan.) Multiple disease susceptibility alleles, genotype errors, or missing genotype data can create problems when testing for association between alleles or genotypes at a genetic marker and a dichotomous phenotype. I used likelihood methods to study the impact of each of these factors on detecting association. In the presence of multiple disease susceptibility alleles, I found that power of the likelihood ratio test (LRT) declines less when based on haplotypes made up of tightly linked single nucleotide polymorphisms (SNPs) than when based on individual SNPs. The result suggests that statistical methods based on haplotypes may be useful to identify and locate complex disease genes. Genotype errors can lead to excess type I error in nuclear family (case- parents) studies when errors resulting in Mendelian inconsistent families are corrected but other errors remain in the data. I developed a LRT for single SNPs or haplotypes that incorporates nuisance parameters for genotype errors and showed that type I error rate can be controlled at little cost to power. For nuclear family data in which missing parents and additional siblings create a diversity of family structures, I developed a unified approach to computing LRT power for a test of association. Comparison of LRT power with power of a family-based association test showed that LRT has greater power. 1 Likelihood Ratio Tests for Association with Multiple Disease Susceptibility Alleles, Genotyping Errors, or Missing Parental Data by Richard Wayne Morris A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Biomathematics Raleigh 2003 Approved by Norman L. Kaplan Jeffrey L. Thorne Co-Chair of Advisory Committee Co-Chair of Advisory Committee Bruce S. Weir Dahlia M. Nielsen 2 to Jackie …the adventure continues ii Biography I grew up in San Diego, California, and graduated from San Diego State University with a BS in Zoology. I then attended the University of California, Berkeley, and completed requirements for a PhD in Genetics, except for writing the dissertation. I moved to Raleigh, North Carolina, in 1978 and held various research positions in the Departments of Statistics and Genetics at North Carolina State University. I completed a Master of Statistics at NCSU in 1984 and worked on statistical problems in toxicology in a contract-research environment for 17 years. I entered the PhD program in Biomathematics in 1994 and continued to work full time while pursuing coursework in statistics and mathematics. I joined the Biostatistics Branch at the National Institute of Environmental Health Sciences in 2001 and completed a PhD in Biomathematics in 2003. iii Acknowledgements I have really enjoyed doing the work summarized in this dissertation – in part because it is intellectually satisfying work, but mostly because it has given me an opportunity to interact with some very interesting and talented people. In my early years in the Biomathematics program, when I was taking a course each semester and working full time, Joe Haseman encouraged me in many ways and always supported my educational aspirations. During this period, Rich Cohn fostered an environment in the Division of Statistics and Public Health Research at Analytical Sciences, Inc. that encouraged personal and professional growth. I am grateful to both of these individuals for their support. I thank Ann Etheridge for her enthusiastic help in navigating the administrative requirements of the Biomathematics program. She helped and encouraged me in ways that are not reflected in the technical content of this document, but are nevertheless important to me. I am grateful to Clare Weinberg, who made it possible for me to complete this work at the National Institute of Environmental Health Sciences. Without her efforts, this would not have been possible. I thank the members of my committee for their support. Jeff Thorne has given freely of his time and offered insightful comments at different stages of development of this work. Bruce Weir has provided valuable perspective on statistical problems and has long supported my efforts to achieve closure on this project. Dahlia Nielsen has offered alternative interpretations and has often asked questions that evoked constructive thought about the problems discussed here. My greatest intellectual debt, iv however, is to Norm Kaplan, who worked with me from rough concepts through mathematical details to manuscript submissions. He kept me focused on each problem and provided valuable perspective throughout this work. The two years I spent completing this dissertation under Norm’s direction were, simply put, great fun. v Table of Contents Page List of Tables................................................................................................... vii List of Figures ................................................................................................. viii List of Symbols................................................................................................ ix List of Abbreviations...................................................................................... xi Introduction .................................................................................................... 1 Chapter 1 On the Advantage of Haplotype Analysis in the Presence of Multiple Disease Susceptibility Alleles .................................... 4 Abstract........................................................................................ 5 Introduction ................................................................................. 6 Methods ....................................................................................... 7 Likelihood Ratio Test............................................................ 7 Specification of Alternative Hypotheses............................... 8 Single Historical Scenario ..................................................... 8 Multiple Historical Scenarios................................................ 9 Expected LRT Power ............................................................ 10 Nonuniform Susceptibility Allele Frequencies ..................... 10 Results ......................................................................................... 11 Uniform Distribution of Multiple Susceptibility Mutations: 2, 4, and 8 Marker Alleles ................................................ 11 Uniform Distribution of Multiple Susceptibility Mutations: Haplotypes........................................................................ 12 Nonniform Distribution of Multiple Susceptibility Mutations.......................................................................... 13 Discussion.................................................................................... 14 Acknowledgements ..................................................................... 16 Appendix ..................................................................................... 16 References ................................................................................... 17 Chapter 2 Testing for Association with Nuclear Families in the Presence of Genotyping Error............................................ 18 Abstract........................................................................................ 19 Introduction ................................................................................. 20 vi Page Methods ....................................................................................... 21 Genotype Error Model........................................................... 21 Nuclear Family Model........................................................... 24 Likelihood of Observed Data ................................................ 25 Likelihood Ratio Test............................................................ 26 Data Analysis Strategies........................................................ 27 Simulations............................................................................ 28 Results ......................................................................................... 28 Type I Error Rate for Single SNP.......................................... 28 Type I Error Rate for 2 and 3 SNPs ...................................... 29 Power Loss Under MGE ....................................................... 31 Bias of RME Under Alternative Hypothesis......................... 31 Discussion.................................................................................... 32 Acknowledgements ..................................................................... 37 Appendix A ................................................................................. 38 Appendix B.................................................................................. 38 Likelihood of Complete Data ................................................ 38 Expectation Step.................................................................... 40 Maximization Step................................................................. 41 References ................................................................................... 43 Chapter 3 A Likelihood Approach to Power Computations for a Case-parents Design with Missing Parents and Additional Siblings ...................................................................
Recommended publications
  • Identifying Novel Disease-Associated Variants and Understanding The
    Identifying Novel Disease-variants and Understanding the Role of the STAT1-STAT4 Locus in SLE A dissertation submitted to the Graduate School of University of Cincinnati In partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Immunology Graduate Program of the College of Medicine by Zubin H. Patel B.S., Worcester Polytechnic Institute, 2009 John B. Harley, M.D., Ph.D. Committee Chair Gurjit Khurana Hershey, M.D., Ph.D Leah C. Kottyan, Ph.D. Harinder Singh, Ph.D. Matthew T. Weirauch, Ph.D. Abstract Systemic Lupus Erythematosus (SLE) or lupus is an autoimmune disorder caused by an overactive immune system with dysregulation of both innate and adaptive immune pathways. It can affect all major organ systems and may lead to inflammation of the serosal and mucosal surfaces. The pathogenesis of lupus is driven by genetic factors, environmental factors, and gene-environment interactions. Heredity accounts for a substantial proportion of SLE risk, and the role of specific genetic risk loci has been well established. Identifying the specific causal genetic variants and the underlying molecular mechanisms has been a major area of investigation. This thesis describes efforts to develop an analytical approach to identify candidate rare variants from trio analyses and a fine-mapping analysis at the STAT1-STAT4 locus, a well-replicated SLE-risk locus. For the STAT1-STAT4 locus, subsequent functional biological studies demonstrated genotype dependent gene expression, transcription factor binding, and DNA regulatory activity. Rare variants are classified as variants across the genome with an allele-frequency less than 1% in ancestral populations.
    [Show full text]
  • The Struggle to Find Reliable Results in Exome Sequencing Data
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE METHODS ARTICLEprovided by Frontiers - Publisher Connector published: 12 February 2014 doi: 10.3389/fgene.2014.00016 The struggle to find reliable results in exome sequencing data: filtering out Mendelian errors Zubin H. Patel 1,2 †, Leah C. Kottyan1,3 †, Sara Lazaro1,3 , Marc S. Williams 4 , David H. Ledbetter 4 , GerardTromp 4 , Andrew Rupert 5 , Mojtaba Kohram 5 , Michael Wagner 5 , Ammar Husami 6 , Yaping Qian 6 , C. Alexander Valencia 6 , Kejian Zhang 6 , Margaret K. Hostetter 7 , John B. Harley 1,3 and Kenneth M. Kaufman1,3 * 1 Division of Rheumatology, Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA 2 Medical Scientist Training Program, University of Cincinnati College of Medicine, Cincinnati, OH, USA 3 Department of Veterans Affairs, Veterans Affairs Medical Center – Cincinnati, Cincinnati, OH, USA 4 Genomic Medicine Institute, Geisinger Health System, Danville, PA, USA 5 Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA 6 Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA 7 Division of Infectious Disease, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA Edited by: Next Generation Sequencing studies generate a large quantity of genetic data in a Helena Kuivaniemi, Geisinger Health relatively cost and time efficient manner and provide an unprecedented opportunity to System, USA identify candidate causative variants that lead to disease phenotypes. A challenge to these Reviewed by: studies is the generation of sequencing artifacts by current technologies. To identify and Tatiana Foroud, Indiana University School of Medicine, USA characterize the properties that distinguish false positive variants from true variants, we Goo Jun, University of Michigan, USA sequenced a child and both parents (one trio) using DNA isolated from three sources *Correspondence: (blood, buccal cells, and saliva).
    [Show full text]
  • BIOINFORMATICS Pages 1–9
    Vol. 00 no. 00 2012 BIOINFORMATICS Pages 1–9 DELISHUS: An Efficient and Exact Algorithm for Genome-Wide Detection of Deletion Polymorphism in Autism Derek Aguiar 1;2, Bjarni V. Halldorsson´ 3, Eric M. Morrow ∗;4;5, Sorin Istrail ∗;1;2 1Department of Computer Science, Brown University, Providence, RI, USA 2Center for Computational Molecular Biology, Brown University, Providence, RI, USA 3School of Science and Engineering, Reykjavik University, Reykjavik, Iceland 4Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, RI, USA 5Department of Psychiatry and Human Behavior, Brown University, Providence, RI, USA Received on XXXXX; revised on XXXXX; accepted on XXXXX Associate Editor: XXXXXXX ABSTRACT mutations with severe effects is more commonly being viewed as a Motivation: The understanding of the genetic determinants of com- major component of disease (McClellan and King, 2010). Phenoty- plex disease is undergoing a paradigm shift. Genetic heterogeneity pic heterogeneity – a large collection of individually rare or personal of rare mutations with deleterious effects is more commonly being conditions – is associated with a higher genetic heterogeneity than viewed as a major component of disease. Autism is an excellent previously assumed. This heterogeneity spectrum can be summari- example where research is active in identifying matches between the zed as follows: (i) individually rare mutations collectively explain phenotypic and genomic heterogeneities. A considerable portion of a large portion of complex disease; (ii) a single gene may contain autism appears to be correlated with copy number variation, which many severe but rare mutations in unrelated individuals; (iii) the is not directly probed by single nucleotide polymorphism (SNP) array same mutation may lead to different clinical conditions in different or sequencing technologies.
    [Show full text]
  • The Clark Phase-Able Sample Size Problem: Long-Range Phasing and Loss of Heterozygosity in GWAS
    The Clark Phase-able Sample Size Problem: Long-Range Phasing and Loss of Heterozygosity in GWAS Bjarni V. Halld´orsson1,3,4,,, Derek Aguiar1,2,,, Ryan Tarpine1,2,,, and Sorin Istrail1,2,, 1 Center for Computational Molecular Biology, Brown University [email protected] 2 Department of Computer Science, Brown University [email protected] 3 School of Science and Engineering, Reykjavik University 4 deCODE genetics Abstract. A phase transition is taking place today. The amount of data generated by genome resequencing technologies is so large that in some cases it is now less expensive to repeat the experiment than to store the information generated by the experiment. In the next few years it is quite possible that millions of Americans will have been genotyped. The question then arises of how to make the best use of this information and jointly estimate the haplotypes of all these individuals. The premise of the paper is that long shared genomic regions (or tracts) are unlikely unless the haplotypes are identical by descent (IBD), in contrast to short shared tracts which may be identical by state (IBS). Here we estimate for populations, using the US as a model, what sample size of genotyped individuals would be necessary to have sufficiently long shared haplotype regions (tracts) that are identical by descent (IBD), at a statistically significant level. These tracts can then be used as input for a Clark-like phasing method to obtain a complete phasing solution of the sample. We estimate in this paper that for a population like the US and about 1% of the people genotyped (approximately 2 million), tracts of about 200 SNPs long are shared between pairs of individuals IBD with high probability which assures the Clark method phasing success.
    [Show full text]
  • Integration and Missing Data Handling in Multiple Omics Studies
    INTEGRATION AND MISSING DATA HANDLING IN MULTIPLE OMICS STUDIES by Zhou Fang MS in Industrial Engineering, University of Pittsburgh, 2014 BS in Theoretical and Applied Mechanics, Fudan University, China, 2012 Submitted to the Graduate Faculty of the Department of Biostatistics Graduate school of Public health in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2018 UNIVERSITY OF PITTSBURGH DEPARTMENT OF BIOSTATISTICS GRADUATE SCHOOL OF PUBLIC HEALTH This dissertation was presented by Zhou Fang It was defended on May 3rd 2018 and approved by George C. Tseng, ScD, Professor, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Wei Chen, PhD, Associate Professor, Department of Pediatrics, School of Medicine, University of Pittsburgh Gong Tang, PhD, Associate Professor, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Ming Hu, PhD, Assistant Staff, Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation Ying Ding, PhD, Assistant Professor, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh ii Dissertation Advisors: George C. Tseng, ScD, Professor, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Wei Chen, PhD, Associate Professor, Department of Pediatrics, School of Medicine, University of Pittsburgh iii Copyright c by Zhou Fang 2018 iv INTEGRATION AND MISSING DATA HANDLING IN MULTIPLE OMICS STUDIES Zhou Fang, PhD University of Pittsburgh, 2018 ABSTRACT In modern multiple omics high-throughput data analysis, data integration and missing- ness data handling are common problems in discovering regulatory mechanisms associated with complex diseases and boosting power and accuracy. Moreover, in genotyping problem, the integration of linkage disequilibrium (LD) and identity-by-descent (IBD) information be- comes essential to reach universal superior performance.
    [Show full text]
  • The NCBI Dbgap Database of Genotypes and Phenotypes
    COMMENTARY The NCBI dbGaP database of genotypes and phenotypes Matthew D Mailman, Michael Feolo, Yumi Jin, Masato Kimura, Kimberly Tryka, Rinat Bagoutdinov, Luning Hao, Anne Kiang, Justin Paschall, Lon Phan, Natalia Popova, Stephanie Pretel, Lora Ziyabari, Moira Lee, Yu Shao, Zhen Y Wang, Karl Sirotkin, Minghong Ward, Michael Kholodov, Kerry Zbicz, Jeffrey Beck, Michael Kimelman, Sergey Shevelev, Don Preuss, Eugene Yaschenko, Alan Graeff, James Ostell & Stephen T Sherry The National Center for Biotechnology Information has created the dbGaP public repository for individual-level http://www.nature.com/naturegenetics phenotype, exposure, genotype and sequence data and the associations between them. dbGaP assigns stable, unique identifiers to studies and subsets of information from those studies, including documents, individual phenotypic variables, tables of trait data, sets of genotype data, computed phenotype-genotype associations, and groups of study subjects who have given similar consents for use of their data. The technical advances and declining costs of The purposes of this description of dbGaP are collaborating researchers—a model that sup- high-throughput genotyping afford investi- threefold: (i) to describe dbGaP’s functionality ports documentation of protocols, phenotypes, gators fresh opportunities to do increasingly for users and submitters; (ii) to describe dbGaP’s variables and analysis through paper-based complex analyses of genetic associations with design and operational processes for database manuals and forms, personal communication phenotypic and disease characteristics. The methodologists to emulate or improve upon; or, more recently, electronic access privileges to Nature Publishing Group Group Nature Publishing 7 leading candidates for such genome-wide and (iii) to reassure the lay and scientific public a project data coordinating center.
    [Show full text]
  • Erciyes Medical Genetics Days 2019
    ERCIYES MEDICAL JOURNAL International Participated Erciyes Medical Genetics Days 2019 Uluslararası Katılımlı Erciyes Tıp Genetik Günleri 2019 DOI: 10.14744/etd.2019.55631 KARE ERCIYES MEDICAL JOURNAL International Participated Erciyes Medical Genetics Days 2019 21–23 February 2019, Erciyes University, Kayseri, Turkey Honorary Presidents of Congress Mustafa Calis M. Hakan Poyrazoglu Guest Editor Munis Dundar President of the Congress Munis Dundar Organizing Committee Owner Munis Dundar On Behalf of Faculty of Yusuf Ozkul Medicine, Erciyes University Cetin Saatci Dr. M. Hakan POYRAZOĞLU Muhammet E. Dogan Executive Committee of Turkish Society of Medical Genetics Mehmet Ali Ergun KARE Oya Uyguner Ayca Aykut Publisher Mehmet Alikasifoglu Kare Publishing Beyhan Durak Aras Project Assistants Taha Bahsi Eymen İSKENDER Altug Koc Graphics Neslihan ÇAKIR Contact Address: Dumlupınar Mah., Cihan Sok., No: 15, Concord İstanbul, B Blok 162, Kadıköy İstanbul, Turkey Phone: +90 216 550 61 11 Fax: +90 216 550 61 12 E-mail: [email protected] A-II ERCIYES MEDICAL JOURNAL Editor-in-Chief Jordi RELLO Mehmet DOĞANAY Centro de Investigación Biomédica en Red (CIBERES), Department of Infectious Diseases and Clinical Microbiology, Vall d’Hebron Hospital Campus, Barcelona, Spain Faculty of Medicine, Erciyes University, Kayseri, Turkey Jafar SOLTANİ Editors Infectious Diseases Unit, Department of Pediatrics, Niyazi ACER Faculty of Medicine, Kurdistan University of Medical Department of Anatomy, Faculty of Medicine, Sciences, Sanandaj, Iran Erciyes University, Kayseri,
    [Show full text]
  • Joint Variant and De Novo Mutation Identification on Pedigrees from High-Throughput Sequencing Data
    bioRxiv preprint doi: https://doi.org/10.1101/001958; this version posted January 24, 2014. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. JOINT VARIANT AND DE NOVO MUTATION IDENTIFICATION ON PEDIGREES FROM HIGH-THROUGHPUT SEQUENCING DATA JOHN G. CLEARY1,*, ROSS BRAITHWAITE1, KURT GAASTRA1, BRIAN S. HILBUSH2, STUART INGLIS1, SEAN A. IRVINE1, ALAN JACKSON1, RICHARD LITTIN1, SAHAR NOHZADEH-MALAKSHAH2, MINITA SHAH2, MEHUL RATHOD1, DAVID WARE1, LEN TRIGG1,†, AND FRANCISCO M. DE LA VEGA2,‡, † 1Real Time Genomics, Ltd., Hamilton 3240, New Zealand. 2Real Time Genomics, Inc. San Bruno, CA 94066, USA. The analysis of whole-genome or exome sequencing data from trios and pedigrees has being successfully applied to the identification of disease-causing mutations. However, most methods used to identify and genotype genetic variants from next- generation sequencing data ignore the relationships between samples, resulting in significant Mendelian errors, false positives and negatives. Here we present a Bayesian network framework that jointly analyses data from all members of a pedigree simultaneously using Mendelian segregation priors, yet providing the ability to detect de novo mutations in offspring, and is scalable to large pedigrees. We evaluated our method by simulations and analysis of WGS data from a 17 individual, 3- generation CEPH pedigree sequenced to 50X average depth. Compared to singleton calling, our family caller produced more high quality variants and eliminated spurious calls as judged by common quality metrics such as Ti/Tv, Het/Hom ratios, and dbSNP/SNP array data concordance.
    [Show full text]
  • Genetic Investigation of Kidney Disease
    Genetic Investigation of Kidney Disease A thesis submitted for the Degree of Doctor of Philosophy 2010 Daniel P. Gale UCL Division of Medicine Declaration I, Daniel Philip Gale, confirm that the work presented in this thesis is my own. Where information has been derived from other sources I confirm that this has been indicated in the thesis. 2 Acknowledgements I would like to thank the patients and families who participated in this study and from whom I have learned so much. Without their trust, hospitality and generosity none of this would have been possible. I am especially glad to have the opportunity to thank my supervisor, Professor Patrick Maxwell whose support, encouragement and wisdom have been so valuable in both the laboratory and the hospital. I would also like to thank the Medical Research Council for funding this work. I also owe a debt of gratitude to my colleagues in the laboratories and hospitals in which I worked – especially Drs Sarah Harten, Tapan Bhattacharyya, Alkis Pierides, Margaret Ashcroft and Deepa Shukla; and to my collaborators, in particular Professors Terry Cook, Ted Tuddenham and Peter Robbins, and Drs Matthew Pickering, Nick Talbot and Elena Goicoechea de Jorge. I would also like to thank Guy Neild, Tom Connor, Tony Segal and Oliver Staples for many stimulating discussions, and my family and friends for their unwavering support and interest. 3 Abstract Kidney disease is an important contributor to the burden of ill-health worldwide. Genetic factors have an important role in determining which people are affected by kidney disease, and this study aimed to identify the genes responsible for disease in families in which unusual kidney diseases were transmitted in a pattern suggesting autosomal dominant inheritance.
    [Show full text]
  • Exploration of Interaction Between Common and Rare Variants in Genetic Susceptibility to Holoprosencephaly Artem Kim
    Exploration of interaction between common and rare variants in genetic susceptibility to holoprosencephaly Artem Kim To cite this version: Artem Kim. Exploration of interaction between common and rare variants in genetic susceptibility to holoprosencephaly. Human health and pathology. Université Rennes 1, 2020. English. NNT : 2020REN1B019. tel-03199534 HAL Id: tel-03199534 https://tel.archives-ouvertes.fr/tel-03199534 Submitted on 15 Apr 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESE DE DOCTORAT DE L'UNIVERSITE DE RENNES 1 ECOLE DOCTORALE N° 605 Biologie Santé Spécialité : Génétique, génomique et bio-informatique Par Artem KIM « Prénom NOM » Exploration de l’interaction entre variants rares et communs dans la susceptibilité génétique à holoprosencéphalie Thèse présentée et soutenue à Rennes, le 30/09/2020 Unité de recherche : Institut de Génétique et Développement de Rennes (IGDR), UMR6290 CNRS, Université Rennes 1 Rapporteurs avant soutenance : Composition du Jury : Thomas BOURGERON Thomas BOURGERON Professeur des universités Professeur
    [Show full text]
  • Estimation of Genotype Error Rate Using Samples with Pedigree Information—An Application on the Genechip Mapping 10K Array
    Genomics 84 (2004) 623–630 www.elsevier.com/locate/ygeno Estimation of genotype error rate using samples with pedigree information—an application on the GeneChip Mapping 10K array Ke Hao,a Cheng Li,a,b Carsten Rosenow,c and Wing Hung Wonga,d,* a Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA b Department of Biostatistics, Dana Farber Cancer Institute, Boston, MA, USA c Genomics Collaboration, Affymetrix, Santa Clara, CA, USA d Department of Statistics, Harvard University, Cambridge, MA, USA Received 16 January 2004; accepted 13 May 2004 Available online 13 July 2004 Abstract Currently, most analytical methods assume all observed genotypes are correct; however, it is clear that errors may reduce statistical power or bias inference in genetic studies. We propose procedures for estimating error rate in genetic analysis and apply them to study the GeneChip Mapping 10K array, which is a technology that has recently become available and allows researchers to survey over 10,000 SNPs in a single assay. We employed a strategy to estimate the genotype error rate in pedigree data. First, the ‘‘dose–response’’ reference curve between error rate and the observable error number were derived by simulation, conditional on given pedigree structures and genotypes. Second, the error rate was estimated by calibrating the number of observed errors in real data to the reference curve. We evaluated the performance of this method by simulation study and applied it to a data set of 30 pedigrees genotyped using the GeneChip Mapping 10K array. This method performed favorably in all scenarios we surveyed.
    [Show full text]
  • Mendelian Error Detection in Complex Pedigrees Using Weighted Constraint Satisfaction Techniques Marti Sanchez, Simon De Givry, Thomas Schiex
    Mendelian error detection in complex pedigrees using weighted constraint satisfaction techniques Marti Sanchez, Simon de Givry, Thomas Schiex To cite this version: Marti Sanchez, Simon de Givry, Thomas Schiex. Mendelian error detection in complex pedigrees using weighted constraint satisfaction techniques. Artificial Intelligence Research and Development, 163, IOS Press, pp.22, 2007, Frontiers in Artificial Intelligence and Applications, 978-1-60750-285-2 978-1-58603-798-7. hal-02814634 HAL Id: hal-02814634 https://hal.inrae.fr/hal-02814634 Submitted on 6 Jun 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Mendelian error detection in complex pedigrees using weighted constraint satisfaction techniques Marti Sanchez [email protected] Simon de Givry [email protected] Thomas Schiex [email protected] INRA - UBIA Toulouse, France October 12, 2007 Abstract With the arrival of high throughput genotyping techniques, the detection of likely genotyping errors is becoming an increasingly important problem. In this paper we are interested in errors that violate Mendelian laws. The problem of deciding if a Mendelian error exists in a pedigree is NP-complete [1]. Existing tools dedicated to this problem may offer different level of services: detect simple inconsistencies using local reasoning, prove inconsistency, detect the source of error, propose an optimal correc- tion for the error.
    [Show full text]