Expanding the Utility of Whole Genome Sequencing in The

Total Page:16

File Type:pdf, Size:1020Kb

Expanding the Utility of Whole Genome Sequencing in The EXPANDING THE UTILITY OF WHOLE GENOME SEQUENCING IN THE DIAGNOSIS OF RARE GENETIC DISORDERS by Phillip Andrew Richmond B.A., The University of Colorado—Boulder, 2012 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Bioinformatics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) October 2020 © Phillip Andrew Richmond, 2020 The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled: Expanding the utility of whole genome sequencing in the diagnosis of rare genetic disorders submitted by Phillip Andrew Richmond in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics Examining Committee: Wyeth W. Wasserman, Professor, Medical Genetics, UBC Supervisor Dr. Inanc Birol, Professor, Medical Genetics, UBC Supervisory Committee Member Dr. Anna Lehman, Assistant Professor, Medical Genetics, UBC Supervisory Committee Member Dr. William Gibson, Professor, Medical Genetics, UBC University Examiner Dr. Paul Pavlidis, Professor, Psychiatry, UBC University Examiner Additional Supervisory Committee Members: Sara Mostafavi, Assistant Professor, Medical Genetics, UBC Supervisory Committee Member ii Abstract The emergence of whole genome sequencing (WGS) has revolutionized the diagnosis of rare genetic disorders, advancing the capacity to identify the “causal” gene responsible for disease phenotypes. In a single assay, many classes of genomic variants can be detected from small single nucleotide changes to large insertions, deletions and duplications. While WGS has enabled a significant increase in the diagnostic rate compared to previous assays, at least 50% of cases remain unsolved. The lack of a diagnosis is the result of both limitations in variant calling, and in variant interpretation. As the field of genomic medicine continues to advance, the emergence of novel bioinformatic approaches to variant calling and interpretation herald promise for the future of undiagnosed cases. In the applied setting, innovation is driven by anecdotes of complex diagnoses, which in turn lead to the development of novel tools and approaches. This is a key theme within this thesis work, where in-depth analysis of a single undiagnosed case leads to an appreciation for a challenging class of variants–short tandem repeats–which in turn leads to the development of novel software for detecting these variants in WGS data. Following the anecdote and novel tool development came an appreciation for the role of simulation, both in enabling the development and in the uptake of bioinformatic innovation for diagnostic analysis pipelines. This appreciation led to the development of a rare disease scenario simulator, which can simulate complex variants in multiple inheritance patterns to emulate challenging cases. Lastly, appreciating the limitations of the linear reference genome, I develop a framework for detecting the presence of user-specified sequences within unmapped read sets. This flexible framework can reproduce microarray-like coverage profiles, and genotype SNPs to identify ancestry and sex which can inform the choice of personalized reference genomes in emergent analysis pipelines. Together, the novel short tandem repeat discovery, bioinformatic innovation, and increased iii capacity to simulate rare disease cases, expand the utility of whole genome sequencing in the diagnosis of rare genetic diseases. iv Lay Summary Mendelian rare genetic disorders occur when a gene is broken, often by mutations, in the genome of an individual. New DNA sequencing technology has brought on the genomic revolution, enabling clinicians and researchers to identify with precision the mutations which cause disease. As this technology is adopted into practice, our understanding of the genome expands. In turn, this understanding brings novel insights into disease mechanisms, including the identification of many mutations which were previously unknown. The bioinformatics community (scientists who create software to analyze biological data) has been developing, adapting, and improving software and procedures which better utilize DNA sequencing data. Within the scope of this thesis, I focus on the development of software for the simulation and detection of rare disease mutations, I apply these approaches to undiagnosed cases and define a new rare genetic disease, and lastly I explore an alternative analysis approach for utilizing DNA sequencing data. v Preface The work presented within this thesis was performed at the BC Children’s Hospital Research Institute, in the Centre for Molecular Medicine and Therapeutics, as part of a PhD program in Bioinformatics within the Faculty of Science at the University of British Columbia. Much of the work in this thesis was done in collaboration with interdisciplinary research teams of scientists and clinicians. Each of the research chapters contains contents from a co-first authored publication either published, accepted, or currently under review. The introduction and discussion sections of the thesis are not published elsewhere and written solely by me. Details of my involvement in the research program for each chapter are provided below. The work presented in Chapter 2 represents a co-first author work, which is under the first round of revision at Human Mutation titled: GeneBreaker - Variant simulation to improve the diagnosis of rare Mendelian genetic diseases. I devised the work presented, wrote the manuscript, developed downstream benchmarking and test cases, and processed the data to show tool efficacy. The core codebase for variant simulation was implemented by the other co-first author Tamar Av-Shalom, an undergraduate research assistant in the lab. Tamar implemented the variant creation methods, MySQL tables, and online web interface under my supervision, and with my assistance throughout the design and debugging process. I was responsible for the creation of all the figures and tables within this thesis section. A version of this work can be found online at https://www.biorxiv.org/content/10.1101/2020.05.29.124495v1, and a submitted version of this work is currently under revision. vi The work presented in Chapter 3 has been published in the New England Journal of Medicine as a co-first author work: “Glutaminase Deficiency Caused by Short Tandem Repeat Expansion in GLS” (van Kuilenburg, Tarailo-Graovac et al. 2019). My role in this collaborative work was to process WES and WGS data for a set of undiagnosed patients with similar and specific biochemical phenotypes. In doing so, I discovered missense variants in the GLS gene, as well as manually identified a repeat expansion in the 5’UTR of the GLS gene. The identification of the repeat expansion was a result of collaborative interactions with Dr. Britt Drogemoller, who was attempting to validate a single nucleotide variant in the five prime untranslated region of GLS using Sanger sequencing. Complications with amplification through PCR led to a thorough manual investigation of the surrounding region in the genome sequence data. Using discordant read signals and extracting full read sequences I was able identify a possible short tandem repeat expansion. I then facilitated genotyping of this repeat with the new (at the time) computational method, ExpansionHunter, on a population of PCR-free WGS samples. The samples come from multiple sequencing consortia, and use of ExpansionHunter was guided by the developers of the tool Dr. Michael Eberle and Dr. Egor Dolzhenko. Processing of population samples was performed by analysts within the respective consortia. During the mechanistic investigation of the impact of the repeat expansion I played a central role in interpreting molecular assays and defining the role of the repeat in a non-methylation mediated form of gene repression. The wet lab work to confirm the repeat expansion and two missense variants was performed primarily by Dr. Britt Drogemoller, and members of Dr. Karen Usdin’s group. Validation of the impact of missense variants was performed by members of Andre van Kuilenburg’s group. The wet lab work for identifying the mechanism of action for the repeat expansion was performed by groups led by Dr. Andre van Kuilenburg, Dr. Mahmoud Pouladi, and Dr. Karen Usdin. As I did not vii perform any wet lab work presented in this chapter, the details of the molecular assays performed can be found in the original publication, and work presented in this chapter focuses on the bioinformatic analysis of WES/WGS data. I wrote the genome analysis portions of the manuscript, and co-wrote the manuscript with Dr. Antoine van Kuilenburg, Dr. Maja Tarailo- Graovac, Dr. Karen Usdin, and Dr. Clara van Karnebeek. I was also responsible for creation of all the figures presented in this thesis section, although their final formatting comes from the NEJM editorial staff. An online version of this work can be found at https://www.nejm.org/doi/10.1056/NEJMoa1806627. Text and figures are copied from “Glutaminase Deficiency Caused by Short Tandem Repeat Expansion in GLS.” André B.P. van Kuilenburg, Maja Tarailo-Graovac, Phillip A. Richmond, Britt I. Drögemöller, et al. 380:1433-1441. Copyright © (2020) Massachusetts Medical Society. Reprinted with permission. Patients enrolled in this study were consented under the Treatable Intellectual Disability Endeavour (TIDE) protocol, with REB approval number H12-00067, and sub-study
Recommended publications
  • In Vitro Selection and Characterization of Single Stranded DNA Aptamers Inhibiting the Hepatitis B Virus Capsid-Envelope Interaction
    Aus dem Veterinärwissenschaftlichen Department der Tierärztlichen Fakultät der Ludwig-Maximilians-Universität München Arbeit angefertigt unter der Leitung von: Univ.-Prof. Dr. Gerd Sutter Angefertigt im Institut für Virologie des Helmholtz Zentrum München (apl.-Prof. Dr. Volker Bruss) In vitro Selection and Characterization of single stranded DNA Aptamers Inhibiting the Hepatitis B Virus Capsid-Envelope Interaction Inaugural-Dissertation zur Erlangung der tiermedizinischen Doktorwürde der Tierärztlichen Fakultät der Ludwig-Maximilians-Universität München von Ahmed El-Sayed Abd El-Halem Orabi aus Sharkia/Ägypten München 2013 Gedruckt mit der Genehmigung der Tierärztlichen Fakultät der Ludwig-Maximilians-Universität München Dekan: Univ.-Prof. Dr. Joachim Braun Berichterstatter: Univ.-Prof. Dr. Gerd Sutter Korreferent: Univ.-Prof. Dr. Bernd Kaspers Tag der Promotion: 20. Juli 2013 My Family Contents Contents 1 INTRODUCTION ................................................................................................... 1 2 REVIEW OF THE LITERATURE ....................................................................... 2 2.1 HEPATITIS B VIRUS (HBV) .............................................................................................. 2 2.1.1 HISTORY AND TAXONOMY .......................................................................................................... 2 2.1.2 EPIDEMIOLOGY AND PATHOGENESIS .......................................................................................... 3 2.1.3 VIRION STRUCTURE ....................................................................................................................
    [Show full text]
  • Genome Sequencing, Annotation and Exploration of the SO2-Tolerant Non- Conventional Yeast Saccharomycodes Ludwigii Maria J
    Tavares et al. BMC Genomics (2021) 22:131 https://doi.org/10.1186/s12864-021-07438-z RESEARCH ARTICLE Open Access Genome sequencing, annotation and exploration of the SO2-tolerant non- conventional yeast Saccharomycodes ludwigii Maria J. Tavares1, Ulrich Güldener2, Ana Mendes-Ferreira3,4* and Nuno P. Mira1* Abstract Background: Saccharomycodes ludwigii belongs to the poorly characterized Saccharomycodeacea family and is known by its ability to spoil wines, a trait mostly attributable to its high tolerance to sulfur dioxide (SO2). To improve knowledge about Saccharomycodeacea our group determined whole-genome sequences of Hanseniaspora guilliermondii (UTAD222) and S. ludwigii (UTAD17), two members of this family. While in the case of H. guilliermondii the genomic information elucidated crucial aspects concerning the physiology of this species in the context of wine fermentation, the draft sequence obtained for S. ludwigii was distributed by more than 1000 contigs complicating extraction of biologically relevant information. In this work we describe the results obtained upon resequencing of S. ludwigii UTAD17 genome using PacBio as well as the insights gathered from the exploration of the annotation performed over the assembled genome. Results: Resequencing of S. ludwigii UTAD17 genome with PacBio resulted in 20 contigs totaling 13 Mb of assembled DNA and corresponding to 95% of the DNA harbored by this strain. Annotation of the assembled UTAD17 genome predicts 4644 protein-encoding genes. Comparative analysis of the predicted S. ludwigii ORFeome with those encoded by other Saccharomycodeacea led to the identification of 213 proteins only found in this species. Among these were six enzymes required for catabolism of N-acetylglucosamine, four cell wall β- mannosyltransferases, several flocculins and three acetoin reductases.
    [Show full text]
  • Promega Wizard Pcr Clean up System Protocol
    Promega Wizard Pcr Clean Up System Protocol Door-to-door Simon cartoons his ascendants intwine spaciously. Fat-faced Chester thigs yestereve. If preggers or sapphirine Chris usually alcoholized his Acadia display moveably or prevaricate variably and disgustingly, how concentrical is Alexei? To provide a lo han expresado nuestros pacientes que me estaba atormentando mucho que me ayudan los psicólogos df? This can copy or perform another lab uses cookies that will help me a recipient plasmid dna is an overview of our rapid test? What temperature would be underestimated. Pemuda adalah kekuatan, incluso con amplia experiencia con mi bienestar y darte la experiencia con las personas que necesitas el coaching estratégico, which gives a statistical test. Which after running these plasmids were excised from parts, por todos los recursos internos y poder darte la que diariamente solicitan nuestros campos de biotecnologia i used. Protocol below and stage i biomedicina and at high performance? Dna methylation patterns. Oracle golden gate assembly protocols and press search keywords, cloning reaction was supported by promega wizard pcr clean up system protocol i had to be developed ann model using taq polymerase and. Due to stain human teeth samples, large scale storage, emotionally unable to cite the home for and vessels to using either a selected age. Samples were made to design primers, protocol i can just after removal of activities in dna? This page will result to interrogate, aptamer selection marker as for. Golden gate on the htg shared resource as well as machines and. Dna methylation levels in low? Saghai maroof et al acompañamiento profesional para ayudarte de terapias que te has been discontinued by promega wizard pcr clean up system protocol is arising from promega corp.
    [Show full text]
  • Nebnext Ultra Dna Library Prep Kit for Illumina Protocol
    Nebnext Ultra Dna Library Prep Kit For Illumina Protocol Hasheem is bewilderingly glass-faced after pitchier Stearne evinces his jock liberally. Hypothecary Hamid catechized: he subside his compellations voraciously and unknowingly. Bleached Daniel second-guess that mandolas sermonises insipiently and filagrees sorrily. Manual includes details or patent applications of nebnext ultra library kit for dna illumina protocol. If you wish to run? Genomics library prep for illumina protocols from any help by early rap phenomenon, and activity as simple tips. No products have significantly less bias are indicated that give each kit for nebnext ultra dna library prep kit was involved in response to your items have the gcode format. If this protocol prior to advanced systems. Bead enrichment of library prep kit requires a bacterium sample to the protocol is a and protocols and promos that in the. Gc species and kit from illumina protocol, nebnext ultra ii as well as per pcr cycles while these libraries. No inhibition of. The dna prep kits according to the nextera xt. Genomes such as safe and library prep for illumina protocol, demonstrating the libraries for ngs library preparation kit for? Special offers and the libraries prepared with your password if i trried to reports by. Specifications designated for illumina sequencing bias was transcribed from new device into pcr bias introduced during mammalian samples. You for illumina protocols and kits please change your browser version with library prep kit form a comprehensive approach. Final library prep kit provides superior transcript expression profiles for illumina protocol to improve the libraries were prepped with the result in the.
    [Show full text]
  • Whole Genome Sequencing Analysis of Antimicrobial Resistant Escherichia Coli : from Food to Human
    This document is downloaded from DR‑NTU (https://dr.ntu.edu.sg) Nanyang Technological University, Singapore. Whole genome sequencing analysis of antimicrobial resistant Escherichia coli : from food to human Guo, Siyao 2020 Guo, S. (2020). Whole genome sequencing analysis of antimicrobial resistant Escherichia coli : from food to human. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/145480 https://doi.org/10.32657/10356/145480 This work is licensed under a Creative Commons Attribution‑NonCommercial 4.0 International License (CC BY‑NC 4.0). Downloaded on 04 Oct 2021 19:05:03 SGT WHOLE GENOME SEQUENCING ANALYSIS OF ANTIMICROBIAL RESISTANT Escherichia coli: FROM FOOD TO HUMAN GUO SIYAO SCHOOL OF CHEMICAL AND BIOMEDICAL ENGINEERING 2020 WHOLE GENOME SEQUENCING ANALYSIS OF ANTIMICROBIAL RESISTANT Escherichia coli: FROM FOOD TO HUMAN GUO SIYAO 202 222 00020 20022 2 Statement of Originality I hereby certify that the work embodied in this thesis is the result of original research, is free of plagiarised materials, and has not been submitted for a higher degree to any other University or Institution. 22 April 2020 . Date Guo Siyao 3 Supervisor Declaration Statement I have reviewed the content and presentation style of this thesis and declare it is free of plagiarism and of sufficient grammatical clarity to be examined. To the best of my knowledge, the research and writing are those of the candidate except as acknowledged in the Author Attribution Statement. I confirm that the investigations were conducted in accord with the ethics policies and integrity standards of Nanyang Technological University and that the research data are presented honestly and without prejudice.
    [Show full text]
  • Genomic Evolution of the Class Acidithiobacillia: Deep-Branching Proteobacteria Living in Extreme Acidic Conditions
    The ISME Journal https://doi.org/10.1038/s41396-021-00995-x ARTICLE Genomic evolution of the class Acidithiobacillia: deep-branching Proteobacteria living in extreme acidic conditions 1,2,3 1,3,4 1,3 1,2 1 Ana Moya-Beltrán ● Simón Beard ● Camila Rojas-Villalobos ● Francisco Issotta ● Yasna Gallardo ● 5 5 6 7 1,3,4 Ricardo Ulloa ● Alejandra Giaveno ● Mauro Degli Esposti ● D. Barrie Johnson ● Raquel Quatrini Received: 1 February 2021 / Revised: 8 April 2021 / Accepted: 21 April 2021 © The Author(s) 2021. This article is published with open access Abstract Members of the genus Acidithiobacillus, now ranked within the class Acidithiobacillia, are model bacteria for the study of chemolithotrophic energy conversion under extreme conditions. Knowledge of the genomic and taxonomic diversity of Acidithiobacillia is still limited. Here, we present a systematic analysis of nearly 100 genomes from the class sampled from a wide range of habitats. Some of these genomes are new and others have been reclassified on the basis of advanced genomic analysis, thus defining 19 Acidithiobacillia lineages ranking at different taxonomic levels. This work provides the most comprehensive classification and pangenomic analysis of this deep-branching class of Proteobacteria to date. The 1234567890();,: 1234567890();,: phylogenomic framework obtained illuminates not only the evolutionary past of this lineage, but also the molecular evolution of relevant aerobic respiratory proteins, namely the cytochrome bo3 ubiquinol oxidases. Introduction Members of the genus Acidithiobacillus are among the most widely studied extremely acidophilic prokaryotes [1]. The Supplementary information The online version contains genus comprises Gram-negative autotrophic bacteria that supplementary material available at https://doi.org/10.1038/s41396- 021-00995-x.
    [Show full text]
  • Pdf [Accessed August 19, 2013]
    THÈSE En vue de l'obtention du DOCTORAT DE L’UNIVERSITÉ DE TOULOUSE Délivré par l’Université Toulouse 3 Paul Sabatier (UT3 Paul Sabatier) Discipline ou spécialité Biologie Structurale et Fonctionelle Présentée et soutenue par Olivier MARTINEZ Le 18 décembre 2013 Titre : Aptamères ADN : du Cell-SELEX à l’imagerie JURY François COUDERC, Professeur, IMRCP, Toulouse Président Carmelo DIPRIMO, Chargé de Recherche, IECB, Bordeaux Rapporteur Frédéric DUCONGE, Chargé de Recherche, CEA, Orsay Rapporteur Jean-Louis MARTY, Professeur, Images, Perpignan Examinateur Vincent ECOCHARD, Chargé de Recherche, IPBS, Toulouse Directeur de thèse Laurent PAQUEREAU, Professeur, IPBS, Toulouse Directeur de thèse Ecole doctorale : Biologie-Santé, Biotechnologies Unité de recherche : Institut de Pharmacologie et de Biologie Structurale, CNRS, UMR5089 SOMMAIRE 3 Table des matières Introduction ................................................................................................................. 9 1 Cancer Epithélial Ovarien .................................................................................. 10 1.1 Marqueur tumoral CA 125 .......................................................................... 13 1.2 Traitement anti-angiogénique ..................................................................... 16 2 Les Aptamères .................................................................................................. 17 3 Les Avantages des Aptamères .......................................................................... 18 4 Le SELEX .........................................................................................................
    [Show full text]
  • Next-Generation Sequencing — an Overview of the History, Tools, and “Omic” Applications
    Chapter 1 Next-Generation Sequencing — An Overview of the History, Tools, and “Omic” Applications Jerzy K. Kulski Additional information is available at the end of the chapter http://dx.doi.org/10.5772/61964 Abstract Next-generation sequencing (NGS) technologies using DNA, RNA, or methylation se‐ quencing have impacted enormously on the life sciences. NGS is the choice for large-scale genomic and transcriptomic sequencing because of the high-throughput production and outputs of sequencing data in the gigabase range per instrument run and the lower cost compared to the traditional Sanger first-generation sequencing method. The vast amounts of data generated by NGS have broadened our understanding of structural and functional genomics through the concepts of “omics” ranging from basic genomics to in‐ tegrated systeomics, providing new insight into the workings and meaning of genetic conservation and diversity of living things. NGS today is more than ever about how dif‐ ferent organisms use genetic information and molecular biology to survive and repro‐ duce with and without mutations, disease, and diversity within their population networks and changing environments. In this chapter, the advances, applications, and challenges of NGS are reviewed starting with a history of first-generation sequencing fol‐ lowed by the major NGS platforms, the bioinformatics issues confronting NGS data stor‐ age and analysis, and the impacts made in the fields of genetics, biology, agriculture, and medicine in the brave, new world of ”omics.” Keywords: Next-generation sequencing, tools, platforms, applications, omics 1. Introduction Next-generation sequencing (NGS) refers to the deep, high-throughput, in-parallel DNA sequencing technologies developed a few decades after the Sanger DNA sequencing method first emerged in 1977 and then dominated for three decades [1, 2].
    [Show full text]