A Big Data Perspective on the Genomics of Hearing Loss
Total Page:16
File Type:pdf, Size:1020Kb
Online publiziert: 03.04.2019 Referat Vona Barbara et al. A Big Data Perspective … Laryngo-Rhino- Otol 2019; 00: 00–00 A Big Data Perspective on the Genomics of Hearing Loss Authors Barbara Vona, Marcus Müller, Saskia Dofek, Martin Holderried, Hubert Löwenheim, Anke Tropitzsch Affiliation use of several big data resources that take the form of clinical Eberhard Karls Universität, Universitäts-Hals-Nasen- Ohren- genetics data repositories, in silico prediction tools, and allele Klinik Tuebingen, Germany frequency databases. These exceptional developments have cultivated high-throughput sequencing technologies that are Key words capable of producing affordable high-quality data ranging from big data, genetics, genomics, GJB2, hearing loss diagnostics, targeted gene panels to exomes and genomes. These new ad- high-throughput sequencing, variant interpretation vancements have revolutionized the diagnostic paradigm of hereditary diseases including genetic hearing loss. Bibliography Dissecting hereditary hearing loss is exceptionally challenging DOI https://doi.org/10.1055/a-0803-6149 due to extensive genetic and clinical heterogeneity. There are Online-Publikation: 2019 presently over 150 genes involved in non-syndromic and com- Laryngo-Rhino-Otol 2019; 98: S58–S81 mon syndromic forms of hearing loss. The mutational spectrum © Georg Thieme Verlag KG Stuttgart · New York of a single hearing loss associated-gene can have several tens ISSN 0935-8943 to hundreds of pathogenic variants. Moreover, variant inter- pretation of novel variants can pose a challenge when conflic- Correspondence ting information is deposited in valuable databases. Harnessing Prof. Dr. med. Hubert Löwenheim the power that comes from detailed and structured phenotypic Univ. HNO-Klinik information has proven promising for some forms of hearing Elfriede-Aulhorn-Str. 5 loss, but may not be possible for all genetic forms due to high- D-72076 Tübingen ly variable clinical presentations. New knowledge in both dia- Germany gnostic and scientific realms continues to rapidly accumulate. [email protected] This overwhelming amount of information represents an incre- asing challenge for medical specialists. As a result, specialist AbstRACT medical care may evolve to take on new tasks and facilitate the The completion of the human genome, the most fundamental interface between the human genetic diagnostic laboratory example of big data in science and medicine, is the remarkab- and the patient. These tasks include genetic counselling and le product of multidisciplinary collaboration and is regarded as the inclusion of genetics results in patient care. one of the largest and most successful undertakings in human This overview is intended to serve as a reference to otolaryn- history. Unravelling the human genome means not only iden- gologists who wish to gain an introduction to the molecular tifying the sequence of its more than 3.2 billion nucleotide genetics of hearing loss. Key concepts of molecular genetic bases, but also understanding disease-associated variations diagnostics will be presented. The complex processes under- and applying this knowledge to patient-tailored precision me- lying the identification and interpretation of genetic variants dicine approaches. Genomics has moved at a remarkable pace, in particular would be inconceivable without the enormous with much of the propelling forces behind this credited to tech- amount of data available. In this respect, "big data" is an indis- nological developments in sequencing, computing, and bioin- pensable prerequisite for filtering genetic data in specific indi- formatics, that have given rise to the term “big genomics data.” vidual cases and making it clear and useful, especially for clini- The analysis of genetics data in a disease context involves the cians in contact with patients. S58 Vona B et al. A Big Data Perspective … Laryngo-Rhino-Otol 2019; 98: S58–S81 Inhaltsverzeichnis 1. Glossary 59 KCNQ4 potassium channel voltage gated KQT-like 2. Big Data in the Age of Genomics 60 subfamily member 4 LOVD Leiden Open Variation Database 2.1. Genetic variation—benign or pathogenic? 60 MAF minor allele frequency 3. Paving a Path for the Genomics Revolution 60 mRNA messenger ribonucleic acid 4. Development of High-throughput Sequencing Technologies 61 MYO1A myosin IA 5. The Genetics of Hearing Loss 63 P postnatal day PCR polymerase chain reaction 6. Altering the Diagnostic Paradigm for Hearing Loss 63 SHIELD Shared Harvard Inner-Ear Laboratory 6.1. Gene panel diagnostics in hearing loss 66 Database 6.2. Exome diagnostics in hearing loss 66 SIFT Sorting Intolerant from Tolerant 6.3. Advantages and disadvantages of gene panels and STRC stereocilin exome-based diagnostics 67 T thymine 6.4. Diagnostic Rates 67 WFS1 wolframin ER transmembrane glycoprotein 7. Computational Resources 69 8. High-throughput sequencing analysis 72 9. An Example of Variant Analysis From GJB2 74 1. Glossary 10. From Genome to Phenome 76 Non-sex chromosome. 11. The Outlook of High-throughput Sequencing 78 Autosome: Baits: Capture probes that are made out of oligonucleotides com- 12. Conclusions for Clinical Practice 78 plimentary to a region of interest for sequencing. 13. Acknowledgements 78 Copy number variation: Deletions or duplications of chromoso- References 78 mal regions that affect the number of gene copies. Coverage: The collection of aligned sequencing reads across a nu- cleotide or region of interest. Dideoxynucleotides: Modified deoxynucleotides that lack a 3′ hy- droxyl group to inhibit chain elongation in Sanger sequencing. DNA library: A collection of amplified DNA fragments for high- ABBReviatiONS throughput sequencing. A adenine Exome: The part of the genome that is composed of exons that are C cytosine translated into proteins. CADD Combined Annotation Dependent Depletion Exome sequencing: Sequencing of all exons in coding genes. COL11A2 collagen type XI, alpha-2 Exon: A region of a gene that encodes a protein. DFNA2A deafness, autosomal dominant 2 A locus Gene panel diagnostics: Sequencing of selected genes relevant DFNA3A deafness, autosomal dominant 3 A locus to a specific disease. DFNA6/14/38 deafness, autosomal dominant 6/14/38 locus Genome: The complete set of DNA in an organism. DFNA13 deafness, autosomal dominant 13 locus Gigabase: 109 nucleotide bases. DFNB1A deafness, autosomal recessive 1 A locus High-throughput sequencing: A scalable and relatively cheap se- DFNB16 deafness, autosomal recessive 16 locus quencing method that can range from gene panels to genome se- ddNTP dideoxynucleotide quencing. dNTP deoxynucleotide Indel: A term for the insertion or deletion of one or more bases in DVD Deafness Variation Database a genome. E embryonic day In silico gene panel: A computational filter applied to exome or EVS Exome Variant Server genome sequencing data that restricts the variants for analysis in ExAC Exome Aggregation Consortium Browser a selected sub-set of genes. G guanine In silico pathogenicity prediction: Computational tools that pre- Gb gigabase dict the pathogenicity of variants. GJB2 gap junction protein beta 2 Intron: A non-coding region of a gene between two coding exons. GJB6 gap junction protein beta 6 Kilobase: 1,000 nucleotide bases. GME Greater Middle Eastern Variome Megabase: 1,000,000 nucleotide bases. gnomAD genome aggregation database Minor allele frequency: The frequency of the less common allele HGMD Human Gene Mutation Database (MAF). HPO Human Phenotype Ontology Missense variant: A nucleotide substitution that changes an amino acid. Vona B et al. A Big Data Perspective … Laryngo-Rhino-Otol 2019; 98: S58–S81 S59 Referat Moore’s law: An observation that the number of transistors on a curring errors in the germline or somatic cells can introduce both dense integrated circuit doubles every two years, thus cutting costs small and large changes, termed genetic variation, into our geno- of transistors in half. mes, much of which can be considered benign or polymorphic, Non-synonymous variant: A nucleotide substitution that alters while other changes are associated with disease states. These chan- the amino acid sequence. ges can affect single nucleotides or bases (adenine (A), thymine (T), Nonsense variant: A nucleotide substitution that results in a pre- guanine (G), and cytosine (C)) or several million nucleotides in the mature stop codon during transcription. genome (e.g. large duplications or deletions). However, changes Phenome: The comprehensive description of the phenotype and can also involve whole chromosomes (e. g. monosomy, trisomy) or course of disease in an individual. involve the exchange of genetic material within different parts of Read: A short fragment of sequence. a single chromosome (intrachromosomal rearrangement) or bet- Sanger sequencing: A type of sequencing that uses a chain-ter- ween different chromosomes (interchromosomal rearrangement). mination method with chemically modified dideoxynucleotides that determines the nucleotide sequence. 2.1. Genetic variation—benign or pathogenic? Secondary findings: A genetic test result that is unrelated to the One of the largest sequencing studies to date analysed human va- primary disease indication. riation in 60,706 individuals and estimated that the protein coding Sequencing gap: A region that is poorly covered or missed during sequence (exome sequence) of human genes harbours the equi- sequencing usually due to technical reasons. valent of one variant every eight nucleotide positions across the Splice site variant: