Whole-Exome Sequencing and Its Impact in Hereditary Hearing Loss
Total Page:16
File Type:pdf, Size:1020Kb
Genet. Res., Camb. (2015), vol. 97, e4. © Cambridge University Press 2015 1 doi:10.1017/S001667231500004X Whole-exome sequencing and its impact in hereditary hearing loss 1,2 1 1 1 TAHIR ATIK , GUNEY BADEMCI , OSCAR DIAZ-HORTA , SUSAN H. BLANTON AND MUSTAFA TEKIN1* 1Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, USA 2Division of Genetics, Department of Pediatrics, Ege University School of Medicine, Izmir, Turkey (Received 30 December 2014; revised 12 February 2015; accepted 13 February 2015) Summary Next-generation sequencing (NGS) technologies have played a central role in the genetic revolution. These technologies, especially whole-exome sequencing, have become the primary tool of geneticists to identify the causative DNA variants in Mendelian disorders, including hereditary deafness. Current research estimates that 1% of all human genes have a function in hearing. To date, mutations in over 80 genes have been reported to cause nonsyndromic hearing loss (NSHL). Strikingly, more than a quarter of all known genes re- lated to NSHL were discovered in the past 5 years via NGS technologies. In this article, we review recent developments in the usage of NGS for hereditary deafness, with an emphasis on whole-exome sequencing. 1. Introduction over the past 10 years. This revolutionary sequencing DNA sequencing has been one of the most important technique is capable of sequencing millions of small frag- techniques in medical research and genetic diagnostics ments covering the whole genome or large regions of since chain termination sequencing was first described interest, such as the entire coding portion of the genome 38 years ago (Sanger et al., 1977). Subsequently, PCR (i.e. exome), at a reasonable cost and reduced runtime et al. a was introduced by Mullis et al. (1986), broadening its compared to Sanger sequencing (Yan , 2013 ; et al. applications. Automated Sanger sequencers based on Rabbani , 2014). capillary electrophoresis were an integral part of The Elucidation of the genetic basis of human diseases is Human Genome Project completed in 2003. The first vital for understanding the underlying pathology, human genome sequenced was the result of over 10 making an early diagnosis, developing prevention year’s effort at an estimated cost of $2.7 billion and/or better treatment regimens, and improving gen- et al. (Lander et al., 2001;McPhersonet al., 2001; etic counselling (Gilissen , 2012). Although Sachidanandam et al., 2001). Automated Sanger traditional laboratory approaches, including karyo- sequencing, also called ‘first-generation DNA sequen- typing and copy number variation (CNV) analysis, cing’, is the most widely used technology worldwide and computational methods, including linkage and today for variant detection through sequencing. association, have led to great insights into human Although currently the gold standard for DNA sequen- diseases over the past few decades, there is still a cing, there are some limitations in using this method for substantial gap between disease phenotype and the et al. high-throughput applications, including read length, underlying genetic involvements (Bamshad , et al. runtime and per base cost (Rizzo & Buck, 2012). 2011; Gilissen , 2011). Limiting elements of tra- Fortunately, a high-throughput, powerful technology ditional gene-discovery strategies, such as the necessity known as next-generation sequencing (NGS) (or mas- of large families, and the existence of reduced sively parallel sequencing (MPS)) has been developed penetrance, variable expressivity and locus heterogen- eity have always been a problem for geneticists (Bamshad et al., 2011). The development of NGS, however, has made the identification of causative * Corresponding author: Mustafa Tekin, 1501 NW 10th Avenue, BRB-610 (M-860), Miami, FL 33136, USA. Tel: 305-243-2381. genes easier, even for small families and diseases E-mail: [email protected] with extensive locus heterogeneity (Duman & Tekin, Downloaded from https://www.cambridge.org/core. IP address: 170.106.35.234, on 30 Sep 2021 at 16:07:49, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S001667231500004X T. Atik et al. 2 2012). Furthermore, NGS can also be used to analyse growing strand, caught and imaged by the sequencer, genomic functions through transcriptome, methylome and then the fluorescent terminator is cleaved off the and chromatin structure studies. nucleotide (Ju et al., 2006; Shearer et al., 2011). The Hearing loss (HL) is the most common sensory SOLiD platform uses sequence-by-ligation (SBL) tech- deficit, affecting millions of people worldwide, with nology. A fluorescently labelled probe hybridizes to the an incidence of 1/1000 newborns in the US. At least DNA template to be sequenced. The probe then is half of congenital deafness is due to genetic factors, joined to the growing strand by DNA ligase, and then the vast majority of which is monogenic. Autosomal re- imaged by the sequencer (Shearer et al., 2011). cessive, autosomal dominant and X-linked inheritance are observed in 77, 22 and 1% of genetic cases, respect- ively (Morton, 1991). Approximately 70% of congeni- 3. Why WES? tal deafness is nonsyndromic (nonsyndromic hearing Nowadays, sequencing the whole genome is becoming loss; NSHL), while the remaining 30% is comprised available for wider usage; however, it was not practi- of many different syndromes in which deafness is a fea- cal until recently because of cost and the amount of ture (Shearer & Smith, 2012;Yanet al., 2013 a). data produced. To discover the genes causing Current research estimates that 1% of all human Mendelian diseases, especially those having genetic genes have a function in hearing (Teek et al., 2013). heterogeneity, WES represents a relatively easy-to-use To date, mutations in over 80 genes, with more than alternative method in the research area (Yan et al., 1000 mutations, have been found to cause NSHL mak- 2013 a). While the coding parts of the human genome, ing hereditary hearing loss one of the most genetically i.e. exons of (most) genes or the exome, constitute only heterogeneous traits (http://hereditaryhearingloss.org). about 1% of the entire human genome, 85% of muta- Except for those recurrently reported in a few genes, tions known to cause Mendelian diseases are located such as GJB2 (MIM 121011) and SLC26A4 (MIM in the coding region or in canonical splice sites 605646), most deafness mutations are extremely rare (Choi et al., 2009). The targeted genomic enrichment and are only seen in either a single or a very few families required for WES has some challenges, such as incom- (Diaz-Horta et al., 2012). plete knowledge about all truly protein-coding exons, Here, we review recent developments on the usage variable efficiencies of capture probes used and target- of NGS for hereditary HL, with an emphasis on ing sequence issues (e.g. microRNAs, promoters, whole-exome sequencing (WES). The extreme genetic pseudogenes, repetitive elements and ultra-conserved heterogeneity of NSHL makes it the ideal disorder to elements) (Bamshad et al., 2011). Despite these limita- use to demonstrate the full potential of NGS. tions, exome sequencing has clearly proven to be a powerful tool for discovering the underlying genetic etiology of known or suspected Mendelian disorders. 2. NGS platforms and techniques Moreover, by increasing read length and coverage, these problems seem relatively likely to be solved. While there are five commercially available NGS plat- forms in the marketplace: Roche/454 FLX, the Illumina HiSeq Series, the Applied Biosystems (ABI) SOLiD Analyzer, the Polonator G.007 and the 4. Data analysis Helicos HeliScope, the first three platforms currently On average, WES identifies approximately 22 000 sin- dominate the area (Yan et al., 2013 a). Although gle nucleotide variants (SNVs) in a sample. More than these platforms each have their own strengths and 95% of these variants are already known polymorph- weaknesses originating from the technologies used, isms. Strategies for identifying causal SNVs vary they all include three main stages: template preparation, depending on a number of factors such as the putative sequencing/imaging and data analysis. Template prep- mode of inheritance of the trait, the pedigree structure aration is the first step and determines the part of the and the presence of locus heterogeneity in the trait genome to be sequenced, i.e. whole-genome sequencing (Bamshad et al., 2011). (WGS), WES or targeted next-generation sequencing Analysis of the WES data begins with filtering out (Yan et al., 2013 a). The first platform introduced, the variants against a set of polymorphisms that are fre- Roche/454 GS FLX, uses pyrosequencing. When quent ( > 0·5%) in public (e.g. dbSNP and 1000 DNA polymerase incorporates a nucleotide into the Genomes Project) or internal databases (e.g. GEM. growing DNA strand and ATP hydrolysis occurs, the App) (Gonzalez et al., 2013). This step narrows the pyrophosphate release is recorded (Shearer et al., number of variants to a manageable fraction (2% on 2011). The Illumina platform, based on cyclic reversible average) of the SNVs identified in an individual. termination (CRT) technology, is the most widely used Variations in conserved regions, their functional class NGS platform. In CRT, fluorescent labelled chain- (frameshifts, nonsense, splice site) and missense are terminating nucleotides are