Genetic Analysis in Neurology: the Next 10 Years
Total Page:16
File Type:pdf, Size:1020Kb
NEUROLOGICAL REVIEW Genetic Analysis in Neurology The Next 10 Years Alan Pittman, PhD; John Hardy, PhD n recent years, neurogenetics research had made some remarkable advances owing to the advent of genotyping arrays and next-generation sequencing. These improvements to the technology have allowed us to determine the whole-genome structure and its variation and to examine its effect on phenotype in an unprecedented manner. The identification of rare Idisease-causing mutations has led to the identification of new biochemical pathways and has fa- cilitated a greater understanding of the etiology of many neurological diseases. Furthermore, genome- wide association studies have provided information on how common genetic variability impacts on the risk for the development of various complex neurological diseases. Herein, we review how these technological advances have changed the approaches being used to study the genetic basis of neurological disease and how the research findings will be translated into clinical utility. JAMA Neurol. 2013;70(6):696-702. Published online April 9, 2013. doi:10.1001/jamaneurol.2013.2068 The diploid human genome is around 6 cyclopedia of DNA Elements (ENCODE) billion base pairs (bp) of DNA stored in project suggests that 80% of the human ge- 23 chromosome pairs. The Human Ge- nome is indeed functionally active.1,2 nome Project was initiated in 1990 to se- Several different classes of DNA varia- quence the entire human genome from tion can occur between the genomes of dif- DNA from a number of anonymous indi- ferent individuals. The most common type viduals of predominantly European de- of variation is the single-nucleotide poly- scent. The culmination of this work was morphism. One would expect to find ap- the publication of the draft sequence in proximately 3 million such variants in any 2001, and by 2004, a high-quality refer- given individual compared with that of the ence sequence became available. Work by reference sequence. These single base sub- the Genome Reference Consortium con- stitutions or point mutations arise, on av- tinues to this day to improve the quality erage, every 1000 bp or so, and single- and coverage of low-complexity, repeti- nucleotide polymorphisms that occur in tive, and hard to resolve regions. Following on from the release of the ref- erence genome, extensive analysis was per- CME available online at formed to identify functionally signifi- jamanetworkcme.com cant regions. Although today the exact number of genes is still unknown, it is more than 1% of the population are clas- thought that there are approximately sified as common variants. These are of- 21 000 protein-coding genes (1%-2%) con- ten located in noncoding regions of the tained in the human genome. The remain- genome and tend to have little or no phe- der of the genome consists of RNA genes, notypic effect. The vast majority of these regulatory sequences, and repetitive DNA common single-nucleotide polymor- in which the function is poorly under- phisms have been extensively studied in stood. However, recent work from the En- many ethnically diverse populations by ini- tiatives such as the International HapMap Author Affiliations: Department of Molecular Neuroscience and Reta Lila Weston Project3 and constitute a valuable catalog Laboratories, Institute of Neurology, University College London, England. and resource for genome-wide associa- JAMA NEUROL/ VOL 70 (NO. 6), JUNE 2013 WWW.JAMANEURO.COM 696 ©2013 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/25/2021 tion studies (GWASs) that investigate the effect of com- 1 000 000 000 mon variation on traits such as risk and susceptibility to Single-molecule common disease (eg, type 2 diabetes mellitus and Alz- 100 000 000 sequencing heimer disease). 10 000 000 The single-nucleotide polymorphisms that occur in less 1 000 000 100 000 than 1% of the population are classified as rare variants, DNA 10 000 Sanger PCR microarrays invented and some of these may have profound phenotypic effects 1000 method Short-read (eg, such base changes can change or alter the sequence 100 next-generation of a protein-coding gene). Genomic variation can also be 10 Capillary gel sequencing caused by multiple base changes for insertion and dele- electrophoresis 1975 1980 1985 1990 1995 2000 2005 2010 2015 Kilobases of DNA per Day Machine tion variants (ie, insertions and deletions of bases that range Year in size from 1 to 1000 bp). Such variants can have a sub- First Generation Second Third stantial effect in coding regions of the genome where they Generation Generation can result in gross alterations to a amino acid sequence or even a “frameshift” of the sequence resulting in a trun- Figure 1. Evolution of DNA sequencing technologies (adapted from Stratton cated protein. Larger insertions or deletions are referred et al4). PCR indicates polymerase chain reaction. to as copy number variants and can be both common and rare. Inversion and translocation events can also occur and plate DNA to generate “clusters” of identical DNA fol- can result in gross structural changes affecting many genes. lowed by sequencing through a stepwise incorporation These types of variation can be present in germline cells of fluorescently labeled nucleotides or oligonucle- or may be acquired somatically. Germline variation is either otides. Since the middle of the last decade, there are 3 inherited directly or occurs de novo during meiosis or just main commercial NGS platforms based on different se- after fertilization. Variation occurring in somatic cells is quencing chemistries.6 The technologies that are being acquired and can arise randomly or through external en- used now in many laboratories are referred to as second- vironmental factors. Extensive somatic mutation is a hall- generation sequencing technologies to distinguish other mark of cancer but has also been implicated in autoim- technologies in the pipeline termed third-generation se- mune and neurodegenerative diseases. quencing technologies. Variation in DNA can also occur that contributes to Massive parallel sequencing has now allowed for an heritable differences in gene expression; this is termed unprecedented interrogation of the variation in the hu- epigenetic vitiation. These modifications to the DNA in- man genome. For example, the 1000 Genomes Project, clude methylation and histone modifications, and they launched in January 2008, is an international collabora- function without altering the DNA sequence itself and tive research project involving the Wellcome Trust Sanger can change over time. Such modifications can have an Institute (England), the Beijing Genomics Institute important effect on disease (eg, the switching off of tu- (China), and the National Human Genome Research In- mor suppressor genes in cancer). stitute (United States), whose goal is to establish by far the most detailed catalog of human genetic variation.7 The “NEXT GENERATION” OF DNA SEQUENCING plan is to sequence the genomes of 2500 anonymous par- ticipants from a number of different ethnic groups world- DNA sequencing in the laboratory has been possible since wide using a combination of methods: low-coverage ge- the 1970s, when the Sanger method was first developed, nome sequencing and targeted resequencing of coding and has steadily improved and developed over time to fa- regions. The primary goals of this project are 3-fold: to cilitate automation and throughput. However, the tech- discover single-nucleotide variants at frequencies of 1% nique remains too laborious and expensive (although or higher in diverse populations; to uncover variants down Ͼ99.9% accurate) for the routine sequencing of whole ge- to frequencies of 0.1% to 0.5% in functional gene re- nomes. Over the past 10 years, a number of new sequenc- gions; and to reveal structural variants, such as copy num- ing technologies have been developed that have signifi- ber variants, insertions, and deletions. The results of a cantly reduced the cost and time required for sequencing pilot project comparing different strategies for sequenc- (Figure 1). These post-Sanger technologies are collec- ing have already been published, and the sequencing of tively described as next-generation sequencing (NGS) tech- more than 1000 genomes was completed in May 2011.8 nologies5 and have been developed with whole-genome This resource is publically available and can be used by sequencing in mind. This, however, is not their sole pur- researchers to identify variants in regions that are sus- pose; they can be used for a wide range of applications, pected of being associated with disease. By identifying such as targeted resequencing and RNA sequencing. and cataloguing most of the common genetic variants in Next-generation sequencing platforms have allowed the populations studied, this project has generated data for massive parallelization of sequencing reactions. Un- that will serve as an invaluable reference for clinical in- like the Sanger method in which each sequencing reac- terpretation of genomic variation. tion represents a single predefined target, the DNA mol- ecules in second-generation platforms are immobilized THIRD-GENERATION SEQUENCING on a solid surface and are sequenced in situ. This allows TECHNOLOGIES for the sequencing of many millions of target molecules in parallel and for a substantial reduction in cost. Cur- Massively parallel sequencing has become the domi- rent NGS platforms use the clonal amplification of tem- nant sequencing technology, but other