<<

NEUROLOGICAL REVIEW

Genetic Analysis in Neurology The Next 10 Years

Alan Pittman, PhD; John Hardy, PhD

n recent years, neurogenetics research had made some remarkable advances owing to the advent of genotyping arrays and next-generation sequencing. These improvements to the technology have allowed us to determine the whole-genome structure and its variation and to examine its effect on phenotype in an unprecedented manner. The identification of rare Idisease-causing has led to the identification of new biochemical pathways and has fa- cilitated a greater understanding of the etiology of many neurological diseases. Furthermore, genome- wide association studies have provided information on how common genetic variability impacts on the risk for the development of various complex neurological diseases. Herein, we review how these technological advances have changed the approaches being used to study the genetic basis of neurological disease and how the research findings will be translated into clinical utility. JAMA Neurol. 2013;70(6):696-702. Published online April 9, 2013. doi:10.1001/jamaneurol.2013.2068

The diploid is around 6 cyclopedia of DNA Elements (ENCODE) billion base pairs (bp) of DNA stored in project suggests that 80% of the human ge- 23 pairs. The Human Ge- nome is indeed functionally active.1,2 nome Project was initiated in 1990 to se- Several different classes of DNA varia- quence the entire human genome from tion can occur between the genomes of dif- DNA from a number of anonymous indi- ferent individuals. The most common type viduals of predominantly European de- of variation is the single-nucleotide poly- scent. The culmination of this work was morphism. One would expect to find ap- the publication of the draft sequence in proximately 3 million such variants in any 2001, and by 2004, a high-quality refer- given individual compared with that of the ence sequence became available. Work by reference sequence. These single base sub- the Genome Reference Consortium con- stitutions or point mutations arise, on av- tinues to this day to improve the quality erage, every 1000 bp or so, and single- and coverage of low-complexity, repeti- nucleotide polymorphisms that occur in tive, and hard to resolve regions. Following on from the release of the ref- erence genome, extensive analysis was per- CME available online at formed to identify functionally signifi- jamanetworkcme.com cant regions. Although today the exact number of is still unknown, it is more than 1% of the population are clas- thought that there are approximately sified as common variants. These are of- 21 000 -coding genes (1%-2%) con- ten located in noncoding regions of the tained in the human genome. The remain- genome and tend to have little or no phe- der of the genome consists of RNA genes, notypic effect. The vast majority of these regulatory sequences, and repetitive DNA common single-nucleotide polymor- in which the function is poorly under- phisms have been extensively studied in stood. However, recent work from the En- many ethnically diverse populations by ini- tiatives such as the International HapMap Author Affiliations: Department of Molecular Neuroscience and Reta Lila Weston Project3 and constitute a valuable catalog Laboratories, Institute of Neurology, University College London, England. and resource for genome-wide associa-

JAMA NEUROL/ VOL 70 (NO. 6), JUNE 2013 WWW.JAMANEURO.COM 696

©2013 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/25/2021 tion studies (GWASs) that investigate the effect of com- 1 000 000 000 mon variation on traits such as risk and susceptibility to Single-molecule common disease (eg, type 2 diabetes mellitus and Alz- 100 000 000 sequencing heimer disease). 10 000 000 The single-nucleotide polymorphisms that occur in less 1 000 000 100 000 than 1% of the population are classified as rare variants, DNA 10 000 Sanger PCR microarrays invented and some of these may have profound phenotypic effects 1000 method Short-read (eg, such base changes can change or alter the sequence 100 next-generation of a protein-coding ). Genomic variation can also be 10 Capillary gel sequencing caused by multiple base changes for insertion and dele- electrophoresis 1975 1980 1985 1990 1995 2000 2005 2010 2015 Kilobases of DNA per Day Machine tion variants (ie, insertions and deletions of bases that range Year in size from 1 to 1000 bp). Such variants can have a sub- First Generation Second Third stantial effect in coding regions of the genome where they Generation Generation can result in gross alterations to a sequence or even a “frameshift” of the sequence resulting in a trun- Figure 1. Evolution of DNA sequencing technologies (adapted from Stratton cated protein. Larger insertions or deletions are referred et al4). PCR indicates polymerase chain reaction. to as copy number variants and can be both common and rare. Inversion and translocation events can also occur and plate DNA to generate “clusters” of identical DNA fol- can result in gross structural changes affecting many genes. lowed by sequencing through a stepwise incorporation These types of variation can be present in germline cells of fluorescently labeled nucleotides or oligonucle- or may be acquired somatically. Germline variation is either otides. Since the middle of the last decade, there are 3 inherited directly or occurs de novo during meiosis or just main commercial NGS platforms based on different se- after fertilization. Variation occurring in somatic cells is quencing chemistries.6 The technologies that are being acquired and can arise randomly or through external en- used now in many laboratories are referred to as second- vironmental factors. Extensive somatic is a hall- generation sequencing technologies to distinguish other mark of but has also been implicated in autoim- technologies in the pipeline termed third-generation se- mune and neurodegenerative diseases. quencing technologies. Variation in DNA can also occur that contributes to Massive parallel sequencing has now allowed for an heritable differences in ; this is termed unprecedented interrogation of the variation in the hu- epigenetic vitiation. These modifications to the DNA in- man genome. For example, the 1000 Genomes Project, clude methylation and histone modifications, and they launched in January 2008, is an international collabora- function without altering the DNA sequence itself and tive research project involving the Wellcome Trust Sanger can change over time. Such modifications can have an Institute (England), the Beijing Genomics Institute important effect on disease (eg, the switching off of tu- (China), and the National Human Genome Research In- mor suppressor genes in cancer). stitute (United States), whose goal is to establish by far the most detailed catalog of human genetic variation.7 The “NEXT GENERATION” OF DNA SEQUENCING plan is to sequence the genomes of 2500 anonymous par- ticipants from a number of different ethnic groups world- DNA sequencing in the laboratory has been possible since wide using a combination of methods: low-coverage ge- the 1970s, when the Sanger method was first developed, nome sequencing and targeted resequencing of coding and has steadily improved and developed over time to fa- regions. The primary goals of this project are 3-fold: to cilitate automation and throughput. However, the tech- discover single-nucleotide variants at frequencies of 1% nique remains too laborious and expensive (although or higher in diverse populations; to uncover variants down Ͼ99.9% accurate) for the routine sequencing of whole ge- to frequencies of 0.1% to 0.5% in functional gene re- nomes. Over the past 10 years, a number of new sequenc- gions; and to reveal structural variants, such as copy num- ing technologies have been developed that have signifi- ber variants, insertions, and deletions. The results of a cantly reduced the cost and time required for sequencing pilot project comparing different strategies for sequenc- (Figure 1). These post-Sanger technologies are collec- ing have already been published, and the sequencing of tively described as next-generation sequencing (NGS) tech- more than 1000 genomes was completed in May 2011.8 nologies5 and have been developed with whole-genome This resource is publically available and can be used by sequencing in mind. This, however, is not their sole pur- researchers to identify variants in regions that are sus- pose; they can be used for a wide range of applications, pected of being associated with disease. By identifying such as targeted resequencing and RNA sequencing. and cataloguing most of the common genetic variants in Next-generation sequencing platforms have allowed the populations studied, this project has generated data for massive parallelization of sequencing reactions. Un- that will serve as an invaluable reference for clinical in- like the Sanger method in which each sequencing reac- terpretation of genomic variation. tion represents a single predefined target, the DNA mol- ecules in second-generation platforms are immobilized THIRD-GENERATION SEQUENCING on a solid surface and are sequenced in situ. This allows TECHNOLOGIES for the sequencing of many millions of target molecules in parallel and for a substantial reduction in cost. Cur- Massively parallel sequencing has become the domi- rent NGS platforms use the clonal amplification of tem- nant sequencing technology, but other approaches have

JAMA NEUROL/ VOL 70 (NO. 6), JUNE 2013 WWW.JAMANEURO.COM 697

©2013 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/25/2021 Chain termination Sanger method

Pyrosequencing Reversible termination Ligation (Life (Roche/454 Lifesciences) (Illumina HiSeq) Technologies SOLiD) Next-generation sequencing

Tethered polymerase Reversible termination Biological Nanopores (Pacific Biosciences) (Helicos) (Oxford Nanopore) “Next” next-generation sequencing Solid-state nanopores pH (Ion Torrent/ Microscopy (Halcyon (IBM/Roche, NABSys) Life Technologies) Molecular, ZS )

Figure 2. Summary of first-, second-, and third-generation DNA sequencing technologies and of the sequencing chemistries of leading commercial developers.

The analysis pipeline for NGS technology can be

Intensity analysis roughly divided into 3 analytical steps: Primary analysis: base calling—converting the light- Base-calling signal intensities to nucleotide base calls (usually done quality scoring Primary analysis by the onboard machine software while running). Secondary analysis: alignment and variant calling— Short-read sequences mapping the short DNA reads to the reference sequence (fastQ format) and calling differences or variants between the two. Reference sequence Tertiary analysis: interpretation—analysis of the vari-

Mapping reads ant data with respect to the genetic experiment in question. to reference Secondary analysis Although these are all fairly standardized processes now, there are a few points of note relating to the align- Variant detection ment and annotation of variants. Numerous programs Annotation database have been developed specifically for alignment and vari- Variant analysis ant calling; however, a major issue for standard align- ment programs is the interpretation of small insertions Tertiary analysis Validation and deletions, which has been partly addressed with new programs for this purpose. However, current technol-

Interpretation ogy and analysis do not allow for fully confident analy- sis of insertions and deletions. In the final stages of the alignment phase, the data are annotated with genetic and Figure 3. Standardized outline of informatics pipeline for processing and biological information that can be visually inspected analyzing data from next-generation sequencing platforms. The fastQ format is a text-based format for storing both a biological sequence (usually a through a graphical interface. The ability to provide ac- nucleotide sequence) and its corresponding quality scores. curate and comprehensive genome annotation is criti- cal for interpretation and for the implementation of fil- emerged that avoid amplification of the DNA template tering steps to exclude nonpathogenic and irrelevant prior to sequencing and instead aim to sequence the single variants. Once the majority of variants have been ex- DNA molecule in real time. These new technologies are cluded, a small number of potential variants may re- collectively referred to as the “next” NGS or third- main that could be linked to the disease phenotype. generation sequencing (Figure 2). The potential ben- efits of using single-molecule sequencing are minimal in- GENETICS AND NEUROLOGICAL DISEASE put DNA requirements, elimination of amplification bias, faster turnaround times, and longer read lengths that al- The genome of any given individual will contain mil- low for some haplotyping of sequence information. lions of sequence variants of which the vast majority will have no effect (neutral variation) or will represent nor- ANALYSIS OF DATA AND mal differences in phenotype (eg, hair color). However, THE INFORMATICS PIPELINE some may harbor pathogenic mutations that cause or pre- dispose to disease. Determining if a single variant is as- The volume of data generated by NGS is enormous, and sociated with a disease can be a slow process, especially the workload has shifted away from the laboratory to- if the effect is subtle. ward the data analysis process.9 Analysis of whole- Monogenic gene disorders are usually associated with exome or whole-genome data requires substantial com- rare, highly penetrant genetic mutations that have a pro- putation, data storage, and informatics tools for found effect on the function of a gene (eg, by changing interpreting the variant data (Figure 3). This is the least the coding sequence). However, the severity and pen- trivial aspect of NGS and represents the true challenge. etrance of the phenotype can vary widely, and this could

JAMA NEUROL/ VOL 70 (NO. 6), JUNE 2013 WWW.JAMANEURO.COM 698

©2013 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/25/2021 Table 1. Mendelian Genes for Parkinson Disease

Gene Location Protein Inheritance Source LRRRK 12q12 Leucine-rich repeat kinase 2 Dominant Zimprich et al,12 Paisa´n-Ruı´zetal13 PARK2 6q26 Parkin Recessive Kitada et al14 PARK7 1p36.23 Protein DJ-1-like Recessive Bonifati et al15 PINK1 1p36.12 PTEN induced putative kinase 1 Recessive Valente et al16 SNCA 4q22.1 Synuclein, alpha Dominant Polymeropoulos et al,17 Singleton et al18 VPS35 16q12 Vacuolar protein sorting 35 homologue Dominant Zimprich et al,19 Vilarin˜o-Gu¨ell et al20 EIF4G1 3q27.1 Eukaryotic translation initiation Dominant Chartier-Harlin et al21 factor 4 gamma, 1

Table 2. Common Risk Loci for Parkinson Disease

Gene/ Location Protein Marker Odds Ratio (95% CI) ACMSD/TMEM163 2q21.3 Unknown rs6710823 1.40 (1.20-1.63) BST1 4p15.32 Bone marrow stromal cell antigen 1 rs11724635 1.40 (1.20-1.63) CCDC62/HIP1R 12q24.31 Unknown rs12817488 1.17 (1.09-1.25) FAM47E/STBD1 4q21.1 Unknown rs6812193 1.12 (1.09-1.25) GAK/DGKQ 4p16.3 Unknown rs1564282 1.29 (1.20-1.38) GBA 1q22 Glucosidase, beta, acid N370S 3.51 (2.55-4.83) GPNMB 7p15.3 Glucoprotein (transmembrane) nmb rs156429 1.12 (1.08-1.16) GWA_8p22/FGF20 8p22 Unknown rs591323 1.12 (1.08-1.17) HLA II 6p21.32 Major histocompatibility complex, HLA locus 1.33 (1.19-1.48) class II LRRK2 12q12 Leucine-rich repeat kinase 2 rs1491942 1.17 (1.13-1.22) MAPT 17q21.31 Microtubule-associated protein tau 17q21.31 1.29 (1.25-1.33) MCCC1/LAMP3 3q27.1 Unknown 3q27.1 1.18 (1.13-1.24) PARK16 1q32.1 Unknown 1q32.1 1.26 (1.18-1.34) SETD1A/STX1B 16p11.2 Unknown 16p11.2 1.14 (1.09-1.19) SNCA 4q22.1 Synuclein, alpha 4q22.1 1.30 (1.26-1.35) SREBF1/RAI1y 17q11.2 Unknown 17p11.2 1.18 (1.11-1.25) STK39 2q24.3 Serine threonine kinase 39 2q24.3 1.28 (1.19-1.38) SYT11/RAB25 1q22 Unknown 1q22 1.67 (1.41-1.98)

be due to the influence of other modifier genes. Such Table 2) have recently been identified through single-gene disorders tend to run in families with a clear GWASs,22,23 and although their exact mode of action has inheritance pattern. In addition to rare, highly pen- yet to be elucidated, it is most likely that susceptibility etrant mutations, common variants in the population con- acts through subtle changes in the gene expression of the tribute to the susceptibility to common, complex neu- target genes. These susceptibility variants are consid- rological disease. These variants tend to have small effects ered of low penetrance and would be expected to de- on risk and are usually found in the noncoding portion crease risk, on average, by 10%. Recessively transmitted of the genome. Assessing disease risk at the individual GBA mutations cause Gaucher disease, a lysosomal stor- level based on these variants is challenging and, gener- age disease, and relatives of patients with Gaucher dis- ally speaking, has limited clinical utility. ease show an increased incidence of PD. Subsequent ge- Parkinson disease (PD) is the second most common netic studies of GBA revealed that rare polymorphisms neurological disease of adult onset, with increased inci- have a role to play (ie, significantly increasing the risk of dence with age; approximately 10% of patients report a PD 5-fold).24 positive family history.10 Mendelian forms of PD occur Many other neurodegenerative disorders show an with both autosomal dominant and recessive patterns of extensive family history. For example, Alzheimer dis- inheritance. Toxic gain-of-function mutations in SNCA, ease, frontotemporal dementia, and amyotrophic lateral LRRK2, and VPS35 cause autosomal dominant PD, and, sclerosis show rare but significant familial inherence, furthermore, common polymorphisms in SNCA and Mendelian forms of diseases, and lower-penetrance vari- LRRK2 exert a small but significant risk effect on non- ants associated with the more common sporadic forms Mendelian forms of PD. Conversely, loss-of-function mu- of disease.25 tations in PARK2, PARK7, and PINK1 cause autosomal recessive PD.11 Collectively, the monogenic forms of PD HIGH-THROUGHPUT GENETIC APPROACHES account for about 30% of familial cases and approxi- TO NEUROLOGICAL DISEASE RESEARCH: mately 5% of sporadic cases11 (Table 1). THE ROAD AHEAD Although non-Mendelian forms of PD show a rela- tively low level of heritability, a larger number of sus- In recent years, neurogenetics research had made some re- ceptibility loci (including SNCA, MAPT, and LRRK2; markable advances owing to the advent of the genotyp-

JAMA NEUROL/ VOL 70 (NO. 6), JUNE 2013 WWW.JAMANEURO.COM 699

©2013 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/25/2021 ing arrays and NGS techniques herein described. These risk polymorphism is associated with an increased ex- new techniques allow for increasingly larger numbers of pression of exon 3 containing MAPT transcripts in the samples of the genome to be interrogated at high reso- human .28 This biologically significant finding opens lution. These advances have come after 20 years of small up new lines of research into the role of tau protein in candidate gene studies and traditional positional clon- PD and related neurodegenerative conditions. ing techniques. Since the advent of GWASs in 2005, sev- Over the last 2 years, whole-exome sequencing has rap- eral well-replicated neurodegenerative studies have iden- idly become the approach of choice to study rare varia- tified new disease loci, and these discoveries are set to tions that are not captured by GWASs. It is inexpensive continue over the next few years with ever-increasing but still an effective alternative to whole-genome se- sample sizes under study. Despite the achievements of quencing, especially since approximately 85% of Men- GWASs, this approach is limited. First, it is only able to delian disease–causing mutations are located within 1 of study relatively common types of variants, those that oc- the 180 000 coding exons, which constitutes a mere 30 cur at a frequency of more than 1% in the general popu- MB or 1% of the genome. Nonetheless, whole-exome se- lation. Second, the major problems associated with the dis- quencing is not without its limitations; for example, not covery of such risk-associated variants is the interpretation all known genes are well captured or well sequenced using of the risk in the context of disease pathogenesis. this technology. However, it is anticipated that, over the The results from a typical GWAS highlight disease- next few years, this approach will be gradually replaced associated regions of the genome that can be several ki- by whole-genome sequencing owing to the decreasing lobases or, indeed, even up to a megabase in size, and cost of sequencing and the far greater variant informa- because such variants tend to be noncoding, it is not en- tion content gleaned from the entire genome. tirely obvious what the target gene or functional conse- Owing to recent advances in high-throughput geno- quence is. One possible approach for resolving this is- typing and sequencing technologies, it is hoped that ge- sue would be to integrate high-throughput genotype data, netic research will uncover a large number of additional sequencing data, and local gene expression data in such disease-causing and disease-modifying sequence vari- a way that the biological basis of the association can be ants over the coming years. There is no doubt that these elucidated. new discoveries will lay the foundations for initiating new Occasionally, the biological basis for such disease as- biological research and will open up avenues to new treat- sociations can be relatively straightforward to dissect (eg, ment approaches and diagnostics. The genome-wide ap- at the SNCA locus in PD). In rare cases, Mendelian mu- proaches described herein have, without a doubt, ad- tations can cause disease by being toxic gain-of- vanced the field of neurogenetics. With the continuing function coding mutations, and (in parallel) common, evolution of and improvements to the technology, its rap- noncoding variants in the population are present in the idly decreasing cost, and the improved data informatics same gene that has a more modest effect on disease risk of NGS, it is anticipated that almost all routine genetics by subtly altering the regulation and expression of the research will be conducted in this way. SNCA gene. However, for the vast majority of other GWAS “hits,” further work is needed to dissect out the precise FOR NEUROLOGICAL causal variants and their effect on disease. DISEASE IN A CLINICAL SETTING One such approach is to use genome-wide expres- sion quantitative trait loci (eQTLs) data sets. These eQTLs It is expected that this wealth of new information on the are genomic loci that regulate the expression levels of mes- genome and its effect on the risk of neurological disease senger RNA (mRNA). The measured mRNA is the prod- will result in the development of novel diagnostic as- uct of a single gene with a specific chromosomal loca- says and targeted therapies and in an improved ability tion. The eQTLs may act locally cis or trans (at a distance) to predict the onset, severity, and progression of dis- of a gene, and the abundance of a gene transcript is di- ease. In other words, it will have a major impact on medi- rectly modified by a polymorphism in a regulatory ele- cal practice. Furthermore, the advances in NGS de- ment. The combination of GWAS and the measurement scribed herein that have transformed genomic research of global gene expression allows for the systematic iden- now have the potential to revolutionize the way in which tification of eQTLs.26 By assaying gene expression and genetic neurological diseases are diagnosed in the labo- genetic variation simultaneously on a genome-wide ba- ratory in a clinically useful way. sis for a large number of individuals, statistical methods It is highly desirable to use NGS in the diagnosis of can be used to map the genetic factors that underpin in- neurological disease, and many such new tests are be- dividual differences in the quantitative levels of expres- ginning to become available and are being offered either sion of many thousands of transcripts. Such data sets are commercially of by local health trusts. The develop- becoming widely available; for example, the UK Human ment of these new high-throughput methods is advan- Brain Expression Consortium data set,27 generated from tageous for several reasons. There is wide clinical and ge- 10 distinct brain regions sampled from 134 neuropatho- netic heterogeneity of neurological diseases, which means logically normal individuals, contains detailed informa- that any given disorder may present with a wide spec- tion on the regional expression, splicing, and regulation trum of clinical phenotypes and that even mutations in of genes in physiologically relevant tissue. Already this the same gene may present with different syndromes. For approach has elucidated GWAS neurological disease hits example, mutations in the FA2H gene may present in the (eg, the association of the common 17q21.31 H1 MAPT clinic as brain iron accumulation, leukodystrophy, or he- locus with PD). Studies on eQTLs have revealed that this reditary spastic paraplegia. Clinical heterogeneity can also

JAMA NEUROL/ VOL 70 (NO. 6), JUNE 2013 WWW.JAMANEURO.COM 700

©2013 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/25/2021 arise from mutations in several genes associated with a Of the 160 individuals in the cohort, 85 (53%) had a neu- clinically typical phenotype (eg, LRRK2 and SNCA mul- rological disorder that was successfully diagnosed using tiplications) and, likewise, in the recessive genes PARK2, whole-genome sequencing. PINK1, and PARK7. In addition, numerous genes can un- Performing whole-genome sequencing has the poten- derline a Huntington disease phenotype. Thus, it is dif- tial to provide answers to the diagnostic questions about ficult to assign a specific gene test just on clinical grounds the medical condition and about the potential predispo- alone, and a strong case can be made for multigene test- sitions to unconsidered conditions in the future, which ing of a disease area rather than a single phenotype or a may have implications for other family members. Thus, possible single underlying gene. the routine implementation of whole-genome sequenc- Many important points that can barely be touched on ing will necessitate a detailed review of the full ethical here need to be considered before NGS is routinely in- implications for health care providers and for the pa- corporated into the diagnostic laboratory, but they would tients themselves. have to include a careful consideration of (1) the selec- tion of the appropriate technology, (2) the ethical and Accepted for Publication: December 6, 2012. legal issues, and (3) the data analysis and the infrastruc- Published Online: April 9, 2013. doi:10.1001/jamaneurol ture of the information technology. .2013.2068 It is our view that, at this point in time, the most effi- Correspondence: John Hardy, PhD, Department of Mo- cient and appropriate strategy is that of clinically tar- lecular Neuroscience and Reta Lila Weston Laborato- geted analysis, using one of the benchtop sequencers ries, Institute of Neurology, Queen Square House, Uni- discussed previously, rather than whole-exome sequenc- versity College London, 9th Floor, Queen Square, London ing or whole-genome sequencing, such that only genes WC1N 3BG, England. of relevance to the specific condition are analyzed and Author Contributions: Study concept and design: All au- the results of which are shared with patients. That is not thors. Acquisition of data: Pittman. Analysis and interpre- to say that such new tests will focus only on a single tation of data: Pittman. Drafting of the manuscript: All au- gene but, rather, that all possible known causal genes thors. Critical revision of the manuscript for important with the potential to cause a condition will be screened. intellectual content: Pittman. Statistical analysis: Pitt- For example, in the case of PD, a targeted test that man. Obtained funding: Hardy. Administrative, technical, screens all possible causal Mendelian genes could and material support: Pittman. include the following typical set: SNCA, LRRK2, PARK2, Conflict of Interest Disclosures: None reported. PARK7, PINK1, and VPS35. However, one would prob- ably wish to include genes that have been associated with other atypical forms of parkinsonism such as ATP13A2, FBX07, and PLA2G6 and genes such as MAPT REFERENCES and GCH1 associated with phenocopies of PD. Thus, in this case, a specific diagnostic question is whether to 1. Ecker JR, Bickmore WA, Barroso I, Pritchard JK, Gilad Y, Segal E. Genomics: “test” the genome sequence for known clinically vali- ENCODE explained. Nature. 2012;489(7414):52-55. dated pathogenic variants. 2. Farnham PJ. Thematic minireview series on results from the ENCODE project: Whole-exome sequencing is unlikely to be routinely integrative global analyses of regulatory regions in the human genome. J Biol Chem. 2012;287(37):30885-30887. adopted by the diagnostic market owing to the limita- 3. International HapMap Consortium. The International HapMap Project. Nature. 2003; tions in the technology (ie, poorly captured genes). The 426(6968):789-796. consequences of not capturing all the known genes can 4. Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458(7239): be illustrated by the GBA gene. As previously discussed, 719-724. heterozygous mutations in GBA are the strongest ge- 5. Metzker ML. Sequencing technologies—the next generation. Nat Rev Genet. 2010; 24 11(1):31-46. netic risk factor for developing PD to date. GBA poses 6. Loman NJ, Misra RV, Dallman TJ, et al. Performance comparison of benchtop difficulties for sequencing because it has a pseudogene high-throughput sequencing platforms. Nat Biotechnol. 2012;30(5):434-439. with approximately 96% homology a few kilobases 7. Abecasis GR, Altshuler D, Auton A, et al; 1000 Genomes Project Consortium. downstream. This poses difficulties for capturing the A map of human genome variation from population-scale sequencing [pub- target DNA, for postsequence alignment, and for vari- lished correction appears in Nature. 2011;473(7348):544]. Nature. 2010;467 (7319):1061-1073. ant calling. 8. Buchanan CC, Torstenson ES, Bush WS, Ritchie MD. A comparison of cata- In the years to follow, it is anticipated that whole- loged variation between International HapMap Consortium and 1000 Genomes genome sequencing will become a widespread tool for Project data. J Am Med Inform Assoc. 2012;19(2):289-294. clinical use because of its decreasing cost, the new ad- 9. Flicek P, Birney E. Sense from sequence reads: methods for alignment and vances in the technology, and the improved informatics assembly. Nat Methods. 2009;6(11 suppl):S6-S12. 10. Thomas B, Beal MF. Parkinson’s disease. Hum Mol Genet. 2007;16(R2):R183-R194. systems that can handle the large volume of data gener- 11. Klein C, Westenberger A. Genetics of Parkinson’s disease. Cold Spring Harb Per- ated. This application of whole-genome sequencing will spect Med. 2012;2(1):a008888. be particularly important in the diagnoses of the condi- 12. Zimprich A, Biskup S, Leitner P, et al. Mutations in LRRK2 cause autosomal- tions of individuals for whom traditional sequencing ap- dominant parkinsonism with pleomorphic pathology. Neuron. 2004;44(4): proaches have failed to identify the underlying cause of 601-607. disease. One such new initiative is the National Insti- 13. Paisa´n-Ruı´z C, Jain S, Evans EW, et al. Cloning of the gene containing mutations 29 that cause PARK8-linked Parkinson’s disease. Neuron. 2004;44(4):595-600. tutes of Health Undiagnosed Disease Program. In the 14. Kitada T, Asakawa S, Hattori N, et al. Mutations in the parkin gene cause auto- first year of this program, 160 individuals were en- somal recessive juvenile parkinsonism. Nature. 1998;392(6676):605-608. rolled, and for 39 cases, a diagnosis was able to be made. 15. Bonifati V, Rizzu P, van Baren MJ, et al. Mutations in the DJ-1 gene associated

JAMA NEUROL/ VOL 70 (NO. 6), JUNE 2013 WWW.JAMANEURO.COM 701

©2013 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/25/2021 with autosomal recessive early-onset parkinsonism. Science. 2003;299(5604): veals genetic risk underlying Parkinson’s disease. Nat Genet. 2009;41(12): 256-259. 1308-1312. 16. Valente EM, Abou-Sleiman PM, Caputo V, et al. Hereditary early-onset Parkinson’s 24. Sidransky E, Nalls MA, Aasly JO, et al. Multicenter analysis of glucocerebrosi- disease caused by mutations in PINK1. Science. 2004;304(5674):1158-1160. dase mutations in Parkinson’s disease. N Engl J Med. 2009;361(17):1651- 17. Polymeropoulos MH, Lavedan C, Leroy E, et al. Mutation in the alpha-synuclein 1661. gene identified in families with Parkinson’s disease. Science. 1997;276(5321): 25. Lill CM, Bertram L. Towards unveiling the genetics of neurodegenerative diseases. 2045-2047. Semin Neurol. 2011;31(5):531-541. 18. Singleton AB, Farrer M, Johnson J, et al. alpha-Synuclein locus triplication causes 26. Hernandez DG, Nalls MA, Moore M, et al. Integration of GWAS SNPs and tissue Parkinson’s disease. Science. 2003;302(5646):841. specific expression profiling reveal discrete eQTLs for human traits in blood and 19. Zimprich A, Benet-Pagès A, Struhal W, et al. A mutation in VPS35, encoding a brain. Neurobiol Dis. 2012;47(1):20-28. subunit of the retromer complex, causes late-onset Parkinson disease. Am J Hum 27. Trabzuni D, Ryten M, Walker R, et al. Quality control parameters on a large dataset Genet. 2011;89(1):168-175. of regionally dissected human control for whole genome expression studies. 20. Vilarin˜o-Gu¨ell C, Wider C, Ross OA, et al. VPS35 mutations in Parkinson disease. Am J Hum Genet. 2011;89(1):162-167. J Neurochem. 2011;119(2):275-282. 21. Chartier-Harlin MC, Dachsel JC, Vilarin˜o-Gu¨ell C, et al. Translation initiator EIF4G1 28. Trabzuni D, Wray S, Vandrovcova J, et al. MAPT expression and splicing is dif- mutations in familial Parkinson disease. Am J Hum Genet. 2011;89(3):398-406. ferentially regulated by brain region: relation to genotype and implication for 22. Do CB, Tung JY, Dorfman E, et al. Web-based genome-wide association study tauopathies. Hum Mol Genet. 2012;21(18):4094-4103. identifies two novel loci and a substantial genetic component for Parkinson’s 29. Gahl WA, Markello TC, Toro C, et al. The National Institutes of Health Undiag- disease. PLoS Genet. 2011;7(6):e1002141. nosed Diseases Program: insights into rare diseases. Gen Med. 2012;14(1): 23. Simo´n-Sa´nchez J, Schulte C, Bras JM, et al. Genome-wide association study re- 51-59.

JAMA NEUROL/ VOL 70 (NO. 6), JUNE 2013 WWW.JAMANEURO.COM 702

©2013 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/25/2021