COMMENTARY

Major changes in our DNA lead to major changes in our thinking

Jonathan Sebat

Variability in the genome has far exceeded expectations. In the course of the past three years, we have learned that much of our naturally occurring genetic variation consists of large-scale differences in genome structure, including copy-number variants (CNVs) and balanced rearrangements such as inversions. Recent studies have begun to reveal that structural variants are an important contributor to risk; however, structural variants http://www.nature.com/naturegenetics as a class may not conform well to expectations of current methods for gene mapping. New approaches are needed to understand the contribution of structural variants to disease.

A subject that has gained much attention in on average than single polymor- gene mapping approaches. What these stud- the field of human genetics has been the dis- phisms (SNPs), which account for approxi- ies have begun to reveal is that structural vari- covery that of the genome mately 2.5 Mb (1/1,200 bp)8–10. Therefore, the ants contribute to disease and the risk factors including large insertions and deletions of total genomic variability between is involved often do not conform to the expec- DNA, collectively termed copy-number vari- significantly greater than previously thought, tations of prevailing association-based meth- ants (CNVs), as well as balanced chromosomal amounting to a difference of at least 0.2%, ods. This has consequences for what methods rearrangements, such as inversions, contribute >0.12% at the structural level and 0.08% at should be used to study CNVs, and it also has to a major proportion of genetic difference in the nucleotide level. implications for the respective contribution of humans. Following the first studies to report the In retrospect, perhaps it should not have common and rare CNVs in disease. Nature Publishing Group Group 200 7 Nature Publishing 1,2

© widespread abundance of CNVs in humans , been so surprising to find our genome riddled Much of what was previously known about knowledge of structural variation has grown with deletions, duplications and inversions. the role of CNVs in disease comes from a rich rapidly, owing to steady improvements in Remarkable genomic plasticity had been literature on ‘genomic disorders’13. Genomic oligonucleotide microarray technology and observed in model organisms much earlier, for disorders are defined as a diverse group of the development of new sequencing-based3 example when cytogenetic studies by Barbara genetic that are each caused by an and SNP-based4,5 structural variant detection McClintock found that transposition events alteration in DNA copy number. These muta- methods, and their use in large-scale proj- explained nonmendelian patterns of segre- tions can be relatively large, microscopically ects to map structural variation in different gation for certain maize phenotypes11. Later, visible imbalances, such as in Prader-Willi syn- populations6,7. studies of the human genome revealed the pres- drome14, or they may be much smaller, requir- It is now recognized that the genomes of any ence of cytogenetically visible polymorphisms ing higher resolution detection methods, such 12 15 two individuals in the human population differ in heterochromatin length . Nevertheless, this as in Williams Syndrome . Genomic disorders more at the structural level than at the nucleo- aspect of human variability was not unmis- are typically sporadic in nature because the tide sequence level. Conservative estimates sug- takable. The proverbial lamp post was firmly CNV in most cases is a de novo with gest that CNVs between individuals amount fixed in the opposite direction because reliable nearly complete penetrance, and because the 3 to 4 Mb (1/800 bp) of genetic difference , and methods did not exist for ascertaining CNVs affected individuals have severe developmental less conservative estimates put this figure in genome wide, and because prevailing methods problems and are unlikely to have offspring. 7 the range of 5–24 Mb . By either measure, for gene mapping worked best in the context However, there are notable examples of men- CNVs account for more nucleotide variation of a static genome. delian disease traits associated with CNVs. For Technological innovations have opened example, duplications of the gene for peripheral Jonathan Sebat is at the Cold Spring Harbor the door to a fundamental aspect of human myelin protein 22 (PMP22) cause the dominant Laboratory, One Bungtown Road, Cold Spring genomic variation that was previously unrecog- neuropathy Charcot-Marie Tooth disease type Harbor, New York 11724, USA nized and have opened a new window into the 1A16, and deletions of the α-globin gene cluster e-mail: [email protected] genetic basis of disease. Methods for detecting cause the recessive anemia α-thalassemia17. CNVs genome-wide have the power to iden- Previous knowledge of genomic disorders Published online 27 June 2007; doi:10.1038/ tify risk factors for disease directly, and thereby was limited by the available methods: that ng2095 overcome some key limitations of traditional is, limited primarily to disorders that form

NATURE GENETICS SUPPLEMENT | VOLUME 39 | JULY 2007 S3 COMMENTARY

a distinct clinical entity and where genomic findings of early genetic studies of autism that reasons for this effect could include reduced imbalances are often cytogenetically visible or found evidence for linkage at many locations SNP coverage in CNP regions and in regions inherited in a dominant fashion. The applica- in the genome25. An important implication of rich in segmental duplications, or recurrent tion of high resolution genome-wide meth- the recent findings in autism is that the genetic copy-number at individual loci. ods to sporadic disorders promises to greatly component of certain common disorders may Recurrent mutation is certainly evident at improve the power to detect CNVs that cause consist largely of a constellation of rare, highly some CNP loci, based on the existence of sev- disease18. In addition, these genetic findings penetrant mutations. This line of evidence also eral common . For example, quantitative are proving helpful in informing physicians favors the notion that much of the sporadic PCR measurements of FCGR3B in a cohort of about the clinical features of a disorder. For nature of autism can be attributed to sponta- European ancestry showed four distinct dis- example, by identifying new clinically relevant neous mutation at individual loci, in contrast tributions of diploid copy number, indicating CNVs and correlating these changes with phe- to models that explain the lack of mendelian that at least three distinct genomic structures, notypic information, new genomic disorders segregation by the additive or multiplicative consisting of zero, one or two copies per chro- have been defined that had not been previously effects of alleles at multiple loci26. mosome, are common in the population35. The recognized as distinct clinical entities19–21. A high rate of structural mutation is not a distribution of CCL3L1 copy-number alleles Because each genomic disorder is a clinically property of autism or other neurodevelop- was found to be greater still, varying between defined syndrome linked with a single locus, mental disorders; it is a property of the human zero and seven copies per genome34. In both of and each is nearly 100% penetrant, these dis- genome. Therefore, frequent spontaneous the previous examples, disease risk was associ- eases are individually quite rare in the human copy-number mutation may play a prominent ated primarily with the dosage of a gene, rather population. However, it is not a great stretch role in adult-onset neuropsychiatric disorders than with any single . Thus, some CNPs of the imagination to envisage another type or indeed in any heritable disease whose effect constitute common variation that segregates of genomic disorder that is similar in many on reproductive fitness and its prevalence in independently of SNPs. 27 respects to those described above, but is instead the population seem to defy darwinian logic . In the past three years, it has become obvi- http://www.nature.com/naturegenetics a common disease. Consider, for instance, a There are several examples of familial genomic ous that the structure of the human genome is 28 disorder where the clinically defined pheno- disorders ; but one fact that is not well appre- not static. Furthermore, it is becoming increas- type is not associated with a single locus, but ciated is that they are invariably a result of ingly evident that copy-number variability is instead associated with the occurrence of a spontaneous mutation (occurring in recent differs from nucleotide variability in terms single dominant mutation involving any one of ancestry). For example, autosomal dominant of the rate at which copy-number mutations 37 50 autosomal genes. Assuming a spontaneous and sporadic forms of Charcot Marie-Tooth occur spontaneously in the genome and CNV mutation rate of 1/10,000 per locus on disease type 1 are caused by identical dupli- the allelic diversity that may occur as a result. average, a ‘complex genomic disorder’ of this cations of the gene PMP22, and are typically Therefore, CNVs require special consideration 38 kind would be relatively common, with a popu- inherited in the dominant pedigrees and de in large-scale genetic studies of disease . For 29 lation prevalence of 1/200. novo in the sporadic cases . The common α- loci with the highest mutation rates, linkage Spontaneous copy-number mutation has globingene deletions found in different isolated disequilibrium–based methods of association 39 recently emerged as a relevant issue in com- populations each occur on a different haplotype are not effective ; therefore, direct methods of mon disease, for example in autism spectrum Nature Publishing Group Group 200 7 Nature Publishing background, implying that the deletions arose CNV detection are required. In addition, for 17 © disorders (ASD) where the prevalence is esti- independently in each group , and recently a some diseases a -based study may have mated at 1/150 (ref. 22). A high frequency of high rate of spontaneous α-globin mutation advantages over a case-control design because 30 spontaneous copy-number mutation has been in sperm was confirmed by Lam et al. . Thus, it would allow the identification of de novo 23 reported in ASD . In this study, 10% (12/118) the persistence of some diseases in the global mutations. Lastly, confirming the association of sporadic cases were associated with a de novo population may be due to a high rate of random of candidate genes originally identified from CNV, a significantly higher rate than in fami- mutation and a large number of potential sites genome-wide CNV scans will surely require lies with more than one affected (3%) or in the genome which, when altered, can pro- methods that are different from conventional in healthy controls (1%). In a separate study duce a similar disease . approaches for fine-mapping candidate regions focusing on a subset of individuals with syn- It is certain that common copy-number identified in whole-genome association studies, dromic autism (combined with dysmorphic polymorphisms (CNPs) will underlie heritable and are likely to involve a more comprehensive features and mental retardation), Jacquemont human traits. Deletions are known to underlie analysis of CNVs and SNPs, for example using et al. found the frequency of de novo CNVs some relatively common human traits, such a combination of tiling-resolution oligonucle- 24 31 to be 24% (7/26) . The frequency of de novo as the Rh-negative and color otide arrays and high-throughput sequencing 32,33 mutation found in these studies is an underesti- blindness . More recently, CNPs have been technology. When candidate loci originally mate. Considering that microarray analysis at a shown to contribute to disease risk. For exam- identified from CNV studies are examined resolution of ≤85,000 probes detects fewer than ple duplications of the gene CCL3L1 have been more closely, a new surprise may be in store in 10% of all CNVs, the total frequency of de novo found to influence susceptibility to infectious terms of the number of genes and diversity of 34 copy-number changes in autism could be sev- disease , and CNPs of FCGR3B predispose to causative alleles that contribute to disease. eral-fold higher than what has been reported, systemic autoimmune disease35,36. raising the possibility that spontaneous struc- Although the variation in the above cases is ACKNOWLEDGMENTS Special thanks to M. Wigler, M.-C. King and D. Levy tural mutations may contribute to disease in a common, for a variety of reasons, SNP-based for helpful discussions and to J. Lupski for his critical majority of patients. The mutations identified methods may fail to ascertain much of the reading of the manuscript. My laboratory is funded in these studies occurred at many loci through- structural variation at these and other loci. by the Simons Foundation, Lattner Foundation, out the genome, and no individual CNV was Population-based studies have shown that Stanley Foundation, the US National Institutes of Health (National Institute of Mental Health, National found in more than 1% of cases. This high CNPs as a class have reduced linkage disequi- Human Genome Research Institute), Autism Speaks 5,7 degree of heterogeneity is consistent with the librium with neighboring SNPs . Potential and the Southwest Autism Research and Resource

S4 VOLUME 39 | JULY 2007 | NATURE GENETICS SUPPLEMENT COMMENTARY

Center. of the genome can lead to DNA rearrangements and for a multilocus etiology. Am. J. Hum. Genet. 65, 493– human disease traits. Trends Genet. 14, 417–422 507 (1999). COMPETING INTERESTS STATEMENT (1998). 26. Pickles, A. et al. Latent-class analysis of recurrence The author declares no competing financial interests. 14. Ledbetter, D.H. et al. Deletions of chromosome 15 as risks for complex with selection and mea- a cause of the Prader-Willi syndrome. N. Engl. J. Med. surement error: a and family history study of Published online at http://www.nature.com/ 304, 325–329 (1981). autism. Am. J. Hum. Genet. 57, 717–726 (1995). naturegenetics 15. Ewart, A.K. et al. Hemizygosity at the elastin locus 27. Bassett, A.S., Bury, A., Hodgkinson, K.A. & Honer, Reprints and permissions information is in a developmental disorder, Williams syndrome. Nat. W.G. Reproductive fitness in familial schizophrenia. Schizophr. Res. 21, 151–160 (1996). available online at http://npg.nature.com/ Genet. 5, 11–16 (1993). 16. Lupski, J.R. et al. DNA duplication associated with 28. Lee, J.A. & Lupski, J.R. Genomic rearrangements and reprintsandpermissions/ Charcot-Marie-Tooth disease type 1A. Cell 66, 219– gene copy-number alterations as a cause of nervous system disorders. Neuron 52, 103–121 (2006). 1. Iafrate, A.J. et al. Detection of large-scale variation 232 (1991). 29. Hoogendijk, J.E. et al. De-novo mutation in hereditary in the human genome. Nat. Genet. 36, 949–951 17. Higgs, D.R. et al. A review of the molecular genetics motor and sensory neuropathy type I. Lancet 339, (2004). of the human α-globin gene cluster. Blood 73, 1081– 1081–1082 (1992). 2. Sebat, J. et al. Large-scale copy number polymor- 1104 (1989). 30. Lam, K.W. & Jeffreys, A.J. Processes of copy-number phism in the human genome. Science 305, 525–528 18. Shaw-Smith, C. et al. Microarray based comparative change in human DNA: the dynamics of α-globin gene (2004). genomic hybridisation (array-CGH) detects submi- deletion. Proc. Natl. Acad. Sci. USA 103, 8921–8927 3. Tuzun, E. et al. Fine-scale structural variation of the croscopic chromosomal deletions and duplications (2006). human genome. Nat. Genet. 37, 727–732 (2005). in patients with learning /mental retardation 31. Blunt, T., Steers, F., Daniels, G. & Carritt, B. Lack of 4. McCarroll, S.A. et al. Common deletion polymor- and dysmorphic features. J. Med. Genet. 41, 241–248 RH C/E expression in the Rhesus D–phenotype is the (2004). phisms in the human genome. Nat. Genet. 38, 86–92 result of a gene deletion. Ann. Hum. Genet. 58, 19–24 19. Koolen, D.A. et al. A new chromosome 17q21.31 (2006). (1994). microdeletion syndrome associated with a common 5. Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E. 32. Vollrath, D., Nathans, J. & Davis, R.W. Tandem array inversion . Nat. Genet. 38, 999–1001 & Pritchard, J.K. A high-resolution survey of deletion of human visual pigment genes at Xq28. Science 240, (2006). polymorphism in the human genome. Nat. Genet. 38, 1669–1672 (1988). 20. Shaw-Smith, C. et al. Microdeletion encompassing 75–81 (2006). 33. Nathans, J., Piantanida, T.P., Eddy, R.L., Shows, T.B. MAPT at chromosome 17q21.3 is associated with 6. Eichler, E.E. et al. Completing the map of human & Hogness, D.S. Molecular genetics of inherited varia- developmental delay and learning disability. Nat. genetic variation. Nature 447, 161–165 (2007). tion in human color vision. Science 232, 203–210 Genet. , 1032–1037 (2006). 7. Redon, R. et al. Global variation in copy number in the 38 (1986). human genome. Nature 444, 444–454 (2006). 21. Sharp, A.J. et al. Discovery of previously unidentified 34. Gonzalez, E. et al. The influence of CCL3L1 gene-con- http://www.nature.com/naturegenetics 8. Altshuler, D. et al. An SNP map of the human genome genomic disorders from the duplication architecture taining segmental duplications on HIV-1/AIDS suscep- generated by reduced representation shotgun sequenc- of the human genome. Nat. Genet. 38, 1038–1042 tibility. Science 307, 1434–1440 (2005). ing. Nature 407, 513–516 (2000). (2006). 35. Aitman, T.J. et al. Copy number polymorphism in Fcgr3 9. Wang, D.G. et al. Large-scale identification, mapping, 22. Centers for Disease Control and Prevention. Prevalence predisposes to glomerulonephritis in rats and humans. and genotyping of single-nucleotide polymorphisms of autism spectrum disorders–autism and developmen- Nature 439, 851–855 (2006). in the human genome. Science 280, 1077–1082 tal monitoring network, 14 sites, United 36. Fanciulli, M. et al. FCGR3B copycopy numbernumber vvariationariation (1998). States, 2002. MMWR Surveill. Summ. 56, 12–28 is associated with susceptibility to systemic, but not 10. Sachidanandam, R. et al. A map of human genome (2007). organ-specific, autoimmunity. Nat. Genet. 39, 721– sequence variation containing 1.42 million single 23. Sebat, J. et al. Strong association of de novo copy 723 (2007). nucleotide polymorphisms. Nature 409, 928–933 number mutations with autism. Science 316, 445–449 37. Lupski, J.R. Genomic rearrangements and sporadic (2001). (2007). disease. Nat. Genet. 39, S43–S47 (2007). 11. McClintock, B. The origin and behavior of mutable loci 24. Jacquemont, M.L. et al. Array-based comparative 38. McCarroll, S.A. & Altshuler, D. M. Copy-number varia- in maize. Proc. Natl. Acad. Sci. USA 36, 344–355 genomic hybridization identifies high frequency of tion and association studies of human disease. Nat. (1950). cryptic chromosomal rearrangements in patients with Genet. 39, S37–S42 (2007). 12. Jacobs, P.A. Human chromosome heteromorphisms syndromic autism spectrum disorders. J. Med. Genet. 39. Pritchard, J.K. Are rare variants responsible for suscep- (variants). Prog. Med. Genet. 2, 251–274 (1977). 43, 843–849 (2006). tibility to complex diseases? Am. J. Hum. Genet. 69, 13. Lupski, J.R. Genomic disorders: structural features 25. Risch, N. et al. A genomic screen of autism: evidence 124–137 (2001). Nature Publishing Group Group 200 7 Nature Publishing ©

NATURE GENETICS SUPPLEMENT | VOLUME 39 | JULY 2007 S5