Copy Number Variants and Genetic Traits: Closer to the Resolution of Phenotypic to Genotypic Variability
Total Page:16
File Type:pdf, Size:1020Kb
PERSPECTIVES OPINION Copy number variants and genetic traits: closer to the resolution of phenotypic to genotypic variability Jacques S. Beckmann, Xavier Estivill and Stylianos E. Antonarakis Abstract | A considerable and unanticipated plasticity of the human genome, manifested as inter-individual copy number variation, has been discovered. These structural changes constitute a major source of inter-individual genetic variation that could explain variable penetrance of inherited (Mendelian and polygenic) diseases and variation in the phenotypic expression of aneuploidies and sporadic traits, and might represent a major factor in the aetiology of complex, multifactorial traits. For these reasons, an effort should be made to discover all common and rare copy number variants (CNVs) in the human population. This will also enable systematic exploration of both SNPs and CNVs in association studies to identify the genomic contributors to the common disorders and complex traits. The availability of genetic markers has HapMap Project genotyped one million led to extraordinary progress in human SNPs19 and, in a second phase, about genetics in the past 25 years, including the 4 million of the nearly 12 million SNPs that elucidation of the molecular genetic basis were deposited in public databases (see the of many Mendelian disorders or traits. NCBI Single Nucleotide Polymorphism Several landmarks mark this progress. database). These variants constitute the Whereas studies of protein polymorphisms major source of inter-individual genetic in the 1960s predicted a limited amount and phenotypic variation. SNPs and SSRs of genomic variation, analysis of DNA have found additional uses in forensic sequences from several individuals studies, discovery of loss of heterozygosity revealed an extensive and largely in cancers20, population genetic studies21,22, un expected level of variation (TIMELINE). In discovery of uniparental disomies23, 1978, Kan and Dozy were the first to dis- elucidation of the origin of aneuploidies24, cover single nucleotide variants in the diagnostic studies, evolutionary analyses HpaI restriction site that lies downstream and association studies for polygenic com- of the β-globin gene (HBB)1. In 1980, plex traits. This list is not exhaustive, and Wyman and White described a highly the literature is continuously expanding variable restriction fragment length with elegant experimental uses of genetic polymorphism (RFLP)2 and, in 1985, variation. To date, SNPs are the variant Jeffreys et al. reported the minisatellites3,4, type of choice for association studies in soon followed by the description of variable common diseases and complex traits. number of tandem repeats (VNTRs)5. These The results of this massive application to advances paved the way to a systematic case–control studies are just around the genetic exploration of the human genome6. corner25, as testified by the recent impres- As early as 1989, a number of independ- sive publications on age-related macular ent reports7–14 described the widespread degeneration26–29, diabetes30–37, obesity38,39, existence of short sequence repeat (SSR) cardiovascular diseases40,41, prostate variants, also called microsatellites or short cancer42,43 and breast cancer44–46. tandem repeats (STR). In the 1990s, thou- In addition to these by now ‘conven- sands of such markers were used to create tional’ genetic markers, it has been known linkage maps of all human chromosomes15–18. since the 1980s that the human genome These multiallelic markers were used also contains another abundant source extensively by investigators worldwide to of polymorphism, one that involves map disease-causing mutations in more deletions, insertions, duplications and than 2,000 genes. In 2005, the International complex rearrangements of genomic NATURE REVIEWS | GENETICS VOLUME 8 | AUGUST 2007 | 639 © 2007 Nature Publishing Group PERSPECTIVES Timeline | Landmarks in the study of human genetic variation (1960–1980) Analysis Jeffreys et al. reported Independent reports7–14 (1989–1996) Microsatellites of protein sequences Kan and Dozy hypervariability described the widespread became the gold-standard from several discovered single at minisatellite existence of short sequence DNA markers for genetic individuals revealed nucleotide variants in sequences3 and their repeat (SSR) variants, also called studies7–14 and thousands of an extensive and the HpaI restriction use in assessing microsatellites or short tandem microsatellite markers were largely unexpected site downstream of individual genetic repeats (STR) and their used to create linkage maps of The HapMap consortium level of variation. the β-globin gene1. profiles4. application as genetic markers. all human chromosomes15–18. genotyped 1 million single SNPs19. 1960 1975 1978 1980 1985 1987 1989 1991 2002 2004 2005 2007 (1975–1980) Wyman and White described a Nakamura et al. Genomic Identification that a Interrogation of Redon et al57 Description of copy highly variable restriction described the rearrangements large subset of SNPs genomic variability by identified 1,447 copy number variation of the fragment length polymorphism use of variable are identified as are paralogous array hybridization number variable α-globin genes by Kan (RFLP)2. number of tandem the mutational sequence variants methods demonstrated regions showing that and co-workers47,48. repeat (VNTR) mechanism that leads (PSVs) and define the existence of copy at least 12% of the Botstein et al.6 proposed to use markers for human to Charcot–Marie– regions of structural number variants human genome RFLPs to generate linkage maps gene mapping5. Tooth disease type 1A variability85. (CNVs)55,56. contains CNVs. of the human genome. (REF. 53). regions of 1 kb in length or larger. For one single nucleotide pair, their genomic to explain reduced penetrance of some example, Goossens et al. showed that in abundance (over 10 million) makes them disease-causing mutations. Consider the a fraction of cases the α-globin loci are the most frequent source of polymorphic pedigree in FIG. 1a: individual II-4 mani- triplicated47; normally, they are present in changes. By contrast, CNVs are far less fests the disease phenotype because she two copies per haploid genome, although numerous but can affect from one kilobase has only one copy of the normal allele. By in some instances they carry deletions48. to several megabases of DNA per event, contrast, although individual II-2 receives Subsequently, the number of X-linked adding up to a significant fraction of the the mutant allele from his affected mother, pigment genes was found to vary among genome57–59. he is not affected, because he inherited individuals49,50. The Rhesus blood group The discovery of extensive copy number a duplicated (compensatory) allele from gene RHD is another familiar example of a variation in the genomes of normal his father (the father is healthy because a long-established deletion polymorphism51. individuals provides new hypotheses to higher copy number is not harmful). So, Importantly, the existence of intra- account for the phenotypic variability for a dominant loss-of-function mutation, chromosomal duplicons52 (regions of the among inherited (Mendelian and poly- a CNV gain in trans (on the non-carrier genome with identical sequence) predicted genic) disorders and aneuploid syndromes. chromosome) could rescue the phenotype. the possibility of genomic rearrange- Many aspects of the importance of the con- Note that individual II-2 can transmit ments as a result of non-allelic homologous siderable plasticity of the human genome the mutant allele to his offspring, which recombination (NAHR) between these have previously been discussed57,58,60–63, but could — depending on the status of its regions. In 1991, this course of events was their impact on the myriad of phenotypic other allele — manifest the phenotype. identified as the mutational mechanism traits and genetic diseases remains to be In the above example, we assumed that that leads to Charcot–Marie–Tooth elucidated54,64–67. Here we provide examples extra copies of a CNV result in increased neuropathy type 1A (REF. 53). Numerous to show how CNVs might account for gene expression. This is not always the reports describing genomic rearrange- phenotypic variability in genetic disease. case — in ~10% of cases, negative cor- ments have since followed54. In 2004, the relations between CNV and levels of gene interrogation of genomic variability by Dominant traits and penetrance expression were reported68. Consider for array hybridization methods clearly dem- Penetrance, the fraction of individuals with instance, a loss-of-function mutation in onstrated the existence of copy number a particular genotype that show the associ- the context of an expanding CNV that variants55,56. Intense analysis of this type ated phenotype, is an important aspect reduces gene expression. In this case, indi- of genomic variability followed, and the of all genetic disorders. Penetrance is a vidual II-2 could be affected whereas II-4 current conservative estimate from studies complicated issue in genetic counselling, would be an asymptomatic carrier. in a few hundred individuals is that at clinical practice, mapping of disease loci, The mechanism described above least 10% of the genome is subject to copy positional cloning and association might not apply to gain-of-function or number variation57,58 (see the Database of studies for complex, multifactorial and dominant-negative mutations; however, Genomic Variants