Quick viewing(Text Mode)

Copy Number Variants and Genetic Traits: Closer to the Resolution of Phenotypic to Genotypic Variability

Copy Number Variants and Genetic Traits: Closer to the Resolution of Phenotypic to Genotypic Variability

PERSPECTIVES

OPINION Copy number variants and genetic traits: closer to the resolution of phenotypic to genotypic variability

Jacques S. Beckmann, Xavier Estivill and Stylianos E. Antonarakis Abstract | A considerable and unanticipated plasticity of the , manifested as inter-individual copy number variation, has been discovered. These structural changes constitute a major source of inter-individual that could explain variable penetrance of inherited (Mendelian and polygenic) diseases and variation in the phenotypic expression of aneuploidies and sporadic traits, and might represent a major factor in the aetiology of complex, multifactorial traits. For these reasons, an effort should be made to discover all common and rare copy number variants (CNVs) in the human population. This will also enable systematic exploration of both SNPs and CNVs in association studies to identify the genomic contributors to the common disorders and complex traits.

The availability of genetic markers has HapMap Project genotyped one million led to extraordinary progress in human SNPs19 and, in a second phase, about genetics in the past 25 years, including the 4 million of the nearly 12 million SNPs that elucidation of the molecular genetic basis were deposited in public databases (see the of many Mendelian disorders or traits. NCBI Single Nucleotide Polymorphism Several landmarks mark this progress. database). These variants constitute the Whereas studies of polymorphisms major source of inter-individual genetic in the 1960s predicted a limited amount and phenotypic variation. SNPs and SSRs of genomic variation, analysis of DNA have found additional uses in forensic sequences from several individuals studies, discovery of loss of heterozygosity revealed an extensive and largely in cancers20, population genetic studies21,22, unexpected level of variation (TIMELINE). In discovery of uniparental disomies23, 1978, Kan and Dozy were the first to dis- elucidation of the origin of aneuploidies24, cover single nucleotide variants in the diagnostic studies, evolutionary analyses HpaI restriction site that lies downstream and association studies for polygenic com- of the β-globin (HBB)1. In 1980, plex traits. This list is not exhaustive, and Wyman and White described a highly the literature is continuously expanding variable restriction fragment length with elegant experimental uses of genetic polymorphism (RFLP)2 and, in 1985, variation. To date, SNPs are the variant Jeffreys et al. reported the minisatellites3,4, type of choice for association studies in soon followed by the description of variable common diseases and complex traits. number of tandem repeats (VNTRs)5. These The results of this massive application to advances paved the way to a systematic case–control studies are just around the genetic exploration of the human genome6. corner25, as testified by the recent impres- As early as 1989, a number of independ- sive publications on age-related macular ent reports7–14 described the widespread degeneration26–29, diabetes30–37, obesity38,39, existence of short sequence repeat (SSR) cardiovascular diseases40,41, prostate variants, also called or short cancer42,43 and breast cancer44–46. tandem repeats (STR). In the 1990s, thou- In addition to these by now ‘conven- sands of such markers were used to create tional’ genetic markers, it has been known linkage maps of all human chromosomes15–18. since the 1980s that the human genome These multiallelic markers were used also contains another abundant source extensively by investigators worldwide to of polymorphism, one that involves map disease-causing mutations in more deletions, insertions, duplications and than 2,000 . In 2005, the International complex rearrangements of genomic

NATURE REVIEWS | GENETICS VOLUME 8 | AUGUST 2007 | 639 © 2007 Nature Publishing Group

PERSPECTIVES

Timeline | Landmarks in the study of

(1960–1980) Analysis Jeffreys et al. reported Independent reports7–14 (1989–1996) Microsatellites of protein sequences Kan and Dozy hypervariability described the widespread became the gold-standard from several discovered single at minisatellite existence of short sequence DNA markers for genetic individuals revealed nucleotide variants in sequences3 and their repeat (SSR) variants, also called studies7–14 and thousands of an extensive and the HpaI restriction use in assessing microsatellites or short tandem markers were largely unexpected site downstream of individual genetic repeats (STR) and their used to create linkage maps of The HapMap consortium level of variation. the β-globin gene1. profiles4. application as genetic markers. all human chromosomes15–18. genotyped 1 million single SNPs19.

1960 1975 1978 1980 1985 1987 1989 1991 2002 2004 2005 2007

(1975–1980) Wyman and White described a Nakamura et al. Genomic Identification that a Interrogation of Redon et al57 Description of copy highly variable restriction described the rearrangements large subset of SNPs genomic variability by identified 1,447 copy number variation of the fragment length polymorphism use of variable are identified as are paralogous array hybridization number variable α-globin genes by Kan (RFLP)2. number of tandem the mutational sequence variants methods demonstrated regions showing that and co-workers47,48. repeat (VNTR) mechanism that leads (PSVs) and define the existence of copy at least 12% of the Botstein et al.6 proposed to use markers for human to Charcot–Marie– regions of structural number variants human genome RFLPs to generate linkage maps gene mapping5. Tooth disease type 1A variability85. (CNVs)55,56. contains CNVs. of the human genome. (REF. 53).

regions of 1 kb in length or larger. For one single nucleotide pair, their genomic to explain reduced penetrance of some example, Goossens et al. showed that in abundance (over 10 million) makes them disease-causing mutations. Consider the a fraction of cases the α-globin loci are the most frequent source of polymorphic pedigree in FIG. 1a: individual II-4 mani- triplicated47; normally, they are present in changes. By contrast, CNVs are far less fests the disease phenotype because she two copies per haploid genome, although numerous but can affect from one kilobase has only one copy of the normal allele. By in some instances they carry deletions48. to several megabases of DNA per event, contrast, although individual II-2 receives Subsequently, the number of X-linked adding up to a significant fraction of the the mutant allele from his affected mother, pigment genes was found to vary among genome57–59. he is not affected, because he inherited individuals49,50. The Rhesus blood group The discovery of extensive copy number a duplicated (compensatory) allele from gene RHD is another familiar example of a variation in the genomes of normal his father (the father is healthy because a long-established polymorphism51. individuals provides new hypotheses to higher copy number is not harmful). So, Importantly, the existence of intra- account for the phenotypic variability for a dominant loss-of-function mutation, chromosomal duplicons52 (regions of the among inherited (Mendelian and poly- a CNV gain in trans (on the non-carrier genome with identical sequence) predicted genic) disorders and aneuploid syndromes. ) could rescue the phenotype. the possibility of genomic rearrange- Many aspects of the importance of the con- Note that individual II-2 can transmit ments as a result of non-allelic homologous siderable plasticity of the human genome the mutant allele to his offspring, which recombination (NAHR) between these have previously been discussed57,58,60–63, but could — depending on the status of its regions. In 1991, this course of events was their impact on the myriad of phenotypic other allele — manifest the phenotype. identified as the mutational mechanism traits and genetic diseases remains to be In the above example, we assumed that that leads to Charcot–Marie–Tooth elucidated54,64–67. Here we provide examples extra copies of a CNV result in increased neuropathy type 1A (REF. 53). Numerous to show how CNVs might account for . This is not always the reports describing genomic rearrange- phenotypic variability in genetic disease. case — in ~10% of cases, negative cor- ments have since followed54. In 2004, the relations between CNV and levels of gene interrogation of genomic variability by Dominant traits and penetrance expression were reported68. Consider for array hybridization methods clearly dem- Penetrance, the fraction of individuals with instance, a loss-of-function mutation in onstrated the existence of copy number a particular genotype that show the associ- the context of an expanding CNV that variants55,56. Intense analysis of this type ated phenotype, is an important aspect reduces gene expression. In this case, indi- of genomic variability followed, and the of all genetic disorders. Penetrance is a vidual II-2 could be affected whereas II-4 current conservative estimate from studies complicated issue in genetic counselling, would be an asymptomatic carrier. in a few hundred individuals is that at clinical practice, mapping of disease loci, The mechanism described above least 10% of the genome is subject to copy positional cloning and association might not apply to gain-of-function or number variation57,58 (see the Database of studies for complex, multifactorial and dominant-negative mutations; however, Genomic Variants and the UCSC Genome polygenic disorders. Remarkably, many one could argue that the effect of these web site). dominant disorders have variable pen- mutations could be diminished in the This submicroscopic structural type etrance, such as tuberous sclerosis, presence of two copies of normal alleles. of variation has been termed copy number neurofibromatosis type 1 or breast cancer Similarly, a hypomorphic allele (with a low variation (CNV) or copy number polymorphism due to mutations in BRCA1 or BRCA2. We level of expression, for example) could be (CNP). Although a typical SNP affects only propose that CNV could be one determinant masked if the gene was duplicated within

640 | AUGUST 2007 | VOLUME 8 www.nature.com/reviews/genetics © 2007 Nature Publishing Group

PERSPECTIVES

a CNV; its effect could only be manifested of the common trisomies remain largely Trisomy and more than three copies in a state of a single allele on each chromo- unknown. Several attempts to define mini- Most of the phenotypic variability that is some. These arguments could also apply to mum triplicated regions that underlie the associated with trisomies has been attrib- variations in the severity of the resulting trisomy-associated phenotypes have not uted to the allelic contribution to trisomy phenotype. Reduced penetrance has been yielded unequivocal results72. It now seems and threshold effect of gene-expression observed for several diseases that result logical to consider polymorphic triplicated variation72. But some of the phenotypic from CNV, including DiGeorge syndrome regions (that is, CNVs) in normal indi- variability might be due to the presence of and its reciprocal duplication syndrome, as viduals as contributing to the phenotypic more than three copies of certain sequences. well as speech problems in patients with characteristics of trisomies. For example, Consider the pedigree in FIG. 1b: The duplication of the Williams syndrome CNVs involve 3.5 Mb on chromosome 21, trisomy 21 individual has, through an error region. 10.1 Mb on chromosome 13 and in meiosis, inherited two copies of the chro- An estimated 365 Mb of DNA were 6.5 Mb on chromosome 18 — respectively, mosomal segment from the mother and one found to have a variable number of copies 10.1%, 10.6% and 8.6% of the sequenced from the father. Because this segment car- in the genomic DNA of lymphoblastoid cell nucleotides of these (see the ries a CNV with two copies of the ‘red’ gene, lines that were derived from 270 HapMap Database of Genomic Variants). Regions the total number of copies of the red individuals57. Approximately 15% of genes of these chromosomes that vary in copy gene in the trisomy 21 individual is, in fact, within these CNVs are known to underlie number might harbour genes or other five. This genomic imbalance could contrib- Mendelian monogenic disease phenotypes functional genomic elements that, in three ute to her phenotype. Had she inherited two (285 out of the 1961 genes listed in the copies, are insufficient per se to cause the copies of the other maternal allele, she would OMIM Morbid Map). We predict that various phenotypes of trisomy 21, 13 or be trisomic for the whole of chromosome 21, the overall penetrance and/or variations 18, because these micro-triplications are with only three copies of the red gene, and in the severity of the phenotype of dominant present in normal individuals and are therefore would not develop the phenotype disorders that are caused by mutations therefore free of any gene-dosage-dependent that is associated with the red gene. Note that in genes located within CNVs will be phenotypic consequences. As nearly 50% the mother carries three copies of the red modified compared with the penetrance of of CNVs have been detected only once57, gene, and she is not affected. According to dominant traits that are caused by muta- disclosure of the contribution of common the discussion presented above, the region tions in genes that do not map in CNV CNVs to the modulation of the phenotype of the red gene is excluded from contribut- regions. of trisomies will require the study of ing to the trisomy phenotype if triplicated For example, it has been shown that an larger cohorts. because the mother is a normal individual. amino-acid variant (Y402H) in the com- plement factor H and membrane cofactor (CFH) gene predisposes to age-related a macular degeneration26–29,69. As CFH is included in a CNV57, it is conceivable that the risk that is conferred by one such I-1 I-2 change (Y402H) is modified by variability in copy number at the CFH gene or the II-1 II-2 II-3 II-4 nearby genomic region that includes the CFHR1 and CFHR3 genes. In fact, Hughes et al. have reported that a CFHR1 and CFHR3 deletion is protective against age-related macular degeneration70. Similarly, atypical haemolytic–uraemic syndrome might be modified by copy number variation at this , thereby explaining the variability in the clinical presentation of the disorder71. Because both gains and losses of genomic sequences b affect the region in which these genes lie57, studies that combine both SNPs and CNVs might further clarify the contribution of these two types of genomic variation to these diseases.

CNV contribution to trisomy phenotypes Despite the availability of the sequence of the euchromatic portion of the human and other mammalian genomes, and the Figure 1 | Influence of copy number variations (CNVs) on penetrance and variation in severity ongoing functional annotation, the genes of the phenotype. a | A hypothetical example to show how CNVs could be used to determine the and other functional elements that are penetrance of a dominant mutant allele; see text for details. b | A hypothetical example to show responsible for the various phenotypes how CNVs could modify the phenotypic expression of trisomies; see text for details.

NATURE REVIEWS | GENETICS VOLUME 8 | AUGUST 2007 | 641 © 2007 Nature Publishing Group

PERSPECTIVES

a However, the red gene region, although it disease, is likely to enhance the identifica- I does not contribute to the phenotype in tion of the molecular basis of inherited three copies, could contribute if present in monogenic forms of these diseases74–76,83. AT G C five copies, either directly or as a modifier of In addition, the preponderance and overall the phenotype. This hypothesis can be evalu- chromosomal dispersion of CNVs57,58 II ated by association studies and quantitative might also impact on the inter-individual molecular methods of the copy number of differences in drug response84, as well as DNA elements and certain phenotypic char- susceptibility to infection79 or cancer60, acteristics, and CNVs should be considered either directly or by modulating pen- III as potential modifiers of the phenotypic etrance or variability in the expression of variability of trisomies (or aneuploidies the trait examined. in general). SNPs are generally considered in the context of genetic association studies as IV CNVs and genomic disorders powerful markers for the mapping of loci The pathogenic role of CNVs has been that underlie phenotypic variation; these ( )n known since the elucidation of the aetiol- SNPs are mostly proxies for the causal V ogy of Charcot–Marie–Tooth neuropathy variants with which they are in linkage type 1 (REF. 53) and hereditary neuropathy disequilibrium. However, the use of SNPs in with liability to pressure palsies73. Point association studies in CNV-related cases b mutations in the disease-causing genes will fail to identify the causative genomic I within these CNVs have been uncovered. regions, because SNPs in these regions do CG A T However, most patients with genomic not fulfil the criteria for Mendelian inher- disorders have genomic copy number itance or Hardy–Weinberg equilibrium in the abnormalities rather than point mutations, studied samples; this is especially true in II suggesting that the frequency of de novo complex and multiallelic CNVs, probably events for these types of changes, and owing to recurrent expansion or contrac- hence their population prevalence, differ tion57. Furthermore, as many as 20% of significantly in favour of chromosomal the SNPs deposited in NCBI are paralogous III rearrangements54. sequence variants (PSVs)85, and therefore Specific efforts are underway to are a resource for uncover genomic changes that are studies. Redon et al. found that biallelic 57 IV involved in cases of clinical abnormalities CNVs can be tagged by SNPs . However, (see the DECIPHER web site). Although this is particularly difficult for CNVs that copy number changes were initially docu- are complex or that have multiple alleles. ()n mented through the study of inherited This is likely to be due to de novo events V diseases, we now know that CNVs cover that might lead to many occurrences of approximately 12% of the human genome, the same or similar CNV alleles (FIG. 2). Figure 2 | Possible haplotype configurations potentially altering gene dosage, disrupt- This is clearly the case for the complex in a copy number variation (CNV)-prone ing genes or perturbing regulation of their CNVs (containing several variable copies chromosomal region. A simplified schematic expression, even at long-range distances; of the CCL3L1 gene) that are associated representation of various haplotypic scenarios thus, a considerable number of apparently with predisposition to HIV-1 infection57,79 of a CNV-prone chromosomal region (repre- Mendelian disorders might be due to (X.E., unpublished observations). We pro- sented as a horizontal red bar) and its biallelic CNVs. And just as CNVs can affect mono- pose that genome-wide association studies SNPs (represented as vertical lines), with two on genic traits (including monogenic forms of should be systematically complemented either side of the CNV region (represented as common disorders, see REFS 74–76), CNVs by high-density array comparative genomic blue bars above the chromosomal region) are also likely to underlie the aetiology of hybridization (aCGH) studies, which forming the black (in part a) or red (in part b) , and two that are internal to the common disorders as a result of variability could capture the genetic information of 77,78 CNV. Individuals who are homozygous for in gene dosage . CNVs (FIG. 3). a CNV-null allele (I) will not contain the internal SNPs, whereas all other combinations will vary CNVs and complex traits CNVs, SNPs and genetic information in copy number of the corresponding SNPs. Several complex disorders have already The study of SNPs has revealed that any Offspring from a heterozygous I–III combina- been associated with CNVs. For example, two randomly selected human genomes tion will inherit either a null or a double dose of susceptibility to HIV-1, lupus and Crohn differ by 0.1% (see the International the internal SNPs, and thus manifest a disease have been found associated with HapMap Project web site). Remarkably, non-. Furthermore, CNVs involving the CCL3L1 (REF. 79), the study of CNVs revised this estimate: recurrence of CNVs in this chromosomal region FCGR3B64,80 and C4 (REF. 81), and DEFB4 in fact, two randomly selected genomes will blur any between the CNV and its flanking markers. Note also (REF. 82) genes, respectively. The identifica- differ by at least 1%, and most of this 57 that, although markers that flank rare CNVs tion of rare CNVs involved in susceptibility difference is due to CNVs . Both types might be in perfect linkage disequilibrium with to other common disorders, such as of variants, SNPs and CNVs, are widely them, the same haplotype will also be found on chronic pancreatitis, autism spectrum dispersed throughout the genome, the most common CNV allele. disorders, Alzheimer disease or Parkinson although their genomic prevalence may

642 | AUGUST 2007 | VOLUME 8 www.nature.com/reviews/genetics © 2007 Nature Publishing Group

PERSPECTIVES

differ by two orders of magnitude in even flanking SNPs will fail as markers Genetic risk: favour of SNPs. Yet, because common whenever recurrent CNVs fall in different SNPs and SNVs CNVs affect as much as one-tenth of the haplotypes, as defined by their respective human genome57, any variation in copy tagging SNPs (FIG. 2). number will affect a wide spectrum of Even in the case of a recent and rare genomic sequences (from the kilobase CNV, with which all the flanking tag- Phenotype Environmental risk up to the megabase range), and possibly ging SNPs will be in complete linkage many genes. By contrast, although SNPs disequilibrium, these marker alleles will are present throughout the entire genome, usually be poor predictors for this CNV, they involve only a single nucleotide at as they will also be found on the common Genetic risk: a time. Therefore, it might be that only a CNV allele, and therefore be of little use CNV alleles, Copy number dosage minority of SNPs, those that affect func- in association studies (FIG. 2). Several such tional elements, have a causative role in CNVs (common and rare) that are likely Figure 3 | The genetic and environmental phenotype. Considering the higher muta- to be predisposing have been associated risks combined confer the total risk for a tion rate for CNVs versus point muta- with some disorders such as HIV-1 and complex phenotype. The genetic risk could tions86, and the fact that common CNVs AIDS susceptibility79, hereditary pancrea- be subdivided into that contributed by the SNP span at least one-tenth of the human titis75, Crohn disease82 and systemic lupus alleles, and that contributed by copy number genome57, these types of mutational erythematosus64,80,81. A detailed analysis of variation (CNV alleles or copy number dosage). events are likely to be important players the genomic architecture and genetic char- SNV, single nucleotide variant. in the aetiology of common disease traits acteristics of these regions should provide and sporadic birth defects. Furthermore, important clues about the use of CNVs because of their nature, a significant frac- and SNPs in the discovery of molecular tion of CNVs are likely to have functional, causes of complex disorders and traits. with SNP associations, demonstrating that causal consequences rather than being SNP-based case–control studies have these two approaches interrogate differ- linked only to the disorder or trait. limited power when the causal variation ent parts of the genome and arguing in Cataloguing millions of common is distributed over different chromosomal favour of combining both SNP and CNV human SNPs has already yielded impor- backgrounds; that is, no single tagging analysis in such investigations. This study tant insights into human chromosomal SNP allele or haplotype sufficiently represents the first attempt to evaluate the architecture and evolution. It has revealed discriminates affected subjects from impact of both SNPs and CNVs on gene a block-like structure of linkage dis- control subjects. This is in contrast to expression; however, the CNV molecular equilibrium, as well as the existence of causal CNVs, the recurrence of which definition must be refined before general areas of low or high recombination rate, is less of an issue because one need not conclusions about CNVs with respect leading to the identification of so-called follow specific haplotypes of each CNV to the effect of SNPs on phenotype can tagging SNPs — SNPs that can be used to but, rather, relate the overall gene dosage be made. predict with high probability the alleles at to the phenotype (for example, using log2 other co-segregating ‘tagged’ SNPs19. We ratios) (FIG. 3). This is illustrated in two Conclusions are only beginning to identify and cata- recent reports showing that a fraction of The recent demonstration of the consider- logue human CNVs, and our perception autism risk loci is likely to involve rare able plasticity of the human genome might of their impact is constrained by the cur- CNVs rather than SNPs83,87. These studies help to explain phenotypic discrepancies in rently unknown nature of their variability also show how CNVs could rapidly lead to the penetrance of genetic traits and/or and the technology limitations for CNV the identification of the genes that carry in the severity of the resulting phenotype; analysis. Indeed, one might anticipate that disease-causing mutations, many of which it might also provide new leads for the common CNVs are likely to affect the arise de novo87. detection of the molecular basis of recombination landscape in their vicinity, SNP and CNV patterns, together with common complex disorders. and thus the haplotypic relationships to CNV dosage and environmental factors, CNVs could offer some of the missing common genetic markers in these regions. could combine to produce the phenotype clues to the genetic enigma of complex It is therefore legitimate to assume that the of a trait or disease (FIG. 3). In support of disorders, as they could harbour genes or abundance of CNVs and their impact on the need of a combined SNP and CNV other functional elements that, either by chromosomal haplotypic architecture will genotyping approach, Stranger et al.68 copy number alterations or deleterious also be reflected in the informativeness of recently examined RNA levels in lympho- point mutations, cause or predispose to syntenic SNPs. For this reason, a common blastoid cell lines from 210 unrelated these phenotypes. Full exploration of the CNV is not likely to be adequately covered HapMap individuals and studied CNVs impact of CNVs on phenotype will neces- by SNPs, as markers that map within the using BAC arrays57; they concluded that as sitate CGH arrays with higher resolutions CNV (FIG. 2) have most probably been much as 18% of the detected genetic varia- and alternative methods such as multiplex excluded from the HapMap SNP markers tion in expression of around 15,000 genes ligation-dependent probe amplification because of their departure from Mendelian could be explained by variability in CNVs. (MLPA)88, multiplex amplification and probe inheritance or Hardy–Weinberg equi- Furthermore, comparing the efficiency hybridization (MAPH)89,90 or quantitative librium. SNP selection and current SNP of SNPs or CNVs in detecting expres- multiplex PCR of short fluorescent fragment maps are (by design) biased for markers sion QTLs through association studies, (QMPSF)91 for selected regions of the that show Mendelian inheritance. In they reported that fewer than 20% of the genome, allowing the detection of small multiallelic, recurrently occurring CNVs, detected CNV associations overlapped CNVs while providing precise quantitative

NATURE REVIEWS | GENETICS VOLUME 8 | AUGUST 2007 | 643 © 2007 Nature Publishing Group

PERSPECTIVES

estimates of copy numbers. In view of the Currently, our knowledge of CNVs is depart from Mendelian transmission92,93, growing awareness of the need to capture still incomplete, and higher-resolution in cases or controls. This information both types of variation in genome-wide whole-genome tiling arrays are needed to could not only pinpoint additional CNVs studies, the genotyping industry is cur- capture CNVs of smaller size; that is, in but also facilitate the identification of rently adapting genotyping platforms to the range of several kilobases. Meanwhile, chromosomal regions that are associated allow for simultaneous monitoring of granted the availability of high-density, with a given disease phenotype. SNPs and known CNVs. It is therefore whole-genome SNP genotyping arrays highly likely that in the foreseeable future that extract a considerable amount of Note added in proof high-quality diagnostic tools for a system- the genetic information (about 80%), In an elegant recent review, Lupski atic exploration of both types of variations investigators should be encouraged to estimated that the de novo locus-specific will be produced including the emerging report chromosomal regions that contain a mutation rates for rearrangements are cost-effective ultra high-throughput genome series of consecutive markers82 that are not between 100- and 10,000-fold greater than sequencing technologies. in Hardy–Weinberg equilibrium, or that those for point mutations94.

Glossary

Aneuploidy Hardy–Weinberg equilibrium Non-allelic homologous recombination Having more or less than the typical chromosome The binomial distribution of genotypes in a Recombination between non-allelic paralogous segmental number (46 for humans). population, such that frequencies of genotypes AA, duplications (also known as ); a major Aa and aa will be p2, 2pq and q2, respectively, mechanism leading to deletions, duplications and Array comparative genomic hybridization where p is the frequency of allele A, and q is the inversions, as well as complex structural polymorphism A technology in which sampled and reference DNA are frequency of allele a. and rearrangements in the human genome. differentially labelled and hybridized on BAC or oligonucleotide microarrays to show copy number High-resolution tiling path CGH arrays Paralogous sequence variants differences between the sampled genomes. Arrays for comparative genomic hybridization Genetic changes that are not due to polymorphism but (CGH) that offer a resolution in the order of bases to to nucleotide mismatches from paralogous copies of Association study kilobases. The arrays currently use BACs or long duplicated sequences of the genome. About 20% of the A population-based genetic study that examines oligonucleotides. SNPs deposited in databases are not true SNPs but whether a marker allele segregates with a phenotype paralogous sequence variants. (such as disease occurrence or a quantitative trait) at a Hypomorphic significantly higher rate than would be predicted by chance Penetrance Describes an allele that carries a mutation that causes alone. This is ascertained by genotyping variants in both The extent to which a given genotype manifests itself a partial loss of gene function. affected and unaffected or control individuals. in a given phenotype. The penetrance of some genotypes for some diseases is age-related, complicating the BAC Linkage disequilibrium determination of true penetrance. A DNA construct, derived from a fertility plasmid A measure of whether alleles at two loci coexist (or F-plasmid), which usually carries an insert within gametes in a population in a nonrandom fashion. Quantitative multiplex PCR of short fluorescent of 100–300 kb. Complete genomic libraries cloned in Alleles that are in linkage disequilibrium are found fragments BACs (or PACs, which are produced from P1-plasmids) together on the same haplotype more often than Semi-quantitative, high-throughput analysis of targeted have been useful in constructing arrays for array would be expected by chance. genomic alterations using locus-specific primers. comparative genomic hybridization experiments. Microsatellite Restriction fragment length polymorphism Copy number variant or polymorphism A class of repetitive DNA sequences, scattered A fragment length variant of a DNA sequence that A structural genomic variant that results in confined copy throughout the genome, that are made up of is generated through the gain or loss of a site for a number changes in a specific chromosomal region. If its tandemly organized repeats of 2–8 nucleotides in restriction . population allele frequency is less than 1%, it is referred length. They can be highly polymorphic and are to as a variant; if its frequency exceeds 1%, the term frequently used as molecular markers in population Tagging SNPs polymorphism is used. genetics studies. SNPs that are correlated with and therefore can serve as proxies for a set of variants with which they Duplicon Minisatellites are in linkage disequilibrium. A duplication, or portion thereof, of genomic Regions of DNA in which repeat units of 7–100 bp sequence that shows a high level of sequence identity Ultra high-throughput sequencing are arranged in tandem arrays of 0.5–30 kb long. (over 90%) to another region of a reference genome. A compendium of new sequencing technologies with a Also sometimes referred to as a low copy repeat or common aim to accelerate (from years to days or hours) segmental duplication. Multiplex amplification and probe hybridization and reduce the cost (from millions to thousands or A technique in which, following hybridization to hundreds of dollars) of sequencing. Genomic disorder immobilized samples of nucleic acid sequences, A disorder that results from the gain, loss or re-orientation amplification of each oligonucleotide probe yields a Uniparental disomy of a genomic region that often contains dosage-sensitive product of unique size. The copy number of target A state wherein both homologues (alleles) at a locus gene(s). The result is a genomic rearrangement (such sequences is reflected in the relative intensities of the derive from the same parent. Uniparental disomy of some as duplication, deletion and inversion). Segmental amplification products. chromosomal segments generates characteristic syndromes. duplications are often involved in the rearrangement event through non-allelic homologous recombination. Multiplex ligation-dependent probe amplification Variable number of locus A technique involving the ligation of two A locus that contains a variable number of short Haplotype Block adjacent annealing oligonucleotides followed by tandemly repeated DNA sequences that vary in length A chromosomal region in which groups of alleles at quantitative PCR amplification of the ligated products, and are highly polymorphic. different genetic loci are inherited together more often allowing the detection of deletions, duplications and than would be expected by chance. Adjacent blocks are trisomies, and characterization of chromosomal Whole-genome tiling array separated by recombination hotspots (short regions with aberrations in copy number or sequence and SNP A high-density oligonucleotide array that represents the high recombination rates). or mutation detection. majority of DNA sequences of an organism’s genome.

644 | AUGUST 2007 | VOLUME 8 www.nature.com/reviews/genetics © 2007 Nature Publishing Group

PERSPECTIVES

Jacques S. Beckmann is at the Department of 19. The International HapMap Consortium. A haplotype 45. Easton, D. F. et al. Genome-wide association study Medical Genetics, University of Lausanne and map of the human genome. Nature 437, identifies novel breast cancer susceptibility loci. 1299–1320 (2005). Nature 447, 1087–1093 (2007). Centre Hospitalier Universitaire Vaudois, 2 Avenue 20. Dutt, A. & Beroukhim, R. Single nucleotide 46. Stacey, S. N. et al. Common variants on chromosomes Pierre Decker, 1011 Lausanne, Switzerland. polymorphism array analysis of cancer. Curr. Opin. 2q35 and 16q12 confer susceptibility to estrogen Oncol. 19, 43–49 (2007). receptor-positive breast cancer. Nature Genet. 39, Xavier Estivill is at the Genes and Disease Program, 21. Varilo, T. & Peltonen, L. Isolates and their potential 865–869 (2007). Center for Genomic Regulation (CRG), National use in complex gene mapping efforts. Curr. Opin. 47. Goossens, M. et al. Triplicated α-globin loci in humans. Genotyping Center (CeGen), CIBERESP and Pompeu Genet. Dev. 14, 316–323 (2004). Proc. Natl Acad. Sci. USA 77, 518–521 (1980). Fabra University (UPF), Charles Darwin s/n, PRBB 22. Weir, B. S., Anderson, A. D. & Hepler, A. B. 48. Kan, Y. W. et al. Deletion of α-globin genes in Genetic relatedness analysis: modern data and haemoglobin-H disease demonstrates multiple building, E-08003 Barcelona, Catalonia, Spain. new challenges. Nature Rev. Genet. 7, 771–780 α-globin structural loci. Nature 255, 255–256 Stylianos E. Antonarakis is at the Department (2006). (1975). 23. Engel, E. Uniparental disomy revisited: the first twelve 49. Vollrath, D., Nathans, J. & Davis, R. W. Tandem array of Genetic Medicine and Development, years. Am. J. Med. Genet. 46, 670–674 (1993). of human visual pigment genes at Xq28. Science 240, University of Geneva Medical School, and University 24. Antonarakis, S. E. Parental origin of the extra 1669–1672 (1988). Hospital of Geneva, 1 rue Michel-Servet, chromosome in trisomy 21 as indicated by 50. Drummond-Borg, M., Deeb, S. S. & 1211 Geneva, Switzerland. analysis of DNA polymorphisms. Down Syndrome Motulsky, A. G. Molecular patterns of X Collaborative Group. N. Engl. J. Med. 324, chromosome-linked color vision genes among Correspondence to J.S.B., X.E. or S.E.A. 872–876 (1991). 134 men of European ancestry. Proc. Natl Acad. Sci. e-mails: [email protected]; 25. The Wellcome Trust Case Control Consortium. USA 86, 983–987 (1989). Genome-wide association study of 14,000 cases of 51. Wagner, F. F. & Flegel, W. A. RHD gene deletion [email protected]; seven common diseases and 3,000 shared controls. occurred in the Rhesus box. Blood 95, 3662–3668 [email protected] Nature 447, 661–678 (2007). (2000). 26. Edwards, A. O. et al. Complement factor H 52. Ji, Y., Eichler, E. E., Schwartz, S. & Nicholls, R. D. doi:10.1038/nrg2149 polymorphism and age-related macular degeneration. Structure of chromosomal duplicons and their role in Science 308, 421–424 (2005). mediating human genomic disorders. Genome Res. 1. Kan, Y. W. & Dozy, A. M. Polymorphism of DNA 27. Hageman, G. S. et al. A common haplotype in the 10, 597–610 (2000). sequence adjacent to human β-globin structural gene: complement regulatory gene factor H (HF1/CFH) 53. Lupski, J. R. et al. DNA duplication associated with relationship to sickle mutation. Proc. Natl Acad. Sci. predisposes individuals to age-related macular Charcot–Marie–Tooth disease type 1A. Cell 66, USA 75, 5631–5635 (1978). degeneration. Proc. Natl Acad. Sci. USA 102, 219–232 (1991). 2. Wyman, A. R. & White, R. A highly polymorphic locus 7227–7232 (2005). 54. Lee, J. A. & Lupski, J. R. Genomic rearrangements in human DNA. Proc. Natl Acad. Sci. USA 77, 28. Haines, J. L. et al. Complement factor H variant and gene copy-number alterations as a cause of 6754–6758 (1980). increases the risk of age-related macular nervous system disorders. 52, 103–121 3. Jeffreys, A. J., Wilson, V. & Thein, S. L. Hypervariable degeneration. Science 308, 419–421 (2005). (2006). ‘minisatellite’ regions in human DNA. Nature 314, 29. Klein, R. J. et al. Complement factor H polymorphism 55. Iafrate, A. J. et al. Detection of large-scale variation in 67–73 (1985). in age-related macular degeneration. Science 308, the human genome. Nature Genet. 36, 949–951 4. Jeffreys, A. J., Wilson, V. & Thein, S. L. 385–389 (2005). (2004). Individual-specific ‘fingerprints’ of human DNA. Nature 30. Bottini, N. et al. A functional variant of lymphoid 56. Sebat, J. et al. Large-scale copy number 316, 76–79 (1985). tyrosine phosphatase is associated with type I polymorphism in the human genome. Science 305, 5. Nakamura, Y. et al. Variable number of tandem repeat diabetes. Nature Genet. 36, 337–338 (2004). 525–528 (2004). (VNTR) markers for human gene mapping. Science 31. Smyth, D. J. et al. A genome-wide association study of 57. Redon, R. et al. Global variation in copy number 235, 1616–1622 (1987). nonsynonymous SNPs identifies a type 1 diabetes in the human genome. Nature 444, 444–454 6. Botstein, D., White, R. L., Skolnick, M. & Davis, R. W. locus in the interferon-induced helicase (IFIH1) region. (2006). Construction of a genetic linkage map in man using Nature Genet. 38, 617–619 (2006). 58. Wong, K. K. et al. A comprehensive analysis restriction fragment length polymorphisms. Am. J. 32. Grant, S. F. et al. Variant of factor 7-like 2 of common copy-number variations in the Hum. Genet. 32, 314–331 (1980). (TCF7L2) gene confers risk of type 2 diabetes. Nature human genome. Am. J. Hum. Genet. 80, 7. Litt, M. & Luty, J. A. A hypervariable microsatellite Genet. 38, 320–323 (2006). 91–104 (2007). revealed by in vitro amplification of a dinucleotide 33. Sladek, R. et al. A genome-wide association study 59. Eichler, E. E. et al. Completing the map of repeat within the cardiac muscle actin gene. Am. J. identifies novel risk loci for type 2 diabetes. human genetic variation. Nature 447, 161–165 Hum. Genet. 44, 397–401 (1989). Nature 445, 881–885 (2007). (2007). 8. Weber, J. L. & May, P. E. Abundant class of human 34. Scott, L. J. et al. A genome-wide association study of 60. Cho, E. K. et al. Array-based comparative genomic DNA polymorphisms which can be typed using the type 2 diabetes in Finns detects multiple susceptibility hybridization and copy number variation in cancer polymerase chain reaction. Am. J. Hum. Genet. 44, variants. Science 316, 1341–1345 (2007). research. Cytogenet. Genome Res. 115, 262–272 388–396 (1989). 35. Zeggini, E. et al. Replication of genome-wide (2006). 9. Tautz, D. Hypervariability of simple sequences as association signals in UK samples reveals risk 61. Freeman, J. L. et al. Copy number variation: a general source for polymorphic DNA markers. loci for type 2 diabetes. Science 316, 1336–1341 new insights in genome diversity. Genome Res. 16, Nucleic Acids Res. 17, 6463–6471 (1989). (2007). 949–961 (2006). 10. Smeets, H. J., Brunner, H. G., Ropers, H. H. & 36. Saxena, R. et al. Genome-wide association analysis 62. Khaja, R. et al. Genome assembly comparison Wieringa, B. Use of variable simple sequence identifies loci for type 2 diabetes and triglyceride identifies structural variants in the human genome. motifs as genetic markers: application to study of levels. Science 316, 1331–1336 (2007). Nature Genet. 38, 1413–1418 (2006). myotonic dystrophy. Hum. Genet. 83, 245–251 37. Todd, J. A. et al. Robust associations of four new 63. Feuk, L., Carson, A. R. & Scherer, S. W. Structural (1989). chromosome regions from genome-wide analyses variation in the human genome. Nature Rev. Genet. 7, 11. Williamson, R. et al. Report of the DNA committee of type 1 diabetes. Nature Genet. 53, 1884–1889 85–97 (2006). and catalogues of cloned and mapped genes and DNA (2007). 64. Aitman, T. J. et al. Copy number polymorphism in polymorphisms. Cytogenet. Cell Genet. 55, 457–778 38. Frayling, T. M. et al. A common variant in the FTO FCGR3 predisposes to glomerulonephritis in rats and (1990). gene is associated with body mass index and humans. Nature 439, 851–855 (2006). 12. Economou, E. P., Bergen, A. W., Warren, A. C. & predisposes to childhood and adult obesity. 65. Antonarakis, S. E. & Beckmann, J. S. Mendelian Antonarakis, S. E. The polydeoxyadenylate tract of Science 316, 889–894 (2007). disorders deserve more attention. Nature Rev. Genet. Alu repetitive elements is polymorphic in the human 39. Dina, C. et al. Variation in FTO contributes to 7, 277–282 (2006). genome. Proc. Natl Acad. Sci. USA 87, 2951–2954 childhood obesity and severe adult obesity. 66. Feuk, L., Marshall, C. R., Wintle, R. F. & Scherer, S. W. (1990). Nature Genet. 39, 724–726 (2007). Structural variants: changing the landscape of 13. Kashi, Y. et al. Large restriction fragments containing 40. Helgadottir, A. et al. A common variant on chromosomes and design of disease studies. poly-TG are highly polymorphic in a variety of chromosome 9p21 affects the risk of myocardial Hum. Mol. Genet. 15, R57–R66 (2006). vertebrates. Nucleic Acids Res. 18, 1129–1132 infarction. Science 316, 1491–1493 (2007). 67. Vissers, L. E., Veltman, J. A., van Kessel, A. G. & (1990). 41. McPherson, R. et al. A common allele on chromosome Brunner, H. G. Identification of disease genes by 14. Beckmann, J. S. & Weber, J. L. Survey of human and 9 associated with coronary heart disease. whole genome CGH arrays. Hum. Mol. Genet. 14, rat microsatellites. 12, 627–631 (1992). Science 316, 1488–1491 (2007). R215–R223 (2005). 15. Weissenbach, J. et al. A second-generation linkage 42. Gudmundsson, J. et al. Genome-wide association 68. Stranger, B. E. et al. Relative impact of nucleotide map of the human genome. Nature 359, 794–801 study identifies a second prostate cancer and copy number variation on gene expression (1992). susceptibility variant at 8q24. Nature Genet. 39, phenotypes. Science 315, 848–853 (2007). 16. Murray, J. C. et al. A comprehensive human 631–637 (2007). 69. Thakkinstian, A. et al. Systematic review and linkage map with centimorgan density. Cooperative 43. Camp, N. J. et al. Compelling evidence for a prostate meta-analysis of the association between complement Human Linkage Center (CHLC). Science 265, cancer gene at 22q12. 3 by the International factor H Y402H polymorphisms and age-related 2049–2054 (1994). Consortium for Prostate Cancer Genetics. Hum. Mol. macular degeneration. Hum. Mol. Genet. 15, 17. Gyapay, G. et al. The 1993–94 Genethon human Genet. 16, 1271–1278 (2007). 2784–2790 (2006). genetic linkage map. Nature Genet. 7, 246–339 (1994). 44. Hunter, D. J. et al. A genome-wide association 70. Hughes, A. E. et al. A common CFH haplotype, with 18. Dib, C. et al. A comprehensive genetic map of the study identifies alleles in FGFR2 associated with deletion of CFHR1 and CFHR3, is associated human genome based on 5,264 microsatellites. risk of sporadic postmenopausal breast cancer. with lower risk of age-related macular degeneration. Nature 380, 152–154 (1996). Nature Genet. 39, 870–874 (2007). Nature Genet. 38, 1173–1177 (2006).

NATURE REVIEWS | GENETICS VOLUME 8 | AUGUST 2007 | 645 © 2007 Nature Publishing Group

PERSPECTIVES

71. Fremeaux-Bacchi, V. et al. The development of 82. Fellermann, K. et al. A chromosome 8 gene-cluster Acknowledgements atypical haemolytic–uraemic syndrome is polymorphism with low human β-defensin 2 gene copy We thank all the past and present members of our laborato- influenced by susceptibility factors in factor H number predisposes to Crohn disease of the colon. ries and clinics for ideas, debates and discussions. We thank and membrane cofactor protein: evidence from Am. J. Hum. Genet. 79, 439–448 (2006). J. Lupski for critical reading of the manuscript. Work in two independent cohorts. J. Med. Genet. 42, 83. Szatmari, P. et al. Mapping autism risk loci using J.S.B.’s laboratory is funded by grants from the SNF (Swiss 852–856 (2005). genetic linkage and chromosomal rearrangements. National Science Foundation) and the University of Lausanne, 72. Antonarakis, S. E., Lyle, R., Dermitzakis, E. T., Nature Genet. 39, 319–328 (2007). Switzerland. The laboratory of X.E. is supported by: the Reymond, A. & Deutsch, S. Chromosome 21 and 84. Ouahchi, K., Lindeman, N. & Lee, C. Copy number Departament d’Educació i Universitats and the Departament Down syndrome: from genomics to pathophysiology. variants and pharmacogenomics. Pharmacogenomics de Salut of the Catalan Autonomous Government (Generalitat Nature Rev. Genet. 5, 725–738 (2004). 7, 25–29 (2006). de Catalunya); the Ministry of Health and the Ministry of 73. Boerkoel, C. F., Inoue, K., Reiter, L. T., Warner, L. E. & 85. Estivill, X. et al. Chromosomal regions Education and Science of the Spanish Government; the Lupski, J. R. Molecular mechanisms for CMT1A containing high-density and ambiguously mapped European Union Sixth Framework Programme; and Genoma duplication and HNPP deletion. Ann. NY Acad. Sci. putative single nucleotide polymorphisms (SNPs) España. S.E.A.’s laboratory is supported by the SNF, European 883, 22–35 (1999). correlate with segmental duplications in the Union, US National Institutes of Health and the Lejeune and 74. Singleton, A. B. et al. α-Synuclein locus triplication human genome. Hum. Mol. Genet. 11, 1987–1995 ChildCare Foundations (France). causes Parkinson's disease. Science 302, 841 (2003). (2002). 75. Le Marechal, C. et al. Hereditary pancreatitis caused 86. Inoue, K. & Lupski, J. R. Molecular mechanisms for Competing interests statement by triplication of the trypsinogen locus. Nature Genet. genomic disorders. Annu. Rev. Genomics Hum. Genet. The authors declare no competing financial interests. 38, 1372–1374 (2006). 3, 199–242 (2002). 76. Rovelet-Lecrux, A. et al. APP locus duplication causes 87. Sebat, J. et al. Strong association of de novo autosomal dominant early-onset Alzheimer disease copy number mutations with autism. Science 316, DATABASES with cerebral amyloid angiopathy. Nature Genet. 38, 445–449 (2007). The following terms in this article are linked online to: 24–26 (2006). 88. Schouten, J. P. et al. Relative quantification of 40 Gene: http://www.ncbi.nlm.nih.gov/entrez/query. 77. Knight, J. C. Regulatory polymorphisms underlying nucleic acid sequences by multiplex ligation- fcgi?db=gene complex disease traits. J. Mol. Med. 83, 97–109 dependent probe amplification. Nucleic Acids Res. 30, CCL3L1 | CFH | CFHR1 | CFHR3 | C4 | DEFB4 | FCGR3B | HBB | (2005). e57 (2002). RHD 78. Stranger, B. E. & Dermitzakis, E. T. From DNA to RNA 89. Armour, J. A., Sismani, C., Patsalis, P. C. & Cross, G. OMIM: http://www.ncbi.nlm.nih.gov/entrez/query. to disease and back: the ‘central dogma’ of regulatory Measurement of locus copy number by hybridisation fcgi?db=OMIM disease variation. Hum. Genomics 2, 383–390 with amplifiable probes. Nucleic Acids Res. 28, Alzheimer disease | atypical haemolytic–uraemic syndrome | (2006). 605–609 (2000). BRCA1 | BRCA2 | hereditary neuropathy with liability to 79. Gonzalez, E. et al. The influence of CCL3L1 90. Sellner, L. N. & Taylor, G. R. MLPA and MAPH: pressure palsies | neurofibromatosis type 1 | Parkinson disease | gene-containing segmental duplications on new techniques for detection of gene deletions. tuberous sclerosis HIV-1/AIDS susceptibility. Science 307, 1434–1440 Hum. Mutat. 23, 413–419 (2004). FURTHER INFORMATION (2005). 91. Saugier-Veber, P. et al. Simple detection of genomic Centre for Genomic Regulation: http://pasteur.crg.es 80. Fanciulli, M. et al. FCGR3B copy number variation is microdeletions and microduplications using QMPSF Database of Genomic Variants: associated with susceptibility to systemic, but not in patients with idiopathic mental retardation. http://projects.tcag.ca/variation organ-specific, autoimmunity. Nature Genet. 39, Eur. J. Hum. Genet. 14, 1009–1017 (2006). DECIPHER: http://www.sanger.ac.uk/PostGenomics/decipher/ 721–723 (2007). 92. McCarroll, S. A. et al. Common deletion International HapMap Project: http://www.hapmap.org 81. Yang, Y. et al. Gene copy-number variation and polymorphisms in the human genome. Nature Genet. NCBI Single Nucleotide Polymorphism: associated polymorphisms of complement component 38, 86–92 (2006). http://www.ncbi.nlm.nih.gov/SNP C4 in human systemic lupus erythematosus (SLE): low 93. Conrad, D. F. et al. A worldwide survey of haplotype OMIM Morbid Map: copy number is a risk factor for and high copy number variation and linkage disequilibrium in the human http://www.ncbi.nlm.nih.gov/Omim/getmorbid.cgi is a protective factor against SLE susceptibility genome. Nature Genet. 38, 1251–1260 (2006). UCSC Genome Bioinformatics: http://genome.ucsc.edu in European Americans. Am. J. Hum. Genet. 80, 94. Lupski, J. R. Genomic rearrangements and sporadic Access to this links box is available online. 1037–1054 (2007). disease. Nature Genet. 39, S43–S47 (2007).

646 | AUGUST 2007 | VOLUME 8 www.nature.com/reviews/genetics © 2007 Nature Publishing Group