PERSPECTIVES typical mammal5. It was originally thought OPINION that this variation in base composition was arranged in ‘isochores’,large blocks of DNA of homogeneous G+C content that were The evolution of isochores separated by borders of sharp transition (FIG. 1a). In reality, it seems that only some parts of human chromosomes fit this model. The Adam Eyre-Walker and Laurence D. Hurst human major histocompatibility (MHC) locus is a case in point; the MHC class II and One of the most striking features of This large-scale variation in base compo- III regions have consistent G+C contents of mammalian chromosomes is the variation in sition was discovered almost 30 years ago by ~40 and ~52%, respectively, separated by a G+C content that occurs over scales of Bernardi and colleagues4. They separated BOUNDARY of sharp transition (FIG. 1b).But hundreds of kilobases to megabases, the bovine genomic DNA, which had been this picture breaks down in the MHC class I so-called ‘isochore’ structure of the human sheared into large fragments, according to its region, in which G+C content varies genome. This variation in base composition G+C content, by ultracentrifugation, and between 52 and 42% with no obvious struc- affects both coding and non-coding found that there was substantial variation in ture. So, although it is clear that much of the sequences and seems to reflect a its composition. Subsequent studies showed genome does not fit the classic isochore fundamental level of genome organization. that compositional variation was a feature of model, we use the term ‘isochore’ in this However, although we have known about the genomes of both mammals and birds, review to refer generally to large regions of isochores for over 25 years, we still have a and that the G+C content of large (>300-kb) the genome that contain local similarities in poor understanding of why they exist. In this blocks of DNA varied from ~35 to 55% in a base content. article, we review the current evidence for the three main hypotheses. a 0.60 With sequencing almost complete, it is 0.55 tempting to forget how large the human 0.50 genome is (~3.4 × 109 base pairs (bp) in size). However, only a small fraction of this 0.45 sequence is known to have any function. It is estimated that there are ~30,000 genes in the G+C content 0.40 human genome1,2 that produce mRNAs that 0.35 are on average 1,500 bp in length; so, less 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 than 2% of the genome codes for proteins. A kb similar amount of DNA might be involved b 0.60 in gene regulation and chromosome struc- ture3, but most seems to have no function, 0.55 and so has been called ‘junk’ DNA. However, 0.50 this junk DNA is not without structure 0.45 because base composition (that is, the pro- 0.40 portions of A, C, T and G) varies along chro- G+C content mosomes over a large scale. For example, the 0.35 Class II Class III Class I telomeric 10 Mb of 17q is 50% G and C, 0.30 whereas that of the adjacent 3.9 Mb of the 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 1 chromosome is only 38% G and C . This kb shows that genomes, like organisms, have an Figure 1 | Large-scale variation in G+C content. a | The classic isochore model. b | G+C content anatomy. But is this anatomy the conse- across the three classes of human major histocompatibility (MHC) region of chromosome 6 (data from quence of selection, or is it a by-product of GenBank); G+C content is plotted as a moving average, and the window size is 100 kb, advanced by another cellular process? 10 kb each step. NATURE REVIEWS | GENETICS VOLUME 2 | JULY 2001 | 549 © 2001 Macmillan Magazines Ltd PERSPECTIVES GC3 1 although the variation in GC3 is much greater example, G and C nucleotides are preferentially than the variation in isochore G+C content misincorporated into DNA if the DNA is repli- 0.9 (~30–90% for GC3, versus ~35–60% for iso- cated in a pool of free nucleotides rich in G and 9 0.8 chores ), the correlation is very strong (FIG. 2). C (REFS 20–22); second, that free nucleotide con- This indicates that whatever causes the varia- centrations vary during the cell cycle23,24; and 0.7 tion in GC3 is the main determinant of the last, that some parts of the genome are general- 0.6 variation in isochore G+C. It seems likely that ly replicated early, whereas others are replicated 25 isochore G+C content is less variable than GC3 late . So, regions of the genome that replicate 0.5 because of the integration of repetitive DNA at different times should have different muta- 0.4 elements in non-coding DNA (see below). tion patterns and, therefore, different composi- tions. However, these observations were made 0.3 The proposed causes in somatic cell lines and not in the germ line 0.2 There has been considerable interest over the (only mutations in the germ line contribute to 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 past 15 years in explaining why there is large- evolution), and direct evidence for a relation- Genomic G+C content scale variation in base composition along ship between G+C content and replication Figure 2 | Correlation between G+C content of chromosomes. It has been suggested that the time is ambiguous. Most of the (G+C)-rich a gene and that of its surrounding region. The correlation between GC and the G+C content of variation could be a consequence of three chromosome bands replicate in the first half of 3 11–13 26 the 50 kb that surrounds (~25 kb each side) each processes: mutation bias , natural selec- S phase , and the class III region of the MHC, of 369 genes on human chromosomes 21 and 22 tion14–16,or BIASED GENE CONVERSION (BGC)17,18. replicates before the class II region, which is less (data from GenBank). Only annotated genes that These hypotheses can be grouped into two (G+C)-rich6. However, the class III region are multiples of three and that include a stop categories: those that involve natural selection around the TNFα (tumour-necrosis factor-α) codon are included. The correlation coefficient is and those that do not. However, the effects of gene replicates before the region around the 0.73, which is highly significant (p < 0.0001). BGC, when mathematically modelled, are TNFX gene6, despite having a slightly lower considered to be equivalent to those of weak G+C content27 than the TNFX region. Isochores reflect a level of genome organi- DIRECTIONAL SELECTION19 , so natural selection Furthermore, there is no clear, general relation- zation. This is because gene density and that of and BGC are often grouped together. ship between GC3 (or isochore G+C content) short interspersed repetitive DNA elements These hypotheses are not mutually exclu- and replication time in humans and in mice28. (SINES), as well as recombination frequency, are sive — two or more of the processes could be Filipski11 has suggested that variation in higher in the (G+C)-rich parts of the genome, acting together, or selection could be acting on the efficiency of DNA repair might be whereas long interspersed repetitive DNA ele- the pattern of mutation bias or the process of responsible for the formation and mainte- ments (LINES) are almost exclusively restricted BGC. However, for simplicity we assume that nance of isochores, as the efficiency of certain to the (G+C)-poor parts of the genome1,5. isochores are a consequence of one process, types of DNA repair is known to vary across Furthermore, the two isochore boundaries and that the process is responsible for both the genome29, and because some types of that have been studied in detail seem to repre- their formation and subsequent maintenance. repair are biased. For example, base mis- sent a boundary in more than composition: matches introduced into human cell lines are the isochore boundary between the MHC preferentially repaired to GC (REF. 30).So, class II and III regions is reflected in the differ- “There has been variation in repair efficiency should cause ent times at which these regions replicate variation in the pattern of mutation. (with the (G+C)-rich region replicating earli- considerable interest over However, theoretical analyses show that vari- er)6; the boundary at the neurofibromatosis the past 15 years in ation in base composition is limited accord- (NF1) region is reflected in differing recombi- ing to this model under most conditions31, nation rates (with the (G+C)-rich region explaining why there is and repair has never been shown to vary over showing higher recombination levels)7. large-scale variation in base the scales needed to generate isochores. The G+C content of a gene is highly corre- Recently, Fryxell and Zuckerkandl32 sug- lated to the G+C content of the region of the composition along gested that isochores are a consequence of genome in which it is found8,9. This is particu- chromosomes.” CYTOSINE DEAMINATION. The deamination of larly evident at the largely SILENT, third-codon methyl-cytosine and cytosine (that is, C→T → position (denoted GC3, to indicate the propor- and C U, respectively) is expected to tion of codons that end in G or C) (FIG.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-