
652 What controls the length of noncoding DNA? Josep M Comeron Several recent studies of genome evolution indicate that the Eukaryotic genomes consist mostly of DNA sequences that rate of DNA loss exceeds that of DNA gain, leading to an are not part of coding regions, regulatory elements, or RNA underlying mutational pressure towards collapsing the length of genes, and it is this apparently superfluous DNA that noncoding DNA. That such a collapse is not observed suggests contributes to the variation in C-value. Most euchromatic opposing mechanisms favoring longer noncoding regions. The noncoding DNA comprises introns and intergenic regions, presence of transposable elements alone also does not explain with repetitive sequences of different types — satellite and observed features of noncoding DNA. At present, a microsatellite DNA, and different classes of transposable ele- multidisciplinary approach — using population genetics ments (TEs). In this review, I begin by describing several techniques, large-scale genomic analyses, and in silico genomic features of euchromatic noncoding DNA, including evolution — is beginning to provide new and valuable insights intron and intergenic length, and TE and microsatellite into the forces that shape the length of noncoding DNA and, presence, focusing particularly on their relationship with ultimately, genome size. Recombination, in a broad sense, might recombination rates across genomes. Next, I detail results of be the missing key parameter for understanding the observed studies on the mutational tendencies of small indels (inser- variation in length of noncoding DNA in eukaryotes. tions or deletions). Finally, I discuss selective forces that may be influencing the length of noncoding regions. Addresses Department of Ecology and Evolution, University of Chicago, Genome features 1101 East 57th Street, Chicago, Illinois 60637, USA; Intron size e-mail: [email protected] In an innovative study, Lengyel and Penman [7] showed Current Opinion in Genetics & Development 2001, 11:652–659 that the size of hnRNA, but not mature mRNA, increases with genome size in dipterans. They reported significant- 0959-437X/01/$ — see front matter © 2001 Elsevier Science Ltd. All rights reserved. ly longer hnRNA in Aedes than in Drosophila, consistent with Aedes’ 5–6-fold larger genome. This observation, Abbreviations dated before the discovery of the intervening sequences or DB deletion bias DOA dead-on-arrival introns in 1977, was the first indication of a positive rela- Indel insertion or deletion tionship between genome size and total intron length. A IS interference selection significant, although weak, positive relationship between ODB overall deletion bias intron and genome size has now been established for many PDB polymorphic deletion bias other eukaryotes [8–11], both on a large evolutionary time- TE transposable element scale [11] and between Drosophila species [9]. In all cases, however, the differences in intron size alone cannot fully Introduction account for the differences in euchromatic genome size, The total amount of DNA in a haploid genome, known as the indicating that the differences in genome size are not easily C-value, varies enormously among eukaryotic organisms, rang- explained by a single class of noncoding DNA. Variation ing from 1.2× 107 bp (e.g. the yeast Saccharomyces cerevisiae) to in genome size among organisms is usually associated to >6×1011 bp (e.g. Amoeba dubia). More intriguingly, this irregu- congruent changes across different classes of noncoding larity is also frequently observed between closely related DNA (e.g. introns and intergenic regions), suggesting that species (i.e. within the same taxonomic group or genus) with they may be responding to similar, or at least overlapping, similar levels of complexity (morphological, developmental, evolutionary forces. Two factors that can influence the behavioral, etc.), number of genes, and regulatory networks. length of introns and intergenic regions are TE and, to a As emblematic examples, the size of genomes of flowering lesser degree, microsatellite presence. plants varies 1000-fold, and the DNA content can change >200- and 100-fold among vertebrates and species of salaman- Transposable elements ders, respectively [1,2]. The human genome size is not TEs are ubiquitous in all eukaryotes and can be a leading exceptional among vertebrates, with a genome larger than that factor influencing the length of noncoding DNA and of birds but smaller than those of most fishes or amphibians; genome size in some species. For instance, in humans, mammalian genomes vary by >40% in length [3]. Drosophila recognizable TEs represent a fraction as large as 45% of the species provide another interesting case, where differences in euchromatic DNA, and are abundantly present in introns genome size show no obvious relationship to rate of cell [12,13••]. In other organisms, such as Drosophila melanogaster, division or development time, nor do they show clear phylo- TEs have a more limited influence on the overall size of non- genetic trends [4], indicating that the forces involved in coding sequences, with minor effects on intron length; only genome size change may act quite rapidly. This lack of corre- 0.4% of introns in genes with known full-length mRNAs spondence between genome size and biological complexity have detectable remnants of TE sequences (JM Comeron, has been called the C-value paradox or enigma [1,5,6]. unpublished data). The rare detection of TEs in Drosophila What controls the length of noncoding DNA? Comeron 653 introns is not unexpected because they may often alter the recombination rates (i.e. G+C-poor isochores). Moreover, accuracy of transcript processing, mostly with deleterious gene density and the length of intergenic regions also consequences on fitness. In organisms with much longer decrease with recombination [13••], suggesting qualita- introns (e.g. humans), the insertion of TEs may have an tively similar forces shaping the length of both types of attenuated effect on gene expression and would further noncoding DNA across genomes. increase the chance of successive insertions. An equivalent argument can be made for intergenic regions based on the Further, the observed relationship between the length of consequence on gene expression of long terminal repeats in noncoding DNA and recombination in the human genome the proximity of a gene. can be only partially attributed to TE presence. For instance, both median intron length and gene density vary TE invasion and expansion can cause rapid changes in ~10-fold between GC-rich and GC-poor genomic regions genome size, generating differences between very closely but the proportion of the genome comprising TEs varies related species that might share most other forces that much less conspicuously between these regions [13••,35]. influence genome size. An extreme example is the It is possible to argue that the limited variation of TE genome of maize, which has doubled its size as a result of presence is caused by the fact that TEs in regions with low retrotransposon insertions, mostly in the last three million recombination are older than in regions with high recom- years [14]. In Drosophila, TE distribution can also vary bination; they therefore tend to be no longer recognizable among closely related species, or even among populations but still contribute to size. This possibility is unlikely in of the same species [15•]. The recent invasion by the humans, however, where the opposite trend is observed, Penelope TE of D. virilis [16•] is an interesting case in point. with TEs younger in GC-poor isochores [13••], ruling out a simple mutational scenario with rare TE insertions and Microsatellites frequent small deletions [36,37] to explain variation in The main mutational mechanism causing changes in the length of noncoding DNA in humans. number of microsatellite repeats is polymerase template slippage that is not corrected by the mismatch repair In Drosophila, introns also tend to be longer in regions of system [17,18]. In Drosophila, recent studies have revealed low recombination [32••,38••]. This relationship is mostly differences both in density and length of microsatellites caused by genes with long coding sequence (JM Comeron, among species [19•–21•]. These differences concur with unpublished data) and is not explained by TE presence the idea that larger genomes also tend to have more and alone (see above). Congruently, gene density also increas- longer microsatellites [22]. Among distant organisms, es with recombination in D. melanogaster (JM Comeron, Saccharomyces cerevisiae, Caenorhabditis elegans, Arabidopsis unpublished data), paralleling the relationship between thaliana, D. melanogaster, and Homo sapiens show a congruent the length of noncoding regions and recombination increase in overall microsatellite presence with genome size observed in the human genome. A rare case of very when both length and density of microsatellites are long introns can be found also in Drosophila, where genes considered [23–25,26•,27]. This relationship, however, is located in the heterochromatic Y chromosome can have not universal, with the pufferfish Fugu rubripes having a mega-introns (>1 Mb) with large clusters of satellite DNA higher density and longer microsatellites than humans but [39,40•]. This result suggests either that recombination is with a genome eight times smaller [28]. Nevertheless, the required to prevent a one-way growth process for satellite general trend suggests
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-