Origin of Noncoding DNA Sequences: Molecular Fossils of Genome Evolution

Origin of Noncoding DNA Sequences: Molecular Fossils of Genome Evolution

Proc. NatI. Acad. Sci. USA Vol. 84, pp. 6195-6199, September 1987 Evolution Origin of noncoding DNA sequences: Molecular fossils of genome evolution (noncoding DNA sequence/primordial genome/intron) HIROTo NAORA*t, KAORU MIYAHARA*, AND ROBERT N. CURNOWt *Research School of Biological Sciences, The Australian National University, Canberra, A.C.T. 2601, Australia; and tDepartment of Applied Statistics, University of Reading, Reading, RG6 2AN, England Communicated by D. G. Catcheside, April 20, 1987 ABSTRACT The total amount of noncoding sequences on can be generated in a primordial polynucleotide molecule. chromosomes of contemporary organisms varies significantly The analyses reveal (i) that a run ofat least 0.55-kb equivalent from species to species. We propose a hypothesis for the origin length of nonstop codons occurs with a frequency of 4.6% in of these noncoding sequences that assumes that (a) an -0.55- 20-kb-long polynucleotide molecule-possibly the longest kilobase (kb)-long reading frame composed the primordial "genome" molecule of single-stranded polynucleotides that gene and (it) a 20-kb-long single-stranded polynucleotide is the could have been polymerized in a primordial soup/cell and longest molecule (as a genome) that was polymerized at random (ii) that most higher eukaryote genomes still retain such a and without a specific template in the primordial soup/cell. The prototype genome structure, even after a series of gene statistical distribution of stop codons allows examination of the duplications during evolution. probability of generating reading frames of =0.55 kb in this primordial polynucleotide. This analysis reveals that with three Prerequisite Assumptions stop codons, a run of at least 0.55-kb equivalent length of A few assumptions underlie the present analyses: (i) It is nonstop codons would occur in 4.6% of 20-kb-long polynucle- assumed that the formation ofa primordial polynucleotide did otide molecules. We attempt to estimate the total amount of not require a special template system and thus the products noncoding sequences that would be present on the chromo- formed in the primordial soups/cells were single stranded somes of contemporary species assuming that present-day without any selection of nucleotide sequences. Although chromosomes retain the prototype primordial genome struc- various circumstantial observations support the view ofRNA ture. Theoretical estimates thus obtained for most eukaryotes as the primordial polynucleotide, this is still controversial do not differ significantly from those reported for these specific (12, 13). That issue is not dealt with here, but it is assumed organisms, with only a few exceptions. Furthermore, analysis that the original polynucleotide was single stranded. (ii) In of possible stop-codon distributions suggests that life on earth primordial soups/cells, a functional protein of the minimum would not exist, at least in its present form, had two or four stop size was translated from an open reading frame of polynu- codons been selected early in evolution. cleotide using a primordial protein-synthesizing machinery in which stop and nonstop codons were involved. However, no Different amounts of chromosomal noncoding sequences are involvement of an initiation codon early in evolution is present in various forms (cf. ref. 1, pp. 69-109). For example, assumed. (iii) The number of stop codons was subject to most higher eukaryote protein-coding DNA sequences, selection by the primordial protein-synthesizing machinery. exons, are interrupted by noncoding intron sequences that are removed from transcripts by RNA splicing (2). The Possible Length of a Primordial Gene chromosomal genes of higher eukaryotes appear to require surrounding noncoding (territorial) DNA sequences of a Previous observations showed that most, if not all, of the certain size for active function (3, 4). Although specific genes less than -0.55 kb in length do not possess any introns functions have been assigned to some noncoding sequences, and that genes larger than "=0.55 kb do possess introns (11). most eukaryote cells possess a further excess of noncoding This observation has an important implication for the origin DNA sequences on chromosomes. Some sequences have of genes because it is possible that an -0.55-kb-long open been intensively characterized (cf. ref. 1, pp. 1-36). It was reading frame was the original form of a functional primordial recently suggested that an excess of noncoding DNA se- gene. Some observations support this possibility. quences is a consequence of random duplications and dele- First, thermally produced polymers of amino acids, pro- tions (5). However, whether the origin of these sequences is teinoids, display catalytic activities and are '=18,000 Da in closely associated with gene evolution is yet unsettled. molecular mass (14); these values fall within the lower end of Any effort to understand the origin of genes must ask the molecular mass range of known proteins (15). Second, whether the gene or the cell-like organization came first. serine proteinases-e.g., a-lytic proteinase from Myxobacter Despite intensive discussions (6-10), this question remains 495, are regarded as being the oldest of all proteinases (16). unanswered. This paper is not concerned with this specific This enzyme has 198 amino acid residues (17) and would be question but is an attempt to obtain more information about encoded by a reading frame of -0.6 kb. Third, the majority the origin of genes in relation to observed excesses of of proteins possessing >200 amino acid residues appear to noncoding DNA sequences. contain structurally organized sections or domains (18, 19). In this paper, we consider the possibility that a reading Domains are often associated with individual functions frame of -0.55 kilobase (kb) (11) composed the primordial (18-20) such as the binding of substrates. The most common gene; spatial distribution of stop codons is estimated statis- domain size falls between 100 and 200 amino acid residues tically to test the probability that a reading frame of =0.55 kb (18), values that correspond to 0.3- to 0.6-kb-long open reading frames. Finally, eukaryote proteins have molecular masses of n x where n = 1, 2, . ., whereas The publication costs of this article were defrayed in part by page charge 19,000 Da, payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. tTo whom reprint requests should be addressed. 6195 Downloaded by guest on October 2, 2021 6196 Evolution: Naora et al. Proc. Natl. Acad. Sci. USA 84 (1987) prokaryote proteins have molecular masses of n x 14,000 Da Table 1. Size ranges of single-stranded polynucleotides found in (21). Surprisingly, the unit size for eukaryote proteins cor- contemporary organisms responds to 172-amino acid residues and should be encoded Approximate by reading frames 0.5-0.6 kb long. Type Origin Hosts size, kb These observations, though circumstantial, support the idea that an open reading frame of -0.55 kb that contained Viral* information for biologically useful proteins served as the ssDNA Geminivirus Plants 21-24 primordial gene early in genomic evolution. Inoviridae Bacteria 6-8 Parvoviridae Animals 4.5-6 Possible Length of a Primordial Polynucleotide Molecule ssRNA Coronaviridae Animals 16-24 Paramyxoviridae Animals 15-20 How long were the single-stranded polynucleotide chains Bunyaviridae Animals 9-15 that formed in the primordial soup/cell when biologically Cellular useful information arose spontaneously from these mole- Pre-mRNA Eukaryotes 1st cules? It is known that a single-stranded form of polynucle- mRNA Eukaryotes 14-22t otide possesses a much higher susceptibility to biochemical, 9.5§ chemical, and physical degradation, compared with a double- 8.5-8.71 stranded helical structure that is well stabilized by associative Pre-rRNA Eukaryotes 14 forces between bases (22). A microenvironment favoring *Polynucleotides of viral origin longer than 15 kb are included, polynucleotide degradation is likely to have existed early in except for single-stranded DNA (cf. refs. 23 and 24). ssDNA, evolution, thereby restricting the formation of gigantic single- single-stranded DNA; ssRNA, single-stranded RNA. stranded polynucleotide chains. tApproximately 15-kb-long specific transcripts that contain globin Apparent differences in polynucleotide degradation by sequences have been reported (25). biochemical and chemical factors between and tHuman and monkey apolipoprotein B mRNA prepared from livers primordial or intestines. There are variations of the reported sizes (26-29); the contemporary microenvironments do not jeopardize these average value is 20 kb. assumptions about the early environment. A contemporary §c-abl-containing transcripts ofchronic myelocytic leukemia (K-562) cell is well equipped with a regulatory facility composed of cells (30). powerful polymerases, nucleases, and also powerful nucle- $mRNA transcribed from human thyroglobulin gene (>200 kb), the ase inhibitors. Conversely, a primordial soup/cell probably largest gene known to date (31). had neither powerful enzymes (15) nor inhibitors. Polymer- ization occurred at a much reduced rate and hence required data reflects the absolute absence of gigantic single-stranded much more time in the primordial soup/cell. The concentra- RNA molecules. During the evolution of large genes, a tion of polynucleotides formed in a primordial soup/cell system may have developed that processes nascent pre- might

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us