<<

Proc. NatI. Acad. Sci. USA Vol. 84, pp. 6195-6199, September 1987 Origin of noncoding DNA sequences: Molecular fossils of evolution (noncoding DNA sequence/primordial genome/) HIROTo NAORA*t, KAORU MIYAHARA*, AND ROBERT N. CURNOWt *Research School of Biological Sciences, The Australian National University, Canberra, A.C.T. 2601, Australia; and tDepartment of Applied Statistics, University of Reading, Reading, RG6 2AN, England Communicated by D. G. Catcheside, April 20, 1987

ABSTRACT The total amount of noncoding sequences on can be generated in a primordial molecule. of contemporary varies significantly The analyses reveal (i) that a run ofat least 0.55-kb equivalent from to species. We propose a hypothesis for the origin length of nonstop codons occurs with a of 4.6% in of these noncoding sequences that assumes that (a) an -0.55- 20-kb-long polynucleotide molecule-possibly the longest kilobase (kb)-long composed the primordial "genome" molecule of single-stranded that and (it) a 20-kb-long single-stranded polynucleotide is the could have been polymerized in a primordial soup/ and longest molecule (as a genome) that was polymerized at random (ii) that most higher still retain such a and without a specific template in the primordial soup/cell. The prototype genome structure, even after a series of gene statistical distribution of stop codons allows examination of the duplications during evolution. probability of generating reading frames of =0.55 kb in this primordial polynucleotide. This analysis reveals that with three Prerequisite Assumptions stop codons, a run of at least 0.55-kb equivalent length of A few assumptions underlie the present analyses: (i) It is nonstop codons would occur in 4.6% of 20-kb-long polynucle- assumed that the formation ofa primordial polynucleotide did otide molecules. We attempt to estimate the total amount of not require a special template system and thus the products noncoding sequences that would be present on the chromo- formed in the primordial soups/cells were single stranded somes of contemporary species assuming that present-day without any selection of sequences. Although chromosomes retain the prototype primordial genome struc- various circumstantial observations support the view ofRNA ture. Theoretical estimates thus obtained for most as the primordial polynucleotide, this is still controversial do not differ significantly from those reported for these specific (12, 13). That issue is not dealt with here, but it is assumed organisms, with only a few exceptions. Furthermore, analysis that the original polynucleotide was single stranded. (ii) In of possible stop-codon distributions suggests that on earth primordial soups/cells, a functional of the minimum would not exist, at least in its present form, had two or four stop size was translated from an of polynu- codons been selected early in evolution. cleotide using a primordial protein-synthesizing machinery in which stop and nonstop codons were involved. However, no Different amounts of chromosomal noncoding sequences are involvement of an initiation codon early in evolution is present in various forms (cf. ref. 1, pp. 69-109). For example, assumed. (iii) The number of stop codons was subject to most higher eukaryote protein-coding DNA sequences, selection by the primordial protein-synthesizing machinery. , are interrupted by noncoding intron sequences that are removed from transcripts by RNA splicing (2). The Possible Length of a Primordial Gene chromosomal of higher eukaryotes appear to require surrounding noncoding (territorial) DNA sequences of a Previous observations showed that most, if not all, of the certain size for active function (3, 4). Although specific genes less than -0.55 kb in length do not possess any functions have been assigned to some noncoding sequences, and that genes larger than "=0.55 kb do possess introns (11). most eukaryote cells possess a further excess of noncoding This observation has an important implication for the origin DNA sequences on chromosomes. Some sequences have of genes because it is possible that an -0.55-kb-long open been intensively characterized (cf. ref. 1, pp. 1-36). It was reading frame was the original form of a functional primordial recently suggested that an excess of noncoding DNA se- gene. Some observations support this possibility. quences is a consequence of random duplications and dele- First, thermally produced of amino acids, pro- tions (5). However, whether the origin of these sequences is teinoids, display catalytic activities and are '=18,000 Da in closely associated with gene evolution is yet unsettled. molecular mass (14); these values fall within the lower end of Any effort to understand the origin of genes must ask the molecular mass range of known (15). Second, whether the gene or the cell-like organization came first. serine proteinases-e.g., a-lytic proteinase from Myxobacter Despite intensive discussions (6-10), this question remains 495, are regarded as being the oldest of all proteinases (16). unanswered. This paper is not concerned with this specific This has 198 residues (17) and would be question but is an attempt to obtain more information about encoded by a reading frame of -0.6 kb. Third, the majority the origin of genes in relation to observed excesses of of proteins possessing >200 amino acid residues appear to noncoding DNA sequences. contain structurally organized sections or domains (18, 19). In this paper, we consider the possibility that a reading Domains are often associated with functions frame of -0.55 kilobase (kb) (11) composed the primordial (18-20) such as the binding of substrates. The most common gene; spatial distribution of stop codons is estimated statis- size falls between 100 and 200 amino acid residues tically to test the probability that a reading frame of =0.55 kb (18), values that correspond to 0.3- to 0.6-kb-long open reading frames. Finally, eukaryote proteins have molecular masses of n x where n = 1, 2, . . ., whereas The publication costs of this article were defrayed in part by page charge 19,000 Da, payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. tTo whom reprint requests should be addressed.

6195 Downloaded by guest on October 2, 2021 6196 Evolution: Naora et al. Proc. Natl. Acad. Sci. USA 84 (1987)

proteins have molecular masses of n x 14,000 Da Table 1. Size ranges of single-stranded polynucleotides found in (21). Surprisingly, the unit size for eukaryote proteins cor- contemporary organisms responds to 172-amino acid residues and should be encoded Approximate by reading frames 0.5-0.6 kb long. Type Origin Hosts size, kb These observations, though circumstantial, support the idea that an open reading frame of -0.55 kb that contained Viral* information for biologically useful proteins served as the ssDNA Geminivirus 21-24 primordial gene early in genomic evolution. Inoviridae 6-8 Parvoviridae 4.5-6 Possible Length of a Primordial Polynucleotide Molecule ssRNA Animals 16-24 Animals 15-20 How long were the single-stranded polynucleotide chains Bunyaviridae Animals 9-15 that formed in the primordial soup/cell when biologically Cellular useful information arose spontaneously from these mole- Pre-mRNA Eukaryotes 1st cules? It is known that a single-stranded form of polynucle- mRNA Eukaryotes 14-22t otide possesses a much higher susceptibility to biochemical, 9.5§ chemical, and physical degradation, compared with a double- 8.5-8.71 stranded helical structure that is well stabilized by associative Pre-rRNA Eukaryotes 14 forces between bases (22). A microenvironment favoring *Polynucleotides of viral origin longer than 15 kb are included, polynucleotide degradation is likely to have existed early in except for single-stranded DNA (cf. refs. 23 and 24). ssDNA, evolution, thereby restricting the formation of gigantic single- single-stranded DNA; ssRNA, single-stranded RNA. stranded polynucleotide chains. tApproximately 15-kb-long specific transcripts that contain globin Apparent differences in polynucleotide degradation by sequences have been reported (25). biochemical and chemical factors between and tHuman and monkey B mRNA prepared from livers primordial or intestines. There are variations of the reported sizes (26-29); the contemporary microenvironments do not jeopardize these average value is 20 kb. assumptions about the early environment. A contemporary §c--containing transcripts ofchronic myelocytic leukemia (K-562) cell is well equipped with a regulatory facility composed of cells (30). powerful polymerases, , and also powerful nucle- $mRNA transcribed from thyroglobulin gene (>200 kb), the ase inhibitors. Conversely, a primordial soup/cell probably largest gene known to date (31). had neither powerful (15) nor inhibitors. - ization occurred at a much reduced rate and hence required data reflects the absolute absence of gigantic single-stranded much more time in the primordial soup/cell. The concentra- RNA molecules. During the evolution of genes, a tion of polynucleotides formed in a primordial soup/cell system may have developed that processes nascent pre- might not differ significantly from the contemporary level. mRNA during and avoids the formation of large Furthermore, primordial polynucleotide molecules presum- (>20 kb) pre-mRNA molecules. In fact, recent studies have ably possessed a built-in resistance to physical degradation- suggested that RNA splicing may occur while precursor RNA e.g., from thermal movement-similar to that of contempo- is transcribed on chromosomes (33, 34). RNA splicing has rary molecules (22). Thus, the primordial microenvironments been also observed in cells, where coupled would not differ significantly from that of contemporary transcription- takes place (35, 36), indicating the microenvironments in factors limiting the size of single- coupling of RNA splicing with transcription before transla- stranded polynucleotides. Therefore, a hint regarding the tion. If Drosophila Ubx pre-mRNA were not processed for longest length of primordial polynucleotide molecules might splicing during transcription for 75 min (32), single-stranded be given from the sizes of contemporary single-stranded pre-mRNA of >20 kb would have to wait for nearly 55 min RNA and DNA. for processing in the nuclear microenvironment. The "75S" At least two types of polynucleotide are relevant to the transcripts of the Chironomus Balbiani ring are probably the present discussion. The sizes of viral single-stranded DNA largest intracellular RNA species thus far reported (37, 38); and RNA genomes have been collected (23, 24). Some larger however, this figure is almost certainly overestimated. A viral genomes are listed in Table 1. Note that the longest value of about 37 kb was suggested (39), but the actual size reported length for single-stranded DNA or RNA is generally is still uncertain (40, 41). in the order of 20 kb, although some reach 24 kb (23) or Furthermore, the extreme scarcity of monomer proteins slightly longer. No possessing single-stranded DNA larger than 700,000 Da that require >20-kb-long mRNA is or RNA significantly longer than these latter lengths have consistent with the assumption that >20-kb-long RNA can- been found (23). In single-stranded viral genomes, not safely survive in contemporary microenvironments. rate would influence the size limitation. [Obviously, much Therefore no single-stranded contemporary polynucleo- larger double-stranded forms of viral DNA or RNA have tides significantly larger than 20 kb may exist. It is conse- been detected in a wide variety of viruses (23)]. These quently proposed that a 20-kb-long single-stranded polynu- observations are not due to technical difficulties in detection cleotide was the longest molecule polymerized at random but are the consequence of natural limitations on the size of without a specific template and was the progenitor genomic single-stranded polynucleotides that can survive in the pres- form in primordial soup/cells. ent environment. This idea is further supported by the fact that cellular Distribution of Stop Codons in a Primordial Polynucleotide single-stranded to date do not markedly exceed 20 kb Molecule (see Table 1). A number of genes longer than 20 kb in total length have been discovered in various organisms. For It is assumed that stop and nonstop codons were randomly example, Ubx and bxd of Drosophila are -75 and >25 kb and independently distributed over the single-stranded poly- long, respectively (32). Yet, no conclusive evidence for nucleotide molecule polymerized in primordial soup/cells. pre-mRNA transcripts significantly longer than 20 kb has If there are n nonstop codons in a run of L codons, then been reported (for example, ref. 32). This might in part be due there are (L - n) stop codons. Therefore, the number of to technical difficulties in detecting minute amounts of highly opportunities for runs of nonstop codons is (L - n) + 1. The labile pre-mRNA molecules. However, we propose that this probability that any particular one of these runs contains r or Downloaded by guest on October 2, 2021 Evolution: Naora et al. Proc. Natl. Acad. Sci. USA 84 (1987) 6197

more nonstop-codons is pr, where p is the probability of a and 45). This is interpreted as a secure punctuation system to nonstop codon; because L is large, p can be approximated by prevent the formation ofunduly long protein molecules. Such p = nIL. The probability that at least one of the (L - n + 1) a system would have been less likely to evolve if two stop runs is at least r codons long is therefore codons rather than three had been selected. These results suggest that with two stop codons, life would not have it[_ nr ]L-nI+ evolved-at least, not life in the present form. With four stop codons, the calculation (Table 2) shows that For large L and r this probability may be approximated by a 0.55-kb-long gene would appear very rarely-that is, at a frequency of 0.3%. Most of the open reading frames thus P = 1 - exp[-(L - n + 1)(n)] [1] generated in the primordial genomes would have been too short to be functional. This condition would make the (see ref. 42 for details). generation of useful primordial genomes in sufficient quan- Assuming that three stop codons were selected from a total tities unlikely. Again, life on Earth would have been less of 64 codons in the primordial soup/cells, it may be argued probable under this condition. that for large L, the number of stop codons in the sequence is close to the value of3L/64. Eq. 1 can then be approximated Origin of Introns by The results described earlier suggest that reading frames that P 1 -x [ (61 r] [2] were aborted because they were much shorter than =0.55 kb may have been generated in the flanking regions of a Eq. 2 can be used to calculate the probability P of at least primordial gene at a relatively high frequency. The develop- one run of r or more nonstop codons in a sequence of L ment of a self-splicing mechanism (35) probably facilitated codons. the interlinking of distant (previously abortive) reading The result is shown in Table 2, where L = V3 X 20,000. The frames with the original function of the primordial gene; the central column of Table 2 shows that when there are three intervening regions between reading frames may have be- stop codons, a run of at least 0.55 kb of open reading frame come introns. If introns so originated, the existence of stop appears at a frequency of 4.6%-that is, neither very rarely codons at the splicing sites is predicted. Indeed, splicing nor very frequently. signals often contain stop codons or sequences possibly Senapathy (43) gives an equation for the "upper limit in the derived from stop codons in phase or in different phases. RFLs [reading frame lengths] in a finite length of a random Ohno has proposed a prototype-splicing sequence containing sequence." There is strictly no upper limit other than the a (46). The subsequent utilization of abortive total length of the random sequence; the equation must refer reading frames clearly is not the sole event contributing to the to an expected value or some probability level, but this is not origin of introns because some introns are thought to have clear. His equation does give an upper limit similar to ours arisen by transposition at a much later stage of genome when the length of the random sequence is 107 kb, the size of evolution (47). some eukaryotic genomes (see Fig. 1). Estimation of Noncoding DNA Sequences in Various Use of Three Stop Codons: A Crucial Event in the Origin of Organisms Life Most of the primordial genomes contained only one primor- It should be noted here that genomic evolution itselfoccurred dial gene, and this gene is more often estimated as no longer because three stop codons had been originally selected from than =0.55 kb. In this situation the primordial genome the total 64 codons. structure consisted of a 0.55-kb-long primordial gene and (20 In Table 2, r values were also computed using a modifi- - 0.55)-kb-long noncoding sequences. Because genes are cation of Eq. 2 under the assumption that two or four stop believed to have evolved by duplication (48, 49), as a first codons had been selected in a primordial coding system. With approximation, the prototype structure of the genome was two stop codons, a 0.55-kb-long gene would have appeared assumed to be in principle maintained despite a serial, very frequently-that is, at a frequency of 46%. This implies random duplication of its sequences through evolution. On that genes may often have been densely grouped in a single this assumption, the total size, m-kb, of nuclear noncoding polynucleotide molecule, resulting in a marked alteration of DNA sequences of various organisms may be estimated by the gene-duplication pattern (5). It is noticeable that many genes possess more than one stop codon, either consecu- m = N x (20 - 0.55), [3] tively or close by each other, in phase or in different phases, at the end of a given reading frame (for example, see refs. 44 where N is the number of genes in a given species of . Reasonable estimates have been made of the total Table 2. Distribution of longest run, r, of nonstop codons for a number of protein-coding genes in and eukary- given value of polynucleotide size 3L otes (cf. ref. 1, pp. 69-109). Genes that are transcribed but not Sizes of r, expressed in translated are known to exist in contemporary organisms (cf. kb when L = 1/3 x 20,000 codons ref. 2). However, because these genes form a relatively minor Probability fraction of total DNA in larger genomes and, furthermore, as longest run :r Stop codons, no. the primordial forms of these genes are currently unknown, P 2 3 4 such genes are excluded from the calculations; however, this exclusion does not change the estimates significantly. 0.003 1.053 0.722 0.550 In Fig. 1, m values calculated for various species according 0.01 0.939 0.647 0.494 to Eq. 3 are plotted against the genome sizes. Surprisingly, m 0.046 0.793 0.550 0.422 values calculated under the above assumption are not sig- 0.25 0.622 0.437 0.338 nificantly different from the total sizes of nonprotein-coding 0.46 0.550 0.389 0.303 sequences so far reported for most of the higher eukaryotes 0.75 0.474 0.339 0.265 (see Fig. 1), with the few exceptions described below. The m 0.95 0.401 0.290 0.229 values estimated for H. sapiens and N. tabaccum species are Downloaded by guest on October 2, 2021 6198 Evolution: Naora et al. Proc. Natl. Acad. Sci. USA 84 (1987)

18 17, 108 16.

15 107 14 0) 13* 0 c 6 en 106 12~~~~1 C,) 8 X/-o I -- 0) c '80 6 /9i 10 14 16 0 3 ° ... .'. 8 z I-5[ _o.-- *6. 7

/4 104 [ 3/ T- 1 2/ 104 10 106 107 10 10 Genome sizes, kb FIG. 1. Relation between total sizes of noncoding DNA sequences and genome sizes of various organisms. The total sizes of noncoding DNA sequences (0) reported for chosen species oforganisms (derived from publications cited in parentheses below) are plotted against the respective genome sizes; both axes are expressed on a logarithmic scale. The low values shown for prokaryote noncoding sequences do not imply absolute absence of noncoding sequences in prokaryote genomes. Total sizes of noncoding sequences that exclude DNA families (50-52) (A), or which represent only unique and middle repetitive sequences (53-55) (x), are also plotted. Open circles (a) represent m values estimated from Eq. 3 under the assumption that a prototype of the primordial genome is in principle retained in the genomes of contemporary species. Vertical lines represent variations in estimation due to uncertainties in the values ofN, the number ofgenes. Points are numbered and correspond to the following species of organisms: 1, rickettsia (56); 2, Escherichia coli (56); 3, (ref. 1, pp. 69-109); 4, (ref. 1, pp. 69-109); 5, elegans (ref. 1, pp. 69-109); 6, Chlamydomonas reinhardtii (ref. 1, pp. 69-109); 7, (ref. 1, pp. 69-109); 8, (ref. 1, pp. 69-109); 9, Oxytricha nova (ref. 1, pp. 69-109); 10, Physarum polycephalum (ref. 1, pp. 69-109); 11, Strongylocentrotus purpurants (ref. 1, pp. 69-109); 12, Gallus domesticus (57); 13, Nicotiana tabaccum (ref. 1, pp. 69-109); 14, H-omo sapiens (ref. 1, pp. 69-109); 15, cristatus (ref. 1, pp. 69-109); 16, Gonyaulax polyedra (ref. 1, pp. 69-109); 17, assyriaca (ref. 1, pp. 69-109); and 18, Protopteus aethiopicus (ref. 1, pp. 69-109). slightly lower than the total sizes of nonprotein-coding genome (ref. 1, pp. 69-109). Another group includes all sequences for these species. However, such small differ- prokaryotes and most lower eukaryotes of small genomic ences should not be emphasized because available estimates size. These exceptions are interpreted as follows. In the first for protein gene numbers (N) are not sufficient to obtain group, some special sequences of noncoding regions were highly accurate values. The excellent agreement between selectively amplified for some unknown reason, thus forming theoretical and observed estimates in Fig. 1 is not mere a gigantic genome without significant change in gene number. coincidence. The majority of the noncoding sequences of On the other hand, substantial portions, if not all of the contemporary higher eukaryotes in principle retain the pri- noncoding sequences, were selectively deleted from the mordial genome structure-that is, are composed of a single genomes in the second group. Indeed, specific of gene and the (20 - 0.55)-kb-long noncoding sequences. noncoding sequences has been observed in some cases-e.g., Highly repeated satellite DNA is known to exist in various mitochondrial genomes (47). This species of organisms. Because the origin of satellite DNA interpretation is families is thought to be different from others (50, 51), the supported by the view that prokaryotes may have branched amounts of noncoding DNA sequences were calculated, off from a common from which present eukaryotes excluding satellite DNA for species for which information is are thought to have evolved. available (50-52). These values are also plotted in Fig. 1, Considering all these points together, we propose that the showing excellent and, in some cases, even better agreement genomes of higher eukaryotes in principle retained the with the m values obtained above. Similarly, the total size of prototype of a primordial genomic structure during long nonprotein-coding sequences representing only unique and evolutionary periods. Therefore, noncoding sequences on middle repetitive sequences was also considered. As shown the chromosomes ofcontemporary species actually represent in Fig. 1, these values are in good agreement with theoretical the molecular fossils of obligatory by-products formed early estimates. in evolution. The hypothesis accounts for some data but Two exceptional groups of organisms appear to contradict obviously requires further supportive evidence. It is hoped the above agreement. One group consists of the special that this hypothesis will stimulate further research on the organisms, such as F. assyriaca, where the amount of evolutionary significance of structural organizations in ex- noncoding DNA sequences constitutes about 99.98% of the tragenic DNA sequences. Downloaded by guest on October 2, 2021 Evolution: Naora et al. Proc. Natl. Acad. Sci. USA 84 (1987) 6199

We thank Drs. E. H. Creaser, E. E. Decruz, and Ms. K. Koishi for 30. Grosveld, G., Verwoerd, T., van Agthoven, T., de Klein, A., comments. R.N.C. acknowledges a Visiting Fellowship at the Depart- Ramachandran, K. L., Heisterkamp, N., Stam, K. & Groffen, ment of Statistics, the Australian National University, Canberra. This J. (1986) Mol. Cell. Biol. 6, 607-616. work was supported by Grant 85-111-001 from the Toyota Foundation. 31. van Ommen, G.-J. B., Arnberg, A. C., Baas, F., Brocas, H., Sterk, A., Tegelaers, W. H. H., Vassart, G. & Vijlder, 1. Cavalier-Smith, T., ed. (1985) The Evolution of J. J. M. (1983) Nucleic Acids Res. 11, 2273-2285. (Wiley, Chichester, U.K.). 32. Hogness, D. S., Lipshitz, H. D., Beachy, P. A., Peattie, 2. Breathnach, R. & Chambon, P. (1981) Annu. Rev. Biochem. D. A., Saint, R. B., Goldschmidt-Clermont, M., Harte, P. J., 50, 349-384. Gavis, E. R. & Helfand, S. L. (1985) Cold Spring Harbor 3. Naora, H. & Deacon, N. J. (1982) Differentiation 21, 1-6. Symp. Quant. Biol. 50, 181-194. 4. Naora, H. (1986) Biol. Forum 79, 345-371. 33. Beyer, A. L., Bouton, A. H. & Miller, 0. L., Jr. (1981) Cell 5. Loomis, W. F. & Gilpin, M. E. (1986) Proc. Natl. Acad. Sci. 26, 155-165. USA 83, 2143-2147. 34. Apirion, D. (1983) Prog. Nucleic Acids Res. Mol. Biol. 30, 6. Oparin, A. J. (1924) Proiskhozdenic Zhizny (Izd. Moskovski 1-40. Rabochii, Moscow). 35. Cech, T. R. (1986) Cell 44, 207-210. 7. Haldane, J. B. S. (1929) Rationalist Ann. 148, 3. 36. Belfort, M., Pedersen-Lane, J., West, D., Ehrenman, K., 8. Eigen, M. (1971) Naturwissenshaften 58, 465-523. Maley, G., Chu, F. & Maley, F. (1985) Cell 41, 375-382. 9. Miller, S. L. & Orgel, L. E. (1974) The Origin of Life on the 37. Daneholt, B. (1972) (London) 240, 229-232. Earth (Prentice-Hall, Englewood Cliffs, NJ). 38. Serfling, E. (1976) Chromosoma 57, 271-283. 10. Dyson, F. J. (1985) Kagaku 55, 268-276. 39. Case, S. T. & Daneholt, B. (1978) J. Mol. Biol. 124, 223-241. 11. Naora, H. & Deacon, N. J. (1982) Proc. Natl. Acad. Sci. USA 40. Rydlander, L., Pigon, A. & Edstrom, J.-E. (1980) Chromo- 79, 6196-6200. soma 81, 101-113. 12. Haldane, J. B. S. (1964) in The Origins of Prebiological Sys- 41. Serfling, E. (1982) in Cell Differentiation, ed. Nover, L., tems and of Their Molecular Matrices, ed. Fox, S. W. (Aca- Luckner, M. &-Parthier, B. (Springer, New York), pp. 348- demic, New York), pp. 11-15. 375. 13. Darnell, J. E. & Doolittle, W. F. (1986) Proc. Natl. Acad. Sci. 42. David, F. N. & Barton, D. E. (1962) Combinatorial Chance USA 83, 1271-1275. (Griffin, London). 14. Dose, K. & Zaki, L. (1971) Naturforsch. 26b, 144-148. 43. Senapathy, P. (1986) Proc. Nati. Acad. Sci. USA 83, 2133- 15. Dose, K. (1976) in and Evolution, eds. Fox, 2137. L., Deyl, Z. & Blaiej, A. (Dekker, New York), pp. 149-184. 44. Fujita, T., Takaoka, C., Matsui, H. & Taniguchi, T. (1983) 16. Mikes, 0. (1976) in Protein Structure and Evolution, eds, Fox, Proc. Natl. Acad. Sci. USA 80, 7437-7441. L., Deyl, Z. & Bla2ej, A. (Dekker, New York), pp. 273-334. 45. Lemischka, I. & Sharp, P. A. (1982) Nature (London) 300, 17. Dayhoff, M. 0. (1972) Atlas of Protein Sequence and Struc- 330-335. ture (Natl. Biomed. Res. Found., Washington, DC). 46. Ohno, S. (1980) Differentiation 17, 1-15. 18. Richardson, J. S. (1981) Adv. Protein Chem. 34, 167-339. 47. Clark-Walker, G. D. (1985) in The Evolution of Genome Size, 19. Blake, C. C. F. (1985) Int. Rev. Cytol. 93, 149-185. ed. Cavalier-Smith, T. (Wiley, Chichester, U.K.), pp. 277-297. 20. Wetlauffer, D. B. (1981) Adv. Protein Chem. 34, 61-91. 48. Ohno, S. (1970) Evolution by (Springer, New 21. Savageau, M. (1986) Proc. Natl. Acad. Sci. USA 83, 1198- York). 1202. 49. Jeffreys, A. J. (1982) in , eds. Dover, G. A. 22. Saenger, W. (1984) Principles of Structure & Flavell, R. B. (Academic, London), pp. 157-176. (Springer, New York). 50. Walker, P. M. B. (1971) Prog. Biophys. Mol. Biol. 23, 147- 23. Matthews, R. E. F. (1982) Intervirology 17, 4-199. 190. 24. Fenner, F., McAuslan, B. R., Mims, C. A., Sambrook, J. & 51. Pardue, M. L. & Gall, J. G. (1972) in Molecular and , D. 0. (1974) The Biology ofAnimal Viruses (Academic, DevelopmentalBiology, ed. Sussman, M. (Prentice-Hall, Engle- New York). wood Cliffs, NJ), pp. 65-99. 25. Reynaud, C.-A., Imaizumi-Scherrer, M.-T. & Scherrer, K. 52. Coudray, Y., Quetier, F. & Guille, E. (1970) Biochim. Bio- (1980) J. Mol. Biol. 140, 481-504. phys. Acta 217, 259-267. 26. Knott, T. J., Rall, S. C. J., Innerarity, T. L., Jacobson, S. F., 53. Nover, L. & Reinbothe, H. (1982) in Cell Differentiation, eds. Urdea, M. S., Levy-Wilson, B., Powell, L. M., Pease, R. J., M. B. (Springer, New York), Eddy, R., Nakai, H., Byers, M., Priestley, L. M., Robertson, Nover, L., Luckner, & Parthier, E., Rall, L. B., Betsholtz, C., Shows, T. B., Mahley, R. W. & pp;- 23-74. Scott, J. (1985) Science 230, 37-43. 54. Davidson, E. H., Galau, G. A., Angerer, R. C. & Britten, 27. Huang, L.-S., Bock, S. C., Feinstein, S. I. & Breslow, J. L. R. J. (1975) Chromosoma 51, 253-259. (1985) Proc. Natl. Acad. Sci. USA 82, 6825-6829. 55. Doolittle, W. F. (1985) in The Evolution of Genome Size, ed. 28. Deeb, S. S., Disteche, C., Motoulsky, A. G., Lebo, R. B. & Cavalier-Smith, T. (Wiley, Chichester, U.K.), pp. 443-487. Kan, Y. W. (1986) Proc. Natl. Acad. Sci. USA 83, 419-422. 56. Herdman, M. (1985) in The Evolution of Genome Size, ed. 29. Cladaras, C., Hadzopoulou-Cladaras, M., Avila, R., Nuss- Cavalier-Smith, T. (Wiley, Chichester, U.K.), pp. 37-68. baum, A. L., Nicolosi, R. & Zannis, V. I. (1986) 57. Eden, F. C. & Hendrick, J. P. (1978) Biochemistry 17, 25, 5351-5357. 5838-5844. Downloaded by guest on October 2, 2021