<<

y. Cell Sci. Suppl. 7, 00-00 (1987) 213 Printed in Great Britain © The Company of Biologists Limited 1987

STRUCTURE AND REPLICATION OF CAULIMOVIRUS GENOMES

R. HULL, S. N. COVEY a n d A. J. MAULE John Innes Institute, Colney Lane, Norwich NR4 7UH, UK

SUMMARY In this paper the current state of knowledge of the replication of cauliflower mosaic (CaMV) is reviewed and the DNA intermediates and enzymes involved in replication are discussed. Based on this information a model for the replication complex is developed. In this model it is suggested that replication complexes resemble virus particles and that, in their assembly, there are close interactions between the inclusion body protein, the virus coat protein, the replicase enzyme, the tRNA primer and the 35S RNA template. The similarities between CaMV replication complexes and those of are discussed, and we extend this discussion to a comparison between CaMV and reverse transcribing elements.

INTRODUCTION Caulimoviruses are the only group of plant whose particles are known to contain double-stranded DNA. Among the other morphological and cytological features which characterize members of this group are isometric particles which are strongly stabilized and which are usually found only within virus-specific, cytoplas­ mic proteinaceous inclusion bodies. The caulimovirus group has recently been reviewed by Hull (1984). Most of the information on the molecular biology of caulimoviruses comes from studies on the type member of the group, cauliflower (CaMV) (for general reviews see Covey, 1985; Maule, 1985a; Hull & Covey, 1985; Covey & Hull, 1985; Hohn et al. 1985). To introduce this chapter we will described the salient points of caulimovirus molecular biology. Caulimovirus virion DNA is a double­ stranded circular molecule of about 8kbp containing gaps or discontinuities at specific sites. There is always one strand with one discontinuity (gap 1) which, for CaMV, is the transcribed or ( —) strand. The transcripts have the potential to code for 6 or possibly 8 proteins larger than lOkDa; in carnation etched ring virus (CERV) there are 6 long open reading frames (Hull et al. 1986). The complementary strand has either one, two or three discontinuities depending upon virus or strain of virus (Hull & Howell, 1978; Hull & Donson, 1982; Donson & Hull, 1983; Richins & Shepherd, 1983). At each discontinuity, at least for CaMV, the 5' deoxyribonucleo- tide is at a fixed position (Hull et al. 1979) and frequently has one or more ribonucleotides attached (Guilley et al. 1983); the 3' terminus extends beyond the 5' terminus by a variable amount (5-40 nucleotides) giving an apparently triple­ stranded structure (Francki et al. 1980; Richards et al. 1981). RNA is transcribed from a supercoiled form of DNA which associates with host proteins (probably 214 R. Hull, S. N. Covey and A. J. Maule

histones) to produce a mini-chromosome (Olszewski et al. 1982). There are two major RNA transcripts, the 19S RNA which is the mRNA for the major protein of inclusion bodies (the gene VI product) and the 35S RNA which covers the full genome and has a terminal repeat of 180 nucleotides (see reviews cited above). • The recognition that the CaMV genome has several sequences and organizational features in common with retroviruses, together with the realization that members of another DNA virus group, the hepadnaviruses, involved reverse transcription in their replication (Summers & Mason, 1982) led to the development of a detailed model for CaMV replication (Hull & Covey, 1983a; Pfeiffer & Hohn, 1983). Since then the model has continually been refined. The first phase of the replication cycle is in the nucleus where the input DNA, having been covalently closed by a mechanism not yet determined, is transcribed by host RNA polymerase II (Guilfoyle, 1980) to give the 19S and 35S RNAs. The 19S RNA is translated in the cytoplasm to produce the inclusion body protein. The 35S RNA forms the template for the reverse transcription phase of replication which is considered to take place in replication complexes in association with the inclusion bodies. Priming of the ( —) strand DNA synthesis is by plant cytoplasmic tRNAmet initiator adjacent to the site that will form gap 1 at about 600 nt from the 5' end of the template. The formation of strong-stop DNA (termed sa-DNA; see below), the first strand switch, the formation of ( + ) strand primers at purine rich regions (adjacent to the sites of gaps in the plus strand) and the second strand switch are considered to be by mechanisms similar to those of replication. The main difference with retrovirus replication arises when the oncoming newly synthesized strand reaches a priming site. In CaMV it appears that only limited strand displacement takes place thus giving the characteristic structures of the gaps. This completes the mature virion DNA. In this chapter we are going to discuss the nature of replicative intermediates which are important to the elucidation of the replication mechanism, the structure and functioning of the replication complexes, and we will draw attention to the similarities and differences between the replication complexes and retrovirus particles. We will then expand the comparison with retroviruses to include hepadnaviruses and retrotransposons.

REVERSE TRANSCRIPTION REPLICATION INTERMEDIATES One approach to understanding details of the CaMV replicative process has been to analyse the structure of viral DNA forms isolated from infected leaves or from protoplasts infected in vitro. There are two basic types of replication intermediates, those which are extractable by phenol alone and those which resemble encapsidated DNA in that they are extractable by phenol only after prior protease treatment. A variety of viral DNA forms are found in phenol-extracted tissue (Hull & Covey, 19836). These forms have been variously termed ‘intracellular’, ‘unencapsidated’ or ‘free’ DNAs and this fraction contains relatively little virion DNA. A fundamental feature of reverse transcription as a means of generating double-stranded DNA from single-stranded RNA is that it proceeds in two phases. First, a ( —) strand DNA copy Structure and replication of caulimovirus genomes 215 of the RNA template is made and second, this DNA phase becomes the template for ( + ) strand synthesis. Thus, ( + ) strand synthesis cannot begin until after ( —) strand synthesis has started and passed at least the first priming site. From this it follows that, in theory, it should be possible to detect ( —) strands as single-stranded DNAs whilst ( + ) strands exist only in a double-stranded form in cells. One of the first CaMV replication intermediates to be characterised was a small fragment of viral DNA some 600 nucleotides long that co-purified with poly (A) + RNA and was termed sa-DNA (Covey et al. 1983). Using strand-specific probes-, it was shown that sa-DNA was of ( —) strand polarity and it also mapped to a region of the CaMV genome between the 5' terminus of 35S RNA and the position of the ( —) strand gap (G l) in virion DNA. Some of the sa-DNA molecules appeared to be base paired with two smaller fragments of ( + ) strand DNA whilst others were apparently single-stranded. An unusual feature of sa-DNA was that an RNA molecule approximately 75 nucleotides long (Covey et al. 1983) was covalently attached to its 5' end and this fragment had characteristics of plant cytoplasmic tRNAmet initiator (Turner & Covey, 1984). The structure of sa-DNA was reminiscent of retrovirus ( —) strand ‘strong-stop’ DNA, a very early reverse transcript of the genomic RNA 5' end that originates at the primer binding site (pbs). It seems very likely, therefore, that sa- DNA is the CaMV version of this strong-stop DNA. Appreciable amounts of sa- DNA were found associated with CaMV virions in a DNase-resistant form (Turner & Covey, 1984) and this raises the possibility that sa-DNA itself becomes a primer for reverse transcription and accumulates in excess to overcome any pause caused by the first template switch at the termini of 35S RNA. The nature and function of the association of sa-DNA with virions is not yet understood although curiously it is not fully protected from nuclease attack since the RNA moiety was found to be susceptible to RNase (Maule, 19856). Single-stranded DNAs of ( —) sense polarity larger than sa-DNA have also been reported. Marsh et al. (1985) isolated heterogeneous ( —) strands, that ranged in size between 0-6kb and 8-0kb, from CaMV putative replication complexes. They also found that ( + ) strands existed only in a duplex form with ( —) strands. Hull & Covey (19836) had previously demonstrated the presence in DNA extracts of infected leaves of double-stranded CaMV DNAs with single-strand extensions of (—) strands; no single-stranded ( + ) strands were observed. Thomas et al. (1985) showed that isolated CaMV replication complexes were capable of synthesizing both ( —) and ( + ) strands and that this synthesis was asymmetric. ( —) strands with a size range similar to that reported by Marsh et al. (1985) were also observed by Thomas et al. (1985). Thus, several lines of evidence show that CaMV replication exhibits asymmetry as the reverse transcription scheme would suggest. Less is known about the priming and elongation of CaMV ( + ) strands. It seems clear that ( + ) strands originate close to the two ( + ) stand gaps (G2 and G3) although the presence of additional gaps in a sub-population of virion DMA molecules (Maule & Thomas, 1985) suggests that there might be other ( + ) strand priming sites as well. DNA ( + ) strand synthesis is probably primed by RNA oligomers since the remnants 216 R. Hull, S. N. Covey and A. J. Maule of these putative primers have been detected in virion DNA by Guilley et al. (1983). The findings of genome-length ( —) strands described above suggests that ( + ) strand priming is a relatively slow event although ( —) strands might accumulate if ( + ) strand priming is prevented by loss of the RNA primers by excessive RNase H activity. In this respect it is interesting to note that Covey & Turner (1986) have discovered a population of double-stranded CaMV hairpin DNAs that originate at the pbs and their structures suggest that synthesis of double-stranded DNA is possible in the absence of a specific ( + ) strand priming event regulated by RNA oligomers. Quite what function this phenomenon serves in the CaMV replication cycle remains to be seen. Putative replication intermediates isolated from infected plants are shown in Fig. 1.

THE REVERSE TRANSCRIBING ENZYME Characterisation of the DNA polymerase in CaMV replication complexes in relation to its role as a reverse transcriptase has been based on its specificity for exogenously added artificial template and natural heteropolymeric RNAs as template/primers, its requirement for monovalent and divalent cations, sensitivity to actinomycin D, RNase and DNase, a correlation between the size of the active protein and the theoretical size of the putative pol gene product and an antigenic relationship between the primary pol amino acid sequence and a polypeptide present in replication complexes (see Hohn et al. 1985). That CaMV ORF V product does in fact provide the reverse transcriptase (pol) function is supported by amino acid sequence homology comparisons between known reverse transcriptases and ORF V product (Toh et al. 1983; Patarca & Haseltine, 1984; Volovitch al. 1984; Toh etal. 1985) and is proven by the demonstration that cloned ORF V is expressed in yeast to give a functional enzyme (Takatsuji et al. 1986). In common with other reverse transcriptases, the enzyme is capable of utilizing poly rA/oligo dT as a template primer (Guilfoyle et al. 1983; Volovitch et al. 1984) and is distinguishable from y-like polymerases in its utilization of poly rC/oligo dG (Volovitch et al. 1984). The enzyme showed a requirement for Mg2+ (Pfeiffer et al. 1984; Thomas et al. 1985) and monovalent salt at 30-50mM (Pfeiffer et al: 1984; Volovitch et al. 1984; Thomas et al. 1985). Perhaps the most compelling experimental evidence in favour of the CaMV- specific polymerase being capable of reverse transcription has been its demonstrable capacity to copy exogenously applied heteroribopolymers (Volovitch et al. 1984; Thomas et al. 1985). Thomas et al. (1985) showed that the enzyme copied cowpea mosaic virus RNA primed with oligo dT into cDNA of both ( —) and ( + ) polarity. This reaction was dependent upon the replicase complex preparation being subjected to a single freeze-thaw cycle, a treatment which presumably released the endogenous template from the enzyme. It can be predicted that replication by reverse transcription would be sensitive to RNase but insensitive to actinomycin D for the synthesis of ( —) strands and sensitive to both DNase and Actinomycin D for the synthesis of ( + ) strands. This is partially Structure and replication of caulimovirus genomes 217 1 +3 + 2 5 v □ — o— " ------•------•------□ 3'

<^—o®1 -

O— o -

1

Fig. 1. Products of reverse transcription of CaMV genomic RNA. The top line represents the genomic RNA showing the terminal repeat the ( —) strand priming site (O) at gap 1 and the ( + ) strand priming sites (O) at gaps 2 and 3. The next seven lines or pairs of lines show the various forms of DNA intermediates which are transcribed from the RNA which are described in Hull & Covey (19836), Covey et al. (1983), Turner & Covey (1984) and Covey & Turner (1986). The genomic DNA is shown at the bottom. substantiated by the published data but the experiments are complicated by the demonstration that nucleic acids in replication complexes have significant resistance to nucleases. For the CaMV DNA polymerase to act as a reverse transcriptase it would, by necessity, require that an RNase H activity be associated with it. With retroviruses this function is carried out by a domain within thepol gene product which, from the analyses of amino acid sequence homologies (Toh et al. 1985; Hull et al. 1986), would appear to exist also in the gene V product of CaMV and CERV. The mechanism by which RNase H degrades the RNA template after synthesis of ( —) 218 R. Hull, S. N. Covey and A. jf. Maule

DNA strands is less clear. Three reactions occurring during replication through reverse transcription indicate that the RNase H should have at least some endonucleolytic activity: firstly, the 35S RNA template must have the poly (A) tail cleaved from its 3' end: secondly, the tRNA primer should be removed from the 5' end of ( —) DNA after the pbs has been copied into ( —) strand DNA; thirdly, oligomeric RNA primers for ( + ) strand synthesis are most likely generated by cutting 35S RNA in hybrids with ( — ) strand DNA. Accordingly, RNase H activity has been shown to remove the tRNA primer from the natural substrates of the retroviruses, avian myeloblastosis and murine leukaemia viruses, and to produce ( + ) strand primers in in vitro reactions (see Varmus & Swanstrom, 1982). The dogma of reverse transcription is that the RNA template is degraded in a processive way after being transcribed into ( —) stand DNA, although evidence in support of this for caulimoviruses is lacking. Similarly, the relationship between the rate, direction and time of degradation mediated by RNase H and the growing ( — ) strand has also not yet been resolved for caulimoviruses although the observation that preparations of replicative intermediates from CaMV-infected turnip plants contain genome length RNA/DNA hybrids (Marsh et al. 1985) may point to a 5'—>3' activity. The efficiency with which RNase H is able to generate ( + ) strand primers through its failure to digest polypurine stretches of RNA appears to be variable in CaMV, as preparations of virion DNA which contain a small proportion of molecules with genome-length ( + ) strands (Maule & Thomas, 1985) and the detection of near genome length hairpin molecules (Covey & Turner, 1986) both suggest that primers have been lost, perhaps through an excessive appetite for the substrate. Most studies of the in vitro CaMV polymerase have noted the marked asymmetry in activity seen as an overproduction of ( —) strands in relation to ( + ) strands, also recorded for many retrovirus polymerases. Whilst the model for CaMV DNA replication through reverse transcription calls for a temporal asymmetry in synthesis, this disparity leads us to question whether the conditions most appropriate for ( —) strand synthesis in vitro are in fact suboptimal for ( + ) strand synthesis. Further­ more, we believe that the two processes are functions of separate subdomains of the same or separate molecules. Analysis of amino acid homology between CaMV and CERV pol gene products has identified two distinct regions within the polymerase domain which may represent the two activities (Hull et al. 1986). Alternative or further explanations could be that in vitro and in vivo the two activities function at very different rates and/or that the efficiency of priming for ( + ) strand synthesis is low. The characteristic structure of caulimovirus virion DNA which has a single discontinuity in the ( —) strand and, for most isolates, at least two discontinuities in the ( + ) strand, conforms with the model for DNA replication in which synthesis is continuous for the former and discontinuous for the latter. In this respect, caulimoviruses are similar to avian retroviruses (Varmus & Swanstrom, 1982). However, the existence of virion DNA molecules, for at least one isolate of CaMV, with either several or just one discontinuity in the ( + ) strand (Maule & Thomas, Structure and replication of caulimovirus genomes 219

1985) indicates that ( + ) strand synthesis can be either continuous of discontinuous, a factor probably controlled by the availability of primers. Although the proportion of molecules with genome-length ( + ) strands may be small, their existence will have to be taken into account when deducing the mechanisms involved in reverse transcrip­ tion through an analysis of replication intermediates.

The replication complex In vitro studies of CaMV DNA synthesis have been concerned firstly with the identification of the DNA polymerase as a reverse transcriptase and secondly, but equally, with attempts to understand the mechanisms involved, with reference perhaps to what was already known about the amplification of retrovirus genomes. With support from early autoradiographic evidence (Kamei et al. 1969; Favali et al. 1973) in favour of the inclusion body as the site of CaMV DNA synthesis, most workers have opted for subcellular fractions enriched for inclusion bodies as the starting material for the characterization of the DNA replicative activity. In retrospect this was a sound decision since, although the inclusion body itself may not be the primary structural unit in which or on which DNA replication occurs, it is now generally accepted that the process does take place within its environs. Pellet fractions obtained from low speed centrifugation of crude filtrates of tissue extracts made in the presence of non-ionic detergents, contain large numbers of inclusion bodies, together with nuclei from which they appear inseparable, but relatively uncontaminated with other organelles. Further purification of inclusion bodies (Mazzolini etal. 1985) and nuclei (Ansaei al. 1982; Guilboyle et al. 1983) has been achieved, although preparations of either completely free from the other have not been possible. Although the inclusion body preparations of Mazzolini et al. (1985) contained some nuclear debris they did not contain any cellular DNA. These low speed preparations incorporate labelled precursors into high molecular weight products and more specifically into CaMV DNA (Pfeiffer & Hohn, 1983; Pfeiffer et al. 1984; Guilfoyle et al. 1983; Modjtahedi et al. 1984; Mazzolini et al. 1985; Marsh et al. 1985; Thomas et al. 1985) using endogenous templates, although the proportion of the activity directed towards viral DNA synthesis may be low (e.g. only 3 % of total DNA synthesis; Modjtahedi et al. 1984). Hypotonic leaching of the replication complexes away from inclusion bodies followed by sucrose density gradient separation yields peak fractions where CaMV DNA synthesis represents 80-90% of total DNA synthesis (Pfeiffer & Hohn, 1983; Pfeiffer et al. 1984) well separated from CaMV-specific RNA synthesis. However, Marsh et al. (1985) reported that this slowly sedimenting material did not have the properties expected for a reverse transcribing replication complex. An alternative strategy for the preparation of CaMV replication complexes gives a clue as to their structural characteristics. We (Thomas et al. 1985) isolated replication complexes from infected turnip tissue and protoplasts using an extraction at pH 9, in the presence of Triton X-100, EGTA and high salt, and centrifugation through a 60% sucrose cushion at 250 000 £, conditions sufficient to sediment par­ ticulate matter with a sedimentation coefficient in excess to 25S. These preparations, 220 R. Hull, S. N. Covey and A. J. Maule without further purification, exhibited CaMV DNA synthesis as a larger proportion (40-50 %) of total synthesis than previously observed. The main difference between this and other protocols was in the use of a high pH, high salt extraction, a procedure which appears to disrupt inclusion bodies, since protein analysis of replication complexes using antisera to CaMV ORF VI and ORF IV products, reveals very little inclusion body protein but abundant coat protein (C. L. Harker, personal communi­ cation). The association of replicative activity with viral coat protein in a particulate form has also been noted by Marsh et al. (1985) who demonstrated the cosedimen­ tation of replication complexes from nuclear and microsomal lysates with intact CaMV virions. Hence, a model for replication is suggested in which the virion or provirion is implicated as the basic structural unit in which CaMV DNA synthesis occurs. Several further pieces of evidence add support to this hypothesis. (1). The detection in purified preparations of CaMV virions of CaMV DNA polymerase activity associated with a protein of 76K, a size which is similar to that of the coding capacity of the CaMV pol gene (Menissier et al. 1984). (2). The feature common to both virion DNA and the products of in vitro replication in that both are released from their nucleoprotein complexes by phenol only when predigested with protease (Thomas et al. 1985; Marsh et al. 1985). (3). The observation that a class of single­ stranded CaMV DNA molecules, considered to be replication intermediates, can be precipitated from preparations of replication complexes with antiserum to the 42K coat protein (Marsh et al. 1985). (4). The belief that, for hepadnaviruses, the only other retroid elements known to encapsidate the products of reverse transcription, DNA synthesis and virion maturation are integral processes (see Mason et al. 1987). The relationship between the specificity of the ‘cys’ motif on the gag or equivalent gene products of retroviruses and CaMV and the tRNA used for priming (—) strand DNA synthesis (Covey, 1986) is interesting in this context, since it may indicate that for CaMV there is a physical association between a coat protein subunit and the initial priming event in DNA replication. Coordination of the processes of DNA replication and encapsidation for caulimoviruses could account for the partial insensitivity of the former to nucleases and other inhibitors noted above.

A MODEL FOR THE REPLICATION COMPLEX There is now enough information accruing for us to propose a model for the replication complex. The data suggest that it resembles virus particles in many respects. Further interpretation of the available information leads us to suggest that there may be interactions between the inclusion body protein, the complete coat protein gene product, the (—) strand primer, the 35S RNA and eventually the viral DNA. The viral coat protein is encoded by gene IV and is initially produced as a precursor of 58K (Franck et al. 1980; Hahn & Shepherd, 1982). The capsid protein is 42K which is further processed to products of 37 and 35K. The 58K precursor is phosphorylated, the 42K protein derived from it is phosphorylated to a certain Structure and replication of caulimovirus genomes 221

38 % R + K

MoMLV pl5 PPl2

28% D+E 44% K 52 % D

CaM V [ ...... j (ppii) p42 A (p4)

Fig. 2. Comparison of the structure of the gag gene product of a retrovirus, MoMLV and CaMV. The proteins derived or thought to be derived from each main gene product are listed below as p (protein) or pp (phosphoprotein) with the molecular weight in kDa. Regions rich in aspartate and glutamate (D and E) or in basic amino acids (K = lysine, R = arginine) are identified A = the ‘cys’ sequence CX2 CX9C; H = hydrophobic region; H = major capsid protein.

extent but there is little phosphorylation of the 37 and 35K proteins (Hahn & Shepherd, 1980; Menissier-de Murcia et al. 1986). The latter polypeptides are slightly.glycosylated (Hull & Shepherd, 1976; Duplessis & Smith, 1981). From a comparison of the amino acid sequence of gene IV product with the amino acid composition of CaMV coat protein, Franck et al. (1980) suggested that the iV-terminal 100 or so amino acids (which are rich in aspartate and glutamate) and the C-terminal 30 or so amino acids (which are also rich in aspartate) are removed in the formation of the 42K protein. These regions of the 58K precursor protein are shown diagramatically in Fig. 2 in which this gene is compared with the analogous gene, the gag gene, of Moloney murine leukaemia virus (MoMLV). TheiV-terminal polypep­ tide of MoMLV (pl5) is very hydrophobic and interacts with the virion envelope. We would suggest that the Ar-terminus of the CaMV 58K protein, the putative phosphoprotein, ppll interacts with the inclusion body protein via the aspartate­ glutamate rich region. The phosphorylated region of the 58K protein might also be involved with the inclusion body protein or the nucleic acid. In MoMLV the polypeptide (ppl2) between the hydrophobic pl5 and the main core protein, p30 is phosphorylated and has been shown to be associated with the 35S RNA (Sen et al. 1976). But as Dickson et al. (1985) point out, the function of this protein is unclear and it may also interact with other gag products. Hull & Covey (1986) have argued that CaMV inclusion body protein is an analog of the retrovirus env protein. The main capsid protein of both MoMLV (p30) and CaMV (p42) each contain a very basic region which it has been suggested (Franck et al. 1980) is involved in nucleic acid binding. Adjacent to, and C-terminal of, this in the CaMV 42K protein is the ‘cys’ motif; this motif is in a similar relative position in MoMLV gag protein although in the final processed product it is in an adjacent polypeptide (plO). The ‘cys’ motif is considered to have Zn-mediated nucleic acid binding properties (Berg, 1986). As noted above, Covey (1986) pointed out a loose correlation between the sequence within the ‘cys’ motif and the tRNA used for priming ( —) strand DNA synthesis. 222 R. Hull, S. N. Covey and A. J. Maule

This raises the possibility that tRNA might interact with the ‘cys’ motif region and that the upstream basic region interacts with other nucleic acid sequences. From these considerations and those noted earlier in this chapter, we suggest the model shown in Fig. 3. We assume, by analogy with retroviruses (see Varmus et al. 1982 for review), that gene V (thepol gene) is expressed as a polyprotein with gene IV (th egag gene); a gag-pol polyprotein has recently been found in a hepadnavirus (Will et al. 1986) and in a retrotransposon, the yeast Ty element (Mellor et al. 1985; Clare & Farabaugh, 1985; Kingsman et al. this volume). This assumption is not essential for the model but it does help in positioning the polymerase. Also it is difficult to reconcile the homology between the A'-terminal part of gene V product with retrovirus protease domain if it is not expressed as a polyprotein. In the model we suggest that, for the polyprotein, a linked association is set up between the inclusion body protein and the A,T-terminal part of the 58K protein, the basic portion of the 58K protein and the 35S RNA, the ‘cys’ motif and tRNAmet and between the tRNAmet and the primer binding site on the 35S RNA. This could then position the RNA-dependent DNA polymerase subdomain of the gene V product and synthesis of ( — ) strand DNA could commence. There would have to be a mechanism that ensured that a gene IV + V polyprotein, and not just a gene IV protein, associated with the tRNA. If the putative ppll was not cleaved from the 58K protein by the protease domain on the gene V product then the gene IV produced on its own without gene V could remain sequestered by inclusion body protein. Subsequent steps in the assembly of the replication complex would include the release of gene V from gene IV by the action of gene V protease, release of the 42K protein from the association with inclusion body protein as the polymerase reached each molecule attached to the 35S RNA and assembly of the 42K protein into virion-like structures. It is realized that this model is highly speculative but it fits with current observations and explains various features of the virus such as inclusion bodies and the processing of the 58K coat protein precursor. However, if the model is substantiated, there are various problems that will have to be explained. These include the mechanisms of switching the interactions between the coat protein and nucleic acid from single-stranded RNA to double-stranded DNA. The factors involved in ‘maturing’ the virus particles to give the great stability found in purified virions and, at the same time, removing most of the pol gene and the products of RNase H digestion will have to be elucidated. Also to be explained is the occurrence of the ‘unencapsidated’ replicative intermediates described earlier although these could be associated with protein more tenuously.

Fig. 3. Diagram showing proposed model of the CaMV replication complex. In the top left is an electron micrograph of a cytoplasmic viral inclusion body (bar marker -0-5 fl). In the top right is a diagrammatic representation of a portion of inclusion body with virus particles embedded in it. Below is a diagram showing further details of the proposed interactions between the inclusion body protein, the gene IV product, the gene V product, the genomic RNA and the tRNA primer for minus-strand DNA synthesis. This model is discussed in the text. Structure and replication of caulhnovirus genomes Inclusion body protein^.-

Virus particle

Inclusion body " protein

Gene IV domains

Genomic RNA Gelne V RNaseH domains

Polymera

Fig. 3. 224 R. Hull, S. N. Covey and A. J. Maule

CaMV AND OTHER REVERSE TRANSCRIBING ELEMENTS The idea that CaMV replicated by reverse transcription first arose (Covey et al. 1983; Guilley et a l. 1983; Hull & Covey, 1983; Pfeiffer & Hohn, 1983) because of the structural similarities its genome exhibited with the genomes of hepatitis B viruses (Summers & Mason, 1982) and retroviruses (see Weiss et al. 1982, 1985). These included the presence of putative ( —) and ( + ) strand priming sites, a terminally redundant genome-length RNA transcript (35S RNA) and the finding of CaMV ( —) strand ‘strong-stop’ DNA (sa-DNA). In addition to these similarities there are differences in detailed aspects of the replicative processes that presumably reflect an evolutionary divergence and host adaptation in each element. Both CaMV and the heapadnaviruses encapsidate in virions the DNA phase and each exhibits a curious topology in that the DNA strands are discontinuous. One of the early events in the replication cycles of both of these groups of viruses is thought to be the generation of a supercoiled DNA transcription template which remains extrachromosomal in the nucleus of infected cells. This template produces the greater-than-genome-length RNA replication template as the next phase of the cycle. In contrast, retroviruses package the RNA replication template phase in virions and reverse transcription to produce DNA precedes the transcription phase in the cycle. Moreover, the DNA which arises from reverse transcription of retrovirus RNA is structurally complex in that considerable sequence rearrangement occurs to produce a linear molecule with long terminal repeats (LTR s). This DNA provirus becomes integrated into the host genome where it is transcribed to produce messenger RNA and genomic RNA. In the transcription template of retroviruses and in those of CaMV and hepadnaviruses an essential requirement is to generate RNAs with terminal repeats. In retroviruses this is from a linear molecule by transcription of the LTRs whilst in CaMV and hepadnaviruses the same sequence in a circular (supercoiled) DNA is transcribed twice. In all cases this necessitates ignoring transcription termination signals on the first pass but recognizing them on the second. It is not understood how this is achieved. A further fundamental difference in the replicative process of hepadnaviruses is that they are believed to

Fig. 4. Genome organization of (CaMV), ground squirrel hepatitis virus (G SH V), Rous sarcoma virus (RSV), Moloney murine leukemia virus (M oMLV), human T cell lymphotropic viruses II and III (H T L V II and H T L V III), yeast Ty element, Drosophila copia-Vike element 17-6 and Drosophila copia. Each is shown as on the genome length RNA (------), the terminal repeats being indicated by (■ ------). The subgenomic mRNA for the env gene or its equivalent is shown below the genomic RNA; * indicates splice. The gag H, pol ■ and env B genes are highlighted, genes I-V II of CaMV, the b gene of GSHV, the src gene of RSV, the x gene of H TLV II and the A and B genes of H T L V III are noted. The extents of the protease (pro), the reverse transcription (R T) and the integrase (int) domains of MoMLV and various features and regions common to all or most genomes are shown: O = hydrophilic, # = hydrophobic, p = phosphoprotein, o = nucleic acid binding domain, V = protease, T = reverse transcriptase, <0> = integrase. References to the information in this Fig. are given in the text. Reproduced from Hull & Covey 1986 with kind permission of the Journal o f general Virology. Structure and replication of caulimovirus genomes 225 prime ( —) strand synthesis with a protein, while CaMV and retroviruses utilize a host tR N A . In addition to the similarities in replicative mechanisms, these elements also share certain basic common features in the organisation of their genes. This extends to a large and diverse group of eukaryotic transposable elements called retrotransposons,

HTLV-II n „

IITLV-III A

copia

^ ^ g a g ' ------RNA o hydrophilic v protease m m p o l' —**— splice •hydrophobic »reverse 1/ 1 / \ env' n------■ repeat p phosphoprotein transcriptase Dother onucleic acid ° i ntegrase binding domain

Fig- 4. 226 R. Hull, S. N. Covey and A. J. Maule the DNA phases of which bear striking similarity to integrated retroviral proviruses. There is good evidence that at least two retrotransposons, the Ty element of yeast and the Drosophila copia element, undergo reverse transcription as part of the transposition process (Garfinkel et al. 1985; Mellor et al. 19856; Shiba & Saigo, 1983). As pointed out by Hull & Covey (1986) the genome organization of reverse transcribing elements can best be compared in the RNA phase as shown in Fig. 4. The archetypal retrovirus RNA genomes (of replication competent retroviruses) comprises three gene domains: gag, pol and env. The gag gene encodes virion structural polypeptides, pol the reverse transcriptase (and RNase H) and an endonuclease activity thought to mediate provirus integration, and env encodes virion envelope glycoproteins that are thought to confer host range specificity. A similar organisation is exhibited by the other reverse transcribing elements but with some modifications. For instance, some retrotransposons (e.g. Ty element and copia) have apparently lost (or have not acquired) the env gene which perhaps reflects their properties as ‘non-infectious entities’. The AIDS retrovirus, now known as HIV (human immunodeficiency virus), has additional ORFs that adapts it to the infection of T 4 helper cells, the genetic organization of hepadnaviruses has been ‘telescoped’ to produce a small genome with overlapping ORFs, and the CaMV genome has additional ORFs that presumably are required for its propagation in and transmission between plant hosts. These variations aside, it would appear that reverse transcribing elements share a common evolutionary origin and diverged at an early stage although essential functions such as reverse transcription have been conserved even at the amino acid sequence level. A similar suggestion has been made by Mellor et al. (1986) and Kingsman et al. (this volume). A further similarity between these elements is the rather unusual mechanism adopted to express the^ag and pol genes in that it is linked to produce a polyprotein subsequently processed into individual polypeptides. In the yeast Ty element and the retrovirus Rous sarcoma virus it has been demonstrated that frame-shifting occurs between these two out of frame ORFs (Mellor et al. 1985a; Clare & Farabaugh, 1985; Jacks & Varmus, 1985) and in CaMV there is evidence that expression of at least three of its ORFs is linked (Sieg & Gronenborn, 1982; Dixon & Hohn, 1984). It is interesting to speculate that such a frameshift, which on inspection of Fig. 4 might be a common feature in reverse transcribing elements, might be associated with the interaction of the genomic RNA and the upstream gag product discussed earlier. In conclusion, the reverse transcribing elements appear to constitute a unique group in that although they have greatly diversified throughout the taxonomic groups of eukaryotes, they have also retained many close similarities. It will be interesting to see if they are wholly a eukaryotic phenomenon or whether their progenitors are to be found amongst the prokaryotes as well. Structure and replication of caulimovirus genomes i n

REFERENCES A n s a , O. A., Bowyer, J. W. & S h e p h e rd , R. J . (1982). Evidence for replication of cauliflower mosaic virus DNA in plant nuclei. Virology 121, 147-156. B e r g , J. M. (1986). Potential metal-binding domains in nucleic acid binding proteins. Science 232, 485-487. C l a r e , J. & Farabaugh, P. (1985). Nucleotide sequence of yeast Ty element: evidence for an unusual mechanism of gene expression. Proc. natn. Acad. Sci. U.S.A. 8 2, 2829-2833. C o v e y , S. N. (1985). Organization and expression of the cauliflower mosaic virus genome. In Molecular Plant Virology, vol. 2. (ed. J. W. Davies), pp. 121-159. Boca Raton, Florida: CRC Press Inc. COVEY, S. N. (1986). Amino acid sequence homology in gag region of reverse transcribing elements and the coat protein gene of cauliflower mosaic virus. Nucl. Acids Res. 14, 623-633. C o v e y , S. N. & H u l l , R. (1 9 8 5 ). Advances in cauliflower mosaic virus research. Oxford Surveys of Plant Molecular and Cellular Biology 2, 339-346. Oxford: Oxford University Press. C o v e y , S. N. & T u r n e r , D. S. (1986). Hairpin DNAs of cauliflower mosaic virus generated by reverse transcription in vivo. EM BO J. (in press). C o v ey , S. N ., T u r n e r , D. S. & M u ld e r , G. (1983). A small DNA molecule containing covalently linked ribonucleotides originates from the large intergenic region of the cauliflower mosaic virus genome. Nucl. Acids Res. 11, 251-264. D ic k so n , C., E ise n m a n , R. & F a n , H. (1985). Protein biosynthesis and assembly. In RNA tumor viruses, vol. 2. (ed. R. Weiss, N . Teich, H. Varmus & J. Coffin), pp.135-146. Cold Spring Harbor Laboratory. D ix o n , L. K. & H o h n , T . (1984). Initiation of translation of the cauliflower mosaic virus genome from a polycistronic mRNA: evidence from deletion mutagenesis. EMBOJ. 3, 2731-2736. D o n s o n , J. & H u ll, R. (1983). Physical mapping and molecular cloning of caulimovirus D N A . J . gen. Virol. 64, 2281-2288. ' Duplessis, D. H. & S m ith , P. (1981). Glycosylation of the cauliflower mosaic virus capsid polypeptide. Virology 109, 403-408. F a v a l i, M. A., B a s s i, M. & C o n t i, G. G. (1973). A quantitative autoradiographic study of intracellular sites for replication of cauliflower mosaic virus. Virology 53, 115-119. F r a n c k , A., Guilley, H ., J o n a r d , G ., R ic h a r d s, K . & H ir t h , L. (1980). Nucleotide sequence of cauliflower mosaic virus DNA. Cell 21, 285—294. G a r f in k e l , D. J., B o e k e , J. D. & F in k e , G . R. (1985). Ty element transposition: reverse transcriptase and virus-like particles. Cell 42, 507-517. Guilfoyle, T . J. (1980). Transcription of cauliflower mosaic virus genome in isolated nuclei from turnip Brassica rapa cultivar Just Right leaves. Virology 107, 71-80. G u il f o y l e , T ., O l s z e w s k i, N., H a b a n , G ., K u z j , A. & M c C l u r e , B. (1983). Transcription and replication of the cauliflower mosaic virus genome: Studies with isolated nuclei, nuclear lysates and extranuclear cell-free leaf extracts. In Plant Molecular Biology, UCLA Symposium (ed. R. Goldberg), pp.117-136. New York: Alan R. Liss, Inc. G u il l e y , H ., R ic h a r d s, K . E. & J o n a r d , G . (1983). Observations concerning the discontinuous DNAs of cauliflower mosaic virus. EMBOJ. 2, 277-282. H a h n , P. & S h e p h e rd , R. J. (1980). Phosphorylated proteins in cauliflower mosaic virus. Virology 107, 295-297. H a h n , P. & S h e p h e rd , R. J. (1982). Evidence for a 58-Kilodalton polypeptide as precursor of the coat protein of cauliflower mosaic virus. Virology 116, 480-488. Hohn, T., Hohn, B. & Pfeiffer, P. (1985). Reverse transcription in CaMV. Trends in biochemical Sciences 10, 205-209. H u l l , R. (1984). Caulimovirus Group. CMl/AAB Descriptions of Plant Viruses No. 295. H u l l , R . & C o v e y , S . N . (1983a). Does cauliflower mosaic virus replicate by reverse transcription? Trends in biochemical Sciences 8 , 119-121. H u l l , R . & C o v e y , S . N. (19836). Characterization of cauliflower mosaic virus DNA form s isolated from infected turnip leaves. Nucl. Acids Res. 11, 1881-1895. H u l l , R. & C o v e y , S . N . (1985). Cauliflower mosaic virus: pathways of infection. BioEssays 3, 160-163. 228 R. Hull, S. N. Covey and A. J. Maule

H u l l , R. & C ovey , S. N. (1986). Genome organization and expression of reverse transcribing elements: variations and a theme. J . gen. Virol. 67, 1751-1758. H u l l , R ., C ovey , S . N., S t a n le y , J. & D a v ie s, J. W . (1979). The polarity of the cauliflower mosaic virus genome. Nucl. Acids Res. 7, 669-677. H u l l , R. & D o n s o n , J. (1982). Physical mapping of the DNAs of carnation etched ring and figwort mosaic viruses. .J. Gen. Virol. 6 0, 125-134. H u l l , R. & H o w e l l , S. H . (1978). Structure of the cauliflower mosaic virus genome. II. Variation in DNA structure and sequence between isolates. Virology 86, 482-493. H u l l , R. & S h e p h e rd , R. J. (1976). The coat proteins of cauliflower mosaic virus. Virology 70, 217-220. H u l l , R ., S a d l e r , J. & L o n g st a ff, M . (1986). The sequence of carnation etched ring virus DNA: comparison with cauliflower mosaic virus and retroviruses. EMBOJ. 5, 3083-3090. J a c k s, T . & V a r m u s, H. E. (1985). Expression of the Rous sarcoma viruspol gene by ribosomal frameshifting. Science 230, 1093-1102. K a m e i, T ., R u b io -H u e r t o s , M . & M a t s u i, C. (1969). Thymidine-3H uptake by X-bodies associated with cauliflower mosaic virus infection. Virology 37, 506-508. M a r s h , L ., K u z j, A. & Guilfoyle, T . (1985). Identification and characterization of cauliflower mosaic virus replication complexes - analogy to hepatitis B viruses. Virology 143, 212-223. Mason, W. S., Taylor, J. M. & H ull, R. (1987). Retroid virus genome replication. Adv. Virus Res. 3 2 , 35-96. M a u l e , A. J. (1985a). Replication of caulimoviruses in plants and protoplasts. In Molecular Plant Virology, vol. 2. (ed. J. W. Davies), pp.161-190. Boca Raton, Florida: CPC Press. M a u l e , A. J. (19856). Partial characterisation of different classes of viral DNA, and kinetics of DNA synthesis in turnip protoplasts infected with cauliflower mosaic virus. Plant molec. Biol. 5, 25-34. M a u l e , A . J. & T h o m a s, C. M. (1985). Evidence from cauliflower mosaic virus virion DNA for additional discontinuities in the plus strand. Nucl. Acids Res. 13, 7359-7373. M a z z o l in i, L ., B o n n e v il l e , J . M ., V olovttch, M ., M a g a z in , M . & Y o t , P . (1985). Strand- specific viral DNA synthesis in purified viroplasms isolated from turnip leaves infected with cauliflower mosaic virus. Virology 145, 293-303. M e l l o r , J., F u l t o n , S. M., Dobson, M. J., W ilson, W., Kingsman, S. M . & K in g sm a n , A. J. (1985a). A retrovirus-like strategy for expression of a fusion protein encoded by yeast transposon T y l. Nature, Land. 3 1 3 , 243—246. M e l l o r , J ., M a lim , M . H ., G ull, K., Tuite, M. F ., M c C re a d y , S., Dibbayawan, T ., K in g sm a n , S. M. & Kingsman, A. J. (19856). Reverse transcription activity and Ty RNA are associated with virus-like particles in yeast. Nature, lx>nd. 3 1 8 , 563-586. M e l l o r , J., K in g sm a n , A. J. & K in g sm a n , S. M . (1986). Ty, an endogenous retrovirus of yeast? Yeast 2 , 145-152. M enissier, J ., L a q u e l , P., Lebeurier, G. & H ir th , L . (1984). A DNA polymerase activity is associated with cauliflower mosaic virus. Nucl. Acids Res. 12, 8769-8778. M enissier-de M urcia, J., Geldreich, A. & Lebeurier, G. (1986). Evidence for a protein kinase activity associated with purified particles of cauliflower mosaic virus. J . gen. Virol. 67, 1885-1891. M o d jt a h e d i, N., V olovttch, M ., S o sso u n tz o v , L . H a bric o t, Y ., B o n n e v il l e , J. M . & Y o t , P. (1984). Cauliflower mosaic virus-induced viroplasms support viral DNA synthesis in a cell-free system. Virology 133, 289-300. Olszewski, N ., H a g e n , G . & Guilfoyle, T. J. (1982). A transciptionally active, covalently closed minichromosome of cauliflower mosaic virus DNA isolated from infected turnip leaves. Cell 2 9 , 395-402. P a t a r c a , R. & H aseltine, W. A. (1984). Sequence similarity among retroviruses. Nature, Land. 3 0 9 , 728. P f e if f e r , P . & H o h n , T . (1983). Involvement of reverse transcription in the replication of cauliflower mosaic virus: a detailed model and test of some aspects. Cell 33, 781-789. Pfeiffer, P., Laquel, P. & H o h n , T . (1984). Cauliflower mosaic virus replication complexes: characterization of the associated enzymes and of the polarity of the DNA synthesized in vitro. Plant molec. Biol. 3 , 2 6 1 -2 7 0 . Structure and replication of caulimovirus genomes 229

R ic h a r d s, K. E., G u il l e y , H. & J o n a r d , G. (1981). Further characterization of the disconti­ nuities in cauliflower mosaic virus DNA. FEBS Lett. 134, 67-70. R ic h in s, R . D. & S h e p h e r d , R . J . (1983). Physical maps of the genomes of dahlia mosaic virus and mirabilis mosaic virus - two members of the caulimovirus group. Virology 124, 208-214. S e n , A., S h e r r , C. J. & T o d a r o , G. J. (1976). Specific binding of the type C viral core protein pl2 with purified viral RNA. Cell 10, 91-99. S h ib a , T . & S a ig o , K . (1983). Retrovirus-like particles containing RNA homologous to the transposable element copia in Drosophila melanogaster. Nature, Land. 30 2 , 119-124. S ie g , K . & G r o n en b o r n , B. (1982). Introduction and propagation of foreign DNA in plants using cauliflower mosaic virus as a vector. Abstract NATO/FEBS Course on Structure and Functions of Plant Genomes, p .154. S u m m e r s , J. & M a so n , W . S . (1982). Replication of the genome of a hepatitis B-like virus by reverse transcription of an RNA intermediate. Cell 29, 403-415. T a k a t s u ji, H ., H ir o c h ik a , H ., F u k u s h i, T . & I k e d a , J-E . (1986). Expression of cauliflower mosaic virus reverse transcriptase in yeast. Nature, Land. 3 1 9 , 240-243. T h o m a s, C. M ., H u l l , R ., B rya n t, J. A. & M a u l e , A. J. (1985). Isolation of a fraction from cauliflower mosaic virus infected protoplasts which is active in the synthesis of (+ ) and ( —) strand viral DNA and reverse transcription of primed RNA templates. Nucl. Acids Res. 13, 4557-4576. T o h , H ., H a y a sh id a , H . & M iya ta , T . (1983). Sequence homology between retroviral reverse transcriptase and putative polymerases of hepatitis B virus and cauliflower mosaic virus. Nature, Lond. 3 0 5 , 827-829. T o h , H ., K ik u n o , R ., H a y a sh id a , H ., M iya ta , T ., K u g im iy a , W ., I n o u y e , S ., Y u k i, S . & S a ig o , K . (1985). Close structural resemblance between putative polymerase of a Drosophila transposable element 17-6 and a pol gene product of Moloney murine leukaemia virus. EMBOJ. 4, 1267-1272. T u r n e r , D. S. & C o v e y , S. N. (1984). A putative primer for the replication of cauliflower mosaic virus by reverse transcription is virion associated. FEBS Lett. 165, 285-289. V a r m u s , H. E. & S w a n stro m , R. (1982). Replication of retroviruses. In RNA tumour viruses, vol. 1. (ed. R. Weiss, N. Teich, H. Varmus & J . Coffin), pp. 369-512. New York: Cold Spring Harbor Laboratory. V a r m u s, H. E. & S w a n stro m , R. (1985). Replication of retroviruses. In RNA tumour viruses, vol. 2 (ed. R. Weiss, N. Teich, H. Varmus & J. Coffin), pp. 75-134. New York: Cold Spring Harbor Laboratory. V o lo vitc h , M ., M o d jt a h e d i, N., Y o t, P. & B r u n , G. (1984). RNA-dependent DNA polymerase activity in cauliflower mosaic virus-infected leaves. EMBOJ. 3, 309-314. W e i s s , R ., T e ic h , N., V a r m u s, H. & C o f f in , J. (ed.) (1982). RNA Tumour Viruses, vol. 1. New York: C old Spring Harbor Laboratory. W e i s s , R ., T e ic h , N., V a r m u s, H. & Coffin, J . (ed.) (1985). RNA Tumour Viruses, vol. 2. New York: Cold Spring Harbor Laboratory. W i l l , H., S a l f e l d , J., P f a f f , E ., M a n so , C ., T h e il m a n n , L . & S c h a l e r , H. (1986). Putative reverse transcriptase intermediates of human hepatitis B virus in primary liver carcinomas. Science 231, 594-596.