VIROLOGY 219, 115–124 (1996) ARTICLE NO. 0228

The Escherichia coli Retrons Ec67 and Ec86 Replace DNA between the cos Site

View metadata, citation and similarand papers a Transcription at core.ac.uk Terminator of a 186-Related Prophage brought to you by CORE provided by Elsevier - Publisher Connector IAN B. DODD and J. BARRY EGAN1

Department of Biochemistry, University of Adelaide, Adelaide, South Australia 5005, Australia Received December 29, 1995; accepted February 28, 1996

Retrons are unusual, -encoding elements found in . Although there are a number of indica- tions that retrons are mobile elements, their transposition has not been observed. The Escherichia coli retrons Ec67 and Ec86 are different retrons inserted at the same site and we have further characterized this site in search of clues to the mechanism of retron transposition. We confirm, by extending previous sequence analysis, that Ec67 and Ec86 are inserted into prophages related to coliphage 186. Comparison with the recently published sequence of the 186 96–2% region indicates that the retrons have replaced Ç180 bp of DNA between the phage cohesive end site (cos) and the transcription terminator of a phage DNA-packaging . These features—DNA replacement at the insertion site and the location of retron junctions near transcription terminators or DNA cleavage sites—are shared with other retrons and suggest ways in which retron transposition might have occurred. ᭧ 1996 Academic Press, Inc.

INTRODUCTION 1990). The Ec67 insert is 2.1 kb while the Ec86 insert is 2.2 kb and includes an extra open reading frame (orf- A novel type of genetic element called a retron is re- 223) upstream of the msr-msd-ret sequences (Lim, 1991). sponsible for production of reverse transcriptase (RT) in The insertion position of Ec67 has been shown to be many bacterial strains (Inouye and Inouye, 1991, 1993). part of an even larger insert relative to the E. coli K12 Retrons consist of a reverse transcriptase gene (ret) pre- chromosome (Hsu et al., 1990). This 34-kb element is ceded by a short DNA region (msr-msd) involved in the flanked by 26-bp direct repeats of a sequence found at synthesis of an unusual DNA–RNA molecule termed the insertion site in the E. coli chromosome and contains msDNA. The small, single-stranded DNA moiety in this homologues of two of coliphage 186 among the molecule is encoded by msd and is produced by reverse 14 open reading frames identified, leading Hsu et al. to transcription of a portion of an msr-msd-ret RNA. An un- suggest as one possibility that the 34-kb element might usual self-priming mechanism results in the 5؅ end of be a prophage related to 186, with the 26-bp flanking this DNA being covalently linked to the 2؅ OH group of sequences being prophage attachment sites (Hsu et al., an internal residue of the RNA moiety. This short RNA 1990). 186 has a chromosome of 30 kb and forms an moiety is encoded by msr and appears to be that part of SOS-inducible prophage at 57 min in E. coli K12 (Bertani the msr-msd-ret RNA that remains after degradation of and Six, 1988; Woods and Egan, 1972, 1974). Data com- the RNA:DNA hybrid by RNase H activity. The function patible with the proposal that Ec86 and Ec67 are inserted of msDNA is not known. into a prophage were provided by Kirchner et al. (1992), Retrons were originally found in Myxobacteria, in who found that retron Ec86 was associated with an SOS- which they are widespread. More recently, retrons have inducible lethality. They suggested that the killing is also been found in some strains of Escherichia coli. The caused by SOS induction of the prophage-like element codon usage of these retrons is not typical of E. coli, and subsequent expression of cell lethal phage genes. suggesting that these elements have been acquired rela- Retronless derivatives were obtained and were found to tively recently (Inouye and Inouye, 1991). Retrons have have lost the SOS-inducible lethality and also to have now been found in a wide variety of bacterial species lost DNA sequences around the retron, consistent with (Rice et al., 1993). curing of the prophage (Kirchner et al., 1992; Lim, 1991). The first two E. coli retrons to be characterized, retrons Kirchner et al. named this presumptive retron-bearing Ec86 and Ec67, are inserted into the same DNA se- phage retronphage Ec86 (fR86) and concluded that it quence at the 19-min position of the chromosome in two was defective since no cell lysis or phage production E. coli strains (Hsu et al., 1990; Lim, 1991; Lim and Maas, could be detected. A similar situation is found with a third E. coli retron, 1 To whom correspondence and reprint requests should be ad- Ec73, which was found to lie in a P4-related prophage dressed. Fax: 61 8 303 4348. E-mail: [email protected]. (Sun et al., 1991). However, this phage, named fR73, is

0042-6822/96 $18.00 115 Copyright ᭧ 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.

AID VY 7869 / 6a15$$$101 04-12-96 14:16:08 vira AP: Virology 116 DODD AND EGAN viable, being able to produce phage particles and infect to the left of the end of orf-i, the homologue of 186 cI other strains, carrying the retron and the capability for (see below). The 26-bp sequence is not related to the msDNA production with it (Inouye et al., 1991). A fourth 186 att sequence and, unlike 186 att (Reed et al., manu- E. coli retron, Ec83, may also be inserted in a phage script in preparation), does not appear to be part of a (Lim, 1992). However, a fifth retron, Ec107, lies in the tRNA gene. Interestingly, there is a 17/19 base match short region between the convergently transcribed E. coli between part of the fR67 common sequence and a se- pyrE and ttk genes and is not phage associated (Herzer quence 60 bp to the left of the 186 common att sequence, et al., 1992). The location of another E. coli retron, Ec78 suggestive of some relationship between the two att (Maas et al., 1994), has not been defined. sites. Although direct evidence for retron transposition has fR67 appears to have an intact 186-like lytic–lyso- not yet been obtained, it is probable from the above data genic control region. Three 186 genes, cI, apl, and cII, that the retron is a mobile element—that is able either are involved in a transcriptional switch controlling the to transpose or be transposed. Based on the properties activity of the pR (lytic) and pL (lysogenic) promoters. CI of reverse transcriptase-encoding mobile elements in eu- and Apl are repressors, CII is a transcriptional activator, karyotes (Boeke and Corces, 1989), retron transposition and the Apl protein also acts as an excisionase (Dodd presumably involves reverse transcription. However, no et al., 1990, 1993; Neufing et al., manuscript in prepara- model for retron retrotransposition has been proposed. tion; Reed et al., manuscript in preparation). fR67 con- The Ec86 and Ec67 retrons are particularly interesting tains homologues of these genes (orf-i, orf-a, and orf- with regard to retrotransposition because they are differ- b) as well as equivalent potential promoters, pa and pi, ent retrons inserted into the same locus (Lim, 1991). It detected by a computer program for promoter prediction is likely that there is some feature of this location that (Kalionis et al., 1986). 186 pR and pL are sited between favors retron acquisition and therefore further character- the divergent cI and apl genes and are arranged face- ization of this locus may provide insights into the mecha- to-face, with their transcripts overlapping by 62 bases. nism of retron transposition. fR67 pa and pi are also face-to-face, with a 40-base Here we demonstrate additional homologies between overlap between their predicted transcripts. Like Apl and the 34-kb element and coliphage 186, confirming that CII, the Orf-a and Orf-b proteins contain potential HTH the Ec67 and Ec86 retrons are inserted into 186-related DNA binding motifs [2.3 and 4.4 SD, respectively, with prophages to form the retronphages fR67 and fR86. the method of Dodd and Egan (1990)]. Orf-i also contains Comparison with the sequence of the 186 96–2% region, a likely HTH motif (4.0 SD) and, like 186 CI, does not which is presented by Brumby et al. (1996), indicates that appear to contain sequence determinants for RecA-pro- the above retrons appear to have replaced some 180 bp moted self-cleavage displayed by LexA, the lambdoid of target DNA between the cohesive end site (cos) and repressors, and other proteins (Lewis et al., 1994; Sauer .(the 3؅ end of a DNA-packaging gene of the retronphage. et al., 1982; Slilaty and Little, 1987 This and other features of the insertion points of Ec67, In the area between the switch region and the repli- Ec86, and other retrons suggest possible steps in retro- case gene, fR67 contains three potential genes that are transposition. strongly homologous with 186 genes (Fig. 1, Table 1). The fil gene product inhibits cell division (Richardson RESULTS and Egan, 1989); the functions of CP79 and CP80 are unknown. In this region there are four 186 genes not Similar genetic organization of 186, fR67, and fR86 present in fR67 and three putative fR67 genes, includ- The similarities in genetic organization and protein se- ing dam, not present in 186. None of the 186 genes in quence between 186 and the fR67 sequence of Hsu et this region are essential (R. Jarvinen, unpublished data). al. (1990) are summarized in Fig. 1 and Table 1 and A similar sharing of subsets of genes is found between are discussed below. The independently obtained DNA 186 and its relative P2 in this region and homologies sequence data for fR67 of Lim (1991) lies within the Hsu between P2 and fR67 have also been noted (Liu et al., et al. (1990) sequence. Lim has also determined the DNA 1993). sequence of retron fR86 and its surrounding regions. Homology with the 186 replicase gene, A, is dispersed Outside of the retrons, the sequences of fR67 and fR86 between four reading frames in fR67 (Fig. 2). are almost identical, so that the two phages appear to homology between fR67 orf-2 and orf-3 reading frames be very close strains of the same virus. and the 186 and P2 A proteins has been noted previously The 26-bp sequence found at the left boundary of fR67 (Ilyina and Koonin, 1992; Liu et al., 1993; essentially re- with the host chromosome, at the right boundary (25/26 gions b and c in Fig. 2). These homologies, and additional identity), and again (23/26 identity) at the 19-min locus homology to 186 A in an orf of 45 amino acids (orf-45) on the K12 chromosome is consistent with a phage at- and in orf-e of Hsu et al. (1990), suggest that an intact tachment site (Hsu et al., 1990). From the data of Hsu et fR67 replicase gene could be constructed by three al., this presumptive att site of fR67 is located Ç1.9 kb frameshift mutations linking the four frames in the order

AID VY 7869 / 6a15$$$102 04-12-96 14:16:08 vira AP: Virology RETRONS IN A 186-LIKE 117

FIG. 1. Similar genetic organization of 186 and fR67. Known and predicted genes in 186 (Brumby et al., 1996; Kalionis et al., 1986; Richardson et al., 1989; Sivaprasad et al., 1990) and fR67 (Hsu et al., 1990) are indicated by shaded boxes. Slashes within a gene indicate frameshift mutations required to maintain the predicted reading frame (see text). Known and predicted promoters are shown as bent arrows. The presumptive att site of fR67 has been mapped 1.9 kb to the left of orf-i. The point of insertion of Ec67 into the phage is shown to the right of the cohesive end (cos). The Ec86 retron is inserted into the same locus in the phage fR86, a phage very closely related to fR67 (Lim, 1991). The Ç1kbof fR86 sequence known on each side of the Ec86 insertion point shows only Ç100 base pair differences with fR67 (Lim, 1991). Amino acid homology between 186 and fR67 genes (Table 1) is indicated by light shading. orf-45/orf-2/orf-3/orf-e. It is likely that these four orfs rep- also link regions of E. coli-like codon usage (Gribskov et resent a disrupted replicase gene because only orf-45 al., 1984; Staden and McLachlan, 1982) within the orfs. is preceded by a likely ribosome binding site (RBS; In our laboratory, we routinely use the existence of Stormo et al., 1982) and because the frameshifts would frameshifts in codon usage in overlapping orfs to flag the

TABLE 1 Homology between fR67 and 186 Genes

Protein homology fR67 186 Gaps/ Gene Positiona aab Gene aa % Identity lengthc Functiond orf-i 625–32 198 cl 192 30 3/191 Lysogenic repressor orf-a e 721–939 73 apl 87 26f 4/83 Excisionase/repressor orf-b 975–1481 169 cll 169 69 0/169 Establishment of lysogeny orf-66 1492–1689 66 fil 75 65 1/75 Inhibits cell division orf-c 1656–1994 113 — orf-77 2065–2295 77 CP79 77 44 0/77 ?—Nonessential orf-75 2298–2522 75 CP80 75 68 0/75 ?—Nonessential orf-d 2522–2821 100 — dam 2821–3675 285 — Dam methylase orf-45 3675–3809 805g A 694 See Fig. 2 Replicase orf-2 3734–5134 orf-3 5017–5625 orf-e 5625–6086 orf-40 6243–6362 143g tum 146 See Fig. 3 Antirepressor for SOS orf-33 6329–6427 induction of prophage orf-f 6369–6671 orf-g 7203–6889 105 — orf-52 9644–9799 341g orf2 ú190h DNA packaging? orf-5 10667–9690

a Nucleotide position on sequence of Hsu et al. (1990). b Number of amino acids in deduced protein. c Number of gaps and length of the BESTFIT alignment (GCG, 1994). d Function of 186 protein (see text) except for fR67 dam. e Starts taken from a likely RBS within the orf-a of Hsu et al. (1990). f This alignment scores 5.9 SD higher than the mean scores of 100 BESTFIT alignments of randomized Apl and Orf-a sequences. g Length of protein assuming frameshifts to link the listed orfs (Figs. 2 and 3 and text). .h orf2 sequence is of 3؅ end only

AID VY 7869 / 6a15$$$102 04-12-96 14:16:08 vira AP: Virology 118 DODD AND EGAN

virtually identical to the fR67 sequence but contains a different frameshift [position 6542 of Hsu et al. (1990)] that would remove the last 40 codons. Kirchner et al. (1992) have suggested that SOS induction of fR86 may explain the UV sensitivity of strains carrying the pro- phage. However, for the tum-like gene to be involved in this lethality, we would expect an intact open reading frame under LexA control. FIG. 2. fR67 contains a disrupted replicase gene. Homology to the fR67 and fR86 do not show any homology with the 186 replicase protein A (Sivaprasad et al., 1990) is present in four second gene orf97 of the LexA controlled of 186 overlapping reading frames in the fR67 sequence. The open reading (Brumby et al., 1996). In its place, in both fR67 and fR86, frames orf-2, orf-3, and orf-e (Hsu et al., 1990) are shown as boxes. is a leftward reading orf [called orf-g (Hsu et al., 1990), Each orf has been extended to the first upstream stop codon, with the position of the first ATG codon indicated by an open triangle. An orf see Fig. 1; or orf-111 (Lim and Maas, 1990)]. The orf-111 of 45 codons (orf-45) starting from a likely ribosome binding site region encodes a function able to cure a cell of the fR86 (GAGGT-6-ATG; solid triangle) overlaps orf-2. Its putative start codon prophage (Lim, 1991). This activity is puzzling, since exci- overlaps the end of the fR67 dam gene (Table 1). E. coli-like codon sion of 186 is catalyzed by Apl and Int, yet Orf-111 is not usage within the frames (see text) is indicated by light shading. Using homologous to either of these proteins. Perhaps Orf-111 the GCG programs TFASTA and BESTFIT (GCG, 1994), six blocks of significant amino acid similarity to the 186 replicase (dark shading) acts by derepressing the defective fR86 prophage. were found in the four frames. The following list gives for each pair of An alignment of cos sequences from P2-related bacte- sequences the amino acid positions in 186 A, the nucleotide positions riophages with similar sequences in fR67 and fR86 is in the fR67 sequence, the % aa identity, and the gaps/length of the shown in Fig. 4. The P4 and fR73 cos sites are included, alignment: a/a؅, 7–17, 3726–3758, 64%, 0/11; b/b؅, 204–361, 4457– since these phages use the terminase systems of P2- –c/c؅, 370–505, 4957–5358, 48%, 1/136; d/d؅, 509 ;0/138 ,37% ,4930 e/e؅, 541–621, 5493–5729, 35%, 2/83; f/f؅, related phages (Bertani and Six, 1988). It is clear that ;0/29 ,31% ,5484–5398 ,537 623–653, 5799–5891, 45%, 0/31. fR67 and fR86 have cos-like sequences. Within the 53- bp minimal region shown to specify cos function in P2 and P4 (Ziermann and Calendar, 1990; shown overlined possibility of sequencing errors, and this is one possible in Fig. 4), there are 43 positions fully conserved between explanation of the data. Alternatively, if the sequence is correct and the fR86 replicase gene is similarly dis- rupted, then it explains the inability to isolate viable fR86 (Kirchner et al., 1992), since an intact replicase gene would be essential for phage production. fR67 also appears to contain a disrupted tum-like gene. The 186 tum gene encodes a protein that causes 186 prophage induction (Brumby et al., 1996). The pro- moter for tum, p95, is under LexA control and is over- lapped by a LexA binding sequence (Lamont et al., 1989). Inactivation of LexA during the SOS response thus leads FIG. 3. fR67 contains a disrupted prophage induction antirepressor to Tum production and prophage induction. Amino acid gene. Three overlapping reading frames in fR67 show similarity to the homology to Tum is apparent in three reading frames in 186 antirepressor gene tum that is responsible for SOS induction of the fR67 sequence (Fig. 3), one of which is the orf-f of prophage (Brumby et al., 1996; Lamont et al., 1989). The open reading Hsu et al. (1990). Two frameshifts would be required to frames orf-f (Hsu et al., 1990), orf-40, and orf-33 are shown as boxes. produce an intact tum-like gene of 143 codons, beginning orf-f and orf-33 have been extended to the first upstream stop codon, with the position of the first ATG codon in orf-f indicated by an open with a likely RBS and displaying E. coli-like codon usage. triangle. Likely ribosome binding sites are shown by solid triangles: There is also a likely internal translation start site in this the orf-40 RBS is GGAGG-4-ATG; the internal start in the orf-f frame is gene, equivalent to an established internal translation GAGGT-7-ATG. E. coli-like codon usage within the frames (see text) is initiation site within tum (orf95.5; Brumby et al., 1996; Fig. indicated by light shading. Three blocks of amino acid similarity to 186 3). A potential promoter, p , is predicted upstream of the Tum were found in the three frames (indicated by dark shading). The f following list gives for each pair of sequences, the amino acid positions disrupted tum-like gene by our computer program and in 186 Tum, the nucleotide positions in the fR67 sequence, the % aa ,this promoter is overlapped by a likely LexA binding se- identity, and the gaps/length of the alignment: a/a؅, 7–26, 6261–6320 –quence (Fig. 3). Again, it is not clear whether the 25%, 0/20; b/b؅, 32–54, 6359–6421, 26%, 1/23; c/c؅, 75–145, 6453 frameshifts in the tum-like gene are caused by sequenc- 6665, 45%, 0/71. As with 186 tum, the fR67 tum-like gene is preceded ing errors or mutations. The sequence of fR67 of Lim by a potential promoter overlapped by a LexA binding site. The pro- moter, pf, has the 035 and 010 sequences TTTACA-17-TACTGT. The (1991) starts in the middle of orf-f of Hsu et al. (1990) putative LexA binding sequence TACTGTAAATATAAACAGTG has a and thus does not overlap the predicted frameshifts. The lEN20 score of 5.89, placing it in the top scoring half of established sequence of fR86 in the same region (Lim, 1991) is LexA sites (Lewis et al., 1994).

AID VY 7869 / 6a15$$$102 04-12-96 14:16:08 vira AP: Virology RETRONS IN A 186-LIKE BACTERIOPHAGE 119

FIG. 4. The retron junctions in fR67 and fR86. Sequences at the left and right junctions of the Ec67 and Ec86 retrons (Hsu et al., 1990; Lim, 1991) are shown aligned with homologous sequences from 186-related : 186 (Brumby et al., 1996), P2 (Hagga˚ rd-Ljungquist et al., 1989; Linderoth et al., 1991), PSP3 (Bullas et al., 1991), P4 (Halling et al., 1990), and fR73 (Sun et al., 1991). The predicted junctions are indicated by bent arrows, with the retron sequences shown in boldface. The first base of Ec67 msr and the initiation codon of Ec86 orf-223 are boxed. At the left side, cohesive end (cos) sequences from the five viable phages are aligned with similar sequences from fR67 and fR86. The thin line shows the minimal region needed for cos function in P2 and P4 (Ziermann and Calendar, 1990) and the triangles indicate the cos cleavage points. Bases shared between all five viable phages or between all seven sequences are shaded. At the right side, the similarity in DNA and amino acid sequence between the leftward P2 DNA packaging gene Q and its homologues in 186, fR67, and fR86 is shown. Bases conserved in all four phages and amino acids in fR67 and fR86 Orf-341 that are found in 186 Orf2 or P2 Q are shaded. Between cos and the DNA packaging genes, most of the retron sequences have been omitted. Homology between 186 and P2 and the partial PSP3 DNA sequence in this region is shown. The stop codons of the ret and Q/orf2/orf-341 genes are indicated by three-sided boxes and the putative transcription terminator sequences are indicated with arrows and dashed lines. The open circle marks the position predicted by (Lim, 1991) to be the right retron junction. the five viable phages and only 5 of these are not con- homologue, orf2 (Brumby et al., 1996). This, and further served in fR67/fR86. This suggests that the fR67/fR86 homology between fR67 and the P2 P gene, has been cos sites would be functional. noted by Linderoth et al. (1991). Again, however, the Q/ To the right of the retron insertion point in fR67 and orf2 gene equivalent, which we term orf-341, is disrupted fR86 there is homology to the leftward reading P2 gene in the fR67 and fR86 sequences. The 326-codon fR67 Q needed for packaging DNA into phage heads (Linder- orf-5 gene of Hsu et al. (1990) starts with a likely RBS, (oth et al., 1991) and to the partial sequence of its 186 and a single frameshift near its 3؅ end (position 9776

AID VY 7869 / 6a15$$$102 04-12-96 14:16:08 vira AP: Virology 120 DODD AND EGAN would restore orf-341. This portion of the frame is intact tracts, are indicated in Fig. 4. In this same region of P2 in the fR67 sequence of Lim (1991). However, in Lim’s and 186 likely terminators for the leftward Q/orf2 tran- sequence of fR67 and fR86, a different frameshift is scripts are found. Interestingly, the putative msd-msr-ret required to fuse the start of orf-341 to his orf-336. This terminators in fR67 and fR86 also seem capable of portion of the frame is intact in the Hsu et al. (1990) terminating leftward orf-341 transcription, since there are sequence. The intact Orf-341 proteins show Ç60% amino A-rich tracts to the left of the inverted-repeat sequences acid identity to the 186 and P2 equivalents. (Fig. 4). From these data the most likely position of the right retron boundary is within this transcription termina- The retron insertion point tor region, just to the left of the conserved T (Fig. 4). On the basis of the 186 and P2 sequences, this place- As with all known retrons, the lack of knowledge of ment of the retronphage junctions suggests that approxi- the target site sequence immediately before and after mately 180 bp of DNA has been replaced upon retron the insertion event precludes precise and confident anal- insertion. The removed sequences have no known func- ysis of the retron insertion point. Nevertheless, the simi- tion; rather, insertion appears to have left intact the larities between 186 and the retronphages allows us to known essential functions in this region, the cos site and make confident predictions about the sequences prior the DNA packaging gene. 186 and P2 heads are able to to retron insertion. Our examination suggests that inser- accommodate with insertions of at least 1.3 kb tion of the retrons involved replacement of some 180 bp (Bertani and Six, 1988). Thus, retron insertion itself may of sequence between cos and the DNA packaging gene. not have inactivated the recipient bacteriophage. Figure 4 shows the regions around the left and right boundaries of the retron. As pointed out by Lim (1991), DISCUSSION: CONCERNING THE MECHANISM the left boundary of each retron is very clear, with fR67 OF RETRON TRANSPOSITION and fR86 diverging from each other 7 bp to the right of the rightmost cos cleavage point. The retron sequences Three features of the retron insertion point in fR67 also diverge from the phage sequences at this point. and fR86 seem potentially significant with regard to any mechanism of retron transposition. These are the appar- Similarity between the P2 and 186 sequences (and the PSP3 sequence as far as it is known) continues beyond ent replacement of sequences at the target site, the ap- parent location of one junction within transcription termi- cos and into the Q/orf2/orf-341 genes. In contrast to the left retron boundary, the right junction nator sequences, and the location of a DNA processing site close to the other junction. These characteristics, of each retron is more difficult to define. Lim (1991) has suggested that the junctions lie approximately 13 codons which are shared with some other retrons, led us to some speculations about their possible roles in retrotransposi- before the end of the orf-341 gene, at a point where the tion. Another important feature of retrons is their apparent nucleotide similarity between fR67 and fR86 becomes particularly strong (marked by an open circle in Fig. 4). ability to transpose sequences upstream of the msr-msd- ret RNA. We discuss these four features below. Comparison with the 186 and P2 sequences suggests that the boundary is further left than this. Figure 4 shows Replacement of target site DNA that there is considerable nucleotide and amino acid homology between the four sequences right up to the 3؅ Previously, replacement of DNA at the target site was end of orf-341. We would place the right junctions no apparent only for retron Ec107, which appears to have further right than the conserved T within the stop codon removed 34 bp between the E. coli pyrE and ttk genes of orf-341 (Fig. 4). There is no compelling evidence from (Herzer et al., 1992). With our analysis of Ec67 and Ec86, comparison of the sequences of the Q, Orf2, and Orf-341 target site DNA replacement is now likely for all three proteins (Fig. 4) that retron insertion has inactivated the retrons whose left and right junctions can be reasonably DNA packaging gene. The sequence differences be- precisely predicted. In fR73 the right retron junction is tween fR67 and fR86 to the right of this point are ex- not clear (Sun et al., 1991); however, removal of the mini- plainable as due to natural variation between the fR67 mal retron insert leaves Ç140 bp less DNA between the and the fR86 phages prior to retron acquisition or to a gene and the att site in fR73 than there is in its relative divergence occurring since retron insertion. To the left P4, suggesting DNA replacement for this retron also. of this position, the retronphage sequences diverge from Only the left junction is known for Ec83 (Lim, 1992) and each other and the 186 and P2 sequences. The stop neither junction is known for Ec78. Removal of target site codons of the ret genes in fR67 and fR86 lie 42 and DNA is not a normal feature of known retroviral and 18 bp, respectively, to the left of this conserved T. Imme- retrotransposable elements or DNA transposable ele- diately to the left of this T are sequences that are likely ments such as transposons and insertion sequences, all to function as transcriptional terminators for the of which integrate their replication intermediates by a rightward msd-msr-ret mRNA. These sequences, which single-site recombination event mediated by an element- could form RNA stem loops followed by short poly(U) encoded DNA endonuclease (Polard and Chandler,

AID VY 7869 / 6a15$$$102 04-12-96 14:16:08 vira AP: Virology RETRONS IN A 186-LIKE BACTERIOPHAGE 121

transcript. The most straightforward possibility is that transposition requires a longer transcript initiating out- side of the retron. Another possibility, which is attractive because it would make the retron independent of exter- nal promoters, is that the left junction sequences are provided by the activity of a leftward promoter within the msr-msd-ret region. If this ret؅ transcript overlapped the ret transcript, then the two transcripts would be able to provide a complete retron sequence. Figure 5 shows a speculative scheme by which the activities of reverse transcriptase, RNase H, and DNA-dependent DNA poly- merases could produce a single dsDNA retron fragment -from overlapping retron . Whether such ret؅ tran scripts are produced by retrons is not known, although -we have detected potential ret؅ promoters in the se quences of a number of retrons. We believe that further investigation of retron transcription is essential.

Transcription terminator-mediated junction formation

How might the two junction formation events required for retron insertion be achieved? Common features of retron junctions in Ec67, Ec86, and Ec107 (Herzer et al., 1992) suggest one possible mechanism (Fig. 6). The right junction of each of these retrons displays features ex- -pected for a 3؅ end of an mRNA, with a potential tran scription terminator sequence following the ret stop co- FIG. 5. Use of a leftward retron RNA to transfer upstream sequences. don. In these three cases, the retron has inserted just to A hypothetical scheme whereby retrons could use reverse transcription the left of a leftwardly transcribed gene (orf-341 or ttk) to copy sequences upstream of the promoter (pret) for the msr-msd-ret such that, after insertion, ret transcription converges with tRNA, without depending on promoters outside of the retron. DNA is transcription of this gene. The potential transcription ter- denoted by thick lines, and RNA by thin lines with a bullet at the 5؅ end. Promoters are indicated by bent arrows. minator for the rightward ret mRNA in these three retrons seems likely to act also as the terminator for the leftward transcript, since the stem loop sequence of the ret mRNA 1995). Removal of target site DNA by retrons suggests terminator is preceded by an A-rich region that could that junction formation events at two separate sites are encode the poly(U) sequence for a leftward termination required for retron insertion. This may explain in part the signal (Fig. 4; Herzer et al., 1992). We suggest that hybrid- apparent rarity of retron transposition. ization of the terminator sequence at the end of ret mRNA from the donor retron site with the terminator sequence Transposition of left junction sequences of the target site orf341 or ttk mRNA could have formed these junctions (Fig. 6). In such a complex, the 3؅ end The right junctions of retrons with target site DNA lie of one mRNA could function as the primer for reverse downstream of the ret gene. It is likely that the DNA transcription of the other mRNA. The initial base pairing sequences brought by the retron to this junction are de- need not be very strong, provided that a transient interac- rived by reverse transcription of a rightward ret mRNA tion was cemented by RT and extension of at least one produced from the donor retron site (Fig. 5). Transcription of the 3؅ ends (as shown in Fig. 6). RNase H degradation of retrons has not been examined extensively but, in a of most of the RNA in the resultant DNA:RNA duplex -number of cases, a rightward transcript is known to begin could then provide a 3؅ OH group to prime DNA-depen close to the start of msr and to extend into the ret gene dent DNA polymerase synthesis of the second DNA (Herzer et al., 1992; Inouye and Inouye, 1991; Lim and strand. Incomplete RNase H digestion is a feature of

Maas, 1990). It is thought that the promoter (pret) for this msDNA synthesis (Inouye and Inouye, 1991). Once the msr-msd-ret mRNA lies just upstream of msr and within second DNA strand reached the RNA–RNA hybrid, poly- the sequences introduced at the target site. The left ret- merization would presumably stop. Extension from this ron junction lies between 60 and 980 bp to the left of DNA 3؅ end by RT would then copy the second mRNA msr. Thus, there are sequences introduced at the left into DNA (Fig. 6). Further RNase H and DNA-dependent junction that cannot be provided by the known retron DNA polymerase activity could produce a dsDNA frag-

AID VY 7869 / 6a15$$$103 04-12-96 14:16:08 vira AP: Virology 122 DODD AND EGAN

tered by RT. Shimamoto et al. (1993) have shown that msDNA synthesis by RT is sensitive to the presence of certain secondary structures in the template RNA. An abil- ity of an RT complex to simultaneously bind two RNA stem loops could stimulate pairing of terminator structures. This terminator-mediated model for junction formation may also explain a fourth junction, the left junction of the Ec107 retron, where there is another terminator-like sequence downstream of the rightwardly transcribed pyrE gene (Herzer et al., 1992). Formation of this junction by this mechanism would require hybridization of the pyrE mRNA terminator with a terminator at the end of a -ret؅ RNA. Thus, terminator-mediated insertion is a possi ble explanation of four of the nine known junctions of E. coli retrons (two junctions are known for Ec67, Ec86, Ec73, and Ec107; one is known for Ec83). Such a mecha- nism of insertion would seem to be advantageous for retrons because it would tend to minimize disruption of .

Association with DNA processing sites

The other characterized junctions, including the left junctions of the retrons in fR67 and fR86, do not seem associated with transcription terminators. It seems more than a coincidence that four of these five junctions are known to be or are likely to be located close to sites of sequence-specific endonuclease action, cos sites in Ec67 and Ec86 (Fig. 4) and att sites in Ec73 and Ec83 (Lim, 1992; Sun et al., 1991). A role for DNA cleavage sites in retron insertion is suggested by a model proposed by FIG. 6. Terminator-mediated junction formation. A hypothetical Eickbush and co-workers for retrotransposition of the scheme to explain the association of retron junctions with transcription terminator-like sequences. For formation of a right retron junction (as non-LTR element R2Bm in the silkworm Bombyx mori. in Ec67, Ec86, and Ec107), rightward ret mRNA from the donor retron R2Bm encodes a sequence-specific endonuclease activ- site would hybridize to a leftward RNA from the target site. A similar ity which nicks the target site DNA (Xiong and Eickbush, scheme could form a left retron junction (as in Ec107), with hybridization 1988). The resulting DNA 3؅ OH group is then used to -between a leftward (ret؅) RNA from the donor retron site (see Fig. 5) prime reverse transcription with the R2Bm RNA as tem hybridizing to a rightward mRNA from the target site. plate (Luan et al., 1993). This priming reaction does not seem to require DNA–RNA hybridization and it is thought ment containing the retron–target junction. This fragment that DNA and RNA are brought together by the endonu- would then be available for general RecA-mediated re- clease and RT protein moieties. If retrons can utilize such combination with the homologous sequences in the tar- a mechanism but do not encode their own dsDNA endo- get chromosome. nuclease activity, then the association of retron junctions The most unlikely part of this model is the initial RNA with DNA processing sites may reflect a requirement for hybridization event. We propose that the A-rich region foreign endonucleases, such as integrases or termi- preceding the ret terminator could hybridize to the U tract nases, to produce nicks in DNA. However, it is clear that at the end of the orf-341 or ttk mRNA (Fig. 4). The ttk the retrons have not inserted at the normal site of att or terminator is also preceded by an A-rich region, which cos cleavage by the integrase or terminase proteins (see could hybridize to the U tract at the end of the ret mRNA, Fig. 4). Such insertions would harm the phage and would providing a double interaction. However, neither of these thus be likely to be selected against. Perhaps the activity interactions is likely to be strong, and further pairing, in- of these integrase and terminase proteins is not error- volving the terminator stems, would be in competition with free and may result in occasional accidental DNA nicks the intramolecular pairing of the stems. The improbability localized near their normal sites of action which could of the pairing event is perhaps consistent with the infre- be utilized by the retron without harming the phage. We quency of retron transposition. Nevertheless, we propose note that the DNA sequence just 5؅ to the left retron that the terminator–terminator interactions may be fos- junction point of Ec67 and Ec86 is similar to a conserved

AID VY 7869 / 6a15$$$103 04-12-96 14:16:08 vira AP: Virology RETRONS IN A 186-LIKE BACTERIOPHAGE 123 tion of a ret؅ RNA from Ec67 and Ec86 (Fig. 7b). The events depicted in Fig. 7b would constitute the other junction formation event needed for insertion of Ec67 and Ec86. The lack of transcription terminator-like sequences at these DNA processing site-type junctions implies that in these cases the 3؅ end of the ret؅ RNA is formed by RNA processing rather than by termination.

Prospects Although the models described above are very specu- lative, they serve at least to highlight shared features of retrons that may be important in retron transposition. A number of aspects of the models are currently testable, particularly those concerning proposed activities of the retron RTs. The crucial requirement for investigation of retron transposition is a system that allows observation of transposition events. We hope that our description of the Ec67 and Ec86 insertion points provides a stimulus to the development of such a system.

ACKNOWLEDGMENTS

The authors are grateful to Masayori Inouye and Dongbin Lim for communication of unpublished data and to the members of the Egan laboratory for comments on the manuscript. Research in the Egan laboratory is supported by the Australian Research Council.

REFERENCES

Bertani, L. E., and Six, E. W. (1988). ‘‘The P2-like Phages and Their Parasite P4’’ (R. Calendar, Ed.), Vol. II. Plenum, New York. Boeke, J. D., and Corces, V. G. (1989). Transcription and reverse tran- scription of . Annu. Rev. Microbiol. 43, 403. Brumby, A. M., Lamont, I., Dodd, I. B., and Egan, J. B. (1996). Defining the SOS operon of coliphage 186. Virology 219, 105–114. Bullas, L. R., Mostaghimi, A. R., Arensdorf, J. J., Rajadas, P. T., and Zuccarelli, A. J. (1991). Salmonella phage PSP3, another member of FIG. 7. DNA nick-mediated junction formation. (a) The left retron the P2-like phage group. Virology 185, 918–921. junction of Ec67 and Ec86 may be a site of infrequent, accidental Dodd, I. B., and Egan, J. B. (1990). Improved detection of helix-turn- DNA nicking by the phage terminase protein. Sequences conserved helix DNA-binding motifs in protein sequences. Nucleic Acids Res. at the normal terminase cleavage positions (from the five viable 18, 5019–5026. phages in Fig. 4) are represented as a consensus (thick box; S Å G Dodd, I. B., Kalionis, B., and Egan, J. B. (1990). Control of gene expres- or C) with the cleavages indicated by solid triangles. It is not known sion in the temperate coliphage 186. VIII. Control of lysis and lysog- whether these sequences are important in specifying terminase cleav- eny by a transcriptional switch involving face-to-face promoters. J. age. A hypothetical, rare terminase cleavage is indicated by the open Mol. Biol. 214, 27–37. triangle along with the consensus of the sequences in fR67 and Dodd, I. B., Reed, M. R., and Egan, J. B. (1993). The Cro-like Apl repressor fR86 (thin box) that lie just to the left of the retron junction (see Fig. of coliphage 186 is required for prophage excision and binds near 4). Matches of the fR67 and fR86 sequences to the normal cleavage the phage attachment site. Mol. Microbiol. 10, 1139–1150. site sequence consensus are shown in boldface. (b) A hypothetical GCG (1994). ‘‘Program Manual for the Wisconsin Package, Version 8.’’ scheme, based on the mechanism of R2Bm insertion (see text) to GCG, 575 Science Drive, Madison, WI 53711. explain the association of retron junctions with DNA processing sites Gribskov, M., Devereux, J., and Burgess, R. R. (1984). The codon prefer- (cos and att sites). The scheme relies on a DNA nick created near ence plot: Graphic analysis of protein coding sequences and predic- the cos or att site by a terminase or integrase protein. For formation tion of gene expression. Nucleic Acids Res. 12, 539–549. of the left retron junction of Ec67, Ec86, and Ec83, the retron RNA Hagga˚ rd-Ljungquist, E., Barreiro, V., Calendar, R., Kurnit, D. M., and would be a leftward (ret؅) RNA from the retron donor site (see Fig. 5). Cheng, H. (1989). The P2 phage old gene: Sequence, transcription For formation of the right retron junction of Ec73, the retron RNA and translational control. Gene 85, 25–33. would be a rightward ret mRNA. Halling, C., Calendar, R., Christie, G. E., Dale, E. C., Deho, G., Finkel, S., Flensberg, J., Ghisotti, D., Kahn, M., Lane, M. L., Lin, C.-S., Lindqvist, B. H., Pierson, L. H., III, Six, E. W., Sunshine, M. G., and Ziermann, sequence just 5؅ of the cos cleavage sites (Fig. 7a). Acci- R. (1990). DNA sequence of satellite bacteriophage P4. Nucleic Acids dental cleavage of this site by the terminase might pro- Res. 18, 1649. vide a 3؅ OH for R2Bm-like priming of reverse transcrip- Herzer, P., Inouye, S., and Inouye, M. (1992). Retron-Ec107 is inserted

AID VY 7869 / 6a15$$$103 04-12-96 14:16:08 vira AP: Virology 124 DODD AND EGAN

into the Escherichia coli genome by replacing a palindromic 34bp mosomal target site: A mechanism for non-LTR transposition. Cell intergenic sequence. Mol. Microbiol. 6, 345–354. 72, 595–605. Hsu, M.-Y., Inouye, M., and Inouye, S. (1990). Retron for the 67-base Maas, W. K., Wang, C., Lima, T., Zubay, G., and Lim, D. (1994). Multicopy multicopy single-stranded DNA from Escherichia coli: A potential single-stranded with mismatched base pairs are mutagenic transposable element encoding both reverse transcriptase and Dam in Escherichia coli. Mol. Microbiol. 14, 437–441. methylase functions. Proc. Natl. Acad. Sci. USA 87, 9454–9458. Polard, P., and Chandler, M. (1995). Bacterial transposases and retrovi- Ilyina, T. V., and Koonin, E. V. (1992). motifs in ral integrases. Mol. Microbiol. 15, 13–23. the initiator proteins for rolling circle replication encoded by diverse Rice, S. A., Bieber, J., Chun, J.-Y., Stacey, G., and Lampson, B. C. (1993). replicons from eubacteria, eucaryotes and archaebacteria. Nucleic Diversity of retron elements in a population of Rhizobia and other Acids Res. 20, 3279–3285. gram-negative bacteria. J. Bacteriol. 175, 4250–4254. Inouye, M., and Inouye, S. (1991). msDNA and bacterial reverse tran- Richardson, H., and Egan, J. B. (1989). DNA replication studies with scriptase. Annu. Rev. Microbiol. 45, 163–186. coliphage 186: II. Depression of host replication by a 186 gene. J. Inouye, S., and Inouye, M. (1993). The retron: A bacterial retroelement Mol. Biol. 206, 59–68. required for the synthesis of msDNA. Curr. Opin. Genet. Dev. 3, 713– Richardson, H., Puspurs, A., and Egan, J. B. (1989). Control of gene 718. expression in the P2-related temperate coliphage 186. VI. Sequence Inouye, S., Sunshine, M. G., Six, E. W., and Inouye, M. (1991). Retron- analysis of the early lytic region. J. Mol. Biol. 206, 251–255. phage FR73: An E. coli phage that contains a retroelement and Sauer, R. T., Yocum, R. R., Doolittle, R. F., Lewis, M., and Pabo, C. O. integrates into a tRNA gene. Science 252, 969–971. (1982). Homology among DNA-binding proteins suggests use of a Kalionis, B., Dodd, I. B., and Egan, J. B. (1986). Control of gene expres- conserved super-secondary structure. Nature 298, 447–451. sion in the P2-related temperate coliphages. III. DNA sequence of Shimamoto, T., Hsu, M.-Y., Inouye, S., and Inouye, M. (1993). Reverse the major control region of phage 186. J. Mol. Biol. 191, 199–209. transcriptase from bacterial retrons require specific secondary struc- .Kirchner, J., Lim, D., Witkin, E. M., Garvey, N., and Roegner-Maniscalco, tures at the 5؅ end of the template for the cDNA priming reaction. J V. (1992). An SOS-inducible defective retronphage (fR86) in Esche- Biol. Chem. 268, 2684–2692. richia coli strain B. Mol. Microbiol. 6, 2815–2824. Sivaprasad, A. V., Jarvinen, R., Puspurs, A., and Egan, J. B. (1990). DNA Lamont, I., Brumby, A. M., and Egan, J. B. (1989). UV induction of coli- replication studies with coliphage 186. III. A single phage gene is phage 186: Prophage induction as an SOS function. Proc. Natl. Acad. required for phage 186 replication. J. Mol. Biol. 213, 449–463. Sci. USA 86, 5492–5496. Slilaty, S. N., and Little, J. W. (1987). Lysine-156 and serine-119 are Lewis, L. K., Harlow, G. R., Gregg-Jolly, L. A., and Mount, D. W. (1994). required for LexA repressor cleavage: A possible mechanism. Proc. Identification of high affinity binding sites for LexA which define new Natl. Acad. Sci. USA 84, 3987–3991. damage-inducible genes in Escherichia coli. J. Mol. Biol. 241, 507– Staden, R., and McLachlan, A. D. (1982). Codon preference and its use 523. in identifying protein coding regions in long DNA sequences. Nucleic Lim, D. (1991). Structure of two retrons of Escherichia coli and their Acids Res. 10, 141–156. common chromosomal insertion site. Mol. Microbiol. 5, 1863–1872. Stormo, G. D., Schneider, T. D., and Gold, L. M. (1982). Characterization Lim, D. (1992). Structure and biosynthesis of unbranched multicopy of translational initiation sites in E. coli. Nucleic Acids Res. 10, 2971– single-stranded DNA by reverse transcriptase in a clinical Esche- 2996. richia coli isolate. Mol. Microbiol. 6, 3531–3542. Sun, J., Inouye, M., and Inouye, S. (1991). Association of a retroelement Lim, D., and Maas, W. (1990). Mapping of the msDNA operon in the with a P4-like cryptic prophage (retronphage fR73) integrated into chromosome of Escherichia coli B. Mol. Microbiol. 4, 2201–2204. the selenocystyl tRNA gene of Escherichia coli. J. Bacteriol. 173, Linderoth, N. A., Ziermann, R., Hagga˚rd-Ljungquist, E., Christie, G. E., 4171–4181. and Calendar, R. (1991). Nucleotide sequence of the DNA packaging Woods, W. H., and Egan, J. B. (1972). Integration site of noninducible and capsid synthesis genes of bacteriophage P2. Nucleic Acids Res. coliphage 186. J. Bacteriol. 111, 303–307. 19, 7207–7214. Woods, W. H., and Egan, J. B. (1974). Prophage induction of noninduc- Liu, Y., Saha, S., and Hagga˚rd-Ljungquist, E. (1993). Studies of bacterio- ible coliphage 186. J. Virol. 14, 1349–1356. phage P2 replication: The DNA sequence of the cis-acting gene A Xiong, Y., and Eickbush, T. H. (1988). Functional expression of a se- and ori region and construction of a P2 minichromosome. J. Mol. quence-specific endonuclease encoded by the Biol. 231, 361–374. R2Bm. Cell 55, 235–246. Luan, D. D., Korman, M. H., Jakubczak, J. L., and Eickbush, T. H. (1993). Ziermann, R., and Calendar, R. (1990). Characterization of the cos sites Reverse transcription of R2Bm RNA is primed by a nick at the chro- of bacteriophages P2 and P4. Gene 96, 9–15.

AID VY 7869 / 6a15$$$103 04-12-96 14:16:08 vira AP: Virology