Proc. Nati. Acad. Sci. USA Vol. 80, pp. 5271-5275, September 1983 Biochemistry

Sequence coding for the nonstructural proteins is interrupted by an opal termination codon (Sindbis /nucleotide sequence/Middelburg virus) ELLEN G. STRAUSS, CHARLES M. RICE, AND JAMES H. STRAUSS Division of Biology, California Institute of Technology, Pasadena, California 91125 Communicated by Ray D. Owen, May 31, 1983 ABSTRACT We have obtained the nucleotide sequence of the Sequence of Sindbis RNA. Sindbis 49S RNA was transcribed genomic of two , and Middelburg into DNA by using reverse transcriptase and random priming virus, over an extensive region encoding the nonstructural (repli- as described (9). The single-stranded (ss) cDNA was cut with case) proteins. In both in an equivalent position an opal restriction enzymes Hae III or Taq I and the resulting frag- (UGA) termination codon punctuates a long otherwise open read- ments were subjected to sequence analysis by the chemical ing frame. The nonstructural proteins are translated as polypro- methods of Maxam and Gilbert (10), as described (9). All but tein precursors that are processed by posttranslational cleavage one of the Hae III fragments longer than 30 nucleotides and into four polypeptide chains; the sequence data presented here about 30 Taq I fragments that contained non-26S sequence were indicate that the COOH-terminal polypeptide, ns72, may be pro- subjected to sequence analysis. At this stage, together with the duced by read-through of this opal codon. The high degree of amino sequence of the 26S RNA (5), about 90% of the 49S sequence acid homology between the ns72 polypeptides of the two viruses, had been obtained. These sequences were aligned by restric- in contrast to the lack of conserved sequence upstream from the tion enzyme analysis of double-stranded (ds) cDNA made to read-through site, suggests that ns72 plays an important role in 49S RNA by using infrequently cutting enzymes such as EcoRI viral replication, possibly modulating the action of other replicase (6). At this point fragments encompassing the entire 49S se- components. quence could be aligned but the sequence still possessed gaps The alphavirus genus of the family Togaviridae contains about and ambiguities due to compression artifacts. Using this infor- 20 members, many of which are important human or veterinary mation to devise a cloning strategy, we obtained the entire ge- pathogens. These viruses contain an infectious -RNA of about nome as four nonoverlapping clones in a plasmid derivative of 11,700 nucleotides, which sediments at 49 S (reviewed in ref. pBR322 (unpublished data). Cloned DNA was used to se- 1). In the virus this RNA is surrounded by a single species of quence selected regions of the genome by the chemical method. basic protein, the protein, to form a nucleocapsid with A detailed description of the methods used and the entire se- cubic symmetry. As the final event in maturation this capsid quence of the Sindbis 49S RNA will be presented elsewhere. buds through the plasma membrane of the infected cell and Sequence Analysis of Middelburg RNA. A large number of acquires an envelope containing lipids of host cell origin and Hae III fragments from ss cDNA made to Middelburg 49S RNA two virus-encoded transmembrane glycoproteins, termed E2 were subjected to sequence analysis as previously described and El. (9). These sequences were translated in all three reading frames The alphaviruses produce two messenger RNAs after infec- and the deduced amino acid sequences were compared to those tion (1). The first, 49S RNA, appears to be identical to the RNA of Sindbis virus by using computer homology routines. In this in the virion and is translated as polyproteins that are processed way most of the sequences of the Middelburg fragments ob- to form the nonstructural proteins (2) required to replicate the tained could be positioned relative to the Sindbis 49S RNA se- virus RNA and to transcribe a subgenomic 26S RNA. This 26S quence and a partial restriction map was generated by com- RNA is the major messenger found on host cell polysomes and puter. Long ds cDNA was cut with infrequently cutting enzymes is translated into the structural proteins found in virions. The and the resulting fragments were subjected to sequence anal- sequences of the 26 RNAs of three alphaviruses, Semliki Forest ysis to confirm the alignment and obtain additional sequence. virus (3, 4), Sindbis virus (5), and Ross River virus (6), have been Only two EcoRI sites were found to be present in Middelburg determined recently and the details of the encoded proteins RNA, bracketing the region containing the stop codon. ds Mid- established. We have previously reported that the translation delburg cDNA was cut with EcoRI and a 2.5-kilobase fragment of the nonstructural polyproteins from alphavirus 49S RNA ter- was cloned into the EcoRI site of a plasmid derived originally minates with multiple in-phase stop codons positioned about from pBR322. This cloned fragment was subjected to sequence two-thirds of the distance from the 5' end (7). In this com- analysis in its entirety. A sequence analysis schematic for both munication we report that translation of the nonstructural pro- viruses is shown in Fig. 1. teins is modulated by the presence of an opal codon within the RNA. RESULTS AND DISCUSSION During sequence analysis of Sindbis 49S RNA we discovered MATERIALS AND METHODS the presence of an opal stop codon within the region encoding Preparation of RNA. Growth of Sindbis virus or Middel- the nonstructural proteins. A simple map of the Sindbis ge- burg virus and preparation of 49S RNA from infected cells or nome (Fig. 2) illustrates the location of this opal codon in re- from purified virions have been described (8). lation to the regions encoding the nonstructural proteins as well as the structural polypeptides of the virion. To demonstrate the frame for the nonstructural The publication costs of this article were defrayed in part by page charge generality of an open reading pro- payment. This article must therefore be hereby marked "advertise- ment" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Abbreviations: ss, single-stranded; ds, double-stranded. 5271 Downloaded by guest on September 29, 2021 5272 Biochemistry: Strauss et al. Proc. Natl. Acad. Sci. USA 80 (1983) SIN nt4344 nt7601 -4- J-- RNA * SS cDNA = CLONED DNA

MID EcoRl EcoRi c- ZI- RNA * = SS cDNA DS cDNA CLONED DNA 0 1000 2000 3000 4000

FIG. 1. Sequence analysis schematic for Sindbis virus (SIN) RNA and Middelburg virus (MID) RNA. The Sindbis sequence is numbered from the 5' terminus of 49S RNA. The two EcoRI sites define the cloned segment of the Middelburg genome. The regions sequenced as ss cDNA, as ds cDNA, and as cloned DNA, respectively, are noted for each virus. The open boxes are the regions of sequence presented in Fig. 3; the asterisk marks the opal codon. nt, Nucleotide.

teins punctuated with a termination codon as an integral mech- produced in rabbits to the synthetic polypeptide Arg-Gly-Glu- anism of alphavirus replication, we obtained the nucleotide se- Ile-Lys-His-Leu-Tyr-Gly-Gly-Pro-Lys, which is the deduced quence of a second alphavirus, Middelburg virus, in the region amino acid sequence at the COOH terminus of this polypeptide shown by a solid box in Fig. 2. The translated sequences for for Sindbis virus (overlined in Fig. 3). This antipeptide anti- both Sindbis and Middelburg viruses in this region are shown body precipitates ns72 as well as several larger polypeptides, in Fig. 3. In the case of Sindbis virus, nucleotides are num- presumably precursors, from Sindbis-infected cells (S. Lopez bered from the 5' terminus of the RNA and amino acids from and J. Bell, personal communication). Polypeptide ns72 could the NH2 terminus of the nonstructural polyprotein precursor. be produced by several mechanisms, including translation from The sequence of Middelburg is not complete and its sequence a spliced message, internal initiation, or read-through synthe- is aligned by homology with the corresponding Sindbis se- sis. The presence of a minor spliced message seems unlikely in quence. Both viruses have an opal codon at the same position view of the fact that spliced mRNAs have not been detected for in the sequence (boxed in Fig. 3). For Sindbis virus this stop animal RNA viruses whose messages are produced solely in the codon interrupts an otherwise for 2,513 cytoplasm and no evidence for such a RNA has been found for amino acids and is found at position 1,897 in this sequence (un- alphaviruses (12). Similarly, no indication of internal initiation published data). For both viruses the sequence through the stop has been seen for alphaviruses either in vivo (2) or in vitro (13). codon was obtained with both ss cDNA as well as cloned DNA Moreover, the first methionine residue downstream of the opal (Fig. 1). As noted earlier, the sequence of ss cDNA gives the codon is located at residue 1,957 and would result in a poly- majority nucleotide at any position for the population of reverse peptide only 556 amino acids long. Because the open reading transcriptase copies produced from the population of RNA mol- frames upstream and downstream of the termination signal are ecules and thus cloning artifacts, reverse transcriptase errors, in phase, the most likely mechanism for producing ns72 that fits or minor components in the RNA population do not influence all of the existing data is read-through synthesis to produce a the results. Furthermore, in the case of Sindbis virus both strands full-length polyprotein precursor of 2,513 amino acids, fol- of the cloned cDNA were subjected to sequence analysis in both lowed by posttranslational proteolytic cleavage. Indeed, minor directions (i.e., both 5' to 3' and 3' to 5') to rule out sequence amounts of such a precursor have been detected during in vitro analysis artifacts. The existence of the identical stop codon in translation of 49S RNA by Collins et aL (13), who suggested on the same position in two different alphaviruses makes it clear the basis of the synthesis kinetics that this polypeptide might that this opal codon is a fundamental landmark in the alphavirus be produced by read-through of a stop codon. Moreover, the genome. largest polypeptide precipitated from infected cells by the an- Downstream of the terminator, there is a long open reading tibody to the COOH-terminal peptide is sufficiently large to be frame encoding highly conserved proteins in the two viruses the 2,513 amino acid protein encoded by the entire nonstruc- (Fig. 3 and also see below), indicating that this sequence must tural region (S. Lopez and J. Bell, personal communication). be translated in vivo to produce a protein important for virus Thus, we believe that translation of Sindbis virus 49S RNA replication. Direct evidence for the existence of such a poly- produces two polyproteins, which contain the nonstructural peptide, called ns72, comes from experiments with an antibody proteins, both of which are initiated at the same location near

NONSTRUCTURAL PROTEINS VIRION PROTEINS 5' * 3' a ns60l nsi9 ns76 E3 F K CAP POlY (A)

FIG. 2. Map of the Sindbis virus genome. Boxes indicate translated regions. The solid box corresponds to the open box in Fig. 1 and to the se- quences shown in Fig. 3; the asterisk marks the opal codon. The crosshatched box is the region translated from the 26S subgenomic RNA into the structural proteins. Downloaded by guest on September 29, 2021 Biochemistry: Strauss et al. Proc. Natl. Acad. Sci. USA 80 (1983) 5273

1429 L K LL O N AY H A V A D L V N E HN I K S V A I P L L S T G I Y AA G K D R L E V S L N C L T T A SINN 4344 UUGAAAUUGCUACAAAACGCCUACCAUGCAGUGGCAGACUUAGUAAAUGAACAUAACAUCAAGUCUGUCGCCAUUCCACUGCUAUCUACAGGCAUUUACGCAGCCGGAAAAGACCGCCUUGAAGUAUCACUUAACUGCUUGACAACCGCG Y R T T R M I D GAUGCCGACCUAGCAGCUGUGUAUAGAGCCGUGGCGUCCUUGGCUGACGAG0 A 0 L AA V AV AS LA 0 E ACAGUCCGCACAAUGGCCAUACCACUCCUGUCAACGGGGACGUUCGCUGSGGGAAAGGACCGCGUGUUGCAGUCGUUGAACCACCUAUUUACGGCCT V R T M A I P L L S G F A G G K 0 V L O S L NH LF T A

L D R T O A V T I Y C L O K K W K E R I D AA L O LK ES V T E L K E M E I D E L V W I H P SINS1479)CUAGACAGAACUGACGCGGACGUAACCAUCUAUUGCCUGGAUAAGAAGUGGAAGGAAAGAAUCGACGCGGCACUCCAACUUAAGGAGUCUGUAACAGAGCUGAAGGAUGAAGAUAUGGAGAUCGACGAUGAGUUAGU4UGGAUCCAUCCA4494 D I D D L 0 TT 0 V 0 V T I Y C R 0 K S W E K K I D E A I 0 M R T A T E L L D 0 0 T T V M K E L T R V H P CUGGACACCACGGACGUCGAUGUGACGAUAUACUGCCGGGAUAASUC uGSSAA AAGAAAAUCCAAGAGGCCAUUGAUAUGAGGACGGCAACCGAACUGCUAGAUGACSACACAACGGUUAUG AA AGACUAACCAGGGUGCAUCCU

A Y Y L K G K G F S T T K G K L S F E G T K F H A A K D M A E I KV L F P N OD E S N E D L S INSW44411529) GACASGUUCGCUUGAAGGGAAGAAAGGGAUUCAGUACUACD SC AAAAGGAAAAUUGUAUUCGUACUUCGAAGGCACCA AAUUCCAUCD AAGCAGCAAAAGAC AUGGCGGAGAUAAAGGUCCUGUUCCCUAAUGACCAGGAAAGUAAUGAACAACUG C L V G S G F S T G R L H S Y L E G T R F H T A V A E R P T L P R A E E A N E I MIDMID GAUAGCUGCCUAGUGGGGCGCAGUGGAUUCAGCACGGUGGACGGACGGCUGCAUUCAUACCUUGAD S R V D AGGAACUAGGUUCCACCAGACUGCUGUCGACGUGGCD D V AGAAAGGCCAACUUUGUGGCCW AAGGAGAGAGGAAGCGAACGAGCAGAUAD

1579 A IL G E T M E A I R E K C P V H N P SS S P P K T L P C L C M Y A M T P E R V H R L R S N N SINS14794) UGUGCCUACAUAUUGGGUGAGACCAUGGAAGCAAUCCGCGAAAAGUGCCCGGUCGACCAUAACCCGUCGUCUAGCCCGCCCAAAACGUUGCCGUGCCUUUGCAUGUAUGCCAUGACGCCAGAAAGGGUCCACAGACUUAGAAGCAAUAACC S T H Y L G E S M E A I R T K C P V T SS A PP C T V P C L C R Y A M T P E R V H R L R AA MIDMID ACACACUACGUCCUUGGCGAAUCCAUGGAGGCCAUAAGAACCAAAUGCCCGGUGGAUGAUACCGACUCGUCV 0 0 0 AGC ACC ACC AUGC ACCGUCCCGUGCCUUUGCCGCUACGCC AUGACCCCCGAGCGCGUACAUAGAUUGCGCGCCGCGCAGD

VKEV T V CS S T P L P K H K I K N V K V C T K V V L F N P H T P A F V P A R K Y I E V P E D S IN 14944)N1629) GUCAAAGAAGUUACAGUAUGCUCCUCCACCCCCCUUCCUAAGCACAAAAUUAAGAAUGUUCAGAD AGGUUCAGUGCI ACGAAAGUAGUCCUGUUUAAUCCGC AC ACUCCCGC AUUCGUUCCCGCCCGUAAGUACAUAGAAGUGCCAGAACAG V K F T C S F P L P K K I P G V R V A C S A V M L F N H V P A L V S P A K Y E P S I S MIDMID GUGAAGCAGUUCACAGUCUGCUCCUCGUUCCCGCUGCCAAAGUACAAGAUACCAGGCGUGCD V S Y D AGAGAGUGGCGUGUUCGGCUGUAAUGUUGUUUAAUC ACGACGUUCC0 AGCGCUGGUA AGCCCUCGCAAGUAC RAGGGAACCGAGCAUUAGC

T A PP A D A E E A P E V V A T P S P S T A O N T S L O V T D I S L O M O O S S E G S L F SS F S IN(1679)PN5094)I CCUACCGCUCCUCCUGCAC AGGCCGAGGAGGCCCCCGAAGUUGUAGCGACACCGUCACC AUCUACAGCUGAUAAC ACCUCGCUUGAUGUC ACAGACAUCUCACUGGAUAUGGAUGAC AGUAGCGAAGGCUC ACUUUUUUCGAGCUUUAGC S E SS S S G L S V F L GS S E Y E P M E P V P E P L I L A VV EE T A P V R L R V A MIDMu AGCGAGUCGUCAUCAUCUGGACUGUCUGUGUUCGACCUGGACAUAGGCUCUGAUUCAGAGUACGAACCA0 0 I 0 AUGGAACCCGUGCD AGCCCGAACCGCUGAUUGACUUGGCAGUCGUAGAGGAGACGGCCCCCGUCAGACUGGAACGAGUGGCC0 E

1729) G S N S I T S M S W SS G P S S L E I V R R V V V A V H AV E P A P I P P PR K K M A SIN k244) GGAUCGGACAACUCUAUUACUAGUAUGGACAGUUGGUCGUCAGGACCUAGUUCACUAGAGAUAGUAGACCGAAGGC0 D D D AGGUGGUGGUGGCUGACGUUCD AUGCCGUCCD AAGAGCCUGCCCCUAUUCCACCGCCAAGGCUAAAGAAGAUGGCCL P V A A PRR A A T P F T L E VVA P V P A P T M R T MID CCUGUGGCUGCACCUCGCAGAGCCCGCGCGACACCCUUUACUUUGGAGR CAGCGGGUUGUAGCACCAGUUCCUGCGCCGCGUACGAUGCCAGUCAGACCCCCUCGCCGGAAGAAAGCGGCGACCD 9 R P V R PP R K K AA

S D D SIN (1779) R L A A A R K E P T P P A SN S S E S L H L S F GG V S M S L G S I F G ET A R AA V P L A T 5394 CGCUGGCAGCGGCAAGAAAAGAGCCCACUCCACCGGCAAGCAAUAGCUCUGAGUCCCUCCACCUCUCUUUGGUGGGGUAUCCAUGUCCCUCGGAUC AUUUUCGACGGAGAGACGGCCCGCCAGGC AGCGGUACAACCCCUGGCAACA R T P E R I S F G O L O A E C M A I I N OO L T F G O F G A G E F MIDrlsv AGAACACCUG^AAGGAUCUCGUUCGGCGAUUUAGAUGCCGAGUGCAUGGCCAUC AUAAAUGACGACCUGACUUUCGGGGACUUCGGCGCGGGCGAGUUC II A SIN (1829) G P T V P M S F S F S G E I E L S R V T E SEP V L F G S F E P G E V N S II SS R S A V (5544) GGCCCCACGGAUGUGCCUAUGUCUUUCGGAUCGUUUUCCGACGGAGAGAUUGAUGAGCUGAGCCGCAGAGUAACUGAGUCCGAACCCGUCCUGUU0 MID

SIN (1879) S F P L R K 0 R R R R S R R T E T G V G G Y I F S T 0 T G P G H L O KK S V L N L T E P (5694) UCUUUUCCACUACGCAAGCAGAGACGUAGACGCAGGAGCAGGAGGACUGAAUA UGA UAACCGGGGUAGGUGGGUACAUAUUUUCGACGGACAC-ASSGGCCCUGGGCACUUGCAAAAGAAGUCCGUUCUGCAGAACCAGCUUACAGAACCG3

MID E R L T S A R A G A Y I F S S O T G P G H L O OM S v R O T R LA D C GAACGUUUAACGUCAGC A UAGACCGGGCGGGGGCCUACAUAUUCUCAUCGGAUAC

L E N V L HA 0 0 0 M SIN (1929) T R E R I P V L T S K E E L K L Y P T E A N K S A YIDRU K V E N K A I (5644) ACCUUGGAGCGCAAUGUCCUGGAAAGAAUUCAUGCCCCGGUGCUCGACACGUCGAAAGAGGAACAACUCAAACUCAGGUACCAGAUGAUGCCCAC(CGAAGCCAACAAAAGUAGGUACCAGUCUCGUAAAGUAGAAAAUCAGAAAGCCAU MID v AE V H E E R V F A P KC 0 K EK E L L LL 0 M 0 M A P T EA N K S R Y S R K V E N M K A E A UUUUA.AAAU .AAUUUUALLLAAAAULL'Ar.AArrAALAAr AAArarrtrIA .rrAA rF r.AA Ar.1M. Ar. A Ar A .o;.AArnr.r-A (1979)11 E 9 L L S S.L.P.I.L N SAE.C1 0 KIA0 IAAP IAAPAUUKUI P AAAULULUASLAAULLAAAAUGGAGAACSAUGAAGCAGAG T R SIN ~1979) T E L L S G L R L r N S A T OO P E C Y K I T Y P K P L Y S SS V P A N YIS P O F A V A V C N 5994) ACCACUGAGCGACUACUGUCAGGACUACGACUGUAUAACUCUGCCACAGAUCAGCCAGAAUGCUAUAAGAUCACCUAUCCGAAACCAUUGUACUC CAGUAGCGUACCGGCGAACUACUCCGAUCCACAGUUCGCUGUAGCUGUCUGUAA1C I 0 MIDMID v L L G G A K L F V T P TT 0 C R Y V T H K H P K P M Y S ccST S V A F Y L S S AK T A V A AC N GUUAUCGACAGACUGUUGGGCGGAGCGAAACUGUUCGUGACACCCACAACCGACUGCCGAUACGUGACACACAAGCACCCAAAGCCGAUGUACUCGACUAGUGUAGCAUUCUACCUAAGUUCGGCUAAGACUGCAGUGGCGGCAUGCAAU..l....rA

SIN (2029) N Y L H E N Y A 1Y 0 A T0 Tfl lY L SM V D T V A C L O T A T F CUP A K L R S Y P K K H (6144) AACUAUCUGCAUGAGAACUA IUCCGACASUASCAUCUUAUCAGiUUiCUGACGAGUACGAUGCUUACUUGGAUAUGGUAGACGGIGACAGUCGCCUGCCUGGAUACUGCAACCUUCUGCCCCGCUAAGCUUAGAAGUUACCCGAAAAAACA ku E F N PD T vV T c v1 n I T n1 C I . v .. -a a, - rainMMn _L S Y T S Y I T D E Y D A L D M v DG S E S C L A A F C P S K L R S F P K K H D-AALIGAAUUCUUAAGUAGGAACUACCCAACUGUGACAUCCUAUCAGAUCACGGAUGAGUACGACGCCUACtUGbACAUGGUG6ACdGUUCC.AGAGCUGUEUA.ACAGA'GCA6CC'UUU6GU'CCGOCCiAA6UA6GCiGC'UUU.CGiAG.AG.AC11 aIr IvA~lA.Ar-r-AAI. lAIII A.l I- AIrvv ....1aF1 ., 11~~. -o,-.o V, T

SIN 209) Y R S V S A M N I L N V L I AA T K R N C N V T M R E L PWT L O S A T F N V E GAGUAUAGAGCCCCGAAUAUUCCGCAGUGCGGUUCCAUCAGCGAUGCAGAACACGCUAC AAAAUGUGCUCAUUGCCGCA ACUAAAGAUUGCA ACGUCACGCAGAUGCGUGAACUGCC AAC ACUGGACUC AGCGAC AUUCAUGUCGAA MID Y R S A V P S P F O N T L O N V L A AA T K R N C N V T O M R E L P T L D S A V F N V E AGCUACCAUCGAGCCGAAAUUAAGAAGUGCCGUUCCGUCUCCUUUCC AkAAUACACUGCAGAACGUGUUGGC CGCUGCCACGAAACGCAACUGCAACGUGACGCAGAUGCGGGAGCUGCCGA^CCUUGGAUUCAGCGGUGUUUAAUGUCGAG

SIN 29 R C D E Y W E E F A R K P I R I r r E F v T A Y v A R L K G P KA A A L F A K T Y N L V UGCUUUCGA^AAUAUGCAUGUAAUGACGAGUAUUGGGAGGAGUUCGCUCGGAAGCCAAUUAGGAUUACCACUGAGUUUGUC ACCGCAUAUGUAGCUAGACUGAAAGGCCCUAAGGCCGCCGC ACUAUUUGCAA^AGACGUAUAAUUUGGUC MID Y C N Y W D E F A o K P I R L TT E N I T S Y V T R L K G P KAA A L F AK T Y L K UGCUUUAAGAAGUACGCAUGCAAUAACGACUACUGGGACGAAUUUGCUCA^AAGCCCAUCAGGU)UGACGACGGAAAACAUCACAUCCUACGUGACUAGACUGAAGGGCCCGAAGGCAGCCGCUCUAUUCGC A^AAAACCUACGACWUGAAG

SIN ) P L O E v D R F V M O M K R v K v T P G T KH T E E R P K v O v I A A E P L A T A Y L C G I CCAUUGCAAGAAGUGCCUAUGGAUAGAUUCGUCAUGGACAUGA^AAGAGACGUGAAAGUUACACCAGGCACGAAACAC ACAGAAG^AAGACCG^AAGUACAAGUGAUACAA^GCCGCAGAACCCCUGGCGACUGCUUACWUAUGCGGGAUU MID P L O E v M O R F V V D M K DVK T P G T K H T E E R P K V O V I A A E P L A T A Y L C G I CCGUUGCAAGAGGUACCGAUGGACCGCUUCGUCGUGGAUAUGAAGAGAGACGUGAAGGUGACACCCGGC ACCAAGCAuACAGAAGAAAGGCC^AAGGUGCAA^GUCAUACAGGCAGCCGAACCCCUAGCCACCGCCUACCUGUGCGGUAUA

R E L v R R L T A V L L P N I H T L F D M S A E D F D A I I A E H F K 0 P L D A F 0 SIN 129J CACCGGGAAUUAGUGCGUAGGCUUACGGCCGUCUUGCUUCCAAACAUUCAC ACGCUUUUUGACAUGUCGGCGGAGGAUuuGAUGCAA^UCAUAGCAGAACACUUCA^AGCAA^GGCGACCCGGUACUGGAGACGGAUAUCGCAUCAUUCGACG O v E T I S MID R E L v RR L N A v L L P N V H T L F o M S A E D F D A I I S E H F RP G IA v L E T D I A S F CAUAGAGAACUAGUUAGAAGGUUAAAUGCAGUUCUGCUGCCGAACGUCCACACCCUGUUUGACAUGUCGGCGGAAGACUUUGAUGCCAUAAUACCGAGCACUUCCGCCCUGGGGACGCUGUACUAGAAA CAGACAUCGCAUCGUUCGAC

D SIN 12179) A^AAGCCAAGACGACGCUAUGGCGUUAACCGGUCUGAUGAUCUUGGAGGACCUGGGUGUGGAUCAACCACUACUCGACUUGAUCGAGUGCGCCUUUGGAGAAAUAUCAvUCCACCCAUCUACCUACGGGUACUCGUUUUAAAUUCGGGGCGOD A M A L T G L M I LE O L G v D O P L L D L I E C A F G E I S S T H L P T G T R F K F G A MID O D D S L A Y T G L M L L E D L G v D O P L L E L I E A S F G E I T S T H L P T G T R F K F G A AAGAGCCAAGACGACUCGCUGGCGUACACGGGGCUGAUGUUGCUAGAGGACCUCGGCGUCGACCAGCCUUUGCUCGAGUUGAUCGAAGCGUCGUUCGGGGAA^AUAACAAGC ACGCAUUUACCAACGGGAACUAGGUUCAAGUUCGGGGCC

SIN 2229 M M K G MF L LF N L N I A S L EEL K S C A F I G N I I H G MID MAUGAUGKALUCCGGAAUGUUCCUCACCUUUUUGUCAACLCLGUUUSGAAUGUCGUUAUCGCCAGCM K SG M F L T L F v N r M L N M T I A S AAGLGUACuEGEIGSGCGGCUUACACGUCCAGFUGUGCv L E E R L T N S K C AAAGCGUUC2UUGGCG0CGF I G D D N VAKI VAUCAUGG5GUAGUAUCUGACH G v K S D AUGAUGAAAUCUGGCAUGUUCCUGACGCUCUUCGUGAACACGAUGCUCAACAUGACUAUAGCUAGCAGAGUGUUAGAAGAACGGCUGACCAA^UUCU^AAUGUGCCGCCUUUAUCGGCGAUGAUAACAUUGUGC AUGGAGUGAA AUCUGAU

SIN (2379 K E M A E R C A T W L N M E v K I I A V I G E R P P F C GG F I L O O S v T S T A C V A P L (71t94) AAAG^AAUGGCUGAGAGGUGCGCCACCUGGCUCAAC AUGGAGGUUAAGAUCAUCGACGCAGUC AUCGGUGAGAGACCACCUUACUUCUGCGGCGGAUUUAUCUUGCAA^GAUUCGGUUACUUCCACAGCGUGCCGCGUGGCGGAUCCCCUG MID R E v v R Y G G V D 0 v T G T R v A L AAACUGCUGGCUGAGAGAUGCGCCGCAUGGAUGAACAUGGAAGUGAAGAUCAUCGAUGCAGUCAUGUGCGAGCGCCCCCCUUACUUCUGCGGAGGGUUAUCGUGUUUGACCAAGUUACAGGCACCUGUUGC AGAGUGGC AGACCCGCUG

0 SIN 24291 K R L F K L G K P L P A E E R A L L E K A W F R V I G L A R E V (7344 AAAAGGCUGUUUUAGUUGGGUAAACCGCUCCCAGCCGACGACGAGCAAGACGAAGACAGAAGACGCGCUCUGCUAGAUGAAACAAGGCGUGGUUUGAGUAGGUAUAACAGGCACUUUAGCAGUGGCCGUGACGACCCGGUAUGAGGUA A 0 A 0 R G O R R R R R v G 0 A M S E v AAGAGACUCUUUAAGCUCGGAAAACCGCUGCCUGCUGAAGACAAAC AGGACGAGGACCGCAGAAGGGCGUUGGCCGACGAGGC ACA^ACGGUGGAACCGCGUAGGUAUCCAAGCAGAUUUGGAGGCCGCAAUGAAC AGCCGUUACGAGGUC

SIN(2479) 0 N IUUT P v L L A L R T F A O S K R A FD A I R G E I K H L G G P K 7494 GACAAAUA ACACCUGUCCUACUGGCAUUGAGAACUUUUGCCCAGAGCAAAAGAGCAUUCCAAGCCAUCAGAGGGGAAAUAAAGCAUCUCUACGGUGGUCCUAAA UAG MID E G I N V I T A L T T L S N Y N F H L P I 0 L M1DGAGGGGAUCCGA AACGUCAUCACGGCGUUAACCACGCUGUC ACGGAAUUAUCACAAUUUCCGACR H R AUCUAAGAGGACCCGUUUUuGACCUCUACGGCGGUCCUAA^R G v G G P K UGA

FIG. 3. Translated nucleotide sequence of Sindbis virus (SIN) and Middelburg virus (MID) RNAs in the region containing the opal stop codon. For Sindbis virus nucleotides are numbered from the 5' terminus of 49S RNA and amino acids are numbered from the NH2 terminus (11) of the nonstructural polyprotein precursor. The Middelburg sequence is aligned by homology with the Sindbis sequence and the two underlined Hae III sites (G-G-C-C) were not directly sequenced; the positioning ofthe Middelburg sequence between nucleotides 5,394 and 5,729 of Sindbis is arbitrary because of the lack of homology and the large deletion in the Middelburg sequence relative to the Sindbis sequence. In other regions gaps have been introduced as necessary to maintain the homologous alignment. The opal stop codons are boxed; the wavy overline indicates the peptide synthesized to produce antibodies. The single-letter amino acid code is used in the translation of the RNA and the amino acid is positioned above the first nu- cleotide of the corresponding codon. A = Ala, C = Cys, D = Asp, E = Glu, F = Phe, G = Gly, H = His, I = Ile, K = Lys, L = Leu, M = Met, N = Asn, P = Pro, Q = Gln, R = Arg, S = Ser, T = Thr, V = Val, W = Trp, and Y = Tyr. Downloaded by guest on September 29, 2021 5274 Biochemistry: Strauss et al. Proc. Nad Acad. Sci. USA 80 (1983)

1429 SINDBIS VIRUS. 2513 tained'in Fig. 3 are compared with those of Middelburg virus 1. .; ...... nine residues at a time. A dot is placed in the matrix whenever \ . five residues in the string of nine match. The diagonal lines show regions of protein homology between the two viruses; off- .A. diagonal marks are due to random matches.,It is obvious that the two viruses are fairly closely related and Fig; 4--also illus- trates the used for en principle alignment of Middelburg sequence D data with that of Sindbis virus. Downstream from the opal ter- mination codon (arrow in Fig. 4) the two sequences are highly cc A\ homologous and upstream from this stop codon, between amino ~-j> N acids 1,429 and 1,676 in the Sindbis sequence, the two se- quences also demonstrate considerable homology. However, between amino acids 1,676 and 1,896 in the Sindbis sequence LL there is virtually no detectable homology. Furthermore, as the CD0 CD shift in the diagonal makes clear, there is a discontinuity in the alignment and the Sindbis sequence is longer in this region by 88 amino acids. Thus, the COOH-terminal sequence of the polyprotein preceding the opal termination codon is not ho- mologous in these two alphaviruses and differs in length so that the third nonstructural protein of Sindbis virus, ns76 (Figs. 2 and 3), is larger than the corresponding Middelburg protein. FIG. 4. Dot matrix comparing the proteins encoded by Sindbis RNA The function of this region of protein is unknown but the vari- and Middelburg RNA around the opal termination codon. The protein ation in size and sequence might imply that it is not vital for the sequences are compared nine amino acids at a time and a dot is placed virus replication cycle. in the matrix whenever five match. The position of the opal termina- In contrast, the extreme COOH-terminal nonstructural pro- tion codon is indicated by the arrow. tein, ns72, is highly conserved between Middelburg and Sind- bis (Figs. 4 and 5). Beginning just before the opal codon and the 5' end. One precursor is 1,896 amino acids long, terminates continuing to the series of in-phase stop codons in the junction at the opal codon, and is cleaved to produce ns6O, ns89, and region (7), which begin with an amber codon corresponding to ns76. The second precursor is produced in relatively smaller nucleotides 2-4 of 26S RNA, the two sequences are identical amounts by read-through of the opal codon, is 2,513 amino acids in length, exhibit perfect alignment with no gaps or insertions, long, terminates at the multiple in-phase termination codons in and show an overall homology of 73%. The start of ns72 is un- the junction region, and is cleaved to produce a fourth protein, clear but estimates of its molecular weight based on migration ns72, presumably in addition to the three products mentioned in acrylamide gels.and the fact that there is no conservation in above (Fig. 2). ns6O, ns89, and ns76 have been observed in both length or sequence upstream from the opal codon makes it likely lysates of infected cells (reviewed in ref. 2) and as the products that ns72 begins very near this stop codon, a likely start being of in vitro translation of 49S RNA (13) but the small amounts the glycine six residues downstream (unpublished observa- of ns72 produced have only been visualized by immunopre' tions). If we assume they begin at the stop codon, then the two cipitation. ns72s have the same amino acid at 448 out of 616 positions. In The protein-encoding regions on either side of the opal ter- addition, 33 out of the 168. substitutions that occur are con- mination codon have several interesting features, which are il- servative. There is one stretch of 98 amino acids (amino acids lustrated in Fig. 4. Fig. 4 shows a dot matrix in which the amino 2,161-2,259 in the Sindbis sequence) in which there are only acid sequences of Sindbis virus throughout the region con- five differences (Fig. 5). The COOH-terrminal regions show less

SIN (1429)LKLLONAYHAVAOLVNEHNIKSVAIPLLSTGIYAAGKORLEVSLNCLTTALDRTOAOVTIYCLDKKWKERIDAALOLKESVTELKDEDMEI DELVWIHP MID DAD.AAV.R ... .S.AD. TVRTM. TF.G...... T. V . R. .S.E KKIOEAIDMRTA ... LL.D.TTVMK...... SIN (1529) DSCLKGRKGFSTTKGKLYSYFEGTKFHOAAKDNAE IKVLFPNOOESNEOLCAYILGETMEAIREKCPVDHNPSSSPPKTLPCLCMYAMTPERVHRLRSNN MID ... V...... VO.R.H-...... LT.V.V ..RPT..RRE..... V...... ITH.V:. ...T.DTD. .A...... R AAO II SIN (1629) VKEVTVCSSTPLPKHK IKNVOKVOCTKVVLFNPHTPAFVPARKYIEVPEOPTAPPAOAEEAPE VVATPSPSTAONTSLDVTOISLDMDDSSEGSLFSSFS MID ..OF f. Y PG. .R...... HDV. ..SP... RR.PSISSESSSSGLSVFDLDIGSD.EYEPMEPVOPEPL.O.AVVEETAPVRLERVA II SIN (1729) GSDNSI TSMDSWSSGPSSLEIVDRROVVVADVHAVOEPAPIPPPRLKKMARLAAARKEPTPPASNSSESLHLSFGGVSMSLGSIFOGETAROAAVOPLAT MID PVAAPRRARATPFTLE .R ... PP.P.PRTMPVR. .R.K.AAT.TPERISFGDLD.ECMAIINDDLTF.DFGAGEF SIN (1629) GPTDVPMSFGSFSDGEIDELSRRVTESEPVLFGSFEPGEVNSI ISSRSAVSF'PLRKORRRRRSRRTE- YX LTGVGGYIFSTDTGPGHLOKKSVLONOLTEP MID E.L.SAU.ORA.A. ...S OR. .R.TR.AOC SIN (1929) TLERNVLERIHAPVLDTSKEEOLKL-RYOMMPTEANKSRYOSRKVENOKAI TTERLLSGLRLYNSATDGPECYKIT YPKPLYSSSVPANYSDPOFAVAVCN MID VA,.DVHE..VF..KC. KE..RL.L.QM..A...... M..EVID...G.A.K.FVTP.TDCRYVTHKH...M..T..AFYL.SAKT... A.. III SIN (2029) NYLHENYPTVASYOI TDEYDAYLDMVDG1VACLO'ATFCPAKLRSYPKKHEYRAPNIRSAVPSAMONTLONVLIAATKRNCNVTOMRELPTLDSATFNVE MID EF.SR...S.T...... R.A.. S... F... S.HRAE. PF A .. ... I ~ . SIN (2129) CFRKYACNOEYWEEFARKPIRI TTEFVT-AYVARLKGPKAAALFAKTYNLVPLOEVPMORFVMOMKROVKVTPGTKHTEERPKVOVIOAAEPLATAYLCGI MID .K.. ND. .OD . S... .L.. NI.S. . . . K . V I.II SIN (2229) HRELVRRLTAVLLPNIHTLfDSAEDFDOAI IAEHFKOGDPVLETOIASFOKSOODAMALTGLMILEDLGVDOPLLDLIECAFGEISSTHLPTGTRFKFGA MID N. V .S... RP. .A SL.Y .... .L .E... AS ...E....S.T. II I~ II SIN (2329)0MKSGMFLTLfVNTVLNVVIASRVLEERLKTSRCAAFIGOONIIHGVVSDKEMAERCATWLNME VKII AVIGERPPYFCGGF ILODSVTSTACRVADPL MIDN.... M T...... TN.K ...... LL ..... A.M.MC .VF.O. G.C. SIN (2429) KRLFKLGKPLPADDEODEORRRALLDETKAWFRVGIT.GTLAVAVTTRYEVONITPVLLALRTFAOSKRAFOAIRGEIKHLYGGPK MID ...... A.K...... AAR.N .... .OAD.EA.MNS .. EG.RN.IT ..T.LSFNYHN.RHL. .PVID . FIG. 5. Direct comparison of the amino acid sequences encoded by Sindbis RNA and Middelburg RNA around the opal termination codon. The sequence of Sindbis is given in the first line. Amino acids are numbered starting with the NH2 terminus ofthe polyprotein precursor. The Middelburg sequence is shown below; a dot means the amino acid is the same as in the Sindbis sequence. Gaps have been introduced in the Middelburg sequence for the purpose of alignment. The single-letter code is used as in Fig. 3; the boxed X is the position of the opal codon. Downloaded by guest on September 29, 2021 Biochemistry: -Strauss et al. Proc. Natl. Acad. Sci. USA 80 (1983) 5275

homology (Fig. 5). We have postulated that components of the inant message in infected cells insures that most of the virus- alphavirus replicase recognize specific nucleotide sequences' of specific protein synthesis is devoted to production of the struc- about 20 nucleotides in length in the RNA as promoter sites (7, tural proteins. Translation of the genomic RNA up to the opal 11, 14) and it is tempting to speculate that the highly conserved codon produces three nonstructural polypeptides in lesser core of ns72 has such a function. In any event, the high degree amounts and read-through may produce a fourth polymerase of homology between these two viruses indicates an important component in very small quantities for specific regulation of function for ns72 in the virus replication cycle. As has been pre- replication. viously noted (7), the alphaviruses have diverged sufficiently such that the codon used for any particular amino acid has been The computer programs used in this work, including production of figures, were written by Tim Hunkapiller and we are grateful to him randomized, even in highly homologous segments. In ns72 dif- for access to these programs and instructions in their use. The com- ferent codons are used between Sindbis and Middelburg for the puter work was performed on the computer facility of Lee Hood and same amino acid in 57% of the cases in which conservation oc- we are grateful for the time on these facilities. Edith Lenches gave ex- curs (Fig. 3). The slight difference from complete randomiza- -pert technical assistance in the maintenance and preparation of virus tion (66% difference) probably arises because some codons are stocks and cell lines. This work was supported by Grants AI 10793 and infrequently used by alphaviruses (5, 6). Thus, conservation of GM 06965 from the National Institutes of Health and by Grant PCM amino acid sequence is highly significant and the translation *8022830 from the National Science Foundation. frame can be deduced by comparison of conserved amino acid .1. Strauss, J. H. & Strauss, E. G. (1977) in The Molecular Biology sequences; because of the divergence-in codon usage the high- of Animal Viruses, ed. Nayak, D. P. (Dekker, New York), pp. 111- est degree of protein sequence homology is observed only when 166. the proper translation frames are being compared. 2. Schlesinger, M. J. & Kaarsinen, L. (1980) in The Togaviruses, ed. Although termination codons have been located within the Schlesinger, R. W. (Academic, New York), pp. 371-392. translated regions of a number of viruses, at the current time 3. Garoff, H., Frischauf, A.-M., Simons, K., Lehrach, H. & Delius, the alphaviruses appear unique among animal viruses in using H. (1980) Proc. Natl Acad. Sci. USA 77, 6376-6380. An 4. Garoff, H., Frischauf, A.-M., Simons, K., Lehrach, H. & Delius, read-through to modulate the synthesis of their proteins. H. (1980) Nature (London) 288, 236-241. amber codon punctuates the genome of the murine retrovi- 5. Rice, C. M. & Strauss, J. H. (1981) Proc.Natl Acad. Sci. USA 78, ruses between the gag and pol genes (15) and it was originally .2062-2066. believed that read-through was the mechanism for producing 6. Dalgarno, L., Rice, C. M. & Strauss, J. H. (1983) Virology, in press. relatively minor amounts of the polymerase products (16, 17). 7. Ou, J.-H., Rice, C. M., Dalgarno, L., Strauss, E. G. & Strauss, However, recent results with an avian retrovirus, Rous sarcoma J. H. (1982) Proc. Natl Acad. Sci. USA 79, 5235-5239. 8. Ou; J.-H., Strauss, E. G. & Strauss, J. H. (1981) Virology 109, virus, have revealed that in this case the downstream, open 281-289. reading frame for pol is out of phase with the gag gene (18) and 9. Rice,.C. M. & Strauss, J. H. (1981)J. Mal. Biol. 150, 315-340. thus the polymerase moieties may be produced from a minor 10. Maxam, A. M. & Gilbert, W. (1980) Methods Enzymol. 65, 499- species of messenger produced by splicing in which the amber 560. codon could have been deleted. Amber codons are also present 11. Ou, J.-H., Strauss, E. G. & Strauss, J. H. (1983)J. Mol. Biol., in in the genomes of a variety of- RNA plant viruses [including press. 12. Strauss, E..G. & Strauss, J. H. (1983) Curr. Top. Microbiol. Im- tobamoviruses (19) and tymoviruses (20)] and t.read-through munol 105, 1-97. products have been obtained by translation of these and other 13. Collins, P. L., Fuller, F. J., Marcus, P. I., Hightower, L. E. & plant virus RNAs in in vitro protein synthesis systems (20-24). Ball, L. A. (1982) Virology 118, 363-379. Whether these experiments accurately reflect the conditions of 14. Ou, J.-H., Trent, D. G. & Strauss, J. H. (1982)J. Mol. Biol. 156, replication during normal infection is unclear at this time. 719-730. The mechanism by which read-through of an opal codon might 15. Shinnick, T. M., Lerner, R. A. & Sutcliffe, J. G. (1981) Nature (London) 293, 543-548. occur during alphavirus infection is unknown. Specific sup- 16. Philipson, L., Anderson, P., Olshevsky, U., Weinberg, R., Bal- pression of the opal codon is unlikely because in the case of timore, D. & Gesteland, D. (1978) Cell 13, 189-199. Sindbis virus the translation of the structural proteins is ter- 17. Oppermann, H., Bishop, J. M., Varmus, H. E. & Levintow, L. minated with an opal codon (5). The-simplest mechanism pre- (1977) Cell 12, 993-1005. dicts that read-through is a probablistic event of relatively low 18. Schwartz, D., Tizard, R. & Gilbert, W. (1983) Cell 32, 853-869. frequency, perhaps due to wobble in the first anticodon posi- 19. Goelet, P., Lomonossoff, G. P., Butler, P. J. G., Akam, M. E., Gait, M. J. & Karn, J. (1982) Proc. Natl. Acad. Sci. USA 79, 5818- tion to insert a tryptophan. If so, accumulation of ns72 would 5822. be relatively slow in the infected cell and functionally ns72 would 20. Morch, M.-D., Drugeon, G. & Benicourt, C. (1982) Virology 119, appear as. a late protein. An attractive hypothesis for the func- 193-198. -tion of such a polypeptide is as a controlling or modifying factor 21. Pelham, H. R. B. (1978) Nature (London) 272, 469-471. for the polymerase, perhaps to effect the shutoff of minus-strand 22. Mang, K.-G., Ghosh, A. & Kaesberg, P. (1982) Virology 116, 264- 274. template synthesis that occurs at 3-4 hr postinfection (25, 26). 23. Dougherty, W. G. & Kaesberg, P. (1981) Virology 115, 45-56. Thus, alphaviruses, with a small and relatively simple ge- 24. Bisaro, D. M. & Siegel, A. (1982) Virology 118, 411-418. nome, are capable of regulating the synthesis of their virus-en- 25. Sawicki, D. L. & Sawicki, S. G. (1980)J. Virol. 34, 108-118. coded products at three different levels by two different mech- 26. Sawicki, D. L., Sawicki, S. G., Keranen, S. & Kaariinen, L. (1981) anisms. Production of a subgenomic (26S) RNA as the predom- J. Virol. 39, 348-358. Downloaded by guest on September 29, 2021