Proc. Natl. Acad. Sci. USA Vol. 85, pp. 5997-6001, August 1988 Evolution Western equine encephalitis is a recombinant virus (RNA recombination//evolution of RNA ) CHANG S. HAHN, SHLOMO LUSTIG*, ELLEN G. STRAUSS, AND JAMES H. STRAUSSt Division of Biology, 156-29, California Institute of Technology, Pasadena, CA 91125 Communicated by James Bonner, April 14, 1988

ABSTRACT The are a group of 26 - virus; O'Nyong-nyong virus; and . Sindbis borne viruses that cause a variety of human diseases. Many of and Semliki Forest viruses have been intensively studied as the New World alphaviruses cause encephalitis, whereas the models for alphavirus replication (7). is widely Old World viruses more typically cause fever, rash, and distributed, being found in Europe, India, southeast Asia, arthralgia. The genome is a single-stranded nonsegmented Australia, and Africa. Close relatives of this virus, such as RNA molecule of + polarity; it is about 11,700 nucleotides in Ockelbo virus in Europe (8) and Babanki virus in Africa, length. Several alphavirus genomes have been sequenced in cause disease in humans characterized by fever, rash, and whole or in part, and these sequences demonstrate that alpha- arthritis. and O'Nyong-nyong viruses have viruses have descended from a common ancestor by divergent caused large epidemics in Africa of a dengue-like disease also evolution. We have now obtained the sequence of the 3'- characterized by fever, rash, and arthralgia. Ross River virus terminal 4288 nucleotides of the RNA of the New World is the causative agent of epidemic polyarthritis in Australia Alphavirus western equine encephalitis virus (WEEV). Com- and the South Pacific. parisons of the nucleotide and amino acid sequences of WEEV Complete or partial RNA sequences have been obtained for with those of other alphaviruses clearly show that WEEV is Sindbis virus (9), (10-12), Ross River recombinant. The sequences of the protein and of the virus (13), EEEV (14), and VEEV (15). Comparison of these (untranslated) 3'-terminal 80 nucleotides of WEEV are closely nucleotide sequences and their encoded amino acid sequences related to the corresponding sequences of the New World has demonstrated that the alphaviruses are related by linear Alphavirus eastern equine encephalitis virus (EEEV), whereas descent from a common ancestor (7). The relationships found the sequences of glycoproteins E2 and El of WEEV are more are compatible, for the most part, with those derived from closely related to those of an Old World virus, Sindbis virus. studies of serological cross-reactivity, which depends only Thus, WEEV appears to have arisen by recombination between upon antigenic epitopes in the structural proteins. In serolog- an EEEV-like virus and a Sindbis-like virus to give rise to a new ical studies, however, WEEV has always been something of a virus with the encephalogenic properties of EEEV but the puzzle. It is a New World virus that often causes encephalitis, antigenic specificity of Sindbis virus. There has been specula- but serologically it is most closely related to Sindbis virus, an tion that recombination might play an important role in the Old World alphavirus not normally associated with encepha- evolution of RNA viruses. The current finding that a wide- litis. To explore the relationship of WEEV to other alphavi- spread and successful RNA virus is recombinant provides ruses, we have obtained the sequence of the 3'-terminal 4288 support for such an hypothesis. nucleotides of the WEEV genomet and found that WEEV appears to have arisen by recombination between an EEEV- The 26 members of the Alphavirus genus of the family like virus and a Sindbis-like virus. Togaviridae are mosquito-borne viruses that form an impor- tant group of disease agents (1-3). The New World alphavi- MATERIALS AND METHODS ruses include western equine encephalitis virus (WEEV) and Virus RNA Preparation. WEEV RNA [strain BFS1703, eastern equine encephalitis virus (EEEV), both of which are isolated from Cx. tarsalis in July 1953 in Kern County, capable ofcausing encephalitis in humans and causing severe California (16)] was obtained from Mark Stanley and James disease in horses. WEEV has a wide geographic distribution, Hardy (University of California, Berkeley). The virus had being found from western Canada to Mexico and, discontin- been passed twice by i.c. inoculation of suckling mice and uously, to Argentina. WEEV is transmitted in the western four times (including three plaque isolations) in VERO cells. United States by the mosquito Culex tarsalis; serve as For RNA preparation, virus grown in VERO cells was an important vertebrate reservoir. In the eastern United purified by pelleting onto a 30o sucrose cushion followed by States, WEEV is replaced by Highlands J virus (HJV), whose isopycnic banding in Nycodenz (Nyegaard, Oslo). After primary vector is melanura. From serological stud- pelleting and dissociation in NaDodSO4, the RNA was ies (3, 4) and from limited sequencing studies (5, 6), WEEV extracted by phenol/chloroform treatment, precipitated with and HJV are known to be very closely related, and HJV can ethanol, purified on a discontinuous sucrose gradient, and be considered to be a strain of WEEV (2). In the eastern concentrated by ethanol precipitation. United States, the range of HJV overlaps that of EEEV, Cloning and Sequencing. Clones containing the 3'-terminal whose primary vector is also Cs. melanura. Other New 4288 nucleotides of WEEV RNA were obtained by using an World alphaviruses include Venezuelan equine encephalitis oligo(dT)-tailed vector as a primer as described (17). Clones virus (VEEV), found in Central and South America; Fort Morgan virus, found in Colorado; and Aura virus, found in Abbreviations: WEEV, western encephalitis virus; EEEV, eastern South America. encephalitis virus; VEEV, Venezuelan equine encephalitis virus; The Old World alphaviruses include Sindbis virus, the HJV, Highlands J virus. prototype alphavirus; Semliki Forest virus; Chikungunya *Present address: Israel Institute for Biological Research, P.O. Box 19, Ness-Ziona, 70450, Israel. tTo whom reprint requests should be addressed. The publication costs of this article were defrayed in part by page charge MThe sequence reported in this paper is being deposited in the payment. This article must therefore be hereby marked "advertisement" EMBL/GenBank data base (IntelliGenetics, Mountain View, CA, in accordance with 18 U.S.C. §1734 solely to indicate this fact. and Eur. Mol. Biol. Lab., Heidelberg) (accession no. J03854). 5997 Downloaded by guest on October 1, 2021 5998 Evolution: Hahn et al. Proc. Natl. Acad. Sci. USA 85 (1988)

were sequenced by using the chemical sequencing method through the 3'-terminal untranslated sequence, which ends in (18, 19). a poly(A) tract. We have previously sequenced the amino termini of the RESULTS three structural proteins of the McMillan strain of WEEV (isolated in 1941 in Canada from the brain of a fatal human Partial Sequence of WEEV RNA. The translated sequence case) and thus established the start points of the structural of the 3-terminal 4170 nucleotides of the WEEV genome is proteins (21). Comparison of the amino acid sequence of the shown in Fig. 1. This sequence begins in the region encoding McMillan strain with that deduced here for the BFS1703 the carboxyl terminus of nonstructural protein 4, continues strain (isolated from in 1953 in California) reveals through the junction region between the nonstructural and four amino acid differences in 142 amino acids for which structural proteins containing the start of the subgenomic comparison is possible (one in C, one in E2, and two in El). mRNA that is translated to give the structural proteins (20), However, reevaluation of the original data for the McMillan and progresses through the coding sequence of the three strain suggests that the apparent difference in the capsid structural proteins of the virus (a nucleocapsid protein, C, proteins may result from a misscall in the McMillan sequence and two envelope glycoproteins, E2 and El) and finally and that there are no differences between the capsid proteins

I S R Y E I I L A G L I I T S L S T L A E S V K N F K S I R G N P I T L Y G * 39 UCCAGAUACGAGAUCAUACUGGCAGGCCUGAUCAUCACGUCCCUGUCCACGUUAGCCGAFAGCGUUAFGAACUUCAVGAGCAUAPGAGGGAACCCAVUCACCCUCUACGGCUGACCUAA 119 __ __. P -lF P Y _P O _L _N F P_ P_ V MY _P T N P M A Y R D P N P P R 27 120 AUAGGUGACGUAGUAGACACGCACCUACCCACCGCCAA*UG-UUU-CCA-UAC-CCUC-AG-CUG-AAC'UUU-CCA-CCA-GUU-UAC-CCUA-CA-AAU-CCG-AUG-GCUUACCGAGAUCCAAACCCUCCUAGG 239 28 C R W R P F R P P L A A O I E D L R R S I A N L T F K G R S PN P P P G P PP K 87 240 UGCCGCUGGAGGCCGUUUCGGCCCCCGCUGGCUGCUCAAAUCGAAGAUCUUAGGAGGUCGAUAGCCAACUUAACUUUCAAACAACGAUCACCUAAUCCGCCGCCAGGUCCACCGCCAAAG 359 a8 K K K S A P K P K P T O P K KK K OG A K K T K R K PK P G K R 0 R M C M K L E 107 380 AAGAAGAAGAGUGCUCCUAAGCCAAAACCUACUCAGCCUA^AAAGAAGAAGCAGCAAGCCAAGAAGACGAAACGCAAGCCUAAACCAGGGAAACGACAGCGUAUGUGUAUGAAGUUGGAG 479 108 S D K T F P I M L N G O V N G Y A C V V G G R L M K P L H V E G K I D N E 0 L A 147 480 UCGGACAAGACAUUUCCGAUCAUGCUGAACGGCCAAGUGAAUGGAUACGCUUGCGUUGUCGGAGGAAGGCUGAUGAAACCACUCCACGUUGAAGGAAAAAUCGAUAAUGAGCAAUUAGCG 599 148 A V K L K K A S M Y D L E Y G D V P 0 N M K S D T L 0 Y T S D K P P G F Y N W H 187 600 GCCGUG^AAUUGAAGAAGGCUAGCAUGUACGACUUGGAGUAUGGCGACGUUCCCCAGAAUAUGAAAUCAGACACGCUGCAGUACACCAGCGACAAACCACCGGGCUUUUACAACUGGCAC 719

188 H G A V 0 Y E N G R F T V P R G V G G K G D S G R P I L D N R G RV V A I VL G 227 720 CACGGCGCAGUCCAGUAUGAGAAUGGGAGAUUCACCGUACCGAGAGGAGUGGGCGGGAAAGGCGACAGUGGAAGACCGAUCCUGGAC AAC.AGAGGC>GAVUUGUGGCUAUUGUUCUAGGA 839 228 G A N E G T R T A L S V V T W N 0 K G V T I K D T P E G S E P W LV T A L C V 287 840 GGUGCAAACGAGGGCACGCGUACGGCGCUYUCAGUGGUCACUUGGAACCAGAAAGGGGUGACCAUCAAGGAUACCCCCGAAGGUUCUGAACCGUG4UCACUAGUUACAGCGCUGUGCGUG 959 268 L S N V T F P C D K P P V C Y S L A P E R T L D V L E E N V D N P N Y D T L L E 307 960 CUUUCGAAUGUCACGUUCCCAUGCGACAAACCACCCGU PCUAUUCACUGGCGCCAGAACGAACACUCGACGUGCUCGAAGAGAACGUCGACAAUCCAAAUUACGACACGCUGCUGGAG 1079 308 N V L K C P S R R P K R S"TfT D D F T L T S P Y L G F C P Y C R H S A P C F S P 347 1080 AACGUCUUGAAAUGUCCAUCACGCCGGCCCAAACGA CAACCSAUGACUUCACACUSACCAGUCCCUACCUGGGGUUCUGCCCGUAUUGCAGACACUCAGCSCCGUGUUUCAGCCCAG 1199 348 I K I E N V W D E S D D G S I R I O V S A G F G Y N G A G T A D V T K F R Y M S 387 1200 AUAAAAAUUGAGAACGUGUGGGACGAAUCUGAUGAUGGAUCGAUUAGAAUC CAGGUCUCGGCACAAUUCGGCUACAAUCAGGCAGGCACUGCAGAUGUC ACCAAAUUCCGUUACAUGUCU 1319 388 F D H D H D I K E D S M D K I A I S T S 6 P C R R L G H K G Y F L L A O C P P G 427 1320 UUCGACCACGACCAUGACAUCAAGGAAGACAGUAUGGAUA^AAUAGCUAUUAGCACAUCUGGACCCUGCCGUCGUCUUGGCCACAAAGGGUACUUCCUGUUAGCUCAAUGUCCUCCAGGU 1439 428 D S V T V S I T S G A S E N S C T V E KK I R R K F V G R E E Y L F P P V H G K 467 1440 GACAGUGUAACCGUCAGUAUCACGAGCGGAGCAUCUGAGAAUUCAUGC ACCGUGGAGA^AAAGAUC AGGAGGAAGUUUGUCGGUAGAGAGGAGUACUUGUUCCCACCUGUCCAUGGA^AA 1559 468 L V K C H V Y D H L K E T S A G Y I T M H R P G P H A Y K S Y L E E A S G E V Y 507 1560 CUGGU^AAAUGCCACGUUUACGAUCACUUGAAGGAGACGUC UGCCGGAUACAUAACCAUG.CACAGGCCAGGCCCACACGCGUAUAAGUCCUAUCUGGAGGAAGCGUCAGGCGAAGUGUAC 1679 508 I K P P S G K N V T Y E C K C G D Y S T G I V S T R T K M N G C TK A K G C I A 547 1680 AUUAA^ACCACCUUCUGGCAA^GAACGUCACC.UACGA^UGUAA^GUGUGGCGACUACAGCACAGGUAUUGUGAGCACGCGAACGAAGAUGAAC.GGCUGCACUAA^AGCA^AACAGUGCAUUGCC 1799 548 Y K S D 0 T K W V F N S P D L I R H T D H S V 0 G K L H IP F R L T P T V C P V 587 1800 UACAAGAGCGACCAA^ACGAAAUGGGUCUUCAACUCGCCGGAUCUUAUUAGGCACACAGACCACUCAGUGCAAGGUA AACUGCACAUUCCAUUCCGCUUGACACCGACAGUCUGCCCGGUU 1919 588 P L A H T P T V T K W F K G IT L H L T A T R P T L L T T R K L G L RA D A T A 627 1920 CCGUUAGCUCACACGCCUACAGUCACGAAGUGGUUC AAAGGCAUCACCCUCCACCUGACUGCAACGCGACCAACAUUGCUGACAACGAGAsAAUUGGGGCUGCGAGCAGACGCAACAGCA 2039 628 E N I T G T T S R N F S V G R E G L E Y V W G N H E PV R V W A G E S A P G D P 667 2040 GAA^UGGAUUACAGGGACU.ACAUCCAGGAAUUUUUCUGUGGGGCGAGAAGGGCUGGAGUAC.GUAUGGGGCAACCAUGAACCAGUCAGAGUC.UGGGCCCAGGAGUCGGCACCAGGCGACCCA 2159 H G W P H E I I I H Y Y H R H P V Y T V I V L C G V A L A I L V G T A S S A A C 707 2180 CAUGGAUGGCCGCAUGAGAUCAUCAUCCACUAUUUCAUCGGCAUCCAGUCUACACUGUCAUUGUGCUGUGUGGUGUCGCUCUUGCUAUCCUGGUAGGCACUGCAUCJU5ASGCAGCUUGU 2279 708 I A K A R R D C L T P Y A L A P N A T V P T A L A V LC C I R P T N A E W F G E 747 2280 AUCGCCAAAGCAAGAAGAGACUGCCUGAC GCCAUACGCGCUUGCACCGAACGCAACGGUACCCACAGCAUUAGCAGUUUUGUGCUGUAUU. CGGCCAACCAACOC~. AAACAUUUGGAGAA 2399 748 T L N H L W F N N O P F L W A G L C I P L A A L V I L F R C F S CC M P F L L V 787 2400 ACUUUGAACCAUCUGUGGUUUAACAACCACC~tUCUCUGGGCACAGUUGUGCAUCCCUCUGGCAGCGCUUGUUAUUCUGUUCCGCUGCUUUUCAUGCUGCAUGCCUUUUUUAUUGGUU 2519 788 A G V C L G K V D A H A T T V P N V P G I P Y KA L V E R A G Y A P L N L E 827 2520 GCAGGCGUCUGCCUGGGGAAGGUAGACGC UCGAACAUGCGACCACUGUGCCAAAUGUUCCGGGGAUCCCGUAUAAGGCGUUGGUCGAACGUGCAGGUUACGCGCCACUUAAUCUGGAG 2839 828 I T V V S S E L T P S T N K E Y V T C R F H T V I P SP G V K C C G S L E C K A 867 2840 AUCACUGUCGUCUCAUCGGAAUUAACACCC.UCAACUAACAAGGAGUACGUGACCUGCAGAUUCCACACAGUCAUUCCUUCACCACAAGUU.A AAUGCUGCGGGUCCCUCGAGUGUAAGGCA 2759

668 S S K A D Y T C R V F G G V Y P F M W G G A 0 C F CD S EN T 0 L S E A Y V E F 907 2760 UCCUCA AAAGCGGAUUACACAUGCCGCGUUUUUGGCGGUGUGUACCCUUUCAUGUGGGG4 GGCGCACAGUGCUUCUGUGACAGUGAGAACACACAACUGAGUGAGGCAUACGUCGAGUUC 2879 908 A P D C T I D H A V A L K V H T A AL K V G L R I V Y G N T T A H L D T F V N G 947 2880 GCUCCAGACUGCACUAUUGAUCAUGCAGU CGCACUA^AAGUUCACACAGCUGCUCUGAAAGUCGGCCUGCGUAUAGUAUACGGCAAUACC.ACCGCGCACCUGGAUACGUUCGUCAAUGGC 2999 948 V T P G S S R D L K V I A G P I S A A F S P F D H K V V I RK G L V Y N Y D F P 987 3000 GUCACGCCAGGUUCCUCACGGGACCUGAAQGUCAUAGCAGGGCCGAUAUCAGCCGCUUUUUCACCCUUUGACCAUAAGGUCGUCAUCAGAAAGGGGCUUGUUUACAACUACGACUUCCCU 3119

988 E Y G A M K P G A F G D I G A S S L D A T D I V A R T O I R L L K P S V K N I H 1027 3120 GAAUAUGGAGCUAUGAAACCAGGAGCGUUC.GGCGAUAUUCAAGCAUCCUCGCUUGAUGCCACAGACAUAGUAGCCCGCACUGACAUACGG.CUGCUGAAGCCUUCUGUCAAGAACAUCCAC 3239 1028 V P Y T O A V S G Y E M W K N N S G R P L O E T AP F G C K I E V E P L R A S N 1067 3240 GUCCCCUACACCCAAGCAGUAUCAGGGUAUGAAAUGUG6AAGAACAACUCAGGACGACCC.CUGCAAGAAACAGCACCAUUUGGAUGUAAAAUUGAAGUGGAGCCUCUGCGAGCGUCUAAC 3359 1068 C A Y G H I P I S I D I P 0 A A F V R S S E S P T I L E V S C T V A D C I Y S A 1107 3360 UGUGCUUACGGGCACAUCCCUAUCUCGAUU.GACAUCCCUGACGCAGCUUUUGUGAGA UCAUCAGAA^UCACCAACAAUUUUAGAAGUUAGC.UGCACAGUAGCAGACUGCAUUUAUUCUGCA 3479 1108 D F G G S L T L 0 Y K A D R E G H C P V H S H S T T A V L K E A T T H V T A V S 1147 3480 GACUUUGGUGGUUCGCUAACAUUACAGUAC.AA^AGCUGACAGGGAGGGACAUUGUCCAGUUCACUCCCACUCCACAACAGCUGUUUUGAAG.G AAGCGACCACACAUGUGACUGCCGUAGGC 3599 1148 S I T L H F S T S S P G A N F I V S L C G K K T T C N A EC K P P A O H II G E 1187 3600 AGCAUAACACUACAUUUUAGCACAUCGAGC.CCACAGCAAAUUUUAUAGUUUCGCUAUGC.GGCAAGAAGACCACCUGCAAUGCUGAAUGU^AACCACCGGCCGACCACAUAAUUGGAGAA 3719

1166 P H K V D O E F G A A V S K T S W N W L L A L F G S A S S L I V V G L I V LV C 1227 3720 CCACAUAAAGUCGACCAAGAAUUCCAGGCG.GCAGUUUCCAAAACAUCUUGGAAUUGGCUG.CUUGCACUGUUUGGGGGAGCAUCAUCCCUC.AUUGUUGUAGGACUUAUAGUGUUGGUCUGC 3839 1228 S S M L I N T R R* 3840 AGCCUCACUUAUAAACACACGUAGAUGUGUGAACCACACUGACAUAGCGGUAAACUCGAUUACUUCCGAGGAGCGUGGUGCAUAACGCCACGCGCCGCUUGACACUAPACU 3959 39860 CGAUGUAUUUCC GAGGAAGCACAGUGCAUAAUGCUGUGCAGUGUCACAUUAAUCGUAUAUCACACUACAUAUUAAC AACACUAUAUCAC9UUU^UAUAGACUCACUAUGGGUCUCUAAUAU 4079 4080 ACACUACACAUAUUUUACUUA^AAACACUAUACACACUUUAUAAAUUCUUUUAUAAUUUUyUCUUUUGUUUUUAUUUUGUUUUUA^AAUUUC-POLY(A) 4170

FIG. 1. Sequence of the 3'-terminal 4170 nucleotides of WEEV RNA (strain BFS1703). The start points of the structural proteins are indicated. Asterisks indicate the termination codons for the nonstructural and structural polyproteins. Two independent clones were sequenced and only one clonal difference was found: the GAC encoding Asp-72 of E2 was replaced by a UAC encoding tyrosine in the second clone. Downloaded by guest on October 1, 2021 Evolution: Hahn et al. Proc. NatI. Acad. Sci. USA 85 (1988) 5999

in the residues compared. The amino acid sequence diver- Table 1. Percent sequence identity among WEEV, EEEV, gence between the two strains is 2.8% (or 2.1% when the VEEV, and Sindbis virus (SINV) proteins apparent difference in the capsid proteins is discounted). We WEEV WEEV EEEV EEEV SINV WEEV also have reported the sequence of the 3'-terminal 351 nucleotides of McMillan RNA (6). Comparison with that for EEEV SINV SINV VEEV VEEV VEEV BFS1703 shows three nucleotide substitutions and one de- nsP4 (C letion (in McMillan) between these two strains, a divergence terminus) 70 35 40 of 1.1%. These comparisons establish the fact that the widely Capsid studied McMillan strain (the prototype WEEV virus) and the N terminus* 78 39 36 42 27 49 BFS1703 strain are the same virus. Since these two strains C terminus* 91 69 64 76 61 77 were isolated 12 years apart in different geographic areas, the Overall 85 53 50 59 44 63 rate ofdivergence ofWEEV in nature is at most 0.1-0.2% per Envelope year, which is low in comparison to rates that have been E3 50 58 42 59 49 56 established for several RNA viruses (22, 23). E2 44 68 42 46 40 41 WEEV Is a Recombinant. The amino acid sequences of the 6K 44 67 45 54 40 40 WEEV structural proteins are compared to those of EEEV El 49 76 51 58 51 50 and of Sindbis virus in Fig. 2. Inspection ofthis figure clearly Overall 47 71 46 53 46 46 reveals that the WEEV capsid protein C is most closely *N terminus refers to amino acids 1-132 of the Sindbis capsid protein related to that of EEEV, whereas the glycoproteins E2 and or the corresponding positions in the aligned files in Fig. 2. C El are more closely related to the corresponding proteins of terminus includes the remaining amino acids in the capsid proteins Sindbis virus. in the aligned files. Unusually high identity values are shown in The relationships among the proteins of these viruses are boldface type. summarized in Table 1. The amino-terminal and carboxyl- terminal domains of the capsid protein are considered sepa- rately because of the fact that the carboxyl termini of all differ from one another in a uniform and consistent way. alphavirus capsid proteins are closely related. The capsid Sequence data for Semliki Forest virus or Ross River virus proteins of WEEV and EEEV are much more closely related lead to similar results (not shown). WEEV is exceptional in (85% sequence identity) than are those ofWEEV and Sindbis that it is closely related to Sindbis virus in the region of the virus (53% identity). The relationships are reversed in the genome encoding El and E2, but to EEEV in other regions. case of the envelope proteins. The envelope proteins of Nucleotide sequences in the carboxyl-terminal region of WEEV and Sindbis virus are much more closely related (71% nsP4 and in the junction region between structural and identity overall) than are those of WEEV and EEEV (46% nonstructurial proteins, which are believed to contain impor- identity). Figures for the carboxyl-terminal domain of nsP4 tant regulatory elements for transcription of a subgenomic are also included. Although this protein is highly conserved mRNA translated to produce the structural proteins (20), are among alphaviruses, its carboxyl-terminal domain is more compared for the three viruses in Fig. 3a. The EEEV and variable, and WEEV and EEV are much more closely related WEEV nucleotide sequences are very similar to one another in this region than are WEEV and Sindbis virus. and the sequences flanking the start of the subgenomic 26S Also included in Table 1 are comparisons with another RNA are identical. The sequence of Sindbis virus in this alphavirus, VEEV, to illustrate that alphaviruses in general region is similar but not identical. The nsP4 proteins ofEEEV C WEE 11) M4FPYPOLNFPPVYPTNPM4AYRDPNPPRCRWR PFRPPLAAOIEDLRRSIANLTFKG RSPNPPPGFPPPKKKKSAPKPKPTGPKKKKGG AKKTKRKPKPGKRGRMCMKLESDKTFPIlM,, (11601 EEE (11 ..... T.V.Y.HMA.I ...... GVA...... L A ...A. 6.3.A.R.P SLSLETKE 3333 0 11 C) SIN (1) .NRGFFNMLGRRPFPA.T.MWR.RRR.GAAPM.A.NG-..S G.TTAVSA.VIG.AT.PGP.R.R .3.R.. GF KPK.P.T EKK.GPA...... AL..A.RL.D',KN 11200)

WEE (117) NGNYCVGLKLVG~DELAKKAMDEGVGMSTGTOPGYWHAGEGFVRVGGSAIORRVILGNGR (235SC) EEE (117) ...... VF....R.....I...... C...... N...... K.... .0Q .V -S.. 123501, SIN 1121) E0.D.I.H.LAME-KvV... .K.T. .HPV.SKL.FT 5.6A .M.FAGL.V .R.EAFT .EHi E -----SG R .M 5. .... 124001 I III IItI

WEE (236) ALSVVTWNGKGVTIK0TPEGSEPW S LVTALCVLS;NVTFPCDKPPV CYSLAPERTLDVLEENVDNPNYOTLLENVLKCP SRRPPER SITOOFT LTSPYLGFCPYCRHS )259C, 60, 22621 EEE (236) ...... V .M.A.I.... G.CMPC. .EKN.HE. .TM4..3.Y.SRA..3.DAAV .1N A..TR DLOTH OVE.A..AP.19..NOG )259C, 63, 25621 SIN (241).....S. K. .7...... T.E.AP .L.G..S.. P.R .T ..TRE.S.A...... I NHEA...... NAI.R.S.GS.S VI ... 7.... TS. V1. (264GC 64, 22621 1T 1' WEE (2362) ACSIINWEDGIIVAFYGGAVKFMFHHIESDIITGCRGKYLACPDVVISAESTEKRKVREL 1142E2) EEE (26E2) PD.O.. A.9.E.PGDAHA.V .... .T. .M4.LKPH. V.LAYI4MF.NGKTOKS. .I NLHVR. .A. .SLVS.H..Y.1.....T.. GFHD.PNRHT.RILAH.VEFAP. ..KR 1140621 SIN (2362) E....V ...... A..NT....T.....D.S.A.SAN.YK ... LEO..TV. .GT..D.K...... SY...... K ...... V.SN.AT. ..LAR. .KP ..... K.DL 1142621 II

WEE 1262 E2) EEE (1412) .6..VELP.NR.T.KRADOGH.VE. .0..LVGDH.L.SIH.AK-K.TV. .. A.0KVY .. .P.VPE. .T. SDHTTT. .0V .. LI.NNE..V.Y.0G3P.GEGOTFKE.. V..VPV (25962) SIN (443621.----K.7 ...... T..7.P...R.... T.7.S. K6. A.. .1...... K..T!..EIT. ---.AI...V 0...... T9.A ... L..K: 1262621

WEE 1382E2) 666 (26062) EAK.IAT. .PE.L.EHKHPTLI. ..HPDH .S...S.S0J.WP~jR3. ERP.TV-T.7G.7. 7..T... .P.E .S...E N ..... VVVVV. .N.Y.LT1.l.G..TCVAI.M.SC 1379621 SIN (26362) S.57.1..V. .A.N.IHG. .H.S.Q.0T0HL .P--- .. ANPEP-T. .. .V.K.V..T.7.0.0...I.1..... y...... VG ...... ILAVASATV MMI.V 1382621 I ~~~~~~6KrIE I WEE (38362) ASSAACIAKARROCLTPYALAPIAT:VPTALAVLCCIRPPNA ETFGETLNHLWFNNOPFLWAGLCIPLAALVILFFRCF SCCMP FEHATTVPNVPGIPYK 166 1) K 3 L L FLLVAGVCLGEVDA 142362, 55, EEE (38062) OHPCGSFSGL.NL.I...... OTLO]V. Y.N.1... N.F.M.TL ....IVCM.MLAALF..G.A.... C.AW AA. .,TAVM..KV... 1,42062, 56, 166 1, SIN (38362) TVAVL.AC ...T...... VI-.S..L.. V.SA. .T.715. ..S.S... F.V .....FIV.M. .C ..L. .V ... AY.9 Y.y...... 0 142362, 55, 16E 1 I ~~~~I ~ iII

WEE (1761) AVRGALLIVSETSNEVCFTISOKCSEKSKDTRFGYFWGGFDETLEYEADTDAAKHA~VLR.V 1136611 666 (1761) .. .P.... VH.13.3L.NTRII .... .L..I..KYK.KV. .. .V.. .ATO.TSKPHP. .3.G..T ...... 7..T..M..14 .. .356E.5 KY ... GTVGAM4VN.T 1136611 SIN (1761) ..M .VL. ... .3.1. .E...V. K 14 0 1 I( ...... GPAAM .. ..5... LSA..AS. ...14....II 1136611 WEE (13761) YGFJT7AHL DFNVPSRLVAPSASFHVDKLYYFEGMPAGIASOTIATILKSKIVYGVGEWNSGPGT 12556E11 666 (13761) .SV.'WPSA.VY. .. E.. AKIG.A L I. .L.S.W. ... .1.. VYGHE...... TAS..L.SRTSTSN.LY.N.NLK.GR.3AGIV.7.F ..P..F3 .ROE. A..NO-V 12566E11 SIN (13761)..SF.7V K ...... S.T ...... HR T. .TSE.LI.S...... A..V ....S. 1250611

WEE 1375611 666 (25761) ...S.AL.... .P6...V.S ...... 7.1. .7..VS0LE.KITE.T.AS. .... IA. .PTNPVKETEV3FIV.OVLOL. .RM.SPLLRA. .F.F ...ANIHPA.KLGV.TSGI. 1376614 SIN (25661)....A.N.... VDOS.N...... N... .I.T.DA.LVST..E..SE.T.....MA...-VS .... a.0.S...S.7..S.V. .LEE.AV.V .AA...... 13756 11

WEE (37661) CNAECKPPAOHlIISEPHEVD3EFOAAVSETSWNWLLALFGGASSLIVVGLIVL VCSSM4LINTRR * (4396 11 666 )377E1( KGO ....E...VDV.A3HTES.TS-I.A.A.S..KV.V..T.AF..L ... ATA.VALV.FFH.H* 1,441E611 SIN (3766E1) ...... 57.N..1...I...S. .F.....LII..MlF A. .1..75..--* 14396 11

FIG. 2. Comparison of the amino acid sequences of the structural proteins of WEEV (WEE), EEEV (EEE), and Sindbis virus (SIN). A dot in the EEE or SIN sequence means that the amino acid is the same as that of WEE on the first line. Gaps have been introduced for alignment. Potential glycosylation sites are boxed and cysteines are highlighted with dotted overlay. Downloaded by guest on October 1, 2021 6000 Evolution: Hahn et al. Proc. Natl. Acad. Sci. USA 85 (1988) a one amino acid must be introduced into each sequence to maintain alignment), whereas several gaps must be intro- WEE I R G N P I T L Y G * - 26S CAUAAGAGGGAACCCAAUCACCCUCUACGGCUGACCUAAIAUAGGU duced to keep the Sindbis virus sequence aligned. Con- versely, the gap following residue 21 of WEEV E3 is shared EEE I R G H P I T L Y G a CAUAAC-AGbbU/,, itrmtao7% WA(;(C,(AUAA(;;(U(rlsooriAva fttorTAUA UNPLG :UC-A((UAAAUAC7Utt-TI JAoroT~Vor r-Tl~vpr by Sindbis virus and WEEV; downstream ofthis, the Sindbis virus and WEEV sequences are in almost perfect register SIN I R G E I K H L Y G G P K . (only one gap of one amino acid is required to maintain CAUCAGAGGGGAAAUAAAGCAUCUCUACGGUGGUCCUAAUAGUC *** ***** * ** ******** * ********* alignment), whereas numerous gaps are required to keep the EEEV sequence in register. This suggests that the recombi- b nation event occurred between these two gaps in E3, which WEE UAAUUUUUCUUUU GUUUUUAUUUUGUUUUUAAAAUUUC poly (A) is compatible with the sequence similarities exhibited by the capsid proteins and the glycoproteins in Table 1. EEE UAAUUUUUCUUUUAUGUUUUUAUUUUGUUUUUAAUAUUUC poly (A) The 3' crossover appears to have occurred in the 3' untrans- ************* ******************* ***** lated region. The 60 nucleotides of WEEV RNA following the SIN UUUCUUUUAUUAAUCAACAAAAUUUUGUUUUUAACAUUUC poly (A) structural stop codon are very similar to the Sindbis * **** ** ************* ***** protein virus sequence, whereas the last 80 nucleotides ofthe RNA are similar to EEEV, with no sequence similarity detectable in FIG. 3. Comparison of the nucleotide sequences in the junction regions of EEEV (EEE), WEEV (WEE), and Sindbis virus (SIN) (a) between. Although a double crossover seems inherently less or at the 3' end of the RNAs (b). Asterisks denote conserved likely than a single crossover, the presence of important nucleotides. The heavy underlines denote conserved nucleotide replication signals at the 3' end may require such an event to sequences in the alphaviruses that are believed to form important produce viable (or at least efficiently replicating) virus (6, 7). regulatory elements for RNA transcription (7). The termination There is aformal possibility that WEEV is one ofthe paren- codons that end the nonstructural open reading frames are marked tal viruses in a cross that resulted in the reciprocal recomi- with black circles. binants Sindbis virus and EEEV. Because RNA recombina- tion is believed to' occur by a copy-choice mechanism, and WEEV terminate at the same residue, whereas the however, in which reciprocal recombinants are not produced Sindbis virus protein terminates downstream. (24), and because ofthe apparent rarity ofviable recombinant The sequences at the 3' termini of WEEV, EEEV, and viruses, this possibility appears remote. Sindbis virus are shown in Fig. 3b. The 3'-terminal 19 Interaction of the Nucleocapsid and Glycoproteins During nucleotides have been proposed to form an important ele- Virus Budding. Alphaviruses mature when preassembled nu- ment in Alphavirus RNA replication because they are highly cleocapsids, which are icosahedral structures consisting of180 conserved among thembers of this genus (6), and this se- copies of the nucleocapsid protein and one molecule of the quence element (underlined in Fig. 3b) is invariant among virus RNA, acquire an envelope by budding through the these three viruses with the exception of the sixth nucleotide plasma membrane (25, 26). The envelope consists of a lipid from the end. The nucleotides upstream of this are A/U rich bilayer derived from the host cell in which are embedded two and not particularly conserved among alphaviruses, but in virus-encoded glycoproteins, E2 and El. The nucleocapsid this domain the sequences of WEEV and EEEV are almost and the glycoproteins are thought to interact specifically with identical, whereas that of Sindbis virus is more variable. one another, so as to exclude nonvirus proteins from the These results show that within the region examined, the structure; the free energy for driving virus budding is derived WEEV nucleotide sequence is recombinant, with both the 5' from these specific interactions. During evolution, certain and 3' ends derived from an EEEV-like virus and the inter- domains of the glycoproteins of a particular virus must have vening glycoprotein genes derived from a Sindbis-like virus. beeh selected for maximal specific interaction with the capsid We presume that the 5'-terminal two-thirds of the genome, of that virus. In a recombinant virus that contains the capsid which has not yet been sequenced, is also derived from the protein from one virus and the glycoproteins from another, the EEEV-like, virus. Partial support for this comes from our interactions during budding might not be optimal. During previous finding that the 5' terminal sequence of HJV is passage ofsuch a recombinant virus, selection pressure would similar to that of EEEV (5). favor variants in which the nucleocapsid and glycoprotein The Recombination Events. Our interpretation of the se- interactions were improved. It is thus of considerable interest quence information is shown schematically in Fig. 4, which that there are only seven amino acid differences between is included in part to illustrate the structure of the alphavirus WEEV and EEEV in the carboxyl-terminal 104 amino acids of genome. In this model, close inspection of the aligned se- the capsid protein, and for 6 of these WEEV has the Sindbis quences in Fig. 2 suggests that the 5' crossover occurred in virus amino acid (Fig. 2). This suggests that this domain ofthe E3. Gaps must be introduced into the amino acid sequences capsid protein interacts with the glycoproteins during virus to align them, and the two gaps of three amino acids each in assembly and that, after the recombination event, selection E3 are of particular interest. The first gap, following residue has led to some of the EEEV capsid amino acids being 1, is shared by WEEV and EEEV; upstream of this WEEV replaced with Sindbis virus amino acids to allow more efficient and EEEV are in almost perfect alignment (only one gap of interaction with the Sindbis virus glycoproteins. Conversely, in the carboxyl-terminal 16 amino acids of E2, there are 6 nsPl nsP2 nsP3 nsP4 C E3 E2 6K El amino acid differences between WEEV and Sindbis virus, and I H 11 11 > SIN for 4 ofthese WEEV has the EEEV amino acid, suggesting by q the same logic that this domain of E2 interacts with the capsid x x during budding. Other examples can be found in other regions EEE of the structural proteins. The hypothesis that these domains {S~~~~~~~1 are involved in capsid-glycoprotein interactions can be tested by site-specific mutagenesis (27). _IMMEMOMMEN", WEE DISCUSSION FIG. 4. Schematic representation of the recombination event The Origin of WEEV. The two parents of WEEV and the that produced WEEV (WEE). The crossover points to produce WEE time of the recombination event cannot be determined at the are indicated. SIN, Sindbis virus; EEE, EEEV. current time. As described earlier, the McMillan strain of Downloaded by guest on October 1, 2021 Evolution: Hahn et al. Proc. Natl. Acad. Sci. USA 85 (1988) 6001

WEEV isolated in 1941 in Canada and the BFS1703 strain 2. Chamberlain, R. W. (1980) in The Togaviruses, ed. Schlesin- isolated in 1953 in California are clearly strains of the same ger, R. W. (Academic, New York), pp. 175-227. virus. They have nearly identical capsid proteins, glycopro- 3. Calisher, C. H., Shope, R. E., Brandt, W., Casals, J., Kara- batsos, N., Murphey, F. A., Tesh, R. B. & Wiebe, M. E. teins E2 and El, and 3'-terminal sequences. Thus, the recom- (1980) Intervirology 14, 229-232. bination event could not have occurred during passage of the 4. Hayes, C. G. & Wallis, R. C. (1977) Adv. Virus Res. 21, 37-83. virus in culture, as this would have required the identical 5. Ou, J.-H., Strauss, E. G. & Strauss, J. H. (1983) J. Mol. Biol. recombination event to have occurred twice, in different 168, 1-15. laboratories. By the same logic, the recombination event must 6. Ou, J.-H., Trent, D. W. & Strauss, J. H. (1982) J. Mol. Biol. have predated the isolation of the McMillan strain of WEEV 156, 719-730. 7. Strauss, E. G. & Strauss, J. H. (1986) in The Togaviridae and in 1941. Furthermore, all ofthe sequence information obtained , eds. Schlesinger, S. & Schlesinger, M. (Plenum, is compatible with the hypothesis that the recombinant virus New York), pp. 35-90. arose before the separation of WEEV and HJV. On the other 8. Niklasson, B., Espmark, A., LeDuc, J. W., Gargan, T. P., hand, the amino-terminal portions of the capsid proteins of Ennis, W. A., Tesh, R. B. & Main, A. J. (1984) Am. J. Trop. WEEV and EEEV are very similar, a lysine- and arginine-rich Med. Hyg. 33, 1212-1217. domain not well conserved among alphaviruses (28). Thus, the 9. Strauss, E. G., Rice, C. M. & Strauss, J. H. (1984) similarity in the WEEV and EEEV sequences, together with 133, 92-110. 10. Garoff, H., Frischauf, A.-M., Simons, K., Lehrach, H. & the fact that RNA viruses diverge rapidly (21), suggests that Delius, H. (1980) Nature (London) 288, 236-241. the recombination event must be relatively recent. We pro- 11. Garoff, H., Frischauf, A.-M., Simons, K., Lehrach, H. & pose that one of the parents was EEEV itself. The sequence Delius, H. (1980) Proc. Nati. Acad. Sci. USA 77, 6376-6380. similarities with Sindbis virus in the envelope protein regions 12. Takkinen, K. (1986) Nucleic Acids Res. 14, 5667-5682. are not as pronounced and suggest that the second parent was 13. Dalgarno, L., Rice, C. M. & Strauss, J. H. (1983) Virology 129, not Sindbis virus itselfbut a relative ofit. Because WEEV and 170-187. EEEV are New World viruses, we propose that the recom- 14. Chang, G.-J. J. & Trent, D. W. (1987) J. Gen. Virol. 68, 2129- 2142. bination event occurred in the New World between EEEV or 15. Kinney, R. M., Johnson, R. J. B., Brown, V. C. & Trent, an immediate ancestor ofit and a Sindbis-like virus that has yet D. W. (1986) Virology 152, 400-413. to be identified. It seems most likely that the recombination 16. Hardy, J. L., Reeves, W. C., Rush, W. A. & Nir, Y. D. (1974) event took place in the mosquito vector, in which the virus sets Infect. Immun. 10, 553-564. up a persistent life-long infection. EEEV and HJV overlap in 17. Lindqvist, B. H., DiSalvo, J., Rice, C. M., Strauss, J. H. & geographic ranges and mosquito vector. Thus HJV might Strauss, E. G. (1986) Virology 151, 10-20. represent the ancestral recombinant virus that radiated to 18. Maxam, A. M. & Gilbert, W. (1980) Methods Enzymol. 65, produce WEEV. 499-560. in RNA Virus Evolution. 19. Smith, D. R. & Calvo, J. M. (1980) Nucleic Acids Res. 8, 2225- Recombination There has been 2274. much speculation about the importance of recombination in 20. Ou,J.-H., Rice, C. M.,Dalgarno, L., Strauss, E. G. & Strauss, the evolution of RNA viruses (29, 30). In segmented RNA J. H. (1982) Proc. Natl. Acad. Sci. USA 79, 5235-5239. viruses, of individual genome segments during 21. Bell, J. R., Bond, M. W., Hunkapiller, M. W., Strauss, E. G., mixed infection, a form of recombination equivalent to the Strauss, J. H., Yamamoto, K. & Simizu, B. (1983) J. Virology shuffling of chromosomes in diploid creatures, is readily 45, 708-714. demonstrated in cell culture. Reassortment is a major mech- 22. Steinhauer, D. A. & Holland, J. J. (1987) Annu. Rev. Micro- anism for generating new strains of virus biol. 41, 409-433. (31, 32), and it may be that the ability to undergo ready 23. Smith, D. B. & Inglis, S. C. (1987)J. Gen. Virol. 68, 2729-2740. recombination conveys significant selective advantage. 24. Kirkegaard, K. & Baltimore, D. (1986) Cell 47, 433-443. has 25. Strauss, E. G. & Strauss, J. H. (1985) in Virus Structure and Among the nonsegmented RNA viruses, recombination Assembly, ed. Casjens, S. (Jones & Bartlett, Boston), pp. 205- been in general more difficult to demonstrate, but it has been 234. shown to occur in the picornaviruses (33, 34), the coronavi- 26. Fuller, S. D. (1987) Cell 48, 923-934. ruses (35, 36), and the bromoviruses (37), although not before 27. Rice, C. M., Levis, R., Strauss, J. H. & Huang, H. V. (1987) now in the alphaviruses. In poliovirus, recombination occurs J. Virol. 61, 3809-3819. by a copy-choice mechanism during RNA replication (24), 28. Rice, C. M. & Strauss, J. H. (1981) Proc. Natl. Acad. Sci. USA and it is assumed that all RNA recombination (as opposed to 78, 2062-2066. reassortment) occurs by this mechanism. Although well 29. Strauss, J. H. & Strauss, E. G. (1988) Annu. Rev. Microbiol. established in evidence for the of 42, 657-683. principle, importance 30. Hodgman, T. C. & Zimmern, D. (1988) in RNA Genetics, eds. recombination in nature as a mechanism that leads to suc- Domingo, E., Holland, J. J. & Ahlquist, P. (CRC Press, Boca cessful new strains is limited. In the case of poliovirus, Raton, FL), Vol. 3, in press. recombination has been shown to occur in vaccinees that 31. Desselberger, U., Nakajima, K., Alfino, P., Pederson, F. S., have simultaneously received high doses of three attenuated Haseltine, W. A., Hannoun, C. & Palese, P. (1978) Proc. Natl. viruses (34), but this is not a natural system. The finding that Acad. Sci. USA 75, 3341-3345. WEEV, a virus with a wide geographic range, is a naturally 32. Webster, R. G., Laver, W. G., Air, G. M. & Schild, G. C. occurring recombinant lends support to the hypothesis that (1982) Nature (London) 296, 115-121. RNA recombination is an important force in the evolution of 33. Cooper, P. D. (1977) in Comprehensive Virology, eds. RNA viruses. In this particular case, it has given rise to a new Fraenkel-Conrat, H. & Wagner, R. R. (Plenum, New York), virus that combines the of EEEV Vol. 9, pp. 133-207. disease-causing potential 34. Kew, 0. M. & Nottay, B. K. (1984) in Modern Approaches to with new antigenic properties from a Sindbis virus-like virus. : Molecular and Chemical Basis of Virus Virulence, We thank Drs. M. Stanley and J. Hardy for the WEEV RNA used ed. Channock, R. M. (Cold Spring Harbor Lab., Cold Spring in this project. This work was supported by Grants A120612 and Harbor, NY), pp. 357-362. A110793 from the National Institutes of Health and Grant DMB86- 35. Lai, M. M. C., Baric, R. S., Makino, S., Keck, J. G., Egbert, J., 17372 from the National Science Foundation. Leibowitz, J. L. & Stohlman, S. A. (1985) J. Virol. 56, 449-456. 36. Makino, S., Keck, J. G., Stohlman, S. A. & Lai, M. M. C. 1. Griffin, D. E. (1986) in The Togaviridae and Flaviviridae, eds. (1986) J. Virol. 57, 729-737. Schlesinger, S. & Schlesinger, M. (Plenum, New York), pp. 37. Bujarski, J. J. & Kaesberg, P. (1986) Nature (London) 321, 209-249. 528-531. Downloaded by guest on October 1, 2021