<<

Proc. Natl. Acad. Sci. USA Vol. 90, pp. 10539-10543, November 1993 Microbiology

RNA sequence of : Distinctive genomic organization and a putative -like ribosomal frameshifting signal that directs the viral replicase synthesis

BAOMING JIANG*t, STEPHAN S. MONROE*, EUGENE V. KOONINt, SARAH E. STINE*, AND ROGER I. GLASS* *Viral Section, Centers for Disease Control and Prevention, Atlanta, GA 30333; and tNational Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894 Communicated by Bernard Fields, July 27, 1993

ABSTRACT The genomic RNA of astrovirus was In the present study, we have sequenced and analyzed the sequenced and found to contain 6797 nt organized into three entire genomic RNA of H-Ast2§ and compared the sequence open reading frames (la, lb, and 2). A potential ribosomal and genomic structure with those of other positive-strand frameshift site identified in the overlap region of open reading RNA . The results highlight the specific genomic frames la and lb consists ofa "shifty" heptanucleotide and an organization of and support their classification RNA stem-loop structure that closely resemble those at the in a separate family. gag-pro junction of some . This translation frame- shift may result in the suppression of in-frame amber termi- nation at the end of open reading frame la and the synthesis of MATERIALS AND METHODS a nonstructural, fusion polyprotein that contains the putative Cells and Virus. LLCMK2 cells (ATCC CCL 7.1) were protease and RNA-dependent RNA polymerase. Comparative propagated in Earle's minimal essential medium (MEM) sequence analysis indicated that the protease and polymerase of supplemented with antibiotics-and 10% fetal bovine serum. astrovirus are only distantly related to the respective enzymes H-Ast2 was obtained from John Kurtz (Oxford, England) and of other positive-strand RNA viruses. The astrovirus polypro- used to infect LLCMK2 cells in MEM/trypsin at 5 pg/ml as tein lacks the RNA domain typical of other positive- described (2). Virions were partially purified from infected strand RNA viruses of similar size. The genomic cell lysates by centrifuging through a 30% (wt/vol) sucrose organization and expression strategy of astrovirus, with the cushion, suspension in TNE buffer [0.05 M Tris (pH 7.5)/0.1 protease and the polymerase brought together by predicted M NaCl/5 mM EDTA]/1% SDS, and extraction with phenol/ frameshift, most dosely resembled those of plant . chloroform. Virion RNA was precipitated with 2 M LiCl and Specific features of the sequence and genomic organization used for both sequencing and PCR assays. support the classification ofastroviruses as an additional family cDNA Synthesis and Sequencing. Single-stranded cDNA of positive-strand RNA viruses, designated Astroviridae. was synthesized from virion RNA with Super reverse tran- scriptase (Molecular Genetics Resources, Tampa, FL) by Astroviruses were originally identified from the feces of using primers derived originally from cDNA sequence (8) and infants with gastroenteritis on the basis of distinctive ultra- subsequently from sequences determined by directly se- structural features: five- or six-pointed surface stars are quencing virion RNA, using a "primer walking" technique. characteristic of this agent (1). These nonenveloped agents DNA fragments ofvarious length were amplified by the PCR were subsequently determined to be positive-strand RNA assay with Taq polymerase (Perkin-Elmer) and virus-specific viruses (2, 3). Five serotypes have been defined to date based primers. Sequences were determined from three sources: on their distinct antigenicity (4). Astroviruses cause acute virion RNA, PCR DNA, and cDNA clones (8). Virion RNA gastroenteritis in children and adults worldwide (5), but the was directly sequenced by an RNA disease burden has been difficult to determine because of the using sequencing kit lack of sensitive diagnostic assays. Recent studies have (Boehringer Mannheim). Both the PCR DNA and the cloned shown that astroviruses are more frequently found in children cDNA were sequenced by using the Sequenase version 2.0 with than was previously thought (6). In addition, DNA sequencing kit (United States Biochemical). Sequences astroviruses have been detected in the diarrheal feces of on both strands of DNA were determined with each base various animals (5). sequenced at leastfourtimes. Sequences were assembled and Studies of the biochemical properties of purified particles aligned by using the Genetics Computer Group sequence- have provided divergent results' on the number and size of analysis package (9), and a consensus sequence was derived. proteins in astroviruses; two to six polypeptides have been Sequences of the 5' and 3' ends of the genomic RNA were reported, ranging in size from 5.5 kDa to 42 kDa (5). Although determined by following the procedure of Lambden et al. the fastidious growth of astroviruses in vitro has hindered (10). Briefly, a synthetic primer 1 was ligated to the 3' ends characterization of the genome, several investigators (3, 7) ofvirion RNA or cDNA corresponding to the 5' end ofvirion have reported partial sequence information from both inter- RNA with T4 RNA ligase (GIBCO/BRL). cDNA fragments nal regions and the 3' end of human astrovirus serotype 1 (400-600 bp) spanning either the 5' or the 3' ends were (H-Astl), and we have recently sequenced and characterized produced by PCR amplification using a primer 2 complemen- the subgenomic RNA of serotype 2 (H-Ast2; ref. 8). How- ever, the complete sequence and the genomic organization of Abbreviations: H-Ast2, human astrovirus serotype 2; RdRp, RNA- astroviruses remained unknown, and their classification was dependent RNA polymerase; NLS, nuclear localization signal; ORF, tentative. open reading frame; RHDV, rabbit hemorrhagic disease virus. tTo whom reprint requests should be addressed at: Mailstop G04, Centers for Disease Control and Prevention, 1600 Clifton Road, The publication costs of this article were defrayed in part by page charge Atlanta, GA 30333. payment. This article must therefore be hereby marked "advertisement" §The sequence reported in this paper has been deposited in the in accordance with 18 U.S.C. §1734 solely to indicate this fact. GenBank data base (accession no. L13745). 10539 Downloaded by guest on October 1, 2021 10540 Microbiology: Jiang et al. Proc. Natl. Acad. Sci. USA 90 (1993) 5' 3, Genome I AA 1 6797 Subgenomic RNA AA 4314 6797

ORF lb

2773 Pol 4329 ORF 2 ORF la V RFS _ _ -- 1 L - i~~~~~~~~~~~~~~~~~~~~~~ NLS MB Pro 4325 6712 3 MB 2842

...... I .I ....a I II ..I. I II I I IIIII I I I .I .....I III I a I a II I II . .. I ...... 0 1,000 2,000 3,000 4,000 5,000 6,000 Nucleotides

FIG. 1. Genomic organization of human astrovirus. The locations of three ORFs, predicted transmembrane helices (MB), protease (Pro), nuclear localization signal (NLS), ribosomal frameshift structure (RFS), and RNA-dependent RNA polymerase (Pol) are indicated. ORFs la and lb encode a putative nonstructural polyprotein, and ORF 2 codes for a -protein precursor.

tary to the primer 1 and virus-specific primers and were groups of positive-strand RNA viruses-namely, animal sequenced by using internal primers. coronaviruses and arteriviruses, and plant luteoviruses and Comparative Sequence Analysis. Both nucleotide and de- dianthoviruses (19-22). However, the putative frameshifting duced amino acid sequences were compared by using the signal of astrovirus was much less similar to the frameshift BLAST program (11) and the BLOsUM62 matrix (12). Multiple regions of these viruses than to those of some retroviruses alignments were done by using the OPrAL or MACAW pro- (data not shown). The ribosomal frameshifting during trans- grams (13, 14). A phylogenetic tree was constructed by using lation ofastrovirus RNA probably directs the synthesis ofan clustering unweighted pairwise group maximum averages ORF la/lb fusion nonstructural polyprotein of 1416 aa with (UPGMA), neighbor-joining, least-square (Fitch-Margo- a predicted molecular mass of 161 kDa. liash), and protein-parsimony algorithms as implemented in The nucleotide sequence of the astrovirus genomic RNA the PHYLIP package (15). and the deduced amino acid sequences of the nonstructural polyprotein and the capsid protein of H-Ast2 were compared RESULTS AND DISCUSSION with partial sequences available for H-Astl. Between sero- The genomic RNA of H-Ast2 is 6797 nt in length, excluding types, the ribosomal frameshifting region was completely 31 adenines [poly(A) tail] at the 3' end. The genome possesses conserved (data not shown), and the amino acid sequence three overlapping open reading frames (ORFs la, lb, and 2; was highly conserved (>90%o identical) in a portion of the Fig. 1). The sequences surrounding the first AUG codons of nonstructural polyproteinsl (3) but was less conserved (51- ORFs la and 2 are predicted to be optimal for initiating 56% identical) in the predicted capsid-protein regions (3, 7). translation (16). ORF la is preceded by 82 untranslated Of interest, a region in the C-terminal portion of the non- nucleotides and encodes a polypeptide of 920 aa. Interest- structural polyprotein was significantly similar to the putative ingly, ORF lb, which overlaps ORF la by 70 nt, is in reading RNA-dependent RNA polymerases (RdRps) of plant bymo- frame +1, and its first AUG codon, which is predicted to be viruses (P = 0.015) and (P = 0.095). This region weak, is located 380 nt downstream of the ORF la termina- contained the eight conserved motifs typical of the positive- tion codon. ORF 2, present also in the subgenomic RNA, strand RNA virus RdRps, indicating that it belongs to the overlaps ORF lb by 5 nt, begins with an initiation codon at so-called supergroup I, which includes the polymerases of nt 4325, and ends with a stop codon 82 bases from the 3' end. , caliciviruses, potyviruses, and several other ORF 2 codes for a capsid-protein precursor of 796 aa with a groups of plant viruses (Fig. 3A; refs. 29 and 30). predicted molecular mass of 88 kDa (8). Comparison of the protein sequence of astrovirus with a The existence of two separate ORFs (la and lb) located in data base of sequences of other positive-strand RNA viruses two different reading frames prompted us to examine the identified a region of similarity with RHDV that included the 70-nt overlap region in more detail. A potential ribosomal putative catalytic cysteine of the RHDV protease. Using the frameshift signal was identified, consisting of the "shifty"' previously published alignments of chymotrypsin-related heptanucleotide (AAAAAAC) from position 2791 to 2797, proteases of positive-strand RNA viruses, we identified, in followed by a stem-loop structure that may form a the putative protease domain of astrovirus, the conserved pseudoknot with a downstream sequence (Fig. 2A). The segments surrounding the three catalytic amino acid residues putative frameshift signal of the astrovirus showed a striking and a fourth distal segment implicated in substrate binding resemblance to those at the gag-pro junction of some retro- (Fig. 3B; ref. 31). In addition, sequences of the putative viruses, such as mouse mammary tumor virus (Fig. 2B) and proteases of H-Ast2, RHDV, and were fit perfectly the simultaneous tRNA slippage model of -1 frameshifting described for the synthesis of the gag-related lWillcocks, M. M. & Carter, M. J., Third International Symposium polyproteins (18). Ribosomal frameshifting recently has been on Positive Strand RNA Viruses, Sept. 19-24, 1992, Clearwater, shown to be a normal expression mechanism in several FL, pp. 2-47 (abstr.). Downloaded by guest on October 1, 2021 Microbiology: Jiang et al. Proc. Natl. Acad. Sci. USA 90 (1993) 10541 A. Astrovirus G- A C C A A A

2810 G G 2790 2800 C FIG. 2. The putative ribosomal frameshifting signal in the astrovirus genome. (A) Nucleotide GCCCCAAAAAACUACAAA CAAAAUUAUCACUCAtIMUGCAIGGAAAUCAU sequence and predicted RNA secondary structure in the overlap region of astrovirus ORFs la and lb. ORF lb: P KK L Q I I I I The putative frameshift site ("shifty" heptanucle- ORF la: A P K N Y K 2830 2840 2850 2860 otide sequence) is underlined, and the termination la-lb A P K K L Q codon for ORF la is boxed. A potential pseudoknot structure was predicted by searching the region downstream of the stem-loop structure for se- B. MMTV quences complementary to the loop sequence. C C Three base pairs may be sufficient for the pseudoknot formation (17), but the formation of a larger "secondary" stem with a noncanonical GA pair (shown by a dotted line) and two additional A canonical base pairs is also possible. The deduced amino acid sequences of ORFs la, lb, and la-lb GC surrounding the frameshift site are shown. (B) Nu- cleotide sequence and predicted RNA secondary structure in the gag-pro overlap region of mouse AAUUCAAAAAACUUG G C GCUCAAAAGGGGGAUGGAGUU mammary tumor virus (MMTV) (18) are shown for pro: F KK LL comparison. The frameshift site, the termination gag: N S K N L * codon, and the RNA pseudoknot are indicated or gag-pro: N S K K LL described as in A. aligned. For a 118-aa residue overlap, an adjusted alignment residue (39, 40). This region of the H-Ast2 polyprotein has no score of 5.4 SDs above random expectation with an evolu- appropriately located tyrosines and has only one serine tionary distance of 214 was observed, values indicating a (Ser-420), suggesting that this serine may be the RNA-linking genuine evolutionary and functional relationship, given that amino acid of VPg. The NLS, spanning aa 666-682, is additional evidence (e.g., conservation of specific functional identical to that of H-Astl. This signal may be involved in motifs) is available (32). transport of astrovirus proteins to the nucleus, as substanti- An important feature of the putative protease of H-Ast2 is ated by the fact that astrovirus products were detected by the substitution of serine for the catalytic cysteine found in immunofluorescence in the nucleus of bovine astrovirus- most positive-strand RNA-virus proteases of superfamily I. infected cells (42). The astrovirus NLS perfectly fits the Previously, an analogous substitution was found in the pu- consensus for the bipartite-signal motif comprising two clus- tative proteases of sobemoviruses, luteoviruses, and arteri- ters ofbasic amino acid residues separated by a 10-aa spacer viruses (Fig. 3B; refs. 20, 31, 33, 34). However, the putative region (43). In a curious analogy, both the protease and the protease of H-Ast2 showed less similarity to these viral RdRp of potyviruses contain similar NLSs and are accumu- than the caliciviruses. lated in the nuclei of infected plant cells (44, 45). proteases to cysteine proteases of Although the data base screening failed to detect other An extensive search of the astrovirus nonstructural poly- sequences significantly similar to the capsidprotein ofH-Ast2, protein sequence for motifs defining other conserved do- direct comparison of this capsid sequence with the sequences mains of positive-strand RNA viruses failed to identify can- of other positive-strand RNA virus-capsid proteins identified didate regions for an RNA helicase, methyltransferase, or a conserved domain (from position 107 to 286) with hepatitis papain-like protease (35-37). Absence ofthe helicase domain E virus (from position 159 to 337), an agent phylogenetically is remarkable because this domain has always been identified remote from astrovirus and other supergroup I viruses in terms in positive-strand RNA viruses with >6000 nt (38). of the comparison of RdRps and the other principal nonstruc- Absence of the methyltransferase domain suggested that tural domains (data not shown; ref. 46). Because both astro- astrovirus may encode VPg, a viral protein covalently linked virus and hepatitis E virus replicate in the human gut, this to the 5' end of the viral genome (39, 40), a conjecture conserved domain might have resulted from a recombinational compatible with the affinity of the putative H-Ast2 polymer- event during coinfection. Of interest, astrovirus-like particles ase with supergroup I RdRps, which mostly belong to VPg- have been reported (47) in association with fatal hepatitis in containing viruses (29, 30). ducklings, suggesting a possible hepatic tropism for this virus. Additional features detected by computer analysis of the To gain further insight into the evolutionary relationship of nonstructural polyprotein ofH-Ast2 included four transmem- astroviruses, we generated a tentative phylogenetic tree (15) brane a-helices and a NLS (Fig. 1). The transmembrane for the supergroup I RdRps, including the H-Ast2 sequence. helices were located in the region upstream of the protease The result showed that astroviruses constitute a distinct and may be involved in anchoring the viral RNA replication evolutionary lineage not closely associated with any other complex in the membrane, as described for the 3A or 3AB group of viruses (Fig. 4). We are inclined to interpret the proteins ofpoliovirus (41). In all positive-strand RNA viruses relatively high similarity with the RdRps ofbymoviruses and for which the VPg domain has been localized, it is found potyviruses (see above) as conservation of ancestral features within a short region between a (putative) transmembrane rather than direct evidence of common origin. segment and the protease (E.V.K., unpublished data) and is Our data show that astrovirus has no close relatives among linked to the 5' end ofthe viral RNA by a tyrosine or a serine other viruses, as demonstrated by comparative sequence Downloaded by guest on October 1, 2021 10542 Microbiology: Jiang et al. Proc. Natl. Acad. Sci. USA 90 (1993) A I II III IV

consl ..K.E. .&.& R&& . &. &G . . . .&. .D&..@D.... H-Ast 1086 FLKKEI 13 IVCADPIYTRIGA-CLEAHQNAL-MK-QHTD 5 CGWSPMEGGFKK 15 FDWTRYDGTIP ** *S * .*.*. . BaYMV 197 SLKAEL 12 VFTASPITSLFAM-KFYVDDFNK-KF-YATN 6 VGINKFGRGWEK 14 GDGSRFDSSID TEV 169 SLKAEL 12 TFTAAPIDTLLAG-KVCVDDFNN-QF-YDLN 6 VGMTKFYQGWNE 13 ADGSQFDSSLT FCV 14051 92 GLKDEL 12 MIWGCDVGVATVCAAA-FKGVSDAITANHQY 8 MDSPSVEALFQR 9 VDYSKWDSTQS RHDV GLKDEL 12 LLWGCDVGVA-VCAAAVFHNICYKLKMVARF 8 MTSRDVDVIINN 10 LDYSKWDSTMS SRSV 1391 ALKDEL 13 LLWGADLGTV-VRAARAFGPFCDAIKSHTIK 7 NSIEDGPLIYAE 9 ADYTAWDSTQN PV 156 YVKDEL 12 LIEASSLNDSVAM-RMAFGNLYA-AFHKNPG 6 VGCDP-DLFWSK 11 FDYTGYDASLS EMCV 156 FLKDEL 12 IVDVPPFEHCILG-RQLLGKFAS-KFQTQPG 6 IGCDP-DVHWTA 13 VDYSNFDSTHS FMDV 161 FLKDEI 12 IVDVLPVEHILYT-RMMIGRFCA-QMHSNNG 6 VGCNP-DVDWQR 13 VDYSAFDTNHC ECH022 163 CLKDEL 12 CIEACEVDYCIVY-RMIMMEIYD-KIYQTPC 6 VGINP-YKDWHF 11 MDYSQYDGSLS HAV 164 CPKDEL 12 AIDACPLDYSILC-RMYWGPAIS-YFHLNPG 6 IGIDP-DRQWDE 14 LDFSAFDASLS PLRV 284 FVKGEP 12 LIMSVSLVDQLVA-RVLFQNQNKREISLWRS 6 FGLST-DTQTAE 27 TDCSGFDWSVA SBMV 215 FVKQEP 12 LISSVSIVDQLVE-RMLFGAQNELEIAEWQS 6 MGLSV-IHQADA 16 ADISGFDWSVQ v VI VII VIII cons 1 ....SG ...... NS.& ..&.&. .& . &GDD. && ....@~U... .U... T S C H-Ast 49 TRGNPSGQFSTTMDNNMVNFWLQAFEF 15 TVVYGDDRLS 36 VGLSFCGFT 13 KLMASLLKPY 71 ** * ** ** . ***. . * *** BaYMV 45 NVGNNSGQPSTVVDNTLVLMTAFLYAY 17 FVCNGDDNKF 34 CENPYMSLT 10 SLPVERI IAI 80 TEV 47 HKGNNSGQPSTVVDNTLMVIIAMLYT- 12 YYVNGDDLLI 33 TQLWFMSHR 10 KLEEERIVS I 98 FCV 43 SSGLPSGMPLTSVINSLNHCLYVGCAI 20 MMTYGDDGVY 39 DSVVFLKRT 10 LLDRSSILRQ 102 RHDV 43 KRGLPSGMPFTSVINSICHWLLWSAAV 19 FYTYGDDGVY 39 NKISFLKRT 10 KLDKSSILRQ 5 SRSV 43 KEGLPSGFPCTSQVNSINHWLITLCAL 16 FSFYGDDEIV 38 DGLVFLRRT 10 RLDRASIERQ 95 PV 40 KGGMPSGCSGTSIFNSMINNLIIRTLL 13 MIAYGDDVIA 36 ENVTFLKRF 13 VMPMKEIHES 61 EMCV 43 TGGLPSGCAATSMLNTIMNNIIIIRAGL 13 VLSYGDDLLV 37 EDVVFLKRK 10 VMNREALEAM 57 FMDV 43 EGGMPSDCSATGIINTILNNIYVLYAL 1 3 MISYGDDIVV 38 TDVTFLKRH 12 VMASKTLEAI 59 ECH022 42 HGGMPSGSPCTTVLNSLCNLMMCIYTT 10 PIVYGDDVIL 37 MEVEFLKRK 13 LLDTENMI QH 63 HAV 43 CGSMPSGSPCTALLNSIINNVNLYYVF 16 ILCYGDDVLI 42 SELTFLKRS 10 AISEKTIWSL 69 PLRV 50 PGVQKSGSYNTSSSNSRIR-VMAAYHC 4 AMAMGDDALE 19 -ELEFCSHI 9 VNTNKMLYKL 51 SBMV 49 PGIMKSGSYCTSSTNSRIR-CLMAELI 4 B CIAMGDDSVE 31 YAVEFCSHV 8 TSWPKTLYRF 100 * * * H-Ast 3C1 453 NDIVTAAHV 25 KDIAFITCPG 47 RTQDGMSGAPVC-DKYG---RVLAVHQTN 847 FCV 3C1 1102 GVYASVAHV 18 GEFCCFRSTK 47 ETHPGDCGLPYI-DDNG---RVTGLHTGS 552 RHDV 3C1 112 7 GLYISNTHT 14 TDLCLVKGES 45 QTTHGDCGLPLY-DSSG---KIVAIHTGK ? SRSV 3C1 1070 TVFITTTHV 21 GEFTQFRFSK 70 GTIPGDCGAPYV-HKRGNDWVVCGVHAAA 529 PLRV 3C1 39 NALVTAEH- 29 NDISILVGPP 53 NTGPGYSGTGFW-SSKN---- LLGVLKGF 28 SBMV 3C1 41 DVLMVPHHV 31 IDFVLVKVPT 53 PTAKGWSGTPLY-TRDG---- IVGMHTGY 24 BaYMV NIa 220 DWILVPGHL 34 -DVIAIRRPA 51 STVLGMCGCQFWTLER----QIDGIHVAT 74 TEV NIa 226 PFIITNKHL 34 -DMIIIRMPK 53 QTKDGQCGSPLVSTRDG ---FIVGIHSAS 72 PV 3C 32 NVAILPTHA 28 LEITIITLKR 61 PTRAGQCGG-VITCT-G--- KVIGMHVGG 19 EMCV 3C 38 RTLVVNRHM 31 TDVSFIRLSS 64 NTRKGWCGSALLADL-GGSKKILGIHSAG 25 FMDV 3C 38 TAYLVPRHL 35 SDAALMVLHR 64 ATRAGYCGGAVLAKD-GADTFIVGTHSAG 29 HAV 3C 38 DWLLVPSHA 37 QDVVLMKVPT 69 AWRPGMCGGALVSSNQSIQNAILGIHVAG 23 ECH022 3C 39 DEIILHGHS 35 MDLAILKCKL 62 KSCKGMCGGLLISKVEG-NFKILGMHIAG 20

FIG. 3. Amino acid sequence alignment of the predicted functional domains of astrovirus with related domains of other positive-strand RNA viruses. (A) Putative RNA-dependent RNA polymerases. The designation of the motifs has been described (23). The consensus shows amino acid residues that are conserved in at least 80% of the polymerases of supergroup I (23-25). U, bulky aliphatic residue (I, L, M, V); @, aromatic residue (F, Y, W); &, bulky hydrophobic residue (aliphatic or aromatic); and *, any residue. Residues conserved in the (putative) polymerases of all positive-strand RNA viruses of eukaryotes are highlighted by boldface type. Stars denote identical residues, and colons denote similar residues in the sequences of the (putative) polymerases of H-Ast2 and barley yellow (BaYMV). The alignment was generated by the MACAW program (14) using the available information on conserved motifs in viral polymerases (26). Distances between the aligned conserved motifs and from the protein (or the polyprotein for astroviruses and caliciviruses) termini are indicated. The sequences were from GenBank (26-28). TEV, tobacco etch ; RHDV, rabbit hemorrhagic disease virus; FCV, feline calicivirus; SRSV, small round structured virus; FMDV, foot-and-mouth-disease virus (type 01K); EMCV, encephalomyocarditis virus; PV, (type 1); HAV, hepatitis A virus; ECHO22, echovirus (type 22); SBMV, southern bean mosaic sobemovirus; and PLRV, potato leafroll . (B) Putative chymotrypsin-like proteases. The same set of sequences was used as in A, but the sequences were regrouped to show the closer similarity between the putative proteases of the astrovirus and the caliciviruses. *, Putative catalytic residues; !, residues implicated in substrate binding (13); 3C1, 3C-like protease (after 3C protease of picornaviruses); NI, nuclear inclusion protein. Other designations, abbreviations, the procedure for alignment generation, and the sources of the sequences are as in A. analysis, and that its genomic organization is distinctive is remarkable, however, that astroviruses combine features among animal viruses. Astrovirus can be distinguished from typical ofseveral very different groups ofpositive-strand RNA other positive-strand, nonenveloped RNA viruses-Picorna- viruses and even retroviruses (the frameshift signal). Of spe- viridae, , and hepatitis E virus-by the presence cial interest is the similarity of the genomic organization and of the ribosomal frameshift and the lack of a helicase domain. expression strategy of astrovirus and plant luteoviruses (51). Otherwise, the closest similarity in genomic organization is Both groups of viruses lack the helicase domain, whereas the with caliciviruses, which differ from astrovirus in that ORF 2 protease and the polymerase domains are apparently fused via encoding the capsid protein is separated from the 3' end by a ribosome frameshifting. Moreover, both share the substitution small ORF 3; picornaviruses differ by the 5' terminal local- of serine for the catalytic cysteine in the viral proteases. ization of the capsid-protein genes and by the absence of The present findings strongly support the classification of subgenomic RNA; and hepatitis E virus has a distinct array of astroviruses in another family, Astroviridae. The availability nonstructural genes, even though the gene encoding the struc- of sequence information will be useful in the development of tural protein is similarly localized at the 3' end (46, 48-50). It sensitive diagnostic assays to further our understanding of Downloaded by guest on October 1, 2021 Microbiology: Jiang et al. Proc. Natl. Acad. Sci. USA 90 (1993) 10543 14. Schuler, G. D., Altschul, S. F. & Lipman, D. J. (1991) Proteins 9, 180-190. 15. Felsenstein, J. (1989) Cladistics 5, 164-166. 16. Kozak, M. (1991) J. Biol. Chem. 266, 19867-19870. 17. Pleij, C. W. (1990) Trends Biochem. Sci. 15, 143-147. 18. Chamorro, M., Parkin, N. & Varmus, H. E. (1992) Proc. Natl. Acad. Sci. USA 89, 713-717. 19. Brierley, I., Digard, P. & Inglis, S. C. (1989) Cell 57, 537-547. 20. den Boon, J. A., Snijder, E. J., Chirnside, E. D., de Vries, A. A., Horzinek, M. C. & Spaan, W. J. (1991) J. Virol. 65, 2910-2920. 21. Xiong, Z., Kim, K. H., Kendall, T. L. & Lommel, S. A. (1993) Virology 193, 213-221. 22. Prufer, D., Tacke, E., Schmitz, J., Kull, B., Kaufmann, A. & Rohde, W. (1992) EMBO J. 11, 1111-1117. 23. Eggen, R. & van Kammen, A. (1988) in RNA Genetics, eds. Ahlquist, P., Holland, J. & Domingo, E. (CRC, Boca Raton, FL), Vol. 1, pp. 49-69. 24. Hellen, C. U., Krausslich, H. G. & Wimmer, E. (1989) Bio- chemistry 28, 9881-9890. 25. Palmenberg, A. C. (1990) Annu. Rev. Microbiol. 44, 603-623. 26. Koonin, E. V. (1991) J. Gen. Virol. 72, 2197-2206. 27. Hyypia, T., Horsnell, C., Maaronen, M., Khan, M., Kalkki- nen, N., Auvinen, P., Kinnunen, L. & Stanway, G. (1992) Proc. Natl. Acad. Sci. USA 89, 8847-8851. 28. Meyers, G., Wirblich, C. & Thiel, H. J. (1991) Virology 184, 664-676. FIG. 4. Tentative phylogenetic relationship for RNA-dependent 29. Dolja, V. V. & Carrington, J. C. (1992) Semin. Virol. 3, 315- of supergroup I positive-strand RNA viruses. The 326. RNA polymerases Rev. Biochem. Mol. same set of sequences was analyzed as for Fig. 3 A and B. The 30. Koonin, E. V. & Dolja, V. V. (1993) Crit. was consensus of 100 boot- Biol. 28, 375-430. dendrogram derived by majority-rule 31. Gorbalenya, A. E., Donchenko, A. P., Blinov, V. M. & Koo- strapped trees generated using the KITSCH program of the PHYLIP nin, E. V. (1989) FEBS Lett. 243, 103-114. package, which implements the distance-matrix algorithm of Fitch 32. Doolittle, R. F. (1986) Of URFs and ORFs (University Science and Margoliash under the evolutionary-clock assumption. The other Books, Mill Valley, CA). algorithms applied (15) produced very similar tree topologies, except 33. Gorbalenya, A. E., Koonin, E. V., Blinov, V. M. & for the protein-parsimony algorithm, which suggested grouping ofthe Donchenko, A. P. (1988) FEBS Lett. 236, 287-290. astrovirus polymerase with the and potyvirus polymer- 34. Bazan, J. F. & Fletterick, R. J. (1989) FEBS Lett. 249, 5-7. ases. For abbreviations, see Fig. 3A legend. 35. Gorbalenya, A. E., Koonin, E. V., Donchenko, A. P. & Bli- nov, V. M. (1989) Nucleic Acids Res. 17, 4713-4730. the importance of this group of viruses as a cause of disease 36. Gorbalenya, A. E., Koonin, E. V. & Lai, M. M. (1991) FEBS in and animals. Lett. 288, 201-205. 37. Rozanov, M. N., Koonin, E. V. & Gorbalenya, A. E. (1992) J. We thank Daniel Bradley, Jon Gentsch, and John O'Connor for Gen. Virol. 73 (8), 2129-2134. critical reading of the manuscript. 38. Gorbalenya, A. E. & Koonin, E. V. (1989) Nucleic Acids Res. 17, 8413-8440. & 1. Madeley, C. R. & Cosgrove, B. P. (1975) Lancet ii, 124. 39. Vartapetian, A. B. Bogdanov, A. A. (1987) Prog. Nucleic 2. Monroe, S. S., Stine, S. E., Gorelkin, L., Herrmann, J. E., Acid Res. Mol. Biol. 34, 209-251. Blacklow, N. R. & Glass, R. I. (1991) J. Virol. 65, 641-648. 40. Wimmer, E. (1982) Cell 28, 199-201. 3. Matsui, S. M., Kim, J. P., Greenberg, H. B., Young, 41. Giachetti, C., Hwang, S. S. & Semler, B. L. (1992) J. Virol. 66, T. J. E., Black- 6045-6057. L. V. M., Smith, L. S., Lewis, L., Herrmann, 42. Aroonprasert, D., Fagerland, J. A., Kelso, N. E., Zheng, S. & low, N. R., Dupis, K. & Reyes, G. R. (1993) J. Virol. 67, Woode, G. N. (1989) Vet. Microbiol. 19, 113-125. 1712-1715. 43. Dingwall, C. & Laskey, R. A. (1991) Trends Biochem. Sci. 16, 4. Kurtz, J. B. & Lee, T. W. (1984) Lancet i, 1405. 478-481. 5. Greenberg, H. B. & Matsui, S. M. (1992) Infect. Agents Dis. 1, 44. Carrington, J. C., Freed, D. D. & Leinicke, A. J. (1991) Plant 71-91. Cell 3, 953-962. 6. Herrmann, J. E., Taylor, D. N., Echeverria, P. & Blacklow, 45. Li, X. H. & Carrington, J. C. (1993) Virology 193, 951-958. N. R. (1991) N. Engl. J. Med. 324, 1757-1760. 46. Koonin, E. V., Gorbalenya, A. E., Purdy, M. A., Rozanov, 7. Willcocks, M. M. & Carter, M. J. (1992) Arch. Virol. 124, M. N., Reyes, G. R. & Bradley, D. W. (1992) Proc. Natl. 279-289. Acad. Sci. USA 89, 8259-8263. 8. Monroe, S. S., Jiang, B., Stine, S. E., Koopmans, M. & Glass, 47. Gough, R. E., Collins, M. S., Borland, E. & Keymer, L. F. R. I. (1993) J. Virol. 67, 3611-3614. (1984) Vet. Rec. 114, 279. 9. Devereux, J., Haeberli, P. & Smithies, 0. (1984) Nucleic Acids 48. Lambden, P. R., Caul, E. O., Ashley, C. R. & Clarke, I. N. Res. 12, 387-395. (1993) Science 259, 516-519. 10. Lambden, P. R., Cooke, S. J., Caul, E. 0. & Clarke, I. N. 49. Kitamura, N., Semler, B. L., Rothberg, P. G., Larsen, G. R., (1992) J. Virol. 66, 1817-1822. Adler, C. J., Dorner, A. J., Emini, E. A., Hanecak, R., Lee, 11. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, J. J., van der Werf, S., Anderson, C. W. & Wimmer, E. (1981) D. J. (1990) J. Mol. Biol. 215, 403-410. Nature (London) 291, 547-553. 12. Henikoff, S. & Henikoff, J. G. (1992) Proc. Natl. Acad. Sci. 50. Tam, A. W., Smith, M. M., Guerra, M. E., Huang, C.-C., USA 89, 10915-10919. Bradley, D. W., Fry, K. E. & Reyes, G. R. (1991) Virology 13. Gorbalenya, A. E., Blinov, V. M., Donchenko, A. P. & Koo- 185, 120-131. nin, E. V. (1989) J. Mol. Evol. 28, 256-268. 51. Bazan, J. F. & Fletterick, R. J. (1990) Semin. Virol. 1, 311-322. Downloaded by guest on October 1, 2021