<<

Proc. Natl. Acad. Sci. USA Vol. 90, pp. 8595-8599, September 1993 Microbiology Giardiavirus double-stranded RNA genome encodes a capsid polypeptide and a gag-pol-like fusion protein by a translation frameshift (RNA-dependent RNA pdymerase/pseudoknot) ALICE L. WANG*#,, HUNG-MIN YANGt, KATHY A. SHEN*, AND CHING C. WANG*t *Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94143-0446; and tlnstitute of Molecular Biology, Academia Sinica, Taipei, Taiwan 11529 Communicated by Stanley N. Cohen, May 26, 1993

ABSTRACT Giardiavirus is a small, nonenveloped protein. On the other hand, RNA-dependent RNA polymer- comprising a monopartite double-stranded RNA genome, a ase (RDRP) activity resistant to a-amanitin or rifampicin has major protein of 100 kDa, and a less abundant polypeptide of been found to be associated with the isolated virion (7). 190 kDa. It can be Isolated from the culture supernatant of Whether this polymerase activity can be attributed to p190 Giardia lwnblia, a parasitic flagellate in human and other has not yet been determined. mammals, and effclently infects other virus-free G. lamblia. A Unlike the uninfectious yeast killer virus (ScV) (8), Leish- single-stranded copy of the viral RNA can be electroporated mania RNA virus (LRV) (9, 10), or other fungal (11), into uninfected G. amblia cells to complete the viral replication GLV is shed into the culture supernatant and is highly cyde. Giardiavirus genomic cDNA of 6100 nt was constructed infectious in its purified form (12). We have also isolated a and its sequence revealed the presence of two large open single-stranded full-length copy (SS) of the viral RNA (13) reading frames that are separated by a -1 frameshift and share and demonstrated that SS can be electroporated into virus- an overlap of 220 nt. The 3' open reading frame contains all free G. lamblia cells to generate infectious progeny virus consensus RNA-dependent RNA polymerase sequence motifs. particles (14). It is thus possible that the GLV genomic A heptamer-pseudoknot structure similar to those found at transcript may provide a means for the introduction offoreign ribosomal slippage sites in and yeast killer virus genes into G. lamblia. was identified within this overlap. Immunostudies using anti- Toward this goal, we have sequenced the viral genome and sera against synthesized peptides from four regions in the two obtained the nucleotide sequence of a contiguous 6.1-kb open reading frames indicated that the 100- and 190-kDa viral fragment, which encodes two large open reading frames proteins share a common domain in the amino-te l region. (ORFs), both located in the SS strand.§ From the de- But the 190-kDaprotein makes a -1 switch ofits reading frame duced amino acid sequences in the ORFs, four peptides were beyond the presumed sippage heptamer and is therefore a -1 synthesized and used to immunize rabbits. Results from framesmhit fusion protein simil to the gag-pol fusion protein Western blots of the viral proteins that reacted with the found in retroviru. antisera indicate that (i) ORF1 encodes pl00, the putative capsid protein, and (ii) p190 is an ORF1/ORF2 (-1) frame- Giardiavirus (GLV) is a double-stranded (ds) RNA virus of shift fusion protein that is also most likely the viral RDRP. the family that specifically infects Giardia lam- blia (1), an anaerobic flagellate- that causes giardiasis, a MATERIALS AND METHODS diarrheal disease in man that affects underdevel- prevalent Cell Culture and Preparation of RNAs. G. lamblia WB oped as well as developed regions of the world. The proto- Total zoan parasite has two nuclei of similar size and function but strain and culturing conditions have been described (1). lacks mitochondria, endoplasmic reticulum, or well-formed RNA from GLV-infected WB cells was prepared by the hot Golgi (2). Phylogenetic mapping employing small subunit phenol extraction method (1). GLV dsRNA or single- rRNA sequences placed Giardia at the first diverging branch stranded (ss) RNA was then isolated from the total RNA of eukaryote descent and identified it as one of the most through a CF-11 column (15) or by elution from electropho- ancient eukaryotes (3). Understanding the regulation ofgene resed agarose gel slices. expression in this organism is therefore ofparticular interest. Synthesis of Partial cDNAs from GLV RNA. See Fig. 1 for However, relatively little is known about this important the strategies that were used to obtain the overlapping cDNA organism, partly for lack of a genetic transfection system to clones. study the expression of artificially constructed genes. Preparation of Antiera and Western Blots. The following GLV is a 36-nm nonenveloped icosahedron comprising one peptides, double-underlined in Fig. 2, were synthesized by the dsRNA of about 7 kb, one major polypeptide of 100 kDa ABI system at the Institute of Molecular Biology (Academia (plOO), and a less abundant polypeptide of 190 kDa (p190) (4, Sinica, Taipei), conjugated to keyhole limpethemocyanin (21), 5). The molar ratio ofplO0 to GLV dsRNA in purified virion and used to immunize rabbits: (i) the capsid peptide C, has been determined and found to be consistent with the NHITTTYATEEEVTMAIK (encoded by nt 1883-1936, 5:3:2:1 symmetry required by an icosahedral particle (4). In in ORF1); (ii) the terminal peptide T, YSIRVSTLHHLMT- addition, antisera raised against purified virus particles react TRAK (encoded by nt 2952-3002, at the carboxyl terminus of strongly with plO0 but not with the dsRNA. Virus preincu- bated with this antiserum is no longer infectious (6). We Abbreviations: GLV, giardiavirus; ORF, open reading frame; believe that plO0 is most likely the viral capsid RDRP, RNA-dependent RNA polymerase; ScV, yeast killer virus; therefore LRV, Leishmania RNA virus; BSA, bovine serum albumin; ds, double-stranded; ss, single-stranded. The publication costs ofthis article were defrayed in part by page charge *To whom reprint requests should be sent at the * address. payment. This article must therefore be hereby marked "advertisement" §The sequence reported in this paper has been deposited in the in accordance with 18 U.S.C. §1734 solely to indicate this fact. GenBank data base (accession no. L13218). 8595 Downloaded by guest on September 29, 2021 85% Microbiology: Wang et al. Proc. Natl. Acad. Sci. USA 90 (1993)

1 pDON27 984 pG3O 5710 PTPT6100 Computer-assisted analysis of this 6.1-kb cDNA revealed 890 pRBW 1660 3120 3312 the presence of two large ORF's, ORF1 (nt 356-3028) and pSAN27 555 pMNI 2606 5249 pDBT 5911 11_67 ORF2 (nt 2806-5977), separated by a -1 frameshift and 2164 pGGT2 3232 4475 pM2 5483 sharing an overlap of 220 nt (Fig. 2). Hybridization studies pCT4 1 145 1205 PNFPJ661 3229 pLDN 4557 using strand-specific probes indicated that both ORFs are pRP5 pR.P4 from the SS strand. No other ORF longer than 100 amino acid 2507 _2652 5362 _5512 residues was found in all six reading frames of this cDNA. ORF1 begins at nt 356 with the first codon for amino acid 1 kb Trp. Since there is a stop codon in each of the three reading frames within 30 nt upstream from this point, the first FIG. 1. Partial cDNA clones of the GLV genome and their possible translation start for ORF1 is therefore the Met overlaps. Several strategies were used to obtain the first strand cDNA of these clones. pRP4, pRP5, and pG30 were synthesized encoded by nt 368-370. Translation commencing at this using random hexamers, avian myeloblastosis virus reverse tran- position would terminate at nt 3026, producing a polypeptide scriptase, and 1 ,ug of gel-purified GLV dsRNA as template (16). of 887 amino acid residues with predicted molecular mass of pM2, pMN1, pRBW, and pNFP were synthesized using gel-purified 98.4 kDa and pI of 6.4. The next in-frame AUG in ORF1 is SS as template and primed with antisense primers derived from the at nt 719-721. A polypeptide starting from this site would sequences of pRP4, pRP5, and pMN1. The second strands of these have the predicted molecular mass of 85.5 kDa. Based on its cDNA clones were synthesized according to the method of Gubler mobility in SDS/PAGE, the size of plO0 is estimated to be (17). pGGT2, pLDN, and pDBT were synthesized using primers closer to 99.5 kDa, rendering the first Met at nt 368-370 the derived from previously verified cDNA and the reverse transcrip- most likely translation start site. Unfortunately, both amino tase/polymerase chain reaction (PCR) method (18). pDON27, pSAN27, pCT4, and pTPT were synthesized similarly by the reverse termini ofplO0 and p190 are found to be blocked, precluding transcriptase/PCR method, except that the template was poly(A)- proof of the start site with direct peptide sequencing data. tailed GLV RNA synthesized from methyl mercurial hydroxide- Upstream of the presumed ORF1 start site Met, there is an denatured (19) GLV dsRNA. After the synthesis ofthe second strand untranslated region of 367 nt that may contain the promoter cDNA, the DNA fragments were spliced into the EcoRV site of the elements of the viral genome. pBluescript vector (Stratagene). Viral origin of inserts was verified ORF2 (nt 2806-5977) encodes a polypeptide of 1057 amino by Southern and Northern hybridizations and by S1 nuclease pro- acid residues with a predicted molecular mass of 120.1 kDa tection studies (16) and their nucleotide sequences were determined and pl of 9.65. Visual inspection of its amino acid sequence by the dideoxy chain-termination method (20). Each insert was revealed the presence of all known consensus sequence sequenced in its entirety from both directions. motifs of RDRP (23), which are underlined in Fig. 2. Identification of Ribosomal Slippage Heptanucleotide and a ORF1); (iii) the sphinx peptide S, LVLRHNLR (encoded by Downstream Pseudoknot. Inspection ofthe 220-nt ORF over- nt 2809-2832, at the beginning of ORF2. It is in-frame with lap revealed a ribosomal slippage heptamer CCCUUUA (nt peptide R but upstream from the presumed frameshift site at 2836-2842) plus a downstream pseudoknot (Fig. 3) that agree nt 2836-2842. see below.); and (iv) the RDRP peptide R, with all of the structural features in the slippage frameshift MSYLEYIMADTIVDKAFTTT (encoded by nt 4246-4305, sites found in a large number of RNA viruses (24). The in ORF2). Titers of antisera were monitored in ELISA using constraints required for frameshifting in these viruses, such bovine serum albumin (BSA)-conjugated peptides (1 mg/ml) as retroviruses, coronaviruses, and two dsRNA yeast vi- as antigens. After the third monthly booster injection, all four ruses, include (i) a heptamer consisting of a run of 3 A, U, or antisera showed very high titers in ELISA, each giving pos- G residues followed by the tetranucleotide UUUA, UUUU, itive results beyond 50,000-fold dilution. or AAAC and (ii) the presence of a stem-loop or pseudoknot GLV was partially purified by lysing the infected G. on the 3' side 4-9 nt from the heptamer. The structure of lamblia cells in 1% Nonidet P-40 in TSE buffer (10 mM heptamer-pseudoknot in GLV conforms to all of these rules Tris HCl, pH 7.6/0.1 M NaCl/1 mM EDTA). The virus was with the only exception that the first three residues of the pelleted twice from the cleared supernatant at 65,000 rpm for heptamer are C instead of A, U, or G. 45 min in a Beckman TL100.2 rotor. The final virus suspen- Identffication of the Translational Products from ORF1 and sion was then fractionated in 6% SDS/PAGE (22), blotted to ORF2 with Specific Antisera. Since either GLV ORF1 or nitrocellulose, and incubated with the primary antiserum at ORF2 is too small to encode the 190-kDa viral polypeptide, the dilution of 1:1000. Immune reactions were visualized with and there is no evidence for alternately spliced viral RNA, we the ECL horseradish peroxidase chemiluminescence system synthesized four peptides to test if GLV adopts a fusion (Amersham). protein strategy for the synthesis ofits two viral proteins from a single, unmodified mRNA (24). Peptide C represents a small segment in the midportion of ORF1. Antiserum against C RESULTS should recognize the 100-kDa viral capsid as well as p190 if the latter is a fusion protein. Peptide T is at the carboxyl Nucleotide Sequence of the Giardiavirus Genomic cDNA and terminus of ORF1, downstream from the hypothetical ribo- the Start Site of GLV Polypeptides. The complete sequence somal slippage site. Antiserum against T should recognize comprising 6100 nt of the GLV dsRNA genome, constructed only the 100-kDa viral capsid protein. Peptide S is at the through extensive verifications and overlaps, is shown in Fig. beginning of ORF2, upstream from the slippage heptamer. If 2. Although the size of GLV dsRNA has been previously the ribosomes switch frames at the putative slippage site, the estimated as 7 kb based on its mobility on denaturing gels (1), S sequence would be absent from the viral capsid and the the present sequence data suggest that the dsRNA genome fusion protein. Consequently, antiserum against peptide S has a smaller size of 6.1 kb. We believe that this fragment is should not recognize either viral polypeptide. Peptide R is the full-length viral genome, because its outermost cDNA from the midregion of ORF2. Antiserum to R should recog- clones were generated from poly(A)-tailed GLV dsRNA and nize only the p190 protein. the sequences adjoining the added As were reproducible in Immunoreactions between these antisera and the viral repeated experiments. On the 5' side, we also reproduced polypeptides on Western blots are shown in Fig. 4. The lower with several internal primers the same 5' terminal sequence band in Fig. 4G is perhaps a degradative product ofplOO often of this 6.1-kb fragment that is complementary to the 3' detected in GLV samples prepared without the final CsCl- terminus of the antisense strand. banding step (1). The polypeptide p190 is present in quantities Downloaded by guest on September 29, 2021 Microbiology: Wang et al. Proc. Nati. Acad. Sci. USA 90 (1993) 8597

CG0&A~GTGAGGCCTTAOT?~C C CCT=~TAACAAACTGTVCCWAWA~0 ------GTGGTAGAcGGTATGTCTAGG T~CAG&GAAAGOGGOCCCTGTTAQGCTGCT AGACITCGCCAcCOTACACCATTGAGAAC 150 ACATAG TGGTGAATTTATGYTGTTTGGC TGOWTGTGTCACGAGCACGATCTATOGGGG AATCTAAcCCAQG;TGGTOTTGTGTATGhTG AG0TACAAOCCCGTCAACACCACAGGTGGA GR0GCAGGTCTCATTT T~AA 300 AACGRTGCCC TAOCT TGAAACATTAGGr4WC~GACACTAGTGGOC AAGCCCGATGGGTGQG0GACTGGATATQG GAACCCCAGTCCGTTAAACCCGTACGGCTT CGCCAGCCTACAC GGGG0TATCC 450 N A S PN V N G T G Y G N P S P L N P Y G F A S L H R GG L N P> CCTTATTTTGGCACCTGAAAATATTACTTT TGACACTTTAAACACACAkCAATGRTCACGA GGGACCCATGGAGAGTCTCCAGAGGTTCC TAAGGCCTCGATCGCGCCTGCCGGGAGGCA GhATGTCCCGGTTCTCCAACAAAATCAAGA 600 L I L A P K N I T F D T L N T H N D H Z K T N G K S P K V P K A S I A P AG R 0 N V P V L Q Q N Q K) AAACGA0ATTCA~~GCTG000TAG GAGG CGCAAGATG&GCGCGAAATACG ATTCTCAGCCATAAAAACTCTCTACAATTA TTACTCAGA=GTCCTTcGAcCcCAAT'TAT GCOGCATTTAGTTAACAGGTTACGTGGATT 750 N UDUHBA L GOS8 KD AKXD ZRKZIR FSIA KTL YN Y YSKEG P STP IN P NL V NR L RGL> AGATOCcTGGCCAAG&TAGhCGCGACOCT GLACAAGGTOGATATGAACG~G~Cr TACAT ATTCGWCC?GAGACCGACTTTTCCGTATTC GTATGGGTARAACAGCOCTTCAGTAACAG G&GATTAACChCTAGTGCGCTTTOCTACGC 900 D AL AKXID A TL TXV D NNA A YI A LR P T FPY S Y GYKXQ RFTS NR R LTT SA L CYA> AAGC0 TCACCTCATTTCT'GRCCT ACWAAAACGTACACATCTAACTCTCCATT ------OGGTCTAGAGGGTGGCCCATTTT CAACGTGGATGTTTCATCACACGTG0CGGA GCCTCACATGhWGGACATTGTCAOCTATCOG 1050 R T G L SS F L T V D K T Y T S N S P L K G G S R G N P I F N V D V S S H V A K P N N R T L S P I G> TCTTGAAGTATTCA&TCTO0CTACGICTCA AITTTTAAACTCTATTAACTGCATCATC GAAAGTCTTCACQCAATCGCTCTACACTGC TGAiCATACTATCGATCTTTOGGTGGTGTT CCIGvCCACACGTGATGCAACCGMTCIAAA 1200 LK2vFrN LA T sQ rFSK TL L TA SS KYFITS L Y TA D I L SI F GKvFr L PHBV NQP V SN CTACACTCCAATACTTGTIMG=OOACTATT G9TTGATCCACATACTQGGCG TC GOTAATTGCTCACTTTCATCGTCTATCTT CGA.GAGCTCAATTCCACAATTCCTTACGGT AAGTCATAG AACTIGAAAC 1350 YT P I L V R A L L A L I N I L G P G S G N C S L ~S S S I F K. SS I P 0 F L T V S N S T N N S N R T> GAGGTACTGCTTACTACAGAGGITCTICTA CAAGGRTATGTTCAGAAGGCIATCCCACC TCAGTCGACTTGCCACcGACGCTCGCGCC GGAG0GGrCICTCGCTCG0GITITAATCOC AGAGGOCCTAGGACCAGCCCTATGITTCC 1500 R Y C L N T R S A Y K D N F R N G I P P 0 S T L P P T L A P K G 5 S A R V L I P K A L V T S P NF P> AT0GTTACTTAICTTAGTC f3rWGCr AICAGTCTCTTGTATTCAAAGGACGCTAG TATTAATA=CGTAGACATCOGTTCGAGGQG GAGGATTACATIRCAAMICCCTG&CGTCGC CAAGCITGRTCTICCACAGACICIGGAAICT 1650 N L LIL V S SGP Q FFrL Y SKD AS INITVD I G SRG R I T SP IP D VA L D LHARL NN L) CTTCOGTTTTGATGGTTACcGCTACATCGR COGGIIATAGTGOGCGCGGATAGAG&ITA CGIIGG TCATCAAAATIvGT~GTGATGT ACACGGvG0CAAGGQACOCAATGGTACTGG CAACTACOr"iACGCTGACGTTCAIGACGG 1300 FARF D GY RY I D V VI V GAD R DY VNP YQ0N GV YV N OGKXG P NG TG N Y G NAD VNBDG AATAG0RGATAATTTTOTCTTICRACAA ChATGTTAACGIGCAAACGTOCG&CCTCAT GCTG0GCCTATTGA~CCCTGTGG&ACCACAT AACGACTAC0TACGCTACTGAGGAGGAGGT CAWATGGCGATTIAAAITTCAGWGCIIT 1950 I 0 T I F S S F U U N V N V Q T S D L N L G L L T L N N N I T I T I A TKKKV T N AIK I A A A F> IGAACIAWICTACOCAOTACW-=--ATTGT CTACTCCGTTICCCGAAAGCTTTTCAGAG ACRCCAGTCATACTACCAG~CCCTCATCGG GAACIGTCCAcAC CG IChAGAGCGIGI0GAIRAC0GTCGAGCTTAG 2100 A L V Y P V Q P I V Y S G C P K A F Q R N T S Y Y Q P S S K N C Y A T D T A K V K S VN DT V K L S> CGIAC0SITATA&IAA CTATOGTACTAGG TATGACACTACCGTTTGGcCAACCTACACT TAGCAGTGCACRATGGTATAATAACATCGA TAAGC~GAGATCTCTATIGTTTAAGGTCOG CRATCTACCGTTACAGAA~CICG&IIATCT 2250 VQ0V N NA NV LG NTL PF GQ0P TL S SA0N Y N N ID K AKI SN FKXVG N LP L 0NL D YL> ATICTC'CATATGIOGIOAATCTA~CICC TACGACAGGACAACTGTA~ACOATACGTAG TGATAACCTGATCTTGAOCOCCCATAOGAC GGTIGAACCTCOGAATAGGCTACACAGCTCT GOCAGATTTCTTCOCTTATCTAGCTTCAGT 2400 S L D N N KF Y A P T T 0 0 L Y D I R S D N L I L S A N R I V NL G I G Y T A L A D F F A Y L A S V> cCCTOCCCAGAGTTTTTACCACGACAOGAT OOTCACCTCACCGATCTCAAAGCAGACGTA CTCCGTCTATGAACGTTTTATTGAACGTTT TATTGATGACTTCGTAGOCTOGGAICAGATG TGACCTGTTCAACCTTGATACATTGTTAGO 2550 P A QS3F YNHDRN VTS P I S K Q TY S V Y ARF I EKRF IDD FV G NDARC D LF NL DITL LOG TOC&AAACACATAGCCOOGGTIAOCCTCATC TCCAATCCCGTOGCACTGCTCGCTACAGAG AIGCCCACTACCCATAATCATOCATTACAC A0OCCTCACTTTCGGCCAGGAGCATATAAC AGTCAOMGIACOTCOCGOOIIO&OOOCTT 2700 AKNHI A GV AS3S P IP NNHC SLQR C P LP I INNHY T G LITF GQK HI T VRD V A GVZGYK L) ACAOCAG&TCGTCATGRGGGACTTCCAGOG TCGTATCGTAGTAGAGCGCTTOOOCACAGC TOCACCATCTAGAATTOCAGTCAAATTAGA CTOGTCTAGGCTTAGTGCCTOGTACTCAGA CACAACCIGGCA TICCACG 2650 Q QI V NR D FQG RAIV VKRL GITA A P SR IA VKXLD NSR L SA NY SD T T CA IP LS3DR> * C L V L A a N L R H P F I R 5) TGTGATOGAGATCGTCAATTACOCOOCAAT ATOOO&TCCTACACAGGAAAGACACOCACA GGTTTC0GTGACACATACTTTAGCCCCAAT TTCCITTCGROCTTTAAC(;GTGCTGAGCCT ATATTCAATACGOGOAAICAAhCTTACACCA 3000 V NKEI VN YA AI N DPITQKANHA Q V SCINITL A PI SFTRAALITC LS3L Y EI RVIT L T i C DG DAQ L ROGN NOS Y TGKITRIT OF V Y TYFS P N F LS S F NVS K P I FUNT S lUL CCTTATG&CGACACGAGCCAAGCTGTAMTT CAAAATTTGTCCATOCCACIkAT&ICICAGC TTCGAiTCCCTACTACGAGTCTACGTTCTAT GTOGTCT~COGWATAACGAGOIOATACCrG ACTTCAWoOTGCICIIOOAAGIACC&ATA 3150 p I YYD NaLA VI Q N L S NPQ0NLS F D PYY EKSIFrY V V SA DNKENI P TISG PA NKXV PY> cTGGAAA&r.TAGTTAAGcftqTCIOO GcOocTwC.CI0OAwTcCAGGATAOCATCC AATAACGOGOITGGAGACCGTACCTTCTTG ACGACOGIGAAGACAAGAAAGOGAAGOCAT TTTACCTACTTCTCTGCTOCTTTOOOOOC 3300 L K N V V K R S G R R LL A K LR I A S N N G S G D A I F L I I C K I R K G R N F I Y F S AA L 0 0) AAGATCCTW.MTTCGTATOW,CTCCACTG TCOGAGTACAGTCTCCAGGGCGGOCAAACG; ATCTACGCTCCGATTCAO;CTCCAAGACGTC ATACCAGTACGTAAAGAOGATCCAGTGCCA OrGGAOCATATAICAGICTTTAAGTTCTTC 34,50 K I L K F V C A P L S S I S L Q G G 0 T I Y A P I Q L 0 D V I P V R KK D P V P G S I Y A V F K F F) TCAGACAA~A&WOTGGGAOOCACGTGC CTTAAGTCCTACAAAOICAGGTTCCAAGAT CTCCCAAGCCACATAGIGAITAOGAGII AG0AGTTG AAATAIIOO CGAOOGGTTATGTCGATACAOITTTAAG 3600 S K P K A N K A R A L K S Y K V R F 0 D L P 5 N I V I S K L K K A AA R S Y I G S A G Y V D 1 G F K) GCTTTGGACATTTATATAGATATTCTTTCC CAAATOGAGCTOCCAAAGTACATACACGAG TTCTTAATTTTACTCAGAGGCAAGGTGTGT GAAGTCTCGOGIGTTGACAAGOAGAO ACAG GTATTCGTAATATTACTAACCGTATTCTCT 3750 A L D I Y I D I L 5 0 N K L P K Y I N K F L I L L A G K V C K V S R L Y K K K 0 V F V I L L I V F 5) GAOCTCACAGCChAGTOICO.3-ATAGAGGT AACAAGTCCACGOOTCGATGGGGAGGATG TGGACTTTGTTATCAGACTTCGAOAkCCTTG CTTGGOAAOOTIOAOCIATAAGAACCCGAGT ATCATTGAGGAGCAGGTTGTTC~CCTGCTA 3900 K L I A I V AN A G N K 5 1 G 5 N G A N N I LL S D F K I L L G K V S Y K N P 5 I I KK 0 V V P N L> ACCTCAGACCCTATACCACGTAC~CCCGAC TTCTACTCTACATATTTCAAGACOOCTGTC CAGTTTATOCACAGGACATTCGTTCCCGTC ACTCTCAGAAGTGCCCICTCCGTAACATTT CACGAGTACTGTGGOAGGCCGGAGCTCTGG 4050 I S D P I P A I P D F Y S I Y F K I AV Q F N N A T F V P V I L A S A P P L IT F N K. Y C G A P K L N) GGOACCACOOO&ATCCOCTACATC0OPCTAT OOCAAOCOTAGTTTTAACAAATGOOTCGATC TACGGAGCTTATCCTACCGAGG0kATTTIAT COOCTAOCACTCTATGOAGATAATCCACOC CTCAAGCCOCTOGAAAAACCCGAACTTACG 4200 G 1 1 0 5 0 Y I 0 Y 0 K A SF N K N S I Y G A Y P 1 KK I Y A L A L Y 0 D NP P L K P L K K P K L I) AAOOIACGTGCTGTAATCTCOOCICTCCCC ChAAiCGATATATTGATGTCTTA~CICTCG TACATCATGOCAGhACCCATAGTTGACAAA GCATTCACTACAACICTGATGAACOACAOO C&OCTAGAGAACCTTGAGCGCCAC&TGATG 4350 K V A AV I S A S L 0 5 Y I L N S Y L K Y I N A 2 T I V D K A F T T T L N N D A 0 L K N L K A N N N) kCCATACCcO0AOGGICAOOI=AGTG OACCAGT=CAATTTTGATCGCCAGCCTGAC TTGGIOCAAAIAOOCATCTOOCAACAGTTA CTGTTTCAkCCTOOCATCTOCOTCAGCTCCG TACAOAO=CGOCACROIGTCTCATTAGTC 4500 TINITOOGVARV PV AQ0S WL.__..RAP D L V 0I G I NQQL LFNHL A SA SAP Y RA RD SV SL V> ATATCACOITTAGCITCAACIACGACTTTC CCAAACCTAAAGGTACOOATGOIcGACOOT GACAAACGIOTGCTACACGOCCTCCCcCTCG GGGTOGAAGTOGA~CCGCTCTGCAGOGGCT CTCATCAATGTGAC&CAACTTTTAACCATG 4650 ISAR L AS TITI F P N L K V A N 300 DK A V L H JSL PI.. G UNK N .LA LL G A L I L V 10 L LITN) OCgOAATTGTCCAACACTCT0OTCGTTGI AGGICAACGGTOGTOCAGGGCOACGATATC GCTCTAAGTATGACTGACCGGGAGCAOO= ACCAGTTIAGITGACACTTATGCACGTCAG OGGTTCGAGGTCAACCCGAAGAAGTTCTGG 4600 A K L S N I L A S L A S I V V 0 A_.D....DI A L S N IDA K 0 A 1 0 L V DIT Y A R 0 O F K V N P KLJ_F N) ATATikCACOGA~COOACOAATTCCTTCOG AGOGTAGCAACCCCTGGTATAGTCGCGOGG TACCCAGCTAGAATGAITGRITAAATTGTTA TATCAGCTG&CGGAACCTGAAGAGCCCTCA CACTAC&TCTCCATGTTGCCAAAGTTAGCG 4950 I S P D A D K F L A A V A I P G I V A 0 Y P A A N N I K L L Y 0 L I K P K K P 5 N Y I S N L P K L A) AAOOOGCCTAACIGTGGCCAOG&GCATGTA AAOCCCCO0CTCGATGTGCCATGGTCGGAG ACCGTCGAAGAACTTGATTGGCGCGAAGCT CTCGCTATACTGAOOCATAGOCCTGCCCGT ACGTCAGAACTGGTGACTCRAATOOTCCAA 5100 K V P N VV 0 K N V K P A 5 D V P N S K I V K K L 0 W A K A L A I L A, N A P A A I S K L V 1 0 N L 0) CTGATTGGIAOOITTACAGCTGCTCATCC& G&TAAOC0AOCCTTAACTCTTCTCTACCGT TGGTTCOGTGAGGATTTIGCACATGCTACA AAAATAAAAAAAAGAAATCTTCTGGTCCTG TTACAOCAO=CO TTCTGGIO0OOOIAT 5250 L I 0 A F I A A N P D KA A L I L L Y A N F V A 0 L I N A I K I K K A N L L V L L 0 0 P 0 F N 0 0 Y) ICAGAGCCTOGTTOOCTTIOTTAAGGACC ACTCACCTCGCAGATATGATAGTIAGCGAA CTAGACATAGGAkCTOCCAGGTCCTCOGGCC ACGTCTAGTCGCTTTOOGACAICCCCTACC TCACTOG=CCOCACTACCTCACTCTGATA 5400 5 0 5 L V 0 L L R II N L A 0 N I V S K L 0 I 0 L P G P A A I SS A F 0 1 5 P 1 5 L A P N Y L I L I) GTTCCGAGTAGOGTGCATGTOCAGTCGCAA AAOGAATTGGATTTGTGGGGOCTGTOCAGA TCCAGTAAATATOCTAAOCACTATTCAAAC ATATTTCGATACTACAAACTGACTCTTCCT ACGTT0GTGCTIIOOOGCCCAACOOCTAGGT 5550 V P 5 5 V N V 0 5 0 K K L D L N G L C A S S K Y A K N Y S N I F A Y Y K L I L P 1 L V LNA 0 A L 0) GATAAACACTCCACGATTTCATTAGATCT GTCA~CCCTOGTTICAGAAATACCAAAATAT GAC~CCCATGCGCGTCTATTCACAAGTAAC GGGGTGTCAGTAGGAAITGTATAAAGGATA AGAGTACGACATTTTACCCACA0&0TOCTT 5700 0 K N V 1 0 F I A S V I L G 5 K I P K Y D P N A A L F I S N 0 V S V 0 N L I A I A V A N F I N A V L) CGAAG&A0&GCACCTAAGTGTATCCCCGTA AI~TCGTCGCTATATCGGGhOCAG&CACTA TCGTCGGGGTCTTTOA CCTGACAAAATACTATTAAOCCTACAACOO OCOICOOOIACTCACTGAATCTAGTTAAG 5650 A A K I P K C I P V I V 0 L Y A K 0 1 L S V P K S V A L S K P D K I L L S L 0 A A 5 0 Y S L N L V K) AAGOATATACAGRTCACTAGG&AACCAGTG GACCATCC0GGTGACGCGAGGTTCTCACAG GCATTCCCTAACAACTGGAOCOACTTGGCT AGAACGG0CTGGCACCTTTTTGTTGACTGTA~COGCTTAGGAOGTTTCAGGAGAChATOCT 6000 K ILQI AKXP V DBP V DAARFSQ A FP NN NS D LA AITA GITFL LITV PA

FIG. 2. Nucleotide sequence of GLV genomic cDNA. ORFi and ORF2 are shown in the amino acid single-letter code. Consensus RDRP motifs are underlined; ribosomal slippage heptamers are in boldface type and dotted-underlined, whereas the synthetic oligopeptides are double-underlined. The lowercase residues at the 3' terminus are probably introduced by the tailing step in cDNA construction. Downloaded by guest on September 29, 2021 8598 Microbiology: Wang et al. Proc. Natl. Acad. Sci. USA 90 (1993) DISCUSSION U GlA CG GLV was one of the first protozoan viruses characterized

A U biochemically (1). All protozoan viruses known to date turn G C out to have nonsegmented dsRNA genomes ranging from 4.5 G A UA to 7 kb (5). Among them, GLV remains the only one that is AU infectious as isolated virion, and the infectivity of its RNA GU transcript has been demonstrated (5, 14). Using a combina- UA

GC tion of cloning methods, we have obtained the sequence of a UG contiguous 6100-nt GLV cDNA that consists of two large 2828 GC 2897 overlapping ORFs in addition to 368 and 123 nt of untrans- UGGCUX=UCACGAAAGCOCAA'A'r lated flanking regions at its 5' and 3' terminus, respectively. FIG. 3. Ribosomal slippage heptamer and its downstream cDNA cloning data from 3' tailed viral RNA template suggest pseudoknot in GLV RNA. The sequence of the heptamer is under- that this fragment also includes both 3' termini of the GLV lined. dsRNA. The previous higher estimate of 7 kb for this viral dsRNA was probably due to the anomalous mobility ofGLV 2-5% of plOO and is not always visible in the Coomassie dsRNA in agarose gel electrophoresis. blue-stained gel. It is evident from Fig. 4W that (i) anti- There are several strategies for a virus to make more than peptide C reacts strongly with plOO and clearly with p190; (ii) one polypeptide from the same mRNA. The possibility of anti-peptide T reacts only with plOO; (iii) anti-peptide S fails RNA splicing was ruled out in the case ofGLV because there to react with either plOO or p190; and (iv) anti-peptide R are only the plus and the minus strands of the viral genome reacts only with p190. These results confirm the prediction in the virus-infected cell extract. No subgenomic viral RNA from the -1 frameshift model and suggest strongly that the was observed in Northern blots (13). We have also ruled out GLV RDRP is a gag-pol-like fusion protein similar to that RNA editing because one would expect to find RNAs of found in the retroviruses (25). preedited and edited versions during cDNA synthesis if such Sequence Homology with Other Polypeptides. Using the an event took place. Outcomes from our repeated primer- program FASTA, we did not find any sequence of homology extension and PCR experiments have been consistent with scores above the noise level in the GenBank (release 75, last the conclusion that GLV dsRNA is neither spliced noredited. updated on February 15, 1993) for either ORF1 or ORF2, Our Western blot results also show that the fusion protein suggesting that GLV is not very closely related to any of the p190 is formed via translational frameshift instead of sup- known viruses. However, since all of the consensus RDRP pression of the stop codon, thus ruling out the read-through motifs have been identified in the middle third of ORF2 (Fig. model used by some viruses (24). 2), the latter may still retain many additional features of The apparent presence ofa slippage heptamer-pseudoknot RDRP shared by RNA viruses. Previous studies have shown in the 220-nt ORF overlap in the viral genome and the results that GLV is very similar to ScV and LRV1 in its biological from studying the antisera to synthesized peptides strongly characteristics, such as genome organization and sizes of suggest a -1 frameshift model for the synthesis of GLV virion, genome, and capsid polypeptide (1, 8-10). The pro- RDRP similar to the one found in or ScV (25, 27, gram LFASTA was used to align the three viral peptides 28), but not LRV1, which probably uses a +1 frameshift to pairwise until the known RDRP motifs were correctly synthesize the putative fusion protein (29). aligned. Subsequences were then taken from these paired The heptamer-pseudoknot structure in the GLV ORF segments and further aligned by the algorithm of PIMA (26). overlap contains all ofthe features required for frameshifting Fig. 5 shows that within the region of352 amino acid residues, (24) except for the first three C residues in the heptamer. sequence homology among the three viruses extends beyond Mutagenesis studies have shown that CCC at this position the known RDRP motifs. The percentages of identical resi- functions at wild-type level in yeast (30). To our knowledge, dues among the LRV:GLV, ScV:GLV, and LRV:ScV pairs however, CCC has not been identified previously in the are, however, very low. They are 16.43%, 16.71%, and slippage heptamer in nature. 27.11%, respectively, suggesting that GLV is equally distant The predicted molecular mass for GLV capsid protein of from LRV and ScV, whereas the latter two might have shared 98.4 kDa from the cDNA sequence agrees well with that of a more recent common ancestor. viral polypeptide observed. The predicted molecular mass for the fusion protein commencing from the same start site as the GJ w capsid is, however, 210 kDa, slightly larger than our estimate of the viral RDRP of 190 kDa from its mobility in SDS/ kDa C' R T S PAGE. It is possible that this discrepancy is due to anoma- lous movement of the polypeptide in gel electrophoresis in the high molecular mass region. However, other possibilities such as post-translational processing by proteases (31) or 1)7 initiation oftranslation at internal sites, such as in the case of poliovirus (32) or encephalomyocarditis virus (33), cannot be 66 ruled out at this moment. Within the 368-nt stretch at the 5' region of GLV sense

'. : strand, there are other possible translational initiation 45- codons. Five ATGs are found at nt 82, 170, 204, 323, and 330, respectively. Additionally, an ATGATG repeat is found at nt 234-239. These ATGs are each followed by a small ORF ranging from 3 to 46 amino acid residues, many of which overlap one another. It is not known if any of these small Ficy. 4. Wester blots of immunosera to synthetic peptides re- no acting with GLV viral proteins. (G) Coomassie blue-stained 6% ORFs is expressed. However, peptides of such sizes have SDS/PAGE gel. p190 is too faint to be detectable in stained gels. (W) been noted in SDS/PAGE analysis ofpurified virus particles. CoFresponding strips reacting with antiserum to peptide C, peptide A search in this same upstream region did not find any R, peptide T, or peptide S. sequence motifs of known promoters or any consensus Downloaded by guest on September 29, 2021 Microbiology: Wang et al. Proc. Natl. Acad. Sci. USA 90 (1993) 8599

lrv LPQLVEMKCtLGRGVNEIDV ETEARRRLDV---GSLSMQR LDENELRAAVRLIYS-EELR RPVTY9?LI--CDFWSSRWLW 347 scv ---,LELAVLMNRGVGHVSW QAMZDHRLN------?DVAV VDQARLYSCVPRDM--EGSK QTYKYPfl W'WDDYTANRWEW 369 .glv -TLTSDFEThLKVSYKNPS IIEQVVPWLTSDPIPRTPD FYSTYFKTAVQFRTFVPV TLRSA'PITFHEYCGRPELW 415

lrv AANiS-HSRALE---HAHPE IATRKEGQAYRKAVMEQWQH NPMDRWDGTVYVTPSA1LZ,H G,,,LLCDTS F 423 scv VPGGSVHSQYEEDNDYIPQYTYFI K-- TKt, S,MIASPPEVRAWTST1(Y.W GQIYGTD,S-,TLITNFA 447 glv GTTGK--G-Y-RSFW -WSIYGA--YPIAEEI. YLGDNPFLK I L TKRVISAS ILS-L 484

lrv LPPERIWENSNVX11,PGSM GNCGIATRINGWRG5--P .QSFFAV, YDDE'SQHTLMS QKIVFEEIFHIGYN - 495 scv M - FRCEDEVLTFFPV GDQAEAKHKVN4 --D GASSFCF,YDDF>SOSIAS MYTVLCARTF&RN--- - 515 glv E- YIMATTV~ TTLMNDRQLENLER8TMT RVPQSNflRRQDLVQ IGIWQLXILW$SAPR 558

lrv ----ASWVTIADFDSM-E LWIKGKCAGIMAGThSGH ASFINSVIFAYIICAGGH -P---M-H-GSlJLIS..... 564 scv SU,QAEAMNWCESVRHMW LDPDTKEWY,LQG¶L,LSGWX, L;TTFMNTVLNWAYMKLIAGVF D--,LDDVQDSVHNGWVIS 593 glv RDSVSLVIS~ASTTTFP1.V KVRMSDGDK~VLHG1.PS,WK W,ALLYGLINPQLLTL VISA 638

lrv C-TLGHA,D1;I,A,NLNSCGV Lt4ASK--QVF:SKTSGE,B ,,LaVA HnEBETSI{GYLS 614 scv LNRVSTAVIMSAMHINA AQP,M,CNLF,IS--,,,,,,WVE HGMSGGDG,LGAQ 643 glv MTDREQATQLVDYA~GE V ,-FWI$PDRD,, ,,,RV ATPGIVAGYA 689

FIG. 5. Alignment ofcomputer-deduced amino acid sequences around conserved RDRPdomains from LRV, ScV, and GLV. Identical amino acids are shaded. lrv, Leishmania RNA virus 1; scv, yeast killer virus ScV-L; glv, giardiavirus. translational initiation sequences (34), including those used in 14. Furfine, E. & Wang, C. C. (1990) Mol. Cell. Biol. 10, 3659- vertebrates, Drosophila (35), or yeast (36), surrounding the 3663. putative GLV start site. Future 15. Franklin, R. M. (1966) Proc. Natl. Acad. Sci. USA 55, 1504- capsid translational studies 1511. may verify whether this 368-nt sequence, or even smaller 16. Sambrook, J., Fritsch, E. F. & Maniatis, T. (1989) Molecular fragments therein, can function as promoter for the expres- Cloning: A Laboratory Manual (Cold Spring Harbor Lab. sion of foreign genes in GLV-infected G. lamblia cells. Press, Plainview, NY), 2nd Ed. 17. Gubler, U. (1987) Methods Enzymol. 152, 330-335. We are indebted to Dr. Ming F. Tam for helpful discussions in 18. Saiki, R. k., Gelfand, D. H., Stoffel, S., Scharf, S. J., Higuchi, designing the synthetic peptides and Leslie Taylor for assistance in A., Horn, G. T., Mullis, F. B. & Erlich, H. (1988) Science 239, the PIMA alignment. This work was supported by Grant AI-30475 487-491. from the National Institutes of Health (U.S.) as well as the Academia 19. Payvar, F. & Shimke, R. T. (1979) J. Biol. Chem. 254, 7636- 7642. Sinica and the National Science Council of Taiwan. 20. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. Acad. Sci. USA 74, 5463-5467. 1. Wang, A. L. & Wang, C. C. (1986) Mol. Biochem. Parasitol. 21. Harlow, E. & Lane, D. (1988) Antibodies: A Laboratory 21, 269-276. Manual (Cold Spring Harbor Lab. Press, Plainview, NY). 2. Adam, R. D. (1991) Microbiol. Rev. 55, 706-732. 22. Laemmli, U. K. (1970) Nature (London) 227, 680-685. 3. Sogin, M. L., Gunderson, J. H., Elwood, H. J., Alonso, R. A. 23. Lundblad, V. & Blackburn, E. (1990) Cell 60, 529-530. & Peattie, D. A. (1989) Science 243, 75-77. 24. ten Dam, E. B., Pleij, C. W. A. & Bosch, L. (1990) Virus 4. Miller, R. L., Wang, A. L. & Wang, C. C. (1988) Mol. Bio- Genes 4, 121-136. chem. Parasitol. 28, 189-1%. 25. Jacks, T., Masdhani, H. D., Masiarz, P. R. & Varmus, H. E. 5. Wang, A. L. & Wang, C. C. (1991) Annu. Rev. Microbiol. 45, (1988) Cell 55, 447-458. 251-263. 26. Smith, R. F. & Smith, T. F. (1992) Prot. Eng. 5, 35-41. 6. Wang, A. L., Miller, R. L. & Wang, C. C. (1988) Mol. Bio- 27. Icho, T. & Wickner, R. B. (1989) J. Biol. Chem. 264, 6716- chem. Parasitol. 30, 225-232. 6723. 7. White, T. C. & Wang, C. C. (1990) Nucleic Acids Res. 18, 28. Diamond, M. E., Dowhanick, J. J., Nemeroff, M. E., Pietras, 553-559. D. F., Tu, C. & Bruenn, J. A. (1989) J. Virol. 63, 3983-3990. 8. Bruenn, J. A. (1980) Annu. Rev. Microbiol. 34, 49-68. 29. Stuart, K. D., Weeks, R., Guilbride, L. & Myler, P. J. (1992) 9. Tarr, P. I., Aline, R. F., Jr., Smiley, B. L., J. & Proc. Natl. Acad. Sci. USA 89, 8596-8600. Scholler, 30. Dinman, J. D., Icho, T. & Wickner, R. B. (1991) Proc. Natl. Stuart, K. D. (1988) Proc. Natl. Acad. Sci. USA 85,9572-9575. Acad. Sci. USA 88, 174-178. 10. Widmer, G., Comea, A. M., Furlong, D. B., Wirth, D. & 31. Palmenberg, A. C. (1990) Annu. Rev. Microbiol. 44, 603-623. Patterson, J. L. (1989) Proc. Natl. Acad. Sci. USA 86, 5979- 32. Pelletier, J. & Sonenberg, N. (1988) Nature (London) 334, 5982. 320-325. 11. Lemke, P. A. (1976) Annu. Rev. Microbiol. 30, 105-145. 33. Jang, S. K., Davies, M. V., Kaufman, R. J. & Wimmer, E. 12. Miller, R. L., Wang, A. L. & Wang, C. C. (1988) Exp. Para- (1989) J. Virol. 63, 1651-1660. sitol. 66, 118-123. 34. Kozak, M. (1991) J. Biol. Chem. 266, 19867-19870. 13. Furfine, E. S., White, T. C., Wang, A. L. & Wang, C. C. 35. Cavener, D. R. (1987) Nucleic Acids Res. 15, 1353-1361. (1989) Nucleic Acids Res. 17, 7453-7467. 36. Cigan, A. M. & Donahue, T. F. (1987) Gene 59, 1-18. Downloaded by guest on September 29, 2021