Proc. Nati. Acad. Sci. USA Vol. 82, pp. 732-736, February 1985 Biochemistry

Molecular cloning and complete sequence determination of RNA of human rhinovirus type 14 (recombinant DNA/translation frame/sequence comparisons/proteolytic cleavage sites) PIA L. CALLAHAN, SATOSHI MIZUTANI, AND RICHARD J. COLONNO Department of and Cell Biology Research, Merck Sharp & Dohme Research Laboratories, West Point, PA 19486 Communicated by Edward M. Scolnick, October 3, 1984

ABSTRACT The genomic RNA of human rhinovirus type Tris Cl/1 mM EDTA/0.5% NaDodSO4, pH 7.5, and layered 14 was cloned in Escherichia coli and the complete on a 12-ml preformed 15-30% (wt/wt) sucrose gradient in sequence was determined. The RNA genome is 7212 nucleo- the same buffer but containing 0.1 M NaCl. RNA was sedi- tides long. A single large open reading frame of 6536 nucleo- mented at 23,000 rpm for 17 hr at 4°C in a Beckman SW40 tides was identified, which starts at nucleotide 678 and ends 47 rotor. Gradients were fractionated (0.4 ml) with an ISCO from the 3' end of the RNA genome. Comparisons gradient fractionator and the A260 of each fraction was mea- of the specified with those of other sured. A major peak of viral RNA sedimenting at 35 S, rela- showed a striking homology (44-65%) between rhinovirus and tive to the position of marker 18S and 28S in a parallel . The rhinovirus genomic RNA is rich in adenosine gradient, was pooled and precipitated with ethanol. (32.1%) and strongly favors an adenosine or uridine in the Construction of Double-Stranded cDNA. The procedures third position of codons. The predicted map locations of all the used for making cDNA to the genomic RNA and converting rhinovirus structural and non-structural proteins and their the cDNA to oligo(dC)-tailed, double-stranded cDNA were proposed proteolytic cleavage sites are described. as described by Maniatis et al. (10). To insert the oligo(dC)- tailed cDNA into pBR322, 25 ng of pBR322-oligo(dG) clon- Rhinoviruses are the most important ing vector was annealed with 13, 26, or 39 ng of oligo(dC)- known. The name "rhinovirus" reflects the prominent nasal tailed cDNA and used to transform calcium-treated Esche- involvement seen in with these viruses. Human richia coli strain RRI (10). The transformants were selected rhinoviruses (HRVs) are members of the fam- on tetracycline-containing agar plates. ily, which also contains the entero- (polio, coxsackie), car- Characterization of Selected Clones. Colonies (354) were dio- [encephalomyocarditis (EMC), Mengo], and aphthovir- picked and each was used to inoculate 2 ml of L broth con- uses [foot-and-mouth disease virus (FMDV)] (1). As many as taining 12 mg of Na2HPO4, 6 mg of KH2PO4, 1 mg of NaCl, 2 115 HRV serotypes have been identified (2). Similar to other mg of NH4Cl, and 4 mg ofglucose. The cultures were shaken picornaviruses, the rhinoviruses contain a single-stranded at 37°C for 16 hr, then 1.5 ml of each culture was pelleted by RNA genome that serves as a monocistronic mRNA for the centrifugation. Plasmid DNA was isolated by using the mini- synthesis of all viral structural and non-structural proteins. prep protocol of Holmes and Quigley (11). The resulting Studies with poliovirus have indicated that a (VPg) is plasmid DNA preparations were each suspended in 50 ,ul of attached by a tyrosine-04-phosphodiester bond to the 5' pUp H20, 10 ,u1 of each was digested with the restriction endonu- of the RNA genome. In addition, all the picornaviruses con- clease Pst I, and the size of its cDNA insert was determined tain a short poly(A) stretch at the 3' end of their genomic by electrophoresis in 1% agarose gel. Clones containing in- RNA (3). A variety of experimentql results have suggested serts of at least 1500 base pairs (bp) were mapped by colony- that the genomic RNA of picornaviruses is translated into a hybridization studies (12). single polypeptide from which functional viral proteins are Construction of Deletion Subclones. Deletion clones were derived by proteolysis. constructed as described by Hong (13). Briefly, 1 ,tg of iso- The fact that the RNA genome of picornaviruses is infec- lated cDNA insert from either clone 7 or clone 186 was ligat- tious, illustrates the importance of knowing the sequence of ed to 2.8 ,ug of linearized, Pst I-digested pUC9 DNA (10). E. the RNA genome. The complete nucleotide sequences of the coli strain HB101 was transformed with each of the ligation genomic RNAs of polio, EMC, and FMD viruses recently mixtures and transformants were selected on agar plates have been determined (4-7). We now report the sequence of containing ampicillin at 100 ,g/ml. pUC9 plasmids contain- the RNA genome of a single HRV serotype. ing clone-7 or clone-186 inserts were linearized with DNase I and digested with Sal I, and the resulting overhanging ends MATERIALS AND METHODS were filled in with DNA polymerase (10). After blunt-end ligation, the DNA was used to transform E. coli strain Virus Growth and Purification. HRV type 14 (HRV-14) HB101 cells. Transformants were screened for size of in- was obtained from the American Type Culture Collection serts, and clones were selected that represented a distribu- and plaque-purified by standard techniques. The growth and tion of sizes. purification of HRVs have been described recently (8). Nucleic Acid Sequencing of cDNA Inserts. Digestion of Genomic RNA Isolation. Purified HRV-14 was digested clones 7, 57, and 186 and their subsets of deletion clones with proteinase K (0.5 mg/ml) in 10 mM Tris Cl/1 mM with various commercially available restriction EDTA/0.3% NaDodSO4, pH 7.5, for 30 min at 37°C. Viral generated a set of overlapping DNA fragments. These frag- genomic RNA was separated from digested protein by oli- ments were 32P-labeled as described (10). Labeled DNA go(dT)-cellulose chromatography (9). After precipitation fragments were asymmetrically cut with a second restriction with ethanol, the RNA was suspended in 0.3 ml of 10 mM to generate fragments with only one radioactively

The publication costs of this article were defrayed in part by page charge Abbreviations: HRV, human rhinovirus; EMCV, encephalomyocar- payment. This article must therefore be hereby marked "advertisement" ditis virus; FMDV, foot-and-mouth disease virus; PV-1, poliovirus in accordance with 18 U.S.C. §1734 solely to indicate this fact. type 1; bp, base pair(s). 732 Downloaded by guest on September 30, 2021 Biochemistry: Callahan et aL Proc. Natl. Acad. Sci. USA 82 (1985) 733 labeled end. After isolation by electrophoresis in 5% acryla- RNA. Sequence data (see below) indicated that clone 198 mide/bisacrylylcystamine gels (14), labeled DNA fragments represents a sequence starting at genomic RNA nucleotide were sequenced by using the chemical methods described by 13. Maxam and Gilbert (15). Alternatively, synthetic deoxynu- Construction of Deletion Clones. To aid in the rapid se- cleotide primers and Sanger dideoxy sequencing techniques quencing of cDNA clones 7 and 186, a subset of deletion (16) were used to complete the determination of the nucleo- clones representing various deletions from the 5' and 3' ends tide sequence. Both strands of cDNA were sequenced in of clones 186 and 7, respectively, were constructed. Inserts most instances, and all sequence reactions were done at least of clones 7 and 186 were put into pUC9 plasmids that had twice. A continuous DNA sequence of 7227 nucleotides was been linearized with DNase I and cut with the restriction obtained using the Intelligenetics software package. All 32p_ endonuclease Sal I. No Sal I or Pst I sites were found in the labeled triphosphates were purchased from Amersham. HRV-14 sequence. Following ligation and transformation of DNA-modifying enzymes and restriction endonucleases E. coli, clones were selected to give a set of clones that had were purchased from Boehringer Mannheim. inserts corresponding to various lengths of the original clones 7 and 186 (Fig. 1). RESULTS Nucleotide Sequence of the Rhinovirus Genome. The se- quence of cDNA clone inserts were determined predomi- Cloning of Rhinovirus Double-Stranded cDNA. Genomic nantly (>95%) by using the Maxam-Gilbert chemical se- RNA from plaque-purified HRV-14 was extracted and then quencing method (15). The 5' end of the RNA genome and purified by oligo(dT)-cellulose chromatography and sucrose regions lacking convenient restriction sites were sequenced gradient centrifugation. The 35S genomic RNA was used as by using dideoxy sequencing methods (16) in which synthet- a template for the synthesis of cDNA, using reverse tran- ic primers were synthesized from sequenced regions and scriptase and an oligo(dT) primer. Following second-strand used to prime cDNA synthesis on the genomic RNA tem- synthesis with DNA polymerase I, the cDNA was trimmed plate. Deletion clones were labeled at the EcoRI or BamHI with nuclease S1 and tailed with oligo(dC) prior to insertion sites of the pUC9 polylinker and sequenced by using the into the Pst I site of oligo(dG)-tailed pBR322. The newly Maxam-Gilbert method. The first 6 nucleotides at the 5' end constructed plasmids were then used to transform compe- were confirmed by sequential single-nucleotide extension of tent E. coli RRI cells. Each resulting cDNA clone was char- a 32P-labeled synthetic oligodeoxynucleotide representing acterized for the size of its cDNA insert. Clones containing genome nucleotides 7-27. The results of this experiment inserts of >1500 bp were analyzed by restriction endonucle- confirmed the 5' sequence U-U-A-A-A-A for the HRV-14 ase mapping and colony hybridization to determine their re- RNA genome, since addition of only dTTP extended the lationship to each other. Clones 7 (3230 bp), 57 (1910 bp), primer by 4 nucleotides and addition of dATP and dTTP ex- and 186 (3223 bp) were found to overlap and represent a tended the primer by 6 nucleotides, whereas addition of 6784-nucleotide segment of the genomic RNA. Clone 186, dCTP and dGTP had no effect (data not shown). The first 10 which contained a 3' poly(A) of 15 nucleotides hybridized to nucleotides of poliovirus type 1 (PV-1) and HRV-14 are iden- total cellular RNA from infected but not uninfected cells, tical. and thus contained viral-encoded sequences (data not The Intelligenetics software package was used to merge shown). The precise map location of each clone was deter- sequence gel readings into a single DNA sequence contain- mined from restriction digests and is shown in Fig. 1. Since ing 7227 nucleotides including the poly(A) tail (Fig. 2). Com- clone 7 did not appear to extend far enough to represent the puter-generated translation of the DNA sequence in all three 5'-terminal region, a synthetic primer complementary to vi- reading frames (Fig. 3) revealed a single long open reading ral genome nucleotides 864-878 was used to generate a new frame 6537 nucleotides long. The open reading frame initi- set of cDNA clones as above. A new cDNA clone designated ates at nucleotide 629, following four unused AUG codons in 198 was isolated, which overlapped cDNA clone 7 and ex- the same reading frame at nucleotide positions 59, 332, 395, tended 430 nucleotides toward the 5' end of the genome and 419. In addition, there are nine AUG codons in the other two reading frames prior to the AUG at position 629. 2 3 4 5 6 7 kb Base composition of the HRV-14 genomic RNA, minus | ~ ~~~ A A A 7 the poly(A), showed a high adenine content (32.1%), fol- I14 1 57 lowed by uracil (27.3%), guanine (20.5%) and cytosine ,A71 11861 (20.1%). Codon frequency analysis supported the high ade- a 195 nine content by showing a strong preference for adenosine A55 -H and uridine in the third position of codons. The triplet CGG 184 h6A|^571A53 does not occur at all, and GCG and UCG occur only two and A-I38 192 three times, respectively. The predicted polyprotein con- A47 ,_, _44 tains only 1.1% tryptophan, 2.0% cysteine, 2.4% histidine, A62 ___21_A_ and 2.4% methionine, but it is rich in leucine (8.9%) and thre- A13 a23 onine (8.0%). The computer-generated lengths and molecu- F-~~a9~~~~~~~~~~~~1 lar weights for all the viral proteins are listed in Table 1. This 3B assignment ofprotein map positions was done solely by anal- 1A lB IC ID 2A 2B 2C 3AI 3C 3D ogy to the PV-1 sequence. The postulated proteolytic cleav- ' ' ' la1 age sites (Table 2) must be considered tentative until protein sequences are obtained. FIG. 1. Mapping of HRV-14 cDNA clones to the viral RNA genome. The topmost line represents the HRV-14 RNA genome; numbers above the line indicate sequence positions in kilobases DISCUSSION (kb), and AAA, the poly(A) tail. The positions of cDNA clones 198, HRVs are the leading cause of the common cold in hu- 7, 57, and 186 were determined by restriction endonuclease map- mans ping. Deletion clones (designated by A) were constructed from (18). Although up to 115 antigenically distinct sero- cDNA clones 7 and 186 as described in Materials and Methods. The types have been identified, little is known at the molecular HRV-14 protein map is shown at the bottom and was determined by level about any of the viruses. A knowledge of the RNA analogy to poliovirus. Protein nomenclature is that of Rueckert and genome sequence is necessary for a full understanding of the Wimmer (17). functions of the encoded proteins involved in virus structure Downloaded by guest on September 30, 2021 734 Biochemistry: Callahan et aL Proc. NatL Acad. Sci. USA 82 (1985)

20 40 60 80 100 TTAAMACAGC GGATOGGTAT CCCACCATTC GACCCATTGG GTGTAGTACT 120 CTGGTACTAT GTACCTTTGT ACGCCTGTTT CTCCCCAACC ACCCTTCCTT AAAATTCCCA 140 160 180 200 220 240 260 280 CCCATGAMAC GTTAGMAGCT TGACATTAMA GTACMATAGG TGGCGCCATA TCCAATGGTG TCTATGTACA AGCACTTCTG TTTCCCAGGA GCGAGGTATA GGCTGTACCC 300 ACTGCCAAAA GCCTTTMACC GTTATCCGCC AACCMACTAC GTMACAGTTA GTACCATCTT GTTCTTGACT 320 340 360 380 400 420 440 460 GGACGTTCGA TCAGGTGGAT TTTCCCTCCA CTAGTTTGGT CGATGAOGGT AGGAATTCCC CACGGOTGAC CGTGTCCTAG CCTGCGTGGC GGCCMACCCA GCTTATGCTG GGACGCCCTT TTMOOGACAT GGTGTGAAGA CTCGCATGTG CTTGGTTGTG AGTCCTCCGG CCCCTGAATG 480 500 520 540 560 580 600 620 CGGCTAACCT TAACCCTAGA GCCTTATGCC ACGATCCAGT GGTTGTAAGG TCGTAATGAG CAATTCCGGG ACGGGACCGA CTACTTTGGG TGTCCGTGTT TCTCATTTTT CTTCATATTG TCTTATGGTC ACAGCATATA TATACATATA CTGTGATC 1A(VP4) 643 673 703 733 763 JATG 000 OCT CAG OTT TCT 404 CAl AAA AGT 004 TCT CAC GMA MT CMA MC ATT TTG ACC MAT GGA TCA MAT CAl ACT TTC ACA OTT ATA MAT TAC TAT OAT GCA GCA 407 ACA TCA TCA OCT GOT TCA TCA GAO MET Alo Oln Vol Ser MOG CMA CTG ATG CCA Gly Thr Oln Lye Ser Gly Ser Hio Glu Asn Oln Ass Ile Lou Thr Ass Gly Ser Asn Oln Thr Phse Thr Vol Ile Ass Tyr Tyr Lye Asp Alo Alo Ser Thr Ser Ser Ala 01y Ole Ser Leou Ser MET Asp Pro 793 823 13VP2) 853 883 913 TCT MOG TTT ACA GMA CCA OTT AAA OAT CTC ATG CTT MOG GT GCA CCA GCA TTG MATIT1CA CCC MAT OTT GAO 0CC TOT GOT TAT AlT OAT AlA GTA CMA CMA ATC ACA CTC 000 MAT TCA ACA ATA ACA ACA CMA GMA GCA 0CC MAC OCT See Lys Phe Thr Glu Pro Vol Lye Asp Leou MOT Leou Lye Giy Alo Pro Alo Leou Ase 5r Pro Asn Vol Gbu Ala Cys Gly Tyr 8er Asp Arg Vol Ole Ole Ile Thr Leou Gy Ass Ser Thr Ile Thr Thr Ole Glb Alo Ala Ass Alo 943 973 1003 1033 1063 OTT ITO TOT TAT OCT GMA TOO COA GAO TAO OTT CCA OAT ITO GAC OCT AlT OAT ITO MAT AMA ACT TCA AAA CCA GAC ACT TCT GTC TOT AGO TTT TAC ACA OAT AlT 404 TOO ACA ACA GOT TCT GOC TGO TOO TOO Vol Vol AlaoGlb OTT MIG MAA MAA Cys Tyr Trp Pro Glb Tyr Leou Pro Asp Vol Asp Alo Sep Asp Vol Ass Lye Thr See Lye Pro Asp Thr Ser Vol Cys Arg Phe Tyr Thr Leou Asp Sep Lys Thr Trp Thr Thr Gly Ser Lye Gly Trp Cys Trp Lys 1093 1023 1153 1183 1213 TTA CCA OAT GCA CTC MOG OAT ATG GOT ITO TTC 000 CMA MC ATl TTT TTC CAC TCA CTA GOA AlA TCA GOT TAC ACA GTA CAC OTT CAl TOO MAT ICC ACA MAA TTC OAT AGO GOT TOT CTA OTT GTA OTT GTA ATA CCA GMA CAC CMA Leo Pro Asp Alo Leou Lye Asp MET Gly Vsl Phe Gly Ole Ass SET Phe Phe Hle 5cr Lou Gly Arg Ser Gly Tyr Thr Vol His Vol Ole Cys Ass Alo Thr Lye Pbs His Ser Gly Cys Leou Leou Vol Vol Vol Iles Pro Glb His Ole 1243 1273 1303 1333 1363 CTG OCT TCA OAT GAO 000 GOT M~f OTT TCA OTT MAA TAO ACA TTC ACl OAT 004 GOT GMA COT GOT ATA OAT TTA TCA TOT 004 MAT GMA ITO GGA 000 OCT ITO MOG OAT 070 ATA TAO MAT ATG MAT GOT ACT TTA TTA GGA MAT CTG Leou Alo Ser His Glb Gly Gly Ass Vol Sep Vol Lye Tyr Thr Phs Thr His Pro Gly Glb Arg Gly Ile Asp Leou Ser See Ala Ass Glb Vol Gly Gly Pro Vol Lye Asp Vol Ole Tyr Ass SET Ass Gly Thr Leou Leou Gy Ass Leou 1393 1423 1453 0483 1513 CTC ATT TTC OCT 040 040 TTC ATT MAT CTA 404 400 MAT MAT 404 000 404 ATA ITO ATA 004 TAO ATA MOC TOA 074 CCC ATT OAT TCA ATG 404 COT 040 MOC MT TCA 070 470 ITO 470 OCT ATT 000 COT OTT 404 004 Lso Ile Pbs Pro His Ole Pbs GTC~ GTA Ile Ass Leou Arg Thr Ass Ass Thr Alo Tbr Ile Vol Ole Pro Tyr Ile Ass Ser Vol Pro Ile Asp Sep SET Thr Arg His Aoe Ass Vol SeP Leou SOT Vol Ole Pro Ile Ala Pro Leou Thr Vol Pro 1543 1573 1603 1C(VP3) 1633 1663 ACT 004 004 ACT COO TCA CTC COT ATA 404 ITO 404 ATA 004 COT ATG TOO 407 040 TTC TOT GOG ATA AGO TOO MOG TCA ATT ITO CCA TTG 004 ACT 404 ACT 000 004 400 404 OAT GAO Thr Alo Thr Pro ('+G TTG GGG TCA CMA TTC TTG AGO Gly Ser Leou Pro Ile Thr Vol Thr Ile Alo Pro SET Cys Thr Glu Pbs Sep Gly Ile Arg Ser Lye Ocr Ile Vol Pro GIn Gly Leou Pro Thr Thr Thr Leo Pro Gly SeP Gly Ole Phe Leou Thr Thr Asp Asp Arg 1693 1723 1753 1783 1813 CMA TOO COO AlT 004 070 004 MAT TAT 040 004 A07 004 404 474 040 474 074 GGG MAA OTT OAT MAC TTG 074 GAM ATT 474 040 OTT GAT 404 070 ATT 007 470 MO 400 CAT 404 047 040 OTT MO AlT 740 070 474 Ole MAC MAA Ser Pro Ser Ala Leou Pro Ass Tyr Glu Pro The Pro Arg Ile His Ile Leou ly Lye Vol Hle Ass Leou Leou lb Ile Ilie lie V AsiGp Thr Leou Ile Pro M07 Ass Ass Thr His Thr Lys Asp Glb Vol Ass Ser Tyr Leou Ile 1843 1873 1903 1933 1963 004 074 MAT 004 MAC 400 CMA MT 040 040 OT TT77 GGG 404 MOC 070 TTT 477 GOT OAT GIG ITO TTC MAA ACT ACT OTT 070 GOT GMA OTT OTT 040 740 TAT 404 047 TGO 707 004 704 OTT 404 770 TOT 700 470 747 A07 GOT Pro Leou Ass Ala Ass Arg lie Ass Glb lbs Vol Phs Gly The Ass Leou Phe Ile Gly Asp Gly Vol Pbs Lye Thr Thr Leou Leou Gly Glb lle Vol Ole Tyr Tyr The His Tsp See Gly See Leou Arg Pbs Ser Sos SET Tyr The Gly 1993 2023 2053 2083 2113 007 000 TTG TOO 407 GOT MAA 070 A07 074 004 TAO 400 000 007 007 OCT 007 007 004 040 040 400 404 GMA 004 470 074 GOT ACT OAT OTT ITO 700 OAT ATT GOT 070 CMA 700 400 474 074 470 404 474 004 TOO 404 704 Pro Ala Leou Sep Ser Ala Lye Leou The Leou Ala Tyr Thr Pro Pro Gly Ala Arg Gly Pro lie Asp Arg Arg Glb Ala SET Leo Gly Thr His Vol Vol Trp Asp Ile Gly Leou Ole Ser Tr Ile Vol MET Thr Ile Pro Trp Thr See 2143 2173 2203 2233 2263 GIG ITO CAl 777 404 TAT A07 047 004 OAT 404 TAO 400 407 GOT 000 707 074 704 707 700 747 CMA 407 707 OTT 474 OTT CCC 004 GMA 400 400 000 040 070 TAO 774 074 704 770 474 AlT 004 707 004 047 TTT MAG OTT Gly Vol Ole Pbs Arg Tyr Thr Asp Pro Asp Thr Tyr Thr Sep Ala Cly Phe Leou Ser Cys Tsp Tyr lie Thr Ser Leou Ile Leou Pro Pro Glb Thr Thr Gly lie Vol Tyr Leou Lou Sep Pbs Ile See Ala Cys Pro Asp Phe Lye Leou 2293 2323 I 1D(VP1) 2353 2383 2413 400 070 470 MAA 047 A07 CMA 407 470 704 040 407 077 004 070 ACT GAA$OOO 774 007 047 GMA TTA GM GAA 070 470 OTT GAO AMA 400 MAA 040 400 070 000 704 470 704 707 007 004 MAA 040 404 CM MAA 070 000 474 Arg Leou SET Lys Asp Thr lie Tbr Ile Ser Ole Thr Vol Ala Leou Thr Glb Gly Leou Gy Asp lb Leou Gy Glb Vol Ole Vol Gbu Lye Thr Lye lbs Thr Vol Ala See Ole Ser Ser Gly Pro Lye His Thr lie Lye Vol Pro Ile 2443 2473 2503 2533 2563 CTA A07 004 MAC GMA 404 000 000 404 470 007 077 OTT 004 704 GAO 400 474 GMA 400 404 A07 400 740 470 040 TTT MAT GOT 704 GMA ACT 047 074 GM TOO 707 TTG 007 007 004 007 707 070 047 074 ACT GMA 474 CM Leo Thr Ala Ass Glb Thr Gly Ala Thr SET Pro Vol Leou Pro Sep Asp Ser Ile Glb Thr Arg Thr Thr Tyr SET His Phs Ass Gly Sep Gbu Thr Asp Vol Gbu Cys Phs Leou Gly Arg Ala Ala Cys Gab His Vol Thr Glb Ile lie 2593 2623 2653 2683 2713 MO MAA 047 007 407 004 474 047 MAT 040 404 GM 004 MAA TTG OTT MT OAT TOG AMA 470 MC 070 700 400 OTT 070 CMA OTT 404 MI MAA 070 GM 070 770 407 747 077 400 TTT OAT 707 040 747 400 474 070 000 407 Ass Lye Asp Ala Thr Gly Ile Asp Asn His Arg Glb Ala Lye Leou Ps Ass Asp Trp Lye Ile Ace Leou See Sep Leo Vol Ole Leou Arg Lye Lye Leou lb Leou Pe Thr Tyr Vol Arg Phe Asp See Glb Tyr Tr Ile Leou Ala Thr 2743 2773 2803 2833 2863 004 707 CMA 007 04T 704 004 MC 747 704 AGO MAT TTG 070 070 CMA 000 470 747 OTT 004 047 007 000 CCI AMA TOO MAA 404 070 000 047 740 404 700 CMA 407 00 704 MC CCC AlT 074 770 770 MlG ITO GGG 047 404 Ale Ser Ole Pro Asp Ser Ala Ass Tyr See Sep Ass Leou Vol Gal lbs Ala SET Tyr Vol Pro His Gly Ala Pro Lye Ser Lys Arg Vol Gly Asp Tyr Thr Tsp lbn See Ala Ser Ass Pro Ser Vol Pbs Phe Lye Vol bly Asp Thr 2893 2923 2953 2983 3015 704 AGO 777 407 070 007 TA7 074 004 TTG 004 704 004 TAT MAT 707 777 TAT OAT 007 740 704 047 047 047 004 GMA A07 040 747 000 474 407 OTT 074 MAC 047 470 GOT 407 470 004 770 404 474 074 MT GMA OAT OAT See Arg Pbs Sep Vol Pro Tyr Vol Gly Lou Ala See Ala Tyr Ass Cys Pbe Tyr Asp Gly Tyr Ser His Asp Asp Ala Glb Thr lie Tyr Gly Ile Thr Vol Leou Ass Hle SET Gly Ser SET Ala Phs Arg Ile Vol Ass Glb His Asp 3043 3073 3103 3133 5163 GM 040 MAA ACT OTT 070 440 470 AlA 077 747 040 AGO 004 MlG 070 OTT GMA 004 700 ATT 004 404 004 000 AlA GOP 074 CCC 740 404 704 474 GGG 000 404 MAT TAT COT MAG MT 404 GMA 004 074 477 Ml MG AGO MAA blo His Lye The Leou Vol Lye Ole Aeg Vol Tyr His Arg Ala Lye Leo Vol Gbu Ala Trp Ile Pro Arg Ala Pro Arg Ala Leou Pro Tyr The See Ile Gly Arg Thr Ass Tyr Pro Lye Ass Thr Glb Pee Vol Ole Lye Lye Org Lye 31953 2A 3223 3253 3283 3313 007 040 477 AMATCOO 747900 074 004 007 400 740 007 GGG 477 TAT 404 704 MAT ITT MAA 474 470 MAT 740 040 770 470 404 004 GM 040 040 CAT MAT 070 474 004 000 TAT 004 MT 404 OAT 774 004 474 070 704 404 bly Asp Ile Lye See Tyr Gly Leou Gly Pro Aeg Tyr Gly Gly Ile Tyr The Gee Ace Vol Lye Ile SET Ass Tyr His Leou SET Thr Pro Glb Asp His His Ass Leo Ibs Ala Pro Tyr Pro Ass Arg Asp Leou Ale Ilie Vol Gee The 3343 3373 3403 3433 3463 004 004 OAT GOT 004 GMA 404 474 004 040 TOT MAC 007 404 704 007 007ACOA747 700 404 74774 AlA MAG TAT TAO 000 474 ATT TOO GMA MO CCC 400 AC 470 700 407 GMA 004 400 COT 747 740 004 407 404 707 CMA Cly bly His bly Ala Glb The Ile Pro His Cys Asn Arg The See bly Vol Tye Tyr See The Tyr Tyr Arg Lye Tyr Tyr Pro Ile Ile Cys Glb Lye Pro The Ass Ile Trp Ile blo bly See Pro Tyr Tyr Peo See Arg Phe lie 3493 3523 3553 3583 3613 004 004 070 470 MAA GIG OTT Ill CCI 004 040 074 004 GAO TOO GOT 000 477 770 404 700 474 CA7 007 CCC 477 004 070 774 404 007 GMA 007 407 004 747 077 TOT 077 007 GAO 474 004 040 070 GAG TOT 470 004 GAG Ala Gly Vol SET Lye Cly Vol Cly Pro Ala Clo Leou Gly Asp Cys Cly Gly Ilbe Leou Org Cys Ilie His Gly Pro bbs lp Leou Leou The Ala Glb Cly See Cly Tyr Vol Cys Phs Ala Asp Ile Arg lie Leou Co Cys Ilie Ala Glb 2B~ 3643 3675 3703 .3733 3763 GMA CA G 070 AlT OAT TAO 470 404 007 770 GOT 404 007 777 007 070 CC TC770 C0 GAO CMA 470 704 400 AMA ITO ACA GM 074 CMA GM 070 CCI MAA CAT 770 070 400 404 MAA 007 770 700 MAA ITO ITOC A47 0 Clo lie Cly Leo See Asp Tyr Ile The Gly Leou Gly Org Ala Phs Cly Vol Cly Phe The Asp lie Ile See The Lye Vol Thr Cia Leou Ole lb Vol Ala Lye Asp Phe Leo The The Lye Vol Leou See Lye Gal Vol Lye SET Vol 3793 3823 3853 3883 3913 2C 704 GOT 774 070 070 407 700 404 MAT OAT CAT GAO 770 070 407 077 400 CCC 407 074 004 074 077 004 707 OAT 004 707 COT 700 404 707 070 MC 470 740 477 700 MAA 040 777 040 ITO COT TAO 477 CMA AbA 04400 See Ala Leou Vol Ile Ile Cys Aeg Ace His Asp Asp Leou Vol Thr Gob The Ala The Leo Ala Leou Lcou ly Cys Asp Gly See Pro Trp Arg Phs Leou Lye MET Tyr Ole See Lye His Phs Ole Vol Pro Tyr Ile Glb Org lie Ala 3943 3973 4003 4033 4063 MAT OAT 004 700 070 AbA MCG 707 MT 047 004 707 MAT 007 000 MlG 004 770 CM 700 407 007 MT Ml 477 700 MAA 070 407 CM 700 474 MAA MO MAA 074 007 000 CM COO MAA GMAA 074MAAC 707 707 407 MAA 070 Ass Asp Gly Trp Phs Arg Lye Phe Ass Asp Ala Cys Ass Ala Ala Lye Gly Lou Clo Trp Ile Ala Ass Lye Ole See Lye Leou Ilbs lb Tp Ile Lye Asn Lye Gal Leou Pro lbs Ala Lye Glb Lye Leou lb Phe Cys See Lye Leou 4093 4123 4153 4103 4213 MA AAM 07704T 474 074 040 404 CMA ATA 400 400 470 OAT A70 700 MAT 004 40404MGC MAAA CIA lAO 040 070 070 MAT MAC 074 070 700 070 CM CM ATG 700 CMA MC 777 000 004 777 TA7 000 077 CM 704 MAA AGO Lye lie Leou Asp Ile Leou lb Arg lbs Ilie The The SET His Ile Scr Ass Pro The lie Clo Lye Arg Gbu lbs Leou Ps Ass Ass Gob Leou Tsp Lou GblbsSiET See Oln Lye Phe Ala Pro Phs Tyr Ala Gal Glb See Lye Org 4243 4273 4303 4333 4363 ATC AGOACMT070AMCAMC MAA 470 074 MAT TAT ATG CMA 707 MA ACTMTMAAACCM0400G A A G C GACAGGTTGAA0 7 7 470 047 007 404 000 007 70 GOT AMA TCA 774 ACA ACA TOO 407 070 004 007 bOA 477 bOA CMA 040 070 Ile Org Clo Leou Lye Ass Lye SET Vol Ass Tyr SET lie Phs Lye See Lye lie Arg The Glb Pro Vol Cys Vol Leou Ile His Gly Thr Pro Cly See Gly Lye See Leou The The See Ile Vol Cly Org Ala Ile Ala Clo His Phs 4393 4423 4453 4483 4513 MAT 704 004 074 747 704 007 CCA 004 047 CCC MAGCA04 077 047 007 TAT 040 CMA04CMGOATT 070 477 470 047 047 070 AAC CA MT CCA 047 004 040 047 474 AGO ATG 707 707 CMA ATG ITT TOT TCA ITO CAT 070 770 Ass See Ala Vol Tyr See Leou Pro Pro Asp Pro Lye His Phs Asp Cly Tyr lin lin lie Clo Vol Vol Ole MET Asp Asp Leou Ass lie Ace Pro Asp Cly lie Asp Ile See SET Phs Cys lie SET Vol See See Vol Asp Ps Leou 4543 4573 4603 4633 4663 007 004 470 GOT 407 774 047 MAC MC 000 470 774 770 ACC ACT MAT 777 007 070 000 700 404 MT TOT MOC ACA 074 AGO 000 004 404 470 070 MAT COT CM 007 770 070 400 AlA 707 007 777 040 074 047 ATA 707 770 Pro Pro SET Ala See Leou Asp Ass Lye Gly SET Leou Phs The See Ass Phe Vol Leou Ala See The Ass See Ass Chr Leou See Pro Pro The TIce Leou Ass Pro Clu Ala Leou Vol Arg Org Phe Cly Phe Asp Leo Asp Ibs Cys Leou 4693 4723 4753 4/OS 4813 OAT ACT ACC TAO 404 MlG MT CIA MAA 070 MAT 004 000 470 TCA ACC MCG ACA TOO MAA OAT TOO CAT CMA 004 TOT MAT 770TCMAAA 707 700 CCC CTA 070 707 000 AAA 007 007 400 770 074 040 AlA ACT ACC MAC 077 AGO His The The Tyr The Lye Ass Gly Lye Leou Ass Ala Cly SET See The Lye The Cys Lye Asp Cys His lbs Pro Ser Ass Phs Lye Lye Cys Cys Pro Leou Vol Cys Gly Lys Ala Ile Scr Leou Vol Asp Org The The Ass Gal Org 4843 4873 4903 3 A4933 4963 TAT 407 070 OAT CMA 070 070 ACl 007 407 474 407 047 070 MC AGO MAA 070 CMA 477 404 047 700 TA GMA ACA 070 077 0449004 CCA 070 TAT AAA CAT 774 040 07 OAT 007 700 MAC 404 004 007 TOOCM 707 070MTCA Tyr See Vol Asp lbs Leou Vol Thr Ala Ibs Ile See Asp Phe Lye See Lye SET lie Ile The Asp See Leou Co The Leou Phe lbs Cy Pro Gal Tyr Lye Asp Leo Clo Ile Asp Gal Cys Ass The Pro Pro See Glb Cys Ile Ass 4993 5023 5053 5083 5113 CAT TTA 070 MAA 707 074 G07A 0 CM 040GA 477 000 CMA T07 707 Ml MG MAG MAA 700 477 ATA 007 CMA 007 007 400 MO 074 CM AGO 007 470 MAT CMA 000 000 070 077 077 MT 007 077 070 470 777 070 007 404 074 Asp Leou Leou Lye See Vol Asp See lb Glb Ile Org Clo Tyr Cys Lye Lye Lye Lye Tp TIle Ile Pro Clo Ibs Pro The Ass Ile blo Org Ala SET Ass lbs Ala See SET Ile Ile Ass The Ile Leou SET Ps Vol See The Leou 5143 3B(VPu 5203 5233 5263 007 407 077 TAT 070 077 747 AMA 070 777 007 CMA 007 0409000COATATCTGTACCGCTACATAATAAACCCACTTOTGTMT0A04 TAM00M 0 0 0 7 000 000 007 077 070 04400 CCO MOACA A40CM 77700 074 700 070 774 400 MAA Ile Vol Vol Ibs Ala bbs Ole +3C, Cly Tyr Tyr Lye Leou Pe The Cly Pro Tyr See Cly Ass Pro Pro His Ass Lye Leou Lye Ala Pro The Leou Org Pro Vol Vol Gal Gle Cly Pro Ass The Gbu Phe Ala Leo See Leou Leo Org Lye 5293 5323 5353 5383 5413 MAC 074 470 00707 404AC 400 704 MlG 000 00007 404CCC 770 000 470 007CA 7 007 070 707 070 474AT CCC ACA 040 bOA 040 007 007 G07 G07 070 074 070 MT 007 040 AAA 077 404 007 MCG GOT MG TAO MAA 774 074 Ass Ile SET The Ile The The See Lye Cly Clo Phe The Cly Leou Gly Ile His Asp Org Vol Cys Vol Ole Pro The His Ala bbs Pro Gly Asp Asp GaS Leou Vol Ass Cly Gle Lye Ole Org Gal Lye Asp Lye Tyr Lye Leou GaS 5443 5473 5503. 5533 5563 G07 004 000 MAC 407 MT 070 040 077 004 070 070 ACT 774 007 004 MAT GMA AMA 770 404 047 470 400 004 777 470 TOOCM 007 070CMA 007 070 007G 00 40777T 070 070 007 700 M T MAC 777 000 MC 007 470 070 Asp Pro Clo Ass Ilie Ass Leo Clo Leo The Vol Leou The Leou Asp Org Ass Clo Lye Phe Org Asp Ile Org Gly Phe Ilbs See Glb 0sp Leou Glb ly Gob Asp Ala The Leou Gal Vol His See Ass Ass Pe The Ass The Ile Leou 5593 5623 5653 5683 5713 CMA 007 000 007 070 404 470 004 000 077 477 MT 070 007 000 400 000 007 MAC 404 470 077 007 TAT OAT TAT COO ACA AMA ACT GGG 000 707 000 007 070 070 707 007 A07 007 MCG 470 707 CIT 407 007 007 000 007 Gbu Vol Cly Pro Vol The SET Ala Cly Leou Ile Ass Leou See See The Pro The Ass Arg SET Ibs Arg Tyr Asp Tyr Ala The Lys The Cly lbs Cys Cly Cly Vol Leo Cys Ala The Gly Lye Ile Phe Cly Ile His Gal Cly Cly 5743 ~~~~~~5773 i3 5803 5033 5863 MAT 004 404 CM 000 707 704 GOT CM 007 AAA MAA 044 747 077 070 000 MA OaGC0 044 074 070 007 400 047 MlG 007 400 000 777 MC 474 MT 000 070 MAC 000 000AC007 M C704 AA 070 047 CCC 007 070 777 747 Ass Gly Org lbs Gly Phe See Aba lbs Leou Lye Lye Gbs Tyr Phe Vol Ilo Lye lbs Cly lbs Vol Ole Ala Akg His Lye Vol Arg Clo Phe Ass Ile Ass Pro Gal Ass The Ala The Lye See Lye Leo His Pro See Vol Phe Tyr 5893 5923 5953 5903 6013 047 007 707 004 007 040 MlGGACM CT GOT 070 070 407 000 MAT G47 000 A0007MGC IATT MAA 070 407CMGAA0C 070 770 707 Ml TAO C 000MG T ITO AlCM 000 407 070 077 070 007 070 000 047 707 Gob MAT ACMMAAT Asp Phe Pee Cly Asp Lye Glb Pro Ala Vol Leou See Asp Ass Asp Pro Org Leou 01 Vol Lye Leou The Clo See Leou Ps See Lye Tyr Lye Gly Ass yes Ass The Glu Pro The lb Ass SET Leou Vol Ala Gal Asp His Tyr 6043 6073 6103 6133 6163 004 000 CMA 074 074 704 074 OAT 470 CCC ACT 707 CMA OTT ACA CTA MAA CMA 000 770 747 GGA 074 047 00007CMAA 007 470 007 477 400 000 407 004 000 777 000 TOT 070 407 077 000 470 MAA MC A00 000 077 070 Ala GO1y Tie Leou Leou See Leou Asp Ile Pro The See Clo Leou Tr Leou Lye Glb Ala Leou Tyr Gly Vol Asp Gly Leou Co Pro bbs Asp Ile The The See Ala Cly Phe Pro Tyr Vol See Laou Cy Ile Lye Lye Org Asp Ile Leo 6193 6223 6253 6283 6313 MAT AAA TOG 400 040 040 404 CMA MlG 470 Ml 707 T07 070 000 Ml T47 000 077 040 070 007 070 007 ACA TAT OTT MCG OAT GM 774 000 407 007 040 MAA 070 000 774 000 MAA 407 404 070 OTT CM 000 700 407 770 Ass Lv- G/oIThe lie Asp The Clo Lye SET Lye Phe Tyr Leou Asp Lys Tyr Cly Ile Asp Leou Pro Leou Vol Thr Tyr Ole Lye Asp Clo Leou Org See Vol Asp Lye Vol Org Leou Gly Lye See Org Leou Ile Clo Ala See See Leou 6343 6373 .6403 6433 6463 MAT 0AT TCT OTT MO 470 404 470 MAA 074 CCC MO OTT TAO MAA 004 770 CA7 CMA MT CCC 007 077 070 ACT 004 700 004 ITO 007 707 047 007 007 070 777 700 707 070 470 007 700 074 070 CCC000 070 p 047GG 07000 Ass Asp1 Se Vol Ass SET Arg SET Lye Leou Cy Ass Leou Tyr Lye Ala Phe His lie Ass Pro Cly Vol Leou The Cly See Ala Vol Cly Cys Asp Pro Asp Vol Phe Trp See Vol Ole Pro Cys Leo SET Asp G1y His Leou SET Ala 6493 6523 6553 6583 6613 077 047 -00 70-T-70700TT70-00-0-0-7 7 TT0400M 0 7 0 M 7 0 0 0 0 07707007C 7 TTT GAT TAC TCT MAT TTT GAT GCC TCT TTG TCA CCA GTT TGG TTT GTC TGT CTA GAG MAG GTT TTG ACC MAG TTA GGC TTT GCA GGC TCT TCA TTA ATT CMA TCA ATT TGT AAT ACC CAT CAT ATC TTT AGG GAT GMA ATA TAT GTG GTT Phe Asp Tyr Ser Asn Phe Asp Ala Ser Leu Ser Pro Val Trp Phe Val Cys Leu Glu Lys Val Leu Thr Lys Leu Gly Phe Ala Gly Ser Ser Leu Ile Gln Ser Ile Cys Asn Thr His His Ile Phe Arg Asp Glu Ile Tyr Val Val 6643 6673 6703 6733 0 6763 GM GGT GGC ATG CCC TCA GGG TGT0TTCA GGA ACC AGC ATA TTC TCC ATG ATC CM C ATA ATC ATT AGG ACT TTG ATA TTA GAT GCA TAT MA GGA ATA GAT TTA GAC AAA CTT AAA ATC TTA GCT TAC GGT GAT GAT TTG ATT GTT Glu Gly Gly MET Pro Ser Gly Cys Ser Gly Thr Ser Ile Phe Asn Ser MET Ile Asn Asn Ile Ile Ile Arg Thr Leu Ile Leu Asp Ala Tyr Lys Gly Ile Asp Leu Asp Lys Leu Lys Ile Leu Ala Tyr Gly Asp Asp Leu Ile Val 6793 6823 6853 6883 6913 TCT TAT CCT TAT GM CTG GAT CCA CAA GTG TTG GCA ACT CTT GGT AM MT TAT GGA CTA ACC ATC ACA CCC CCA GAC TCT77 GM ACT TTT ACA A ATG ACA TGG GCM MC TTG ACA TTT TTA MG AGA TAC TTC MG CCT GAT CM Ser Tyr Pro Tyr Glu Leu Asp Pro Gln Val Leu Ala Thr Leu Gly Lys Asn Tyr Gly Leo Thr Ile Thr Pro Pro Asp Lys Ser Glu Thr Phe Thr Lys MET Thr Trp Glu Asn Leu Thr Phe Leu Lys Arg Tyr Phe Lys Pro Asp Gln 6943 6973 7003 7033 7063 CM TTT CCC TTT TTG GTT CAC CCA GTT ATG CCC ATG MM GAT ATA CAT GAG TCA ATC AGA TGG ACA MlG GAT CCT MA C ACA CAG GAT CAC GTC CGA TCA TTA TGC ATG TTA GCA TGG CAC TCA GGA GM MA GAG TAC MT GM TTC Gln Phe Pro Phe Lau Val His Pro Val MET Pro MET Lys Asp Ile His Glu Ser Ile Arg Trp Thr Lys Asp Pro Lys Asn Thr Gln Asp His Val Arg Ser Leu Cys MET Leu Ala Trp His Ser Gly Glu Lys Glu Tyr Asn Glu Phe 7093 7123 7153 7185 7205 0 7225 ATT CAG G ATC AGA ACT ACT GAC ATT GGA MA TGT CTA ATT CTC CCA GM TAC AGC GTA CTT AGG AGG CGC TGG 7TG GAC CTC TTT TAGGTTMCA ATATAGACAC TTMTTTGAG TAGMGTAGG AGTTTATMM AAAAA4AAA4 M Ile Gln Lys Ile Arg Thr Thr Asp Ile Gly Lys Cys Leu Ile Leu Pro Glu Tyr Ser Val Leu Arg Arg Arg Trp Leu Asp Leu Phe FIG. 2. Nucleotide and derived protein sequence of HRV-14. The complete nucleotide sequence of the HRV-14 RNA genome is presented. Putative cleavage sites (arrows) and assignment of viral proteins to specific locations are indicated. Downloaded by guest on September 30, 2021 Biochemistry: Callahan et aL Proc. Natl. Acad. Sci. USA 82 (1985) 735 2 3 4 5 6 7 kb Table 2. Predicted HRV-14 polyprotein cleavage sites

II I III II lIIIHi I1 11111111 11 1111 I IIIIhItIm I Ihhl II 111111 I111111 I M 11111 Iol 111111 Proteins Cleavage site POLYPROTEIN 1 2 11 iii H lA/lB -Pro-Ala-Leu-Asn/Ser-Pro-Asn-Val- 3 11111 1111111 11 IN 111111 I111 11111111 I OII I 111 111 111Ill11 IhII 1hfh 1I 1111110hl is111111111i lB/lC -Ile-Val-Pro-Gln/Gly-Leu-Gly-Asp- FIG. 3. Translation of HRV-14 genomic RNA. Computer trans- lC/lD -Ala-Leu-Thr-Glu/Gly-Leu-Gly-Asp- lation of the sequence in Fig. 2 was performed in all three reading frames, numbered 1-3 on the left. The vertical lines designate the occurrence of stop codons. kb, Kilobases. 1D/2A -Ile-Lys-Ser-Tyr/Gly-Leu-Gly-Pro- 2A/2B -Ala-Glu-Glu-Gln/Gly-Leu-Ser-Asp- and replication. In addition, determination of the genome se- quence allows definitive comparisons between the four fam- 2B/2C -Ile-Glu-Arg-Gln/Ala-Asn-Asp-Gly- ilies of picornaviruses to determine sequence conservation in viral proteins and evolutionary relationships between vi- 2C/3A -Thr-Leu-Phe-Gln/Gly-Pro-Val-Tyr- ruses. We have cloned and sequenced the entire RNA genome of 3A/3B -Ala-Gln-Thr-Gln/Gly-Pro-Tyr-Ser- HRV-14. Four cDNA clones, designated 198 (866 bp, nucle- otides 13-878), 7 (3231 bp, nucleotides 443-3673), 57 (1911 3B/3C -Val-Val-Val-Gln/Gly-Pro-Asn-Thr- bp, nucleotides 2624-4534), and 186 (3224 bp, nucleotides 4004-7227), were isolated which overlapped and represented 3C/3D -Val-Glu-Lys-Gln/Gly-Gln-Val-Ile- 99.8% of the entire genome sequence. Random deletion of the 3' and 5' ends of cDNA clones 7 and 186, respectively, yielded a subset of clones used to facilitate rapid sequencing sequence identical to that of PV-1 1A at positions 1-6 and at of the cDNA. The map positions of all these clones are 13 of the first 17 amino acids. shown in Fig. 1. Alignment of the HRV-14 5' noncoding sequence to the Translation of the cDNA in all three reading frames shows first 600 bases of the PV-1 sequence shows an RNA homolo- a single large open reading frame 6637 nucleotides long with gy in the range of 75%, depending on how the alignment is initiation 34 codons downstream at position 629. This would done. Interestingly, the major difference between them oc- predict a 5' 628-nucleotide nontranslated region, which con- curs just 5' of the proposed initiation codon where HRV-14 tains four unused AUG codons upstream and in the same is missing 114 bases that are present in the PV-1 5' noncod- reading frame. The four unused AUG codons would initiate ing region. This is the precise location where each of the peptides with amino acid lengths of only 2, 8, 10, and 24. different poliovirus serotypes differs dramatically (20). The Analysis of the other two reading frames in the 5' noncoding function of this region in picornaviruses remains unknown. region shows nine additional initiation codons. Six of these In addition, computer programs predict a stem and loop would result in proteins <29 amino acids long. The remain- structure at the 5' end of the HRV-14 genome sequence that ing three (nucleotide locations 496, 526, and 595) fall within is very similar to that found in PV-1 (4). the first reading frame (Fig. 3) and could initiate an amino Protein-to-protein comparisons between HRV-14, PV-1, acid sequence of up to 77 amino acids, depending on which EMCV, and FMDV sequences show that PV-1 and HRV-14 AUG was utilized as a start codon. Although these starts are closely related, whereas sequence homology to EMCV cannot be ruled out completely, they seem highly unlikely or FMDV is not apparent. The map positions of the viral for the following reasons: First, they would result in relative- proteins were determined solely by analogy to PV-1, since ly short peptides; second, they lack efficient flanking se- no amino acid sequence was determined directly from isolat- quences for ribosome recognition as described by Kozak ed proteins. However, based on the strong homology with (19), which are present around the AUG at position 629; PV-1 proteins and similarity in the predicted size of the third, comparative studies among the three serotypes of po- HRV-14 viral proteins, the assignment of amino acid se- liovirus demonstrated that the AUG codons found in similar quence to proteins appears to be correct. The general geno- positions in the 5' noncoding region are not conserved, in mic organization is similar to those of other picornaviruses. contrast to the strong homology found within the true coding Protein nomenclature used is the 4-3-4 system recently de- sequence (20). In addition, comparison of the 1A (first cap- scribed by Rueckert and Wimmer (17). Comparisons of ho- sid protein to be translated) sequences of HRV-14 and PV-1 mologies for non-structural proteins between HRV-14 and shows that initiation at position 629 of HRV-14 gives a 1A PV-1 ranged from 44% in the protease gene 3C to 65% in the replicase gene 3D. Recent studies have postulated that the Table 1. Size of predicted HRV-14 proteins protease of picornaviruses is a cysteine protease with the Number of cysteine at position 146 of the PV-1 3C within the active site Protein Nucleotides amino acids Predicted M, (21). The protease gene of HRV-14 encodes the same con- Polyprotein 629-7165 2178 242,593 served amino acid sequence (Gly-Gln-Cys-Gly-Gly-Val at 1A (VP4) 629-835 69 7,310 residues 154-159) that is found in PV-1. A similar sequence 1B (VP2) 836-1621 262 28,506 (Gly-Trp-Cys-Gly-Ser-Ala) is found in EMCV protease (6). 1C (VP3) 1622-2329 236 26,198 Comparison ofthe four structural proteins gave interesting 1D (VP1) 2330-3196 289 32,384 results. 1A (VP4), which is an internal structural protein, and 2A 3197-3634 146 16,090 1B (VP2) show a 60% amino acid homology to 1A and 1B of 2B 3635-3925 97 10,7% PV-1. Although 1A shows conserved sequences throughout 2C 3926-4915 330 37,400 its length, 1B contains a very different region (amino acids 3A 4916-5170 85 9,751 134-184), which corresponds to a PV-1 neutralization epi- 3B (VPg) 5171-5239 23 2,470 tope (22) and is probably the same in HRV-14. Comparisons 3C 5240-5785 182 19,999 of 1D (VP1) and 1C (VP3), which contain most of the 3D 5786-7165 460 52,217 immunodominant neutralizing of PV-1, show only 47% and 44% homology, respectively. Comparisons of these Downloaded by guest on September 30, 2021 736 Biochemistry: Callahan et al. Proc. NatL Acad ScL USA 82 (1985)

sequences have also allowed us to predict possible neutraliz- We thank Nancy Dunn and Peter Kniskern for synthesis of DNA ing sites. For example, the amino acid sequences 62-95 of 1C primers and thank Ann Palmenberg (University of Wisconsin) for (VP3) and 91-95 of 1D (VP1) are unique to HRV-14 and map helpful discussions. to known neutralizing sites in PV-1 (ref. 23; E. Emini, per- sonal communication). No open reading frame large enough to code for the leader proteins found in EMCV is 1. Cooper, P. D., Agol, V. I., Bachrach, H. L., Brown, F., and FMDV Ghendon, Y., Gibbs, A. J., Gillespie, J. H., Lonberg-Holm, found 5' of the amino-terminal methionine codon of 1A K., Mandel, B., Melnick, J. L., Mohanty, S. B., Povey, (VP4) in HRV-14. R. C., Rueckert, R. R., Schaffer, F. L. & Tyrrell, D. A. J. The viral-encoded protease of HRV-14, like those of other (1978) Intervirology 10, 165-180. picornaviruses, is presumably crucial for proper proteolytic 2. Melnick, J. L. (1980) Prog. Med. Virol. 26, 214-232. cleavage of the viral polyprotein and for effective replication 3. Agol, V. I. (1980) Prog. Med. Virol. 26, 119-157. of the virus. The predicted proteolytic cleavage sites are 4. Kitamura, N., Semler, B. L., Rothberg, P. G., Larsen, G. R., shown in Table 2. Like other picornaviruses, there is a pre- Adler, C. J., Dorner, A. J., Emini, E. A., Hanecak, R., Lee, dominance of Gln-Gly sites (6 of 10) in addition to a Glu-Gly, J. J., van der Werf, S., Anderson, C. W. & Wimmer, E. (1981) Nature (London) 291, 547-553. a Tyr-Gly, an Asn-Ser, and a Gln-Ala. Eight of the cleavage 5. Racaniello, V. R. & Baltimore, D. (1981) Proc. Natl. Acad. dipeptides are identical in both sequence and location to Sci. USA 78, 4887-4891. those in PV-1. The Glu-Gly cleavage site between 1C and 1D 6. Palmenberg, A. C., Kirby, E. M., Janda, M. R., Drake, is not found in either PV-1 (4) or EMCV, but five are present N. L., Duke, G. M., Potratz, K. F. & Collett, M. S. (1984) in FMDV (7). Only one uncleaved Gln-Gly sequence exists Nucleic Acids Res. 12, 2969-2985. in the HRV-14 sequence (within the protease region), but 7. Carroll, A. R., Rowlands, D. T. & Clarke, B. E. (1984) Nucle- multiple uncleaved dipeptides identical to the other dipep- ic Acids Res. 12, 2461-2472. tide cleavage sites are found. The Gln-Ala cleavage site, if 8. Abraham, G. & Colonno, R. J. (1984) J. Virol. 51, 340-345. utilized, is unique to HRV and not found in PV-1, EMCV, or 9. Aviv, H. & Leder, P. (1972) Proc. NatI. Acad. Sci. USA 69, FMDV. As in other picornaviruses, the flanking amino acid 1408-1412. sequences around the cleavage sites do not fall into a specific 10. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) Molecular pattern but do contain a proline either amino- or carboxyl- Cloning: A Laboratory Manual (Cold Spring Harbor Labora- terminal to the cleaved amino acid pair in half the sites, tory, Cold Spring Harbor, NY). which is similar to the situation in EMCV (6). This lack of 11. Holmes, D. S. & Quigley, M. (1981) Anal. Biochem. 114, 193- uniformity of sequence surrounding the cleavage sites in all 197. four picornaviruses clearly indicates involvement of second- 12. Grunstein, M. & Hogness, D. S. (1975) Proc. Natl. Acad. Sci. ary or tertiary structure in the precursor molecule. USA 72, 3961-3965. The 3' noncoding region is 47 nucleotides long in HRV-14. 13. Hong, G. F. (1982) J. Mol. Biol. 158, 539-549. This compares to 72 in PV-1, 96 in FMDV, and 126 in EMCV 14. Hansen, J. N., Pheiffer, B. H. & Boehnert, J. A. (1980) Anal. (4-7). Both Biochem. 105, 192-201. HRV-14 and FMDV have single stop codons, 15. Maxam, A. M. & Gilbert, W. (1977) Proc. Natl. Acad. Sci. whereas PV-1 and EMCV have double stops. All four picor- USA 74, 560-564. naviruses lack the polyadenylylation signal sequence A-A- 16. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. U-A-A-A, which supports the finding that the poly(A) tail is Acad. Sci. USA 74, 5463-5467. transcribed from a minus-strand template. 17. Rueckert, R. R. & Wimmer, E. (1984) J. Virol. 50, 957-959. Further sequence comparisons to other HRV serotypes 18. Gwaltney, J. M. (1975) Yale J. Biol. Med. 48, 17-45. will help to generate a genomic map of common and unique 19. Kozak, M. (1981) Nucleic Acids Res. 9, 5233-5252. regions. This information will aid in mapping neutralization 20. Toyoda, H., Kohara, M., Kataoka, Y., Suganuma, T., Omata, sites, active sites of the protease and the replicase, and viral T., Imura, N. & Nomoto, A. (1984) J. Mol. Biol. 174, 561-585. receptor binding sites. Recent competition binding studies 21. Argos, P., Kamer, G., Nicklin, M. J. H. & Wimmer, E. (1984) have shown clearly that HRV-14 competes with the vast ma- Nucleic Acids Res. 12, 7251-7267. jority of other HRV serotypes for attachment to HeLa cell 22. Emini, E. A., Jameson, B. A. & Wimmer, E. (1984) J. Virol. receptors (8). Since this receptor is distinct from other picor- 52, 719-721. naviruses, determination of this highly conserved region in 23. Emini, E. A., Jameson, B. A. & Wimmer, E. (1983) Nature HRV serotypes will be of major importance. (London) 304, 699-703. Downloaded by guest on September 30, 2021