<<

Proc. Natl. Acad. Sci. USA Vol. 85, pp. 7872-7876, November 1988 Viral are homologous to the -like family of proteases: Structural and functional implications (/ prediction/sequence alignment) J. FERNANDO BAZAN*t AND ROBERT J. FLETTERICKt *Department of Biophysics, University of California, Berkeley, CA 94720; and tDepartment of Biochemistry and Biophysics, University of California, San Francisco, CA 94143 Communicated by William J. Rutter, June 28, 1988 (received for review April 8, 1988)

ABSTRACT Proteases that are encoded by animal picor- into a single large polyprotein (two in the case of the naviruses and plant como- and potyviruses form a related segmented CPMV). The precursor is proteolytically pro- group of cysteine-active-center that are essential for cessed at preferred Gln-Gly and Tyr-Gly sites to release a maturation. We show that these proteins are homologous number of mature proteins needed for virus replication, to the family of trypsin-like serine proteases. In our model, the structure, and assembly (22). The 3c gene segment in picor- active-site of the trypsin catalytic triad, Ser-195, is naviruses encodes a Cys (on average 185 amino changed to a Cys residue in these viral proteases. The other two ) that is specific for the Gln-Gly cleavages (22, 23). A residues of the triad, His-57 and Asp-102, are otherwise second, N-terminal, 2a gene segment present only in rhino- absolutely conserved in all the viral protease sequences. Sec- and enterovirus members of the picornavirus family [exclud- ondary structure analysis of aligned sequences suggests the ing HAV (12)] encodes a smaller (average 150 amino acids) location of the component strands of the twin fl-barrel trypsin Cys protease that is similar in sequence to the 3c proteases fold in the viral proteases. Unexpectedly, the 2a and 3c in the vicinity of the putative Cys, but is specific subclasses of viral cysteine proteases are, respectively, homol- for the Tyr-Gly sites (22, 24). The three plant carry ogous to the small and large structural subclasses oftrypsin-like gene regions (termed 24K in CPMV and NIa in TEV-TVMV) serine proteases. This classification allows the molecular map- (19-22, 25, 26) that encode Cys proteases similar to the ping of residues from viral sequences onto related tertiary picornavirus 3c class. structures; we precisely identify amino acids that are strong The catalytic role of a C-terminal Cys in both 2a and 3c determinants of specificity for both small and large viral proteases has been inferred by the high degree of local cysteine proteases. primary sequence conservation (16-18, 22) and by site- specific mutagenesis experiments, which showed that replac- The naturally occurring proteases can be grouped into four ing the Cys inactivates the (27). Inactivation by the classes according to the prominent functional groups at their classical Cys protease inhibitors iodoacetamide, N-ethylma- active sites: Ser, Cys, Asp, and Zn (1). The Ser protease leimide, and para-chloromercuribenzoate (28) and the pro- trypsin and subclasses are distinguished by an tein cystatin (29) confirms this assignment. These results identical spatial arrangement of catalytic His, Asp, and Ser have prompted classification of these enzymes as another residues in radically different ,8/,3 (trypsin) and a//3 (subtil- Cys protease structural class, distinct from the family isin) protein scaffolds, an example of locally convergent (with which there is no discernable sequence similarity) (16- evolution (1, 2). Within the trypsin subclass of molecules, 18, 22). Our studies demonstrate that the tertiary fold ofthese divergent proteins exhibit widely varying degrees of se- to the of quence similarity; analysis of tertiary structures, however, viral Cys proteases is similar bilobal /-barrel motif reveals a high level of structural conservation (2, 3). This the trypsin family of Ser proteases (2), a structure that is precedence of tertiary form over primary sequence is ob- different from that of papain (1). Conserved His, Asp, and served in other structurally characterized families of proteins Cys residues equivalent to the trypsin catalytic triad (2) are such as the immunoglobulins (4) and globins (5). found at structurally similar positions in the picorna- and Cys-active-center viral proteases have been identified in plant viral proteases. Other conserved residues in an align- the sequenced genomes of four genera of the picornavirus ment of viral and cellular proteases fulfill essential structural family: the rhinoviruses [ rhinovirus strains 2-14 roles or contribute directly to the homologous catalytic and (HRV2-14)] (6, 7), the enteroviruses [human poliovirus -binding properties of the Ser proteases. (HPV1), echovirus strain 9 (EV9), coxsackievirus (CXV), bovine enterovirus (BEV), and hepatitis A virus (HAV)] (8- METHODS 12), cardioviruses [encephalomyocarditis virus (EMCV) and Theiler's murine encephalomyelitis virus (TMEV)] (13, 14), The viral protease sequences were selected from the pub- and aphthoviruses [foot-and-mouth disease virus (FMDV)] lished literature and the National Biomedical Research Foun- (15). Two different classes of plant viruses also encode Cys dation (NBRF) data (ref. 30, release 15.0). The cellular proteases that are homologous to the picornaviral proteases (16-18): a comovirus with a bipartite genome, cowpea mosaic Abbreviations: HRV2-14, human rhinovirus, strains 2-14; HPV1, virus (CPMV) (19), and two human poliovirus, strain Sabin vaccine P3/Leon/37; EV9, echovirus potyviruses with monopartite strain 9; CXV, coxsackievirus, B3 strain Nancy; BEV, bovine genomes, tobacco-etch virus (TEV) (20) and tobacco-vein- enterovirus; HAV, hepatitis-A virus; EMCV, encephalomyocarditis mottling virus (TVMV) (21). virus; TMEV, Theiler's murine encephalomyelitis virus; FMDV, The animal and plant viruses discussed above have in foot-and-mouth disease virus, strain 0[1]K; TVMV, tobacco-vein- common a positive-strand RNA genome that is translated mottling virus; TEV, tobacco etch virus; CPMV, cowpea , B genome segment; SAP, (strain V8) protease); SGT, Streptomyces griseus trypsin; TRP, trypsin; CHT, The publication costs of this article were defrayed in part by page charge ; ELA, ; SGPA and SGPB, Streptomyces payment. This article must therefore be hereby marked "advertisement" griseus protease A and B; ALP, Lysobacter enzymogenes a-lytic in accordance with 18 U.S.C. §1734 solely to indicate this fact. protease; NBRF, National Biomedical Research Foundation.

7872 Downloaded by guest on September 26, 2021 Biochemistry: Bazan and Fletterick Proc. Natl. Acad. Sci. USA 85 (1988) 7873

Ser protease sequences [trypsin (TRP), chymotrypsin RESULTS (CHT), and elastase (ELA)] are found in the alignments of Craik et al. (3). Several bacterial enzymes are considered: The viral Cys protease sequences were aligned in an effort to Streptomyces griseus trypsin (SGT) (31), Staphylococcus extend the similarities that were noted in the C-terminal third aureus protease (SAP) (32), Streptomyces griseus proteases of the proteins (16-18). The picorna- and plant virus se- A and B (SGPA and SGPB) (33), and Lysobacter enzymo- quences were aligned in order of their closer similarity until genes a-lytic protease (ALP) (33). Crystallographic coordi- all sequences had been incorporated into the profile (36). Two nates for representative members of the various cellular absolutely conserved residues in the N-terminal half of the protease families are available from the Protein Data Bank viral 3c proteases, His-40 and Asp-85, accompanied the (34). Tertiary structure modeling of the 2a-3c viral protease conserved C-terminal residues Cys-147 and His-161 (Fig. 1). folds based on the homologous known structures of small- A parallel profile ofthe smaller 2a proteases showed absolute large trypsin-like Ser proteases was carried out on an Evans conservation of His-20, Asp-38, and Cys-109 residues (Fig. and Sutherland PS330 graphics work station (linked to a VAX 1). Merging ofthe 3c and 2a alignments (introducing a number 8650 computer) with the Biosym INSIGHT software pack- of gaps in the shorter 2a sequences) showed that the three age. conserved 2a residues were equivalent to the conserved His- The University of Wisconsin Genetic Computer Group 40, Asp-85, and Cys-147 of the 3c proteases. (ref. 35; release 5.2) program PROFILE (36) was used in Profile searches of the NBRF data bank (ref. 30; release compiling the alignments of distantly related proteins. Se- 15.0) utilizing the 2a, 3c, and merged profiles directed quence templates that incorporated the disparate and con- attention to the trypsin-like Ser protease family. Notably, the served amino patterns of both viral and cellular prote- 2a profile retrieved small bacterial protease sequences, while ases were used to search the (approximately) 6800 sequences the 3c and merged profiles matched several large trypsin-like of the NBRF data base release 15.0 (30). The discriminatory Ser proteases, the bacterial SAP (32) scoring highest. Five of capability of a template was refined until it could consistently the 3c proteases (HRV2, HPV1, EV9, CXV, and BEV) (6- retrieve all known viral Cys protease and/or trypsin-like Ser 11) and the five available 2a sequences (6-8, 10, 11) were then protease sequences. GENALIGN (37) produced statistical used to generate pairwise alignment statistics with GENA- alignment scores of both pairwise and multiple matchings LIGN (37) in matches between themselves, a test group of based on the method of Needleman and Wunsch (38). trypsin-like Ser proteases (SAP, SGT, TRP, CHT, and ELA) Structural constraints were incorporated into the alignments (3, 31, 32) and respective randomized sequences. The 3c/2a by allowing gaps to be located only between elements of matches produced comparable identity scores (13 on aver- known or predicted secondary structure. The latter predic- age) to the large/small Ser protease pairings (average of 14) tions combined the Garnier-Robson algorithm (39) with (33). Viral-2a/small Ser protease alignments scored an aver- amphipathic a-strand search techniques (40) to best locate age of 13 identities, while the typical viral-3c/large Ser the twelve 8-strands of the trypsin-like Ser protease fold in protease alignments were segregated into matches with SAP the viral proteases and the bacterial SAP. Accurate location (scoring x15 identities) and all others (4 identities). This low of turns was facilitated by the MATCH algorithm (41). overall identity between viral and cellular proteases must be

Box 1 Box 2 Box 3 Box 4 -10 12 2 9 3 6 48 87 117 118 129 150 Viral 2a 11 ....- .... Cys .- I T~ Protease T124 Tyr Gly 2 0 3 8 C109 Gln Gly

Box I Box 2 Box 3 Box 4 ix2644 62 74 124 15415x 6 167 93 Small Ser Protease H 3 4 D6 4 s1 4 6 S162

Box I Box 2 Box 3 Box 4 1 32 51 83 93 130 155 156 166 183 Viral 3c Cys Protease HAL' At~~~~~~~~~~~~~~~~~~~~~~JI I j i

H D C V/S12 4 0 85 14 7 G1..nGGly Qin iy

Box 1 Box 2 Box 3 Box 4 16 49 69 100 110 177 203 208 218 245 Large I I I I l I t--717-771 Ir" bmw I Ser 9 MN,1 I I Protease L%*JXXISmm m. M

D S S214 57 102 195

FIG. 1. Schematic comparison of the polypeptide chains of typical viral Cys and cellular Ser proteases that are aligned in Fig. 2. Boxes 1- 4 correspond to sequence-similar regions ofthe chains that contribute to formation of the catalytic site or substrate binding pocket in determined tertiary structures. The different box-shading schemes distinguish sequence patterns of the two known structural subclasses of cellular trypsin-like Ser proteases (labeled small and large) homologous to the viral 2a and 3c Cys protease subclasses. Proteolytic processing sites are identified at the N and C termini of the viral proteases by vertical arrows. Cellular proteases have analogous N-terminal sites, as the enzymes are processed from longer, inactive precursors. The broken-outline box (numbered -10 to 1) at the N terminus of the 2a protease represents sequences N-terminal to the presumed Tyr-Gly cleavage site that display significant similarity to mature protein sequences in 3c and cellular proteases. The numbering of the aligned viral 2a-3c and small Ser protease sequences is absolute; the large Ser protease sequence is numbered according to the chymotrypsinogen scheme (3). H20, His-20; D38, Asp-38, etc. Downloaded by guest on September 26, 2021 7874 Biochemistry: Bazan and Fletterick Proc. Natl. Acad. Sci. USA 85 (1988)

qualified by the realization that the few conserved residues considering sequence/structure relationships in widely di- are known to play essential structural roles in and vergent families of proteins. We sought to distinguish substrate binding for trypsin-like Ser proteases. In particular, whether the low similarity that is observed in viral/cellular the three absolutely conserved residues in the viral 2a/3c sequence comparisons was indeed indicative of structural merged alignment (His-20/40, Asp-38/85, Cys-109/147; see similarity and evolutionary relationship or suggestive merely Fig. 1) superimposed on His-57, Asp-102, Ser-195 of the of analogous structure. To this effect, available structural trypsin-like Ser protease catalytic triad (Fig. 1) (2, 3). No information [from Ser protease crystal structures (2) and viral significant alignment was possible with the active site triad protease secondary structure predictions (39-41)] was incor- (Asp-32, His-64, Ser-220) of subtilisin-like Ser proteases (1). porated into the viral/cellular protease alignments in an effort The lead of Taylor (4) and Bashford et al. (5) is followed in to refine the profiles and increase their ability to retrieve

a Vla, a I 0 2 0 30 4 0 5 0 so0 CPMV O24K 01SS VAIMSKCRANLOVFGT NLOIVMVPGRR. IF, LACKHFFT IK.. TKL RVEI VMDtRRRYYHOFDPANI Y. D T E V NIII a TI GMAVPV AYDOL PPKNEDLTFEGESL F. KGPRDYNPI SSTI HLTNESD TT. SLYGI GFGPFII ITNKHL FRRNNOTL LVO T V M V NI a T/GPTETL...PFDALPPEKOEVAFESKALL KGVRDFNPISACV LLLENSSD SE.. RLFGI GFGPYIIIANOHL FRRNNGEL TIK F MiDV_ E/ISAPPTDLOKMVMGNTKPVELILDGKT VAICCATGVF... GTAYLVP LF. AEKY KI MVD FRA MT DS DY RV FE F. E T ttE V O AGNPVMUD FELFCAKNI VAPI TFYYP.....DKAEVTOSCLLL. GAHLFVVN VA. ETDWTAFKLKDV VRHERHTVALN .... SV E MCV 3o O/GPNPVMD..FEKrVAKHVTAPIGFVYP ...... TGVSTOTCLLV.. ON:TLVONK MA...ESDNTSI VVNSOG VTHARSTVKI L .. Al HAV O/ S T L . E GALVRKNL VMOAFGVGEKNGC .. VRWVMN L G V K .. D DWLLVPS AYKF. EKDYEMMEFYFNNRGGTYYSI SAGNVVI B E V 3o O/ GPL ...... FDF VSLLKKNI RTVKTG ...... AGEFTALGVY DTVVVLPR AM...PGK TIEMNOKK DI E V L D A Y D L N C X V 3_ 01/GP FEFAVAMMKRNSSTVKTE YGEFTMLGI Y. DRWAVLPR A K .....PGPT L NDCOEVGVLDAKEL...V OEVGVLDAKEL ..... V EVe GPA FEFAVAMMKRNASTVKTE YGEFTLGIY DRWAVLPR AK.....PGPSI LMNDIO K L D A L E H P V I 3o O/ GPO F DYAVAMAKRNI VTATTOS KGEFTML GOVH DNVAI L PT AS . PGESI VI D K EVE EA HRVt 4 3_ 0/ GPN ..... TEFALSLLRKNI MTI TTS ...... K EFTGLI DRVCVI PT AO. PGDDVLVNGO OK IRVK DKYK L ...... V H RV 2 _o 0/ GPE ...... EEFGMSLI KHNSCVI TTE ...... NGKFT LGVY ... D RF VV PT AD. P OK E OV DD I ITT K VI DS Y D L V..Y Co p .I a v a zkzr n p v & .1k g g y dVP r a.. p g . s v v .g qq n v d I. S A P ..VI LPNNDRHOI TDTTNGHYAPVTYV OVEAP. TGTFI ASGVVV.. KDTLLTNK VVD. .ATHGDPHALKAFFPSAISDIN .O.DON.. S GT VVGG. TRAAOGEFPFMVRLSM GCGGALYA. ODI VLTAA CVSGSGNNTSI TAT GSG V VDL OSGS ...... T RP ..I.VGG YTCGANTVPYOVSLNS GYHFCGGSLIN.. SOWVVSAA CYKS .....I OVRLGCODNI NV V E...... C H T I VN G EEAVPGSWPWOVSLODKT...GFHFCGGSLIN.N ENWVVTAA CGVT .. .. TSDVVVAGEEF DOGSSSE ...... E L A .. VGG TEAORNSWPSOI SLOYRSGSSWANTCGGTLI R. ONWVMTAA CVDRE. LTFRVVVGEEHNL NO. NN...... Soo StI, b bbbbbbbb bbbbbb bb bbbbbbbb bbbbbbb bb bbbbbb St r *nd A B C D Chy a I 6 2 0 3 0 4 0o 7 0

9 0 00oo 1 101 2 0 IPDSELVLYSHPSLEDVSHS.... C 'A O L F CWDP DK ELPSV FGADFLSCKYOEFSSFYEAOYN ADIKVRTKXECLTIO T E V III a SL HGVF KVKNTT TL OOHL D GR O 11IRPK DFPPF OKLKFREPPOEEI CLVTT FOTKSMSS S T II a TM ..... G M A DFPPFP ...OKLKFROPTIKDNVCMVST NFOOKSVSSLVS V Lt V 4HGEFKVKNSTOLOMKPVE RDOI: IVIK A TKK. F MDV 3Ie KVKGODMLSDAALMVLHRGNNR.ANMIA...... V T FNDT TPVVGVINNADVGRLIFS ...... G EALTYKDI VVCM T M E V 3o NRSGAKTDLT F KVTKGPL ..... FE NV F CSN KDDFP ARNDTVTGI MNTGLAFVYS GNFLI GNOPVNT E MlC V _3 AKAGKETDVSFI RLSSGPL ..... FN NTSKFOKA TGAAPVTGI NTDI PMMYT GTFLKAGVSVPVE H A V Its SL DVGFODVVLMKVPT PK..... F D, TOHFI KK. R. ALNRLATLVTTVNGTPML ...... I SEGPLKMEEKA e E V Its DKTDTSLELTIVKLKMNEK ...... FN 'AM P D TDYN EAVVVVNTSYYPOLFTCVG RVK DYGFLNLAG C X V DK DGT NL ELT L LELNRNEK ...... F ...... EVEVN. EAVLAI NTSKFPNMYI PVG ...... OVTEYGFLNLGG _3 K E T LL L E K . tDIGGFVAKE D DO NL L ONOR E V S So DKDSINLELTLLKLNRNEK.... FR NGFLARE EVEVN EAVLAI NTSKFPNMYI PVG OVTDYGFLNLGG H PV I 3e DOAGTNLEITIITLKRNEK ...... FNRDI ROH PT ...... I TETN .. DGVLI VNTSKYPNMYVPVG ...... AVTEOOYLNLGG HRVI 4_ 3 DPENINLELTVLTLDRNEK...... F RGFI SED...... LEGVD. ATLVVHSNNFTNTI LEVG .. P.NPLSST A LI H RV 2 Its SKNOI KLEI TVL KL DRNEK ...... F A ODI R RYI N EDDYP NCNLALLANOPEPTI INVG DVVSYGNI LLSG Con d ks d g q * kx It n r n * It ...... r D Ir k .kq ... p v a v a 9 wgp p Iv g s n v 9 n 9 SSAP P YPNGGFTAEOI TKYSS...... EK LALVKFSP. NEONKH GEVVKPATMSNNAETOVNO TVTGYPGDKP S 0CT AVKVRSTKVLOAPGYNGT G KO WA LAIKO^IPIN.... OPTLKI A.. TTTAYNOSTFTVAGWGANRE. GGSOORYLLKANVPFVSD T RP .GNOOFISASKSIVHPSYNSLTI.. 00N D L I K L KS AASLNSNVASISLP..TSCASAGTOCLISGNSNTKSSGTSYPDVLKCLKAPILSN CH T .KIOKLKI AKVFKNSKYNSLTI.. N N O T L L K L T .A A S F SOT V S AV C L P S A S D D F A A GT T CV T T NWO L T RY T N A N T P D R L OO A S L P L L S N E LA .GT EOY VG VOKI V VHPY WNT DDVAA _ IOALLNRLA SVTL NSYVOL GVL PRAGT LANNSPCYI TGWGL TNTN. GOLAOTLOOAYL PT VDY SS5.a St bbbbb bb bb bbb bb b b b bbbbb bbbb bbbbbbb b SItraodst Stndr E a H

Cb y m a I 0 0 t 0 2 1 1 0 1 20 130 1 40 1 50 1 60 I 7 1 a 1 3 0 1 4 0 1 5 0 6O O O

C 24 SG NYV EAPTI PEDC GKI GCASLLPPLEPI AOAOI 0...... PMWV K fNKVSRYL EY ... SLVI AH G C IVGVHVAGIO0 00 T EVNI a DTSC...TFPS S DG0I FWKHW .I.. OTKD CSPLVSTNC D TT NNY F T SVP KN F MEL L TNO E A OO WV T V tV- III a ESSH OVH E DT S F WOHW. T T K D CSPLV S I CDG0 4I Lt NHSL THT TNGSNYFVEFPEK FVATYL DAADGWCK ...... F M DV3aS DTMPGLFAY RAAT KA Y CGGAVL AK GG A T GVGYCSCVSRSMLLKMKAHIDPEPHHEIG D...... D IV HTSAGGN T M E V 3c .ACFNCLH. .... NAOTN COSAO CNVNN G K K GL AAATt TKELI EAAEKSMLALEPO/ G E MCV 3 T...... OTFNHCI HY.. KANTRK CGCSALLADLGG G S K Lt 1HSAGSS GI. AAASI VSOEMI NAVVNAFEPOI G. H A V 3 TYOHEKK. NDSTT rvDLTVDOAW..RGKOGEGLP MCGIGALVSSNC 0S ON Lt |IHVAGG0 SI. LVAKLVTOEMFONI DKKI El S. BE V o3 TPTKVL0 .MMY EFPTKA CO4CVIGVI SM It VIVHVGGNC 50 FAASLLNNRYFTAEO/ CXV TPTK 'IML . FPTFtAG C450LM ST 0 KVLSI HVGGNC H 0 GFSAALLKLNFLK DEO .. TPTKNLMT. .... NFPTRA OCOSVLMSTO SVLGtIH0000 4HGF.SFSAALLRHYFNEEOI R H P V I _ 3o ROTARI LMY .. . N F P T A C IGVI T C T SH ...... GFAAALKRSYFTOSOG.O...G.. HRVI 4530 TPTNRMI RY . DYATKT CGSVLCATGT KV FG11HVGGNG RO GFSAOLKKOYFVEKOI O.l. H R V 2 _3 e NOTARMLKY . SY PT K S YCIOVLYKIG. OVLKV I St HVG1NOH NOGSG RD..GFSAMLLRSYFTDVOI Co 1...... 9q .. s a*v r v a " I.rcl0. pka Ggpls9lg. g 1 s N9 g n 9q y q S A rTLKGEAMOY. ... DLSTTG NSGSPVFNEKN WGGVP N E FNOAVF NENVRNFLKONI EDI HF A D EVGI11H S GT iAANEENEKICAGTPDTGGVDTCO DSSPVMFCKDIIS G P FtRK N D E V OV|I VSWGYG ..CARP. GYPOVYTEVSTFASAI ASAARTL.. T RP NDOTI TSNMFCAGYLO. GGKDSCM D S PVVCS. ..CAOK. NKPOVYTKVCNYVSWI KOTI ASN.. C H T TOCEK8. KDAMI CAGA. SGVSSCM DSCOPLVCKKNNG. AV TCSTS. TPOVYARVTALVNWVOOTLAAN. E LA Al CSSSSTNOSTS iKNSMVCAGGD. SVNG C C D OPLHCLVI N G. ON TV |IVTSFVSR LOCNVT. RKPTVFTRVSAYI SNI IASN.. ** b See-St r bbbbb b bbbb b b b b b b *b b b b b b a a a a a a a a a a a a a I J K IbL1

Ch Yon a 2 2 0 2 30 24 0 VI r al IF by0 m

10 1 10 20 Box 1 30 40 Box 2 50

0EVV 2a CSNRASLTSYYGPFGOOOGAAYV ...... GSYK IL L. TYADWEN ...... EVWOSY ..... ORDLLVTRVDAHC . DI RCN TTTROSITTMTNT.GAIWTTIRGSV ... CGDYRVVN SO TSADNON. CVWESTY ..... N LLVSTTTAHIC ...... DIIARCO H P V I 2a| LTPLSTKDLTTTOFGHONKAVYTT....AGYKtICOT LA. TODDLON ....AVNVMN ...ISRI0LLVTESRAOT SI ARNN HRV1 40 2a VILPSKGDI LGTTYGGHYT..VNVPLLVSFO|T0.GYKICNYLM.TPODDLN PRCN H R V2 _ 2a T A VT RPI T T A. .GPSDMYVHV..P.GSNMLI... YRNL LF. NSEMHE ...... S L OS Lt T N0T0V D ...... D Y PS C D C00 n 9 9 t .C aV y k . 9 s t a .d n v 9 y . t t a h d a , COOP A ..I AGGEAI TTG GSRCSLGFNVSVNVAHALTA CTS SSASW GTRTGTSFPNN oI NGRHSNPAAARVYTLYNGSYODI TTAGN ..I SGGDAI YSS ... TGRCSLGF VRS G STYYFLTA CTDGATGTW NSARTTVLGTTSGSSFPNN TD IVRTYT TTI PKDGTV.G ...0. TSAAN A L P AN IVGGI EYSI NN.. ASLCSVGFSVTRGATKGFVTAI CG. TVNATAR. GGAVVGTFAARVFPGN RAWVSLTSA TLLPRVAN. GSSFVT V RGSTE tr b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b A C 5t ra| a O E F So m 2In aEa0.2 5 12 27 34 37 47 57 62 64 72 a3 C h y m 10 30 40 50 S7 60 00 90 100 102 110 120

Vl r el s0 70 00 90 Box 100 110 120 ox 4 130 140 *I 01_I B E V_ 2a| CRSGIYYCKSTAKHYPIVVTPPSITYKIEANDYYPERMOTHILLGIGFAEPODCOOLLRCEH. V ILTOGGO HV..GFADVDLLWIE DDAME OIG. C XV 2az CNAGVYYCESRRKYYPVSFVGPTFOYMEANNYYPVRYOSMLI GHGFESPODCG LRCHH. GVIIITA GEGLV. AFSDI RDLYAYEEAMEOI G. HRV1 42 H R V 2 2a| CTOATTTCKHEONRFPITVTSHDWYEIOESEYYPKHIOYNLLIGEGPCEPODOCOOLLCKH.GVtOIOTAGGDNHV..AFIDLRHFHC.AEEOiG C00 n c t V V y,y c 9 r k y P a 9 t P n e n Y y P q y g m 9 n 9 e p DN1]G GiIj c h V G t 19g 9 gfad m a q ATVGMAVTRRGSTTGTHSGSVTALNAT V. NYGGGDVVTYGM IRTNVCAEPODSOGLLYST. RIGLLTSIGSGNCSS. GGTTFFOPVTEAL SVYGASVY L AAVGAAVC 00000P, AFVGOAVORSGSTTGLRSGSVTGLNATV..N0GSSGIVTGMIOTNVCAGP00DSSCSLFAGS.TALGLTSb Ibb bb bNSGRTTGb b bbTOCGTITAKNVT0A..b bbbb S.O bbbbST b b bIbO0SWISb b bIb bOO bb b bVMSGONVOSN.SORSSLFERLOPILSOTGLSLV b b b GSSSCRT.GGTTFYOPVTEALSATGATVLb Ibb b bbb bb b a a a a a a a S a H I __U~GJ .A .. K .. iLL I Sm IS I 9 6 10 1 14 1 24 1 34 t 3 9 1 46 t S I 50a 9 I100 19 1 Chy . 1 30 1 40 1 60 1 70 10a0 1 90 1 95 2 00 2 10 2 20 2 3 24 0

FIG. 2. Alignment of the viral Cys protease sequences with representative trypsin-like Ser protease sequences from and bacteria. The numbering schemes of Fig. 1 are followed. (a) Viral 3c/large Ser protease alignment. (b) Picornavirus 2a/small bacterial Ser protease alignment. Consensus (Cons) sequences highlight identical (uppercase) and most frequently occurring (lowercase) residues at any given position. The 12 core 3-strands (labeled A-L) and short C-terminal a-helix (labeled 1) characteristic of all trypsin-like Ser proteases are located under the profiles. Gaps in the alignments map to length- and sequence-variable surface loops. The similarity boxes (labeled 1-4) of Fig. 1 encase the "conserved" catalytic triads of His, Asp, and Ser/Cys residues (#) as well as the few other identical matches (boxed and in bold). Other chemically conserved residues are just in bold lettering. Notably, residues 189-190, 213-216, and 226 (labeled by @) are important in forming the substrate binding pocket in large Ser proteases. Residue 213 in the 2a proteases is not a conserved His as was the case in the 3c and SAP enzymes. The * in the ALP sequence marks a small insertion (3). Downloaded by guest on September 26, 2021 Biochemistry: Bazan and Fletterick Proc. Natl. Acad. Sci. USA 85 (1988) 7875 homologous sequences from the data bank. The chymotryp- 2a protease alignment can be accommodated by shortening sinogen numbering scheme (3) was adopted for all cellular loops in the three bacterial structures. We can identify and viral sequence referrals (Figs. 1 and 2). specificity-determining residues in the 2a proteases by com- The alignment in Fig. 2a illustrates the evolutionary rela- parison to the homologous SGPA, SGPB, and ALP struc- tionship between the 3c viral proteases and the large-trypsin tures. The substrate-binding pocket in these small bacterial family of proteins. While the overall degree of proteases is not well developed and can be better described identity is poor, blocks of significant similarity surround the as a groove on the surface of the enzyme (42). SGPA and three active site residues (boxes 1-4). These regions form SGPB are characterized as having a chymotryptic specificity either the hydrophobic core of the 8-barrels or the loops that for aromatic residues (42). The 2a proteases are similar to define the active site and binding pocket ofthe enzyme. Gaps SGPA and B in their specificity for Tyr at the S1 site (22). that must be introduced to align the shorter 3c proteases map to sequence and length-variable surface loops (3) connecting the 12 p-strands of trypsin. One loop stands out regarding DISCUSSION both conservation in length and acidic amino acid composi- Inspection of the blocks of sequence similarity in Figs. 1 and tion in both viral and cellular enzymes: residues 70-79 have 2 suggests that the 2a and 3c viral proteases are structurally been shown to bind calcium in eukaryotic trypsin molecules similar, from which we infer that they diverged from a (2). The proposed D-E loop in viral proteases may also bind c2 +. common ancestor. Their separate alignment with small bac- Ca2 terial and large trypsin-like proteases, respectively, serves to On the basis of known three-dimensional structures, we emphasize the parallel evolutionary relationship with the two can identify the residues that will form the specificity (Si) homologous classes of trypsin-like molecules. A correct pocket in the 3c proteases. In particular, residues 189-190 of alignment (as regards structure) between the 2a and 3c the large-trypsin class are located at the bottom of the S1 classes will be best attained by structural superposition of pocket and are a major determinant of specificity (2,3). These spatially equivalent residues that will require large gaps in the positions are typically Ala/Pro-Thr in the viral 3c proteases, smaller 2a sequences. In lieu of the actual crystal structures, similar to the Ser-Thr pairing of SAP (32). Residues 216 and this structural alignment is made possible by the separate 226 (which lie on opposite sides of the pocket) are well grouping of the 3c and 2a classes of viral proteases with the conserved in character between the viral and cellular en- large and small trypsin-like molecules (Figs. 1 and 2). We zymes. The striking differences between the viral 3c and observe that the 2a/3c viral proteases are both 35-45 amino trypsin-like enzymes (besides the active site nucleophile) acids shorter than their corresponding small/large cellular map to residues 213 and 215, respectively positioned at the homologs (Fig. 1). By conventional reasoning (42), the viral side and top of the specificity pocket (2). Residue 213 is molecules could be judged to antedate the cellular molecules typically a small hydrophobic amino acid and residue 215 a the greater economy of large aromatic amino acid in the sequenced trypsin homologs in evolutionary history. However, (3). The 3c proteases have instead a conserved His-213 and Ala/Gly at position 215. Molecular modeling of this pair of residues in the pocket of a trypsin-inhibitor complex struc- ture (2) reveals possible hydrogen-bonding interactions be- tween viral His-213, Thr-190, and the S1-bound Gln side chain (not shown). Accurate positioning of the His-213 ring may be made possible by the Tyr-228 -* Leu viral . We postulate that residues in this collection are the primary determinants of the Gln-Gly cleavage specificity of the 3c proteases. These amino acids are different in all known trypsin-like enzymes (3) with the exception of SAP (32), which has a homologous Thr-190/His-213/Gly-215/Val-228 complement of residues. This bacterial Ser protease recog- nizes a Glu in the S1 pocket (32). The 2a picornaviral proteases are aligned with three small trypsin-like Ser proteases of bacterial origin in Fig. 2b. Tertiary structures have been determined by x-ray crystal- lography for SGPA-B and ALP (33). These bacterial enzymes show clear core with the large trypsin-like class, differing significantly only in the economy and surface packing of the loops connecting the p-strands (42). The poor sequence identity (typically -15%) compounded with the differences in chain length made alignments between bacte- FIG. 3. Illustration of important residues in an alignment of small rial and pancreatic enzymes difficult prior to structural viral/cellular proteases mapped onto the a-carbon framework of determinations (33, 42). Fig. 2b shows that there is a SGPA (43). Residues of the catalytic triad (residues 57, 102, and 195), significant degree of sequence identity in the active site and conserved amino acids at positions 211, 214, 216, and 171 are far residue boxes between bacterial and viral 2a proteases (10 apart in sequence but contiguous in space. Residue 142 is a conserved identities in the alignment of 8 sequences). Identical residues Lys/Arg in the 2a proteases, and is located in a position to interact map to the interface between p-barrel domains in a small with the conserved Asp-102 in a manner analogous to the bacterial bacterial structure and are catalytically important (such as ion pair ofArg-138-Asp-102 (42). Loop residue 66 in SGPA is the site His-57 and Asp-102 accompanying Cys-195), structurally of a seven-amino acid insertion in SGPB. The corresponding D-E loop in viral 2a proteases is similar to that in SGPA. The E-F loop vital glycines (such as the tightly buried Gly-211, or Gly-216 is abbreviated in the viral proteases in a manner not seen in any with stressed dihedral angles), or particularly functional, as bacterial protein and requires the joining of marked residues 91-94. in the case of the buried Tyr-171 that hydrogen bonds to Residues 122-144 denote an abbreviated viral F-G loop (the hinge residue 214 (a bacterial Ser, viral Thr, that interacts with between p-barrel domains), hypervariable in sequence and length in Asp-102) (2). These residues are highlighted on the a-carbon known Ser enzymes. Both of these loops are spatially distant from frame ofthe SGPA crystal structure (43) in Fig. 3. Gaps in the the conserved active site in the viral 2a structural model. Downloaded by guest on September 26, 2021 7876 Biochemistry: Bazan and Fletterick Proc. Natl. Acad. Sci. USA 85 (1988) viral proteases mayjust be a reflection ofthe tightly budgeted 5. Bashford, D., Chothia, C. & Lesk, A. M. (1987) J. Mol. Biol. 196, 199- 216. viral genome. 6. Skern, T., Sommergruber, W., Blaas, D., Gruendler, P., Fraundorfer, F., Mechanistically the His-57/Asp-102/Cys-195 active site Pieler, C., Fogy, I. & Kuechler, E. (1985) Nucleic Acids Res. 13, 2111- geometry of the viral proteases may function homologously 2126. 7. Stanway, G., Hughes, P. J., Mountford, R. C., Minor, P. D. & Almond, to that oftrypsin and analogously to that ofpapain. The active J. W. (1984) Nucleic Acids Res. 12, 7859-7875. site geometry of the catalytic His-159/Asn-175/Cys-25 triad 8. Stanway, G., Hughes, P. J., Mountford, R. C., Reeve, P., Minor, P. D., of papain is surprisingly similar to that of the trypsin/ Schild, G. C. & Almond, J. W. (1984) Proc. Nadl. Acad. Sci. USA 81, subtilisin groups (44) but displays no other structural simi- 1539-1543. 9. Werner, G., Rosenwirth, B., Bauer, E., Seifert, J.-M., Werner, F.-J. & larity in the rest of the protein (1, 44). We emphasize that the Besemer, J. (1986) J. Virol. 57, 1084-1093. trypsin, subtilisin, and papain families of enzymes have 10. Lindberg, A. M., Stalhandske, P. 0. K. & Pettersson, U. (1987) Virol- independently evolved a similar active-site constellation of ogy 156, 50-63. functional residues but share no common ancestor (1, 2). The 11. Earle, J. A. P., Skuce, R. A., Fleming, C. S., Hoey, E. M. & Martin, S. J. (1988) J. Gen. Virol. 69, 253-263. viral 3c protease model is further inconsistent with a subtil- 12. Najarian, R., Caput, D., Gee, W., Potter, S. J., Renard, A., Merrywea- isin or papain secondary and tertiary structure. ther, J., Van Nest, G. & Dina, D. (1985) Proc. Nail. Acad. Sci. USA 82, Experimental simulations of the papain triad in a trypsin- 2627-2631. 13. Palmenberg, A. C., Kirby, E. M., Janda, M. R., Drake, N. I., Potratz, like Ser protease have used direct chemical modification (45) K. F. & Collett, M. C. (1984) Nucleic Acids Res. 12, 2969-2985. or mutagenic (46) means to change the active site Ser to a 14. Pevear, D. C., Calenoff, M., Rohzon, E. & Lipton, H. L. (1987)J. Virol. Cys. Higaki et al. (46) expressed a mutant trypsin with a 61, 1507-1516. Ser-195 -- an enzyme less active than 15. Forss, S., Strebel, K., Beck, E. & Schaller, H. (1984) Nucleic Acids Res. Cys change, producing 12, 6587-6601. native trypsin by a factor of 105. This mutant theoretically 16. Argos, P., Kamer, P., Nicklin, M. J. H. & Wimmer, E. (1984) Nucleic mimics the 3c viral Cys protease active site. However, the Acids Res. 12, 7251-7267. Cys-195 trypsin mutant may be a poorer enzyme because the 17. Franssen, H., Leunissen, J., Goldbach, R., Lomonossoff, G. & Zimm- electronic charge distribution of the native active site is ern, D. (1984) EMBO J. 3, 855-861. 18. Domier, L. L., Shaw, J. G. & Rhoads, R. E. (1987) Virology 158,20-27. evolutionarily designed for Ser-195 as nucleophile. The 19. Lomonossoff, G. P. & Shanks, M. (1983) EMBO J. 2, 2253-2258. observed viral modification of the nucleophilic Ser -+ Cys 20. Domier, L. L., Franklin, K. M., Shahabuddin, M., Hellmann, G. M., may be accompanied by other changes that contribute to a Overmeyer, J. H., Hiremath, S. T., Siaw, M. F. E., Lomonossoff, new active site environment. G. P., Shaw, J. G. & Rhoads, R. E. (1986) Nucleic Acids Res. 14, 5417- 5430. This study proposes a compelling model for the tertiary 21. Allison, R., Johnston, R. E. & Dougherty, W. G. (1986) Virology 154, fold and active-site geometry of the viral Cys proteases that 9-20. clearly differentiates them from the papain-like class of 22. Wellink, J. & van Kammen, A. (1988) Arch. Virol. 98, 1-26. cellular Cys proteases. Previous analyses (16-19, 22, 24) 23. Hanecak, R., Semler, B. L., Ariga, H., Anderson, C. W. & Wimmer, E. (1984) Cell 37, 1063-1073. have focused on two other conserved viral residues (besides 24. Toyoda, H., Nicklin, M. J. H., Murray, M. G., Anderson, C. W., Dunn, the putative active-site Cys) as important to catalysis. We J. J., Studier, F. W. & Wimmer, E. (1986) Cell 45, 761-770. have argued that His-213 plays an important specificity- 25. Garcia, J. A., Schrijvers, L., Tan, A., Vos, P., Wellink, J. & Goldbach, determining role in the 3c viral proteases and bacterial SAP. R. W. (1987) Virology 159, 67-75. 26. Carrington, J. C. & Dougherty, W. C. (1987) J. Virol. 61, 2540-2548. His-203 (conserved in the 2a class) is predicted to be in a 27. Ivanoff, L. A., Towatari, T., Ray, J., Korant, B. D. & Petteway, S. R. surface loop connecting strands J and K (Fig. 3). Others (47) (1986) Proc. Nail. Acad. Sci. USA 83, 5392-5396. have postulated that (p/,a-type trypsin) Ser enzymes and 28. Konig, H. & Rosenwirth, B. (1988) J. Virol. 62, 1243-1250. (a + papain) Cys enzymes are ancestrally related 29. Korant, B. D., Brzin, J. & Turk, V. (1985) Biochem. Biophys. Res. ,-type Commun. 127, 1072-1076. because ofa limited sequence similarity in the vicinity ofboth 30. Sidman, K. E., George, D. G., Barker, W. C. & Hunt, L. T. (1988) Cys-25 (papain) and Ser-195 (trypsin) to some members ofthe Nucleic Acids Res. 16, 1869-1871. 3c viral class of proteases. A comparison of the determined 31. Read, R. J., Brayer, G. D., Jurasek, L. & James, M. N. G. (1984) crystal structures of papain and trypsin enzymes (1, 44) does Biochemistry 23, 6570-6575. 32. Drapeau, G. R. (1978) Can. J. Biochem. 56, 534-544. not support this proposal. 33. Delbaere, L. T. J., Brayer, G. D. & James, M. N. G. (1979) Can. J. Asp active-center proteases from have re- Biochem. 57, 135-144. cently been shown to be evolutionarily related to the cellular 34. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Jr., Asp protease family (48). Molecular modeling of the human Brice, M. D., Rogers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977) J. Mol. Biol. 112, 535-542. immunodeficiency virus protease by using the x-ray coordi- 35. Devereaux, J., Haeberli, P. & Smithies, 0. (1984) Nucleic Acids Res. 12, nates of bacterial structures has resulted in a useful 387-395. structure for the design of therapeutic inhibitors (48). Anal- 36. Gribskov, M., Homyak, M., Edenfield, J. & Eisenberg, D. (1988) Comp. ogously, the twin "small" and "large" trypsin-based struc- Appl. Biol. Sci. 4, 61-66. 37. Martinez, H. (1988) Nucleic Acids Res. 16, 1683-1691. tural models proposed for the 2a and 3c classes of proteases 38. Needleman, S. & Wunsch, C. (1970) J. Mol. Biol. 48, 443-453. are sufficiently detailed to allow testing prior to actual 39. Garnier, J., Osguthorpe, J. D. & Robson, B. (1978) J. Mol. Biol. 120, 97- structural determination. We expect that analysis of these 120. viral "Cys trypsins" will provide valuable information on the 40. Finer-Moore, J. & Stroud, R. M. (1984) Proc. Natl. Acad. Sci. USA 81, 155-159. evolution of catalytic function and protein structure in the 41. Cohen, F. E., Abarbanel, R. M., Kuntz, I. D. & Fletterick, R. J. (1986) trypsin superfamily. Biochemistry 25, 266-275. 42. James, M. N. G. (1976) in andPhysiologicalRegulation, eds. We thank Drs. C. Craik, T. Jukes, J. Higaki, and C.-B. Stewart for Ribbons, D. W. & Brew, K. (Academic, New York), pp. 125-142. stimulating discussions and critical readings of this manuscript. 43. Sielecki, A. R., Hendrickson, W. A., Broughton, C. G., Delbaere, L. T. J., Brayer, G. D. & James, M. N. G. (1979) J. Mol. Biol. 134, 781- J.F.B. acknowledges the lead of L. Evnin in setting in motion 793. experiments to test the viral Cys protease structural model. The 44. Argos, P., Garavito, R. M., Eventoff, W., Rossmann, M. G. & Branden, support of Tina S. Bazan (to J.F.B.) is gratefully recognized. C. I. (1978) J. Mol. Biol. 126, 141-158. 45. Yokosawa, H., Ojima, S. & Ishii, S. (1977) J. Biochem. 82, 869-876. 1. Neurath, H. (1984) Science 224, 350-357. 46. Higaki, J. N., Gibson, B. W. & Craik, C. S. (1987) Cold Spring Harbor 2. Kraut, J. (1977) Annu. Rev. Biochem. 46, 331-358. Symp. Quant. Biol. 52, 615-621. 3. Craik, C. S., Rutter, W. J. & Fletterick, R. J. (1983) Science 220, 1125- 47. Gorbalenya, A. E., Blinov, V. M. & Donchenko, A. P. (1986) FEBS 1129. Lett. 194, 253-257. 4. Taylor, W. R. (1986) J. Mol. Biol. 188, 233-258. 48. Pearl, L. H. & Taylor, W. R. (1987) (London) 329, 351-354. Downloaded by guest on September 26, 2021