Insect Biochemistry and Molecular Biology 40 (2010) 835e846

Contents lists available at ScienceDirect

Insect Biochemistry and Molecular Biology

journal homepage: www.elsevier.com/locate/ibmb

Molecular cloning of a multidomain cysteine and protease inhibitor precursor from the tobacco hornworm (Manduca sexta) and functional expression of the F-like domain

Takayuki Miyaji a, Satoshi Murayama a, Yoshiaki Kouzuma a, Nobutada Kimura b, Michael R. Kanost c, Karl J. Kramer d, Masami Yonekura a,* a Laboratory of Food Molecular Functionality, College of Agriculture, Ibaraki University, 3-21-1, Chuo, Ami-machi, Inashiki-gun, Ibaraki 300-0393, Japan b Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki 305-8566, Japan c Department of Biochemistry, Kansas State University, Manhattan, KS 66506, USA d Center for Grain and Animal Health Research, Agricultural Research Service, United States Department of Agriculture, 1515 College Avenue, Manhattan, KS 66502, USA article info abstract

Article history: A Manduca sexta (tobacco hornworm) cysteine protease inhibitor, MsCPI, purified from larval hemo- Received 3 June 2010 lymph has an apparent molecular mass of 11.5 kDa, whereas the size of the mRNA is very large (w9 Received in revised form kilobases). MsCPI cDNA consists of a 9,273 nucleotides that encode a polypeptide of 2,676 amino acids, 10 August 2010 which includes nine tandemly repeated MsCPI domains, four cystatin-like domains and one procathepsin Accepted 10 August 2010 F-like domain. The procathepsin F-like domain was expressed in Escherichia coli and processed to its active mature form by incubation with pepsin. The mature hydrolyzed Z-LeueArgeMCA, Z- Keywords: PheeArgeMCA and BoceValeLeueLyseMCA rapidly, whereas hydrolysis of SuceLeueTyreMCA and Z- Cysteine protease inhibitor e e Cystatin Arg Arg MCA was very slow. The protease was strongly inhibited by MsCPI, egg-white cystatin and fl Cathepsin F-like protease sun ower cystatin with Ki values in the nanomolar range. When the MsCPI tandem protein linked to two Manduca sexta MsCPI domains was treated with , it was degraded by the cathepsin F-like protease. However, Multifunctional domains tryptic digestion converted the MsCPI tandem protein to an active inhibitory form. These data support the hypothesis that the mature MsCPI protein is produced from the MsCPI precursor protein by trypsin- like proteases. The resulting mature MsCPI protein probably plays a role in the regulation of the activity of endogenous cysteine proteases. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction mechanisms against invading organisms such as bacteria, viruses and parasites (Dubin, 2005). Cystatins from animal origin target Proteinaceous cysteine protease inhibitors (CPIs) have been lysosomal cysteine proteases, generally known as . The found in various animal and plant tissues, and many of them have cysteine proteases are involved both in the physiological protein been isolated and characterized in terms of their protein structure breakdown (cathepsins B, C, F, H, L, O and X) and specific functions and inhibitory activity. According to the MEROPS listing (Rawlings correlated with their tissue distribution (cathepsins K, S and V). The et al., 2004), more than 10 families of CPIs have been classified cysteine proteases are optimally active in the slightly acidic, based on the amino acid sequences of inhibitory domains. Cystatins reducing milieu found in lysosomes. They comprise a group of (family I25) are the best characterized CPI group of mammalian and -related , sharing similar amino acid sequences and plant origins. On the basis of , the cystatin folds (Turk et al., 2001). superfamily is divided into three subfamilies, I25A (cystatin A and Generally, cystatins have three conserved sequence motifs sarcocystatin), I25B (cystatin C and phytocystatin) and I25C (kini- including a Gly in the vicinity of the N-terminal region, QeXeVeXeG nogens). Cystatins are believed to be involved in the regulation of in the first hairpin loop, and PeW in the second hairpin loop. Struc- endogenous cysteine protease activity and/or in defensive tural analysis revealed that the amino-terminal segment and two loops of cystatins form a tripartite wedge-shaped edge structure that interacts with the cleft of the cognate cysteine proteases, * Corresponding author. Tel./fax: þ81 29 888 8683. papain and (Rzychon et al., 2004; Jenko et al., 2003). E-mail address: [email protected] (M. Yonekura). Among the three regions, the QeXeVeXeG motif is critical for

0965-1748/$ e see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.ibmb.2010.08.003 836 T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846 inhibition of cysteine proteases as revealed from the mutational 2.3. M. sexta genomic library construction and screening analysis of egg-white cystatin (Auerswald et al., 1992). In insects, cystatins include sarcocystatin from Sarcophaga per- An M. sexta genomic library was constructed using the Copy- egrina and a cystatin-like protein from Drosophila melanogaster ControlÔ Fosmid Library Production Kit (EPICENTRE) according to (Suzuki and Natori, 1985; Delbridge and Kelly, 1990). Saito et al. the manufacturer’s instructions. M. sexta genomic DNA was incu- (1989) suggested that sarcocystatin A from S. peregrina is bated with the End-Repair Enzyme Mix at room temperature for involved in morphogenesis of larval and adult structures of Sar- 45 min. Approximately 40 kb end-repaired DNAs were resolved by cophaga. Besides cystatins, two other types of CPIs, the propeptide 1% low melting point agarose gel electrophoresis. The 40 kb frag- region of a cysteine protease homologous protein from Bombyx ments were recovered from the gel using GELase (45 C for 1 h). The mori (family I29) and an inhibitor of apoptosis protein from D. size-fractionated DNA was ligated with the CopyControl pCC1FOS melanogaster (family I32) also are found in insects (Yamamoto et al., vector at room temperature for 2 h and then the mixture was 1999; Hay, 2000). However, relatively little information is known incubated with the MaxPlax Lambda Packaging Extracts at 37 C for about CPIs from insects compared with mammals and plants. To 180 min. The packaged CopyControl fosmid clones were incubated obtain a better understanding about structures and functions of with Escherichia coli EPI300 cells at 37 C for 20 min after which the novel CPIs from insects, we investigated a CPI from the tobacco infected EPI300 cells were spread on LB plate containing 12.5 mg/ml hornworm (Manduca sexta). chloramphenicol and incubated at 37 C overnight. The M. sexta Previously, a CPI from M. sexta, MsCPI, was purified from larval fosmid library thus constructed was screened by colony hybrid- hemolymph of the tobacco hornworm and the partial nucleotide ization using a 324-bp MsCPI gene fragment labeled with digox- sequence was determined (Miyaji et al., 2007). Although the MsCPI igenin as a probe. After the second and third screening, one positive protein purified from the hemolymph had a small molecular mass clone was isolated and fosmid DNA was prepared from it. To of 11,400e11,600 Da, northern blot analysis indicated that the confirm the insert position of MsCPI precursor gene, the fosmid message was extremely large (w9 kb). Molecular cloning of the DNA was digested with various restriction enzymes and Southern MsCPI cDNA revealed that the encoded protein belonged to the hybridization was performed. The DNA fragments obtained by cystatin family and was composed of at least six tandemly repeated EcoRI, HindIII or PstI digestions were subcloned into the plasmid MsCPI segments. vector pUC18 and the nucleotide sequences were determined. To determine the complete structure of the MsCPI gene, we performed genomic and cDNA cloning of the full-length gene, and 2.4. Southern hybridization functionally expressed the procathepsin F-like domain that is a part of the MsCPI precursor. In addition, we determined the conceptual The fosmid DNA from the positive clone was digested with BglII, amino acid sequence of MsCPI precursor protein and the interac- EcoRI, HindIII, PstI, SacI, SalI, SphI, XbaI, or XhoI. The DNA fragments tion that occurs between MsCPI and the cathepsin F-like protease. were resolved by agarose gel electrophoresis and transferred to þ a Hybond-N nylon membrane (GE Healthcare), after which the 2. Materials and methods membrane was baked at 80 C for 2 h. A 324-bp MsCPI gene frag- ment was labeled with alkaline phosphatase (AlkPhos Direct 2.1. Screening of MsCPI cDNA from M. sexta fat body cDNA library Labeling and Detection System with CDP-Star, GE Healthcare) for use as a probe. Hybridization was performed in hybridization buffer An M. sexta fat body cDNA library that was previously con- (4% blocking reagent/0.5 M NaCl) at 55 C overnight. After washing structed (Jiang et al., 2003) was screened by plaque hybridization for 10 min twice at 55 C with 50 mM Na phosphate buffer, pH 7.0 using a partial MsCPI cDNA (324 bp) labeled with digoxigenin as containing 0.2% blocking reagent, 0.1% SDS, 1 mM MgCl2, 150 mM a probe (Miyaji et al., 2007). Six positive clones were isolated and NaCl and 2 M urea followed by 5 min twice at room temperature phage DNAs were prepared from them. The DNA with the longest with 50 mM TriseHCl buffer, pH 10 containing 2 mM MgCl2 and insert was digested with EcoRI and XhoI, and the DNA fragment was 100 mM NaCl, the membrane was incubated with CDP-Star detec- subcloned into the plasmid vector pUC18 or pBluescript SK . The tion reagent for 5 min and visualized using an LAS 3000 Mini nucleotide sequence was determined by the dideoxy chain termi- Imaging System (Fuji Film). nation method using the BigDye Terminator v3.1 cycle sequencing Ready Reaction Kit and automated DNA sequencer ABI prism 2.5. 50-RACE and RT-PCR of MsCPI gene 3130xl (Applied Biosystems Inc.). Isolation of total RNA from fat body of M. sexta was performed as 2.2. Isolation of genomic DNA from an M. sexta larva previously reported (Miyaji et al., 2007). 50-RACE and RT-PCR were conducted using the GeneRacerÔ Kit (Invitrogen) according to the Genomic DNA was prepared from a fifth instar M. sexta larva manufacturer’s instructions. Briefly, full-length mRNA with a 50-cap using the method of Blin and Stafford (1976), as follows. The larva structure was decapped by tobacco acid pyrophosphatase and then was frozen in liquid nitrogen and ground to a powder. The powder ligated with RNA Oligo. First-strand cDNA was synthesized by Ò was then dissolved in extraction buffer (20 mg/ml RNase A, 0.5% SuperScript III Reverse Transcriptase using the mRNA and MsCPI SDS, 100 mM EDTA and 10 mM TriseHCl, pH 8.0). After incubation specific primers. PCR was performed using the first-strand cDNA as at 37 C for 1 h, the solution was gently mixed with proteinase K at a template and gene-specific primers, and the PCR products were Ò a final concentration of 100 mg/ml at 50 C for 3 h, and then incu- purified by using the Wizard SV Gel and PCR Clean-Up System bated for 10 min with an equal volume of phenol equilibrated with (Promega). The purified PCR products were cloned into the pGEM TE (1 mM EDTA, 10 mM Tris-HCl, pH 8.0). After three extractions T-Easy vector (Promega) and then sequenced. with phenol, the supernatant was mixed thoroughly with 0.1 volume of 3 M sodium acetate, pH 4.6 and 2 volumes of ethanol. 2.6. Expression of procathepsin F-like protein in E. coli The precipitated DNA was recovered and washed using 70% ethanol twice. The purity and concentration of the genomic DNA thus MsCPI cDNA prepared from a positive clone from M. sexta fat obtained were confirmed by agarose gel electrophoresis and by body cDNA library was digested with Bam HI and the DNA fragment measuring the absorbance at 260 nm. including the region encoding the procathepsin F-like domain T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846 837

(Msprocat F) was ligated into pUC18 (pUC-MsCPI). Msprocat F fraction generated against mouse IgG (Funakoshi). The protein was gene-specific primers (F1, 50-CCAAGGAGGCACTGTGTGGCGC-30; F2, visualized by addition of the peroxidase substrate solution, 0.5 mg/ 50-CATATGGCCGAGGTGTATCACCAT TTG-30;B1,50-AAGAATACATAA ml 3, 30-diaminobenzidine tetrahydrochloride (Dojindo), 0.15% 0 0 TTCACTC TGG-3 and B2, 5 -GAATTCTCATACGACAG CACTGCTCGCC- H2O2, 5 mM imidazole and 100 mM TriseHCl, pH 7.5. 30) were constructed and PCR was performed using a primer set of F1 and B1 with pUC-MsCPI as the template. A second PCR was 2.10. Assay of Mscat F activity performed using a primer set of F2 and B2 with the first PCR mixture as the template. The amplified Msprocat F gene was ligated Peptidase (cathepsin F-like protease, Mscat F) activity was into pGEM T-Easy vector. The Nde I site in the Msprocat F gene was measured using the method of Barrett and Kirschke (1981).Fluoro- mutated without amino acid substitution by the mutation primers, genic substrates, t-Butyloxycarbonyl-L-Leucyl-L-Lysyl-L-Arginine 4- F3, 50-GTACAAAACACATGGGTATGAAAGCTAGTC-30 and B3, 50- Methyl-Coumaryl-7-Amide (BoceLeueLyseArgeMCA), Benzylox- 0 GACTAGCTTTCATACCCATGTGTTT TGTAC-3 , using the QuikChange ycarbonyl-L-Arginyl-L-Arginine 4-Methyl-Coumaryl-7-Amide (Z- Site-Directed Mutagenesis Kit (Stratagene). After confirmation of ArgeArgeMCA), Benzyloxycarbonyl-L-Leucyl-L-Arginine 4-Methyl- the nucleotide sequence, the DNA fragment encoding the Msprocat Coumaryl-7-Amide (Z-LeueArgeMCA), Benzyloxycarbonyl-L-Phenyl- F protein was excised by digestion with Nde I and Sal I, and then alanyl-L-Arginine 4-Methyl-Coumaryl-7-Amide (Z-PheeArgeMCA; ligated into the expression vector pET-22b previously digested with Peptide Institute, Inc) or N-Succinyl-L-Leucyl-L-Tyrosine 4-methyl- the same enzymes. The resulting plasmid, pET-Msprocat F, was Coumaryl-7-Amide (SuceLeueTyreMCA, Sigma), were dissolved in introduced into the E. coli BL21 CodonPlus(DE3) RIL strain and the E. DMSO to the concentration of 10 mM and diluted 1000-fold in coli was incubated in a Circle Grow medium (Qbiogene). After 100 mM Na-acetate buffer, pH 5.5 containing 1 mM DTT and 1 mM induction with 1 mM IPTG, the culture was incubated overnight at EDTA. One milliliter of substrate solution was preincubated at 37 Cfor 37 C, after which the cells were harvested. The cells were lysed by 10 min. One hundred microliters of enzyme solution were added and sonication in 30 mM TriseHCl buffer, pH 7.5 containing 30 mM incubated at 37 C for 10 min. After incubation, 200 mlof100mMNa- NaCl. The lysate was centrifuged at 8000 g for 10 min and the acetate buffer, pH5.0 containing 10 mM iodoacetic acid was added to pellet (inclusion bodies) was collected and washed with 1 M stop the reaction. Amounts of the liberated product, 7-amino-4- sucrose, followed by 1% Triton X-100/10 mM EDTA. methylcoumarin (AMC), were determined by measurement of the fluorescence intensity at 440 nm (excitation at 380 nm) using a Shi- 2.7. Refolding of Msprocat F protein madzu spectrofluorophotometer (model RF-1500). One unit of enzyme activity was defined as the amount of enzyme liberating Refolding of Msprocat F protein was performed using the 1 nmol AMC in 1 min. The concentration of active Mscat F was method of Brömme et al. (2004). The inclusion bodies were dis- determined using E-64 (L-trans-epoxysuccinyl-leucylamido (4-gua- solved in 50 mM TriseHCl buffer, pH 8.0 containing 5 mM EDTA, nidino) butane) as the active site titrant (Barrett et al., 1982). The 5 mM cysteine and 8 M urea to the protein concentration of 1 mg/ kinetic parameters, Vmax and Km, were determined by using the ml. The protein solution was diluted using 50 mM TriseHCl buffer, LineweavereBurk double reciprocal plot. pH 8.0 containing 5 mM cysteine to a final concentration of 20 mg/ ml and stirred at 4 C overnight. After refolding, the Msprocat F 2.11. Effects of pH and temperature on Mscat F activity solution was dialyzed against 50 mM TriseHCl buffer, pH 8.0 and concentrated by ultrafiltration using a YM-10 filter (Millipore). The effect of pH on Mscat F activity was determined using Z- Protein concentration was determined using the BCA protein LeueArgeMCA as the substrate at 37 C for 10 min in 100 mM assay kit (Thermo Fisher Scientific) with bovine serum albumin as buffers containing 1 mM DTT and 1 mM EDTA. Buffers used were the standard protein. SDS-PAGE was carried out using a 15% poly- sodium formate (pH 3.0e4.0), sodium acetate (pH 4.0e5.5), sodium acrylamide gel as described by Laemmli (1970). phosphate (pH 5.5e7.0). The effect of temperature on Mscat F activity was determined 2.8. Activation of Msprocat F and purification using Z-LeueArgeMCA as the substrate at 20e50 C for 10 min in 100 mM sodium acetate buffer, pH 5.5 containing 1 mM DTT and Activation of Msprocat F was performed using pepsin, which 1 mM EDTA. Mscat F was also treated at 20e50 C for 30 min in was successfully used previously (Brömme et al., 2004; Wang et al., advance, and then the heat stability was investigated by measuring 1998; Fonovic et al., 2004). The refolded Msprocat F solution was activity at 37 C. dialyzed against 20 mM Na-acetate buffer, pH 4.5 containing 1 mM EDTA. After dialysis, the Msprocat F solution was supplemented 2.12. Inhibition assay with DTT (final concentration 5 mM) and porcine pepsin (Sigma) at a molar ratio of 1/10. The solution was incubated at 37 C for 5 h, Inhibition of Mscat F was assayed using recombinant MsCPI and activation was monitored by measuring peptidase activity (Miyaji et al., 2007), egg-white cystatin, (Sigma) and sunflower using Z-LeueArgeMCA as the substrate and western blotting using cystatin, Sca (Kouzuma et al., 2001). Inhibition constants (Ki) toward a polyclonal antibody elicited against Msprocat F. Mscat F were determined by the method of Henderson (1972). After activation, the solution was applied to a SP-sepharose FF column (GE Healthcare) equilibrated with 20 mM Na-acetate 2.13. Action of Mscat F on the MsCPI tandem protein buffer, pH 5.5 containing 1 mM DTT and 1 mM EDTA, and the were eluted with a linear gradient of increasing NaCl The expression of MsCPI tandem protein which contains MsCPI-5 concentration from 0 to 0.5 M. and 6 domains in E. coli was performed according to previously reported methods (Miyaji et al., 2007). Purified MsCPI tandem 2.9. Western blotting protein and Mscat F or trypsin were incubated at an enzyme/inhibitor molar ratio of 1/9 in 1 mM Na-acetate buffer, pH 5.5 containing 1 mM Western blotting was carried out with a murine anti-Msprocat F DTT and 1 mM EDTA at 37 C for 1 h. The proteolytic processing of the serum obtained according to the method of Young and Davis MsCPI tandem protein was confirmed by SDS-PAGE and measure- (1983), and a horseradish peroxidase-conjugated rabbit IgG ment of the inhibitory activity against papain. 838 T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846

Fig. 1. Gene organization and nucleotide sequence of MsCPI. A. Gene organization and domains of the MsCPI precursor protein. Introns in MsCPI gene are indicated by horizontal lines. 50-UTR (nucleotides 1e45), signal sequence (46e105), four cystatin-like domains (domain 1, 505e903; 2, 1651e1995; 3, 5338e5700; 4, 6769e7119), nine MsCPI domains (domain 1, 2392e2703; 2, 2704e3018; 3, 3019e3333; 4, 3334e3648; 5, 3649e3963; 6, 3964e4278; 7, 4279e4593; 8, 4594e4908; 9, 4909e5223), procathepsin F-like domain (pro- region, 7120e7416; mature enzyme, 7417e8076) and 30-UTR (8077e9273) are present. B. Nucleotide and deduced amino acid sequences of MsCPI cDNA. Putative signal sequence is boxed. MsCys-1e4, MsCPI-1e9 and procathepsin F-like domains are indicated by underlines, dotted lines and broken lines, respectively. Arrowheads indicate boundaries between MsCPI domains. Stop codon is indicated by an asterisk. T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846 839

Fig. 1. (continued).

3. Results suggested that the protein is composed of multifunctional domains including at least five MsCPI domains, two cystatin-like domains and 3.1. cDNA cloning and nucleotide sequence of MsCPI gene one procathepsin F-like domain. However, because the cDNA was not a full-length cDNA, we determined the nucleotide sequence of the MsCPI cDNA clones were screened from 50,000 plaques of an M. MsCPI gene obtained from genomic DNA. sexta fat body cDNA library using a partial MsCPI cDNA labeled with digoxigenin as a probe. Six positive clones were isolated and the 3.2. Genomic cloning and nucleotide sequence of MsCPI gene nucleotide sequence of the longest insert DNA was determined. The cDNA consists of 5665 nucleotides and encodes a protein with at Approximately 5000 colonies from the M. sexta genomic library least 1492 amino acids. Analysis of the deduced amino acid sequence were screened using the MsCPI gene as a probe. One positive clone 840 T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846

A 10 20 30 40 50 60 MsCPI-1 SVP - GG I T AQN PNA P E F KQLAEESMQKYLQS --IGSTKPHKVVRVVKATTQVVSGSMTRI MsCPI-2 ETPVGGIQAQD PNDP I FQSLAEESMQKYLQS --IGSTKPHKVVRVVKATTQVVSGSMTRI MsCPI-3 ETPVGGI QAQEPNDP I FQSLAEESMQKYLQS --IGSTKPHKVVRVVKAS TQVVSGSMTR I MsCPI-4 ETPVGGIQAQD PNDP I FQSLAEESMQKYLQS --IGSTKPHKVVRVVKATTQVVSGSMTRI MsCPI-5 ETPVGGI QAQEPNDP I FQSLAEESMQKYLQS --IGSTKPHKVVRVVKAS TQVVSGSMTR I MsCPI-6 ETPVGGI QAQEPNDP I FQSLAEESMQKYLQS --IGSTKPHKVVRVVKAS TQVVSGSMTR I MsCPI-7 ETPVGGIQAQD PNDP I FQSLAD ESMQKYLQS --IGSTKPHKVVRVVKATTQVVSGSMTRI MsCPI-8 ETPVGGI QAQEPNDP I FQSLAEESMQKYLQS --IGSTKPHKVVRVVKAS TQVVSGSMTR I MsCPI-9 ETPVGGIQAQD PNDP I FQSLAEESMQKYLQS --IGSTKPHKVVRVVKATTQVVSGSMTRI Bmsarco-type 1 LDFVGGL Q L Q DAHD QKYKL LAEESLRQFLQK- -NG T TKPHT VVRLNK V TTQVVSGTL IR L sarcocystatin QC - VGCP - SEVKGD KLKQ --SNEETL K S L SKLAAG DGPTYK L V KINSATTQVVSGSKDV I Dmsarco-type QV - VGGV-SQ LEGD SRKE - - A L E LLDATL AQLATG DGPSYK AINV TSVT G QVVA GSLNTY

70 80 90 100 110 Identity (%) MsCPI-1 EFV I SPSDGNSGDV I SCYSEVWEQPWM HK --KE I TVDCK I NNQKYRAKR --- 90 MsCPI-2 EFV I SPSDGNSGDV I SCYSEVWEQPWRHK --KE I TVDCK I NNQKYRAKR --- 98 MsCPI-3 EFV I SPSDR NSGDV I SCYSEVWEQPWRHK --KE I TVDCK I NNQKYRAKR --- 99 MsCPI-4 EFV I SPSDGNSGDV I SCYSEVWEQPWRHK --KE I TVDCK I NNQKYRAKR --- 98 MsCPI-5 EFV I SPSDGNSGDV I SCYSEVWEQPWRHK --KE I TVDCK I NNQKYRAKR --- - MsCPI-6 EFV I SPSDGNSGDV I SCYSEVWEQPWRHK --KE I TVDCK I NNQKYRAKR --- 100 MsCPI-7 EFV I SPA DGNSGDV I SCYSEVWEQPWRHK --KE I TVDCK I NNQKYRAKR --- 96 MsCPI-8 EFV I SPSDGNSGDV I SCYSEVWEQPWM HK --KE I TVDCK I NNQKYRAKR --- 99 MsCPI-9 EFV I SPSDGNSGDV I SCYSEVWEQPWM HK --KE I TVDCK I NNQKYRAKR --- 97 Bmsarco-type 1 D FVAAP TGEES R- -YQC H SE I WER PWLKK --TDI E V N CKQSN E- -R S KR --- 49 sarcocystatin NADLKDENDKTK - - - TC DITIW S QPWL-ENGIE V T FNC PGEPK - - VVK KHSA 31 Dmsarco-type E VELDNGSDK-K- - -QC TVK IW T QPWLKENGTNI KIKC SGDDG - - ELDR -TW 25

B 10 20 30 40 50 MsCys-1 EEP G A P --CIG CATH I - - NP QAAGVT - ELATLA V RHLDR HDP GVRHALKT V IDVE MsCys-2 EEE G NEL NNV SNI-----PLDDPVLQ-QLC KES I K K IEKESP KL -NALNISE MIS MsCys-3 QNSE KT I - - VGAFRE EQ - DP TKNQYK - LLAKD SLREYQR LKKSN-HAH KV I E VKR MsCys-4 E DDE T PL ------E AH I EFESAEMAMQLAKEALKH I EAKYP HP - RKQKVVRIFS BmCys-6 E DD - - PL ------E AHVEFESAEMALQLAN EALKH I EARYP NP - RKQK ILRIFT TcCys-1 ----DSVPCLG CPFDL - - NTNAEGVD - EL IDVALE HIQTERAKK - HA --LV KVLR cystatin C SSPG K P PRLVGGPMDA - - SVEEEGVR - RALDFA VGEYNKASNDM- YHSRALQVVR human cathepsin F -APAQP R - - AASFQAW- - GP PSPELL - APTRFALEMFNR GRAAG - TRAV LGL VRG stefin A MIP------G GLSE A- -KP ATPE IQ- E I VDKVKPQLE EKTNET - YG- - KLE AVQ MsCPI-5 E TP------VGGIQAQ- -EP NDP I FQ - SLAE ESMQK Y L QS I GS T - K P HKVVRVVK kininogen domain 3 ------CVGCPRD I - - PTNSPELE - ETLTHT I TKLNAENNAT - FYFK IDNVKK

60 70 80 90 100 110 MsCys-1 RQVQVVN G VRY I L TLLVDYDSCTETQT----ESC VS ------SNT C KISI L E MsCys-2 ATTQK VSGTLTKI A F ILEHLNCTKDTPVGLRRKC EVVDE - - - LG - SNL C EAV I F E MsCys-3 VRT K VVSGLIY E I H F TAVPT T C PSK I F - -DASYCKQQQG - - - SA - ML SC HSK IWS MsCys-4 LEKQVVA G IHY R L KIEIGLT D C SALSA - - -QTDCKLVKD---EG LNKFC RVNVW L BmCys-6 LEKQVVA G IHY RMKVEVGLT N CTALTN- - -RSDC KH I SD - - - ESLNKFC RVNVW M TcCys-1 LQQQVVT G VKY I L TAEFAPT L C EKSAN - LDASSC PRDTN - - - AE - TT I C EITYLH cystatin C ARKQ I V A G VNY F L DVELGRT T CTKTQP- -NLDNC PFHDQPHLKR- KAFC SFQI YA human cathepsin F RVRRAGQG SLY S L EATLEEPPC NDPMV------C RLPVS---KK-TLLC SFQVLD stefin A YKTQVVA G TNY Y I KVRAGD------NKYMHLKVFK MsCPI-5 ASTQVVSGSMTR I E F VISPSDGNS------GD-VISC YSEVWE kininogen domain 3 ARVQVVA G KKY F I D F VARET T C SKESNEELTESC ETKKL - - -GQ- S LDC NAEVYV

120130140150 MsCys-1 KPWVKLPRGGKYR GI LSNNCTSEWQF-----GDNGEVIDNYDSN N R MsCys-2 K HT------RK ERQITY NC ------IER MsCys-3 Q PW ------LKRK QITVA C NPDTTL------GDNRN K R MsCys-4 R PWT------DHPANYRV T C DYHEAA------T-- BmCys-6 R PWT------NHPPNFRV T C DYQESA------TID TcCys-1 KPWI-----SKAKHVI KNNCTVSQEYVRPD I RKSD I SNE I DVSGKR cystatin C V PWQ------GTMTLSKSTC ------QDA human cathepsin F EL------GRHVLLRKDC ------stefin A SLPG------QNEDLVLTGYQVDKNK------DDELTGF MsCPI-5 Q PW ------RHKKEITVD C KINNQK------YRAKR kininogen domain 3 V PW ------EK KIYPTVNCQP------LGM

Fig. 2. Comparison of the amino acid sequences of MsCPI domains and MsCys-1e4. A. Comparison of the amino acid sequences of MsCPI domains and insect sarcocystatin-type cystatins. The amino acid sequences were aligned using MAFFT software (http://mafft.cbrc.jp/alignment/software/). Identical amino acids with MsCPI domains are enclosed in black boxes. The three conserved regions of a cystatin (G, QXVXG and PW) are underlined. MsCPI-1e9, MsCPI domains 1e9; Bmsarco-type 1, sarcocystatin-type protein 1 in MsCPI precursor homologous protein from B. mori (KAIKO base Gene No. BGIBMGA005131); Sarcocystatin, sarcocystatin A from Sarcophaga peregrina (GenBank accession no. J02847) and Dmsarco-type, Cystatin-like protein from Drosophila melanogaster (GenBank accession no. AAF55073). B. Comparison of the amino acid sequences of MsCys-1e4 and several cystatins. Cystatin-like domains were aligned together with several cystatins using MAFFT software. Two or more identical amino acid residues among MsCys-1e4 are enclosed in black boxes. The three conserved regions of the cystatins are underlined. MsCys-1e4, Manduca sexta cystatin-like domains 1e4; BmCys-6, cystatin-like domain 6 in MsCPI precursor homologous protein from B. mori; TcCys-1, cystatin-like domain 5 in cathepsin F-like cysteine protease from Tribolium castaneum (GenBank accession no. XP_973607); cystatin C, human cystatin C (GenBank accession no. CAA29096); human cathepsin F, cystatin-like domain from human cathepsin F; stefin A, human stefin A (GenBank accession no. BAA87858); MsCPI-5, MsCPI domain 5; kininogen domain 3, human low molecular weight kininogen domain 3 (GenBank accession no. AAA35497). T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846 841 was isolated and Southern hybridization using a 324-bp MsCPI like domains. In other insect species, the procathepsin F-like gene fragment as a probe suggested that EcoRI, HindIII and PstI protein from Apis mellifera has only a single cystatin-like domain. In digestion of the fosmid DNA was effective for subcloning of MsCPI D. melanogaster and human, a cystatin-like domain with the precursor gene. Therefore, the fosmid DNA was digested by EcoRI, QeXeVeXeG motif is not present in their procathepsin F-like HindIII and/or PstI and the nucleotide sequence of the insert DNA proteins. However, it has been predicted that the N-terminal region (about 38 kbp) was determined (GenBank accession no. AB524031). of human procathepsin F has a cystatin-like fold (Nägler et al., To clarify the positions of the introns, RT-PCR using gene-specific 1999). Cathepsin F from the human liver fluke, Opisthorchis viver- primers was performed. Furthermore, the positions of the initiation rini, lacks a region corresponding to the cystatin-like domain codons and 50-UTR were determined by 50-RACE. As shown in (Pinlaor et al., 2009). Fig. 1A, the MsCPI gene (27,515 bp) was organized into eight exons with seven introns (1.2 kbp, 6.7 kbp, 71 bp, 0.6 kbp, 5.8 kbp, 2.7 kbp and 1.8 kbp). Combining the partial cDNA and genomic DNA 3.4. Expression and activation of Msprocat F sequences, the full-length MsCPI transcript (GenBank accession no. AB242821) consists of 9273 nucleotides that encode a polypeptide To investigate the function of procathepsin F-like domain of the of 2676 amino acids (Fig. 1B). MsCPI precursor (Msprocat F), the recombinant protein was produced in E. coli. The recombinant Msprocat F (35 kDa) accu- 3.3. Primary structure analysis of MsCPI precursor protein mulated in cells was insoluble (Fig. 5A). Therefore, the recombinant protein was dissolved in a solution containing 8 M urea and then Analysis of the amino acid sequence of the polypeptide of 2676 diluted with renaturation buffer to facilitate refolding of the protein amino acids revealed that the presence of another two cystatin-like to its biologically active structure. domains and four MsCPI domains. Consequently, the MsCPI protein After dialysis against an acidic buffer (pH 4.5), activation of the is apparently synthesized from a very large protein precursor that refolded Msprocat F protein was achieved by proteolysis with has four cystatin-like domains, nine MsCPI domains and one pro- pepsin. When Msprocat F was incubated at 37 C for 2 h in the cathepsin F-like domain (Fig. 1B). presence of pepsin, peptidase activity (Z-LeueArgeMCA hydro- The MsCPI domains (MsCPI-2e9) are composed of 315 nucleo- lyzing activity) was drastically increased, reaching its the highest tides encoding 105 amino acids except for MsCPI-1, which consists level after 3 h and then gradually decreasing (Fig. 5B). Western blot of only 312 nucleotides encoding 104 amino acids. The MsCPI analysis indicated that three proteins (bands i, ii, and iii) of different domains share 88e100% identical amino acid residues. Several masses were derived from the Msprocat F protein. Under the acidic amino acids in the MsCPI domains are different from a previously condition (pH 4.5), protein i apparently converted to protein ii reported RT-PCR product of the MsCPI gene (Miyaji et al., 2007). (Fig. 5C, 0 h). During incubation with pepsin at 37 C, the smaller However, all MsCPI domains share three conservative motifs of protein (band iii) was produced (Fig. 5C, 2 h), and the intensity of known cystatins (glycine in the N-terminal region, QeXeVeXeGin band iii was the highest at 3 h incubation and gradually decreased the middle region, and PeW in the C-terminal region) and have two concomitant with the activity. The N-terminal amino acid sequences cysteine residues, suggesting that they belong to the I25A family to of band i was AlaeGlueValeTyreHis, indicating that i is an intact which sarcocystatin from S. peregrina also is a member (Fig. 2A). form of Msprocat F. The N-terminal amino acid sequences of ii and iii Four cystatin-like domains (MsCys-1e4) are composed of 133, were LyseAlaeValeIle and AlaeProeAspeSer, respectively. These 115, 125 and 117 amino acids, respectively (Fig. 2B). They share data suggested that Msprocat F was converted under acidic condi- 17e21% identity with each other and have at least four cysteine tions to an intermediate form by autolysis at the C-terminal side of residues. They exhibit 15e20% identical residues with cystatin C Arg91p (Msprocat F numbering in Fig. 3), and then by cleavage at the from Homo sapiens. Therefore, it is proposed that MsCys-1e4 C-terminal side of Thr99p by pepsin to an enzymatically active belong to the I25B family. They share completely or partially mature form. conserved motifs of cystatins. MsCys-1 has three conserved motifs. Activated Mscat F protein was purified by ion-exchange chro- On the other hand, MsCys-2 and 4 lack two (glycine and PW) and matography using SP-Sepharose FF column. The purity of the Mscat one motif (glycine), respectively. In MsCys-3, QeXeVeXeG in the F protein was confirmed by SDS-PAGE (Fig. 5A). The yield of the middle region was replaced by KeXeVeXeG. Mscat F protein was 0.3 mg per liter of LB broth. The procathepsin F-like domain consists of 318 amino acids that include a 99-amino acid putative pro-region and 219-amino acid mature enzyme. The mature region contains the three conserved active site residues in a cysteine protease, Cys25, His159 and 3.5. Action of Mscat F on synthetic MCA substrates Asn175 (papain numbering) in the sequence. It shared 52% and 36e38% identity with human procathepsin F and other cathepsins The substrate specificity of the Mscat F was determined using (K, L and S), respectively (Fig. 3). seven synthetic substrates, BoceGlueLyseLyseMCA, BoceLeue When we searched for homologous proteins with the proca- LyseArgeMCA, BoceValeLeueLyseMCA, SuceLeueTyreMCA, Z- thepsin F-like domain, several procathepsin F-like proteins with PheeArgeMCA, Z-LeueArgeMCA and Z-ArgeArgeMCA (Table 1). one or some cystatin-like domains in the molecule were identified Mscat F was most active toward with Km and kcat values of 1.12 mM (Fig. 4). Hence, we define a cystatin-like domain as a domain with and 0.589 s 1, respectively. BoceGlueLyseLyseMCA, BoceLeue the sequence QeXeVeXeG or related sequence because of the LyseArgeMCA and Z-ArgeArgeMCA were hydrolyzed very slowly importance of this motif in cysteine protease inhibition. Proteins by Mscat F, indicating that substitution at the P2 site substantially from B. mori (KAIKObase Gene No. BGIBMGA005131) and Tribolium reduced catalytic efficiency (kcat/Km). The kcat/Km toward Z-Phe- 1 1 castaneum (GenBank accession no. XP_973607) have eight cystatin- ArgeMCA of Mscat F (kcat/Km ¼ 215,000 M s ) was higher than like domains although the molecular sizes are smaller than that of that of human cathepsin F expressed in baculovirus system (kcat/ 1 1 MsCPI precursor. The MsCPI precursor and the B. mori protein have Km ¼ 106,000 M s , Fonovic et al., 2004) and cathepsin F-like 1 1 two types of cystatin-like domains, the I25A type such as that of cysteine protease from Clonorchis sinensis (kcat/Km ¼ 10,100 M s , sarcocystatin and MsCPI, and the I25B type such as cystatin C. In Na et al., 2008), and lower than that of human cathepsin F produced 1 1 contrast, the T. castaneum protein has only the I25B type of cystatin- in Pichia pastoris (kcat/Km ¼ 5,682,000 M s ,Wang et al., 1998). 842 T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846

10p 20p 30p 40p 50p Msprocat F AEVY HHLQAEHLFYE FLS TYKPE YINDRHQMRQ RFE I FKE NVR --K M HELNTHERGTATY Bmprocat F ------HLFYD FLA T H KPN YID D AAEMRR RFE I FKG NVR --K I HELNTHERGTAV Y hcat F SQDL - PVKMAS I F KNF VITYNRTY -ESKEEAR W R LSVF VNN MV - - RAQK I QA LDRGTAQ Y hcat K --LY PEE I LDTHWELWKKT HRKQY N N KVDE I SRR L- I WE KN LKY I S IH N L E-ASLG VHTY hcat L -TLTFDHSLE AQWTKWKAMHNRLY G-MNEEGWRR A-VWEKN MKM I E LH NQE - YREG KHSF hcat S -QLHKDPTLDH HWH LWK K TYGKQY KEKNEEAVRR L- I WE KN LKFVMLH N L E-H SMG MHSY

60p 70p 80p 90p 1 10 Msprocat F --GVTR FADLTYEEFS TKHMG M- -K A SLRDP NQVQFR K A V IPNVTAPDSFDWRDH GAVTG Bmprocat F --G I T Q FADLS YEEFGKK YLG L--K P SLRDT NQ IPMR Q A E IPKIEIPDK FDWRDYDAVTD hcat F --GVTK F S DLTE EEFR T IYLNT----LLRKEPGNKMKQA KSVGDLAPPEWDWR SKGAVTK hcat K ELAMNHLGD M T S EEVVQK MTG L--K VPL SHSRSNDTLY I PE EGRAPDSV D Y R KKG Y VTP hcat L TMAMNAF G D M T S EEFRQVMNG FQNRKPRKGKVFQEPLFY-----EAPR S V DWR EKG Y VTP hcat S DLG MNH LGD M T S EEVMSLTSSL - - RVPSQWQR- - N I TYKSNPNRILPDSV DWR EKG C VTE

20 30 40 50 60 70 Msprocat F VKDQGSCGSCWAFSVTGN I EGQWKM KTGD LVSLSEQELVDCDKL ---D Q GCNGGL PDNAY Bmprocat F VKDQGM CGSCWAFSVTGNV EGQY K L KTGQ L L SLSEQELVDCDKL ---D E GCNGGL PDNAY hcat F VKDQGM CGSCWAFSVTGNV EGQWFLNQG T L L SLSEQELL DCDKM- - -D KAC M GGL PS NAY hcat K VKN QGQ CGSCWAFSSVG ALEGQL K K KTGK L LNLSP Q N LVDCVSE - - - NDGCG GGYMTNAF hcat L VKN QGQ CGSCWAFSA TGALEGQMFRKTGR L I SLSEQN LVDCSGPQ - GNEGCNGGLM D Y A F hcat S VKY QGSCGA CWA F S AVG ALE A Q L K L KTGK LVT LSA Q N LVDCSTEKYGNKGCNGGFMTTA F

80 90 100 110 120 130 Msprocat F RA I EQLGGLESEDDYPYEGS DDKCS FNKTL ARVQ I SGAVN I TS - NETDMAKWLVKHGP I S Bmprocat F RA I EQLGGLEL E T DYPYEGE DDKCV FNKSIARVQ I SGAVN I TS - NETDMAKWLVN HGP I S hcat F S AIKNLGGLET EDDYS Y Q G HMQSC N F SAEKA K V Y I NDSV ELSQ-NEQKLA A WLA K R GP I S hcat K QYVQKNRG IDSEDA YPYV G QEESC MYN P T GKAAKCRG YRE I PEGNEKALKRAVARVGPV S hcat L QYVQDNGGL D SEESYPYEATEESC KYN PKYSVANDTG F V D I PK -QE KALMK AVATVGP I S hcat S QY I IDNKG IDS DASYPYKAMD Q KCQYDSKYRAATCS KYTELPYGRE DVLKEAVANKGPV S

140 150 160 170 180 Msprocat F IGINAN--AMQFYMGG I SHPWRMLCNPS NLDHGVL I VGYGAKDYPLFHKHLPYWI I KNSW Bmprocat F IGINAN--AMQFYMGG I SHPWK MLCNPT NLDHGVL I VGYGAKDYPLFHKHLPYWT IKNSW hcat F VA INAF- -GMQ F Y RHGISR P L R P LCS P WL I DHA VLL VGYGNRS------DVP F W A IKNSW hcat K VA I D A SLTSFQFYSKG VY - - YDESCNSDNLN H A VLA VGYG IQK------GNKHWI I KNSW hcat L VA I D A GHESF L FYKEGIY- -FEPDC SSEDMDHGVLV VGYGFESTES - -DNNKYWLVKNSW hcat S V G VDA RHPSFF LY RSG VY - - YEPSC -TQN VNHGV LV VGYGDLN------GKEYWLVKNSW

190 200 210 Identity (%) Msprocat F GTSWGEQGYYRVYRGDG- TCGVNQ MASSAVV - - Bmprocat F G K SWGEQGYYRVYRGDG- TCGVNK MASSS VV - 82 hcat F GTD WGE K GYYYLHRGS G -ACGVNT MASSAVVD 52 hcat K G ENWGNKGY ILMAR NKNNACG IANLASFPKM- 38 hcat L G EEWGMGGYVKMAKDRRNHCG IASAASYPTV - 37 hcat S G HNFGEE GY I R MAR NKG NHCG IASFPS YPE I - 36

Fig. 3. Comparison of amino acid sequences of procathepsin F-like domain of MsCPI cDNA with human cathepsins F, K, L and S and also the B. mori procathepsin F-like domain. The procathepsin F-like domain was aligned together with various cysteine proteases using MAFFT software. Identical amino acids with procathepsin F-like domain from MsCPI precursor are enclosed in black boxes. Catalytic residues and residues forming the S2 binding pocket are indicated by asterisks and dots, respectively. The cleavage sites for the activation of the enzyme are indicated by arrowheads. Bmprocat F, procathepsin F-like domain in MsCPI precursor homologous protein from B. mori; hcat F, Homo sapiens cathepsin F; hcat K, Homo sapiens (GenBank accession no. CAA57649); hcat L, Homo sapiens (GenBank accession no. AAA66974) and hcat S, Homo sapiens (GenBank accession no. AAB22005).

3.6. Effects of pH and temperature on Mscat F activity Sca (plant). Although all three cystatins inhibited Mscat F, they differed in effectiveness (Table 2). Inhibitory activity of MsCPI To characterize the enzymatic properties of Mscat F, the pepti- (Ki ¼ 1.2 nM) was w20-fold stronger than that of egg-white cystatin dase activity was investigated using Z-LeueArgeMCA as the (25.3 nM) and w260-fold stronger than that of Sca (310 nM). substrate under various conditions. The optimum pH of Mscat F was e pH 5.5 6.0 (Fig. 6A). More than 40% of maximum activity remained 3.8. Action of Mscat F on the MsCPI tandem protein at pH 3.5 and 6.5, and nearly disappeared at pH 3.0 and 7.0. When the reaction temperature was varied, maximum activity occurred at To investigate whether Mscat F can function as a processing 37 C and there was no activity at 50 C(Fig. 6B). Mscat F was rela- protease of the MsCPI precursor, recombinant MsCPI tandem tively stable up to 35 C after an incubation time of 30 min (Fig. 6B). protein was produced in E. coli. After the MsCPI tandem protein was incubated with Mscat F at 37 C for 1 h, both SDS-PAGE and papain 3.7. Inhibition of Mscat F by various cystatins inhibition assay were performed (Fig. 7). SDS-PAGE of the digest showed a ladder of protein bands, suggesting that the tandem Inhibition of Mscat F was investigated using three cystatins, protein was degraded nonspecifically by Mscat F. The inhibitory MsCPI (insect), egg-white cystatin (animal) and sunflower cystatin activity of the digest against papain was decreased to only 15% T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846 843

Signal sequence Cystatin-like domain having complete conserved sequences Unknown function Cystatin-like domain having partial conserved sequences MsCPI type domain Procathepsin F-like domain 500 aa

M. sexta

B. mori

T. castaneum

A. mellifera

D. melanogaster

O.

H. sapiens

Fig. 4. Comparison of domain structures of the MsCPI precursor protein and procathepsin F elike proteins. Among three conserved regions of a cystatin (G, QeXeVeXeG, and PW), domains with the motif QeXeVeXeG or a related sequence are defined as cystatin-like domains. M. sexta, MsCPI precursor protein (GenBank accession no. AB242821); B. mori, MsCPI precursor homologous protein from B. mori (KAIKO base Gene No. BGIBMGA005131); T. castaneum, cathepsin F-like cysteine protease from Tribolium castaneum (GenBank accession no. XP_973607); A. mellifera, CG12163-PA isoform A from Apis mellifera (GenBank accession no. XP_392381); D. melanogaster, CG12163 isoform A from D. melanogaster (GenBank accession no. NP_730901); H. sapiens, Homo sapiens procathepsin F (GenBank accession no. AAF13146) and O. viverrini, cysteine protease from O. viverrini (GenBank accession no. AAV69023). relative to the tandem protein. In contrast, when the MsCPI tandem when a larger number of genome sequences have been determined protein was treated with trypsin, the digest yielded a single band in the future. with mobility similar to that of the mature MsCPI protein (Fig. 7A). Nine MsCPI domains share 88e100% identity with each other The inhibitory activity against papain of the tryptic digest was four and contain three conserved motifs. Therefore, it is hypothesized times higher than that of the MsCPI tandem protein (Fig. 7B). that the MsCPI domains play similar physiological roles. The cys- tatin-like domain, MsCys-1, has three conserved motifs, but the other cystatin-like domains are less conserved. We have attempted 4. Discussion to express the MsCys-1e4 proteins in E. coli and to investigate their cysteine protease inhibitory activities. Proteinaceous cysteine protease inhibitors inactivate cysteine Although both Z-LeueArgeMCA and Z-PheeArgeMCA are proteases from various organisms. In mammals, the physiological effective substrates for Mscat F, the kcat/Km toward Z-Phe- roles of CPIs are recognized as regulatory proteins for endogenous eArgeMCA was only 40% of that toward Z-LeueArgeMCA. This cysteine proteases and also as defensive proteins inhibiting exog- preference resembles those of human cathepsin K and S, but it was enous cysteine proteases that facilitate infections caused by different from human cathepsin F, which hydrolyzed Leu and Phe bacteria and viruses. Although it is thought that CPIs from insects residues in the P2 position of substrates equally (Wang et al., 1998). have also similar functions, little information is known about the Based on the x-ray crystallographic studies of papain-like prote- structures and functions of insect CPI proteins. ases, such as cruzain and cathepsins B, L, and K (McGrath et al., To date the CPI from larval hemolymph of M. sexta is the third 1997; Kamphuis et al., 1985; Musil et al., 1991; McGrath et al., CPI protein to be purified from insects. MsCPI has an apparent 1995), the S2 subsite, which binds the P2 amino acid residue of molecular mass of 11.5 kDa, whereas the size of the mRNA is substrates, defines the primary substrate specificity of cathepsins. unexpectedly large (w9 kb) and is inconsistent with the molecular The crystal structure of human cathepsin F indicates that S2 pocket mass of the inhibitor protein. Therefore, we searched for a gene of human cathepsin F is formed by several amino acid residues, encoding a MsCPI precursor protein, whose mass would be such as Leu67, Pro68, Ala133, Ile157, Asp158, Ala160 and Met205 consistent with size of the mRNA. (Fig. 3). Among these residues, three of them, Ala133, Ile157 and cDNA and genomic cloning indicated that there was indeed Ala160, are substituted for Gly133, Leu159, and Gly162 in Mscat F, a MsCPI precursor protein (2676 amino acid residues) that con- respectively, suggesting that these substitutions affect the substrate tained multifunctional domains, including four cystatin-like preference of Mscat F. domains, nine tandem-repeated MsCPI domains and one proca- Mscat F was inhibited by MsCPI (Ki ¼ 1.2 nM) more strongly than thepsin F-like domain. In insects, encoding proteins with by egg-white cystatin (Ki ¼ 25.3 nM) and Sca (Ki ¼ 310 nM). Thus, multi-CPI domains and a cysteine protease domain (cathepsin F) MsCPI is an effective inhibitor of the cognate cysteine protease, such as the MsCPI precursor are found in another lepidopteran (B. Mscat F. This tendency was also observed with two other cystatins, mori) and a coleopteran (T. castaneum). In contrast, so far it has not one from animal origin (egg-white cystatin and human cathepsins) been found in a dipteran (D. melanogaster) or an hymenopteran (A. and plant origin (sunflower cystatin and papain), suggesting that mellifera). It is unclear whether the gene was acquired or lost during many cystatins inhibit their cognate cysteine proteases effectively the evolutionary process. However, that question will be answered (Table 2). 844 T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846

Fig. 6. Effects of pH and temperature on Mscat F activity. A. Effect of pH on Mscat F activity was assayed using Z-LeueArg MCA as the substrate, 37 C for 10 min in buffers containing 1 mM DTT and 1 mM EDTA. Buffers used were 100 mM sodium formate buffer (pH 3.0e4.0, circles), 100 mM sodium acetate buffer (pH 4.0e5.5, triangles), 100 mM sodium phosphate buffer (pH 5.5e7.0, squares). B. Effect of temperature on Mscat F activity. Optimum temperature (closed circles) of Mscat F was investigated at 20e50 C for 10 min in 100 mM sodium acetate, pH 5.5 containing 1 mM DTT and 1 mM EDTA. Heat stability (open circles) of Mscat F was investigated after pre- incubation at 20e50 C for 30 min.

MHC class II molecules. In general, peptide loading occurs after degradation of Ii molecules by a series of endosomal and lysosomal proteases, cathepsin L and S (Nakagawa et al., 1998; Driessen et al., 1999). However, Shi et al. (2000) indicated that cathepsin F is involved in Ii peptide processing since macrophages from mice that were deficient in both cathepsins L and S were still capable of processing Ii. Thus, in higher organisms with an acquired immune system, cathepsin F functions as an enzyme responsible for antigen processing and presentation. However, since insects apparently lack a similar system for acquired immunity, the role(s) of cathepsin F in insects remains unclear. Therefore, we investigated the func- tion of Mscat F. In the boundary sequence of the two MsCPI Fig. 5. Overexpression and activation of the Msprocat F protein. A. SDS-PAGE of domains, there are two basic amino acid motifs, KR and RXXR expressed proteins in E. coli cells. Lane 1, total cell extracts of E. coli cells without (Fig. 2A). Generally, prohormones with these motifs undergo induction; lane 2, total cell extracts after induction by IPTG; lane 3, soluble fraction of proteolytic cleavage at the N- or C-terminal side of those residues E. coli cells after sonication, lane 4, insoluble fraction after sonication, and lane 5, or between the two basic residues by cathepsin L or prohormone purified Mscat F protein. M, molecular weight markers, bovine serum albumin (67.0 kDa); ovalbumin (45.0 kDa); a-chymotrypsinogen (25.6 kDa); soybean trypsin proteases (Hook et al., 2004). Since cathepsin F belongs to the inhibitor (20.1 kD); and lysozyme (14.3 kDa) B. Mscat F activity measured using Z- cathepsin L family and Mscat F can cleave at the C-terminal side of LeueArgeMCA as the substrate. The assay was performed at 37 C for 10 min in an arginine residue, a possible role of Mscat F is in the processing of 100 mM sodium acetate buffer, pH 5.5 after incubation of Msprocat F with porcine MsCPI precursor protein. To assess the possibility, a recombinant e pepsin at 37 C for 0 5h. C. Western blotting of rMsprocat F using mouse anti- “ ” rMsprocat F antiserum. Western blotting was performed after incubation of Msprocat F MsCPI tandem protein with two MsCPI domains linked by the with porcine pepsin at 37 C for 0e5 h. Three protein bands (i, ii, and iii) derived from motif RAKR was constructed. The inhibitory activity of the MsCPI Msprocat F were detected. R indicates Msprocat F after refolded and dialyzed against tandem domain protein toward papain was lower than that of 50 mM TriseHCl buffer, pH 8.0. a single domain (MsCPI) possibly because of the steric hindrance caused by the proximity of two domains or for other reasons (data So far the physiological functions of cathepsin F in mammals not shown). Hence, the MsCPI tandem protein was incubated with have been reported. One of the functions is in antigen processing Mscat F at an enzyme/inhibitor molar ratio of 1/9, which and presentation (Shi et al., 2000). The major histocompatibility complex (MHC) class II-associated invariant chain (Ii), which binds to the peptide binding groove of newly synthesized MHC class II Table 2 fi fi Inhibition constants (Ki) of various cystatins toward Mscat F and ve other cysteine peptides, regulates intracellular traf cking and peptide loading of proteases.

Cysteine proteases M. sexta MsCPI Chicken Sunflower Sca Table 1 egg-white cystatin Hydrolysis of synthetic substrates by Mscat F. M. sexta Cathepsin F-like protease 1.2 25.3 310 1 m Substrates kcat (s ) Km ( M) kcat/Km Relative Human 3 1 1 ( 10 s M ) activity (%) 6.8 1.7 Z-LeueArgeMCA 0.589 1.12 527 100 Cathepsin F 0.06 Z-PheeArgeMCA 0.310 1.45 215 40.8 Cathepsin H 3.0 0.064 BoceValeLeueLyseMCA 1.13 8.65 131 24.9 Cathepsin L 0.87 0.019 SuceLeueTyreMCA 0.284 13.9 20.4 3.87 Papaya Z-ArgeArgeMCA 0.00853 39.7 0.215 0.0408 Papain 3.1 4.2 BoceLeueLyseArgeMCA n.d. n.d. n.d. The K values (nM) toward Mscat F were determined using the method of Henderson BoceGlueLyseLyseMCA n.d. n.d. n.d. i (1972). The Ki values toward cathepsin F and B, H and L by chicken egg-white The kinetic constants were determined at pH 5.5, 37 C. Relative activities are shown cystatin and MsCPI are from references (Barrett and Kirschke, 1981; Fonovic et al., with the kcat/Km value for Z-LeueArgeMCA taken as 100%. n.d. ¼ no activity 2004; Miyaji et al., 2007). The Ki value toward papain for sunflower cystatin Sca is detected. refered to Kouzuma et al. (2001). T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846 845

Blin, N., Stafford, D.W., 1976. A method for the preparation of DNA from calf thymus nuclei in maximum purity and yield, analysis of this DNA compared with that obtained by other methods. Nucleic Acids Res. 3, 2303e2308. Brömme, D., Nallaseth, F.S., Turk, B., 2004. Production and activation of recombinant papain-like cysteine proteases. Methods 32, 199e206. Delbridge, M.L., Kelly, L.E., 1990. Sequence analysis, and chromosomal localization of a gene encoding a cystatin-like protein from Drosophila melanogaster. FEBS Lett. 274, 141e145. Driessen, C., Bryant, R.A., Lennon-Duménil, A.M., Villadangos, J.A., Bryant, P.W., Shi, G.P., Chapman, H.A., Ploegh, H.L., 1999. Cathepsin S controls the trafficking and matu- ration of MHC class II molecules in dendritic cells. J. Cell Biol. 147, 775e790. Dubin, G., 2005. Proteinaceous cysteine protease inhibitors. Cell Mol. Life Sci. 62, 653e669. Fonovic, M., Brömme, D., Turk, V., Turk, B., 2004. Human cathepsin F: expression in baculovirus system, characterization and inhibition by protein inhibitors. Biol. Chem. 385, 505e509. Hay, B.A., 2000. Understanding IAP function and regulation: a view from Drosophila. Fig. 7. Action of Mscat F and trypsin on the MsCPI tandem protein. MsCPI tandem Cell Death Differ. 7, 1045e1056. protein and Mscat F or trypsin were incubated at an enzyme/inhibitor molar ratio of 1/ Henderson, P.J., 1972. A linear equation that describes the steady-state kinetics of 9 in 1 mM Na-acetate buffer, pH 5.5 containing 1 mM DTT and 1 mM EDTA at 37 C for enzymes and subcellular particles interacting with tightly bound inhibitors. 1h.A. SDS-PAGE of MsCPI tandem protein digested with Mscat F or trypsin. Lane 1, Biochem. J. 127, 321e333. MsCPI tandem protein; lane 2, MsCPI tandem protein digested with Mscat F; lane 3, Homma, K., Kurata, S., Natori, S., 1994. Purification, characterization, and cDNA MsCPI tandem protein digested with trypsin and lane 4, rMsCPI. M, Marker. B. Inhib- cloning of procathepsin L from the culture medium of NIH-Sape-4, an embry- fl fl itory activity toward papain of MsCPI tandem protein digested with Mscat F or trypsin. onic cell line of Sarcophaga peregrina ( esh y), and its involvement in the e 1, MsCPI tandem protein; 2, MsCPI tandem protein digested with Mscat F and 3, MsCPI differentiation of imaginal discs. J. Biol. Chem. 269, 15258 15264. Hook, V., Yasothornsrikul, S., Greenbaum, D., Medzihradszky, K.F., Troutner, K., Toneff, T., tandem protein digested with trypsin. Bundey, R., Logrinova, A., Reinheckel, T., Peters, C., Bogyo, M., 2004. Cathepsin L and Arg/Lys aminopeptidase: a distinct prohormone processing pathway for the biosynthesis of peptide neurotransmitters and hormones. Biol. Chem. 385, 473e480. Jenko, S., Dolenc, I., Guncar, G., Dobersek, A., Podobnik, M., Turk, D., 2003. Crystal fi corresponds to the MsCPI/Msprocat F domain ratio in the MsCPI structure of Ste n A in complex with cathepsin H: N-terminal residues of inhibitors can adapt to the active sites of endo- and exopeptidases. J. Mol. Biol. 326, 875e885. precursor molecule. However, the tandem protein was not pro- Jiang, H., Wang, Y., Yu, X.Q., Kanost, M.R., 2003. Manduca sexta serpin-3 regulates cessed specifically by Mscat F and the digest did not inhibit papain. prophenoloxidase activation in response to infection by inhibiting proph- e On the other hand, trypsin did convert the MsCPI tandem protein to enoloxidase-activating proteinases. J. Biol. Chem. 278, 3552 3561. Kamphuis, I.G., Drenth, J., Baker, E.N., 1985. Thiol proteases. Comparative studies a single domain of MsCPI with inhibitory activity against papain. based on the high-resolution structures of papain and actinidin, and on amino Thus, the MsCPI tandem protein was inactivated by Mscat F acid sequence information for cathepsins B and H, and stem . J. Mol. digestion, though active protein by trypsin. The processing in vivo Biol. 182, 317e329. Kouzuma, Y., Tsukigata, K., Inanaga, H., Doi-Kawano, K., Yamasaki, N., Kimura, M., of the MsCPI precursor protein is not catalyzed by Mscat F but by 2001. Molecular cloning and functional expression of cDNA encoding the some trypsin-like or prohormone processing protease. cysteine proteinase inhibitor Sca from sunflower seeds. Biosci. Biotechnol. In insects several cysteine proteases are involved in assorted Biochem. 65, 969e972. Laemmli, U.K., 1970. Cleavage of structural proteins during the assembly of the head physiological processes (Reeck et al., 1999). In S. peregrina cathepsin of bacteriophage T4. Nature 227, 680e685. B is involved in dissociation of the fat body during metamorphosis McGrath, M.E., Eakin, A.E., Engel, J.C., McKerrow, J.H., Craik, C.S., Fletterick, R.J., 1995. (Takahashi et al., 1993). Cathepsin L is essential for the differenti- The crystal structure of cruzain: a therapeutic target for Chagas’ disease. J. Mol. e ation of imaginal discs in S. peregrina cells (Homma et al., 1994) and Biol. 247, 251 259. McGrath, M.E., Klaus, J.L., Barnes, M.G., Brömme, D., 1997. Crystal structure of human is involved in larval molting and metamorphosis in Helicoverpa cathepsin K complexed with a potent inhibitor. Nat. Struct. Biol. 4, 105e109. armigera (Wang et al., 2010). More recently, several cathepsin-like Miyaji, T., Kouzuma, Y., Yaguchi, J., Matsumoto, R., Kanost, M.R., Kramer, K.J., fi proteases including cathepsins B and L were found in M. sexta, and Yonekura, M., 2007. Puri cation of a cysteine protease inhibitor from larval hemolymph of the tobacco hornworm (Manduca sexta) and functional cysteine protease activity increased after parasitism by the para- expression of the recombinant protein. Insect Biochem. Mol. Biol. 37, 960e968. sitoid wasp, Cotesia congregata (Serbielle et al., 2009). Although the Musil, D., Zucic, D., Turk, D., Engh, R.A., Mayr, I., Huber, R., Popovic, T., Turk, V., fi functions of Mscat F remain undetermined, Mscat F may play roles Towatari, T., Katunuma, N., Bode, W., 1991. The re ned 2.15 A X-ray crystal structure of human liver cathepsin B: the structural basis for its specificity. similar to those cysteine proteases. On the other hand, MsCPI can EMBO J. 10, 2321e2330. strongly inhibit Mscat F and various other cysteine proteases Na, B.K., Kang, J.M., Sohn, W.M., 2008. CsCF-6, a novel cathepsin F-like cysteine including animal cathepsins (Miyaji et al., 2007). Therefore, it is protease for nutrient uptake of Clonorchis sinensis. Int. J. Parasitol. 38, 493e502. Nägler, D.K., Sulea, T., Ménard, R., 1999. Full-length cDNA of human cathepsin F proposed that MsCPI is involved in the regulation of endogenous predicts the presence of a cystatin domain at the N-terminus of the cysteine protease activity and/or in the inhibition of cysteine proteases protease zymogen. Biochem. Biophys. Res. Commun. 257, 313e318. derived from invading organisms. To further investigate these Nakagawa, T., Roth, W., Wong, P., Nelson, A., Farr, A., Deussing, J., Villadangos, J.A., Ploegh, H., Peters, C., Rudensky, A.Y., 1998. Cathepsin L: critical role in Ii speculations, we need to clarify the processing mechanism of the degradation and CD4 T cell selection in the thymus. Science 280, 450e453. MsCPI-cat F precursor protein in various tissues and during Pinlaor, P., Kaewpitoon, N., Laha, T., Sripa, B., Kaewkes, S., Morales, M.E., Mann, V.H., development of M. sexta. Elucidation of these regulatory systems Parriott, S.K., Suttiprapa, S., Robinson, M.W., To, J., Dalton, J.P., Loukas, A., fl will provide new insights for future biological research on insect Brindley, P.J., 2009. Cathepsin F cysteine protease of the human liver uke, Opisthorchis viverrini. PLoS Negl. Trop. Dis. 3, e398. proteolytic enzymes and their inhibitors. Rawlings, N.D., Tolle, D.P., Barrett, A.J., 2004. Evolutionary families of peptidase inhibitors. Biochem. J. 378, 705e716. Reeck, G., Oppert, B., Denton, M., Kanost, M., Baker, J.E., Kramer, K.J., 1999. Insect proteinases. In: Turk, V. (Ed.), Proteases: New Perspectives. Birkhauser Verlag, References Basel, Switzerland, pp. 125e148. Rzychon, M., Chmiel, D., Stec-Niemczyk, J., 2004. Modes of inhibition of cysteine Auerswald, E.A., Genenger, G., Assfalg-Machleidt, I., Machleidt, W., Engh, R.A., proteases. Acta Biochim. Pol. 51, 861e873. Fritz, H., 1992. Recombinant chicken egg white cystatin variants of the QLVSG Saito, H., Suzuki, T., Ueno, K., Kubo, T., Natori, S., 1989. Molecular cloning of cDNA for region. Eur. J. Biochem. 209, 837e845. sarcocystatin A and analysis of the expression of the sarcocystatin A gene Barrett, A.J., Kirschke, H., 1981. Cathepsin B, cathepsin H, and cathepsin L. Methods during development of Sarcophaga peregrina. Biochemistry 28, 1749e1755. Enzymol. 80, 535e561. Serbielle, C., Moreau, S., Veillard, F., Voldoire, E., Bézier, A., Mannucci, M.A., Volkoff, A.N., Barrett, A.J., Kembhavi, A.A., Brown, M.A., Kirschke, H., Knight, C.G., Tamai, M., Drezen, J.M., Lalmanach, G., Huguet, E., 2009. Identification of parasite-responsive Hanada, K., 1982. L-trans-Epoxysuccinyl-leucylamido(4-guanidino)butane (E- cysteine proteases in Manduca sexta. Biol Chem. 390, 493e502. 64) and its analogues as inhibitors of cysteine proteinases including cathepsins Shi, G.P., Bryant, R.A., Riese, R., Verhelst, S., Driessen, C., Li, Z., Bromme, D., B, H and L. Biochem J. 201, 189e198. Ploegh, H.L., Chapman, H.A., 2000. Role for cathepsin F in invariant chain 846 T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846

processing and major histocompatibility complex class II peptide loading by Wang, B., Shi, G.P., Yao, P.M., Li, Z., Chapman, H.A., Brömme, D., 1998. Human macrophages. J. Exp. Med. 191, 1177e1186. cathepsin F. Molecular cloning, functional expression, tissue localization, and Suzuki, T., Natori, S., 1985. Purification and characterization of an inhibitor of the enzymatic characterization. J. Biol. Chem. 273, 32000e32008. cysteine protease from the hemolymph of Sarcophaga peregrina larvae. J. Biol. Wang, L.F., Chai, L.Q., He, H.J., Wang, Q., Wang, J.X., Zhao, X.F., 2010. A cathepsin L- Chem. 260, 5115e5120. like proteinase is involved in moulting and metamorphosis in Helicoverpa Takahashi, N., Kurata, S., Natori, S., 1993. Molecular cloning of cDNA for the armigera. Insect Mol. Biol. 19, 99e111. 29 kDa proteinase participating in decomposition of the larval fat body Yamamoto, Y., Watabe, S., Kageyama, T., Takahashi, S.Y., 1999. A novel inhibitor during metamorphosis of Sarcophaga peregrina (flesh fly). FEBS Lett. 334, protein for Bombyx cysteine proteinase is homologous to propeptide regions of 153e157. cysteine proteinases. FEBS Lett. 448, 257e260. Turk, V., Turk, B., Turk, D., 2001. Lysosomal cysteine proteases: facts and opportu- Young, R.A., Davis, R.W., 1983. Efficient isolation of genes by using antibody probes. nities. EMBO J. 20, 4629e4633. Proc. Natl. Acad. Sci. U.S.A. 80, 1194e1198.