Turkish Journal of Biology Turk J Biol (2016) 40: 69-83 http://journals.tubitak.gov.tr/biology/ © TÜBİTAK Research Article doi:10.3906/biy-1412-88

Proteomics comparison of aspartic in insects

1, 2 Samin SEDDIGH *, Maryam DARABI 1 Department of Plant Protection, Varamin-Pishva Branch, Islamic Azad University, Varamin, Iran 2 Department of Agronomy and Plant Breeding Sciences, College of Aboureihan, University of Tehran, Tehran, Iran

Received: 25.12.2014 Accepted/Published Online: 27.04.2015 Final Version: 05.01.2016

Abstract: Aspartic (APs; EC: 3.4.23) are a catalytic type of protease that use an activated water molecule, bound to one or more aspartate residues, for catalysis of their peptide substrates. In this study, bioinformatic analyses of APs enzymes were performed on insect protein sequences, including nineteen species of eleven different families. According to the conserved motifs obtained with MEME and MAST tools, three motifs were common to all insects. The structural and functional analyses of five selected insects from different orders were performed with ProtParam, SOPMA, SignalP 4.1, TMHMM 2.0, and ProDom tools in the ExPASy database. The tertiary structure of Apis mellifera as a sample of insects was predicted by the Phyre2 server using the “1qdm” model (PDB accession code: 1qdm) and was compared with Swiss-Model. The 3D structure quality was verified by the PROCHECK server. MegAlign was used for protein sequence alignment, and a phylogenetic tree was constructed with MEGA 6.06 software using the neighbor-joining method. In protein–protein interaction analysis with STRING 9.1, zero and twenty-seven significant protein interaction groups were identified in A. mellifera and other species, respectively. The obtained data provide a background for bioinformatic studies of the function and evolution of other insects and organisms.

Key words: , bioinformatics, insects, phylogenetic analysis, protein–protein interaction, structural analysis, tertiary structure

1. Introduction (Rao et al., 1998). They are used in food, detergents, and Proteases (peptidases or proteinases) are a large category pharmaceutical and biotechnology industries (Najafi et of enzymes that catalyze the hydrolysis of peptide bonds. al., 2005; López-Otín and Bond, 2008; Yegin et al., 2011; Proteases that cleave peptide bonds within the polypeptide Sabotič and Kos, 2012; Zhao et al., 2013). chain are classified as , and those that cleave Aspartic proteases (APs; EC: 3.4.23), commonly peptide bonds at the N or C termini of polypeptide chains known as acidic proteases, are one of the major catalytic are known as (López-Otín and Bond, 2008). classes of proteases, which are widely distributed not only Thus, the term “protease” includes both “endopeptidases” in plants but also among vertebrates, yeasts, nematode and “exopeptidases”, while the term “proteinase” is used parasites, fungi, and viruses (Davies, 1990; Rawlings and to describe only “endopeptidases” (Ryan, 1990). Proteases Barrett, 1995). They are also found in insects; for instance, are present in all organisms and are involved in various in larval Cyclorrhapha several digestive aspartic peptidases physiological processes, including the production of were reported, such as in Musca domestica (Greenberg nutrients for cell growth and proliferation (Lin et al., and Paretsky, 1955), Stomoxys calcitrans (Lambremont et 2011), for protein degradation (Ciechanover, 2005), and as al., 1959), Calliphora vomitoria (Fraser et al., 1961), M. regulatory components for diverse physiological functions domestica and Sarcophaga ruficornis (Sinha, 1975), and C. (Ehrmann and Clausen, 2004; Oikonomopoulou et al., vicina (Pendola and Greenberg, 1975). In M. domestica, 2006). Proteases include aspartic, cysteine, glutamic, it was proposed that the midgut aspartic peptidase is a serine, and threonine proteases, depending on the amino D-like enzyme (Lemos and Terra, 1991). Other present in the , or the metalloproteases, if insects have digestive aspartic peptidases in their guts, a metal ion is required for catalytic activity (Alagarsamy including some species of Hemiptera (Houseman and et al., 2006). Microbial-sourced proteases are invaluable Downe, 1983; Knop Wright et al., 2006) and Coleoptera commercial enzymes and account for approximately (Thie and Houseman, 1990; Wolfson and Murdock, 1990; 60% of the total worldwide sales of industrial enzymes Silva and Xavier-Filho, 1991). In the species of six families * Correspondence: [email protected] 69 SEDDIGH and DARABI / Turk J Biol of the order Hemiptera, APs (-like proteinases) et al., 1991; Sielecki et al., 1991). The low pH of midguts were found along with cysteine proteinases (Houseman of many members of Coleoptera and Hemiptera provides and Downe, 1983). It has been found that the temporal more favorable environments for APs (pH optimum ~3–5) activity profile of an AP is associated with fat body histolysis than the high pH of most insect guts (pH optimum ~8–11) during the early metamorphosis of Ceratitis capitata (Houseman et al., 1987), where the aspartic and cysteine (Rabossi et al., 2004). A lysosomal AP from Aedes aegypti proteinases would not be active. Several insect species has been purified and characterized (Cho et al., 1991). Two have comparable pH optima for AP: for example, 3.5 in AP-encoding complementary deoxyribonucleic acids were Rhodnius prolixus (Terra et al., 1988), 4.5 in Leptinotarsa characterized from the small intestine (posterior midgut) decemlineata (Thie and Houseman, 1990), 3.3 in of Triatoma infestans (Balczun et al., 2012). AP was also Callosobruchus maculates (Silva and Xavier-Filho, 1991), cloned from Chilo suppressalis (Walker) with the PCR 3.0 in A. aegypti (Cho et al., 1991), 3.0–3.5 in larval M. method, and its sequence characteristics and variations domestica (Lemos and Terra, 1991), 4.0 in Parasarcophaga in the Chinese populations were analyzed (Li et al., 2014). surcoufi (Dorrah et al., 2000), and 4.0 in Crenotiades Two aspartic proteases were found in the fecal fluid of the dilutes and Parasarcophaga hertipes (Colebatch et al., 2001; leaf-cutting ant Acromyrmex echinatior (Kooij et al., 2014). Elmelegi et al., 2006). AP activity has also been detected in recombinant The effect of some peptidase inhibitors on the activity of proteins of bacterial origin (Hill and Phylip, 1997). They APs is important in plant protection. Inhibition of aspartic are known to play a crucial role in a wide range of biological peptidases by thiol compounds has been reported previously functions (Zhao et al., 2013); however, they have received in other insects such as R. prolixus (Houseman and Downe, considerable attention because of their significant role in 1982), P. hertipes (Barrett, 1977), and L. decemlineata human diseases (Umezawa et al., 1970). APs are a catalytic (Thie and Houseman, 1990). Inhibition assays indicated type of protease enzymes on which comprehensive reviews that cysteine, aspartyl (cathepsin D), serine (trypsin, have been published (Fruton, 1971; Tang, 1979). They are chymotrypsin-like) proteases, and metalloproteases act in endopeptidases that depend on aspartic residues cereal leaf beetle digestion (Wielkopolan et al., 2015). The for their catalytic activity. The active site of AP residue is trypsin inhibitors present in soybean were subsequently shown to be toxic to the larvae of the flour beetle Tribolium located within the motif Asp-Xaa-Gly, in which Xaa can be confusum (Lipke et al., 1954). Ser or Thr (Blundell et al., 1991; Sielecki et al., 1991). Bioinformatics plays a fundamental role in the analysis APs use an activated water molecule, bound to one and interpretation of genomic and proteomic data. or more aspartate residues, for catalysis of their peptide It uses methods and technologies from mathematics, substrates (Baldwin et al., 1993) (Figure 1). APs contain statistics, computer sciences, physics, biology, and certain important enzymes such as , , , medicine (Romano et al., 2011). In this study, we applied cathepsin D, and proteases isolated from different fungi. bioinformatics analysis to this gene for elucidation of These enzymes are homologous entirely in terms of amino structural, functional, and phylogenetic relationships in acid sequences, particularly around the active site residues insects. Analyses of multiple alignment, molecular mass, (Fruton, 1971; Tang, 1979; Fruton, 1987). According to the isoelectric point, , motifs, transmembrane MEROPS (http://merops.sanger.ac.uk/) database, created domain, secondary and spatial structure, and evolution by Rawlings and Barrett (1999), APs are grouped into 16 were performed to provide an understanding of the different families on the basis of their amino acid sequence evolutionary mechanisms of the AP . homology, which in turn are assembled into six different clans based on their evolutionary relationship and tertiary 2. Materials and methods structure (Rawlings and Barrett, 1995). MEROPS contains 2.1. Aspartic proteases sequence retrieval a full listing of all sequences related to the AP family. The AP protein reference sequences (RefSeq) belonging Most APs show maximal activity at a low pH (pH 3–4) to different insect species were downloaded from the and have isoelectric points in the range of pH 3–4.5. Their National Center for Biotechnology Information (NCBI) molecular masses are in the range of 30–45 kDa (Blundell (http://www.ncbi.nlm.nih.gov; date received: October 2014) in FASTA format. Due to the large number of sequences available for AP, it was not feasible to analyze all the sequences. Thus, total data consisted of 19 insect species, which are listed in Table 1 with their accession numbers. One species from each order of insects (Coleoptera, Diptera, Hymenoptera, Lepidoptera, and Phthiraptera) was selected as the representative of that Figure 1. Aspartic protease mechanism. order for structural analysis.

70 SEDDIGH and DARABI / Turk J Biol

Table 1. The nineteen insect species analyzed in the current study.

Index Scientific name Abbreviation Family Accession number 1 Apis dorsata Ad-AP Apidae XP_006607509 2 Apis florea Af-AP Apidae XP_003693293 3 Apis mellifera Am-AP Apidae XP_392857 4 Bombus impatiens Bi-AP Apidae XP_003489428 5 Bombyx mori Bm-AP Bombycidae NP_001037351 6 Bombus terrestris Bt-AP Apidae XP_003403066 7 Ceratitis capitata Cc-AP Tephritidae XP_004531541 8 Culex quinquefasciatus Cq-AP Culicidae XP_001867326 9 Drosophila ananassae Da-AP Drosophilidae XP_001955129 10 Drosophila persimilis Dpe-AP Drosophilidae XP_002013092 11 Drosophila pseudoobscura pseudoobscura Dp-AP Drosophilidae XP_001358331 12 Drosophila willistoni Dw-AP Drosophilidae XP_002064935 13 Drosophila yakuba Dy-AP Drosophilidae XP_002098017 14 Megachile rotundata Mr-AP Megachilidae XP_003705085 15 Musca domestica Md-AP Muscidae NP_001273807 16 Microplitis demolitor Mde-AP Braconidae XP_008560602 17 Nasonia vitripennis Nv-AP Pteromalidae XP_001600858 18 Pediculus humanus corporis Ph-AP Pediculidae XP_002427417 19 Tribolium castaneum Tc-AP Tenebrionidae XP_966517

2.2. Multiple alignment and conserved motif analysis weight, isoelectric points, and secondary structures of AP There are many tools for defining the existence or absence of proteins were estimated using Compute pI/Mw (http:// noticeable domains; however, they are unable to recognize web.expasy.org/compute_pi/) (Bjellqvist et al., 1993, 1994; smaller individual motifs and more divergent patterns. Gasteiger et al., 2005) and SOPMA (http://npsa-pbil.ibcp. Accordingly, the motifs of protein sequences were created fr/cgi-bin/npsa_automat.pl?page=npsa_sopma.html) using the Multiple Em for Motif Elicitation program (Geourjon and Deleage, 1995) in each representative species. (MEME; version 4.9.1) and Motif Alignment and Search The presence and location of signal peptide cleavage sites Tool (MAST; version 4.9.1), available at http://meme.nbcr. in AP sequences were predicted by the SignalP 4.1 server net/meme (Bailey et al., 2009), to study the variation of AP (Petersen et al., 2011). TMHMM (http://www.cbs.dtu.dk/ protein in insects. The program was set to report the 10 services/TMHMM-2.0/) (Moller et al., 2001) and ProDom most robust motifs of 7–30 amino acids, occurring zero (http://prodom.prabi.fr/prodom/current/html/form.php) times or once per sequence, among the proteins belonging (Servant et al., 2002) were used to identify transmembrane to the APs of nineteen insect species. MegAlign from the helices and functional domains in AP protein, respectively. DNASTAR package (version 12.1) was used for multiple 2.4. Tertiary structure prediction and evaluation of the alignment analysis of APs. Sequences were aligned using model ClustalW method with default parameters. The tertiary structure prediction and visualization of AP 2.3. Structural and functional analysis were performed with the Protein Homology/analogY Several online web services and software programs were Recognition Engine V2.0 (PHYRE2) server (http://www. used for the analysis of AP in insects. Comparative and sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index) (Kelley bioinformatics analyses were carried out online on the and Sternberg, 2009) and compared with Swiss-Model EXPASY website (http://expasy.org/tools). Molecular predictions (Biasini et al., 2014).

71 SEDDIGH and DARABI / Turk J Biol

In order to evaluate the stereochemical qualities and (Lepidoptera), Musca domestica (Diptera), Pediculus accuracy of the 3D model, PROCHECK 3.5 analysis (http:// humanus corporis (Phthiraptera), and Tribolium castaneum www.ebi.ac.uk/thornton-srv/software/PROCHECK) was (Coleoptera). used and a Ramachandran plot was drawn (Laskowski et 3.1. Multiple alignment and conserved motif analysis al., 1993, 1996). A conserved motif is a sequence pattern that occurs 2.5. Protein–protein interactions repeatedly in a group of related protein sequences. Motif The Search Tool for the Retrieval of Interacting Genes/ analyses of AP proteins were performed in order to find Proteins (STRING 9.1) database (http://string-db.org/) patterns of conserved motifs, and the MEME program was was used to predict the interacting proteins (Franceschini run to detect sequence motifs in insect samples. MEME et al., 2013). The database contains information from represents motifs as position-dependent letter-probability various sources, including experimental repositories, matrices, which designate the probability of each possible computational prediction methods, and public text groups. letter at each position in the pattern, whereas motifs in 2.6. Phylogenetic analysis MAST are represented as position-dependent scoring Phylogenetic analyses of APs based on amino acid sequences matrices, which describe the score of each possible letter were carried out using Molecular Evolutionary Genetic at each position in the pattern. Ten conserved motifs of Analysis (MEGA; version 6) (Tamura et al., 2013). Sequences APs were identified by MEME (Figure 2) and are listed in were aligned with the ClustalW method, and a phylogenetic Table 2. These results also revealed that only three motifs tree was obtained by neighbor-joining (NJ) with complete (1, 8, and 10) were shared by all insects. The alignment gap deletion using the Poisson substitution model, rates of protein sequences by MegAlign showed high homology among sites, uniform rates, same patterns among lineages among various insects. Three similar conserved motifs (homogeneous), and 1000 bootstrap replications. were shown in the alignment report (Figure 3). 3.2. Structural and functional analysis 3. Results Molecular weight and the isoelectric point of each In the current study, a total of nineteen APs protein representative sample were predicted. AP protein in all sequences of eleven different insect families were analyzed insects ranged from 383 to 390 amino acids in length. The with bioinformatics tools. Five insects were chosen as theoretical isoelectric points and molecular weight were representatives of each insect order for structural analysis, calculated at a range of 5.64–8.70 and 41,622.63–42,509.98 including Apis mellifera (Hymenoptera), Bombyx mori kDa, respectively (Table 3).

Figure 2. Motifs for AP proteins. The MAST motifs are shown as different-colored boxes. The E-value of a sequence is the expected number of sequences in a random database of the same size that would match the motifs as well as the sequence.

72 SEDDIGH and DARABI / Turk J Biol

Table 2. Motif sequences identified with MEME.

Motif Width E-value Sites Sequence 1 23 1.2e-272 19 P[PK]Q[DK]F[KR]V[IV]FDTGSSNLWVP[SG]KKC 2 24 6.1e-251 18 ILGDVFIG[RK][YF]YT[EV]FDMGNNRVGFA 3 30 5.5e-286 18 QTFAEALSEPGL[AV]FVAAKFDGILG[ML][AG]YS[KS]I 4 30 1.1e-284 18 NIAC[KL]LH[NR]KY[DN][SN][ST]KSSTY[KV]KNGTDFAI[RQ]YG 5 30 2.8e-241 12 Y[VI]L[KR]V[TA]Q[FM]GKT[VI]CLSGFMG[MI]DIPPPNGPLW 6 30 1.4e-225 18 HY[ES]G[SD]FTY[VL]PV[DT][RK]K[GA]YWQFKMDS[IV]S[VI]GSDK 7 22 5.4e-177 15 P[QT]PEPLSNYLDAQY[YF]G[VP]I[ST]IGT 8 16 1.2e-149 19 GC[EQ]AIAD[TS]GTSLIAGP 9 22 2.2e-180 18 P[VP]FYNM[VYC]KQ[GN]L[VI]P[QS][PC][VI]FSFYLN 10 22 8.4e-144 19 SGS[LV]SGYLSTDTV[TDN][IV][AG]G[LM][KT][IV][KS][DN]

The Self-Optimized Prediction Method with Alignment 3). Signal peptide cleavage site predictions for AP are (SOPMA) server calculates the percentage of , shown in Figure 4. extended strand, , and random coil in the Proteins can end up as integral membrane secondary structure of protein sequences. The prediction components. These proteins are said to be transmembrane. of the secondary structure of APs based on hierarchical Transmembrane domain cleavage site predictions for APs neural network analysis (Combet et al., 2000) showed that are shown in Table 3. The number of transmembrane the most plentiful structural elements of the secondary helices in representative insects was variable between zero structure were extended strand and random coils, whereas and one. beta turns and alpha helixes were occasionally distributed Pattern and profile search was performed by the in the proteins (Table 3). Protein Domain Database (ProDom) server. ProDom Secretory proteins are synthesized in the endoplasmic is constructed automatically by clustering homologous reticulum and are identified with their signal peptides. segments. This database consists of domain family entries. A signal peptide is a short peptide (5–30 amino acids Each entry provides a multiple sequence alignment of long) that exists at the N-terminal of most of the newly homologous domains and a family consensus sequence. synthesized proteins, which are destined towards the The analysis of functional domains of the AP samples secretory pathways (Blobel and Dobberstein, 1975). showed different domains (Figure 5). A comparison of Signal peptides are extremely heterogeneous, and many ProDom domains showed that there were nineteen similar prokaryotic and eukaryotic signal peptides are functionally domains among samples, which are listed in Table 4 with interchangeable even between different species; however, their positions and E-values. the efficiency of protein secretion is strongly determined 3.3. Tertiary structure prediction and assessment of the by the signal peptide (Von Heijne, 1985; Kober et al., 2013). model The SignalP 4.1 server predicts the existence and location Three-dimensional modeling of Am-AP was estimated of signal peptide cleavage sites in protein sequences. The by the Protein Homology/analogY Recognition Engine method includes a prediction of cleavage sites and a signal/ 2.0 (PHYRE2) (Figure 6a). PHYRE2 uses profile–profile nonsignal peptide prediction based on a combination of matching and secondary structure for the prediction of several artificial neural networks. The results of SignalP tertiary structure. This server also uses the alignment of indicated that all the members of AP possess a signal hidden Markov models via HHsearch (Söding, 2005) to peptide, which is synthesized in the cytoplasm. The length significantly improve accuracy of alignment and detection of all signal peptides in insect AP is 17–20 amino acid rate. The model of Am-AP by PHYRE2 was predicted using residues (Table 3). The maximum length of signal peptide the “d1qdma2” model (PDB accession code: 1qdm), with belonged to Md-AP (about 20 amino acid residues), while 100% confidence and 56% identity. The PHYRE2 result was Am-AP and Ph-AP had a maximum of about 17 amino compared with predicted models of Swiss-Model (Figures acid residues as the shortest signal peptide length in 6b–6d). Swiss-Model (http://swissmodel.expasy.org/) is N-terminal extension among representative insects (Table an automated system for modeling the 3D structure of

73 SEDDIGH and DARABI / Turk J Biol

Figure 3. Multiple alignment of APs amino acid sequences of insects by MegAlign in DNASTAR package using ClustalW method. Shade with blue residues match the consensus; shade with yellow residues differ from the consensus. The conserved motifs (1, 8, and 10) are shown in the shape.

74 SEDDIGH and DARABI / Turk J Biol

Table 3. Predictions of primary and secondary structures, transmembrane domains, and signal peptides of 5 different aspartic protease proteins using Compute PI/Mw, SOPMA, TMHMM, and SignalP 4.1.

Compute pI/Mw SOPMA prediction results TMHMM prediction result SignalP 4.1 prediction results prediction results Name Length Random Extended Beta Alpha Predicted Number of PI MW Signal peptide position coil strand turn helix TM helix predicted TMHs

Am-AP 385 5.90 42,222.21 40.26 32.73 5.71 21.30 __ 0 1–17

Bm-AP 384 6.25 41,622.63 40.10 34.11 6.25 19.53 A5-A24 1 1–18

Md-AP 390 5.64 42,425.89 40.77 31.79 7.69 19.74 __ 0 1–20

Ph-AP 383 8.70 42,509.98 43.70 32.13 6.17 17.99 __ 0 1–17

Tc-AP 384 7.53 41,635.69 41.41 32.03 5.47 21.09 A5-A24 1 1–18

a protein from its amino acid sequence using homology 3.5. Functional interaction network analysis of aspartic modeling techniques (Biasini et al., 2014). The 3D models proteases of both servers indicated that they all adopt very similar In order to predict the interacting proteins, Am-AP was tertiary structures; therefore, these results suggested that applied to the STRING 9.1 tool as a representative of insects. the designated PDB code is the best model of AP. STRING is a database of known and predicted protein Verification of the stereochemical quality of Am- interactions. The interactions include direct (physical) AP structure was performed by the Programs to Check and indirect (functional) associations. According to the Stereochemical Quality of Protein Structures the results, no datasets were predicted in A. mellifera, (PROCHECK) server, and the Ramachandran plot was although twenty-seven of these existed in other organisms. designated. The Ramachandran plot shows the phi–psi Ten functional partners were identified in the network torsion angles for all residues in the structure, except analyses, listed in Table 5. According to the protein– those at the chain termini. The coloring/shading on the protein interaction analysis, seven enriched pathways of plot represents the different regions (Morris et al., 1992). KEGG were obtained, including mTOR signaling pathway, The plot revealed that the tertiary structure of Am-AP has pyrimidine metabolism, DNA replication, lysosome, good chain stereochemistry (Figure 7). regulation of autophagy, drug metabolism, and metabolic 3.4. Phylogenetic analysis pathways. The K-means algorithm was used for protein The multiple sequence alignment of full-length protein clustering in twelve different colors (Figure 9). sequences was used to construct an unrooted phylogenetic tree, and the NJ method was used for its design. The NJ 4. Discussion algorithm is commonly applied with distance tree-building In this study, bioinformatics methods were used to identify regardless of the optimization criterion. This method is the characteristics of AP in nineteen species from eleven comparatively rapid. The results of the phylogenetic tree different insect families. showed that all AP proteins were separated into 3 different Before finding APs in some insects like Coleoptera, groups, designated as Groups I–III (Figure 8). Most of it was proven that pepstatin, a powerful and specific the designated groups were supported by more than 50 inhibitor of APs, strongly inhibited of bootstrap values, shown in branches. Based on the tree, Cc- the midgut enzymes of Colorado potato beetle, L. AP was excluded from the groups because of low bootstrap decemlineata, demonstrating that an AP was present in the support. Each group contained six members of different midgut extracts (Wolfson and Murdock, 1987). In corn, families; consequently, all Apoidea insects were designated has been shown to confer resistance to to Group II. All the members of the Drosophilidae family fall armyworm (Spodoptera frugiperda) via degradation of were classified in Group I with a Hymenopteran species, the peritrophic membrane of this chewing insect, which Nv-AP. Group III contained different insects from several interferes with nutrient acquisition, ultimately killing the families. Although similarities are clear within a family, insect (Jiang et al., 1995; Pechan et al., 2002; Mohan et there is not enough knowledge about different families’ al., 2006). Therefore, knowing the characterization of this relationships and similarities. However, these results enzyme can be helpful in insect pest control and integrated suggested that, in insects, APs are very similar to each pest management. other and are probably inherited from a common ancestor.

75 SEDDIGH and DARABI / Turk J Biol

Figure 4. Prediction of signal peptide cleavage sites in five selected species of insects by SignalP 4.1 server. C-score (raw cleavage site score), S-score (signal peptide score), and Y-score (combined cleavage site score) are shown.

Figure 5. Graphical results of functional domains of five different insect species detected by ProDom.

76 SEDDIGH and DARABI / Turk J Biol

Figure 6. Comparing tertiary structure prediction of AP protein in Am-AP (accession number: XP_392857; PDB accession code: 1qdm): (a) intensive model by Phyre2 server; (b), (c), and (d) three models of Swiss-Model predictions.

Table 4. Functional domains of five different insect orders of AP protein by ProDom.

ProDom Am-AP Bm-AP Md-AP Ph-AP Tc-AP Index domain Position Score E-value Position Score E-value Position Score E-value Position Score E-value Position Score E-value

1 PD198173 18–60 112 0.0003 5–61 297 8e-26 3–66 277 2e-23 19–59 148 2e-08 9–59 174 1e-11

2 PD522343 61–103 223 3e-17 62–104 235 1e-18 67–109 235 1e-18 60–106 220 6e-17 60–102 229 7e-18

3 PDB2A9G7 67–290 124 1e-05 69–289 118 4e-05 75–239 142 7e-08 68–284 122 1e-05 74–296 161 5e-10

4 PDA7L942 69–174 108 0.0008 52–175 137 3e-07 56–183 106 0.001 49–176 119 3e-05 40–173 131 1e-06

5 PD000182 105–161 148 2e-08 108–158 262 9e-22 113–163 195 5e-14 106–156 154 3e-09 106–156 212 6e-16

6 PDC0Q4U8 107–382 223 3e-17 109–384 149 1e-08 114–387 177 6e-12 146–380 152 5e-09 107–381 199 2e-14

7 PDA923K1 140–385 188 4e-13 124–383 152 5e-09 146–390 175 9e-12 122–382 208 2e-15 125–379 187 4e-13

8 PD933308 168–192 119 4e-05 169–193 128 4e-06 174–197 120 3e-05 167–191 118 5e-05 167–191 122 1e-05

9 PDB8S246 169–333 175 1e-11 170–333 167 1e-10 175–338 161 4e-10 168–331 139 2e-07 168–332 190 2e-13

10 PDA5R6K7 185–298 160 7e-10 190–297 99 0.008 202–303 141 8e-08 202–296 139 2e-07 184–297 150 8e-09

11 PD890699 193–252 250 2e-20 194–255 286 2e-24 198–262 318 3e-28 194–258 256 5e-21 192–254 278 1e-23

12 PDB3O874 226–299 142 8e-08 226–294 123 1e-05 232–294 115 0.0001 225–285 124 8e-06 225–298 167 1e-10

13 PDA1D2U8 261–293 144 4e-08 259–292 169 5e-11 263–299 159 8e-10 256–293 136 4e-07 259–294 144 4e-08

14 PDB0S9U8 265–354 145 3e-08 264–349 134 7e-07 270–359 107 0.0007 263–347 131 2e-06 264–353 136 4e-07

15 PD876079 267–347 147 2e-08 256–333 144 4e-08 273–354 134 6e-07 265–345 137 3e-07 258–333 150 1e-08

16 PD777487 267–372 110 0.0004 266–371 106 0.001 272–390 124 9e-06 265–370 113 0.0002 266–371 111 0.0003

17 PDA9Z9E1 298–327 123 1e-05 297–323 145 4e-08 303–331 149 9e-09 296–324 101 0.005 297–325 114 0.0001

18 PD058906 301–347 154 3e-09 302–346 140 1e-07 315–352 158 9e-10 308–345 153 5e-09 309–346 167 9e-11

19 PDA1D277 348–384 187 5e-13 347–383 205 4e-15 353–389 190 2e-13 346–383 183 1e-12 347–384 199 2e-14

77 SEDDIGH and DARABI / Turk J Biol

Figure 7. Ramachandran plot of Am-AP (PDB accession code: 1qdm). The Ramachandran plot shows the phi–psi torsion angles for all residues in the structure (except those at the chain termini). Glycine residues are separately identified by triangles, as these are not restricted to the regions of the plot appropriate to the other side chain types. The coloring/shading on the plot represents the different regions as follows: the darkest areas (shown in red) correspond to the ‘core’ regions representing the most favorable combinations of phi–psi values. Ideally, one would hope to have over 90% of the residues in these ‘core’ regions. The percentage of residues in the ‘core’ regions is one of the better guides to stereochemical quality. The regions are labeled as follows: A-Core alpha, L-Core left-handed alpha, a-Allowed alpha, l-Allowed left-handed alpha, ~a-Generous alpha, ~l-Generous left-handed alpha, B-Core beta, p-Allowed epsilon, b-Allowed beta, ~p-Generous epsilon, and ~b-Generous beta.

Figure 8. The phylogenetic tree of AP protein from insects using the CLUSTALW (MEGA 6) program. The NJ method was used to construct the tree. Numbers associated with branches show bootstrap support values for NJ analyses higher than 50. Three major groups, designated from I to III, are marked with different colors.

78 SEDDIGH and DARABI / Turk J Biol

Table 5. Characteristics of input protein Am-AP (cathepsin D XP_392857) and functional partners predicted with STRING 9.1.

Index Name Functions Score 1 Atg1 Autophagy-specific gene 1 0.812 2 GB17311 Similar to CG8444-PA 0.755 3 GB14710 Venom serine 0.643 4 GB11983 Similar to angiotensin-converting enzyme CG8827-PA, isoform A 0.623 Similar to angiotensin-converting enzyme, testis-specific isoform precursor (ACE-T) 5 GB16675 0.623 (Dipeptidyl carboxypeptidase I) (Kininase II) 6 GB20065 Similar to angiotensin I-converting enzyme (peptidyl- A) 1 isoform 1 0.623 7 GB19369 Similar to rudimentary-like CG3593-PA (468 aa) 0.627 8 GB11271 Similar to proprotein convertase subtilisin/kexin type 1 preproprotein 0.609 9 Fur1 Furin-like protease 1 0.609 10 Mcm5 Minichromosome maintenance 5 0.624

Figure 9. The interactive network view of predicted proteins of protein–protein interactions using STRING 9.1 tool. Network nodes are proteins, and the edges represent the predicted functional associations. Intercluster edges are represented by dashed lines.

79 SEDDIGH and DARABI / Turk J Biol

Bioinformatics is an interdisciplinary field that the secretory proteins of each sample from representative develops methods and software tools for understanding insects were predicted by SignalP analysis. The function of biological data. It is a powerful tool for predicting the AP can be the same among insects, which resulted from structure and function of a protein from its amino acid nineteen similar domains by ProDom. sequence by means of its similarity to a sequence of All aspartate proteases have a highly conserved known structure or function. This method plays a key role sequence of Asp-Thr-Gly. In general, these enzymes are in guiding the experimental characterization of a genome monomeric enzymes consisting of two domains, and it (Martin et al., 2004). Because of its effectiveness and is thought that these domains may have arisen through low cost, bioinformatics has been widely used to predict ancestral gene duplication. Since phylogenetic analyses antigenic epitopes on proteins (Bai et al., 2012). Thus, it can be the basis of molecular and biochemical analyses seems that these tools have revolutionized the study of of protein families, the protein research was performed genetics (Darabi et al., 2012; Darabi and Farhadi-Nejad, on the AP protein family in insects. Based on their 2013; Kesici et al., 2013; Seddigh and Darabi, 2014). In phylogenetic relationships, AP proteins were divided into conclusion, our study employed bioinformatics methods to 3 groups. Phylogenetic analysis showed that evolutionary explain the structure, function, and molecular mechanism relationships among different groups of AP proteins were of AP protein in insects. We used several different software inevitable. Moreover, within each group, similar amino packages and online services to predict protein function, acid sequences suggested strong evolutionary relationships such as 3D structure, protein–protein interaction among different families. network, and evolution relationships of AP in insects. In this research, bioinformatics analyses of insect Bioinformatics analysis has allowed us to predict primary APs showed similarities between this protein in different and secondary structures of representative insect samples. families, and the obtained data provide a background for The 3D prediction revealed that the tertiary structure of bioinformatics studies on the function and evolution of insect APs is common to that of other enzymes belonging APs in other insects and organisms. to the AP family, whose crystallographic structure has been determined (Kervinen et al., 1999). Topology predictions Acknowledgment were analyzed and motifs were obtained. This research was supported by the Varamin-Pishva The obtained data of the bioinformatics analysis Branch of the Islamic Azad University. We express our indicated that AP is a secretory protein, and the positions of sincere appreciation.

References

Alagarsamy S, Larroche C, Pandey A (2006). Microbiology and Barrett A (1977). Cathepsin D and other carboxyl proteinases. In: industrial biotechnology of food-grade proteases: a perspective. Barrett AJ, editor. Proteinases in Mammalian Cells and Tissues. Food Technol Biotech 44: 211–220. Amsterdam, the Netherlands: Elsevier, pp. 209–248. Bai Y, He S, Zhao G, Chen L, Shi N, Zhou H, Cong H, Zhao Q, Zhu Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt XQ (2012). Toxoplasma gondii: bioinformatics analysis, cloning T, Kiefer F, Cassarino TG, Bertoni M, Bordoli L et al. (2014). and expression of a novel protein TgIMP1. Exp Parasitol 132: SWISS-MODEL: modelling protein tertiary and quaternary 458–464. structure using evolutionary information. Nucleic Acids Res Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren 42: W252–258. J, Li WW, Noble WS (2009). MEME SUITE: tools for motif Bjellqvist B, Basse B, Olsen E, Celis JE (1994). Reference points discovery and searching. Nucleic Acids Res 37: W202–208. for comparisons of two-dimensional maps of proteins from Balczun C, Siemanowski J, Pausch JK, Helling S, Marcus K, Stephan different human cell types defined in a pH scale where C, Meyer HE, Schneider T, Cizmowski C, Oldenburg M isoelectric points correlate with polypeptide compositions. et al. (2012). Intestinal aspartate proteases TiCatD and Electrophoresis 15: 529–539. TiCatD2 of the haematophagous bug Triatoma infestans Bjellqvist B, Hughes GJ, Pasquali C, Paquet N, Ravier F, Sanchez JC, (Reduviidae): sequence characterisation, expression pattern Frutiger S, Hochstrasser D (1993). The focusing positions of and characterisation of proteolytic activity. Insect Biochem polypeptides in immobilized pH gradients can be predicted Molec 42: 240–250. from their amino acid sequences. Electrophoresis 14: 1023– Baldwin ET, Bhat TN, Gulnik S, Hosur MV, Sowder RC, Cachau RE, 1031. Collins J, Silva AM, Erickson JW (1993). Crystal structures of Blobel G, Dobberstein B (1975). Transfer of proteins across native and inhibited forms of human cathepsin D: implications membranes. I. Presence of proteolytically processed and for lysosomal targeting and drug design. P Natl Acad Sci USA unprocessed nascent immunoglobulin light chains on 90: 6796–6800. membrane-bound ribosomes of murine myeloma. J Cell Biol 67: 835–851.

80 SEDDIGH and DARABI / Turk J Biol

Blundell T, Cooper J, Sali A, Zhu Z (1991). Comparison of the Geourjon C, Deleage G (1995). SOPMA: significant improvements sequences, 3-D structures and mechanisms of pepsin-like and in protein secondary structure prediction by consensus retroviral aspartic proteinases. In: Dunn BM, editor. Structure prediction from multiple alignments. Comput Appl Biosci 11: and Function of the Aspartic Proteases. New York, NY, USA: 681–684. Plenum Press, pp. 443–453. Greenberg B, Paretsky D (1955). Proteolytic enzymes in the house Cho WL, Dhadialla TS, Raikhel AS (1991). Purification and fly, Musca domestica (L.). Ann Entomol Soc Am 48: 46–50. characterization of a lysosomal aspartic protease with cathepsin Hill J, Phylip LH (1997). Bacterial aspartic proteinases. FEBS Lett D activity from the mosquito. Insect Biochem 21: 165–176. 409: 357–360. Ciechanover A (2005). Proteolysis: from the lysosome to ubiquitin Houseman JG, Campbell FC, Morrison PE (1987). A preliminary and the proteasome. Nat Rev Mol Cell Bio 6: 79–87. characterization of digestive proteases in the posterior midgut Colebatch G, East P, Cooper P (2001). Preliminary characterisation of the stable fly Stomoxys calcitrans (L.) (Diptera: Muscidae). of digestive proteases of the green mirid, Creontiades dilutus Insect Biochem 17: 213–218. (Hemiptera: Miridae). Insect Biochem Molec 31: 415–423. Houseman JG, Downe AER (1982). Characterization of an acidic Combet C, Blanchet C, Geourjon C, Deleage G (2000). NPS@: proteinase from the posterior midgut of Rhodnius prolixus Stal network protein sequence analysis. Trends Biochem Sci 25: (Hemiptera: Reduviidae). Insect Biochem 12: 651–655. 147–150. Houseman JG, Downe AER (1983). Cathepsin D-like activity in the Darabi M, Farhadi-Nejad H (2013). Study of the 3-hydroxy-3- posterior midgut of Hemipteran insects. Comp Biochem Phys methylglotaryl-coenzyme A reductase (HMGR) protein in B 75: 509–512. Rosaceae by bioinformatics tools. Caryologia 66: 351–359. Jiang B, Siregar U, Willeford KO, Luthe DS, Williams WP (1995). Darabi M, Masoudi-Nejad A, Nemat-Zadeh G (2012). Bioinformatics Association of a 33-kilodalton cysteine proteinase found in study of the 3-hydroxy-3-methylglotaryl-coenzyme A corn callus with the inhibition of fall armyworm larval growth. reductase (HMGR) gene in Gramineae. Mol Biol Rep 39: Plant Physiol 108: 1631–1640. 8925–8935. Kelley LA, Sternberg MJ (2009). Protein structure prediction on the Davies DR (1990). The structure and function of the aspartic Web: a case study using the Phyre server. Nat Protoc 4: 363– proteinases. Annu Rev Biophys Bio 19: 189–215. 371. Dorrah MA, Yousef HA, Bassal TT (2000). Partial characterization Kervinen J, Tobin GJ, Costa J, Waugh DS, Wlodawer A, Zdanov of acidic proteinase in the midgut of Parasarcophaga surcoufi A (1999). Crystal structure of plant aspartic proteinase larvae (Diptera: Sarcophagidae). J Egypt Soc Parasitol 30: 643– prophytepsin: inactivation and vacuolar targeting. EMBO J 18: 653. 3947–3955. Ehrmann M, Clausen T (2004). Proteolysis as a regulatory Kesici K, Tüney İ, Zeren D, Güden M, Sukatar A (2013). Morphological mechanism. Annu Rev Genet 38: 709–724. and molecular identification of pennate diatoms isolated from Elmelegi M, Dorrah M, Bassal T (2006). Aspartic and serine- Urla, İzmir, coast of the Aegean Sea. Turk J Biol 37: 530–537. peptidase activity in the midgut of the larval Parasarcophaga Knop Wright M, Brandt S, Coudron T, Wagner R, Habibi J, Backus hertipes (Sarcophagidae, Cyclorrahapha, Diptera). Efflatounia E, Huesing J (2006). Characterization of digestive proteolytic 6: 1–9. activity in Lygus hesperus Knight (Hemiptera: Miridae). J Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Insect Physiol 52: 717–728. Roth A, Lin J, Minguez P, Bork P, von Mering C et al. (2013). Kober L, Zehe C, Bode J (2013). Optimized signal peptides for the STRING v9.1: protein-protein interaction networks, with development of high expressing CHO cell lines. Biotechnol increased coverage and integration. Nucleic Acids Res 41: Bioeng 110: 1164–1173. D808–815. Kooij PW, Rogowska-Wrzesinska A, Hoffmann ,D Roepstorff ,P Fraser A, Ring RA, Stewart RK (1961). Intestinal proteinases in an Boomsma JJ, Schiøtt M (2014). Leucoagaricus gongylophorus insect, Calliphora vomitoria L. Nature 192: 999–1000. uses leaf-cutting ants to vector proteolytic enzymes towards Fruton JS (1971). Pepsin. In: Boyer PD, editor. The Enzymes. New new plant . ISME J 8: 1032–1040. York, NY, USA: Academic Press, pp. 119–164. Lambremont EN, Fish FW, Ashrafi S (1959). Pepsin-like enzyme in Fruton JS (1987). Aspartyl proteinases. In: Neuberger A, Brocklehurst larvae of stable flies. Science 129: 1484–1485. K, editors. New Comprehensive Biochemistry. Amsterdam, the Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993). Netherlands: Elsevier, pp. 1–38. PROCHECK: a program to check the stereochemical quality Gasteiger E, Hoogland C, Gattiker A, Wilkins MR, Appel RD, of protein structures. J Appl Crystallogr 26: 283–291. Bairoch A (2005). Protein identification and analysis tools Laskowski RA, Rullmann JA, MacArthur MW, Kaptein R, Thornton on the ExPASy server. In: Walker JM, editor. The Proteomics JM (1996). AQUA and PROCHECK-NMR: programs for Protocols Handbook. Totowa, NJ, USA: Humana Press, pp. checking the quality of protein structures solved by NMR. J 571–607. Biomol NMR 8: 477–486.

81 SEDDIGH and DARABI / Turk J Biol

Lemos FJ, Terra WR (1991). Properties and intracellular distribution Rawlings ND, Barrett AJ (1995). Families of aspartic peptidases, and of a cathepsin D-like proteinase active at the acid region of those of unknown catalytic mechanism. Methods Enzymol Musca domestica midgut. Insect Biochem 21: 457–465. 248: 105–120. Li XC, Luo GH, Han ZJ, Fang JC (2014). Molecular cloning Rawlings ND, Barrett AJ (1999). MEROPS: the peptidase database. and analysis of aspartic protease (AP) gene in Ty3/gypsy Nucleic Acids Res 27: 325–331. in different geographical populations of Chilo Romano P, Giugno R, Pulvirenti A (2011). Tools and collaborative suppressalis (Lepidoptera: Pyralidae) in China. Acta Entomol environments for bioinformatics research. Brief Bioinform 12: Sinica 57: 530–537. 549–561. Lin JS, Lee SK, Chen Y, Lin WD, Kao CH (2011). Purification and Ryan CA (1990). Protease inhibitors in plants: genes for improving characterization of a novel extracellular defenses against insects and pathogens. Annu Rev Phytopathol from Rhizopus oligosporus. J Agric Food Chem 59: 11330– 28: 425–449. 11337. Sabotič J, Kos J (2012). Microbial and fungal protease inhibitors— Lipke H, Fraenkel G, Liener I (1954). Growth inhibitors—effect of current and potential applications. Appl Microbiol Biot 93: soybean inhibitors on growth of Tribolium confusum. J Agric Food Chem 2: 410–414. 1351–1375. López-Otín C, Bond JS (2008). Proteases: multifunctional enzymes Seddigh S, Darabi M (2014). Comprehensive analysis of beta- in life and disease. J Biol Chem 283: 30433–30437. galactosidase protein in plants based on Arabidopsis thaliana. Turk J Biol 38: 140–150. Martin DM, Berriman M, Barton GJ (2004). GOtcha: a new method for prediction of protein function assessed by the annotation of Servant F, Bru C, Carrère S, Courcelle E, Gouzy J, Peyruc D, Kahn seven genomes. BMC Bioinformatics 5: 178–195. D (2002). ProDom: automated clustering of homologous domains. Brief Bioinform 3: 246–251. Mohan S, Ma P, Pechan T, Bassford E, Williams W, Luthe D (2006). Degradation of the S. frugiperda peritrophic matrix by an Sielecki A, Fujinaga M, Read R, James M (1991). Refined structure inducible maize cysteine protease. J Insect Physiol 52: 21–28. of porcine pepsinogen at 1.8 Å resolution. J Mol Biol 219: 671–692. Moller S, Croning MD, Apweiler R (2001). Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics Silva CP, Xavier-Filho J (1991). Comparison between the levels 17: 646–653. of aspartic and cysteine proteinases of the larval midguts of Callosobruchus maculatus (F.) and Zabrotes subfasciatus Morris AL, MacArthur MW, Hutchinson EG, Thornton JM (1992). (BOH.) (Coleoptera: bruchidae). Comp Biochem Phys B 99: Stereochemical quality of protein structure coordinates. 529–533. Proteins 12: 345–364. Sinha M (1975). Pepsin-like activity in the midgut of Sarcophaga Najafi MF, Deobagkar D, Deobagkar D (2005). Potential application ruficornis and Musca domestica. Appl Entomol Zool 10: 313– of protease isolated from Pseudomonas aeruginosa PD100. 315. Electron J Biotechno 8: 79–85. Söding J (2005). Protein homology detection by HMM–HMM Oikonomopoulou K, Hansen KK, Saifeddine M, Vergnolle N, Tea I, comparison. Bioinformatics 21: 951–960. Diamandis EP, Hollenberg MD (2006). Proteinase-mediated cell signalling: targeting proteinase-activated receptors (PARs) Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013). by kallikreins and more. Biol Chem 387: 677–685. MEGA6: Molecular Evolutionary Genetics Analysis version Pechan T, Cohen A, Williams WP, Luthe DS (2002). Insect feeding 6.0. Mol Biol Evol 30: 2725–2729. mobilizes a unique plant defense protease that disrupts the Tang J (1979). Evolution in the structure and function of carboxyl peritrophic matrix of caterpillars. P Natl Acad Sci USA 99: proteases. Mol Cell Biochem 26: 93–109. 13319–13323. Terra WR, Ferreira C, Garcia ES (1988). Origin, distribution, Pendola S, Greenberg B (1975). Substrate-specific analysis of properties and functions of the major Rhodnius prolixus proteolytic enzymes in the larval midgut of Calliphora vicina. midgut . Insect Biochem 18: 423–434. Ann Entomol Soc Am 68: 341–345. Tie NM, Houseman JG (1990). Identification of cathepsin B, D and Petersen TN, Brunak S, von Heijne G, Nielsen H (2011). SignalP 4.0: H in the larval midgut of Colorado potato beetle, Leptinotarsa discriminating signal peptides from transmembrane regions. decemlineata Say (Coleoptera: Chrysomelidae). Insect Nat Methods 8: 785–786. Biochem 20: 313–318. Rabossi A, Stoka V, Puizdar V, Turk V, Quesada-Allué LA (2004). Umezawa H, Aoyagi T, Morishima H, Matsuzaki M, Hamada Novel aspartyl proteinase associated to fat body histolysis M (1970). Pepstatin, a new pepsin inhibitor produced by during Ceratitis capitata early metamorphosis. Arch Insect Actinomycetes. J Antibiot 23: 259–262. Biochem Physiol 57: 51–67. Von Heijne G (1985). Signal sequences: the limits of variation. J Mol Rao MB, Tanksale AM, Ghatge MS, Deshpande VV (1998). Biol 184: 99–105. Molecular and biotechnological aspects of microbial proteases. Microbiol Mol Biol R 62: 597–635.

82 SEDDIGH and DARABI / Turk J Biol

Wielkopolan B, Walczak F, Podleśny A, Nawrot R, Obrępalska- Yegin S, Fernandez-Lahore M, Salgado AJG, Guvenc U, Goksungur Y, Stęplowska A (2015). Identification and partial characterization Tari C (2011). Aspartic proteinases from Mucor spp. in cheese of proteases in larval preparations of the cereal leaf beetle manufacturing. Appl Microbiol Biotechnol 89: 949–960. (Oulema melanopus, Chrysomelidae, Coleoptera). Arch Insect Zhao G, Zhou A, Lu G, Meng M, Sun M, Bai Y, Han Y, Wang L, Biochem 88: 192–202. Zhou H, Cong H (2013). Identification and characterization Wolfson J, Murdock L (1987). Suppression of larval Colorado of Toxoplasma gondii aspartic protease 1 as a novel vaccine potato beetle growth and development by digestive proteinase candidate against toxoplasmosis. Parasite Vector 6: 175–188. inhibitors. Entomol Exp Appl 44: 235–240. Wolfson JL, Murdock LL (1990). Diversity in digestive proteinase activity among insects. J Chem Ecol 16: 1089–1102.

83