Data-Mining Approaches Reveal Hidden Families of Proteases in The
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press Letter Data-Mining Approaches Reveal Hidden Families of Proteases in the Genome of Malaria Parasite Yimin Wu,1,4 Xiangyun Wang,2 Xia Liu,1 and Yufeng Wang3,5 1Department of Protistology, American Type Culture Collection, Manassas, Virginia 20110, USA; 2EST Informatics, Astrazeneca Pharmaceuticals, Wilmington, Delaware 19810, USA; 3Department of Bioinformatics, American Type Culture Collection, Manassas, Virginia 20110, USA The search for novel antimalarial drug targets is urgent due to the growing resistance of Plasmodium falciparum parasites to available drugs. Proteases are attractive antimalarial targets because of their indispensable roles in parasite infection and development,especially in the processes of host e rythrocyte rupture/invasion and hemoglobin degradation. However,to date,only a small number of protease s have been identified and characterized in Plasmodium species. Using an extensive sequence similarity search,we have identifi ed 92 putative proteases in the P. falciparum genome. A set of putative proteases including calpain,metacaspase,and s ignal peptidase I have been implicated to be central mediators for essential parasitic activity and distantly related to the vertebrate host. Moreover,of the 92,at least 88 have been demonstrate d to code for gene products at the transcriptional levels,based upon the microarray and RT-PCR results,an d the publicly available microarray and proteomics data. The present study represents an initial effort to identify a set of expressed,active,and essential proteases as targets for inhibitor-based drug design. [Supplemental material is available online at www.genome.org.] Malaria remains one of the most dangerous infectious diseases metalloprotease (falcilysin; Eggleson et al. 1999). The success- in the world. It kills 1–2 million people each year, and is ful crystallization of plasmepsin II and the expression of re- responsible for enormous economic burdens in endemic re- combinant plasmepsin I/II and falcipain-2 represented a sig- gions. The development of new antimalarial drugs is urgently nificant advance towards a functional understanding and a needed due to the continuing high mortality and morbidity rational design of inhibitors of these enzymes (Silva et al. caused by malaria and the increasing prevalence of drug- 1996; Bernstein et al. 1999; Tyas et al. 1999; Shenai et al. 2000; resistance in the pathogenic parasite Plasmodium falciparum. Dua et al. 2001). Malarial proteases have long been considered potential The recent completion of the P. falciparum genome pro- targets for chemotherapy due to their crucial roles in the para- vides a basis on which to identify new proteases. The first pass site life cycle, and the feasibility of designing specific inhibi- annotation has predicted 25 proteases that belong to ten tors (for reviews, see McKerrow et al. 1993; Rosenthal 1998; families of five catalytic classes (Table 1). Despite this initial Blackman 2000; Rosenthal 2002). Efforts to identify func- progress, direct evidence from protease inhibition assays and tional proteases targeted by inhibition assays are ongoing. independent comparisons with other genomes suggest that in Subtilase-1 and Subtilase-2, two homologous serine proteases, addition to the limited number of characterized and predicted are demonstrated to be involved in schizont rupture and proteases, many important proteolytic enzymes remain un- merozoite invasion (Blackman et al. 1998; Barale et al. 1999; characterized (Olaya and Wasserman 1991; Southan 2001). Hackett et al. 1999). Cysteine proteases have also been impli- The following six sets of experimental data suggest that uni- cated in the rupture/invasion process (Salmon et al. 2001). A dentified proteases are responsible for additional critical hy- cluster of Serine Repeat Antigens (SERAs) exhibit limited se- drolytic activities: (1) A calpain-type protease, which appears quence similarity to cysteine proteases, though their proteo- to be involved in merozoite invasion of red blood cells (Olaya lytic activity remains undocumented (Delplace et al. 1988; and Wasserman, 1991); (2) an entire group of threonine pro- Miller et al. 2002). A zinc-metallo-aminopeptidase has also teases in the proteasome complex (Gantt et al. 1998); (3) pro- been demonstrated to possess enzymatic activity (Florent et teases that catalyze the primary processing of Merozoite Sur- al. 1998). Meanwhile, three classes of proteases have been face Protein (MSP-1; David et al. 1984), Apical Merozite Anti- identified to be involved in hemoglobin degradation: (1) Four gen-1 (AMA-1; Narum and Thomas 1994), and the precursor aspartic proteases (plasmepsin I, II, IV, and HAP) (see Banerjee of SERA (Li et al. 2002); (4) the gp76 and gp68 GPI-anchored et al. 2002 for a review); (2) three cysteine proteases (falcipain- serine proteases that cleave host erythrocyte surface proteins 1, -2, and -3) (see Rosenthal 2002 for a review); and (3) one in P. falciparum and P. chabaudi, respectively (Braun-Breton et al. 1988); (5) a 75-kD merozoite serine protease (Rosenthal et al. 1987); and (6) a neutral aminopeptidase essential in he- 4Present address: Malaria Vaccine Development Unit, National moglobin digestion (Curley et al 1994). Additional supportive Institute of Allergy and Infectious Diseases, National Institutes of evidence that the majority of malarial proteases are unex- Health, Bethesda, MD 20892, USA. plored comes from a comparison with the number of prote- 5 Corresponding author. ases found in other organisms. According to the statistics in E-MAIL [email protected]; FAX (703) 365-2740. Article and publication are at http://www.genome.org/cgi/doi/10.1101/ the protease database Merops (http://www.merops.ac.uk) as gr.913403. released on March 18, 2002, all the model organisms possess 13:601–616 ©2003 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/03 $5.00; www.genome.org Genome Research 601 www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press Wu et al. Table 1. Ninety-two (92) P. falciparum Protease Homologs Predicted From Comparative Genomic Analysis Protease homolog with highest BLAST score Pfam domain structure Catalytic Protease Protease Accession (species, class family nomenclature Gene ID protease name) E-score Domain ID (name) E-score APa Aspartic A1 PM I PF14_0076 P39898 (P. falciparum, PM I) 0 PF00026 (eukaryotic 1.5e-59 + asp protease) מ PM II PF14_0077 P46925 (P. falciparum, PM II) 0 PF00026 (eukaryotic 8.9e-52 asp protease) HAP(PM III) PF14_0078 CAB40630 (P. falciparum, HAP) 0 PF00026 (eukaryotic 3.3e-30 + asp protease) מ PM IV PF14_0075 AAC15794 (P. malariae, PM) 0 PF00026 (eukaryotic 5.1e-56 asp protease) PM V PF13_0133 Q05744 (Chicken, Cathepsin D) 9e-08 PF00026 (eukaryotic 0.00037 + asp protease) מ PM VI PFC0495w CAC20153 (Eimeria tenella, 2e-99 PF00026 (eukaryotic 1.1e-40 eimepsin) asp protease) PM VII PF10_0329 CAC20153 (Eimeria tenella, 8e-28 PF00026 (eukaryotic 1.3e-11 + eimepsin) asp protease) מ PM VIII PF14_0623 CAC20153 (Eimeria tenella, 1e-35 PF00026 (eukaryotic 7.5e-11 eimepsin) asp protease) מ PM IX PF14_0281 AAD56283 (Pseudopleuronectes 1e-33 PF00026 (eukaryotic 4.5e-26 americanus, pepsinogen A asp protease) form IIa) PM X PF08_0108 AAA31096 (Pig, pepsinogen A) 1e-42 PF00026 (eukaryotic 2.5e-26 + asp protease) מ Cysteine C1 Falcipain-1 PF14_0553 P25805 (P. falciparum, 0 PF00112 (Papain 9.5e-92 falcipain-1) family) Falcipain-2 PF11_0165 AAF63497 (P. falciparum, 0 PF00112 (Papain 3.0e-84 + falcipain 2) family) מ Falcipain-3 PF11_0162 AAF86352 (P. falciparum, 0 PF00112 (Papain 4.8e-79 falcipain-3) family) Papain PF11_0161 AAF97809 (P. falciparum, 1e-134 PF00112 (Papain 8.8e-84 + Falcipain 2) family) מ DPP I PFL2290wAAD02704 (Dog, DPP I) 5e-30 PF00112 (Papain 9.9e-16 family) DPP I PFD0230c AAD02704 (Dog, DPP I) 6e-06 PF00112 (Papain 2.7e-05 + family) Cathepsin C PF11_0174 AAL48191 (Human, 5e-23 PF00112 (Papain 3.3e-14 + Cathepsin C) family) מ SERA PFB0360c H71617 (P. falciparum, SERA) 0 PF00112 (Papain 2.1e-51 family) מ SERA PFB0325c F71617 (P. falciparum, SERA) 0 PF00112 (Papain 2.9e-10 family) מ SERA PFB0330c G71617 (P. falciparum, SERA) 0 PF00112 (Papain 2.9e-46 family) מ SERA PFB0335c H71617 (P. falciparum, SERA) 0 PF00112 (Papain 3.4e-16 family) מ SERA PFB0340c B71617 (P. falciparum, SERA) 0 PF00112 (Papain 1.5e-53 family) מ SERA PFB0345c C71617 (P. falciparum, SERA) 0 PF00112 (Papain 1.2e-47 family) מ SERA PFB0350c D71617 (P. falciparum, SERA) 0 PF00112 (Papain 8.8e-51 family) מ SERA PFB0355c H71617 (P. falciparum, SERA) 0 PF00112 (Papain 1.5e-49 family) מ Papain PFI0135c G71617 (P. falciparum, SERA) 1e-177 PF00112 (Papain 2.2e-34 family) מ C2 Calpain MAL13P1.310 NP_497964 (C. elegans, 2e-35 PF00648 (Calpain 8.0e-13 Calpain) family) מ C12 UCH1 PF14_0577 BAB47136 (Rat, UCH 13) 5e-31 PF01088 (UCH 1) 1.3e-37 מ UCH1 PF11_0177 NP_062508 (Mouse, UCH37) 6e-30 PF01088 (UCH 1) 1.5e-17 C13 GPI8p PF11_0298 CAB96076 (P. falciparum, 1e-179 PF01650 (Peptidase 2.7e-13 + transamidase GPI8p transamidase) C13) ממ C14 Metacaspase PF13_0289 CAD24804 (T. brucei, 5e-32 No hits metacaspase) מ C19 UCH2 PFA0220wNP_566486 (Arabidopsis, 1e-12 PF00442 (UCH2) 1.2e-09 UBP25) (continued) 602 Genome Research www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press Prediction of Novel Malaria Proteases Table 1. Continued Protease homolog with highest BLAST score Pfam domain structure Catalytic Protease Protease Accession (species, class family nomenclature Gene ID protease name) E-score Domain ID (name) E-score APa מ UCH2 PFD0165w O57429 (Chicken, UCH2) 5e-13 PF00442 (UCH2) 1.9e-06 PF00443 (UCH2) 9.2e-07 מ UCH2 PFD0680c AAG42755 (Arabidopsis, 1e-34 PR00442 (UCH2) 8.5e-11 UBP14) מ UCH2 PFE1355c NP_566680 (Arabidopsis, 3e-30 PF00442 (UCH2) 7.5e-07 UBP7) PF00443 (UCH2) 1.9e-14 מ UCH2 PFE0835w094966 (Human, UBP19) 3e-51 PF00442 (UCH2) 1.8e-11 PF00443 (UCH2) 1.7e-20 מ UCH2 MAL7P1.147 NP_568171 (Arabidopsis, 1e-54 PF00442 (UCH2) 3.7e-10 UBP12) PF00443 (UCH2) 1.9e-16 מ UCH2 PFI0225wAAC68865 (Chicken, UBP66) 1e-18 PF00442 (UCH2) 9.9e-23 מ UCH2 PF13_0096 NP_494298 (C.