<<

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Letter Data-Mining Approaches Reveal Hidden Families of in the Genome of Parasite Yimin Wu,1,4 Xiangyun Wang,2 Xia Liu,1 and Yufeng Wang3,5 1Department of Protistology, American Type Culture Collection, Manassas, Virginia 20110, USA; 2EST Informatics, Astrazeneca Pharmaceuticals, Wilmington, Delaware 19810, USA; 3Department of Bioinformatics, American Type Culture Collection, Manassas, Virginia 20110, USA

The search for novel antimalarial drug targets is urgent due to the growing resistance of falciparum parasites to available drugs. Proteases are attractive antimalarial targets because of their indispensable roles in parasite infection and development,especially in the processes of host e rythrocyte rupture/invasion and degradation. However,to date,only a small number of s have been identified and characterized in Plasmodium . Using an extensive sequence similarity search,we have identifi ed 92 putative proteases in the P. falciparum genome. A set of putative proteases including ,,and s ignal peptidase I have been implicated to be central mediators for essential parasitic activity and distantly related to the host. Moreover,of the 92,at least 88 have been demonstrate d to code for products at the transcriptional levels,based upon the microarray and RT-PCR results,an d the publicly available microarray and proteomics data. The present study represents an initial effort to identify a set of expressed,active,and essential proteases as targets for inhibitor-based drug design. [Supplemental material is available online at www.genome.org.]

Malaria remains one of the most dangerous infectious diseases metalloprotease (falcilysin; Eggleson et al. 1999). The success- in the world. It kills 1–2 million people each year, and is ful crystallization of II and the expression of re- responsible for enormous economic burdens in endemic re- combinant /II and falcipain-2 represented a sig- gions. The development of new antimalarial drugs is urgently nificant advance towards a functional understanding and a needed due to the continuing high mortality and morbidity rational design of inhibitors of these (Silva et al. caused by malaria and the increasing prevalence of drug- 1996; Bernstein et al. 1999; Tyas et al. 1999; Shenai et al. 2000; resistance in the pathogenic parasite Plasmodium falciparum. Dua et al. 2001). Malarial proteases have long been considered potential The recent completion of the P. falciparum genome pro- targets for chemotherapy due to their crucial roles in the para- vides a basis on which to identify new proteases. The first pass site cycle, and the feasibility of designing specific inhibi- annotation has predicted 25 proteases that belong to ten tors (for reviews, see McKerrow et al. 1993; Rosenthal 1998; families of five catalytic classes (Table 1). Despite this initial Blackman 2000; Rosenthal 2002). Efforts to identify func- progress, direct evidence from protease inhibition assays and tional proteases targeted by inhibition assays are ongoing. independent comparisons with other genomes suggest that in -1 and Subtilase-2, two homologous serine proteases, addition to the limited number of characterized and predicted are demonstrated to be involved in schizont rupture and proteases, many important proteolytic enzymes remain un- merozoite invasion (Blackman et al. 1998; Barale et al. 1999; characterized (Olaya and Wasserman 1991; Southan 2001). Hackett et al. 1999). proteases have also been impli- The following six sets of experimental data suggest that uni- cated in the rupture/invasion process (Salmon et al. 2001). A dentified proteases are responsible for additional critical hy- cluster of Serine Repeat Antigens (SERAs) exhibit limited se- drolytic activities: (1) A calpain-type protease, which appears quence similarity to cysteine proteases, though their proteo- to be involved in merozoite invasion of red blood cells (Olaya lytic activity remains undocumented (Delplace et al. 1988; and Wasserman, 1991); (2) an entire group of threonine pro- Miller et al. 2002). A zinc-metallo- has also teases in the proteasome complex (Gantt et al. 1998); (3) pro- been demonstrated to possess enzymatic activity (Florent et teases that catalyze the primary processing of Merozoite Sur- al. 1998). Meanwhile, three classes of proteases have been face (MSP-1; David et al. 1984), Apical Merozite Anti- identified to be involved in hemoglobin degradation: (1) Four gen-1 (AMA-1; Narum and Thomas 1994), and the precursor aspartic proteases (plasmepsin I, II, IV, and HAP) (see Banerjee of SERA (Li et al. 2002); (4) the gp76 and gp68 GPI-anchored et al. 2002 for a review); (2) three cysteine proteases (falcipain- serine proteases that cleave host erythrocyte surface 1, -2, and -3) (see Rosenthal 2002 for a review); and (3) one in P. falciparum and P. chabaudi, respectively (Braun-Breton et al. 1988); (5) a 75-kD merozoite (Rosenthal et al. 1987); and (6) a neutral aminopeptidase essential in he- 4Present address: Malaria Vaccine Development Unit, National moglobin (Curley et al 1994). Additional supportive Institute of and Infectious Diseases, National Institutes of evidence that the majority of malarial proteases are unex- Health, Bethesda, MD 20892, USA. plored comes from a comparison with the number of prote- 5 Corresponding author. ases found in other . According to the statistics in E-MAIL [email protected]; FAX (703) 365-2740. Article and publication are at http://www.genome.org/cgi/doi/10.1101/ the protease database Merops (http://www.merops.ac.uk) as gr.913403. released on March 18, 2002, all the model organisms possess

13:601–616 ©2003 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/03 $5.00; www.genome.org Genome Research 601 www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Wu et al.

Table 1. Ninety-two (92) P. falciparum Protease Homologs Predicted From Comparative Genomic Analysis

Protease homolog with highest BLAST score structure Catalytic Protease Protease Accession (species, class family nomenclature Gene ID protease name) E-score Domain ID (name) E-score APa

Aspartic A1 PM I PF14_0076 P39898 (P. falciparum, PM I) 0 PF00026 (eukaryotic 1.5e-59 + asp protease) מ PM II PF14_0077 P46925 (P. falciparum, PM II) 0 PF00026 (eukaryotic 8.9e-52 asp protease) HAP(PM III) PF14_0078 CAB40630 (P. falciparum, HAP) 0 PF00026 (eukaryotic 3.3e-30 + asp protease) מ PM IV PF14_0075 AAC15794 (P. malariae, PM) 0 PF00026 (eukaryotic 5.1e-56 asp protease) PM V PF13_0133 Q05744 (Chicken, D) 9e-08 PF00026 (eukaryotic 0.00037 + asp protease) מ PM VI PFC0495w CAC20153 (Eimeria tenella, 2e-99 PF00026 (eukaryotic 1.1e-40 eimepsin) asp protease) PM VII PF10_0329 CAC20153 (Eimeria tenella, 8e-28 PF00026 (eukaryotic 1.3e-11 + eimepsin) asp protease) מ PM VIII PF14_0623 CAC20153 (Eimeria tenella, 1e-35 PF00026 (eukaryotic 7.5e-11 eimepsin) asp protease) מ PM IX PF14_0281 AAD56283 (Pseudopleuronectes 1e-33 PF00026 (eukaryotic 4.5e-26 americanus, pepsinogen A asp protease) form IIa) PM X PF08_0108 AAA31096 (Pig, pepsinogen A) 1e-42 PF00026 (eukaryotic 2.5e-26 + asp protease) מ Cysteine C1 Falcipain-1 PF14_0553 P25805 (P. falciparum, 0 PF00112 ( 9.5e-92 falcipain-1) family) Falcipain-2 PF11_0165 AAF63497 (P. falciparum, 0 PF00112 (Papain 3.0e-84 + falcipain 2) family) מ Falcipain-3 PF11_0162 AAF86352 (P. falciparum, 0 PF00112 (Papain 4.8e-79 falcipain-3) family) Papain PF11_0161 AAF97809 (P. falciparum, 1e-134 PF00112 (Papain 8.8e-84 + Falcipain 2) family) מ DPP I PFL2290wAAD02704 (Dog, DPP I) 5e-30 PF00112 (Papain 9.9e-16 family) DPP I PFD0230c AAD02704 (Dog, DPP I) 6e-06 PF00112 (Papain 2.7e-05 + family) PF11_0174 AAL48191 (, 5e-23 PF00112 (Papain 3.3e-14 + Cathepsin C) family) מ SERA PFB0360c H71617 (P. falciparum, SERA) 0 PF00112 (Papain 2.1e-51 family) מ SERA PFB0325c F71617 (P. falciparum, SERA) 0 PF00112 (Papain 2.9e-10 family) מ SERA PFB0330c G71617 (P. falciparum, SERA) 0 PF00112 (Papain 2.9e-46 family) מ SERA PFB0335c H71617 (P. falciparum, SERA) 0 PF00112 (Papain 3.4e-16 family) מ SERA PFB0340c B71617 (P. falciparum, SERA) 0 PF00112 (Papain 1.5e-53 family) מ SERA PFB0345c C71617 (P. falciparum, SERA) 0 PF00112 (Papain 1.2e-47 family) מ SERA PFB0350c D71617 (P. falciparum, SERA) 0 PF00112 (Papain 8.8e-51 family) מ SERA PFB0355c H71617 (P. falciparum, SERA) 0 PF00112 (Papain 1.5e-49 family) מ Papain PFI0135c G71617 (P. falciparum, SERA) 1e-177 PF00112 (Papain 2.2e-34 family) מ C2 Calpain MAL13P1.310 NP_497964 (C. elegans, 2e-35 PF00648 (Calpain 8.0e-13 Calpain) family) מ C12 UCH1 PF14_0577 BAB47136 (Rat, UCH 13) 5e-31 PF01088 (UCH 1) 1.3e-37 מ UCH1 PF11_0177 NP_062508 (Mouse, UCH37) 6e-30 PF01088 (UCH 1) 1.5e-17 C13 GPI8p PF11_0298 CAB96076 (P. falciparum, 1e-179 PF01650 (Peptidase 2.7e-13 + transamidase GPI8p transamidase) C13) ממ C14 Metacaspase PF13_0289 CAD24804 (T. brucei, 5e-32 No hits metacaspase) מ C19 UCH2 PFA0220wNP_566486 (Arabidopsis, 1e-12 PF00442 (UCH2) 1.2e-09 UBP25)

(continued)

602 Genome Research www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Prediction of Novel Malaria Proteases

Table 1. Continued

Protease homolog with highest BLAST score Pfam domain structure Catalytic Protease Protease Accession (species, class family nomenclature Gene ID protease name) E-score Domain ID (name) E-score APa

מ UCH2 PFD0165w O57429 (Chicken, UCH2) 5e-13 PF00442 (UCH2) 1.9e-06 PF00443 (UCH2) 9.2e-07 מ UCH2 PFD0680c AAG42755 (Arabidopsis, 1e-34 PR00442 (UCH2) 8.5e-11 UBP14) מ UCH2 PFE1355c NP_566680 (Arabidopsis, 3e-30 PF00442 (UCH2) 7.5e-07 UBP7) PF00443 (UCH2) 1.9e-14 מ UCH2 PFE0835w094966 (Human, UBP19) 3e-51 PF00442 (UCH2) 1.8e-11 PF00443 (UCH2) 1.7e-20 מ UCH2 MAL7P1.147 NP_568171 (Arabidopsis, 1e-54 PF00442 (UCH2) 3.7e-10 UBP12) PF00443 (UCH2) 1.9e-16 מ UCH2 PFI0225wAAC68865 (Chicken, UBP66) 1e-18 PF00442 (UCH2) 9.9e-23 מ UCH2 PF13_0096 NP_494298 (C. elegans, UBP) 5e-58 PF00442 (UCH2) 4.3e-08 PF00443 (UCH2) 0.00048 מ UCH2 PF14_0145 T41069 (Fission yeast, UCH) 1e-20 PF00442 (UCH2) 4.8e-05 PF00443 (UCH2) 3.1e-08 מ C48 Sumo1 protease PFL1635wXP_128236 (Mouse, SUMO 7e-33 PF02902 (Ulp1 2.3e-38 protease) protease) מ Ulp2 peptidase MAL8P1.157 O13769 (Fission yeast, Ulp2) 1e-06 PF02902 (Ulp1 9.2e-07 protease) מ C56 DJ-1 peptidase MAL6P1.153 BAB79527 (Chicken, DJ-1 6e-16 PF01965 (DJ-1/PfpI 8.1e-18 protease) protease) מ Metallo M1 AMPN MAL13P1.56 O96935 (P. falciparum, 0 PF01433 (Peptidase 6.2e-74 M1 peptidase) family M1) מ M3 Dcp PF10-0058 NP_601500 (Corynebacterium 1e-04 PF01432 (Peptidase 6.4e-10 glutamicum, Zn-dependent family M3) peptidases) מ Neurolysin MAL13P1.184 Q02038 (Pig, neurolysin) 4e-09 PF01432 (.8e-07 family M3) מ M16 MPPa PFE1155c AAF00541 (Toxoplama gondii, 1e-109 PF00675 (Insulinase, 6.4e-19 MPPa) family M16) מ MPPb PFI1625c AAK51086 (Avicennia 1e-108 PF00675 (Insulinase, 1.3e-55 marina, MPPb) family M16) מ M16 peptidase PF11_0189 NP_593544 (yeast 1e-29 PF00675 (Insulinase, 0.0076 metallopeptidase) family M16) מ Insulysin PF11_0226 P35559 (Rat, Insulysin) 3e-10 PF00675 (Insulinase, 1.0e-05 family M16) מ Falcilysin PF13_0322 AAF06062 (P. falciparum, 0 PF00675 (Insulinase, 0.19 falcilysin) family M16) מ Pitrilysin PF14_0382 T06521 (Pea pitrilysin) 5e-27 PF00675 (Insulinase, 0.0018 family M16) מ M17 AMPL PF14_0439 NP_194821 (Arabidopsis 2e-83 PF00883 (Peptidase 9e-150 AMPL) family M17) מ M18 DNPE PFI1570c AAL16034 (Coccidioides 7e-55 PF02127 (Peptidase 3.6e-31 immitis, DNPE) family M18) מ M22 GCP PF10_0299 NP_194003 (Arabidopsis 4e-54 PF00814 (Glycopro- 1.3e-53 GCP) tease family) מ M24A AMPM PFE1360c AAG33975 (Arabidopsis, 3e-61 PF00557 (Peptidase 2.5e-48 AMPM) family M24) AMPM MAL8P1.140 P53582 (Human, AMP1) 1e-26 PF00557 (Peptidase 9.1e-06 + family M24) מ AMPM PF10_0150 P53582 (Human, AMP1) 1e-104 PF00557 (Peptidase 5.4e-67 family M24) מ AMPM PF14_0327 AAL76285 (P. falciparum, 0 PF00557 (Peptidase 8.7e-50 AMPM2) family M24) מ M24B AMPP PF14_0517 CAC59823 (Tomato, AMPP) 3e-85 PF00557 (Peptidase 2.8e-07 family M24) מ M41 Ftsh peptidase PF11_0203 NP_006787 (Human, AFG3) 1e-163 PF01434 (Peptidase 2.6e-80 family M41) 4.0e-80 PF00004 (AAA) מ Ftsh peptidase PFL1925wNP_422020 (Caulobacter 1e-122 PF01434 (Peptidase 3.2e-93 crescentus, cell division family M41) 5.7e-90 protein FtsH) PF00004 (AAA)

(continued)

Genome Research 603 www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Wu et al.

Table 1. Continued

Protease homolog with highest BLAST score Pfam domain structure Catalytic Protease Protease Accession (species, class family nomenclature Gene ID protease name) E-score Domain ID (name) E-score APa

מ Ftsh peptidase PF14_0616 NP_568787 (Arabidopsis, 1e-141 PF01434 (Peptidase 2.7e-69 FtsH) family M41) PF00004 (AAA 9.9e-85 מ Serine S1 DegP protease MAL8P1.126 NP_568577 (Arabidopsis, 4e-52 PF00089 () 1.8e-07 DegP protease) מ Neurotypsin-like PF14_0067 BAA23986 (Mouse, 7e-17 PF01477 (PLAT/LH2 1.4e-09 neurotrypsin) domain) מ S8 Subtilase-1 PFE0370c CAA05627 (P. falciparum, 0 PF00082 (Subtilase 2.6e-15 subtilase-1) family) מ Subtilase-2 PF11_0381 CAB43592 (P. falciparum, 0 PF00082 (Subtilase 1.6e-35 Subtilase-2) family) מ Subtilase-like PFE0355c CAA05627 (P. falciparum, 2e-22 PF00082 (Subtilase 5.6e-19 subtilase-1) family) מ S9 ACPH PFC0950c P13676 (Rat, ACPH) 1e-17 PF00561 (␣/␤ 0.021 fold) S14 Clp PFC0310c P54416 (Synechocystis sp 6e-42 PF00574 (Clp protease) 1.8e-65 + Clp1) Clp PF14_0348 NP_567521 (Arabidopsis 2e-29 PF00574 (Clp protease) 3.0e-37 + clp) מ ClpB PF08_0063 AAA88777 (P. berghei, 0 PF00574 (Clp protease) 1.0-16 ClpB) ClpB PF14_0063 NP_439019 (Haemophilus 3e-95 PF00004 (AAA) 7.1e-06 + influenzae ClpB) מ ClpC PF11_0175 T07807 (Soybean clp) 1e-154 PF00574 (Clp protease) 0.0026 מ S16 Lon PF14_0147 AAA61616 (Human, Lon) 4e-53 PF00004 (AAA) 1.4e-17 מ S26A SP1 PF13_0118 T40251 (Fission yeast, 2e-10 PF00461 (Signal 0.055 IMP) peptidase) מ S26B signalase MAL13P1.167 AAD19813 (Drosophila, 3e-46 PF00461 (Signal 3.7e-19 signalase SPC21) peptidase) מ S54 Rhomboid PFE0340c NP_523536 (Drosophila, 6e-07 PF01694 (Rhomboid 5.5e-27 rhomboid-5) family) מ Rhomboid MAL8P1.16 NP_654179 (Bacillus 2e-07 PF01694 (Rhomboid 3.8e-27 anthracis rhomboid) family) מ Threonine T1 Proteasome ␣1 PF14_0716 P92188 (Trypanosoma 8e-57 PF00227 (Proteasome) 7.7e-39 cruzi, ␣1) מ Proteasome ␣2 MAL6P1.88 O9LSU2 (Rice, ␣2) 3e-73 PF00227 (Proteasome) 1.4e-49 מ Proteasome ␣3 PFC0745c O24362 (Spinach, ␣3) 3e-44 PF00227 (Proteasome) 2.0e-21 מ Proteasome ␣4 PF13_0282 O81148 (Arabidopsis, ␣4) 1e-64 PF00227 (Proteasome) 7.4e-47 מ Proteasome ␣5 PF07_0112 Q95083 (Drosophila, ␣5) 4e-70 PF00227 (Proteasome) 2.1e-47 מ Proteasome ␣6 MAL8P1.128 Q9LSU3 (Rice, ␣6) 4e-52 PF00227 (Proteasome) 2.8e-30 מ Proteasome ␣7 MAL13P1.270 O24616 (Arabidopsis, ␣7) 3e-66 PF00227 (Proteasome) 8.5e-43 מ Proteasome ␤1 PFE0915c P42742 (Arabidopsis, ␤1) 3e-44 PF00227 (Proteasome) 1.2e-39 מ Proteasome ␤2 MAL8P1.142 Q9LST6 (Rice, ␤2) 1e-35 PF00227 (Proteasome) 2.9e-17 מ Proteasome ␤3 PFA0400c P25451 (Yeast, ␤3) 8e-35 PF00227 (Proteasome) 1.5e-18 מ Proteasome ␤4 PF14_0676 XP_079788 (Drosophila, 4e-36 PF00227 (Proteasome) 1.4e-22 ␤4) מ Proteasome ␤6 PFI1545c O43063 (Fission yeast, ␤6) 4e-26 PF00227 (Proteasome) 1.1e-10 מ Proteasome ␤7 PF13_0156 Q99436 (Human, ␤7) 1e-74 PF00227 (Proteasome) 5.5e-35

The cut-off criteria of E-score Յ 1e-04 was employed to define protease homologs. The nomenclature of the protease family is: A1 (), C1 (papain), C2 (calpain), C12 (ubiquitin carboxyl-terminal hydrolase, family 1, UCH1), C13 (hemoglobinase), C14 (), C19 (ubiquitin C-terminal hydrolase family 2, UCH2), C48 (Ubiquitin-like protease, Ulp), C56 (DJ-1 peptidase), M1 (alanyl aminopeptidase, AMPN), M3 (thimet ), M16 (pitrilysin and mitochondrial processing peptidase, MPP), M17 (, AMPL), M18 (aspartyl aminopeptidase, DNPE), M22 (O-sialoglycoprotein , GCP), M24A (methionyl aminopeptidase, AMPM), M24B (X-Pro dipepti- dase, AMPP), M41 (FtsH endopeptidase), S1 (trypsin), S8 (subtilsin), S9 (acylaminoacyl-peptidase, ACPH), S14 (clp), S16 (Lon protease, La), S26A (prokarytotic I, SP1), S26B (signalase), S54 (rhomboid), and T1 (threonine endopeptidase). Abbreviations for proteases include: PM, plasmepsin; DPPI, dipeptidyl-peptidase I; UBP, ubiquitin-specific protease; IMP, mitochondrial inner membrane peptidase; SPC21, microsomal signal peptide 21 kDa subunit. Previously characterized proteases with proteolytic activity are highlighted in bold. The 23 proteases predicted by the official annotation published in PlasmoDB are highlighted in italic. Potential candidate proteases Calpain, Metacaspase, and (SP1) are highlighted in bold italic. .indicate the gene is predicted to contain/not contain an transit peptide עa

604 Genome Research www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Prediction of Novel Malaria Proteases

a large number of predicted and characterized proteases (hu- (Rawlings and Barrett 1993). Using the comparative database man, 493; mouse, 431; Drosophila melanogaster, 529; Cae- search, we detected a total of 59 new protease homologs, in norhabditis elegans, 360; Arabidopsis thaliana, 568; Baker’s addition to 12 characterized proteases with proteolytic activ- yeast, 112; Escherichia coli, 127; Bacillus subtilis, 119). An av- ity and 21 predicted by official annotation (Table 1). More- erage of 2.21% of the gene products belong to the protease over, a spectrum of conserved core characteristic domains/ superfamily in 77 completed genomes. Hence, given the ob- motifs for specific catalytic classes has been detected in most servation that the number of predicted proteases appears to be of the predicted proteases, indicating their potential activity. positively correlated with organismal complexity, one might The 92 putative proteases belong to 26 families of five envisage that a considerable number of malaria proteases clans, compared to the previously reported 12 proteases that have yet to be identified in the ∼23 Mbp Plasmodium falciparum belong to six families of four clans (Rosenthal 2002). The dis- genome that encodes for approximately 5300 gene products. tribution (11% aspartic, 36% cysteine, 22% metallo, 17% ser- Here, we report a complete survey of protease homologs ine, and 14% threonine) resembles those in other model or- in the predicted and annotated P. falciparum genome (Gard- ganisms, supporting the fundamental premise that a proto- ner et al. 2002). Our initial comparative sequence search iden- type protease system is conserved throughout tified 92 putative malaria proteases, including potentially an (Rawlings and Barrett 1993; Southan 2001). Our speculation interesting calpain, a metacaspase, and a signal peptidase I. that a large number of potential proteases remain unexplored Their expressions have been evaluated by microarray and RT- in the P. falciparum genome appears justified. Undoubtedly, PCR assays. This study helps to develop an integrated view of some of the uncharacterized proteases will perform crucial a number of novel malarial proteases within an organismal, functions in the parasite life cycle as discussed below. evolutionary, and functional context, and offers an intriguing opportunity to further target expressed and active proteases Examples of Potentially Important Proteases for chemotherapy. Calpain RESULTS AND DISCUSSION Calpain is a group of intracellular cysteine proteases that me- diate a wide variety of physiological and pathophysiological Ninety-Two Putative Proteases Are Predicted processes, including , cell motility, apop- by Comparative Genomic Analysis tosis, and cell cycle regulation (Sorimachi et al. 1997; Glading To gain further insight into the proteolytic machinery of the et al. 2002). In P. falciparum, a calpain, yet unidentified, was malaria parasite, the protein sequences in the annotated P. believed to be essential in merozoite invasion, based on the falciparum genome were subjected to an exhaustive search observation that Calpain inhibitors I and II strongly blocked against the Merops protease database, which has a catalog and invasion (Olaya and Wasserman 1991). a structure-based classification of proteases. We adopted a We have identified a putative calpain (MAL13P1.310) in relatively stringent threshold of E Յ 1e-04 for BLASTP to en- the P. falciparum genome, which exhibits high sequence simi- sure the high coverage with low false-positives. Redundant larity to C. elegans calpain-7 (E=2e-35). Moreover, its ortholog hits and partial sequences were excluded, resulting in a total (accession no. EAA19663) has been identified in the newly of 92 protease homologs (Table 1). As highlighted in the Pro- released genome of the model rodent malaria parasite Plasmo- tease nomenclature column in Table 1, all twelve previously dium yoelii yoelii. It possesses a catalytic domain (985–1453) characterized proteases with proteolytic activity are included. detected by the Hidden Markov Model in the pfam search, In addition, as highlighted in the Gene ID column, 23 out of with E = 8.0e-13 (Fig. 1). The most intriguing aspect of this 25 proteases predicted by first-pass annotation published in domain is the presence of three active sites (Cys1035, PlasmoDB are included, among which 1 and 2 have His1371, and Asn1391) that constitute a cleft crucial for cata- been demonstrated to possess proteolytic activity; PFI0660c is lytic activity (Arthur et al. 1995). A multiple alignment of the not included because the E-score (0.39) of its closest homolog catalytic regions was produced for the putative plasmodial (Bacillus anthracis CAAX amino terminal protease, accession calpain and the representative human . In addition to number NP_655263) is far below the cut-off 1e-04; PF11_0314 the invariable Cys-His-Asn triad, a high degree of identity is is not included because it is more likely to possess ATP hy- also observed in its vicinity, reflecting stringent functional drolytic and regulatory function than protoelytic function and mechanistic conservation (Fig. 1). Indeed, the experimen- based on . tal demonstration that a single catalytic subunit in rat and The domain/motif organization of predicted proteases chicken calpains possesses a full bona fide proteolytic activity was revealed by the InterPro Search. For each putative malaria (Yoshizawa et al. 1995) reinforces the potential processing protease, the known protease sequence or protease domain of capacity of the putative plasmodial calpain. the highest similarity was used as a reference for annotation; Our further phylogenetic analysis of the putative P. fal- the catalytic type and protease family were predicted in ac- ciparum calpain revealed its striking origin, which might have cordance with the classification in the protease database Me- attributed to an alternative Ca2+-independent regulatory rops (http://www.merops.co.uk/merops/merops.htm), and mechanism. Figure 2 shows the evolutionary tree inferred by the was named in accordance with the SWISS-PROT the neighbor-joining (NJ) method using Poisson corrected enzyme nomenclature (http://www.expasy.ch/cgi-bin/ distance (Saitou and Nei 1987). Evolutionary trees based on lists?peptidas.txt) and literatures. Parsimony (PAUP4.0) and Maximum Likelihood (PHYLIP) also yielded topologies and clade structures congruent with New Catalytic Types and Families NJ (data not shown). Apparently, two putative plasmodial Proteases are classified into five major clans (Aspartic, Cyste- calpains belong to a novel monophyletic group of ine, Metallo, Serine, and Threonine) based on their catalytic calpain-7 proteases, with 61% bootstrap support. They share mechanisms. They can be further grouped into distinct fami- the common domain architecture in the calpain-7 clade: lack- lies and subfamilies by intrinsic evolutionary relationships ing any significant similarity to the C-terminal EF-hand Ca2+-

Genome Research 605 www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Wu et al.

Figure 1 Multiple alignment of the catalytic domains of the putative P. falciparum calpain (MAL13P1.310) and the representative human calpains using T-coffee program followed by manual correction. The catalytic domain region is predicted to be from residue 985 to 1453 by pfam HMM algorithm. The three conserved amino acids, C(1035), H(1371), N(1391), that are part of the active sites are highlighted with arrowheads. Graphic presentation of the alignment and the consensus sequence were obtained by the program BOXSHADE 3.21. Conserved residues are shaded with black and gray. The accession numbers of calpain protein sequences used for alignment refer to Figure 2. binding domain present in most of the essential Ca2+- fungi PalB, the nearest neighbor of calpain- dependent mammalian calpain subtypes (calpains -1, -2, -3, 7, contains a PBH domain resembling the Ca2+-binding do- -9, -11, and Mu/M-type) (Franz et al. 1999). Provided that main (Denison et al. 1995), one could speculate that the loss

606 Genome Research www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Prediction of Novel Malaria Proteases

Figure 2 The of the calpains, inferred by the neighbor-joining method based on the amino acid sequences with Poisson corrected distance. The option of complete deletion of gaps was used for tree construction. 1000 bootstrap replicates were used to infer the reliability of branching points. Bootstrap values of >50% are presented. The scale bar indicates the number of amino acid substitutions per site. The parasite sequences are underlined. The putative P. falciparum calpain and P. yoelii yoelii ortholog are highlighted in red and blue, respectively. The accession number for each sequence is included in the parentheses after the species name. of Ca2+ dependency in calpain-7 subtype had been derived Ca2+-independent plasmodial calpain may serve as a good an- from evolutionary events such as domain shuffling, which timalarial target for two reasons. First, it may be the central might be associated with the divergence of mRNA splicing component of crucial signal transduction pathways that af- sites (Craik et al. 1983). Such events appear to have occurred fect parasite and host-parasite interactions. Second, close to or prior to the origin of the animal (Fig. 2). because it is evolutionarily divergent from the essential sub- The identification of plasmodial calpain has also impli- types of host calpains, its specific inhibitor may have minimal cated the existence of calpain-mediated pathways. Its poten- effects on the host. tial cognate targets include host cytoskeletal proteins such as spectrin, integrin, and ezrin. Moreover, the recent discovery Metacaspase of a typical endogenous substrate of calpain, Protein Kinase C Metacaspase (PF13_0289) is another interesting hypothetical (PFL1110c; PFI1685w) in P. falciparum, has provided the sup- protease. In , a cascade of (cysteine aspar- port of a parasite-controlled signaling cascade (Doerig et al. tyl proteases) is the major modulator of (pro- 2002). grammed cell death) (Thornberry and Lazbnik 1998; Aravind It is conceivable that the putative protease-active and et al. 1999). Two families of ancient caspase-like proteins

Genome Research 607 www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Wu et al.

(paracaspases and ) have been found in metazo- yeast metacaspase and the plasmodial homolog exhibit dis- ans, fungi, and . As shown in the phylogenetic tree tinct sequence profile to other vertebrate caspases and human (Fig. 3), the putative plasmodial metacaspase occupies a dis- paracaspase. Previously, Uren et al. (2000) have postulated tinct clade constituting paracaspases and metacaspases, that ancient (paracaspases and metacaspases) and vertebrate which are likely to be the primordial form of 14 subfamilies of subtypes differ in substrate-specificity. We have demonstrated vertebrate caspases (bootstrap value = 100%). Interestingly, that the experimentally confirmed differential substrate- human paracaspase is capable of interacting with the onco- specificity in major vertebrate subtypes is largely determined gene Bcl10 and triggering NF-kB activation, indicative of the by the chemical property and configuration of residues situ- prone-to-apoptosis property of the ancestral caspase (Uren et ated in the caspase fold (Wang and Gu 2001). Thus, the ob- al. 2000). Moreover, yeast metacaspase has been demon- served distinct configuration of residues in the strated as an effective executor for apoptosis, suggesting the proximity could account for parasite-specific substrate- root of apoptosis dates back to unicellular organisms (Madeo preference. et al. 2002). The multiple alignment clearly reveals that the In Plasmodium, the physiological process of apoptosis has putative plasmodial metacaspase retains the typical caspase never been reported, nor the critical components identified. fold, which is centered with the His (404)-Cys (460) catalytic Nevertheless, the detection of the metacaspase homolog will dyad conserved in all representative proteolytically active allow us to investigate the role, if any, of apoptosis and/or caspases (Fig. 4). Conversely, considerable sequence diversity analogous signal transduction pathway in the parasite. In ad- is observed in the vicinity of this active site cleft. In particular, dition, since metacaspases have only been found in protozo-

Figure 3 The phylogenetic tree of the caspases, inferred by the neighbor-joining method based on the amino acid sequences with Poisson corrected distance. The option of complete deletion of gaps was used for tree construction. 1000 bootstrap replicates were used to infer the reliability of branching points. Bootstrap values of >50% are presented. The scale bar indicates the number of amino acid substitutions per site. The protozoan sequences are underlined.

608 Genome Research www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Prediction of Novel Malaria Proteases

Figure 4 Multiple alignment of the catalytic regions of the putative P. falciparum metacaspase (PF13_0289) and the representative proteolytically active caspases using Clustal X1.8 followed by manual correction. The catalytic dyad H (404) and C (460), which are part of the active sites, is highlighted with arrowheads. Graphic presentation of the alignment and the consensus sequence were obtained by the program BOXSHADE 3.21. Conserved residues are shaded with black and gray. The accession numbers of caspase protein sequences used for alignment refer to Figure 3. ans, yeasts, and possibly in , and are phylogenetically the typical Ser/His/Asp triad system in other serine proteases. distinct from other caspase subtypes (Fig. 3), the putative plas- It seems plausible that the putative plasmodial SP1 has a fun- modial metacaspase may serve as a potential chemotherapeu- damental role yet to be determined, and represents a promis- tic target. ing target given its distant relatedness to the host.

Signal Peptidase 1 (SP1) Important Protease-Mediated Pathways Implicated in P. falciparum Signal peptidases (SP) play indispensable roles in protein traf- Our findings suggested at least five new protease-mediated ficking and sorting by removing signal peptides from precur- activities: (1) an ATP-dependent ubiquitin-proteasome- sors of secretary proteins. This serine protease family consists mediated cell-cycle control and stress-response system of two subtypes, SP1 and signalase, based on their distinct (Verma et al. 2002). Although the mechanism by which pro- structural, functional, and evolutionary features. To date, SPs teasomes function in P. falciparum is poorly understood, their have been found in , , fungi, plants, and ani- importance was suggested by the observed irreversible inhibi- mals; however, SP has never been reported previously in pro- tion on the growth and development of the hepatic and tists, despite the fact that the dynamic parasite life cycle re- erythrocytic stages of three different Plasmodium species by flects a need of specific peptidase(s) to process proteins that Lactacystin, a specific threonine protease inhibitor (Gantt et are translocated across host and parasite membranes. Using al. 1998). The identification of the clade of threonine prote- the comparative genomic search, we first identified two ho- ases ␣ and ␤, and a series of ubiquitinyl (UCH1 and mologs of signal peptidase, PF13_0118 (SP1) and MAL13P1.167 UCH2) brings new insight into this universally conserved pro- (signalase) in P. falciparum. teasome machinery (Table 1). (2) A lysosomal . Between two subtypes, SP1 has generated extensive re- This selective pathway to degrade cytosolic proteins may in- search interest because it represents a novel antibiotic target volve a number of with versatile functions, which for its distinct prokaryotic origin and essential functions are assisted by cytosolic and lysosomal molecular chaperones (Paetzel et al. 2000). We have also identified an ortholog of P. and receptor proteins in the lysosomal membrane. (3) A cal- falciparum SP1 in the rodent parasite P. yoelii yoelii genome. pain-activated signal transduction cascade, which may work Our evolutionary analysis revealed that the two putative plas- in conjunction with upstream modulator and downstream modial SP1 have three clusters of homologs: (1) Bacteria SP1; effectors of host or parasite origin. (4) A caspase-mediated (2) an Arabidopsis chloroplast thylakoidal processing pepti- apoptosis or apoptosis-like signal transduction pathway. Al- dase; and (3) mitochondrial inner membrane peptidases though yeast metacaspase has been confirmed to induce (Imp) found in , which appear to be the nearest apoptosis, the classical apoptosis regulators appear to be miss- neighbor to plasmodial SP1 (Fig. 5). Given the proposed pro- ing in the yeast genome. Thus, it is desirable yet challenging karyotic origin of the chloroplast and , ma- to identify the key components in this pathway, which may larial SP1 is likely to have evolved via the prokaryotic-specific be conserved across organisms, or be parasite-specific. (5) A lineage. Moreover, the potential of its catalytic activity can be signal peptidase-initiated precursor protein processing path- inferred from the comparative sequence analysis. The puta- way. tive SP1 contains the catalytic dyad (Ser175, Lys274) that is invariable across representative SP1 proteins with confirmed Evolutionary Implications signal peptidase activity (Fig. 6). Most notably, this Ser/Lys Studying the origin and the evolutionary mechanisms behind catalytic dyad mechanism is unique in SP1, compared with plasmodial proteases will contribute to the selection of target

Genome Research 609 www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Wu et al.

Figure 5 The phylogenetic tree of the Signal peptidases I (SP1), inferred by the neighbor-joining method based on the amino acid sequences with Poisson corrected distance. The option of complete deletion of gaps was used for tree construction. 1000 bootstrap replicates were used to infer the reliability of branching points. Bootstrap values of >50% are presented. The scale bar indicates the number of amino acid substitutions per site. The putative P. falciparum calpain and P. yoelii yoelii ortholog are highlighted in red and blue, respectively. The SP1 homolog in Arabidopsis is termed Chloroplast thylakoidal processing peptidase. Imp is the abbreviation for mitochondrial inner membrane peptidase. proteases to be studied in detail, for which specific inhibitors may evolve at different rates and thus achieve novel functions with no or minimal effect on the host can be designed. A (Fast et al. 2001). The first target is the apicoplast, complex evolutionary scenario including gene duplication, the apicomplexan-specific plastid derived by secondary re- domain shuffling, and lateral gene transfer has been impli- duction of a red alga endosymbiont. Since the plastid- cated in the preliminary analysis of the predicted proteolytic encoded gene is of prokaryotic origin, its inhibitor may have machinery in P. falciparum. Gene duplication is believed to only a minor, if any, effect on the vertebrate host and there- play important roles in the evolution of multigene families by fore may represent a promising antimalarial target. Our pre- providing raw material for the novel functionality under dif- liminary analysis shows that the putative clpC gene ferential evolutionary constraints (Ohno 1970; Li 1983; Fried- “PF11_0175” matches one apicoplast-encoded gene (Wilson man and Hughes 2001; Gu et al. 2002; McLysaght et al. 2002). et al. 1996). Moreover, 14 predicted proteases may contain an In P. falciparum, well-characterized falcipains (-1, -2, -3), ples- apicoplast transit peptide, among 511 identified in the mepsins (I-IV), and subtilases (-1, -2) exemplify the multigene entire parasite genome by pattern-recognition program PATS families that arise from gene duplications (Coombs 2001; (Predict Apicoplast-Targeted Sequences) (Zuegge et al. 2001). Rosenthal 2002). We have identified a series of putative pro- From the population perspective, we would antici- teases that may comprise multigene families (Table 1). Some pate detecting a certain level of polymorphism among puta- reflect tandem gene duplications in adjacent tive proteases, due to the ancient origin of P. falciparum as loci. For example, eight SERA homologs aggregate as a cluster revealed by chromosome-wide SNP analysis (Verra and in chromosome 2 contig 11953 (Miller et al. 2002). In con- Hughes 1999; Mu et al. 2002; Wootton et al. 2002). However, trast, some potential paralogs are located in remote chromo- the alternative Malaria’s eve hypothesis of a severe recent some regions. For instance, the UCH2 family with the con- population bottleneck may still be valid (Rich et al. 1998; sensus domain is sparsely distributed over seven chromo- Volkman et al. 2001). More detailed analysis of the genomics somes. This suggests that ancient gene duplications and and proteomics of plasmodial proteases will help resolve these subsequent functional divergence may result in an extensive fundamental questions about P. falciparum evolution. repertoire of the present multigene families. In addition to gene duplication, domain shuffling coupled with the splice- Eighty-Three Putative Proteases Are Actively site variation, intron loss, and are proposed to be important modes in the evolution of aspartic Transcribed in the Intraerythrocytic Stage, proteases in the parasite genus , including P. fal- and Sixty-Seven Are Actively Translated ciparum (Jean et al. 2001). The proteases encoded by or des- in the Life Cycle tined to parasite are of particular interest because We are bearing in mind that genome analysis based solely on organelles represent microenvironments in which proteases sequence similarity clearly predicts many unknown putative

610 Genome Research www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Prediction of Novel Malaria Proteases

Figure 6 Multiple alignment of the catalytic regions of the putative P. falciparum SP1 (PF13_0118) and the representative proteolytically active signal peptidase I using T-coffee program followed by manual correction. The catalytic dyad Ser (S175) and Lys (K274), which are part of the active sites, is highlighted with arrowheads. Graphic presentation of the alignment and the consensus sequence were obtained by the program BOXSHADE 3.21. Conserved residues are shaded with black and gray. The accession numbers of SP1 protein sequences used for alignment refer to Figure 5. malaria proteases, however, these are only predictions. Which rifin, and stevor) as internal references (Hayward et al. 2000; of the 59 newly predicted proteases, in addition to the 12 Ben Mamoun et al. 2001). As anticipated, all gametocyte-specific characterized proteases and 21 proteases annotated previ- genes, 39 of 45 var genes, 99 of 118 rifin genes, and 12 of 14 ously, are true protein-encoding genes expressed in the para- stevor genes displayed signal intensities below the level of the site life cycle? This important question was first addressed by negative controls (data not shown). These data further support analyzing an en masse profile using microar- our conclusion that 75 predicted proteases are actively tran- ray chips, and then followed by RT-PCR confirmation. scribed during the erythrocytic stage. Interestingly, the putative multigene families such as SERA and UCH2 exhibit variable ex- Microarray pression levels across paralogous members, reflecting a certain We focused on the parasite expression profiles of the asexual level of functional divergence after gene duplication events. erythrocyte stage not only because this stage is responsible for We also analyzed two microarray datasets published in malaria clinical manifestations, but also because of the acces- the PlasmoDB. The first dataset includes the expression pro- sibility of the research materials. In order to obtain all genes file of two erythrocyte stages (Trophozoites and Schizonts) transcribed throughout the erythrocyte stage of the parasite, using the Oligo Microarray (Hayward et al. 2000). The result we extracted and pooled mRNAs from P. falicarpum 3D7 cul- of 66 predicted proteases transcribed in at least one stage sup- ture samples collected at four 12-h intervals. Figure 7 shows ported our finding that the majority of the predicted prote- the temporal development of parasites that includes rings, ases were actively transcribed during the erythrocyte stage trophozoites, schizonts, and merozoites, indicating that an (Table 2). The second dataset represents the first proof-of- asynchrony was successfully achieved. Probes were labeled concept experiment of using cDNA microarray to explore the with fluorescent dyes using mRNAs purified from the asyn- expression profile of five erythrocytic forms and stages (Ben chronous culture as a template, and then hybridized to the Mamoun et al. 2001). Among 944 elements or gene fragments microchip arrayed with 6239 Malaria Genome Array Oligo- (317 genes of identifiable homology) included in the probe mers (Operon Technologies). design, eight corresponded to predicted proteases. The posi- Results, summarized in Table 2, clearly demonstrated tive signals of seven genes are consistent with our result from that 75 predicted proteases have signal intensities higher than asynchronous culture. The stage-specific profile also con- those of negative controls. Being aware that the cut-off value firmed the ubiquitous expression of the putative proteosome for signal intensity is controversial, and that using the average ␤6 (PFI1545c), which does not have corresponding 70-mer in intensity of the negative controls may be somewhat arbitrary, the Oligo Microarray. we selected the gametocyte-specific proteins (CS protein TRAP-related protein, Pfs25, Pfs48/45, Pfg377, and a gameto- Reverse Transcription Polymerase Chain Reaction cyte-specific var) and three large gene families in which the Among the 17 remaining predicted proteases that are not de- majorities are silent due to clonal expression switching (var, tected using microarray hybridization, seven showed signal

Figure 7 Four P. falciparum 3D7 culture samples were collected at 12-h intervals, and pooled to achieve a total asynchrony.

Genome Research 611 www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Wu et al.

Table 2. Expression Profiles of Putative Plasmodial Proteases intensity below the negative con- trols. One possibility is that some of b c Oligo Microarray Proteomics them are expressed in stages other than the asexual erythrocytic ones. Gene ID Oligo Microarray (asyn)a TSTMGS This could be further investigated PF14_0076 21,915 7595 6935 + + + by using RNAs extracted from the PF14_0077 6544 6084 3034 + + + intraerythrocytic and extraerythro- PF14_0078 2981 16,185 7152 + + + cytic stages. The remaining ten pre- PF14_0075 20,151 20,979 21,090 + + + dicted proteases were not included PF13_0133 1521 5636 4810 in the oligomer set printed on the ממ PFC0495w* 454 + ∼ array slides because only 90% of ממ PF10_0329 76 the P. falciparum genome data was ממ PF14_0623 215 PF14_0281 No oligo available when the oligomers were designed. To examine whether + מ PF08_0108 5561 2478 PF14_0553 3187 4947 4954 + these predicted proteases were also PF11_0165 8073 17,895 4771 + expressed in the erythrocyte stage, PF11_0162 9302 4559 5676 + + PF11_0161 2436 7318 2025 + we designed specific primers and performed RT-PCR using the RNA++ממ PFL2290w* 467 PFD0230c 8342 2473 25,913 + extracted from the asynchronous PF11_0174 19,855 36,201 19,570 + + culture (Fig. 7) as templates. Data ממ PFB0360c 376 shown in Figure 8A clearly suggests + ממ PFB0325c* 179 that all ten predicted genes were ac- PFB0330c 2415 2238 5476 + + PFB0335c 3428 1695 4803 tively transcribed. PFB0340c 28,613 13,254 59,511 + + As mentioned above, P. falcipa- PFB0345c 2273 3115 10,053 + rum calpain, metacaspase, and -signal peptidase1 are of particular in + + 6320 מ PFB0350c 4572 ממ PFB0355c 1401 terest due to the potential biological PFI0135c 3655 2747 3516 + MAL13P1.310 676 1635 53,119 roles they may play. The microarray analysis suggested that the predicted + 2060 מ PF14_0577 1008 PF11_0177 781 2058 3604 + genes for these proteases were ac- tively transcribed (Table 2). We also + 2514 מ PF11_0298 700 מ PF13_0289 643 1550 performed RT-PCR to further con- PFA0220w3831 5734 10,100 + firm their expression (Fig. 8B). + ממ PFD0165w2208 PFD0680c 894 1696 2387 + The microarray and RT-PCR PFE1355c 2840 2251 2414 + + + data only indicated the active tran- .scription of 85 predicted proteases + 1831 מ PFE0835w2422 MAL7P1.147 6182 3575 3628 + + + + In order to examine expression at PFI0225w3780 1998 2117 + the level of translation, we analyzed PF13_0096 2105 3554 3789 the proteomics data published in + מ PF14_0145 1826 2428 PFL1635w No oligo ++ PlasmoDB (Florens et al. 2002). It MAL8P1.157 2311 2226 1946 + appeared that 67 out of 92 pre- MAL6P1.153 No oligo ++ + dicted proteases are translated at MAL13P1.56 2788 3250 1588 + + + + some point during the life cycle PF10_0058 1765 3672 4309 (Table 2). Some proteases are ubiq- ממ MAL13P1.184 324 PFE1155c 3316 3099 2134 + + + uitous, whereas others show stage- PFI1625c 2018 4583 4239 + + specific expression. It was notable PF11_0189 4006 4829 3862 + + that the three predicted proteases -that did not have detectable tran+++ממ PF11_0226 1002 PF13_0322 4279 8553 8017 + + + + scription from the microarray assay ממ PF14_0382 790 did show positive translation in PF14_0439 4828 10,517 4692 + + + -specific stages including intraeryth ++++ מ PFI1570c 2424 2650 PF10_0299 1326 2972 1991 + + rocytic stages. PFE1360c 857 1919 1579 + Combining the complemen- ,tary results from microarray, RT-PCR + מ MAL8P1.140 3379 2033 מ PF10_0150 2880 2103 +++ and proteomics analysis, we found +++ממ PF14_0327 4130 that of the 92 putative proteases PF14_0517 7889 15,722 6630 + + + + ,identified by scanning the genome מ PF11_0203 2440 1695 PFL1925w No oligo + 88 were transcribed and 67 were translated at some stage in the life מ PF14_0616 5347 1855 -cycle. The remaining four may be ex ממ MAL8P1.126 1030 ממ PF14_0067 735 + pressed at extraerythrocytic stages + 5371 מ PFE0370c 3384 or may be pseudogenes, a result due to the frameshift in the open read- (continued) ing frame (Triglia et al. 2001).

612 Genome Research www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Prediction of Novel Malaria Proteases

strated to be actively transcribed Table 2. Continued proteins by the microarray, RT-PCR Oligo Microarrayb Proteomicsc data, and proteomics. This study is an initial attempt at the sys- Gene ID Oligo Microarray (asyn)a TSTMGStematic identification of novel ma- laria proteases that have essential PF11_0381 1071 2229 10,369 + functions and assessment of their מ PFE0355c 840 4153 evolutionary relationship to the PFC0950c No oligo + PFC0310c 3426 2308 3281 vertebrate host. By combining in PF14_0348 640 2518 2312 silico genomics-based predictions PF08_0063 3901 3233 1715 + + + + with experimental confirmation, PF14_0063 4125 2130 3154 there is an increased likelihood PF11_0175 8264 2856 4534 + + + + of identifying new therapeutic tar- ממ PF14_0147 1023 gets. PF13_0118 3050 2405 1676 MAL13P1.167 2758 2514 4077 METHODS 10,279 מ PFE0340c 6507 MAL8P1.16 No oligo PF14_0716 8791 9053 6476 + + + Genome Sequences,Homology MAL6P1.88 No oligo + PFC0745c 5944 9387 7742 + + Search,and Comparative PF13_0282 7652 7248 5541 + + Sequence Analysis PF07_0112 4279 8258 5730 + + + + A total of 5865 nonredundant MAL8P1.128 No oligo +++ MAL13P1.270 1487 6809 5159 + + + + query sequences of characterized PFE0915c 3396 5939 5761 + + and predicted proteases from 1066 MAL8P1.142 5571 5814 5579 + + + organisms were obtained from the PFA0400c No oligo ++++ Merops database (http://www. merops.ac.uk, release 5.8 of March + ממ PF14_0676 6984 PFI1545c No oligo + 19, 2002), which has a catalog and a PF13_0156 5240 9539 13,684 + + + structure-based classification of proteases. The BLASTP searches aExpression profile of asynchronous culture using Oligo Microarrays. The microarray slide was with default setting were targeted printed with 6239 70-mers mapped to 4407 predicted open reading frames. Probes were labeled to the predicted and annotated with fluorescent dyes using mRNAs purified from an asynchronous culture as a template (http:// Plasmodium falciparum genome that derisilab.ucsf.edu/). Briefly, messenger RNAs were purified using oligo T cellulose and reverse was published in the PlasmoDB transcription was conducted to incorporate aminoallyl dUTP into the cDNAs. The Cy3 and Cy5 (http://plasmodb.org/; Kissinger et NHS esters were then coupled to amine groups of the cDNA, and dye-labeled probes were hy- al. 2002). A cut-off criteria of E- bridized with the microarray slides under standard condition (3X SSC, 50% formamide, 0.1% SDS, value <1e-04 was adopted to define 10 mg/ml salmon sperm DNA, 68°C). The slide was scanned with a GenePix 4000B (Axon Instru- protease homologs. Partial se- ment) at default PMT settings, 100% power. The array data were analyzed initially with GenePixPro quences (<80% of full-length) and software (Axon Instrument), then with global normalization. The expression level is indicated by redundant sequences were ex- the mean signal intensity of all corresponding oligomers in triplicates on the microarray slides cluded. Conserved domains/motifs (MRA-452) obtained from Malaria Research and Reference Resource Center (http:// in P. falciparum sequences were www.malaria.mr4.org/). Ten predicted proteases without corresponding oligomers are highlighted identified by searching InterPro re- in bold. Two sets of negative controls were included in the DeRisi design: (1) 20 oligomers from lease 5.1, which integrates Pfam yeast intergenic region with the mean intensity 529; (2) 33 P. falciparum genes cloned in the plasmid, including 16 ribosomal proteins, 17 tRNA genes, LSU, Clp, and tufA. Their mean intensity 7.3, PRINTS 33.0, PROSITE 17.5, was 598. The percentiles of expression level over all the spots are 297 (30%), 394 (35%), 512 ProDom 2001.3, SMART 3.1, TI- (40%), 646 (45%), and 795 (50%), respectively. The genes that showed signal intensity belowthe GRFAMs 1.2, and the current mean of negative controls are highlighted in italic. An asterisk (*) indicates the gene was reported SWISS-PROT + TrEMBL data. to be expressed in the parasitic life cycle from proteomics data (Florens et al. 2002). Multiple alignments were ob- bExpression profile in the erythrocytic stage using cDNA microarray chip containing 944 elements tained by the program T-coffee (317 genes of identiable homology; Ben Mamoun et al. 2001). The average intensities were (Notredame et al. 2000), followed indicates signal not detected or belowthe by manual editing according to the (מ) extracted from PlasmoDB. R, ring. A minus sign cut-off (35% percentile over all the spots on the array). structure information. Graphic pre- cExpression profile in the parasite life cycle using MUDPIT proteomics technology (Florens et al. sentation of the alignment and 2002), extracted from PlasmoDB. A plus sign (+) indicates at least one peptide of the protein was consensus sequences were deduced detected by Mass Spectrum. T, trophozoites; M, merozoites; G, gametocytes; S, sporozoites. by the program BOXSHADE 3.21 (http://www.ch.embnet.org/ software/BOX_form.html). Phylo- genetic trees were inferred by the neighbor-joining method (Saitou Conclusions and Nei 1987) using MEGA2.0 (http://www.megasoft- The exhaustive homology search and comparative sequence ware.net/). Unweighted Maximum Parsimony (as imple- analysis have resulted in the delineation of 92 putative pro- mented in PAUP 4.0) and Maximum Likelihood (as imple- mented in PHYLIP) were used to examine whether the in- teases, including 59 that had not been previously recognized ferred phylogeny is sensitive to any tree-making method. The in the P. falciparum genome. This set includes potentially im- bootstrap resampling with 1000 pseudoreplicates was carried portant proteases such as calpain, metacaspase, and signal out to assess support for each individual branch. Bootstrap peptidase, and indicates protease-mediated activity that may values of <50% were collapsed and treated as unresolved po- be vital for parasite life cycle. Furthermore, 88 are demon- lytomies.

Genome Research 613 www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Wu et al.

normalization. The expression level is indicated by the mean signal intensity of all corresponding oligomers in triplicates on the microarray slides (MRA-452) obtained from Malaria Research and Reference Resource Center (http://www. malaria.mr4.org/). Two sets of negative controls were in- cluded in the DeRisi design: (1) 20 oligomers from yeast intergenic region with the mean intensity 529, (2) 33 P. fal- ciparum genes cloned into a plasmid, including 16 ribosomal proteins, 17 tRNA genes, LSU, Clp, and tufA. Their mean in- tensity was 598. Reverse Transcription Polymerase Chain Reaction RT-PCR was performed using the same mRNA described above as template. Reverse transcription was conducted using Super- Script II (Invitrogen). The PCR cycle: 95°C 1 min; (95°C 1 min, 54°C 30 sec, 52°C 30 sec, 65°C 1 min) x 35, 65°C 10 min, hold at 4°C. The primer sequences used to amplify 10 target genes without corresponding oligomers in the array set, and puta- tive calpain, metacaspase, and signal peptidase I are included in the Supplemental Table 1.

ACKNOWLEDGMENTS We thank Lois Blaine, David Emerson, and Thomas Nerad for their critical comments during the manuscript preparation, Truc Nguyen for computational support. This study is sup- Figure 8 (A) Expression of ten putative proteases without corre- ported by an ATCC start-up fund to Y.W., and an NIH-grant sponding oligomers in the microarray set. RT-PCR was conducted to (1R21AI49300) to Y.W. We thank the scientists and funding examine the transcriptional expression of the ten putative proteases agencies comprising the International Malaria Genome using specific primers based on the prediction. Lane a in each sample Project for making sequence data from the genome of P. fal- represents the negative control in which RT-PCR was conducted with- ciparum (3D7) public prior to publication of the completed out reverse transcriptase. Lane 1b: PF14_0281; Lane 2b: PFL1635w; sequence. The Sanger Centre (UK) provided sequence data for Lane 3b: MAL6P1.153; Lane 4b: PFL1925w; Lane 5b: PFC0950C; Lane 1, 3–9, and 13, with financial support from the 6b: MAL8P1.16; Lane 7b: MAL6P1.88; Lane 8b: MAL8P1.128; Lane Wellcome Trust. A consortium composed of The Institute for 9b: PFA0400c; Lane 10b: PFI1545c.M indicates 1 kb DNA ladder. (B) Expression of putative calpain, caspase, and SP1 genes using two Genome Research, along with the Naval Medical Research pairs of primers. RT-PCR was conducted to confirm the transcriptional Center (USA), sequenced chromosomes 2, 10, 11 & 14, with expression of putative calpain, metacaspase, and SP-1 genes, using 2 support from NIAID/NIH, the Burroughs Wellcome Fund, and pairs of specific primers for each gene. Lanes “a” represent negative the Department of Defense. The Stanford Genome Technol- controls in which RT-PCR was conducted without reverse transcrip- ogy Center (USA) sequenced chromosome 12, with support tase.M indicates 1 kb DNA ladder. Lanes 1b and 2b: MAL13P1.310; from the Burroughs Wellcome Fund. The Plasmodium Ge- Lanes 3b and 4b: PF13_0289; Lanes 5b and 6b: PF13_0118. See nome Database is a collaborative effort of investigators at the Suppl. Table 1 for the predicted size of RT-PCR products. University of Pennsylvania (USA) and Monash University (Melbourne, Australia), supported by the Burroughs Well- come Fund. The publication costs of this article were defrayed in part Microarray Expression Analysis Using Asynchronous by payment of page charges. This article must therefore be Erythrocytic P. falciparum Culture hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact. An en masse gene expression profile was obtained using mi- croarray chips arrayed with 6239 Malaria Genome Array Oligomers (Operon Technologies), designed by Dr. Joe DeRisi REFERENCES of the University of California at San Francisco (http:// Aravind, L., Dixit, V.M., and Koonin, E.V. 1999. The domains of derisilab.ucsf.edu/). These 6239 70-mers mapped to 4407 pre- death: Evolution of the apoptosis machinery. Trends Biochem. Sci. dicted open reading frames which covered >90% of the avail- 24: 47–53. able P. falciparum genome sequences. In order to obtain all Arthur, J.S., Gauthier, S., and Elce, J.S. 1995. Active site residues in genes transcribed throughout the erythrocyte stage of the m-calpain: Identification by site-directed mutagenesis. FEBS Lett. parasite, we extracted and pooled mRNAs from P. falciparum 368: 397–400. 3D7 culture samples (Trager and Jensen 1976) collected at Banerjee, R., Liu, J., Beatty, W., Pelosof, L., Klemba, M., and four 12-h intervals to achieve an asynchrony (shown in Fig. Goldberg, D.E. 2002. Four plasmepsins are active in the Plasmodium falciparum food vacuole, including a protease with 3). Probes were labeled with fluorescent dyes using mRNAs an active-site histidine. Proc. Natl. Acad. Sci. 99: 990–995. purified from the asynchronous culture as a template. Mes- Barale, J.C., Blisnick, T., Fujioka, H., Alzari, P.M., Aikawa, M., sager RNAs were purified using oligo T cellulose, and reverse Braun-Breton, C., and Langsley, G. 1999. Plasmodium falciparum transcription was conducted to incorporate aminoallyl dUTP -like protease 2, a merozoite candidate for the into the cDNAs. The Cy3 and Cy5 NHS esters were then merozoite surface protein 1–42 maturase. Proc. Natl. Acad. Sci. coupled to amine groups of the cDNA, and dye-labeled probes 96: 6445–6450. were hybridized with the microarray slides under standard Ben Mamoun, C., Gluzman, I.Y., Hott, C., MacMillan, S.K., ,.SSC, 50% formamide, 0.1% SDS, 10 mg/mL Amarakone, A.S., Anderson, D.L., Carlton, J.M., Dame, J.Bןconditions (3 Chakrabarti, D., Martin, R.K., et al. 2001. Coordinated salmon sperm DNA, 68°C). The slide was scanned with a programme of gene expression during asexual intraerythrocytic GenePix 4000B (Axon Instrument) at default PMT settings, development of the human malaria parasite Plasmodium 100% power. The array data were analyzed initially with falciparum revealed by microarray analysis. Mol. Microbiol. GenePixPro software (Axon Instrument), then with global 39: 26–36.

614 Genome Research www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Prediction of Novel Malaria Proteases

Bernstein, N.K., Cherney, M.M., Loetscher, H., Ridley, R.G., and Hackett, F., Sajid, M., Withers-Martinez, C., Grainger, M., and James, M.N. 1999. Crystal structure of the novel aspartic Blackman, M.J. 1999. PfSUB-2: A second subtilisin-like protein in proteinase zymogen proplasmepsin II from Plasmodium Plasmodium falciparum merozoites. Mol. Biochem. Parasitol. falciparum. Nat. Struct. Biol. 6: 32–37. 103: 183–195. Blackman, M.J. 2000. Proteases involved in erythrocyte invasion by Hayward, R.E., Derisi, J.L., Alfadhli, S., Kaslow, D.C., Brown, P.O., the malaria parasite: Function and potential as chemotherapeutic and Rathod, P.K. 2000. Shotgun DNA microarrays and targets. Curr. Drug Targets 1: 59–83. stage-specific gene expression in Plasmodium falciparum malaria. Blackman, M.J., Fujioka, H., Stafford, W.H., Sajid, M., Clough, B., Mol. Microbiol. 35: 6–14. Fleck, S.L., Aikawa, M., Grainger, M., and Hackett, F. 1998. A Jean, L., Long, M., Young, J., Pery, P., and Tomley, F. 2001. Aspartyl subtilisin-like protein in secretory organelles of Plasmodium proteinase genes from apicomplexan parasites: Evidence for falciparum merozoites. J. Biol. Chem. 273: 23398–23409. evolution of the gene structure. Trends Parasitol. 17: 491–498. Braun-Breton, C., Rosenberry, T.L., and da Silva, L.P. 1988. Kissinger, J.C., Brunk, B.P., Crabtree, J., Fraunholz, M.J., Gajria, B., Induction of the proteolytic activity of a membrane protein in Milgram, A.J., Pearson, D.S., Schug, J., Bahl, A., Diskin, S.J., et al. Plasmodium falciparum by phosphatidyl inositol-specific 2002. The Plasmodium genome database. Nature 419: 490–492. phospholipase C. Nature 332: 457–459. Li, J., Matsuoka, H., Mitamura, T., and Horii, T. 2002. Coombs, G.H., Goldberg, D.E., Klemba, M., Berry, C., Kay, J., and Characterization of proteases involved in the processing of Mottram, J.C. 2001. Aspartic proteases of Plasmodium falciparum Plasmodium falciparum serine repeat antigen (SERA). Mol. and other parasitic protozoa as drug targets. Trends Parasitol. Biochem. Parasitol. 120: 177–186. 17: 532–537. Li, W-H. 1983. Evolution of duplicate genes and pseudogenes. In Craik, C.S., Rutter, W.J., and Fletterick, R. 1983. Splice junctions: Evolution of genes and proteins (eds. M. Nei and R.K. Keohn), pp. Association with variation in protein structure. Science 14–37. RK Sinauer Associates, Sunderland, MA. 220: 1125–1129. Madeo, F., Herker, E., Maldener, C., Wissing, S., Lachelt, S., Herlan, Curley, G.P., O’Donovan, S.M., McNally, J., Mullally, M., O’Hara, H., M., Fehr, M., Lauber, K., Sigrist, S.J., Wesselborg, S., et al. 2002. A Troy, A., O’Callaghan, S.A., and Dalton, J.P. 1994. caspase-related protease regulates apoptosis in yeast. Mol. Cell. from Plasmodium falciparum, Plasmodium 9: 911–917. chabaudi chabaudi and Plasmodium berghei. J. Eukaryot. McKerrow, J.H., Sun, E., Rosenthal, P.J., and Bouvier, J. 1993. The Microbiol. 41: 119–123. proteases and pathogenicity of parasitic protozoa. Annu. Rev. David, P.H., Hadley, T.J., Aikawa, M., and Miller, L.H. 1984. Microbiol. 47: 821–853. Processing of a major parasite surface during the McLysaght, A., Hokamp, K., and Wolfe, K.H. 2002. Extensive ultimate stages of differentiation in Plasmodium knowlesi. Mol. genomic duplication during early chordate evolution. Nat. Genet. Biochem. Parasitol. 11: 267–282. 31: 200–204. Delplace, P., Bhatia, A., Cagnard, M., Camus, D., Colombet, G., Miller, S.K., Good, R., Drew, D.R., Delorenzi, M., Sanders, P.R., Debrabant, A., Dubremetz, J.F., Dubreuil, N., Prensier, G., Fortier, Hodder, A.N., Speed, T.P., Cowman, A.F., De Koning-Ward, T.F., B., et al. 1988. Protein p126: A antigen and Crabb, B.S. 2002. A subset of Plasmodium falciparum SERA associated with the release of Plasmodium falciparum merozoites. genes are expressed and appear to play an important role in the Biol. Cell 64: 215–221. erythrocytic cycle. J. Biol. Chem. 277: 47524–47532. Denison, S.H., Orejas, M., and Arst Jr., H.N. 1995. Signaling of Mu, J., Duan, J., Makova, K.D., Joy, D.A., Huynh, C.Q., Branch, ambient pH in Aspergillus involves a cysteine protease. J. Biol. O.H., Li, W.H., and Su, X.Z. 2002. Chromosome-wide SNPs Chem. 270: 28519–28522. reveal an ancient origin for Plasmodium falciparum. Nature Doerig, C., Meijer, L., and Mottram, J.C. 2002. Protein kinases as 418: 323–326. drug targets in parasitic protozoa. Trends Parasitol. 18: 366–371. Narum, D.L. and Thomas, A.W. 1994. Differential localization of Dua, M., Raphael, P., Sijwali, P.S., Rosenthal, P.J., and Hanspal, M. full-length and processed forms of PF83/AMA-1 an apical 2001. Recombinant falcipain-2 cleaves erythrocyte membrane membrane antigen of Plasmodium falciparum merozoites. Mol. and protein 4.1. Mol. Biochem. Parasitol. 116: 95–99 Biochem. Parasitol. 67: 59–68. Eggleson, K.K., Duffin, K.L., and Goldberg, D.E. 1999. Identification Notredame, C., Higgins, D.G., and Heringa, J. 2000. T-Coffee: A and characterization of falcilysin, a metallopeptidase involved in novel method for fast and accurate multiple . hemoglobin catabolism within the malaria parasite Plasmodium J. Mol. Biol. 302: 205–217. falciparum. J. Biol. Chem. 274: 32411–32417. Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, Berlin Fast, N.M., Kissinger, J.C., Roos, D.S., and Keeling, P.J. 2001. Olaya, P. and Wasserman, M. 1991. Effect of calpain inhibitors on Nuclear-encoded, plastid-targeted genes suggest a single common the invasion of human erythrocytes by the parasite Plasmodium origin for apicomplexan and plastids. Mol. Biol. falciparum. Biochim. Biophys. Acta. 1096: 217–221. Evol. 18: 418–426. Paetzel, M., Dalbey, R.E., and Strynadka, N.C. 2000. The structure Florens, L., Washburn, M.P., Raine, J.D., Anthony, R.M., Grainger, and mechanism of bacterial type I signal peptidases. A novel M., Haynes, J.D., Moch, J.K., Muster, N., Sacci, J.B., Tabb, D.L., et antibiotic target. Pharmacol. Ther. 87: 27–49. al. 2002. A proteomic view of the Plasmodium falciparum life Rawlings, N.D. and Barrett, A.J. 1993. Evolutionary families of cycle. Nature 419: 520–526. peptidases. Biochem. J. 290: 205–218. Florent, I., Derhy, Z., Allary, M., Monsigny, M., Mayer, R., and Rich, S.M., Licht, M.C., Hudson, R.R., and Ayala, F.J. 1998. Malaria’s Schrevel, J. 1998. A Plasmodium falciparum aminopeptidase gene Eve: Evidence of a recent population bottleneck throughout the belonging to the M1 family of zinc-metallopeptidases is world populations of Plasmodium falciparum. Proc. Natl. Acad. Sci. expressed in erythrocytic stages. Mol. Biochem. Parasitol. 95: 4425–4430. 97: 149–160. Rosenthal, P.J. 1998. Proteases of malaria parasites: New targets for Franz, T., Vingron, M., Boehm, T., and Dear, T.N. 1999. Capn7: A chemotherapy. Emerg. Infect. Dis. 4: 49–57. highly divergent vertebrate calpain with a novel C-terminal Rosenthal, P.J. 2002. Hydrolysis of erythrocyte proteins by proteases domain. Mamm. Genome. 10: 318–321. of malaria parasites. Curr. Opin. Hematol. 9: 140–145. Friedman, R., and Hughes, A.L. 2001. Pattern and timing of gene Rosenthal, P.J., Kim, K., McKerrow, J.H., and Leech, J.H. 1987. duplication in animal genomes. Genome Res. 11: 1842–1847. Identification of three stage-specific proteinases of Plasmodium Gantt, S.M., Myung, J.M., Briones, M.R., Li, W.D., Corey, E.J., falciparum. J. Exp. Med. 166: 816–821. Omura, S., Nussenzweig, V., and Sinnis, P. 1998. Proteasome Saitou, N. and Nei, M. 1987. The neighbor-joining method: A new inhibitors block development of Plasmodium spp. Antimicrob. method for reconstructing phylogenetic trees. Mol. Biol. Evol. Agents Chemother. 42: 2731–2738 4: 406–425. Gardner, M.J., Hall, N., Fung, E., White, O., Berriman, M., Hyman, Salmon, B.L., Oksman, A., and Goldberg, D.E. 2001. Malaria parasite R.W., Carlton, J.M., Pain, A., Nelson, K.E., Bowman, S., et al. exit from the host erythrocyte: A two-step process requiring 2002. Genome sequence of the human malaria parasite extraerythrocytic proteolysis. Proc. Natl. Acad. Sci. 98: 271–276. Plasmodium falciparum. Nature 41: 498– 511. Shenai, B.R., Sijwali, P.S., Singh, A., and Rosenthal, P.J. 2000. Glading, A., Lauffenburger, D.A., and Wells, A. 2002. Cutting to the Characterization of native and recombinant falcipain-2, a chase: Calpain proteases in cell motility. Trends Cell Biol. principal trophozoite cysteine protease and essential 12: 46–54. hemoglobinase of Plasmodium falciparum. J. Biol. Chem. Gu, X., Wang, Y., and Gu, J. 2002. Age distribution of human gene 275: 29000–29010. families shows significant roles of both large- and small-scale Silva, A.M., Lee, A.Y., Gulnik, S.V., Maier, P., Collins, J., Bhat, T.N., duplications in vertebrate evolution. Nat Genet. 31: 205–209. Collins, P.J., Cachau, R.E., Luker, K.E., Gluzman, I.Y., et al. 1996.

Genome Research 615 www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Wu et al.

Structure and inhibition of plasmepsin II, a prediction. Genetics 158: 1311–1320. hemoglobin-degrading enzyme from Plasmodium falciparum. Proc. Wilson, R.J., Denny, P.W., Preiser, P.R., Rangachari, K., Roberts, K., Natl. Acad. Sci. 93: 10034–10039. Roy, A., Whyte, A., Strath, M., Moore, D.J., Moore, P.W., et al. Sorimachi, H., Ishiura, S., and Suzuki, K. 1997. Structure and 1996. Complete gene map of the plastid-like DNA of the malaria physiological function of calpains. Biochem. J. 328: 721–732. parasite Plasmodium falciparum. J. Mol. Biol. 261: 155–172. Southan, C. 2001. A genomic perspective on human proteases. FEBS Wootton, J.C., Feng, X., Ferdig, M.T., Cooper, R.A., Mu, J., Baruch, Lett. 498: 214–218. D.I., Magill, A.J., and Su, X.Z. 2002. Genetic diversity and Thornberry, N.A. and Lazbnik, L. 1998. Caspases: Enemies within. chloroquine selective sweeps in Plasmodium falciparum. Nature Science 281: 1312–1316. 418: 320–323. Trager, W. and Jensen, J.B. 1976. Human malaria parasites in Yoshizawa, T., Sorimachi, H., Tomioka, S., Ishiura, S., and Suzuki, K. continuous culture. Science. 193: 673–675. 1995. A catalytic subunit of calpain possesses full proteolytic Triglia, T., Thompson, J.K., and Cowman, A.F. 2001. An EBA175 activity. FEBS Lett. 358: 101–103. homolog which is transcribed but not translated in erythrocytic Zuegge, J., Ralph, S., Schmuker, M., McFadden, G.I., and Schneider, stages of Plasmodium falciparum. Mol. Biochem. Parasitol. G. 2001. Deciphering apicoplast targeting signals—feature 116: 55–63. extraction from nuclear-encoded precursors of Plasmodium Tyas, L., Gluzman, I., Moon, R.P., Rupp, K., Westling, J., Ridley, falciparum apicoplast proteins. Gene 280: 19–26. R.G., Kay, J., Goldberg, D.E., and Berry, C. 1999. Naturally- occurring and recombinant forms of the aspartic proteinases plasmepsins I and II from the human malaria parasite Plasmodium falciparum. FEBS Lett. 454: 210–214. WEB SITE REFERENCES Uren, A.G., O’Rourke, K., Aravind, L.A., Pisabarro, M.T., Seshagiri, S., http://www.merops.ac.uk; a catalogue and structure-based Koonin, E.V., and Dixit, V.M. 2000. Identification of classification of proteases. paracaspases and metacaspases: Two ancient families of http://www.expasy.ch/cgi-bin/lists?peptidas.txt; classification of caspase-like proteins, one of which plays a key role in MALT peptidase (protease) families in SWISS-PROT. lymphoma. Mol. Cell 6: 961–967. http://plasmodb.org; official database of the malaria parasite genome Verma, R., Aravind, L., Oania, R., McDonald, W.H., Yates III, J.R., project. Koonin, E.V., and Deshaies, R.J. 2002. Role of Rpn11 http://www.ch.embnet.org/software/BOX_form.html; software for Metalloprotease in deubiquitination and degradation by the 26S printing and shading of multiple alignment files. proteasome. Science 298: 611–615. http://www.megasoftware.net/; software package for molecular Verra, F. and Hughes, A.L. 1999. Natural selection on apical evolutionary genetics analysis. membrane antigen-1 of Plasmodium falciparum. Parassitologi. http://derisilab.ucsf.edu/; microarray resources provided by 41: 93–95. Dr. Joseph DeRisi at University of California, San Francisco. Volkman, S.K., Barry, A.E., Lyons, E.J., Nielsen, K.M., Thomas, S.M., http://www.malaria.mr4.org/; Malaria Research and Reference Choi, M., Thakore, S.S., Day, K.P., Wirth, D.F., and Hartl, D.L. Reagent Resource Center. 2001. Recent origin of Plasmodium falciparum from a single progenitor. Science 293: 482–484. Wang, Y. and Gu, X. 2001. Functional divergence in caspase gene family and altered functional constraints: Statistical analysis and Received October 16, 2002; accepted in revised form January 28, 2003.

616 Genome Research www.genome.org Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Data-Mining Approaches Reveal Hidden Families of Proteases in the Genome of Malaria Parasite

Yimin Wu, Xiangyun Wang, Xia Liu, et al.

Genome Res. 2003 13: 601-616 Access the most recent version at doi:10.1101/gr.913403

Supplemental http://genome.cshlp.org/content/suppl/2003/08/27/13.4.601.DC1 Material

References This article cites 62 articles, 19 of which can be accessed free at: http://genome.cshlp.org/content/13/4/601.full.html#ref-list-1

License

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the Service top right corner of the article or click here.

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

Cold Spring Harbor Laboratory Press