<<

Stage-specific proteomic expression patterns of the human filarial parasite malayi and its endosymbiont Wolbachia

Sasisekhar Bennurua,1, Zhaojing Mengb, José M. C. Ribeiroc, Roshanak Tolouei Semnania, Elodie Ghedind, King Chanb, David A. Lucasb, Timothy D. Veenstrab, and Thomas B. Nutmana

aLaboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20892; bLaboratory of Proteomics and Analytical Technologies, Advanced Technology Program, SAIC–Frederick, Frederick, MD 21702; cLaboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20892; and dCenter for Vaccine Research, Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261

Edited by Paul B. Rainey, Massey University, Auckland, New Zealand, and accepted by the Editorial Board May 1, 2011 (received for review August 5, 2010) Global proteomic analyses of pathogens have thus far been limited stop codon read-throughs, frame shifts, and predicted orphan to unicellular organisms (e.g., protozoa and bacteria). Proteomic genes. These data also can help delineate the expression status of analyses of most eukaryotic pathogens (e.g., helminths) have been known and predicted/hypothetical genes. restricted to specific organs, specific stages, or secretomes. We Proteomic analyses of most eukaryotic pathogens (e.g., hel- report here a large-scale proteomic characterization of almost all minths) have been restricted to specific organs, specific stages, or the major mammalian stages of , a causative agent secretomes. Previously, we and others have described the secre- of lymphatic filariasis, resulting in the identification of more than tomes of Bm (1–3). We report here large-scale proteomic anal- 62% of the products predicted from the Bm draft genome. The yses of almost all the major mammalian stages of Bm, resulting in analysis also yielded much of the proteome of Wolbachia, the the identification of more than 62% of the products predicted obligate endosymbiont of Bm that also expressed proteins in from the Bm draft genome (4). We also report the identification a stage-specific manner. Of the 11,610 predicted Bm gene prod- of the majority of the expressed proteins of the Bm–Wolbachia ucts, 7,103 were definitively identified from adult male, adult fe- (wBm), the obligate endosymbiont of Bm that also appears to male, blood-borne and uterine microfilariae, and infective L3 express proteins in a stage-specific manner. larvae. Among the 4,956 gene products (42.5%) inferred from the genome as “hypothetical,” the present study was able to con- Results fi fi rm 2,336 (47.1%) as bona de proteins. Analysis of protein fam- Overview of Bm Proteome. To assemble a high-density proteome ilies and domains coupled with stage-specific expression highlight map of Bm, proteins from the adult male (AM) and adult female the important pathways that benefit the parasite during its de- (AF) parasites, microfilariae (MF), L3 larvae (L3), and the im- fi velopment in the host. Gene set enrichment analysis identi ed mature (i.e., uterine) MF (UTMF) were extracted. After having extracellular matrix proteins and those with immunologic effects been digested into tryptic peptides, each stage was analyzed in- fi as enriched in the micro larial and L3 stages. Parasite sex- and dependently by using reverse-phase liquid chromatography– fi fi stage-speci c protein expression identi ed those pathways re- tandem MS (RPLC-MS/MS). The spectra were searched against fi lated to parasite differentiation and demonstrates stage-speci c the genomic databases for Bm and its endosymbiont Wolbachia expression by the Bm endosymbiont Wolbachia as well. (wBm). A total of 72,318 unique peptides were matched to 6,981 proteins (3,653, 3,688, 3,135, 2,672, and 4,843 proteins) from filaria | AM, AF, MF, L3 larvae, and UTMF, respectively (SI Appendix, Table S1). Combining these data with those from a study per- isease associated with infection by Brugia malayi (Bm) and formed previously on the Bm secretome (1) (that included 122 DWuchereria bancrofti, the two major causative organisms of proteins not found in the current analyses) and 164 additional human lymphatic filariasis, is the second leading cause of mor- proteins (based on peptide matches that identified more than bidity/disability worldwide, in large part because of the parasites’ one protein; SI Appendix, Table S2) resulted in the definitive ability to alter the structural and functional integrity of the identification of a total of 7,103 proteins of the 11,610 proteins lymphatics, leading to lymphedema and elephantiasis. Invasion, (∼61%) predicted from the genome (4) [Fig. 1A and SI Appen- establishment of infection within the host and development are dix, Table S2; Brugia Proteome Database (http://exon.niaid.nih. essential processes within the complex parasite life cycle (SI gov/transcriptome/brugia/Brugia_Proteome.zip)]. MICROBIOLOGY Appendix, Fig. S1), with many of the parasitic stages being targets Genomic analysis predicted that 4,956 (42.7%) of the 11,610 for therapeutic intervention or vaccines. Each of the filarial life potential proteins were hypothetical proteins; the present study cycle stages has characteristics that are shared and others that are stage-specific. Filarial infections are often characterized by a series of dis- Author contributions: S.B. and T.B.N. designed research; S.B., Z.M., J.M.C.R., K.C., and crete host responses directed at the parasite and its endosym- D.A.L. performed research; J.M.C.R., E.G., T.D.V., and T.B.N. contributed new reagents/ biont Wolbachia that evolve during the course of infection. analytic tools; S.B., Z.M., J.M.C.R., R.T.S., E.G., T.D.V., and T.B.N. analyzed data; and S.B. Because proteins are usually the effectors of most biological and T.B.N. wrote the paper. fl functions, proteomic data enable a more direct understanding of The authors declare no con ict of interest. these important processes compared with those inferred from This article is a PNAS Direct Submission. P.B.R. is a guest editor invited by the Editorial Board. genomic studies. Absolute quantification of genome-wide ex- Data deposition: Detailed database and extensive annotation of the genome-wide pro- pressed proteins is not yet within our reach for most eukaryotes. teins identified from Brugia malayi and its endosymbiont Wolbachia is available for However, spectral counts of massive MS-based data (e.g., ob- download from the National Institutes of Health server mentioned in the manuscript. served frequencies of each peptide) allow for relative quantifi- 1To whom correspondence should be addressed. E-mail: [email protected]. cation. Proteomic data also allow for clearer genomic curation by This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. improving annotation and the identification of translational sites, 1073/pnas.1011481108/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1011481108 PNAS | June 7, 2011 | vol. 108 | no. 23 | 9649–9654 Downloaded by guest on September 26, 2021 Fig. 1. (A) Venn diagrams illustrating the overview of the Brugia proteome. The total numbers of proteins definitively identified (N = 7,103) include the excretory-secretory products and the somatic proteins. In addition, the 164 proteins for which a definitive identification could not be made are depicted separately as nonunique. (B) Functional annotation of the total Brugia proteome. Pie chart representing the percentage of proteins within each functional category as a function of the total proteome. Only a single annotation was assigned to a given protein. All unknown and hypothetical proteins have been classified as uncharacterized. Note that metabolism includes amino acid, carbohydrate, nuclear, and energy metabolism. A complete list is given with ad- ditional annotation and embedded links in the Brugia Proteome Database (http://exon.niaid.nih.gov/transcriptome/brugia/Brugia_Proteome.zip).

was able to confirm 2,336 (47.1%) of these 4,956 predicted proteins identified by LC-MS/MS (SI Appendix, Fig. S4, red) proteins as bona fide proteins. Interestingly, 594 of these 2,336 suggests a close overlap between the two sets of data (Fig. S4), hypothetical proteins are classified as “conserved hypothetical but the larger proteins were more readily detected by LC-MS/ proteins.” Although the function of these “conserved” proteins is MS than those with lower molecular weight (MW). Indeed, the not completely known, approximately 30% of these could be median length of the proteins detected using LC-MS/MS was 353 assigned probable functions based on a conserved sequence residues, whereas that of the nondetected proteins (inferred motif or subtle similarities to other characterized functional and from the genome) was 168 residues (SI Appendix, Fig. S5). The structural features. Moreover, there appears to be some stage- LC-MS/MS identification of a greater number of higher-MW specific enrichment/abundance of many of the conserved hypo- proteins (compared with lower-MW proteins) could be a result thetical proteins (SI Appendix, Fig. S2) that may fill in the gaps of Bm using multidomain proteins in its parasitic lifestyle or believed to be missing in specific metabolic pathways or that may because smaller proteins generate fewer tryptic peptides avail- act as mediators with activities that have not been recognized able for identification. Although the latter hypothesis seems to previously (reviewed in ref. 5). be supported by the increasing number of peptides detected in relation to size (SI Appendix, Fig. S6), analysis of the number of fi fi Stage-Speci c Expression. Among the identi ed proteins from peptides identified as a proportion of the total theoretical tryptic each of the stages (Fig. 1A), 31% (2,255 and 7,267) were fi peptides (SI Appendix, Fig. S7) does not suggest that there was expressed exclusively in one of the ve stages analyzed. Indeed, a differential detectability of the peptides across a wide range of only 15% were common to all the stages. Moreover, within a MWs. The presence of relatively large proteins in the proteomes given parasite stage, the percentages of the total number of of parasitic organisms compared with free-living species may also proteins identified in each stage that was specific for that stage be related to parameters such as temperature, environment, and ranged from 9.3% to 18.7%. Although the function of most of other stresses (6). With only 65% of the detected proteins these stage-specific proteins is largely unknown, their stage encoded by complete ORFs [defined by the presence of an ATG specific expression is an indicator of the developmental regula- start and a stop codon (Bm draft genome (4); Brugia Proteome tion of particular processes, only some of which are defined at Database)], this suggests that either there was an incomplete/ the present time. To corroborate and extend the analysis of the stage-specific partial annotation of the Bm genome or that Brugia can read expression in a subset of the proteomic data, the proteomic through in-frame stop codons (TGA and TAG) to code for profiles of the AM and AF Bm (using a female:male ratio of selenocysteine and pyrrolysine. fi The “butterfly” pattern distribution with large acidic and basic spectral counts) were compared with transcriptional pro ling “ ” (fold change, female:male). A significant relationship (r = wings of the Bm proteome is a phenomenon that has been 0.4657; P < 0.0001; SI Appendix, Fig. S3) between gene expres- observed in a number of genomes (7). This bimodality in pI fi sion and protein production was observed. Interestingly, apart distribution may be related to the dif culty in maintaining pro- from the expected sperm-associated proteins, AM parasites ap- tein structural stability and solubility near physiological/cyto- pear to expend more energy on the production of cytoskeletal plasmic pH or the preponderance of highly acidic and basic proteins [e.g., myosins (Bm1_40715), troponins (Bm1_35060), residues (i.e., D, E, K, T, R) versus the ones with pKa values and intermediate filament proteins (Bm1_45215)], whereas the close to 7 (i.e., H, C). Indeed only approximately 3% of the AF worm protein production was skewed toward proteins genome encodes for neutral proteins (based on calculated pI). [e.g., major filarial sheath proteins (Bm1_19100), CHD4 Interestingly, the MW/pI plot for elegans, the only (Bm1_47050), and cullin (Bm1_45370)] involved in embryogen- other nematode for which this type of analysis has been done, esis or the production of MF. does not exhibit a clearly distinct butterfly pattern (8), suggesting that this distribution may have relevance to how parasitic (e.g., Amino Acid Composition, Codon Use, and CAI. The LC-MS/MS– Bm) and nonparasitic, free-living differ biologically. based proteomic analysis resulted in the identification of proteins Codon usage tables (SI Appendix, Table S4) were generated by across a broad pI range. Comparison of the pI and MW of all the using the 25 most abundant proteins (SI Appendix, Table S5), predicted proteins (SI Appendix, Fig. S4, blue) with that of the and the codon adaptation index (CAI) was analyzed. CAI scores

9650 | www.pnas.org/cgi/doi/10.1073/pnas.1011481108 Bennuru et al. Downloaded by guest on September 26, 2021 not only indicate that the vast majority of the proteins have tified to date as part of the Brugia proteome characterization, moderate expression levels (CAI > 0.5 and < 0.7), but also 45% of the proteins have no known function (Fig. 1B). To ac- highlights the identification of proteins with relatively low count for the relative abundance of specific functional pro- abundance (CAI < 0.5; SI Appendix, Fig S8A). Moreover, when cesses, the proteomic data from each stage were normalized to the CAI was analyzed by stage, there were no obvious differences fructose 1,6 bisphosphate aldolase (Bm1_15350). Stage-specific among the different stages (SI Appendix, Fig. S8B). Analysis analysis of the functional annotation (as percentage of total indicates that, of the 16 predicted proteins (comprising entirely proteins identified per stage) reveals that none of the stages was hypothetical proteins) with a CAI value greater than 0.9, only biased toward any particular functional process (SI Appendix, one was detectable in the proteome. This could also be attrib- Fig. S10). However, enrichment analysis suggests that immuno- uted to the fact that all the predicted proteins in the CAI range logically relevant (Fig. 2 and SI Appendix, Fig. S11) and extra- of 0.9 to 1.0 were small proteins or peptides. Despite this, the cellullar matrix (ECM)-related proteins (SI Appendix, Fig. S12) fi proportion of peptides identi ed (relative to the number of are enriched in the microfilaria and L3 stages compared with the proteins within a given CAI range) increased with increasing other stages studied (SI Appendix, Table S8). Some of these CAI values (SI Appendix, Fig. S8C). immunologically relevant proteins have been described pre- viously (23–33) and relate to those molecules that are highly Amino Acid Repeats. Amino acid composition, especially single amino acid repeats (SAARs) and tandem repeats (TRs) are an immunogenic in human infections or that have activity on the important feature of proteomic analyses. The most prominent mammalian host immune response (through mimicry or other (>5 aa) repeats were Gln, Ser, Asp, Ala, Pro, Thr, and Glu (SI mechanisms). These data corroborate a number of previous fi Appendix, Fig. S9 and Tables S6 and S7). SAARs account for studies that demonstrate stage-speci c expression and/or sero- 12% to 14% of the amino acid content of proteomes from logic reactivity of ALT-2 and the larval allergens (in the L3 β eukaryotes, archaea, and bacteria, but are not present in every stage) (34) and BmR1 (35), BmMIF (36), TGF- homologue protein class (e.g., they are absent from metabolic enzymes and (30), SXP-1 (37), galectins (38), and microfilarial sheath proteins heat shock proteins) (9–11). SAARs also tend to appear pri- (in the microfilarial stage) (39). Particular sets of ECM-associ- marily in the flexible regions and loops of transcription factors ated proteins were common and highly enriched in the both the and protein kinases (12, 13). L3s (SI Appendix, Fig. S12A) and in the MF (SI Appendix, Similar to that of Plasmodium knowlesi and Plasmodium Fig. S12B). vivax, the filarial genome is highly A- and T-rich (∼70%) (4, Approximately 73% of the proteins had matches to the 14). In comparison, the ORFome is closer to 60% A and T-rich. InterPro, Prosite, ProDom, or Pfam databases. Among the top Under conditions of codon frequencies that control the de- 25 protein domains detected in the proteome, the most abundant pendency of repeat expansion, an A–T-rich genome should were domains associated with protein kinases (PF00069), WD show relatively equal distributions of lysine, asparagine, phe- nylalanine, and isoleucine. Although the A–T content certainly influences the number of Lys and Asn repeats, the un- derrepresentation of Phe and Ile (SI Appendix,Fig.S9)indi- cates a selective determinant at play. TRs are common in structural proteins such as collagens, keratins, and antifreeze proteins. Approximately 250 InterPro entries have been characterized as repeats that do not fold into a globular domain of their own, such as ankyrins, keltch, TRP, and armadillo repeats. The occurrence of TRs in proteins from parasitic organisms such as Plasmodium spp. (15), Leishmania spp. (16), and Trypanosome spp. (17, 18), is widespread. Al- though their exact function is still unclear, it has been suggested that they are involved in protein–protein interactions, binding to host-cell receptors or other processes. A relevant feature concerning protein repeats is the presence of substantial hu- moral response that barely confer a protective immunity (19, 20),whichleadstothespeculationthatthesecouldbedecoyor immunomodulatory moieties (15, 18, 21, 22). Although the Bm- predicted proteome (11,610 proteins) has an overall occurrence MICROBIOLOGY of TRs of approximately 15% (i.e., using a threshold of six or more residues occurring twice or more in a given protein; SI Appendix,TableS7), it does not appear that the presence of TR in proteins is related to the humoral responses they engender. Only a few known immunologically active proteins (LL20, ALT, major microfilarial sheath protein) were found to contain repetitive domains, the functions of which are still speculative (20). Fig. 2. Stage-specific enrichment in Bm proteome. Microfilarial stage-specific Functional Classification. The identified proteins were classified enrichment of immunologically relevant Bm proteins. GSEA analysis was fi performed on proteins ranked based on their relative abundance in each into functional categories based broadly on the KOG classi ca- fl tion of C. elegans (www.wormbase.org), with some adaptations. stage. The green curve shows the enrichment score and re ects the degree fi to which each protein (represented by the vertical lines) is represented at the Similar to our previous classi cation (1), hypothetical, unchar- top or bottom of the ranked protein list. The dotted black line specifies the acterized conserved, and unknown proteins were grouped as maximum enrichment score. The heat map depicts the relative abundance uncharacterized, whereas metabolic processes of carbohydrates, (red to blue) of the proteins specifically enriched in the microfilarial stage amino acids, lipids, nuclear, and energy were grouped within and compared with the expression in the other stages. SI Appendix includes a single category termed metabolism. Of all the proteins iden- information on L3 enriched proteins and additional analyses.

Bennuru et al. PNAS | June 7, 2011 | vol. 108 | no. 23 | 9651 Downloaded by guest on September 26, 2021 Wolbachia (PF00400), zinc finger C2H2 type (PF00096), RNA recognition . The genome of wBm is represented by a single cir- motif (PF00076), and collagen triple helix (PF01391)-containing cular chromosome that is approximately 66% A- and T-rich with proteins. Although this type of analysis examines the relative an extremely low density of predicted functional genes (40). diversity rather than the overall activity within a given protein Proteomic analysis of the various stages of Bm resulted in the family domain, we have been able to demonstrate that MFs have identification of 557 of the 805 wBm-predicted proteins [based “ a marked enrichment of C2H2 domain-containing zinc finger on peptides matching a single Wolbachia protein, unique pep- proteins, proteins that generally bind DNA or act as transcrip- tides”; SI Appendix, Table S9; and Wbm Proteome Database tion factors (SI Appendix, Fig. S13). The infective L3 larvae (http://exon.niaid.nih.gov/transcriptome/brugia/Wbm_Proteome. similarly contained a large number of collagen protein family zip)], many of which are expressed in a stage-specific manner members, whereas Ser/Thr phosphatase and DnaJ protein fam- (Fig. 3A). Among the most abundantly (relatively) detected wBm ilies were prominent within the AM proteome. proteins were the outer surface protein WSP (Wbm0432), Gene sets specific to the L3 (293) and UTMF (905) and those probable outer membrane protein (Wbm0010), outer membrane common with the AF indicates possible roles as developmental protein-pal like (Wbm0152) protein, chaperonin GroEL, HSP60 proteins, whereas the core genes (n = 1,093) implicate necessary (Wbm0350), and the molecular chaperone DnaK, HSP70 components required at all stages of the lifecycle. Further in- (Wbm0495; Fig. 3B). A total of 96 of the 166 hypothetical pro- depth functional analysis of subsets of proteins should help in teins could be validated as bona fide proteins, of which identifying target molecules that are crucial for the development Wbm0253 and Wbm0603 happen to be among the proteins that of the parasite and establishment of infection. were identified with the most abundant peptide counts (Fig. 3B).

Fig. 3. Overview and functional features of Wolbachia proteome. (A) Venn diagrams illustrating the overview of Wolbachia proteins identified from each stage of the filarial lifecycle. The total number of proteins identified (N = 557) includes the excretory-secretory products and somatic proteins from each stage. A complete list is given with additional annotation and embedded links in the wBm Proteome Database (http://exon.niaid.nih.gov/transcriptome/brugia/ Wbm_Proteome.zip). (B) Heat map showing the most abundantly identified Wolbachia proteins within a given parasite stage. (C) Functional annotation of the total Wolbachia proteome. Pie chart representing the percentage of proteins within each functional category as a function of the total Wolbachia proteome. (D) The percentage of the total predicted proteome contributed by each parasite stage within a functional category. Only a single annotation was assigned to a given protein. All unknown and hypothetical proteins have been classified as uncharacterized. Note that metabolism includes amino acid, carbohydrate, nuclear, and energy metabolism.

9652 | www.pnas.org/cgi/doi/10.1073/pnas.1011481108 Bennuru et al. Downloaded by guest on September 26, 2021 Interestingly, the ribosomal protein S18 (Wbm0501) that was Nevertheless, the proteome maps of Bm and its endosymbiont found in the excretory–secretory products of MF (1) was not wBm provide a detailed and stage-specific picture that comple- detected in any of the somatic proteomes. Functionally, genes ments genome annotation and gene prediction. Biological function involved in translation, ribosomal structure, and biogenesis were arises in part from the concerted actions of interacting proteins in the second most abundant (12%) after uncharacterized/hypo- specialized networks. The library of stage-specific polypeptides now thetical proteins (27%; Fig. 3C). Immunologically, the major allows expansion to high-resolution surveys of metabolic or regu- responses to Wolbachia following infection with L3 larvae have latory pathways, and thus the binary interaction networks that will been to the WSP protein, a Wolbachia protein that has been help in understanding host–parasite interactions. suggested to be expressed in a stage-specific manner (41). Stage- specific analysis of this endosymbiont also indicates a bias toward Materials and Methods proteins involved in posttranslational modifications, protein Parasites and in Vitro Culture. Adult Bm male (BmAM) and female (BmAF) turnover, and chaperones in the adult worms compared with the parasites, MF (BmMF), and the L3 larvae were obtained from the other stages. On a comparative basis, there seems to be low Research Reagent Repository Center (Athens, GA). The immature forms of MF numbers of Wolbachia in the immature stages of the MF (i.e., (i.e., UTMF) shed by the AFs in vitro were collected every 24 h. The UTMF), or this stage has Wolbachia that are metabolically less procedures were conducted in accordance with the animal care and use active (Fig. 3D). committee guidelines at the National Institutes of Health and the University The presence of complete pathways for nucleotide and heme of Georgia. biosynthesis in Wolbachia and their partial absence in the filarial parasites, coupled with the observation of a loss of viability and Protein Isolation. The parasite stages were lysed in lysis buffer, dialyzed, fi desalted, and digested with trypsin. Strong cation-exchange liquid chro- reproductive capacity of the larial organisms following elimi- matography fractionation of tryptic peptides was performed. nation of Wolbachia by antibiotics (42–44), suggest that the en- dosymbiont provides crucial signals and pathway components Nanobore RPLC-MS/MS. Fractions collected from the strong cation-exchange critical for parasite survival. Interestingly, all the members of the column were pooled, lyophilized, and reconstituted in 20 μL0.1%TFA heme biosynthetic pathway, except for the ferrochelatase before analysis by nanobore RPLC-MS/MS, by using an Agilent 1100 (Wbm0719) and the protoporphyrinogen oxidase, which is ab- Nanoflow LC system coupled online with a linear ion trap–Fourier trans- sent from the Wolbachia genome, were detected in the present form mass spectrometer. proteomic analysis. Although wBm is devoid of a cell wall, the functional cellular machinery for the synthesis of lipid-II exists LC-MS/MS Data Analysis. Proteins were identified by searching the LC-MS/MS (45). Proteomic analysis identified each of the components in- data using SEQUEST against the Bm database downloaded from The Institute volved in lipid-II synthesis. for Genomic Research and the Wolbachia database from New England Biolabs. Methionine oxidation and phosphorylation on serine, threonine, Conclusion and tyrosine were included as dynamic modifications in the database search. Only tryptic peptides with as many as two missed cleavage sites that met the The primary challenge in systems biology is to understand and criteria (delta correlation ≥ 0.08 and charge state-dependent cross-correlation integrate genomic, transcriptomic, proteomic, and metabolomic scores ≥ 1.9 for [M+H]1+, ≥ 2.2 for [M+2H]2+, and ≥ 3.1 for [M+3H]3+) were data. Attaining complete proteomic coverage of any organism is considered legitimately identified. Further evaluation of the peptide iden- not inherently possible because of the limits of the technology as tifications were also performed by searching a subset of the data against well as the dynamic nature of any proteome. Although depleting a decoy reversed database generated from the sequences in The Institute for the most abundant proteins can enable lower abundance pro- Genomic Research database. Functional analysis and annotations were car- teins to be identified, it is still impossible to know a priori the ried out by using various bioinformatic tools (SI Appendix). Peptides were complete set of genes that are being translated within any cell assigned to proteins only if they could be matched to a single protein under a specific set of conditions. Recent studies points out that (termed unique peptides). Peptides that matched more than one protein the core proteome represents only a small fraction of the full (as in protein families or related proteins) were noted as nonunique pep- fi proteome (20.7% in yeast and 7.6% in humans) (46). tides. Proteins identi ed based on matches to nonunique peptides (except Proteomic analysis by LC-MS/MS does not detect the protein for being enumerated and listed in SI Appendix, Table S2) were not included in any other analyses. expression levels with similar sensitivities as seen with isotope labeling. Nevertheless, it provides empirical evidence of protein Gene Set Enrichment Analysis. Gene Set Enrichment Analysis (GSEA), a method expression and allows for high-throughput comparisons. Differ- for analyzing molecular profiling data, examines the clustering of a pre- ences in protein recovery from the various stages could have defined group of genes or proteins (gene set) across the entire database to resulted in the proteins being under- or oversampled. The im- determine whether the gene set has biased expression in one condition (or mature uterine microfilarial stage protein recovery (from pooled stage) versus another (47). For this analysis, the entire list of Brugia proteins samples) was relatively higher compared with the adult worms, was sorted on their relative abundance (i.e., spectral counts). The distribu- MICROBIOLOGY MF, and L3 larvae. Therefore, the conclusions on the stage- tion of proteins from an a priori defined set throughout this ranked list was specific identification of the proteins should be considered then determined by using GSEA. Sets of genes encoding for proteins in each tentative. To account for the protein recovery from various functional category (Fig. 1B and SI Appendix) were analyzed by using GSEA stages, the data were normalized to fructose 1,6 bisphosphate for specific enrichment of genes/proteins (SI Appendix). aldolase as a common housekeeping gene product. This ap- ACKNOWLEDGMENTS. This project was funded primarily by the Division of proach provides provisional evidence for relative protein abun- Intramural Research, National Institute of Allergy and Infectious Diseases, dance and the presence or absence of a particular protein in any National Institutes of Health (NIH), and in part with federal funds from the given stage. National Cancer Institute, NIH, under Contract HHSN261200800001E.

1. Bennuru S, et al. (2009) Brugia malayi excreted/secreted proteins at the host/ 4. Ghedin E, et al. (2007) Draft genome of the filarial nematode parasite Brugia malayi. parasite interface: Stage- and gender-specific proteomic profiling. PLoS Negl Trop Science 317:1756–1760. Dis 3:e410. 5. Galperin MY (2001) Conserved ‘hypothetical’ proteins: New hints and new puzzles. 2. Hewitson JP, et al. (2008) The secretome of the filarial parasite, Brugia malayi: Comp Funct Genomics 2:14–18. Proteomic profile of adult excretory-secretory products. Mol Biochem Parasitol 160: 6. Brocchieri L, Karlin S (2005) Protein length in eukaryotic and prokaryotic proteomes. – 8–21. Nucleic Acids Res 33:3390 3400. 3. Moreno Y, Geary TG (2008) Stage- and gender-specific proteomic analysis of Brugia 7. Knight CG, Kassen R, Hebestreit H, Rainey PB (2004) Global analysis of predicted proteomes: Functional adaptation of physical properties. Proc Natl Acad Sci USA 101: malayi excretory-secretory products. PLoS Negl Trop Dis 2:e326. 8390–8395.

Bennuru et al. PNAS | June 7, 2011 | vol. 108 | no. 23 | 9653 Downloaded by guest on September 26, 2021 8. Mawuenyega KG, et al. (2003) Large-scale identification of 29. Zang X, Maizels RM (2001) Serine proteinase inhibitors from nematodes and the arms proteins by multidimensional liquid chromatography-tandem mass spectrometry. J race between host and pathogen. Trends Biochem Sci 26:191–197. Proteome Res 2:23–35. 30. Gomez-Escobar N, Gregory WF, Maizels RM (2000) Identification of tgh-2, a filarial 9. Depledge DP, Dalby AR (2005) COPASAAR—a database for proteomic analysis of nematode homolog of Caenorhabditis elegans daf-7 and human transforming single amino acid repeats. BMC Bioinformatics 6:196. growth factor beta, expressed in microfilarial and adult stages of Brugia malayi. 10. Depledge DP, Lower RP, Smith DF (2007) RepSeq—a database of amino acid repeats Infect Immun 68:6402–6410. present in lower eukaryotic pathogens. BMC Bioinformatics 8:122. 31. Ghosh I, Eisinger SW, Raghavan N, Scott AL (1998) Thioredoxin peroxidases from 11. Mar Albà M, Santibáñez-Koref MF, Hancock JM (1999) Amino acid reiterations in Brugia malayi. Mol Biochem Parasitol 91:207–220. yeast are overrepresented in particular classes of proteins and show evidence of 32. Manoury B, Gregory WF, Maizels RM, Watts C (2001) Bm-CPI-2, a cystatin homolog a slippage-like mutational process. J Mol Evol 49:789–797. secreted by the filarial parasite Brugia malayi, inhibits class II MHC-restricted antigen 12. Loire E, Praz F, Higuet D, Netter P, Achaz G (2009) Hypermutability of genes in Homo processing. Curr Biol 11:447–451. sapiens due to the hosting of long mono-SSR. Mol Biol Evol 26:111–121. 33. Murray J, Gregory WF, Gomez-Escobar N, Atmadja AK, Maizels RM (2001) Expression 13. Mularoni L, Veitia RA, Albà MM (2007) Highly constrained proteins contain an and immune recognition of Brugia malayi VAL-1, a homologue of vespid venom unexpectedly large number of amino acid tandem repeats. Genomics 89:316–325. allergens and secreted proteins. Mol Biochem Parasitol 118:89–96. 14. Ghedin E, Wang S, Foster JM, Slatko BE (2004) First sequenced genome of a parasitic 34. Gregory WF, Atmadja AK, Allen JE, Maizels RM (2000) The abundant larval transcript- nematode. Trends Parasitol 20:151–153. 1 and -2 genes of Brugia malayi encode stage-specific candidate vaccine antigens for 15. Kemp DJ, Coppel RL, Anders RF (1987) Repetitive proteins and genes of malaria. Annu filariasis. Infect Immun 68:4174–4179. Rev Microbiol 41:181–208. 35. Rahmah N, et al. (2001) A recombinant antigen-based IgG4 ELISA for the specific and 16. McKean PG, Trenholme KR, Rangarajan D, Keen JK, Smith DF (1997) Diversity in sensitive detection of Brugia malayi infection. Trans R Soc Trop Med Hyg 95:280–284. repeat-containing surface proteins of Leishmania major. Mol Biochem Parasitol 86: 36. Pastrana DV, et al. (1998) Filarial nematode parasites secrete a homologue of the 225–235. human cytokine macrophage migration inhibitory factor. Infect Immun 66: 17. Hoft DF, et al. (1989) Trypanosoma cruzi expresses diverse repetitive protein antigens. 5955–5963. Infect Immun 57:1959–1967. 37. Dissanayake S, Xu M, Piessens WF (1992) A cloned antigen for serological diagnosis of 18. Ibañez CF, et al. (1988) Multiple Trypanosoma cruzi antigens containing tandemly bancrofti microfilaremia with daytime blood samples. Mol Biochem repeated amino acid sequence motifs. Mol Biochem Parasitol 30:27–33. Parasitol 56:269–277. 19. Goto Y, Carter D, Reed SG (2008) Immunological dominance of Trypanosoma cruzi 38. Pou-Barreto C, et al. (2008) Galectin and aldolase-like molecules are responsible for tandem repeat proteins. Infect Immun 76:3967–3974. the specific IgE response in humans exposed to Dirofilaria immitis. Parasite Immunol 20. Schofield L (1991) On the function of repetitive domains in protein antigens of 30:596–602. Plasmodium and other eukaryotic parasites. Parasitol Today 7:99–105. 39. Hirzmann J, et al. (2002) Cloning and expression analysis of two mucin-like genes 21. Fehr T, et al. (1997) Role of repetitive antigen patterns for induction of antibodies encoding microfilarial sheath surface proteins of the parasitic nematodes Brugia and against antibodies. J Exp Med 185:1785–1792. . J Biol Chem 277:47603–47612. 22. Wrightsman RA, Dawson BD, Fouts DL, Manning JE (1994) Identification of 40. Foster J, et al. (2005) The Wolbachia genome of Brugia malayi: Endosymbiont immunodominant epitopes in Trypanosoma cruzi trypomastigote surface antigen-1 evolution within a human pathogenic nematode. PLoS Biol 3:e121. protein that mask protective epitopes. J Immunol 153:3148–3154. 41. Fenn K, Blaxter M (2004) Are filarial nematode Wolbachia obligate mutualist 23. Maizels RM, Gomez-Escobar N, Gregory WF, Murray J, Zang X (2001) Immune evasion symbionts? Trends Ecol Evol 19:163–166. genes from filarial nematodes. Int J Parasitol 31:889–898. 42. Chirgwin SR, et al. (2003) Removal of Wolbachia from is closely linked 24. Maizels RM, Blaxter ML, Scott AL (2001) Immunological genomics of Brugia malayi: to worm death and fecundity but does not result in altered lymphatic lesion Filarial genes implicated in immune evasion and protective immunity. Parasite formation in Mongolian gerbils (Meriones unguiculatus). Infect Immun 71:6986–6994. Immunol 23:327–344. 43. Hoerauf A, et al. (2000) Targeting of Wolbachia endobacteria in Litomosoides 25. Harnett W, Harnett MM, Byron O (2003) Structural/functional aspects of ES-62— sigmodontis: Comparison of tetracyclines with chloramphenicol, macrolides and a secreted immunomodulatory phosphorylcholine-containing filarial nematode ciprofloxacin. Trop Med Int Health 5:275–279. glycoprotein. Curr Protein Pept Sci 4:59–71. 44. Smith HL, Rajan TV (2000) Tetracycline inhibits development of the infective-stage 26. Lobos E, Nutman TB, Hothersall JS, Moncada S (2003) Elevated immunoglobulin E larvae of filarial nematodes in vitro. Exp Parasitol 95:265–270. against recombinant Brugia malayi gamma-glutamyl transpeptidase in patients with 45. Henrichfreise B, et al. (2009) Functional conservation of the lipid II biosynthesis bancroftian filariasis: Association with tropical pulmonary eosinophilia or putative pathway in the cell wall-less bacteria Chlamydia and Wolbachia: Why is lipid II immunity. Infect Immun 71:747–753. needed? Mol Microbiol 73:913–923. 27. Falcone FH, et al. (2001) A Brugia malayi homolog of macrophage migration 46. Weiss M, Schrimpf S, Hengartner MO, Lercher MJ, von Mering C (2010) Shotgun inhibitory factor reveals an important link between macrophages and eosinophil proteomics data from multiple organisms reveals remarkable quantitative recruitment during nematode infection. J Immunol 167:5348–5354. conservation of the eukaryotic core proteome. Proteomics 10:1297–1306. 28. Turner DG, Wildblood LA, Inglis NF, Jones DG (2008) Characterization of a galectin- 47. Subramanian A, et al. (2005) Gene set enrichment analysis: A knowledge-based like activity from the parasitic nematode, , which modulates approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA ovine eosinophil migration in vitro. Vet Immunol Immunopathol 122:138–145. 102:15545–15550.

9654 | www.pnas.org/cgi/doi/10.1073/pnas.1011481108 Bennuru et al. Downloaded by guest on September 26, 2021