UCSF UC San Francisco Electronic Theses and Dissertations

Title Predictions of -Protein Interactions in Schistosoma Mansoni

Permalink https://escholarship.org/uc/item/84h5f2z1

Author White Bear, Javona

Publication Date 2012

Peer reviewed|Thesis/dissertation

eScholarship.org Powered by the California Digital Library University of California

ii iii

Dedication and Acknowledgements

I would like to acknowledge and thank the following collaborators for their time and contribution to this work:

Andrej Sali 1: primary investigator, leading researcher on protein structure prediction, advising, revision, and research contributions.

James H. McKerrow 4: McKerrow lab, leading researcher on S. mansoni, advising, and research contributions.

David T. Barkan 1,2: Sali lab member, collaborative research, prior work, revision and research contributions.

Fred P. Davis 3: Sali lab member, prior work and research contributions.

1 Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, University of California, San Francisco, CA 94158

2 Graduate Group in Bioinformatics, University of California, San Francisco, CA 94158, USA

3 Janelia Farm, Howard Hughes Medical Institute, Ashburn, VA 20147

4 Department of Pathology and Sandler Center for Basic Research in Parasitic Diseases, University of California at San Francisco, San Francisco, California 94158, USA

∗*To whom correspondence should be addressed. Tel: Phone: +1 415 514 4227; Fax: +1 415 514 4231; Email: [email protected]

iv

Predictions of Protein-Protein Interactions in Schistosoma Mansoni

Javona White Bear

Abstract

Background Schistosoma mansoni invasion of the human host involves a variety of cross-species protein-protein interactions. The pathogen expresses a diverse arsenal of that facilitate the breach of physical and biochemical barriers present in skin, evasion of the immune system, and digestion of human hemoglobin, allowing schistosomes to reside in the host for years. However, only a small number of specific interactions between S. mansoni and human proteins have been identified. We present and apply a protocol that generates testable predictions of S. mansoni- human protein interactions. Methods In this study, we first predict S. mansoni-human protein interactions based on similarity to known protein complexes. Putative interactions were then scored and assessed using several contextual filters, including annotation automatically derived from literature using a simple natural language processing methodology. Our method predicted 7 out of the 10 previously known cross-species interactions. Conclusions Several predictions that warrant experimental follow-up were presented and discussed, including interactions involving potential vaccine candidate antigens, protease inhibition, and immune evasion. The application framework provides an integrated methodology for investigation of host-pathogen interactions and an extensive source of orthogonal data for experimental analysis. We have made the predictions available online for community perusal. v

Table of Contents

PRELIMINARY PAGES TITLE I COPYRIGHT II ACKNOWLEDGMENTS III ABSTRACT IV TABLE OF CONTENTS V LIST OF TABLES VI LIST OF FIGURES VII

THESIS

PREDICTION OF PROTEIN-PROTEIN INTERACTIONS IN SCHISTOSOMA MANSONI 1 ABSTRACT 1 AUTHOR SUMMARY 2 INTRODUCTION 2 RESULTS AND DISCUSSION 5 MATERIALS AND METHODS 12 BIBLIOGRAPHY 15 FIGURE S 24 TABLES 26

SUPPLEMENTAL DATA

CERCARIAL ELASTASE INTERACTIONS 31 VENOM ALLERGY PROTEIN INTERACTIONS 43 CALPAIN INTERACTIONS 44 CYSTATIN INTERACTIONS 46 TETRASPANIN INTERACTIONS 47 HUMAN IMMUNOGLOBIN INTERACTIONS 48

RELEASE FORM

LIBRARY RELEASE FORM 49

vi

List of Tables

Table Title Page 1 Network Filtered Interactions 26 2 Biological Filtered Interactions 26 3 Biological Filtered Interactions Correlated to S. mansoni 26 Lifecycle 4 Comparison of known and predicted S. mansoni protein 26 interactions 5 Potential S. mansoni Protein Interactions. 27 S1 Cercarial Elastase Interactions 31 S2 Venom Allergy Protein interactions 43 S3 Calpain Interactions 44 S4 Cystatin Interactions 46 S5 Tetraspanin interactions 47 S6 Human Immunoglobin Interactions 48

vii

List of Figures

Figure Title Page 1 Prediction Framework 24 2 Retrospective Predictions: Collagen & C3 25 3 Prospective Predictions: Elafin & CD59. 25

1

Author Summary

The S. mansoni parasite is the etiological agent of the disease Schistomiasis. However, protein-protein interactions have been experimentally characterized that relate to pathogenesis and establishment of in- fection. As with many pathogens, the understanding of these interactions is a key component for the development of new vaccines. In this project, we have applied a computational whole-genome compar- ative approach to aid in the prediction of interactions between S. mansoni and human proteins and to identify important proteins involved in infection. The results of applying this method recapitulate several previously characterized interactions, as well as suggest additional ones as potential therapeutic targets.

Introduction

Etiological Agents and Effects of Schistomiasis

Schistosoma are dioecious parasitic trematodes (flukes) that cause the chronic disease Schistomiasis, affecting over 230 million people world-wide and causing more 200,000 deaths a year. They are digenetic organisms with six life cycle stages, four of which take place in the human host. [1] Schistosoma mansoni, one of the major etiological agents in Africa and South America of chronic Schistomiasis, releases eggs that become trapped in host tissues, triggering an unsuccessful immune response and causing a host granulomatous reaction, that in its most sever forms, causes fibrosis, scarring, and portal hypertension.

The host granulomatous reaction is a primary cause of mortality associated with Schistomiasis. [2,3]

Infection during childhood frequently results in growth retardation and anemia. The parasite persists in the host for up to 40 years with a high possibility of reinfection in endemic areas. [4] Standard methods for treating Schistosomiasis are not cost effective or affordable in most endemic populations, are ineffective as a prophylaxis against newly acquired infections (i.e. the cercarial and schisotsomula stages of the life cycle), and are becoming increasingly less effective against established infections due to parasite resistance. [5–7] Therefore, there is a need for improved and affordable treatments. [7, 8] 2

S. mansoni - Human Protein Interactions Involved in Pathogenesis and Infec- tion Across Life Cycle Stages

S. mansoni infection involves parasite- human protein interactions over four of the six parasite life cycle stages [1]. Infection begins during the cercarial stage of the life cycle when the freshwater-dwelling larval cercariae contacts the human host. Invasion of the skin is achieved through degradation of the extracellular matrix by cercarial elastase; this same enzyme may help avoid the host immune response through cleavage of human C3 Complement [9] S. mansoni sheds its immunogenic tail to progress to the more mobile schistosomula life cycle stage, where it is carried by blood flow to the lungs and hepatic portal system.

Using proteomic analysis, cercarial elastase was implicated in the cleavage of an extensive list of human proteins, with follow-up experiments confirming its cleavage of at least seven dermal proteins [10].

After schistosomula entry, maturation to the adult life cycle stage occurs in the inferior mesenteric blood vessels where a number of proteins aid in immune evasion and digestion of human hemoglobin proteins. Among the proteins expressed in the adult cycle is the adult tegument surface protein Sm29, a potential Schistosomiasis vaccine candidate antigen. Sm29 interacts with unknown human immune proteins [11]. The final life cycle stage in humans is the egg phase; mated adults produce hundreds of eggs per day to facilitate transmission back to fresh water. The immune reaction to eggs leads to schistome pathogenesis. [12]

Large-Scale Computational Prediction of Protein-Protein Interactions

Ongoing efforts to address Schistomiasis include the development of new vaccines. Knowledge of the spe- cific protein-protein interactions between the pathogen and human host can greatly facilitate this effort.

However, a comprehensive literature review revealed only eleven confirmed interactions (Table 4), indi- cating the characterization effort is still in its infancy. These interactions were identified by experiments such as in vitro Edman degradation [10], fluorescence end point assay [13], crystallagrophy [14], and mea- surement of released radioactivity from a suspension [15]. Further types of low-throughput experiments could be based on hypothesis of specific predicted protein interactions [10].

While many methods have been developed to predict intraspecies protein-protein interactions, few have focused specifically on interspecies interactions, where knowledge of the biological context of patho- 3 genesis can be used to refine predictions. Previously, our lab developed a protocol to predict interactions and applied these in the host-pathogen context [16,17]. Host-pathogen protein complexes were identified using comparative modeling based on a similarity to protein complexes with experimentally determined structures. The binding interfaces of the resulting models were assessed by a residue contact statistical potential, and filtered to retain the pairs known to be expressed in specific pathogen life cycle stages and human tissues (i.e. Biological Context Filter), thus increasing confidence in the predictions. The host-pathogen prediction protocol was benchmarked and applied to predict interactions between human and ten different pathogenic organisms [16, 17].

Informing Computational Predictions

A crucial step for the construction of the Biological Context Filter is to annotate pathogen proteins by the life cycle stages in which they were expressed. This step is especially informative for S. mansoni, a digenetic organism, with life cycle stages and protein expression specific to both the molluscum and human hosts. While the has been extensively annotated and made generally available, pathogen genome and expression annotation can be more elusive. Pathogens such as Plasmodium falciparum have been sequenced and extensively annotated [18]. In comparison, the S. mansoni genome, while much larger than that of P. falciparum (11,809 S. mansoni proteins vs 5,628 P. falciparum proteins), was only recently sequenced and assigned a full set of accession identifiers in GeneDB [19]. The sequencing effort is ongoing and there is limited annotation of corresponding proteins and structural information available. Thus, it is challenging even to cross-reference S. mansoni proteins described in various reports, particularly in those published prior to full genome sequencing [20]. Additionally, life cycle stage annotation is difficult to access or even absent in most databases.

The best source of annotation is directly from primary references in literature. Most primary reference accessions were often embedded in portable document format (pdf) files and other file formats that make extraction challenging. Furthermore, the context of an extracted accession and the life cycle stage to which it applies must be isolated from each reference and verifiable for accuracy and further study. We address these challenges by designing a simple natural language processing engine (NLP) that accomplishes data extraction, accuracy, verifiability, and correlation between disperse data sets required to construct the

Biological Context Filter.

In Results below, we describe the benchmarking of our host-pathogen prediction protocol against 4 experimentally characterized interactions, followed by a discussion of prospective prediction of novel host-pathogen interactions that may warrant experimental follow-up.

Results & Discussion

Protein Interaction Prediction

The protocol begins with the initial set of 3,052 S. mansoni and 8,784 human protein sequences (Fig. 1) for which high-quality models could be created using MODTIE [17]. The initial interaction predictions

(Initial Predictions) were obtained by assessing host-pathogen interactions for which comparative mod- els could be constructed. In previous work, the fraction of pathogen proteins that aligned to a protein template of previously observed solved complex structures averaged 21% [17]. Here, only 13.9% of S. mansoni proteins could be modeled using such a template. Human proteome interaction template cover- age remained consistent with previous work at 34%. Overall, the protocol predicted 528,719 cross-species initial potential interactions. (Table 1).

Next, three network-level filters were applied to prune the initial predictions. The first Network Filter removed interactions based on promiscuous domains, defined as templates used for more than 1% of the total predictions. Promiscuous domains, while present in many interacting complexes lack specificity and are overrepresented in the predicted data set, making them less desirable as vaccine candidate antigens.

For example, a domain in the crystal structure of HIV Capsid Protein (p24) bound to FAB13B5, Protein

Data Bank (PDB 1E6J), is frequently used template in potential interactions.

However, immunoglobin is conserved in mammals and not specific to human interactions. Many vari- ations will either not be applicable to S. mansoni and human interactions or conserved across species for binding similar epitopes. While most of the Fab (fragment antigen binding) region, which form the paratope, is highly variable in sequence, it is composed of less than 22 amino acids and relatively small.

Many templates will score above the alignment threshold for this portion of the paratope and the short protein sequence acting as the epitope in the binding site based on shorter sequence lengths, high vari- ability, and conservation across species resulting in a disproportionate number of potential interactions.

Templates, similar to p24, were too generalized to draw conclusions with any degree of confidence and removed as promiscuous domains in the first Network Filter.

In the second Network Filter, predictions based on homodimer complexes were removed. This step 5 removes instances where highly conserved interacting dimers form similar complexes in both S. mansoni and humans due to speciation events [21, 22]. An example of such is the FGFR2 tyrosine kinase domain

(PDB 1E6J). FGFR2 has high similarity to tyrosine kinases in S. mansoni, but were generally conserved in eukaryotes and thus comprise a bias in the homodimerization interaction of the catalytic subunits [23].

In the final Network Filter step, interactions with less than a 97% confidence interval were removed to further narrow the focus of potential interactions to a higher confidence level. This filtering results in the remaining 34,164 interactions.

Next, the Biological Context Filter isolates potential interactions according to various life cycle stages of S. mansoni and likelihood of in vivo occurrence in human tissues. In the first step of the Biological

Context Filter, interactions passing the Network Filter were enriched with data from a simple natural language processing (NLP) literature search and databases using listed nomenclature and functional in- formation (Methods). However, there is limited life cycle stage expression information using database annotation.

An NLP engine was designed to address this limitation and accomplished the following: (1) charac- terization of 12,720 S. mansoni automatically from primary reference; (2) recording of contextual, life cycle stage, and citation information into a customized database; and (3) programmatic correlation of this data with existing database annotations. This resulted in annotation of 96.6% of S.mansoni sequences, greatly exceeding existing annotation from any single database, which topped out at 62%.

NLP annotation further extended this coverage with life cycle stage and characterization information not readily available in database annotation.

Next, the Life Cycle / Tissue Filter refines interactions for likelihood of in vivo interaction based on biological context derived from NLP of the component proteins in each interacting complex and their expression in the four life-cycle stages of S. mansoni different human tissues (Materials & Methods). The list of pathogen life cycle stage and human tissue pairs was generated (Table 2).

The third Biological Context Filter applies a targeted post-process analysis of potential interactions.

In this step, NLP parameters were used to rank the prediction based on number of occurrences and the assigned weight of the literature where the observation occurred (Materials & Methods).

In previous work predicting host-pathogen protein interactions, filters resulted in a wide range of reductions for different pathogen genome due to varying levels of biological annotation available for each genomes. The majority of the biological annotations in Davis et. al. 2007 [17] were not relevant in a 6 pathogenic context and therefore did not pass the filtering, while pathogen proteins had limited life cycle stage annotation resulting in multiple host-pathogen data sets with no interactions [17].

In the current framework, 22,743 (Table 2) interactions passed both biological and network-level fil- ters, which was 51.5% more than the average of the ten pathogens in the previous work despite a below average model coverage. This increase is largely due to the NLP annotation, which produced a large num- ber of pathogen proteins with a defined life cycle stage. Overall, the Biological Context Filter resulted in 1,345 annotated interactions in likely in vivo S. mansoni life cycle stage and human tissue interaction sites (Table 2).

Assessment I: Comparison of predicted and known S. mansoni-human protein interac- tions.

To assess the predictions, we first compared the predicted set with the set of known S. mansoni-human protein interactions. There were 10 confirmed interactions between S. mansoni and human proteins.

Among the 10, there is only one structure available in PDB (Table 4). The host-pathogen application framework recovered 7 of the 10 known interactions. The majority (7/10) of experimentally characterized

S. mansoni-human protein interactions involve the serine peptidase cercarial elastase). Several experi- ments have characterized the cleavage by cercarial elastase of extracellular membrane and complement proteins [10, 12, 24–26].

Our method recapitulated several of these interactions. First, a retrospective prediction was made between the enzyme and human collagen based on the template structure of tick tryptase inhibitor in complex with bovine trypsin (PDB 2UUY) (Fig 2). Previous studies indicate that the enzyme has a role in suppressing host immune response (Table 3) [10,27]; its similarity to tryptase, which has been used as a indicator of mast cell activations and an important mechanism of host defense against pathogens [28],is consistent with this suggested role of cercarial elastase in pathogenesis.

In addition to cercarial elastases cleavage of extracellular proteins, several important protein-protein interactions involved in S. mansoni immune evasion have been characterized, including its cleavage of

Complement C3 (Table 3). Our method retrospectively predicted this interaction based on the structure of fire ant chymotrypsin in complex with the PMSF inhibitor (PDB 1EQ9) (Fig 2). Fire ant chymotrypsin, which is similar to elastases in many species,, degrades proteins for digestion and is a known target for blocking growth from the ant larval stage to adult in ant-infested areas [29]. 7

Assessment II: Comparison of Vaccine Candidate Antigens and S. mansoni-human protein interactions.

Next we assessed predictions against experimentally characterized vaccine candidate antigens where the mechanism, specificity, and interacting human proteins were still undetermined. Currently, there were

9 S. mansoni proteins considered as vaccine candidate antigens and 5 protein groups viewed as vaccine candidate antigens. We predicted interactions with 5 of the current vaccine candidate antigens and all of the potential vaccine candidate antigens (Table 5).

We now describe two specific examples of predicted interactions involving S. mansoni protein vac- cine candidate antigens that may warrant experimental follow-up; these interactions are consistent with literature hypotheses. As noted, cercarial elastase is known to cleave several human proteins (Table 4), and it is considered a vaccine candidate antigen due to its abundance in S. mansoni cercarial secretions.

Functionally, it has been indicated as the primary means of pathogen entry across the human dermal barriers, the first stage of pathogenesis [24].

We predicted novel interactions between cercarial elastase, its isoforms and other human proteins including calpains, cystatins, tetraspanins, immune and complement proteins (Table 5). An example is the interaction between cercarial elastase and the elastase specific inhibitor elafin.This prediction is based on the template crystal structure of elafin complexed with porcine pancreatic elastase (PDB 1FLE) (Fig

3). Elafin plays a wound-healing role in the dermal immune response in humans and is an antimicrobial against other pathogens such as Pseudomonas aeruginosa and Staphylococcus aureus [30]. Elafin has been demonstrated to bind with high affinity to both human leukocyte elastase and porcine pancreatic elastase. If a similar affinity can be characterized for its binding to cercarial elastase, elafin could act as a barrier to S. mansonis ability to establish infection. Experimentally validated binding with cercarial elastase could show that increased concentrations of elafin may be sufficient to prevent onset of infection and act as an effective barrier against infection [31].

The Schistosoma protein Sm29 another vaccine candidate antigen indicated in pathogenic immune evasion, was involved in several predictions. The predicted interaction with the human CD59 protein, involved in the complement membrane attack complex (MAC), would aid S. mansonis ability to disable immune response. This prediction was based upon the template of ATF-urokinase and its receptor (PDB

2I9B) (Fig 3), which is involved in multiple patho-physiological processes. Sm29, an uncharacterized 8 transmembrane protein, is an S. mansoni surface protein in both the schistosomula and adult life cycle stages that has been indicated in several immune response interactions, making it an important vaccine candidate antigen.

CD59, protectin, regulates complement, inhibits the membrane attack complex (MAC), prevents lysis and is exploited as an established immune evasion tactic used by viruses [32, 33]. Murine experiments indicate immunization with rSm29 reduces S. mansoni parasite burdens and offers protective immunity; however, the exact mechanism has not been characterized. Experimental validation of the predicted in- teraction between Sm29 and CD59 could provide further insight on how S. mansoni inhibits the MAC and additional strategies for preventing this inhibition. Additional interactions involving vaccine candidate antigens and key targets are referenced in the supplement.

Limitations

The S. mansoni genome was only recently sequenced [20], and there were fewer than 39 validated crystal structures of pathogen proteins available, with only one of these in complex with a human protein. Initial predictions rely on sequence and structure comparison to known interacting complexes, thus the lack of available protein structures in complex limits the coverage of the protocol. Additional experimental ef- forts will increase coverage and accuracy, by identifying more S. mansoni and human protein interactions, more protein interactions in complex, and further characterizing the biology for comparative analysis.

Furthermore, template coverage is primarily restricted to domain- mediated interactions, although peptide-mediated interactions are also known to contribute to protein interaction networks [34]. Peptide motifs that mediate protein interactions were identified through a combination of computational and ex- perimental methods [35,36], and application of these motif-based methods will likely expand the coverage of host-pathogen protein interactions.

Prediction Errors

Several factors affect the accuracy of the method. These include errors in the comparative modeling process [37], the coarse-grained nature of the statistical potential used to assess the interface residue contacts [16], and consideration of only interactions between individual domains that could lead to pre- dicted interactions that were unfavorable in the context of the full-length proteins. Additionally, both S. mansoni and humans are eukaryotic species, which means core cellular components, such as translation 9 machinery, metabolic enzymes, and ubiquitin-signaling components are conserved and comprise many of the initially predicted interactions.

We address the similarities in conserved structures using the Biological Context Filter to remove complexes where there was a low possibility of in vivo occurrence, homodimer complexes that clearly involve conserved machinery, and high frequency template domains that could indicate both conserved sequences and structures as well as sequence-structure bias due to lack of interacting template coverage.

For example, S. mansoni has been shown to secrete chemokine binding proteins as a decoy mechanism that modulates the host immune response. These proteins would be difficult to identify and characterized using known proteomic analysis and would likely be homologous to human proteins and would introduce noise into the detection and isolation of these types of interactions [38].

Future Work

Computational prediction and identification of protein-protein interactions is an important aspect in the development of new vaccines and vaccine candidate antigens. A variety of approaches such as ge- nomic proximity, fission/ fusion, phylogenetic tree similarity, gene co-occurrence, co-localization, co-expression, and other features that only make sense or are currently feasible in the context of a single genome [39]. Comparative approaches offer a broad spectrum analysis of protein-protein interactions based on previously observed interactions.

The results of the approach and targeted analysis used on S. mansoni-human protein interactions here suggest that sequence and structure-based methods are an applicable approach [16,40]. This method has several extensions including those that identify peptide motifs [34] , sequence signatures [41] that mediate interactions, and further extension of analysis and interpretation strategies such as analysis of the genetic polymorphisms at loci encoding for the proposed interacting proteins.

Potential impact

We developed a computational whole-genome method to predict potential host-pathogen protein in- teractions between S. mansoni and humans. Our results show seven validated predictions already ex- perimentally characterized and numerous potential interactions involving proteins indicated as vaccine candidate antigens or potential vaccine candidate antigens. Despite limitations in S. mansoni struc- tural coverage, our results demonstrate that broad-spectrum data enrichment and analysis is an effective 10 method for protein-protein interaction prediction and highlight several potential immunization targets against S. mansoni. A lists of high-confidence predictions from targeted analysis are available online at http://salilab.org/hostpathogen. Additionally, in the tradition of open source efforts of the biomed- ical scientific community, the application framework is available for download. In closing, we expect our method to complement experimental methods in and provide insight into the basic biology of S. mansoni-human protein interactions.

Materials and Methods

The initial predictions of S. mansoni-human were generated based on a protocol described in [17], briefly reviewed here. First, genome-wide S. mansoni and human protein structure models were calculated by

MODPIPE [42], an automated software pipeline for large-scale protein structure modeling [43]. MOD-

PIPE uses MODELLER [44] to perform the canonical comparative modeling steps of fold assignment, target-template alignment, model construction, and model assessment. High-scoring models were de- posited in MODBASE [45], a publicly accessible database of comparative models. Next, resulting models were aligned to SCOP domain sequences, and if a model aligned to a SCOP sequence with more than

70% identity, it was assigned that SCOP domain identifier. These annotations were used as the basis for a search in PIBASE, a database of domain-domain interactions. In this search, those models assigned a

SCOP domain that was part of a PIBASE interaction were structurally aligned to the conformation of that domain in the complex. In cases where a human model was aligned to one domain in a PIBASE inter- action and an S. mansoni model was aligned to the other domain, a putative modeled complex resulted.

This complex was then assessed with the MODTIDE potential, which outputs a Z-score approximating the statistical likelihood of the individual domain interface residues forming a complex across the two proteins. A detailed description of the full protocol is available in [16]. We refer to the resulting set of predictions as Initial Predictions. 11

Filtering Interactions

Two sets of filters were applied to the resulting interactions. The first filter, referred to as the Network

Filter was based on aspects of the modeling and scoring process. The second filter, referred to as the

Biological Context Filter, was based on the stages of the life cycle and tissue pairs (Figure 1).

Application of Network Filters

Predictions based on templates used for more than 1% of the total number of S. mansoni and human interactions were considered promiscuous and removed. 242,677 (45.9%) (Table 1) interactions met this criterion due to the overall similarity in eukaryotic organisms for network level machinery [46] and to the lack of known structure information for S. mansoni proteins. High confidence interactions were isolated based on previous work demonstrating an optimal statistical potential Z-score threshold of -1.7, which gave true-positive and false-positive rates of 97% and 3%, respectively [17]. The homodimer complex filter removed predicted interactions based on template complexes formed by protein domains from the same

SCOP family excluding highly conserved eukaryotic pathways. These predictions primarily consisted of multimeric enzyme complexes formed by host and pathogen proteins, as well as core cellular components such as ribosome subunits, proteasome subunits, and core cellular components [17]. In total, 143,065 homo-dimer complexes were removed from the filter set based on this criteria (Table 1).

Application of Biological Context Filters

Interactions that pass the Network filtering are then filtered for biological context using the following methods.

Natural Language Processing (NLP) & Enrichment Filters

S. mansoni Protein Annotation.

In preparation for applying the Life Cycle / Tissue Filter, a Natural Language Processing (NLP) protocol was created to automatically identify from the literature which S. mansoni proteins were expressed in different pathogen life cycle stages. S. mansoni protein database identifiers and their amino acid se- quences were extracted from the GeneDB [19], National Center for Biotechnology Information [NCBI],

TIGR [47], and Uniprot/TrEMBL databases [48]. A literature search identified experiments indicating 12 proteins expressed in different S. mansoni life cycle stages and categorized each literature reference into corresponding life cycle stages. All literature was then mined using NLP to derive accessions and context information. Accessions were derived from the text with regular expression searches corresponding to the specifications of the database (for example, a word in the text matching the regular expression form

[A-Z][0-9]5 indicates a Uniprot Accession).

Thus, for each paper, a list of protein accessions was obtained. All protein accessions were then mapped by comparing sequences to Smp accession, Uniprot accession, and NCBI accession, in that order of priority. Thus, the final result of NLP processing was a list of all accessions of proteins expressed in life cycle stages of S. mansoni. S. mansoni protein sequence data from these initial interactions were en- riched from biological annotation obtained from MODBASE [45], GeneDb [19], NCBI, Uniprot [48], and primary reference in literature. The annotations included protein names, links to referenced resources, and any available functional annotation.

Human Protein Annotation.

Human proteins were annotated for tissue expression (GNF Tissue Atlas [49], known expression on cell surface, and known immune system involvement (ENSEMBL [50]. Functional annotation for each protein was obtained from Annotation (GOA) [51]. Human protein sequences were correlated with predicted interacting sequences to determine involvement [16].

Life Cycle/ Tissue Filter

Next, the Biological Context Filter was applied to S. mansoni and human protein interactions in the four life cycle stages associated with pathogenesis and infections in humans. S. mansoni proteins were

filtered by life cycle stage, known expression and excretion, using NLP and database annotation. An interaction had to be present in the host tissue associated with the specific stage of pathogenesis and that

S. mansoni life cycle stage to be included in the resulting interactions. The following life cycle stage and tissue pairs were applied to filter interactions: (1) cercariae proteins and human proteins expressed in skin, (2) schistosomula proteins and human proteins expressed in skin, lungs, bronchial, liver, endothelial cells, immune cells, red blood cells, blood, T-cells, early erythroid cells, NK cells, myeloid cells, and

B-cells, (3) adult S. mansoni and human proteins expressed in liver, endothelial cells, immune cells, red blood cells, blood, T-cells, early erythroid cells, NK cells, myeloid cells, and B-cells, and (4) eggs and 13 human proteins expressed in liver, endothelial cells, immune cells, red blood cells, blood, T-cells, early erythroid cells, NK cells, myeloid cells, and B-cells.

Targeted Filter

The final step in the Biological Context Filter uses a targeted post process analysis based on NLP and database annotations using two additional data mining steps. For each of the interacting protein complex pairs, three parameters were analyzed: pairwise expression in both known human tissue target and S. mansoni life cycle stage as indicated by the Life Cycle / Tissue Filter, expression or involvement in known human immunogenic responses, and S. mansoni protein expression or involvement with human proteins targeted by other parasites.

Parameters for additional data mining in the target analysis include the following criteria: investigator- selected proteins of interest and NLP derived key terms that were used to target annotation data in protein names and functional annotation (Uniprot [48], GeneDB [19], Gene Ontology [GO] [52]). Proteins selected as targets were assigned weights composed of two factors: (1) an average weight of number of citations across all references to the number of actual references used and (2) an investigator-assigned rank (1-3) based on significance and scope of primary reference/ experiments of NLP sources. The names of proteins and functional annotation were mined for the weighted key terms. All investigator-selected proteins of interest were presumed to pass filter criteria and the remaining interactions were ordered based on key term weights, rank, and Z score.

Assessments

Predictions were benchmarked against confirmed S. mansoni-human interactions, which were compiled from the literature. Prospective interactions were assessed using vaccine candidate antigens and hypoth- esized vaccine candidate antigens where interactions have not been confirmed although several potential human protein binders have been experimentally identified (Table 4). Orthogonal biological information implemented in the filters provided significant enrichment of observed interactions (97% of predicted com- plexes were enriched). The number of protein pairs was reduced by about three orders of magnitude and assessment against previously characterized interactions (63% of known interactions predicted) suggests the method was applicable for genome-wide predictions of protein complexes. 14

References

1. World Health Organization. WHO Schistosomiasis, June 2012. URL http://www.who.int/

topics/schistosomiasis/en/.

2. ML Burke, MK Jones, GN Gobert, and YS Li. Immunopathogenesis of human schistosomiasis.

Parasite Immunology, 31(4):163–176, April 2009.

3. Thomas A Wynn, Robert W Thompson, Allen W Cheever, and Margaret M Mentink-Kane. Im-

munopathogenesis of schistosomiasis. Immunological reviews, 201:156–167, October 2004.

4. Center for Disease Control. CDC - Schistosomiasis - Biology, June 2012. URL http //www.

cdc.gov/parasites/schistosomiasis/biology.html.

5. Gabriela Ribeiro-dos Santos, Sergio Verjovski-Almeida, and Luciana C C Leite. Schistosomiasis—a

century searching for chemotherapeutic drugs. Parasitology Research, 99(5):505–521, April 2006.

6. Jennifer Keiser Xiao Shuhua Marcel Tanner Burton H Singer J¨urgUtzinger. Combination

Chemotherapy of Schistosomiasis in Laboratory Studies and Clinical Trials. Antimicrobial Agents

and Chemotherapy, 47(5):1487, May 2003.

7. Allen G.P. Ross, Paul B. Bartley, Adrian C. Sleigh, G. Richard Olds, Yuesheng Li, Gail M.

Williams, and Donald P. McManus. Schistosomiasis — NEJM. New England Journal of Medicine,

346(16):1212–1220, 2002.

8. M Ismail, S Botros, A Metwally, S William, A Farghally, L F Tao, T A Day, and J L Bennett. Resis-

tance to praziquantel direct evidence from Schistosoma mansoni isolated from Egyptian villagers.

The American journal of tropical medicine and hygiene, 60(6):932–935, June 1999.

9. Elizabeth Hansell, Simon Braschi, Katalin F Medzihradszky, Mohammed Sajid, Moumita Debnath,

Jessica Ingram, K.C. Lim, and James H McKerrow. Proteomic Analysis of Skin Invasion by Blood

Fluke Larvae. PLoS Neglected Tropical Diseases, 2(7):e262, July 2008.

10. Jessica Ingram, Giselle Knudsen, K.C. Lim, Elizabeth Hansell, Judy Sakanari, and James McK-

errow. Proteomic Analysis of Human Skin Treated with Larval Schistosome Peptidases Reveals

Distinct Invasion Strategies among Species of Blood Flukes. PLoS Neglected Tropical Diseases, 5

(9):e1337, September 2011. 15

11. Fernanda C Cardoso, Gilson C Macedo, Elisandra Gava, Gregory T Kitten, Vitor L Mati, Alan L

de Melo, Marcelo V Caliari, Giulliana T Almeida, Thiago M Venancio, Sergio Verjovski-Almeida,

and Sergio C Oliveira. Schistosoma mansoni Tegument Protein Sm29 Is Able to Induce a Th1-Type

of Immune Response and Protection against Parasite Infection. PLoS Neglected Tropical Diseases,

2(10):e308, October 2008.

12. L.A.L. Quezada, M. Sajid, K.C. Lim, and J.H. McKerrow. A Blood Fluke Serine Protease Inhibitor

Regulates an Endogenous Larval Elastase. Journal of Biological Chemistry, 287(10):7074–7083,

March 2012.

13. Akhmed Aslam, Phyllis Quinn, Richard S McIntosh, Jianguo Shi, Ashfaq Ghumra, James H McK-

errow, Karen A Bunting, David W Dunne, Michael J Doenhoff, Sherie L Morrison, Ke Zhang, and

Richard J Pleass. Proteases from Schistosoma mansoni cercariae cleave IgE at solvent exposed

interdomain regions. Biochimica et Biophysica Acta (BBA) - Enzymology, 45(2):567–574, January

2008.

14. Rui Zhao Craig McFarland Jeffrey Kieft Anna Niedzwiecka Marzena Jankowska-Anyszka Janusz

Stepinski Edward Darzynkiewicz David N M Jones Richard E Davis Weizhi Liu. Structural In-

sights into Parasite eIF4E Binding Specificity for m7G and m2,2,7G mRNA Caps. The Journal of

Biological Chemistry, 284(45):31336, November 2009.

15. S Tzeng, J.H. McKerrow, K Fukuyama, K Jeong, and W L Epstein. Degradation of purified skin

keratin by a proteinase secreted from Schistosoma mansoni cercariae. The Journal of Parasitology,

69(5):992–994, 1983.

16. F P Davis. Protein complex compositions predicted by structural similarity. Nucleic Acids Re-

search, 34(10):2943–2952, May 2006.

17. Fred P Davis, David T Barkan, Narayanan Eswar, James H McKerrow, and Andrej Sali. Host-

pathogen protein interactions predicted by comparative modeling. Protein Science, 16(12):2585–

2596, December 2007. URL http://pibase.janelia.org/modtie/v1.11/index.html.

18. Malcolm J Gardner, Neil Hall, Eula Fung, Owen White, Matthew Berriman, Richard W Hyman,

Jane M Carlton, Arnab Pain, Karen E Nelson, Sharen Bowman, Ian T Paulsen, Keith James, 16

Jonathan A Eisen, Kim Rutherford, Steven L Salzberg, Alister Craig, Sue Kyes, Man-Suen Chan,

Vishvanath Nene, Shamira J Shallom, Bernard Suh, Jeremy Peterson, Sam Angiuoli, Mihaela

Pertea, Jonathan Allen, Jeremy Selengut, Daniel Haft, Michael W Mather, Akhil B Vaidya, David

M A Martin, Alan H Fairlamb, Martin J Fraunholz, David S Roos, Stuart A Ralph, Geoffrey I

McFadden, Leda M Cummings, G Mani Subramanian, Chris Mungall, J Craig Venter, Daniel J

Carucci, Stephen L Hoffman, Chris Newbold, Ronald W Davis, Claire M Fraser, and Bart Barrell.

Genome sequence of the human malaria parasite Plasmodium falciparum. Nature, 419(6906):498–

511, October 2002.

19. F J Logan-Klumpler, N De Silva, U Boehme, M B Rogers, G Velarde, J A McQuillan, T Carver,

M Aslett, C Olsen, S Subramanian, I Phan, C Farris, S Mitra, G Ramasamy, H Wang, A Tivey,

A Jackson, R Houston, J Parkhill, M Holden, O S Harb, B P Brunk, P J Myler, D Roos, M Carring-

ton, D F Smith, C Hertz-Fowler, and M Berriman. GeneDB–an annotation database for pathogens.

Nucleic Acids Research, 40(D1):D98–D108, December 2011.

20. Matthew Berriman, Brian J Haas, Philip T LoVerde, R Alan Wilson, Gary P Dillon, Gustavo C

Cerqueira, Susan T Mashiyama, Bissan Al-Lazikani, Luiza F Andrade, Peter D Ashton, Mar-

tin A Aslett, Daniella C Bartholomeu, Gaelle Blandin, Conor R Caffrey, Avril Coghlan, Richard

Coulson, Tim A Day, Art Delcher, Ricardo DeMarco, Appolinaire Djikeng, Tina Eyre, John A

Gamble, Elodie Ghedin, Yong Gu, Christiane Hertz-Fowler, Hirohisha Hirai, Yuriko Hirai, Robin

Houston, Alasdair Ivens, David A Johnston, Daniela Lacerda, Camila D Macedo, Paul McVeigh,

Zemin Ning, Guilherme Oliveira, John P Overington, Julian Parkhill, Mihaela Pertea, Raymond J

Pierce, Anna V Protasio, Michael A Quail, Marie-Ad`eleRajandream, Jane Rogers, Mohammed

Sajid, Steven L Salzberg, Mario Stanke, Adrian R Tivey, Owen White, David L Williams, Jennifer

Wortman, Wenjie Wu, Mostafa Zamanian, Adhemar Zerlotini, Claire M Fraser-Liggett, Barclay G

Barrell, and Najib M El-Sayed. The genome of the blood fluke Schistosoma mansoni. Nature, 460

(7253):352–358, July 2009.

21. I Ispolatov. Binding properties and evolution of homodimers in protein-protein interaction net-

works. Nucleic Acids Research, 33(11):3629–3635, June 2005.

22. S K Hanks and T Hunter. Protein kinases 6. The eukaryotic protein kinase superfamily kinase

(catalytic) domain structure and classification. Journal of the Federation of American Societies 17

for Experimental Biology, 9:576–596, May 1995.

23. Diana Bahia, Luiza Freire Andrade, Fernanda Ludolf, Renato Arruda Mortara, and Guilherme

Oliveira. Protein tyrosine kinases in Schistosoma mansoni. Mem´oriasdo Instituto Oswaldo Cruz,

101:137–143, 2006.

24. P Jones H Sage S Pino-Heiss J H McKerrow. Proteinases from invasive larvae of the trematode

parasite Schistosoma mansoni degrade connective-tissue and basement-membrane macromolecules.

Biochemical Journal, 231(1):47, October 1985.

25. J.H. McKerrow, S Pino-Heiss, R Lindquist, and Z Werb. Purification and characterization of

an elastinolytic proteinase secreted by cercariae of Schistosoma mansoni. Journal of Biological

Chemistry, 260(6):3703–3707, March 1985.

26. Rachel S Curwen, Peter D Ashton, Shobana Sundaralingam, and R Alan Wilson. Identification

of novel proteases and immunomodulators in the secretions of schistosome cercariae that facilitate

host entry. Molecular & cellular proteomics MCP, 5(5):835–844, May 2006.

27. R Correa-Oliveira, E J Pearce, G C Oliveira, D B Golgher, N Katz, L G Bahia, O S Carvalho,

G Gazzinelli, and A Sher. The human immune response to defined immunogens of Schistosoma

mansoni elevated antibody levels to paramyosin in stool-negative individuals from two endemic

areas in Brazil. Transactions of the Royal Society of Tropical Medicine and Hygiene, 83(6):798–

804, November 1989.

28. S He, M D Ga¸ca,and A F Walls. A role for tryptase in the activation of human mast cells:

modulation of histamine release by tryptase and inhibitors of tryptase. The Journal of pharmacology

and experimental therapeutics, 286(1):289–297, July 1998.

29. Istvan Botos, Erik Meyer, Myhanh Nguyen, Stanley M Swanson, John M Koomen, David H Russell,

and Edgar F Meyer. The structure of an insect chymotrypsin. Journal of Molecular Biology, 298

(5):895–901, May 2000.

30. A J Simpson, A I Maxwell, J R W Govan, C Haslett, and J M Sallenave. Elafin (elastase-

specific inhibitor) has anti-microbial activity against Gram-positive and Gram-negative respiratory

pathogens. Biochimica et Biophysica Acta (BBA) - Enzymology, 452(3):309–313, June 1999. 18

31. O Wiedow, J L¨uademann,and B Utecht. Elafin is a potent inhibitor of proteinase 3. Biochemical

and biophysical research communications, 174(1):6–10, January 1991.

32. Jiusheng Deng, Daniel Gold, Philip T LoVerde, and Zvi Fishelson. Inhibition of the Complement

Membrane Attack Complex by Schistosoma mansoni Paramyosin. INFECTION AND IMMUNITY,

71(11):6402–6410, November 2003.

33. Zvi Fishelson. Novel mechanisms of immune evasion by Schistosoma mansoni. Mem´oriasdo

Instituto Oswaldo Cruz, 90(2):289–292, 1995.

34. Victor Neduva and Robert B Russell. Peptides mediating interaction networks: new leads at last.

Biochimica et Biophysica Acta (BBA) - Enzymology, 17(5):465–471, October 2006.

35. Amy Hin Yan Tong, Becky Drees, Giuliano Nardelli, Gary D Bader, Barbara Brannetti, Luisa

Castagnoli, Marie Evangelista, Silvia Ferracuti, Bryce Nelson, Serena Paoluzi, Michele Quondam,

Adriana Zucconi, Christopher W V Hogue, Stanley Fields, Charles Boone, and Gianni Cesareni. A

Combined Experimental and Computational Strategy to Define Protein Interaction Networks for

Peptide Recognition Modules. Science Signaling, 295(5553):321, January 2002.

36. Victor Neduva, Rune Linding, Isabelle Su-Angrand, Alexander Stark, Federico de Masi, Toby J

Gibson, Joe Lewis, Luis Serrano, and Robert B Russell. Systematic discovery of new recognition

peptides mediating protein interaction networks. PLoS biology, 3(12):e405, December 2005.

37. M A Marti-Renom, A C Stuart, A Fiser, R S´anchez, F Melo, and A Sali. Comparative protein

structure modeling of genes and genomes. Annual review of biophysics and biomolecular structure,

29(1):291–325, 2000.

38. Philip Smith, Rosie E Fallon, Niamh E Mangan, Caitriona M Walsh, Margarida Saraiva, Jon R

Sayers, Andrew N J McKenzie, Antonio Alcami, and Padraic G Fallon. Schistosoma mansoni

secretes a chemokine binding protein with antiinflammatory activity. Journal of Experimental

Medicine, 202(10):1319–1325, November 2005.

39. Benjamin A Shoemaker and Anna R Panchenko. Deciphering Protein–Protein Interactions. Part

II. Computational Methods to Predict Protein and Domain Interaction Partners. PLOS Compu-

tational Biology, 3(4):e43, April 2007. 19

40. Haiyuan Yu, Nicholas M Luscombe, Hao Xin Lu, Xiaowei Zhu, Yu Xia, Jing-Dong J Han, Nicolas

Bertin, Sambath Chung, Marc Vidal, and Mark Gerstein. Annotation Transfer Between Genomes:

Protein–Protein Interologs and Protein–DNA Regulogs. Genome Research 2004 14: 1107-1118,

14:1107–1118, 2004.

41. E Sprinzak and H Margalit. Correlated sequence-signatures as markers of protein-protein interac-

tion. Journal of Molecular Biology, 311(4):681–692, 2001.

42. U Pieper, B M Webb, D T Barkan, D Schneidman-Duhovny, A Schlessinger, H Braberg, Z Yang,

E C Meng, E F Pettersen, C C Huang, R S Datta, P Sampathkumar, M S Madhusudhan, K Sjolan-

der, T E Ferrin, S K Burley, and A Sali. ModBase, a database of annotated comparative protein

structure models, and associated resources. Nucleic Acids Research, 39(Database):D465–D474,

December 2010.

43. Narayanan Eswar, Bino John, Nebojsa Mirkovic, Andras Fiser, Valentin A Ilyin, Ursula Pieper,

Ashley C Stuart, Marc A Marti-Renom, M S Madhusudhan, Bozidar Yerkovich, and Andrej Sali.

Tools for comparative protein structure modeling and analysis. Nucleic Acids Research, 31(13):

3375–3380, 2003.

44. Andrej Sali and Tom L Blundell. Comparative Protein Modelling by Satisfaction of Spatial Re-

straints. Biochimica et Biophysica Acta (BBA) - Enzymology, 234(3):779–815, December 1993.

45. M Shen and A Sali. Statistical potential for assessment and prediction of protein structures. Protein

Science, 15(11):2507–2524, 2009.

46. M K Basu, L Carmel, I B Rogozin, and E V Koonin. Evolution of protein domain promiscuity in

eukaryotes. Genome research, 18(3):449–461, 2008.

47. Feng Liang Ingeborg Holt Geo Pertea Jonathan Upton John Quackenbush. The TIGR Gene Indices:

reconstruction and representation of expressed gene sequences. Nucleic Acids Research, 28(1):141,

January 2000.

48. The UniProt Consortium. Reorganizing the protein space at the Universal Protein Resource

(UniProt). Nucleic Acids Research, 40(D1):D71–D75, December 2011. 20

49. Andrew I Su, Tim Wiltshire, Serge Batalov, Hilmar Lapp, Keith A Ching, David Block, Jie Zhang,

Richard Soden, Mimi Hayakawa, Gabriel Kreiman, Michael P Cooke, John R Walker, and John B

Hogenesch. A gene atlas of the mouse and human protein-encoding transcriptomes. Proceedings

of the National Academy of Sciences, 101(16):6062–6067, April 2004.

50. T J P Hubbard, B L Aken, K Beal, B Ballester, M Caccamo, Y Chen, L Clarke, G Coates,

F Cunningham, T Cutts, T Down, S C Dyer, S Fitzgerald, J Fernandez-Banet, S Graf, S Haider,

M Hammond, J Herrero, R Holland, K Howe, K Howe, N Johnson, A Kahari, D Keefe, F Kokocin-

ski, E Kulesha, D Lawson, I Longden, C Melsopp, K Megy, P Meidl, B Ouverdin, A Parker,

A Prlic, S Rice, D Rios, M Schuster, I Sealy, J Severin, G Slater, D Smedley, G Spudich, S Trevan-

ion, A Vilella, J Vogel, S White, M Wood, T Cox, V Curwen, R Durbin, X M Fernandez-Suarez,

P Flicek, A Kasprzyk, G Proctor, S Searle, J Smith, A Ureta-Vidal, and E Birney. Ensembl 2007.

Nucleic Acids Research, Database issue, 35:D610–D617, 2007.

51. Evelyn Camon, Michele Magrane, Daniel Barrell, Vivian Lee, Emily Dimmer, John Maslen, David

Binns, Nicola Harte, Rodrigo Lopez, and Rolf Apweiler. The Gene Ontology Annotation (GOA)

Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Research, Database

issue, 32:D262–D266, 2004.

52. David Botstein, J Michael Cherry, Michael Ashburner, Catherine A Ball, Judith A Blake, Heather

Butler, Allan P Davis, Kara Dolinski, Selina S Dwight, Janan T Eppig, Midori A Harris, David P

Hill, Laurie Issel-Tarver, Andrew Kasarskis, Suzanna Lewis, John C Matese, Joel E Richardson,

Martin Ringwald, Gerald M Rubin, and Gavin Sherlock. Gene Ontology: tool for the unification

of biology - Nature Genetics. Nature, 25(1):25–29, May 2000.

53. Marc H Dresden and Harold L Asch. Proteolytic enzymes in extracts of Schistosoma mansoni

Cercariae. Biochimica et Biophysica Acta (BBA) - Enzymology, 289(2):378–384, December 1972.

54. J P Salter. Schistosome Invasion of Human Skin and Degradation of Dermal Elastin Are Mediated

by a Single Serine Protease. Journal of Biological Chemistry, 275(49):38667–38673, September

2000.

55. G. Gazzinelli and J. Pellegrino. Elastolytic Activity of Schistosoma Mansoni Cercarial Extract.

Journal of Parasitology, 50:591–592, August 1964. 21

56. William Castro-Borges, Adam Dowle, Rachel S Curwen, Jane Thomas-Oates, and R Alan Wilson.

Enzymatic Shaving of the Tegument Surface of Live Schistosomes for Proteomic Analysis A Ratio-

nal Approach to Select Vaccine Candidates. PLoS Neglected Tropical Diseases, 5(3):e993, March

2011.

57. A A Da’dara, P J Skelly, M M Wang, and D A Harn. Immunization with plasmid DNA encoding

the integral membrane protein, Sm23, elicits a protective immune response against schistosome

infection in mice. Vaccine, 20(3-4):359–369, November 2001.

58. Giselle M Knudsen, Katalin F Medzihradszky, Kee-Chong Lim, Elizabeth Hansell, and James H

McKerrow. Proteomic analysis of Schistosoma mansoni cercarial secretions. Molecular & cellular

proteomics MCP, 4(12):1862–1875, December 2005.

59. Melaine Delcroix, Katalin Medzihradsky, Conor R Caffrey, Richard D Fetter, and James H McKer-

row. Proteomic analysis of adult S. mansoni gut contents. Molecular & Biochemical Parasitology,

154(1):95–97, July 2007.

60. M Tendler, C A Brito, M M Vilar, N Serra-Freire, C M Diogo, M S Almeida, A C Delbem, J F

Da Silva, W Savino, R C Garratt, N Katz, and A S Simpson. A Schistosoma mansoni fatty acid-

binding protein, Sm14, is the potential basis of a dual-purpose anti-helminth vaccine. Proceedings

of the National Academy of Sciences of the United States of America, 93(1):269–273, January 1996.

61. Maged Al-Sherbiny, Ahmed Osman, Rashida Barakat, Hala El Morshedy, Robert Bergquist, and

Richard Olds. In vitro cellular and humoral responses to Schistosoma mansoni vaccine candidate

antigens. Acta tropica, 88(2):117–130, October 2003.

62. Cristina T Fonseca, Ed´ecioCunha-Neto, Anna C Goldberg, Jorge Kalil, Am´eliaR de Jesus,

Edgard M Carvalho, Rodrigo Correa-Oliveira, and Sergio C Oliveira. Human T cell epitope map-

ping of the Schistosoma mansoni 14-kDa fatty acid-binding protein using cells from patients living

in areas endemic for schistosomiasis. Microbes and infection / Institut Pasteur, 7(2):204–212,

February 2005.

63. E J Pearce, S L James, S Hieny, D E Lanar, and A Sher. Induction of protective immunity against

Schistosoma mansoni by vaccination with schistosome paramyosin (Sm97), a nonsurface parasite 22

antigen. Proceedings of the National Academy of Sciences of the United States of America, 85(15):

5678–5682, August 1988.

64. Kamal A Shalaby, Lei Yin, Arvind Thakur, Linda Christen, Edward G Niles, and Philip T LoVerde.

Protection against Schistosoma mansoni utilizing DNA vaccination with genes encoding Cu/Zn

cytosolic superoxide dismutase, signal peptide-containing superoxide dismutase and glutathione

peroxidase enzymes. Vaccine, 22(1):130–136, December 2003.

65. Afzal A Siddiqui, Troy Phillips, Hugues Charest, Ron B Podesta, Martha L Quinlin, Justin R

Pinkston, Jenny D Lloyd, Janet Pompa, Rachael M Villalovos, and Michelle Paz. Enhancement

of Sm-p80 (large subunit of calpain) induced protective immunity against Schistosoma mansoni

through co-delivery of interleukin-2 and interleukin-12 in a DNA vaccine formulation. Vaccine, 21

(21-22):2882–2889, June 2003.

66. Sophia J Parker-Manuel, Alasdair C Ivens, Gary P Dillon, and R Alan Wilson. Gene Expression

Patterns in Larval Schistosoma mansoni Associated with Infection of the Mammalian Host. PLoS

Neglected Tropical Diseases, 5(8):e1274, August 2011.

67. M H Tran, M S Pearson, J M Bethony, and D J Smyth. Tetraspanins on the surface of Schistosoma

mansoni are protective antigens against schistosomiasis. Nature medicine, 2006.

68. Andreas Ruppel, Diane J McLaren, Hans Jochen Diesfeld, and Ursula Rother. Schistosoma man-

soni escape from complement-mediated parasiticidal mechanisms following percutaneous primary

infection. European Journal of Immunology, 14(8):702–708, 1984. 23

Figure Legends

%'*"(!+& Modtie & Modpipe Initial Predictions S. mansoni & human Proteomes Template Interactions Template Structures

Network Filter #$"!)$& Promiscuous Domain Filter Detect Sequence Similarities Homo- Dimer Filter

Detect Structure High Confidence Similarities Filter

Biological Context ''"($#& Identify Pairs with Filter similarity to known NLP Enrichment complexes Filter Lifecycle/Tissue Assess Seqeunce or Filter structural basis for putative interaction NLP Target Filter

!"#$%& Predicted Interactions Initial Predictions S. mansoni-H. sapiens Protien-Protein Interactions

A B C

Figure 1. Prediction Framework: A. Modtie & ModPipe protocol for detecting sequence and structure similarity: 1. The protocol begins with the set of human and S. mansoni proteins. 2. Sequence matching procedures are then used to identify similarities between the proteins and proteins with known structure or interactors. 3. A structure-based statistical potential assessment, or a sequence similarity score in the absence of structure, is then used to predict interacting partners and yield the Initial Predictions. B. Initial Predictions: 1. Network Filter: Promiscuous Dimers, homo-dimer complexes, and high confidence interactions are then extracted from the initial set of predictions. 2. The Biological Context Filter is applied to the remaining set of predictions, weighing and ranking NLP enriched predictions, isolating life cycle stage and tissues interactions between S. mansoni and humans, and application of the Targeted Filter. C. Predicted Interactions are an output of the Biological Context Filter. 1b. Illustrates the reduction in interactions obtained after each step from 528,719 to the remaining 1,345 interactions. The framework reduces the number of potential S. mansoni-human protein interactions by about three orders of magnitude (Table 1-3). 24

Figure 2. (Retrospective Predictions): Collagen & C3. Examples of validated interactions. (A) Cercarial elastase (purple) and human collagen (blue) based on the template structure of tick tryptase inhibitor in complex with bovine trypsin (PDB 2UUY) (B) Cercarial elastase (purple) and human Complement C3 (precursor C3b) (blue) based on the template structure (PDB 1EQ9) of fire ant chymotrypsin complexed with PMSF, an inhibitor. Figures were generated by PyMOL (http://www.pymol.org).

Figure 3. (Prospective Predictions): Elafin & CD59. Examples of predicted interactions. (A) Cercarial elastase (purple) was predicted to interact with the human elastase specific inhibitor elafin (blue). This prediction is based on the template crystal structure of elafin complexed with porcine pancreatic elastase (PDB 1FLE). Elafin is indicated in the dermal immune response in humans for wound healing and as an antimicrobial against other pathogens. (B) Sm29 (purple) was predicted to interact with human CD59 (blue) protein involved in the membrane attack complex corroborates hypothesis of S. mansoni’s ability to disable immune response. This prediction was based upon the template of ATF-urokinase and its receptor (PDB 2I9B). Sm29, an uncharacterized transmembrane protein, is an S. mansoni surface protein in both the schistosomula and adult life cycle stages that has been indicated in several immune response interactions making it an important vaccine candidate antigen. Figures were generated by PyMOL (http://www.pymol.org). 25

Tables

Network Filtered Interactions Filter Number of Interactions Removed Interactions Remaining Unfiltered - 528,719 Promiscuous Domains 242,677 286,042 Homo-dimer Complexes 143,065 142,977 High Confidence 108,813 34,164

Table 1. Network Filter: Interactions removed during each application of indicated filter. Unfiltered interactions from the initial result set are shown in the first row. The number of interactions removed and remaining from each filtering step are shown in each category column including Promiscuous Pairs, Homo-dimer Complexes, and High Confidence interactions. The remaining 34,164 interactions are used as an input into the next step, the Biological Context Filter.

Biological Context Filter Interactions. Filter Interactions Removed Interactions Remaining Network - 34,164 NLP & Enrichment 10,929 23,235 Life cycle/ Tissue 492 22,743 Targeted 21,398 1,345

Table 2. Biological Context Filter: Interactions removed during each application of indicated filter. Unfiltered interactions from the Network Filter result set are shown in the first row. The number of interactions found from the total resulting data set are shown in each category column including interactions enriched in the NLP filter, Life Cycle/Tissue Filter, and Targeted Filter.

Biological Context Filter Interactions Correlated to S. mansoni Life Cycle Stage. Life cycle stage Interactions Life cycle-Tissue Human Tissues Cercariae 460 (460/1345) 1 (skin) 21545 (21545/1.5e6) Schistosomula 442 (442/1345) 15 22,743 (22,743/1.5e6) Adult 329 (329/1345) 13 22,576 (22,576/1.5e6) Egg 114 (111/1345) 11 22,164 (22,164/1.5e6)

Table 3. Protein interactions involved in S. mansoni life cycle stages that are directly involved in human pathogenesis from the resulting targeted predictions are shown with targeted life cycle stage, the number of predicted interactions for the corresponding life cycle stage and the number of human tissues targeted in that stage, and associated human tissues involved in the interaction. The similar number of human tissues listed in columns two and three represent the overlap in targeted S. mansoni-human protein interaction sites in vivo. Numbers in parentheses represent a susbset of interactions from the entire set of filtered interactions. Human tissue expression data were obtained from the GNF Tissue Atlas [49] and GO [52] functional annotation unless noted otherwise. 26

Comparison of known and predicted S. mansoni protein interactions.

S. Mansoni Human Protein Predicted Reference PDB

Protein

Smp 001500 EIF4E-binding protein 1 - [14] 3HXG

EIF4E

SmCe Cercarial Collagen (I, IV, VIII) * [15, 24, 25] 2CHA

elastase

SmCe Cercarial IgE - [13] -

elastase

SmCe Cercarial Complement C3 (C3b) * [10] 1EQ9

elastase

SmCe Cercarial Laminin * [15, 24, 25] 2CHA

elastase

SmCe Cercarial Fibronectin * [15, 24, 25] 2CHA

elastase

SmCe Cercarial Keratin * [53] -

elastase

SmCe Cercarial Elastin * [54, 55] 1FON

elastase

SmCB2 Collagen (I) (nidogen) * [10] 1STF

(Smp 141610)

Cathepsin B

SmCB2 Complement C3 - [10] -

(Smp 141610)

Cathepsin B

Table 4. Confirmed protein-protein interaction between S. mansoni and human proteins. The application framework predicted 7 of the 10 known interactions. Predicted interactions are indicated by (*). PDB column indicates the template PDB structure used to predict the interaction. 27

Potential S. mansoni Protein Interactions.

S. Mansoni Protein Human Protein Predicted Reference PDB

Sm-TSP-1 IgG1/IgG3 Immune - [56] -

(Smp 095630) Response

Tetraspanin

Sm-TSP-2 IgG1/IgG3 Immune - [56] -

(Smp 181530) Response

Tetraspanin

Sm 29 (Smp 072190) 67782326 (GI) * [11, 56] 2I9B,

Unknown, C-terminal TGF-beta receptor 1YWH transmembrane domain type-2 isoform A

precursor

Sm 29 (Smp 072190) 67782324 (GI) TGF, * [11, 56] 2I9B,

Unknown, C-terminal beta receptor II 1YWH transmembrane domain

Sm 29 (Smp 072190) 42716302 (GI) CD59 * [11, 56] 2I9B,

Unknown, C-terminal glycoprotein precursor, 1YWH transmembrane domain MAC

Sm 29 (Smp 072190) 9966907 (GI) SLURP-1 * [11, 56] 2I9B,

Unknown, C-terminal 1YWH transmembrane domain

Sm 29 (Smp 072190) 4505865 (GI) PLAU * [11, 56] 2I9B,

Unknown, C-terminal 1YWH transmembrane domain

Sm 29 (Smp 072190) 53829381 (GI) PLAU * [11, 56] 2I9B,

Unknown, C-terminal 1YWH transmembrane domain 28

Sm 29 (Smp 072190) 53829379 (GI) PLAU * [11, 56] 2I9B,

Unknown, C-terminal 1YWH transmembrane domain

Sm 29 (Smp 072190) 4504033 (GI) Glycosyl- * [11, 56] 2I9B,

Unknown, C-terminal phosphatidylinositol- 1YWH transmembrane domain anchored molecule-like

proteinDE

Sm 23 (Smp 017430) IgG3, MAP-3 Immune - [11, 56–58] -

Tetraspanin Response

Sm 14 (Smp 095360.1 IgG1, IgG3 Immune - [11, 26, 56, -

/.2/.3/.4) FABP Response 58–62]

Sm 97 (P06198 UP) IgG, IgE Immune - [58, 61, 63] -

Paramyosin Response

Sm 28 (Smp 054160) IL-5 to MAP-4, IgG2 to - [11, 58, 61] -

GST MAP-4 Immune

Response

SOD (Smp 176200.1) 4507149 (GI) SOD1 * [11, 58, 59, 2AF2,

SOD [Cu-Zn], Cytosolic [Cu-Zn] 64] 1JK9

SOD (Smp 176200.1) 118582275 (GI) SOD3 * [11, 58, 59, 2AF2

SOD [Cu-Zn], Cytosolic Extracellular 64]

SOD (Smp 176200.1) 4826665 (GI) Copper * [11, 58, 59, 2AF2,

SOD [Cu-Zn], Cytosolic chaperone for SOD 64] 1JK9

Sm-p80 (Smp 173580) C3 Complement - [56, 65] -

Katanin p80 WD40 Immune Response

Smp 006510 Cercarial 4505787 (GI) Elafin * [31] 1FLE

Elastase (*Supplemental Table 1)

VAL Venom Allergen *Supplemental Table 2 * [26, 56, 66] *

Proteins 29

Calpain *Supplemental Table 3 * [11, 12, 58, *

65, 66]

Cystatin *Supplemental Table 4 * [58, 59] -

Tetraspanin *Supplemental Table 5 * [11, 56, 66, -

67]

Immune evasion Immunoglobin Proteins * [5, 11, 13, -

*Supplemental Table 6 32, 59, 68]

Table 5. Prospective protein-protein interactions between S. mansoni and human proteins. Prospective interactions are currently under investigation as candidate antigens or potential vaccine candidate antigens. The application framework predicted interactions between several proteins and suggested S. mansoni and human interactions with further variations on these interactions listed in Supplemental Tables (1-6). Cercarial Elastase Interactions

S. mansoni Human Human Protein Smp_112090 148613878 gi acrosin precursor [Homo sapiens] SMP_006510 148613878 gi acrosin precursor [Homo sapiens],Acrosin precursor (EC 3.4.21.10) [Contains: Ac SMP_119130 38504669 gi serine (or cysteine) proteinase inhibitor, clade B (ovalbumin), member 8 [Homo SMP_006510 38504669 gi serine (or cysteine) proteinase inhibitor, clade B (ovalbumin), member 8 [Homo SMP_006510 21359871 gi SPARC-like 1 [Homo sapiens] Smp_006520 134268640 gi tectorin alpha precursor [Homo sapiens] SMP_006510 134268640 gi tectorin alpha precursor [Homo sapiens], SMP_112090 134268640 gi tectorin alpha precursor [Homo sapiens], Smp_119130 21614538 gi (O43464) Splice isoform 2 of O43464 SMP_006510 31563528 gi (O95925) Splice isoform 2 of O95925,AF286369_1 eppin-2,serine protease inhibitor Smp_112090 58533163 gi (P14210) Splice isoform 2 of P14210 SMP_006510 58533163 gi (P14210) Splice isoform 2 of P14210,Hepatocyte growth factor, heavy chain precur Smp_112090 58533170 gi (P14210) Splice isoform 3 of P14210 SMP_006510 58533170 gi (P14210) Splice isoform 3 of P14210,HEPATOCYTE GROWTH FACTOR Smp_006520 5453652 gi (P19883) Splice isoform 2 of P19883 SMP_006510 5453652 gi (P19883) Splice isoform 2 of P19883,follistatin precursor Smp_006510 21536452 gi (P35030) Splice isoform C of P35030,(P35030) Splice isoform C of P35030,Protease Smp_006510 9665236 gi (Q9UKR0) Splice isoform 2 of Q9UKR0,(Q9UKR0) Splice isoform 2 of Q9UKR0,kallikre Smp_006510 153791781 gi 48 kDa protein [Homo sapiens],48 kDa protein [Homo sapiens],Uncharacterized prot Smp_006520 42544239 gi Adipsin/complement factor D precursor (EC 3.4.21.46) (ComplementDE factor D, p SMP_006510 42544239 gi Adipsin/complement factor D precursor (EC 3.4.21.46) (ComplementDE factor D, p Smp_119130 89142741 gi AF243527_5 serine protease SMP_006510 89142741 gi AF243527_5 serine protease,serine protease prostase,Q9Y5K2_chr19:56102005-561058 SMP_006510 54873613 gi Agrin SMP_006510 55743100 gi alpha 3 type VI collagen isoform 2 precursor [Homo sapiens] SMP_006510 55743104 gi alpha 3 type VI collagen isoform 4 precursor [Homo sapiens]. SMP_006510 4502067 gi Alpha-1-microglobulin/bikunin (Growth-inhibiting protein 19),,AMBP protein precu SMP_112090 115583663 gi Alpha-2-antiplasmin precursor (Alpha-2-plasmin inhibitor) (Alpha-2-PI)DE (Alph SMP_006510 115583663 gi Alpha-2-antiplasmin precursor (Alpha-2-plasmin inhibitor) (Alpha-2-PI)DE (Alph Smp_112090 41406055 gi Amyloid beta (A4) protein (Peptidase nexin-II, Alzheimer disease) SMP_006510 41406055 gi Amyloid beta (A4) protein (Peptidase nexin-II, Alzheimer disease),Amyloid beta A

SMP_119130 4502147 gi Amyloid-like protein 2 precursor (Amyloid protein homolog) (APPH)DE (CDEI box- 30 SMP_006510 4502147 gi Amyloid-like protein 2 precursor (Amyloid protein homolog) (APPH)DE (CDEI box- SMP_119130 4557287 gi Angiotensinogen (Serine (Or cysteine) proteinase inhibitor, clade ADE (Alpha-1 SMP_006510 4557287 gi Angiotensinogen (Serine (Or cysteine) proteinase inhibitor, clade ADE (Alpha-1 SMP_006510 4507065 gi Antileukoproteinase 1 precursor (ALP) (HUSI-1) (Seminal proteinaseDE inhibitor Smp_119130 4507065 gi Antileukoproteinase 1 precursor (ALP) (HUSI-1) (Seminal proteinaseDE inhibitor Smp_006510 13346502 gi Apolipoprotein (A) related gene C (Lipoprotein, Lp(A)-like 2,),Apolipoprotein (A Smp_006520 41350214 gi Asporin (LRR class 1) Smp_112090 29244926 gi atrial natriuretic peptide-converting enzyme [Homo sapiens] Smp_006510 11342670 gi Azurocidin 1, preproprotein,Azurocidin 1, preproprotein,Azurocidin precursor (Ca Smp_006510 11545839 gi Brain-specific serine protease-4 (Serine protease PRSS22),Brain-specific serine Smp_006510 22208982 gi Breast normal epithelial cell associated serine protease (KallikreinDE 10),Bre Smp_006510 62526043 gi Caldecrin precursor (EC 3.4.21.2) (Chymotrypsin C),Caldecrin precursor (EC 3.4.2 Smp_006520 4502997 gi Carboxypeptidase A1 precursor (EC 3.4.17.1) SMP_119130 4502997 gi Carboxypeptidase A1 precursor (EC 3.4.17.1),Carboxypeptidase A1 (Pancreatic),Car Smp_006520 4502999 gi CARBOXYPEPTIDASE A2 PRECURSOR (EC 3.4.17.15) Smp_119130 126273559 gi carboxypeptidase B2 (plasma, carboxypeptidase U), isoform CRA_a SMP_006510 22202611 gi Carboxypeptidase D,,Similar to carboxypeptidase D (CPD protein),Carboxypeptidase Smp_112090 4503149 gi Cathepsin G precursor (EC 3.4.21.20) (CG) SMP_006510 4503149 gi Cathepsin G precursor (EC 3.4.21.20) (CG),Cathepsin G precursor (EC 3.4.21.20) ( SMP_006510 58218970 gi cDNA FLJ55455, highly similar to Polyserase-2 (EC 3.4.21.-) Smp_006510 24308541 gi CDNA FLJ90724 fis, clone PLACE1009279, weakly similar toDE Physiologically act SMP_006510 10190748 gi CEGP1 protein Smp_112090 4502907 gi Chymase precursor (EC 3.4.21.39) (Mast cell protease I) SMP_006510 4502907 gi Chymase precursor (EC 3.4.21.39) (Mast cell protease I),Chymase precursor (EC 3. Smp_006510 4503137 gi Chymotrypsin-like protease CTRL-1 precursor (EC 3.4.21.-),Chymotrypsin-like prot Smp_006520 118498341 gi chymotrypsinogen B precursor [Homo sapiens] Smp_112090 118498350 gi chymotrypsinogen B2 precursor [Homo sapiens] Smp_006510 4503649 gi Coagulation factor IX (Plasma thromboplastic component, ChristmasDE disease, h Smp_119130 10518503 gi Coagulation factor VII (Serum prothrombin conversion accelerator) SMP_006510 10518503 gi Coagulation factor VII (Serum prothrombin conversion accelerator),Coagulation fa Smp_119130 4503645 gi Coagulation factor VII (Serum prothrombin conversion accelerator)DE (FVII coag SMP_006510 4503645 gi Coagulation factor VII (Serum prothrombin conversion accelerator)DE (FVII coag Smp_006510 4503625 gi Coagulation factor X,Coagulation factor X,Coagulation factor X precursor (EC 3.4 SMP_006510 145275213 gi coagulation factor XII precursor [Homo sapiens] Smp_006520 4502961 gi COLLAGEN ALPHA 1(VII) CHAIN PRECURSOR (LONG-CHAIN COLLAGEN) (LCDE COLLAGEN) SMP_006510 4502961 gi COLLAGEN ALPHA 1(VII) CHAIN PRECURSOR (LONG-CHAIN COLLAGEN) (LCDE COLLAGEN),Co 31 SMP_006510 55743098 gi collagen alpha-3(VI) chain isoform 1 precursor [Homo sapiens] SMP_006510 55743102 gi collagen alpha-3(VI) chain isoform 3 precursor [Homo sapiens] SMP_006510 55743106 gi collagen alpha-3(VI) chain isoform 5 precursor [Homo sapiens] Smp_119130 93141047 gi Collagen type XII alpha 1 Smp_006510 7706083 gi Complement C1r-like proteinase,Complement C1r-like proteinase,Q9NZP8_chr12:71401 Smp_006510 4502495 gi Complement C1s subcomponent precursor (EC 3.4.21.42) (C1 esterase)DE [Contains Smp_112090 119392081 gi Complement factor I precursor (EC 3.4.21.45) (C3B/C4B inactivator)[Contains: Com SMP_006510 119392081 gi Complement factor I precursor (EC 3.4.21.45) (C3B/C4B inactivator)[Contains: Com Smp_006510 21264359 gi Complement factor MASP-3 (Mannan-binding lectin serine protease 1,DE isoform 2 Smp_119130 92110053 gi CUB and Sushi multiple domains 2 Smp_119130 10092639 gi Cysteine-rich motor neuron 1 SMP_112090 10092639 gi Cysteine-rich motor neuron 1,Q9NZV1_chr2:36557971-36750376_G9V Cysteine-rich rep SMP_006510 10092639 gi Cysteine-rich motor neuron 1,Q9NZV1_chr2:36557971-36750376_G9V Cysteine-rich rep Smp_006520 110735443 gi DJ894D12.3 (Delta-like 1 (Mouse) homolog) SMP_119130 110735443 gi DJ894D12.3 (Delta-like 1 (Mouse) homolog),O00548_chr6:170448474-170455736_R502 D SMP_006510 4505787 gi Elafin precursor (Elastase-specific inhibitor) (ESI) (Skin-derivedDE antileuko Smp_006520 4505787 gi Elafin precursor (Elastase-specific inhibitor) (ESI) (Skin-derivedDE antileuko Smp_006520 58331209 gi Elastase 1, pancreatic SMP_006510 58331209 gi Elastase 1, pancreatic,Elastase-1 precursor (EC 3.4.21.36), Smp_006510 4503549 gi Elastase 2, neutrophil,Elastase 2, neutrophil,Leukocyte elastase precursor (EC 3 Smp_112090 58331211 gi Elastase 2B, preproprotein SMP_006510 58331211 gi Elastase 2B, preproprotein,Elastase 2B,Elastase 2B.,Pancreatic elastase IIB Smp_112090 6679625 gi Elastase 3B, pancreatic SMP_006510 6679625 gi Elastase 3B, pancreatic, Smp_006510 15559207 gi Elastase-2A precursor (EC 3.4.21.71),Elastase-2A precursor (EC 3.4.21.71),Elasta Smp_006510 58331214 gi Elastase-3A precursor (EC 3.4.21.70) (Elastase IIIA) (Protease E),Elastase-3A pr SMP_006510 50659100 gi ELGC699 Smp_006510 4506151 gi Enteropeptidase precursor (EC 3.4.21.9) (Enterokinase) (Serineprotease 7) [Conta Smp_112090 117956391 gi EOS protein (Hypothetical protein) SMP_006510 117956391 gi EOS protein (Hypothetical protein),PRSS33 protein (Protease, serine, 33),PRSS33 Smp_006520 21614533 gi eosinophil serine protease 1 splicing variant SMP_006510 21614533 gi eosinophil serine protease 1 splicing variant,(Q9Y6M0) Splice isoform 3 of Q9Y6M Smp_006510 12408682 gi Epididymal sperm-binding protein 1 (Epididymal secretory protein 12)DE (hE12), SMP_119130 10732863 gi Eppin precursor (Epididymal protease inhibitor) (Serine proteaseDE inhibitor-l SMP_006510 10732863 gi Eppin precursor (Epididymal protease inhibitor) (Serine proteaseDE inhibitor-l Smp_006520 14211875 gi Esophagus cancer-related gene-2 32 SMP_112090 14211875 gi Esophagus cancer-related gene-2,,Esophagus cancer-related gene-2 protein precurs SMP_006510 14211875 gi Esophagus cancer-related gene-2,,Esophagus cancer-related gene-2 protein precurs Smp_112090 71040111 gi Fibromodulin, SMP_006510 5901956 gi Follistatin-related protein 1 precursor (Follistatin-like 1), SMP_119130 5901956 gi Follistatin-related protein 1 precursor (Follistatin-like 1),,Follistatin-relate SMP_006510 5031701 gi Follistatin-related protein 3 precursor (Follistatin-like 3)DE (Follistatin-re SMP_006510 54792136 gi follistatin-related protein 4 precursor [Homo sapiens] SMP_006510 50363221 gi Full-length cDNA clone CS0DE007YP21 of Placenta of Homo sapiensDE (human) (Ful Smp_112090 5453676 gi Granzyme A precursor (EC 3.4.21.78) (Cytotoxic T-lymphocyte proteinase1) (Hanukk SMP_006510 5453676 gi Granzyme A precursor (EC 3.4.21.78) (Cytotoxic T-lymphocyte proteinase1) (Hanukk Smp_112090 4758494 gi Granzyme B precursor (EC 3.4.21.79) (T-cell serine protease 1-3E)(Cytotoxic T-ly SMP_006510 4758494 gi Granzyme B precursor (EC 3.4.21.79) (T-cell serine protease 1-3E)(Cytotoxic T-ly Smp_006510 15529990 gi Granzyme H precursor (EC 3.4.21.-) (Cytotoxic T-lymphocyte proteinase)DE (Cath Smp_006510 4504235 gi Granzyme K precursor (EC 3.4.21.-) (Granzyme-3) (NK-tryptase-2) (NK-DE TRYP-2) Smp_112090 4885369 gi Granzyme M precursor (EC 3.4.21.-) (Met-ase) (Natural killer cellgranular protea SMP_006510 4885369 gi Granzyme M precursor (EC 3.4.21.-) (Met-ase) (Natural killer cellgranular protea Smp_006510 4826762 gi Haptoglobin,Haptoglobin,P00738_chr16:71864828-71871062_S243P_G248R_T372A_D397H H SMP_006510 45580723 gi haptoglobin-related protein precursor [Homo sapiens] SMP_006510 169217813 gi hCG2002962, trypsin temporary entryhCG2002962" Smp_119130 126012571 gi Heparan sulfate proteoglycan 2 (Perlecan) SMP_006520 73858566 gi Heparin cofactor 2 precursor (Heparin cofactor II) (HC-II) (ProteaseDE inhibit SMP_006510 73858566 gi Heparin cofactor 2 precursor (Heparin cofactor II) (HC-II) (ProteaseDE inhibit Smp_006510 4504383 gi Hepatocyte growth factor activator,Hepatocyte growth factor activator,Hepatocyte Smp_006510 74027265 gi hepatocyte growth factor activator inhibitor,hepatocyte growth factor activator Smp_112090 33859835 gi Hepatocyte growth factor precursor (Scatter factor) (SF)DE (Hepatopoeitin A) [ SMP_006510 33859835 gi Hepatocyte growth factor precursor (Scatter factor) (SF)DE (Hepatopoeitin A) [ Smp_112090 4758502 gi HGF activator like protein (Hyaluronan binding protein 2) SMP_006510 4758502 gi HGF activator like protein (Hyaluronan binding protein 2),Q14520_chr10:114977468 Smp_119130 26787982 gi Homo sapiens clone 23888 mRNA sequence (IL15RA protein) SMP_006510 93141001 gi HSAJ1454,AAQ89332_chr4:168351965-168851214_R97H_I112V,,HSAJ1454 SMP_112090 54607080 gi Hypothetical 47.4 kDa protein,Unknown (protein for MGC:21282,Carboxypeptidase B SMP_006520 21361302 gi Hypothetical 48.5 kDa protein, serine (or cysteine) proteinase inhibitor, clade SMP_006510 21361302 gi Hypothetical 48.5 kDa protein, serine (or cysteine) proteinase inhibitor, clade Smp_119130 157738643 gi Hypothetical protein DKFZp434G0625 SMP_119130 4507171 gi Hypothetical protein DKFZp468G1528,P09486_chr5:151071645-151084259_P19S_T128I_R1 33 SMP_006510 4507171 gi Hypothetical protein DKFZp468G1528,P09486_chr5:151071645-151084259_P19S_T128I_R1 Smp_006510 38505209 gi Hypothetical protein DKFZp686A06175,Hypothetical protein DKFZp686A06175, SMP_006510 38679890 gi Hypothetical protein FLJ46651 (Organic anion transporter OATP-DE M1) Smp_112090 4506115 gi Hypothetical protein PROC SMP_006510 4506115 gi Hypothetical protein PROC,Vitamin K-dependent protein C precursor (EC 3.4.21.69) Smp_006520 5454114 gi Hypothetical protein TFPI SMP_119130 5454114 gi Hypothetical protein TFPI,P10646_chr2:188534209-188571038_V292M TISSUE FACTOR PA SMP_006510 5454114 gi Hypothetical protein TFPI,P10646_chr2:188534209-188571038_V292M TISSUE FACTOR PA SMP_112090 61743916 gi Hypothetical protein,CPA4 protein,Carboxypeptidase A4 (Carboxypeptidase A4, isof SMP_006510 122937420 gi inactive serine protease 54 precursor [Homo sapiens] SMP_112090 4504619 gi Insulin-like growth factor-binding protein 7 precursor (IGFBP-7) (IBP-DE 7) (I SMP_006510 4504619 gi Insulin-like growth factor-binding protein 7 precursor (IGFBP-7) (IBP-DE 7) (I SMP_006520 20127446 gi Integrin beta-5 precursor,P18084_chr3:125803374-125926748_T193A_L333F_L428V_R431 Smp_119130 4504649 gi Interleukin 15 receptor, alpha Smp_119130 117306176 gi Kallikrein 5 splice variant 2 (Kallikrein 5) (Kallikrein 5 spliceDE variant 1) SMP_006510 117306176 gi Kallikrein 5 splice variant 2 (Kallikrein 5) (Kallikrein 5 spliceDE variant 1) Smp_112090 61744424 gi Kallikrein 6 SMP_006510 61744424 gi Kallikrein 6,Kallikrein 6.,Kallikrein-6 precursor (EC 3.4.21.-) (Protease M) (Ne Smp_006510 29366812 gi Kallikrein 9,Kallikrein 9,Kallikrein 9.,Kallikrein-9 precursor (EC 3.4.21.-) (Ka Smp_006520 5803199 gi Kallikrein-11 precursor (EC 3.4.21.-) (Hippostasin) (Trypsin-likeDE protease) SMP_006510 5803199 gi Kallikrein-11 precursor (EC 3.4.21.-) (Hippostasin) (Trypsin-likeDE protease), Smp_112090 22208987 gi Kallikrein-12 precursor (EC 3.4.21.-) (Kallikrein-like protein 5)DE (KLK-L5) SMP_006510 22208987 gi Kallikrein-12 precursor (EC 3.4.21.-) (Kallikrein-like protein 5)DE (KLK-L5),K Smp_112090 11496281 gi Kallikrein-13 precursor (EC 3.4.21.-) (Kallikrein-like protein 4)DE (KLK-L4) SMP_006510 11496281 gi Kallikrein-13 precursor (EC 3.4.21.-) (Kallikrein-like protein 4)DE (KLK-L4),K Smp_119130 20302143 gi Kallikrein-15 precursor (EC 3.4.21.-) (ACO protease) SMP_006510 20302143 gi Kallikrein-15 precursor (EC 3.4.21.-) (ACO protease),Kallikrein-15 precursor (EC Smp_112090 5031829 gi Kallikrein-2 precursor (EC 3.4.21.35) (Tissue kallikrein-2) (GlandularDE kalli SMP_006510 5031829 gi Kallikrein-2 precursor (EC 3.4.21.35) (Tissue kallikrein-2) (GlandularDE kalli Smp_006510 21327705 gi Kallikrein-7 precursor (EC 3.4.21.-) (hK7) (Stratum corneumDE chymotryptic enz Smp_006510 91823048 gi Kallikrein-related peptidase 14,Kallikrein-related peptidase 14,Kallikrein-relat SMP_006520 19923632 gi Kazal-type serine protease inhibitor domain 1 (FKSG28) (Novel kazal-DE type se SMP_006510 19923632 gi Kazal-type serine protease inhibitor domain 1 (FKSG28) (Novel kazal-DE type se Smp_006520 5901992 gi Keratocan precursor (KTN) (Keratan sulfate proteoglycan keratocan) SMP_006510 20302147 gi KLK15 splice variant 3 SMP_112090 32313599 gi Kunitz-type protease inhibitor 1 precursor (Hepatocyte growth factorDE activat 34 SMP_006510 32313599 gi Kunitz-type protease inhibitor 1 precursor (Hepatocyte growth factorDE activat Smp_119130 38045910 gi Laminin alpha 3 splice variant b1 Smp_006510 116292750 gi Lipoprotein, Lp(A),Lipoprotein, Lp(A),Lipoprotein, Lp(A).,OTTHUMP00000017543 SMP_006520 93102379 gi Low-density lipoprotein receptor-related protein 1B precursor (Low-DE density Smp_006510 93102379 gi Low-density lipoprotein receptor-related protein 1B precursor (Low-DE density Smp_112090 31543212 gi Macrophage stimulating 1 (Hepatocyte growth factor-like) SMP_006510 31543212 gi Macrophage stimulating 1 (Hepatocyte growth factor-like),Macrophage stimulating SMP_006510 21264357 gi mannan-binding lectin serine protease 1 isoform 1 precursor [Homo sapiens] Smp_112090 21264363 gi Mannan-binding lectin serine protease 2 SMP_006510 21264363 gi Mannan-binding lectin serine protease 2,OTTHUMP00000044363, Smp_112090 34098976 gi Marapsin 2 precursor SMP_006510 34098976 gi Marapsin 2 precursor,Marapsin 2 precursor (Marapsin 2), Smp_006510 4503001 gi Mast cell carboxypeptidase A precursor (EC 3.4.17.1) (MC-CPA)(Carboxypeptidase A SMP_006510 68508970 gi mast cell tryptase beta III SMP_006510 151301154 gi mucin-6 precursor [Homo sapiens] SMP_006510 28212222 gi Multivalent protease inhibitor protein, SMP_006520 28212222 gi Multivalent protease inhibitor protein,,Multivalent protease inhibitor protein Smp_006510 71361688 gi Myeloblastin precursor (EC 3.4.21.76) (Leukocyte proteinase 3) (PR-3)DE (PR3) Smp_112090 6005844 gi Neuropsin precursor (EC 3.4.21.-) (NP) (Kallikrein-8) (Ovasin) (SerineDE prote SMP_006510 6005844 gi Neuropsin precursor (EC 3.4.21.-) (NP) (Kallikrein-8) (Ovasin) (SerineDE prote Smp_006510 21464127 gi neuropsin type2,neuropsin type2,(O60259) Splice isoform 2 of O60259 Smp_006520 4506143 gi Neurotrypsin precursor (EC 3.4.21.-) (Serine protease 12) (Motopsin)(Leydin) SMP_006510 4506143 gi Neurotrypsin precursor (EC 3.4.21.-) (Serine protease 12) (Motopsin)(Leydin),Neu Smp_119130 55770876 gi Notch4 Smp_006520 122937343 gi Novel protein similar to mouse Jedi soluble isoform 736 protein SMP_006510 19923780 gi organic anion transport polypeptide 2 SMP_006510 7019531 gi Organic anion transporter OATP-D SMP_112090 39777594 gi Organic anion transporter OATP-E (Colon organic anion transporter)DE (Organic SMP_006510 39777594 gi Organic anion transporter OATP-E (Colon organic anion transporter)DE (Organic Smp_006510 4506121 gi OTTHUMP00000018738,OTTHUMP00000018738,Protein Z, vitamin K-dependent plasma glyc SMP_119130 4506121 gi OTTHUMP00000018738,Protein Z, vitamin K-dependent plasma glycoprotein,P22891_chr Smp_112090 110815798 gi ovochymase-1 precursor [Homo sapiens] Smp_112090 148231605 gi ovochymase-2 precursor [Homo sapiens] SMP_006520 45505132 gi Pancreatic secretory trypsin inhibitor precursor (Tumor-associatedDE trypsin i Smp_112090 45505132 gi Pancreatic secretory trypsin inhibitor precursor (Tumor-associatedDE trypsin i SMP_006510 45505132 gi Pancreatic secretory trypsin inhibitor precursor (Tumor-associatedDE trypsin i 35 SMP_006510 145309328 gi papilin precursor [Homo sapiens] SMP_112090 20336242 gi PC1,,PC1 Smp_119130 126273569 gi Plasma carboxypeptidase B2, isoform a preproprotein (CarboxypeptidaseDE B2) (P Smp_006520 4505881 gi Plasminogen SMP_006510 4505881 gi Plasminogen,Plasminogen precursor (EC 3.4.21.7) [Contains: Plasmin heavy chain A SMP_006510 87196600 gi Polyserase-3 SMP_006510 169172917 gi PREDICTED: hypothetical protein LOC346702 [Homo sapiens] SMP_006510 169163549 gi PREDICTED: hypothetical protein LOC646960 [Homo sapiens] Smp_006520 169217813 gi PREDICTED: hypothetical protein [Homo sapiens]. SMP_006510 169163744 gi PREDICTED: similar to ACR [Homo sapiens] SMP_006510 169173186 gi PREDICTED: similar to hCG1643218 [Homo sapiens] SMP_006510 169165225 gi PREDICTED: similar to hCG1786642 [Homo sapiens] PREDICTED: similar to hCG1786642" SMP_006510 169164530 gi PREDICTED: similar to hCG1818432 [Homo sapiens] SMP_006510 169179025 gi PREDICTED: similar to hCG31140 [Homo sapiens] SMP_006510 169202626 gi PREDICTED: similar to mucin 5, partial [Homo sapiens] SMP_006510 169163027 gi PREDICTED: similar to Putative acrosin-like protease [Homo sapiens] Smp_112090 4504875 gi preprokallikrein (AA -24 to 238) SMP_006510 4504875 gi preprokallikrein (AA -24 to 238),AF243527_1 renal kallikrein, Smp_119130 26006869 gi Probable protease inhibitor WAP12a precursor SMP_006510 26006869 gi Probable protease inhibitor WAP12a precursor,Q8IUB3_chr20:44998901-45000060_L8P_ Smp_112090 26006867 gi Probable protease inhibitor WAP12b precursor SMP_006510 26006867 gi Probable protease inhibitor WAP12b precursor, Smp_112090 22779936 gi Probable protease inhibitor WAP6 SMP_006510 22779936 gi Probable protease inhibitor WAP6, Smp_112090 22129776 gi Probable serine protease HTRA3 precursor (EC 3.4.21.-) (High-DE temperature re SMP_006510 22129776 gi Probable serine protease HTRA3 precursor (EC 3.4.21.-) (High-DE temperature re Smp_112090 4506153 gi Prostasin precursor (EC 3.4.21.-) (Serine protease 8) [Contains:Prostasin light SMP_006510 4506153 gi Prostasin precursor (EC 3.4.21.-) (Serine protease 8) [Contains:Prostasin light Smp_119130 71834857 gi Prostate specific antigen precursor SMP_006510 71834857 gi Prostate specific antigen precursor,Prostate specific antigen precursor.,Q8NCW4_ Smp_119130 71834855 gi prostate-specific antigen isoform 4 preproprotein [Homo sapiens] Smp_112090 4502173 gi Prostate-specific antigen precursor (EC 3.4.21.77) (PSA) (Kallikrein-DE 3) (Se SMP_006510 4502173 gi Prostate-specific antigen precursor (EC 3.4.21.77) (PSA) (Kallikrein-DE 3) (Se Smp_112090 21618357 gi Prostate-type hippostasin SMP_006510 21618357 gi Prostate-type hippostasin,Kallikrein-11 precursor (EC 3.4.21.-) (Hippostasin) (T 36 Smp_119130 68299793 gi Protease inhibitor H SMP_006510 68299793 gi Protease inhibitor H,Q8N5P0_chr5:147611170-147623020_P36 Hypothetical protein, Smp_006510 13994276 gi Protease serine 27 precursor (EC 3.4.21.-) (Marapsin) (Pancreasin)DE (Channel- SMP_006510 56606135 gi protease, serine, 37 precursor [Homo sapiens] SMP_119130 154759291 gi Protein Z-dependent protease inhibitor precursor (PZ-dependentDE protease inhi Smp_006520 154759291 gi Protein Z-dependent protease inhibitor precursor (PZ-dependentDE protease inhi SMP_006510 154759291 gi Protein Z-dependent protease inhibitor precursor (PZ-dependentDE protease inhi Smp_112090 4503635 gi Prothrombin precursor (EC 3.4.21.5) (Coagulation factor II) [Contains:Activation SMP_006510 4503635 gi Prothrombin precursor (EC 3.4.21.5) (Coagulation factor II) [Contains:Activation Smp_006510 11321636 gi PUTATIVE MAST CELL MMCP-7-LIKE II TYPTASE,PUTATIVE MAST CELL MMCP-7-LIKE II TYPT SMP_112090 16751923 gi Putative multivalent protease inhibitor WFIKKN (WAP,DE follistatin/kazal, immu SMP_006510 16751923 gi Putative multivalent protease inhibitor WFIKKN (WAP,DE follistatin/kazal, immu SMP_006510 116812593 gi putative solute carrier organic anion transporter family member 1B7 [Homo sapien SMP_006510 66347875 gi R subcomponent of complement component 1 Smp_006510 19743898 gi Receptor tyrosine kinase-like orphan receptor 2,Receptor tyrosine kinase-like or Smp_006510 50659098 gi Regeneration associated muscle protease, isoform a,Regeneration associated muscl SMP_006510 11863156 gi Reversion-inducing-cysteine-rich protein with kazal motifs,Reversion-inducing cy Smp_119130 32454741 gi Rheumatoid arthritis related antigen RA-A47 SMP_006510 32454741 gi Rheumatoid arthritis related antigen RA-A47,P50454_chr11:75003703-75009433_L6P_A SMP_006510 4506117 gi S plasma protein Smp_006510 50363237 gi S39329_11 glandular kallikrein-1,S39329_11 glandular kallikrein-1, Smp_119130 33598937 gi scavenger receptor class F member 2 isoform 2 precursor [Homo sapiens] Smp_119130 24586659 gi Scavenger receptor class F member 2 precursor (Scavenger receptorDE expressed SMP_006510 78190498 gi Secreted modular calcium-binding protein 1 SMP_112090 110225349 gi serine (or cysteine) proteinase inhibitor, clade A (alpha-1 antiproteinase, anti SMP_006510 110225349 gi serine (or cysteine) proteinase inhibitor, clade A (alpha-1 antiproteinase, anti Smp_006520 27777657 gi Serine (Or cysteine) proteinase inhibitor, clade A (Alpha-1DE antiproteinase, SMP_006510 27777657 gi Serine (Or cysteine) proteinase inhibitor, clade A (Alpha-1DE antiproteinase, SMP_112090 8393956 gi Serine (Or cysteine) proteinase inhibitor, clade B (Ovalbumin), memberDE 13,HU SMP_006510 8393956 gi Serine (Or cysteine) proteinase inhibitor, clade B (Ovalbumin), memberDE 13,HU Smp_006510 5902072 gi Serine (Or cysteine) proteinase inhibitor, clade B (Ovalbumin), memberDE 3,Ser SMP_119130 5902072 gi Serine (Or cysteine) proteinase inhibitor, clade B (Ovalbumin), memberDE 3,squ Smp_006510 40254871 gi,14602453Serine protease gi Smp_112090 47551347 gi Serine protease 1-like protein 1 SMP_006510 47551347 gi Serine protease 1-like protein 1,AAQ88957_chr19:636719-646430_L143P,GLGL782 SMP_006510 148806923 gi serine protease 45 [Homo sapiens] 37 Smp_006520 148806900 gi serine protease 48 precursor [Homo sapiens] Smp_006520 33695155 gi Serine protease hepsin (EC 3.4.21.-) (Transmembrane protease, serineDE 1) [Con SMP_006510 33695155 gi Serine protease hepsin (EC 3.4.21.-) (Transmembrane protease, serineDE 1) [Con Smp_112090 4506141 gi Serine protease HTRA1 precursor (EC 3.4.21.-) (L56) SMP_119130 4506141 gi Serine protease HTRA1 precursor (EC 3.4.21.-) (L56),Serine protease HTRA1 precur SMP_006510 4506141 gi Serine protease HTRA1 precursor (EC 3.4.21.-) (L56),Serine protease HTRA1 precur Smp_112090 7019477 gi Serine protease HTRA2, mitochondrial precursor (EC 3.4.21.-) (HighDE temperatu SMP_006510 7019477 gi Serine protease HTRA2, mitochondrial precursor (EC 3.4.21.-) (HighDE temperatu SMP_006510 94536774 gi Serine protease inhibitor Kazal type 9 SMP_006510 92110017 gi serine protease inhibitor Kazal-type 13 precursor [Homo sapiens] SMP_112090 7657453 gi Serine protease inhibitor Kazal-type 4 precursor (Peptide PEC-60DE homolog),Se SMP_006510 7657453 gi Serine protease inhibitor Kazal-type 4 precursor (Peptide PEC-60DE homolog),Se SMP_006510 74027261 gi serine protease inhibitor Kazal-type 5 isoform b precursor [Homo sapiens] SMP_006510 122937482 gi serine protease inhibitor Kazal-type 8 precursor [Homo sapiens] Smp_119130 110626171 gi Serine protease inhibitor Kunitz type 3 SMP_006510 110626171 gi Serine protease inhibitor Kunitz type 3,Serine protease inhibitor Kunitz type 3 SMP_112090 10863909 gi Serine protease inhibitor, Kunitz type, 2,O43291_chr19:43447373-43474483_Q3H_C47 SMP_006510 10863909 gi Serine protease inhibitor, Kunitz type, 2,O43291_chr19:43447373-43474483_Q3H_C47 SMP_006510 14602453 gi serine protease,(P57727) Splice isoform B of P57727 SMP_006510 40254871 gi Serine protease,Transmembrane protease, serine 11E precursor (EC 3.4.21.-) (Seri SMP_006510 122937297 gi Serpin A11 precursor SMP_006510 156071456 gi serpin B11 [Homo sapiens] SMP_112090 28076869 gi Serpin B4 (Squamous cell carcinoma antigen 2) (SCCA-2) (Leupin),SQUAMOUS CELL CA SMP_006510 28076869 gi Serpin B4 (Squamous cell carcinoma antigen 2) (SCCA-2) (Leupin),SQUAMOUS CELL CA SMP_119130 5453886 gi Serpin I2 precursor (Myoepithelium-derived serine protease inhibitor)DE (Pancp Smp_006510 5453886 gi Serpin I2 precursor (Myoepithelium-derived serine protease inhibitor)DE (Pancp SMP_006510 110225347 gi SERPINA9 protein,Serpin peptidase inhibitor, clade A (Alpha-1 antiproteinase,DE Smp_112090 31377629 gi Similar to carboxypeptidase A5 Smp_006520 24111255 gi Similar to carboxypeptidase A6 Smp_006510 14702169 gi Similar to plasminogen activator, tissue,Similar to plasminogen activator, tissu SMP_006510 39725934 gi Similar to serine (or cysteine) proteinase inhibitor, clade F (alpha-2DE antip SMP_006520 9790233 gi Solute carrier organic anion transporter family member 1B3 (SoluteDE carrier f SMP_006510 9790233 gi Solute carrier organic anion transporter family member 1B3 (SoluteDE carrier f SMP_006510 8394291 gi Solute carrier organic anion transporter family member 1C1 (SoluteDE carrier f Smp_006520 5032095 gi Solute carrier organic anion transporter family member 2A1 (Solutecarrier family SMP_006510 5032095 gi Solute carrier organic anion transporter family member 2A1 (Solutecarrier family 38 Smp_006520 13569932 gi Solute carrier organic anion transporter family member 5A1 (Solutecarrier family SMP_006510 13569932 gi Solute carrier organic anion transporter family member 5A1 (Solutecarrier family SMP_006510 93277099 gi solute carrier organic anion transporter family member 6A1 [Homo sapiens] SMP_119130 11545873 gi SPARC-related modular calcium-binding protein 1 precursor (SecretedDE modular SMP_006510 11545873 gi SPARC-related modular calcium-binding protein 1 precursor (SecretedDE modular Smp_119130 10863911 gi SPINK2 protein (Fragment) SMP_006510 10863911 gi SPINK2 protein (Fragment),,Serine protease inhibitor Kazal-type 2 precursor (Acr Smp_006520 6005882 gi SPUVE protein Smp_006520 11415040 gi Suppressor of tumorigenicity 14 (EC 3.4.21.-) (Serine protease 14)DE (Matripta SMP_006510 11415040 gi Suppressor of tumorigenicity 14 (EC 3.4.21.-) (Serine protease 14)DE (Matripta Smp_119130 148886654 gi sushi, von Willebrand factor type A, EGF and pentraxin domain-containing protein SMP_006510 113431326 gi temporary entry SMP_006510 55743098 gi,55743102temporary gi,55743106 entry, gi,55743104 gi SMP_006510 118498341 gi temporary entry, chymotrypsinogen B1 [Homo sapiens] SMP_006510 29244926 gi temporary entry, corin [Homo sapiens], SMP_006510 148806900 gi temporary entry, epidermis-specific serine protease-like protein [Homo sapiens] SMP_006510 148231605 gi temporary entry, ovochymase 2 [Homo sapiens] SMP_006510 116256363 gi temporary entry, transmembrane protease, serine 13 [Homo sapiens],transmembrane SMP_006510 118498350 gi temporary entry,cDNA FLJ77335, highly similar to Homo sapiens chymotrypsinogen B SMP_006510 71834855 gi temporary entry,kallikrein 3, (prostate specific antigen), isoform CRA_i SMP_006510 110815798 gi temporary entry,Ovochymase 1 Smp_119130 4759164 gi Testican-1 precursor (SPOCK protein) SMP_006510 4759164 gi Testican-1 precursor (SPOCK protein), SMP_006510 93141003 gi Testican-3 precursor (SPARC/osteonectin, CWCV, and Kazal-like domainsDE proteo Smp_006510 33186882 gi Testis serine protease 2 precursor,Testis serine protease 2 precursor,Testis ser Smp_112090 7019563 gi Testis-specific protease-like protein 50 precursor SMP_006510 7019563 gi Testis-specific protease-like protein 50 precursor, Smp_006510 21614531 gi testisin,testisin,(Q9Y6M0) Splice isoform 2 of Q9Y6M0 Smp_112090 5803197 gi Testisin precursor (EC 3.4.21.-) (Eosinophil serine protease 1) (ESP-DE 1) SMP_006510 5803197 gi Testisin precursor (EC 3.4.21.-) (Eosinophil serine protease 1) (ESP-DE 1),Tes SMP_006510 4507377 gi thyroxine-binding globulin precursor SMP_119130 5730091 gi Tissue factor pathway inhibitor 2,P48307_chr7:93128164-93132019_V102A_R231Q TISS SMP_006510 5730091 gi Tissue factor pathway inhibitor 2,P48307_chr7:93128164-93132019_V102A_R231Q TISS SMP_006510 73760409 gi Tissue factor pathway inhibitor beta, SMP_119130 73760409 gi Tissue factor pathway inhibitor beta,,Tissue factor pathway inhibitor beta Smp_006510 4505861 gi Tissue-type plasminogen activator precursor (EC 3.4.21.68) (tPA) (t-DE PA) (t- 39 Smp_112090 12383051 gi TMEFF2 protein precursor (Transmembrane protein TENB2) (TPEF)DE (Transmembrane SMP_006510 12383051 gi TMEFF2 protein precursor (Transmembrane protein TENB2) (TPEF)DE (Transmembrane Smp_119130 145701030 gi TMPRSS3 SMP_006510 145701030 gi TMPRSS3,AAQ88894_chr11:117485670-117526277_R172Q_K193E_G203V Smp_006510 169209600 gi TPSAB1 protein,TPSAB1 protein,TPSAB1 protein.,TPS1 protein Smp_119130 32698841 gi Transmembrane protease, serine 11B (EC 3.4.21.-) SMP_006510 32698841 gi Transmembrane protease, serine 11B (EC 3.4.21.-),Putative uncharacterized protei Smp_119130 4758508 gi Transmembrane protease, serine 11D precursor (EC 3.4.21.-) (AirwayDE trypsin-l SMP_006510 4758508 gi Transmembrane protease, serine 11D precursor (EC 3.4.21.-) (AirwayDE trypsin-l Smp_112090 32698940 gi Transmembrane protease, serine 12 (EC 3.4.21.-) SMP_006510 32698940 gi Transmembrane protease, serine 12 (EC 3.4.21.-),Q86WS5_chr12:49523015-49567560_Y Smp_006520 14602459 gi Transmembrane protease, serine 2 precursor (EC 3.4.21.-) [Contains:DE Transmem SMP_006510 14602459 gi Transmembrane protease, serine 2 precursor (EC 3.4.21.-) [Contains:DE Transmem Smp_112090 13173471 gi Transmembrane protease, serine 3 (EC 3.4.21.-) (Serine protease TADG-DE 12) (T SMP_006510 13173471 gi Transmembrane protease, serine 3 (EC 3.4.21.-) (Serine protease TADG-DE 12) (T Smp_006510 15451940 gi Transmembrane protease, serine 4 (EC 3.4.21.-) (Membrane-type serineprotease 2) Smp_006520 13540535 gi Transmembrane protease, serine 5 (EC 3.4.21.-) (Spinesin) SMP_006510 13540535 gi Transmembrane protease, serine 5 (EC 3.4.21.-) (Spinesin),Transmembrane protease SMP_006510 29568105 gi Transmembrane protein with EGF-like and two follistatin-like domainsDE 1, SMP_006520 29568105 gi Transmembrane protein with EGF-like and two follistatin-like domainsDE 1,,Tran SMP_006510 94400921 gi,94400923Trypsin domain gi containing 1 SMP_006510 94400923 gi Trypsin domain containing 1, Smp_006510 48255915 gi Trypsin X3 (KFIL2540),Trypsin X3 (KFIL2540), Smp_112090 4506145 gi Trypsinogen A SMP_006510 4506145 gi Trypsinogen A,Trypsinogen A (Protease, serine, 1) (Trypsin 1),Trypsin-1 precurso Smp_112090 4506147 gi Trypsinogen E (Protease, serine, 2) (Trypsin 2) (Anionic trypsinogen) SMP_006510 4506147 gi Trypsinogen E (Protease, serine, 2) (Trypsin 2) (Anionic trypsinogen),Trypsinoge Smp_112090 13775595 gi Tryptase beta-1 precursor (EC 3.4.21.59) (Tryptase-1) (Tryptase I) SMP_006510 13775595 gi Tryptase beta-1 precursor (EC 3.4.21.59) (Tryptase-1) (Tryptase I),Tryptase beta Smp_112090 110578663 gi tryptophan/serine protease [Homo sapiens] SMP_006510 110578663 gi tryptophan/serine protease [Homo sapiens], hypothetical protein LOC203074 [Homo SMP_006510 110431331 gi Type II transmembrane serine protease 7 precursor (HypotheticalDE protein FLJ1 Smp_006510 6005820 gi unnamed protein product,unnamed protein product,Solute carrier organic anion tra Smp_112090 4505863 gi Urokinase-type plasminogen activator (Plasminogen activator,DE urokinase) SMP_006510 4505863 gi Urokinase-type plasminogen activator (Plasminogen activator,DE urokinase),Plas Smp_119130 5032223 gi VESPR 40 Smp_006520 39930511 gi VPPP1921 SMP_006510 39930511 gi VPPP1921,AAQ88713_chr18:55252112-55513543_V193G Smp_006520 21703706 gi WAP four-disulfide core domain 10A SMP_006510 21703706 gi WAP four-disulfide core domain 10A,,WAP four-disulfide core domain protein 10A p Smp_119130 20069858 gi WAP four-disulfide core domain 12 SMP_006510 20069858 gi WAP four-disulfide core domain 12,Q8WWY7_chr20:44437919-44438517_G23R,,WAP four- Smp_006510 21717822 gi WAP four-disulfide core domain 5,WAP four-disulfide core domain 5,probable prote Smp_119130 153946389 gi WAP four-disulfide core domain 8 SMP_006510 153946389 gi WAP four-disulfide core domain 8,Q8IUA0_chr20:44866097-44893315_N137 WAP four-di Smp_119130 56699495 gi WAP four-disulfide core domain protein 2 precursor (Major epididymis-DE specif 41 VAL Venom Allergen Proteins Interactions

S. mansoni Human Human Protein Smp_123550 25121984 gi BA719J20.1.2 (Acidic epididymal glycoprotein-like 1n),(P54107) Splice isoform Sh Smp_123550 5174675 gi Cysteine-rich secretory protein 3 precursor (CRISP-3) (SGP28 protein),Cysteine-r Smp_123550 110825980 gi Glioma pathogenesis-related protein,Glioma pathogenesis-related protein 1 precur Smp_123550 70780384 gi HGSC289 (OTTHUMP00000016309),Peptidase inhibitor 16,,HGSC289 Smp_123550 30425410 gi OTTHUMP00000031047 (R3H domain (Binds single-stranded nucleic acids)DE contain Smp_123550 13899303 gi Putative secretory protein (CocoaCrisp) (Trypsin inhibitor Hl)DE (Hypothetical Smp_123550 4507671 gi Testis-specific protein TPX1 e isoform (Testis-specific protein TPX1 bDE isofo 42 Calpain Interactions

S. mansoni Human Human Protein Smp_157500 27765074 gi (P20807) Splice isoform II of P20807 SMP_137410 27765074 gi (P20807) Splice isoform II of P20807,AF127765_1 calpain 3; calcium activated neu Smp_137410 27765076 gi (P20807) Splice isoform IV of P20807 Smp_137410 27765072 gi AF127764_1 calpain 3; calcium activated neutral protease; CAPN3; CL1 Smp_157500 13186302 gi Calpain 10 (EC 3.4.22.17) (Calcium-activated neutral proteinase 10)DE (CANP 10 Smp_157500 27765078 gi Calpain 3, isoform e (CAPN3 protein) Smp_157500 4502565 gi Calpain small subunit 1 (CSS1) (Calcium-dependent protease smallDE subunit 1) Smp_157500 41152101 gi Calpain-13 (EC 3.4.22.-) Smp_157500 4557405 gi Calpain-3 (EC 3.4.22.54) (Calpain L3) (Calpain p94) (Calcium-activatedneutral pr Smp_157500 13186316 gi Calpain-6 (Calpamodulin) (CalpM) (Calpain-like protease X-linked) SMP_137410 13186316 gi Calpain-6 (Calpamodulin) (CalpM) (Calpain-like protease X-linked),Q9Y6Q1_chrX:10 Smp_157500 7656959 gi Calpain-7 (EC 3.4.22.-) (PalB homolog) (PalBH) SMP_157500 5729758 gi Calpain-9 (EC 3.4.22.-) (Digestive tract-specific calpain) (nCL-4)DE (CG36 pro Smp_137410 12408656 gi Cell proliferation-inducing protein 30 Smp_157500 20149675 gi EF hand domain containing 2 Smp_157500 21361462 gi EH-domain containing 2 (EH domain-containing protein-2) Smp_137410 30240932 gi EH-domain-containing protein 1 (Testilin) (hPAST1) Smp_157500 4503593 gi Epidermal growth factor receptor pathway substrate 15 Smp_157500 10864047 gi Epidermal growth factor receptor substrate EPS15R Smp_137410 7705383 gi GC36 protein SMP_089460.27705383 gi GC36 protein, Smp_157500 157389005 gi Hypothetical protein Smp_137410 4507207 gi Hypothetical protein DKFZp459G0314 Smp_157500 38707981 gi Hypothetical protein FLJ45489 SMP_089460.138707981 gi Hypothetical protein FLJ45489, Smp_137410 6912388 gi Hypothetical protein GCA Smp_137410 48476342 gi Mitochondrial ATP-Mg/Pi carrier (Mitochondrial Ca2+-dependent soluteDE carrier Smp_157500 113865939 gi Novel S100 calcium-binding protein (S100 calcium binding protein A7-DE like 2) Smp_157500 6912582 gi Peflin (PEF protein) (Penta-EF hand domain containing 1) (CDNADE FLJ10558 fis, Smp_137410 113411704 gi PREDICTED: calpain 8 [Homo sapiens] Smp_137410 88943801 gi PREDICTED: similar to calpain 8 [Homo sapiens]

Smp_157500 88952798 gi PREDICTED: similar to calpain isoform 4 [Homo sapiens] 43 Smp_137410 7019485 gi Programmed cell death 6 Smp_157500 28827815 gi S100 calcium binding protein A15 (S100 calcium binding protein A7-likeDE 1) Smp_137410 14161692 gi Similar to RIKEN cDNA 2310005G05 gene Smp_157500 5032105 gi Small optic lobes homolog SMP_089460.25032105 gi Small optic lobes homolog,O75808_chr16:536840-543514_G19C SMALL OPTIC LOBES HOMO Smp_137410 38679884 gi sorcin isoform B [Homo sapiens] Smp_157500 47078247 gi temporary entry SMP_089460.147078247 gi temporary entry, Smp_137410 31377581 gi Uncharacterized protein C17orf57 Smp_137410 37577157 gi Unknown (protein for MGC:9155 SMP_089460.237577157 gi Unknown (protein for MGC:9155,Calpain-5 (EC 3.4.22.-) (nCL-3) (htra-3),,Calpain Smp_157500 89353291 gi,5901916unnamed gi protein product 44 Cystatin Interactions

S. mansoni Human Human Protein SMP_006390 4503139 gi Cathepsin B, preproprotein,Cathepsin B precursor (EC 3.4.22.1) (Cathepsin B1) (A SMP_006390 6042196 gi Cathepsin F precursor (EC 3.4.22.41) (CATSF),Q9UBX1_chr11:66106767-66111317_Q153 SMP_006390 4503151 gi Cathepsin K precursor (EC 3.4.22.38) (Cathepsin O) (Cathepsin X)DE (Cathepsin SMP_006390 23110960 gi Cathepsin L2,CATHEPSIN L2 PRECURSOR (EC 3.4.22.43) (CATHEPSIN V) (CATHEPSIN U),, SMP_006390 4557501 gi Cathepsin O precursor (EC 3.4.22.42),Cathepsin O precursor (EC 3.4.22.42)., SMP_006390 23110962 gi Cathepsin S,P25774_chr1:147922107-147953821_S161 CATHEPSIN S PRECURSOR (EC 3.4.2 SMP_006390 22538442 gi Cathepsin Z precursor (EC 3.4.22.-) (Cathepsin X) (Cathepsin P), SMP_006390 4503155 gi CTSL protein (Cathepsin L) (Hypothetical protein DKFZp686A18159),CTSL protein (H SMP_006390 23110955 gi pro-cathepsin H preproprotein [Homo sapiens] 45 Tetraspanin Interactions

S. mansoni Human Human Protein Smp_100080 5032207 gi (Q96QS1) Splice isoform 2 of Q96QS1, tetraspanin Smp_181530 4757944 gi CD81 antigen (26 kDa cell surface protein TAPA-1) (Target of theDE antiprolife Smp_100080 21264580 gi Hypothetical 24.1 kDa protein, tetraspanin Smp_100080 145580605 gi tetraspanin Smp_181530 145580605 gi tetraspanin 46 Human Immunoglobin Interactions

S. mansoni S. mansoni Protein Human Human Protein SMP_027400 Ankyrin, kinase,Serine/threonine,SH213994242 gi C-Cbl-interacting protein ige SMP_031880 Immunoglobulin-like 21361212 gi CTLA4,CYTOTOXIC T-LYMPHOCYTE PROTEIN 4 SMP_032970 Calcium-binding EF-hand,Myoc156616292 gi OTTHUMP00000016605 ige SMP_043990.3basigin 146229333 gi allergin-1 precursor [Homo sapiens] SMP_055500 DNA-directed DNA polymerase,Importin-beta,21361212 gi CTLA4 IgGN-terminal Proteins SMP_082490 Cyclin, A/B/D/E,Cyclin, C-terminal112382241 gi Proto-oncogene tyrosine-protein kinase SMP_115130 Immunoglobulin-like 23312371 gi CD40 type II isoform, SMP_127420 Cystathionine beta-synthase,156104869 Chloride gi channelChloride channel protein ige SMP_129430 Transcription factor, MADS-box19923215 gi Myocyte-specific enhancer factor 2C IgG Proteins SMP_133590 C2 calcium-dependent membrane60097902 targeting,Filaggrin gi Filaggrin ige SMP_141410 Carbohydrate kinase, FGGY,Immunoglobulin11231177 gi AF109683_1 leukocyte-associated Ig-like receptor 1b ige SMP_148450 Chitinase II,Glycoside hydrolase,144226251 catalytic gi (cartilage core glycoprotein-39) ige SMP_153170 Fibronectin, type III 63055063 gi Fc gamma receptor I IgG Proteins SMP_156630 Protein coding gene 50511926 gi FCGR2B protein (Fragment) IgG Proteins SMP_169240 SH2 motif,Src,Variant SH321618338 Signal transducer/activator gi (P40763) Splice isoform Del-701 of P40763 ige SMP_172070 Protein coding gene 23312371 gi CD40 type II isoform IgG Proteins 47 48

Library Release

Publishing Agreement It is the policy of the University to encourage the distribution of all theses, dissertations, and manuscripts. Copies of all UCSF theses, dissertations, and manuscripts will be routed to the library via the Graduate Division. The library will make all theses, dissertations, and manuscripts accessible to the public and will preserve these to the best of their abilities, in perpetuity.

I hereby grant permission to the Graduate Division of the University of California, San Francisco to release copies of my thesis, dissertation, or manuscript to the Campus Library to provide access and preservation, in whole or in part, in perpetuity.

______Author Signature

______Date

(This page must be signed and dated by the author and include the correct pagination – as the last numbered page number of your document.)