Proteomics 2007, 7, 1775–1785 DOI 10.1002/pmic.200601006 1775

RESEARCH ARTICLE Systematic identification of SH3 domain-mediated human –protein interactions by peptide array target screening

Chenggang Wu1*, Mike Haiting Ma1*, Kevin R. Brown2, Matt Geisler1, Lei Li1, Eve Tzeng1, Christina Y. H. Jia1, Igor Jurisica2 and Shawn S.-C. Li1

1 Department of Biochemistry and the Siebens-Drake Research Institute, Schulich School of Medicine and Dentistry, University of Western Ontario, London, Ontario, Canada 2 Ontario Cancer Institute, Northeast Structural Genomics Consortium, Toronto, Ontario, Canada

Systematic identification of direct protein–protein interactions is often hampered by difficulties Received: December 12, 2006 in expressing and purifying the corresponding full-length . By taking advantage of the Revised: February 1, 2007 modular nature of many regulatory proteins, we attempted to simplify protein–protein interac- Accepted: February 23, 2007 tions to the corresponding domain-ligand recognition and employed peptide arrays to identify such binding events. A group of 12 Src homology (SH) 3 domains from eight human proteins (Swiss-Prot ID: SRC, PLCG1, P85A, NCK1, GRB2, FYN, CRK) were used to screen a peptide target array composed of 1536 potential ligands, which led to the identification of 921 binary interactions between these proteins and 284 targets. To assess the efficiency of the peptide array target screening (PATS) method in identifying authentic protein–protein interactions, we exam- ined a set of interactions mediated by the PLCg1 SH3 domain by coimmunoprecipitation and/or affinity pull-downs using full-length proteins and achieved a 75% success rate. Furthermore, we characterized a novel interaction between PLCg1 and hematopoietic progenitor kinase 1 (HPK1) identified by PATS and demonstrated that the PLCg1 SH3 domain negatively regulated HPK1 kinase activity. Compared to protein interactions listed in the online predicted human interaction protein database (OPHID), the majority of interactions identified by PATS are novel, suggesting that, when extended to the large number of peptide interaction domains encoded by the human genome, PATS should aid in the mapping of the human interactome.

Keywords: HPK1 / Interactome / Peptide array / PLCg1 / SH3 domain

1 Introduction Correspondence: Dr. Shawn S.-C. Li, Department of Biochemistry and the Siebens-Drake Research Institute, Schulich School of A description of the global protein connectivity in a cell is of Medicine and Dentistry, University of Western Ontario, London, enormous value to our understanding of all essential cellular Ontario, Canada N6A 5C1 E-mail: [email protected] functions and disease mechanisms [1–3]. Until recently, ge- Fax: 11-519-661-3175 nome-wide mapping of protein interactions, or the inter- actome, has been focused on model organisms such as Sac- Abbreviations: co-IP, coimmunoprecipitation; HEK, human charomyces cerevisiae[4–6], Caenorhabditis elegans [7], and embryonic kidney; HPK1, hematopoietic progenitor kinase 1; Drosophila melanogaster [8]. By taking advantage of the high- OPHID, online predicted human interaction protein database; PATS, peptide array target screening; PIDs, peptide or protein interaction domains; SH, Src homology; Y2H, yeast two-hybrid * Both these authors contributed equally to this work.

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com 1776 C. Wu et al. Proteomics 2007, 7, 1775–1785 throughput ability of the affinity purification coupled with membrane array was used in the discovery of potential Cdc4 MS (AP-MS) approach and systematic yeast two-hybrid substrates in yeast [31]. However, due to significant differ- (Y2H) [9], interactome frameworks have been established for ences in genome size and protein architecture between these organisms, which in turn have provided unprece- humans and yeast, most human protein interactions, espe- dented insights into many important biological processes cially those occurring in and cell–cell [10]. Recent work on large-scale identification of human communication, may not be directly inferred from the yeast protein interactions by Y2H has, however, changed the land- interactome. Indeed, a comparative analysis of over 70 000 scape of the human interactome drastically [11, 12]. These binary interactions identified to date for yeast, worm, fly, and studies outlined a skeleton for part of the human inter- human revealed that only 16 interactions are common to all actome and served as an impetus for more comprehensive four species [32], suggesting that networks of protein–protein mapping of the human interactome [13]. interactions differ significantly from one species to another. Although the current state of knowledge does not allow a Because most of the regulatory proteins in humans have a thorough comparison between the human interactome and modular architecture, domain-mediated interactions may that of a model organism such as S. cerevisiae, for which a constitute a significant part of the regulatory networks found plethora of genetic, mass spectrometric, and Y2H data are in a human cell. Thus, high-throughput strategies that make available [14–19], the human interactome is believed to be use of these unique features provide attractive means to more complex than that of the yeast. On one hand, the two experimentally map regulatory networks in humans. For species differ significantly in proteome size, cellular diversity, instance, MS-aided protein identification was applied to the and compartmentalization; on the other hand, protein struc- mammalian WW and 14-3-3 domain families to identify their tures vary considerably from S. cerevisiae to Homo sapiens [20– respective interaction networks [33, 34]. These studies 22]. Compared to yeast, there is a drastic expansion of modular revealed that modular domains connect multiple proteins in domains, especially the so-called peptide or protein interaction a network and are involved in regulating a wide range of cel- domains (PIDs), in the human genome. For instance, an esti- lular processes. Recently, domain macroarrays were mated 120 copies of the Src-homology (SH) 2 domains are employed to quantitatively characterize a protein interaction encoded by the human genome while no functional SH2 do- network mediated by the ErbB receptor [35]. In the present main has so far been identified in yeast [23]. Consequently, study, we applied a peptide array target screening (PATS) signaling events regulated by tyrosine phosphorylation and strategy to map interactions stemmed from a group of 12 dephosphorylation do not take place in yeast. Similarly, a hu- SH3 domains taken from eight human proteins. The result- man cell harbors approximately 300 SH3 domains while yeast ing interaction network linked 8 “bait” proteins to 284 “tar- contains only 28 copies of the same domain family [24]. get” proteins through 921 binary interactions. Using PLCg1 Protein–protein interactions occurring via the recogni- as an example, we confirmed a number of PATS-derived tion of a motif in one protein by a mod- interactions by in-solution peptide binding and affinity pull- ular domain in another are a common means employed by a down or coimmunoprecipitation (co-IP) assays carried out on cell to assemble complex protein networks seen in signal intact proteins. Furthermore, we demonstrate that the PLCg1 transduction and underlie specific substrate recognition by SH3 binds to the hematopoietic progenitor kinase 1 (HPK1) protein kinases and phosphatases [25–27]. Several important in vivo and thereby negatively regulates its kinase activity. features of interaction domains make them ideal targets also for large-scale human interactome mapping. First, interac- tion domains are found in thousands of human proteins. 2 Materials and methods Second, they range from 30 to 150 amino acids in size, ap- proximately one-third of a typical human protein, and gen- 2.1 Expression, purification, and fluorescein labeling erally fold into stable 3-D structures in solution. These char- of SH3 proteins acteristics render interaction domains particularly amenable to biochemical and biophysical manipulations. Third, many SH3 domains were expressed and purified as previously PIDs bind short peptide motifs as well as they do the corre- reported [36]. To facilitate specific attachment of fluorescein, sponding native proteins [28]. Therefore, by identifying spe- each SH3 domain was engineered to contain a C-terminal cific peptides to which a PID binds, one can often deduce Gly–Gly–Cys triad sequence. FPLC-purified (His)6-SH3 pro- potential interacting proteins for that domain [29]. teins were labeled with fluorescein-5-maleimide, according An SH3 domain-mediated protein interacting network to the manufacturer’s protocol (Pierce). was first identified in yeast using a strategy that combined specificity determination by phage display libraries with 2.2 Synthesis of peptide spot arrays on cellulose experimental screening for binary protein–protein interac- membranes and probing of the peptide arrays by tions by Y2H [16]. Peptides synthesized on NC membranes SH3 domains were employed recently to map the binding partners for a group of yeast SH3 domains and to scan the human pro- Peptide arrays were assembled on a functionalized cellulose teome [30]. Similarly, a proteome-wide phosphopeptide membrane following essentially the same procedure as

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com Proteomics 2007, 7, 1775–1785 Protein Arrays 1777 reported elsewhere [36]. Probing of the peptide array using include at least one SH3 domain protein identified by purified SH3 proteins also followed established procedures -A [39]. The eight human SH3 domain-containing [36]. Upon recording background fluorescence (mainly from proteins analyzed by PATS (Swiss-Prot ID: SRC, PLCG1, Trp residues in the peptides), 1.0 mM fluorescein-labeled P85A, NCK1, GRB2, FYN, CRK) have 626 interactions with SH3 domain was added directly to the blocking solution and 390 different proteins in OPHID. When these interactions incubated with the membrane for 1 h at room temperature were compared to the 921 interactions discovered by PATS, (RT). The membrane was then washed three times in TBS-T 35 direct matches were identified involving 21 different and once in TBS before imaging. The fluorescent signals interacting proteins. An additional 28 SH3 domain-binding resulting from SH3-binding on the peptide array were cap- proteins were identified in interactions covered by both tured by and analyzed on a Fluor-S MultiImager (BioRad). PATS and OPHID, but with different SH3 partners. The OPHID database included 8590 different human proteins, of 2.3 Peptide synthesis and fluorescence polarization which 724 were in common with the PATS “preys” and thus measurements represent a common “prey” set. If we consider only the eight “bait” proteins used in PATS, and the 724 common “prey” Individual peptides were synthesized at 0.1 mmol scale on a proteins, then there are 566 interactions identified by PATS 433A Peptide Synthesizer (Applied Biosystems) using and 173 by OPHID. In order to apply the appropriate statis- standard Fmoc chemistry. Fluorescein labeling, purification tical analysis we considered the following: if all the possible of the peptides, and fluorescent polarization measurements interactions were balls in a bag, and we randomly draw out were performed essentially as previously described [36]. 566 balls, mark them, and return to the bag, then the prob- ability that we randomly choose a marked ball is 566/5792 for 2.4 Cell culture, GST pull-down, co-IP, and Western the first draw, and 566/(5792-1) for the second draw and so blot on. We thus calculated the probability for 173 sequential draws, which peaked at 17 marked balls (which is the Human embryonic kidney (HEK) 293T cells (from ATCC) expected overlap), and found that the probability of drawing were cultured in DMEM supplemented with 10% v/v FBS. 35 or more marked balls (the observed overlap between Cells were transiently transfected with expressing plasmids OPHID and PATS) is less than 1.9261025. Thus the encoding the appropriate proteins. GST pull-down and observed intersection between OPHID and PATS databases immunoprecipitation experiments followed established pro- is significant and cannot be explained by chance alone. cedures [37]. Jurkat cells were cultured according to standard proce- dures (ATCC). Cells were stimulated with 10 mg/mL CD3 3 Results (clone OKT from eBioscience) for 2 min at 377C followed by lysis and immunoprecipitation with anti-PLCg1 antibody. IP 3.1 Identification of an SH3 domain signaling complex was extensively washed and resolved on an SDS- network by PATS PAGE followed by detection using anti-HPK1 antibody. The PATS approach is similar to the WISE method devel- 2.5 Kinase assay oped by Landgraf et al. [30] for identifying a protein interac- tion network in yeast. Here we employed PATS to system- HA-HPK1 expressed in HEK293T cells were immunoprecip- atically identify protein interactions mediated by a group of itated with anti-HA antibody as described above. IP com- human SH3 domains. Because most SH3 domains select for plexes were washed four times with NP-40 buffer and twice an [R/K]xcPxxP (or the Class-I motif, where c represents a with kinase buffer [38]. Kinase assays were carried out as hydrophobic residue and x denotes any ) or/and a described by Kiefer et al. [38] in the absence or presence of PxcPx[R/K] (Class-II) motif [40], we treated these sequences different amounts of the PLCg1 SH3 domain with or without as inputs to scan the Swiss-Prot protein database for peptides competing peptides. SDS Laemmli sample buffer was added that contain such motifs. Some proteins such as collagens to stop the kinase reaction. Samples were heated and then that are enriched in prolines for structural reasons were resolved on 15% SDS-PAGE. Data acquisition and analysis excluded from the list of retrieved peptides. A motif-scan- were carried out using an Amersham Storm phosphoimager ning program developed in-house retrieved 768 Class-I pep- system. tides (each containing 13 residues) from 698 proteins and 768 Class-II peptides (containing 14 residues) from 670 pro- 2.6 Comparison of PATS to the online predicted teins. For each peptide, the consensus motif was extended by human interaction protein database (OPHID) three or four residues at both the N- and C-terminus to eliminate possible end effect. A total of 1536 peptides were We compared PATS protein interactions to those found in synthesized in an array format on functionalized NC mem- the OPHID. The OPHID version used in this study contains branes. Multiple copies of the same array were produced and 44 800 unique interactions in which 1989 interactions probed for binding, respectively, to a group of 12 human SH3

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com 1778 C. Wu et al. Proteomics 2007, 7, 1775–1785 domains. The SH3 domains were taken from proteins known to function in cellular signal transduction. These include tyrosine kinases (i.e. Src, Abl, and Fyn), phospho- inositide-specific kinase (i.e. PI3K), lipase (i.e. PLCg1), and adaptor proteins (i.e. Grb2, NCK, and Crk). Due to significant background signals observed when using a GST-fused SH3 domain as a probe (data not shown), we tagged each SH3 domain with a (His)6 sequence at the N-terminus to facilitate protein purification on an Ni-NTA column and a Gly–Gly– Cys triad at the C-terminus to allow specific attachment of a fluorescein moiety via the sidechain of the Cys residue. Most SH3 domains contain a single or no internal Cys residue in their sequences (data not shown). In the former case, we verified by an in-solution binding assay that the attachment of a fluorescein moiety did not affect the affinity of an SH3 domain for its peptide target (Supporting Fig. S1). Fluorescent labeling allowed us to develop a simple assay to examine SH3–peptide interaction by incubating an SH3 domain with the peptide array and subsequently analyze the SH3–peptide complexes under a fluorescence imager. As shown in Fig. 1A, the PLCg1 SH3 domain exhibited strong binding to a number of spots on the peptide array as indi- cated by the corresponding fluorescent signals. Selective binding of the PLCg1 SH3 domain to only a subset of pep- tides on the array is consistent with the notion that the Class I and II motifs are necessary, but not sufficient for SH3 binding. Because the dissociation constant (Kd) of a typical SH3–ligand complex falls within the range of 1–100 mM [40], Figure 1. PATS identifies candidate binding proteins for the SH3 domains. (A) Binding profile of the PLCg1 SH3 domain to a PRR- m we applied the SH3 domain at 1 M to avoid signal satura- targeted peptide array containing 1536 peptides. The array was tion. The fluorescent signal produced by each peptide spot probed by the fluorescein-labeled PLCg1 SH3 domain. Bright on the array was quantified and normalized against the (fluorescent) spots indicate positive binding. (B) The signal pro- average signal across the entire array. An exponential decay duced by each peptide spot was measured and normalized function was obtained when the normalized signals were against the average signal across the entire array. Normalized plotted against the peptide spots (Fig. 1B), implying that binding signals were plotted from the strongest to weakest ver- sus spot. The curve produced (thick line) was smoothened and binding was significant for only a small fraction of the pep- fitted to an exponential function [f(x)=a6exp(b6x) 1 - tides. This observation provided a rationale for the necessity c6exp(d6x)]. The fitted curve (thin line) and transition point to experimentally identify SH3-binding partners instead of were generated in MATLAB. relying solely on the computer-aided ligand prediction based on consensus motifs [29]. Using the transition point as a cut- off (Fig. 1B), we identified a group of peptides that bound Gene Ontology (GO) annotations and protein descriptors significantly to the PLCg1 SH3 domain (Supporting Table [34] (Fig. 2A). This finding is consistent with the established S1). The same procedure was repeated for the remaining 11 roles for the eight SH3 proteins that were used as ‘baits’ in SH3 domains. the present study. A complete list of binding partners for the Using the PATS strategy, we identified 1165 binary SH3– 12 SH3 domains is available through the worldwide web ligand interactions for the 12 human SH3 domains studied (http://129.100.30.140/PATS.htm). A list of SH3-interacting herein. The large number of potential interactions mediated proteins that have an annotated function in signal transduc- by this group of SH3 domains is reminiscent of data gener- tion or/and cell–cell communication is given in Supporting ated on certain yeast SH3 domains [16], and reinforces the Table S2. notion that the SH3 domain is a promiscuous interaction Based on the PATS data and functional classification, we module [40]. Because a single SH3 domain could bind to constructed the first version of human SH3 domain-medi- multiple sites on a given prey protein, and on the other hand, ated signaling network (Fig. 2B). This network comprises multiple SH3 domains could prefer a single site, the number 700 binary interactions linking the 12 SH3 domains to 184 of SH3-binding proteins collapsed to 284 when these factors binding partners that are classified to function in signal were taken into account. The majority of these binding pro- transduction and/or cell communication. Inspection of this teins (184 of 284) are known or predicted to function in cell network revealed the following characteristics. First, each communication and/or signal transduction based on their SH3 domain is connected to multiple binding partners

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com Proteomics 2007, 7, 1775–1785 Protein Arrays 1779

Figure 2. An interaction network mediated by 12 human SH3 domains. (A) Distribution of molecular function categories for 284 SH3- interacting proteins uncovered by PATH according to their GO annotations and protein descriptors. (B) An SH3 domain-mediated interac- tion network showing baits and preys that share related GO components in signal transduction and/or cell communication. SH3 domains are depicted as ovals, interacting proteins are represented in solid circles. Connecting blue edges denote binary interactions. The diagram was generated using NAViGATOR (http://ophid.utoronto.ca/navigator).

through binary interactions. Second, significant overlaps are actors, whereas the majority of binding proteins are shared observed between sets of interacting proteins for different by two or more SH3 domains (Fig. 2B). The overall low SH3 domains. Indeed, for most of the SH3 domains depict- selectivity of the SH3 domain-mediated interaction network, ed in the network, each has only a handful of unique inter- which likely stems from the promiscuity in specificity for

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com 1780 C. Wu et al. Proteomics 2007, 7, 1775–1785 this family of protein module and from the abundance of proline-rich sequences in the human proteome, is also observed for networks mediated by the yeast SH3 domains and other PIDs such as the WW and 14-3-3 domains [30, 33, 34, 41].

3.2 Identification of a PLCª1 interaction network

The phosphoinositide-specific phospholipase C family of isozymes catalyze the conversion of phosphotidylinositol 4,5- bisphosphate (PIP2) to two second messengers, inositol- 1,4,5 phosphate (IP3) and diacylgylcerol (DAG) [42–46] that are involved in intracellular calcium release and protein kinase C activation, respectively [47]. The PLCg1 subtype, Figure 3. An interaction network for PLCg1 identified by PATS. Interacting proteins for PLCg1 are separated into difference which is ubiquitously expressed, differs from the b and d groups according to GO annotation and functional descriptors. types in that it contains an insertion in its catalytic domain. Representative examples in each group are given under that This insertion, consisting of two SH2 and one SH3 domains category and are color-coded. Blue denotes a protein that exhib- flanked by a split PH domain [42], is thought to play an ited positive binding to PLCg1 (or SH3) in solution. Yellow, nega- important part in coupling PLCg1 to receptor activation [43]. tive binding; black, not examined, red, disease-related. Interestingly, the SH3 domain has been shown to function independently of the holoenzyme. For instance, agonist- induced Ca21 entry into PC12 cells requires the SH3 instead neuronal cells [51] and SLP-76, a docking protein in T cell of the lipase domain of PLCg1 [48]. The mitogenic effect of signaling [52] are known targets of the PLCg1 SH3 domain PLCg1 was also ascribed to the SH3 domain because expres- but eluded PATS. The SH3-binding region in PIKE does not sing the SH3 domain alone could induce mitogenesis in contain a conventional motif, whereas the SLP-76 proline- quiescent NIH3T3 cells [49, 50]. How does a small peptide- rich region contains multiple potential SH3-binding sites binding module accomplish the role of a mitogen? Since the that apparently function in a concerted manner in binding a SH3 domain lacks catalytic activity found in a typical en- given SH3 domain [40]. These modes of SH3–ligand inter- zyme, it must exert its regulatory functions by associating action were not addressed by PATS in the current study. with other molecules in the cell. Using PATS, we identified 63 proteins that could poten- 3.3 Validation of PATS interactions by in-solution tially interact with PLCg1 through its SH3 domain (Sup- binding assays porting Table S1). These interacting proteins include 7 GTPases (e.g. R-Ras) and their regulators (e.g. GEFs and Because PATS measures binary domain–peptide interac- GAPs such as DDEF1/2, SOS1/2), eight kinases or phos- tions on a solid support, it is necessary to confirm PATS- phatases (e.g. M4K1, Abl2, Jak2, DUS15), nine receptors (e.g. derived interactions in solution using full-length proteins. FcgR2C, IR3Ra), ten transcription factors or DNA/RNA- Since it was impractical to confirm all the interactions iden- binding proteins (e.g. RAG-1, DLX4, AIRE), four transporters tified by PATS, we concentrated on validating interactions (e.g. RyR-1, SNX17), six other signaling molecules (e.g. mediated by the PLCg1 SH3 domain. Subject to the avail- SHAN2/3), and 19 unclassified proteins (Fig. 3). Impor- ability of cDNA clones, a representative group of 12 proteins tantly, seven disease proteins were also identified as potential identified by PATS as binding partners for the PLCg1 SH3 targets of the PLCg1 SH3 domain. These include the CUB domain were tested for binding by GST pull-down or co-IP. and sushi multiple domains protein 2 (CSMD2), dystro- These included five proteins (DDEF1/2, SOS1/2, and RIN3) brevin binding protein 1 (DTBP1), autoimmune poly- in the “GTPase/GEF/GAP” category, two (M4K1 or HPK1 endocrinopathy candidiasisectodermal dystrophy protein and Abl2) from “kinases/phosphatases”, one (FCG2B) from (AIRE), colonic and hepatic tumor overexpressed protein “receptors”, one (SNX17) from “transporters”, and two (CHTOG), Fanconi anemia group A protein (FANCA), fra- (SHAN2/3) representing “other signaling molecules” (Fig. 3). gile X mental retardation 2 protein (FMR2), and the glioma When expressed in HEK293T cells nine of the twelve pro- tumor suppressor candidate region gene 1 protein (GSCR1; teins (75%) exhibited robust binding to either the SH3 do- Fig. 3). This result suggests that the PLCg1 SH3 domain may main or intact PLCg1 (Fig. 4 and Table 1), demonstrating exert its function through a wide spectrum of regulatory that PATS is capable of identifying authentic protein–protein pathways and that deregulation of one or more of these interactions. Remarkably, eight of the nine proteins (except pathways may contribute to various diseases. Despite the for SOS1) that displayed positive binding in solution are numerous novel-binding proteins revealed here, it should be unknown interactors of PLCg1, suggesting that PATS is a noted that all known interactors of PLCg1 were not identified powerful method for uncovering novel protein functions. In by PATS. For instance, PIKE, a nuclear GTPase specific for the case of SOS1 although it was demonstrated to interact

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com Proteomics 2007, 7, 1775–1785 Protein Arrays 1781

Table 1. Binding of PLCg1 SH3 domain to peptides or the corresponding intact proteins identified by PATS

a) Protein ID (Alias) Description Binding site Kd GST (mM)b) pull-down or co-IP

DUS15_ HUMAN (DUSP15) Dual specificity protein phosphatase 15 VADTPEVPIKKHFK 2.2 2 DDEF2_HUMAN (PAG3) -associated protein with ARFGAP activity 3 TTSAPPLPPRNVGK 22.2 1 DDEF1_HUMAN (AGAP1) PIP2-dependent ARF GTPase-activating protein 1 TTEAPPLPPRNAGK 15.0 1 M4K1_HUMAN (HPK1) Hematopoietic progenitor kinase 1 EDTPPPLPPKPKFR 14.1 1 ABL2_HUMAN (ARG) Tyrosine protein kinase ABL2 ( ARG) SSGSPALPRKQRDK 11.0 1 SHAN2_HUMAN (Shank2) SH3 and multiple ankyrin repeat domains protein 2 MVDKPPVPPKPKMK 10.8 1 SOS1_HUMAN (SOS-1) Son of sevenless protein homolog 1 PKPLPRFPKKYSYP 24.4 1 SOS2_HUMAN (SOS-2) Son of sevenless protein homolog 2 CKQPPRFPRKSTFS 31.9 1 RIN3_HUMAN (RIN-3) Ras and Rab interactor 3 QVPAPPLPAKKNLP 42.3 1 SHAN3_HUMAN (Shank3) SH3 and multiple ankyrin repeat domains protein 3 LLEKPPVPPKPKLK 38.1 1 FCGB_HUMAN (FcgRIIb) IgG Fc receptor II-b VAGTPAAPPKAVLK 59.2 2 SNX17_HUMAN (SNX-17) 17 ANVLPAFPPKKLFS 68.8 2 a) Protein names are according to Swiss-Prot convention with the commonly used alias given in brackets. b) Dissociation constants (Kd) were determined by fluorescence polarization using labeled peptides and purified GST-SH3 proteins. with PLCg1 previously, the corresponding binding site was low affinity the binding site is located in the extracellular not identified [53]. The biological relevance of these interac- domain of the receptor, which likely accounts for its inability tions were further underscored by the observation that pro- to co-IP PLCg1 (data not shown). These results suggest that, teins belonging to the same family, such as SOS1 and SOS2, by including additional filters such as site-accessibility and Shank2 and Shank3, PAG3 and AGAP1, all exhibited bind- protein folding parameters in addition to GO annotations, ing to PLCg1 or its SH3 domain, implying that the corre- the reliability of PATS-derived interactions could be further sponding interaction is conserved within the family of pro- improved. teins. Moreover, binding of PLCg1 to its partners appeared to depend solely on the SH3 domain as the mutation of a Trp828 residue to Ala at the ligand-combining site led to a 3.4 A novel interaction between HPK1 and PLCª1 complete loss of binding (Fig. 4A). In complementary identified by PATS. experiments, we examined whether peptide binding seen on the array could also take place in solution. To this end, we To establish the physiological relevance of interactions synthesized soluble peptides corresponding to the SH3- revealed by PATS, one needs to carry out more focused binding sites in the 12 proteins and measured their affinities, studies on the corresponding protein pairs in a biological respectively, for purified PLCg1 SH3 domain by fluorescence setting. To this end we chose to investigate a novel interac- polarization (Fig. 4C). As seen in Table 1, all peptides bound tion between PLCg1 and HPK1 (or M4K1) in more detail. the SH3 domain with dissociating constant (Kd) values HPK1 is an STE20-like serine/threonine kinase expressed smaller than 70 mM. In contrast, peptides below the transi- only in hematopoietic cells and tissue [38], and the mechan- tion point (Fig. 1C) generally exhibited no detectable or weak ism for its regulation is not well understood. We first estab- binding (Kd.50 mM) to the SH3 domain in solution (data not lished that HPK1 and PLCg1 could interact in vivo. As shown shown). Thus peptide affinities in solution correlated with in Fig. 5A, Flag-tagged PLCg1, but not a W828A mutant that their corresponding signal strength on the array. With the disrupts the binding site of the SH3 domain, co-IP’ed with exception of the DUSP15 peptide, the affinity of a peptide HA-tagged HPK1 when both proteins were expressed in appears to be a good indicator of binding between the corre- HEK293Tcells. Since crosslinking of the Tor B cell receptors sponding full-length protein and PLCg1. Peptides displaying leads to HPK1 activation in lymphocytes, we next examined high affinity (i.e. Kd,50 mM) for the SH3 domain in solution whether the interaction between the two proteins is coupled were also capable of mediating the interaction between the to receptor activation. Jurkat Tcells were thus treated with an corresponding proteins (Table 1). It is surprising that the anti-CD3 antibody to activate the T cell receptor (verified by DUSP15 peptide, which exhibited the greatest affinity for the SLP-76 and LAT phosphorylation, data not shown) and PLCg1 SH3 domain in solution of the 12 peptides tested, endogenous PLCg1 was precipitated from both treated and represented a false positive. It is possible that the SH3-bind- untreated cells. As seen in Fig. 5B, endogenous HPK1 and ing site is inaccessible in the context of intact DUSP15, de- PLCg1 can form a complex in T cells regardless of TCR acti- spite the rarity for proline-rich segments to occur in the vation, suggesting the two proteins interact in a constitutive interior of a folded protein. In the case of FcgRIIb, besides its manner.

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com 1782 C. Wu et al. Proteomics 2007, 7, 1775–1785

Figure 5. PLCg1 binds to HPK1 in vivo and negatively regulates its kinase activity. (A) HA-tagged HPK1 was coexpressed with FLAG-tagged PLCg1 or an SH3-defective mutant in HEK293T cells and their interactions probed by co-IP using an anti-FLAG anti- body followed by Western blotting using an anti-HA antibody. Lysate (20% input) was shown as control. (B) Interaction between endogenous HPK1 and PLCg1 in Jurkat T cells was independent of T cell receptor activation by an anti-CD3 antibody. (C) HA-HPK1 was overexpressed in HEK293T cells and IP’ed using anti-HA Figure 4. Verification of PATH interactions by pull-down and co- antibodies. Kinase assay was performed on histone H2A using IP assays. (A) In vitro pull-down assays. The GST-fused PLCg1 the immunoprecipitates (equally divided) in the presence of SH3 domain or a variant containing the mutation of Trp828 to Ala 10 mM purified (His)6-tagged PLCg1 SH3 domain or the W828A was used to pull-down PAG3, ABl2, Rin3, Shank2, Shank3, and mutant. HPK1, respectively, from lysate of cells expressing these proteins with appropriate tags (i.e. GFP, FLAG, or HA) for detection on Western blots. Equivalent amount of GST and GST-SH3 proteins amounts of the PLCg1 SH3 domain the kinase activity of was used in each binding assay. WCL: whole cell lysate loaded at HPK1 was progressively reduced (Fig. 6A). Furthermore, the 20% of the amount used in the pull-down lanes. (B) Co-IP assay. inhibitory effect of the wt SH3 domain to HPK1 was essen- An anti-PLCg1 antibody was used to precipitate PLCg1 from Jur- tially reversed by an HPK1-derived peptide shown to bind to kat T cells and the associated proteins were identified using an anti-SOS2 or anti-AGAP1 antibody on Western blots. GST-PLCg1 the PLCg1 SH3 domain (Table 1), suggesting that this inhi- SH3 domain was also employed to pull-down the endogenous bitory effect was caused by binding of the PLCg1 SH3 do- proteins. (C) Representative binding curves of fluorescein- main to the proline-rich region of HPK1. The specificity of labeled peptides (shown here for SOS1 and HPK1) to purified this novel function of the PLCg1 SH3 domain was demon- (His)6-PLCg1 SH3 domain based on fluorescence polarization strated by the observation that a peptide derived from an measurements. internal SH3-binding site of the p85 subunit of the PI3 kinase (data not shown) failed to block the inhibitory effect of What could be a functional consequence of the HPK1- the SH3 domain even when applied at 250 mM (Fig. 6B). PLCg1 interaction? Since HPK1 is a Ser/Thr kinase, we next investigated whether SH3-binding could modulate the activ- 3.5 Comparison of the PATS SH3 interaction network ity of HPK1. We performed kinase assays in the presence of with OPHID either the wild-type (WT) PLCg1 SH3 domain or the W828A mutant using histone H2A as a substrate [38], and found that To evaluate the gross information content of the PATS SH3 wt PLCg1 SH3 domain inhibited the kinase activity of HPK1 subnetwork, we compared it to OPHID, a comprehensive while the mutant did not. This inhibition appeared to be database of human protein–protein interactions that com- dose-dependent since in the presence of incremental prise interactions curated from the literature as well as those

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com Proteomics 2007, 7, 1775–1785 Protein Arrays 1783

containing proteins by OPHID, suggesting that the eight SH3 proteins analyzed by PATS are highly connected in the human interactome. When comparing the PATS and OPHID interaction datasets, we identified 35 “exact matches” (both SH3 and binding proteins matched) involv- ing 21 unique proteins and an additional 28 cross-matched interactions in which the binding partners were linked to different SH3 proteins in the two datasets (Table 2). It is important to point out that all the matched interactions were mediated by the set of 184 proteins classified to func- tion in signal transduction and/or cell communication (Fig. 2), indicating that functional correlation underpins protein networking. Of the 35 matched interactions, 22 were verified in the literature to involve direct SH3-ligand binding (Supporting Table S3). Although interactions mediated by SH3 proteins constitute only a small part of the OPHID database at present, the overlap between OPHID and PATS is highly significant (p = 1.92E-05). Apart from these matches, the majority of interactions identified by PATS were novel. In particular, 550 binary interactions (that are not listed in OPHID) were identified from this study, suggesting that PATS is capable of not only recapitulating known interactions, but also unraveling novel interactions and is therefore highly complementary to existing high- throughput methods of network mapping. Nevertheless, it should be cautioned that it is likely that a significant num- ber of SH3-mediated interactions listed in OPHID were overlooked by PATS. This so-called false negative rate is, however, difficult to evaluate since OPHID (and for that matter, any current, large protein–protein interaction data- Figure 6. PLCg1 SH3 domain negatively regulates the kinase ac- base) only lists interacting proteins with no (or limited) tivity of HPK1 through binding to its proline-rich motif. (A) Kinase information on whether the interaction involves a particular assays similar to Fig. 5C were carried out in the presence of 0, 2, domain or where the binding occurs. In this regard, PATS 10, or 50 mM PLCg1 SH3 domain. (B) Kinase assays carried out at 0 or 50 mM SH3 domain in the presence of 0, 50, 200, or 250 mM may help annotate current protein–protein interaction competing HPK1 peptide (DKPPLLPPKKEKMKRKY) or 250 mMof databases by providing binding site information. a PI3K p85 peptide (sequence WRQPAPALPPKPPKPTT). from mammalian high-throughput screens and inferred 4 Discussion interologs–orthologous interactions from yeast, worm, fly, and mouse (available online http://ophid.utoronto.ca) [54]. We described the application of a peptide-array-based meth- The version of OPHID (ophid_interactions_070406) used in od to systematically identify potential protein–protein inter- this study includes 44 800 interactions for 9098 human actions mediated by a group of human SH3 domains. PATS proteins. However, the potential SH3 domain–protein or WISE [30] simplifies protein–protein interaction to do- interaction network in OPHID is considerably smaller. Of main-ligand recognition, and thereby represents a versatile the 610 human SH3 domain-containing proteins (identified method that can be used to identify binary interactions and by the PFAM-A database, version 19.0), 164 are found in the corresponding binding sites simultaneously. Combining OPHID. These 164 SH3 proteins mediate 1989 interactions the PATS and the OPHID datasets, the eight proteins exam- with 86 SH3-domain proteins (interactions between two ined in this study generated a total of 1517 binary protein– SH3-containing proteins) and 904 different non-SH3 pro- protein interactions. Notably, in our limited case study with teins. For the 8 SH3 proteins included in the current study, PLCg1, up to 75% of these interactions can be recapitulated a total of 921 interactions to 284 unique proteins were on full-length proteins, suggesting that PATS is a useful identified by PATS. In contrast, the same eight proteins method to identify authentic protein–protein interactions. mediated a total of 626 interactions with 390 different pro- With proper modifications, PATS can be readily adapted teins in OPHID. This corresponds to approximately 30% of to map the interaction networks mediated by various peptide all interactions identified to date for the 164 SH3 domain- interaction domains and to identify global substrate pools for

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com 1784 C. Wu et al. Proteomics 2007, 7, 1775–1785

Table 2. Comparison of SH3 interactions discovered by PATS protein kinases and phosphatases. It is conceivable that with the OPHID PATS may be extended to the remainder of the human SH3 (a) domain family to identify the corresponding SH3-mediated Dataset SH3 proteins SH3 interactions Overlap interaction network, or to other PIDs such as the WW and EVH1 domains that recognize proline-rich sequences [40]. a) c) PATS 8 921 (585) 35 (128) Although PATS could potentially be used to map networks OPHID 164 1989 (626)b) mediated by phosphopeptide-binding domains such as SH2, PTB, FHA, and 14-3-3 [25], the corresponding peptide (b) sequence may not be as readily accessible as a proline-rich Overlapping motif in a folded protein. In addition, the phosphorylation interactions between Cross-SH3 interactionsc) potential of each peptide has to be evaluated [29] before PATS PATH and OPHID can be applied, otherwise the number of potential targets to be screened would be too large to manipulate in a cost-effec- SH3 Interacting SH3 protein Interacting SH3 protein Protein protein (PATH) protein (OPHID) tive manner. It should be recognized that not all protein–protein CRK SOS1 ABL1 FLNA TRIO interactions are mediated by domain binding to short pep- CRK RPGF1 ABL1 PKD1 SRC tide sequences. Indeed, even for SH3 domains, many recog- CRK M4K1 ABL1 DYN2 SHAN1 nize amino acids far removed from the consensus binding CRK DOCK1 ABL1 EFS Q16248 CRK ABL2 ABL1 FLNC NPHP1 site, and some SH3–ligand interactions depend on 3-D sur- FYN SOS1 ABL1 SYNJ2 ITSN2 face–surface contacts instead of motif recognition [55]. FYN FYB ABL1 LRP2 DLG1 Moreover, certain SH3–ligand interactions are rather weak FYN CD2AP ABL1 EP15 CRK and take place only in specific biological settings or when in FYN BCA1 ABL1 VP13A MYO1E cooperation with other domains present in the same mole- GRB2 WASL ABL1 DGLP1 DLG1 GRB2 SOS2 ABL1 DGLP2 DLG1 cule [40]. Apparently these types of interactions will be over- GRB2 SOS1 ABL1 DGLP4 ABI1 looked in the current design of PATS. GRB2 SF3B4 CRK BMX SRC Despite these considerations, we expect PATS to play an GRB2 RPGF1 CRK FLNB NPHP1 important part in deciphering the human interactome. It is GRB2 P3C2B CRK FCG2B LYN GRB2 M4K1 CRK DTX1 GRB2 possible that interactions mediated by modular domains GRB2 JAK2 CRK SHAN2 BAIP2 (such as SH3) and unmodified peptides (such as Pro-rich GRB2 GHR CRK NMDE4 ABL1 motifs) provide a skeleton of basal protein–protein interac- GRB2 DYN1 CRK FANCA SPTA2 tion networks on which dynamic interactions elicited by GRB2 DAG1 FYN WASIP NCK1 PTMs (such as phosphorylation) can be added in response to GRB2 ABL1 FYN M4K5 CRKL GRB2 DAB2 FYN TRAIP ARHG7 specific environmental cues. Large-scale mapping of net- NCK1 WASL FYN MYPT1 ARHG7 works mediated by interaction domains and related bioin- NCK1 SOS1 GRB2 MYLK SRC formatics analysis should help establish the fundamental NCK1 RRAS GRB2 M4K3 SH3G2 principles of protein network organization in humans and NCK1 ABL1 GRB2 PLCG2 PLCG1 P85A SH3K1 PLCG1 ADNP Q6TME4 other metazoan species. Finally, it should be pointed out that P85A JAK2 PLCG1 SC24B DLG2 PATS is an in vitro method that maps potential, static pro- P85A GHR tein–protein interactions. Only when it is combined with P85A CSF2R other orthogonal methods of network mapping and charac- P85A ABL1 terization of network dynamics, will its full potential be real- PLCG1 SOS2 ized. PLCG1 SOS1 PLCG1 ABL1 SRC ABL1 We thank the following scientists for providing cDNA expres- sion constructs used in this work: Qi Chen, John Colicelli, a) The eight SH3 proteins were predicted by PATS to mediate Toshiaki Katada, Friedemann Kiefer, Jane McGlade, Paul Ran- 921 binary interactions with 284 different proteins, of which dazzo, Paul Worley, Hisataka Sabe, David Schubert, Thomas 585 interactions with 184 proteins having the same annotated Schlueter, and Morgan Sheng. This work was supported by grants functions were considered ‘high-confidence’ (listed in Sup- (to S. S. C. L. and I. J.) from Genome Canada through the porting Table S3) and form the basis of the SH3 signaling network depicted in Fig. 2. Ontario Genome Institute, Cancer Research Society, Inc. (to S. b) Number of OPHID interactions considering only the eight SH3 S. C. L.), the Canadian Institute of Health Research (CIHR) (to proteins tested by PATS. S. S. C. L.), US Army DOD (to I. J.), and IBM (to I. J.). C. W. c) Cross-SH3 interactions with same interacting protein in both was supported by an Ontario Graduate Scholarship (OGS). S. S. datasets (and not in overlap set), but different predicted SH3 C. L. holds a Canada Research Chair in Functional Genomics protein partners. and Cellular Proteomics.

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com Proteomics 2007, 7, 1775–1785 Protein Arrays 1785

5 References [28] Songyang, Z., Prog. Biophys. Mol. Biol. 1999, 71, 359–372. [29] Obenauer, J. C., Cantley, L. C., Yaffe, M. B., Nuclic Acids Res. [1] Hood, L., Heath, J. R., Phelps, M. E., Lin, B., Science 2004, 2003, 31, 3635–3641. 306, 640–643. [30] Landgraf, C., Panni, S., Montecchi-Palazzi, L., Castagnoli, L. [2] Hood, L., Perlmutter, R. M., Nat. Biotechnol. 2004, 22, 1215– et al., PLoS Biol. 2004, 2, e14. 1217. [31] Tang, X., Orlicky, S., Liu, Q., Willems, A., in: Deshaies, R. J. [3] Butcher, E. C., Berg, E. L., Kunkel, E. J., Nat. Biotechnol. 2004, (Eds.) Methods in Enzymology, Ubiquitin and Protein Deg- 22, 1253–1259. radation, Academic Press, New York 2005, pp. 433–458. [4] Ho, Y., Gruhler, A., Heilbut, A., Bader, G. D. et al., Nature [32] Gandhi, T. K. B., Zhong, J., Mathivanan, S., Karthick, L. et al., 2002, 415, 180–183. Nat. Genet. 2006, 38, 285–293. [5] Gavin, A.-C., Bosche, M., Krause, R., Grandi, P. et al., Nature [33] Ingham, R. J., Colwill, K., Howard, C., Dettwiler, S. et al., Mol. 2002, 415, 141–147. Cell Biol. 2005, 25, 7092–7106. [6] Uetz, P., Giot, L., Cagney, G., Mansfield, T. A. et al., Nature [34] Jin, J., Smith, F. D., Stark, C., Wells, C. D. et al., Curr. Biol. 2000, 403, 623–627. 2004, 14, 1436–1450. [7] Li, S., Armstrong, C. M., Bertin, N., Ge, H. et al., Science [35] Jones, R. B., Gordus, A., Krall, J. A., MacBeath, G., Nature 2004, 303, 540–543. 2006, 439, 168–174. [8] Giot, L., Bader, J. S., Brouwer, C., Chaudhuri, A. et al., Sci- [36] Jia, C. Y. H., Nie, J., Wu, C., Li, C., Li, S. S. C., Mol. Cell. Pro- ence 2003, 302, 1727–1736. teomics 2005, 4, 1155–1166. [9] Phizicky, E., Bastiaens, P. I. H., Zhu, H., Snyder, M., Fields, S., [37] Li, C., Iosef, C., Jia, C. Y. H., Han, V. K. M., Li, S. S.-C., J. Biol. Nature 2003, 422, 208–215. Chem. 2003, 278, 3852–3859. [10] Gunsalus, K. C., Ge, H., Schetter, A. J., Goldberg, D. S. et al., Nature 2005, 436, 861–865. [38] Kiefer, F., Tibbles, L. A., Anafi, M., Janssen, A. et al., EMBO J. 1996, 15, 7013–7025. [11] Rual, J.-F., Venkatesan, K., Hao, T., Hirozane-Kishikawa, T. et al., Nature 2005, 437, 1173–1178. [39] Finn, R. D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S. et al., Nuclic Acids Res. 2006, 34, D247–D251. [12] Stelzl, U., Worm, U., Lalowski, M., Haenig, C. et al., Cell 2005, 122, 957–968. [40] Li, S., Biochem. J. 2005, 390, 641–653. [13] Ghavidel, A., Cagney, G., Emili, A., Cell 2005, 122, 830–832. [41] Hu, H., Columbus, J., Zhang, Y., Wu, D. et al., Proteomics [14] Davierwala, A. P., Haynes, J., Li, Z., Brost, R. L. et al., Nat. 2004, 4, 643–655. Genet. 2005, 37, 1147–1152. [42] Katan, M., Williams, R. L., Semin. Cell Dev. Biol. 1997, 8, 287– [15] Tong, A. H. Y., Lesage, G., Bader, G. D., Ding, H. et al., Sci- 296. ence 2004, 303, 808–813. [43] Rhee, S. G., Annu. Rev. Biochem. 2001, 70, 281–312. [16] Tong, A. H. Y., Drees, B., Nardelli, G., Bader, G. D. et al., Sci- [44] Bootman, M. D., Lipp, P., Berridge, M. J., J. Cell Sci. 2001, ence 2002, 295, 321–324. 114, 2213–2222. [17] Gavin, A.-C., Aloy, P., Grandi, P., Krause, R. et al., Nature [45] Jordan, M. S., Singer, A. L., Koretzky, G. A., Nat. Immunol. 2006, 440, 631–636. 2003, 4, 110–116. [18] Krogan, N. J., Cagney, G., Yu, H., Zhong, G. et al., Nature [46] Putney, J. W., Nat. Cell Biol. 2002, 4, E280–E281. 2006, 440, 637–643. [47] Patterson, R. L., van Rossum, D. B., Nikolaidis, N., Gill, D. L., [19] Measday, V., Baetz, K., Guzzo, J., Yuen, K. et al., PNAS 2005, Snyder, S. H., Trends Biochem. Sci. 2005, 30, 688–697. 102, 13956–13961. [48] Patterson, R. L., van Rossum, D. B., Ford, D. L., Hurt, K. J. et [20] Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W. et al., Sci- al., Cell 2002, 111, 529–541. ence 2001, 291, 1304–1351. [49] Smith, M., Liu, Y., Matthews, N., Rhee, S. et al., PNAS 1994, [21] Chervitz, S. A., Aravind, L., Sherlock, G., Ball, C. A. et al., 91, 6554–6558. Science 1998, 282, 2022–2028. [50] Smith, M. R., Liu, Y.-l., Kim, S. R., Bae, Y. S. et al., Biochem. [22] Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C. et al., Biophys. Res. Commun. 1996, 222, 186–193. Nature 2001, 409, 860–921. [23] Liu, B. A., Jablonowski, K., Raina, M., Arce, M. et al., Mol. [51] Ye, K., Snyder, S. H., J. Cell Sci. 2004, 117, 155–161. Cell 2006, 22, 851–868. [52] Gonen, R., Beach, D., Ainey, C., Yablonski, D., J. Biol. Chem. [24] Karkkainen, S., Hiipakka, M., Wang, J. H., Kleino, I. et al., 2005, 280, 8364–8370. EMBO Rep. 2006, 7, 186–191. [53] Kim, M. J., Chang, J. S., Park, S. K., Hwang, J. I., et al., Bio- [25] Pawson, T., Cell 2004, 116, 191–203. chemistry 2000, 39, 8674–8682. [26] Pawson, T., Nash, P., Science 2003, 300, 445–452. [54] Brown, K. R., Jurisica, I., Bioinformatics 2005, 21, 2076–2082. [27] Seet, B. T., Dikic, I., Zhou, M.-M., Pawson, T., Nat. Rev. Mol. [55] Zarrinpar, A., Bhattacharyya, R. P., Lim, W. A., Sci. STKE. Cell Biol. 2006, 7, 473–483. 2003, 179, RE8.

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com