Network Reconstruction and Significant Pathway Extraction Using Phosphoproteomic Data from Cancer Cells Marion Buffard, Aurélien Naldi, Ovidiu Radulescu, Peter Coopman, Romain Larive, Gilles Freiss

To cite this version:

Marion Buffard, Aurélien Naldi, Ovidiu Radulescu, Peter Coopman, Romain Larive, et al..Net- work Reconstruction and Significant Pathway Extraction Using Phosphoproteomic Data from Cancer Cells. Proteomics, Wiley-VCH Verlag, 2019, 19 (21-22), pp.1800450. ￿10.1002/pmic.201800450￿. ￿hal- 03049205￿

HAL Id: hal-03049205 https://hal.archives-ouvertes.fr/hal-03049205 Submitted on 9 Dec 2020

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Network reconstruction and significant pathway extraction using phosphoproteomic data from cancer cells

BUFFARD Marion1,2, NALDI Aurélien3, RADULESCU Ovidiu2,#, COOPMAN Peter J1,#, LARIVE 5 Romain Maxime4,#,¶ and FREISS Gilles1,#,¶

# Equal contributing authors

1 IRCM, Univ Montpellier, ICM, INSERM, Montpellier, France. 10 2 DIMNP, Univ Montpellier, CNRS, Montpellier, France. 3 Computational Systems Biology Team, Institut de Biologie de l'École Normale Supérieure, Centre National de la Recherche Scientifique UMR8197, INSERM U1024, École Normale Supérieure, PSL Université, Paris, France. 4 IBMM, Univ Montpellier, CNRS, ENSCM, Montpellier, France. 15 ¶ To whom correspondence should be addressed at: Gilles FREISS, IRCM, INSERM U1194, 208 rue des Apothicaires, F-34298 Montpellier cedex 5, France. Tel: + 33 467 61 31 91. Fax: + 33 4 67 61 37 87. E-Mail: [email protected] ; Romain LARIVE, Faculté des Sciences Pharmaceutiques et Biologiques, Laboratoire de Toxicologie du Médicament - 20 Bâtiment K - 1er étage, 15 avenue Charles Flahault - BP 14491, 34093 Montpellier Cedex 5. Tel: + 33 411 75 97 50. Fax: + 33 411 75 97 59. E-Mail: [email protected]

Short title

Reconstructing signaling networks in cancer cells by phosphoproteomic data 25

Abbreviations

SRMS, Src-related tyrosine kinase lacking C-terminal regulatory tyrosine and N-terminal myristoylation sites PIK3CA, phosphatidyl-inositol 3-kinase 30 KEGG, Kyoto Encyclopedia of and Genomes

Keywords

Data processing and analysis; Phosphoproteomics; Oncogenic signaling; SRMS; PIK3CA

35 Total number of words: 5825

1 Abstract

Protein phosphorylation acts as an efficient switch controlling deregulated key signaling pathways in cancer. Computational biology aims to address the complexity of reconstructed networks but overrepresents well-known and lacks information on less-studied 5 proteins. We developed a bioinformatic tool to reconstruct and select relatively small networks that connect signaling proteins to their targets in specific contexts. It enabled us to propose and validate new signaling axes of the Syk kinase. To validate the potency of our tool, we applied it to two phosphoproteomic studies on oncogenic mutants of the well-known PIK3CA kinase and the unfamiliar SRMS kinase. By combining network reconstruction and signal 10 propagation, we built comprehensive signaling networks from large-scale experimental data and extracted multiple molecular paths from these kinases to their targets. We retrieved specific paths from two distinct PIK3CA mutants, allowing us to explain their differential impact on the HER3 receptor kinase. In addition, to address the missing connectivities of the SRMS kinase to its targets in interaction pathway databases, we integrated phospho-tyrosine and 15 phospho-serine/threonine proteomic data. The resulting SRMS-signaling network comprised casein kinase 2, thereby validating its currently suggested role downstream of SRMS. Our computational pipeline is publicly available, and contains a user-friendly graphical interface (http://dx.doi.org/10.5281/zenodo.3333687).

2 Statement of significance of the study

This study applies and validates a novel bioinformatic pathway extraction and analysis tool on two phosphoproteomic studies on the signaling of oncogenic mutants of the PIK3CA kinase and the SRMS kinase. By combining network reconstruction and signal propagation analysis, 5 we build comprehensive cell signaling networks from substantial experimental data and extract multiple molecular pathways from a kinase to its targets. These various alternatives, ranked by their biological significance, enable us to conceive of molecular hypotheses requiring experimental validation. The results of this study demonstrate here that our framework can be applied to explore substantial amounts of phosphoproteomic data at the 10 network level.

3 1 Introduction

Aberrant phosphorylation contributes to tumor initiation and progression. Despite the development of targeted kinase inhibitors, it remains difficult to predict how tumors will respond to them; which inhibitors to combine; and how to overcome acquired drug resistance. A major 5 shortcoming is the poor molecular understanding of the kinase signaling networks and remains a challenging bioinformatical task. Pathway-oriented databases, such as KEGG [1–3], Pathway Commons [4] and Reactome [5], contain regulatory relations between proteins, allowing large-scale reconstruction of signaling networks. These databases rely on curation and updating of the interactions. These databases 10 suffer from the overrepresentation of well-studied proteins and the lack of information on less- known proteins. Discovery of signaling pathways and molecular cross-talk is based on experiments and efficient bioinformatic tools that are able to exploit new experimental data and correct the extant biases in the databases. Several tools, such as Netwalker and Pathlinker, were used for the analysis of large-scale networks [6,7]. Netwalker is a software 15 application suite with random walk-based network analysis methods for network-based comparative interpretations of genome-scale data. Pathlinker computes the k-shortest simple paths in a network from a source to a target with an option for weighting the edges in the network. We recently developed a new bioinformatic pipeline that combines the advantages of these 20 two existing methods. Additionally, we integrated our methodology with the reconstruction of a large network composed of the elements of existing database pathways, which are enriched in targets previously identified by phosphoproteomic experiments. This step, prior to network analysis by subnetwork-extraction, avoids the major drawback of the aforementioned over- or underrepresentation. This methodology was applied to the reconstruction and signal 25 propagation analysis of the Syk kinase signaling network in cells [8]. The method allows reconstruction of a kinase-related network from the global phosphoproteomic data obtained by mass spectrometry. The input to our method was a list of Syk-dependent differentially tyrosine-phosphorylated proteins [9]. We selected the pathways from existing databases, enriched in Syk-targets, to recreate a global network of signaling proteins. This 30 large network still contains numerous unessential proteins, and we developed a reduction algorithm by selecting the most appropriate potential paths from Syk to its targets. We first associated weights to the interaction network edges. These weights promoted network- directed edges coming from a protein kinase or phosphatase to an identified target and demoted edges with no biological relevance. We then refined these weights by taking into 35 account the topology of the network and optimizing signal propagation, by a random walk with restart (RWR). Subnetworks, related to specific biological processes and based on the Syk- target Ontology, were then extracted. This workflow generated valuable results and allowed us to validate the involvement of Syk in -mediated adhesion and motility via cortactin and ezrin. 40 In this study, we further develop the functionality of our bioinformatic tool by adapting and applying it to two phosphoproteomic studies on the signaling of oncogenic PIK3CA (phosphatidyl-inositol 3-kinase) mutants and the SRMS (Src-related tyrosine kinase lacking C-terminal regulatory tyrosine and N-terminal myristoylation sites) kinase. We optimized the automation of our initial Python code to facilitate the implementation of our bioinformatic 45 method. This approach enables us to retrieve specific signaling molecular paths from two

4 distinct PIK3CA mutants to the HER3 receptor. We also generated the proximal and distal signaling networks of the SRMS protein tyrosine kinase comprising secondary signaling intermediates by integrating phospho-tyrosine and phospho-serine/threonine proteomic data. We integrate these improvements into our workflow and propose a graphical interface allowing 5 one to apply this bioinformatic pipeline to other phosphoproteomic analyses.

2 Materials and Methods

2.1 Phosphoproteomic data for the bioinformatic workflow

The bioinformatic workflow input is a list of the UniProt Accession Numbers (AC) of proteins that have been identified as differentially phosphorylated (named “targets”) between 10 experimental conditions perturbing the concerned kinase (named “source”). Identification of specific paths from PIK3CA mutants to the receptor tyrosine kinase HER3 (section 3.2) involved the following: The quantitative phosphoproteomic analyses comparing the control or isogenic breast cancer cell lines that express the E545H or H1047R PIK3CA mutants were performed as reported [10]: After protein extraction, trypsin digestion, and anti- 15 phosphotyrosine immuno-affinity chromatography enrichment, the SILAC-labeled peptides were identified and quantified by LC-MS/MS. The datasets of the protein targets of the E545H or H1047R PIK3CA mutants are displayed in Supplementary Tables S1-4 and were obtained from Supplementary Tables S1-4 of the original work [10]. The sources are the E545H or H1047R PIK3CA mutants. 20 Reconstruction of the SRMS signaling network by integration of multiple phosphoproteomic data sets (section 3.3) involved: The label-free quantitation-based phosphoproteomic analysis using cells expressing GFP alone (the empty vector control) or cells expressing wild-type GFP- SRMS was performed as described [11]. After protein extraction, the proteins were digested by dual enzymatic digestion (Trypsin/Lys-C) and the phosphopeptides were enriched using TiO2 25 resin. The dataset containing the indirect targets of SRMS (proteins differentially phosphorylated on serine and threonine) is displayed in Supplementary Table S5 and was obtained from Supplementary Tables S4-5 of the original work [11]. The dataset containing the direct substrates of SRMS is displayed in Supplementary Table S6 and was obtained from Supplementary Table S8 of the original work [12]. The source is SRMS to search the paths from 30 SRMS to the CK2 subunits. The sources are the four CK2 subunits to search the paths from CK2 to the indirect targets of SRMS.

2.2 Online databases

UniProt AC mapping from UniProt.org/downloads (2017/02) 35 HGNC dataset from genenames.org/cgi-bin/statistics (2017/02) GO ontology from geneontology.org/page/download-ontology (go-basic.obo, 2017/02) GO annotation from geneontology.org/page/download-annotations (goa_human.gaf, 2017/02) KEGG: www.kegg.jp, release 84 (2017/10)

5 Pathway commons: pathwaycommons.org release 8 (2018/01)

2.3 Pathway database selection

We used pathways from the KEGG [1] and Pathway Commons [4] databases. For PIK3CA reconstruction, we first selected the more enriched pathways in the lists of targets (using a 5 Fisher exact test) and included the pathways containing targets not covered by significantly overrepresented pathways from the same database [8]. As SRMS signaling has not been characterized, we kept all the pathways without selection and added the links from SRMS to its identified direct substrates [12]. The selected pathways were combined, resulting in a larger directed network forming the prior-knowledge network. Each node corresponds to a unique 10 protein and edges to all its interactors in the different selected pathways.

2.4 Functional protein annotations

For PIK3CA reconstruction, the components of the network with tyrosine kinases (GO:0004713) and tyrosine phosphatases (GO:0004725) GO terms were annotated as phospho-tyrosine modifiers and we extended the list of phospho-tyrosine modifiers from 123 15 proteins to 207 manually verified proteins. For SRMS, those proteins present in the serine/threonine kinase activity list (GO:0004674) were annotated as kinase proteins.

2.5 Search path from source to targets

The reconstructed, embedded, large network contains thousands of nodes and edges and 20 billions of path possibilities. We constrained the path research by using an ad hoc distance and edge weights. The path research is based on a weighted near-shortest-path analysis that employs a modified version of Dijkstra’s shortest path algorithm. To define the edge weights, we combine functional annotation with random walk for weight refinement. 25 As targets are differentially phosphorylated, we promote edges from a kinase or phosphatase, in correlation with the functional annotation for each dataset studied, to a target by adding a smaller weight to the corresponding edges (adapted from [8]). Conversely, we demote edges reaching a target identified as differentially phosphorylated but that did not originate from a kinase or phosphatase (for the complete list of ad hoc weights used in this study, see the Supp 30 Figure S1). Random walk analysis allows the weights to be refined by taking into account network topology. This analysis allows the avoidance of multiple paths with exactly the same length and favors plausible paths containing crossroad proteins. We simulated a random walk with return on the network twice; firstly using equal weights for all edges and a secondly using the 35 ad hoc weights. The equilibrium node probabilities, in the two cases, are used to modulate the ad hoc weights and eliminate biases created by topology (for details, see [8]). Contrary to its usual implementation [14], we do not use the random walk method to prune the network but to refine its initial weights. The final path selection was performed using Dijkstra’s algorithm. The Dijkstra algorithm identifies the shortest path from source to every target. As alternative paths 40 can also be interesting, we slightly modified this algorithm from its original form.

6

In this modified algorithm, not only the shortest paths but also longer paths are accepted. The “overflow”, defined as the extra distance measured as percentage of the shortest path, necessary to include near-shortest paths, is a parameter of the method. The overflow is zero 5 for the shortest paths. The choice of shortest paths is sufficient on the first analysis to test that all targets were connected to the source. To refine the analysis for specific targets, the overflow value should be set empirically, by continuously increasing it from zero until new, alternative paths are selected.

10

2.6 Subnetwork extraction

Sets of alternative paths define subnetworks in the prior large network. Finally, it is also possible to extract subnetworks according to groups of GO terms representing relevant processes and functions, for example cell adhesion and motility [8] or a selected subset of 15 targets. This approach was used to separate networks, by processes and functions, or to reduce the network size to explore, more deeply, alternative paths (e.g. HER3 in the PIK3CA mutant study).

2.7 Network visualization, comparison and analysis

20 Cytoscape 3.7 (http://www.cytoscape.org/) was used to visualize and explore the networks and to generate figures [15]. For alignment and comparison of the networks obtained for PIK3CA mutants we used DyNet, the Cytoscape plug-in [16]. The parameters were set as follows. Initial layout: Prefuse Force Directed Layout; Treat networks as: Directed networks; Find corresponding nodes by: name; Find corresponding edges by: interaction. 25 To retrieve information about the putative in vivo kinases and the functional effects of phosphorylation on the activity of the target proteins that were identified with differentially phosphorylated peptides, we manually consulted the site-specific annotation database PhosphoSitePlus [17].

7 3 Results and Discussion

3.1 Improvement of network reconstruction and shortest path

analysis

The original Python code (https://github.com/aurelien-naldi/NetworkReconstruct) was 5 modified to optimize the automatic network reconstruction and extraction of all subnetworks within the same improved application program. We also integrated all the pathways from Pathway Commons (Reactome, Panther, and PID) with the KEGG’s pathways (Figure 1). The selection of signaling pathways in databases can be expanded as much as is necessary to maximize the path possibilities (step 1). This selection could be necessary when the resulting 10 prior-knowledge network lacks connectivity from source to targets. We then embedded the pathways to create a directed network (step 2). If no selection is applied to step 1, a large network will be generated containing all known database pathways. We also included the possibility of adding protein interactions from experimental data directly to the prior-knowledge network, which is particularly useful if there is a lack of connectivity of the source to the prior- 15 knowledge network (e.g., when the source is poorly described in pathway databases). We applied both options in the case of the SRMS kinase, adding the interactions from SRMS to its substrates (see subsection 3.3 for more details). To detect the most relevant paths and to eliminate unnecessary interactions (step 3), we used the strategy of the weighted shortest path search. This added the possibility of promoting edges according to the type of 20 phosphoproteomic data (see the material and methods). The topology of the network is still taken into account by refining the weight of the protein interactions with the random walk procedure. The subnetwork extraction has also been integrated within the script and can be applied to retrieve those paths from the source to all experimentally identified targets and to those involved in specific cellular processes (based on their ), or to a particular 25 list of proteins of interest (step 4). The overflow, admitting the shortest paths and allowing the inclusion of alternative paths, has also been made modular. This option is important in network biology to understand the etiology of drug mechanisms and drug resistance. The application of this bioinformatic methodology is illustrated below and applied to two phosphoproteomic studies. 30

3.2 Identification of specific paths from PIK3CA mutants to the

receptor tyrosine kinase HER3

As a first validation of our improved method for analyzing the molecular paths from a protein to its targets, we selected a study published in Proteomics that uses a mass spectrometry- 35 based phosphoproteomic approach to identify unique mediators of the oncogenic PIK3CA signaling [10]. PIK3CA is an attractive target for cancer therapy because its activity is often dysregulated in cancer. The p110α catalytic subunit of PI3K, encoded by the PIK3CA gene, is one of the most frequently mutated oncogenes in breast cancer [18]. Two recurrent oncogenic

8 “hotspot” mutations, E545H and H1047R, occur in the helical and the kinase domains of the PIK3CA protein [19]. Although the implications of PIK3CA mutations in cell transformation have been established [20], the mechanisms of how they lead to increased oncogenic features have not been determined to date . Figure 1 of Blair and colleagues (2015) depicts the experimental 5 strategy applied to analyze the impact of each of these two mutations on the cell signaling of human breast epithelial cells. A differential phosphorylation pattern was observed between the two PIK3CA mutants. The authors focused on their distinct impact on HER3 that is specifically phosphorylated on the Y1328 residue in the presence of the H1047R mutant or on the Y1159 residue in the presence of the E545H mutant. Thus, HER3 could be a molecular 10 intermediate that conducts the signal from the H1047R mutant, but not the E545H mutant, to the MAP kinase pathway. To identify the specific paths from each of the PIK3CA mutants to the HER3 receptor, we reconstructed the signaling networks of each PIK3CA mutant using quantitative phosphoproteomic comparison with the control condition. To explain the different 15 consequences of the two mutants, we considered the same source, but with different targets, in our network reconstruction process, leading to two distinct prior networks (Supp Tables S1- 2). We then applied our analytic method to select the more reliable paths linking the PIK3CA mutants to their targets in each network. The superposition of these two “shortest path” networks revealed paths specific for the E545H PIK3CA (red edges and nodes) or H1047R 20 PIK3CA mutant (green edges and nodes) (Figure 2A). Among the shared targets (white diamonds), some were reachable by paths specific to the E545H PIK3CA mutant (red edges to white diamonds) or to the H1047R PIK3CA mutant (green edges to white diamonds). These properties highlight differences between the signaling networks of each PIK3CA mutant and allow us to formulate molecular hypotheses that can be experimentally verified. Next, we 25 focused on the paths linking PIK3CA to HER3 in each network and found the same path for the two mutants, linking PIK3CA to HER3 through the tyrosine kinases PTK2 (focal adhesion kinase 1) and FYN (Figure 2B). Although this result suggests the indirect impact of PIK3CA on the tyrosine phosphorylation of HER3, it did not explain the differential regulation of HER3 by the two PIK3CA mutants. Enlarging our selection to the near shortest paths, an alternative 30 path through the SRC kinase was detected (Supp Figure S2A-B) but was still shared by the two mutants. According to Blair and colleagues (2015), comparing quantitative differences between the E545H and H1047R PIK3CA mutants allows one to focus on the unique signaling alterations of these two mutations. We therefore refined our analysis by selecting only the 35 phosphoproteomic differences between the E545H and H1047R mutants (Supp Tables 3-4). We reconstructed the PIK3CA mutant networks and searched for paths leading to HER3. The path from the E545H mutant to HER3 remained identical, but the path from the H1047R mutant was profoundly modified, with the MET receptor kinase serving as the final component linked to HER3 (Supp Figure S2C). MET was experimentally identified in the 40 phosphoproteomic screen and the interaction between MET and HER3 has been shown to confer resistance of cancer cells to EGFR pharmacological inhibitors [21]. Nevertheless, the previous step of this path was the interaction between the ligand for the receptor-type KIT kinase (KITLG) and MET, and KITLG has not been described as an activator of MET. This interaction was retrieved from three KEGG pathways as a general mechanism describing the 45 activation of receptor tyrosine kinases by extracellular growth factors (RAS, PI3K-AKT and RAP1). We enlarged the selection to the near shortest paths and, searching for more relevant interactions upstream of MET, we identified the hepatocyte growth factor (HGF) MET ligand that is linked to the H1047R PIK3CA mutant by STAT3, MAPK1 and PTK2 (Figure 2C and

9 Supp Figure S3). The majority of components in this path were experimentally identified in the phosphoproteomic screen (diamond nodes). We retained this hypothetical path that describes an autocrine and/or paracrine signaling mechanism with the production of extracellular HGF leading to phosphorylation of HER3 by MET. This example clearly demonstrates that our 5 method is useful to retrieve the molecular signaling pathways from protein kinases to their targets, identified by a phosphoproteomic screening, at a level of detail and plasticity for which interesting biological hypotheses may be generated, explored, and tested.

3.3 Reconstruction of the SRMS signaling network by integration

10 of multiple phosphoproteomics data

SRMS is a nonreceptor protein-tyrosine kinase that belongs to the BRK family kinases. While discovered in 1994, little information on the biochemical, cellular and physiopathological roles of SRMS has been reported [22]. SRMS is highly expressed in breast cancers compared to normal mammary cell lines and tissues and SRMS is a candidate serum biomarker for gastric 15 cancer [23,24]. Recently, Goel and colleagues [11] attempted to uncover SRMS-regulated signaling by identifying differentially phosphorylated peptides on serine or threonine by mass spectrometry. The phosphorylation of these SRMS-indirect targets indicates the regulation of protein-serine/threonine kinase signaling intermediates by SRMS. Phosphorylation motif analysis suggested that casein kinase 2 (CK2) may represent a key downstream target of 20 SRMS [11]. Interestingly, CK2 has been characterized as a crucial player in cancer biology and an attractive target for anticancer drug design [25,26]. To identify the molecular paths linking SRMS to its indirect targets through CK2, we applied our methodological workflow to reconstruct the SRMS-associated network using the proteins differentially phosphorylated on serine and threonine (Supp Table S5). Despite a resulting 25 network of a consistent size (5216 edges and 760 nodes), SRMS was isolated from the major region of this network and only linked to the BRK/PTK6 kinase (Figure 3A and Supp Figure S4). We assumed that this lack of connectivity from SRMS to its signaling network was a consequence of its underrepresentation in signaling databases and searched to add direct protein interactions of SRMS to the network. Goel and colleagues [11] identified novel candidate 30 SRMS substrates using phosphotyrosine antibody-based immunoaffinity purification in large- scale, label-free, quantitative phosphoproteomics and validated a subset of the SRMS candidate substrates by high-throughput peptide arrays [12]. We enriched the set of SRMS targets used for the pathway selection step of the network reconstruction with the SRMS- candidate substrates (steps 1-2 of our methodological workflow) (Supp Table S6). We also 35 added the direct interactions, from SRMS to its substrates, to the set of network interactions, increasing the size of the resulting protein interaction network (6321 edges and 1307 nodes) (Supp Figure S5). Consequently, we searched for the molecular paths from SRMS to its indirect targets. Despite the reconnection of SRMS to the major part of the directed network, only two of its indirect targets were reachable from SRMS (Figure 3B). 40 Our network reconstruction procedure is based on the generation of a prior-knowledge interaction network composed of the components and interactions described in the public databases of signaling pathways. While such networks are often assembled using complete pathway or interaction databases, we select only the enriched pathways in the list of phosphoproteomic data. Consequently, this restriction reduces the number of irrelevant

10 interactions and better assesses the relevance of the identified pathways. In the case of SRMS, however, we did not obtain enough coverage to connect SRMS to its indirect targets. For this reason, we enlarged the prior-knowledge interaction network to the components and interactions described in the public databases of all signaling pathways without selecting those 5 enriched in the list of phosphoproteomic data. Then, we searched for the molecular paths from SRMS to its indirect targets in the resulting prior-knowledge interaction network (111736 edges and 8313 nodes). Among the 60 indirect targets of SRMS, 29 were present in this network and 16 were now reachable from SRMS. Since CK2 was identified as one of the major potential SRMS-secondary signaling intermediates, we included CK2 in the list of SRMS- 10 indirect targets and searched for the most relevant molecular paths from SRMS to its indirect targets. In agreement with Goel and colleagues [11], we retrieved CK2 as a candidate intermediate protein-serine/threonine kinase to propagate the signal to the SRMS-indirect targets (Figure 3C). CK2 is composed of two α and two β subunits that appear in our network as CSNK2A1, CSNK2A2, CSNK2A3 and CSNK2B. These subunits are reachable from SRMS 15 through CDK2 (cyclin-dependant kinase 2), a potential SRMS substrate, and propagate the signal to PSMA3 and RAD23A, two indirect SRMS targets, PSMA3 being a known substrate of CK2 [27]. Three other SRMS-candidate substrates are involved in propagating the signal to the SRMS-indirect targets; the CDK1 (cyclin-dependant kinase 1), the GEFs VAV2, and DOK1. Interestingly, CDK1 was also retrieved as a candidate intermediate kinase by Goel and 20 colleagues [11] and DOK1 has been described as an SRMS substrate [23]. To test whether CK2 could propagate the signal from SRMS to all of its indirect targets, we searched the paths from the four CK2 subunits to the SRMS-indirect targets. All of the targets were reachable from each CK2 subunit, suggesting that the role of CK2 as a downstream intermediate of SRMS could be even more prominent. We merged these paths with the path from SRMS to CK2 to 25 obtain a SRMS-signaling network with paths that could confirm the role of CK2 as an intermediate of SRMS (Supp Figure S6). These networks now allow us to formulate molecular hypotheses that require experimental validation to explore the functional role of each of the CK2, CDK2, CDK1 and DOK1 kinases as signaling intermediates of the SRMS kinase.

30 3.4 Graphical interface

The bioinformatic workflow presents compelling results and provides valuable indications to study protein signaling pathways based on phosphoproteomic data. Used first to study Syk kinase signaling in breast cancer, we demonstrate here that this workflow can be more widely applied to other kinases and other types of data (direct substrates, serine/threonine 35 phosphorylations) with appropriate modifications. Moreover, we have developed a graphical interface that combines the different options and adaptations that were included in this study (Supp Figures S7-8 and Supplementary text) (http://dx.doi.org/10.5281/zenodo.3333687).

11 4 Concluding Remarks

In this study, we demonstrate that our recently developed bioinformatic pipeline can be generally adapted to other phosphoproteomic datasets and allow us the discovery of candidate mechanisms that explain how signals propagate in large networks of signaling 5 proteins. Further improvements, such as the consideration of the phosphorylated sites and the quantitative phosphoproteomic data, would be necessary to advance towards spatiotemporal dynamic models of signaling and behavior. Taking into account the site of phosphorylation rather than the entire protein would lead to the possibility of predicting the protein kinase upstream of each detected phosphorylation, by analyzing the phosphorylation 10 motifs. Additionally, the impact of phosphorylation on the activity of the identified targets could be retrieved from phosphorylation databases and used to refine the inference of signal propagation. Finally, introducing the quantitative dimension of the phosphoproteomic data would permit the quantification of the static response of the signaling network to specific perturbations, by the bias of, for instance, modular response [28] or static response analysis 15 methods [29]. The results of this study may open the path towards a dynamic description of signaling that uses detailed representations of the interaction mechanisms and can integrate temporal fluctuations at the system level [30].

12 Acknowledgments

This work was supported by grants from the Plan Cancer (ASC14021FSA), the Ligue Régionale Contre le Cancer (Hérault R18024FF) and the INCa-Cancéropôle GSO (Emergence program, N°2018-E01). MB is a recipient of the Labex EpiGenMed PhD 5 fellowship (an “Investissements d’avenir” program, reference ANR-10-LABX-12-01). The authors declare no conflict of interest.

13 5 References

[1] M. Kanehisa, S. Goto, Nucleic Acids Res. 2000, 28, 27. [2] M. Kanehisa, M. Furumichi, M. Tanabe, Y. Sato, K. Morishima, Nucleic Acids Res. 2017, 45, D353. 5 [3] M. Kanehisa, Y. Sato, M. Furumichi, K. Morishima, M. Tanabe, Nucleic Acids Res. 2019, 47, D590. [4] E. G. Cerami, B. E. Gross, E. Demir, I. Rodchenkov, Ö. Babur, N. Anwar, N. Schultz, G. D. Bader, C. Sander, Nucleic Acids Res. 2011, 39, D685. [5] A. Fabregat, S. Jupe, L. Matthews, K. Sidiropoulos, M. Gillespie, P. Garapati, R. Haw, 10 B. Jassal, F. Korninger, B. May, M. Milacic, C. D. Roca, K. Rothfels, C. Sevilla, V. Shamovsky, S. Shorser, T. Varusai, G. Viteri, J. Weiser, G. Wu, L. Stein, H. Hermjakob, P. D’Eustachio, Nucleic Acids Res. 2018, 46, D649. [6] K. Komurov, S. Dursun, S. Erdin, P. T. Ram, BMC Genomics 2012, 13, 282. [7] A. Ritz, C. L. Poirel, A. N. Tegge, N. Sharp, K. Simmons, A. Powell, S. D. Kale, T. M. 15 Murali, Npj Syst. Biol. Appl. 2016, 2, 16002. [8] A. Naldi, R. M. Larive, U. Czerwinska, S. Urbach, P. Montcourrier, C. Roy, J. Solassol, G. Freiss, P. J. Coopman, O. Radulescu, PLoS Comput. Biol. 2017, 13, e1005432. [9] R. M. Larive, S. Urbach, J. Poncet, P. Jouin, G. Mascré, A. Sahuquet, P. H. Mangeat, P. J. Coopman, N. Bettache, Oncogene 2009, 28, 2337. 20 [10] B. G. Blair, X. Wu, M. S. Zahari, M. Mohseni, J. Cidado, H. Y. Wong, J. A. Beaver, R. L. Cochran, D. J. Zabransky, S. Croessmann, D. Chu, P. V. Toro, K. Cravero, A. Pandey, B. H. Park, PROTEOMICS 2015, 15, 318. [11] R. K. Goel, M. Meyer, M. Paczkowska, J. Reimand, F. Vizeacoumar, F. Vizeacoumar, T. T. Lam, K. E. Lukong, Proteome Sci. 2018, 16, 16. 25 [12] R. K. Goel, M. Paczkowska, J. Reimand, S. Napper, K. E. Lukong, Mol. Cell. Proteomics MCP 2018, 17, 925. [13] R. K. Goel, M. Meyer, M. Paczkowska, J. Reimand, F. Vizeacoumar, F. Vizeacoumar, T. T. Lam, K. E. Lukong, Proteome Sci. 2018, 16, 16. [14] K. Komurov, M. A. White, P. T. Ram, PLoS Comput. Biol. 2010, 6, DOI: 30 10.1371/journal.pcbi.1000889. [15] P. Shannon, A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, D. Ramage, N. Amin, B. Schwikowski, T. Ideker, Genome Res. 2003, 13, 2498. [16] I. H. Goenawan, K. Bryan, D. J. Lynn, Bioinformatics 2016, 32, 2713. [17] P. V. Hornbeck, B. Zhang, B. Murray, J. M. Kornhauser, V. Latham, E. Skrzypek, 35 Nucleic Acids Res. 2015, 43, D512. [18] K. E. Bachman, P. Argani, Y. Samuels, N. Silliman, J. Ptak, S. Szabo, H. Konishi, B. Karakas, B. G. Blair, C. Lin, B. A. Peters, V. E. Velculescu, B. H. Park, Cancer Biol. Ther. 2004, 3, 772. [19] B. Karakas, K. E. Bachman, B. H. Park, Br. J. Cancer 2006, 94, 455. 40 [20] A. G. Bader, S. Kang, P. K. Vogt, Proc. Natl. Acad. Sci. 2006, 103, 1475. [21] J. A. Engelman, K. Zejnullahu, T. Mitsudomi, Y. Song, C. Hyland, J. O. Park, N. Lindeman, C.-M. Gale, X. Zhao, J. Christensen, T. Kosaka, A. J. Holmes, A. M. Rogers, F. Cappuzzo, T. Mok, C. Lee, B. E. Johnson, L. C. Cantley, P. A. Jänne, Science 2007, 316, 1039. 45 [22] N. Kohmura, T. Yagi, Y. Tomooka, M. Oyanagi, R. Kominami, N. Takeda, J. Chiba, Y.

14 Ikawa, S. Aizawa, Mol. Cell. Biol. 1994, 14, 6915. [23] R. K. Goel, S. Miah, K. Black, N. Kalra, C. Dai, K. E. Lukong, FEBS J. 2013, 280, 4539. [24] M.-W. Yoo, J. Park, H.-S. Han, Y.-M. Yun, J. W. Kang, D.-Y. Choi, J. won Lee, J. H. Jung, K.-Y. Lee, K. P. Kim, PROTEOMICS 2017, 17, 1600332. 5 [25] J. H. Trembley, G. Wang, G. Unger, J. Slaton, K. Ahmed, Cell. Mol. Life Sci. CMLS 2009, 66, 1858. [26] I. M. Hanif, I. M. Hanif, M. A. Shazib, K. A. Ahmad, S. Pervaiz, Int. J. Biochem. Cell Biol. 2010, 42, 1602. [27] S. Bose, F. L. L. Stratford, K. I. Broadfoot, G. G. F. Mason, A. J. Rivett, Biochem. J. 10 2004, 378, 177. [28] B. N. Kholodenko, A. Kiyatkin, F. J. Bruggeman, E. Sontag, H. V. Westerhoff, J. B. Hoek, Proc. Natl. Acad. Sci. 2002, 99, 12841. [29] Radulescu Ovidiu, Lagarrigue Sandrine, Siegel Anne, Veber Philippe, Le Borgne Michel, J. R. Soc. Interface 2006, 3, 185. 15 [30] M. Buffard, O. O. Ortega, C. F. Lopez, O. Radulescu, JOBIM Meet. Abstr. A34 2017.

15 Figure legends

Figure 1. Workflow of the network construction and signal

propagation analysis.

This workflow allows us to uncover potential signaling paths, from a kinase of interest to a list 5 of proteins identified by phosphoproteomic experiments. Step1: Select pathways from KEGG and Pathway Commons databases. Step 2: Embed the selected pathways to create a prior- knowledge interaction network. Step 3: Search for paths from the source to its experimentally identified targets by a combination of weighted shortest paths and random walk methods. Step 4: Focus on the more biologically relevant paths to a subset of targets or to a unique target. 10

Figure 2. Identification of specific paths from the PIK3CA mutants to

the receptor tyrosine kinase HER3.

The protein interaction networks are composed of nodes and edges. Nodes represent the proteins whose diamond or rounded rectangle shape correspond to the experimentally 15 identified targets or to the proteins of the pathway databases, respectively. The edges of the networks represent the protein interactions whose target arrow shape corresponds to the sign of the interaction (Delta, positive interaction; T, negative interaction; Circle, unknown consequence). (A) Alignment and comparison of the signaling networks of the E545H and H1047R PIK3CA 20 mutants obtained from the quantitative phosphoproteomic comparison of each mutant with the control condition. The source of the signal (PIK3CA) is displayed in yellow. Red edges and nodes are specific for the E545H mutant. Green edges and nodes are specific for the H1047R mutant. White nodes and gray edges are common to both networks. (B) Subnetwork of the signal propagation from the PIK3CA mutants to HER3 extracted from 25 the signaling networks of the E545H and H1047R mutants obtained from the quantitative phosphoproteomic comparison of each PIK3CA mutant with the control condition. (C) A subset of the near shortest paths from the H1047R to HER3 extracted from the signaling network of the H1047R mutants obtained from the quantitative phosphoproteomic differences between the E545H and H1047R mutants. 30

Figure 3. Reconstruction of the SRMS-signaling network by

integration of multiple phosphoproteomics data

The protein interaction networks are composed of nodes and edges. Green nodes with rounded rectangle shapes represent the proteins experimentally identified as potential direct

16 substrates of SRMS (phosphorylated on tyrosine residues). Red diamond-shaped nodes represent the experimentally identified indirect targets of SRMS (phosphorylated on serine/threonine residues). The edges of the networks represent the protein interactions whose target arrow shape corresponds to the sign of the interaction (Delta, positive interaction; 5 T, negative interaction; Circle, unknown consequence). (A) SRMS subnetwork isolated from the prior-knowledge network obtained from embedding the database pathways enriched in the list of SRMS-indirect targets. (B) Subnetwork of the signal propagation from SRMS to its direct substrates (green round rectangles) and to its indirect targets (red diamonds). This subnetwork is extracted from the 10 prior-knowledge network enriched with the -direct SRMS substrates. (C) Subnetwork of the signal propagation from SRMS to its direct substrates (green round rectangles) and to its indirect targets (red diamonds). This subnetwork is extracted from the prior-knowledge network enlarged to the components and interactions described in the public databases of all signaling pathways. The CK2 subunits are light blue in color.

17 Buffard, figure 1. Target proteins (from phosphoproteomic analyses) 1. Selection of the pathways from databases

2. Embed pathways

3. Search for path from source to targets

4. Extract subnetwork corresponding to GO terms or specific target Buffard, Figure 2

A BC

Buffard, Figure 3 A B

C

Buffard, figure S1

Target Target Target Kinase/ Protein Protein protein phosphatase protein protein

D=1 D=2 D=3 D=5 D=6 D=8

Target Target Target Target Protein Protein protein protein protein protein

: Differentially phosphorylated proteins = target proteins

: Kinase or phosphatase protein

: Protein from databases not identified in dataset with no kinase or phosphatase activity Buffard, Figure S2

A B C

PIK3CA

PRKCB

PRKCA PRKCE PRKCD PRKCZ PTK2 INSR

KITLG MAPK3 CBL CBLC CBLB

JAK2

MAPK1

FGF22 FGF16 FGF4 FGF10 FGF17 FGF9 FGF20 FGF8 STAT3 FGF3 FGF7 FGF19 FGF6 FGF23 FGF1 FGF18 SPRY2

FGF2 INS VEGFB VEGFC PGF FIGF VEGFA HGF

MET

HER3 CREB1 CDKN1B TSC2 GSK3A

PRR5 MTOR FOXO1 NR4A1

CHUK RPS6KB2 AKT1S1 FOXO4

CDKN1A MLST8 MAPKAP1 BAD AKT1

RICTOR TP53 None CASP9 MDM2 CDKN2A AKT3 ILK FOXO3 KIF20A PIK3CD PDPK1 KIF23

PIAS2 PIK3CA PIK3CB UBE2I HIC1 PIK3R3 PIK3R5 PIK3CG PIK3R2 PIK3R1 PIAS3 SUMO1 MITF PLK1 PIAS1 EPHA5 EPHA2 EGFR EPHA3 EPHA7 PIAS4 SPDL1 EPHA4 EPHA1 EPHA6 EFNA2 SKA2 DYNC1I2 KIF18A HDAC8 EGF EFNA5 SMC1A FOXL2 MTA1 B9D2 NUDC LYN FYN PPP2R5D EFNA4 CLASP2 CLIP1 DYNC1H1 SMC3 YES1 STAG2 SP3 NGEF MAD2L1 GSK3B PPP1CC CENPT MYH14 PPP2R5CDYNC1I1 AURKB RAD21 HSP90AA1 PAFAH1B1 PDS5B TP53BP1 RHOA EPHA10 CENPP DYNC1LI1 EFNA3 CENPH ERCC6L MYL12B DYNLL2 STAG1 PDS5A MYL9 NDC80 MAD1L1 CENPE BUB1 BUB3 BIRC5 MYL6 EFNA1 CENPC RANGAP1 KDR MYH11 CENPF CENPI ROCK1 EPHA8 AHCTF1 SPC25 ZWILCH MAPRE1 DYNC1LI2 NUF2 SKA1 ROCK2 NDEL1 PPP2R5B RCC2 SHB MYH9 ZW10 CENPO CENPK NSL1 TAOK1 SEC13 CENPA CENPU CENPQ MYH10 SPC24 VEGFA CDCA5 ESPL1 PTK2 PMF1 DYNLL1 GRB2 KIF2A CDCA8 KIF2C PPP2R1B CLASP1 ITGB3BP INCENP VCL DSN1 CENPM PPP2CB BUB1B CKAP5 SRC CENPN KSR2 ITGB3 KIF2B IQGAP1 RAP1B ZWINT PPP2R5E ATM KNTC1 NDE1 SOS2 KSR1 APBB1IP MIS12 PPP2CA PPP2R5A MARK3 NRASARAFPEBP1 ATR CENPL PPP2R1A SOS1 MAPK1 XPO1 GML KRAS HRAS SIRT1 ARRB2 BRAF MRPL18 MAP2K2 MAPKAPK2 BAG3 RAF1 ELK1 MAPK3 CRYBA4 HSPB1 MAP2K1 ST13 BAG4 YWHAB ARRB1 CSK TNFRSF21 DNAJB1 HSPH1 FGB FGG BAG2DNAJB6 HSPA1L HSF1 CNKSR2 ITGA2BRAP1AFGA FN1 NUP54 CNKSR1 CCAR2 NUP188 TLN1VWF HSPB2 BAG5 HSPA5 SEH1L RPS19BP1 HSPA12BHSPA2 HSPA14 HSPA1AHSPA4L NUP160NUP155NUP210 HSPA7 NUP88 COL4A6 DEDD2 BAG1 HSPA8 NDC1POM121C HSPA13 NUP214NUP85 NUP62 NUP35 HSPA9HSPA6 NUP37 RLN1 SNUPN DDX20 HSPA12A RANBP2NUP98 SNRPBGEMIN7 DNAJC2 TPRNUP43 WDR77 SERPINH1 HSPA1B NCBP1 NUP205 NUP93 TNF FKBP4 CDK8 NUPL2 RAE1 SNRPG NCOA6 AAAS GEMIN8SNRPD2 GEMIN5 HSPA4 NUP107 SNRPE CLNS1A CHD9 NUP50DNAJC7 GEMIN4 SMN2GEMIN2 WNT1 PHAX TBL1X MED20 NUP133 POM121 PRMT5 NCOA3 SNRPD1SNRPD3 TGS1 NUP153 GEMIN6 MED9MED7 CCND3 EGR2 PPARA MED17 NCBP2SNRPF MED28NCOA2 MED15 MED19 NCOA1 MED23 MED31 MED10 HELZ2 MED13 MED21 CDK4 SMARCD3 CEBPB KLF4 MED12 MED16 LEP UBB CD36 CREBBP LPLADIPOQ ANGPTL4 FABP4 PCK1 MED25FAM120BTHRAP3 EBF1 CEBPA KLF5 PPARGC1A PLIN1

MED6 MED11 CDK19 MED22 PPARG CEBPD NFKB1 U2AF1L4 SRRM1 MED13L MED14 CCNC HNRNPH2 HNRNPL EP300 MED24 MED30 GTF2F2 CD2BP2 MED8 MED26 SRSF5 CARM1 MED27 MED1 MED4 ZNF638 HNRNPC MED18 MED29 SREBF1 ALYREF PCBP2 TBL1XR1 RXRA SUGP1 HNRNPA0 HNRNPKSNRPA CCAR1 PSMC2 YBX1 SRRM2 ADIRF ZNF467 TGFB1 PSME1 SRSF7 HNRNPDHNRNPUL1 WNT10B HNRNPR SRSF10 RELA PSMB2 PSMC3PSMB3 HNRNPU HNRNPF NR2F2 HNRNPA2B1 PSMB8PSMB1 HNRNPH1 SREBF2 PSMA2PSMA4PSMA7 PSMD6 PSMB11 SRSF9 DDX5 PSMB6 PSMD7 HNRNPM SRSF2 SRSF3 PSMB7 CDC20 PSMD4 HNRNPA3 U2AF2 PRPF40A PSMB10CDC26 ANAPC2 ANAPC7 SRSF6 RPS6 NOC4L UBE2C PCBP1 HNRNPA1 FUS DHX37 RPS2 PSMA3 CDC23 SRSF1 DDX47 ANAPC10 UBE2D1 EMG1 PSMD13 CDC27 PSME3 SRRT FBLWDR3NOP14HEATR1 PSMC4 RBM5 RRP9 UTP20 ANAPC5ANAPC11ANAPC1 PSMD11 SF1 PTBP1 DHX9 PSMD12 UBE2E1PSMD14 SNRPC GTF2F1 WDR43 ANAPC4 PSMA1 RPS11 CDC16 PSMD8 RBMX UTP3 RPS27 NOP58 ANAPC16 SNRNP70 NOL6 EXOSC8 SKIV2L2 PSMC6 PSMD2PSMD10 PSMA6 DDX49 ANAPC15 PSMB9 EXOSC2 RPS13RRP1 PSMA5 U2AF1 RNPS1 RPS15ARPS28 PSMD9 PSMC5 EXOSC1 RPS27L BOP1RBM28 EXOSC7RPS3ANIP7 PDCD11RPS8PWP2 PSME2 PSMD3 MPHOSPH6 EXOSC4 PSMB4 PSMA8 POLR2L RPS24 PSME4 POLR2H FTSJ3EXOSC10RPS16RPS23 PSMD5 POLR2K MPHOSPH10 EXOSC9EXOSC6 NOP56UTP6 PSMC1 PSMF1 POLR2E C1D RPS5EXOSC3 PSMB5 RRP7A POLR2JPOLR2I POLR2D WDR36 NCL NOL12UTP18WDR75 NOL11 RPS14 IMP4 RPS9 POLR2C POLR2F UTP14AUTP14C PSMD1 POLR2B POLR2G DCAF13 BMS1 RPS7 DDX52FCF1 IMP3 EXOSC5TBL3 WDR46 KRR1RRP36 XRN2 TSR1 UTP15 DIEXF RCL1 RPS18 PRPF19 RIOK3 ERCC3 POLR2A RPS27A PNO1 ERCC6 RIOK2 BYSL RPS15 MNAT1 UBC RPS17 GTF2H5 RPSA UBA52 RPS3 RPS26 RPS4Y1 CDK7 LTV1 ERCC8 RIOK1 RPS21 GTF2H2 GTF2H1 RPS25 GTF2H4 PPIE RPS10 RPS20 CSNK1E HMGN1 FAU RPS12 TCEA1 RPS4Y2 GTF2H3 NOB1 CCNH ERCC2 USP7 RPS29 CSNK1D ZNF830 RPS4X RPS19 XPA AQR XAB2 UVSSA

CUL4B ISY1

DDB1 CUL4A

RBX1COPS7B INO80C TFPTINO80E INO80 ACTL6A INO80B ACTR8

RUVBL1 INO80DNFRKB COPS2 ACTR5 RAD23A COPS5 XPC ACTB COPS7A GPS1

COPS4 RAD23BCOPS3 COPS6 CETN2COPS8 MCRS1 DDB2 YY1

TERT DKC1

NHP2

RUVBL2 WRAP53

CNOT8 PARN LSM4 PLCB3 NOL9 MYC ARPP19 GANAB PSIP1 PTPN1 LSM7 TCF7 CNOT6 LSM2 CNOT1 PAN2 MARCKS WDR18 TCF7L1 ZCCHC11 CNOT6L BANF1 CNOT4 GNA11 CANX EIF4E CCND1 PTK6 EIF4A1 KPNA1 DCP1B GNA15 LEF1 TEX10 CCNB1 CALR EDC4 GNL3 PAIP1 EIF4B DDX6 DCP2 DDX21 TCF7L2 CNOT10 CNOT2 MASTL EIF4A3 PABPC1 CTNNB1 PDIA3 DCP1A PRKCA PAN3 PLCB1 SRMS EDC3 XRN1 EBNA1BP2 SENP3 ENSA CNOT7 LSM6 EIF4G1 LSM3 CHRM3 PRKCSH HMGA1 EIF4A2 CNOT11 LAS1L CNOT3 PATL1 LSM5 ZCCHC6 PELP1 LSM1 PLCB2 CDK1 TNKS1BP1 GNAQ GNA14 ARHGEF18 ARHGEF39 FGD2 ARHGEF17 ARHGEF16 ARHGEF10L FGD3

ARHGEF37 PLEKHG5 FGD4 TRIO OBSCN FGD1 ARHGEF15 ABR FAS ARHGEF40 ARHGEF19 ARHGEF35 CCNE1 ARHGEF2 ITSN1 CCND2 ARHGEF9 TIAM2 ARHGEF12 ARHGEF10 ARHGEF5 SOS1 CCNE2 ARHGEF4 KALRN ARHGEF7 ARHGEF1 ORC2 CDC6 MCM5 ARHGEF3 ECT2 PLEKHG2 ORC3 MCM6 MCM2 ARHGEF26 RAC1 ORC5 FASLG ORC6 RASGRF2 HIF1A ORC1 ARHGEF11ARHGEF33 RHOU ARHGEF6 GNA13 MCM8 IL17F CCNA2 MCM4 MMP9 RORA LIF CDK2 MCF2 CCNA1 NET1 ARHGEF38 ORC4 CDT1 MCF2L JUNB FZR1RB1 LBP RORC OSM POU2F1 TIMP1 ICAM1 IRF4 PREX1 AKAP13 CDKN1A CDKN1B PAK1 SAA1 MCM3 SOS2 HGF BCL2L1 TNFRSF1B NDN MYC HSP90B1 MMP2 BCL6 CCND1 IFNA21 NOS2 RCHY1 IL23R MMP1 SOX2 TWIST1 SOCS3 IFNA14 IFNG VAV1 EFNA4 BATF PPP1R12A EPHA7 IFNA1 IFNA17 PIM1 FGF2 LCN2 NANOG VAV3 MMP3 S1PR1 SHB RAC2 IFNA16 IL17A IFNA6 IL6R YES1 MUC1 IFNB1 MYH14 MYL6TIAM1 EFNA1 EPHA3 IL23A MCM7 RAC3 IFNGR1 ZEB1 TSC2 IFNA7 EPHA10 IFNA2 MYL9 LYN POMC STAT3 CBLB MCL1 FOXO4 VEGFA EFNA2 IFNAR1 IFNGR2 MYH11PTK2 NGEF CD28 IFNA5 IL6 GSK3AFOXO3 ROCK1 IFNA8 CD3E CD3D MLLT4 EFNA5 IFNAR2 HSP90AA1 MYL12B RICTOR ROCK2 EPHA2 EPHA1 TP53 STAT5A PIK3R2 MYH9 JAK2 AKT1S1 MYH10 SPG7 TYK2 CD3G EPHA5 IFNA4 IFNA10 IL10 FOXO1 AFG3L2 IL13RA1 PIK3CA EPHA4 EPHA6 YME1L1 JAK1 PIK3R1PIK3R3 PIK3R5 KDR SOCS1 IL2RA ARAP3 CCL11 EPHA8 PHB JAK3 CASP9 IQGAP1 EFNA3 IGHG1 VAV2 RHOA IL2RG IL2RB PIK3CD RAP1A PARL PIK3CB RAP1B PMPCA IGHE PLCE1 RRAS2 FOXL2 STAT2 NR4A1 PLD2 STAT1 IL4R PIK3CG ITGB3 MARK3 GATA3 AKT1 CCND3 RRAS None PMPCB SMDT1 RAPGEF4 STAT6 IL2 SRC SP3 HIC1 STAT5B RPS6KB2 TLN1 STOML2 PTGS2 AKT3 FYN BCL2L11 None CSK None CDKN2A ARAF NR2F2 OPRD1 MDM2 MAPKAP1 ADIRF PRR5 KSR1 MAP2K1 CEBPD KLF4 FN1 FGB ZNF638 HRAS MAPK8 IRF9 MTOR MAP2K2KRAS NGF TP53BP1 YWHAB SREBF2 GNB2L1 PEBP1 EGR2 CASP12 CNKSR2 KLF5 PIAS1 IGHG4 PDPK1 FGG NRAS RAF1 ZNF467 CCL22 MLST8 VWF FGA MTA1 PIAS2 ARRB2 MAPK9 TGFB1 SH2D1A PHB2 OPRM1 FCER2 ITGB1 BRAF MAPK10 SREBF1 FOS KSR2 CEBPB EBF1 PIAS4 ALOX5 MAPK3 MAPK1 MITF CAPN1 IL4 COL1A2 ITGA2BAPBB1IP CNKSR1 NGFR CAPN2 FSCN1 ARRB1 SLAMF1 MED28 PIAS3 IL13 VCL NCOA6 ANXA1 VIM BIRC5 HMOX1 MED10 CDK8 MED21TBL1X SMARCD3 GSK3B CREB1 TGS1 WNT1 UBE2I ITGB2 MED6 CCNC NCOA1PPARG ALOX15 CREB5 MAP2K7 CDK19 MED24 SUMO1 CREB3L1 MED4 MED23 MED7 BAK1 LAMA5 CREB3 MED11 ITGAX MAOA CREB3L2 MED9 WNT10B F13A1 CREB3L3 MED15 MED16 RGPD5RGPD1 BAX CREB3L4 PRKCQ BDNF TBL1XR1MED29 LEP MED17MED13L RGPD4 IL18 PTK6 MAGED1 CEBPA VCAM1 CD36 FABP4 CREBBP RGPD2 PPARGC1A ANGPTL4 RGPD3 NCOA2 MED20 JUN PPP1CB BAD PLIN1PCK1ADIPOQ TNF RGPD8 ITGAM MED12 PPP1CA DOK1 MED19 SUMO4 CBL ACOX1 LPL PPP1CC MED8 MED18 MED31 MED14 MED1 SUMO2 FAM120B MED27 MED30MED26 SUMO3 CDK4 ARAP1 BCL2 THRAP3 PTPN1 MED13 PPP2R1B RANGAP1 PLK1 CENPO KRT18 CFTR CACNA1D PPP1R1B GRIA3 MED22 NCOA3 NSL1 RXRA MED25 ITGB3BP MAP3K5 CHUK GRIN3B CACNA1F PPARA HELZ2 KIF18A PPP2R1A SKA1 RYR2 GRIA4 MAD2L1 PPP2CB CHD9 CENPE GRIN2A CENPM CKAP2 ATP2B1 UGT1A1 NUF2 CACNA1S EP300 DYNC1I2 BUB3 NFKB1 UGT1A5 GRIN2C UGT1A8 CENPI CENPN DYNLL1 UCK2 UGT1A9 CENPC BUB1 CDK2/3 PRKACA RELA SRMS GNL3L PLN CYP2A6 TK1 UGT1A4 SPDL1 CARM1 DYNC1LI2 CENPH PRKACB ACOX3 KNTC1 INCENP CLIP1 NFATC1 ORAI1 UGT2A1 NDUFA11 PRKACG SOX9 NDEL1 RANBP2 UGT1A10 PPP2R5B STAG2 CLASP1 UBXN6 NFKBIA UGT2B28 KIF2B UMPS UPP1 TYMP CENPL CKAP5 F2R VCP PPP2R5D MIS12 DNAJC7 PPP2R5E AAASNUP93 GRIN2BGRIN2D CDA CES1 UGT2B7 BUB1B TNFRSF21 TK2 GUSB DYNC1LI1 PDS5B IFT81 ATP2B3GRIA2 NFKBIB UPP2 TAOK1 POM121C MRPL18 ATP2B2 UGT1A3 SMC3 HSF1 ATM ATP1B3 PTCH1 ATP2B4 AMH UCKL1 CES2 NUP107 BAG3 CACNA1C SVIP MAD1L1 CENPP NUP205 UGT1A7 HDAC8 STAG1 NDC80 NPLOC4 HSPH1NUP54 ATP1B4 GRIN1 SLC9A1 DPYD UGT2B10 ZWILCH CENPA GRIA1 ZWINT ESPL1 HSPA8 PDS5A NUP133 HSPA1L FKBP4 HSPB2 TRAF2 FXYD1 UCK1 DYNLL2 BAG5 HSPA9HSPA12B GRB10 GLI3 UGT2B11 RCC2 HSPA1B NSFL1C DNAJB6 ATP1B2 GRIN3A UGT2B17UGT1A6 NUP37 HSPA5 BAG2 COL4A6 DEDD2 INS PPP2R5A CENPK HSPA4LHSPA4 POM121 SMC1A HSPA7 HSPA1A PDE4DPDE4CPDE4B UGT2A3 UGT2B4 KIF2C NUP98 HSPA13 ATP1A4 PDE3B UFD1L DPYS DSN1 CENPU DYNC1H1 NUP160 NUP88 HSPA2 NUP188 CRYBA4 CDCA8 HSPA14 MAPKAPK2 RAD21 MAVS UGT2B15 KIF2A SEH1L HSPA6 SIRT1 FXYD2 PDE3A IFIH1 PMF1 MAPRE1 HSPA12A RPS19BP1 CYP3A4 SKA2 NUP35 DNAJB1 RLN1 IRS2 PDE4A DYNC1I1 NUP210NUP43 NUP50 TPR ST13 ATP1A1 DDX58 UPB1 CENPT CENPQ NUP85 DNAJC2 CCAR2 ATP1B1 PPP2R5C NUP155 NDC1 HSPB1 NDE1 AURKB INSR ZW10 NUP214 IRS1 HHIP CLASP2 BAG1 BAG4 SERPINH1 NUP153 GML ATP1A3 SPC25 AHCTF1NUDC B9D2 NUP62 ATP1A2 GLI1 CENPF NoneCAMK2BCAMK2D SEC24B SPC24 CDCA5 RAE1 CAMK4 SEC13 SEC24C ERCC6L NUPL2 CAMK2G SAR1A PREB PPP2CA PAFAH1B1 ERN1 ADCY8 SAR1B CALML4 SEC23A ADCY5 ATR ADCY7 CALML6 CALML3 ADORA2A SEC31B NUPL1 ADCY9 SEC23BSEC24D EDNRA ADCY6 ADCY2 PTGER2 IRF7 ADCY1 CALML5 GLP1R SEC31A SENP2 ADCY3ADCY4 ADRB1 TSHR SEC24A XPO1 CALM2 HCAR2 GHSR GNAS MC2R ADORA1 XBP1 HTR1D GNAI2GNAI1 SSTR2 NPR1 GHRL FSHB PSMA4PSMB7 HCAR3 ADRB2NPY1R DRD1 PSMB10 PSMD13 PSMA6PSME1 GNAI3 GABBR2 FSHR CDC16PSMF1 DRD2 HTR1B VIPR2 PSMA8 PSMB11 CHRM2 NXF1 PSMC5PSME2 NPY NXF2 PSMD4PSMB8UBE2C GPR119ADCYAP1R1 DDIT3 NXF5 PSMB2 PSMA5 SUCNR1 HCAR1 HTR4 PSMB1 PSMB5UBC PABPC1L PSMB9 PSMD8 HTR1E GABBR1 HTR1F NXF3 PSMA2UBBCDC20 ANAPC5 SSTR5 CHRM1 DRD5 CNOT1 CNOT6 ANAPC4CDC27CDC23 PSMB3 HTR1A HTR6 PABPC3 RPS3 GIPR ZCCHC6 EIF4A3 PSMC3 SSTR1 OXTR UBE2E1 ANAPC10 PTGER3 CNOT3 ANAPC15ANAPC1UBE2D1 UBA52 FFAR2 TNKS1BP1 PSMA7 PSMD6PSMD7 CNOT10 ANAPC2 PABPC4 PSME4 ANAPC11CDC26 ANAPC7 PSMC4 UPF3B NXT2 NXT1 PSMD9 CNOT8 PAN2 EIF4G1 ATF6 PSMC2 PSMD10 PABPC5 PAIP1 PSMD3 PSMD14 HNRNPM EIF4A2 PARN PSMD12 ANAPC16RPS27A PSMD1 UPF3A PSMB4 CNOT4 ATF6B PSMC1 PSMD2 FXR1 EIF4B PSMA1 PSMD5PSMD11 PSMA3 PABPC1L2B THOC5 PSME3 EIF4E PABPC1 CNOT7 CNOT2 MAGOHB PSMB6 CYFIP2 PSMC6 EIF4A1 IRAK4 None CNOT11 IRAK1 FXR2 MAGOH CYFIP1 PAN3 ZCCHC11 EIF4G2 CNOT6L WFS1 UPF2 RBM8A EIF4E2 FMR1 EIF4G3 RPS29 RPS17 ALYREF EIF4E1B RPS12 PNO1 RPS10 FAU DDX39B RPS20 BYSL RIOK2 TRAF6 RPS4X PPIE UPF1 LTV1 RPS18 RPS15 XAB2 UVSSA WIBG RPSA

CASC3 RIOK1 RPS25 ISY1 TCEA1 NFE2L2 ZNF830 THOC1 RPS4Y2 NOB1 PRPF19 ATF4 RPS9 RIOK3 TBL3 RPS6 PWP2 AQR UTP6NOL6UTP15 MYD88 DHX37 RPS14 EEF1A2 CSNK1D RPS26 ERCC6 HMGN1 EXOSC6BOP1 TNFAIP3 THOC2 RPS27LRRP1 XRN2DIEXF KRR1 FCF1 C1DRPS8RPS27 GTF2H1 RAN NOP58 RPS24EXOSC1UTP14A CSNK1E USP7 RPS15A NOC4L TSR1 XPOT RRP9 GTF2H5 EIF2AK2 RPS28 EXOSC2 CCNH EIF2AK3 NOL11BMS1 EXOSC5RPS23 WDR43 GTF2H2 THOC6 PDCD11 RPS16EXOSC4RRP36 RPS4Y1 THOC7 UTP20 CDK7 RPS11EXOSC3 GTF2H4 RBM28HEATR1 RPS3A RPS5 WDR46 RPS19 IMP4 FBLDDX52SKIV2L2EXOSC9NOL12 ERCC2 EIF2AK1 EIF2AK4 PHAX DDX47 TLR9 MNAT1 XPO5 UTP14CWDR36UTP18EXOSC8 GTF2H3 EXOSC7 WDR3 ERCC8 PPP1R15A MPHOSPH6NCL RPS21 WDR75 RPS13 TLR4 EMG1 RCL1 POLR2K XPA NIP7EXOSC10RPS7 UTP3 DDX49 IMP3 NOP14 TLR2 ERCC3 NMD3 DCAF13 MPHOSPH10 POLR2G POLR2E FTSJ3 POLR2A EEF1A1 NOP56RPS2 TLR7 POLR2J POLR2D RRP7A DDB1 POLR2B POLR2L CUL4A EIF2S1 POLR2C INO80CTFPT DDB2 NCBP1 POLR2I EIF3G HNRNPA1 EIF3A U2AF2 POLR2H INO80E XPC CETN2 EIF3F EIF2B4 POLR2F NCBP2 COPS7A COPS2 EIF3H EIF3I RBX1 EIF2S2 ACTL6AYY1 INO80RAD23B SAP18 CUL4B INO80DNFRKB COPS5 SRSF2 DDX5 ACTB EIF3B SNRPB EIF2S3 EIF2B2 SNRPD2 INO80B EIF1AX EIF3CL HNRNPL COPS3 RUVBL1 SRRM1 GPS1 ACTR8 EIF1AY EIF2B3 EIF3D SRRM2 HNRNPA0 EIF5 EIF2B1 ACTR5 RBMX COPS4MCRS1 EIF1 EIF3C RNPS1 SRSF10 COPS6 EIF1B EIF3E EIF2B5 HNRNPA3 SRSF5 PCBP2 COPS8 COPS7B ACIN1 EIF3J HNRNPD HNRNPU SNRPA RAD23A SNRPF SRSF9 SRSF6 U2AF1L4 PNN SNRPG GTF2F1 YBX1 SF1 GTF2F2 SNRNP70 SNRPD1 DKC1 HNRNPH2 PRPF40A SRSF7 SNRPD3 TERT SRRT HNRNPH1 PCBP1 HNRNPUL1 WRAP53 HNRNPK HNRNPA2B1SNRPE HNRNPF SRSF3 DHX9 HNRNPR CCAR1 RUVBL2 SUGP1 PTBP1 SNRPC FUS NHP2 RBM5 HNRNPC CD2BP2 SRSF1 U2AF1

LSM4 PLCB3 GEMIN5 PELP1 GMPS TXNDC5 TNFRSF10B ARPP19 GANAB PSIP1 IL12A SNUPN LSM7 TEX10 ERP29 LSM2 GEMIN7 TPMT MARCKS BANF1 XDH CANX GNA11 SMN2 ERO1L TNFRSF10C KPNA1 DCP1B GNA15 EBNA1BP2 SENP3 IMPDH1 CCNB1 CALR FCGR2B EDC4 GEMIN6 HPRT1 DDX6 DCP2 IMPDH2 PDIA6 ERO1LB TNFSF10 NOL9 GNL3 DDX21 MASTL KPNB1 PDIA3 DCP1A PRKCA DDX20 PDIA4 PLCB1 EDC3 XRN1 ENSA LSM6 GEMIN2 TNFRSF10A LSM3 CHRM3 ITPA PRKCSH HMGA1 GEMIN8 P4HB IL12B WDR18 PATL1 LSM5 LAS1L TNFRSF10D LSM1 PLCB2 CDK1 GNAQ STRAP GNA14 GEMIN4 SRMS

CDK2

CSNK2A1

DVL1

DVL2 CSNK2A3 CSNK2A2 CSNK2B

CTNNB1 EGFR PTEN CSNK1E SNCA CSNK1D ACTB ACTG1 TP53

SRC VEGFA HIST3H3 PRKCE PTK2 PRKD2 AKT1 MAPK1 FOXO3 PSMA3 SMO SEC13 RAD23A SFN SESN2 CDKN1B

ROCK1 PTK2B MAPK14 CHEK2 CDC42 PEBP1 BAD ATM PRKCA CCNB1 CDK1 CDKN1A NUP133 SEH1L CIT

HSP90AA1 CBX8 PHC1 SCMH1 RNF2 RING1 PHC3 CBX2 PHC2 TP53BP1 MARCKS RPS27 MASTL NUDC CLIP1 HMGA1 PHAX MYH9

HNRNPK RRP9 ENSA Graphical User Interface Inputs Workflow Buffard, figure S7

Target proteins (from phosphoproteomic a analyses) 1. Select pathways from databases

b 2. Embed pathways

c

3. Search for paths from source to targets d

4. Extract subnetwork corresponding to GO terms or specific target

e Outputs Cytoscape readable and f modifiable files Buffard, figure S8 Supporting Information to Buffard, et al.

Supplementary Text

Graphical interface

In this section, we briefly present the different implemented options for the graphical interface (Suppl Figure 7). First, the user selects the data file (Browse button) containing the UniProt AC of the differentially phosphorylated proteins (step a) and the output folder. Then, the user chooses (1) the pathway databases by ticking the boxes allowing the choice of KEGG, Pathway Commons or both; (2) the pathway selection mode, by ticking radio buttons (the “all pathways” option will embed all pathways from the selected databases) (step b). To avoid the lack of connectivity of the source in the prior-knowledge network, we also included the possibility of directly adding protein interactions (e.g. with a kinase and its substrates). The protein components of these added interactions will be taken into account for the pathway selection only if the “enriched pathways” mode has been selected (step c). Next, the user sets all the parameters for the shortest path search from a source to the list of the differentially phosphorylated proteins: the source node (e.g., UniProt AC Q9H3Y6 for SRMS), the list of proteins for promoted edges (a personalized list and/or a selected pre-established list of kinase/phosphatase). The “overflow option” allows one to retrieve the near-shortest paths instead of the “strict” shortest paths to generate a set of alternatives, selecting all paths for which the total distance is up to xx% higher than that of the shortest path (step d). Finally, the “shortest path” extraction can be tuned by selecting a subset of targets and/or categories (regrouped GO terms) (step e). We have already included some GO term groups representing relevant processes and functions in cancer. All generated networks are able to be explored and manipulated using Cytoscape.

Legends to supplementary figures

Supplementary Figure 1. List of ad hoc weights of edges

An ad hoc weight is introduced on each edge based on the nature of the source and target node. “Normal” edges have a distance of 5 (d = 5), edges coming out of identified proteins (d = 3), edges reaching an identified protein while coming out of a tyrosine kinase or phosphatase (d = 2) or combining these two conditions (d = 1). Edges reaching a target identified as differentially phosphorylated, but which did not come from a kinase or phosphatase (d = 8), even if they came out of another identified protein (d = 6).

Supplementary Figure 2. Identification of specific paths from the PIK3CA mutants to the receptor tyrosine kinase HER3.

The protein interaction networks are composed of nodes and edges. The nodes represent the proteins whose diamond or rounded rectangle shape correspond, respectively, to the experimentally identified targets or to the proteins of the pathway databases. The edges of the networks represent the protein interactions whose target arrow shape corresponds to the sign of the interaction (Delta, positive interaction; T, negative interaction; Circle, unknown sign). (A) Subnetwork of the signal propagation from the E545H mutant to HER3, extracted from the signaling network obtained from the quantitative phosphoproteomic comparison between the E545H mutant and the control condition. This subnetwork contains all the near-shortest paths allowing a 20% overflow. (B) Subnetwork of the signal propagation from the H1047R mutant to HER3, extracted from the signaling network obtained from the quantitative phosphoproteomic comparison between the H1047R mutant and the control condition. This subnetwork contains all the near-shortest paths allowing a 20% overflow. (C) Subnetwork of the signal propagation from the H1047R mutant to HER3, extracted from the signaling network of this mutant obtained from the quantitative phosphoproteomic differences between the E545H and H1047R mutants.

Supplementary Figure 3. Signal propagation from the H1047R PIK3CA mutant to HER3.

Subnetwork of the signal propagation from the H1047R mutant to HER3, extracted from the signaling network of this mutant obtained from the quantitative phosphoproteomic differences between the E545H and H1047R mutants. This subnetwork contains all the near-shortest paths allowing a 20% overflow. The protein interaction networks are composed of nodes and edges. Nodes represent the proteins whose diamond or rounded rectangle shape correspond, respectively, to the experimentally identified targets or to the proteins of the pathway databases. Edges of the networks represent the protein interactions whose target arrow shape corresponds to the sign of the interaction (Delta, positive interaction; T, negative interaction; Circle, unknown consequence).

Supplementary Figure 4. SRMS prior-knowledge network obtained from embedding the database pathways enriched in the list of SRMS indirect targets.

The protein interaction networks are composed of nodes and edges. The red diamond-shaped nodes represent the experimentally identified indirect targets of SRMS (phosphorylated on serine/threonine residues). The edges of the networks represent the protein interactions whose target arrow shape corresponds to the sign of the interaction (Delta, positive interaction; T, negative interaction; Circle, unknown consequence).

Supplementary Figure 5. SRMS prior-knowledge network enriched with the

SRMS-indirect targets and the SRMS direct substrates.

The protein interaction networks are composed of nodes and edges. The green rounded rectangle- shaped nodes represent the proteins experimentally identified as potential direct tyrosine- phosphorylated SRMS substrates. Red diamond-shaped nodes represent the experimentally identified indirect targets of SRMS (phosphorylated on serine/threonine residues). Edges of the networks represent the protein interactions whose target arrow shape corresponds to the sign of the interaction (Delta, positive interaction; T, negative interaction; Circle, unknown consequence).

Supplementary Figure 6. Subnetwork of the paths from SRMS to its indirect targets and through the four CK2-subunits.

The protein interaction networks are composed of nodes and edges. The green rounded rectangle- shaped nodes represent the proteins experimentally identified as potential direct tyrosine- phosphorylated SRMS substrates. The red diamond-shaped nodes represent the experimentally identified indirect targets of SRMS (phosphorylated on serine/threonine residues). The CK2 subunits are light blue in color. Edges of the networks represent the protein interactions whose target arrow shape corresponds to the sign of the interaction (Delta, positive interaction; T, negative interaction; Circle, unknown consequence).

Supplementary Figure 7. Graphical interface

Outline of the graphical interface steps and options. Steps (a) to (e) must be entered by the user as inputs for the different corresponding steps of the workflow (black arrows). Step (f) represents the workflow output results, creating files corresponding to the different subnetworks created by the user’s entries and choices (in step (e)). These files can be visualized and manipulated with Cytoscape (cytoscape.org).

Supplementary Figure 8. Diagrammatic representation of the algorithm

The algorithm is dependent on the user input. Input and output are represented by parallelogram, alternatives by diamond. Optional inputs and outputs are represented by dashed lines.