<<

bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

INKA, an integrative data analysis pipeline for phosphoproteomic

inference of active phosphokinases

Thang V. Pham1,2, Robin Beekhof1,2, Carolien van Alphen1,2, Jaco C. Knol1, Alex A.

Henneman1, Frank Rolfs1, Mariette Labots1, Evan Henneberry1, Tessa Y.S. Le Large1, Richard

R. de Haas1, Sander R. Piersma1, Henk M.W. Verheul1, Connie R. Jimenez1 *

1 OncoProteomics Laboratory, Department of Medical Oncology, Cancer Center Amsterdam,

VU University Medical Center, De Boelelaan 1117, 1081 HV Amsterdam, The Netherlands

2 These authors contributed equally to this work

* Correspondence should be addressed to C.R.J. ([email protected])

Running Title: Integrative tool ranks active

Character Count: 50,356

1 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Abstract

2

3 Identifying (hyper)active kinases in cancer patient tumors is crucial to enable individualized

4 treatment with specific inhibitors. Conceptually, activity can be gleaned from global

5 phosphorylation profiles obtained with mass spectrometry-based phosphoproteomics. A

6 major challenge is to relate such profiles to specific kinases to identify (hyper)active kinases that

7 may fuel growth/progression of individual tumors. Approaches have hitherto focused on

8 phosphorylation of either kinases or their substrates. Here, we combine kinase-centric and

9 substrate-centric information in an Integrative Inferred Kinase Activity (INKA) analysis. INKA

10 utilizes label-free quantification of phosphopeptides derived from kinases, kinase activation

11 loops, kinase substrates deduced from prior experimental knowledge, and kinase substrates

12 predicted from sequence motifs, yielding a single score. This multipronged, stringent analysis

13 enables ranking of kinase activity and visualization of kinase-substrate relation networks in a

14 biological sample. As a proof of concept, INKA scoring of phosphoproteomic data for different

15 oncogene-driven cancer cell lines inferred top activity of implicated driver kinases, and relevant

16 quantitative changes upon perturbation. These analyses show the ability of INKA scoring to

17 identify (hyper)active kinases, with potential clinical significance.

18

19

20 Keywords

21 cancer / single-sample analysis / kinase-substrate phosphorylation network / drug selection /

22 computational tool

23

2 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

24 Introduction

25

26 Cancer is associated with aberrant kinase activity (Hanahan & Weinberg, 2011), and among

27 recurrently altered some 75 encode kinases that may “drive” tumorigenesis and/or

28 progression (Vogelstein et al, 2013). In the last decade, multiple kinase-targeted drugs including

29 small-molecule inhibitors and have been approved for clinical use in cancer treatment

30 (Knight et al, 2010). However, even when selected on the basis of extensive genomic

31 knowledge, only a subpopulation of patients experiences clinical benefit (Huang et al, 2014;

32 Valabrega et al, 2007; Flaherty et al, 2010), while invariably resistance also develops in

33 responders. Resistance cannot only result from mutations in the targeted kinase or downstream

34 pathways, but also from alterations in more distal pathways (Al-Lazikani et al, 2012; Trusolino

35 & Bertotti, 2012; Ramos & Bentires-Alj, 2015). This complexity calls for tailored therapy based

36 on detailed knowledge of the individual tumor's biology, including a comprehensive profile of

37 (hyper)active kinases. MS-based phosphoproteomics enables global protein phosphorylation

38 profiling of cells and tissues (Jimenez & Verheul, 2014; Casado et al, 2016), but to arrive at a

39 prioritized list of actionable (combinations of) active kinases, a dedicated analysis pipeline is

40 required as the data are massive and complex. Importantly, a prime prerequisite for personalized

41 treatment requires that the analysis is based on a single sample.

42 Different kinase ranking approaches have been described previously. Rikova and colleagues

43 sorted kinases on the basis of the sum of the spectral counts (an MS correlate of abundance) for

44 all phosphopeptides attributed to a given kinase, and identified known and novel oncogenic

45 kinases in lung cancer (Rikova et al, 2007). This type of analysis can be performed in individual

46 samples, but is limited by a focus on phosphorylation of the kinase itself, rather than the (usually

3 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

47 extensive) set of its substrates. Instead, several substrate-centric approaches also exist, including

48 KSEA (Casado et al, 2013; Terfve et al, 2015; Wilkes et al, 2015), pCHIPS (Drake et al, 2016),

49 and IKAP (Mischnik et al, 2016).

50 Neither a kinase-centric nor a substrate-centric phosphorylation analysis may suffice by itself to

51 optimally single out major activated (driver) kinase (s) of cancer cells. To achieve an optimized

52 ranking of inferred kinase activities based on MS-derived phosphoproteomics data for single

53 samples, we propose a multipronged, rather than a singular approach. In this study, we devised a

54 phosphoproteomics analysis tool for prioritizing active kinases in single samples, called

55 Integrative Inferred Kinase Activity (INKA) scoring. INKA combines direct observations on

56 phosphokinases (either all kinase-derived phosphopeptides or activation loop peptides

57 specifically), with observations on phosphoproteins that are known or predicted substrates for the

58 pertinent kinase. To demonstrate its utility, we analyzed cancer cell line 'cases' with known

59 driver kinases in a single-sample manner, such as would be required in a personalized medicine

60 setting and explored dynamic changes upon perturbation.

61

62 Results

63

64 INKA: integration of kinase-centric and substrate-centric evidence to infer kinase activity

65 from single-sample phosphoproteomics data

66 To infer kinase activity from phosphoproteomics data of single samples, we developed a

67 multipronged data analysis approach. Figure 1 summarizes the data collection (Fig 1A) and

68 analysis workflows (Fig 1B) of the current study. For in-house data generation, we utilized

69 phosphotyrosine (pTyr)-based phosphoproteomics of cancer cell lines and tumor needle biopsies

4 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

70 (Supplementary Table 1). Kinases covered by individual analysis approaches are detailed in

71 Supplementary Table 2.

72 First, phosphopeptides derived from established protein kinases (KinBase, http://kinase.com)

73 (Manning et al, 2002) were analyzed. Kinase hyperphosphorylation is commonly associated with

74 increased kinase activity. This is the rationale for using the sum of spectral counts (the number of

75 identified MS/MS spectra) for all phosphopeptides derived from a kinase as a proxy for its

76 activity, and to rank kinases accordingly, as pioneered by Rikova et al. (Rikova et al, 2007)

77 Second, kinase activation loop phosphorylation was analyzed. Most kinases harbor an activation

78 segment, residing between highly conserved Asp-Phe-Gly and Ala-Pro-Glu motifs.

79 Phosphorylation of residues in the activation loop counteracts the positive charge of a critical

80 arginine in the catalytic loop, eliciting conformational changes and consequent kinase activation

81 (Nolen et al, 2004). To identify phosphopeptides that are derived from a kinase activation

82 segment, we used the Phomics toolbox (http://phomics.jensenlab.org) (Munk et al, 2016).

83 Subsequently, kinases were ranked after spectral count aggregation as described above.

84 Third, as a substrate-centric complement to the kinase-centric analyses above, and similar to a

85 key ingredient in KSEA analysis (Casado et al, 2013), one can backtrack phosphorylation of

86 substrates to responsible kinases as an indirect way to monitor kinase activity. Experimentally

87 established kinase-substrate relationships listed by PhosphoSitePlus (Hornbeck et al, 2015) were

88 used to link substrate-associated spectral counts to specific kinases, followed by kinase ranking.

89 Fourth, another substrate-centric analysis was included to complement the previous step. To

90 date, databases logging experimental kinase-substrate relationships are far from complete,

91 leaving a large proportion of phosphopeptides that cannot be mapped as a kinase substrate

92 through that avenue. Therefore, we applied the NetworKIN prediction algorithm (Linding et al,

5 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

93 2007; Horn et al, 2014) to observed phosphosites to generate a wider scope of kinase-substrate

94 relationships. NetworKIN uses phosphorylation sequence motifs and protein-protein network

95 (path length) information to predict and rank kinases that may be responsible for phosphorylation

96 of specific substrate phosphosites. After applying score cutoffs to restrict the NetworKIN output

97 to the most likely kinase-substrate pairs, kinases were ranked by the sum of all spectral counts

98 associated with their predicted substrates.

99 Finally, we devised a method to integrate the four analyses as described above and to provide a

100 single metric for pinpointing active kinases in biological samples analyzed by

101 phosphoproteomics (Fig 1B, Materials and Methods). For a given kinase, associated values in

102 either of the two kinase-centric analyses are summed, and the same is done for the two substrate-

103 centric analyses. Subsequently, the geometric mean of both sums is taken as an integrated

104 inferred kinase activity, or INKA score. A non-zero INKA score requires both kinase-centric and

105 substrate-centric evidence to be present. Furthermore, a skew parameter is calculated (0 for

106 exclusively kinase-centric, 1 for exclusively substrate-centric, and 0.5 for equal contribution; see

107 Materials and Methods) indicating to which extent the INKA score is derived from kinase-

108 centric or from substrate-centric evidence. For kinases that are missing from PhosphoSitePlus

109 and can not be inferred by NetworKIN prediction as well, a separate kinase-centric ranking is

110 performed to include these MS-observed in the analysis. The latter involves 172 out of

111 538 established protein kinases considered in our analyses (Supplementary Fig 1 and

112 Supplementary Table 2). For kinases inferred through PhosphoSitePlus/NetworKIN but not

113 observed by MS, the reciprocal analysis is not performed as kinases display overlapping

114 substrate specificities precluding unequivocal assignment of a substrate to a specific kinase.

6 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

115 Results are visualized through different plots (Fig 1B). Individual analyses each result in a bar

116 graph with top 20 kinases. Integration by INKA scoring results in a scatter plot for all kinases

117 with an INKA score that is at least 10% of the top scoring kinase (vertical: INKA score,

118 horizontal: skew towards kinase-centric or substrate-centric evidence). For the top 20 kinases (by

119 INKA score), a ranked bar graph and a network of all inferred kinase-substrate connections are

120 visualized as well (Fig 1B). This pipeline is available to users who can upload their data to

121 www.inkascore.org for INKA scoring.

122

123 INKA analysis of cancer cell line use cases

124 To assess performance of the INKA approach, we carried out pTyr-based phosphoproteomics

125 with four well-studied cell lines for which oncogenic driver kinases are known: K562 chronic

126 myeloid leukemia (CML) cells harboring a BCR-ABL fusion (Klein et al, 1976; Heisterkamp et

127 al), SK-Mel-28 melanoma cells carrying mutant BRAF (Carey et al, 1976; Davies et al, 2002),

128 HCC827-ER3 lung carcinoma cells with mutant EGFR (Girard et al, 2000; Zhang et al, 2012),

129 and H2228 lung carcinoma cells with an EML4-ALK fusion (Soda et al, 2007; Rikova et al,

130 2007). Figure 2 displays, on a single row per cell line, bar graphs with the top 20 kinases for each

131 of the four basic analyses (Kinome, Activation Loop, PhosphoSitePlus, and NetworKIN) as well

132 as combined score analysis (INKA). Bars for known driver kinases are highlighted by coloring

133 for each cell line except SK-Mel-28, which is driven by BRAF, a serine/threonine kinase that

134 cannot be detected by pTyr-based phosphoproteomics. In the latter case, downstream driver

135 targets in the MEK-ERK pathway (MAP2K1, MAP2K2, MAPK1, MAPK3) are highlighted (Fig

136 2B). Underlying data can be found in Supplementary Table 3.

7 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

137 From the first column of bar graphs in Fig 2 it is evident that kinome analysis, based on all

138 phosphopeptides identified for a kinase, is a strong indicator for (hyper)active kinases, as was

139 found previously (Rikova et al, 2007; Guo et al, 2008). All drivers rank among

140 the top two. Activation loop analysis (Fig 2, second column) follows a similar trend, but suffers

141 from a focus on a small protein segment. With respect to substrate-centric analyses, both

142 PhosphoSitePlus and NetworKIN analyses (Fig 2, third and fourth columns, respectively) rank

143 driver kinases among the top 10. One of the two analyses often outperforms the other, but not in

144 a consistent pattern, underscoring our choice to use a combination of these approaches. Since

145 substrate-centric inference attributes data from multiple, possibly numerous, substrate

146 phosphosites to a single kinase, bar segments may coalesce into a black stack in more extreme

147 cases.

148 From the above it is clear that individual analysis modules harbor complementary layers of

149 information about phosphokinases and phosphosubstrates that can be integrated using INKA

150 scoring into a more balanced and accurate pattern of inferred kinase activity (Fig 2, last column).

151 Importantly, all driver (-related) kinases rank among the INKA top 3 for the tested cell lines

152 (with MAPK3 serving as a surrogate for the BRAF Ser/Thr kinase in SK-Mel-28).

153 In order to explore the statistical significance of the INKA score, and especially the impact of

154 kinase inference through substrate-centric PhosphoSitePlus and NetworKIN analyses, we

155 permuted both phosphopeptide-spectral count links (for a sample) and kinase-substrate links (for

156 all PhosphoSitePlus and NetworKIN data), followed by INKA score calculation. After 105

157 permutations, sample- and kinase-specific INKA score distributions were obtained and used to

158 derive p-values for each actual INKA score. Based on this calculation, almost all top 20 INKA

8 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

159 scores were significant (Fig 2). Higher INKA scores clearly correlated with lower p-values

160 (Supplementary Fig 2), underscoring the relevance of the INKA score.

161

162 Linking INKA analysis to cell biology and treatment selection

163 To assess to what extent (hyper)active phosphokinases identified by the above INKA analysis

164 represent actionable drug targets, we investigated public cell line drug sensitivity data from the

165 'Genomics of Drug Sensitivity in Cancer' resource (Yang et al, 2013) (GDSC,

166 http://www.cancerrxgene.org, Supplementary Table 4).

167 K562. Clinically, treatment of BCR-ABL-positive CML patients with kinase inhibitors such as

168 nilotinib, imatinib, and dasatinib has shown great benefit (Kantarjian et al, 2010; Saglio et al,

169 2010; Larson et al, 2012), in line with addiction of tumor cells to the BCR-ABL fusion kinase.

170 BCR/ABL activity was inferred in all basic analyses of K562, causing ABL1 to stand out as a

171 prime candidate in the INKA analysis (Fig 2A, Fig 3A). The integrated analysis also inferred

172 phosphorylation of downstream signaling partners of ABL1 such as SRC-family kinases and

173 MAPK1/3 among all members of the kinase-substrate relation network for top 20 INKA kinases

174 in K562 (Fig 3A, enlarged in Supplementary Fig 3). As expected, GDSC data indicate K562

175 cells to be sensitive to various ABL inhibitors (Supplementary Table 4).

176 SK-Mel-28. A vast majority of patients with BRAF-mutant melanoma carry a mutation encoding

177 an activating V600E substitution in the BRAF kinase domain (Forbes et al, 2015; Lovly et al,

178 2012; Rubinstein et al, 2010). Treatment with small-molecule inhibitors of BRAF (Chapman et

179 al, 2011; Hauschild et al, 2012; Flaherty et al, 2012) and/or downstream MEKs (MAP2K1/2)

180 (Flaherty et al, 2012; Robert et al, 2015) often induces strong initial responses, but both single-

181 agent and combination treatments are commonly hampered by development of resistance. The

9 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

182 BRAF Ser/Thr kinase cannot be directly observed with pTyr-based phosphoproteomics, but

183 INKA analysis of SK-Mel-28 revealed high- and medium-ranking activity of its downstream

184 pathway members MAPK3 and MAPK1, respectively, in addition to high activity of focal

185 adhesion kinase (PTK2) and SRC-family members (Fig 2B, Fig 3B). Of the latter, SRC is a

186 central node in the inferred kinase-substrate network of top 20 INKA-scoring kinases in SK-Mel-

187 28 (Fig 3B, Supplementary Fig 4). GDSC data indicate that inhibition of BRAF and downstream

188 MEK1/2 is successful in reducing SK-Mel-28 growth, while inhibition more downstream of

189 MAPK1/3 is less effective (Supplementary Table 4). Single-agent inhibition was not effective

190 for PTK2, and undocumented for SRC-family kinases FYN and YES1. Based on INKA data, one

191 could test 'combination therapy' with a PTK2 or SRC inhibitor and a BRAF or MEK1/2

192 inhibitor. Interestingly, a previous study showed the importance of SRC in BRAF inhibitor

193 resistance in BRAF-mutant melanoma cells and patient-derived tissues (Girotti et al, 2013).

194 HCC827-ER3. Non-small cell lung carcinoma (NSCLC)-derived HCC827 cells expressing

195 mutant EGFRE746-A750 are highly sensitive to EGFR inhibitors (GDSC, Supplementary Table 4).

196 HCC827-ER3 is a sub-line of HCC827 with in vivo acquired resistance to the EGFR inhibitor

197 erlotinib (Zhang et al, 2012), and cross-resistance to sunitinib (van der Mijn et al, 2016). INKA

198 analysis suggests that HCC827-ER3 still contains hyperactive EGFR, with its score ranking

199 second (Fig 2C, Fig 3C), indicating driver potential. Network visualization of inferred kinase-

200 substrate relations (Fig 3C, Supplementary Fig 5) shows that both EGFR and MET are highly

201 connected hyperactive nodes. Based on INKA scoring, acquired resistance of HCC827-ER3 to

202 erlotinib (Zhang et al, 2012; van der Mijn et al, 2016) could potentially be overcome by co-

203 targeting MET, which ranks first. However, high MET phosphorylation was already found in

204 parental, erlotinib-sensitive HCC827 cells (Zhang et al, 2012; van der Mijn et al, 2016)

10 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

205 (Supplementary Table 3E, Supplementary Fig 6), that appear relatively insensitive to MET

206 inhibitors (Supplementary Table 4). Treating HCC827 with gefitinib, which inhibits EGFR but

207 not MET (Davis et al, 2011), not only reduces hyperphosphorylation but also overall levels of

208 MET (Guo et al, 2008). Thus, high INKA ranking of MET may be the result of EGFR-MET

209 crosstalk, rather than indicating driver activity. Interestingly, growth of HCC827-ER3 is

210 inhibited after co-targeting of EGFR and AXL (Guo et al, 2008; van der Mijn et al, 2016).

211 Unfortunately, AXL (hyper)activity cannot be pinpointed by INKA, as both experimental

212 (PhosphoSitePlus) and predicted (NetworKIN) kinase-substrate relationship data are lacking for

213 AXL. However, in a kinase-centric ranking devised for kinases that suffer from such a problem,

214 phospho-AXL ranks second with relatively high counts (Fig 2C, Fig 3C left panel) whereas it

215 was not identified in parental HCC827 cells (Supplementary Fig 6), in line with AXL being a

216 resistance hub in HCC827-ER3 (Guo et al, 2008; van der Mijn et al, 2016).

217 H2228. In NSCLC-derived H2228 cells, an EML4-ALK fusion underlies oncogenic ALK kinase

218 activity. Indeed, several ALK inhibitors have been approved for the treatment of ALK fusion-

219 positive NSCLC. ALK ranked third in INKA analysis (Fig 2D, Fig 3D), which may be due to

220 paucity of information for this kinase (as for AXL, see above) or indicate involvement of other

221 kinases in fueling . GDSC data indicate that effective inhibitors of H2228 growth are

222 rare, even when targeting ALK (alectinib: IC50 4.4 µM, Z-score -1.92; Supplementary Table 4).

223 The kinase-substrate network for top 20 INKA-scoring kinases in H2228 (Fig 3D,

224 Supplementary Fig 7) shows that in addition to ALK there are multiple hyperactive, highly

225 connected kinases (e.g., PTK2, SRC, EGFR) as candidates for combination treatment. Inhibitors

226 targeting these kinases individually do not have a significant effect on H2228 proliferation

227 (Supplementary Table 4). PTK2 and SRC are inhibited by several ALK inhibitors at IC50

11 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

228 concentrations (Davis et al, 2011), ruling these kinases out as co-targets. Interestingly, EGFR, a

229 high-ranking kinase in INKA and a central node in the network, is implicated in reduced

230 sensitivity to ALK inhibition in H2228, while dual inhibition of ALK and EGFR results in highly

231 increased apoptosis (Voena et al, 2013). Furthermore, both simultaneous knockdown and

232 combined drug inhibition of these kinases inhibit H2228 cell proliferation and tumor xenograft

233 growth more potently than targeting either kinase alone (Li et al, 2016).

234

235 Testing the INKA approach with literature data

236 To further test our strategy for prioritizing targeting candidates, we also examined

237 phosphoproteome data from the literature (Guo et al, 2008; Bai et al, 2012) (Fig 4,

238 Supplementary Table 3). INKA analysis of data on EGFR-mutant NSCLC cell line H3255 (Guo

239 et al, 2008) uncovered major EGFR activity in these cells, with EGFR ranking first, followed by

240 MET (Fig 4A). In another study, the rhabdomyosarcoma-derived cell line A204 was associated

241 with PDGFR signaling (Bai et al, 2012), and INKA scoring of the underlying data accordingly

242 ranks PDGFR in second place (Fig 4B). In the same study, osteosarcoma-derived MNNG/HOS

243 cells were shown to be dependent on MET signaling and sensitive to MET inhibitors (Bai et al,

244 2012). In line with this, INKA analysis clearly pinpointed MET as the major driver candidate in

245 this cell line (Fig 4C).

246 Altogether, analysis of public datasets illustrates the capacity of INKA scoring to identify

247 kinases that are relevant oncogenic drivers in diverse cancer cell lines at baseline.

248

249 Testing the INKA approach in differential settings

250 To explore the discriminative power of INKA scoring, we analyzed pTyr-phosphoproteomic data

12 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

251 from wild-type versus mutant cells, and from untreated versus drug-treated cells as genetic and

252 pharmacological dichotomies (Fig 5, Supplementary Table 3). First, we utilized raw data from

253 our laboratory (van der Mijn et al, 2014) to compare wild-type U87 glioblastoma cells with

254 isogenic U87-EGFRvIII cells overexpressing a constitutively active EGFR mutant. The EGFR

255 INKA score was significantly higher and dominating in U87-EGFRvIII relative to U87 (Fig 5A).

256 Some other kinases also exhibited higher-ranking INKA scores, including MET and EPHA2, for

257 which enhanced phosphorylation in U87-EGFRvIII was previously documented (Huang et al,

258 2007; Stommel et al, 2007), as well as SRC-family members. In a treatment setting, INKA

259 analysis clearly revealed a specific drug effect after targeting EGFR in U87-EGFRvIII, with the

260 high, first-rank INKA score for EGFR at baseline being halved after treatment with erlotinib (Fig

261 5B).

262 Second, in order to extend these findings to a clinical framework, INKA scoring was applied to

263 pTyr-phosphoproteomic data on patient tumor biopsies (Fig 5C,D). Biopsies were collected both

264 before and after two weeks of erlotinib treatment (standard dose, clinical trial NCT01636908;

265 Labots et al., submitted for publication). Importantly, the on-treatment biopsy from a patient

266 with advanced head and neck squamous cell carcinoma showed a reduced INKA score and rank

267 for EGFR as well as -associated kinases (Fig 5C). Interestingly, in a pancreatic cancer

268 patient, no residual EGFR activity could be inferred by INKA in a tumor biopsy after erlotinib

269 treatment (Fig 5D).

270 Third, we assessed performance of INKA analysis using published phosphoproteome data

271 (Bensimon et al, 2010) of human G361 melanoma cells after induction of genotoxic stress with

272 neocarzinostatin, a radiomimetic that induces double-strand breaks (Fig 5E,F). As these data

273 were generated using an earlier-generation mass spectrometer that is less sensitive than current

13 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

274 orbitrap-based instruments, count data did not work well, and MS intensity data were analyzed

275 instead. As expected in the context of DNA damage signaling, INKA scores for ATM and

276 PRKDC/DNA-PK exhibited a time-dependent increase after addition of neocarzinostatin (Fig

277 5E,F and Supplementary Fig 8). Moreover, the ATM INKA score was significantly reduced after

278 the addition of ATM inhibitor KU55933 (Fig 5E). INKA scoring suggests that the inhibitor

279 influences PRKDC as well (Fig 5F).

280 In summary, the above differential analyses of phosphoproteomes show that the INKA pipeline

281 can pinpoint target activation and inhibition after perturbation in both cell lines and clinical

282 samples.

283

284 Discussion

285

286 Present-day cancer treatment is increasingly shifting towards individualized therapy by specific

287 targeting of hyperactive kinases in patient tumors. In this context, with heterogeneity and

288 plasticity of kinase signaling in a specific tumor at a specific time, it is essential to have an

289 overview of (hyper)active kinases and prioritize ones that (help) drive malignancy to maximize

290 therapeutic success and minimize expensive failures and unnecessary burden for the patient.

291 Here, we present a novel pipeline, Integrative Inferred Kinase Activity (INKA) scoring, to

292 investigate phosphoproteomic data from a single sample and identify (hyper)active kinases as

293 candidates for (co-)targeting with kinase inhibitors. In a first demonstration of its application, we

294 have performed INKA analyses of established cancer cell lines with known oncogenic drivers.

295 We analyzed both data from tyrosine phosphoproteomics in our laboratory and similar data

296 described in the literature (five cell lines in each case). Furthermore, INKA could distinguish

14 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

297 relevant differences between closely related mutant and wild type cells and reveal drug

298 perturbation effects in both tyrosine and global (TiO2) phosphoproteomics data. We also applied

299 INKA scoring to tumor needle biopsies of two patients before and after kinase inhibitor

300 treatment.

301 This study shows that data from label-free MS-based phosphoproteomics can be harnessed in a

302 multipronged analysis to infer and rank kinase activity in individual biological samples. In

303 kinase-centric analyses, phosphorylation of the kinase itself (either considering all sites or

304 focusing on the ones located in the activation loop) is used as a proxy for its activation. In

305 substrate-centric analyses, instead, substrate phosphorylation is used to deduce kinase activity

306 indirectly through kinase-substrate relationships (based on either experimental knowledge or

307 motif-based prediction). Here we developed INKA that combines kinase-centric and substrate-

308 centric evidence in a stringent meta-analysis, to yield an integrated metric for inferred kinase

309 activity. The results of INKA highlight kinases that are in line with known cancer biology and

310 show that INKA scoring clearly outperforms substrate-centric analyses alone, and also holds a

311 slight edge over phosphokinase ranking pioneered by Rikova et al12. In particular, kinases

312 driving tumor growth rank high in INKA scoring, illustrating the power of applying an

313 integrative analysis of in-depth phosphoproteomics data.

314 Meaningful substrate-centric inference of kinase activity is pivotal to INKA scoring. It depends

315 on the availability of comprehensive curated data on experimentally observed kinase-substrate

316 relationships or reliable predictions thereof. To date, this requirement has only been partly

317 fulfilled, and more substantial inferences could be made with further population of resources

318 (such as PhosphoSitePlus), especially when the latter also cover cancer-associated aberrations

319 such as fusion genes BCR-ABL and EML4-ALK (Medves & Demoulin, 2012; Lee et al, 2017).

15 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

320 To overcome current limitations in information, our INKA pipeline provides a separate kinase-

321 centric ranking of kinases that are not yet covered by PhosphoSitePlus and NetworKIN, but do

322 show up as phosphoproteins in a sample. This reduces the chance of missing important kinases,

323 as illustrated by the case of AXL in the HCC827-ER3 cell line. Moreover, the INKA pipeline

324 generates network visualizations of all kinase-substrate relationships inferred for the top 20

325 kinases in an experiment. This provides a more instructive overview of phosphoproteomic

326 biology in a sample than mere scoring and ranking alone.

327 A next stage is to apply INKA to more advanced cancer models and, especially, clinical samples.

328 To analyze limited amounts of patient tumor tissue in a clinical practice setting, we have recently

329 downscaled tyrosine phosphoproteomics to clinical needle-biopsy levels56. Using this workflow,

330 we performed proof-of-concept analysis of patient samples from a clinical molecular profiling

331 study with kinase inhibitors (Labots et al., submitted), showing relevant effects.

332 In summary, INKA scoring can infer and rank kinase activity in a single biological sample, and

333 display differences between closely related yet genetically distinct cells and after drug

334 intervention. INKA analysis of phosphoproteomic data on tumor biopsies collected in kinase

335 inhibitor trials can pave the way for future clinical application. The ultimate goal would be

336 tailoring treatment selection for the individual patient.

337

338 Materials and Methods

339

340 Cell lines and culture

341 H2228 non-small cell lung cancer cells carry an EML4-ALK fusion , SK-Mel-28 melanoma

342 cells harbor a V600E BRAF mutation, and K562 chronic myelogenous leukemia cells carry a

16 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

343 BCR-ABL1 fusion gene (philadelphia ). HCC827 and HCC827-ER3 are non-small

344 cell lung cancer cell lines with a deletion of EGFR residues E746-A750, the latter cell line being

345 a subclone of HCC827 after selection for acquired resistance to erlotinib in a murine xenograft

346 setting (Zhang et al, 2012). HCC827, H2228, SK-Mel-28 and K562 were obtained from the

347 American Type Culture Collection (ATCC, Rockville, MD) and HCC827-ER3 was provided by

348 Prof. Balazs Halmos, Division of Hematology/Oncology, Columbia University Medical Center,

349 New York, USA.

350 H2228 and SK-Mel-28 were cultured in 175-cm2 flasks containing DMEM supplemented with

351 10% fetal bovine serum and 2mM L-glutamine. K562 was cultured as a suspension in RPMI

352 1640 containing 2 mM L-glutamine and supplemented with 10% fetal bovine serum. HCC827

353 and HCC827-ER3 were grown in 15-cm dishes with the latter medium. Cell lines tested negative

354 for mycoplasma.

355

356 Cell lysis and phosphoproteomics

357 Cells were grown to 70-80% confluency (adherent cell lines) or to a 1 x 106 cells/ml suspension

358 (K562), washed with phosphate-buffered saline (PBS), lysed at approximately 2-3 x 107 cells/ml

359 in 5 ml lysis buffer (9 M urea, 20 mM HEPES pH 8.0, 1 mM sodium orthovanadate, 2.5 mM

360 sodium pyrophosphate, 1mM ß-glycerophosphate), sonicated (3 cycles of 30 s), and extracts

361 were stored at -80 oC.

362 For phosphoproteomics, lysate aliquots equivalent to 10 mg total protein were used as described

363 before (van der Mijn et al, 2015). were reduced by incubation in 4.5 mM dithiothreitol

364 for 30 min at 55 oC, alkylated in 10 mM iodoacetamide for 15 min at room temperature in the

365 dark, and digested overnight at room temperature with 10 µg/ml trypsin in a fourfold increased

17 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

366 volume containing 20 mM HEPES pH 8.0. After acidification (trifluoroacetic acid to 1% final

367 concentration), tryptic digests were desalted on Sep-Pak C18 cartridges (Waters

368 Chromatography, Etten-Leur, Netherlands) and lyophilized. Peptides were taken up in 700 µl

369 immunoprecipitation buffer (50 mM MOPS pH 7.2, 10 mM sodium phosphate, 50 mM NaCl)

370 and transferred at 4 oC to a microcentrifuge tube containing 40 µl of a 50% (v/v) slurry of

371 agarose beads harboring P-Tyr-1000 anti-phosphotyrosine monoclonal antibodies (Cell Signaling

372 Technologies, Danvers, USA) that had been washed and taken up in PBS. Following a 2-h

373 incubation at 4 oC on a rotator, beads were washed twice with cold PBS, and three times with

374 cold HPLC-grade water. Bound peptides were eluted with a total of 50 µl 0.15% trifluoroacetic

375 acid in two steps. After desalting with custom-made C18 stage tips, drying, and solubilization in

376 20 µl 4% acetonitrile/0.5% trifluoroacetic acid, samples were subjected to LC-MS/MS analysis.

377 For tumor biopsy phosphoproteomics, tumor needle biopsies from a patient with advanced head

378 and neck squamous cell carcinoma, obtained in an Institutional Review Board-approved

379 molecular profiling study before and after 2 weeks of treatment with erlotinib (NCT clinical

380 trials identifier 01636908; www.clinicaltrials.gov), were cut and processed as described

381 elsewehere (Labots et al, 2017). Peptide preparation from tumor biopsy lysates was performed

382 with 2.5 mg protein input for both pre- and on-treatment biopsies.

383

384 LC-MS/MS

385 Peptides were separated on an Ultimate 3000 nanoLC-MS/MS system (Dionex LC-Packings,

386 Amsterdam, The Netherlands) equipped with a 20-cm, 75-μm inner diameter fused silica

387 column, custom packed with 1.9-μm ReproSil-Pur C18-AQ silica beads (120-Å pore size; Dr.

388 Maisch, Ammerbuch-Entringen, Germany). After injection, peptides were trapped at 6 μl/min on

18 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

389 a 10-mm, 100-μm inner diameter trap column packed with 5-μm ReproSil-Pur C18-AQ silica

390 beads (120-Å pore size) in buffer A (buffer A: 0.5% acetic acid, buffer B: 80% acetonitrile, 0.5%

391 acetic acid), and separated at 300 nl/min with a 10–40% buffer B gradient in 90 min (120 min

392 inject-to-inject). Eluting peptides were ionized at a potential of +2 kV and introduced into a Q

393 Exactive mass spectrometer (Thermo Fisher, Bremen, Germany). Intact masses were measured

394 in the orbitrap with a resolution of 70,000 (at m/z 200) using an automatic gain control (AGC)

395 target value of 3 × 106 charges. Peptides with the top 10 highest signals (charge states 2+ and

396 higher) were submitted to MS/MS in the higher-energy collision cell (4-Da isolation width, 25%

397 normalized collision energy). MS/MS spectra were acquired in the orbitrap with a resolution of

398 17,500 (at m/z 200) using an AGC target value of 2 × 105 charges and an underfill ratio of 0.1%.

399 Dynamic exclusion was applied with a repeat count of 1 and an exclusion time of 30 s.

400

401 Peptide identification and quantification

402 MS/MS spectra were searched against theoretical spectra based on a UniProt complete human

403 proteome FASTA file (release January 2014, no fragments; 42104 entries) using MaxQuant

404 1.4.1.2 software (Cox & Mann, 2008). specificity was set to trypsin, and up to two

405 missed cleavages were allowed. Cysteine carboxamidomethylation (+57.021464 Da) was treated

406 as fixed modification and serine, threonine and tyrosine phosphorylation (+79.966330 Da),

407 methionine oxidation (+15.994915 Da) and N-terminal acetylation (+42.010565 Da) as variable

408 modifications. Peptide precursor ions were searched with a maximum mass deviation of 4.5 ppm,

409 and fragment ions with a maximum mass deviation of 20 ppm. Peptide and protein

410 identifications were filtered at a false discovery rate of 1% using a decoy database strategy. The

411 minimal peptide length was set at 7 amino acids, the minimum Andromeda score for modified

19 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

412 peptides was 40, and the corresponding minimum delta score was 17. Proteins that could not be

413 differentiated based on MS/MS spectra alone were clustered into protein groups (default

414 MaxQuant settings). Peptide identifications were propagated across samples using the 'match

415 between runs' option. Phosphopeptide MS/MS spectral counts (Liu et al, 2004) were calculated

416 from the MaxQuant evidence file using R.

417

418 INKA analysis

419 The INKA analysis pipeline was implemented in R, utilizing custom tables with data extracted

420 from web resources including UniProt (UniProt Consortium, 2015) (for mapping attributes of

421 UniProt accessions; www..org, mapping date 8 June 2016), PhosphoSitePlus (Hornbeck

422 et al, 2015) (for experimentally observed phosphorylation sites and kinase-substrate

423 relationships; www.phosphosite.org, Phosphorylation_site_dataset and

424 Kinase_Substrate_Dataset, versions of 3 July 2016), KinBase (Manning et al, 2002) (for

425 currently recognized protein kinases; kinase.com/web/current/kinbase, mapping date 20 July

426 2016), and HGNC (Yates et al, 2017) (for mapping to official gene symbols of the HUGO Gene

427 Nomenclature Committee; www.genenames.org). Furthermore, a tool for the identification of

428 kinase activation loop peptides (Munk et al, 2016), provided on the Phomics website

429 (phomics.jensenlab.org/ activation_loop_peptides), and a locally running version of the

430 NetworKIN algorithm (Horn et al, 2014) were used. For the latter NetworKIN engine,

431 NetworKIN version 3.0 code was used in combination with a UniProt human reference proteome

432 FASTA file derived from release 2014_01 filtered for "no fragments", and containing 21849

433 TrEMBL entries and 39703 Swiss-Prot entries.

20 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

434 Data filtering and annotation. For phosphopeptide data, we used data from the MaxQuant

435 'modificationSpecificPeptides' table, combined with spectral count data calculated above.

436 Although INKA analysis can be performed with intensity-based quantification, we favor spectral

437 count-based quantification as it is less sensitive to peptides with outlier intensities and is more

438 robust for the analysis of aggregated data for multiple peptides, some of which may exhibit

439 dominantly high intensities. For our QExactive data, spectral counting outperformed intensity-

440 based quantification for INKA-based kinase ranking of known drivers (data not shown), yet for

441 the low level LTQ-FTMS data the intensity data worked better (Fig 5E,F; count data not shown).

442 Table rows with data linked to multiple UniProt gene symbols were deconvoluted into separate

443 rows with a single gene symbol. For phosphosite data, we used data from the MaxQuant

444 'Phospho (STY)Sites' table, filtering for so-called class I sites (localization probability > 0.75).

445 Table rows with data linked to multiple UniProt accessions, and those linked to multiple

446 phosphopeptides, were deconvoluted into separate rows. Data from the web resources mentioned

447 above were used to prioritize rows linking the same phosphosite to the same gene, only retaining

448 the row with the best annotated accession. Subsequently, the phosphosite data were merged with

449 pertinent phosphopeptide data in a single, non-redundant class-I phosphosite-phosphopeptide

450 table.

451 Plot data generation. (1) Kinome analysis: A table with data for kinome analysis was generated

452 from the deconvoluted phosphopeptide table obtained above by removing rows with different

453 UniProt gene symbols linking the same phosphopeptide to the same HGNC-mapped symbol, and

454 then filtering for rows with phosphopeptides derived from protein kinases (HGNC-mapped

455 symbol among those for established protein kinases catalogued in KinBase). (2) Activation Loop

456 analysis: the table generated for kinome analysis was further filtered for phosphopeptides that

21 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

457 harbor activation loop sites as indicated by the Phomics web resource. (3) PhosphoSitePlus

458 (PSP) analysis: the non-redundant class-I phosphosite-phosphopeptide table was merged with a

459 table with data from PhosphoSitePlus detailing experimentally observed human kinase-substrate

460 relationships. (4) NetworKIN (NWK) analysis: based on phosphosite data in the non-redundant

461 class-I phosphosite-phosphopeptide table, kinase-substrate relationships were predicted (outside

462 R) using the NetworKIN algorithm in combination with the same FASTA file that was used for

463 peptide and protein identification. The results were imported in our R pipeline and restricted to

464 predictions associated with a NetworKIN score of at least 2 that, in addition, exceeded 90% of

465 the score of the top prediction for the same phosphosite. The non-redundant class-I phosphosite-

466 phosphopeptide table was then merged with the filtered NetworKIN results table.

467 From the four analysis tables, lists with plot data tables for individual samples were generated as

468 follows. For the kinase-centric kinome and activation loop analyses, per sample, kinase gene

469 symbols and associated spectral count values were extracted from all rows (corresponding to

470 phosphopeptides) in the respective tables, and spectral counts were multiplied by the number of

471 phosphorylations of the pertinent peptide so as to account linearly for all phosphorylation

472 activity impinging on the protein. For the substrate-centric PSP and NWK analyses, the same

473 was done with the corresponding analysis tables, but multiplied spectral counts were further

474 divided by the number of phosphosites inferred by MaxQuant in the 'Phospho (STY)Sites' table.

475 As the latter number may exceed the number of actual phosphomodifications of the

476 phosphopeptide, this makes sure the signal is not exaggerated.

477 Kinase bar graph plotting. To visualize kinase phosphorylation levels (kinase-centric analyses),

478 or substrate phosphorylation levels attributed to specific kinases (substrate-centric analyses), the

479 above plot data lists were used to produce, per sample, bar graphs for the top 20 kinases in each

22 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

480 of the four analyses. Specifically, all signals (spectral counts) assigned to a specific kinase were

481 represented by stacked bar segments. For kinase-centric analyses, bar segments correspond to

482 individual phosphopeptides; for substrate-centric analyses, they represent individual

483 phosphosites (aggregated signal of one or more phosphopeptides).

484 INKA analysis and plotting. To infer kinase activity within a single sample, we integrated data

485 from the four analysis tables relevant to the sample (visualized separately in kinase bar graphs

486 above), and calculated, for each kinase, a single INKA score metric. First, from the different

487 analyses we aggregated cumulative signals (corresponding to bars above) assigned to a given

488 kinase, summing values from the kinome analysis and those from the activation loop analysis to

489 get a kinome-centric measure (퐶푘푖푛) according to equation (1), and summing values from the

490 PSP and NWK analyses to get a substrate-centric measure (퐶푠푢푏) according to equation (2):

491 퐶푘푖푛 = 퐶푘푖푛표푚푒 + 퐶푎푐푡푖푣푎푡푖표푛푙표표푝 (1)

492 퐶푠푢푏 = 퐶푃푆푃 + 퐶푁푊퐾 (2)

493 Subsequently, the geometric mean of these two measures was taken to obtain the final, integrated

494 INKA score via equation (3), and a 'skew' parameter was defined in equation (4) to indicate the

495 relative contribution of kinase-centric versus substrate-centric evidence to the INKA score:

496 푆푐표푟푒 = √퐶푘푖푛 ⋅ 퐶푠푢푏 (3)

2 퐶 497 푆푘푒푤 = 푎푟푐푡푎푛 ( 푠푢푏) (4) 휋 퐶푘푖푛

498 The skew parameter is a goniometric correlate of the ratio of substrate-centric and kinase-centric

499 evidence, with the corresponding angle normalized on [0, 휋] to give a value in [0,1]. The value is 2

500 0 when all evidence is kinase-centric, 1 when all evidence is substrate-centric, and 0.5 when

501 there is equal contribution from both sides. To produce an intuitive, visual representation of the

502 inferred (hyper)active kinases in a given experimental sample, for each kinase, the combination

23 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

503 of the INKA score and skew parameter values were taken as vertical and horizontal coordinates,

504 respectively, to create a scatter plot.

505 Kinases for which there is no available data on experimentally observed substrates (PSP), nor

506 any reliable sequence motif with which to predict the kinase's substrates (NWK), will always get

507 a zero 퐶푠푢푏 score, and therefore a zero INKA score. Therefore, for these kinases we performed a

508 separate ranking solely based on the kinase-centric side of the INKA score, i.e, the sum of

509 signals in the kinome and activation loop analyses, 퐶푘푖푛. Flanking the INKA scatter plot, a bar

510 graph was plotted for the top 20 of such 'out-of-scope' kinases, only considering kinases

511 associated with a total of at least 2 spectral counts. As a more simplified visualization, INKA

512 scores for the top 20 kinases were also plotted in a bar graph for each sample.

513 Kinase-substrate relationship network plotting. For network visualization of inferred kinase-

514 substrate relations annotated with INKA scores, the 'network' R package (Butts, 2008) was used.

515 Per sample, relevant data were extracted from a merge of the PSP and NWK analysis tables

516 generated above. Edge lists were created with kinases and substrates as tail and head nodes,

517 respectively, of directed edges. Other data were stored as node and edge attributes in the network

518 object. Nodes were depicted as a hexagon for observed kinases (one or more phosphopeptides

519 derived from the kinase identified), as a pentagon for inferred kinases (no direct observation, but

520 linked to phosphorylation of one or more substrate phosphopeptides), or as a circle for non-

521 kinase substrates. INKA scores were used to define kinase node colors in a white to red gradient,

522 and kinases with at least one phosphorylated activation loop phosphosite were indicated by a

523 thicker node border. Edge widths were made to correlate with the associated substrate site

524 phosphorylation level, and edge colors to the analysis on which the kinase-substrate relationship

525 was based (PSP: coral, NWK: blue, both: green). Node coordinates were calculated using a

24 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

526 Fruchterman-Reingold layout algorithm for force-directed graph drawing with empirically

527 chosen parameter settings (niter=100N, area=N1.8, repulse.rad=N1.5, ncell=N3; N: number of

528 nodes).

529 Statistical significance assessment. Random INKA scores for each kinase in each sample were

530 generated using a two-fold randomization procedure. First, all measured spectral count values

531 were permuted in a sample-wise fashion. Second, as the kinase responsible for phosphorylation

532 of a specific phosphosite, a kinase was randomly selected from the pool of all kinases that could

533 possibly be inferred. To this end, all kinase-substrate links in the PhosphoSitePlus data table and

534 in a proteome-wide NetworKIN analysis table (obtained for all phosphosites in all protein

535 sequences contained in the same FASTA file as used in the identification stages) were

536 randomized. The result of this procedure was then used as input for the calculation of INKA

537 scores in the same way as described above. Iterating this procedure 100,000 times gave sample-

538 and kinase-specific null distributions which enabled the calculation of p-values for each of the

539 original INKA scores associated with each sample.

540

541 Acknowledgements

542 543 SK-Mel-28 cells from ATCC were kindly provided by Dr. Elisa Giovannetti, AIRC/Start-Up

544 Unit, Pisa University, Italy. HCC827 ER3 cells were kindly provided by Prof. Balazs Halmos,

545 Division of Hematology/Oncology, Columbia University Medical Center, New York, USA. Raw

546 label-free phosphoproteomic data on human melanoma G361 cells after radiomimetic treatment

547 were kindly provided by Dr. A. Bensimon, ETH Zürich, Switzerland. Inge de Reus is

548 acknowledged for experimental assistance. Dr. Frank Koopman is thanked for critically reading

549 the manuscript. VitrOmics Healthcare Services (VHS), Cancer Center Amsterdam, and NWO-

25 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

550 Middelgroot (project number 91116017) are acknowledged for support of the mass spectrometry

551 infrastructure, and VHS also for support of RB.

552

553 Author Contributions

554

555 CRJ designed the experiments. TVP, RB, JCK, AH wrote the R script and performed the data

556 analysis. AH proposed the combination score and developed the web server. CvA and RdH

557 performed the experiments. CRJ, CvA and TLL gave input on the analysis. EH and SRP

558 optimized NetworKIN settings. SRP performed nanoLC-MS/MS, database searching and

559 prepared figures. HV and ML provided clinical expertise. CRJ, SRP, TVP, RB, CvA, JCK, AH,

560 FR, ML and HV wrote the paper.

561

562 Conflict Of Interest

563 The authors declare no conflicts of interest.

564

565 Data Availability

566 The mass spectrometry data from this publication have been deposited to the ProteomeXchange

567 Consortium via the PRIDE partner repository [www.ebi.ac.uk/pride/archive] and assigned the

568 identifier PXD006616.

569

570 Code Availability

571 Researchers can analyze their data by the INKA pipeline via www.inkascore.org.

572

26 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

573 References

574

575 Al-Lazikani B, Banerji U & Workman P (2012) Combinatorial drug therapy for cancer in the

576 post-genomic era. Nat. Biotechnol. 30: 679–92

577 Bai Y, Li J, Fang B, Edwards A, Zhang G, Bui M, Eschrich S, Altiok S, Koomen J & Haura EB

578 (2012) Phosphoproteomics identifies driver tyrosine kinases in sarcoma cell lines and

579 tumors. Cancer Res. 72: 2501–11

580 Bensimon A, Schmidt A, Ziv Y, Elkon R, Wang S-Y, Chen DJ, Aebersold R & Shiloh Y (2010)

581 ATM-Dependent and -Independent Dynamics of the Nuclear Phosphoproteome After DNA

582 Damage. Sci. Signal. 3: rs3

583 Butts CT (2008) network : A Package for Managing Relational Data in R. J. Stat. Softw. 24: 1–

584 36

585 Carey TE, Takahashi T, Resnick LA, Oettgen HF & Old LJ (1976) Cell surface antigens of

586 human malignant melanoma: mixed hemadsorption assays for humoral immunity to

587 cultured autologous melanoma cells. Proc. Natl. Acad. Sci. U. S. A. 73: 3278–82

588 Casado P, Hijazi M, Britton D & Cutillas PR (2016) Impact of phosphoproteomics in the

589 translation of kinase-targeted therapies. Proteomics: 1600235

590 Casado P, Rodriguez-Prados J-C, Cosulich SC, Guichard S, Vanhaesebroeck B, Joel S & Cutillas

591 PR (2013) Kinase-Substrate Enrichment Analysis Provides Insights into the Heterogeneity

592 of Signaling Pathway Activation in Leukemia Cells. Sci. Signal. 6: rs6-rs6

593 Chapman PB, Hauschild A, Robert C, Haanen JB, Ascierto P, Larkin J, Dummer R, Garbe C,

594 Testori A, Maio M, Hogg D, Lorigan P, Lebbe C, Jouary T, Schadendorf D, Ribas A,

595 O’Day SJ, Sosman JA, Kirkwood JM, Eggermont AMM, et al (2011) Improved survival

27 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

596 with vemurafenib in melanoma with BRAF V600E mutation. N. Engl. J. Med. 364: 2507–

597 16

598 Cox J & Mann M (2008) MaxQuant enables high peptide identification rates, individualized

599 p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26:

600 1367–72

601 Davies H, Bignell GR, Cox C, Stephens P, Edkins S, Clegg S, Teague J, Woffendin H, Garnett

602 MJ, Bottomley W, Davis N, Dicks E, Ewing R, Floyd Y, Gray K, Hall S, Hawes R, Hughes

603 J, Kosmidou V, Menzies A, et al (2002) Mutations of the BRAF gene in human cancer.

604 Nature 417: 949–954

605 Davis MI, Hunt JP, Herrgard S, Ciceri P, Wodicka LM, Pallares G, Hocker M, Treiber DK &

606 Zarrinkar PP (2011) Comprehensive analysis of kinase inhibitor selectivity. Nat.

607 Biotechnol. 29: 1046–51

608 Drake JM, Paull EO, Graham NA, Lee JK, Smith BA, Titz B, Stoyanova T, Faltermeier CM,

609 Uzunangelov V, Carlin DE, Fleming DT, Wong CK, Newton Y, Sudha S, Vashisht AA,

610 Huang J, Wohlschlegel JA, Graeber TG, Witte ON & Stuart JM (2016) Phosphoproteome

611 Integration Reveals Patient-Specific Networks in Prostate Cancer. Cell 166: 1041–1054

612 Flaherty KT, Puzanov I, Kim KB, Ribas A, McArthur GA, Sosman JA, O’Dwyer PJ, Lee RJ,

613 Grippo JF, Nolop K & Chapman PB (2010) Inhibition of mutated, activated BRAF in

614 metastatic melanoma. N. Engl. J. Med. 363: 809–19

615 Flaherty KT, Robert C, Hersey P, Nathan P, Garbe C, Milhem M, Demidov L V, Hassel JC,

616 Rutkowski P, Mohr P, Dummer R, Trefzer U, Larkin JMG, Utikal J, Dreno B, Nyakas M,

617 Middleton MR, Becker JC, Casey M, Sherman LJ, et al (2012) Improved survival with

618 MEK inhibition in BRAF-mutated melanoma. N. Engl. J. Med. 367: 107–14

28 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

619 Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, Ding M, Bamford S,

620 Cole C, Ward S, Kok CY, Jia M, De T, Teague JW, Stratton MR, McDermott U &

621 Campbell PJ (2015) COSMIC: exploring the world’s knowledge of somatic mutations in

622 human cancer. Nucleic Acids Res. 43: D805-11

623 Girard L, Zöchbauer-Müller S, Virmani AK, Gazdar AF & Minna JD (2000) Genome-wide

624 allelotyping of lung cancer identifies new regions of allelic loss, differences between small

625 cell lung cancer and non-small cell lung cancer, and loci clustering. Cancer Res. 60: 4894–

626 906

627 Girotti MR, Pedersen M, Sanchez-Laorden B, Viros A, Turajlic S, Niculescu-Duvaz D, Zambon

628 A, Sinclair J, Hayes A, Gore M, Lorigan P, Springer C, Larkin J, Jorgensen C & Marais R

629 (2013) Inhibiting EGF receptor or signaling overcomes BRAF inhibitor

630 resistance in melanoma. Cancer Discov. 3: 158–67

631 Guo A, Villén J, Kornhauser J, Lee KA, Stokes MP, Rikova K, Possemato A, Nardone J,

632 Innocenti G, Wetzel R, Wang Y, MacNeill J, Mitchell J, Gygi SP, Rush J, Polakiewicz RD

633 & Comb MJ (2008) Signaling networks assembled by oncogenic EGFR and c-Met. Proc.

634 Natl. Acad. Sci. U. S. A. 105: 692–7

635 Hanahan D & Weinberg RA (2011) Hallmarks of cancer: the next generation. Cell 144: 646–74

636 Hauschild A, Grob J-J, Demidov L V, Jouary T, Gutzmer R, Millward M, Rutkowski P, Blank

637 CU, Miller WH, Kaempgen E, Martín-Algarra S, Karaszewska B, Mauch C, Chiarion-Sileni

638 V, Martin A-M, Swann S, Haney P, Mirakhur B, Guckert ME, Goodman V, et al (2012)

639 Dabrafenib in BRAF-mutated metastatic melanoma: a multicentre, open-label, phase 3

640 randomised controlled trial. Lancet 380: 358–365

641 Heisterkamp N, Stam K, Groffen J, de Klein A & Grosveld G Structural organization of the bcr

29 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

642 gene and its role in the Ph’ translocation. Nature 315: 758–61

643 Horn H, Schoof EM, Kim J, Robin X, Miller ML, Diella F, Palma A, Cesareni G, Jensen LJ &

644 Linding R (2014) KinomeXplorer: an integrated platform for kinome biology studies. Nat.

645 Methods 11: 603–4

646 Hornbeck P V, Zhang B, Murray B, Kornhauser JM, Latham V & Skrzypek E (2015)

647 PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 43: D512-

648 20

649 Huang M, Shen A, Ding J & Geng M (2014) Molecularly targeted cancer therapy: some lessons

650 from the past decade. Trends Pharmacol. Sci. 35: 41–50

651 Huang PH, Mukasa A, Bonavia R, Flynn RA, Brewer ZE, Cavenee WK, Furnari FB & White

652 FM (2007) Quantitative analysis of EGFRvIII cellular signaling networks reveals a

653 combinatorial therapeutic strategy for glioblastoma. Proc. Natl. Acad. Sci. U. S. A. 104:

654 12867–72

655 Jimenez CR & Verheul HMW (2014) Mass spectrometry-based proteomics: from cancer biology

656 to protein biomarkers, drug targets, and clinical applications. Am. Soc. Clin. Oncol. Educ.

657 Book: e504-10

658 Kantarjian H, Shah NP, Hochhaus A, Cortes J, Shah S, Ayala M, Moiraghi B, Shen Z, Mayer J,

659 Pasquini R, Nakamae H, Huguet F, Boqué C, Chuah C, Bleickardt E, Bradley-Garelik MB,

660 Zhu C, Szatrowski T, Shapiro D & Baccarani M (2010) Dasatinib versus Imatinib in Newly

661 Diagnosed Chronic-Phase Chronic Myeloid Leukemia. N. Engl. J. Med. 362: 2260–2270

662 Klein E, Ben-Bassat H, Neumann H, Ralph P, Zeuthen J, Polliack A & Vánky F (1976)

663 Properties of the K562 cell line, derived from a patient with chronic myeloid leukemia. Int.

664 J. cancer 18: 421–31

30 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

665 Knight ZA, Lin H & Shokat KM (2010) Targeting the cancer kinome through

666 polypharmacology. Nat. Rev. Cancer 10: 130–7

667 Labots M, van der Mijn JC, Beekhof R, Piersma SR, de Goeij-de Haas RR, Pham T V., Knol JC,

668 Dekker H, van Grieken NCT, Verheul HMW & Jiménez CR (2017) Phosphotyrosine-

669 based-phosphoproteomics scaled-down to biopsy level for analysis of individual tumor

670 biology and treatment selection. J. Proteomics 162: 99–107

671 Larson RA, Hochhaus A, Hughes TP, Clark RE, Etienne G, Kim D-W, Flinn IW, Kurokawa M,

672 Moiraghi B, Yu R, Blakesley RE, Gallagher NJ, Saglio G & Kantarjian HM (2012)

673 Nilotinib vs imatinib in patients with newly diagnosed Philadelphia chromosome-positive

674 chronic myeloid leukemia in chronic phase: ENESTnd 3-year follow-up. Leukemia 26:

675 2197–2203

676 Lee M, Lee K, Yu N, Jang I, Choi I, Kim P, Jang YE, Kim B, Kim S, Lee B, Kang J & Lee S

677 (2017) ChimerDB 3.0: an enhanced database for fusion genes from cancer transcriptome

678 and literature data mining. Nucleic Acids Res. 45: D784–D789

679 Li C, Kern JG, Huang S, Armstrong EA, Werner LR & Harari PM (2016) Abstract 2102A:

680 Simultaneous targeting of EGFR and ALK in EML4-ALK positive lung cancer to inhibit

681 tumor cell proliferation and migration. Cancer Res. 76: DOI: 10.1158/1538-7445.AM2016-

682 2102A

683 Linding R, Jensen LJ, Ostheimer GJ, van Vugt MATM, Jørgensen C, Miron IM, Diella F,

684 Colwill K, Taylor L, Elder K, Metalnikov P, Nguyen V, Pasculescu A, Jin J, Park JG,

685 Samson LD, Woodgett JR, Russell RB, Bork P, Yaffe MB, et al (2007) Systematic

686 discovery of in vivo phosphorylation networks. Cell 129: 1415–26

687 Liu H, Sadygov RG & Yates JR (2004) A model for random sampling and estimation of relative

31 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

688 protein abundance in shotgun proteomics. Anal. Chem. 76: 4193–201

689 Lovly CM, Dahlman KB, Fohn LE, Su Z, Dias-Santagata D, Hicks DJ, Hucks D, Berry E, Terry

690 C, Duke M, Su Y, Sobolik-Delmaire T, Richmond A, Kelley MC, Vnencak-Jones CL,

691 Iafrate AJ, Sosman J & Pao W (2012) Routine multiplex mutational profiling of melanomas

692 enables enrollment in genotype-driven therapeutic trials. PLoS One 7: e35309

693 Manning G, Whyte DB, Martinez R, Hunter T & Sudarsanam S (2002) The

694 Complement of the . Science (80-. ). 298: 1912–1934

695 Medves S & Demoulin J-B (2012) Tyrosine kinase gene fusions in cancer: translating

696 mechanisms into targeted therapies. J. Cell. Mol. Med. 16: 237–248

697 van der Mijn JC, Broxterman HJ, Knol JC, Piersma SR, De Haas RR, Dekker H, Pham T V, Van

698 Beusechem VW, Halmos B, Mier JW, Jiménez CR & Verheul HMW (2016) Sunitinib

699 activates Axl signaling in renal cell cancer. Int. J. cancer 138: 3002–10

700 van der Mijn JC, Labots M, Piersma SR, Pham T V, Knol JC, Broxterman HJ, Verheul HM &

701 Jiménez CR (2015) Evaluation of different phospho-tyrosine antibodies for label-free

702 phosphoproteomics. J. Proteomics 127: 259–263

703 van der Mijn JC, Sol N, Mellema W, Jimenez CR, Piersma SR, Dekker H, Schutte LM, Smit EF,

704 Broxterman HJ, Skog J, Tannous BA, Wurdinger T & Verheul HMW (2014) Analysis of

705 AKT and ERK1/2 protein kinases in extracellular vesicles isolated from blood of patients

706 with cancer. J. Extracell. Vesicles 3: 25657

707 Mischnik M, Sacco F, Cox J, Schneider H-C, Schäfer M, Hendlich M, Crowther D, Mann M &

708 Klabunde T (2016) IKAP: A heuristic framework for inference of kinase activities from

709 Phosphoproteomics data. Bioinformatics 32: 424–31

710 Munk S, Refsgaard JC, Olsen J V & Jensen LJ (2016) From Phosphosites to Kinases. Methods

32 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

711 Mol. Biol. 1355: 307–21

712 Nolen B, Taylor S & Ghosh G (2004) Regulation of protein kinases; controlling activity through

713 activation segment conformation. Mol. Cell 15: 661–75

714 Olsen J V, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P & Mann M (2006) Global, in

715 vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127: 635–48

716 Ramos P & Bentires-Alj M (2015) Mechanism-based cancer therapy: resistance to therapy,

717 therapy for resistance. Oncogene 34: 3617–26

718 Rikova K, Guo A, Zeng Q, Possemato A, Yu J, Haack H, Nardone J, Lee K, Reeves C, Li Y, Hu

719 Y, Tan Z, Stokes M, Sullivan L, Mitchell J, Wetzel R, Macneill J, Ren JM, Yuan J,

720 Bakalarski CE, et al (2007) Global survey of phosphotyrosine signaling identifies oncogenic

721 kinases in lung cancer. Cell 131: 1190–203

722 Robert C, Karaszewska B, Schachter J, Rutkowski P, Mackiewicz A, Stroiakovski D, Lichinitser

723 M, Dummer R, Grange F, Mortier L, Chiarion-Sileni V, Drucis K, Krajsova I, Hauschild A,

724 Lorigan P, Wolter P, Long G V, Flaherty K, Nathan P, Ribas A, et al (2015) Improved

725 overall survival in melanoma with combined dabrafenib and trametinib. N. Engl. J. Med.

726 372: 30–9

727 Rubinstein JC, Sznol M, Pavlick AC, Ariyan S, Cheng E, Bacchiocchi A, Kluger HM, Narayan

728 D & Halaban R (2010) Incidence of the V600K mutation among melanoma patients with

729 BRAF mutations, and potential therapeutic response to the specific BRAF inhibitor

730 PLX4032. J. Transl. Med. 8: 67

731 Saglio G, Kim D-W, Issaragrisil S, le Coutre P, Etienne G, Lobo C, Pasquini R, Clark RE,

732 Hochhaus A, Hughes TP, Gallagher N, Hoenekopp A, Dong M, Haque A, Larson RA,

733 Kantarjian HM & ENESTnd Investigators (2010) Nilotinib versus Imatinib for Newly

33 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

734 Diagnosed Chronic Myeloid Leukemia. N. Engl. J. Med. 362: 2251–2259

735 Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa S, Fujiwara S, Watanabe H,

736 Kurashina K, Hatanaka H, Bando M, Ohno S, Ishikawa Y, Aburatani H, Niki T, Sohara Y,

737 Sugiyama Y & Mano H (2007) Identification of the transforming EML4–ALK fusion gene

738 in non-small-cell lung cancer. Nature 448: 561–566

739 Stommel JM, Kimmelman AC, Ying H, Nabioullin R, Ponugoti AH, Wiedemeyer R, Stegh AH,

740 Bradner JE, Ligon KL, Brennan C, Chin L & DePinho RA (2007) Coactivation of receptor

741 tyrosine kinases affects the response of tumor cells to targeted therapies. Science 318: 287–

742 90

743 Terfve CDA, Wilkes EH, Casado P, Cutillas PR & Saez-Rodriguez J (2015) Large-scale models

744 of signal propagation in human cells derived from discovery phosphoproteomic data. Nat.

745 Commun. 6: 8033

746 Trusolino L & Bertotti A (2012) Compensatory Pathways in Oncogenic Kinase Signaling and

747 Resistance to Targeted Therapies: Six Degrees of Separation. Cancer Discov. 2: 876–880

748 UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res. 43: D204-

749 12

750 Valabrega G, Montemurro F & Aglietta M (2007) Trastuzumab: mechanism of action, resistance

751 and future perspectives in HER2-overexpressing breast cancer. Ann. Oncol. 18: 977–84

752 Voena C, Di Giacomo F, Panizza E, D’Amico L, Boccalatte FE, Pellegrino E, Todaro M,

753 Recupero D, Tabbò F, Ambrogio C, Martinengo C, Bonello L, Pulito R, Hamm J, Chiarle

754 R, Cheng M, Ruggeri B, Medico E & Inghirami G (2013) The EGFR family members

755 sustain the neoplastic phenotype of ALK+ lung adenocarcinoma via EGR1. Oncogenesis 2:

756 e43

34 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

757 Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz L a & Kinzler KW (2013) Cancer

758 genome landscapes. Science 339: 1546–58

759 Wilkes EH, Terfve C, Gribben JG, Saez-Rodriguez J & Cutillas PR (2015) Empirical inference

760 of circuitry and plasticity in a kinase signaling network. Proc. Natl. Acad. Sci. 112: 7719–

761 7724

762 Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith

763 JA, Thompson IR, Ramaswamy S, Futreal PA, Haber DA, Stratton MR, Benes C,

764 McDermott U & Garnett MJ (2013) Genomics of Drug Sensitivity in Cancer (GDSC): a

765 resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 41: D955-

766 61

767 Yates B, Braschi B, Gray KA, Seal RL, Tweedie S & Bruford EA (2017) Genenames.org: the

768 HGNC and VGNC resources in 2017. Nucleic Acids Res. 45: D619–D625

769 Zhang Z, Lee JC, Lin L, Olivas V, Au V, LaFramboise T, Abdel-Rahman M, Wang X, Levine

770 AD, Rho JK, Choi YJ, Choi C-M, Kim S-W, Jang SJ, Park YS, Kim WS, Lee DH, Lee J-S,

771 Miller VA, Arcila M, et al (2012) Activation of the AXL kinase causes resistance to EGFR-

772 targeted therapy in lung cancer. Nat. Genet. 44: 852–60

773

35 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

774 Figure Legends

775

776 Figure 1. Generic phosphoproteomics workflow and data analysis strategy.

777 A Overview of an MS-based phosphoproteomics experiment. Proteins from a biological

778 sample are digested with trypsin, and phosphopeptides are enriched for analysis by

779 (orbitrap-based) LC-MS/MS. Phosphopeptides can be captured with various affinity resins;

780 here, data were analyzed of phosphopeptides enriched with anti-phosphotyrosine antibodies

781 and TiOx. Database-based phosphopeptide identification, and phosphosite localization and

782 quantification is performed using a tool like MaxQuant.

783 B Scheme of INKA analysis for identification of active kinases in a single biological sample.

784 Quantitative phosphodata for established kinases are taken as direct (kinase-centric)

785 evidence, using either all phosphopeptides attributed to a given kinase ('Kinome'), or only

786 those from the kinase activation loop segment ('Activation Loop'). Phosphosites are filtered

787 for class I phosphosites (localization probability > 75%) (Olsen et al, 2006), coupled to

788 phosphopeptide spectral count data, and used for substrate-centric inference of kinases on

789 the basis of kinase-substrate relationships that are either experimentally observed (provided

790 by PhosphoSitePlus, 'PSP') or predicted by an algorithm using sequence motif and protein-

791 protein network information (NetworKIN, 'NWK'). All evidence lines are integrated in a

792 kinase-specific INKA score using the geometric mean of combined spectral count data ('C')

793 for kinase-centric and substrate-centric modalities. Results are visualized in a scatter plot of

794 INKA scores for kinases scoring ≥10% of the maximum ('INKA Plot '; horizontal shifts

795 from the middle indicate evidence being more kinase-centric or more substrate-centric). For

796 top 20 INKA-scoring kinases, a score bar graph ('INKA Ranking'), and a kinase-substrate

36 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

797 relation network for pertinent kinases and their observed substrates ('INKA Network') are

798 also produced.

799

800 Figure 2. Ranking of top 20 kinases in four cell line use cases by each of four lines of

801 evidence and integrative INKA scoring.

802 A K562 chronic myelogenous leukemia cells with a BCR-ABL fusion. INKA score ranking

803 indicates that ABL1/BCR-ABL (orange bars) exhibits principal kinase activity in this cell

804 line, in line with a role as an oncogenic driver.

805 B SK-Mel-28 melanoma cells with mutant BRAF. In the 'Kinome' analysis, CDK1, CDK2 and

806 CDK3 share a second place, based on phosphopeptides that cannot be unequivocally

807 assigned to either of them. INKA scoring implicates MAPK3 as the number one activated

808 kinase. As SK-Mel-28 is driven by BRAF, a serine/threonine kinase that is missed by pTyr-

809 based phosphoproteomics, downstream targets in the MEK-ERK pathway are highlighted by

810 blue coloring.

811 C Erlotinib-resistant HCC827-ER3 NSCLC cells with mutant EGFR. INKA scores reveal the

812 driver EGFR (pink coloring) as second-highest ranking and MET as highest ranking kinase,

813 respectively.

814 D H2228 NSCLC cells with an EML4-ALK fusion. The driver ALK (purple coloring) is ranked

815 as a top 3 kinase by INKA score, slightly below PTK2 and SRC.

816

817 Data information: For each cell line, bar graphs depict kinase ranking based on kinase-centric

818 analyses (panel "Kinase phosphopeptides"), substrate-centric analyses (panel "Substrate

819 phosphopeptides"), and combined scores (panel "INKA"). Bar segments represent the number

37 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

820 and contribution of individual phosphopeptides (kinase-centric analyses) or phosphosites

821 (substrate-centric analyses). P-values flanking INKA score bars were derived through a

822 randomization procedure with 105 permutations of both peptide-spectral count links and kinase-

823 substrate links.

824

825 Figure 3. INKA plots and kinase-substrate relation networks for four cell line use cases.

826 A K562 CML cells with a BCR-ABL fusion. ABL1 is the most activated kinase, with relatively

827 equal contributions from both analysis arms. It is a highly connected, central node in the

828 network.

829 B SK-Mel-28 melanoma cells with mutant BRAF. Downstream MEK-ERK pathway members

830 are highlighted in lieu of BRAF which is missed by the current pTyr-based workflow.

831 MAPK3 is the top activated kinase. The network includes two clusters with highly

832 connected activated kinases, MAPK1/3 and SRC, respectively.

833 C Erlotinib-resistant HCC827-ER3 NSCLC cells with mutant EGFR. EGFR and MET are the

834 most active, highly connected kinases. AXL, inactive in parental cells (see Supplementary

835 Fig 6), but associated with erlotinib resistance in this sub-line, can only be analyzed through

836 the kinase-centric arm (pink bar highlighting).

837 D H2228 NSCLC cells with an EML4-ALK fusion. ALK is a high-ranking kinase with roughly

838 equal evidence from both analysis arms. Multiple highly active and connected nodes imply

839 relative insensitivity to ALK inhibition, in line with previous functional data. Larger

840 networks are shown in Supplementary Figs 3-5 and Supplementary Fig 7.

841

38 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

842 Data information: In INKA plots proper, the vertical position of kinases (drivers in red) is

843 determined by their INKA score, whereas the horizontal position is determined by the

844 (im)balance of evidence from kinase-centric and substrate-inferred arms of the analysis. Kinases

845 not covered by PhosphoSitePlus (PSP) and NetworKIN (NWK) are visualized in a flanking bar

846 graph.

847

848 Figure 4. INKA analysis of three cancer cell line cases from the literature.

849 A H3255 NSCLC cells with an EGFR mutation. INKA analysis shows that EGFR is a hyper-

850 activated kinase (top 2 in all branches), together with MET.

851 B A204 rhabdomyosarcoma cells with documented PDGFRA signaling. PDGFRA exhibits

852 variable ranking (top-intermediate) in individual analysis types for A204, but integrated

853 INKA analysis (right-most bar graph) infers it as a highly active, rank-2 kinase, after

854 EPHA1.

855 C MNNG/HOS osteosarcoma cells with documented MET signaling. MET consistently ranks

856 among the top 2 in all analysis arms, culminating in a first rank in the integrative INKA

857 analysis.

858

859 Data information: See the legend of Figure 2 for basic explanation.

860

861 Figure 5. INKA analysis in differential genetic and pharmacological settings.

862 A Effect of a monogenetic change in a cancer cell line use case. Comparison of U87

863 glioblastoma cells ("wild type") with isogenic U87-EGFRvIII cells overexpressing a

864 consitutively active EGFR variant ("mutant") grown under baseline conditions.

39 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

865 B Effect of drug treatment in a cancer cell line use case. Comparison of U87-EGFRvIII cells at

866 baseline with U87-EGFRvIII cells treated with 10 µM erlotinib for 2 h shows a clearly

867 reduced INKA score for EGFR.

868 C Effect of drug treatment in a patient with hypopharyngeal cancer. Tumor biopsies were

869 taken both before and after two weeks of erlotinib treatment.

870 D Same as panel C, but for a patient with pancreatic cancer.

871 E Time-dependent effect of radiomimetic treatment in a cancer cell line use case. MS

872 intensity-based INKA analysis of TiO2-captured phosphoproteomes from G361 melanoma

873 cells at different time points following treatment with the DNA damage-inducing drug

874 neocarzinostatin (NCS) in the absence or presence of ATM inhibitor KU55933 (ATM).

875 Plotted is the INKA score for ATM, exhibiting a time-dependent increase, which is not

876 observed with ATM blocking. Full INKA score bar graphs are shown in Supplementary Fig

877 8.

878 F Same as panel E, but plotting of the INKA score for PRKDC/DNA-PK, which exhibits

879 similar behavior. Raw data for panels A&B are from Van der Mijn et al. (van der Mijn et al,

880 2014). Raw data for panels E&F are from Bensimon et al. (Bensimon et al, 2010) and

881 averaged for replicate treatment conditions.

882

883 Supplementary Figure Legends

884

885 Supplementary Figure 1. Coverage of 538 unique protein kinases from KinBase by

886 resources providing kinase-substrate relationships.

40 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

887 The Venn diagram shows that 172 kinases in KinBase are missing from both PhosphoSitePlus

888 (PSP) and NetworKIN (NWK), 31 are only covered by NWK, 172 are only covered by PSP,

889 while 163 are covered by both NWK and PSP. Additionally, 14 proteins annotated by PSP are

890 not present in KinBase, but mostly involve small-molecule kinases or proteins with spurious

891 (electronic) annotation, with the exception of two fusion protein species with established protein

892 kinase activity (BCR-ABL and NPM-ALK).

893

894 Supplementary Figure 2. Correlation between kinase-specific INKA scores and estimated

895 p-values.

896 Log10-transformed INKA scores are plotted against (sign-switched) log10-transformed p-values

897 for each of four cell line use cases. Highly significant, positive Spearman rank correlation

898 coefficients for each of the cell lines demonstrate a monotonic trend: higher INKA scores are

899 associated with lower p-values. (a) K562 chronic myelogenous leukemia cells. (b) SK-Mel-28

900 melanoma cells. (c) HCC827-ER3 lung cancer cells. (d) H2228 lung cancer cells.

901

902 Supplementary Figure 3. Kinase-substrate relation network for top 20 INKA-scoring

903 kinases and their observed substrates in K-562 chronic myelogenous leukemia cells with a

904 BCR-ABL fusion. ABL1 is a highly connected and central node. Kinases downstream of BCR-

905 ABL signaling, such as SRC, are also active, albeit to a lower extent.

906

907 Supplementary Figure 4. Kinase-substrate relation network for top 20 INKA-scoring

908 kinases and their observed substrates in SK-Mel-28 melanoma cells with BRAFV600E.

41 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

909 Two clusters of activated kinases are observed, one containing BRAF targets MAPK1 and

910 MAPK3, and the other containing SRC as highly connected nodes.

911

912 Supplementary Figure 5. Kinase-substrate relation network for top 20 INKA-scoring

913 kinases and their observed substrates in HCC827-ER3 non-small cell lung carcinoma cells.

914 EGFR and MET are central and highly connected nodes. AXL, associated with erlotinib

915 resistance of HCC827-ER3 cells, is missed by INKA score-based analysis as no substrate-centric

916 data is available for this kinase.

917

918 Supplementary Figure 6. INKA plot and top 20 INKA-scoring kinase bar graph for

919 HCC827 cells with mutant EGFR.

920 In the INKA plot proper, the vertical position of kinases (driver in red) is determined by their

921 INKA score, whereas the horizontal position is determined by the (im)balance of evidence from

922 kinase-centric and substrate-inferred arms of the analysis. EGFR and MET are implicated as the

923 most active kinases. Kinases not covered by PhosphoSitePlus (PSP) and NetworKIN (NWK) are

924 visualized in a flanking bar graph on the left. Note absence from the bar graph of the AXL kinase

925 that is responsible for erlotinib resistance in the HCC827-ER3 sub-line (see Fig 3c).

926

927 Supplementary Figure 7. Kinase-substrate relation network for top 20 INKA-scoring

928 kinases and their observed substrates in H2228 non-small cell lung carcinoma cells with an

929 EML4-ALK fusion.

930 Multiple highly active and connected nodes are present in the network for H2228, implying

931 relative insensitivity to inhibition of ALK alone, in line with previous functional data. Dual

42 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

932 inhibition of the number-5 hyperactive node, EGFR, and ALK results in significant reduction of

933 proliferation (Voena et al, 2013).

934

935 Supplementary Figure 8. MS intensity-based INKA analysis of data published by

936 Bensimon et al. on TiO2-captured phosphoproteomes from G361 melanoma cells following

937 radiomimetic treatment.

938 A INKA score bar graphs for G361 at baseline.

939 B-E INKA score bar graphs for G361 after 10 min (B), 30 min (C), 120 min (D), or 360 min

940 (E) treatment with 200 ng/ml neocarzinostatin (NCS).

941 F,G INKA score bar graphs for G361 after 30 min (F) or 120 min (G) treatment with 200 ng/ml

942 neocarzinostatin in the presence of 10 µM KU55933 (ATM inhibitor, ATMi). DNA

943 damage-induced kinases ATM, ATR, and PRKDC/DNA-PK are highlighted in red.

944

945 Data information: raw data (Bensimon et al, 2010) were normalized and averaged for replicate

946 treatment conditions.

947

948 Supplementary Table Legends

949

950 Supplementary Table 1. Mass-spectrometric and annotation data for phosphopeptides that

951 were captured and identified in phosphotyrosine-based immunoprecipitation experiments

952 on cell lines and patient tumors.

953 Pair-wise tables "phosphopeptides..." and "phosphosites..." harbor information from selected

954 columns of the MaxQuant modificationSpecificPeptides.txt and Phospho (STY)Sites.txt export

43 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

955 files, respectively. The latter list phosphosites inferred by MaxQuant on the basis of mass-

956 spectrometric evidence. Raw intensities were median normalized, and MS/MS spectral counts

957 were obtained with the help of the MaxQuant evidence.txt export file.

958

959 Supplementary Table 2. Coverage of protein kinases by various resources used in this

960 study.

961 A 538 established protein kinases from KinBase (http://kinase.com/web/current/kinbase)

962 B 489 kinases for which activation loop segment annotation is available from Phomics

963 (http://phomics.jensenlab.org)

964 C 349 kinases linked to substrate phosphorylation in experimental studies, provided by

965 PhosphoSitePlus (http://www.phosphosite.org/staticDownloads.action)

966 D 194 kinases linked to substrate phosphorylation through prediction, provided by NetworKIN

967 (http://networkin.info)

968 E 172 protein kinases that are not covered by both PhosphoSitePlus and NetworKIN

969 F 14 entries from PhosphoSitePlus that are not covered by KinBase.

970

971 Supplementary Table 3. Quantitative result tables for INKA analyses.

972 A-O Detailed are analyses for (A) K562 chronic myeloid leukemia cells, (B) SK-Mel-28

973 melanoma cells, (C) HCC827-ER3 non-small cell lung carcinoma cells; sub-line of

974 HCC827, (D) H2228 non-small cell lung carcinoma cells. (E) HCC827 non-small cell

975 lung carcinoma cells; parental line of HCC827-ER3, (F) H3255 non-small cell lung

976 carcinoma cells (Guo et al, 2008), (G) A204 rhabdomyosarcoma cells (Bai et al, 2012),

977 (H) MNNG/HOS osteosarcoma cells (Bai et al, 2012), (I) U87 glioblastoma cells (van

44 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

978 der Mijn et al, 2014), (J) U87-EGFRvIII glioblastoma cells (van der Mijn et al, 2014),

979 (K) U87-EGFRvIII glioblastoma cells treated with erlotinib (van der Mijn et al, 2014),

980 (L) tumor tissue biopsy from a hypopharyngeal cancer patient (Pt23) before treatment,

981 (M) tumor tissue biopsy from a hypopharyngeal cancer patient (Pt23) after two weeks of

982 erlotinib treatment, (N) tumor tissue biopsy from a pancreatic cancer patient (Pt24) before

983 treatment, (O) tumor tissue biopsy from a pancreatic cancer patient (Pt24) after two

984 weeks of erlotinib treatment.

985

986 Data information: For each given kinase, metrics are given for individual analyses (Kinome,

987 ActLoop, PSP, and NWK), accompanied by kinase-centric and substrate-centric INKA score

988 components (Kin = Kinome + ActLoop, Sub = PSP + NWK), INKA scores (Score = √ (Kin ∗

989 Sub)), relative INKA scores (Rel.Score) and skew parameters indicating relative contribution

990 from kinase-centric versus substrate-centric evidence (Skew = arctan (Sub/Kin) ∗ 2/휋). Skew

991 ranges from 0 (kinase-centric evidence only) to 1 (substrate-centric evidence only), so that a

992 skew of 0.5 indicates equal contribution.

993

994 Supplementary Table 4. Drug sensitivity data for top 5 INKA-scoring kinases in four

995 cancer cell line use cases.

996 A K562 driven by BCR-ABL. The data show high sensitivity of the cell line only to inhibitors

997 of ABL.

998 B SK-Mel-28 driven by BRAF. Drug inhibition data are also listed for BRAF and ERK-

999 activating MEKs in addition to top INKA-scoring kinases. Among the latter, ERK1 and

45 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1000 ERK2 exhibit limited sensitivity to inhibitors. Direct targeting of BRAF, or (upstream of

1001 ERK1/2) MEK1/2, is much more successful, in line with BRAF driver function.

1002 C HCC827 driven by EGFR. HCC827, the erlotinib-sensitive parental line of the erlotinib-

1003 resistant HCC827-ER3 sub-line, exhibits high sensitivity to EGFR inhibitors, but not MET

1004 inhibitors.

1005 D H2228 associated with EML4-ALK. The data show moderate sensitivity of H2228 to ALK

1006 inhibitors, but little efficacy of inhibitors targeting top INKA-scoring kinases.

1007

1008 Data information: data were derived from the Genomics of Drug Sensitivity in Cancer website

1009 (GDSC, http://www.cancerrxgene.org) and include cell line-specific IC50 values and Z-scores

1010 for relative sensitivity to a given drug compared to all other cell lines.

46 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A Lysate Tryptic digest Desalt Dry p-peptide capture Desalt LC-MS/MS Ti or O O m/z database search B Phosphopeptide Phosphosite PSP NWK Activation Upstream Upstream Kinome Loop Kinase Kinase

BCR PRPF4B SRC MET ABL1 MAPK1 ABL1 CDK1 ABL1 LYN LYN CDK3 MAPK14 ABL1 ABL2 CDK2 ABL2 FYN BTK PRPF4B MAPK3 HCK TTK ABL2 DYRK1B PKMYT1 KIT MAPK1 DYRK1A ERBB2 EGFR MAPK14 YES1 ABL2 NTRK1 LYN SRC RET MAP2K1 PEAK1 HIPK2 BCR ERBB2 MAPK3 HIPK1 JAK2 TYK2 BTK FYN MAP2K1 TEC

Top 20 kinases Top DYRK1B LYN ITK DYRK1A LCK SYK FLT1 TNK2 HCK MET MAPK1 YES1 PTK2 ALK FGR FYN GSK3B MAPK3 YES1 HIPK1 GSK3A MAP2K4 INSR DDR1 DDR1 EGFR PDGFRB

0 50 100 150 0 10 20 30 40 50 60 0 50 100 150 200 250 0 50 100 150 200 Spectral counts Spectral counts Spectral counts Spectral counts

INtegrative INferred Kinase Activity score (INKA) Statistics

permutation = ( + ) × ( + ) and randomization

∑INKA Plot∑ ∑ INKA Ranking∑ INKA Network

ABL1 CBL FYB ABL2 ITK SH2D2A BCR PLCG1 1 ABL1 250 FYN LYN INPP5D FCGR2C DOK1 DOK2 SYK LCK CRK MAPK1 BTK BMX 0.9 225 BCR−ABL FYN FCGR2A TEC BCR 0.8 200 SRC NTRK2 WAS ABL2 TNK2 PEAK1 ABL1 BTK MAP2K2 PPP1CA ABI1 FGR 0.7 175 MAPK3 LYN HCK GAB2 HCK STAT5A EGFR JAK2 MAPK3 NCK1 0.6 150 YES1 PRKCD PTK2 PTTG1IP MPZL1 MAP2K1 NTRK1 LCK G6PD ABL2 Score MAPK1 CDK2 SHC1 0.5 125 JAK2 CDK1 SRC BCR KIT INPPL1 STAT5B

Relative Score Relative TEC GAB1 0.4 100 PDGFRB TNK2 RET TYK2 LYN CDK3 TTK HIPK2 TTK CTTN PFN1 PRKACA 0.3 75 MAP2K6 FYN SYK MAPK14 PIK3R3 PDGFRA MAPK1 MAPK14 CBLB CD46 SRC MAP3K6 PTPRA 0.2 BTK 50 PIK3R2 MAPK3 HCK HIPK2 MAP2K4

YES1 FGR MAP3K5 MAP2K3 YES1 0.1 25 ARHGAP35 JAK2 ACTN1 AURKC TNK2 SYKTECTTK ALK MAPK14 GSK3AGSK3BHIPK2 PTK2 FGRTYK2 Kinase−CentAKT3 AURKBEPHB4 ricFER CDK5AURKA Equal SubstrBMXAKT1ate−Cent AKT2 ric 0 50 100 150 200 250 AXL Contribution BLK INKA score Kinase phosphopeptides Substrate phosphopeptides INKA

Kinome Activation loop PhosphositePlus NetworKin Combined Score p-value BCR PRPF4B SRC MET ABL1 1.0e-05 ABL1 MAPK1 WEE1 ABL1 ABL2 1.0e-05 A CDK1 ABL1 LYN LYN BCR 2.0e-05 CDK3 MAPK14 ABL1 ABL2 LYN 1.0e-05 bioRxiv preprintCDK2 doi: https://doi.org/10.1101/259192; thisABL2 version posted February 2, 2018. The copyrightFYN holder for this preprint (which was notBTK MAPK1 5.0e-05 PRPF4B certified by peer review) is the author/funder.MAPK3 All rights reserved. No reuse allowedHCK without permission. TTK FYN 3.7e-04 ABL2 DYRK1B PKMYT1 KIT SRC 1.7e-04 MAPK1 DYRK1A ERBB2 EGFR BTK 5.3e-04 MAPK14 YES1 ABL2 NTRK1 MAPK3 2.2e-04 LYN SRC RET MAP2K1 HCK 2.2e-04 PEAK1 HIPK2 BCR ERBB2 YES1 1.7e-02 MAPK3 HIPK1 JAK2 TYK2 LCK 3.4e-03 BTK FYN MAP2K1 TEC JAK2 5.6e-03 DYRK1B LYN LCK ITK TEC 3.5e-02 DYRK1A LCK SYK FLT1 TNK2 9.9e-02 TNK2 HCK MET MAPK1 TTK 1.1e-02 YES1 PTK2 ALK FGR SYK 6.9e-02 FYN K562 GSK3B K562 MAPK3 K562 YES1 K562 MAPK14 K562 1.7e-01 HIPK1 Kinome GSK3A ActLoop MAP2K4 PSP INSR NWK HIPK2 6.8e-02 DDR1 DDR1 EGFR PDGFRB FGR INKA 3.2e-02

0 50 100 150 0 10 20 30 40 50 60 0 50 100 150 200 250 0 50 100 150 200 0 50 100 150 200 250

p-value PRPF4B PRPF4B SRC ABL1 MAPK3 2.2e-04 CDK3 MAPK1 WEE1 MAP2K1 SRC 4.0e-04 B CDK2 PTK2 RET LYN PTK2 4.3e-03 CDK1 MAPK3 JAK2 TTK FYN 9.9e-03 MAPK1 DYRK1B MAP2K1 EGFR YES1 1.0e-02 PTK2 DYRK1A PKMYT1 ERBB2 JAK2 4.5e-03 MAPK3 MAPK14 ERBB2 FGR LYN 8.7e-03 DYRK1B HIPK2 BMX KDR INSR 5.7e-03 DYRK1A HIPK1 MET MAP2K2 MAPK14 1.5e-02 TNK2 YES1 PTK2 YES1 MAPK1 1.3e-02 MAPK14 SRC MAPK3 INSR LCK 1.5e-02 PEAK1 LCK FGR MAP2K3 HIPK2 1.9e-02 YES1 GSK3B MAP2K2 FYN IGF1R 1.9e-02 FYN GSK3A MAP2K4 MAP2K6 CDK1 5.0e-02 SYK FYN ABL1 KIT GSK3A 2.9e-02 HIPK2 INSR MAP2K6 ITK GSK3B 2.9e-02 HIPK1 IGF1R MAP3K6 MAPK3 FGR 1.3e-02 AXL SK−Mel−28 MAPK9 SK−Mel−28 MAP3K5 SK−Mel−28 SRMS SK−Mel−28 CDK5 SK−Mel−28 1.7e-02 GSK3B Kinome DYRK4 ActLoop MAPK14 PSP MET NWK EPHA2 1.1e-01 GSK3A DYRK2 CDK5 MAP2K5 TYK2 INKA 4.3e-02

0 10 20 30 40 50 0 10 20 30 40 50 0 20 40 60 80 100 0 10 20 30 40 50 60 70 0 5 10 15 20 25 30 35

p-value MET MET SRC MET MET 1.0e-05 EGFR PTK2 MET EGFR EGFR 1.0e-05 C NRK GSK3B ABL1 ERBB2 EPHA2 1.0e-05 EPHA2 GSK3A MST1R PRKCE MST1R 4.1e-04 PTK2 EPHA2 EGFR FYN ERBB2 1.0e-05 PEAK1 EGFR BMX EPHA2 PTK2 2.4e-02 FRK MST1R FYN MAPK3 FYN 5.7e-04 CDK1 PRPF4B RET INSR SRC 4.0e-04 MINK1 EPHB4 LYN YES1 EPHB2 1.1e-02 ROR1 AXL EPHA2 ABL1 GSK3B 3.5e-04 TNK2 MAPK8 INSR EPHB2 EPHB4 8.5e-03 AXL MAPK10 PTK2 TYK2 GSK3A 4.4e-04 GSK3B HIPK2 WEE1 EPHB4 MAPK3 5.8e-04 GSK3A HIPK1 MAPK3 NTRK1 LYN 2.2e-03 CDK3 EPHB2 MAPK1 EPHB1 YES1 6.0e-03 CDK2 DYRK1B FGR ITK ABL1 1.7e-03 EPHB4 DYRK1A LCK ERBB4 CDK1 2.3e-02 EPHB2 HCC827_ER3 MAPK12 HCC827_ER3 CDK1 HCC827_ER3 MAP2K1 HCC827_ER3 MAPK1 HCC827_ER3 3.2e-03 PRKCD Kinome MAPK1 ActLoop ALK PSP EPHA4 NWK HIPK2 INKA 4.4e-03 LYN JAK1 MAP2K1 ADRBK1 FRK 1.4e-02

0 50 100 150 200 0 20 40 60 80 100 0 100 200 300 400 0 100 200 300 400 500 0 100 200 300 400 500

p-value PRPF4B PRPF4B SRC EGFR PTK2 1.6e-03 ALK PTK2 RET MET SRC 1.1e-04 D CDK1 MAPK14 WEE1 ERBB2 ALK 7.0e-04 CDK3 GSK3B MET INSR LYN 7.3e-04 CDK2 GSK3A PKMYT1 ABL1 EGFR 4.5e-04 PTK2 MAPK3 ERBB2 LYN MET 4.0e-04 MAPK14 MAPK1 ALK EPHA1 EPHA2 6.3e-03 LYN DYRK1B BMX MAP2K1 FYN 2.3e-03 GSK3B DYRK1A MAP2K1 YES1 ABL1 1.4e-03 GSK3A EPHA2 JAK2 TTK MAPK3 9.1e-04 EPHA2 YES1 PTK2 FGR YES1 3.4e-03 TNK2 SRC ABL1 FYN GSK3A 1.3e-03 MAPK3 LYN FGR KDR GSK3B 1.5e-03 MAPK1 LCK EGFR KIT MAPK14 5.6e-03 DYRK1B HIPK2 INSR EPHA2 INSR 2.1e-03 DYRK1A HIPK1 MAP2K4 ITK MAPK1 8.0e-03 YES1 HCK MAP3K6 TEC LCK 1.0e-02 FYN H2228 FYN H2228 MAP3K5 H2228 MAP2K6 H2228 ERBB2 H2228 3.8e-03 PEAK1 Kinome ABL2 ActLoop MAP2K6 PSP MAP2K7 NWK ABL2 3.8e-02 EGFR ABL1 MAP2K3 FLT1 HCK INKA 1.6e-02

0 10 20 30 40 50 0 10 20 30 40 50 0 50 100 150 200 0 50 100 150 0 10 20 30 40 50 Spectral counts Spectral counts Spectral counts Spectral counts INKA score CBL Not in PSP/NWK K562 FYB ITK PEAK1 SH2D2A 1 250 PLCG1 ABL1 FYN DDR1 INPP5D FCGR2C DOK1 DOK2

A SYK HIPK3 LCK CRK 0.9 225 BTK BMX CDK17 BCR−ABL FCGR2A TEC AXL BCR DYRK4 0.8 200 NTRK2 WAS ABL2 TNK2 PEAK1 ABL1 FAM20B MAP2K2 PPP1CA ABI1 FGR TRIM28 0.7 175 LYN HCK GAB2 STAT5A EGFR JAK2 MAPK3 NCK1 0.6 150 PRKCD PTK2 PTTG1IP

MPZL1 MAP2K1

Score NTRK1 ABL2 G6PD MAPK1 0.5 125 CDK2 SHC1 CDK1 BCR SRC KIT

Relative Score Relative INPPL1 STAT5B GAB1 0.4 100 PDGFRB RET LYN TYK2 CDK3 TTK HIPK2 CTTN PFN1 PRKACA 0.3 75 MAP2K6

Top-20 Kinases Top-20 FYN MAPK14 MAPK1 PIK3R3 PDGFRA SRC CBLB CD46 0.2 BTK 50 MAP3K6 PTPRA MAPK3 HCK PIK3R2 YES1 LCK MAP2K4 0.1 25 MAP3K5 MAP2K3 YES1 JAK2 ARHGAP35 0 5 10 15 AURKC TNK2 SYKTECTTK ALK ACTN1 MAPK14 GSK3AGSK3BHIPK2 PTK2 FGRTYK2 AKT3 AURKBEPHB4 FER CDK5AURKA BMXAKT1 AKT2 AXL BLK Not in PSP/NWK SK−Mel−28 MAP2K4 MAP3K6 TEC AXL 1 MAPK3 ● 35 MAP2K1 PEAK1 STAT3 JAK2 B MAPK1 DYRK4 SRC GSK3B 0.9 32 MAP2K6 SGK223 MAPK14 PTK2 CDK5 MAP2K3 CDK17 MAPK3 0.8 28 ABL1 HIPK3 PXN YES1

MAP2K2 WEE1 RET 0.7 25 PKMYT1 MAP3K5

VCL

SYK TTK 0.6 21 GSK3A FGR CDK1

Score CDK2 ERBB2 bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not TYK2 0.5 18 MET certified by peer review) is the author/funder. All rights reserved. No reuse allowed without FYNpermission. LYN PTK2

Relative Score Relative PRKACA MAPK14 YES1 LCK 0.4 JAK2 14 INSR BMX LYN PEAK1 BCAR1 Top-20 Kinases Top-20 MAPK1 MPZL1 IGF1R 0.3 11 SRC ● LCK ITK EGFR HIPK2 IGF1R FYN CDK5 INSR KDR CDK1 PTPRA 0.2 FGR 7 TNK2 GSK3A/B EPHA2 TYK2 PDGFRA EPHA2 0.1 4 HIPK2 ABL2 PDGFRB 0 2 4 6 SYK HCK CAV1 AURKA ALK AKT2 AKT3 ABL1 AKT1 AURKB BMX

MAP2K2 Not in PSP/NWK HCC827−ER3 IRS2 PDK1 MST1R ACP1 PAK4 CRKL JAK2 PEAK1 GSK3A DAG1 1 MET 530 PRKCE LRP6 KDR ITGB4 AXL STK11 GSK3B C RET ROR1 FGR EPHA2 ITK GIT1 PDGFRB BMX 0.9 477 BCAR1 SGK223 TYRO3 ARHGAP35 MAPK3 INPPL1 TYRO3 PTK2B PTPN11 ADRBK1 CTNND1 TNK2 GPRC5A 0.8 424 RASA1 MET MAPK1 DYRK4 ACTN1 EPN2 HGS EPS8 JUP CBLB PTK2 GAK MAP2K1 RIPK1 SGK223 ERRFI1 0.7 371 FYN PXN CAMK2A BMPR2 YES1 CTTN ITGB5 YBX1 CBL STAM2 SHC1 PKN2 VCL USP6NL GAB1 STAT3 ABI1 BMP2K 0.6 318 GAB2 EGFR EGFR APP OSMR HSP90AA1 PKM OCLN PAK1 FAM20B ERBB2 PAG1 EPB41

Score CAV1 ERBIN COPS5 YAP1 BCR 0.5 265 CD46 STAT1 ABL1 PLCG1 NOTCH1 TJP1 CDK4

Relative Score Relative DOCK7 HSP90AA2P PTPN6 RAB5A LYN TNS3 CRK STAM 0.4 212 SRC SPRY2 ERBB3 TTK G6PD PEAK1 CDK1 VIM FRK NCK1 ABI2 MPZL1 0.3 159 PFN1 ARHGAP5 CDK2 EMD TEC PRKCD RAC1 CFL1 RGS3 PTPRA CABLES1 Top-20 Kinases Top-20 KIT 0.2 106 PBK EPHA2 GJA1 LAT2 EFNB2 CCDC88A WEE1 ERBB2 SDCBP PTK2 EPHB4 MST1R PKMYT1 EPHB2 TFRC HIPK2 0.1 GSK3AGSK3B EPHB4EPHB2 FYNSRC 53 PTTG1IP MAPK3YES1LYNABL1 EFNB1 0 10 20 30 FRK CDK1 HIPK2MAPK1ERBB4 WASL ARAF CDK2 MINK1 ERBB3EPHB3EPHA4EPHB1TTKLCKTYK2INSRALK AKT2 ALPK3MERTKTGFBR2JAK1MAPK10 MELKCDC42BPBMAPK14PDPK1 MAP4K4NEK1 EPHA3FERPAK2 IGF1RABL2PRKCGPRKCACSNK1A1PTK2BHCK JAK2EPHA5ADRBK1CDK5PRKCIPAK4FGR AKT1 ATR AURKA

Not in PSP/NWK ITGB1 H2228 ITK NCK1 CTNND1

PEAK1 CTNNA2 LCK 1 PTK2 50 TEC SGK223 SRC CDK2 CTNNA1 PEAK1 D DOCK4 NAMPT IRS2 AXL ABL2 ALK FYN 0.9 45 KIT HIPK3 PIK3R3 SGK223 PAG1 LYN HCK CDK17 GSK3A ABL1 MPZL1 PTPRA

0.8 40 LYN IRS1

EPHA2 PTTG1IP INSR MAP3K6 CAV1 EGFR CRK FGR 0.7 35 MAP2K6 BCAR1 PIK3R1 EPHA2 PXN PDGFRB INPP5D MET JAK2 MUC1 0.6 MAPK3 30 KDR GAB2 MAP2K3 CDK1 SDCBP FYN ABL1 SRC AP2B1 MAPK14 MAPK1 YES1 Score EGFR GAB1 GSK3A/B INPPL1 HIPK2 0.5 25 PTK2 MAPK3 MAP2K4 PDGFRA MAPK14 SHC1 Relative Score Relative INSR RET CD46 G6PD 0.4 20 MET GPRC5A MAP3K5 MAP2K1 TNK2 CTTN PRKACA BMX

Top-20 Kinases Top-20 LCK MAP2K2 0.3 MAPK1 ABL2 ERBB2 15 ERBB2 EML4 STAT3 CBL GSK3B YES1 HCK EPHA1 ITGB4 0.2 HIPK2 IGF1R JAK2 10 ARHGAP35 PTPRE PTPN11 FGR MST1R ALK TYK2 HSP90AA1 EPHA3 EPHB3 HSP90AA2P EPHB4 VCL 0.1 CDK5 5 MAP2K7 ACTN1 VASP TNK2 EPHA4 VIM 0 2 4 6 8 FGFR1 ERBB4 TNS3 AURKC TGFBR2 FRK SYK AKT3 AURKA AURKB AKT1 AKT2 Kinase−Centric Kinase−CentAXL ric Equal Substrate−CentBCR−ABL ric Counts Evidence Only Contribution Evidence Only Observed Inferred Non kinase Activation-loop Rel. INKA Score Substrate kinase kinase substrate phosphorylated NWK PSP Both

0 Color Key 1 Kinase bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Kinase phosphopeptides Substrate phosphopeptides INKA

EGFR PTK2 SRC ERBB2 EGFR PTK2 MET EGFR EGFR MET MET MAPK14 MET MET PTK2 EPHA2 GSK3B RET YES1 EPHA2 A MAPK14 GSK3A BMX FYN ERBB2 GSK3B PTK2B ABL1 EPHA2 SRC GSK3A PRPF4B MAP2K1 ABL1 GSK3A EPHB4 MAPK13 PTK2 MAP2K6 GSK3B EPHB3 EPHA2 GSK3B LYN FYN YES1 DYRK1B GSK3A ERBB4 YES1 PTK2B DYRK1A FYN MAP2K1 LCK PRKCD YES1 FGR KIT EPHB4 ERBB2 SRC MST1R FGR LYN EPHA4 LCK SYK PDGFRB EPHB3 DYRK1A FYN LCK KDR EPHA4 LYN EPHA1 EPHA2 TYK2 INSR FYN MAPK3 MAP3K5 NTRK1 ERBB4 CDK3 MAPK1 MAP2K6 ERBB3 ERBB3 H3255 H3255 H3255 H3255 H3255 MAPK3 CDK2 EPHB4 MAP2K4 EPHB4 INKA ranking CDK1 Kinome EGFR Activation Loop MAP2K3 PhosphoSitePlus LCK NetworKIN EPHA1

0 10 20 30 40 0 5 10 15 20 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80

FGFR1 PTK2 SRC EGFR EPHA2 PDGFRA DDR2 RET ERBB2 PDGFRA CDK1 FGFR1 BMX LYN FGFR1 EPHA2 GSK3A WEE1 EPHA4 FYN B PTK2 DDR2 EPHA2 PKMYT1 ITK FYN DYRK1A ERBB2 EPHA2 GSK3A PTK2 PRPF4B MAP2K1 PDGFRA ABL1 PEAK1 HIPK1 FGFR1 MET JAK2 GSK3A MAPK14 JAK2 ABL1 EPHB4 EPHB4 MAPK1 EPHA2 YES1 EGFR DYRK1A MAPK3 MET MAP2K1 MAPK3 PRPF4B FYN PDGFRA TEC EPHB3 TNK2 AXL GSK3B FYN TYK2 HIPK1 TYRO3 GSK3A NTRK1 EPHA7 MAPK14 EPHB4 FYN MAP2K6 TNK2 JAK2 EPHA7 ABL1 EPHB4 FER EPHB3 DDR1 PTK2 TYK2 EPHA3 MAPK1 JAK1 EGFR EPHB2 MELK A204 A204 A204 A204 A204 ABL2 AXL HIPK3 MAP2K2 EPHB6 INKA ranking ABL1 Kinome ABL1 Activation Loop FGR PhosphoSitePlus ABL2 NetworKIN ALK

0 10 20 30 40 0 5 10 15 20 25 0 20 40 60 80 120 0 10 20 30 40 50 0 10 20 30 40 50 60 70

MET PTK2 SRC EGFR MET PTK2 MET MET MET PTK2 PDGFRA GSK3A BMX ERBB2 FYN C CDK1 EPHA2 RET FYN EGFR GSK3A PTK2 FGR EPHA2 MAPK14 FYN ABL1 ABL1 ABL1 EPHA2 HIPK1 MST1R LYN PDGFRA PRKDC DYRK1A EGFR EPHA2 GSK3A MAPK14 AXL FGR PDGFRA FGFR1 HIPK1 PRPF4B WEE1 KDR EPHB4 EGFR HIPK3 PKMYT1 ITK TEC DYRK1A FYN FYN ERBB3 TYK2 EPHB4 FGFR1 ERBB2 LCK ABL2 ALK AXL DYRK2 MAP2K1 MAP2K1 PRPF4B INSR FLT1 AXL ABL1 PEAK1 GSK3B SRMS BMX HIPK3 MAPK1 GSK3A PDGFRB BTK TEC FGFR1 MNNG MNNG ALK MNNG YES1 MNNG CDK1 MNNG DYRK2 MAPK9 LYN ERBB4 DDR2 ABL1 Kinome JAK1 Activation Loop EPHA2 PhosphoSitePlus ABL2 NetworKIN DYRK1A INKA ranking

0 10 20 30 40 50 0 10 20 30 40 0 20 40 60 80 0 20 40 60 80 100 0 20 40 60 80 100 120 Spectral Counts Spectral Counts Spectral Counts Spectral Counts INKA Score bioRxiv preprint doi: https://doi.org/10.1101/259192; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A U87 / U87-EGFRvIII B U87-EGFRvIII C Hypopharyngeal cancer patient 23 D Pancreatic cancer patient 24 E ATM SRC 7 30’ EGFR EGFR SRC 5x10 120’ MET EPHA2 YES1 MET 4x107 +ATMi SRC SRC PRKCD GSK3A 7 FYN FYN MAPK3 FRK 3x10

YES1 YES1 YES1 EPHA2 7 INKA score INKA 2x10 10’ GSK3A GSK3A FYN FYN 360’ 7 ABL1 ABL1 CDK2 EGFR 1x10 0’ 120’ 30’ MAPK3 MAPK3 EGFR LCK 0 LCK LCK LCK GSK3B GSK3B GSK3B GSK3A MAPK3 control ABL1 NCS +10miNCnS +30min NCS +30min EPHA2 EPHA2 PRKCD NCS +120miNCSn +360min NCS +120min MAPK1 MAPK1 ERBB2 FGR INSR INSR CDK1 LYN F PRKDC/DNA-PK CDK1 CDK1 GSK3B MET 5x107 CDK2 CDK2 LYN JAK2 120’ 4x107 +ATMi PTK2 PTK2 FGR PTK6 30’ JAK2 FRK 3x107 JAK2 CDK2 360’ FGR FGR JAK2 PTK2 7 10’ INKA score INKA 2x10 mutant Untreated ABL2 ABL2 ABL2 Untreated CDK1 Untreated wild type Erlotinib Erlotinib 7 MAPK14 MAPK14 INSR TYK2 Erlotinib 1x10 0’ 120’ 30’ 0 0 50 100 150 200 0 50 100 150 200 0 20 40 60 0 5 10 15 20 control

INKA score INKA score INKA score INKA score NCS +10miNCnS +30min NCS +30min NCS +120miNCSn +360min NCS +120min