bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 A multi-tiered map of EMT defines major

2 transition points and identifies vulnerabilities

3 Indranil Paul1, Dante Bolzan3, Heather Hook4, Ahmed Youssef1, Gopal Karemore5, Michael UJ 4 Oliphant6, Weiwei Lin1, Qian Liu7, Sadhna Phanse1, Carl White1, Dzmitry Padhorny10, Sergei 5 Kotelnikov10, Guillaume Andrieu8, Pingzhao Hu7, Gerald V. Denis9, Dima Kozakov10, Brian 6 Raught12, Trevor Siggers4, Stefan Wuchty3,11†, Senthil Muthuswamy6†, Andrew Emili1,2†*

7 Affiliations

8 1 Department of Biochemistry, Boston University School of Medicine, Boston University, 72 East 9 Concord Street, Boston, MA 02118, USA

10 2 Department of Biology, Charles River Campus, Boston University, Life Science & Engineering 11 (LSEB-602), 24 Cummington Mall, Boston, MA 02215 USA

12 3 Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL 13 33146, USA

14 4 (a) Department of Biology, Boston University, 24 Cummington Mall, Boston, MA 02115, USA; (b) 15 Biological Design Center, Boston University, 610 Commonwealth Avenue, Boston, MA 02215

16 5 Advanced Analytics, Novo Nordisk A/S, 2760 Måløv, Denmark

17 6 Cancer Research Institute, Department of Medicine, Beth Israel Deaconnes Medical Center, 18 Boston MA 02115, USA

19 7 Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Manitoba 20 R3E 0J9, Canada

21 8 Team 2 - Institut Necker-Enfants Malades, INSERM U1151, Paris, France

22 9 Boston Medical Center Cancer Center, Boston University, Boston University, 72 East Concord 23 Street, Boston, MA 02118, USA

24 10 (a) Department of Applied Mathematics and Statistics, Stony Brook University, 11794 Stony 25 Brook, NY. (b) Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony 26 Brook, New York, 11794, United States

27 11 (a) Department of Biology, University of Miami, 1301 Memorial Drive, Coral Gables, FL 33146, 28 USA; (b) Sylvester Comprehensive Cancer Center, 1475 NW 12th Ave, Miami, FL 33136, USA

29 12 Discovery Tower (TMDT), 101 College St, Rm. 9-701A, University of Toronto, Toronto, Ontario, 30 M5G 1L7, Canada 31 32 † Co-communicating authors

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

33 *Correspondence to: [email protected] 34

2 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

35 Summary

36 TGFβ mediated epithelial to mesenchymal transition (EMT) proceeds through hybrid ‘E/M’ 37 states. A deeper understanding of these states and events which regulate entry to and exit from 38 the E/M states is needed for therapeutic exploitation. We quantified >60,000 molecules across 39 ten time points and twelve omic layers in mammary epithelial cells. Proteomes of whole cells, 40 phosphoproteins, nucleus, extracellular vesicles, secretome and membrane resolved major shifts, 41 E®E/M and E/M®M during EMT, and defined state-specific signatures. Metabolomics identified 42 early activation of arachidonic acid pathway and an -mediated switch from Cytochrome 43 P450 to Cyclooxygenase / Lipoxygenase branches during E®E/M. Single-cell transcriptomics 44 identified GLIS2 as an early modulator of EMT. Integrative modeling-predicted combinatorial 45 inhibition of AURKB, PP2A and SRC exposed vulnerabilities at E®E/M juncture. Covariance 46 analysis revealed remarkable discordance between and transcripts, and between 47 proteomic layers, implying insufficiency of current approaches. Overall, this dataset provides an 48 unprecedented resource on TGFβ signaling, EMT and cancer. 49 50 Short title

51 Multi-level time-resolved deconstruction of EMT program 52 53

3 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

54 Introduction

55 Epithelial to mesenchymal transition (EMT) regulates cell plasticity during embryonic 56 development, tissue homeostasis and cancer, where polarized epithelial (E) cells dedifferentiate, 57 transition through intermediate E/M hybrid states (E/M) and acquire mesenchymal (M) 58 properties (Nieto et al., 2016). These E/M share several clinically-relevant attributes of 59 circulating tumor cells (CTCs) and are responsible for EMT-associated stemness, 60 chemoresistance, immune evasion and metastasis (Dongre and Weinberg, 2019). Complete 61 molecular characterization of E/M states, and the mechanisms driving E®E/M and E/M®M 62 transitions will enable development of refined mechanistic models and discovery of new 63 therapeutic targets. 64 Approximately 150 are currently described as hallmarks of EMT (MSigDB) 65 (http://www.gsea-msigdb.org/gsea/msigdb/) that were identified from studies measuring the 66 expression of ‘endpoint’ markers (e.g. CDH1, MUC1, VIM, FN1) to track the process (Sha et al., 67 2019). The prevalent view on EMT as a transcriptionally-driven program is debatable (Yang et 68 al., 2020). In fact, the poor correlation between genomic alterations or mRNA levels and 69 (sub)cellular proteoforms during EMT has been noted (Califano and Alvarez, 2017; Liu et al., 70 2016). This is crucial because EMT is likely an emergent property where the shifts in cell 71 physiology and phenotype are orchestrated by intra- and extra-cellular signaling (Sigston and 72 Williams, 2017), extensive receptor-ligand crosstalk, relocalization (Hung and Link, 73 2011) and metabolic adaptations (Tabassum and Polyak, 2015; Thomson et al., 2019). Attempts 74 made to model EMT primarily invoke -regulatory networks (GRNs) controlled by key 75 transcription factors (TFs) and miRNAs. These models are usually based on a restricted number 76 of factors and thus do not capture the multi-layered architecture of signaling during EMT (Hong 77 et al., 2015; Zhang et al., 2014). To date, no EMT-focused studies have simultaneously measured 78 metabolite and changes at different functional levels (e.g., mRNA, total protein, 79 nucleus, secretome, etc.). 80 Consequentially, several aspects of EMT remain unclear. This includes dependencies between 81 molecular layers, secreted molecules, and specific signatures at various stages of EMT, kinetics 82 and scope of metabolic reprogramming, dynamics of subcellular protein localizations and ligand- 83 receptor mediated intercellular crosstalk. To bridge these gaps, we employed multiple high- 84 throughput platforms including microarray, scRNAseq and precision mass spectrometry (MS) to 85 quantify molecules spanning 12 distinct layers of biological information. Our ability to integrate 86 this information allowed us to define signatures of E/M states and identify molecular regulators 87 of key transition points. 88

89

90

91

4 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

92 RESULTS

93 A comprehensive resource on TGFβ-induced EMT

94 The human mammary epithelial cell line MCF10A has been widely used to investigate TGFβ 95 (transforming growth factor β) induced EMT (Liu et al., 2019b; Wei et al., 2015). To generate 96 extensive temporally resolved quantitative expression maps of the evolving EMT landscape, cells 97 were treated with TGFβ (TGF-β1; 10 ng/mL) over 12 days (0, 4 hrs, 1–6, 8 & 12 days), spanning 98 multiple ‘omics’ layers and employing complementary technology platforms (Fig. 1A-B, Fig. S1A- 99 C). Across all time points, after stringent quality control (see STAR Methods), we report nanoLC- 100 MS/MS-based quantifications (Fig. 1C, Table S1) of 6,540 whole cell (WC), 4,198 nuclear (Nuc), 101 2,223 plasma membrane (Mem), 1,209 extracellular vesicle (EV) and 1,133 secreted (Sec) 102 proteins. Using a serial enrichment workflow (see STAR Methods) we also quantified the total 103 phosphoproteome (Phos; 8,741 high-confidence sites on 2,254 proteins; including 6975 Ser, 962 104 Thr and 140 Tyr residues), N-glycoproteome (Glyco; 549 proteins), acetylome (Acet; 349 sites on 105 165 proteins) and peptidome (Pep; 547 peptides from 202 proteins). For proteomics, samples for 106 each point were multiplexed using isobaric tandem mass-tag (TMT-10), enabling higher 107 throughput and robust comparisons. In addition to proteins, we tracked cellular metabolism in 108 the same samples by nanoLC-MS/MS-based untargeted metabolomics, quantifying 8,325 HMDB- 109 indexed endogenous small molecules (Metabol). Furthermore, we measured 23,787 gene 110 transcripts (mRNA) and 2,578 microRNAs (miRNA) using microarrays. To assess cellular 111 heterogeneity, we employed scRNAseq to quantify transcriptomes of 1,913 individual cells (>200 112 cells per time point) undergoing EMT. 113 In total, we provide temporal quantifications of >60,000 proteins, phosphosites, mRNAs, miRNAs, 114 and metabolites combined, in addition to 9,785 mRNAs in our scRNAseq dataset (Fig. 1C). 115 Subcellular enrichments were performed using previously established MS-compatible protocols 116 (see STAR Methods), yielding high purity determined through keyword matching against a 117 cellular compartment annotation database (Fig. S1D). Quantitative reproducibility across the 3 118 biological replicates was excellent (Fig. S1E). The expression profiles in Fig. 1D provide a 119 snapshot of concurrent changes of a given gene over various layers during EMT. We reproduced 120 expected expression behavior for many established markers of EMT, including an increase of M 121 markers VIM, CDH2 and concomitant decrease of E markers SCRIB, MUC1. Using strict criteria for 122 differential expression (Benjamini-Hochberg adj. p-value < 0.05; r2 ≥ 0.6 & |log2FC| ≥ 1; see STAR 123 Methods) we identified >10,000 significantly regulated molecules (Table S2). As such, this study 124 is the largest experimental description of EMT regulome induced by TGFβ, till date. 125 We found that molecular abundances were highly variable across layers and time-points (Fig. 126 S1F) and all layers contributed significantly to the overall variation in the system (Fig. 1E). While 127 each layer showed substantive alterations, the magnitudes (Fig. S1G), response profiles (Fig. 1F) 128 and fraction of differentially expressed molecules (Fig. 1G) of each omic data set during the time 129 course varied significantly. These findings provide direct evidence for global reorganization of 130 the cellular architecture during EMT. For example, Phos and Sec showed the highest fractional 131 change (36% & 40%, respectively) (Fig. 1G), suggesting extensive intra- and extra-cellular 132 signaling during EMT (Scheel et al., 2011). These results add new depths to our current

5 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

133 understanding of TGFβ signaling and EMT (Fig. 1H, I).

134

135 Integrative multi-tiered topology of EMT reveals key transition points and 136 identifies molecular drivers

137 Although E®M progression is considered a continuum, distinct stages have been reported for 138 MCF10A cells (Zhang et al., 2014), and cancer tissues (Liu et al., 2019a; Pastushenko et al., 2018). 139 Using a phylogenetic clustering approach (Hughes and Friedman, 2009) we estimated ‘distances’ 140 between time steps to understand the transition kinetics (Fig. 2A). We observed that up to 24 141 hours cells maintained their parental E phenotype, while day 2 marked a switch-like exit from 142 the E state, suggesting that at day 2 the E/M began and continued through Day 6, after which cells 143 gradually entered the M state. Cells in day 2 to 6 can be further sub-divided into E/M-1 (Day 2/3, 144 late E) and E/M-2 (Day 4/5/6, early M) states. These cellular reconfigurations were consistent 145 with principal component analysis of individual layers, such as Mem, Phos and WC (Fig. S2A). 146 Our observations suggest that each molecular layer has a distinct and complementary role in 147 shaping the transition (Fig 2B, Fig 1E-G, Fig S1F, G). Although, the topology was generally 148 correlated (Fig. 2C), pairwise coefficients of determination (adj. R2) revealed a strikingly low 149 concordance between total proteins and other proteomic layers (mean R2 ranging from 0.015 for 150 Acet to 0.299 for Glyco) (Fig. 2D, Upper panel). The discordance was even stronger between 151 mRNA and various proteomic layers (mean R2 ranging from 0.011 for Pep to just 0.109 for WC) 152 (Fig. 2D, Lower panel), which increased further with EMT progression. Collectively, these 153 observations illustrate that at a systems level mRNA quantity is a poor proxy of protein 154 abundance. Total protein or mRNA quantities are also unreliable predictors of a proteins PTMs 155 or subcellular trafficking, which ultimately determines signaling output. 156 Next, to model the regulatory patterns of molecules, we employed a 3-step computational 157 workflow. First, we used a regression strategy for time-course measurements (Conesa et al., 158 2006) to remove residual noise (time dimension) and non-reliable (replicates) components. 159 Second, an unsupervised machine learning approach (self-organizing maps, SOMs) (Wirth et al., 160 2012) was applied on the retained molecules to generate time-point specific SOM portraits (Fig. 161 2E, Fig. S2B-D, see STAR Methods for details). SOMs are a powerful integration tool for large 162 and diverse global datasets to extract fundamental underlying patterns of co-regulation (Lu et al., 163 2009; Tamayo et al., 1999). Third, we performed pathway enrichments (Ulgen et al., 2019) of 164 these SOMs to elucidate the functional themes at each time-step of EMT (Fig. 2F). 165 SOMs traced the temporal unfolding of the transition and provided molecular fingerprints of each 166 time-step. Molecules ranking high (top 1% i.e., rank ≤ 250 & log2FC ≥ 1) with SOMs for control to 167 day 1 (430 unique molecules) primarily included proteins characteristic of an E state (Fig. 2E, 168 Table S3), such as, ACTG1 (Mem; establishes cell junctions and shape) and EPHB2 (Glyco; 169 regulates angiogenesis, contact dependent adhesion and migration; tumor suppressor). SOMs for 170 days 6–12 (252 unique molecules) contained established M markers (e.g., CALD1, CDH2, FLNA, 171 FN1, FSTL1, LGALS1, NT5E, TAGLN, TPM1, TPM2, TPM4, VIM). This group included molecules 172 with established roles in cancer and EMT, including miR-5189 (miR; targets ARF6 which

6 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

173 internalizes CDH1), CALR (Acet; Lys 62; calcium homeostasis, promotes metastasis, imparts 174 resistance to anoikis), ALCAM (Nuc; prognostic marker in multiple cancers linked to nuclear 175 translocation of β-catenin and stemness), ARHGAP33 (Sec; regulates intracellular trafficking), 176 RAPGEF5 (Sec; promotes nuclear translocation of β-catenin) and SOCS3 (Phos; Thr 3/13; E3 177 ligase, E3 ligase inhibiting TGFβ signaling). The onset of the major regulatory switch at day 2 is 178 likely driven by molecules peaking at its corresponding SOM (95 molecules) such as miR-675- 179 5p/H19 (miR; induces HIF1α, SNAIL activity), LTBP1 (Sec; master regulator of integrin- 180 dependent TGFβ activation) and CD44 (Sec; signal transduction). Notably, many of the 181 genes/molecules identified in the SOM, to our knowledge, have not been previously linked to 182 TGFβ signaling or EMT, presumably because they do not show clear changes in mRNA or WC but 183 in ‘other’ molecular layers, such as EV or Sec (Fig. S2E). This contrasts with most MSigDB 184 hallmarks of EMT for which we captured clear transcriptional profiles (Fig. S2F). 185 Enrichment analysis of active subnetworks (Ulgen et al., 2019) identified 237 significant 186 pathways (Fig. 2F, Table S3), discretized across sequential steps of EMT. For example, ‘Beta 187 oxidation of hexanoyl-CoA to butanoyl-CoA’ declined as cells leave E and enter E/M, indicating 188 reprogramming of mitochondrial fatty acid β-oxidation, consistent with a metastatic phenotype 189 (Ma et al., 2018). Conversely, ‘RHO GTPases mediated activation of ROCKs/PAKs/IQGAPs’ 190 increased as cells leave E/M and enter M, suggestive of their key role at this stage of EMT 191 (Ungefroren et al., 2018). The E/M specifically were associated with migration-associated 192 pathways such as ‘anchoring fibril formation’, ‘ECM proteoglycans’ and ‘laminin interactions’, 193 consistent with their shared property with CTCs. 194 Overall, we catalogued complex kinetics of thousands of molecules spanning multiple molecular 195 layers during EMT. Importantly, we predict signatures specific to each stage of EMT, e.g., the E/M, 196 with potential clinical value.

197

198 Metabolomics reveals kinetics and predicts novel enzyme-metabolite associations

199 TGFβ is known to regulate Warburg effect in cancer cells, but may also regulate other metabolic 200 pathways with implications for cancer management (Hua et al., 2019). Active subnetwork 201 analysis of WC and Phos datasets identified 13 enriched KEGG metabolic pathways during TGFβ- 202 induced EMT (Fig. S3A). Notably, significant enrichment (p–value ≤ 0.05) was only observed post 203 day 2 for pathways such as Steroid biosynthesis (SHB), Sphingolipid metabolism (SLM) and 204 Glycosaminoglycan degradation (GGM). To evaluate these gene-centric inferences of metabolic 205 phenotypes, we directly profiled intra-cellular small molecules by applying an optimized 206 untargeted metabolomics workflow to the same set of samples. Using stringent criteria (see STAR 207 Methods), we quantified >8,000 putative HMDB-compounds covering a wide range of chemical 208 classes (Fig. 3A). 209 Phylogenetic clustering (Fig. 3B) and SOM analysis (Fig. 3C) revealed kinetics of metabolic 210 remodeling prompted by TGFβ. In particular, a distinct temporal separation from parental E cells 211 was observed as early as 4 hours indicating rapid modulation of cellular metabolism by TGFβ, 212 preceding changes in most other layers. To glean further insights, we performed integrative

7 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

213 network analysis of metabolite SOMs with differential molecules in WC and Phos using 214 MetaboAnalyst (Chong et al., 2019) (Fig. 3D). This analysis reiterated several enriched pathways 215 predicted with protein expression alone, e.g., glycerophospholipid metabolism (GPLM) and lysine 216 degradation (LD). However, integration of metabolite and protein data within the framework of 217 metabolite SOMs revealed pathway activities representative of key transition steps of EMT driven 218 primarily by corresponding metabolite signatures (Fig. S3B-D). Indeed, we observed that 219 arachidonic acid metabolism (AAM), GPLM and LD pathways were activated within 4 hours of 220 TGFβ stimulation, which was not captured by gene enrichment. Consistent with observations in 221 Fig 2F, processes such as fatty acid metabolism, SHB and SLM appear after day 2, as cells prepare 222 for a metastatic phenotype (Koundouros and Poulogiannis, 2020). 223 Pairwise-correlation based integration of metabolite profiles with proteomics measurements 224 from the same samples could enable mechanistic predictions and aid discovery of novel players. 225 To explore this, we chose AAM as an example. Arachidonic acid is an omega-6 fatty acid stored as 226 membrane phosphoglycerolipid. Its cytosolic release enables stoichiometric chain reactions and 227 results in >100 functionally diverse compounds (Hanna and Hafez, 2018) impacting processes 228 such as redox state, proliferation, apoptosis and chemotaxis (Tallima and El Ridi, 2018). We 229 computed Pearson’s correlations between abundance of annotated , in WC and Phos, and 230 metabolites of AAM pathway, in Metabol, quantified at SOM for 4 hours to day 1 (Fig. 3E, F). 231 Overall, we found that metabolites mapping to AAM either increased linearly (CYP450 branch 232 and thromboxane) or initially decreased before rising (Lipoxygenase branch and prostaglandin) 233 (Fig. 3G), suggesting fine-tuned regulation of the different branches. Encouragingly, our enzyme- 234 metabolite association map identified PLA2G4A, the rate-limiting phospholipase of AAM pathway 235 (Hanna and Hafez, 2018), among the top candidates (Fig. 3F). Interestingly, correlation and 236 expression profiles of PLA2G15, another phospholipase, (Fig. 3E-G) indicated a potential 237 enzyme-mediated switch from Cytochrome P450 to Cyclooxygenase/Lipoxygenase branches 238 during EàE/M. 239 Overall, these observations provide direct evidence for the kinetics of metabolic reprogramming 240 during TGFβ induced EMT. We found metabolites and protein signatures coordinating processes 241 such as AAM, GPLM and LD during key stages of the transition. We also demonstrate how our 242 enzyme-metabolite correlation map could be used as a resource to predict novel enzymes for 243 observed metabolic changes.

244

245 scRNAseq analysis reveals heterogeneous responses to TGFβ and novel 246 transcriptional regulators of EMT

247 scRNAseq studies in murine epithelial Py2T cells treated with TGFβ (Krishnaswamy et al., 2018), 248 MCF10A cells undergoing confluence-dependent EMT (McFaline-Figueroa et al., 2019) and LPS- 249 induced EMT in alveolar epithelial cells (Riemondy et al., 2019), provided critical insights into 250 EMT. However, a temporal analysis of EMT to understand the transition states has not been 251 reported. 252 After quality control, we retained 1,913 single cells with a combined depth of 9,785 genes (Fig.

8 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

253 S4A, Table S1). As anticipated, many of the top expressing genes (TGFB1, TPT1, KRT6A, TMSB10, 254 MT2A) are key players in EMT (Fig. S4B). Interestingly, similar to many primary human tumors 255 (Aiello et al., 2018; Puram et al., 2018), we did not observe an explicit loss of several classical E 256 markers at the transcript level (Fig. 4A, Fig. S4C), indicating post-transcriptional regulation. This 257 may be energetically economical for tumor cells by conferring invasive properties without losing 258 capacity for metastatic seeding (Lambert et al., 2017). 259 To understand the stages of cell differentiation, we took advantage of the multiple time points in 260 our dataset (as opposed to a ‘pseudo-time’), using Monocle3 (see STAR Methods) and identified 261 20 cell clusters in 3 disjoint partitions (Fig. 4B, Fig. S4D). Partition P2 (12 clusters) represents 262 the primary EMT axis while P1/P3 predominantly expressed genes related to cell cycle (Fig. S4E) 263 and were ignored for further analysis. We observed that C3 responded strongly to TGFβ (Fig. 4C), 264 C4/6 resisted EMT, C5/C8 are the ‘transition’ states and C13/14 represented terminal M cells (in 265 terms of hallmark M markers; Fig. 4C, right panel). Examining clusters C9/18/19 which were 266 composed of cells from nearly all time points, suggests presence of stable M-type cells in MCF10A 267 populations and appears to be transcriptional impasse to TGFβ signaling. 268 To explore the underlying gene expression program, we used hierarchical clustering to group the 269 individual clusters into 6 subtypes (Fig. 4C) and employed SCENIC (Aibar et al., 2017) to infer 270 TFs and GRNs underlying these subtypes (Fig. D-F; Table S5). For each subtype, we identified 271 several unique (Fig. 4E) or highly active TFs (Fig. 4F), including both established and novel 272 players. Several TFs implicated in EMT (TWIST2, FOXK2, ZEB1, ID2, MSX1, ING4) were over- 273 represented in S4-S6, corresponding to later stages of EMT. In contrast, direct evidence of 274 mechanistic links of TFs enriched in early stages of EMT (i.e., S3, S4) are lacking, indicating gaps 275 in current EMT models. Using human TF–binding arrays (see STAR Methods, Fig. 4G) we 276 confirmed elevated activity of three S3 TFs, GLIS2, SP1 and ZNF266 upon TGFβ induction (Fig. 277 4H). In addition to providing experimental evidence to our predictions, the TF-binding array also 278 identified several other TFs potentially playing important roles at the early stages of EMT (Fig. 279 4I). 280 Overall, we provide a high-resolution temporal map of gene expression programs of individual 281 cells as they respond to TGFβ signaling and undergo EMT. More generally, our data suggests that 282 most transcriptional changes occur at early time points (also Fig. 2D, Lower panel), followed by 283 further adaptations driven predominantly by secondary or post-transcriptional mechanisms.

284

285 Spatial regulation of proteins and inter-cellular communication during EMT

286 Regulation of protein distribution is a crucial signaling mechanism (Ferrell, 1998), but remains 287 insufficiently understood in EMT. The average pairwise Pearson’s correlation between proteins 288 quantified in various cellular compartments (CCs) ranged between r = 0.12 to 0.58 indicating fine- 289 tuned regulation of protein distributions (Fig. 5A). We found 3,965 proteins localizing to ≥2 CCs 290 (Fig. S5A), which we categorized into 2 classes as follows: Class I proteins (1,424) displayed a 291 correlated trend (r ≥ 0.4) consistently across all CC pairs suggesting regulation primarily at or 292 before the translation step (Fig. 5B, C; Table S5). Class II proteins (1,205) displayed anti-

9 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

293 correlated trends (r ≤ –0.4) between any two CCs, implying active post-translational control of 294 their asymmetric distributions (Fig. 5D, E; Table S5). 295 The Bromodomain and ExtraTerminal (BET) cofactors, BRD2 and BRD4, were identified as Class 296 II proteins, with contrasting kinetic profiles (Fig. 5F). Using TF-binding arrays, we verified 297 enhanced global recruitment of BRD4, but not two other cofactors p300 and LSD1, (Fig. 5G) 298 indicating its importance during EMT. Indeed, treatment with a pan-BET inhibitor JQ-1 299 suppressed EMT (Fig. 5H). Another notable Class II protein was SCRIB (Fig. 5I), which regulates 300 apical-basal polarity and directional migration by acting as a molecular scaffold through protein- 301 protein interactions (Bonello and Peifer, 2019). Using an in vivo proximity ligation (BioID) screen 302 of SCRIB ((Fig. S5B, see STAR Methods), we identified multiple novel interactors (Table S6) of 303 which many have known roles in EMT (Fig. 5J). Using immunoprecipitation, we verified 304 interactions of SCRIB with SNAP23 and ARHGEF7 (Fig. 5K). 305 Regulation of extracellular signals is important for EMT (Scheel et al., 2011). Subtypes within a 306 cell population can differ in their capacity to send and receive signals, with implications for 307 metastasis and drug resistance (Kim et al., 2018; Tabassum and Polyak, 2015). To map the inter- 308 cellular communication between subgroups of cells during TGFβ-induced EMT, we integrate 309 proteomics and scRNAseq data to perform a systems-wide survey of ligand-receptor (L-R) pair 310 mediated crosstalk (Fig. 5L). 311 First, using a database of >2,500 curated binary L-R interactions (Ramilowski et al., 2015), we 312 searched for pairs of L and R in our Sec and Mem/Glyco datasets, respectively, assuming that co- 313 directional expression changes in L and/or R of a pair (FDR adj. p-value < 0.05 and combined L-R 314 |log2FC| ≥1) can indicate biological role. Currently, at least two L-R pairs are implicated in TGFβ- 315 induced EMT in several tissue models (Heldin et al., 2012). Our analysis detected 67 upregulated 316 and 12 downregulated L-R pairs at any given time point (Fig. 5M, Table S7) following TGFβ 317 treatment. Notably, none of these pairs have been directly implicated in TGFβ signaling or EMT, 318 although individually many of the identified L or R occur frequently in the context of EMT and/or 319 cancer. For instance, LAMC2 with its 7 receptors (CD151, COL17A1, ITGA2, ITGA3, ITGA6, ITGB1, 320 ITGB4) exhibit significant alteration during EMT. LAMC2 is overexpressed in cancers (Garg et al., 321 2014) and its silencing can reverse EMT (Moon et al., 2015; Pei et al., 2019). The cognate 322 receptors, CD151, ITGA3, ITGA6 and ITGB1 synergize with TGFβ signaling to promote metastatic 323 behavior (Pellinen et al., 2018; Sadej et al., 2010; Shirakihara et al., 2013; Zhang et al., 2017). 324 Next, by systematically comparing the expression patterns of L & R (identified above) among the 325 16 clusters (identified in our scRNAseq dataset), we obtained cell-cell communication networks 326 (sender ® receiver) (Fig. 5N, Fig. S5C, Table S7). For example, C13 cells, which appeared at day 327 3 and showed highest M genes expression (Fig. 4C), produced the receptor CD44 to its cognate 328 ligand MMP7 expressed by C17 (Fig. S5D). Together, this suggests that communication between 329 cell subgroups (C17:MMP7 ® C13:CD44) may exist during EMT which can have potential 330 ramifications for tumor growth (Tabassum and Polyak, 2015). Encouragingly, several of these 331 identified L-R pairs showed strong correlation (Fig. 5O) in human breast invasive carcinoma 332 samples (Cancer Genome Atlas Network, 2012; Cerami et al., 2012). 333 Our study provides a comprehensive analysis of TGFβ-triggered subcellular trafficking as cells

10 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

334 undergo EMT. Such translocations, potentially driven by differential PPI, could mediate the tight 335 coordination between functional modules (e.g., SCRIB complex) and EMT phenotypes such as 336 cytoskeletal rearrangement (Huang et al., 2012). We also uncovered novel cell-cell 337 communication pathways via L-R interactions in driving EMT, representing an untapped clinical 338 opportunity.

339

340 Modeling phosphoproteome dynamics during EMT reveals susceptibilities

341 Phosphoregulatory mechanisms are a key aspect of TGFβ signaling and EMT. We confidently 342 quantified 8,741 phosphosites (p-sites; 6,975 Ser, 962 Thr and 140 Tyr residues) (Fig. S6A-C) 343 over a dynamic range of 106 orders of magnitude (Fig. S6D), and phospho–STY frequencies (Fig. 344 S6E) in line with previous reports (D’Souza et al., 2014) mapping to 2,254 proteins (Fig. S6F). Of 345 all p-sites, 3,138 (35.8%) were differentially regulated in at least one time point (Fig. S6G). At 346 protein level, different patterns of regulation were noted; some proteins, such as DEK, VIM and 347 MISP, were regulated at ~90% of detected sites, some proteins, such as CAV1, CAMK2 and 348 GOLGA1, showed a ~50% mixture of regulated and unregulated sites, while others such as 349 AHNAK, PML and BCLAF1 showed ~2% differential sites (Fig. 6A). Interestingly, the fraction of 350 regulated p-sites in several proteins, e.g., VIM, increased with EMT progression (Fig. 6B). 351 We observed that in ~50%, p-sites dynamics were not explained (r ≤ 0.4) by a corresponding 352 change at the protein level (Fig. 6C, D). Interestingly, for ~26%, a directionally opposite change 353 between phosphorylation and the corresponding protein abundance was noted (r ≤ –0.1), 354 suggesting effects on protein stability. Phospho-regulated proteins were enriched for ‘nucleus’, 355 ‘’ and ‘focal adhesion’ annotations (Fig. 6E) reflecting the importance of CC 356 remodeling during EMT. 357 We also computed correlations between nuclear localization profiles (Nuc) and phosphorylation 358 kinetics (Phos) of individual proteins (Fig. 6F, G). For instance, phosphorylation of MICAL3 at 359 T684 and S685 regulates CSCs by promoting symmetric division (Tominaga et al., 2019). The 360 pattern of phosphorylation at residues T684, S685 and S687 (Fig. 6D, MICAL3) which are located 361 at the consensus NLS motif of MICAL3 suggests a role in regulating nuclear translocation and/or 362 retention at E/M®M transition. Indeed, the bipartite NLS motif of MICAL3 interacts with 363 Importin-α and the p-sites T684, S685, S687 are directly adjacent to the binding interface (Fig. 364 6H). 365 To analyze stage-specific kinase activities we generated a ternary model which distinguishes 366 active into 3 broad stages (i.e., E, E/M, and M) of EMT (Fig. 6I, Fig. S6H). An example that 367 illustrates the utility of this model are AKT isoforms that have distinct and opposing roles during 368 cancer development (Hinz and Jücker, 2019). We predict AKT1 is strongly associated with the E- 369 stage, which is consistent with its role in maintaining the E phenotype (Li et al., 2016). In fact, 370 depletion of AKT1 in MCF10A cells promoted TGFβ-induced E®E/M transition (Iliopoulos et al., 371 2009). Our model further predicts key roles for AKT2 and AKT3 at E/M®M transition. Indeed, 372 AKT2 and AKT3 were associated with tumor invasiveness, stemness and sensitivity to drug 373 treatment (Chin et al., 2014), key characteristics of the E/M populations (Dongre and Weinberg,

11 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

374 2019). Among several other kinases (Fig. S6J), our ternary model predicts critical roles for PRKCA 375 and AURKB at the junction of E®E/M and E/M®M transitions. PRKCA is reportedly a hub and 376 therapeutic target for EMT-induced breast CSCs (Tam et al., 2013). Similarly, inhibition of AURKB 377 was found to reverse EMT and reduce breast cancer metastasis in vivo (Zhang et al., 2020). 378 Overall, we reveal the rich intricacies of the phosphoregulome during EMT, identified functional 379 p-sites, predict novel kinase susceptibilities, and provide a mechanistic framework to enhance 380 understanding of the signaling mechanisms during EMT.

381

382 Integrative systems causal model of EMT identifies mechanistic vulnerabilities

383 Systems biology approaches that combine multiple molecular types (proteins, miRNAs, 384 metabolites) into a framework of established knowledge allow for a rich assessment of a 385 biological context (Hawe et al., 2019; Karczewski and Snyder, 2018). Using experimentally 386 validated functional priors (compiled from ENCODE, PhosphoSitePlus, SignaLink 2.0, SIGNOR 2.0, 387 HINT, miRTarBase and MetaBridge), we combined causal inference and PCSF (Prize Collecting 388 Steiner Forest) (Akhmedov et al., 2017) to construct hierarchical mechanistic models of the EMT 389 program (Fig. 7A, see STAR Methods). The final ‘EMT network’ comprised of 3,255 edges 390 connecting 2,217 molecules, including 723 kinase/phosphatase–substrate, 1,407 TF–target, 746 391 miRNA–target and 31 metabolite–gene interactions. 392 One of the potential applications of the EMT network is to discover signaling paths from 393 TGFBR1/2 to any gene(s) of interest within the network. As a demonstration, we queried several 394 genes (FN1, MMP7, CD44, SCRIB, TWISTNB, ZEB1, SNAI2) and recovered previously known and 395 unknown paths to them putatively active at multiple stages of EMT (Fig. 7B). Next, to identify key 396 factors driving EMT, we extracted ‘controller’ nodes (Liu et al., 2011; Vinayagam et al., 2016) 397 exerting a significant influence on EMT network topology (see STAR Methods). Survival analysis 398 against a large publicly available dataset of primary breast cancers with long-term patient 399 outcomes (Cancer Genome Atlas Network, 2012) showed a significant association between 400 tumors with altered expression of these controllers and shortened overall survival (Fig. 7C). Not 401 surprisingly, a few of them are established key regulators of metastasis (Fig. 7D), providing 402 independent support to our model. Unlike controllers, however, the non-controller nodes were 403 poorly represented in EMT literature (Fig. 7E), again highlighting gaps, and potentially 404 identifying new regulatory processes. 405 We queried the EMT network to identify the signaling context in which these controllers are 406 active at various key stages of EMT, which could also provide clues into mechanistic 407 vulnerabilities (Fig. 7F). As cells are stimulated with TGFβ, the TFs SMAD2 and SMAD3 are 408 activated, as expected. Another early responder was RHO GTPase RAC1, an effector of both KRAS 409 (Wu et al., 2014) and TGFβ signaling (Ungefroren et al., 2018), suggesting potential crosstalk. The 410 downstream effector of RAC1, MAPK14 (p38 MAPK), was also regulated early in EMT, suggesting 411 cooperation between RAC1 and MAPK pathways (Santibáñez et al., 2010). Our model suggests 412 SMAD3 regulates two other TF hubs, CEBPB (CCAAT/enhancer-binding protein β) and FOXA1. 413 Loss of CEBPB reportedly switches TGFβ signaling from growth-inhibiting to EMT-inducing

12 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

414 (Johansson et al., 2013), while FOXA1 is reportedly a key TF during EMT (Wang et al., 2013). We 415 observed that STAT3 is suppressed at later stages of EMT. A recent study in KRAS-driven lung 416 and pancreatic cancer found that STAT3 is required for maintaining the E state and is lost during 417 acquisition of M phenotypes (D’Amico et al., 2018). 418 The EMT network directly predicts novel avenues for blocking EMT. To assess this, we performed 419 a morphometry-based screening where we treated MCF10A cells with TGFβ in combination with 420 drugs which were predicted to inhibit several of the controller nodes active at E®E/M transition 421 (Fig. 7G, see STAR Methods). Our analysis using a custom-built and publicly accessible image 422 analysis software (Ochs et al., 2019) (Fig. 7G) showed noticeable efficacy of LB100+Barasertib, 423 LB100+PP1 and Sonidegib+Autocamtide in reverting the elongated (= ‘eccentric’) phenotype of 424 EMT-induced cells (Fig. 7H), thus providing direct experimental evidence to our predictions. 425 Overall, our EMT network recapitulates known signaling pathways, uncovers novel routes of 426 information flow to known regulators of EMT, identifies new signaling players and pathways, 427 makes substantial and provocative novel predictions and reveals cohesive time-resolved 428 regulatory patterns and mechanistic links between both controllers and non-controllers. 429

430 DISCUSSION

431 EMT and cancer are emergent systems (Sigston and Williams, 2017) wherein progression through 432 various stages is regulated by intricate networks of intra- and extra-cellular signaling within and 433 between cells. The key to understanding such complex biological phenomena are establishing 434 experimental workflows that integrate multiple tiers of biological information (Karczewski and 435 Snyder, 2018). 436 Theoretical discussions on EMT are often guided by the Waddington metaphor of a ball (=cell) 437 rolling over a phenotypic landscape, which is dynamically shaped by multiple parameters: 438 topologies of signaling networks, molecular stochasticity, extraneous cooperating and opposing 439 forces (e.g., EMT inducing and/or inhibitory ligands, interaction with other cells) (Li and Balazsi, 440 2018). By integrating several molecular layers, SOMs and ‘neighbor joining’ approaches revealed 441 the kinetics of cell-fate transitions and major phenotypic switch points driven by TGFβ. Several 442 studies have indicated molecular and phenotypic granularity in EMT continuum and suggested 443 existence of discrete metastable E/M (Sha et al., 2019). Many current EMT markers are biased 444 toward the later stages of EMT, when the process is approaching completion (Song et al., 2019). 445 Consequently, the molecular nature of E/M is still poorly understood. Our time-course integrative 446 SOM allowed us to trace the temporal unfolding and uncover the molecular nature of the E®M 447 transition. 448 While previous omics studies have measured either one or two molecular layers to describe EMT, 449 our multi-tiered datasets enabled the discovery of several new aspects of EMT which were not 450 captured by previous approaches. For example, the correlation between mRNA and proteins were 451 weak. Strikingly, correlations between the various proteomics layers were also found to be quite 452 poor, indicating that systems behavior cannot readily be extrapolated by any single layer (e.g., 453 mRNA or total proteome), but instead needs an integrated analysis of several molecular layers.

13 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

454 We show that TGFβ-induced EMT is only partially driven by transcription, where several E genes 455 are repressed only post-transcriptionally, while transcripts of M genes are upregulated but 456 mostly during earlier time points. At later stages, post-translational mechanisms become more 457 prominent in driving the process, suggesting that the regulatory control of EMT may be more 458 flexible than previously appreciated. Similarly, our results also predict the mechanistic 459 importance of protein subcellular localization during EMT. A comparison of proteins detected in 460 EV, Sec, Glyco, Mem and Nuc indicated extensive regulation of protein localizations. Further, 461 previous studies on EMT have largely focused on cell-autonomous signaling, whereas multiple 462 inter-cellular signaling mechanisms are evident from our integrative analysis. Indeed, we found 463 both EV and Sec to be extensively regulated during EMT. Once considered as cellular ‘garbage 464 bins’, their active participation in signaling and crosstalk is increasingly recognized (H. Rashed et 465 al., 2017). Our results demonstrating limited qualitative overlap between EV and Sec hint at a 466 potentially overlooked mechanistic distinction and provides opportunities for new biological 467 insights. We utilized our Mem, Glyco and Sec datasets to provide a repertoire of 79 putative new 468 L-R pairs with potential roles during EMT, which should be validated in future studies. By 469 combining these results with the scRNAseq profiles, we were able to define an extensive network 470 of inter-cellular communications. 471 In conclusion, we have established a comprehensive multi-tiered molecular landscape of TGFβ- 472 induced EMT. This study aims to provide a valuable resource which is accessible through an 473 interactive website (https://www.bu.edu/dbin/cnsb/emtapp/) (Fig. S7) and will strongly 474 complement hypothesis-driven research with direct implications for epithelial cancers. 475

476

477

14 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

478 REFERENCES

479 Aibar, S., González-Blas, C.B., Moerman, T., Huynh-Thu, V.A., Imrichova, H., Hulselmans, G., 480 Rambow, F., Marine, J.-C., Geurts, P., Aerts, J., et al. (2017). SCENIC: Single-cell regulatory 481 network inference and clustering. Nat. Methods 14, 1083–1086.

482 Aiello, N.M., Maddipati, R., Norgard, R.J., Balli, D., Li, J., Yuan, S., Yamazoe, T., Black, T., Sahmoud, 483 A., Furth, E.E., et al. (2018). EMT Subtype Influences Epithelial Plasticity and Mode of Cell 484 Migration. Dev. Cell 45, 681-695.e4.

485 Akhmedov, M., Kedaigle, A., Chong, R.E., Montemanni, R., Bertoni, F., Fraenkel, E., and Kwee, I. 486 (2017). PCSF: An R-package for network-based interpretation of high-throughput data. PLOS 487 Comput. Biol. 13, e1005694.

488 Bonello, T.T., and Peifer, M. (2019). Scribble: A master scaffold in polarity, adhesion, 489 synaptogenesis, and proliferation. J. Cell Biol. 218, 742–756.

490 Califano, A., and Alvarez, M.J. (2017). The recurrent architecture of tumour initiation, 491 progression and drug sensitivity. Nat. Rev. Cancer 17, 116–130.

492 Cancer Genome Atlas Network (2012). Comprehensive molecular portraits of human breast 493 tumours. Nature 490, 61–70.

494 Cerami, E., Gao, J., Dogrusoz, U., Gross, B.E., Sumer, S.O., Aksoy, B.A., Jacobsen, A., Byrne, C.J., 495 Heuer, M.L., Larsson, E., et al. (2012). The cBio Cancer Genomics Portal: An Open Platform for 496 Exploring Multidimensional Cancer Genomics Data: Figure 1. Cancer Discov. 2, 401–404.

497 Chin, Y.R., Yoshida, T., Marusyk, A., Beck, A.H., Polyak, K., and Toker, A. (2014). Targeting Akt3 498 Signaling in Triple-Negative Breast Cancer. Cancer Res. 74, 964–973.

499 Chong, J., Wishart, D.S., and Xia, J. (2019). Using MetaboAnalyst 4.0 for Comprehensive and 500 Integrative Metabolomics Data Analysis. Curr. Protoc. Bioinforma. 68, e86.

501 Conesa, A., Nueda, M.J., Ferrer, A., and Talón, M. (2006). maSigPro: a method to identify 502 significantly differential expression profiles in time-course microarray experiments. 503 Bioinforma. Oxf. Engl. 22, 1096–1102.

504 D’Amico, S., Shi, J., Martin, B.L., Crawford, H.C., Petrenko, O., and Reich, N.C. (2018). STAT3 is a 505 master regulator of epithelial identity and KRAS-driven tumorigenesis. Genes Dev. 32, 1175– 506 1187.

507 Dongre, A., and Weinberg, R.A. (2019). New insights into the mechanisms of epithelial– 508 mesenchymal transition and implications for cancer. Nat. Rev. Mol. Cell Biol. 20, 69–84.

509 D’Souza, R.C.J., Knittle, A.M., Nagaraj, N., Dinther, M. van, Choudhary, C., Dijke, P. ten, Mann, M., 510 and Sharma, K. (2014). Time-resolved dissection of early phosphoproteome and ensuing 511 proteome changes in response to TGF-β. Sci. Signal. 7, rs5–rs5.

512 Ferrell, J.E. (1998). How regulated protein translocation can produce switch-like responses. 513 Trends Biochem. Sci. 23, 461–465.

514 Garg, M., Braunstein, G., and Koeffler, H.P. (2014). LAMC2 as a therapeutic target for cancers. 515 Expert Opin. Ther. Targets 18, 979–982.

15 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

516 H. Rashed, M., Bayraktar, E., K. Helal, G., Abd-Ellah, M.F., Amero, P., Chavez-Reyes, A., and 517 Rodriguez-Aguayo, C. (2017). Exosomes: From Garbage Bins to Promising Therapeutic Targets. 518 Int. J. Mol. Sci. 18.

519 Hanna, V.S., and Hafez, E.A.A. (2018). Synopsis of arachidonic acid metabolism: A review. J. Adv. 520 Res. 11, 23–32.

521 Hawe, J.S., Theis, F.J., and Heinig, M. (2019). Inferring Interaction Networks From Multi-Omics 522 Data. Front. Genet. 10.

523 Heldin, C.-H., Vanlandewijck, M., and Moustakas, A. (2012). Regulation of EMT by TGFβ in 524 cancer. FEBS Lett. 586, 1959–1970.

525 Hinz, N., and Jücker, M. (2019). Distinct functions of AKT isoforms in breast cancer: a 526 comprehensive review. Cell Commun. Signal. CCS 17.

527 Hong, T., Watanabe, K., Ta, C.H., Villarreal-Ponce, A., Nie, Q., and Dai, X. (2015). An Ovol2-Zeb1 528 Mutual Inhibitory Circuit Governs Bidirectional and Multi-step Transition between Epithelial 529 and Mesenchymal States. PLOS Comput. Biol. 11, e1004569.

530 Hua, W., ten Dijke, P., Kostidis, S., Giera, M., and Hornsveld, M. (2019). TGFβ-induced metabolic 531 reprogramming during epithelial-to-mesenchymal transition in cancer. Cell. Mol. Life Sci.

532 Huang, R.Y.-J., Guilford, P., and Thiery, J.P. (2012). Early events in cell adhesion and polarity 533 during epithelial-mesenchymal transition. J. Cell Sci. 125, 4417–4422.

534 Hughes, A.L., and Friedman, R. (2009). A phylogenetic approach to gene expression data: 535 evidence for the evolutionary origin of mammalian leukocyte phenotypes. Evol. Dev. 11, 382– 536 390.

537 Hung, M.-C., and Link, W. (2011). Protein localization in disease and therapy. J. Cell Sci. 124, 538 3381–3392.

539 Iliopoulos, D., Polytarchou, C., Hatziapostolou, M., Kottakis, F., Maroulakou, I.G., Struhl, K., and 540 Tsichlis, P.N. (2009). MicroRNAs differentially regulated by Akt isoforms control EMT and stem 541 cell renewal in cancer cells. Sci. Signal. 2, ra62.

542 Johansson, J., Berg, T., Kurzejamska, E., Pang, M.-F., Tabor, V., Jansson, M., Roswall, P., Pietras, K., 543 Sund, M., Religa, P., et al. (2013). MiR-155-mediated loss of C/EBPβ shifts the TGF-β response 544 from growth inhibition to epithelial-mesenchymal transition, invasion and metastasis in breast 545 cancer. Oncogene 32, 5614–5624.

546 Karczewski, K.J., and Snyder, M.P. (2018). Integrative omics for health and disease. Nat. Rev. 547 Genet. 19, 299–310.

548 Kim, E., Kim, J.-Y., Smith, M.A., Haura, E.B., and Anderson, A.R.A. (2018). Cell signaling 549 heterogeneity is modulated by both cell-intrinsic and -extrinsic mechanisms: An integrated 550 approach to understanding targeted therapy. PLoS Biol. 16.

551 Koundouros, N., and Poulogiannis, G. (2020). Reprogramming of fatty acid metabolism in 552 cancer. Br. J. Cancer 122, 4–22.

553 Krishnaswamy, S., Zivanovic, N., Sharma, R., Pe’er, D., and Bodenmiller, B. (2018). Learning time- 554 varying information flow from single-cell epithelial to mesenchymal transition data. PLoS ONE 555 13.

16 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

556 Lambert, A.W., Pattabiraman, D.R., and Weinberg, R.A. (2017). Emerging Biological Principles of 557 Metastasis. Cell 168, 670–691.

558 Li, C., and Balazsi, G. (2018). A landscape view on the interplay between EMT and cancer 559 metastasis. Npj Syst. Biol. Appl. 4, 1–9.

560 Li, C.-W., Xia, W., Lim, S.-O., Hsu, J.L., Huo, L., Wu, Y., Li, L.-Y., Lai, C.-C., Chang, S.-S., Hsu, Y.-H., et 561 al. (2016). AKT1 Inhibits Epithelial-to-Mesenchymal Transition in Breast Cancer through 562 Phosphorylation-Dependent Twist1 Degradation. Cancer Res. 76, 1451–1462.

563 Liu, X., Li, J., Cadilha, B.L., Markota, A., Voigt, C., Huang, Z., Lin, P.P., Wang, D.D., Dai, J., Kranz, G., et 564 al. (2019a). Epithelial-type systemic breast carcinoma cells with a restricted mesenchymal 565 transition are a major source of metastasis. Sci. Adv. 5, eaav4275.

566 Liu, Y., Beyer, A., and Aebersold, R. (2016). On the Dependency of Cellular Protein Levels on 567 mRNA Abundance. Cell 165, 535–550.

568 Liu, Y., Xue, M., Du, S., Feng, W., Zhang, K., Zhang, L., Liu, H., Jia, G., Wu, L., Hu, X., et al. (2019b). 569 Competitive endogenous RNA is an intrinsic component of EMT regulatory circuits and 570 modulates EMT. Nat. Commun. 10, 1637.

571 Liu, Y.-Y., Slotine, J.-J., and Barabási, A.-L. (2011). Controllability of complex networks. Nature 572 473, 167–173.

573 Lu, R., Markowetz, F., Unwin, R.D., Leek, J.T., Airoldi, E.M., MacArthur, B.D., Lachmann, A., Rozov, 574 R., Ma’ayan, A., Boyer, L.A., et al. (2009). Systems-level dynamic analyses of fate change in 575 murine embryonic stem cells. Nature 462, 358–362.

576 Ma, Y., Temkin, S.M., Hawkridge, A.M., Guo, C., Wang, W., Wang, X.-Y., and Fang, X. (2018). Fatty 577 acid oxidation: An emerging facet of metabolic transformation in cancer. Cancer Lett. 435, 92– 578 100.

579 McFaline-Figueroa, J.L., Hill, A.J., Qiu, X., Jackson, D., Shendure, J., and Trapnell, C. (2019). A 580 pooled single-cell genetic screen identifies regulatory checkpoints in the continuum of the 581 epithelial-to-mesenchymal transition. Nat. Genet. 51, 1389–1398.

582 Moon, Y.W., Rao, G., Kim, J.J., Shim, H.-S., Park, K.-S., An, S.S., Kim, B., Steeg, P.S., Sarfaraz, S., 583 Changwoo Lee, L., et al. (2015). LAMC2 enhances the metastatic potential of lung 584 adenocarcinoma. Cell Death Differ. 22, 1341–1352.

585 Nieto, M.A., Huang, R.Y.-J., Jackson, R.A., and Thiery, J.P. (2016). EMT: 2016. Cell 166, 21–45.

586 Ochs, F., Karemore, G., Miron, E., Brown, J., Sedlackova, H., Rask, M.-B., Lampe, M., Buckle, V., 587 Schermelleh, L., Lukas, J., et al. (2019). Stabilization of chromatin topology safeguards genome 588 integrity. Nature 574, 571–574.

589 Pastushenko, I., Brisebarre, A., Sifrim, A., Fioramonti, M., Revenco, T., Boumahdi, S., Van 590 Keymeulen, A., Brown, D., Moers, V., Lemaire, S., et al. (2018). Identification of the tumour 591 transition states occurring during EMT. Nature 556, 463–468.

592 Pei, Y.-F., Liu, J., Cheng, J., Wu, W.-D., and Liu, X.-Q. (2019). Silencing of LAMC2 Reverses 593 Epithelial-Mesenchymal Transition and Inhibits Angiogenesis in Cholangiocarcinoma via 594 Inactivation of the Epidermal Growth Factor Receptor Signaling Pathway. Am. J. Pathol. 189, 595 1637–1653.

17 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

596 Pellinen, T., Blom, S., Sánchez, S., Välimäki, K., Mpindi, J.-P., Azegrouz, H., Strippoli, R., Nieto, R., 597 Vitón, M., Palacios, I., et al. (2018). ITGB1-dependent upregulation of -1 switches TGFβ 598 signalling from tumour-suppressive to oncogenic in prostate cancer. Sci. Rep. 8, 1–14.

599 Puram, S.V., Parikh, A.S., and Tirosh, I. (2018). Single cell RNA-seq highlights a role for a partial 600 EMT in head and neck cancer. Mol. Cell. Oncol. 5, e1448244.

601 Ramilowski, J.A., Goldberg, T., Harshbarger, J., Kloppmann, E., Lizio, M., Satagopam, V.P., Itoh, M., 602 Kawaji, H., Carninci, P., Rost, B., et al. (2015). A draft network of ligand–receptor-mediated 603 multicellular signalling in human. Nat. Commun. 6, 1–12.

604 Riemondy, K.A., Jansing, N.L., Jiang, P., Redente, E.F., Gillen, A.E., Fu, R., Miller, A.J., Spence, J.R., 605 Gerber, A.N., Hesselberth, J.R., et al. (2019). Single-cell RNA sequencing identifies TGF-β as a key 606 regenerative cue following LPS-induced lung injury. JCI Insight 4, e123637.

607 Sadej, R., Romanska, H., Kavanagh, D., Baldwin, G., Takahashi, T., Kalia, N., and Berditchevski, F. 608 (2010). Tetraspanin CD151 Regulates Transforming Growth Factor β Signaling: Implication in 609 Tumor Metastasis. Cancer Res. 70, 6059–6070.

610 Santibáñez, J.F., Kocić, J., Fabra, A., Cano, A., and Quintanilla, M. (2010). Rac1 modulates TGF-β1- 611 mediated epithelial cell plasticity and MMP9 production in transformed keratinocytes. FEBS 612 Lett. 584, 2305–2310.

613 Scheel, C., Eaton, E.N., Li, S.H.-J., Chaffer, C.L., Reinhardt, F., Kah, K.-J., Bell, G., Guo, W., Rubin, J., 614 Richardson, A.L., et al. (2011). Paracrine and Autocrine Signals Induce and Maintain 615 Mesenchymal and Stem Cell States in the Breast. Cell 145, 926–940.

616 Sha, Y., Haensel, D., Gutierrez, G., Du, H., Dai, X., and Nie, Q. (2019). Intermediate cell states in 617 epithelial-to-mesenchymal transition. Phys. Biol. 16, 021001.

618 Shirakihara, T., Kawasaki, T., Fukagawa, A., Semba, K., Sakai, R., Miyazono, K., Miyazawa, K., and 619 Saitoh, M. (2013). Identification of integrin α3 as a molecular marker of cells undergoing 620 epithelial-mesenchymal transition and of cancer cells with aggressive phenotypes. Cancer Sci. 621 104, 1189–1197.

622 Sigston, E.A.W., and Williams, B.R.G. (2017). An Emergence Framework of Carcinogenesis. Front. 623 Oncol. 7.

624 Song, J., Wang, W., Wang, Y., Qin, Y., Wang, Y., Zhou, J., Wang, X., Zhang, Y., and Wang, Q. (2019). 625 Epithelial-mesenchymal transition markers screened in a cell-based model and validated in lung 626 adenocarcinoma. BMC Cancer 19, 680.

627 Tabassum, D.P., and Polyak, K. (2015). Tumorigenesis: it takes a village. Nat. Rev. Cancer 15, 628 473–483.

629 Tallima, H., and El Ridi, R. (2018). Arachidonic acid: Physiological roles and potential health 630 benefits – A review. J. Adv. Res. 11, 33–41.

631 Tam, W.L., Lu, H., Buikhuisen, J., Soh, B.S., Lim, E., Reinhardt, F., Wu, Z.J., Krall, J.A., Bierie, B., Guo, 632 W., et al. (2013). Protein Kinase C α Is a Central Signaling Node and Therapeutic Target for 633 Breast Cancer Stem Cells. Cancer Cell 24, 347–364.

634 Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., and Golub, 635 T.R. (1999). Interpreting patterns of gene expression with self-organizing maps: Methods and 636 application to hematopoietic differentiation. Proc. Natl. Acad. Sci. 96, 2907–2912.

18 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

637 Thomson, T.M., Balcells, C., and Cascante, M. (2019). Metabolic Plasticity and Epithelial- 638 Mesenchymal Transition. J. Clin. Med. 8, 967.

639 Tominaga, K., Minato, H., Murayama, T., Sasahara, A., Nishimura, T., Kiyokawa, E., Kanauchi, H., 640 Shimizu, S., Sato, A., Nishioka, K., et al. (2019). Semaphorin signaling via MICAL3 induces 641 symmetric cell division to expand breast cancer stem-like cells. Proc. Natl. Acad. Sci. 116, 625– 642 630.

643 Ulgen, E., Ozisik, O., and Sezerman, O.U. (2019). pathfindR: An R Package for Comprehensive 644 Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. 645 10.

646 Ungefroren, H., Witte, D., and Lehnert, H. (2018). The role of small GTPases of the Rho/Rac 647 family in TGF-β-induced EMT and cell motility in cancer. Dev. Dyn. 247, 451–461.

648 Vinayagam, A., Gibson, T.E., Lee, H.-J., Yilmazel, B., Roesel, C., Hu, Y., Kwon, Y., Sharma, A., Liu, Y.- 649 Y., Perrimon, N., et al. (2016). Controllability analysis of the directed human protein interaction 650 network identifies disease genes and drug targets. Proc. Natl. Acad. Sci. U. S. A. 113, 4976–4981.

651 Wang, H., Meyer, C.A., Fei, T., Wang, G., Zhang, F., and Liu, X.S. (2013). A systematic approach 652 identifies FOXA1 as a key factor in the loss of epithelial traits during the epithelial-to- 653 mesenchymal transition in lung cancer. BMC Genomics 14, 680.

654 Wei, S.C., Fattet, L., Tsai, J.H., Guo, Y., Pai, V.H., Majeski, H.E., Chen, A.C., Sah, R.L., Taylor, S.S., 655 Engler, A.J., et al. (2015). Matrix stiffness drives Epithelial-Mesenchymal Transition and tumour 656 metastasis through a TWIST1-G3BP2 mechanotransduction pathway. Nat. Cell Biol. 17, 678– 657 688.

658 Wirth, H., von Bergen, M., and Binder, H. (2012). Mining SOM expression portraits: feature 659 selection and integrating concepts of molecular function. BioData Min. 5, 18.

660 Wu, C.-Y.C., Carpenter, E.S., Takeuchi, K.K., Halbrook, C.J., Peverley, L.V., Bien, H., Hall, J.C., 661 DelGiorno, K.E., Pal, D., Song, Y., et al. (2014). PI3K regulation of RAC1 is required for KRAS- 662 induced pancreatic tumorigenesis in mice. Gastroenterology 147, 1405-1416.e7.

663 Yang, J., Antin, P., Berx, G., Blanpain, C., Brabletz, T., Bronner, M., Campbell, K., Cano, A., 664 Casanova, J., Christofori, G., et al. (2020). Guidelines and definitions for research on epithelial– 665 mesenchymal transition. Nat. Rev. Mol. Cell Biol. 1–12.

666 Zhang, J., Tian, X.-J., Zhang, H., Teng, Y., Li, R., Bai, F., Elankumaran, S., and Xing, J. (2014). TGF-β– 667 induced epithelial-to-mesenchymal transition proceeds through stepwise activation of multiple 668 feedback loops. Sci Signal 7, ra91–ra91.

669 Zhang, J., Lin, X., Wu, L., Huang, J.-J., Jiang, W.-Q., Kipps, T.J., and Zhang, S. (2020). Aurora B 670 induces epithelial–mesenchymal transition by stabilizing Snail1 to promote basal-like breast 671 cancer metastasis. Oncogene 39, 2550–2567.

672 Zhang, K., Myllymäki, S.-M., Gao, P., Devarajan, R., Kytölä, V., Nykter, M., Wei, G.-H., and 673 Manninen, A. (2017). Oncogenic K-Ras upregulates ITGA6 expression via FOSL1 to induce 674 anoikis resistance and synergizes with αV-Class integrins to promote EMT. Oncogene 36, 5681– 675 5694.

676 677

19 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

678 Author contributions 679 Indranil Paul: Conceptualization, Methodology, Software, Data curation, Writing – Original draft, 680 Visualization, Investigation, Formal analysis 681 Dante Bolzan: Software, Formal analysis 682 Heather Joan Hook: Investigation 683 Ahmed Youssef: Software 684 Gopal Karemore: Software 685 Michael Uretz John Oliphant: Investigation 686 Weiwei Lin: Investigation 687 Qian Liu: Formal analysis 688 Sadhna Phanse: Software 689 Dzmitry Padhorny: Software 690 Sergei Kotelnikov: Software 691 Carl White: Software 692 Andrieu Guillaume: Investigation 693 Pingzhao Hu: Formal analysis 694 Gerald Denis: Investigation 695 Dima Kozakov: Software 696 Brian Raught: Investigation 697 Trevor Siggers: Investigation 698 Stefan Wuchty: Software, Formal analysis 699 Senthil Muthuswamy: Conceptualization, Investigation 700 Andrew Emili: Conceptualization, Supervision, Writing – Original draft, Resources, Funding 701 acquisition, Formal analysis

702 Acknowledgements 703 We dedicate our sincere gratitude to Stefano Monti, Xarelabos Varelas, Valentina Perissi and 704 Matthew Layne for providing critical feedback on the manuscript. We thank the Microarray and 705 Sequencing Core Facility at Boston University School of Medicine for acquisition of mRNA, miRNA 706 and scRNAseq data.

707 Funding 708 S.M., G.D., and A.E. acknowledge joint funding (UO1CA243004) from the NCI program Research 709 Projects in Cancer Systems Biology. The CNSB has generous ongoing support from Boston 710 University.

711 Competing interests 712 The authors declare no competing financial interest. 713

20 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

714 Main figure legends

715 Figure 1. A multi-dimensional resource on TGFβ-induced EMT 716 (A) MCF10A cells were exposed to TGFβ for the indicated time points. A major premise of the 717 study was to characterize various cell-states and the transition points through which EMT 718 occurs. 719 (B) Samples from 3 biological replicates were aliquoted and multiple technology platforms 720 were employed to quantify various molecular layers. 721 (C) An overview of numbers of molecules quantified in various layers. 722 (D) Expression snapshots of some well-known EMT markers. Heatmaps show log2FC values 723 (relative to Control, FDR = 0.05). 724 (E) Variance explained by each layer over 4 principal components. 725 (F) Ridge plot showing differential molecules, as % of total quantified, for each layer. 726 (G) Pie chart showing the overall fraction (in %) of differential molecules (yellow portion) 727 relative to all molecules quantified in each layer. 728 (H) Overlap between established EMT databases (MSigDB, www.gsea-msigdb.org; dbEMT2.0, 729 http://dbemt.bioinfo-minzhao.org) and differential proteins and miRNAs (adj. p-value < 730 0.05; r2 ≥ 0.6 & |log2FC| ≥ 1) from this study. 731 (I) Differential molecules (proteins, miRNAs, metabolites) from this study were used to 732 assess the number of coherent functional modules, i.e., biological interactions or network 733 edges, by employing the Prize-Collecting Steiner Forest algorithm on an interaction 734 network compiled from PathwayCommons, miRTarBase and STITCH.

735 Figure 2. The topological architecture of EMT 736 (A) Neighbor-joining was applied as a distance metric to reveal the similarities between 737 timepoints, considering all bulk measurements, excluding mRNA (see text). 738 (B) Combined pseudo-eigenvalues space of all datasets, indicating the contribution of each 739 dataset to the eigenvalue (variance). 740 (C) Matrix correlations between each pair of datasets. 741 (D) Line plots show the distribution of adjusted coefficient of determination (R2) values 742 between layers as a function of time points. 743 (E) Left panel. SOM-portraits characterizing expression landscape of the samples. Color 744 gradient refers to over- or under-expression of metagenes (=cluster of genes) in each 745 time-point compared to the mean expression level of the metagene in the pool of all time- 746 points: red = high, yellow/green = intermediate levels and blue = low (see STAR Methods 747 for details). Middle panel. Representative examples that appeared among the highest- 748 ranking features (top 1%) in each time-point. Right panel. Barplot shows the number of 749 features that were contributed by each layer among the highest-ranking features (top 750 1%) in time-point. 751 (F) Heatmap depicts sample-wise pathway scores, which were derived from subnetwork- 752 based pathway enrichment analysis using the highest-ranking molecules (top 1%) for 753 each SOM.

754

755

21 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

756 Figure 3. Dynamics of TGFβ-induced metabolic adaptations 757 (A) Barplot showing the ‘class’ distribution of quantified metabolite features as defined in 758 HMBD. Also shown are the relative proportions of differentially expressed features for 759 each class (p-value ≤ 0.01, absolute log2FC ≥ 1). 760 (B) Neighbor-joining tree for the Metabol dataset. 761 (C) SOM analysis of the Metabol dataset. 762 (D) Top 10% of metabolite features of each SOM were grouped based on clusters in `B` and 763 used for “Network analysis” using MetaboAnalyst (https://www.metaboanalyst.ca). 764 (E) Identified metabolites of AAM pathway in SOM for 4 hours to day 1 were taken and 765 Pearson’s correlation computed with known metabolic enzymes (KEGG) quantified in the 766 Phos & Prot datasets. 767 (F) The plot shows enzymes ranked according to their Pearson’s correlation with metabolites 768 of AAM pathway, as detected in this study. 769 (G) Schematics of information flow from TGFβ signaling to AAM pathway, mediated by known 770 enzymes. Heatmaps and line plots display ‘standardized’ expression values over the time 771 points.

772 Figure 4. scRNAseq analysis reveals cellular dynamics and novel TFs for EMT 773 (A) Violin plots showing expression of well-known EMT hallmarks in each time point. 774 (B) UMAP of scRNAseq by Monocle3. Dots represent single cells and are colored by inferred 775 clusters, while trajectories depict cells during EMT. 776 (C) Heatmap showing the number of cells (in log2 scale) in each cluster of partition P2. Mean 777 expression of some well-known E and M markers in each cluster are also shown. 778 (D) The plot shows all TFs ranked according to their SCENIC score. TF names are shown for 779 the 5 TFs with the highest scores and some well-known EMT associated TFs. 780 (E) Tree displays unique TFs identified by SCENIC for each subtype. TFs highlighted in ‘bold’ 781 are known players in EMT as of dbEMT2.0. The genes in the outer circle are 782 representative examples used by SCENIC to infer TF activity. 783 (F) Barplot shows the inferred activities of top 15 TFs based on SCENIC scores in each 784 subtype. 785 (G) Schematics of the human TF-binding array workflow (see STAR Methods for details). 786 (H) & (I) Density plots of ∂B-scores of indicated TFs. Two-sided Kolmogorov-Smirnov test 787 was performed to evaluate the significance of distribution differences between Control 788 and TGFβ treated conditions.

789 Figure 5. Spatial regulation of proteins and intercellular communication 790 (A) The schematic summarizes correlations based on Pearson’s coefficients between the 791 indicated layers. Each pie chart depicts the fraction of differential proteins (orange slice) 792 with respect to all proteins quantified in the layer. The dotted lines indicate correlation 793 of overlapping proteins between layers. 794 (B) The plot displays top 25 Class I proteins. Each pair is represented by a different shape, as 795 shown. 796 (C) Expression profiles of top 3 Class I proteins. Each colored line represents a molecular 797 layer, as shown. 798 (D) The plot displays top 25 Class II proteins. Each pair is represented by a different shape, 799 as shown.

22 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

800 (E) Expression profiles of top 3 Class II proteins. Each colored line represents a molecular 801 layer, as shown. 802 (F) Expression profiles of BRD2 and BRD4 in various layers, as indicated. Legend as in E 803 (G) Density plots of ∂B-scores of indicated co-factors. Two-sided Kolmogorov-Smirnov test 804 was performed to evaluate the significance of difference in distributions. 805 (H) Phase-contrast images after 6 days of TGFβ treatment in presence or absence of active JQ- 806 1 (100 nM) or its inactive analogue. 807 (I) Expression profile of SCRIB in various layers, as indicated. Legend as in E 808 (J) PPI network of SCRIB interactors identified using BioID. 809 (K) Immunoblots showing interactions of a few SCRIB partners identified using BioID. 810 (L) Schematics of analysis pipeline for discovering active L-R pairs (see STAR Methods). 811 (M) Heatmap showing combined log2FCs of L-R pairs in the Sec and Mem/Glyco datasets. 812 (N) Network plot showing L-R interactions detected between different P2 cell clusters. 813 (O) Scatter plot of correlations between indicated gene-pairs in Breast invasive carcinoma 814 samples. Regression line is shown in red.

815 Figure 6. Phosphoproteome dynamics during EMT 816 (A) Fraction of detected p-sites on a protein that are regulated during EMT. 817 (B) Detected p-sites on VIM and their expression during EMT. The gray lines indicate all p- 818 sites that are catalogued in PhosphoSitePlus database. 819 (C) Distribution of Pearson’s correlation between expression of proteins and the p-sites 820 detected on them. 821 (D) Schematic showing two examples each where expression of proteins and p-sites showed 822 either low (≤–0.1; CDS2, CBX1) or high (≥0.4; MISP, MICAL3) correlation. 823 (E) enrichment of genes with at least a single regulated p-site at any time point. 824 (F) Distribution of Pearson’s correlation between expression of proteins detected in Nuc 825 layer and the p-sites detected on them in the Phos layer. A few EMT hallmarks are 826 highlighted. 827 (G) Heatmaps of expression profiles of indicated molecules in Nuc and Phos layers. 828 (H) Structural model of MICAL3 p-sites, NLS and Importin-α. 829 (I) Ternary plot of kinase activity scores binned into 3 broad stages of EMT, i.e., E, ICS, and 830 M. 831 (J) Top. Expression of PRKCA as detected in various layers. Bottom. Pathway enrichment of 832 all differential substrates of PRKCA detected in our dataset. 833 (K) Top. Expression of AURKB as detected in various layers. Bottom. Pathway enrichment of 834 all differential substrates of AURKB detected in our dataset.

835 Figure 7. Integrative systems causal model of EMT 836 (A) Schematics of network and causal inference workflow. Non-redundant genes with most 837 significant expression (Hotelling’s T2 statistic) profiles were used for the analysis. 838 CausalPath-estimated logical networks were used to augment a custom-built confidence- 839 weighted scaffold interactome which was then used to solve the Steiner forest problem 840 using the OmicsIntegrator software. Only differential molecules (relative to Control, FDR 841 adj.p-value ≤ 0.05, |log2FC| ≥ 1) were considered as ‘prizes’. 842 (B) A subgraph showing the signaling context of well-known EMT players ‘queried’ on the 843 EMT network.

23 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

844 (C) Kaplan-Meier plots comparing prognostic performance of MSigDB hallmarks and 845 ‘controllers’ identified in this study, as indicated. 846 (D) Overlap between established EMT databases (as in Figure 1) and the ‘controllers’ 847 identified in this study. 848 (E) Overlap between the EMT databases and the ‘non-controllers’ identified in this study. 849 (F) A simplified schematic showing the hierarchical relationships between several bottleneck 850 genes at different stages of EMT, as indicated. 851 (G) Workflow for morphometric screening of drug combinations. The tool GENIMASEG is 852 available here: https://figshare.com/articles/software/GenImaSeg_Tool_zip/14371778 853 (H) The barplot of results of the above experiment indicates varying degrees of synergy or 854 antagonism between inhibitors, in influencing EMT-associated changes in cell shape 855 (eccentricity). Error bars indicate spread of datapoints across all quantified cells in each 856 condition.

857 Supplementary figure legends

858 Figure S1. Related to Figure 1. A multi-dimensional resource on TGFβ-induced EMT 859 (A) TGFβ treatments were staggered, at defined time periods, such that all plates were 860 harvested at the same time. Cells were serum starved for 16 hours before harvesting. 861 (B) Phase-contrast images of MCF10A cells at different time points. Scale bar = 100 µm. 862 (C) For maximizing efficiency while maintaining compatibility with technology-specific 863 protocols, after harvesting cells were collected in 4 aliquots per replicate, as shown. 864 Conditioned media were also collected. 865 (D) Upper panel – Barplots showing percentage of quantified proteins with annotation in the 866 ‘Cellular Component’ category. Lower panel – Boxplots showing intensity values (log2) of 867 proteins (= markers) commonly used to assess sub-cellular fractionation purity. Box 868 edges correspond to 25th and 75th percentiles, whiskers include extreme data points. 869 (E) Cumulative distribution (%) of Pearson’s coefficients across the samples (= time points). 870 Significant overlap between the 3 biological replicates, shown as 3 colors, indicates high 871 reproducibility. 872 (F) Heatmaps of molecular expression profiles of indicated layers. The mean of all quantified 873 molecules across the 3 replicates was used. 874 (G) Violin plots showing the spread of log2FCs of molecules at each time point (relative to 875 Control) for each layer.

876 Figure S2. Related to Figure 2. The topological architecture of EMT 877 (A) PCA of the various molecular layers. Time points are shown with different shapes and 878 colors. Points with similar shape/color indicate biological replicates. 879 (B) The population map presents the number of genes mapped to each individual metagene. 880 (C) The plot summarizes co-variance structure of datasets at the metagene level. 881 (D) The plot shows correlation of expression patterns of individual genes and the metagenes 882 in which they are contained. 883 (E) Temporal expression profiles of a few example genes identified in SOM analysis. Each 884 colored line represents a molecular layer, as shown. 885 (F) Temporal expression profiles of a few example genes enlisted as ‘EMT hallmarks’ in 886 MSigDB database. Each colored line represents a molecular layer, as shown.

24 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

887

888 Figure S3. Related to Figure 3. Dynamics of TGFβ-induced metabolic adaptations 889 (A) The heatmap displays the results of active subnetwork analysis using differential 890 molecules in Prot and Phos datasets. 891 (B) – (D) Metabolite-metabolite interaction networks, using MetaboAnalyst, for indicated 892 clusters. High connectivity indicates co-regulation of functionally related metabolites.

893 Figure S4. Related to Figure 4. scRNAseq analysis reveals cellular dynamics and novel TFs for 894 EMT 895 (A) An outline of the QC pipeline employed for scRNAseq data analysis. 896 (B) The plot shows top 25 most expressed genes. Each row corresponds to a gene and each 897 bar corresponds to the expression of the gene in single cells. 898 (C) Expression of indicated genes in mRNA layer. 899 (D) Developmental trajectories of MCF10A cells in response to TGFβ, inferred by Monocle3. 900 Clusters are indicated by colors. 901 (E) Heatmap showing aggregate expression of groups of genes (=Modules) with similar 902 expression pattern across the partitions, by Monocle3. Modules 9/12 were highly 903 expressed in P1/P3 and were enriched for ‘Cell cycle’ related GO annotations.

904 Figure S5. Related to Figure 5. Spatial regulation of proteins and intercellular communication 905 (A) The plot shows number of common genes (intersection size, y axis) between layers as 906 indicated. Only differential genes in each layer (‘Set size’, FDR ≤ 0.05; adj. p-value ≤ 0.05; 907 r2 ≥ 0.6 & |log2FC|, relative to Control, of ≥ 1) were considered for the analysis. 908 (B) Schematics of the BioID experiment performed to discover novel SCRIB partners induced 909 by TGFβ signaling. 910 (C) Circos plot showing L-R interactions between the P2 clusters. 911 (D) UMAP plots of scRNAseq data highlighting the co-expression patterns of MMP7 and CD44.

912 Figure S6. Related to Figure 6. Phosphoproteome dynamics during EMT 913 (A) An outline of QC pipeline for Phos data analysis.

914 (B) Enrichment efficiency for TiO2 workflow employed for the study. 915 (C) About 74% of all detected p-sites were reliably localized (=Class I) by MaxQuant. 916 (D) Quantifications of p-sites were achieved with a dynamic range of 106 orders of magnitude. 917 (E) Proportions of number of phosphate moieties in each detected p-site. 918 (F) Number of phosphoproteins regulated over the time course. 919 (G) Magnitudes of log2FC values for p-sites over the time course. 920 (H) The plot shows all kinases ranked according to their KSEA enrichment. Mean of |z-score| 921 values over all time points were taken.

922 Figure S7. Related to Figure 7. Integrative systems causal model of EMT 923 (A) A snapshot of the companion website which can be interactively and freely accessed at 924 (https://srphanse.shinyapps.io/EMTAPP_2/). 925

25 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 1

A Experimental design and premise C Datasets (this study, 10 time points)

MCF10A cells: TGFβ (10 ng/mL) Proteomics Genes PTM sites * 1 2 3 4 5 6 8 12 Control 4 Component Peptides (signif. up/dn) ** (signif. up/dn) ** hrs days Whole cell proteome (WC) 552,716 6,540 (403/470) -- Nuclear region (Nuc) 286,162 4,198 (298/259) -- Membrane (Mem) 35,870 2,223 (149/131) -- Epithelial Intermediate hybrids Mesenchymal Peptidome (Pep) 735 202 (31/31) -- Secretome (Sec) 27,136 1,133 (247/200) -- Epithelial E Extracellular vesicles (EV) 20,438 1,209 (149/123) -- layers Protein Hybrid E/M Metabolites Glycosylated proteins (Glyco) 6696 590 (17/26) -- Mesenchymal M miRNA Phosphorylation (Phos) 149,469 2,254 mRNA 8,741 (1,536/1,602) Transition points Topological Transition Molecular landscape Total unique proteins -- 7,632 -- relations kinetics Discriminants Transcriptomics Genes B Data acquisition, technologies and platforms Single cell Matrix Total RNA (signif. up/dn) Single-cell Bulk Number of cells *** 1,913 mRNA 23,787 (2,441/1,993) RNA sequencing Transcriptome, Metabolites & Proteome mRNA 9,785 mi-RNA 2,574 (178/118)

WC proteome † Metabolomics Features (MS1 peaks) HMDB indexed KEGG indexed Phosphorylation Nuclear region Features (Level 2) **** 27,759 8,325 (610/468) 1,118 (65/59) * Unique PTM sites remaining after QC steps; ** relative to `Control`; absolute log2FC ≥ 1, FDR ≤ 0.05, r2 ≥ 0.6; N-Glycosylation * *** Number of cells & genes remaining after QC steps **** Level 2 identifications (Salek et al., 2013) Membrane Secretome Exosome Peptidome Barcode cells mRNA D MultiOmic profiles of EMT markers miRNA Metabolome Epithelial markers Mesenchymal markers

CDH13 SCRIB CDH2 CALD1 EV Sec 10-plex (TMT labelling) Glyco Library preparation Mem Phos Acet * † Pep WC Nuc Microarray mRNA ddSeq SC seq y 1 y 3 y 8 y 2 y 5 y 6 y 4 a a a a a a a basic RP y 12

ABLIM2 MUC1 VIM PDLIM4 a D D D D D D fractionation D D 4 hours

Cells nLC/Q-Exactive HF-X log2FC Down Up Time points M Q Genes Genes Time points Data matrices Data matrix Ion spectrum E F G Variance contribution Differential molecules over time points Differential molecules as % of total detected

3 Layers Acet EV Glyco Mem EV Mem Differential Metabol 7% 23% 8% 13% miRNA No change mRNA 77% Nuc 93% 92% 87% Pep 2 Phos Prot Metabol miRNA mRNA Nuc Sec WC Sec iance 13% 11% 13% r 20% a Phos V Pep 87% 89% 80% 87% 1 Nuc mRNA miRNA Pep Phos WC Sec Metabol Mem 19% 14% 36% 40% Glyco 0 64% 60% EV 81% 86% PC 1 PC 2 PC 3 PC 4 Principal components (PC) Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 8 Day 12 4 hours

H Comparison to current knowledge I Enhanced discovery of coherent functional modules Proteomics microRNAs 1500 Day 12 This study dbEMT2 Day 8 dbEMT2 Day 6 703 Day 5 Day 4 1000 This study 1,066 93 56 Day 3 1,667 213 Day 2 Day 1 4 hours 500 126 46 MIR106B MIR193A MIR30D MIR130A MIR193A MIR30D 365 73 MIR130A MIR200A MIR331 Prize−Collecting Steiner Forest MSigDB MIR130B MIR200A MIR331 Number of biological interactions MIR130B MIR200B MIR345 MIR132 MIR20A MIR345 0 ACTN4 FN1 LOXL2 S100A8 MIR141 MIR221 MIR424 CD82 FOXO1 MMP2 SCEL MIR141 MIR221 MIR452 WC Phos

CDH1 IGFBP7 MUC1 SERPINE1 MIR145 MIR222 MIR491 ) CDH2 IRF6 NUMB SERPINF1 MIR149 MIR23B MIR491 Nuc CLDN7 ITGB1 OCLN SKP1 MIR149 MIR23B MIR506 Sec CLU ITGB4 PCDH9 SLC9A3R1 MIR150 MIR29C MIR506 Mem

EV Current CTNND1 JUN PKP3 SPARC MIR150 MIR301A MIR612 MSigDB ( F11R LGALS1 PLA2G4A TIMP3 MIR15A MIR30A MIR675 Pep knowledge F3 LOX ROCK1 TNC MIR17 MIR30B MIR675 Glyco bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 2

A B D Neighbor joining Relative pseudoeigengene variance R2 between layers Prot Glyco 0.25 Phos ... with total protein Day 12 Glyco Nuc Nuc 0.20 0.40 EVs Sec Mem 0.15 Day 8 ) 2 miRNA mRNA R 0.20

Pseudoeig 2 Sec 0.10 Mem Day 6 Pep Phos Metab Acet EVs 0.05 Day 5 mination ( Pep 2 r 0.3 0.4 0.5 0.6 0.00 Acet Day 3 Pseudoeig 1 ... with mRNA Prot Day 4 Glyco Day 2 C Matrix correlations 0.15 1 Mem Mem Sec Pep 0.10 mRNA 0.8 Coefficient of dete Nuc 1 Phos EVs Phos Sec 0.05 0.6 miRNA Acet EVs Pep Acet 0.4 0.00 Nuc Prot

4 hrs y 1 y 3 y 5 y 8 y 2 y 6 0.2 y 4 a a a a a a a Cont Glyco y 12 a D D D D D D Day 1 D

Metab D 4 hours

E F SOMs Features Contribution Active subnetwork analysis of upregulated genes Cooperation of PDCL (PhLP1) and TRiC/CCT in G−protein beta folding ACTG1_Mem 50.0 Formation of tubulin folding intermediates by CCT/TriC STC2_Sec 40.0 Association of TriC/CCT with target proteins during biosynthesis SLC43A3_Prot 30.0 Chaperonin−mediated protein folding S100A2_Sec Interaction between L1 and Ankyrins hsa-miR-3175_miR 20.0 Control Cytosolic tRNA aminoacylation ABCF1_Prot 10.0 Metal sequestration by antimicrobial proteins PROM2_T816_Phos 0.0 Activation of BAD and translocation to mitochondria LRR FLII−interacting protein 1 (LRRFIP1) activates type I IFN production ACTG1_Mem 60.0 Loss of MECP2 binding ability to 5mC−DNA ZNF827_S487_Phos rRNA modification in the nucleus and cytosol SLC43A3_Prot 40.0 Telomere Extension By Telomerase S100A2_Sec Chk1/Chk2(Cds1) mediated inactivation of Cyclin B:Cdk1 complex ABCF1_Prot 4 hours 20.0 Cytosolic sensors of pathogen−associated DNA STC2_Sec IRF3−mediated induction of type I IFN SRSF7_S227_Phos 0.0 Beta oxidation of hexanoyl−CoA to butanoyl−CoA PLCG1 events in ERBB2 signaling PROM2_T816_Phos Activation of C3 and C5 EPHB2_Glyco 40.0 Loss of MECP2 binding ability to 5mC−DNA CBX1_Prot Detoxification of Reactive Oxygen Species ABCF1_Prot Respiratory electron transport, ATP synthesis Day 1 hsa-miR-34c-5p_miR 20.0 Respiratory electron transport H3F3B_Prot SUMO E3 ligases SUMOylate target proteins HPR_Sec 0.0 LRR FLII−interacting protein 1 (LRRFIP1) activates type I IFN production 1 PLCG1 events in ERBB2 signaling hsa-miR-675-5p_miR 40.0 Scavenging of heme from plasma NBL1_Sec 30.0 Retinoid metabolism and transport A0A3B3ITY6_Sec Anchoring fibril formation ITM2B_Sec 20.0 ECM proteoglycans Day 2 FSTL3_Sec Chondroitin sulfate/dermatan sulfate metabolism SERPINA3_EV 10.0 Non−integrin membrane−ECM interactions LTBP1_Sec 0.0 Laminin interactions Propionyl−CoA catabolism LTBP1_Sec 30.0 Defective HLCS causes multiple carboxylase deficiency hsa-miR-30c-5p_miR Anchoring fibril formation IGFL1_Prot 20.0 Fibronectin matrix formation SERPINA3_EV ECM proteoglycans

Day 3 NBL1_Sec 10.0 Laminin interactions NKPD1_T551_Phos Chondroitin sulfate/dermatan sulfate metabolism Time FSTL3_Sec 0.0 Negative regulation of TCF−dependent signaling by DVL Defective HLCS causes multiple carboxylase deficiency hsa-miR-21-5p_miR 40.0 Integrin cell surface interactions LOX_EV 30.0 Fibronectin matrix formation SHARPIN_Prot Golgi Associated Vesicle Biogenesis SQSTM1_S277_Phos 20.0 Regulation of TNFR1 signaling ARHGAP33_Sec Day 4 TP53 Regulates Transcription of Cell Cycle Genes MT.CO1_Prot 10.0 Signaling by Non−Receptor Tyrosine Kinases MMP9_Sec 0.0 p130Cas linkage to MAPK signaling for integrins Phosphorylation: G1/S transition by active Cyclin E:Cdk2 complexes HMGN1_Sec 40.0 Cell−cell junction organization ARHGAP33_Sec 30.0 CDO in myogenesis SHARPIN_Prot Adherens junctions interactions RAPGEF5_Sec 20.0 RHO GTPases activate ROCKs Day 5 TMSB4X_Sec RHO GTPases activate PAKs hsa-miR-3162-5p_miR 10.0 RHO GTPases activate IQGAPs CRIP2_Sec 0.0 Uptake and function of diphtheria toxin 2 Activation of ATR in response to replication stress ARHGAP33_Sec DNA strand elongation hsa-miR-5189-3p_miR 40.0 Orc1 removal from chromatin ALCAM_Nuc Synthesis of DNA TPM2_Nuc Mitotic G1−G1/S phases

Day 6 NASP_S480_Phos 20.0 Unwinding of DNA HMGN1_Sec Assembly of the pre−replicative complex ACKR2_Prot 0.0 Regulation of TNFR1 signaling Phosphorylation: G1/S transition by active Cyclin E:Cdk2 complexes ZNF827_S487_Phos 60.0 CDO in myogenesis CALR_K62__1_Acet Adherens junctions interactions RAPGEF5_Sec 40.0 RHO GTPases activate IQGAPs DNAH11_Prot Uptake and function of diphtheria toxin Day 8 ARHGAP33_Sec 20.0 Calnexin/calreticulin cycle hsa-miR-21-5p_miR Antigen Presentation: Folding, assembly and peptide loading of class I MHC DDX60L_Prot 0.0 Assembly of Viral Components at the Budding Site G0 and Early G1 ZNF827_S487_Phos 80.0 Smooth Muscle Contraction SOCS3_S13_Phos 60.0 Striated Muscle Contraction CCDC171_S121_Phos Unwinding of DNA ATF7IP_Prot 40.0 Insulin receptor recycling TPM2_Nuc Day 12 20.0 Iron uptake and transport DDX60L_Prot Transferrin endocytosis and recycling SSR3_Prot 0.0 TAK1 activates NFkB by phosphorylation and activation of IKKs complex EV Sec Nuc Prot Acet

Mem Time Phos Glyco miRNA Metabol bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 3

A Metabolite classes B Neighbor joining E Metabolite-Enzyme correlation Total detected Downregulated Upregulated Day 12 (AAM; SOM 4 hours & day 1) Phenylpropanoids and polyketides Organosulfur compounds Day 8 Day 6 Organooxygen compounds Organonitrogen compounds PLA2G15 Organoheterocyclic compounds Day 5 Organic compounds Organic acids and derivatives Nucleosides, nucleotides, and analogues Day 2 Day 4 Lipids and lipid−like molecules 4,259 putative metabolites Lignans, neolignans and related p-value ≤ 0.05 Hydrocarbon derivatives Day 3

Homogeneous non−metal compounds r² ≥ 0.6 ) Homogeneous metal compounds q-value ≤ 0.05 Benzenoids 113 Alkaloids and derivatives Day 1 0 3 6 9 0 3 6 9 0 3 6 9 Number (log2) Cont 4 hrs C SOMs D Integrative network analysis (for metabolites) (SOM metabolites + Differential Prot & Phos) KEGG enzymes (n= PLA2G4A_S437 4 hours & day 1 Pearson's Contribution to correlation enrichments 1 PLA2G4A −33.12 Arachidonic acid metabolism 0.5 CYP4F2 Proteins 0 Control −0.5 −1 −8.62 Glycerophospholipid metabolism Metabolites

−7.19 Lysine degradation p−value (log2) 5(S)-HETE 5(S)-HPETE 12(S)-HPETE −4.34 Tryptophan metabolism 15(S)-HPETE Leukotriene A4 Leukotriene B4 Prostaglandin E2 4 hours 4 6 8 10 F 0-OH-Leukotriene B4 Days 2–4 Enzymes ranked by correlation (Pearson's r ≥ 0.7)

−22.93 Glycerophospholipid metabolism GSR PXDN DDX21 HARS PTPN14 PLA2G15 HK2 Day 1 −11.86 Fatty acid metabolism MICAL2 CSS2 CHD3 A GLUL −11.28 Lysine degradation MICAL3 CES2 SSH3 DDX52 HSD17B6 NSD1 DDX41 −10.18 Steroid hormone biosynthesis PCYT1A LIG1 DNMT3A P4HA1 GPHN Enzymes −7.88 Sphingolipid metabolism PLA2G4A DDT ICMT Day 2 PDE4D MCM6 MGST3 LY −7.6 ASS1 p−value (log2) Biosynthesis of unsaturated fatty acids C A TRX MCM3 A CYP4F2 −7.3 Other types of O−glycan biosynthesis SETD1A GALE PTPRE PCMT1 −7.05 Inositol phosphate metabolism 0.7 0.8 0.9

Day 3 Correlation −6.52 Ether lipid metabolism Time G 2.5 5.0 7.5 10.0 12.5 Dynamics of metabolites & enzymes in AAM pathway TGFβ Days 5–12

−19.64 Lysine degradation Day 4 Phos PLA2G4A PLA2G15 −18.38 Glycerophospholipid metabolism WC mRNA Scaled −16.18 Fatty acid metabolism expression

−14.69 Retinol metabolism 2 Free AA 1 0 −12.16 Arachidonic acid metabolism −1

Day 5 −2 −11.46 Glycolysis / Gluconeogenesis

−11 Glutathione metabolism CYP450 Cyclooxygenases Lipoxygenases

−10.09 N−glycan biosynthesis CYP4F2 COX2 LOX −8.94 Mucin type O−glycan biosynthesis

−8.88 Other types of O−glycan biosynthesis Day 6 −8.72 Biosynthesis of unsaturated fatty acids 8,9-EET 5,6-DH-PGJ2 5-HPETE

p−value (log2) 5,6-EET 2,3-Dinor TXB2 8-HPETE −8.37 Drug metabolism − other enzymes 11,12-EET 11-HPETE 14,15-EET 12-HPETE −7.73 Glycerolipid metabolism 15-HETE 15-HPETE 16-HETE 11,12-EETA −7.64 Glycosaminoglycan degradation 19-HETE 14,15-EETA 20-HETE Metabolite profiles 20-OH-LTB4 Day 8 −7.32 Glycosaminoglycan biosynthesis − keratan sulfate Hepoxilin A3 Hepoxilin B3 −7.21 Steroid hormone biosynthesis Scaled −7.04 Chondroitin / dermatan sulfate expression Time

−6.82 Ether lipid metabolism

−6.71 Inositol phosphate metabolism Day 12 5 10 15 Number of hits EMT progression bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 4

A Expression of hallmark transcripts (scRNAseq) B Differentiation trajectory

CDH1 EPCAM KRT14 MUC1 Control 4 hours Day 1 Day 3

30.0 100 ●●● ● ● ● ●●●●●●●●●●●● ● ● ●●● ● ●●●●●●●●● 100 ● ● ●●●●● ●●●●● ● 10.0 ●●● ●●● ● ●●● ● ● ●● ●● ●●●●●●●●●●● ● 10.0 ●● ●● ● ● ●●●●●●●●●● ●●● ●● ●● ●● ●●●● ●●●● ●●●●●●●● ● ●●●●● ● ● ● ● ●●●●●●● ●●● ●●● ●● ●●●● ●●5 ● ●●●●● ●5 ● ● ●● ●5● ●●● ●●5●●● 10 ●●●●● ● ●● ● ●● ● 3.0 ●●●●●● ● ●● ●● ● ● ●●●● ● 10 ● ● ● ●●● 3.0 ●● ● ●●● ● ●●●● ●●● ●● ●●●● ●9 ●● ●9 ●●●● ●●●●● ●●●9●●●● ●● ●●9●● ●●●●●● ●● ●●●●●● ●●● ● ● ●6 ●● ●● ●6 ●●●●●●●●●● ●6 ●● ●●6● ● ●32 ● ● ●32 ●●●●●● ● ●●●32●● ●●32●● 1.0 ● ● ● ●●●●●●●●●●● ●●●●●●●● ● ●●●● ●●●●●● 1.0 ●●●● 8● ●●●●●●●● ●8●● ●●●●●●●●●●●●● ● ●●8●●●● ● ●● ● ●8●● 1 ●●●●●●●●●●●● ●● 7 ●●●●●●●●●●● ● ● 7 ●●● ● ● ● ●●●7 ● ●●● ● ● ●7 1 ●●●●●●●●●●●●●●●● ● ●3●●●●1●●1 ● ●●●●●●●●●●●●●● ●● ●●●3●●11 ●3 11 ● ●3 11 ● ●●●●●●●4●●●●●●●● ●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●4 ●●●●●●● ●● ●4 ●● ● ●4 ● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●● ● ● ● ●●●●●●● ●●●●●●2●●●●● 2● 2 2 ●●●●●●●● FAP FN1 LOX VIM Day 4 Day 5 Day 6 Day 8 30.0 ●●●●●● ●●●● ●●●●● ● ●●●● ●●●●●●●●●●● ● ● ●●● ● ●●●●● ●●●●●●●●● Expression ● ●● ●● ● ● ●●●● ●●● ●● ● 30 ●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●● ●●● ●●● ●●●●●●●●●● 100 30.0 ● ●● ● ● ● ●●●● ●●●● ● ●● ● ● ●●●●●●●●●●●●● ●●● ●●●● ● ●●●●●●●●● ● ● ●● ● 10.0 ●●●●●●● ● ● ● ●●●●●●●●●●●● ● ●●●●●●●●● ●● ●●●●●●●● ●●●●●●●●●● ● ● ●●●●●●● ● ●●●●●●●●●●●●●● ●●●●●●●● ● ●●●●●● ● ● ●●●●●●●●●● ●●● ●●●●●●●● ●●●●●● ● ● ●●●● ●●●●● 10.0 ●●●●●● ●●●●●● ● ●●● ●●●●●●●● ●●● ● ●●●●●●● ●●●●●●● ●●●●● ●●●●●● ●●●●● ●●●●● 10 ●●●●● ● ●5 ●● ● ●5●●●●● ● ●●●●●●●●● ●●5●●●●●● ●●●●●● ●●● ●5●●●●●●●●● ●●●● ●● ●●●●●● ● ●●●●●● ● ● ●●●●●● ●●● ● ●●●●●●● ●●●● ●● ● ●●●● ●●●●●●●● ● ●●● ●●●●● ●●●●● 3.0 10 ●●●● ● ● ●● ●●● ● ●● ● ● ● ● 3.0 ● ●●9●●●● ● ●●9●● ●●9●●● ●●● ● ●●●●● ●●● ● ● 3 ●●●3●●● ● ●●●●63 ●●●6●3●●●● ●●●●9 ● 1.0 ● ●●●●●2●●●●● ●●●●2●●●● ●●●2●●●●● ●●●● ● ●8●●●●●● ●8●●●●●● ●8 ● ●●63 1.0 ● ● ●●●●7●● ● ●●●7●●●● ● ●●7●●● ●2● ●1 ● ●1 ●●● ●1 ●●●●● ●1 ●●●●●●● 1 ●4 ●3 ●1 ●4 ●3 ●1 ●4 ●3 ●1 ●4 ●3 ●1 ●8●●7●● 1 ●●●●●● 0.3 0.3 ●2 ●2 ●2 ●2 Time points Clusters: 3 4 5 6 8 12 13 14 16 17 18 19 20

C Cell dynamics E Unique TFs for each subtype 20 ● ● ● ● ● ● ● ● CLPX 17 CD59 CALD1CALU AM200B AP1AR DAB2IP ● ● ● ● ● ● ● ● DCBLD2 C1orf21 F KIAA0101 13 S6 KLHL2 14 ● ● ● ● ● ● ● ● BTBD MED25 1 BTF3 PFKFB3 14 PLEKHA2 ● ● ● ● ● ● ● ● POM121C BNIP1 RNF 12 11 ● ● ● ● ● ● ● ● BICC1 RNF207 SLC7A6 8 A BAG4 19 ZNF567 TP2B1 ZNF300 ABCC5 ● ● ● ● ● ● ● ● ZNF236 6 18 ARFGEF1ARFIP2 ABHD12 ● ● ● ● ● ● ● ● S5 ZHX1 ABHD2 16 MYPOP ZEB1 NFIB 4 ● ● ● ● ● ● ● ● CITED2 9 AGTRAPAPC GRHL3ING4 POLI RFXANK COL8A1 ● ● ● ● ● ● ● ● ZNF124 2 S4 E ZNF358 CTBP1 8 ACOX1 P ZNF398 11 ● ● ● ● ● ● ● ● AS1 CUX2 ZNF581 EIF3E 0 5 S3 ZNF587 ● ● ● ● ● ● ● ● AEBP2 PDP2 4 ZNF6 ● ● ● ● ● ● ● ● S2 ZSCAN30 ZNF692 POC5 6 POLR2B ● ● ● ● ● ● ● ● ZNF740 3 S1 CBFB RMND1 ● ● ● ● ● ● ● ● ZNF544 EXOSC3T2A ADNP2 KA PDLIM5 ZNF35 G6S6 ALDH3B1

VIM MECP2 FN1

FAP TG2A LOX G1 A PALLD ZNF33B S1

Day 1 Day 3 Day 4 Day 5 Day 6 Day 8 MIOS CDH1 MUC1 ATP1A1 KRT14 Control 4 hours LUZP1

EPCAM 1A Clusters VDR NUAK1 BCL1 KLHL5 ZNF100 E M TRPS1 C1orf226 K ZNF248 ATNAL1 SOX7 CHCHD3 Mean expression ZNF264 KATNA1 NCOR1 G2 CSTF1 S2 ZNF37A FILIP1L MBD4 G5S5 ZNF436 Cumulative TF activity across all cell clusters DSE D LRRFIP1 Unique ZNF75A

CLASP2 root LMO2 TFs ZNF766 POLR2A CALD1 RAD21 KDM4B ZNF85 ARHGEF17 KDM5B BCLAF1 ID2 ASF1A ATXN7 S3 CIC CNOT1 7.5 HKR1 G3 HDAC2 ATXN1 GLIS2 GRB7 GOT1 HMGB2 NEDD4L ATP13A3 FOXK2 S4 MXD3 TSHZ1 ELF1 ARIH1 G4 SF1 ZFHX3 ALCAM ZNF71 SOX12 AKAP1 5.0 EGR1 TBPL1 CHD9 ZNF548 ZBTB JUN ZNF48 ZBTB24 KLF9 ZNF148 11 NR4A1 ZNF217 ZNF266 ZNF430 PHF21A ZNF195 ZNF75D SCENIC score ZFX RNF10 ZNF133 ENO1 2.5 HMBOX1 IRX5 TPM1 MBD1 METTL14 UBE2K SP1 ZBTB45 FOXK2 ING4 VEZF1 VPS54 NNT NF1 ZNF266 SP100 CD59 NF1 TWIST2 SNAI2 RREB1 MSX1 EPAS1 MSX1 GLIS2 ID2 0.0 ZEB1 Known EMT–associated CALM1 CALM2 TFs in dbEMT 2.0 CAPN2 100 200 300 400 500 LIMA1 PFN1 PSMB7 RPL37A S100A16 SRSF2 Ranked TFs TNFRSF12A

F Top 15 enriched TFs in each subtype G ATF3 BCLAF1 BHLHE40 CEBPB CEBPG EGR1 ELF1 ELK3 0.75 Schematic of human TF–binding array 0.50 0.25 ɑ-COF 0.00 ELK4 ENO1 ESRRA ETS2 FOS GTF2F1 HDAC2 JUNB 0.75 Apply 0.50 Cells nuclear 0.25 extract 0.00 to array KDM5B MAZ MXI1 MYC NR3C1 PML POLR2A RAD21 TGFβ 0.75 treatment SCENIC score 0.50 for 24 hours 0.25 0.00 RCOR1 RELA SMARCA4 SP1 TAF7 UQCRB YBX1 ZMIZ1 Prepare 0.75 Nuclear extract 0.50 COF recruitment motif 0.25 S2 S4 0.00 S1 S3 S5 S6 Infer TF-binding using database (=δB−scores) H I δB−score distributions δB−score distributions

p-val ≤ 0.001 p-val ≤ 0.006 D ≥ 0.32 D ≥ 0.27 actors actors f ZNF263 f PAX1 p-val ≤ 0.007 p-val ≤ 0.120 D ≥ 0.34 D ≥ 0.25 iption iption r SP1 r GCM2 p-val ≤ 0.000 p-val ≤ 0.092 ansc ansc

r D ≥ 0.45 r D ≥ 0.23 T T GLIS2 EGR1

−2.5 0.0 2.5 Recovered Consensus −2 0 2 4 Recovered Consensus δB−scores δB−scores bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 5

A Spatial B Class I (top 25) C Expression D Class II (top 25) E Expression 0.4 tion regulation rrela co FAP CALD1 EIF4G1 PLOD3 's SERPINH1 DIAPH1 of on 2 rs 0.20 S100A8 CUL3 ● a 1 proteins e CALD1 1 TMEM131 P 0 MAOA 0 ACTB PLOD2 MRPL12 −1 FN1 −1 TIPRL TGM1 −2 LDHA −2 TAGLN EIF4G2 ● 0.17 FNDC3B FAP OLA1 SND1 EVs Secretome GALNT2 2 VPS26A 0.12 1 CDH2 1 RCC1 S100A9 TEX10 0

0.19 xpression (log2) 0 xpression (log2)

SOAT1 e GLUL e 0.44 −1 CYBRD1 −1 EPRS

ed ed −2 P4HA2 z −2 LSM2 z Glycosylated IGF1R PTRF S100A8 UFL1 0.18 LARP1 PGK1 PADI2 ● 2 RRBP1 0.26 1 Membrane PDIA5 1 FSCN1

0.16 Standardi Standardi CALU SLC25A3 0 0.58 MCAM 0 FARSB TGM2 −1 SND1 −1 C12orf75 −2 PLOD3 −2 FLNA UFL1

0.96 0.97 0.98 0.99 y1 y2 y3 y4 y5 y6 y8 −0.90 −0.93 −0.96 −0.99 y1 y2 y3 y4 y5 y6 y8 a a a a a a a a a a a a a a y 12 y 12 a a D D D D D D D D D D D D D D D D Control Control Pearson's correlation 4 hours Pearson's correlation 4 hours

Nucleus EV–Mem ● EV–Nuc EV–Sec Glyco–Mem Glyco–Nuc Mem–Nuc Mem–Sec Nuc–Sec EV Sec Glyco Mem Acet Phos Pep Nuc WC mRNA

F Expression profiles G hTF array of 3 major co-factors H BRD proteins regulate EMT

BRD2 BRD4 JQ1 inactive JQ1 Untreated 2 Pearson's r = 0.48 Pearson's r = –0.42 +TGFβ

p-val ≤ 0.018 1 D ≤ 0.017

p300 – TGFβ p-val ≤ 0.000 xpression (log2)

0 actors f e

− D ≤ 0.039 ed Co LSD1 z SP1 KLF14 −1 WC p-val ≤ 0.000 PLAG1 KLF16 D ≥ 0.086 GLIS2 EGR1 Nuc SP4 ZNF263 mRNA BRD4 Standardi −2 + TGFβ y1 y2 y3 y4 y5 y6 y8 y1 y2 y3 y4 y5 y6 y8 a a a a a a a a a a a a a a y 12 y 12 −3 −2 −1 0 1 2 3 a a D D D D D D D D D D D D D D D D Control Control 4 hours 4 hours δB−scores

I Expression of SCRIB J BioID'ed SCRIB interactors with known role in EMT K IP of SCRIB interactors SCRIB Pearson's r = –0.21 SCRIB – – + + 2 TGFβ – + – + SCRIB 1

TJP1 LAMC2 TANC1 FASN EIF4G1 MKL2 UTRN SLC16A3 SLC4A7 COL17A1 CAPZA1 ARHGEF7

xpression (log2) 0 e SNAP23 ed IP: Streptavidin z −1 EEF1D SNAP23 WC CTNNA1 TJP2 ITGA6 WWOX SCRIB Mem −2 Input Standardi Phos GAPDH SCRIB BioID (this study) Enriched pathways p–value Known PPI y1 y2 y3 y4 y5 y6 y8 ITGB1 a a a a a a a y 12 Up-regulated (Mem layer) a D D D D D D D Cell-Cellcommunication 2.55E-24 D Control

4 hours Down-regulated (Mem layer) Integrin cell surface interactions 2.94E-18 Not detected (Mem layer) Cell junction organization 6.60E-15

ITGA2 ITGA3 TLN1 FLNA ITGA5

L M Analysis pipeline Ligand-Receptor (L–R) pairs (Sec, Mem and Glyco datasets) CRK Database of UR

L-R pairs AV (>2500) A logFC ITGB5 ITG LRP1 PL AV LRP1 T1 4 − − − − − AV R ITGA1 ITGB1 UR UR LRP1 LDLR 2 AV ST14 − − ITGB4 ITGB1 ITGA2 SDC1 OBO1 COL17A1 ITGB1 ITGA2 ITGA6 ITGA3 CD151 V1 EGFR ITG A EPHB2 EPHB2 EPHB2 EPHA2 ERBB2 − − Proteomics A F11R EPHB2 PVRL2 PVRL1 ITGB5 ITGB1 CD63 ITGA3 − EPHB2 CD44 LRP1 CD44 CD151 − − − − − − − − − − A ITGA5 LRP1 IGF1R

− − 0 R CELSR1 LRP1 SO SDC1 GPC1 ITG LRP1 ITGA3 − − − − − − − − − − − − − − − − − − IGF2R CD44 F3 LRP1 − − − − − − − − − − − − NCSTN C F3 ITGB1 PL Sec -- Ligand NT5E ITG ITGB6 CD44 ITGB1 PL ITGA5 ITGA2 ITGA3 ITGA6 −2 IGF1R − − − − − − − − − − − − − − − − − − − T4 T4 T4 T4 AP1 AP1 Mem/Glyco -- Receptor − L L L L −4 scRNAseq P P TBP1 MMP9 SERPINE1 SERPINE1 L SERPINE1 LAMC2 LAMC2 SERPINE1 CALR SEMA7A SEMA7A SLIT1 TIMP2 MMP9 MMP9 FN1 FN1 MMP7 CALR ML FN1 LR FN1 FN1 FN1 FN1 LAMC2 EFNB2 ML EFNA5 EFNB1 ANXA1 ML ML APP AZGP1 TIMP1 CALR MMP7 PKM PSAP PSAP PSAP IGF2 FN1 FN1 FN1 LR APP TIMP2 TFPI HSPG2 SLIT1 EFNA1 PLG CTGF EFNB1 TFPI HSPG2 CTGF SLIT1 SPINT1 INS LAMC2 LAMC2 PLG PLG HSPG2 LAMC2 SERPINA1 LAMC2 CDH1 Day 12 Day 8

Cells Day 6 Genes Day 5 Time points Genes Day 4 Day 3 Day 2 Day 1 4 hours Identify L-R pairs N O Crosstalk (scRNAseq) 17● Expression correlation in BrCa clinical samples Clusters ) LR pairs 1 Time points MMP7–CD44 Time points ●12 Protein 2 ●16 1 29 LAMC2–ITGB4 Select L-R pairs iTALK CPTAC ( (combined log2FC ≥ ±1) ●13 compare ) Annotation 5 1

LR pairs 5 3 Time points mRNA 1 6 CD44

14 CD44 MMP9–CD44 LAMC2 4 ●1 ● ● 1 2 TCGA ( 1 20 2 1 ● 1 MMP7 MMP9 ITGB4 1 ●3 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 6

A Fraction of regulated p−sites B Regulation of VIM p–sites C Correlation: Prot and Phos 4 hours Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 8 Day 12 ~50% ● ● ● ~26% S5 S10 S39 S72 S73 S214 S325 S339 S459 ● 1.0 Quantified 75 M Regulated sites (%) − 50 ● E/M

Density 0.5 E 25 ●

Regulated p 0.0 0 VIM 1 50 100 150 200 250 300 350 400 450 −1.0 −0.5 0.0 0.5 1.0 Proteins (ranked by fraction of p–site regulation) E E/M M Down Up IF rod [IL]−x−C−x−x−[DE] motif Correlation

D Regulation of Prot and Phos E Enriched "Cellular components" Prot S21 S33 S37 M CDS2 Phos MISP T377 S394 S395 S397 S400 S543 S544 S675 ICS E

0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 GYF T684 S685 S687 S977 CBX1 S89 S91 MICAL3 Adj.P.value A 2.0–20 0 100 0 100 300 500 700 900 1100 1300 1500 1700 1900 A –03 Chromo 1 Chromo 2 CH domain LIM Zn-binding bMERB Nuclear localization signal 9.9

F Correlation: Nuc and Phos G Nuc and Phos H Structural model of MICAL3 p-sites, NLS and Importin-α MICAL3_T684 MICAL3_S685 0.8 MICAL3 NLS EMT hallmarks MICAL3_S687 MICAL3_Nuc

0.6 VCAN_S1129 VCAN_Nuc ≥ 0.4 r FLNA_S2131 FLNA_Nuc 0.4 PVR

Density SDC1 TNC_S72 VCAN TNC_Nuc T684 0.2 VIM TNC TOPORS_S98 TOPORS COL4A2 VCAN TOPORS_S567 Importin-α CD44 ITGAV DST BASP1 FLNA MICAL3 TOPORS_S570 S685 ≤ –0.1

r TOPORS_Nuc 0.0 S687 y 1 y 3 y 8 y 2 y 5 y 6 y 4 Z-scores a a a a a a a −1.0 −0.5 0.0 0.5 1.0 y 12 a D 5 D D D D D D D Control 0 Correlation 4 hours E E/M M −5

I EMT stage-dependent kinase activities J Expression profiles K Expression profiles

E/M PRKCA AURKB AKT1 100 CDK4 CAMK2A AKT3 CDK6 CAMK2B CHEK1 1 CLK2 CDK2 CSNK1G1 1 HIPK1 20 CSNK1A1 CHUK EIF2AK2 GSK3B MAP2K1 80 0 IKBKB MAP2K5 xpression (log2) xpression (log2) MAP2K4 CSNK1G3 e e AKT2 MAPK8 MAPK13 Mem 0 MAPK11 GRK5 ed −1 ed MAPK9 MAPK14 z Nuc z MARK1 MAP2K6 40 GRK1 MAPKAPK3 PDHK4 WC Mem PRKAA2 CAMK1 60 PDK2 PKD1 −2 mRNA mRNA RPS6KA1 −1 RPS6KA2 PRKAA1 PRKCD PRKCB PRKCE RPS6KB2 ATM PLK4 Standardi Standardi PRKCT PRKCG y1 y2 y3 y4 y5 y6 y8 y1 y2 y3 y4 y5 y6 y8 a a a a a a a a a a a a a a y 12 y 12 60 a a PRKCA TTK PRKCH D D D D D D D D D D D D D D D D Control Control PIM1 40 PRKCI 4 hours 4 hours CSNK1G2 NEK2 AURKB CDC42BPA ROCK1 ARAF PLC beta mediated events ATR PRKDC CAMK2G ROCK2 RHO GTPases activate IQGAPs AURKA MAK Acetylcholine regulates insulin secretion CAMK4 RPS6KB1 Apoptotic cleavage of cellular proteins 80 NEK4 PLK1 CSNK1E The NLRP3 inflammasome CSNK2A2 Initiation of Nuclear Envelope Reformation MARK4 20 PDK1 Ca−dependent events CASP−mediated cleavage of cytoskeletal proteins CDK7 NOSTRIN mediated eNOS trafficking RAF1 DAG and IP3 signaling Sema3A PAK dependent Axon repulsion XBP1(S) activates chaperone genes 100 CSNK2A1 CLK1 Platelet sensitization by LDL Inflammasomes PAK1 RPS6KA3 mTORC1−mediated signalling Regulation of insulin secretion Cell Cycle, Mitotic E 20 40 60 80 M Synthesis of IP3 and IP4 in the cytosol 100 Cell Cycle CaM pathway Pathways associated VEGFR2 mediated vascular permeability

Effects of PIP2 hydrolysis with substrates 0 500 1000 1500 0

250 500 750 Enrichment score 1000 Enrichment score bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 7

A An overview of network analysis pipeline B PathLinks: TGFBR1 to key EMT players Phos Phos miRNA Metabol TGFBR1 Exos Sec Glyco Mem Prot t1;t2;t3 t1;t2;t3;t6 Nuc t1;t2;t3;t4; t1;t2;t3 t5;t6;t7;t8

SMAD2 SMAD3 SHC1 RAC1 Feature selection Hotelling's T² statistic t1; t1; t1; t3 t2; t3; t2; t1; Differential expression analysis t3 t1; t2; t4 t3; t2; t 3 t2; p-value ≤ 0.05 t4; t 3 t3 t3 CausalPath absolute log2FC ≥ 1 t1;t2; t3; t8 t6 Causal modeling t3;t5; using prior mechanistic t8 knowledge Determine Prizes MEF2A CEBPB MYC MAPK3 CDC25A MAPK14 MAPK1 ROCK1 PAK1 t3; t1; Augmentation of the OmicsIntegrator t5; scaffold network t1; t3 t6; Generates forest of trees t2; t6; t2; t 3 t7 t3 t7; (=biological networks) t3 t6; t8; with highly t7; t 8 t 4 t 8 t9 connected branches t8 (=signaling pathways) SERPINB5 TP63 TAL1 POLR2E POLR2A CDK2 MAX CSNK2A1 *SCRIB ... t5; Time 2 EMT network t6; t 9 Time 10 t6; t7; t6; t 4 t5; A hierarchical causal t8 TMEM104 UHRF1BP1 PROCR t7; FXR1 t7 t8 EFR3A THOC5 NSMCE4A ATP2C1 SYVN1 ZC3H4 ELAVL2 PPAPDC2 signaling network LYRM2 t 7 TAF6 AGFG2 HMGB3 TRIP12 MRPS25 FRMD4B RAPH1 MBD1 t8; RAP2B CD9 CAST PLXNB1 MTFR1

LDHA DOCK7 GOLGA5 t9 hsa-miR-125a-3p UBE2J1 ANPEP CDK14 hsa-miR-338-5p PACS2 FAM134C LIPA ERBB2 TOPBP1 ESYT1 RALY GPR137BELOVL1 PHF23 UBN1 WARS GTF3C1 MAP4 MYCBP2 RAVER1 TRIM23 CWC25 HS2ST1 TAGLN2 PAQR7 CTR9 FRY SLC46A3 PRRC2B SLC25A5 CCNY RAB3D SEC63 IGF2 RRP12 EIF5 ERBB2IP SLC39A9 RSBN1L JUN PERP MTA1 NCOR1 OCRL TBC1D4 SCAF11 ATP11AKRT72 HSP90B1 SOD2 COTL1 SDHAF1 LIG1CEP170B OSTM1 BUD13 ATP6V1B2 CTDP1 NEXN PRRC2C HIGD1A INTS3 EFNB2 SETD1B PHF8 GBP1 DCAF5 ZNF638 MINK1 ACKR3 GAS2L1 CAP1 SP110 TMEM65 METAP2 PPIC ADD3 MME MRC2 CPD LETM1 HMGN1 SLTM KRT77 SETD7 ATM ZNF462 INF2 RBM26PRR12 ZC3HAV1 STEAP3 CBX3 PLXDC2 ARHGAP32 CHSY1 ASXL2 MED13 APBB2 SP100TRIM22 RANBP3 HN1 SMG7 CHST3 GALNT2 SCD5 TJP2 RRBP1 PDIA3 RABGAP1 ITGA6 SETX hsa-miR-125a-5p HP1BP3 MOCS1 KIAA1217 NDUFAB1 SF3A3 ZNF629 TESK1 RAD9A PALLD CELSR2 DGCR8 CYBRD1 ARHGAP17 BASP1 LUC7L2 TTC7B ISG15 PRICKLE2 ARGLU1 DDX60L BTBD9 YTHDF1 TMBIM6 SNX6 GPC1 OGT TBC1D15 IRF2BP2 AHCYL1 ACTR2 PLEKHA7 RAB11FIP1 MIER1 STAU1 GANAB SRGAP2 LUC7L3 VAPB DUSP3 HOXC13 FOXK1 LAMC1 WAPAL CST6 SNX30 LAP3 OAS3 DLC1 PAPOLA JPH1 FBN2 hsa-miR-1 t5;t6;t7; LLGL2 FARP2 HADHATLN2 SNCG OSBPL8 MRE11A MTSS1 HBB SLC7A5 ANXA9 RANBP10 MGAT1 ATG2B FAM69A YIPF6 NDUFA11 CLIC4 CSNK2B LUZP1 CXCL12 TOMM20 SMYD5 CHEK1 C2orf54 VSNL1 DUOX1 REEP3 CNNM2 LRP11 GABARAPL1 KIF13A GPATCH4 NDUFB5 hsa-miR-137 ATP1B1 ATP2B4 SLC25A20 FBLN1 GPR56 LFNG CCT4 KLC4 CDCA3 IARSGOLGA1 VSTM2L IGBP1 ATP5J2 ZNF185 USP54 LAMP1 UBA52 ZNF263 hsa-miR-2110 MMGT1 LDLR CLSPN RPS21 DKK1 TM9SF3MAP3K7 XPNPEP1 CPEB4 PPL ESYT2 PDS5B TET3 TBX5 t8;t9 ACTN4 ACTN1 DMWD CUEDC1TNRC6A NBR1hsa-miR-1253 MRPS11TRAM2 UBE2V1 G3BP2 SQSTM1 STAT2 HPR ACOT13 MAP3K14 t 4 hsa-miR-338-3p t 3 UFC1 FXYD3 IGF2R HPX TJAP1 TFE3 KCNH8 TXNDC15 PLG IKBKG CBX8 RPL8 OAT PRKCSH PRCC hsa-miR-615-3p PDLIM1 PEF1 LRWD1 PEX13 hsa-miR-1469 NKPD1 ETS2 TANC2 PHC2 MRPL52 GALNT1 TNFRSF12ADAGLB CLDN7 KIF3B SPRY4FTH1 RSRC2 C1GALT1hsa-miR-300 GUSBAVL9 FAM171A1PPP1R14B NONO STRAP MT1X NUDT5 hsa-miR-1208 TNS3 PDAP1 GDI2 HSPBP1 BCL10 PTPRE ADAM19 PRMT8 t1; HK2 PPP6C ABCC3 BAG1CLUSORBS2 JPH2 PDCD4 PTGES2 ACSL5 CCNL1 ALDH3A1 C1orf174 RHBDF2 SHROOM2 CECR5 RAPGEF5 YWHAG PCDH1 TOR1B WDR70 ZDHHC21 AGFG1 GCHFR APOA1BP F11R TMEM184B MED25 HNRNPA2B1 NDUFS1 hsa-miR-217 ADCY6 SLC35B2 hsa-miR-648 PPP1R14CUBE2N BCL7C RTN3 KIDINS220 GCSH AFAP1L2CARHSP1 TXNDC11 NFIA GABARAP FZD6 SAFB2 NDUFV3 BTF3L4 TIMM8A B4GALT1 LSR QSOX1 FZD2 CREBBPTIMP1 hsa-miR-199b-5p TACC2 GOLIM4 TAF15 RPLP0 DVL1 CD82 MRPL47 SLC6A6 C11orf58 YTHDF2 POLDIP3 ESR1 CNBP BCAR1 PPP5C XRN1 STRA6 PGP t6; CIZ1 TAF5 CBL TGOLN2 FXR2 PTK2 TRAF2 CERS2 LAMA2 ABCC5 PPA2 MAP3K5 HNRNPL SERPINH1 CCK t2; SRP9 C1R APOB CALCOCO2 GLYR1 SUZ12 NUDT3 GADD45GIP1 PCMT1 IL1RAP SRGAP3 OTUD7B DBN1SLC2A5 DRAP1 MAFK EPHB4 UBA3 SIPA1 HMGN3 IFIT1CTSH CDV3 ALPP PACSIN3 FILIP1L PARL AKAP2 MATR3 LGALS9 TOMM6 CRLF3 ACSL1 HDLBPC6orf203 EIF3G TRAF6 HPS6 RELT SMPD1 COL6A1 WDR82 EPB41L4A PLEKHG1 t 4 PPM1F CDK13HNF4A DTL POGZ NOTCH1 PSIP1 PHLDA2 LYRM7 NHP2L1 GRAMD1AYY1 CMBL ARPC5L CCPG1 SMARCAD1 CHD3 NR3C1 RAP1B U2AF2 ELMOD2 GALNT7 CDS2 ANTXR1 SCARB2 STK16 UPF1 AKR1C1 C10orf35 PPP1CC PGBD5 hsa-miR-554 PER1 FAM3C ITGA3 NEK9FDX1 VPS4A t7; PCBP1 PI4K2A DNAJB4 hsa-miR-484 SNRPA1 GABARAPL2 ZDHHC5 IMPAD1 GOT2 BIRC6 HDAC1 ECM1 MZT2B EIF5B TTC39B ETFB TLK2 BDH1 CMAS t7; TSN FOSL2 ZNRF3 SKP1 CTSF NAV2 PRKAG2ZC3H14 CHST15 WDR48 FKBP5 DAAM1LMNA RPP30 ATXN1 SON TRPV3DDX17 hsa-miR-1224-3p THUMPD1 DNAJB11 ABHD14B IL1B E2F6 DSC3 NT5E CDC42BPA t3; ARHGEF11 PLOD1 KIAA1161 TRIM28 SELM SLC20A1BCL9L SOX15 TPST1 EML2 PCCB HMGB1 ATP1B3 SSBP1 CARD9 MYO9A t 4 CTU2 A2ML1 PCDH9 ARL1 TNFRSF10ASULT1A4 BCL2L12 FLRT3 USP47 LYSMD2 PAX5 ALDH1A3 SMCR7L PEX16 CCNT2 EGR1 OTUB1 TMC4 TLK1 RANBP2 COPS2 SYNPO PTPN3 RNF214 GPX8 LTBR SNAI2 C5 MFAP1 ZNF652 CTCFL ETFDH SPG20 IFFO1 SERPINE1 TFAP4HACL1 t 4 t 3 CADM3 PDLIM7 PIK3IP1 SHMT1 BLVRB TRIAP1SDF2L1 AGRN PARP12SPARC NHLRC2 MPP7 RPE MYO1E CPNE8 YEATS4 GRAMD3 hsa-miR-483-3p TTYH3 CD44 AAK1 TRIM24 PKP3 CD47 PLEKHM2 C12orf65CRIP2 MINPP1 PAWR BRMS1L PLBD2 ABCD4 SERPINA1AFF1 HSP90AA1 PGM2 COX18 CHD4 TAF10 t8; FN1 PTGES3 hsa-miR-661 TRIM38 EZR PRPF40A SLC38A10 FSTL3 FUT5 SLK MYO18A PPIL4 S100A6 STK10 CBX1OPTN EEF1B2 STC2 TFAP2C MFSD6 LYST UBAP2L t8 PCDHA13 LACTB GCDH PSMB6 P2RY2hsa-miR-595 S100A13 KCNIP3 FMNL1 SSFA2 PRPF4B TGM1 VPS13D FSCN1 TNFSF10 SH3BGRL3 EIF2A POLR3B NHLRC3 ICMT SLC35A2 CLK4 t9 EIF3K t 4 KPNA1 ARHGEF2 HINT1 PTPRK GDF15 LRP10 CSTA FZD8 SLC1A4 MEPCE SRF TBC1D25 JUNB CA12 hsa-miR-1279HIF1A REST MSRB3 ENDOD1 HSDL2 MDK FAM83A DDX20 CLIC1 ATF1RCOR1GEMIN4LAMB1 EBF1 SCEL MAP7D1 SLC43A3 DBT TSTA3 SCP2 PGRMC1 RAD23A TGFBI APOA1 EFNA1 KPNA6hsa-miR-548c-3p hsa-miR-1224-5p TMEM63B BLOC1S1 SP1CA2 ZC3H18 PA2G4 PPP2R5A VARS FOSL1 CORO1A VPS26A WHSC1 WDR44 PTPN23 ANP32B SEC16A OSBP MORC2hsa-miR-657 REPS1 TIMP3 SRP72 FAM177A1 GNB1 UNC5BCAV2 LEPREL2 PTP4A3 TTR S100P LTA CASP3 SMARCA2 CD63 TWF1 BCKDHA EIF2S2 NPC2 H3F3B SLC37A2 FOXL2 ZBTB7A ATG7 PPAN CXCL16TGFB2 TGFB1EPRS PHLDB2 GLUL H2AFY ZRANB2 USP8 t9 RPS7 FUBP1 hsa-miR-330-3p ABCB10 TARDBP GPHN COL17A1HP TPM4 HIST1H1A FAM129B RNF121 CMPK1 PCDHA12 SLC1A3 MAPRE2 RRP15 C17orf85 GLIPR1 SPTAN1 NUFIP2 FOXK2 GIT1 DIRC2 GLB1 PRR15L GBA TFPI HSPG2 HIST1H1DCNSTARHGAP27 FAM162A ZMYND8 TGFBR2ADRM1 EMP3 EIF4H EFNB1 SELENBP1 TRPS1VCP PAM OSTC LY96 RANBP1 HIBCH ARID4B FKBP10 PTER YAP1 hsa-miR-636GATA1 MAT2A PML HSP90AB1 NASP RPS13 RICTOR RNF41 JMJD1C SH3KBP1 CUX1 EIF2AK2 CLTC TAB1 hsa-miR-641 RBMS1 ENAH YWHAH ANGPTL4 LAMTOR5 HEG1 NRCAM ARF6 CTCF INPP4B SORL1 TSC1 TBC1D17 VDAC1 GOLT1B TAGLN ATF3 NECAP2 MAP1B RBM8A FAM193A FAM120A APEX1 ABCF1 FOXA1 CLK2 PBRM1 RERE PODXL USP6NL KRT18 TNPO2 LGALS1 CHD9 GPRC5A EFEMP1 PRRG1 SERPINE2 MECP2 VHL ATF4 CANX PDIA4 HSPA5 GLCCI1KANK4 SMARCC2 FUS KLF5 PPM1K LRRC41 FOXA2 NNMT RFX5 S100A4 LRRC1 WT1 ENGCHMP4ANCOA5 LOX SIN3ACSNK2A1ENO1 SSB NPLOC4 PITPNA PANX1 NFIC NANOGRFTN1CENPE AIM1 DAK SNRPD3 ARL15 ARF4 USP31CD99ATP5C1 ADNP DIXDC1 MBOAT2 INS B2M SLC29A1SMARCA4 HNRNPUL2GATAD2A BTN2A1 REEP1 DHX16 MIA2 RAB7A RCBTB1 FCHO2 BUD31 SERPINA3 SMAD3 LAD1 MYH14 CDC5L TBC1D2B C3 MMP3 STOM LARP7 TRIM59 SMC3 UQCR10SH3PXD2B FOXO1CHMP2B IGSF3 * FMNL2 CD40 hsa-miR-663b TAPBP PDX1 XRCC6 SLC38A7 TGFBR1 DAG1 FKBP9 EPS8L2 CASC3 ZFYVE26 SEC62 SULF2 CP AASDHPPT ZFC3H1 DAXX CCDC6 NUDT21 FNDC3B hsa-miR-142-3p CTNNA1 GAPVD1PRNP KLF13 CGREF1 CD79ARPS18 APTX LPAR2 RAB2ALASP1 PPP1R14AITGB5 MCFD2 CAMK2G FADS3 COX7B ITPR3 TNC FRMD6 BCAP31 hsa-miR-129-5p DNAJB5 KIAA1468 ATF7IPMAP3K6B4GALT4 PRKG1 IGFBP7 FNDC3A ILKhsa-miR-626 KMT2D BIK USP24 USP48 SNRNP200 BRWD1 SCRN2 MTFMT DMXL1 ARHGEF1 PRPF38A PLCG1hsa-miR-612 TNRC18 WDR43 FGFBP1 PROM2 PYGL ARPC1B WTAP P4HA2 PGRMC2 SDCBP2 RFFL PDE4D MEF2A ANXA3 GSS MMP14 MBTPS2 MLLT4 RAD21 FLVCR1 MED1PARD6A BCL2L13GATA2 PTPRS GYS1 ATL1 SERPINC1 CTGF PSMD6 NFE2 FAT2 ZHX1 LARP1SETD5 MPO PMEPA1 HARS MDH2 CEP350 SIRT1 COX6A1 HIST1H4ZFP36L1ARPLP2 DDX21 P4HA1 ICAM1 SNRNP27 MRPL49 FAS SDC4TIMM10 TBL1XR1 EPB41L1 ACBD5 CTSC GNS PRDM1 ANXA8 YIPF3 ARFGAP2 NCKIPSDHIST1H1E JUN RNF4 POLR2A ST14TIPRL hsa-miR-1248 SH2D3A ZNF281 TMEM165 ERP29 NCOR1 RBM5 DKC1 MRPS21 TRAPPC4 RRP9 ITGB4 GNG2 NFKB1 GOLM1 MIA3 WDR37 AFAP1 USP6 ITGA5 DDRGK1 SRC MYOFTAL1 ATP6V0C CLCA2 SMAD2 SRPK1 YIPF2 TOM1 MYCLPCAT2 C1S UTP14A IFI16 RPIASNRPE KRR1 NOP14 CRELD1RAB1B HIST1H1C hsa-miR-1205CNOT3G6PC3 SDPR SHC1 TMEM154 PTTG1 SLC27A1 GJB3 EDC4 FOSL2 MMP7 SPI1 EBF1 BCLAF1 NFKB1 SSR3 NHEJ1 HTRA1 MAX PLA2G15GABPA SLC15A4 TMEM2 FXN SF1 XRCC5 TMEM126B AATF PHB2OCIAD2 CYLD FAM83H AGAP3 TCF20 RPS19 PRKDC TAF7 BCLAF1 ATMPPP1R7 PKN2 MBP PFKM ADIPOQPAK4 SETDB1 hsa-miR-1207-3p PPIL3 SNAI2 hsa-miR-1207-5pINPP5D HIST1H1B DNMT3A UGP2 SMO POLR1D FAM160B1 ETFA BRCA1 ABLIM3 TRPC6 ETV6 ENSA CDKN1BFAM134A LYN DNAJC2 HM13CDC25A PPFIBP1 MRPL3 * FSTL1 SBSN CEP192 hsa-miR-339-5p WDR83 CASC4 CBX6 SPG11 RRAGC DNAJC9 TIMP2 STAT1 TUFM CD55LCN2 NIPBL NDUFA6 RBM10 SDR39U1 GTF2BCEBPB FAHD1 BCL3PAK1 ATXN3 CCDC102A XRCC1 PSAP TSPAN13 MMP10IGFL1 ZEB1 PPHLN1 TPM3 C18orf21 ADAM9 CCDC80TMEM120A USF1 SLC3A2 MCM4 GLOD4 GSK3A ATP5F1 RABIF HMGA2 FLNBSNAP23 RAC1 NCOR2 PRKAR2A PCYT1A APP PAFAH1B2 EPB41L4B TRPM4 IKZF3 NDUFA4 SNIP1 CD81 RPS2 FARSBPSME4 SMEK1 CCNK STK38L NOL12 AZGP1 CICSERPINB5 VCL SEMA7A CYR61 BCL2L1 CHD2 BMPR2HNRNPA1 INTS1 TOPORS SRRTATP1A1 IVL UNC93B1 FLNA SRD5A1 RPL10 NCL LCP1 TCHH MMP2 ZNF598 EIF4A1 AVENZNF292 BAIAP2L1C7orf50RNASE7 SARSARSK SLC35A4 CLPTM1 C4orf32MMP19 WASF2 TGM2 SMARCD3 E2F4 TOP2B CD59 DTNB POLR2E PTBP1 ZNF609 THRAP3 JUND SPEGC1QBP COX8A VANGL1 SLC39A7 GTPBP1 ITPKB ITGAV PRODH TFPTARID4A SP2 CAMKK2 RNPS1 OVCA2 ORMDL1 MGRN1 RBM25 WBSCR17MAPK8 FAM175A WDR20 IMPACT CALD1 CAMK2D PAK6 KRT6BSSRP1 PDE4DIP ATP6V0A1 CANT1 PRKCA MYL6 CROCCMARCKSL1 CACYBP ACO1E4F1 HMGA1 SLC7A2 BTD PVR NUCB1 CD97 AP3B1 CKLF ATP5O SDHA UIMC1 PTPN1 SIRT2 PDXDC1 NAPG PTTG1IP GPAM GUK1 PRKAR1A LIPG MAPK3 RPL37 MRPL15 TBC1D2 RNF20 NRF1 DOK1 FH NOM1 ARRB1 SNX16 PSEN2 hsa-miR-1257 CTSL PPM1B MMP7LTBP2 NBN MAPK1 DYNLRB1 KLK8 CEP152 IDH1 AGTRAP CDK12 RBP4 BLVRA SCAMP4 EIF4E TRA2A NOP56 AKT1 NOLC1 UTP18 VIM PAK2 ETF1 SPTBN1 NME1 GFPT1 SRSF1 MRPL45 XPC ATF2 COX7A2 ACIN1 ELL KRI1 PEAK1 CDH1 SP3 TP63 RABGEF1 L1CAM BRD4 TARS ATP5I DYNC1H1 RAB21 SDC1GNG7 COL4A2EZH2 SMAD4 ASPHD2 MOV10 CTTN TOE1 DNPEP PABPC4 LMBRD1 NFYB LAMA3 EAF1 UBTD2 LEPRE1 BTK ZNRD1PRKAA1 LAMC2 VDAC2 STEAP4 DYNC1LI1 GOLGA7 ZNHIT1 SDC2 ARSB RYBP PIAS1 PERP INTS7 HIP1 t 8 NUDCD2 TMEM9 RPTOR SUB1ALYREF MTDH ACTG1 hsa-miR-650 MLPH KRT1 NIFK CHD8 t4; KIF3A TCF4 S100A8 MYLKDPF2 ARID1A hsa-miR-147b MLLT1 GSTK1 NDST2 MTHFSD IRGQ TTN MAPK14 FDXR PRKACA LGALS3BP MDM2 PSD3 RAD23B EIF4G1 RPL3 PGAP2 NDRG1 CAV1 MYOD1 KATLAMB35 hsa-miR-760 TMEM44 SAFB PADI2 hsa-miR-330-5p EVPL FOS HMGB2 PLEKHA6 OCIAD1 CLIP1 ILF3 IRF6 ENOPH1TAPBPL PI4KB CDK1 DUT PAFAH1B1 MED14 ARPP19 ABLIM2 MTA1 PTRF CTSBACAP2CCDC88A SPEN SRSF2 BRIP1 PPP3CARALBP1 POLG2 KDM1A HIST1H3A FAF1 AK2 PCYT2 CPNE1 TMSB4XPRKCB SETD3 SAMHD1SQLEASS1 MUC1 POFUT1 ERCC6 CLK1 STIM1 TCOF1 TPR hsa-miR-1262 RUNX1SPI1 DLG1 PEA15 PLEKHG3 PARVA TAX1BP3 YBX1 IGHG1 CDK2 SERPINB2 LTB4R2 PHGDH BANF1 STX4 MAOA SNRNP70 BABAM1HTRA2 PKD2 TPPP TMEM201 RAB4A POLR2I F3 ARHGEF12 MYL9 KDM5A PRAF2 Legend NDEL1ITGB6 ZBTB16 CYCS RHOA NBL1 AJUBA PSMD1 t4; CTSZ KLC1 SSH3 SREBF1 CLSTN1 KCMF1 AKAP8 EP300 t 5 POFUT2 RAB5A ARHGAP1 PFN1 DSC2 AKAP13ESRRA APC t6; NPM1 DYNLL1 TPD52L1 PNKP TES FKBP8ACTC1 SVIL CDKN3 HDGFRP2 CES2 ARID1B MCM2 RBM39 LMNB2RPS6 CTNND1 ING1 SOCS3 HIST2H2AB PCK2 POU2F1 CEP120 ATR SHARPIN CNPY3 NUP153 HYI HEXB TCEAL3 OCLN PRKD1 PJA1 C2CD2 PITPNM1 CEP68 TP53 S100A9 CCT3 MDC1 RASSF1 BAD ATPIF1 AHCTF1 OSBPL5 VCAN PPP3R1 ZFP36 TOP2AYIPF4 AURKA RECQL PNPO SPECC1L t 9 PEBP1 FAM169A PPP2R2D PPARG DDX41 DNM2 SLC9A1 ABCC2 ACP1 ABCC1 CD2BP2 SRSF9 NDE1 SPRR1B MYOZ1 CDS1 PPM1G KIF14 URI1 EFHD2 GOSR2IMPDH2 BRAP PRPF19 TRAF4 INCENP RANGAP1 CHERP HIST2H3A BTF3 EIF4EBP1 PTPN6RUNX3 DIAPH1 HNRNPA0 POLR3A SLC12A6 EIF4A3 GALE LMO7 LIMA1 CGN THOC1LEMD2 SARNP CTSA USP16 NEB LRBA FOXC2 LRRC8E CBFB UBR4SRRM1 SIX5 RPS27L SNX17 SRPK2 PGD TMEM14C MCAM CS AHNAK NDUFS4 CST3 t5; AGPAT4 ANG ESF1 STK4 ORC3 OGFR TNIK SPTBN2 GEMIN5 PCNP PLAUACSS2SCD STAT3 DSPTTC1 ZNF687 CASP1 SLPI t9 CRYAB HDGF SYMPK SUMO2 HUWE1 AGPAT6 TOR1AIP1 SSH1 ARPC1A DNAH11 KCNN4 CABLES2 RPA1 SERPINB3 MAPKBP1 NUP214 PPA1 GLT8D1 NUCKS1 MARCKS IGFBP6 PPP3R2 MCM6 TP53BP1 HIST1H2BNFLAD1 RSF1 DHCR24 KIF22 RBM4B NCCRP1 KIF17 DDB2SAA1 PLAUR CAPZA2 SIRPA BCL2 ANO6 PTEN PBX3 CTSD RBMXL1 MCM3GM2A CLMNTOMM34 RAF1 PPP1CA RPS3A S100A11 ADK TUBA1C NARS NUP50 PHRF1 Increases expression RAB2B RBMX CCAR2 KMT2A STMN1CXCL8 MAPKAPK5 TMPO MPZL1 PAFAH1B3 USP36 NOB1 MAOB ATP5D HIST3H2BB FBN1 TFEB WC or Steiner CDH4 IQGAP2 EFNA5 PAX3 TPX2 BAG3KAT7 ARPC2 HNRNPC NGDNhsa-miR-944 PCM1 t6; CEP170 SLC25A3 PSMA7 WBP11 ROCK1 DIDO1 ZYX SCRIB C7orf55 BIRC5 SMC1A LRPAP1 IRF3 PNN PTDSS1 ABL1SF3B2 GGH SNAPIN DDB1 SFTPD SLC9A3R1 SERPINB13 HNRNPDTXN hsa-miR-142-5p CENPJ GLIPR2 EPHA2 CD14 TREX1MBD4 MTOR SLC16A1 TNKS1BP1 CHI3L1 UBE2I FABP5 VRK1 RAB12 PPP1R12A CASP4 CFL1TIMM9 NEDD4L FIG4 RIN1 TPM1 PRRC2ARBL2 MVK EPHB2 MAP4K3 TRAFD1 WBP4 DESUQCRH TBP CORO1B SFMBT1 RTKN ZBTB33 ITGB3 HSPB1 SPAG9 STAT6 PPP2CB PI3 FYN FNBP1L CDH2 ZC3H3 HSF1 CD2AP SF3B1 MAML1 FNBP4 CNN2 EPHA4 SUPT16H PRDX1 EIF4B SLC26A2 MDH1 NFE2L2 SERPINB4 PLA2G4A H2AFX DIABLO SCAF1 t7; FBLN5 EIF2B4 ACTBL2 HSPE1 PLD1 GPR108 NPC1 KDELR1 RPL36AL FAP DEK hsa-miR-645 MYF5 TCEB1 TMEM11 PPP2R1B SNCA ULBP2MVP Decreases expression APAF1 BBX EV or Sec HSPD1 ABLIM1 ANKLE2 GPR155 HMGN4 YTHDC1 KBTBD6 NOC2L SYTL1 PDLIM5 CHPF2 MCOLN1 ZMAT5CDON PPP1R10 ASB7 KCNJ15 SLC38A1 ACYP1 ST13 ZNF768 TXN2 RHEB SUDS3NUMA1 MPHOSPH10 ST5 PCNT t 5 PPME1 CCDC61 UBA2 SYTL2 RBPJ ACSL4 t4;t7 GPX1 FAM83B HSD17B6 ASPH MFSD3 SLC20A2 LGALS3 RPRD1BLAS1L RPRD2 SERPINB1MYBBP1A CALU CEBPG TRA2B GFAP TUSC1 DCTD CCDC132 RTN4 EXT1 UBP1 FRMD5 t8 HSPH1 PPP2CA GPR180 TJP1 ARHGAP5 RAB6B ANXA1 CDH3 BNIP3 SPCS2 NUBPL ADH5 BET1 GALNT6 TIA1 PSEN1 SMN2 RALGAPA1 MTMR12 ZNRF2 MPRIP TTL GPATCH8 SACSSETD1A NOP58 MAK16 ABI1 FAM120C SRRM2 RAB11FIP5 MAT2B G3BP1 ZDHHC17 DNAJB1 ZBTB1 XPNPEP3 AURKB BAZ2A CHM ATRX ASH2L RACK1 ELF1 RCN2 SUSD5 Glyco CRK FBXO38 COG3 CUL4A RBBP6 MAGI3 CEP135 HSPA4 PHF20L1 miRNA – Gene ARHGAP35 IGHG3 ALG9 MYL12A CDK8 TUBB4A ALCAM MTMR10 E2F1 TWISTNB ARVCF RPS26 IRF2BP1FGFR1OP TOM1L1 CNN3 DOCK5 NHS PRDX5 t 8 PTPN14 AATK ARHGAP33 AKT2 RACGAP1 LTF CASP9 CLPTM1L TGFB1I1 MAP2 VPS4B TMEM159 IPO7 CTTNBP2NL PARN CIR1 (De)Phosphorylation TUBA1A NELFE PTK7 SDF4 ATXN7 MAN2A1 C1orf131 RBM14 MKI67 SART1TFIP11 MUT Mem VPS53 COX5A NENF CBX5 RCC2 CCDC86 MGST3 DNAJB6 STXBP2 PLEC LMAN2 C18orf25 KIF23 DDX52 SYNE2 UTP3 VKORC1 MYO9B SNRPB2 CDCA8 TMEM179TBM7SF3 TOP3A RPS10 CSRP1 MAN2B2 CHCHD5 CASP7 KIF20A COBLL1 NUDC FAM195B SEC31A HGSNAT PUS1 MAPRE1 TF – Gene/miRNA GSR MISP DNAJA3 SPG21 RSL1D1 LSM14B ARPC5 C2CD5 HMGN2 ACLY NOL8 CLN6 Phos SDHAF2 STK39 CDK11A ATP7A MAFK hsa-miR-142-3p hsa-miR-142-5p LBR Signaling interaction Bottleneck Nuc

SDHB analysis CDK11B t5; t 9 Nuc and Phos Queried EMT players t6; t 5 * Properties of the EMT network t7; miR t1;tn Time 1; Time n Total number of nodes: 2217 t8 Total number of edges: 3255 Note: The highest logFC value Average number of neighbor: 2.970 * FN1 TWISTNB* *ZEB1 Low High and the corresponding layer is shown for each node.

C EMT network-derived signatures predict patient survival D Overlap with E Overlap with 27 controllers 2191 non-controllers E E/M M MSigDB Non-

+++++++++++++++++++ dbEMT2 ++++++++ ++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++ ++++ +++++++++ ++ ++++ +++++++++++ ++ +++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++ +++++++++ ++++++++++++++++++++++++++++++++++ +++ ++ + +++++ ++++++++++++++++++++++++++++++++++++ + ++++++++++ ++++ +++++++++++++++++++++++++++ ++++++ controllers +++++++++++++++++++++++++++++++++++++++++ ++ + ++ ++++++++++ + +++++++++ +++++++++++++++++++++++++++++++++++++++++++++++ + +++++ +++ ++++ +++++ + +++++ ++++++++++++++ +++++++++++++++++++++++++++++++++++ ++ +++++++++++++ ++++++++++++++++++++++++++++++ +++ ++++++++++++++++++++ ++++++++++++++++ + +++ +++ +++ ++ ++++++++ + ++++++++++++++ ++ +++++++++++++++ ++++ +++ ++++++ ++++ +++++ +++++ +++++ ++++++++++ ++++++ +++++++++ ++ +++ ++++++++ + ++++ +++++++++ +++ ++++++++++ +++ + + +++++++++++ +++ + + ++++ ++++ ++ ++++++++++++++++++ +++++++++++++++ +++ + +++++ +++++++++++++ ++++++++++++++++ ++++++++ ++ +++++++ ++++++ ++++++++++++ +++++++ +++ +++++++ +++ +++ +++ ++++ ++ + ++ 1.00 +++ +++++++++++++++ ++++++++++ ++++ +++ +++ ++++++++++++++ +++ +++++ +++ +++++++ ++ ++++++++++++++ + +++++++++++++ ++++ ++++ + + +++++++++ + +++++++++++++++ ++++ + +++++++++ ++ +++++++++++ +++++++ ++++ +++++ + +++++++++ +++ +++++++++ ++ + ++++ ++ ++++++++++++ ++++++++++++ ++++ +++ + +++++ ++ + ++++ ++++++ ++++++++++++++++++ ++++ +++++++++++++++++++++++ + + +++++++ ++ +++++ ++++ +++++ ++++++ + + + +++ ++ ++++ + + ++ ++ ++++++ +++ +++ + +++ +++ + + +++ ++++ ++++ +++ High +++++ + ++++++ +++++++++++++++ + ++ ++++++ ++++++ ++++ ++++ + +++ +++ +++++ + ++++ + + ++++ ++ +++ + ++++ ++ +++ +++++ + +++ ++++ +++++++++++++++++++++++ + +++++ + + +++ +++ +++ +++++++++++++++++ + + ++++++++++ + + ++ +++++++ ++++ +++++++ +++++++++++++++++ ++ ++ +++++ +++ ++ ++++++ ++++ ++++++ ++ +++ ++++++ + + + ++ ++ +++ ++ ++++ ++ ++++++ +++ ++ + ++ + +++ ++++ +++ + + ++ ++ + ++ + +++++++ ++++++ +++++++ +++ ++ +++++++ ++++++++++++++++ ++ ++ +++ + ++++++++++++++++ ++++ – +++ + + ++ ++ + + + +++ +++ + ++ +++ + +++ + + ++ ++ +++ +++ ++ ++ ++ +++++++ ++ +++++++ +++++++++++++++++++++++ +++ + + ++++ + + Low ++++++++++++ + + + ++ + ++ +++ + ++ + ++ + + + + ++ + + ++ ++ ++ ++ +++ + ++++ + ++ +++ + + + ++++ + + ++ + + + + +++ + + ++++ + + ++ ++ + + 0.75 + + + + +++ ++ – + + ++ +++ +++++++ ++ + + ++ ++ + + + ++ ++ + + + + ++ + ++++ ++++ ++++ +++ + + + +++ + + ++++ MSigDB +++++ ++ + ++ ++ + ++ ++ +++ + ++ ++ +++ + +++ + + ++ + +++ ++ + ++ ++ ++++ + + ++ ++ + + + + + + + + 239

++ + + ++ ++ + + ++++ + + + + +++

+ + +

+ + +++++++ + + + + + + dbEMT2 1,881 ++ + 1,098 + + +++ + ++ 67 ++ ++ + 0.50 +++ ++ + + ++ ++ + + + ++ ++

+ + + +

+ + + + + + + + + + + + + +

+ ++

++ +

++ + + + + + 132 + +

+ + 1 0.25 + + + 18 + 877 35

+ + + p < 0.0001 p < 0.0001+ p < 0.0001 p < 0.0001 7 36 Survival probability 33 0.00 Controllers 0 0 0 0 MEF2A SPI1 CSNK2A1 ABL1 96 Time: MSigDB 1000 2000 3000 4000 5000 6000 7000 8000

1000 2000 3000 4000 5000 6000 7000 8000 1000 2000 3000 4000 5000 6000 7000 8000 1000 2000 3000 4000 5000 6000 7000 8000 BCLAF1 GATA2 CDK1

F A simplified schematic of links between several controllers during EMT

E2F4 INCENP AURKB

SMAD2 MEF2A MAPK3

NFKB1 TP63S455 PERP SHC1 MAPK14 SAA1 STAT3 JUN S36, T29

130 RUNX1 CEBPB CEBPBS237 BCL3 TGFBR1 SP1 PRKCA EGFR RAC1 MYC SPI1 FN1 EBF1 miR-142-5p HIP1 REST ITGB5 PTK2

SMAD3 S47, S51, S460 PI4K2A PRKAA1S508 FOXA1 S126 TAL1 ACP1 AKT1 GATA2 CDK2T14, Y15 E E/M M 1 2 4 hours & Day 1 Transition 1 Days 2 – 5 Transition 2 Days 6 – 12

G Workflow for morphometric analysis of inhibitor screen H Exposed combinatorial vulnerabilities Controllers Selected Inhibitors Experiment Analysis 0.09 ● 0.06 ● ● Inhibitor Target ● ● ● ● EMT inhibitor 1 Known EMT blocker 0.03 ● ● ● ● Sonidegib Hh inhibitor TGFβ/Control ● ● TGFβ LB-100 PP2A inhibitor 0.00 ● ● + ● ● ● Barasertib AURKB inhibitor Inhibitor ● ● EMT inhibitor 1 + + + + + + + + + + + + + + + + + + + + + + + ● Autocamtide CaMK-II inhibitor permutations Sonidegib + + + + + + + + + + + + + + + + + + + + + + + ● ● ● ● ● ● PP1 SRC inhibitor LB-100 ● + + + + + + + + + + + + + + + + + + + + + + + ● ● APCP NT5E inhibitor Barasertib + + + + + + + + + + + + + + + + + + + + + + + ● ● ● Original Segmented Mask Autocamtide + + + + + + + + + + + + + + + + + + + + + + + ● PP1 + + + + + + + + + + + + + + + + + + + + + + + Control Day 3 Measured: Circularity of cells APCP + + + + + + + + + + + + + + + + + + + + + + + bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S1. Related to Figure 1

A C 10 time points Treatment scheme Biological triplicates TGFβ treatment Cont Day 12 Day 8 Day 6 Extracellular vesicles Day 5 Secretome Day 4 Conditioned media Day 3 Cells Day 2 Day 1 aliquots 4 hour Serum Metabolome Plasma membrane mRNA Single-cell RNA remove Harvest Nucleus miRNA sequencing Peptidome Phospho-STY Proteome Acetylome N-Glycosylation B Morphological changes over time points Control TGFβ (4 days) TGFβ (8 days) TGFβ (12 days)

100 µm

D Enrichment performance E Quantitative accuracy across replicates

Acet EV Glyco 100 100 100 100 1.00 1.00 1.00 s Cor s Cor s Cor 75 75 ' 0.75 ' 0.75 ' 0.75 75 75 0.50 0.50 0.50 0.25 0.25 0.25 50 50 50 50 0.00 0.00 0.00 earson earson earson

P 0 25 50 75 100 P 0 25 50 75 100 P 0 25 50 75 100 25 25 25 25 Cumulative% Cumulative% Cumulative%

ercent proteins 0 P 0 0 0 Mem Metabol miRNA no yes no yes no yes no yes 1.00 1.00 1.00 s Cor s Cor s Cor Annotated Annotated Annotated Annotated ' 0.75 ' 0.75 ' 0.75 nuclear region? 0.50 0.50 0.50 exosome? secretome? membrane? 0.25 0.25 0.25 0.00 0.00 0.00 earson earson earson

P 0 25 50 75 100 P 0 25 50 75 100 P 0 25 50 75 100 15 15.5 Cumulative% Cumulative% Cumulative% 14 17 15.0 mRNA Nuc Pep alues

v 13 16 1.00 1.00 1.00

14.5 s Cor s Cor s Cor ' 0.75 ' 0.75 ' 0.75 12

(Log2) 0.50 0.50 0.50 15 11 14.0 0.25 0.25 0.25 0.00 0.00 0.00 Intensity 10 14 earson earson earson P 0 25 50 75 100 P 0 25 50 75 100 P 0 25 50 75 100 Neg Pos Neg Pos Neg Pos Cumulative% Cumulative% Cumulative%

GOGA2 HSPA8 HSP90 ATP1A1 HPRT PCNA Phos WC Sec CLX YWHAZ VDAC2 TUBA1A HDAC1 BCL2 ENO1 TUFM MT-CYB 1.00 1.00 1.00 s Cor s Cor s Cor

' 0.75 ' 0.75 ' 0.75 ENPL PGK1 ITGB1 MT-CO1 0.50 0.50 0.50 CYC CFL1 0.25 0.25 0.25 H12 TSG101 0.00 0.00 0.00 earson earson earson

SUV91 ALB P 0 25 50 75 100 P 0 25 50 75 100 P 0 25 50 75 100 Markers tested COX41 ALDOA Cumulative% Cumulative% Cumulative% TGON2 LDHA F G Expression profiles (scaled) Magnitudes of differential expression WC Nuc Sec mRNA Acet EV Glyco Mem Metabol miRNA

4 EV

0

Pep −4 miRNA

Mem mRNA Nuc Pep Phos Sec WC Phos Log2FC

4

Metabol 0 Acet

−4

Glyco fc_T1 fc_T2 fc_T3 fc_T4 fc_T5 fc_T6 fc_T7 fc_T8 fc_T9 fc_T1 fc_T2 fc_T3 fc_T4 fc_T5 fc_T6 fc_T7 fc_T8 fc_T9 fc_T1 fc_T2 fc_T3 fc_T4 fc_T5 fc_T6 fc_T7 fc_T8 fc_T9 fc_T1 fc_T2 fc_T3 fc_T4 fc_T5 fc_T6 fc_T7 fc_T8 fc_T9 fc_T1 fc_T2 fc_T3 fc_T4 fc_T5 fc_T6 fc_T7 fc_T8 fc_T9 fc_T1 fc_T2 fc_T3 fc_T4 fc_T5 fc_T6 fc_T7 fc_T8 fc_T9 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S2. Related to Figure 2

A PCA analysis Acet EV Glyco Mem 4 Control 10 2 4 hours 2 5 5 1 day 0 2 days 0 0 0 −2 3 days PC2 (8%)

−5 PC2 (5.36%) 4 days PC2 (11.51%) PC2 (11.42%) −2 −4 −5 5 days −10 6 days 0 5 0 0 4 8 10 0 5 10 −5 8 days −4 −10 −5 −10 PC1 (15.95%) PC1 (10.61%) PC1 (19.96%) PC1 (12.93%) 12 days

Metabol miRNA mRNA Nuc 10 20 20 5 10 10 0 0 0 0 −10 −10 PC2 (9.87%) −10 PC2 (11.78%) PC2 (13.79%) PC2 (11.13%) −5 −20 −20 −30 −20 0 0 5 0 0 10 25 10 −5 −40 −20 −15 −10 −50 −25 −20 −10 PC1 (13.26%) PC1 (19.83%) PC1 (25.62%) PC1 (17.92%)

Pep Phos WC Sec 20 20 10 4 10

0 0 0 −10 0 −20 PC2 (9.53%) PC2 (14.19%) PC2 (10.14%) PC2 (11.08%) −4 −20 −10 −40 −30 0 5 0 0 0 25 50 10 20 10 −5 −10 −50 −25 −30 −20 −10 −10 PC1 (16.94%) PC1 (19.55%) PC1 (15.2%) PC1 (18.65%)

B C D Population Metagene variance map Gene-Metagene covariance map

41 Map

1.26 0.99

0 0.18

1

Number of genes in metagene Metagene variance Correlation (genes-metagene)

E SOMs for Cont to day 1 SOMs for days 6-12 SOM for day 2 F EMT hallmarks in SOMs for days 6-12 ACTG1 CALR LTBP1 CALD1 CDH2 FLNA Key 2 Acet 1 Exo

0 Glc Mem −1 mRNA −2 Nuc Pep EPHB2 ALCAM CD44 FSTL1 TAGLN FN1 Phos

Standardized 2 Prot expression (log2) 1 Sec

0

−1

−2 y 1 y 2 y 3 y 4 y 5 y 6 y 8 y 1 y 2 y 3 y 4 y 5 y 6 y 8 y 1 y 2 y 3 y 4 y 5 y 6 y 8 y 1 y 2 y 3 y 4 y 5 y 6 y 8 y 1 y 2 y 3 y 4 y 5 y 6 y 8 y 1 y 2 y 3 y 4 y 5 y 6 y 8 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a y 12 y 12 y 12 y 12 y 12 y 12 a a a a a a D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D Control Control Control Control Control Control 4 hours 4 hours 4 hours 4 hours 4 hours 4 hours bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S3. Related to Figure 3

A B GeneSet enrichment of metabolic pathways Metabolite-metabolite interaction network using active subnetworks of metabolites identified in SOM for 4 hours to day 1

Other glycan degradation Glycosaminoglycan degradation Sphingolipid metabolism Lysine degradation Arginine and proline metabolism Steroid biosynthesis Biosynthesis of unsaturatedf atty acids Phenylalanine metabolism Tyrosine metabolism Glycosphingolipid biosynthesis− ganglio series Glycerophospholipid metabolism Ether lipid metabolism Glycerolipid metabolism Day 2 Day 3 Day 4 Day 5 Day 6 Day 8 Day 12

Enrichment scores 15 10 5 0 −5 −10

C Metabolite-metabolite interaction network D Metabolite-metabolite interaction network of metabolites identified in SOM for day 2 to day 4 of metabolites identified in SOM for day 5 to day 12 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S4. Related to Figure 4

A scRNAseq QC pipeline C Bulk mRNA expression

CDH1 EPCAM Data matrices 26,364 genes • 2125 cells 1 0 Genes not expressed in any cell 16,666 genes • 2125 cells −1 −2 Outlier cells (MAD ≥ 3) 16,666 genes • 1913 cells xpression (log2)

e FAP FN1 ed

z 1 Cells with ≤ 200 genes 16,666 genes • 1913 cells 0 −1 Standardi Genes detected in ≤ 4 cells 14,442 genes • 1913 cells −2

Genes with average UMI count ≤ 0.1 9785 genes • 1913 cells

B Top 25 expressing genes

Mean expression

Distribution of expression across cells

D Differentiation trajectory 5.0 ●● ●●●●●●● ●●● ● ●●● ●● ●●●● ●●●●●● ●●● ●●●●●●●● ● ●●● ●●●●●●● 20● ● ●●● ● ● ●●●●● ●●●●● ●● ● ●●●●● ●● ●●●●●● ●●●●● ●●●●●● ● ● ●●●●●●● ● ● ●● ●● ● ● ● ●●●● ● ●●●●●● ●●●● 17● ●●● ● ●● ●●● ●● ●●● ● ●●●● ● ● ●●●●●●● ●●●●●● ●● ●●● ● ●● ●●●● ●●●● ●● ● ● ●● ● ● ● ● ● ● ●●● ●● ●●●● ● ● ● ● ●● ●●●● ●●●●●● ● ● ● ● ●●●●● ● ● ● ●●●● ● ● ● ● ● ●●● ●● ● ● ●●● ●14●● ●●●● ● ● ●●●● ● ●● ● ●●● ● ●● ●● ● ● ● ● ●● ●● ● ●● ● ● ●●●● ● ●● ●●● ● ● ● ● ● ● ● ●● ●●●● ●●●●●●●● ● ● ● ●●●●●●● ●●●●●●●● ● ●●●● ● ● ● ● ●● 2.5 ● ● ●●●●●●●●●●● ●●●● ●●● ● ● ● ● ●● ●●●●● ●●● ●●●●●● ● ● ● ● ●●● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ●●●●●●● ●●● ● ●● ●● ●●● ● ●●●● ● ● ●● ● ●●●● ● ●● ●● ● ●●● ●●● ●● ●●●●●● ●●● ●● ●●● ● ●● ●● ●●●● ●●●●● ●●●● ● ● ●● ●● ●● ●●● ● ●●● ●●●●●● ●● ●●● ●● ● 13● ●●● ●● ● 12 ●●●● ●●●●●● ●●● ●●●●● ●● 19 ●●●●●●●●● ●●●●●● ●●●●●●●●● ● ● ● ●● ● ● ●● ●● ● ●●●●● ● ●● ● ●●●●●●● ● ● ●●●●●● ●● 6 ●● ● ●●●● ●● ●●● ●●●● ● ● ●● ●●●●● ●●●●●● ●●●● ●●●●●●●●● ● ● ● ●● ● ● ● ●●●● ●●● ● ● ● ● ●● ● ● ●● ●●●●●●●●● ● ●● ●● ● ●● ●● ● ●●●● ●●●●●●●●● 9 ●● ●● ●● ● ●●● ● ●●● ●●●● ● ● ●●●●●●●●● ●●●●● ● ●● ●●●●●●● ● ●●● ●●●● ●● ● ●● ●●●●● ● ●●●● 16● ● 18 0.0 ●●● ●● ● ● ●●● ● ●●● ●● ●● ● ●●●● 7● ● ● ●●●● ● ●● ●●● ●●●● ●●● ● ●●● ●● ●●●●● ●●● UMAP 2 ●●●● ●● ●●●● ● ●●●●●●● ●● ●●●● ● ●● ●●●● ●●●●●● ● ● ●● ● ● ●●● ●● ●●●● ●● ● ● ● ●●●● ● ●●● ●● ● ● ● 4 ● ●●● 11● ●●● ●● ● ●●● ● ●●●●●●● ● ● ●●●● ●●●● ●● ● ●● ● ● ● ●● ● ●●● ●●●●●●●●● ● ● ● ●● ●●●●●● ● ●●●●●●●●●●●● ●●● ● ●● ●● ●●●● ● ●●●● ●●●●●●● ● ● ●● ● ●● ● ●● ●●●● ● ●●●●● ●●●●● ● ● ●● ● ● ● ● ● ● ●●● ●●● ● ● ● ●●● ●● ●●●●● ●●●●●●●●●● ●●● ● ● ● ● ● ● ●●●● ● ●●●●● 8●●● ● ●● ● ● ● ●● ●●●●●● ●● ●●●●●● ●● ●●● ● ●●●●●● ● ●● ● ● ● ●●●● ●● ● ●●● ● ●●● ●● ●●10 ●●●●●●● ●●●●●● −2.5 ● ●● ●●●● ● ● ●●●●●●●●●● ●● ● ● ● ●●●●● ●● ● ●●●●● ● ● ●●●●●●● ●● ● ●●●●● ●●● ●●●●●●● ● ● ● ● ● ●●● ● ●● ●● ●●● ●●●●●● ●● ●● ●●● ●●●●● ●● ● ●●●●●●●● ●● ●●●●●●● ●●●● ●● ● ● ●●● ●● ●●● ● ●● ●●●●●●● ●3● ● ● ●●●● ●1● ●● 15 ●●●●●●●●● ●● ●●●● ● ● ●●●● ● ●●●●●●● ● ● 5 ●●●● ●●●●●●● ● ●●● ●●●●●● ●● ● ●●●●●●●●●● ●●●●● ● ●●●●●● ● ●●●● ● ●●●●● ● ●●● ●●● ● ●● ● ●●●●●2 ● ●●●● ● ●●●●●●●● ● ● ● ●●●●●●●● ●●●●●● ● ●●● ●●●●● ●●●●●●●●● ● ●● ● ●●●●● ●●●● ●●● ● ●● ●●● ● ●●●● ●●●●●●

−5 0 5 10 UMAP 1

E Module and pathway enrichment of the partitions GO Biological process Mitotic cell cycle Cell division Microtubule cytoskeleton organization segregation Mitotic nuclear division Organelle localization DNA metabolic process Cellular response to DNA damage stimulus Meiotic cell cycle Positive regulation of transferase activity Gamete generation Ciliary basal body−plasma membrane docking Anaphase−promoting complex− dependent catabolic process Retrograde vesicle−mediated transport, Golgi to ER Antigen processing and presentation ofe xogenous peptide antigen via MHC class II Protein localization to chromosome, centromeric region Nuclear envelope organization Regulation of ubiquitin protein ligase activity

0 30 60 90 Module 12 Module 9 DNA metabolic process Module 4 Chromosome organization Module 1 DNA repair Response tor adiation Module 7 1.5 DNA replication−independentn ucleosome organization 1 Module 3 DNA replication−independentn ucleosome assembly Module 6 Telomere maintenance via semi−conservative replication 0.5 Ciliary basal body−plasma membrane docking 0 Module 2 Deoxyribonucleoside monophosphate biosynthetic process −0.5 Module 10 Module 11 0 50 100 Module 8 Positive regulation of biological process Module 5 Organonitrogen compound metabolic process System development Immune system process Tissue development

tition 3 tition 1 tition 2 Locomotion r r r

a a a Cell proliferation P P P Response to cytokine Cell adhesion Cell migration Leukocyte activation Cytokine−mediated signaling pathway Regulation of endopeptidase activity Extracellular matrix organization Regulation of peptidyl−tyrosine phosphorylation Cornification Negative regulation of viral life cycle 0 100 200 300 Overlap size (%) bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S5. Related to Figure 5

A B Overlap between various molecular layers BioID of SCRIB –/+TGFβ

+SCRIB –SCRIB

Protein extraction

Glyco Sec LC-MS/MS EV Streptavidin Mem Experiment pull down Nuc Control Day 4

Biotinylated interactor Biotinylated non-specific Non-biotinylated protein C L–R crosstalk between cell clusters 17 16 20

14

MMP7 EPHA2

CD63 NCSTN 3 CDH1 LRP1 MMP9 UR SPINT1 A T1 R LR SEMA7APL F3 P SO HSPG2AP1 ST14 PVRL2 4 PSAP EGFR

ITGA1 TIMP1 PVRL1 TFPI

6 EPHB2 ANXA1 LDLR AZGP1 SDC1 F11R IGF1R EFNA1 CELSR1 EFNA5 CAV1 ITGB4 NT5E CD151 ITGA5 GPC1 ERBB2 12 CD44 ITGB6 LAMC2 ITGB5 ITGA2 SERPINE1

ITGA3

FN1 ITGB1 AV APP CALR PKM

TIMP2 ITG CTGF

ITGA6 T4 L

TBP1 COL17A1EFNB1 EFNB2 L

ML

13

D Expression in scRNAseq dataset MMP7 CD44 Clusters 20 ●● 17 ●● 13 ●● 14 ●● 12 ●● 19 ●● 18 ●● 16 ●● 9 ●● 8 ●● 5 ●● 4 ●● 6 ●● 3 ●● bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S6. Related to Figure 6

A B C Total detected Enrichment efficiency Phosphosite localization probability 11,215 Phosphopeptides− 77% Class I sites− 74% Poetntial contaminant Reverse hits Missing values Localization prob < 0.75 Other sites 8741 All other peptides −23% −26%

D Dynamic range E Freq of pSTY per PSM F Regulated proteins G FC values 2000 7.5 100 76812 Dynamic range ~106 80000 Up 1000 5.0 75 56274 60000 2.5 ered 0 v

50 0.0

40000 Log2 FC

No. of proteins 1000

Number of PSMs −2.5

% sites co 25 20000 Down 5712 2000 209 1 −5.0 99.33% 0 0

10 15 20 1 2 3 4 5 Day 1 Day 1 Day 2 Day 2 Day 4 Day 4 Day 6 Day 6 Day 3 Day 3 Day 8 Day 8 Day 5 Day 5 4 hour 4 hour Phosphosite intensity (Log2) Number of pSTY per PSM Day 12 Day 12 (All samples)

H Kinase activity

4 OCK1 R CLK1

3 NEK2 MAPK1 URKB PRKG1 A PRKCH

2 OCK2 R PDK2 URKA PKD1 A PRKCE CDC42BPB AKT2 MAPK13 MAP2K5 MAK PRKCG MAPK14 PA CDK7 CAMK2B MAPK3 NEK4 PIM1 PRKCI GRK5 CDC42B CHUK PRKCD CAMK1 PRKDC AK2 P PDHK4 GRK1 RPS6KB1 PRKCA CSNK1G3 1 PLK1 Mean of KSEA scores across all time points TR MARK4 A MAPK7 CSNK1E GSK3B CSNK2A1 CSNK1A1 TTK CDK2 RPS6KA2 PRKAA2 PLK4 AKT3 IKBKB MAP2K4 RPS6KB2 MAPKAPK3 CDK6 PRKCB TM CSNK1G2 RPS6KA1 A MAP2K6 AK1 RPS6KA3 CLK2 ARAF PRKAA1 P RAF1 MAPK11 AKT1 MAPK9 EIF2AK2 MARK1 CAMK4 CAMK2G MAP2K1 CSNK2A2 CSNK1G1 PRKCT MAPK8 CAMK2A PDK1 CHEK1 HIPK1 CDK4

0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 Ranked kinases bioRxiv preprint doi: https://doi.org/10.1101/2021.06.01.446492; this version posted June 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S7. Related to Figure 7

EMT Website

Interface for querying the EMT network