medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

1 Rafael Stroggilos1,3, Maria Frantzi2, Jerome Zoidakis1, Emmanouil Mavrogeorgis1÷, Marika 2 Mokou2, Maria G Roubelakis3,4, Harald Mischak2, Antonia Vlahou1*

3 1. Systems Biology Center, Biomedical Research Foundation of the Academy of Athens, Athens, Greece.

4 2. Mosaiques Diagnostics GmbH, Hannover, Germany.

5 3. Laboratory of Biology, National and Kapodistrian University of Athens, School of Medicine, Athens, 6 Greece

7 4. Cell and Therapy Laboratory, Biomedical Research Foundation of the Academy of Athens, Athens, 8 Greece.

9 ÷current address: Mosaiques Diagnostics GmbH, Hannover, Germany

10 * Corresponding author:

11 Antonia Vlahou, PhD

12 Biomedical Research Foundation, Academy of Athens

13 Soranou Efessiou 4,

14 11527, Athens, Greece

15 Tel: +30 210 6597506

16 Fax: +30 210 6597545

17 E-mail: [email protected] 18

19 TITLE

20 and coexpression alterations marking evolution of bladder

21

22 ABSTRACT

23 Despite advancements in therapeutics, Bladder Cancer (BLCA) constitutes a major clinical 24 burden, with locally advanced and metastatic cases facing poor survival rates. Aiming at 25 expanding our knowledge of BLCA molecular pathophysiology, we integrated 1,508 publicly 26 available, primary, well-characterized BLCA transcriptomes and investigated alterations in 27 gene expression with stage (T0-Ta-T1-T2-T3-T4). We identified 157 and several 28 pathways related prominently with , showing a monotonically up- or down- 29 regulated trend with higher disease stage. Genome wide coexpression across stages further 30 revealed intrinsic and microenvironmental gene rewiring programs that shape BLCA 31 evolution. Novel associations between epigenetic factors (CBX7, ZFP2) and BLCA survival 32 were validated in external data. T0 together with advanced stages were heavily infiltrated 33 with immune cells, but of distinct populations. We found AIF1 to be a novel driver of 34 macrophage-based immunosuppression in T4 tumors. Our results suggest a continuum of 35 alterations with increasing malignancy.

36

37 INTRODUCTION

NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

38 Bladder Cancer (BLCA) accounts for approximately 200,000 annual deaths worldwide and is 39 considered the most expensive cancer to manage[1]. Advances in imaging technologies and 40 drug discovery have improved patient survival and quality of life [1]. However, early-stage 41 incidents [classified as non-muscle invasive (NMI)], still suffer from high rates of disease 42 recurrence, whereas advanced stage [classified as muscle invasive (MI)], and metastatic 43 face poor outcomes [2].

44 Advances in state-of-art molecular profiling technologies have enabled deeper investigations 45 of BLCA, expanding our current understanding of its molecular pathology. According to the 46 dual track model of bladder carcinogenesis [3], papillary-NMI and MI disease develop from 47 different sets of molecular alterations. Studies performing mutational profiling suggest that 48 low grade Ta tumors arise from activating mutations in either FGFR3 or HRAS, which 49 typically result in over-activation of the downstream Akt/PIK3CA/mTOR and RTK/MAPK 50 growth pathways [3]. In contrast, MIBC tumors are thought to develop from dysplastic Tis 51 lesions with non-functional (mutated) TP53 or RB1 tumor suppressive pathways [3]. 52 However, it remains unclear how these mutational signatures translate to different gene 53 expression routes. Moreover, the dual track model cannot explain adequately molecular 54 events driving transformation of a papillary-NMI tumor to MI, nor in the case of MIBC, 55 alterations happening before the dissemination to detrusor muscle.

56 In an effort to better understand molecular pathogenesis, various BLCA molecular subtypes 57 have been described [4-15]. Data supporting a continuum of alterations that likely drive 58 bladder carcinogenesis came from MI patients having tumors with a mosaic of both 59 intraepithelial (Tis) and papillary growth patterns, and from MI cancers having traits of 60 papillary-NMI related mutations. Approximately 22% of MI tumors present with activating 61 mutations in PI3KCA and homozygous deletion of the CDKN2A , respectively [9], while 62 CDKN2A deletion in MIBC has been observed to occur more frequently in FGFR3 mutated 63 than wild type tumors [16]. Interestingly, comparative mutational analysis between low 64 grade NMI, high grade NMI and MIBC revealed smooth increments or declines in the 65 frequency of mutations in driver genes (FGFR3, KDM6A, TP53, CDKN2A) with increasing 66 malignancy [17]. Additionally, multi-omics analysis of NMIBC identified dysfunctional TP53 67 and RB1 pathways in about ~25% of both Ta and T1 tumors [15].

68 The clinical distinction between NMI and MI diseases and the current understanding of their 69 molecular determinants cannot describe adequately the events driving tumor evolution. On 70 the other hand, molecular subtypes may have important utilities in the diagnosis, prognosis, 71 and decision making, but unfortunately, they represent static entities and their dynamics 72 can only be studied between baseline diagnosis and future recurrences/progressions. In 73 contrast, stage as an ordinal variable reflecting tumor size and depth of invasion, offers a 74 better opportunity for studying the progressive alterations marking tumor initiation, growth, 75 dissemination to detrusor muscle and metastasis.

76 Gene expression studies comparing the stage profiles of BLCA typically involve small sample 77 sizes and are often limited to comparing NMI and MI. To obtain insight into the trajectories 78 of cancer evolution in BLCA, we collected publicly available transcriptomes from bulk tissue 79 samples, and performed a comprehensive stage analysis of 1,508 subjects with primary 80 BLCA, including a novel integrated pathway-to-network analysis. As the disease progresses 81 gradually to higher stages, our results indicate tumor dependencies on concerted alterations 82 of gene expression, with most prominent those involved in cell cycle regulation. medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

83

84 METHODS

85 Dataset mining

86 A comprehensive data mining strategy was employed to retrieve studies applying -omics 87 technologies in BLCA. The overall workflow is summarized in Figure 1. All genomic urothelial 88 cancer data from cBioportal (including The Cancer Genome Atlas) were downloaded 89 (5/1/2020). Gene Expression Omnibus (GEO) was queried for transcriptomics, additional 90 genomics or array datasets using the search terms “bladder cancer” and “urothelial 91 carcinoma”. We also queried ArrayExpress using the special filter “Array express data only” 92 to obtain any additional datasets missing from GEO. All cohort data published or updated 93 between 2010 and 2019, annotated as Homo sapiens, coming from tissue samples with 94 sample size >10, were initially retrieved (25/1/2020). All used datasets were published and 95 downloaded anonymized.

96 Integration of the transcriptomes

97 Microarray data were summarized to gene level with the package oligo [18]. Affymetrix 98 were normalized with the RMA method, while Illumina were filtered based on detection p 99 value < 0.05, followed by quantile normalization, addition of 1, and transformation to the 100 natural logarithmic scale. Data were annotated using biomaRt (v2.42.1) [19] and mircroarray 101 probes matching to multiple genes were excluded from downstream analysis. The probe 102 with the highest mean across arrays was selected as representative in cases where multiple 103 probes matched to the same gene. Merging of expression matrices was based on the Hugo 104 Gene Symbol using the intersection of genes between studies.

105 Adjustment for batch effects

106 To correct for batch effects across different studies in the discovery data, ComBat [20], 107 removeBatchEffect [21], and naiveRandRuv [22] were evaluated. ComBat performed best 108 and was chosen for further use (manuscript in preparation). The quality of corrected data 109 was assessed with BatchQC [23] (Supplementary Figures 1-2), with boxplots of expression 110 distribution per sample and principal component analysis plot of sample relationships ,with 111 gene expression comparisons of housekeeping to other genes, and with a set of 12 BLCA 112 markers with known regulation across normal-NMI-MI or across normal-low grade- high 113 grade disease (Supplementary Figure 3).

114 Differential expression, monotonicity and functional annotation

115 To identify genes that form a continuum of changes across BLCA stages, each disease stage 116 was compared against normal samples. Genes significantly affected (Mann-Whitney p < 0.05 117 and had also same orientation of change in all comparisons), were extracted (n = 3,108 118 genes). We refer to this set as Concordantly Differentially Expressed Genes (CDEGs). 119 Monotonicity for a gene was defined as being a CDEG and additionally having a continuously 120 larger/smaller fold change with increasing stage (fold change, as defined in each disease 121 stage versus normal control). Functional annotation and enrichment were performed using 122 PubMed and the online tool GeneCards (https://ga.genecards.org/), respectively.

123 Monotonicity in pathway activation and analysis for gene coexpression medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

124 We utilized CDEGs to infer pathway activation scores and to create stage coexpression 125 networks. Pathway activation scores per sample were calculated with the ssGSEA-GSVA 126 method [24], using the Molecular Signature Database libraries of Hallmark, Canonical 127 Pathways (Reactome subset), C3 (GO biological processes subset) and C5 (GTRD subset of 128 transcription factor targets). Dorothea (https://github.com/saezlab/dorothea) [25] was 129 utilized to assess regulon activity. Activation scores across stages were compared with 130 Mann-Whitney tests, and direction of change was defined based on fold change (= Mean of 131 stage – Mean of T0). Pathways showing a monotonal change in the activation scores with 132 increasing stage were extracted. Monotonicity for a pathway was defined similarly to the 133 previous section, i.e. being significantly different in all stage comparisons to T0, and also 134 having a continuously larger/smaller fold change with increasing stage. Stromal infiltration 135 scores were imputed with the ESTIMATE algorithm [26].

136 Gene-pair coexpression weights (non-negative) were approximated with ensemble learning, 137 using GENIE3 [27] and direction of coexpression (positive/negative) was determined by 138 Spearman’s coefficient. Stage specific networks were constructed with igraph using the top 139 5,600 positively correlated gene pair interactions with the highest weights for each of the 140 stages. Networks were analysed with Louvain clustering [28] to identify local modules of 141 significantly higher coexpression relatioships (communities) within each stage. The top 5 in 142 size (= number of genes) communities of coxpressed genes per stage were analysed for 143 Biological Processes [29]. Significance was thresholded at p<0.05 after FDR 144 adjustment according to Benjamini-Hochberg. Hub genes in the monotonal networks were 145 defined based on the betweenness centrality metric.

146 Statistical analysis

147 For continuous variables, significance was defined at Mann-Whitney p < 0.05, unless stated 148 otherwise. Categorical variables were investigated for significance with the Chi-squared test 149 and were adjusted for multiple hypothesis (CRAN package RVAideMemoire). All statistical 150 tests were two sided. Gene Ontology and Reactome pathway enrichment analysis was 151 conducted with functions from the package ClusterProfiler that performs a two-sided 152 hypergeometric test. All reported correlation scores correspond to the Spearman’s Rank 153 coefficient. Kaplan-Meier analysis and cox proportional hazards regression were performed 154 with the CRAN packages survminer and survival, and statistical significance was determined 155 with the log-rank method. CIBERSORT analysis was conducted in the web platform 156 https://cibersort.stanford.edu/, and only samples with successful deconvolution (p < 0.05, n 157 = 350 samples) were further used for the comparisons of relative immune populations 158 among stages. All processing, analyses and visualizations were created in the R programming 159 language (version 4.0.2), using predominantly Bioconductor libraries.

160

Technology Platform Dataset ID T0 Ta T1 T2 T3 T4 Tis Study size Affymetrix GPL17586 GSE121711 10 3 2 3 18 Affymetrix GPL17586 GSE93527 79 79 Affymetrix GPL570 E-MTAB-1940 4 41 41 86 Affymetrix GPL570 GSE31684 5 10 16 41 18 90 Affymetrix GPL6244 GSE104922 19 7 10 5 0 41 Affymetrix GPL6244 GSE128959 13 39 1 53 medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

Affymetrix GPL6244 GSE83586 13 44 241 1 1 1 301 Illumina GPL14951 GSE48276 2 6 23 6 37 Illumina GPL14951 GSE52219 15 7 1 23 Illumina GPL14951 GSE69795 3 7 27 37 Illumina GPL6102 GSE13507 67 23 80 31 19 12 232 Illumina GPL6947 GSE48075 33 34 42 23 8 2 142 Stage size 81 229 262 371 146 46 4 1139

161 Table 1: Stage distribution across the microarray datasets used for the discovery phase.

162

163 RESULTS

164 Cohort description and data quality controls

165 To perform a comprehensive investigation of publicly available omics studies, we collected 166 105 datasets comprising more than 8,000 individuals (Figure 1). Minimum inclusion criteria 167 for datasets were defined as having at least stage information per subject. To maintain high 168 integrity, we selected microarray data quantified by commercially available single-color 169 channel vendors (Affymetrix and Illumina). Samples or datasets collected after 170 administration of neoadjuvant chemotherapy (NAC) or not clearly annotated as primary 171 were excluded, generating a final dataset corresponding to 1,508 patients. This included in 172 total 12 microarray studies which were employed as a discovery set (Table 1) and in 173 addition, the TCGA 2017 BLCA project based on RNA-seq analysis was used as a validation 174 set (Figure 1).

175 The compiled discovery cohort comprised 1,058 primary bladder cancer tumor 176 transcriptomes of treatment-naïve patients without prior cancer history, along with 81 177 adjacent to the cancer bladder samples, a total of 1,139 gene expression profiles (Table 1). 178 The ratio of men:woman was 3.5:1, with equal distribution among NMI and MI disease (p = 179 0.99) and similar mean ages at baseline diagnosis (68 years, p = 0.81). Percentages of normal 180 samples, NMI, and MI in the dataset were 7.1%, 43.5%, and 49.4%, respectively (Table 1), 181 with the grade distribution being as follows: 16.5% low grade, 48.8% high grade disease with 182 the remaining samples lacking available grade information. Detailed histological records 183 were missing for 71.5% of the cohort, with the most frequently reported histology among 184 the available records being urothelial/papillary (= 23.3%) with squamous differentiation 185 being the most frequent variant (= 1.3%). Annotations included mutation status for FGFR3, 186 PI3KCA, RAS, RB1 and TP53.

187 Removing batch effects with ComBat was assessed with Relative Log Expression and 188 Principal Component Analysis plots (Figure 2A), as well as with comparative analysis of 189 expression levels between houekeeping and non-housekeeping genes (Figure 2B), with the 190 algorithm BatchQC (Supplementary Figures 1 and 2) [23], or using a set of 12 positive BLCA 191 markers with known regulation across normal-NMI-MI or normal-low-high grade disease 192 (Supplementary Figure 3). ComBat successfully eliminated batch effects while maintaining 193 the biological variability (Figures 2A-C). BatchQC reports indicated that the variability in the 194 corrected data was generally explained by stage rather than the batch variable 195 (Supplementary Figure 2), allowing for an in-depth downsteam analysis.

196 medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

197 198 Figure 1: Study design and workflow for the integrative analysis of primary BLCA 199 transcriptomes.

200

201 Monotonically changing genes in BLCA regulate growth and cell-cycle progression

202 For initial assessment of the gene expression relations between normal adjacent urothelium 203 (referred to as T0) and cancerous samples, we performed differential expression analysis of 204 T0 versus NMI and T0 versus MI cancers. When compared to T0, a strong positive correlation 205 (r=0.83, p < 2.1x10^-16) was observed between the two sets of log fold change values, 206 suggesting mutual concordance in abundance change and directionality between NMI and 207 MI tumors. To investigate transcriptional changes associated with increasing malignancy, 208 CDEGs were extracted from the comparisons Ta vs T0, T1 vs T0, T2 vs T0, T3 vs T0 and T4 vs 209 T0. A total of 157 genes were identified having a monotonal (i.e. continuously increasing or 210 decreasing in comparison to T0) change in expression across stages, of which 118 were up- 211 and 39 were downregulated with increasing stage (Table 2). Functional annotation revealed 212 that for 46 of these genes, experimental evidence on mediating cell cycle progression exists 213 (Table 2; Supplementary Table 1). Upregulated cell-cycle associated genes (n = 44) were not 214 phase specific and included cyclins, DNA polymerases, regulators of the cohesin complex and 215 kinetochore components. The two downregulated cell-cycle associated genes, BTG2 and 216 CDC14B, are both tumor suppressors. The list included 23 genes involved in signal 217 transduction (Table 2), 6 of which (ARHGAP11A, AURKA, CDKN3, PBK, PLK1, RRM2), promote 218 cell-cycle progession and were all upregulated with increased stage. The data also indicated 219 an overactivation of the Wnt pathway with increasing disease stage, with its upstream 220 inhibitor APCDD1 being downregulated and its activating ligand WNT2 upregulated 221 (Supplementarty Table 1). Fourteen of the 157 genes were transcriptional or translational 222 regulators (Table 2), including genes with known upregulation in bladder cancer 223 (transcription factors E2F1, DEPDC1 [30, 31]). Based on the monotonal changes with medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

224 increasing stage, increased androgen receptor activity may be predicted, as both its 225 translational enhancer BUD31 [32] and its downstream transcritption factor ELK1 [33] were 226 upregulated. Four (HTR2C, LRP8, NENF, NMU) of the 157 genes are involved in 227 neurotransmission or neuronal development, all upregulated (Table 2). Among the 157 228 genes, 21 genes were of not well described or unknown function (Supplementary Table 1), 229 including the oncogenic factor TRIM65 [34] found upregulated with bladder cancer stage. 230 Additionally, upregulation was detected for the cisplatin resistance gene CLPTM1L, as well as 231 for SRD5A1, which catalyzes the conversion of testosterone and progesterone to their 5- 232 alpha forms [35]. Further functional enrichment using GeneCards for the 157 genes verified 233 their involvement in cell-cycle pathways, with top hits being related to the regulation of the 234 Anaphase-promoting (APC) complex (score = 31.53), to PLK1 (score = 24.47) and Aurora B 235 (score = 20. 95) signaling, as well as to TP53 (score = 19.06) and RB1 (score = 17.85) cell cycle 236 checkpoint control (Supplementary Table 2). Using the same tool, analysis for compounds 237 with potentially therapeutic benefit provided as top hit the DNA alkylating agent 238 Bendamustine (score = 19.23) (Supplementary Table 3) [36]. Univariate cox regression 239 analysis indicated 29 genes with potentially prognostic impact at p < 0.01 (Supplementary 240 Table 4).

241 242 Figure 2: ComBat corrected expression data of the discovery cohort and respective 243 differential gene expression analysis A) Relative Log Expression (right) and Principal 244 Component Analysis plots (left) showing gene expression levels per sample and sample 245 relationships, respectively; the different colors denote the different datasets B) dispersion- 246 to-expression plots illustrating preservation of housekeeping gene properties (higher mean 247 expression and lower median absolute deviation (MAD) compared to non-housekeeping 248 genes) in the corrected data, regardless of microarray (top) or RNAseq (bottom) defined 249 housekeeping genes; C) numbers of differentially expressed genes (DEGs) between disease 250 stages and normal adjacent urothelium (top), along with respective group sizes (bottom), E) 251 heatmap of the top (based on fold change) 20 downregulated and top 20 upregulated 252 monotonically changing genes with increasing disease stage. medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

253

254

Function Genes ANLN, ARHGAP11A, AURKA, CDC20, CDCA3, CDCA5, CDCA8, CDK2AP2, CDKN3, CENPA, CENPN, CEP72, KIF23, KIFC1, NEIL3, OIP5, PRC1, PTTG1, RAE1, RUVBL2, SAC3D1, SMC4, FBXO5, CCNA2, CCNB1, Cell Cyle CCDC124, CEP55, MKI67, MND1, GINS2, MCM10, MCM4, KIF11, KIF14, NDC80, POLD1, POLE2, PBK, PLK1, RRM2, E2F1, DAXX, E2F7, DTL, BTG2, CDC14B ARHGAP11A, AURKA, CDKN3, HTR2C, ITPR3, LRP8, MELK, NENF, NMU, Signal transduction NOX4, PBK, PLK1, PPP1R14C, RRM2, WNT2, AKAP7, APCDD1, IGFBP2, KIT, MTMR9, PPP1R1B, PTCH1, TAPT1 Transcriptional DAXX, DEPDC1, E2F1, E2F7, ELK1, FOXA3, HOXC6, HOXC9, MED19, regulation TEAD4, AEBP2, CBX7 DNA damage CDCA5, EXO1, NDC80, NTHL1, POLD1, POLE2, RAD51AP1, UBE2T response mediated DTL, FBXW2, PSMB3, PSMC1, TRAF2, DET1 Immunity ISG15, SPP1, GBP6, IFNAR1, PIGR CAD, DAGLA, ENO1, IMPDH1, NUDT1, SRD5A1, ADHFE1, ALDH7A1, Metabolism ASS1, CAT, CHST9, CYP4V2, MGST1 ARF5, ARPC5L, BUD31, CLPTM1L, EIF4EBP1, FAM91A1, GP6, PACSIN3, Other roles XPO5, CIRBP, GALNT12, RSL1D1, RWDD3, SLC25A27, UST ARHGDIA, BOP1, CTSA, GTPBP4, LMNB2, LYAR, MRPL37, NIP7, NXT1, Homeostasis POLR2D, POP7, RBM28, SRPRB, CTSO FAM50A, IGFL2, KIAA2013, MLF2, PAQR4, RECQL4, RP9, SBSN, Uncertain/Unknown SLC22A15, SNX8, TRIM65, YIF1A, FYCO1, GARNL3, ICA1L, ITM2C, LONRF1, NHS, NNAT, ZFP2, ZNF181 255 Table 2: Classification of the 157 monotonically changing genes with increasing disease BLCA 256 stage to functional categories. Red and blue colors correspond to upregulation and 257 downregulation, respectively. More details per gene are provided in Supplementary Table 1.

258

259 The expanded dataset of 3,108 CDEGs links cell-cycle, ECM remodelling and metabolic 260 alterations with increasing disease stage

261 To increase coverage of the BLCA alterations, we further extracted Concordantly 262 Differentially Expressed Genes (defined in Methods; CDEGs, n = 3,108) from the comparisons 263 of cancerous stages to normal adjacent urothelium and investigated the differentially 264 activated pathways they represent, using ssGSEA. Similar to the analysis of monotonicity, we 265 sought to identify pathways with monotonically up- or down- regulated activation scores 266 across T0 and BLCA stages. Results indicated gradually stronger activations of several mitotic 267 processes, positive regulation of the canonical Wnt pathway, mTORC1 signaling, expression 268 of MYC targets, degradation of anaphase inhibitors, metabolism of nucleotides, mobility of 269 formins, and the TNFR2/non-canonical NF-kB pathway (Figure 3, Supplementary Table 5). 270 Conversely, consistently reduced expression versus controls was observed in genes involved 271 in lipid and fatty acid catabolic processes, in the metabolism of heme, in pathways 272 promoting muscle cell differentiation and, interestingly, in genes regulating the circadian 273 clock (Figure 3). Regulon activity per sample was estimated and respective scores between medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

274 cancerous stages and normal tissue were compared. GATA3 and GLI2 were the only regulons 275 that significantly associated with increasing malignancy with their activity being 276 downregulated (Figure 3).

277

278 279 Figure 3: Excerpt of the pathways showing a monotonal increase or decline in their 280 activation scores with higher stage.

281

282 To further investigate genome wide alterations and coexpression patterns associated with 283 cancer progression, an integrated network-pathway analysis of the different disease stages 284 was performed, using the 3,108 CDEGs. Out of 9,659,664 weighted gene pair interactions, 285 the top 5,600 were used as the most representative to build the corresponding networks, 286 applying Louvain clustering to identify topologically connected communities (modules). 287 Retrieved gene clusters have high degree of coexpression, and thus changes in their 288 composition across stages reflect gains/losses in pathway rewiring. We used the Gene 289 Ontology – Biological Process library to identify molecular processes affected by changes in 290 coexpression inside the top 5 largest in size (based on number of genes) communities 291 (Figure 4, Supplementary Tables 6-11). The analysis revealed large differences in gene 292 coexpression between T0 and cancerous samples with 4 out of the 5 largest in size 293 communities associating clearly with specific biological processes.

294 Three out of the top 5 communities were consistently detected in all BLCA stage networks. 295 Based on examination of their enriched processes, these were labeled as 1) the cell-cycle 296 community, 2) the ECM and developmental community 3) the metabolic and translational 297 community (Figures 4A, 4B)

298 When investigating the cell cycle community and specifically the processes of G1/S transition 299 of mitotic cell cycle, cell-cycle G2/M phase transition, regulation of cell cycle phase 300 transition, regulation of organization and nuclear division, most of the 301 participating genes were found to be coexpressed in all BLCA stages (Figure 4B). However, 302 the gene size of the cell cycle community increased with higher stage (Ta n = 118, T1 n = 148 medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

303 and MIBC n = 168-170), and interestingly, the proportion of the genes with unknown or 304 unclear function (lacking GO annotation) also increased (11.9% for Ta, 29.8% for T4). 305 Among the genes with cell cycle annotation (n=184), 80 genes were coexpressed in all 306 cancerous stages and were also upregulated in malignancy compared to T0, possibly forming 307 the backbone of cell cycle progression in BLCA (Supplementary Table 13). However, only 36 308 of them had a monotonal increase in the expression levels with higher stage (Supplementary 309 Table 13). To detect potential drivers of variation in the cell cycle co-expression profiles of 310 stages, we looked into the betweenness centrality scores of the genes inside each of the 311 stage cell cycle communities (Figure 4C), and found CDCA5 and KIF2C to be hub genes in Ta 312 tumors, FOXM1 and AURKB in T1, CDT1 and SMC4 in T2, CCNB1 and RRM2 in T3 and KIF14 in 313 T4 tumors (Figure 4C).

314

315 The community of ECM and developmental processes was enriched in cell-cell 316 communication and cell-matrix interactions, in responses to microenvironmental stress, as 317 well as in differentiation programs of epithelial, mesenchymal and stem cells (Figure 5B). 318 The biological process of organization included coexpressions of 15-36 319 genes of which COL13A1, FGFR4, FOXF2 and SCUBE1 were coexpressed only in T0 compared 320 to disease stages, whereas 26 genes (including COL6A1/A2, COL16A1, MFAP5, MMP11) were 321 coexpressed in malignancy but not in T0. In line with recent observations [37], we noticed 322 that T0 presented with an active ECM remodeling profile. Sixteen of the ECM associated 323 genes were coexpressed both in T0 and in the NMIBC stages, including genes promoting 324 ECM degradation and tumor cell migration (MMP2, CTSK, PDPN), fibrotic collagens (COL1A2, 325 COL6A3, COL14A1, COL15A1), and pro-angiogenic factors (PDGFRA, RECK). Notably, 326 coexpression in the T0 samples was predicted to be driven by the cancer stem cell marker 327 ALDH1A2 and MFAP4 (Figure 4C), while in Ta tumors, by COL16A1 and CLIP3. Since CLIP3 328 interacts with both AKT1 and AKT2 [38], CLIP3 may have an important role in the early 329 AKT/PI3K/mTOR axis of carcinogenesis. For the same process of extracellular matrix 330 organization, compared to Ta, T1 tumors had gains in ADAMTSL2, while compared to T1, T2 331 tumors had gains in MMP11 and LRP1. Similarly compared to T2, T3 tumors had 332 coexpression gains in ITGA10 and in genes of the endocytic pathway (ABL1, CYP1B1) 333 whereas compared to T3, T4 tumors had gains in DDR2 and JAM2.

334 The community of metabolism and encompassed mitochondrial, translational 335 and multiple metabolic processes being activated during carcinogenesis, and was more 336 profound in the T1 and more advanced tumors. The processes of cellular respiration, 337 translational initiation, mRNA catabolic process, nonsense mediated decay and protein 338 targeting to ER were consistently enriched in most of BLCA stages. The results highlighted a 339 set of 12 genes commonly co-expressed across stages for these processes, including COX7B, 340 DLD, NDUFS4, UQCRFS1, PAIP2, RPL15, RPL30, RPL7, RPS23, RPS27, RPS27A, RPS4X, with 341 RPL36A and YTHDF3 being T2-specific, COQ10B, EIF4E3, MTIF2, NCPB1 as T3-specific while 342 EIF4E1, ISCU and GSPT2 were T4-specific.

343 Besides the abovementioned consistently detected communities in BLCA, a community 344 enriched in processes of immune cell activation and cytokine secretion, was identified in the 345 T0 and the MIBC stages and involved both innate and adaptive responses, as well as, 346 processes of immune cell adhesion and migration. It was the least variable community with 347 respect to different processes and respective genes coexpressed across the different stages. 348 Strikingly, T0 and the MIBC stages had similar profiles. Out of the 17 genes of the process of medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

349 T-cell activation that were commonly coexpressed at the MIBC stages, 15 were also 350 coexpressed in the T0 samples. To further investigate these observations, the transcriptome 351 data per sample were deconvoluted into relative abundances of immune cell populations 352 using CIBERSORT, and cell fractions between disease stages were compared. Significant 353 results were obtained for the following populations: CD8+, activated CD4+, activated NK, 354 Monocytes, Macrophages M2 and activated Dendritic cells (Supplementary Figure 4). 355 Results indicated differential commitment of immune cells to T0 and BLCA stages. T0 (n = 37) 356 were significantly more infiltrated with CD8+ (p = 0.046) and monocytes (p = 4.6E-4) than 357 tumor samples (n = 313), verifying the presence of the immune community in the T0 358 samples. However, compared to tumor, T0 samples had significantly less abundance of 359 activated CD4+ cells (p = 5.28E-3), of macrophages (p = 16.8E-5), of activated dendritic cells 360 (p = 0.0002) and of activated natural killer cells (p = 0.015). Generally, NMIBC had lower 361 infiltration burden than MIBC, while compared to other BLCA stages, Ta tumors (n = 34) 362 were significantly more infiltrated with activated dendritic cells (p = 0.024). Abundance of 363 CD8+, of activated NK cells, and of M2 macrophages increased linearly with higher 364 malignancy. Interestingly, AIF1 a gene that promotes macrophage survival [39], was found to 365 be driver of immune coexpression in the T4 tumors.

366

367 368 Figure 4: Biological process analysis of the largest in size coexpressed communities identified 369 in each BLCA stage, using the 3,108 CDEGs between T0 and cancerous stages. A) Coherent 370 communities identified and characterized across T0 and disease stages. Numbers in 371 parentheses show the fraction of genes with existing Biological Process annotation with 372 respect to the total number of genes found to be coexpressed in each of the community B) 373 Barplots of the most significantly enriched biological processes per community depicting medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

374 number of coexpressed genes for each. C) Hub genes identified across the studied 375 conditions based on the betweenness centrality scores (y axis).

376

377 RNA-seq independent analysis validates presence of communities and prognostic genes

378 In lack of another dataset comprising all the disease stage spectrum of primary BLCA 379 incidents, the observed alterations in the discovery set were investigated for their 380 reproducibility in the TCGA-BLCA publication [9]. Although the validation set is not perfectly 381 suited since it includes only stages T2-T3-T4, we used it due to the RNA-seq technology 382 which offers a good reliability at the validated findings. Differential expression analysis 383 among the available stage comparisons (T3 vs T2 and T4 vs T3) in the TCGA data validated 74 384 of the 157 monotonal genes in the comparison T3vsT2, and none in the comparison T4vsT3 385 (Supplementary Table 13). Cox regression analysis of the 157 genes in the TCGA data 386 validated 12 genes having a prognostic value (Supplementary Table 4), including IMPDH1, 387 MRPL37, MED19, ENO1, ANLN, GTPBP4, MLF2, higher levels of which associating with worse 388 survival and CBX7, ZFP2, AKAP7, ICA1L, CDC14B higher levels of which associating with 389 better survival probability. To validate as possible the coexpression analysis findings, stage 390 specific coexpression networks were also created using the TCGA data, and were clustered 391 with the Louvain algorithm. GO-Biological process analysis of the communities validated the 392 differential segregation of the cell-cycle, extracellular matrix and immune activation 393 processes to distinct communities (Supplementary Figure 5).

394

395 DISCUSSION

396 Early integration of raw BLCA data has been previously performed, in the context of 397 characterizing molecular subtypes [40], or validating results of either scRNA-seq [41] or RNA- 398 seq re-analysis [42]. In this study we performed an early integration meta-analysis of normal 399 looking urothelium and BLCA stages, aiming to identify continuous gene expression 400 alterations with increasing malignancy. To our knowledge, this is the first attempt to 401 associate molecular alterations with clinical classification, by analyzing more than one 402 thousand well-characterized, primary samples. Instead of focusing onto molecular subtypes, 403 we increased power and addressed the disease as a continuum, under the assumption that 404 individual samples reflect different snapshots of the whole process. Our novel design based 405 on the hypothesis of continuous evolution through stages, has been successfully applied 406 here and resulted in novel findings on gene regulation associated with cancer progression.

407 First, we identified a set of 157 genes with a monotonal change in expression across T0 and 408 BLCA stages. Almost half of these genes were components of the cell cycle machinery, or 409 kinases signaling positively for it, or transcription factors responsible for the expression of 410 cell cycle genes. Twelve of the 157 genes had also prognostic value in external RNA-seq data. 411 Of these, apart from ENO1 higher levels of which had been previously linked with worse 412 BLCA outcome [43], all the remaining 11 associations to survival are novel. IMPDH1 plays a 413 role in cancer progression by promoting cell growth, and higher expression levels of this 414 gene have been observed in MIBC compared to normal tissue [44]. Similarly, MED19 a 415 component of the mediator complex that regulates the transcription of RNA-polymerases, 416 was found overexpressed by IHC in BLCA compared to normal tissues, and its 417 knockdown in the T24 and 5637 bladder cell lines resulted in cell-cylce arrest at the G0/G1 medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

418 checkpoint and also in attenuation of cell growth [45]. The involvement of MRPL37, GTPBP4 419 and MLF2 in BLCA development has not been characterized, but oncogenic properties have 420 been attributed to these genes in other contexts. ANLN and AKAP7 are thought to regulate 421 bladder cell growth and apoptosis in a TP53 independent manner [46]. ICA1L is naturally 422 expressed in sperm cells and its implication in BLCA has not been investigated. The CDC14B 423 gene is located on the 9q chromosome, a region that is often deleted in BLCA. This might 424 also explain its downregulation in malignancy, as observed in the discovery set. CDC14B is 425 believed to dephosphorylate TP53 [47], but the functional consequence on the mitotic or 426 DNA damage repair pathways is not yet well clarified [48]. CBX7 is a component of the 427 modifier PRC1-complex and is required for the propagation of the 428 transcriptionally repressive state of multiple genes through cell-division during embryonic 429 development [49], including Hox genes [50]. Expectedly, we noticed that HOXC6 and HOXC9 430 were both monotonically upregulated with increasing malignancy (Supplementary Table 1). 431 ZFP2 is a probable transcription factor and evidence suggest an epigenetic role as well [51], 432 while recently, high load of mutations in another member of the ZFP family (ZFP36) were 433 associated with upper tract urothelial carcinoma [52].

434 Our analysis highlighted expected landmark pathways, such as mTORC1 pathway [53] and 435 MYC targets [54] which were upregulated, but also novel downregulated pathways such us 436 the circadian clock and the metabolism of heme. These results associate for the first time 437 BLCA progression to the disruption of the circadian homeostasis and to iron metabolism 438 deficiencies, events that are thought to be tumorigenic [55, 56], but their exact mechanism 439 of action is not well understood. We detected an “incomplete” Warburg effect in which the 440 monotonical downregulation (or deactivation) of lipid and fatty acids metabolism, was not 441 accompanied by any upregulation of glycolysis. Additionally, to the GATA3 regulon which is a 442 known driver of luminal biology, we detected a novel progressive downregulation of the 443 GLI2 regulon. GLI are transcription factors of the Sonic hedgehog (Shh) pathway and 444 although GLI2 expression levels positively correlate with more invasive BLCA cell lines, Shh 445 genes do not behave accordingly [57]. Our results validate these observations, as the entire 446 regulon of the GLI2 TF was inactivated with increasing BLCA malignancy, suggesting no 447 potential therapeutic effect in its inhibition.

448 A change in the expression levels of a single gene may not always have a functional 449 consequence, mainly due to the complementary action of other genes of that pathway. 450 Instead, a concerted change in the expression levels of multiple genes is often required for 451 the cell to successfully transit between phenotypes. We investigated this type of changes 452 using a network coexpression analysis followed by intra-stage graph clustering, and were 453 able to detect both intrinsic and microenvironmental gene rewiring programs that shape 454 tumor evolution. These gene programs were found to be organized into distinct biological 455 communities (which were validated in the TCGA 2017 study), and consisted of genes stably 456 coexpressed among all BLCA stages, but also of genes specifically coexpressed (gained or 457 lost) in a particular stage only. We assume that the ECM remodelling relies mostly on this 458 last property of gains and losses in coexpression, since the bladder consists of 459 heterogeneous anatomical tissues with stiff biological barriers (basement membrane, 460 muscle layer), which in turn require specialized cell functions to be dissolved. The immune 461 activation community was present in both T0 and MIBC. Results of the CIBERSORT analysis 462 showed that inside the bladder tumor and with increasing stage, most monocytes 463 preferentially differentiate into macrophages with M2 polarization. This is in line with the 464 recent findings from Chen et al. [41] in which authors analyzed scRNA-seq data of BLCA medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

465 patients and observed a similar pattern of differentiation for monocytes. As the CIBERSORT 466 algorithm has not been designed to recognize signatures of T-cell exhaustion, CD8+ cells 467 appeared at higher number in T0 samples compared to tumor, however we cannot exclude 468 the existence of biological barriers in the normal adjacent urothelium that forbid CD8+ cells 469 from reaching tumor. T4 samples had the highest percent of M2-macrophages, which could 470 partially explain their immunosuppressive state, and as a novel finding, T4 coexpression in 471 the immune cells was driven by AIF1, a gene that ensures the survival and proliferation of 472 macrophages. We hypothesize that immune evasion in BLCA is likely promoted by M2- 473 macrophages that actively express AIF1, but further work is required to validate these 474 observations.

475 Our study has its limitations, which include the retrospective nature of the analysis, and the 476 usage of batch correction methods to integrate different microarrays. The batch correction 477 method unavoidably eliminates some of the biological variability leading to increased false 478 negative rate; our validation of the applied method via checking expression of known genes, 479 in combination to focusing on positive signals, suggests that sufficient biological variability is 480 maintained. In addition, restrictions in the validation set imposed by lack of samples from all 481 disease stages did not allow validating the observations particularly at the NMIBC. Finally, 482 clinical stage is known to have varying rates of misdiagnosis. However, the high number of 483 samples used in each of the stage categories is expected to balance out to some extent 484 misdiagnosed cases while increasing power of the received results.

485

486 Data availability

487 The datasets generated during and/or analysed during the current study are available from 488 the corresponding author on reasonable request.

489

490 Code availability

491 All custom R scripts created for analysis and visualizations are available by the first and the 492 corresponding author upon reasonable request.

493

494 Acknowledgements

495 MM is supported by the European Union’s Horizon 2020 research and innovation 496 programme under the Marie Sklodowska-Curie grant agreement No 898260 (H2020-MSCA- 497 IF-2019, ReDrugBC, Grant agreement ID: 898260).

498

499 Contributions

500 HM and AV conceived and designed the investigation, RS, MF, MZ and EM processed and 501 analyzed data, RS and EM wrote software for analysis and visualizations, RS, MF, MZ HM and 502 AV interpreted results, RS wrote the manuscript, MM, MR, HM and AV provided critical 503 revisions.

504 medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

505 Ethics declarations/ Competing interests

506 HM is the co-founder and co-owner of Mosaiques Diagnostics, MF, EM, and MM are 507 employed by Mosaiques Diagnostics.

508

509 References

510 [1] Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 511 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 512 countries. CA: a cancer journal for clinicians. 2018;68:394-424. 513 [2] Knowles MA, Hurst CD. Molecular biology of bladder cancer: new insights into 514 pathogenesis and clinical diversity. Nature reviews Cancer. 2015;15:25-41. 515 [3] Czerniak B, Dinney C, McConkey D. Origins of Bladder Cancer. Annual review of 516 pathology. 2016;11:149-74. 517 [4] Dyrskjot L, Kruhoffer M, Thykjaer T, Marcussen N, Jensen JL, Moller K, et al. Gene 518 expression in the urinary bladder: a common carcinoma in situ gene expression signature 519 exists disregarding histopathological classification. Cancer research. 2004;64:4040-8. 520 [5] Sjodahl G, Lauss M, Lovgren K, Chebil G, Gudjonsson S, Veerla S, et al. A molecular 521 taxonomy for urothelial carcinoma. Clinical cancer research : an official journal of the 522 American Association for Cancer Research. 2012;18:3377-86. 523 [6] Damrauer JS, Hoadley KA, Chism DD, Fan C, Tiganelli CJ, Wobker SE, et al. Intrinsic 524 subtypes of high-grade bladder cancer reflect the hallmarks of breast cancer biology. 525 Proceedings of the National Academy of Sciences of the United States of America. 526 2014;111:3110-5. 527 [7] Choi W, Porten S, Kim S, Willis D, Plimack ER, Hoffman-Censits J, et al. Identification of 528 distinct basal and luminal subtypes of muscle-invasive bladder cancer with different 529 sensitivities to frontline chemotherapy. Cancer cell. 2014;25:152-65. 530 [8] Rebouissou S, Bernard-Pierrot I, de Reynies A, Lepage ML, Krucker C, Chapeaublanc E, et 531 al. EGFR as a potential therapeutic target for a subset of muscle-invasive bladder cancers 532 presenting a basal-like phenotype. Science translational medicine. 2014;6:244ra91. 533 [9] Robertson AG, Kim J, Al-Ahmadie H, Bellmunt J, Guo GW, Cherniack AD, et al. 534 Comprehensive Molecular Characterization of Muscle-Invasive Bladder Cancer. Cell. 535 2017;171:540-+. 536 [10] Hedegaard J, Lamy P, Nordentoft I, Algaba F, Hoyer S, Ulhoi BP, et al. Comprehensive 537 Transcriptional Analysis of Early-Stage Urothelial Carcinoma. Cancer cell. 2016;30:27-42. 538 [11] Hurst CD, Alder O, Platt FM, Droop A, Stead LF, Burns JE, et al. Genomic Subtypes of 539 Non-invasive Bladder Cancer with Distinct Metabolic Profile and Female Gender Bias in 540 KDM6A Mutation Frequency. Cancer cell. 2017;32:701-15 e7. 541 [12] Sjodahl G, Eriksson P, Liedberg F, Hoglund M. Molecular classification of urothelial 542 carcinoma: global mRNA classification versus tumour-cell phenotype classification. The 543 Journal of pathology. 2017;242:113-25. 544 [13] Mo Q, Nikolos F, Chen F, Tramel Z, Lee YC, Hayashi K, et al. Prognostic Power of a Tumor 545 Differentiation Gene Signature for Bladder Urothelial Carcinomas. Journal of the National 546 Cancer Institute. 2018;110:448-59. 547 [14] Kamoun A, de Reynies A, Allory Y, Sjodahl G, Robertson AG, Seiler R, et al. A Consensus 548 Molecular Classification of Muscle-invasive Bladder Cancer. European urology. 2020;77:420- 549 33. 550 [15] Lindskrog SV, Prip F, Lamy P, Taber A, Groeneveld CS, Birkenkamp-Demtroder K, et al. 551 An integrated multi-omics analysis identifies prognostic molecular subtypes of non-muscle- 552 invasive bladder cancer. Nature communications. 2021;12:2301. medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

553 [16] Rebouissou S, Herault A, Letouze E, Neuzillet Y, Laplanche A, Ofualuka K, et al. CDKN2A 554 homozygous deletion is associated with muscle invasion in FGFR3-mutated urothelial 555 bladder carcinoma. Journal of Pathology. 2012;227:315-24. 556 [17] Nassar AH, Umeton R, Kim J, Lundgren K, Harshman L, Van Allen EM, et al. Mutational 557 Analysis of 472 Urothelial Carcinoma Across Grades and Anatomic Sites. Clinical Cancer 558 Research. 2019;25:2458-70. 559 [18] Carvalho BS, Irizarry RA. A framework for oligonucleotide microarray preprocessing. 560 Bioinformatics. 2010;26:2363-7. 561 [19] Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration of 562 genomic datasets with the R/Bioconductor package biomaRt. Nature protocols. 563 2009;4:1184-91. 564 [20] Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data 565 using empirical Bayes methods. Biostatistics. 2007;8:118-27. 566 [21] Smyth GK, Speed T. Normalization of cDNA microarray data. Methods. 2003;31:265-73. 567 [22] Jacob L, Gagnon-Bartsch JA, Speed TP. Correcting gene expression data when neither 568 the unwanted variation nor the factor of interest are observed. Biostatistics. 2016;17:16-28. 569 [23] Manimaran S, Selby HM, Okrah K, Ruberman C, Leek JT, Quackenbush J, et al. BatchQC: 570 interactive software for evaluating sample and batch effects in genomic data. 571 Bioinformatics. 2016;32:3836-8. 572 [24] Hanzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray 573 and RNA-Seq data. Bmc Bioinformatics. 2013;14. 574 [25] Garcia-Alonso L, Holland CH, Ibrahim MM, Turei D, Saez-Rodriguez J. Benchmark and 575 integration of resources for the estimation of human transcription factor activities. Genome 576 Res. 2019;29:1363-75. 577 [26] Yoshihara K, Shahmoradgoli M, Martinez E, Vegesna R, Kim H, Torres-Garcia W, et al. 578 Inferring tumour purity and stromal and immune cell admixture from expression data. 579 Nature communications. 2013;4. 580 [27] Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P. Inferring Regulatory Networks from 581 Expression Data Using Tree-Based Methods. PloS one. 2010;5. 582 [28] Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in 583 large networks. J Stat Mech-Theory E. 2008. 584 [29] Yu G, Wang LG, Han Y, He QY. clusterProfiler: an R package for comparing biological 585 themes among gene clusters. Omics : a journal of integrative biology. 2012;16:284-7. 586 [30] Lee SR, Roh YG, Kim SK, Lee JS, Seol SY, Lee HH, et al. Activation of EZH2 and SUZ12 587 Regulated by E2F1 Predicts the Disease Progression and Aggressive Characteristics of 588 Bladder Cancer. Clinical cancer research : an official journal of the American Association for 589 Cancer Research. 2015;21:5391-403. 590 [31] Kanehira M, Harada Y, Takata R, Shuin T, Miki T, Fujioka T, et al. Involvement of 591 upregulation of DEPDC1 (DEP domain containing 1) in bladder carcinogenesis. Oncogene. 592 2007;26:6448-55. 593 [32] Hsu CL, Liu JS, Wu PL, Guan HH, Chen YL, Lin AC, et al. Identification of a new androgen 594 receptor (AR) co-regulator BUD31 and related peptides to suppress wild-type and mutated 595 AR-mediated prostate cancer growth via peptide screening and X-ray structure analysis. 596 Molecular oncology. 2014;8:1575-87. 597 [33] Kawahara T, Shareef HK, Aljarah AK, Ide H, Li Y, Kashiwagi E, et al. ELK1 is up-regulated 598 by androgen in bladder cancer cells and promotes tumor progression. Oncotarget. 599 2015;6:29860-76. 600 [34] Wei WS, Chen X, Guo LY, Li XD, Deng MH, Yuan GJ, et al. TRIM65 supports bladder 601 urothelial carcinoma cell aggressiveness by promoting ANXA2 ubiquitination and 602 degradation. Cancer letters. 2018;435:10-22. medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

603 [35] Pham H, Ziboh VA. 5 alpha-reductase-catalyzed conversion of testosterone to 604 dihydrotestosterone is increased in prostatic adenocarcinoma cells: suppression by 15- 605 lipoxygenase metabolites of gamma-linolenic and eicosapentaenoic acids. The Journal of 606 steroid biochemistry and molecular biology. 2002;82:393-400. 607 [36] Leoni LM, Hartley JA. Mechanism of action: the unique pattern of bendamustine- 608 induced cytotoxicity. Seminars in hematology. 2011;48 Suppl 1:S12-23. 609 [37] Wullweber A, Strick R, Lange F, Sikic D, Taubert H, Wach S, et al. Bladder Tumor Subtype 610 Commitment Occurs in Carcinoma In Situ Driven by Key Signaling Pathways Including ECM 611 Remodeling. Cancer research. 2021;81:1552-66. 612 [38] Ding J, Du K. ClipR-59 interacts with Akt and regulates Akt cellular 613 compartmentalization. Molecular and cellular biology. 2009;29:1459-71. 614 [39] Egana-Gorrono L, Chinnasamy P, Casimiro I, Almonte VM, Parikh D, Oliveira-Paula GH, et 615 al. Allograft inflammatory factor-1 supports macrophage survival and efferocytosis and limits 616 necrosis in atherosclerotic plaques. Atherosclerosis. 2019;289:184-94. 617 [40] Tan TZ, Rouanne M, Tan KT, Huang RY, Thiery JP. Molecular Subtypes of Urothelial 618 Bladder Cancer: Results from a Meta-cohort Analysis of 2411 Tumors. European urology. 619 2019;75:423-32. 620 [41] Chen Z, Zhou L, Liu L, Hou Y, Xiong M, Yang Y, et al. Single-cell RNA sequencing highlights 621 the role of inflammatory cancer-associated fibroblasts in bladder urothelial carcinoma. 622 Nature communications. 2020;11:5077. 623 [42] Chen X, Chen H, He D, Cheng Y, Zhu Y, Xiao M, et al. Analysis of Tumor 624 Microenvironment Characteristics in Bladder Cancer: Implications for Immune Checkpoint 625 Inhibitor Therapy. Frontiers in immunology. 2021;12:672158. 626 [43] Ji M, Wang Z, Chen J, Gu L, Chen M, Ding Y, et al. Up-regulated ENO1 promotes the 627 bladder cancer cell growth and proliferation via regulating beta-catenin. Bioscience reports. 628 2019;39. 629 [44] Lee SJ, Lee EJ, Kim SK, Jeong P, Cho YH, Yun SJ, et al. Identification of pro-inflammatory 630 cytokines associated with muscle invasive bladder cancer; the roles of IL-5, IL-20, and IL-28A. 631 PloS one. 2012;7:e40267. 632 [45] Zhang H, Jiang H, Wang W, Gong J, Zhang L, Chen Z, et al. Expression of Med19 in 633 bladder cancer tissues and its role on bladder cancer cell growth. Urologic oncology. 634 2012;30:920-7. 635 [46] da Silva GN, Evangelista AF, Magalhaes DA, Macedo C, Bufalo MC, Sakamoto-Hojo ET, et 636 al. Expression of genes related to apoptosis, cell cycle and signaling pathways are 637 independent of TP53 status in urinary bladder cancer cells. Molecular biology reports. 638 2011;38:4159-70. 639 [47] Li L, Ljungman M, Dixon JE. The human Cdc14 phosphatases interact with and 640 dephosphorylate the tumor suppressor protein p53. The Journal of biological chemistry. 641 2000;275:2410-4. 642 [48] Dietachmayr M, Rathakrishnan A, Karpiuk O, von Zweydorf F, Engleitner T, Fernandez- 643 Saiz V, et al. Antagonistic activities of CDC14B and CDK1 on USP9X regulate WT1-dependent 644 mitotic transcription and survival. Nature communications. 2020;11:1268. 645 [49] Moussa HF, Bsteh D, Yelagandula R, Pribitzer C, Stecher K, Bartalska K, et al. Canonical 646 PRC1 controls sequence-independent propagation of Polycomb-mediated gene silencing. 647 Nature communications. 2019;10:1931. 648 [50] Grossniklaus U, Paro R. Transcriptional silencing by polycomb-group proteins. Cold 649 Spring Harbor perspectives in biology. 2014;6:a019331. 650 [51] Sobocinska J, Molenda S, Machnik M, Oleksiewicz U. KRAB-ZFP Transcriptional 651 Regulators Acting as Oncogenes and Tumor Suppressors: An Overview. International journal 652 of molecular sciences. 2021;22. medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258890; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license .

653 [52] Su X, Lu X, Bazai SK, Comperat E, Mouawad R, Yao H, et al. Comprehensive integrative 654 profiling of upper tract urothelial carcinomas. Genome biology. 2021;22:7. 655 [53] Ching CB, Hansel DE. Expanding therapeutic targets in bladder cancer: the 656 PI3K/Akt/mTOR pathway. Lab Invest. 2010;90:1406-14. 657 [54] Mahe M, Dufour F, Neyret-Kahn H, Moreno-Vega A, Beraud C, Shi M, et al. An 658 FGFR3/MYC positive feedback loop provides new opportunities for targeted therapies in 659 bladder cancers. EMBO molecular medicine. 2018;10. 660 [55] Fu L, Kettner NM. The circadian clock in cancer development and therapy. Progress in 661 molecular biology and translational science. 2013;119:221-82. 662 [56] Chen Y, Fan Z, Yang Y, Gu C. Iron metabolism and its contribution to cancer (Review). 663 International journal of oncology. 2019;54:1143-54. 664 [57] Mechlin CW, Tanner MJ, Chen M, Buttyan R, Levin RM, Mian BM. Gli2 expression and 665 human bladder transitional carcinoma cell invasiveness. The Journal of urology. 666 2010;184:344-51. 667 668