<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A meta-learning approach for genomic survival analysis

Yeping Lina Qiu1,2, Hong Zheng2, Arnout Devos3, Olivier Gevaert2,4,∗ 1Department of Electrical Engineering, Stanford University 2Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University 3School of Computer and Communication Sciences, Swiss Federal Institute of Technology Lausanne (EPFL) 4Department of Biomedical Data Science, Stanford University ∗To whom correspondence should be addressed: [email protected]

Abstract

RNA sequencing has emerged as a promising approach in cancer prognosis as sequencing data becomes more easily and affordably accessible. However, it remains challenging to build good predictive models especially when the sample size is limited and the number of features is high, which is a common situation in biomedical settings. To address these limitations, we propose a meta-learning framework based on neural networks for survival analysis and evaluate it in a genomic cancer research setting. We demonstrate that, compared to regular transfer- learning, meta-learning is a significantly more effective paradigm to leverage high-dimensional data that is relevant but not directly related to the problem of interest. Specifically, meta-learning explicitly constructs a model, from abundant data of relevant tasks, to learn a new task with few samples effectively. For the application of predicting cancer survival outcome, we also show that the meta- learning framework with a few samples is able to achieve competitive performance with learning from scratch with a significantly larger number of samples. Finally, we demonstrate that the meta-learning model implicitly prioritizes based on their contribution to survival prediction and allows us to identify important pathways in cancer.

1 Introduction

Cancer is a leading cause of death in the world. Accurate prediction of its survival outcome has been an interesting and challenging problem in cancer research over the past decades. Quantitative methods have been developed to model the relationship between multiple explanatory variables and survival outcome, including fully parametric models (Hosmer et al., 2008; Klein and Moeschberger, 2006) and semi-parametric models such as the Cox proportional hazards model (Cox, 1972). The Cox model makes a parametric assumption about how the predictors affect the hazard function, but makes no assumption about the baseline hazard function itself (Harrell et al., 1982). In most real world scenarios, the form of the true hazard function is either unknown or too complex to model, making the Cox model the most popular method in survival analysis (Kleinbaum and Klein, 2012). In clinical practice, historically, survival analysis has relied on low-dimensional patient characteristics such as age, sex, and other clinical features in combination with histopathological evaluations such as stage and grade (Louis et al., 2016). With advances in high-throughput sequencing technology, a greater amount of high-dimensional genomic data is now available and more molecular biomarkers can be discovered to determine survival and improve treatment. With the cost of RNA sequencing coming down significantly, from an average of $100M per genome in 2001 to $1k per genome in bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

2015 (Park and Kim, 2016), it is becoming more feasible to use this technology to prognosticate. Such genomic data often has tens of thousands of variables which requires the development of new algorithms that work well with data of high dimensionality. To address these challenges, several implementations of regularized Cox models have been proposed (Goeman, 2010; Park and Hastie, 2007; Wong et al., 2018). A regularized model adds a model complexity penalty to the Cox partial likelihood to reduce the chance of overfitting. More recently, the increasing modeling power of deep learning networks has aided in developing suitable survival analysis platforms for high dimensional feature spaces. For example, autoencoder architectures have been employed to extract features from genomic data for liver cancer prognosis prediction (Chaudhary et al., 2018). The Cox model has also been integrated in a neural network setting to allow greater modeling flexibility (Cheerla and Gevaert (2019); Ching et al. (2018); Luck et al. (2017); Yousefi et al. (2017). In studying a specific rare cancer’s survival outcome, one interesting problem is whether it is possible to make use of the abundant data that is available for more common relevant cancers and leverage that information to improve the survival prediction. This problem is commonly approached with transfer-learning (Pratt, 1993), where a model which has been trained on a single task (e.g., 1 or more abundant cancers) is used to fine-tune on a related target task (rare cancer). In survival analysis, transfer-learning has shown to significantly improve prediction performance (Li et al., 2016). Deep neural networks used to analyze biomedical imaging data can also take advantage of information transfer from data in other settings. For example, multiple studies show that convolutional neural networks pretrained on ImageNet data can be used to build performant survival models with histology images (Deng et al., 2009; Mobadersany et al., 2018). In this context, meta-learning is an area in deep learning research that has gained much attention in recent years which addresses the problem of “learning to learn” (Finn et al., 2017; Vilalta and Drissi, 2002). A meta-learning model explicitly learns to adapt to new tasks quickly and efficiently, usually with a limited exposure to the new task environment. Such a framework may potentially adapt better than the traditional transfer-learning setting where there is no explicit adaptation incorporated in the pre-training algorithm. This problem setting with limited exposure to a new task is also known as few-shot learning: learn to generalize well, given very few examples (called shots) of a new task (Li et al., 2016). Recent advances in meta-learning have shown that, compared to transfer-learning, it is a more effective approach to few-shot classification (Chen et al., 2017; Devos and Grossglauser, 2019), regression (Finn et al., 2017), and reinforcement learning (Duan et al., 2016; Finn et al., 2017). In this study, we propose a meta-learning framework based on neural networks for survival analysis applied in a cancer research setting. Specifically, for the application of predicting survival outcome, we demonstrate that our method is a preferable choice compared to regular transfer-learning pre-training and other competing methods on three cancer datasets when the number of training samples from the specific target cancer is very limited. Finally, we demonstrate that the meta-learning model implicitly prioritizes genes based on their contribution to survival prediction and allows us to uncover biological pathways associated with cancer survival outcome.

2 Materials and methods

2.1 Datasets

We use the RNA sequencing data from The Cancer Genome Atlas (TCGA) pan-cancer datasets (Tomczak et al., 2015). We remove the genes with NA values and normalized the data by log transformation and z-score transformation. The feature dimension is 17176 genes after preprocessing. The data contains 9707 samples from 33 cancer types. The outcome is the length of survival time in months. 78% of the patients are censored, which means that the subject leaves the study before an event occurs or the study terminates before an event occurs to the subject.

2.2 Meta-learning

Our proposed survival prediction framework is based on a neural network extension of the Cox regression model that relies on semi-parametric modeling by using a Cox loss (Ching et al., 2018). The model consists of two modules: the feature extraction network and the Cox loss module (Figure 1). We use a neural network with two hidden layers to extract features from the RNA sequencing

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

data input, which yields a lower dimensional feature vector for each patient. The features are then fed to the Cox loss module, which performs survival prediction by doing a Cox regression with the features as linear predictors of the hazard (Cox, 1972). The parameters of the Cox loss module β are optimized by minimizing the negative of the partial log-likelihood:

  

X X Zj β L(β) = − Ziβ − log  e  (1)

Yiuncensored Yi≥Yi

where Yi is the survival length for patient i, Zi is the extracted feature for patient i, and β is the coefficient weight vector between the feature and the output. Since Zi is the output of the feature extraction module, it can be further represented by:

Zi = fϕ (Xi) (2)

where Xi is the input predictor of patient i, f denotes a nonlinear mapping the neural network learns to extract features form the predictor, and ϕ denotes the model parameters including the weights and biases of each neural network layer. The feature extraction module parameters ϕ and Cox loss module parameters β are jointly trained in the model. For convenience in the following discussion we denote the combined parameters as θ. The optimization of parameters θ consists of two stages: a meta-learning stage, and a final learning stage. The meta-learning stage is the key process, where the model aims to learn a suitable parameter initialization for the final learning stage, so that during final learning the model can adapt efficiently to previously unseen tasks with a few training samples (Nichol et al., 2018). In order to reach the desired intermediate state, a first order gradient-based meta-learning algorithm is used to train the network during the meta-learning stage (Finn et al., 2017; Nichol et al., 2018). Specifically, at the beginning of meta-learning training, the model is randomly initialized with parameter θ. Consider that the training samples for the meta-learning stage consist of n tasks Tτ , τ = 1, 2 . . . n. A task is defined as a common learning theme shared by a subgroup of samples. Concretely, these samples come from a distribution on which we want to carry out a classification task, regression task or reinforcement learning task. The algorithm continues by sampling a task Tτ and using samples of Tτ to update the inner-learner. The inner-learner learns Tτ by taking k steps of k stochastic gradient descent (SGD) and updating the parameters to θτ :

0 θτ = θ 1 0 0 0 θτ ← θτ − αLτ,0 θτ 2 1 0 1 θτ ← θτ − αLτ.1 θτ (3) ... k k−1 0 k−1 θτ ← θτ − αLτ,k−1 θτ

k th where θτ is the model parameter at step k for learning task τ, Lτ,k−1 is the loss computed on the k minibatch sampled from task τ, Lτ,1 is the loss computed on the second minibatch sampled from task τ and so on. The ‘prime’ symbol denotes differentiation, and α is the inner learner step size. Note that this learning process is the separate for all tasks, starting from the same initialization θ. After an arbitrary m number of tasks are independently learnt by the above k-step SGD process, and k obtaining θτ , τ =1, 2, . . . m, we make one update across all tasks with the meta-learner to get a better initialization θ:

m 1 X θ ← θ + γ (θk − θ) (4) m τ τ=1

1 Pm k where γ is the learning step of the meta-learner. The term m τ=1(θτ − θ) can be considered as a gradient, so that for example a popular optimization algorithm like Adam (Kingma and Ba, 2014) can be used by the meta-learner to self-adjust learning rates for each parameter. The entire process

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Input Features Predicted Hazard

Cox Loss Cox Loss Module

Feature Extraction Module

Figure 1: Schematic showing the survival prediction model architecture

of inner-learner update and meta-learner update is repeated until a chosen maximum number of meta-learning epochs is reached. This algorithm is shown to encourage the gradients of different minibatches of a given task to align in the same direction, thereby improving generalization and efficient learning later on (Nichol et al., 2018). In the final learning stage, the model is provided with a few-sample dataset of a new task. First, the model is initialized with the meta-learnt parameters θ, which are then fine-tuned with the new k0 task training data to θτ and finally the fine-tuned model is evaluated with testing data from the new task. The training procedure of final learning does not require a special algorithm, and can be conducted with regular mini-batch stochastic gradient descent. This final learning stage is equal to the inner-learning loop for a single task in Equation (3) without any outer loop.

2.3 Experimental setup

In order to assess the meta-learning method’s performance, we compare it with several alternative training schemes based on the same neural network architecture: regular pre-training, combined learning, and direct learning. First, meta-learning initially learns general knowledge from a dataset containing tasks that are relevant but not directly related to the target, and then learns task-specific knowledge from a very small target task dataset. We define the first dataset as the “multi-task training data”, and the second as the “target task training data”. Secondly, regular pre-training also has a two-stage learning process on the same datasets, but unlike meta-learning without explicitly focusing on learning to reach an initialization that is easy to adapt to new tasks. Thirdly, combined learning does not involve a two-stage learning process, but also leverages knowledge from the relevant tasks by combining the multi-task training data and the target task training data together in one dataset to train a prediction model. Direct learning on the other hand, only uses the target task training samples. To illustrate the effectiveness of the methods with few samples, we consider three cases of direct learning: a large sample size, a medium sample size, and a small sample size which is the same size as the “target task training data” used for the other methods (i.e. regular pre-training, combined learning and meta-learning) (Figure 2). In our experiments, the “multi-task training data” is the pan-cancer RNA sequencing data containing samples from any cancer sites except one cancer site that we define as the target cancer site. The associated target cancer data is considered as the “target task data”. This target task dataset is split into training data (80%) and testing data (20%), stratified by disease sub-type and censoring status.

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Multi-task training Direct data Combined Learning Meta Pre- Learning Learn train Combined Target task data Learn Final Fine- Target task training data Direct (n=20) Learn tune

Learn Meta- Regular Target task training data learning Pre-training (n=150)

25 Target task training data 25 25 25 trials (n=250) trials trials trials

Target task testing data (20% of target task data)

Figure 2: Data flow for meta-learning, regular pre-training, and combined learning frameworks

For meta-learning, regular pre-training, and combined learning we will not use all 80% of the training set for the target task, as we want to assess the performances when the algorithm is exposed to only a small number of target task training samples. Therefore, we will randomly draw 20 samples from the training dataset as one “target task training data”. We choose a small sample size of 20 because it is a possible case in real life situations where the target task is the study of rare diseases (Hee et al., 2017), or where new technologies are used to produce data which only have the capacity to produce a small sample. For direct learning, we randomly draw three different sizes from the training datasets to form training data, 20 for the small size, 150 for the medium size, and 250 for the large size. All methods are evaluated on the common testing data of the target task. Finally, as a linear baseline, we use the combined learning training sample (multi-task training data and target task training data) to train a linear cox regression model. We conduct 25 experiment trials for each method, where each trial is trained with a randomly drawn “target task training dataset” as described above.

2.4 Evaluation

We evaluate the survival prediction model performances with two commonly used evaluation metrics: the concordance index (C-index) (Harrell et al., 1982) and the integrated Brier score (IBS) (Brier, 1950). Fistly, the C-index is a standard performance measure of a model’s predictive ability in survival analysis. It is calculated by dividing the number of all pairs of subjects whose predicted survival times are correctly ordered, by the number of pairs of subjects that can possibly be ordered. A pair cannot be ordered if the earlier time in the pair is censored or both events in the pair are censored. A C-index value of 1.0 indicates perfect prediction where all the predicted pairs are correctly ordered, and a value of 0.5 indicates random prediction. Secondly, the integrated Brier score is used to evaluate the error of survival prediction and is represented by the mean squared differences between observed survival status and the predicted survival probability at a given time point. The IBS provides an overall calculation of the model performance at all available times. An IBS value of 0 indicates perfect prediction, while 1 indicates entirely inaccurate prediction. We select cancer sites from TCGA with the following two inclusion criteria: (1) a minimum of 450 samples, providing enough training samples for different benchmarking training schemes and (2) a minimum of 30% non-censoring samples, enabling more accurate evaluation than more heavily censored cohorts. This results in the following cancers: glioma, including glioblastoma (GBM) and

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Hyper-parameter Value Task-level optimizer SGD Task-level learning rate 0.01 Task-level gradient steps 5 Task-level Batch size 100 Meta-level optimizer ADAM Meta-level learning rate 0.0001 Meta-level tasks batch size 10 L2 regularization scale 0.1

Table 1: Selected hyper-parameters for meta-learning’s meta-learning stage

low-grade-glioma (LGG); non-small cell lung cancer, including lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), and head-neck squamous cell carcinoma (HNSC). In addition, these three types of cancers are of clinical interest, because gliomas are the most common type of malignant brain tumor, and lung cancer is the deadliest cancer in the world (Ceccarelli et al., 2016; Herbst and Lippman, 2007). HNSC, on the other hand, is a less widely studied type of cancer, which nonetheless attracted growing attention in the recent decade since the release of the publicly available largest dataset in HNSC by TCGA (Brennan et al., 2017; Tonella et al., 2017). We evaluate the C-index and IBS in 25 experimental trials for each method.

2.5 Hyper-parameter selection

To avoid overfitting, we do not conduct a separate hyper-parameter search for each of the cancer datasets. Instead, we search for hyper-parameters on one type of cancer and apply the chosen parameters to all experiments. We select the largest cancer cohort, glioma, and use 5-fold cross- validation for hyper-parameter selection. For each given set of hyper-parameters, we average the results from five validation sets (each is 20% of training data). Since there is similarity in the algorithm between methods (combined learning, direct learning, and regular pre-training), we share hyper-parameters between experiments when it makes sense, as detailed below. All methods use the same neural network architecture with two hidden fully connected layers of size 6000 and 2000, and an output fully connected feature layer of size 200. Each layer uses the ReLU activation function (Nair and Hinton, 2010). Initially we experiment with 4 different structures: 1 or 2 hidden layers with feature size of 200 or 50, respectively. We chose the optimal structure detailed before and use it as the architecture for all methods in our subsequent discussion. For the regular pre-training model, we search for hyper-parameters for the pre-training stage and fine-tuning stage separately. For both stages, we test the mini-batch gradient descent and Adam optimizers, and determine learning rates with grid search on a grid of [.1, .05, .01, .005, .001] for SGD and a grid of [.001, .0005, .0001,.00005, .00001] for Adam. We test batch sizes of 50, 100, 200, and 800 for pre-training. The selected parameters for the pre-train stage are: an SGD optimizer with learning rate of .001, L2 regularization scale of 0.1 and batch size of 800. The selected parameters for the fine-tune stage are: an SGD optimizer with learning rate of .001, L2 regularization scale of 0.1, and batch size of 20 which is the size of each target cancer training dataset. For the combined learning model and direct learning model, since the algorithm is very similar to the regular pre-training model’s pre-train stage, we use the same parameters selected for the pre-train. The batch sizes for direct learning is half of the size of training data. For the meta-learning model, we search for hyper-parameters for the meta-learning only. For the final learning stage, we use the same hyper-parameters as in the fine-tune stage of the regular pre-training model, as both methods can use similar algorithms in the last stage of training. From our previous discussion, in the meta-learning stage an SGD optimizer and an Adam optimizer are suitable for the inner learner and meta-learner respectively. For the learning rates, we perform grid search on a grid of [.1, .05, .01, .005, .001] for SGD and a grid of [.001, .0005, .0001,.00005, .00001] for Adam. Batch size is searched from [50, 100, 200, 800], the number of tasks for averaging one meta-learner update is searched from [5, 10, 20], and the number of gradient descent steps for the inner-learner is

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Method Glioma Lung cancer HNSC Direct (250 samples) 0.24 ± 0.02 0.19 ± 0.01 0.20 ± 0.01 Direct (150 samples) 0.25 ± 0.01 0.19 ± 0.01 0.21 ± 0.01 Direct 0.30 ± 0.02 0.24 ± 0.02 0.30 ± 0.02 Combined 0.29 ± 0.02 0.21 ± 0.02 0.26 ± 0.02 Pre-training 0.31 ± 0.02 0.23 ± 0.02 0.26 ± 0.02 Meta-learning 0.28 ± 0.01 0.16 ± 0.01 0.16 ± 0.00

Table 2: Integrated Brier scores (IBS) with 95% confidence intervals (n=X trials) for target cancer survival prediction with 20 samples, unless specified otherwise. Lower value is better. Best performing method in bold.

searched from [3, 5, 10, 20]. The selected parameters for the meta-learning stage are shown in Table 1. Finally, in order to evaluate the effect of fluctuations of the meta-learning hyper-parameters, and ensure that our results reflect the average performance over fluctuations, we conduct a series of tests on the validation data where in each experiment we vary one of the five unique meta-learning hyper-parameters from the chosen value by tuning it up or down by one grid, obtaining ten sets of varied hyper-parameters. We do 5-fold cross-validation for each set of varied hyper-parameters and compute the concordance index from the resulting fifty experiments. We also do fifty random experiments using the selected hyper-parameters and compare the average results of varied versus selected hyper-parameters. We conduct a two-sample t-test on the two results, and conclude that the results obtained by varied parameters do not have a significant difference from the results obtained by the chosen parameters (mean concordance index difference of 0.005 with p value 0.50). Therefore, our results are robust with respect to fluctuations of the hyper-parameters and our conclusions are not based on excessive hyper-parameter tuning.

2.6 Interpretation of the genes prioritized by the meta-learning model

We apply risk score backpropagation (Yousefi et al., 2017) to the meta-learned models to investigate the feature importance of genes for each of the three target cancer sites. For a given sample, each input feature is assigned a risk score by taking the partial derivatives of the risk with respect to the feature. A positive risk score with high absolute value means the feature is important in poor prediction (high risk), and a negative risk score with high absolute value means the feature is important in good prediction (low risk). The features are ranked by the average of risk score across all samples. Two approaches were adopted for annotating the genes with ranked risk scores generated by the meta-learning model. Firstly, the top 10% high-risk genes (genes with positive risk scores) and the top 10% low-risk genes (genes with negative risk scores) from each cancer type were subjected to set over-representation analysis, by comparing the genes against the gene sets annotated with well-defined biological functions and processes. We model the association between the genes and each gene set using a hypergeometric distribution and Fisher’s exact test. Secondly, instead of arbitrary thresholding in the first approach, all the genes, together with their ranked risk scores were incorporated in the gene set enrichment analysis with the fgsea R package (Sergushichev, 2016) which calculates a cumulative gene set enrichment statistic value for each gene set. The gene set databases used in this analysis include Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto, 2000), The Reactome Pathway Knowledgebase (Croft et al., 2014) and WikiPathways (Slenter et al., 2018).

3 Results

3.1 Meta-learning outperforms regular pre-training and combined learning

For all of the target cancer sites, meta-learning achieves similar or better performance than regular pre-training or combined learning (Figure 3; Table 2). For the glioma cohort, the mean C-index for meta-learning is 0.86 (0.85-0.86 95% CI), compared to 0.84 (0.83-0.85 95% CI) for regular

7 2). For the 4; Table Combined learning Pre−training Meta−learning . The copyright holder for this preprint (which preprint this for holder copyright The

Lung cancer Glioma HNSC

8 Meta−learning this version posted April 23, 2020. 2020. 23, April posted version this ;

CC-BY-NC-ND 4.0 International license International 4.0 CC-BY-NC-ND Pre−training

available under a under available Combined learning Combined

0.55 0.50 0.45 0.85 0.80 0.75 0.70 0.65 0.60 0.65 0.60 0.55 0.50 0.70 https://doi.org/10.1101/2020.04.21.053918 Concordance index (C−index) index Concordance doi: doi: mean C-indices for large sample,95% medium sample CI), and 0.85 small (0.85-0.86 sample 95%the direct CI), learning mean and are C-index 0.82 0.86 (0.81-0.84 (0.86-0.87 is 95% 0.86 CI) (0.85-0.86 respectively, and 95% fordirect CI). meta-learning learning, For 0.53 (0.49-0.56 the 95% HNSC CI) cohort, for small the sample mean direct C-index learning, and is 0.61 0.62 (0.59-0.63 95% training can compensate for suchdata lack explicitly of and information implicitly, by respectively.prediction transferring We performance show knowledge from that than the meta-learning large-samplecomparable pan-cancer achieves direct performance similar training with or in medium-sample better direct lunglung training cancer cancer in cohort, and glioma the HNSC, (Figure mean0.54 and C-index (0.52-0.56 reaches is 95% 0.57 (0.56-0.58 CI)sample 95% for direct CI) medium learning, for and sample large 0.65 sample direct (0.65-0.66 direct learning, 95% learning, CI) 0.53 for (0.50-0.55 meta-learning. 95% For CI) the glioma for cohort, small the Next, we compare our meta-learningsamples approach with with different regular cohort direct sizes.the learning on number The the performance of target of training taskamount direct training of samples learning information decreases drops is from significantly lost and when 250 the to model 20, can hardly which learn is well. anticipated However, meta-learning because and a pre- great 0.62 (0.61-0.64 95% CI) foracross combined 25 training. random trials Note also that,glioma tends the cohorts. to variance be In of the addition, the smallest, meta-learning eachon which results of average is than these most a multi-layer observable linear neural forlung baseline networks the cancer model. also lung (0.60-0.62 shows The cancer 95% better linear and CI),95% performance baseline 0.77 CI). model for achieves glioma a (0.74-0.80 C-index 95% of CI), 0.61 and for 0.59 for HNSC (0.58-0.62 3.2 Meta-learning achieves competitive predictive performance compared to direct learning pre-training and 0.81 (0.81-0.82mean 95% C-index CI) is for 0.65 combined (0.65-0.66pre-training, training. 95% and 0.62 CI) For (0.62-0.63 for the 95%is meta-learning, lung CI) 0.61 0.60 cancer for (0.59-0.63 (0.58-0.61 cohort, combined 95% 95% training. the CI) CI) For for for the meta-learning, regular HNSC 0.59 cohort, (0.57-0.61 the 95% result CI) for regular pre-training and Figure 3: C-Indexpre-training for and meta-learning target cancer survival prediction, comparing combined learning, regular (0.60-0.64 95% CI) for large sample direct learning, 0.57 (0.54-0.59 95% CI) for medium sample was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made made It is perpetuity. in preprint the display to license a bioRxiv has granted who author/funder, is the review) peer by certified not was bioRxiv preprint preprint bioRxiv bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

CI) for meta-learning. Thus, in all three cancer sites, meta-learning reaches competitive performances as large sample direct learning and can outperform it in certain cases.

0.65 Lung cancer 0.60 0.55 0.50 0.45 0.40 direct 20

0.85 Glioma direct 150 0.80 direct 250 0.75 Meta−learning 0.70 0.65 0.60 HNSC

Concordance index (C−index) Concordance index 0.55 0.50 0.45 0.40

direct 20 direct 150 direct 250 Meta−learning

Figure 4: C-Index for target cancer survival prediction, comparing direct learning with large (250), medium (150) and small (20) size samples and meta-learning.

3.3 Risk score ranked genes are enriched in key cancer pathways

Next, we investigated for each cancer site what genes are most important in the meta-learning model (Figure 5, Supplementary Tables S1- S6). In gliomas, the high-risk genes are associated with viral carcinogenesis (p value 0.002), Herpes simplex infection (p value 0.007), (p value 0.03), (p value 0.03), DNA damage response (p value 0.04), all of which are also enriched in gene set enrichment analysis with positive enrichment scores (Figure 5a, Table S2). The low-risk genes are associated with HSF1 activation (p value 0.02) which is involved in hypoxia pathway, and tryptophan metabolism (p value 0.04), the latter is also enriched in gene set enrichment analysis with negative enrichment score. Tryptophan catabolism has been increasingly recognized as an important microenvironmental factor in anti-tumor immune responses (Platten et al., 2012) and it is a common therapeutic target in cancer and neurodegeneration diseases (Platten et al., 2019). In head and neck cancer, the high-risk genes are associated with PTK6 signaling (p value 0.01), which regulates cell cycle and growth, and and inflammatory response (p value 0.009). The low-risk genes are associated with autophagy (p value 0.02), which is also enriched in gene set enrichment analysis. Other enriched pathways include B cell signaling pathway, cell cycle, and interleukin 1 signaling pathway (Figure 5b, Table S4). Interleukin 1 is an inflammatory which plays a key role in carcinogenesis and tumor progression (Mantovani et al., 2018). In lung cancer, the top high-risk genes are associated with "non-small cell lung cancer" pathway (p value 0.01), tuberculosis (p value 0.008), Hepatitis B and C virus infection (p value 0.03), and many pathways implicated previously in cancer. These pathways are also enriched in gene set enrichment analysis (Figure 5c, Table S6). Pulmonary tuberculosis has been shown to increase the risk of lung cancer (Wu et al., 2011; Yu et al., 2011). The low-risk genes are associated with energy metabolism (p value 0.03), ferroptosis (p value 0.037) and AMPK signaling pathway (p value 0.046), all related to energy metabolism, particularly lipid metabolism. AMPK signaling pathway activation by an AMPK agonist was shown to suppresses non-small cell lung cancer through inhibition of lipid metabolism (Chen et al., 2017). AMPK signaling and energy metabolism are also enriched in gene set enrichment

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Pathway Gene ranks NES pval Renin−angiotensin system,KEGG 1.71 1.0e−02 Viral carcinogenesis,KEGG 1.53 1.4e−03 Hepatitis C,KEGG 1.51 4.6e−03 Phototransduction,KEGG 1.51 4.2e−02 Epstein−Barr virus infection,KEGG 1.49 3.4e−03 Small cell lung cancer,KEGG 1.40 3.0e−02 Herpes simplex infection,KEGG 1.38 2.1e−02 signaling pathway,KEGG 1.37 4.9e−02 Cushing syndrome,KEGG 1.30 5.0e−02 Tryptophan metabolism,KEGG −1.55 2.3e−02 TP53 Regulates of Cell Death Genes 1.92 1.1e−03 Mitotic G1−G1−S phases 1.65 1.7e−02 Apoptosis 1.65 2.5e−03 Integrated Breast Cancer Pathway 1.62 6.3e−03 Prostaglandin Synthesis and Regulation 1.60 1.2e−02 Oxidative Damage 1.57 1.7e−02 Interferon alpha−beta signaling 1.53 3.9e−02 G1 to S cell cycle control 1.50 2.0e−02 Preimplantation Embryo 1.50 2.7e−02 IL17 signaling pathway 1.47 4.7e−02 BDNF−TrkB Signaling 1.46 4.4e−02 DNA Damage Response 1.42 3.3e−02 Cell Cycle 1.34 3.9e−02 alkylation leading to liver fibrosis −1.42 4.5e−02 Peptide GPCRs −1.42 4.2e−02 Tryptophan metabolism −1.45 4.0e−02 regulation in adipogenesis −1.57 2.8e−02 0 4000 8000 12000 16000 (a) Glioma Pathway Gene ranks NES pval African trypanosomiasis,KEGG 1.71 6.4e−03 ,KEGG 1.61 1.1e−02 Cell cycle,KEGG 1.52 6.4e−03 beta−Alanine metabolism,KEGG 1.50 3.8e−02 Pathogenic Escherichia coli infection,KEGG 1.48 2.8e−02 Mineral absorption,KEGG 1.45 4.0e−02 Proteoglycans in cancer,KEGG 1.35 1.8e−02 Protein processing in ,KEGG 1.34 2.9e−02 Hepatocellular carcinoma,KEGG 1.30 4.0e−02 cAMP signaling pathway,KEGG −1.32 3.1e−02 Toxoplasmosis,KEGG −1.36 3.6e−02 FoxO signaling pathway,KEGG −1.39 2.2e−02 Autophagy,KEGG −1.48 9.7e−03 Fc epsilon RI signaling pathway,KEGG −1.54 1.4e−02 B cell receptor signaling pathway,KEGG −1.56 9.6e−03 Autophagy,KEGG −1.48 9.7e−03 NAD+ biosynthetic pathways 1.68 1.4e−02 Cell Cycle 1.56 4.0e−03 Cytokines and Inflammatory Response 1.52 4.4e−02 Photodynamic therapy−induced NF−kB survival signaling 1.50 3.5e−02 Tumor suppressor activity of SMARCB1 1.50 3.7e−02 Pathogenic Escherichia coli infection 1.48 2.8e−02 Interleukin−4 and Interleukin−13 signaling 1.44 2.1e−02 Proteasome Degradation 1.44 3.1e−02 Parkin− Proteasomal System pathway 1.37 5.0e−02 Structural Pathway of Interleukin 1 (IL−1) −1.42 4.8e−02 MicroRNAs in cardiomyocyte hypertrophy −1.49 1.6e−02 Hematopoietic Stem Cell Gene Regulation by GABP alpha−beta Complex −1.51 4.4e−02 miRNA regulation of prostate cancer signaling pathways −1.53 3.1e−02 IL−1 signaling pathway −1.54 1.4e−02 Differentiation of white and brown adipocyte −1.56 2.9e−02 Type II diabetes mellitus −1.67 1.6e−02 Fatty Acid Biosynthesis −1.84 3.0e−03 0 4000 8000 12000 16000 (b) Head and neck cancer Pathway Gene ranks NES pval Pentose phosphate pathway,KEGG 1.99 5.6e−04 Chronic myeloid leukemia,KEGG 1.73 1.3e−03 Glioma,KEGG 1.70 2.0e−03 Bladder cancer,KEGG 1.68 6.3e−03 Melanoma,KEGG 1.68 3.2e−03 Tuberculosis,KEGG 1.52 2.9e−03 Phospholipase D signaling pathway,KEGG 1.50 5.6e−03 MicroRNAs in cancer,KEGG 1.45 9.7e−03 Type I diabetes mellitus,KEGG −1.67 7.7e−03 Mammary gland development pathway − Pregnancy and lactation (Stage 3 of 4) 1.97 5.6e−04 Retinoblastoma Gene in Cancer 1.89 4.0e−05 Non−genomic actions of 1,25 dihydroxyvitamin D3 1.88 3.2e−04 LTF danger signal response pathway 1.75 8.9e−03 Fluoropyrimidine Activity 1.71 7.2e−03 Signaling of Hepatocyte Growth Factor Receptor 1.67 8.3e−03 Non−small cell lung cancer 1.58 9.0e−03 Nanoparticle triggered autophagic cell death −1.83 3.7e−03 0 4000 8000 12000 16000 (c) Lung cancer Figure 5: Gene set enrichment analysis of the ranked gene list in (a) Glioma (b) Head and neck cancer (c) Lung cancer. The gene set databases used in this analysis included Kyoto Encyclopedia of Genes and Genomes (KEGG) and WikiPathways. Pval, enrichment p-value; NES, normalized enrichment score. In (a) and (b) the enrich pathways with p value below 0.05 were displayed. In (c) the enrich pathways with p value below 0.01 were displayed. Detailed information can be found in supplementary tables S2, S4 and S6.

analysis (Figure 5C). Other enriched pathways include Notch signaling, interleukin signaling, ErbB signaling, and signaling pathways regulating pluripotency of stem cells.

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

4 Discussion

Previous studies have shown that when analyzing high-dimensional genomic data, deep learning survival models can achieve comparable or superior performance compared to other methods (e.g., Cox elastic net regression, random survival forests) (Kim et al., 2019). However, the performance of deep learning is often limited by the relatively small amount of available data (Yousefi et al., 2017). To address this issue, our work investigates different deep learning paradigms to improve the performance of deep survival models, especially in the setting of small size training data. In previous studies the most common way to build deep survival models is to train neural networks with a large number of target task training samples from scratch, a process we call direct training. Direct training with a large sample size (e.g. n = 250) can thus be considered as a baseline. As expected, the performance of direct training drops when the number of training samples decreases (e.g. from n = 150 to n = 20). On the other hand, combined learning, regular pre-training, and meta-learning all leverage additional data from other sources, thereby enabling them to achieve better performances when the training sample size is small. We use a small number of target cancer site training samples (e.g. n = 20) with these methods and investigate their performance. It is important to note that combined learning, regular pre-training and meta-learning are exposed to exactly the same information, but differ only in their algorithms. Combined learning is a one-stage learning process, whereas pre-training and meta-learning are two-stage learning methods. Meta- learning shows better predictive performance than combined or regular pre-training, indicating that it is able to adapt to a new task more effectively due to the improved optimization algorithm targeting the few-sample training environment. It has been shown that methods which use only target task data (direct learning with different size samples) and methods which use additional information (combined, pre-training and meta-learning) perform differently, and one type of approach may be better than the other on different cancer sites. For example, on glioma, direct learning tends to do better overall; whereas on lung cancer, the other methods outperform direct learning. This may be due to that fact that the amount of information that can be learnt from related data versus from the target data is different for each cancer site. If there is significant information within the target cancer samples alone, then direct training will be more effective than learning from other cancer samples. On all three cancer sites we observe that meta-learning achieves similar or better performance than medium-size direct training, and outperforms large-size direct training in some cases. However, the advantage of meta-learning may not generalize to every cancer site. Certain cancers may have very unique characteristics so that transfer of information from other cancers may not help in prediction regardless of improved adaptivity. For the three cancer sites, the affinity of each target cancer to other types of cancers in the pan-cancer data aids the performance of meta-learning, which efficiently transfers the information from other cancers to the target cancers. On the other hand, some cancers are more dissimilar from other cancer sites which makes information transfer difficult. For example, for another cancer site, kidney cancer, specifically kidney renal clear cell carcinoma (KIRC) and kidney renal papillary cell carcinoma (KIRP), both meta-learning and pre-training do not produce good survival prediction. This can be visualized in Figure S1, comparing the affinity between different target cancers with the rest of the cancers on a t-distributed stochastic neighbor embedding (t-SNE) graph. Therefore, in order for meta-learning to achieve good performance, the related tasks training data need to contain a reasonable amount of transferable information to the target task. The performance of meta-learning can be explained by the learned learning algorithm at the meta- learning stage where the model learns from related tasks. We further investigate how to optimize meta-learning performance. We examine results from two sampling approaches when forming one task, where we either draw samples only from one cancer, or draw samples from multiple types of cancer. It is a more natural choice to consider each cancer type as a separate task, but we found that the latter leads to improved performance. To explain this improvement, we examine the gradient of the meta-learning loss function. It can be shown that the gradient of the loss function contains a term that encourages the gradients from different minibatches for a given task to align in the same direction (Appendix A). If the two minibatches contain samples from the same type of cancer, their gradient might already be very similar and thus this higher order term would not have a large effect. On the other hand, if the second minibatch contains samples from a different type of cancer than the first, the algorithm will learn something that is common to both of them and thereby help to improve generalization.

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

The gene set enrichment analysis results validate our model for prioritizing the genes for survival predication. In the three cancer types investigated, the resulting gene lists are enriched in key pathways in cancer including cell cycle regulation, DNA damage response, cell death, interleukin signaling, and NOTCH signaling pathway, etc. Apart from the well-recognized cancer pathways, our results also reveal potential players affecting cancer development and prognosis, that are not well-studied yet. Viruses have been linked to the carcinogenesis of several cancers, including human papilloma virus in cervical cancer, hepatitis B and C viruses in liver cancer, and Epstein-Barr virus in several lymphomas and nasopharyngeal carcinoma (Martin and Gutkind, 2008). Our results further suggest that viruses might also play a role in glioma and lung cancer, where the high-risk genes are enriched in several viral carcinogenesis pathways. In gliomas, the enriched pathways that are unfavorable for survival include Epstein-Barr virus and herpes simplex infection. In lung cancer, both hepatitis B and C virus infection pathways are enriched. This suggests that that hepatotropic viruses may affect the respiratory system, including the association with lung cancer. For example, hepatitis B virus infection has been associated with poor prognosis in patients with advanced non-small cell lung cancer (Peng et al., 2015). The role of Epstein-Barr virus in gliomagenesis have also been studied but the results remain inconclusive (Akhtar et al., 2018). Whether these viruses do play a role in carcinogenesis and further affect cancer prognosis, or the association we observed reflects an abnormal immune system that is unfavorable for the survival of cancer patients remains to be investigated. As for the enriched pathways that are favorable for cancer survival, we identified pathways related to metabolism, in particular, lipid metabolism, in all the three cancer types investigated. In glioma, the top enriched pathway favorable for cancer survival is adipogenesis regulation. In head and neck cancer, differentiation of adopocyte and fatty acid biosynthesis are top enriched favorable pathways. In lung cancer, ferroptosis and AMPK signaling pathway are both related to energy metabolism. Ferroptosis is a process driven by accumulated iron-dependent lipid ROS that leads to cell death. Small molecules-induced ferroptosis has a strong inhibition of tumor growth and enhances the sensitivity of chemotherapeutic drugs, especially in drug resistance (Lu et al., 2018). AMPK plays a central role in the control of cell growth, proliferation and autophagy through the regulation of mTOR activity and lipid metabolism (Chen et al., 2017; Han et al., 2013). The link between cancer and metabolism is worth investigating in future studies.

5 Conclusion

In survival analyses one problem that researchers have encountered is the insufficient amount of training samples for machine learning algorithms to achieve good performances. We address this problem by adapting a meta-learning approach which learns effectively with only a small number of target task training samples. We show that the meta-learning framework is able to achieve similar performance as learning from a significantly larger number of samples by using an efficient knowledge transfer. Moreover, in the context of limited training sample exposure, we demonstrate that this framework achieves superior predictive performance over both regular pre-training and combined learning methods on two types of target cancer sites. Finally, we show that meta-learning models are interpretable and can be used to investigate biological phenomena associated with cancer survival outcome. The problem of small data size may be a limiting factor in many biomedical analyses, especially when studies are conducted with data that is expensive to produce, or in the case of multi-modal data (Cheerla and Gevaert, 2019). Our work shows the promise of meta-learning for biomedical applications to alleviate the problem of limited data. In future work, we intend to extend this approach to analysis with medical imaging data, such as histopathology data and radiology data, for building predictive models on multi-modal data with limited sets of patients.

6 Acknowledgements

AD acknowledges funding from the European Union’s Horizon 2020 research and innovation pro- gramme under the Marie Skłodowska-Curie grant agreement No. 754354.

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

References Saghir Akhtar, Semir Vranic, Farhan Sachal Cyprian, and Ala-Eddin Al Moustafa. Epstein–barr virus in gliomas: cause, association, or artifact? Frontiers in oncology, 8:123, 2018. K Brennan, JL Koenig, AJ Gentles, JB Sunwoo, and O Gevaert. Identification of an atypical etiological head and neck squamous carcinoma subtype featuring the cpg island methylator phenotype. EBioMedicine, 17:223–236, 2017. Glenn W Brier. Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1):1–3, 1950. Michele Ceccarelli, Floris P Barthel, Tathiane M Malta, Thais S Sabedot, Sofie R Salama, Bradley A Murray, Olena Morozova, Yulia Newton, Amie Radenbaugh, Stefano M Pagnotta, et al. Molecular profiling reveals biologically discrete subsets and pathways of progression in diffuse glioma. Cell, 164(3):550–563, 2016. Kumardeep Chaudhary, Olivier B Poirion, Liangqun Lu, and Lana X Garmire. Deep learning–based multi-omics integration robustly predicts survival in liver cancer. Clinical Cancer Research, 24(6): 1248–1259, 2018. Anika Cheerla and Olivier Gevaert. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics, 35(14):i446–i454, 2019. Xi Chen, Chun Xie, Xing-Xing Fan, Ze-Bo Jiang, Vincent Kam-Wai Wong, Jia-Hui Xu, Xiao-Jun Yao, Liang Liu, and Elaine Lai-Han Leung. Novel direct ampk activator suppresses non-small cell lung cancer through inhibition of lipid metabolism. Oncotarget, 8(56):96089, 2017. Travers Ching, Xun Zhu, and Lana X Garmire. Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. PLoS computational biology, 14(4):e1006076, 2018. David R Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2):187–202, 1972. David Croft, Antonio Fabregat Mundo, Robin Haw, Marija Milacic, Joel Weiser, Guanming Wu, Michael Caudy, Phani Garapati, Marc Gillespie, Maulik R Kamdar, et al. The reactome pathway knowledgebase. Nucleic acids research, 42(D1):D472–D477, 2014. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. Arnout Devos and Matthias Grossglauser. Subspace networks for few-shot classification. arXiv preprint arXiv:1905.13613, 2019. Yan Duan, John Schulman, Xi Chen, Peter L Bartlett, Ilya Sutskever, and Pieter Abbeel. Rl2: Fast reinforcement learning via slow reinforcement learning. arxiv, 2016. URL https://arxiv. org/abs/1611.02779, 2016. Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1126–1135. JMLR. org, 2017. Jelle J Goeman. L1 penalized estimation in the cox proportional hazards model. Biometrical journal, 52(1):70–84, 2010. Dong Han, Shao-Jun Li, Yan-Ting Zhu, Lu Liu, and Man-Xiang Li. Lkb1/ampk/ signaling pathway in non-small-cell lung cancer. Asian Pacific Journal of Cancer Prevention, 14(7):4033– 4039, 2013. Frank E Harrell, Robert M Califf, David B Pryor, Kerry L Lee, and Robert A Rosati. Evaluating the yield of medical tests. Jama, 247(18):2543–2546, 1982.

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Siew Wan Hee, Adrian Willis, Catrin Tudur Smith, Simon Day, Frank Miller, Jason Madan, Martin Posch, Sarah Zohar, and Nigel Stallard. Does the low prevalence affect the sample size of interventional clinical trials of rare diseases? an analysis of data from the aggregate analysis of clinicaltrials. gov. Orphanet journal of rare diseases, 12(1):44, 2017. Roy S Herbst and Scott M Lippman. Molecular signatures of lung cancer–toward personalized therapy. New England Journal of Medicine, 356(1):76–78, 2007. David W Hosmer, Stanley Lemeshow, and S May. Applied survival analysis: regression modeling of time to event data (wiley series in probability and statistics), 2008. Minoru Kanehisa and Susumu Goto. Kegg: kyoto encyclopedia of genes and genomes. Nucleic acids research, 28(1):27–30, 2000. Dong Wook Kim, Sanghoon Lee, Sunmo Kwon, Woong Nam, In-Ho Cha, and Hyung Jun Kim. Deep learning-based survival prediction of oral cancer patients. Scientific reports, 9(1):1–10, 2019. Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. John P Klein and Melvin L Moeschberger. Survival analysis: techniques for censored and truncated data. Springer Science & Business Media, 2006. David G Kleinbaum and Mitchel Klein. The cox proportional hazards model and its characteristics. In Survival analysis, pages 97–159. Springer, 2012. Yan Li, Lu Wang, Jie Wang, Jieping Ye, and Chandan K Reddy. Transfer learning for survival analysis via efficient l2, 1-norm regularized cox regression. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 231–240. IEEE, 2016. David N Louis, Arie Perry, Guido Reifenberger, Andreas Von Deimling, Dominique Figarella- Branger, Webster K Cavenee, Hiroko Ohgaki, Otmar D Wiestler, Paul Kleihues, and David W Ellison. The 2016 world health organization classification of tumors of the central nervous system: a summary. Acta neuropathologica, 131(6):803–820, 2016. Bin Lu, Xiao Bing Chen, Mei Dan Ying, Qiao Jun He, Ji Cao, and Bo Yang. The role of ferroptosis in cancer development and treatment response. Frontiers in pharmacology, 8:992, 2018. Margaux Luck, Tristan Sylvain, Héloïse Cardinal, Andrea Lodi, and Yoshua Bengio. Deep learning for patient-specific kidney graft survival analysis. arXiv preprint arXiv:1705.10245, 2017. Alberto Mantovani, Isabella Barajon, and Cecilia Garlanda. Il-1 and il-1 regulatory pathways in cancer progression and therapy. Immunological reviews, 281(1):57–61, 2018. D Martin and JS Gutkind. Human tumor-associated viruses and new insights into the molecular mechanisms of cancer. Oncogene, 27(2):S31–S42, 2008. Pooya Mobadersany, Safoora Yousefi, Mohamed Amgad, David A Gutman, Jill S Barnholtz-Sloan, José E Velázquez Vega, Daniel J Brat, and Lee AD Cooper. Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences, 115(13):E2970–E2979, 2018. Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010. Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018. Mee Young Park and Trevor Hastie. L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4):659–677, 2007. Sang Tae Park and Jayoung Kim. Trends in next-generation sequencing and a new era for whole genome sequencing. International neurourology journal, 20(Suppl 2):S76, 2016.

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Jie-Wen Peng, Dong-Ying Liu, Gui-Nan Lin, JJ Xiao, and Zhong-Jun Xia. Hepatitis b virus infection is associated with poor prognosis in patients with advanced non small cell lung cancer. Asian Pac J Cancer Prev, 16(13):5285–5288, 2015. Michael Platten, Wolfgang Wick, and Benoît J Van den Eynde. Tryptophan catabolism in cancer: beyond ido and tryptophan depletion. Cancer research, 72(21):5435–5440, 2012. Michael Platten, Ellen AA Nollen, Ute F Röhrig, Francesca Fallarino, and Christiane A Opitz. Tryptophan metabolism as a common therapeutic target in cancer, neurodegeneration and beyond. Reviews Drug Discovery, 18(5):379–401, 2019. Lorien Y Pratt. Discriminability-based transfer between neural networks. In Advances in neural information processing systems, pages 204–211, 1993. Alexey Sergushichev. An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. BioRxiv, page 060012, 2016. Denise N Slenter, Martina Kutmon, Kristina Hanspers, Anders Riutta, Jacob Windsor, Nuno Nunes, Jonathan Mélius, Elisa Cirillo, Susan L Coort, Daniela Digles, et al. Wikipathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic acids research, 46(D1): D661–D667, 2018. Katarzyna Tomczak, Patrycja Czerwinska,´ and Maciej Wiznerowicz. The cancer genome atlas (tcga): an immeasurable source of knowledge. Contemporary oncology, 19(1A):A68, 2015. Luca Tonella, Marco Giannoccaro, Salvatore Alfieri, Silvana Canevari, and Loris De Cecco. signatures for head and neck cancer patient stratification: are results ready for clinical application? Current treatment options in oncology, 18(5):32, 2017. Ricardo Vilalta and Youssef Drissi. A perspective view and survey of meta-learning. Artificial intelligence review, 18(2):77–95, 2002. Kin Yau Wong, Cheng Fan, Maki Tanioka, Joel S Parker, Andrew B Nobel, Donglin Zeng, Dan-Yu Lin, and Charles M Perou. An integrative boosting approach for predicting survival time with multiple genomics platforms. bioRxiv, page 338145, 2018. Chen-Yi Wu, Hsiao-Yun Hu, Cheng-Yun Pu, Nicole Huang, Hsi-Che Shen, Chung-Pin Li, and Yiing-Jeng Chou. Pulmonary tuberculosis increases the risk of lung cancer: a population-based cohort study. Cancer, 117(3):618–624, 2011. Safoora Yousefi, Fatemeh Amrollahi, Mohamed Amgad, Chengliang Dong, Joshua E Lewis, Con- gzheng Song, David A Gutman, Sameer H Halani, Jose Enrique Velazquez Vega, Daniel J Brat, et al. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Scientific reports, 7(1):1–11, 2017. Yang-Hao Yu, Chien-Chang Liao, Wu-Huei Hsu, Hung-Jen Chen, Wei-Chih Liao, Chih-Hsin Muo, Fung-Chang Sung, and Chih-Yi Chen. Increased lung cancer risk among patients with pulmonary tuberculosis: a population cohort study. Journal of Thoracic Oncology, 6(1):32–37, 2011.

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Appendix A

We compute the gradient of the meta-learning loss function. Suppose that, for each task Tτ , the 2 inner-learner takes 2 steps of stochastic gradient descent and updates the parameters to θτ . From Equation (3), we can write the gradient using a Taylor expansion:

θ0 − θ2 g = τ τ = L0 θ0 + L0 θ0 − αL00 θ0 L0 θ0 + O α2 (5) α τ,0 τ τ,1 τ τ,1 τ τ,0 τ

Lτ,0 is the loss computed on the first minibatch sampled from task τ, and Lτ,1 is the loss computed on the second minibatch sampled from task τ. The expectation of the first two terms in Equation (5) corresponds to the gradient of expected loss, and the expectation of the third term can be written as: 1  ∂  [L00(θ)L0 (θ)] = (L0 (θ) ·L0 (θ)) . (6) Eτ,0,1 1 0 2Eτ,0,1 ∂θ 1 0 This term increases the inner product of the gradient of the first minibatch and the gradient of the second minibatch, which means it encourages the gradients from different minibatches for a given task to align in the same direction.

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplementary information

40 40

ident ident

gbm luad 0 0 y y lgg lusc others others

−40 −40

−80 −40 0 40 −80 −40 0 40 x x (a) LGG/GBM (b) LUAD/LUSC

80

40

40

ident ident kirc 0 y hnsc y kirp 0 others others

−40 −40

−40 0 40 80 −80 −40 0 40 x x (c) HNSC (d) KIRC/KIRP Figure S1: Mapping of 33 types of cancers’ gene expression data with a t-distributed stochastic neigh- bor embedding (t-SNE), highlighting (S1a) LGG/GBM versus the rest of cancers; (S1b) LUAD/LUSC versus the rest of cancers; (S1c) HNSC versus the rest of cancers; (S1d) KIRC/KIRP versus the rest of cancers. Renal cell carcinomas (i.e. KIRC and KIRP) are farther apart from other types of cancers, and also more heterogeneous within, thus information transfer from other cancers is considered more difficult.

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S1: Glioma gene set over-representation

Pathway Gene set p value p value Jaccard In- Overlapping Number Genes Source size (adjusted) dex of overlap- ping

Protein export - Homo sapiens (human) 23 0.018223195 1 0.003460208 SPCS1; SEC62; SRPRB; IMMP1L; SEC61B; 6 lowriskgenes KEGG SEC11C Sulfur relay system - Homo sapiens (human) 8 0.038146719 1 0.001741149 URM1; MPST; TST 3 lowriskgenes KEGG Sulfur metabolism - Homo sapiens (human) 9 0.038146719 1 0.001741149 BPNT1; MPST; TST 3 lowriskgenes KEGG Legionellosis - Homo sapiens (human) 55 0.039741108 1 0.005675369 IL12B; HSPA1B; HSPA1A; IL6; ARF1; 10 lowriskgenes KEGG HSPD1; NAIP; CXCL3; CASP8; CLK1 RNA transport - Homo sapiens (human) 171 0.040939816 1 0.011924119 AAAS; NCBP1; SUMO3; EIF3H; EIF3A; 22 lowriskgenes KEGG NUP210; RPP40; ACIN1; FXR2; THOC2; THOC6; XPO5; NUP133; EIF2B4; RBM8A; EIF1; RGPD1; GEMIN5; SMN1; FMR1; NUP160; EIF4E2 Tryptophan metabolism - Homo sapiens (hu- 40 0.041726265 1 0.004013761 IDO1; INMT; ALDH3A2; GCDH; OGDH; 7 lowriskgenes KEGG man) AADAT; ALDH9A1 Carbohydrate digestion and absorption - Homo 44 0.048176123 1 0.004011461 SLC37A4; PIK3CA; CACNA1D; PRKCB; 7 lowriskgenes KEGG sapiens (human) HK1; ATP1B2; ATP1B1 NOSTRIN mediated eNOS trafficking 6 0.008571329 1 0.001744186 NOSTRIN; WASL; CAV1 3 lowriskgenes REACTOME Disinhibition of SNARE formation 5 0.008571329 1 0.001744186 STXBP3; PRKCB; STX4 3 lowriskgenes REACTOME RUNX1 regulates transcription of genes in- 6 0.008571329 1 0.001744186 PRKCB; CBFB; CREBBP 3 lowriskgenes REACTOME volved in differentiation of myeloid cells cycle for steroid hormone re- 19 0.008593 1 0.003466205 NR3C2; NR3C1; HSPA1A; HSP90AA1; 6 lowriskgenes REACTOME ceptors (SHR) HSPA1B; FKBP4 Dectin-2 family 29 0.008593 1 0.003466205 CLEC10A; MUC20; MUC5B; FCER1G; 6 lowriskgenes REACTOME MUC4; SYK PAOs oxidise polyamines to amines 2 0.010013404 1 0.001164144 PAOX; SMOX 2 lowriskgenes REACTOME TWIK-releated acid-sensitive K+ channel 2 0.010013404 1 0.001164144 KCNK9; KCNK3 2 lowriskgenes REACTOME (TASK) Attenuation phase 28 0.011863238 1 0.004029937 FKBP4; HSPA1B; HSPA1A; HSBP1; 7 lowriskgenes REACTOME HSP90AA1; SERPINH1; CREBBP Crosslinking of collagen fibrils 10 0.012809503 1 0.002320186 TLL1; LOXL3; PCOLCE; PXDN 4 lowriskgenes REACTOME Lysine catabolism 12 0.012809503 1 0.002320186 AADAT; PIPOX; OGDH; GCDH 4 lowriskgenes REACTOME Pyrimidine salvage 11 0.012809503 1 0.002320186 UPP1; UCK1; TK2; DCK 4 lowriskgenes REACTOME RUNX1 regulates mediated 6 0.015871668 1 0.001743173 GPAM; KCTD6; CBFB 3 lowriskgenes REACTOME transcription HSF1 activation 31 0.021616644 1 0.004022989 FKBP4; HSPA1B; HSPA1A; HSBP1; RPA3; 7 lowriskgenes REACTOME HSP90AA1; SERPINH1 Metabolism of polyamines 40 0.023498306 1 0.004581901 GAMT; PAOX; CKMT1A; SMOX; CKM; 8 lowriskgenes REACTOME AMD1; OAZ2; OAZ3 Activation of RAC1 14 0.025669753 1 0.002317497 NCK2; ROBO1; SOS2; PAK4 4 lowriskgenes REACTOME Interconversion of polyamines 3 0.028037765 1 0.001163467 PAOX; SMOX 2 lowriskgenes REACTOME Synthesis of glycosylphosphatidylinositol 18 0.028222393 1 0.002888504 PIGB; PIGH; PIGO; PIGW; DPM2 5 lowriskgenes REACTOME (GPI) Metabolism of amino acids and derivatives 339 0.030727025 1 0.020263425 GCAT; OGDH; PIPOX; CKM; NAALAD2; 40 lowriskgenes REACTOME PDHX; HAL; SERINC5; RPL11; RPL18; AMD1; RPL4; SERINC1; IYD; TXNRD1; DDO; OAZ2; IVD; GAMT; INMT; PAOX; RPL35A; IDO1; MPST; RPL7; QDPR; ALDH9A1; ACADSB; RPSA; GLUD1; RPS15; CKMT1A; TST; GCDH; RPS13; SARDH; AADAT; SMOX; OAZ3; SECISBP2 Cohesin Loading onto 10 0.038146719 1 0.001741149 RAD21; PDS5A; PDS5B 3 lowriskgenes REACTOME HSF1-dependent transactivation 36 0.048176123 1 0.004011461 FKBP4; HSPA1B; HSPA1A; HSBP1; 7 lowriskgenes REACTOME HSP90AA1; SERPINH1; CREBBP Model for regulation of MSMP expression in 3 0.028037765 1 0.001163467 CCR2; CTCF 2 lowriskgenes Wikipathways cancer cells and its proangiogenic role in ovar- ian tumors Viral carcinogenesis - Homo sapiens (human) 201 0.00190752 0.612313978 0.017094017 IL6ST; IKBKG; GSN; DLG1; STAT3; 32 highriskgenes KEGG PIK3CB; PIK3CD; ATP6V0D2; CREB1; KRAS; HLA-C; GTF2E2; EGR3; CCNE1; HIST1H2BL; HIST1H2BH; TBPL1; TBP; CDC20; HDAC5; HDAC6; CDKN1B; C3; RANBP1; MRPS18B; KAT2B; EP300; BAD; HIST2H2BE; YWHAB; TRAF3; ATF6B Herpes simplex infection - Homo sapiens (hu- 185 0.007380866 1 0.013579576 IKBKG; EP300; OAS2; CSNK2A1; IFNAR2; 25 highriskgenes KEGG man) SOCS3; RNASEL; IFIT1; GTF2IRD1; TN- FRSF1A; TBP; TAF10; MAPK10; STAT2; SKP1; OAS3; HLA-C; C3; TBPL1; CYCS; IL15; HLA-DOA; TRAF3; POU2F3; CFP Epstein-Barr virus infection - Homo sapiens 200 0.013975163 1 0.014973262 IFNAR2; IKBKG; GADD45A; PIK3CD; 28 highriskgenes KEGG (human) OAS3; OAS2; ; IRAK4; HLA-C; CDKN1B; CCNE1; PSMD12; PIK3CB; MAPK14; MAPK10; NCOR2; STAT3; STAT2; CR2; CYCS; HLA-DOA; PSMD8; CD58; PSMD2; BID; PSMC3; TRAF3; MAP3K14 Necroptosis - Homo sapiens (human) 162 0.018658053 1 0.012008734 CHMP2A; CAPN1; CAMK2A; IFNAR2; 22 highriskgenes KEGG MAPK10; FTH1; TNFRSF1A; TNFRSF10B; TNFRSF10A; STAT3; STAT2; HIST2H2AA3; CHMP4B; VPS4B; GLUD2; PPIA; HIST1H2AL; HIST1H2AG; HIST1H2AC; BID; PLA2G4A; CASP1

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S1: Glioma gene set over-representation (continued)

Pathway Gene set p value p value Jaccard In- Overlapping Number Genes Source size (adjusted) dex of overlap- ping Circadian rhythm - Homo sapiens (human) 30 0.021491434 1 0.004027618 CRY1; PRKAA1; NR1D1; SKP1; BHLHE40; 7 highriskgenes KEGG CREB1; FBXL3 RNA degradation - Homo sapiens (human) 79 0.025072681 1 0.007323944 ZCCHC7; DCP2; PARN; PABPC4; EX- 13 highriskgenes KEGG OSC3; EXOSC7; EXOSC10; TOB1; PABPC1; PATL1; PAPD7; PAPD5; PNLDC1 Tuberculosis - Homo sapiens (human) 178 0.026606605 1 0.012965964 LSP1; CALM3; IRAK2; IRAK4; CAMK2A; 24 highriskgenes KEGG CREB1; TLR1; IL10RB; CARD9; RFXAP; TNFRSF1A; MAPK14; MAPK10; C3; ATP6V0D2; EP300; HLA-DOA; TGFB2; FCGR3B; BAD; CYCS; CEBPG; BID; FCGR2B Human T-cell leukemia virus 1 infection - 219 0.028932505 1 0.015822785 IKBKG; DLG1; RAN; MAP3K3; PIK3CB; 30 highriskgenes KEGG Homo sapiens (human) PIK3CD; MYC; KRAS; ADCY7; MAPK10; HLA-C; CRTC1; CCNE1; CHEK2; TBPL1; TNFRSF1A; TBP; CDC20; FOSL1; RANBP1; KAT2B; EP300; IL2RB; IL15; HLA-DOA; TGFB2; CREB1; CDKN2C; MAP3K14; ATF6B Cell cycle - Homo sapiens (human) 124 0.045580947 1 0.009911894 GADD45A; ; CDKN2C; CDKN1B; 18 highriskgenes KEGG CCNE1; CHEK2; CDC25B; STAG1; CDC14B; CDC14A; CDC20; ; SKP1; EP300; MYC; TGFB2; YWHAB; MCM5 Cytokine Signaling in Immune system 458 0.00015883 0.299552844 0.031476998 IL6ST; IFNAR2; PIK3CB; PIK3CD; IRAK2; 65 highriskgenes REACTOME IRAK4; CAMK2A; SNAP25; TNFSF13; TN- FSF15; IFIT1; HCK; SIGIRR; FLNB; PELI2; CISH; BRWD1; IL2RB; HLA-C; MAP3K14; IFITM3; IFITM1; CREB1; TNFRSF25; UBA52; IL10RB; TRIM8; MEF2A; SKP1; TNFSF9; TNFRSF6B; TNFSF4; HIST1H3H; OSM; IL27; MAP3K8; MAP3K3; TRIM29; INPP5D; IKBKG; TNFRSF18; GBP2; GBP7; PTPN2; PTPN6; STAT3; STAT2; TNFRSF1A; IL15; IL1RAP; CASP1; S100A12; OAS3; OAS2; RNASEL; SOCS3; EIF4G1; IL20RB; UBE2L6; TRIM46; MAPK14; MAPK10; PSMB8; ADAM17; TRAF3 Signaling by NOTCH 119 0.001166907 1 0.012195122 MOV10; FURIN; DLGAP5; MYC; CREB1; 22 highriskgenes REACTOME HDAC5; UBA52; YBX1; PRKCI; ARRB2; KAT2B; NCSTN; HDAC6; TLE1; NCOR2; POFUT1; SKP1; LFNG; EP300; ADAM17; ST3GAL4; PBX1 Regulation of mRNA stability by that 37 0.002709399 1 0.005737235 TNFSF13; PABPC1; MAPK14; ANP32A; EX- 10 highriskgenes REACTOME bind AU-rich elements OSC3; EXOSC7; PARN; YWHAB; DCP2; EIF4G1 Interferon alpha/beta signaling 70 0.002974208 1 0.007390563 HLA-C; IFNAR2; IFIT1; STAT2; OAS3; 13 highriskgenes REACTOME OAS2; IFITM3; IFITM1; PTPN6; SOCS3; RNASEL; PSMB8; GBP2 Immune System 1827 0.003342285 1 0.059848734 AGL; UBE2Q2; CD226; PIK3CB; PIK3CD; 182 highriskgenes REACTOME MUC1; ACLY; FBXL12; FBXL19; PSMD12; SIGIRR; PELI2; ACTR10; CD93; CFP; ASB3; CNPY3; IL10RB; TRIM8; MEF2A; SH3RF1; PRDX6; WIPF3; WIPF1; EP300; ATP6V0D2; CUL2; TRIM29; ARHGAP9; LAIR2; C4BPA; PTPN2; PTPN6; PTPN4; C3; FBXO41; SLPI; FCGR3B; MAPKAP1; IL1RAP; SEC61A1; UBR2; KRAS; SOCS3; ARPC1A; EIF4G1; IL20RB; ARPC2; RNF123; MAPK14; MAPK10; DAPP1; P2RX7; TRAF3; IL6ST; COMMD9; CAMK2A; CISH; PSMC3; SEC61G; MAP3K14; IKBKG; TMEM173; GNLY; VAT1; UBA52; NCSTN; SKP1; LCN2; HIST1H3H; MAP3K8; MAP3K3; INPP5D; CARD9; PSMB8; UBA6; UBE2C; IL15; ATP11A; RNASEL; FBXL3; TLR1; FAF2; CDC20; DTX3L; ADAM17; MUC16; RASGRP3; TANK; HCK; GZMM; SPSB2; BRWD1; C5AR1; CDH1; IFITM3; IFITM1; RNF19A; UNC93B1; VNN1; UBE2E3; TN- FRSF6B; YWHAB; AIM2; VAMP8; ABI1; IL27; PECAM1; PLD1; RNF217; RNF213; DOCK2; SMURF1; FTH1; PPIA; WASF2; CASP4; CASP1; HECTD3; S100A12; OAS3; OAS2; CAPN1; PI3; RNF220; TRIM46; LRMP; HLA-C; UBE2G1; IFNAR2; IRAK2; OSM; IRAK4; IL2RB; SNAP25; TNFSF13; TNFSF15; IFIT1; NFASC; FLNB; GGH; TN- FAIP6; PSMD2; ALDH3B1; NOS3; CREB1; PRCP; TNFRSF25; C1QA; ZBTB16; TN- FSF9; TNFSF4; TRIM69; RNF34; XRCC5; CLEC4G; CLEC4A; TNFRSF18; GBP2; GBP7; HERC1; HERC6; CR2; TNFRSF1A; STAT3; STAT2; CD68; HLA-DOA; CHIT1; LAT2; GSN; GOLGA7; APEH; UBE2L6; PTPRN2; FCGR2B; CD58; DOK3; SLC11A1; SIGLEC9; SIGLEC7; FOLR3

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S1: Glioma gene set over-representation (continued)

Pathway Gene set p value p value Jaccard In- Overlapping Number Genes Source size (adjusted) dex of overlap- ping TNFs bind their physiological receptors 29 0.004906229 1 0.004608295 TNFSF13; TNFRSF25; TNFSF15; TN- 8 highriskgenes REACTOME FRSF18; TNFSF9; TNFRSF6B; TNFSF4; TNFRSF1A Voltage gated Potassium channels 44 0.005076968 1 0.005169443 KCNG2; KCNA2; KCNS1; KCND2; KCNQ3; 9 highriskgenes REACTOME KCNQ1; KCNF1; KCNH7; KCNH2 Collagen degradation 35 0.006191992 1 0.004605642 MMP15; MMP10; ADAM17; FURIN; 8 highriskgenes REACTOME COL18A1; COL9A2; COL16A1; COL17A1 TNFR2 non-canonical NF-kB pathway 50 0.007903028 1 0.00627138 TNFSF13; TNFRSF25; TNFSF15; TN- 11 highriskgenes REACTOME FRSF18; SKP1; TNFSF9; TNFSF4; TRAF3; TNFRSF1A; TNFRSF6B; MAP3K14 HATs acetylate 142 0.00981636 1 0.011043622 DR1; YEATS2; YEATS4; KAT2B; EP400; 20 highriskgenes REACTOME TADA2A; HIST1H2BL; HIST1H2BH; PHF20; TAF10; ATXN7L3; EP300; HIST2H2AA3; HIST2H2BE; HIST1H2AL; HIST1H2AG; HIST1H2AC; HIST1H3H; MEAF6; RUVBL2 Synthesis of PA 40 0.011593687 1 0.004597701 LPCAT4; LIPH; PLA2G2D; PLA2G4A; PLD6; 8 highriskgenes REACTOME PLD1; AGPAT4; GPD1L SCF-beta-TrCP mediated degradation of Emi1 10 0.012757504 1 0.00232288 SKP1; CDC20; UBA52; FBXO5 4 highriskgenes REACTOME LGI-ADAM interactions 14 0.012757504 1 0.00232288 STX1B; ADAM23; ADAM11; ADAM22 4 highriskgenes REACTOME UCH proteinases 102 0.013842122 1 0.009470752 UBA52; PSMD12; ACTR5; INO80C; PSMA1; 17 highriskgenes REACTOME PSMA4; HIST2H2AA3; PSMD2; PSMB8; PSMB6; PSMD8; HIST1H2AL; HIST1H2AG; HIST1H2AC; PSMC3; USP15; BAP1 Growth signaling 19 0.016938149 1 0.002895194 PTPN6; SOCS3; CISH; ADAM17; STAT3 5 highriskgenes REACTOME KSRP (KHSRP) binds and destabilizes mRNA 16 0.016938149 1 0.002895194 DCP2; PARN; MAPK14; EXOSC3; EXOSC7 5 highriskgenes REACTOME Polo-like mediated events 16 0.016938149 1 0.002895194 EP300; CENPF; MYBL2; WEE1; FOXM1 5 highriskgenes REACTOME Pre-NOTCH Expression and Processing 42 0.019850024 1 0.004589788 PRKCI; MOV10; KAT2B; POFUT1; FURIN; 8 highriskgenes REACTOME LFNG; ST3GAL4; EP300 Degradation of the extracellular matrix 105 0.021458182 1 0.008923592 FURIN; ADAMTS4; CAPN1; CDH1; KLK7; 16 highriskgenes REACTOME BMP1; MMP15; MMP10; NCSTN; COL16A1; SPOCK3; ACAN; COL9A2; COL17A1; ADAM17; COL18A1 Metalloprotease DUBs 36 0.021491434 1 0.004027618 KAT2B; HIST1H2AL; HIST1H2AG; 7 highriskgenes REACTOME HIST1H2AC; HIST2H2AA3; EP300; UBA52 Interferon Signaling 158 0.021717868 1 0.011995638 TRIM46; IFNAR2; OAS3; OAS2; IFITM3; 22 highriskgenes REACTOME IFITM1; TRIM29; CAMK2A; RNASEL; SOCS3; EIF4G1; HLA-C; UBE2L6; IFIT1; GBP7; PTPN2; PTPN6; TRIM8; FLNB; STAT2; PSMB8; GBP2 Transcriptional activation of mitochondrial bio- 27 0.022501875 1 0.003462204 CYCS; SOD2; SIRT5; SIRT4; CREB1; 6 highriskgenes REACTOME genesis GLUD2 Signaling by Interleukins 254 0.025564278 1 0.017223382 IL6ST; OSM; IL27; S100A12; MAP3K8; 33 highriskgenes REACTOME MAP3K3; PIK3CB; PIK3CD; IRAK2; IRAK4; CREB1; INPP5D; SNAP25; SOCS3; UBA52; IL10RB; IL15; IL20RB; STAT3; PTPN6; SKP1; MEF2A; SIGIRR; IL1RAP; MAPK10; HCK; MAPK14; PELI2; BRWD1; IL2RB; IK- BKG; HIST1H3H; CASP1 Regulation of IFNA signaling 26 0.025570448 1 0.002320186 IFNAR2; STAT2; PTPN6; SOCS3 4 highriskgenes REACTOME Toxicity of botulinum toxin type C (BoNT/C) 3 0.027974833 1 0.001164822 SNAP25; STX1B 2 highriskgenes REACTOME GABA A receptor activation 13 0.027974833 1 0.001164822 GABRB3; GABRB2 2 highriskgenes REACTOME The AIM2 inflammasome 3 0.027974833 1 0.001164822 CASP1; AIM2 2 highriskgenes REACTOME mTORC1-mediated signalling 23 0.028095548 1 0.002891845 EEF2K; AKT1S1; YWHAB; RRAGD; EIF4G1 5 highriskgenes REACTOME Budding and maturation of HIV virion 30 0.03325781 1 0.003458213 CHMP4B; CHMP2A; VPS28; VPS4B; 6 highriskgenes REACTOME UBA52; PPIA Glyoxylate metabolism and glycine degradation 32 0.03325781 1 0.003458213 GRHPR; GNMT; LIPT1; ALDH4A1; BCK- 6 highriskgenes REACTOME DHA; DBT Regulation of TP53 Activity through Associa- 14 0.03407681 1 0.002318841 BANP; TP73; PPP1R13L; PHF20 4 highriskgenes REACTOME tion with Co-factors Mitochondrial biogenesis 53 0.035666601 1 0.004020678 CYCS; MAPK14; SOD2; SIRT5; SIRT4; 7 highriskgenes REACTOME CREB1; GLUD2 NCAM signaling for neurite out-growth 59 0.039471083 1 0.005681818 SPTB; CACNA1I; NRTN; ARTN; COL9A2; 10 highriskgenes REACTOME CREB1; KRAS; PSPN; GDNF; COL6A1 Disease 508 0.039528833 1 0.027738599 FGFR1OP2; PIK3CB; PIK3CD; SNAP25; 59 highriskgenes REACTOME CDKN1B; KSR2; HCK; VWF; TAF10; APOBEC3G; CHMP4B; BAD; VPS28; RNMT; MAP3K11; RAN; CDH1; UBA52; NUP85; NCSTN; GTF2E2; CHMP2A; DERL1; DERL3; SHH; TBP; SKP1; HDAC5; HDAC6; RANBP1; EP300; PRDX1; YWHAB; FURIN; ERBB3; ERLEC1; CREB1; DOCK2; KAT2B; XRCC5; XRCC4; PAPD7; STX1B; ARRB2; STAT3; VPS4B; PPIA; MAPKAP1; CAPN1; MYC; KRAS; AKT1S1; CDC25B; POLR2G; TAF7; NUP153; NCOR2; FGF19; ADAM17 Interleukin-1 family signaling 77 0.039638193 1 0.00676819 IKBKG; PELI2; MAP3K8; S100A12; SKP1; 12 highriskgenes REACTOME MAP3K3; IRAK2; IRAK4; IL1RAP; CASP1; UBA52; SIGIRR NOTCH1 Intracellular Domain Regulates Tran- 47 0.040852221 1 0.005131129 HDAC5; HDAC6; TLE1; KAT2B; NCOR2; 9 highriskgenes REACTOME scription SKP1; UBA52; MYC; EP300

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S1: Glioma gene set over-representation (continued)

Pathway Gene set p value p value Jaccard In- Overlapping Number Genes Source size (adjusted) dex of overlap- ping NCAM1 interactions 37 0.041501243 1 0.00401837 CACNA1I; NRTN; ARTN; COL9A2; PSPN; 7 highriskgenes REACTOME GDNF; COL6A1 p53-Dependent G1 DNA Damage Response 21 0.043039667 1 0.002888504 CCNE1; CDKN1B; UBA52; CHEK2; PHF20 5 highriskgenes REACTOME p53-Dependent G1/S DNA damage checkpoint 21 0.043039667 1 0.002888504 CCNE1; CDKN1B; UBA52; CHEK2; PHF20 5 highriskgenes REACTOME Assembly Of The HIV Virion 17 0.044031253 1 0.002317497 VPS28; UBA52; PPIA; FURIN 4 highriskgenes REACTOME Eicosanoid ligand-binding receptors 15 0.044031253 1 0.002317497 TBXA2R; LTB4R; PTGIR; CYSLTR1 4 highriskgenes REACTOME Antigen processing: Ubiquitination & Protea- 263 0.044259631 1 0.016675352 UBE2Q2; ASB3; CUL2; UBR2; SOCS3; 32 highriskgenes REACTOME some degradation FBXL3; UBA52; RNF213; CDC20; UBE2L6; FBXL12; UBE2E3; HERC1; HERC6; UBA6; UBE2C; SKP1; DTX3L; SPSB2; ZBTB16; FBXO41; SH3RF1; RNF220; TRIM69; HECTD3; SMURF1; RNF34; FBXL19; RNF217; UBE2G1; RNF19A; RNF123 MAPK family signaling cascades 236 0.045869684 1 0.015287296 SPTB; RASAL3; GDNF; KRAS; ERBB3; 29 highriskgenes REACTOME CAMK2A; IL6ST; UBA52; RASGRP3; MMP10; KSR2; ANGPT1; NRTN; RAG1; CDC14B; CDC14A; VWF; IL17RD; ARRB2; FGF19; DUSP2; MYC; PSPN; IL2RB; ARTN; KALRN; MAP3K11; YWHAB; ACTN2 mTOR signalling 40 0.047921055 1 0.004016064 PRKAA1; EEF2K; AKT1S1; YWHAB; 7 highriskgenes REACTOME RRAGD; EIF4G1; STRADA Signaling by NOTCH1 73 0.047955595 1 0.006760563 HDAC5; HDAC6; TLE1; KAT2B; NCSTN; 12 highriskgenes REACTOME ADAM17; NCOR2; EP300; SKP1; UBA52; MYC; ARRB2 Mammary gland development pathway - Invo- 10 0.001626496 0.977524312 0.002905288 STAT3; SOCS3; IL6ST; CDH1; MYC 5 highriskgenes Wikipathways lution (Stage 4 of 4) Integrated Breast Cancer Pathway 66 0.006464185 1 0.007369615 RASGRP3; NUP85; XRCC3; ODC1; EP300; 13 highriskgenes Wikipathways RAD54L; GADD45A; FOSL1; CDH1; MYC; CREB1; KRAS; AHR Oxidative Damage 40 0.013030469 1 0.005154639 CDKN1B; CYCS; BAD; C5AR1; MAPK10; 9 highriskgenes Wikipathways GADD45A; TRAF3; CR2; C1QA Interferon alpha-beta signaling 21 0.014371125 1 0.003466205 IFITM3; GBP2; IFITM1; RNASEL; PSMB8; 6 highriskgenes Wikipathways IFIT1 Regulation of Apoptosis by Parathyroid 21 0.014371125 1 0.003466205 BID; BCL2L12; PTHLH; BCL2L15; MYC; 6 highriskgenes Wikipathways Hormone-related Protein PIK3CG NO-cGMP-PKG mediated Neuroprotection 47 0.015361467 1 0.005151689 CYCS; CNGB1; NPPA; NOS3; GUCY1B2; 9 highriskgenes Wikipathways CAMK2A; CREB1; ACTN2; BAD Glucocorticoid and Mineralcorticoid 8 0.015820701 1 0.001745201 CYP11A1; CYP21A2; HSD11B2 3 highriskgenes Wikipathways Metabolism mir-124 predicted interactions with cell cycle 6 0.015820701 1 0.001745201 SIX4; PRKAA1; PTBP1 3 highriskgenes Wikipathways and differentiation The human immune response to tuberculosis 22 0.018126628 1 0.003464203 IFNAR2; IFIT1; STAT2; IFITM1; PTPN2; 6 highriskgenes Wikipathways PSMB8 Prostaglandin Synthesis and Regulation 44 0.020919085 1 0.005145798 PTGS1; S100A10; PLA2G4A; CYP11A1; 9 highriskgenes Wikipathways HSD11B2; TBXA2R; ANXA3; PTGIR; ABCC4 Apoptosis Modulation and Signaling 91 0.027414659 1 0.008384572 TNFRSF25; BLK; TNFRSF10B; TN- 15 highriskgenes Wikipathways FRSF10A; TNFRSF6B; TNFRSF1A; BAD; BIRC7; CYCS; BID; TRAF3; CASP6; CASP4; CASP1; MAP3K14 Structural Pathway of Interleukin 1 (IL-1) 44 0.027792027 1 0.00513992 MAP3K8; TANK; MAP3K3; IRAK2; MYC; 9 highriskgenes Wikipathways IRAK4; MAPK14; IL1RAP; MAP3K14 Fluoropyrimidine Activity 34 0.030400651 1 0.004022989 SLC29A1; RRM2; RRM1; XRCC3; GGH; 7 highriskgenes Wikipathways SMUG1; ABCC4 Apoptosis 87 0.031966146 1 0.007847534 TNFRSF25; IKBKG; TNFRSF1A; TN- 14 highriskgenes Wikipathways FRSF10B; MAPK10; MYC; TP73; CYCS; BAD; BID; TRAF3; CASP6; CASP4; CASP1 Cell Cycle 120 0.033932118 1 0.009933775 GADD45A; MYC; CDKN2C; CDKN1B; 18 highriskgenes Wikipathways CCNE1; CHEK2; CDC25B; STAG1; CDC14B; CDC14A; CDC20; CDC6; SKP1; EP300; WEE1; TGFB2; YWHAB; MCM5 Mitotic G1-G1-S phases 20 0.035077937 1 0.002890173 CCNE1; FBXO5; RRM2; MYBL2; CDC6 5 highriskgenes Wikipathways Apoptosis Modulation by 19 0.035077937 1 0.002890173 CASP6; TNFRSF1A; BID; CYCS; MAPK10 5 highriskgenes Wikipathways Parkin-Ubiquitin Proteasomal System pathway 73 0.035883432 1 0.006772009 HSPA4; PSMD2; UBE2L6; PSMD12; 12 highriskgenes Wikipathways TUBA1A; CCNE1; UBE2G1; PSMC3; RNF19A; CASP1; PSMD8; HSPA14 TNF alpha Signaling Pathway 93 0.038990875 1 0.008365867 IKBKG; MAP3K8; MAP3K3; CSNK2A1; 15 highriskgenes Wikipathways KRAS; TANK; KSR2; TNFRSF1A; SKP1; SMPD2; PTPRCAP; BAD; PSMD2; BID; MAP3K14 DNA Damage Response (only ATM dependent) 110 0.039497097 1 0.009407858 WNT10A; PIK3CD; PIK3CG; MYC; SOD2; 17 highriskgenes Wikipathways PIK3C2B; PIK3C2A; KRAS; TCF7L1; CDKN1B; PIK3CB; WNT3A; MAPK10; FOSL1; TP73; WNT4; BAD Microglia Pathogen Phagocytosis Pathway 40 0.041661964 1 0.004576659 PIK3CB; PIK3CD; PIK3CG; PIK3C2A; HCK; 8 highriskgenes Wikipathways PTPN6; C1QA; SIGLEC7 Interleukin-1 family signaling 15 0.044031253 1 0.002317497 PTPN2; PTPN7; PTPN6; PTPN4 4 highriskgenes Wikipathways

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S2: Glioma gene set enrichment analysis (GSEA)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme

Tryptophan metabolism - Homo sapiens (hu- 0.023100107 0.979820681 -0.432518002 -1.550736633 1141 33 AADAT; ALDH9A1; ALDH3A2; IDO1; OGDH; KEGG man) INMT; GCDH Cushing syndrome - Homo sapiens (human) 0.049814344 0.979820681 0.274721105 1.298534716 2548 138 AHR; CCNE1; TCF7L1; WNT10A; ADCY7; KEGG CACNA1F; CDKN2C; CAMK2A; STAR; PBX1; WNT3A; CDKN1B; CYP11A1; CACNA1I; USP8; WNT4; CREB1; CYP21A2; ATF6B; GNAI3; FZD3; NCEH1; ADCY9; CDKN2B; CACNA1C; RAP1A; APC2; WNT16; PRKACA; KCNK2; CDK6 Renin-angiotensin system - Homo sapiens (hu- 0.010151278 0.74865674 0.55331802 1.713839711 511 19 PRCP; KLK1; THOP1; PREP; ANPEP; CTSA; EN- KEGG man) PEP; CPA3; CTSG; AGT; NLN Small cell lung cancer - Homo sapiens (human) 0.029809018 0.979820681 0.316900196 1.400774125 1513 92 CCNE1; BIRC7; TRAF3; CYCS; CDKN1B; KEGG PIK3CB; MYC; IKBKG; GADD45A; PIK3CD; ITGA3; SKP2; PTEN; COL4A2; ITGA2; NOS2; DDB2; CDKN2B; BCL2L1; ITGAV; MAX; TP53; PTK2; TRAF1; CDK6; LAMB1; BAX; ITGB1; LAMB2; LAMC1; ; TRAF2; PIK3R1; TRAF6; NFKBIA p53 signaling pathway - Homo sapiens (human) 0.048804487 0.979820681 0.327513728 1.372478184 2479 69 CCNE1; CD82; BID; CYCS; TP73; TNFRSF10B; KEGG RRM2; CHEK2; GADD45A; TP53I3; PTEN; DDB2; IGFBP3; BCL2L1; FAS; TP53; CCNB1; ATR; CDK6; BAX; PPM1D; PMAIP1; RCHY1; BBC3; SESN2 Viral carcinogenesis - Homo sapiens (human) 0.001408837 0.415606778 0.309546872 1.528384025 71 188 CCNE1; HIST1H2BL; GTF2E2; HDAC6; YWHAB; KEGG KRAS; TBP; CDC20; C3; HIST1H2BH; TRAF3; RANBP1; EP300; CDKN1B; STAT3; IL6ST; HIST2H2BE; ATP6V0D2; DLG1; TBPL1; BAD; PIK3CB; HDAC5; GSN; IKBKG; HLA-C; CREB1; KAT2B; PIK3CD; ATF6B; EGR3; MRPS18B; HLA- B; SKP2; HLA-A; EIF2AK2; CDKN2B; TP53; RHOA; PRKACA; POLB; TRAF1; CDK6; BAX; TRADD; MAPKAPK2; SNW1; PMAIP1; GRB2; ACTN4; JAK3; CREB3L2; TRAF2; NRAS; PIK3R1; GTF2E1; NFKBIA; HDAC7; GTF2H3; IRF9; PRKACB; YWHAZ; NFKB1; DDB1; SCRIB; BAK1 Hepatitis C - Homo sapiens (human) 0.004599636 0.452297534 0.321690611 1.513911872 234 134 YWHAB; KRAS; IFNAR2; TRAF3; BID; RNASEL; KEGG CYCS; OAS2; TNFRSF1A; STAT3; OAS3; SOCS3; STAT2; BAD; PIK3CB; MYC; IFIT1; IKBKG; PIK3CD; CLDN11; EIF2AK1; EIF2AK2; FAS; PPP2R2B; EIF2AK3; TP53; PPP2R1A; CDK6; BAX; TRADD; PSME3; GRB2; CLDN15; E2F3; TRAF2; CLDN16; NRAS; PIK3R1; TRAF6; CXCL10; NFKBIA; NR1H3; IRF9; YWHAZ; NFKB1 Epstein-Barr virus infection - Homo sapiens 0.003443486 0.452297534 0.302236718 1.4859104 175 182 CCNE1; HLA-DOA; CD58; IFNAR2; TRAF3; CR2; KEGG (human) MAPK10; BID; CYCS; OAS2; CDKN1B; PSMD2; STAT3; OAS3; NCOR2; MAP3K14; PSMC3; STAT2; PIK3CB; MYC; IKBKG; PSMD12; HLA-C; GADD45A; IRAK4; PIK3CD; MAPK14; PSMD8; HLA-B; CALR; SKP2; NFKBIE; PSMD4; SIN3A; HLA-A; DDB2; EIF2AK2; PLCG2; PSMD3; FAS; TP53; ENTPD8; CDK6; BAX; TRADD; PSMC4; SNW1; NFKBIB; ITGAL; JAK3; E2F3; TRAF2; PIK3R1; TRAF6; CXCL10; NFKBIA; BTK; HLA- DRB1; SAP30; IRF9 Phototransduction - Homo sapiens (human) 0.041858717 0.979820681 0.475625998 1.513433595 2115 21 CNGB1; CALM3; CALML4; GNB1; PDE6A; KEGG GNAT2; CALML3; PDE6G; GNGT1 Herpes simplex infection - Homo sapiens (hu- 0.020837817 0.979820681 0.287566634 1.375665052 1064 150 IL15; CFP; POU2F3; HLA-DOA; TBP; C3; IF- KEGG man) NAR2; TRAF3; MAPK10; RNASEL; EP300; CYCS; OAS2; TAF10; TNFRSF1A; OAS3; SOCS3; TBPL1; STAT2; IFIT1; IKBKG; SKP1; GTF2IRD1; CSNK2A1; HLA-C; HLA-B; SKP2; SRPK1; HLA- A; EIF2AK1; EIF2AK2; TAF6; FAS; EIF2AK3; TP53; TRAF1; NXF1; IFIH1; PER1; NFKBIB; TRAF2; TRAF6; NXF3; NFKBIA; TAF4; HLA- DRB1; IRF9; MCRS1

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S2: Glioma gene set enrichment analysis (GSEA) (continued)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme Neutrophil degranulation 0.007895092 0.836411609 0.239588011 1.304953275 409 448 FTH1; PLD1; PRDX6; COMMD9; LCN2; VAT1; Reactome ATP11A; S100A12; LRMP; ALDH3B1; CFP; TMEM173; AGL; PRCP; GGH; C3; CD58; SLPI; DOCK2; GOLGA7; CHIT1; XRCC5; APEH; ACLY; FOLR3; SNAP25; CAPN1; VNN1; PPIA; PSMD2; SLC11A1; NFASC; NCSTN; PSMC3; PECAM1; CD68; DOK3; SIGLEC9; GSN; ARHGAP9; PSMD12; VAMP8; HLA-C; C5AR1; PTPN6; CD93; ACTR10; FAF2; TNFAIP6; MAPK14; FCGR3B; PTPRN2; HLA-B; CYFIP1; RHOG; GALNS; TM- BIM1; VCL; CAP1; HLA-A; OSTF1; CCT2; HPSE; ANPEP; COTL1; PLAU; CTSA; PYGB; PLEKHO2; RAB10; C1orf35; CD36; SIGLEC5; CTSZ; AGA; RAP2C; ANXA2; RAP1A; PSMD3; XRCC6; TXNDC5; ITGAV; FCGR2C; RHOA; DNAJC5; BPI; ENPP4; RAB27A; FUCA2; ANO6; ATP11B; GRN; ITGAM; CD300A; CYB5R3; LILRB3; RNASET2; HK3; CLEC4D; ATP8B4; MMP9; ALDOA; SERPINB1; NPC2; NIT2; CDA; LILRA3; IT- GAL; TBC1D10C; SERPINA3; S100A9; HRNR; SIRPA; PAFAH1B2; MLEC; HGSNAT; OLR1; NRAS; CYBA; CTSG; IQGAP2; RAB3A; LAMP1; CLEC5A; ALOX5; LTA4H; SLC2A5; SERPINA1; PA2G4; LAMP2; NFKB1; DSN1; PSMA2; RAB6A; SYNGR1; PGAM1; FPR1; TMEM179B; S100A11; EPX; ARL8A; NME1-NME2; ATG7; SLCO4C1; RNASE3; FABP5; CD59; DSP; CCT8; RAB37 Recycling pathway of L1 0.031622634 0.88702983 -0.449249346 -1.532444194 1561 27 NUMB; RPS6KA5; RPS6KA2; MSN; RPS6KA3; Reactome AP2B1; EZR; DNM2; RDX; RPS6KA1; DPYSL2; CLTA Butyrate Response Factor 1 (BRF1) binds and 0.041723843 0.88702983 0.509530904 1.53051629 2097 17 YWHAB; EXOSC3; EXOSC7; DCP2; XRN1; MAP- Reactome destabilizes mRNA KAPK2; EXOSC5; DCP1A Cytokine Signaling in Immune system 0.00922317 0.836411609 0.241344672 1.304909082 477 414 IL15; MEF2A; PELI2; TRIM29; S100A12; OSM; Reactome IL2RB; CAMK2A; HCK; IFNAR2; TRAF3; GBP2; GBP7; MAPK10; UBE2L6; EIF4G1; PTPN2; RNASEL; FLNB; OAS2; TNFSF4; TNFRSF1A; SNAP25; TNFRSF18; INPP5D; IL20RB; CISH; TNFRSF6B; TNFRSF25; STAT3; OAS3; TNFSF9; IRAK2; MAP3K14; IL6ST; BRWD1; SOCS3; STAT2; ADAM17; PIK3CB; TRIM8; TNFSF13; IFIT1; IKBKG; PSMB8; SIGIRR; SKP1; IL27; MAP3K8; HIST1H3H; HLA-C; TRIM46; IL10RB; CREB1; IRAK4; PIK3CD; PTPN6; MAPK14; MAP3K3; CASP1; IFITM3; UBA52; IFITM1; IL1RAP; TNFSF15; CD27; HLA-B; SAA1; EGR1; IL18; UBE2E1; HIST1H3J; HERC5; IRF1; ARIH1; HLA-A; EIF2AK2; IL18BP; IL1A; IFI30; PELI1; GBP6; FYN; TNFRSF13C; PPP2R1A; PRKACA; HIST2H3D; IL18R1; LIF; IL10; SDC1; HIST1H3I; CSF3; EIF4A2; MAPKAPK2; NCAM1; RIPK2; TN- FSF8; EIF4G3; NFKBIB; GRB2; IL2RG; IL12RB2; TAB3; JAK3; VRK3; UBA7; USP18; TRAF2; SUMO1; CNTFR; PIK3R1; TRAF6; NFKBIA; GAB2; IL1RN; HLA-DRB1; IRF9; TRIM38; YW- HAZ; CD4; NFKB1; TRIM5; CLCF1; HAVCR2; TMEM189 Translocation of ZAP-70 to Immunological 0.026126833 0.88702983 -0.530111039 -1.598754197 1298 17 PTPN22; HLA-DQB2; CD247; HLA-DPB1; HLA- Reactome synapse DPA1; ZAP70; HLA-DQA1 HSP90 chaperone cycle for steroid hormone re- 0.040996962 0.88702983 -0.49024243 -1.52459203 2037 19 FKBP4; HSPA1A; NR3C2; HSP90AA1; HSPA1B; Reactome ceptors (SHR) NR3C1; PGR; HSPA1L Collagen degradation 0.005869449 0.836411609 0.499213775 1.730725335 296 29 FURIN; COL18A1; COL16A1; ADAM17; MMP15; Reactome COL9A2; COL17A1; MMP10; TMPRSS6; MMP12 TP53 Regulates Transcription of Cell Death 0.019492294 0.884904638 0.407372139 1.541179624 988 42 RABGGTB; BID; CASP6; TP73; TNFRSF10B; Reactome Genes TNFRSF10A; CASP1; TP53I3; TNFRSF10D; TP53BP2; IGFBP3; FAS; TP53; BAX; PMAIP1; BBC3; TMEM219; BCL6; STEAP3; TP53AIP1 Endogenous sterols 0.009686916 0.836411609 0.532156676 1.717629863 486 22 AHR; CYP46A1; CYP7B1; CYP11A1; CYP21A2; Reactome FDXR; CYP27A1; PTGIS Polo-like kinase mediated events 0.037114999 0.88702983 0.524954793 1.550798269 1868 16 FOXM1; EP300; CENPF; MYBL2; WEE1; CCNB1; Reactome LIN9; RBBP4; LIN54; CDC25C RHO GTPases Activate WASPs and WAVEs 0.004305492 0.836411609 0.477078151 1.73230666 217 35 ABI1; ARPC1A; WIPF3; WASF2; WIPF1; ARPC2; Reactome CYFIP1; ACTR3; PTK2; ACTB; ACTG1; GRB2; WASF1; BTK SCF(Skp2)-mediated degradation of p27/p21 0.047345739 0.918893834 0.491828902 1.502615817 2383 18 CCNE1; CDKN1B; SKP1; UBA52; SKP2 Reactome E associated events during G1/S transi- 0.020869153 0.88702983 0.455670233 1.579764134 1055 29 CCNE1; CDKN1B; MYC; WEE1; SKP1; UBA52; Reactome tion SKP2; MAX Activation of target genes at G1/S 0.012868122 0.836411609 0.575808822 1.701028997 647 16 CCNE1; RRM2; CDC6; FBXO5; DHFR; TFDP2 Reactome G1/S-Specific Transcription 0.012868122 0.836411609 0.575808822 1.701028997 647 16 CCNE1; RRM2; CDC6; FBXO5; DHFR; TFDP2 Reactome Mitotic G1-G1/S phases 0.026535135 0.88702983 0.314030359 1.404619144 1352 98 CCNE1; POLE2; CDKN2C; CDKN1B; RRM2; Reactome MYBL2; CDC6; FBXO5; MYC; MCM5; WEE1; SKP1; UBA52; ; SKP2; DHFR; CDKN2B; TFDP2; MAX; PRIM2; CCNB1; PPP2R1A; LIN9; CDK6; RBBP4 Cyclin A:Cdk2-associated events at S phase en- 0.044584099 0.88702983 0.424354953 1.471197121 2255 29 CCNE1; CDC25B; CDKN1B; WEE1; SKP1; Reactome try UBA52; SKP2

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S2: Glioma gene set enrichment analysis (GSEA) (continued)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme Metabolism of steroid hormones 0.001403689 0.836411609 0.569680106 1.903149269 70 25 HSD11B2; STAR; STARD3NL; CYP11A1; Reactome CYP21A2; HSD17B2; HSD11B1; FDXR Signaling by NOTCH 0.021931113 0.88702983 0.308373207 1.406268922 1119 110 LFNG; TLE1; FURIN; PRKCI; HDAC6; DLGAP5; Reactome EP300; MOV10; PBX1; ARRB2; NCOR2; NCSTN; POFUT1; ADAM17; HDAC5; MYC; SKP1; CREB1; ST3GAL4; YBX1; KAT2B; UBA52; JAG1 UCH proteinases 0.030116224 0.88702983 0.312156043 1.391053503 1533 96 PSMA1; USP15; HIST1H2AG; PSMB6; ACTR5; Reactome PSMD2; PSMC3; INO80C; PSMB8; PSMD12; PSMA4; HIST1H2AL; HIST2H2AA3; HIST1H2AC; PSMD8; UBA52; BAP1; PSMD4; PSMB2; PSMD3; HIST1H2AJ; INO80E; PSMC4; YY1; ACTB; INO80D; PSME3; ACTL6A; TGFB1; MBD6; PSMB10; SENP8; MCRS1; HIST1H2AH; PSMA2 Cell surface interactions at the vascular wall 0.042538163 0.88702983 0.287973161 1.32922636 2167 119 KRAS; CD99L2; CD58; ANGPT1; INPP5D; PPIA; Reactome THBD; TNFRSF10B; PECAM1; PIK3CB; IGLL1; TNFRSF10A; PTPN6; ITGA3; SLC16A8; CD48; PROC; GRB14; SLC7A7; SDC4; TNFRSF10D; L1CAM; SLC16A3; APOB; SLC7A11; ITGAV; FYN; PF4V1; CD244; SDC1; ITGB1; ITGAM; GRB2; ITGAL; TGFB1; SIRPA; OLR1; NRAS; PIK3R1 Ovarian tumor domain proteases 0.018941713 0.884904638 0.421710851 1.560663791 958 38 TRAF3; OTUB2; IKBKG; ZRANB1; UBA52; Reactome UBE2D1; PTEN; OTUD7B; TP53; RHOA; OTUD7A; RIPK2; IFIH1; OTUD3; TNIP1; TRAF6 Deubiquitination 0.044164467 0.88702983 0.243746284 1.251046765 2274 254 HIST1H2BL; PSMA1; USP15; HIST1H2AG; Reactome CDC20; PSMB6; HIST1H2BH; TRAF3; RNF123; ACTR5; EP300; TAF10; ARRB2; PSMD2; HIST2H2BE; PSMC3; MYC; OTUB2; IKBKG; INO80C; USP8; PSMB8; PSMD12; PSMA4; HIST1H2AL; HIST2H2AA3; KAT2B; ZRANB1; HIST1H2AC; PSMD8; UBA52; BAP1; SKP2; UBE2D1; USP24; PSMD4; PTEN; PSMB2; OTUD7B; DDB2; ATXN3; PSMD3; HIST1H2AJ; TP53; RHOA; USP13; USP10; POLB; INO80E; OTUD7A; MAT2B; PSMC4; YY1; ACTB; RIPK2; INO80D; IFIH1; OTUD3; PSME3; ACTL6A; SNX3; TNIP1; USP9X; BECN1; TGFB1; USP18; IDE; TRAF2; USP42; MBD6; TRAF6; NFK- BIA; PSMB10; BRCC3; SENP8; CFTR; MCRS1; RAD23A; HIST1H2AH; PSMA2 G1/S Transition 0.008919762 0.836411609 0.370382713 1.558707807 451 70 CCNE1; POLE2; CDKN1B; RRM2; CDC6; FBXO5; Reactome MYC; MCM5; WEE1; SKP1; UBA52; SKP2; DHFR; TFDP2; MAX; PRIM2; CCNB1; PPP2R1A Regulation of TP53 Activity through Methyla- 0.031415692 0.88702983 0.534796562 1.57987239 1581 16 EP300; EHMT2; CHEK2; UBA52; TP53; TTC5; Reactome tion SMYD2 Synthesis of PIPs at the plasma membrane 0.019476039 0.884904638 0.381848951 1.51517898 987 52 PIK3C2A; PIK3CG; PIK3C2B; MTM1; INPP5D; Reactome PLEKHA8; PIK3CB; PIP4K2A; PIK3CD; RAB4A; PIP5K1B; PTEN Regulation of TP53 Activity 0.027457495 0.88702983 0.282017677 1.348191886 1404 150 PRKAA1; MEAF6; RNF34; TBP; PRDM1; Reactome PPP1R13L; DNA2; PHF20; TAF7; EP300; TAF10; EHMT2; TP73; MAPKAP1; CHEK2; PIP4K2A; CSNK2A1; MAPK14; BANP; BRD7; UBA52; TP53BP2; BRPF3; AURKA; TAF6; BRPF1; TP53; RICTOR; PPP2R1A; ATR; TTC5; SUPT16H; SMYD2; PDPK1; RBBP4; SSRP1; ZNF385A; PPP2R5C; PRR5; TAF12; PRKAB2; RAD9B; TAF4; TOPBP1 Transcriptional Regulation by TP53 0.008834161 0.836411609 0.254178091 1.341368606 456 325 CCNE1; POLR2G; PRKAA1; RABGGTB; CDK12; Reactome MEAF6; RNF34; YWHAB; TBP; PRDM1; PPP1R13L; DNA2; PHF20; E2F7; PRDX1; BID; TAF7; EP300; CYCS; TAF10; MOV10; EHMT2; CASP6; CDKN1B; TP73; TNFRSF10B; MAPKAP1; CHEK2; RRAGD; PIP4K2A; DDIT4; TNFRSF10A; CSNK2A1; GADD45A; MAPK14; BANP; CASP1; BRD7; TP53I3; UBA52; CCNK; PLK2; FANCC; PTEN; CDK9; TNFRSF10D; DDB2; TP53BP2; IGFBP3; BRPF3; AURKA; TAF6; BRPF1; TFDP2; PMS2; ALK; FAS; TP53; RICTOR; CCNB1; PPP2R1A; ATR; TTC5; CNOT6; BAX; SUPT16H; SMYD2; PDPK1; CCNT1; RBBP4; SSRP1; PMAIP1; MLH1; CNOT1; ZNF385A; PPP2R5C; G6PD; PRR5; TAF12; RRAGA; BBC3; CDC25C; PRKAB2; GSR; SESN2; RAD9B; TAF4; COX6B1; PRDX5; TOPBP1; GTF2H3; POLR2D; TNRC6B; RHEB; COX11; YWHAZ; EXO1 Phospholipid metabolism 0.033944847 0.88702983 0.261235641 1.297459942 1742 197 PLD1; LPCAT4; PHOSPHO1; PIK3C2A; PN- Reactome PLA2; AGPAT4; PIK3CG; PLD6; PIK3C2B; MTM1; INPP5D; LPIN2; PLEKHA8; PITPNM3; PIK3CB; PIP4K2A; STARD7; CSNK2A1; PEMT; DGAT2; PLA2G2D; PIK3CD; LPCAT3; LIPH; PLA2G4A; GPD1L; CEPT1; PLBD1; PITPNM2; PLA2G3; RAB4A; OSBPL10; NDUFS3; PIP5K1B; PTEN; TNFAIP8L1; TNFAIP8; DDHD1; PNPLA3; PCYT2 The phototransduction cascade 0.041476476 0.88702983 0.43702527 1.488698054 2098 27 CNGB1; RGS9BP; PPEF1; FNTA; GNB1; PDE6A; Reactome PDE6G; GNGT1; METAP2; NMT2

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S2: Glioma gene set enrichment analysis (GSEA) (continued)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme Aflatoxin activation and detoxification 0.013192612 0.836411609 -0.550534601 -1.685703373 654 18 DPEP2; CYP3A5; GGT6; AKR7A2; GGT7; ACY3; Reactome CYP3A4; DPEP3 APC/C-mediated degradation of cell cycle pro- 0.033248385 0.88702983 0.39855507 1.483460988 1682 39 CDC14A; NEK2; CDC20; UBE2C; FBXO5; SKP1; Reactome teins UBA52; UBE2D1; UBE2E1; ANAPC1; CDC26; AU- RKA; CCNB1 Regulation of mitotic cell cycle 0.033248385 0.88702983 0.39855507 1.483460988 1682 39 CDC14A; NEK2; CDC20; UBE2C; FBXO5; SKP1; Reactome UBA52; UBE2D1; UBE2E1; ANAPC1; CDC26; AU- RKA; CCNB1 Cytosolic tRNA aminoacylation 0.00762633 0.836411609 0.525207006 1.7343984 384 24 AIMP2; QARS; YARS; AARS; EEF1E1; RARS; Reactome GARS; EPRS; PPA1 Regulation of mRNA stability by proteins that 0.01500099 0.884904638 0.432597239 1.590864699 757 37 YWHAB; EIF4G1; ANP32A; EXOSC3; TNFSF13; Reactome bind AU-rich elements EXOSC7; DCP2; PARN; MAPK14; PABPC1; XRN1; ZFP36; MAPKAPK2 Interferon alpha/beta signaling 0.006126872 0.836411609 0.404884881 1.631392666 310 56 IFNAR2; GBP2; RNASEL; OAS2; OAS3; SOCS3; Reactome STAT2; IFIT1; PSMB8; HLA-C; PTPN6; IFITM3; IFITM1; HLA-B; EGR1; IRF1; HLA-A Serotonin Neurotransmitter Release Cycle 0.02909532 0.88702983 0.527192939 1.583569073 1462 17 PPFIA2; SNAP25; UNC13B; PPFIA3; SYT1; SYN3; Reactome RIMS1; RAB3A cGMP effects 0.030534351 0.88702983 -0.514105346 -1.574159217 1515 18 KCNMA1; PRKG1; PDE1A; PDE11A; KCNMB4; Reactome PRKG2 Norepinephrine Neurotransmitter Release Cy- 0.031534841 0.88702983 0.53463497 1.57939502 1587 16 PPFIA2; SNAP25; UNC13B; PPFIA3; SYT1; Reactome cle SLC22A1; RIMS1; RAB3A Interferon Signaling 0.018124707 0.884904638 0.294380964 1.393777097 928 140 TRIM29; CAMK2A; IFNAR2; GBP2; GBP7; Reactome UBE2L6; EIF4G1; PTPN2; RNASEL; FLNB; OAS2; OAS3; SOCS3; STAT2; TRIM8; IFIT1; PSMB8; HLA-C; TRIM46; PTPN6; IFITM3; IFITM1; HLA- B; EGR1; UBE2E1; HERC5; IRF1; ARIH1; HLA-A; EIF2AK2 ISG15 antiviral mechanism 0.037860327 0.88702983 0.426422988 1.489971563 1910 30 UBE2L6; EIF4G1; FLNB; IFIT1; UBE2E1; HERC5; Reactome ARIH1; EIF2AK2; EIF4A2; EIF4G3; UBA7 Antiviral mechanism by IFN-stimulated genes 0.037860327 0.88702983 0.426422988 1.489971563 1910 30 UBE2L6; EIF4G1; FLNB; IFIT1; UBE2E1; HERC5; Reactome ARIH1; EIF2AK2; EIF4A2; EIF4G3; UBA7 TCF dependent signaling in response to WNT 0.044771222 0.88702983 -0.262438651 -1.285680698 2182 170 TNKS; KLHL12; CREBBP; LGR5; CXXC4; Reactome LGR4; HIST1H4J; KAT5; SOX6; MEN1; TLE4; WNT3; HIST1H3B; USP34; HIST1H4H; RNF43; HIST1H3F; FRAT2; TRRAP; SOX13; CTNNBIP1; HIST1H4E; FRAT1; BCL9; HIST1H4B; HIST1H3A; LEF1; SMURF2; CTNNB1; PPP2R5E; LRP5; TLE3; CSNK2B; LGR6; SOX9; DKK1; DACT1; HIST1H3G; HIST1H2BI; H2AFZ; HIST1H2BO; RNF146; PPP2R1B; UBC; SFRP1 Steroid hormones 0.00383331 0.836411609 0.485293294 1.749989429 193 34 HSD11B2; STAR; STARD3NL; CYP11A1; Reactome CYP21A2; HSD17B2; HSD11B1; CUBN; FDXR Cell Cycle 0.039460775 0.784765749 0.292631447 1.343666031 2010 114 CCNE1; CDC14A; YWHAB; CDC20; CDKN2C; Wikipathways CDC25B; EP300; TGFB2; CDKN1B; CHEK2; CDC6; MYC; MCM5; WEE1; STAG1; SKP1; CDC14B; GADD45A; E2F5; SKP2; ANAPC1; ESPL1; SMC1A; CDKN2B; STAG2; TFDP2; TP53; CCNB1; ATR; CDK6 Apoptosis 0.002525053 0.431784108 0.380687215 1.649678682 127 82 CASP4; TRAF3; MAPK10; BID; CYCS; TN- Wikipathways FRSF1A; CASP6; TNFRSF25; TP73; TNFRSF10B; BAD; MYC; IKBKG; CASP1; NFKBIE; IRF1; DFFB; BCL2L1; FAS; TP53; TRAF1; BAX; TRADD; PMAIP1; NFKBIB; HELLS; BCL2L2; BBC3; TRAF2; PIK3R1; NFKBIA Integrated Breast Cancer Pathway 0.006332746 0.721933071 0.39496863 1.6158355 319 61 AHR; KRAS; CDH1; EP300; FOSL1; ODC1; MYC; Wikipathways RAD54L; RASGRP3; XRCC3; GADD45A; CREB1; NUP85; ANXA1; GDI1; IMPA1; AURKA; RAP1A; BAX; TFPI; GRN Interferon alpha-beta signaling 0.038744395 0.784765749 0.480759919 1.529793683 1943 21 GBP2; RNASEL; IFIT1; PSMB8; IFITM3; IFITM1; Wikipathways EGR1 Mitotic G1-G1-S phases 0.016970371 0.784765749 0.532948874 1.652437416 853 19 CCNE1; RRM2; MYBL2; CDC6; FBXO5; DHFR Wikipathways IL17 signaling pathway 0.046633872 0.784765749 0.427732478 1.468976466 2350 28 IL17RA; TRAF3; STAT3; MAP3K14; IKBKG; Wikipathways IL17RD; NFKBIB; TRAF6; CEBPB; NFKB1; IL17C Peptide GPCRs 0.042102716 0.784765749 -0.362677639 -1.423763055 2087 49 CCR7; CXCR1; TACR1; TAC4; MC1R; SSTR3; ED- Wikipathways NRA; ATP8A1; CCR2; CCKBR; GNRHR; CCR9; GALR2; CCR10; FPR2 Preimplantation Embryo 0.027191051 0.784765749 0.394865977 1.500993518 1370 43 TCF7L1; SIX3; MXD1; CDH1; PBX1; FOXD1; Wikipathways IRX5; E2F5; EGR1; ; POU5F1; AQP9; ZFP36L2; ZFP36 Transcription factor regulation in adipogenesis 0.028016939 0.784765749 -0.491467316 -1.571775394 1395 21 SLC2A4; FOXO1; IL6; NR3C1; CEBPA; LPIN1; Wikipathways PPARG; PCK2 BDNF-TrkB Signaling 0.044264314 0.784765749 0.407704852 1.456766535 2227 33 PIK3CG; KRAS; HOMER1; EEF2K; TRPC6; Wikipathways CREB1; GRB2; MKNK1; NRAS; BDNF; GAB2; RHEB; GAB1; RPS6KB1 TP53 Regulates Transcription of Cell Death 0.001051295 0.359542984 0.558943936 1.919600523 52 28 BID; CASP6; TNFRSF10B; TNFRSF10A; CASP1; Wikipathways Genes TP53I3; TNFRSF10D; IGFBP3; FAS; BAX; PMAIP1; BBC3; BCL6; STEAP3; TP53AIP1 Oxidative Damage 0.017136178 0.784765749 0.422129083 1.56995052 864 39 C1QA; TRAF3; CR2; MAPK10; CYCS; CDKN1B; Wikipathways BAD; GADD45A; C5AR1; BAG4; NFKBIE; TRAF1; TRAF2; TRAF6; MAP3K9; NFKB1; BAK1

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S2: Glioma gene set enrichment analysis (GSEA) (continued)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme Protein alkylation leading to liver fibrosis 0.044603069 0.784765749 -0.360719175 -1.416074714 2211 49 PDGFRA; SMAD3; COL4A5; IL6; RELB; SMAD2; Wikipathways COL4A4; COL4A1; NFKB2; COL4A3; MAPK12; SMAD1; PPARG; TIMP2; TIMP1; MMP1; MMP2; COL4A6; MAPK13; PARP1; TNFSF10; FASLG; MAPK3 G1 to S cell cycle control 0.019794925 0.784765749 0.370655465 1.501831771 999 58 CCNE1; POLE2; CDKN2C; CDKN1B; MYC; Wikipathways MCM5; WEE1; GADD45A; CREB1; ATF6B; CDKN2B; TFDP2; PRIM2; TP53; CCNB1; CDK6 Tryptophan metabolism 0.040490649 0.784765749 -0.38724215 -1.452814372 2006 40 AADAT; ALDH9A1; ALDH3A2; IDO1; OGDH; Wikipathways INMT; GCDH DNA Damage Response 0.033421353 0.784765749 0.342399972 1.423833769 1695 66 CCNE1; BID; CYCS; CDKN1B; TNFRSF10B; Wikipathways CHEK2; MYC; TLK1; GADD45A; CREB1; RAD51; DDB2; SMC1A; FAS; TP53; CCNB1; ATR; CDK6; BAX; PMAIP1; BBC3; CDC25C; H2AFX; CDK4; TP53AIP1; GADD45G; CHEK1; CCNB2; TLK2; RRM2B; E2F1; PRKDC Prostaglandin Synthesis and Regulation 0.012317274 0.784765749 0.424236849 1.604102106 620 42 PTGIR; ABCC4; HSD11B2; TBXA2R; CYP11A1; Wikipathways ANXA3; PTGS1; S100A10; PLA2G4A; ANXA1; HSD11B1; PTGIS; ANXA2; EDN1

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S3: HNSC gene set over-representation

Pathway Gene set p value p value Jaccard In- Overlapping Number Genes Source size (adjusted) dex of overlap- ping

Nicotine addiction - Homo sapiens (human) 40 0.018223195 1 0.003460208 GRIA3; CACNA1B; GRIN3A; GRIN1; 6 lowriskgenes KEGG GABRB2; GRIN2A Glutamatergic synapse - Homo sapiens (hu- 114 0.018652582 1 0.009444444 DLGAP1; GRIN1; SLC38A3; GNG2; 17 lowriskgenes KEGG man) ADCY2; GRM5; PLCB4; ITPR2; ITPR3; GNG11; SHANK2; GRIA3; GLS2; GRIN3A; PPP3CC; GRIN2A; SLC1A1 Autophagy - other - Homo sapiens (human) 32 0.021616644 1 0.004022989 GABARAPL1; GABARAPL2; ATG16L1; 7 lowriskgenes KEGG ATG7; ATG5; ATG9A; ATG2B Butanoate metabolism - Homo sapiens (human) 28 0.02261948 1 0.003458213 ACSM1; ACSM3; HMGCS1; HMGCL; 6 lowriskgenes KEGG HADH; ECHS1 Long-term potentiation - Homo sapiens (hu- 67 0.038039107 1 0.006221719 RAF1; EP300; GRM5; GRIN1; PLCB4; 11 lowriskgenes KEGG man) ITPR2; ITPR3; RPS6KA2; GRIN2A; PPP3CC; CREBBP Base excision repair - Homo sapiens (human) 33 0.041726265 1 0.004013761 POLB; POLL; OGG1; POLD1; LIG3; PCNA; 7 lowriskgenes KEGG PARP3 Beta oxidation of butanoyl-CoA to acetyl-CoA 5 0.00370472 1 0.001745201 ECHS1; ACSM3; HADH 3 lowriskgenes REACTOME Regulation of TP53 Activity through Acetyla- 30 0.004941329 1 0.004602992 HDAC1; AKT1; MAP2K6; EP300; PIP4K2B; 8 lowriskgenes REACTOME tion ING5; BRD1; ING2 PI5P Regulates TP53 Acetylation 9 0.005029165 1 0.00232288 MAP2K6; EP300; ING2; PIP4K2B 4 lowriskgenes REACTOME Miscellaneous substrates 12 0.012809503 1 0.002320186 CYP2D6; CYP4B1; CYP2S1; CYP4F3 4 lowriskgenes REACTOME Synthesis of pyrophosphates in the cytosol 11 0.012809503 1 0.002320186 NUDT3; NUDT4; PPIP5K2; PPIP5K1 4 lowriskgenes REACTOME Mitochondrial Fatty Acid Beta-Oxidation 38 0.014087781 1 0.004589788 MCAT; PCCA; ACAD11; ECHS1; ACSM3; 8 lowriskgenes REACTOME ACOT9; HADH; MMAA CYP2E1 reactions 11 0.015871668 1 0.001743173 CYP2D6; CYP2A6; CYP2S1 3 lowriskgenes REACTOME PIWI-interacting RNA (piRNA) biogenesis 14 0.018556833 1 0.002318841 TDRD1; TDRD6; MAEL; TDRD9 4 lowriskgenes REACTOME Inositol phosphate metabolism 50 0.018708496 1 0.005694761 MIOX; PLCH2; PLCB4; NUDT3; NUDT4; 10 lowriskgenes REACTOME INPP5A; INPP5D; INPP5J; PPIP5K2; PPIP5K1 Signaling by NOTCH3 44 0.024346622 1 0.005136986 WWP2; HEY1; EGF; DLL1; EP300; PBX1; 9 lowriskgenes REACTOME MAMLD1; UBC; CREBBP APEX1-Independent Resolution of AP Sites via 7 0.025727636 1 0.00174216 POLB; LIG3; OGG1 3 lowriskgenes REACTOME the Single Nucleotide Replacement Pathway Toxicity of botulinum toxin type G (BoNT/G) 3 0.028037765 1 0.001163467 VAMP1; VAMP2 2 lowriskgenes REACTOME -enabled inhibition of pre-replication com- 9 0.028037765 1 0.001163467 CDK1; MCM8 2 lowriskgenes REACTOME plex formation TRAF3-dependent IRF activation pathway 13 0.03420592 1 0.002316155 IFIH1; IRF7; EP300; CREBBP 4 lowriskgenes REACTOME Synaptic adhesion-like molecules 21 0.03523289 1 0.002886836 GRIA3; LRFN3; GRIN2A; LRFN4; GRIN1 5 lowriskgenes REACTOME Intraflagellar transport 41 0.03662046 1 0.004574042 DYNLL2; IFT20; TNPO1; IFT74; TTC26; 8 lowriskgenes REACTOME IFT46; IFT81; TTC21B NOTCH3 Intracellular Domain Regulates Tran- 20 0.043225634 1 0.00288517 HEY1; EP300; MAMLD1; CREBBP; PBX1 5 lowriskgenes REACTOME scription DNA Mismatch Repair 9 0.008339762 1 0.002321532 PCNA; MSH6; RFC1; POLD1 4 lowriskgenes Wikipathways Hematopoietic Stem Cell Gene Regulation by 19 0.008593 1 0.003466205 DNMT3B; GZMB; SMAD4; GABPB1; 6 lowriskgenes Wikipathways GABP alpha-beta Complex EP300; CREBBP Senescence and Autophagy in Cancer 106 0.016982469 1 0.009449694 TNFSF15; IGF1; SLC39A2; ING1; ING2; 17 lowriskgenes Wikipathways VTN; SMAD4; RAF1; IRF7; GABARAPL1; GABARAPL2; ATG16L1; ATG7; ATG5; PCNA; PLAT; SQSTM1 Caloric restriction and aging 9 0.038146719 1 0.001741149 SIRT1; IGF1; AKT1 3 lowriskgenes Wikipathways Sudden Infant Death Syndrome (SIDS) Suscep- 159 0.045005202 1 0.010917031 DEAF1; SCN4B; GRIN1; SLC25A4; GATA2; 20 lowriskgenes Wikipathways tibility Pathways SCN3B; GAPDH; TSPYL1; LMX1B; ESR2; MAOA; CREBBP; HDAC1; EP300; SOX2; SPTBN1; VIPR2; NOS1AP; PBX1; VAMP2 Initiation of transcription and translation elon- 32 0.04710214 1 0.003450259 HDAC1; NFKBIA; SUPT5H; EP300; PPP3CC; 6 lowriskgenes Wikipathways gation at the HIV-1 LTR CREBBP Pathways Affected in Adenoid Cystic Carci- 63 0.049386532 1 0.005668934 FOXP2; RAF1; EP300; MAGI2; BRCA1; 10 lowriskgenes Wikipathways noma BRD1; AKT1; HIST1H2AL; IL17RD; CREBBP Sulfur relay system - Homo sapiens (human) 8 0.005018461 0.818003735 0.00232423 MOCS3; MPST; CTU1; TST 4 highriskgenes KEGG African trypanosomiasis - Homo sapiens (hu- 34 0.005096596 0.818003735 0.005166475 IDO1; APOA1; PRKCA; IFNG; HPR; THOP1; 9 highriskgenes KEGG man) HBA1; HBA2; IL6 beta-Alanine metabolism - Homo sapiens (hu- 31 0.017851169 1 0.004027618 AOC3; ALDH3A1; ABAT; HADHA; GAD1; 7 highriskgenes KEGG man) ALDH3B1; ALDH1B1 Hepatocellular carcinoma - Homo sapiens (hu- 168 0.018840372 1 0.013484358 SMARCC1; NQO1; ARAF; NFE2L2; SHC3; 25 highriskgenes KEGG man) ACTB; WNT7B; GADD45A; ARID2; TERC; FZD2; FZD5; TGFBR1; LEF1; WNT10A; LRP5; WNT3; WNT1; GSTM3; CDK4; CDK6; PLCG2; ACTL6A; PRKCA; E2F1 Salmonella infection - Homo sapiens (human) 86 0.02651946 1 0.007851935 IFNG; CXCL1; CXCL2; ARPC3; ACTB; 14 highriskgenes KEGG FLNB; FLNC; MYH9; ARPC5L; IL6; ARPC1B; PKN2; DYNC1H1; CASP1 Malaria - Homo sapiens (human) 49 0.027884803 1 0.005136986 HBA1; HBA2; THBS2; IL6; IFNG; COMP; 9 highriskgenes KEGG LRP1; CSF3; TLR2 Mineral absorption - Homo sapiens (human) 51 0.031868233 1 0.005134056 SLC26A9; MT1X; MT1M; SLC34A2; TRPV6; 9 highriskgenes KEGG FTL; ATP1A3; STEAP2; SLC8A1 Tyrosine metabolism - Homo sapiens (human) 36 0.035764986 1 0.00401837 AOC3; ALDH3A1; FAHD1; GOT1; ADH1B; 7 highriskgenes KEGG ALDH3B1; COMT

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S3: HNSC gene set over-representation (continued)

Pathway Gene set p value p value Jaccard In- Overlapping Number Genes Source size (adjusted) dex of overlap- ping Oocyte meiosis - Homo sapiens (human) 125 0.049580177 1 0.009387079 ESPL1; CAMK2A; CDC20; CDK2; ANAPC7; 17 highriskgenes KEGG RPS6KA1; CALML4; SPDYA; PPP2R5E; SKP1; PTTG2; CPEB1; SPDYE1; CPEB2; YWHAB; YWHAG; CCNB2 Synthesis of PE 14 0.004319961 1 0.002900232 CPT1B; PHOSPHO1; PCYT2; ETNK2; 5 highriskgenes REACTOME ETNK1 Scavenging of heme from plasma 12 0.004319961 1 0.002900232 LRP1; HPR; HBA1; HBA2; APOA1 5 highriskgenes REACTOME RUNX1 and FOXP3 control the development of 10 0.008322419 1 0.00232288 CTLA4; IL2RA; IFNG; CBFB 4 highriskgenes REACTOME regulatory T lymphocytes (Tregs) G2 Phase 5 0.008557166 1 0.001745201 CCNA2; CDK2; E2F1 3 highriskgenes REACTOME Reelin signalling pathway 5 0.008557166 1 0.001745201 DAB1; RELN; VLDLR 3 highriskgenes REACTOME Glycerophospholipid biosynthesis 133 0.010844308 1 0.01103144 HADHA; GPAM; LPCAT4; ETNK2; 20 highriskgenes REACTOME PCYT1A; ETNK1; PCYT2; CSNK2A2; CPT1B; PITPNB; PTDSS1; LCLAT1; PITPNM3; CPNE1; DDHD2; LPCAT3; PHOSPHO1; SLC44A5; PEMT; CPNE3 Collagen formation 92 0.012195197 1 0.008417508 COL11A1; COL11A2; P4HA3; COL23A1; 15 highriskgenes REACTOME PCOLCE; COL9A3; SERPINH1; PPIB; COL10A1; LOXL1; PLOD3; PLOD1; CTSB; ITGB4; COL6A2 Collagen biosynthesis and modifying 68 0.014463281 1 0.006798867 COL10A1; COL11A1; COL11A2; PCOLCE; 12 highriskgenes REACTOME P4HA3; PLOD3; PLOD1; COL9A3; SER- PINH1; COL23A1; COL6A2; PPIB Binding and Uptake of Ligands by Scavenger 41 0.015415948 1 0.005148741 STAB2; APOA1; MSR1; HPR; HBA1; HBA2; 9 highriskgenes REACTOME Receptors FTL; LRP1; HSP90B1 PTK6 Regulates Cell Cycle 6 0.015846172 1 0.001744186 PTK6; CDK2; CDK4 3 highriskgenes REACTOME RHO GTPases Activate WASPs and WAVEs 37 0.019913623 1 0.004587156 CYFIP2; ARPC3; ACTB; ARPC1B; WIPF2; 8 highriskgenes REACTOME WIPF1; ACTR3; BTK Neutrophil degranulation 485 0.024465713 1 0.02752729 HSPA8; DBNL; ORM1; SNAP23; TRAPPC1; 58 highriskgenes REACTOME PRG2; FAF2; FTL; PSMD12; SIRPB1; GLA; RAP1B; IGF2R; S100A9; TMEM63A; MME; PGLYRP1; GYG1; ALDH3B1; CD44; C5AR1; TMEM173; DYNLT1; METTL7A; TMEM179B; SLC11A1; GCA; PGAM1; ATP6V0C; PRDX6; SVIP; IQ- GAP1; DNAJC13; DYNC1H1; TRIM24; CTSB; CXCL1; XRCC5; PSMD11; KRT1; PSMA5; DSG1; PLAUR; MOSPD2; ATP11B; RNASE3; PA2G4; ADAM8; TLR2; ATP8B4; FUCA2; GLB1; CD58; DEFA1B; CPNE3; CPNE1; MMP8; GLIPR1 2-LTR circle formation 7 0.025687478 1 0.001743173 BANF1; XRCC5; XRCC4 3 highriskgenes REACTOME Erythrocytes take up oxygen and release carbon 9 0.025687478 1 0.001743173 CA4; HBA1; HBA2 3 highriskgenes REACTOME dioxide Immune System 1827 0.026871103 1 0.05670272 HSPA8; ORM1; TRAPPC1; CLEC2B; 173 highriskgenes REACTOME CLEC2D; LILRB4; ANAPC7; RPS6KA1; PSMD11; PSMD12; IGF2R; PELI1; PG- LYRP1; CBLB; CD44; EIF4E2; CFI; TRIM11; TRIM10; TMEM179B; GCA; ATP11B; CD300LF; PRDX6; SVIP; WIPF2; WIPF1; IFI27; TRIM29; TRIM24; CD209; C4BPB; EGR1; HLA-DQB1; RNF138; PTPN4; PSMA5; DSG1; LGALS9; PLAUR; SIRPB1; IFI35; PA2G4; SOCS3; ARPC1B; ADAM8; CCNF; IL4R; ARPC3; ATP8B4; HIST1H3J; NFKB2; DEFA1B; TRAF2; CUL5; CAMK2A; DBNL; CD1B; HSP90B1; TNFRSF4; GLMN; GLA; RAP1B; CD200R1; IL2RA; MME; PLCG2; CTSB; NCR1; SEC61B; FUCA2; TMEM173; NFATC3; SKP1; DNAJC13; UBE2J2; MAP3K8; CXCL1; ATG12; UBE2M; KRT1; NLRX1; IL19; IL6; MO- SPD2; IL23R; RNASE3; TLR2; RNF7; HLA-G; FAF2; SEC23A; CDC20; ACTR3; GLB1; CPNE3; CPNE1; USP18; IL27RA; COLEC10; RASGRP4; PRG2; HCK; S100A9; TMEM63A; AP1S2; GYG1; BTK; C5AR1; DYNLT1; IFITM2; RNF19B; RNF19A; CSF1; CSF3; ACTB; PROS1; UNC93B1; UBE2E1; FBXO22; IQGAP1; NOD1; YWHAB; DYNC1H1; MMP8; CTLA4; KIR3DL1; PPP2R5E; CYFIP2; YES1; CASP1; STUB1; KLHL5; BTN3A2; RIPK1; TXK; GLIPR1; IL2RB; SNAP23; ATP6V1F; ATP6V1A; FTL; FLNB; EIF4A1; ITGB1; ALDH3B1; CREB1; METTL7A; ITGB7; IL9R; PGAM1; BTLA; ATP6V0C; DUSP7; CD101; CD247; XRCC5; PRR5; TNFRSF17; GBP3; TNFRSF1A; ICAM4; PPL; GAN; UBOX5; ASB16; DNM2; IFNG; BTBD6; CD58; SEC31A; SEC22B; SLC11A1 PTK6 Down-Regulation 3 0.028006291 1 0.001164144 SRMS; PTK6 2 highriskgenes REACTOME Synthesis of glycosylphosphatidylinositol 18 0.028158923 1 0.002890173 PIGG; PIGN; PIGP; PIGX; PIGY 5 highriskgenes REACTOME (GPI)

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S3: HNSC gene set over-representation (continued)

Pathway Gene set p value p value Jaccard In- Overlapping Number Genes Source size (adjusted) dex of overlap- ping SCF(Skp2)-mediated degradation of p27/p21 18 0.028158923 1 0.002890173 CCNA2; PTK6; SKP1; CDK4; CDK2 5 highriskgenes REACTOME Signaling by PTK6 54 0.031364096 1 0.005685048 PTK6; GPNMB; ERBB4; SRMS; BCAR1; 10 highriskgenes REACTOME EPAS1; SOCS3; CDK2; CDK4; EREG Signaling by Non-Receptor Tyrosine 54 0.031364096 1 0.005685048 PTK6; GPNMB; ERBB4; SRMS; BCAR1; 10 highriskgenes REACTOME EPAS1; SOCS3; CDK2; CDK4; EREG FGFR4 ligand binding and activation 14 0.038088888 1 0.00174216 FGF18; FGFR4; FGF17 3 highriskgenes REACTOME CHL1 interactions 9 0.038088888 1 0.00174216 ITGB1; HSPA8; CHL1 3 highriskgenes REACTOME Mismatch repair (MMR) directed by 14 0.044112555 1 0.002316155 RPA1; MLH1; RPA3; LIG1 4 highriskgenes REACTOME MSH2:MSH3 (MutSbeta) TP53 Regulates Transcription of Genes In- 14 0.044112555 1 0.002316155 CCNA2; E2F1; CDK2; E2F8 4 highriskgenes REACTOME volved in G1 Cell Cycle Arrest Early Phase of HIV Life Cycle 15 0.044112555 1 0.002316155 LIG1; BANF1; XRCC5; XRCC4 4 highriskgenes REACTOME Mismatch repair (MMR) directed by 14 0.044112555 1 0.002316155 MLH1; RPA1; RPA3; LIG1 4 highriskgenes REACTOME MSH2:MSH6 (MutSalpha) Synthesis of PC 29 0.04698885 1 0.003452244 CPT1B; PHOSPHO1; SLC44A5; CSNK2A2; 6 highriskgenes REACTOME PCYT1A; PEMT Kennedy pathway from Sphingolipids 13 0.000916315 0.550705476 0.003480278 PCYT1A; PTDSS1; PEMT; PCYT2; ETNK2; 6 highriskgenes Wikipathways ETNK1 Interleukin-10 signaling 38 0.007625464 1 0.00516055 CCR1; CCL20; CCL22; CXCL1; CXCL2; TN- 9 highriskgenes Wikipathways FRSF1A; CSF1; CSF3; IL6 Tumor suppressor activity of SMARCB1 31 0.007744666 1 0.004600345 SMARCC1; EED; H3F3A; H3F3B; GLI4; 8 highriskgenes Wikipathways CDK6; CDK4; ACTL6A Cytokines and Inflammatory Response 28 0.008568904 1 0.003468208 CSF1; IFNG; CXCL1; CXCL2; CSF3; IL6 6 highriskgenes Wikipathways Signaling by PTK6 2 0.010001747 1 0.001164822 SOCS3; PTK6 2 highriskgenes Wikipathways Alanine and aspartate metabolism 12 0.012783485 1 0.002321532 ABAT; GPT; GOT1; GAD1 4 highriskgenes Wikipathways RUNX1 and FOXP3 control the development of 7 0.015846172 1 0.001744186 IL2RA; IFNG; CTLA4 3 highriskgenes Wikipathways regulatory T lymphocytes (Tregs) Transcriptional Regulation by MECP2 30 0.022560624 1 0.003460208 GAD1; PVALB; MOBP; CREB1; PTPN4; 6 highriskgenes Wikipathways TRPC3 Purine metabolism 7 0.025687478 1 0.001743173 ATIC; ADA; AMPD1 3 highriskgenes Wikipathways Secretion of Hydrochloric Acid in Parietal Cells 4 0.028006291 1 0.001164144 CCKBR; HRH2 2 highriskgenes Wikipathways ApoE and miR-146 in inflammation and 8 0.038088888 1 0.00174216 TRAF4; TLR2; NFKB2 3 highriskgenes Wikipathways atherosclerosis Spinal Cord Injury 117 0.039683084 1 0.009402655 GADD45A; RTN4R; AQP4; IFNG; CXCL1; 17 highriskgenes Wikipathways CXCL2; IL6; EGR1; CDK2; PRKCA; GJA1; EFNB2; VCAN; CDK4; FOXO3; SOX9; E2F1 NAD+ biosynthetic pathways 22 0.043132587 1 0.002886836 CD38; IDO1; SIRT3; SIRT6; SIRT5 5 highriskgenes Wikipathways Breast cancer pathway 154 0.044118246 1 0.011425462 ARAF; GADD45A; SHC3; FZD2; CSNK2A2; 21 highriskgenes Wikipathways WNT7B; RAD51; SKP1; WNT10A; FZD5; NFKB2; FGF17; LEF1; LRP5; WNT3; WNT1; CDK4; CDK6; FGF18; E2F1; DLL4 Genotoxicity pathway 64 0.049223554 1 0.00567215 CBLB; TNFRSF17; NLRX1; B3GNT2; 10 highriskgenes Wikipathways FBXO22; SEL1L; MEX3B; BRMS1L; E2F8; GADD45A

29 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S4: HNSC gene set enrichment analysis (GSEA)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme

B cell receptor signaling pathway - Homo sapi- 0.009615951 0.622169841 -0.372038015 -1.563666442 489 69 PPP3CC; NFKBIA; RAF1; SYK; INPP5D; KEGG ens (human) FCGR2B; CD79A; CD19; RASGRP3; AKT1; RAC2; PIK3R2; MAP2K2; MAPK1; CARD11; PIK3CB; BLNK; MALT1; SOS2; PIK3CA; PPP3CA Fc epsilon RI signaling pathway - Homo sapi- 0.014185095 0.622169841 -0.376578882 -1.538260898 720 60 PDPK1; RAF1; MAP2K6; SYK; INPP5D; FCER1A; KEGG ens (human) MS4A2; AKT1; RAC2; MAPK10; PIK3R2; PLA2G4D; MAP2K2; MAPK1; ALOX5AP; MAPK8; PIK3CB; JMJD7-PLA2G4B; SOS2; PIK3CA; MAP2K7; AKT3 Cell cycle - Homo sapiens (human) 0.00644264 0.622169841 0.325408317 1.516089026 314 118 CDK6; CCNA2; CCNB2; PTTG2; ESPL1; YWHAG; KEGG E2F1; GADD45A; MCM6; ANAPC7; YWHAB; CCNB3; TTK; CDK2; SKP1; CDC20; CDK4; CDKN2B; CHEK2; FZR1; CUL1; PRKDC; CDC7; E2F5; WEE1; CDKN1C; HDAC2; ANAPC10; CCNH; ; YWHAZ; TP53 beta-Alanine metabolism - Homo sapiens (hu- 0.037678517 0.734661535 0.434460516 1.499564472 1859 28 ALDH3B1; ALDH1B1; AOC3; HADHA; ABAT; KEGG man) ALDH3A1; GAD1; ALDH6A1; SMOX Protein processing in endoplasmic reticulum - 0.028814642 0.734661535 0.276750889 1.339975858 1397 153 TRAF2; EDEM3; STT3B; HSPA8; EIF2AK4; KEGG Homo sapiens (human) MAN1A2; HSPA4L; SVIP; STUB1; SEC31A; SEC23A; UBE2J2; SSR2; HSP90B1; NFE2L2; SEC61B; DNAJC10; PLAA; PPP1R15A; SKP1; SEL1L; UBQLN2; DERL3; PDIA4; CUL1; RNF185; DNAJB1; MAN1A1; SSR4; RPN1; DNAJC5B; CAPN1; HSP90AA1; UBE2D1; EIF2AK1; BAK1; DNAJC5; RPN2; STT3A; TRAM1; DNAJB11; BCL2; PREB; SEC24B; RAD23A Autophagy - animal - Homo sapiens (human) 0.009747426 0.622169841 -0.321369757 -1.479444734 498 115 PDPK1; ATG2B; GABARAPL2; RAF1; KEGG GABARAPL1; STK11; ATG9A; ATG7; ATG5; ATG16L1; AKT1; MAPK10; PIK3R2; RRAGD; MAP2K2; MAPK1; RRAGB; MAPK8; PIK3CB; SNAP29; ATG4A; RPS6KB1; MAP3K7; DDIT4; ATG2A; IRS1; PIK3CA; CAMKK2; GABARAP; RB1CC1; ITPR1; AKT3; PPP2CA; PPP2CB; TRAF6; NRBF2; PRKACA; NRAS; BAD; MTOR FoxO signaling pathway - Homo sapiens (hu- 0.022368961 0.733204828 -0.299966974 -1.394288531 1144 122 PDPK1; EGF; GABARAPL2; RAF1; SMAD4; KEGG man) IGF1; GABARAPL1; STK11; EP300; SIRT1; FBXO25; CREBBP; AKT1; MAPK10; PIK3R2; FOXO4; CDKN1B; MAP2K2; MAPK1; FOXO1; TGFB3; MAPK8; PIK3CB; ATM; FASLG; IRS1; PRKAG1; SOS2; PIK3CA; G6PC3; SLC2A4; AGAP2; GABARAP; PRMT1; AKT3; HOMER2; BCL2L11; CCND2 African trypanosomiasis - Homo sapiens (hu- 0.006396588 0.622169841 0.472828815 1.711557555 314 34 PRKCA; HBA1; THOP1; HBA2; APOA1; IDO1; KEGG man) IFNG; HPR; IL6; IL18; ICAM1; IL12B; GNAQ; FAS; APOL1; PRKCG; IL1B; SELE; PLCB3; VCAM1; TNF; PLCB2; HBB Autophagy - other - Homo sapiens (human) 0.014763352 0.622169841 -0.470333594 -1.630410796 747 29 ATG2B; GABARAPL2; GABARAPL1; ATG9A; KEGG ATG7; ATG5; ATG16L1; ATG4A; ATG2A; GABARAP; PPP2CA; PPP2CB Hepatocellular carcinoma - Homo sapiens (hu- 0.039846049 0.734661535 0.266959778 1.303578308 1935 162 PRKCA; WNT3; CDK6; FZD2; WNT1; NQO1; KEGG man) TERC; ARAF; LRP5; WNT10A; ACTL6A; ACTB; E2F1; GADD45A; ARID2; WNT7B; FZD5; NFE2L2; GSTM3; SHC3; SMARCC1; LEF1; TGFBR1; PLCG2; CDK4; PLCG1; WNT9A; WNT10B; FZD8; IGF2; GSTA4; MGST1; TP53; BAK1; WNT2B; APC2; SMAD3; CSNK1A1L; ACTG1; PRKCG; WNT7A; MAPK3; WNT16; MGST3; TGFB2 cAMP signaling pathway - Homo sapiens (hu- 0.031108092 0.734661535 -0.270346575 -1.323492499 1602 172 VIPR2; PDE4B; GRIA3; PTCH1; NFKBIA; RAF1; KEGG man) PDE3B; ATP1B1; PDE10A; EP300; GRIN3A; GRIN2A; ADCY2; ADORA1; CNGA4; CREBBP; ATP2A2; CHRM1; AKT1; RAC2; MAPK10; ATP2B3; GRIN1; PIK3R2; ROCK1; FFAR2; MYL9; MAP2K2; MAPK1; MAPK8; PIK3CB; PT- GER2; TIAM1; GLI1; RYR2; GHRL; PIK3CA; CALM2; ADCYAP1R1; CAMK2G; FXYD1; AKT3; CACNA1C; ATP2B4; CAMK2D; ATP1B2; ACOX3; GABBR2; PRKACA; BAD; RAPGEF4 Mineral absorption - Homo sapiens (human) 0.03957623 0.734661535 0.3745293 1.445757807 1949 45 TRPV6; FTL; SLC26A9; SLC8A1; SLC34A2; KEGG ATP1A3; MT1M; STEAP2; MT1X; MT1F; TRPM6; CYBRD1 Toxoplasmosis - Homo sapiens (human) 0.035961542 0.734661535 -0.297539113 -1.358098979 1843 109 PDPK1; LAMB2; XIAP; NFKBIA; PIK3R5; KEGG MAP2K6; HSPA1B; IL12A; HLA-DMB; BIRC7; LAMB1; HSPA1L; AKT1; MAPK10; CIITA; MAPK1; LAMA4; TGFB3; MAPK8; HLA-DOB; MAP3K7; PIK3R6; TYK2; HLA-DOA; HSPA1A; CYCS; IL10RA; AKT3; TLR4; GNAO1; IFNGR2 Pathogenic Escherichia coli infection - Homo 0.028096261 0.734661535 0.370263637 1.476426685 1386 52 ARPC3; PRKCA; TUBA1A; ACTB; ITGB1; KEGG sapiens (human) ARPC5L; ARPC1B; CTTN; NCK1; HCLS1; YW- HAZ; TUBB2B; TUBB; ACTG1; EZR; TUBA4A; WASL; ARPC4; ROCK2; CDC42; TUBA3D; ARPC2

30 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S4: HNSC gene set enrichment analysis (GSEA) (continued)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme Proteoglycans in cancer - Homo sapiens (hu- 0.017969898 0.662640002 0.270706229 1.353055352 867 192 PRKCA; WNT3; KDR; FZD2; WNT1; ERBB4; KEGG man) ARAF; VEGFA; WNT10A; ACTB; ITGB1; SHH; TLR2; TWIST2; IQGAP1; FLNC; WNT7B; FZD5; IHH; CD44; CAMK2A; CTTN; PLAUR; HSPB2; FLNB; PLCG2; PLCG1; HSPG2; WNT9A; MMP2; WNT10B; HCLS1; FZD8; ARHGEF12; PPP1R12C; MDM2; IL12B; HIF1A; IGF2; TP53; RRAS; MSN; WNT2B; ITGB5; VAV2; ACTG1; PPP1CC; FAS; THBS1; PTPN11; PRKCG; ITGB3; WNT7A; CAV1; ARHGEF1; EZR; MAPK3; WNT16; ERBB3; TGFB2 Proteasome - Homo sapiens (human) 0.011155508 0.622169841 0.422994592 1.608623215 549 42 PSMA4; PSMD11; PSMA5; PSMD12; PSMB10; KEGG IFNG; PSMA7; PSMD7; PSME2; PSMD2; PSMC3; PSMA1; PSMB5; PSMD13; PSMC1 Neutrophil degranulation 0.001029866 0.489701339 0.255683343 1.407114861 48 448 CD58; PGAM1; SNAP23; S100A9; XRCC5; Reactome ALDH3B1; DYNLT1; MMP8; FUCA2; MME; PRG2; TRAPPC1; HSPA8; DBNL; PRDX6; PSMD11; FTL; TMEM63A; METTL7A; RNASE3; ATP11B; TRIM24; TMEM179B; ATP6V0C; ATP8B4; SVIP; IGF2R; ORM1; CPNE1; PSMA5; DSG1; TLR2; GLA; IQGAP1; DYNC1H1; DNAJC13; GYG1; DEFA1B; RAP1B; PSMD12; GLIPR1; CPNE3; PA2G4; MOSPD2; SLC11A1; GLB1; PGLYRP1; GCA; FAF2; CD44; C5AR1; PLAUR; SIRPB1; CTSB; CXCL1; ADAM8; TMEM173; KRT1; PSMD7; RAB9B; C1orf35; LAMP1; SELL; RAB5B; MAN2B1; FPR2; CD68; PYGB; NCSTN; A1BG; FCGR3B; PSMD2; PPBP; AGA; NME1-NME2; ANO6; TICAM2; CAB39; PNP; TCIRG1; PSMC3; ALOX5; CXCR1; CAPN1; DOK3; HSP90AA1; COTL1; MGST1; PTPRC; NFAM1; CYFIP1; APRT; DNAJC5; S100A8; ARL8A; QSOX1; TUBB; CR1; RHOF; OSCAR; PLD1; STK10; ATAD3B; RHOG; PPIE; PSMD13; RAB10; STBD1; PYGL; PSEN1; EEF2; CD93; SLCO4C1; MNDA; ALDOC; PTX3; CTSC p75NTR signals via NF-kB 0.0492891 0.939960889 -0.511101082 -1.511559311 2495 16 NFKBIA; UBC; SQSTM1; PRKCI; UBA52 Reactome Synthesis of Leukotrienes (LT) and Eoxins 0.009007588 0.939960889 -0.555523767 -1.724026397 456 19 ALOX15; CYP4F3; GGT5; CYP4B1; ALOX5AP; Reactome (EX) DPEP1; GGT1; MAPKAPK2; CYP4F22; DPEP3 Metabolism of polyamines 0.031602892 0.939960889 0.40786959 1.501391018 1555 36 CPS1; NQO1; ODC1; SLC6A7; GOT1; PAOX; Reactome ADI1; CKM; ASL; GATM; AZIN1; SMOX; CKB Collagen formation 0.039773659 0.939960889 0.316265709 1.376092502 1939 80 COL10A1; PCOLCE; COL23A1; PLOD3; Reactome COL11A1; LOXL1; ITGB4; P4HA3; COL11A2; COL6A2; PLOD1; PPIB; COL9A3; CTSB; SER- PINH1; COL5A2; BMP1; ADAMTS14; PLEC; COL28A1; PXDN; P4HA1; COL3A1; COL6A1; ITGA6; COL8A2; COL4A4; MMP1; TLL1 Xenobiotics 0.017827111 0.939960889 -0.573694865 -1.664943148 900 15 CYP2S1; CYP2D6; CYP2A6; CYP3A4; CYP3A7; Reactome ARNT2; CYP2C8; CYP2J2 Mitochondrial Fatty Acid Beta-Oxidation 0.011525103 0.939960889 -0.458875791 -1.64222078 583 33 MCAT; ACSM3; MMAA; HADH; PCCA; ECHS1; Reactome ACAD11; ACOT9; ACAA2 Fc epsilon receptor (FCERI) signaling 0.026078117 0.939960889 -0.342288788 -1.444041568 1333 70 TAB3; PDPK1; MAP3K1; NFKBIA; SYK; ITPR3; Reactome LAT2; AHCYL1; ITPR2; MAPK10; ITK; PIK3R2; MAPK1; CARD11; MAPK8; PIK3CB; MAP3K7; MALT1; PIK3CA; PPP3CA; MAP2K7; ITPR1; GRAP2 Mitotic Spindle Checkpoint 0.024735561 0.939960889 0.309686101 1.404496566 1208 101 SPC24; KIF2C; ZW10; CLIP1; UBE2E1; PPP2R5E; Reactome ANAPC7; DYNC1H1; RANGAP1; CDCA8; ZWILCH; CENPQ; CDC20; CKAP5; NSL1; AHCTF1; SKA1; NDE1; SPC25; ERCC6L; ANAPC10; CENPO; AURKB; UBE2D1; UBE2C; BIRC5; PPP1CC; INCENP; RPS27; NDC80; CLASP1; KNTC1; ANAPC11 Cell Cycle Checkpoints 0.040547233 0.939960889 0.25051819 1.272096593 1964 216 CCNA2; SPC24; CCNB2; KIF2C; HIST1H4E; Reactome MCM10; ZW10; CLIP1; UBE2E1; RAD9B; YWHAG; PPP2R5E; MCM6; ANAPC7; RPA3; DYNC1H1; YWHAB; RANGAP1; CDCA8; ZWILCH; RPA1; HIST1H4A; CDK2; TP53BP1; CENPQ; CDC20; CKAP5; RBBP8; NBN; CHEK2; NSL1; AHCTF1; SKA1; TOPBP1; HIST1H4D; CDC7; NDE1; SPC25; WEE1; ERCC6L; ANAPC10; MDM2; CENPO; YWHAZ; AURKB; TP53; UBE2D1; PHF20; HIST3H2BB; UBE2C; BIRC5; PPP1CC; HIST1H2BO; CDC25A; INCENP; RPS27; NDC80; BARD1; HUS1; CLASP1; KNTC1; RPA2; HIST1H2BF; ANAPC11; HIST1H2BD EPHB-mediated forward signaling 0.019582457 0.939960889 0.44202519 1.582257976 967 32 ARPC3; ACTR3; ACTB; YES1; ARPC1B; LIMK2; Reactome ACTG1; CFL1; WASL; ARPC4; RASA1; ROCK2; CDC42; ARPC2; SDC2 G alpha (s) signalling events 0.01497082 0.939960889 -0.311660368 -1.435912002 766 115 VIPR2; PDE4B; RAMP3; GNAL; GNG2; GPR15; Reactome PDE3B; AVPR2; PDE10A; GNG11; ADM; ADCY2; MC1R; REEP3; REEP1; GRK6; ADCYAP1; GPR27; INSL3; PTGER2; REEP6; GNB2; CALCRL; OR2A7; VIPR1; ADCYAP1R1; HTR7; PTGER4; PDE8A; GPR84; PDE7B; REEP5; PTGDR

31 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S4: HNSC gene set enrichment analysis (GSEA) (continued)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme Biosynthesis of DHA-derived SPMs 0.04782256 0.939960889 -0.523833477 -1.520238392 2416 15 ALOX15; CYP2D6; GSTM4; HPGD Reactome Biosynthesis of specialized proresolving media- 0.019611325 0.939960889 -0.544865464 -1.639949165 993 17 ALOX15; CYP2D6; GSTM4; HPGD; ALOX5AP; Reactome tors (SPMs) CYP3A4 DNA Replication 0.029718777 0.939960889 0.332906198 1.420138441 1451 72 LIG1; CCNA2; MCM10; UBE2E1; GINS2; MCM6; Reactome ANAPC7; RPA3; RPA1; CDK2; SKP1; FZR1; RPA4; CUL1; PRIM1; CDC7; POLE3; ANAPC10; POLA1; UBE2D1; UBE2C; FEN1; CDT1; POLD2; GINS1; PRIM2; RPA2; ANAPC11; ANAPC2; DNA2 NOTCH3 Intracellular Domain Regulates Tran- 0.007386049 0.939960889 -0.554602994 -1.74419649 373 20 PBX1; MAMLD1; EP300; HEY1; CREBBP; Reactome scription MAML3; SNW1; NOTCH3; MAML2; KAT2A Signaling by NOTCH3 0.044282478 0.939960889 -0.377169493 -1.435944029 2247 43 PBX1; EGF; DLL1; MAMLD1; EP300; HEY1; Reactome WWP2; UBC; CREBBP Striated Muscle Contraction 0.015764444 0.939960889 0.464731928 1.622506601 778 29 DES; TNNI2; TNNI3; TPM2; TMOD2; VIM; Reactome ACTN3; TMOD3; MYH3; MYBPC3 Regulation of KIT signaling 0.046898424 0.939960889 0.510831564 1.517383448 2314 16 PRKCA; SH2B3; YES1 Reactome Regulation of Activity at G2/M Transi- 0.042299971 0.939960889 0.315595016 1.370125745 2064 79 TUBA1A; NEK2; CCNB2; YWHAG; PRKAR2B; Reactome tion CSNK1E; DYNC1H1; ODF2; CEP72; SKP1; CEP152; CKAP5; SDCCAG8; CEP135; CUL1; CEP192; CEP250; SSNA1; NDE1; CSNK1D; HSP90AA1 Loss of Nlp from mitotic 0.033305511 0.939960889 0.343269053 1.42815318 1635 63 TUBA1A; NEK2; YWHAG; PRKAR2B; CSNK1E; Reactome DYNC1H1; ODF2; CEP72; CEP152; CKAP5; SDCCAG8; CEP135; CEP192; CEP250; SSNA1; NDE1; CSNK1D; HSP90AA1 Loss of proteins required for interphase micro- 0.033305511 0.939960889 0.343269053 1.42815318 1635 63 TUBA1A; NEK2; YWHAG; PRKAR2B; CSNK1E; Reactome tubule organization from the DYNC1H1; ODF2; CEP72; CEP152; CKAP5; SDCCAG8; CEP135; CEP192; CEP250; SSNA1; NDE1; CSNK1D; HSP90AA1 Sema3A PAK dependent Axon repulsion 0.01120295 0.939960889 0.579954407 1.722707207 552 16 PLXNA1; SEMA3A; HSP90AA1; PLXNA4; CFL1; Reactome NRP1; PAK2; PAK3; PLXNA3 AURKA Activation by TPX2 0.04087249 0.939960889 0.333095955 1.398178779 2004 66 TUBA1A; NEK2; YWHAG; PRKAR2B; CSNK1E; Reactome DYNC1H1; ODF2; CEP72; CEP152; CKAP5; SDCCAG8; CEP135; CEP192; CEP250; SSNA1; NDE1; CSNK1D; HSP90AA1 Common Pathway of Fibrin Clot Formation 0.005393678 0.939960889 0.599478446 1.811376587 265 17 SERPINC1; PROCR; PROS1; PF4V1; PROC; F2; Reactome F13A1 Formation of Fibrin Clot (Clotting Cascade) 0.03314783 0.939960889 0.434728921 1.517757877 1637 29 SERPINC1; PROCR; A2M; PROS1; PF4V1; PROC; Reactome F2; F13A1 Regulation of TP53 Activity through Acetyla- 0.016126801 0.939960889 -0.472019643 -1.621441316 815 28 ING5; BRD1; MAP2K6; EP300; PIP4K2B; HDAC1; Reactome tion ING2; AKT1; RBBP4; CHD4; AKT3; MTA2; GATAD2A; PIP4K2A; PML Mitotic Prometaphase 0.005201801 0.939960889 0.300752591 1.471925804 252 165 NCAPG; TUBA1A; SPC24; NEK2; CCNB2; KIF2C; Reactome ZW10; CLIP1; YWHAG; PRKAR2B; PPP2R5E; CSNK1E; DYNC1H1; RANGAP1; CDCA8; ZWILCH; ODF2; CEP72; CENPQ; CEP152; CDC20; CSNK2A2; CKAP5; NSL1; AHCTF1; SDCCAG8; CEP135; SKA1; CEP192; CEP250; SMC2; SSNA1; NDE1; SPC25; ERCC6L; CSNK1D; CENPO; HSP90AA1; AURKB; TUBG2 M Phase 0.022609993 0.939960889 0.243127544 1.285007054 1090 300 NCAPG; PRKCA; TUBA1A; SPC24; NEK2; Reactome CCNB2; H3F3B; ESPL1; KIF2C; H3F3A; HIST1H4E; ZW10; CLIP1; UBE2E1; BANF1; YWHAG; HIST1H3J; PRKAR2B; PPP2R5E; CSNK1E; ANAPC7; NUP153; DYNC1H1; RAN- GAP1; CDCA8; ZWILCH; ODF2; NUP214; HIST1H4A; CEP72; CENPQ; H2AFJ; CEP152; CDC20; CSNK2A2; CKAP5; NSL1; AHCTF1; SDCCAG8; CEP135; NCAPD3; SKA1; CEP192; KIF23; CEP250; HIST1H4D; SMC2; SSNA1; NDE1; SPC25; ERCC6L; TMPO; NEK6; ANAPC10; CSNK1D; CENPO; HSP90AA1; AURKB; UBE2D1; TUBG2; RAB1A; HIST3H2BB; TUBB; MCPH1; UBE2C; BIRC5; MASTL; PPP1CC WNT ligand biogenesis and trafficking 0.02404509 0.939960889 0.476837221 1.584745034 1185 24 WNT3; WNT1; SNX3; WNT10A; WNT7B; Reactome WNT9A; WNT10B; WNT2B; WNT7A; WNT16 APC/C-mediated degradation of cell cycle pro- 0.045564574 0.939960889 0.383674179 1.437702677 2239 39 CCNA2; NEK2; UBE2E1; ANAPC7; CDK2; Reactome teins SKP1; CDC20; FZR1; CUL1; ANAPC10; AURKB; UBE2D1; UBE2C Regulation of mitotic cell cycle 0.045564574 0.939960889 0.383674179 1.437702677 2239 39 CCNA2; NEK2; UBE2E1; ANAPC7; CDK2; Reactome SKP1; CDC20; FZR1; CUL1; ANAPC10; AURKB; UBE2D1; UBE2C Anchoring of the basal body to the plasma mem- 0.02712322 0.939960889 0.318816093 1.410924951 1325 88 TUBA1A; NEK2; TMEM67; YWHAG; PRKAR2B; Reactome brane CSNK1E; DYNC1H1; ODF2; TMEM216; CEP72; CEP152; CKAP5; SDCCAG8; CEP135; CEP192; CEP250; SEPT2; SSNA1; NDE1; AHI1; CSNK1D; HSP90AA1; CEP97; TUBB; TCTN3; MARK4 ISG15 antiviral mechanism 0.017670503 0.939960889 0.453998516 1.596648378 868 30 UBE2E1; EIF4E2; EIF4A1; FLNB; PLCG1; DDX58; Reactome MX1; EIF4E; PPM1B; ISG15; MAPK3; TRIM25; IFIT1; EIF4G2; NEDD4; STAT1; MX2; IRF3 Antiviral mechanism by IFN-stimulated genes 0.017670503 0.939960889 0.453998516 1.596648378 868 30 UBE2E1; EIF4E2; EIF4A1; FLNB; PLCG1; DDX58; Reactome MX1; EIF4E; PPM1B; ISG15; MAPK3; TRIM25; IFIT1; EIF4G2; NEDD4; STAT1; MX2; IRF3

32 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S4: HNSC gene set enrichment analysis (GSEA) (continued)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme Intraflagellar transport 0.000511398 0.486339765 -0.517400034 -1.926267214 25 39 IFT81; TTC26; TNPO1; IFT74; DYNLL2; IFT46; Reactome IFT20; TTC21B; TRIP11; CLUAP1; IFT122; TC- TEX1D2; TCTEX1D1; KIF3B; DYNLRB1; IFT172; WDR34; KIF3A Cell Cycle 0.004008222 0.685405961 0.337713106 1.563630653 194 114 CDK6; CCNA2; CCNB2; PTTG2; ESPL1; YWHAG; Wikipathways E2F1; GADD45A; MCM6; ANAPC7; YWHAB; CCNB3; TTK; CDK2; SKP1; CDC20; CDK4; CDKN2B; CHEK2; FZR1; CUL1; PRKDC; CDC7; E2F5; WEE1; CDKN1C; HDAC2; ANAPC10; CCNH; MDM2; YWHAZ; TP53 MicroRNAs in cardiomyocyte hypertrophy 0.01597737 0.875280385 -0.341910037 -1.485536233 818 82 PDPK1; EGF; RAF1; IGF1; MAP2K6; LIF; RCAN1; Wikipathways MYEF2; GATA4; AKT1; PIK3R2; CTF1; ROCK1; MAP2K2; MAPK1; MAPK8; PIK3CB; FZD1; NFATC4; MYLK; PIK3CA; HDAC5; PPP3CA; MAP2K7; AGT; MAP3K14; IKBKE Type II diabetes mellitus 0.016466505 0.875280385 -0.544795494 -1.6667906 834 18 PIK3R5; MAPK1; MAPK8; IRS1; SLC2A4; Wikipathways CACNA1A; PRKCZ; INS-IGF2; SOCS4; MTOR IL-1 signaling pathway 0.014160261 0.875280385 -0.384964932 -1.544912971 723 55 TAB3; MAP3K1; NFKBIA; MAP3K3; MAP2K6; Wikipathways SQSTM1; AKT1; PIK3R2; MAP2K2; MAPK1; MAPK8; IRAK3; ECSIT; MAP3K7; IL1A; MAP2K7; MAP3K14; PELI2; PRKCZ; MAP- KAPK2; IRAK2; TRAF6 Proteasome Degradation 0.031333251 0.875280385 0.34492225 1.43605414 1528 64 PSMA4; PSMD11; PSMA5; PSMD12; PSMB10; Wikipathways IFNG; HLA-G; PSMA7; PSMD7; PSME2; UBA1; PSMD2; RPN1; PSMC3; PSMD9; UBE2D1; PSMA1; PSMB5; PSMD13; PSMC1; RPN2 Pathogenic Escherichia coli infection 0.028055578 0.875280385 0.370263637 1.476113637 1368 52 ARPC3; PRKCA; TUBA1A; ACTB; ITGB1; Wikipathways ARPC5L; ARPC1B; CTTN; NCK1; HCLS1; YW- HAZ; TUBB2B; TUBB; ACTG1; EZR; TUBA4A; WASL; ARPC4; ROCK2; CDC42; TUBA3D; ARPC2 Parkin-Ubiquitin Proteasomal System pathway 0.049528639 0.875280385 0.324952116 1.369635292 2421 68 TUBA1A; RNF19A; HSPA8; PSMD11; CASP1; Wikipathways STUB1; UBE2J2; PSMD12; GPR37; PSMD7; CUL1; UBA1; PSMD2; HSPA7; PSMC3; CASP8; PSMD9; TUBB2B; TUBB; PSMD13; PSMC1; TUBA4A Structural Pathway of Interleukin 1 (IL-1) 0.048421218 0.875280385 -0.370206006 -1.416644582 2471 44 TAB3; MAP3K1; SAFB; MAP3K3; MAP2K6; Wikipathways IRF7; MBP; MAP2K2; MAPK1; MAP3K7; IL1A; MAP2K7; MAP3K14; MAPKAPK2; IRAK2; TRAF6 Differentiation of white and brown adipocyte 0.029121635 0.875280385 -0.47684147 -1.559342403 1481 23 PPARG; BMP7; BMP4; EBF3; PPARGC1B; Wikipathways CEBPA; HOXC9; SLC7A10; BMP2; SMAD9 Fatty Acid Biosynthesis 0.00303078 0.685405961 -0.569256695 -1.838913595 153 22 HADH; ECHDC2; ECH1; SCD; ECHS1; ECHDC3; Wikipathways ACAA2; ACLY; MECR Photodynamic therapy-induced NF-kB survival 0.035345617 0.875280385 0.423296372 1.500017557 1735 31 CXCL2; VEGFA; TNFRSF1A; NFKB2; IL6; Wikipathways signaling ICAM1; MMP2; BIRC5; IL1B; EGLN2; SELE; MMP1; VCAM1; TNF; PTGS2; RELA NAD+ biosynthetic pathways 0.013747102 0.875280385 0.531102312 1.678044213 675 20 SIRT6; CD38; SIRT3; SIRT5; IDO1; TNKS; TNKS2; Wikipathways QPRT; PARP2; NADSYN1; SIRT2; SIRT7; NAMPT Hematopoietic Stem Cell Gene Regulation by 0.044246392 0.875280385 -0.487433096 -1.512889891 2243 19 GZMB; SMAD4; EP300; GABPB1; CREBBP; Wikipathways GABP alpha-beta Complex DNMT3B miRNA regulation of prostate cancer signaling 0.030626487 0.875280385 -0.436269553 -1.525391842 1557 30 NFKBIA; RAF1; PDGFA; CREBBP; CDKN1B; Wikipathways pathways MAP2K2; MAPK1; FOXO1; PIK3CA; AKT3; BAD; MTOR; CCND1; CDKN1A; CREB3L1; AR; SOS1; GSK3B Interleukin-4 and Interleukin-13 signaling 0.020518514 0.875280385 0.325636153 1.441366796 997 88 IL23R; BCL6; IL4R; VEGFA; HSPA8; CCL11; Wikipathways FOXO3; CCL22; SOCS3; ITGB1; PIM1; IL6; GATA3; IL18; ICAM1; VIM; IRF4; MMP2; IL6R; F13A1; OSM; ALOX5; IL12B; HIF1A; HSP90AA1; TP53; MCL1; BIRC5 Tumor suppressor activity of SMARCB1 0.036534978 0.875280385 0.426008872 1.498118789 1794 30 CDK6; H3F3B; H3F3A; ACTL6A; GLI4; EED; Wikipathways SMARCC1; CDK4; EZH2 Cytokines and Inflammatory Response 0.043501197 0.875280385 0.488153519 1.522703351 2143 19 CSF3; CXCL2; IFNG; IL6; CXCL1; CSF1; IL11; Wikipathways IL12B; IL7; IL15; IL1B

33 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S5: Lung cancer gene set over-representation

Pathway Gene set p value p value Jaccard In- Overlapping Number Genes Source size (adjusted) dex of overlap- ping

Glycosphingolipid biosynthesis - globo and 15 0.034141328 1 0.002317497 B3GALT5; GLA; ST8SIA1; FUT1 4 lowriskgenes KEGG isoglobo series - Homo sapiens (human) Ferroptosis - Homo sapiens (human) 40 0.036511049 1 0.004576659 ALOX15; VDAC2; SLC11A2; ATG7; 8 lowriskgenes KEGG MAP1LC3A; MAP1LC3B; LPCAT3; ACSL1 Glycosphingolipid biosynthesis - lacto and neo- 27 0.03979562 1 0.003454231 B3GNT2; B3GALT5; B3GALT2; ST8SIA1; 6 lowriskgenes KEGG lacto series - Homo sapiens (human) FUT5; FUT1 AMPK signaling pathway - Homo sapiens (hu- 120 0.046105873 1 0.009392265 CRTC2; IRS1; MAP3K7; PPP2R3A; CREB5; 17 lowriskgenes KEGG man) STRADA; STRADB; PIK3R3; PPARG; EIF4EBP1; PPP2R5A; PPP2R5B; TSC2; TSC1; RAB2A; PFKFB2; PFKFB4 Oxygen-dependent asparagine hydroxylation of 3 0.000999475 1 0.001747234 HIF1AN; HIF1A; EPAS1 3 lowriskgenes REACTOME Hypoxia-inducible Factor Alpha PTK6 Expression 5 0.008557166 1 0.001745201 EPAS1; HIF1A; PELP1 3 lowriskgenes REACTOME GABA A (rho) receptor activation 3 0.010001747 1 0.001164822 GABRR1; GABRR2 2 lowriskgenes REACTOME Interleukin-20 family signaling 15 0.018520034 1 0.002320186 IL19; STAT3; IL22RA2; IL22RA1 4 lowriskgenes REACTOME Intra-Golgi traffic 45 0.020991086 1 0.005142857 MAN2A1; NAPA; CUX1; MAN1C1; CYTH1; 9 lowriskgenes REACTOME TRIP11; CYTH3; RAB33B; GOSR1 NrCAM interactions 7 0.025687478 1 0.001743173 DLG3; ANK1; NRP2 3 lowriskgenes REACTOME Nucleobase catabolism 37 0.025756807 1 0.004022989 NT5M; NT5C; NT5C1B; ENTPD6; ENTPD5; 7 lowriskgenes REACTOME ENTPD3; NUDT18 Inhibition of TSC complex formation by PKB 3 0.028006291 1 0.001164144 TSC2; TSC1 2 lowriskgenes REACTOME Basigin interactions 26 0.033341329 1 0.003456221 SLC7A8; L1CAM; BSG; ITGA3; ATP1B1; 6 lowriskgenes REACTOME MAG LDL clearance 19 0.035155359 1 0.002888504 LIPA; NCEH1; AP2B1; NPC2; CLTA 5 lowriskgenes REACTOME Phosphate bond hydrolysis by NTPDase pro- 8 0.038088888 1 0.00174216 ENTPD6; ENTPD5; ENTPD3 3 lowriskgenes REACTOME teins Nef and 8 0.038088888 1 0.00174216 DOCK2; ELMO1; 3 lowriskgenes REACTOME Pyruvate metabolism 30 0.04698885 1 0.003452244 LDHAL6B; PDPR; BSG; LDHC; PDP1; PDK4 6 lowriskgenes REACTOME Exercise-induced Circadian Regulation 48 0.005708783 1 0.006274957 STBD1; CBX3; PSMA4; CLDN5; KLF9; 11 lowriskgenes Wikipathways G0S2; HERPUD1; UCP3; ETV6; GSTM3; GFRA1 Nanoparticle triggered autophagic cell death 22 0.011223673 1 0.003466205 TSC2; TSC1; ATG7; MAP1LC3A; ULK2; 6 lowriskgenes Wikipathways ATG4A Globo Sphingolipid Metabolism 20 0.028158923 1 0.002890173 B3GALT5; ST8SIA1; FUT1; ST6GALNAC1; 5 lowriskgenes Wikipathways ST6GALNAC2 Energy Metabolism 47 0.031868233 1 0.005134056 TFB1M; UCP3; MYBBP1A; SIRT3; PPARG; 9 lowriskgenes Wikipathways EP300; PPP3CC; MEF2A; NRF1 Ferroptosis 40 0.036511049 1 0.004576659 ALOX15; VDAC2; SLC11A2; LPCAT3; 8 lowriskgenes Wikipathways MAP1LC3A; ATG7; MAP1LC3B; ACSL1 Bladder cancer - Homo sapiens (human) 41 0.004980754 0.582276332 0.005730659 EGFR; MAP2K1; RB1; THBS1; MAPK1; 10 highriskgenes KEGG HRAS; MDM2; FGFR3; ; E2F3 One carbon pool by folate - Homo sapiens (hu- 20 0.006359388 0.582276332 0.003474233 ALDH1L2; MTHFD1L; AMT; FTCD; GART; 6 highriskgenes KEGG man) DHFR Central carbon metabolism in cancer - Homo 65 0.006432965 0.582276332 0.007373795 MAP2K1; HRAS; AKT2; PFKP; PFKM; HK3; 13 highriskgenes KEGG sapiens (human) LDHA; EGFR; FGFR3; G6PD; PDK1; MET; MAPK1 Tuberculosis - Homo sapiens (human) 178 0.008120665 0.582276332 0.014069264 AKT2; MRC1; IRAK1; RAB5C; CAMK2A; 26 highriskgenes KEGG CAMK2B; TLR2; TLR1; VDR; CD209; CARD9; TNFRSF1A; STAT1; MAPK1; IL18; FCGR3A; FCGR3B; BAD; CEBPB; CEBPG; BID; NOD2; MALT1; NOS2; FCGR2B; CASP3 Pentose phosphate pathway - Homo sapiens (hu- 30 0.009382451 0.582276332 0.004039238 H6PD; RBKS; DERA; TKTL1; PFKM; PFKP; 7 highriskgenes KEGG man) G6PD MicroRNAs in cancer - Homo sapiens (human) 299 0.01088367 0.582276332 0.013057671 HDAC4; DDIT4; MDM2; FGFR3; THBS1; 24 highriskgenes KEGG CD44; PRKCG; HRAS; HMOX1; MAP2K1; SPRY2; FZD3; MAPK1; IGF2BP1; SLC7A1; CASP3; DICER1; MET; EGFR; PLCG2; PLAU; E2F3; E2F2; CCNE2 Homologous recombination - Homo sapiens 41 0.013948896 0.603046601 0.004597701 XRCC2; RAD54B; RAD54L; POLD3; 8 highriskgenes KEGG (human) BARD1; BLM; RAD51; TOP3B ErbB signaling pathway - Homo sapiens (hu- 85 0.015029199 0.603046601 0.008417508 HRAS; EGFR; MAP2K1; BAD; ERBB4; 15 highriskgenes KEGG man) MAPK1; CAMK2A; CAMK2B; JUN; PRKCG; PLCG2; AKT2; STAT5A; EREG; BTC Melanoma - Homo sapiens (human) 72 0.018403282 0.645613478 0.006798867 EGFR; BAD; RB1; MAPK1; HRAS; FGF5; 12 highriskgenes KEGG MAP2K1; E2F3; E2F2; AKT2; MET; MDM2 Glioma - Homo sapiens (human) 71 0.020112569 0.645613478 0.007336343 MAP2K1; EGFR; AKT2; RB1; MAPK1; 13 highriskgenes KEGG HRAS; CAMK2A; CAMK2B; PRKCG; MDM2; E2F3; E2F2; PLCG2 Hepatitis C - Homo sapiens (human) 155 0.025040284 0.717658621 0.011487965 AKT2; OAS1; RSAD2; CLDN2; TNFRSF1A; 21 highriskgenes KEGG CLDN11; CLDN15; CLDN18; MAP2K1; TICAM1; STAT1; MAPK1; HRAS; E2F2; EGFR; RIPK1; BAD; BID; RB1; E2F3; CASP3 Non-small cell lung cancer - Homo sapiens (hu- 66 0.029026771 0.717658621 0.006783493 MAP2K1; PLCG2; BAD; RB1; MAPK1; 12 highriskgenes KEGG man) HRAS; EGFR; PRKCG; E2F3; E2F2; AKT2; STAT5A

34 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S5: Lung cancer gene set over-representation (continued)

Pathway Gene set p value p value Jaccard In- Overlapping Number Genes Source size (adjusted) dex of overlap- ping Hepatitis B - Homo sapiens (human) 144 0.029064056 0.717658621 0.010970927 PTK2B; AKT2; TLR2; JUN; MAP2K1; 20 highriskgenes KEGG VDAC3; TICAM1; STAT1; MAPK1; HRAS; E2F2; STAT5A; BAD; CCNA2; RB1; PRKCG; E2F3; CASP3; ATF6B; CCNE2 Cushing syndrome - Homo sapiens (human) 154 0.033390117 0.76558769 0.011462882 MEN1; CDKN2C; CAMK2A; CAMK2B; 21 highriskgenes KEGG PDE8A; ADCY4; PLCB2; ORAI1; CCNE2; ATF6B; RAP1B; FZD2; FZD3; MAPK1; EGFR; MAP2K1; RB1; DVL2; E2F3; E2F2; WNT11 Chronic myeloid leukemia - Homo sapiens (hu- 76 0.037176772 0.79558292 0.007311586 GAB2; MAP2K1; BAD; RB1; MAPK1; 13 highriskgenes KEGG man) HRAS; MDM2; E2F3; E2F2; PTPN11; AKT2; STAT5A; CTBP1 Toll-like receptor signaling pathway - Homo 104 0.049311336 0.817489614 0.007829978 AKT2; CXCL9; TLR2; TLR8; JUN; TLR1; 14 highriskgenes KEGG sapiens (human) TICAM1; STAT1; MAPK1; IRAK1; SPP1; RIPK1; MAP2K1; TAB1 Melanogenesis - Homo sapiens (human) 101 0.049507574 0.817489614 0.008356546 TYRP1; CAMK2A; CAMK2B; ADCY4; 15 highriskgenes KEGG HRAS; PLCB2; FZD2; FZD3; MAPK1; MC1R; MAP2K1; DVL2; ASIP; PRKCG; WNT11 Signaling by cytosolic FGFR1 fusion mutants 18 0.004625502 1 0.003476246 ZMYM2; STAT1; TRIM24; GAB2; STAT5A; 6 highriskgenes REACTOME FGFR1OP Phase 0 - rapid depolarisation 44 0.006216062 1 0.005169443 SCN7A; CACNB4; SCN2A; FGF12; 9 highriskgenes REACTOME CAMK2A; CAMK2B; CACNA2D3; CACNA2D4; SCN5A Signaling by FGFR in disease 47 0.009134022 1 0.005163511 HRAS; GAB2; STAT1; FGFR3; ZMYM2; 9 highriskgenes REACTOME TRIM24; FGFR1OP; FGF5; STAT5A ERBB2 Regulates Cell Motility 15 0.009168191 1 0.002900232 ERBB4; EGFR; EREG; BTC; MEMO1 5 highriskgenes REACTOME FGFR1 mutant receptor activation 31 0.009382451 1 0.004039238 GAB2; STAT1; ZMYM2; TRIM24; FGFR1OP; 7 highriskgenes REACTOME FGF5; STAT5A Resolution of D-loop Structures through Holli- 33 0.009382451 1 0.004039238 GEN1; XRCC2; DNA2; BARD1; BLM; 7 highriskgenes REACTOME day Junction Intermediates RAD51; EME2 Signaling by FGFR1 in disease 38 0.011554828 1 0.004600345 HRAS; GAB2; STAT1; ZMYM2; TRIM24; 8 highriskgenes REACTOME FGFR1OP; FGF5; STAT5A GRB2 events in ERBB2 signaling 16 0.012638263 1 0.002898551 ERBB4; HRAS; EGFR; EREG; BTC 5 highriskgenes REACTOME Establishment of Sister Chromatid Cohesion 11 0.012731557 1 0.00232423 ESCO2; ESCO1; STAG1; PDS5A 4 highriskgenes REACTOME SHC1 events in ERBB2 signaling 22 0.014332229 1 0.003468208 HRAS; EGFR; ERBB4; PTPN12; EREG; BTC 6 highriskgenes REACTOME Resolution of D-Loop Structures 35 0.014531594 1 0.004034582 GEN1; XRCC2; DNA2; BARD1; BLM; 7 highriskgenes REACTOME RAD51; EME2 Formation of the Editosome 9 0.015795255 1 0.001746217 APOBEC3H; APOBEC3C; APOBEC3B 3 highriskgenes REACTOME mRNA Editing: C to U Conversion 9 0.015795255 1 0.001746217 APOBEC3H; APOBEC3C; APOBEC3B 3 highriskgenes REACTOME TLR3-mediated TICAM1-dependent pro- 6 0.015795255 1 0.001746217 TICAM1; RIPK1; RIPK3 3 highriskgenes REACTOME grammed cell death Zinc efflux and compartmentalization by the 7 0.015795255 1 0.001746217 SLC30A5; SLC30A7; SLC30A3 3 highriskgenes REACTOME SLC30 family Nitric oxide stimulates guanylate cyclase 25 0.022443232 1 0.003464203 PDE3A; PRKG1; PDE5A; NOS2; KCNMB3; 6 highriskgenes REACTOME GUCY1B2 ERBB2 Activates PTK6 Signaling 13 0.025520888 1 0.002321532 ERBB4; EGFR; EREG; BTC 4 highriskgenes REACTOME Homology Directed Repair 139 0.026858093 1 0.01046832 SUMO2; BARD1; RNF4; EME2; XRCC2; 19 highriskgenes REACTOME TIMELESS; CHEK1; PARP2; HIST1H2BD; BLM; POLE; POLD3; PIAS4; GEN1; DNA2; CCNA2; RAD51; TP53BP1; TIPIN Insulin-like Growth Factor-2 mRNA Binding 3 0.027943391 1 0.001165501 IGF2BP1; IGF2BP3 2 highriskgenes REACTOME Proteins (IGF2BPs/IMPs/VICKZs) bind RNA Cholesterol biosynthesis via desmosterol 4 0.027943391 1 0.001165501 DHCR7; DHCR24 2 highriskgenes REACTOME Cholesterol biosynthesis via lathosterol 4 0.027943391 1 0.001165501 DHCR7; DHCR24 2 highriskgenes REACTOME Signaling by Hippo 20 0.028032267 1 0.002893519 CASP3; DVL2; LATS1; AMOTL2; AMOT 5 highriskgenes REACTOME HDR through Homologous Recombination 133 0.031241616 1 0.009944751 SUMO2; BARD1; RNF4; EME2; XRCC2; 18 highriskgenes REACTOME (HR) or Single Strand Annealing (SSA) TIMELESS; CHEK1; HIST1H2BD; BLM; POLE; POLD3; PIAS4; GEN1; DNA2; CCNA2; RAD51; TP53BP1; TIPIN G0 and Early G1 25 0.033174432 1 0.003460208 MYBL2; CCNA2; CCNE2; E2F5; LIN9; 6 highriskgenes REACTOME RBBP4 N-Glycan antennae elongation 15 0.034012368 1 0.002320186 ST3GAL4; MGAT4B; B4GALT4; B4GALT6 4 highriskgenes REACTOME Regulation of TLR by endogenous ligand 16 0.034012368 1 0.002320186 CD36; TLR2; TLR1; S100A8 4 highriskgenes REACTOME SHC1 events in ERBB4 signaling 14 0.034012368 1 0.002320186 HRAS; EREG; BTC; ERBB4 4 highriskgenes REACTOME mRNA Editing 11 0.037973374 1 0.001744186 APOBEC3H; APOBEC3C; APOBEC3B 3 highriskgenes REACTOME SEMA3A-Plexin repulsion signaling by inhibit- 14 0.043950039 1 0.002318841 SEMA3A; PLXNA3; PLXNA4; TLN1 4 highriskgenes REACTOME ing Integrin adhesion S Phase 103 0.045740468 1 0.008361204 AKT2; PDS5A; CCNE2; STAG1; ANAPC5; 15 highriskgenes REACTOME GINS2; GINS3; POLE; POLD3; DNA2; CCNA2; ESCO2; ESCO1; RB1; MCM5 Downregulation of ERBB2 signaling 29 0.046762797 1 0.003456221 EGFR; AKT2; ERBB4; PTPN12; EREG; BTC 6 highriskgenes REACTOME Signaling of Hepatocyte Growth Factor Recep- 34 0.0050574 0.850185102 0.005172414 HRAS; RAP1B; PTK2B; MET; JUN; MAPK1; 9 highriskgenes Wikipathways tor RASA1; PTPN11; MAP2K1 ErbB Signaling Pathway 92 0.005787334 0.850185102 0.009518477 AKT2; CAMK2A; CAMK2B; MDM2; HRAS; 17 highriskgenes Wikipathways JUN; BTC; MAPK1; PLCG2; MAP2K1; BAD; STAT5A; EREG; EGFR; ERBB4; PRKCG; PDK1 Osteopontin Signaling 13 0.006415057 0.850185102 0.002901915 MAP2K1; MAPK1; SPP1; PLAU; MAP3K14 5 highriskgenes Wikipathways

35 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S5: Lung cancer gene set over-representation (continued)

Pathway Gene set p value p value Jaccard In- Overlapping Number Genes Source size (adjusted) dex of overlap- ping Retinoblastoma Gene in Cancer 89 0.010481201 0.850185102 0.008963585 BARD1; MDM2; RBBP4; CCNE2; CHEK1; 16 highriskgenes Wikipathways FAF1; POLE; FANCG; POLD3; DHFR; MSH6; RRM1; CCNA2; RB1; E2F3; E2F2 Network in Prostate Cancer 72 0.011073971 0.850185102 0.007357102 HRAS; MAP2K1; RAP1B; SPRY2; 13 highriskgenes Wikipathways SMARCA4; JUN; MAPK1; HSD17B3; SMARCD3; RASA1; PTPN11; PTK2B; FOXA1 EPO Receptor Signaling 26 0.011754847 0.850185102 0.004036909 MAP2K1; STAT1; MAPK1; RASA1; EPOR; 7 highriskgenes Wikipathways PDK1; STAT5A Non-small cell lung cancer 66 0.012563889 0.850185102 0.007352941 HRAS; MAP2K1; PLCG2; RB1; MAPK1; 13 highriskgenes Wikipathways EGFR; PRKCG; AKT2; E2F3; E2F2; BAD; PDK1; STAT5A Nanoparticle triggered regulated necrosis 10 0.012731557 0.850185102 0.00232423 TICAM1; TNFRSF1A; RIPK1; RIPK3 4 highriskgenes Wikipathways IL-2 Signaling Pathway 42 0.017922081 0.922700962 0.005151689 HRAS; GAB2; MAP2K1; STAT1; JUN; 9 highriskgenes Wikipathways MAPK1; PTPN11; PTK2B; STAT5A Melatonin metabolism and effects 34 0.021429024 0.922700962 0.004029937 AANAT; CYP2D6; IRAK1; CAMK2A; PER1; 7 highriskgenes Wikipathways ACHE; MAOA Differentiation of white and brown adipocyte 25 0.022443232 0.922700962 0.003464203 CEBPB; CEBPG; ZIC1; HSPB7; BMP7; 6 highriskgenes Wikipathways SLC7A10 Factors and pathways affecting insulin-like 30 0.025610088 0.922700962 0.004027618 NEB; ACVR2B; PRKAB1; IGFBP5; TRIM63; 7 highriskgenes Wikipathways growth factor (IGF1)-Akt signaling TNFRSF1A; PDK1 Mammary gland development pathway - Preg- 33 0.030315433 0.922700962 0.004025302 EGFR; ELF5; CEBPB; GAL; ORAI1; ERBB4; 7 highriskgenes Wikipathways nancy and lactation (Stage 3 of 4) PNCK IL-7 Signaling Pathway 25 0.033174432 0.922700962 0.003460208 BAD; STAT1; PTK2B; MAPK1; MAP2K1; 6 highriskgenes Wikipathways STAT5A Senescence and Autophagy in Cancer 106 0.033262463 0.922700962 0.008903728 MDM2; HRAS; JUN; THBS1; MAP1LC3C; 16 highriskgenes Wikipathways IGFBP5; SPARC; MAPK1; MLST8; MAP2K1; CEBPB; GABARAPL2; RB1; ATG5; CD44; PLAU Serotonin Receptor 2 and ELK-SRF-GATA4 20 0.035000627 0.922700962 0.002891845 HRAS; MAP2K1; HTR2B; MAPK1; GATA4 5 highriskgenes Wikipathways signaling Bladder Cancer 40 0.036292907 0.922700962 0.004581901 HRAS; EGFR; MAP2K1; RB1; THBS1; 8 highriskgenes Wikipathways MAPK1; MDM2; FGFR3 Vitamin D Metabolism 10 0.037973374 0.922700962 0.001744186 VDR; DHCR7; CYP27A1 3 highriskgenes Wikipathways Nifedipine Activity 8 0.037973374 0.922700962 0.001744186 PTK2B; MAP2K1; MAPK1 3 highriskgenes Wikipathways fig-met-1-last-solution 14 0.043950039 0.922700962 0.002318841 PRKAB1; G6PD; PDK1; LDHA 4 highriskgenes Wikipathways Hepatitis C and Hepatocellular Carcinoma 56 0.045850477 0.922700962 0.005131129 PODXL; MYOF; JUN; CXCR1; NOS2; CD44; 9 highriskgenes Wikipathways CASP3; E2F2; PTPN11 Cardiac Progenitor Differentiation 53 0.04726475 0.922700962 0.004576659 ISL1; SOX2; NCAM1; ACTC1; POU5F1; 8 highriskgenes Wikipathways GATA4; FOXA2; SCN5A Amyotrophic lateral sclerosis (ALS) 35 0.047793856 0.922700962 0.00401837 CAT; BAD; BID; CCS; CASP3; TNFRSF1A; 7 highriskgenes Wikipathways PRPH MET in type 1 papillary renal cell carcinoma 59 0.048898709 0.922700962 0.005678592 RAP1B; MAP2K1; HRAS; MAPK1; STRN; 10 highriskgenes Wikipathways AKT2; JUN; PTPN11; BAD; MET Toll-like Receptor Signaling Pathway 102 0.049311336 0.922700962 0.007829978 AKT2; IRAK1; CXCL9; TLR2; TLR8; JUN; 14 highriskgenes Wikipathways TLR1; STAT1; TICAM1; MAPK1; SPP1; RIPK1; MAP2K1; TAB1

36 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S6: Lung cancer gene set enrichment analysis (GSEA)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme

Non-small cell lung cancer - Homo sapiens (hu- 0.010898204 0.321497006 0.374045141 1.557379546 545 66 E2F2; RB1; HRAS; PLCG2; E2F3; AKT2; KEGG man) MAP2K1; EGFR; MAPK1; BAD; PRKCG; STAT5A; PDPK1; NRAS; ALK; PIK3R2; GADD45A; TP53; RXRB Chronic myeloid leukemia - Homo sapiens (hu- 0.001316629 0.186477295 0.405078284 1.734856777 65 76 E2F2; PTPN11; RB1; GAB2; HRAS; E2F3; KEGG man) AKT2; MAP2K1; CTBP1; MAPK1; MDM2; BAD; STAT5A; NRAS; BCL2L1; RUNX1; PIK3R2; GADD45A; TP53 Fc epsilon RI signaling pathway - Homo sapi- 0.037671233 0.44360956 0.347892645 1.421121415 1891 60 LCP2; GAB2; HRAS; PLCG2; AKT2; MAP2K1; KEGG ens (human) MAPK1; PDPK1; NRAS; VAV1; LYN; PLA2G4D; MAP2K3; PIK3R2; MAP2K4; PLA2G4A; INPP5D; MAPK13; PIK3R1; MAPK11; FYN; MAP2K2; VAV3; PIK3CD; JMJD7-PLA2G4B Central carbon metabolism in cancer - Homo 0.031231926 0.400583398 0.351556535 1.440756657 1565 61 PFKM; HRAS; AKT2; PDK1; MAP2K1; EGFR; KEGG sapiens (human) FGFR3; PFKP; MAPK1; MET; LDHA; G6PD; HK3; NRAS; GCK; PIK3R2; PGAM2; TP53; PTEN; SLC1A5 Melanoma - Homo sapiens (human) 0.003160632 0.186477295 0.409587644 1.682858843 157 62 E2F2; FGF5; RB1; HRAS; E2F3; AKT2; MAP2K1; KEGG EGFR; MAPK1; MET; MDM2; BAD; NRAS; PDGFB; PDGFC; PIK3R2; GADD45A; TP53; PTEN; CDH1; PDGFA; CDK6; FGF19; PIK3R1 Allograft rejection - Homo sapiens (human) 0.042946752 0.44360956 -0.412774741 -1.466925282 2142 32 FASLG; PRF1; GZMB; IL12B; HLA-A; HLA-C; KEGG HLA-F; HLA-DPA1; HLA-DOA; CD40LG; HLA- DQA2; FAS; CD28; HLA-DRB1; CD80 Graft-versus-host disease - Homo sapiens (hu- 0.022933665 0.369265734 -0.424909347 -1.54239649 1143 35 FASLG; PRF1; GZMB; KLRC1; KLRD1; HLA- KEGG man) A; HLA-C; HLA-F; HLA-DPA1; HLA-DOA; HLA- DQA2; FAS; KIR3DL1; CD28; HLA-DRB1; CD80 Cushing syndrome - Homo sapiens (human) 0.024663633 0.369265734 0.28906144 1.370608778 1240 138 E2F2; PDE8A; CCNE2; ADCY4; MEN1; CAMK2B; KEGG RB1; E2F3; WNT11; MAP2K1; ATF6B; FZD3; EGFR; RAP1B; MAPK1; FZD2; CAMK2A; ORAI1; PLCB2; DVL2; CDKN2C; ARNT; ADCY3; DVL3; CREB3; ADCY7; CACNA1H; WNT10A; WNT10B; CACNA1G; CDK2; CYP11A1; GNAI3 Bladder cancer - Homo sapiens (human) 0.006262965 0.263939228 0.450013135 1.68471731 313 40 E2F2; RB1; THBS1; HRAS; E2F3; MAP2K1; KEGG EGFR; FGFR3; MAPK1; MDM2; NRAS; TYMP; MMP9; TP53; HBEGF; CDH1 Longevity regulating pathway - multiple 0.022824054 0.369265734 0.369150302 1.493315141 1146 57 ADCY4; HRAS; AKT2; FOXA2; ATG5; PRKAB1; KEGG species - Homo sapiens (human) CAT; NRAS; ADCY3; ADCY7; RPS6KB1; PIK3R2; CLPB; CRYAB; PIK3R1; IRS2; IGF1R; EIF4EBP2 Autoimmune thyroid disease - Homo sapiens 0.02827712 0.397226206 -0.429783078 -1.527369773 1410 32 FASLG; PRF1; GZMB; TPO; HLA-A; HLA-C; KEGG (human) HLA-F; HLA-DPA1; HLA-DOA; CD40LG; HLA- DQA2; FAS; CD28; HLA-DRB1; CD80 GnRH signaling pathway - Homo sapiens (hu- 0.04962412 0.44360956 0.308654456 1.346055389 2481 84 JUN; ADCY4; CAMK2B; PTK2B; HRAS; KEGG man) MAP2K1; EGFR; MAPK1; GNRH1; CAMK2A; PLCB2; NRAS; ADCY3; PLA2G4D; ADCY7; MAP2K3; LHB; HBEGF; CALML3; MAP2K4; PLA2G4A; MAPK13; PLCB1; CALML6 Breast cancer - Homo sapiens (human) 0.04747314 0.44360956 0.275960724 1.305509134 2385 136 E2F2; JUN; FGF5; RB1; HRAS; E2F3; AKT2; KEGG WNT11; DLL1; MAP2K1; FZD3; EGFR; MAPK1; FZD2; DVL2; NRAS; BRCA2; CSNK1A1L; DVL3; HES5; TNFSF11; HEYL; WNT10A; RPS6KB1; PIK3R2; GADD45A; WNT10B; TP53; PTEN; CSNK1A1; DLL4; DLL3; FRAT1; SHC2; CDK6; FGF19; TCF7L2; PIK3R1 Type I diabetes mellitus - Homo sapiens (hu- 0.00769014 0.283573902 -0.4521162 -1.669986296 381 38 FASLG; PRF1; PTPRN; GZMB; IL12B; HLA-A; KEGG man) HLA-C; HLA-F; HLA-DPA1; HLA-DOA; GAD1; HLA-DQA2; FAS; ICA1; CD28; HLA-DRB1; PT- PRN2; CD80; LTA; TNF; HLA-DPB1 One carbon pool by folate - Homo sapiens (hu- 0.03057396 0.400583398 0.514629501 1.573020565 1531 18 MTHFD1L; DHFR; ALDH1L2; FTCD; AMT; KEGG man) GART Axon guidance - Homo sapiens (human) 0.048569779 0.44360956 0.263061805 1.283049144 2439 168 SEMA4B; PTPN11; ROBO3; PLXNA4; LIMK1; KEGG CAMK2B; SEMA3E; HRAS; PLCG2; SSH1; RASA1; PDK1; FZD3; SEMA6A; MAPK1; MET; SEMA3A; RYK; CAMK2A; BMP7; PLXNA3; ROBO1; NRAS; TRPC3; SRGAP2; EFNB1; DPYSL5; NTNG2; TRPC6; ROBO2; DPYSL2; RRAS; PIK3R2; NFATC2; SEMA6B; NGEF; EPHB2; SEMA3B; GNAI3; TRPC1; EFNA2; UNC5A; EPHA2 AMPK signaling pathway - Homo sapiens (hu- 0.041262282 0.44360956 -0.293033377 -1.34157647 2061 110 STRADB; TSC1; RAB2A; CREB5; PFKFB4; KEGG man) STRADA; EIF4EBP1; PPP2R5B; IRS1; PFKFB2; PPP2R3A; TSC2; MAP3K7; PIK3R3; CRTC2; PPARG; PPP2R5A; RAB14 Other types of O-glycan biosynthesis - Homo 0.047304577 0.44360956 0.502863349 1.512858902 2370 17 OGT; LFNG; MFNG; POMT1; ST3GAL3; POFUT1; KEGG sapiens (human) B4GALT1; ST6GAL1; B4GALT3; GXYLT2 Signaling pathways regulating pluripotency of 0.015129497 0.343323191 0.304191843 1.426586927 759 128 OTX1; SOX2; INHBE; POU5F1; HRAS; ISL1; KEGG stem cells - Homo sapiens (human) AKT2; WNT11; MAP2K1; FZD3; FGFR3; MAPK1; FZD2; SKIL; ACVR2B; DVL2; NRAS; DLX5; DVL3; PCGF1; ACVR1B; ACVR1; JAK2; WNT10A; PIK3R2; ESRRB; REST; WNT10B; TBX3; MEIS1; SMAD1; NODAL

37 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S6: Lung cancer gene set enrichment analysis (GSEA) (continued)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme Glioma - Homo sapiens (human) 0.00201031 0.186477295 0.404008132 1.703994205 100 70 E2F2; CAMK2B; RB1; HRAS; PLCG2; E2F3; KEGG AKT2; MAP2K1; EGFR; MAPK1; MDM2; CAMK2A; PRKCG; NRAS; PDGFB; PIK3R2; GADD45A; TP53; PTEN; CALML3; PDGFA; SHC2; CDK6; PIK3R1; CALML6 VEGF signaling pathway - Homo sapiens (hu- 0.024662115 0.369265734 0.368339576 1.483868068 1238 56 HRAS; PLCG2; AKT2; MAP2K1; SH2D2A; KEGG man) MAPK1; BAD; PRKCG; NRAS; PLA2G4D; PIK3R2; NFATC2; MAPKAPK3; HSPB1; SHC2; PLA2G4A; MAPK13; PIK3R1; MAPK11; PTGS2; MAP2K2; PIK3CD; PRKCB; JMJD7-PLA2G4B C-type lectin receptor signaling pathway - 0.025034965 0.369265734 0.3141687 1.410882593 1252 99 PTPN11; CD209; JUN; HRAS; PLCG2; STAT1; KEGG Homo sapiens (human) AKT2; CARD9; MALT1; MAPK1; MDM2; MAP3K14; RELB; NRAS; STAT2; BCL3; IL1B; PYCARD; RRAS; PIK3R2; NFATC2; RRAS2; IL10; PLK3; CALML3; MAPK13; PAK1; NLRP3; PIK3R1; CALML6; EGR2; MAPK11; MRAS DNA replication - Homo sapiens (human) 0.047535422 0.44360956 0.395344119 1.444286718 2381 36 MCM5; DNA2; POLD3; POLE; PRIM1; POLD1; KEGG POLE4; PRIM2; POLD4; RNASEH2B; RFC1; POLA2; RPA2; POLA1; POLE3; RFC4; LIG1; MCM4; MCM2 Tuberculosis - Homo sapiens (human) 0.002909468 0.186477295 0.314787174 1.522957884 145 159 CEBPB; CD209; CAMK2B; TLR1; VDR; BID; KEGG IRAK1; FCGR3A; STAT1; FCGR3B; AKT2; CARD9; CEBPG; NOS2; FCGR2B; MALT1; RAB5C; CASP3; MAPK1; MRC1; IL18; CAMK2A; TLR2; BAD; TNFRSF1A; NOD2; CD14; RFX5; JAK2; IRAK4; IL1B; CYP27B1; TCIRG1; ATP6V1H; RAB7A; LBP; IL10; C3; PLK3 MAPK signaling pathway - Homo sapiens (hu- 0.041069685 0.44360956 0.241246726 1.249739286 2060 268 RASGRP3; RASGRP4; JUN; FGF5; MAPK8IP2; KEGG man) IRAK1; CACNB4; HRAS; RASA1; AKT2; DUSP10; MAP2K1; EGFR; RAP1B; FGFR3; CASP3; MAPK1; CACNA2D3; MET; TAB1; CACNA2D4; ERBB4; DUSP8; TAOK3; IL1RAP; PRKCG; EREG; MAP3K14; TNFRSF1A; RPS6KA3; RELB; CD14; RPS6KA2; NRAS; MAP3K12; RASGRF1; MEF2C; PLA2G4D; IRAK4; MAP2K3; TAOK2; CACNA1H; IL1B; CACNA2D2; PDGFB; RRAS; NLK; PDGFC; GADD45A; TP53; RAPGEF2; MAPKAPK3; MAP4K2; RRAS2; MAPK8IP1; HSPB1; CACNA1G; RPS6KA1; EFNA2; EPHA2; PDGFA; TAB2; MAP2K4; NTF3; PLA2G4A; MAPK13; FGF19; NGFR; PAK1; MAP4K4 Phospholipase D signaling pathway - Homo 0.00558147 0.263939228 0.318209685 1.501945651 279 134 PTPN11; ADCY4; GRM3; AGPAT1; GAB2; KEGG sapiens (human) PTK2B; HRAS; PLCG2; PIP5K1A; AKT2; MAP2K1; F2R; EGFR; MAPK1; CXCR1; RALA; PLCB2; AGPAT5; DGKA; NRAS; ADCY3; AVPR1A; DGKQ; LPAR4; PLA2G4D; ADCY7; CYTH4; AGPAT4; PDGFB; DNM1; RRAS; LPAR2; PDGFC; PIK3R2; CYTH2; RRAS2; PTGFR; DGKG; GRM8; AGPAT2; PDGFA; LPAR6; SHC2; PLA2G4A; DGKD; PIK3R1; PLCB1; RALB Homologous recombination - Homo sapiens 0.012344446 0.331055597 0.457031152 1.634852561 617 33 RAD54B; BLM; POLD3; BARD1; XRCC2; KEGG (human) RAD54L; TOP3B; RAD51; BRCA2; RAD50; POLD1; XRCC3; TOP3A; PALB2; POLD4; BRCC3; RPA2; TOPBP1 MicroRNAs in cancer - Homo sapiens (human) 0.009658188 0.316573927 0.301807035 1.445153351 485 147 E2F2; SPRY2; IGF2BP1; CCNE2; DDIT4; HDAC4; KEGG THBS1; HRAS; PLCG2; E2F3; MAP2K1; FZD3; EGFR; FGFR3; CASP3; HMOX1; MAPK1; MET; SLC7A1; DICER1; CD44; MDM2; PLAU; PRKCG; NRAS; SERPINB5; MARCKS; CDCA5; ZFPM2; MMP9; PDGFB; PIK3R2; TP53; PTEN; TNXB; BMF Hepatitis C - Homo sapiens (human) 0.032611729 0.400852503 0.284472207 1.342705187 1635 134 E2F2; CLDN2; BID; RB1; RSAD2; HRAS; STAT1; KEGG E2F3; CLDN15; AKT2; CLDN18; MAP2K1; EGFR; TICAM1; CASP3; RIPK1; MAPK1; OAS1; CLDN11; BAD; TNFRSF1A; NRAS; STAT2; PSME3; PPP2R2C; PIK3R2; IRF7; TP53; CDK2; PPARA; CDK6; PPP2R2A; MAVS; TRAF3; CD81; PIK3R1 Hepatitis B - Homo sapiens (human) 0.018812335 0.369265734 0.299394939 1.404090596 944 128 E2F2; VDAC3; CCNE2; JUN; CCNA2; RB1; KEGG PTK2B; HRAS; STAT1; E2F3; AKT2; MAP2K1; ATF6B; TICAM1; CASP3; MAPK1; TLR2; BAD; PRKCG; STAT5A; NRAS; STAT2; CREB3; MMP9; PIK3R2; IRF7; TP53; PTEN; NFATC2 Pentose phosphate pathway - Homo sapiens (hu- 0.000559016 0.164909759 0.596920902 1.993460357 27 25 PFKM; DERA; TKTL1; PFKP; RBKS; H6PD; KEGG man) G6PD; PGM1 Epstein-Barr virus infection - Homo sapiens 0.018406621 0.369265734 0.275739831 1.360089822 922 182 E2F2; PSMD7; NEDD4; CCNE2; JUN; BID; KEGG (human) CCNA2; RB1; IRAK1; PLCG2; STAT1; E2F3; AKT2; CASP3; RIPK1; PSMD3; OAS1; TAB1; PSMC5; PSMD4; CD44; MDM2; TLR2; MAP3K14; RELB; STAT2; TAP2; LYN; NFKBIE; IRAK4; MAP2K3; SKP2; NCOR2; PIK3R2; IRF7; HLA-B; USP7; GADD45A; TP53; CDK2; CD40; BCL2L11; TAB2; HLA-E; MAP2K4; CDK6; MAPK13; MAVS; TRAF3

38 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S6: Lung cancer gene set enrichment analysis (GSEA) (continued)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme Pancreatic cancer - Homo sapiens (human) 0.014815258 0.343323191 0.35077173 1.498679661 742 75 E2F2; RB1; STAT1; E2F3; AKT2; RALBP1; KEGG MAP2K1; EGFR; MAPK1; RALA; BAD; RAD51; BCL2L1; BRCA2; RPS6KB1; PIK3R2; GADD45A; TP53 SHC1 events in ERBB2 signaling 0.049867714 0.693910801 0.464684132 1.484522511 2487 21 PTPN12; HRAS; EGFR; BTC; ERBB4; EREG; Reactome NRAS; NRG4; HBEGF FRS-mediated FGFR2 signaling 0.030219285 0.693910801 0.536081572 1.586811851 1508 16 PTPN11; FGF5; HRAS; NRAS Reactome HDR through Homologous Recombination 0.043549287 0.693910801 0.289907882 1.331498575 2185 113 CHEK1; SUMO2; RNF4; CCNA2; EME2; PIAS4; Reactome (HR) or Single Strand Annealing (SSA) DNA2; BLM; POLD3; BARD1; POLE; TP53BP1; XRCC2; TIPIN; HIST1H2BD; GEN1; TIMELESS; RAD51; BRCA2; RAD50; POLD1; XRCC3; TOP3A; POLE4; HIST1H4J; PALB2; HERC2; CDK2; POLD4; RFC1; BRCC3; UBE2V2; UBC; UBB; RNF168; RNF8; RPA2; HIST1H2BJ; HIST1H4E; TOPBP1; CCNA1; POLE3; RFC4; HIST1H4C FRS-mediated FGFR3 signaling 0.003178665 0.494390184 0.642618712 1.868421535 158 15 PTPN11; FGF5; HRAS; FGFR3; NRAS Reactome Downstream signaling of activated FGFR3 0.020473039 0.693910801 0.515134035 1.625519272 1023 20 PTPN11; FGF5; HRAS; FGFR3; NRAS Reactome Negative regulation of FGFR3 signaling 0.038239466 0.693910801 0.457902363 1.516222337 1913 24 SPRY2; PTPN11; FGF5; FGFR3; MAPK1 Reactome Epigenetic regulation of gene expression 0.049662672 0.693910801 -0.272053667 -1.294149107 2465 141 MYBBP1A; SAP18; HIST4H4; HIST1H2BC; Reactome GTF2H5; ACTB; TET2; HIST1H4A; HIST1H2BE; DNMT1; UHRF1; EP300; HIST1H4I; CBX3; HIST1H3C; RRP8; KAT2B; SIN3B; POLR2F; H2AFB1; HIST1H2BO; TWISTNB; CHD4; ERCC6; HIST2H2BE; HIST1H2BL; CD3EAP; HIST2H4A; MBD2; POLR2H; SAP30; HIST1H3I; DNMT3B; POLR2L; HIST1H3D; HIST2H2AC; CCNH; HIST1H3J; HIST1H4B; PHF1 TBC/RABGAPs 0.015726605 0.693910801 -0.406640483 -1.567474319 785 45 TSC1; MAP1LC3B; RAB33B; TBC1D24; TSC2; Reactome TBC1D10C; RAB5B; TBC1D17; TBC1D14; TBC1D3B; TBC1D10B; RAB11A; RAB5A; TBC1D15; SYTL1; RABGAP1; RAB6B; RAB33A; RAB4A; RABGEF1; GABARAP; TBC1D20; TBC1D16; RAB35 Formation of Senescence-Associated Hete- 0.031981576 0.693910801 0.533205784 1.578299462 1596 16 EP400; RB1; HIST1H1E; H1F0; HMGA1; Reactome rochromatin Foci (SAHF) HIST1H1D; TP53; HIST1H1B; LMNB1 DNA Damage/ Stress Induced Senes- 0.010010569 0.595003191 0.500076038 1.691868331 501 26 CCNE2; EP400; CCNA2; RB1; RAD50; HIST1H1E; Reactome cence H1F0; HMGA1; HIST1H1D; TP53; CDK2; CCNA1; HIST1H1B; LMNB1 Nucleobase catabolism 0.04347044 0.693910801 -0.420826818 -1.472772591 2174 30 NT5C1B; NT5M; NUDT18; ENTPD5; ENTPD6; Reactome ENTPD3; NT5C Toll-Like Receptors Cascades 0.021043838 0.693910801 0.28681624 1.371154075 1057 147 PTPN11; JUN; TLR1; IRAK1; S100A12; TLR8; Reactome PLCG2; MAP2K1; CD36; TICAM1; TMEM189; RIPK1; MAPK1; RIPK3; TAB1; TLR2; S100A8; IRAK3; NOD2; RPS6KA3; CD14; RPS6KA2; MEF2C; S100A1; IRAK4; MAP2K3; DNM1; IRF7; AGER; UNC93B1; MAPKAPK3; LBP; RPS6KA1; HSP90B1; VRK3; UBE2D3; TAB2; MAP2K4; CTSS; CNPY3; TRAF3; TLR6; UBC; UBB; ATF1; CTSK; MAPK11; TLR5; RIPK2; TLR7; MAP3K8; UBE2V1; NKIRAS2; ELK1; SAA1; CTSB; NOD1; SIGIRR; APOB; RPS27A Homology Directed Repair 0.040099751 0.693910801 0.288549124 1.335932347 2009 119 CHEK1; SUMO2; RNF4; PARP2; CCNA2; Reactome EME2; PIAS4; DNA2; BLM; POLD3; BARD1; POLE; TP53BP1; XRCC2; TIPIN; HIST1H2BD; GEN1; TIMELESS; RAD51; BRCA2; RAD50; POLD1; XRCC3; TOP3A; POLE4; HIST1H4J; PALB2; HERC2; CDK2; POLD4; RFC1; BRCC3; UBE2V2; UBC; UBB; RNF168; RNF8; RPA2; POLQ; HIST1H2BJ; HIST1H4E; TOPBP1; CCNA1; POLE3; RFC4; HIST1H4C Metabolism of porphyrins 0.049059395 0.693910801 0.520588921 1.51361847 2453 15 COX15; COX10; CPOX; HMOX1; PPOX; ALAS1; Reactome ALAS2 Signaling by PDGF 0.025909009 0.693910801 0.372675253 1.483203733 1298 52 PTPN12; PTPN11; SPP1; THBS1; HRAS; BCAR1; Reactome STAT1; RASA1; STAT5A; NRAS; COL6A6; PDGFB; PDGFC; PIK3R2; GRB7 Fc epsilon receptor (FCERI) signaling 0.048137329 0.693910801 0.324900889 1.371130902 2405 70 RASGRP4; JUN; LCP2; GAB2; HRAS; PLCG2; Reactome MALT1; MAPK1; TAB1; PDPK1; NRAS; VAV1; LYN; PIK3R2; NFATC2; TAB2; MAP2K4; PAK1; PIK3R1 Association of TriC/CCT with target proteins 0.0301135 0.693910801 0.40083201 1.495593668 1506 39 CCT7; CCNE2; AP3M1; FBXL3; NOP56; DCAF7; Reactome during biosynthesis GBA; XRN2; CCT8; FBXW10; WRAP53; TP53; FBXW9 Cyclin D associated events in G1 0.047890308 0.693910801 0.370686061 1.420415209 2395 44 E2F2; RB1; E2F3; E2F5; CDKN2C; MNAT1; Reactome PPP2R3B; LYN; JAK2; SKP2; CDK6; PPP2R2A; UBC; UBB G1 Phase 0.047890308 0.693910801 0.370686061 1.420415209 2395 44 E2F2; RB1; E2F3; E2F5; CDKN2C; MNAT1; Reactome PPP2R3B; LYN; JAK2; SKP2; CDK6; PPP2R2A; UBC; UBB

39 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S6: Lung cancer gene set enrichment analysis (GSEA) (continued)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme Mitotic G1-G1/S phases 0.000398153 0.378643096 0.387808644 1.739909886 19 98 E2F2; RBBP4; MCM5; CCNE2; FBXO5; CCNA2; Reactome RB1; DHFR; E2F3; POLE; AKT2; MYBL2; E2F5; LIN9; CDKN2C; PRIM1; MNAT1; CCNB1; PPP2R3B; POLE4; LYN; JAK2; SKP2; PRIM2; CDK2; POLA2; CDK6; PPP2R2A; RRM2; UBC; UBB; RPA2; MCM10; POLA1; CCNA1; POLE3 Neuronal System 0.032953331 0.693910801 0.239610601 1.257606989 1664 300 GNG5; KCNJ5; KCNMB3; LRFN3; ADCY4; Reactome SLC18A3; CAMK2B; GLRA2; GRIK3; CHRNB2; KCNN3; KCNG2; CACNB4; KCNK17; LR- RTM4; HRAS; KCNH2; KCNF1; ACHE; KCNB1; KCNA6; LRFN1; MAPK1; SHANK3; CACNA2D3; PANX1; GNAL; MDM2; CAMK2A; ABCC9; IL1RAP; AP2A2; MAOA; PRKCG; PLCB2; KCNA3; RPS6KA3; KCNN4; RPS6KA2; PDPK1; KCNK10; ADCY3; RASGRF1; KCNN1; SYT2; KCNAB2; GRIK1; ADCY7; LRFN2; GNB5; KCND2; CACNA2D2; CHRNG; RRAS; COMT; GJC1; NLGN3; KCNV2; KCNQ3; KCNQ5; KCNJ10; AP2M1; DLGAP2; KCNJ9; RPS6KA1; KCNH1; GNAI3; SLC1A2; SLC38A1; GABBR2; GNB4; CHRNA5; ACTN2; KCNH8; EPB41L2; KCNK9; ALDH5A1; GABRB2; PLCB1; PTPRD; DNAJC5; SYN1; UNC13B; RTN3; APBA2; GNG10; GALNT8; GRM1; NRXN2; SLC6A1; HTR3A Orc1 removal from chromatin 0.039286643 0.693910801 0.485087392 1.530706284 1964 20 MCM5; CCNA2; SKP2; CDK2; UBC; UBB; Reactome CCNA1; MCM4; MCM2; RPS27A DNA Replication 0.005616406 0.494390184 0.377161398 1.600913733 280 72 MCM5; CCNE2; CCNA2; DNA2; POLD3; POLE; Reactome ANAPC5; GINS2; GINS3; PRIM1; POLD1; POLE4; SKP2; PRIM2; CDK2; POLD4; RFC1; POLA2; UBE2C; ANAPC1; UBC; UBB; RPA2; MCM10; POLA1; CCNA1; POLE3; RFC4; LIG1; MCM4; CDC7; MCM2; CDC23 DNA strand elongation 0.024281417 0.693910801 0.436005842 1.553868982 1218 32 MCM5; DNA2; POLD3; GINS2; GINS3; PRIM1; Reactome POLD1; PRIM2; POLD4; RFC1; POLA2; RPA2; POLA1; RFC4; LIG1; MCM4; MCM2 Synthesis of DNA 0.005718498 0.494390184 0.384786885 1.611552406 286 67 MCM5; CCNE2; CCNA2; DNA2; POLD3; POLE; Reactome ANAPC5; GINS2; GINS3; PRIM1; POLD1; POLE4; SKP2; PRIM2; CDK2; POLD4; RFC1; POLA2; UBE2C; ANAPC1; UBC; UBB; RPA2; POLA1; CCNA1; POLE3; RFC4; LIG1; MCM4; MCM2; CDC23 S Phase 0.000997865 0.474484603 0.377374196 1.680267365 49 94 MCM5; CCNE2; CCNA2; RB1; DNA2; POLD3; Reactome POLE; AKT2; ESCO2; ANAPC5; PDS5A; ESCO1; STAG1; GINS2; GINS3; PRIM1; MNAT1; POLD1; CDCA5; POLE4; SKP2; STAG2; PRIM2; CDK2; POLD4; RFC1; POLA2; PDS5B; UBE2C; ANAPC1; UBC; UBB; RPA2; POLA1; CCNA1; POLE3; RFC4; LIG1; MCM4; MCM2; CDC23; PTK6 Meiotic synapsis 0.041102204 0.693910801 -0.393306518 -1.456675064 2050 38 HIST4H4; HIST1H2BC; HIST1H4A; REC8; Reactome HIST1H2BE; SYCP2; HIST1H4I; HIST1H2BO; HIST2H2BE; HIST1H2BL; HIST2H4A EGFR downregulation 0.046340053 0.693910801 0.442271868 1.481488775 2323 25 SPRY2; PTPN12; EGFR; SH3GL2; EPN1; Reactome SH3KBP1; HGS; STAM2; UBC; UBB Signaling by EGFR 0.004887976 0.494390184 0.446018221 1.700639482 244 43 SPRY2; PTPN12; PTPN11; HRAS; EGFR; Reactome ADAM17; SH3GL2; NRAS; EPN1; ADAM10; PAG1; SH3KBP1; HGS; STAM2; PIK3R1; UBC; UBB Phase 0 - rapid depolarisation 0.012621573 0.666839747 0.445744637 1.622450036 631 35 FGF12; CAMK2B; SCN2A; CACNB4; SCN5A; Reactome CACNA2D3; SCN7A; CACNA2D4; CAMK2A Telomere C-strand (Lagging Strand) Synthesis 0.019878928 0.693910801 0.485981376 1.609198551 994 24 DNA2; POLD3; POLE; PRIM1; POLD1; POLE4; Reactome PRIM2; POLD4; RFC1; POLA2; RPA2; POLA1; POLE3; RFC4; LIG1 Extension of 0.026056676 0.693910801 0.44073632 1.543616374 1301 30 DNA2; POLD3; POLE; PRIM1; POLD1; POLE4; Reactome WRAP53; PRIM2; POLD4; RFC1; POLA2; RPA2; POLA1; POLE3; RFC4; LIG1 activated TAK1 mediates p38 MAPK activation 0.041845772 0.693910801 0.481891353 1.520621099 2092 20 IRAK1; TMEM189; TAB1; NOD2; MAP2K3; MAP- Reactome KAPK3; TAB2; MAPK11; RIPK2; UBE2V1; NOD1; IRAK2; MAP2K6; IKBKG Ub-specific processing proteases 0.018451665 0.693910801 -0.273912588 -1.355853169 919 184 WDR20; USP20; POLB; IL33; PSMA4; CDC25A; Reactome VDAC2; FKBP8; HIST1H2BC; HIST1H2AH; SMAD7; PSMA6; ADRB2; PSMD10; USP33; HIST2H2BF; HIST1H2BE; CYLD; HIF1A; FOXO4; MAP3K7; USP2; USP30; RNF123; SMURF2; IFIH1; USP37; HIST1H2BO; TAF10; RCE1; AR; WDR48; USP47; MUL1; PSMA3; TAF9B; TNKS2; PSMD12; PSMD14; HIST2H2BE; USP25; USP22; PSMB3; HIST1H2BL; PSMB10; HIST3H2A; PSMA5

40 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S6: Lung cancer gene set enrichment analysis (GSEA) (continued)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme regulation of trafficking 0.016865438 0.693910801 -0.310814143 -1.429285834 839 113 TSC1; DENND2D; DENND1C; MON1B; Reactome MAP1LC3B; RAB13; TRAPPC5; TRAPPC4; RAB33B; TBC1D24; TSC2; DENND3; SBF2; TBC1D10C; RAB14; RAB5B; DENND5B; TBC1D17; RAB3IL1; GDI2; TBC1D14; TBC1D3B; TBC1D10B; RAB11A; TRAPPC6B; RAB5A; TBC1D15; RINL; GDI1; CHML; DENND4B; CHM; TRAPPC2L; SYTL1; DENND2C; RABGAP1; RAB6B; TRAPPC3; RAB33A; RAB4A; RAB27B; RAB18; RAB27A; RABGEF1; GABARAP; TBC1D20; TBC1D16; RAB1A; RAB35 GRB2 events in ERBB2 signaling 0.0290478 0.693910801 0.5474155 1.591617065 1452 15 HRAS; EGFR; BTC; ERBB4; EREG; NRAS; NRG4; Reactome HBEGF; NRG1 Deubiquitination 0.025772574 0.693910801 -0.250968153 -1.293757621 1280 254 WDR20; USP20; POLB; IL33; PSMA4; CDC25A; Reactome OTUD5; VDAC2; FKBP8; HIST1H2BC; HIST1H2AH; ACTB; SMAD7; PSMA6; ADRB2; PSMD10; USP33; HIST2H2BF; HIST1H2BE; CYLD; FOXK2; HIF1A; EP300; TNFAIP3; FOXO4; MAP3K7; USP2; KAT2B; ACTR8; STAM; USP30; RNF123; RHOA; SMURF2; IFIH1; USP37; HIST1H2BO; TAF10; RCE1; AR; WDR48; RAD23A; USP47; MUL1; PSMA3; TAF9B; TNKS2; PSMD12; PSMD14; BAP1; KDM1B; HIST2H2BE; USP25; USP22; MCRS1; PSMB3; HIST1H2BL; TNIP3; PSMB10; HIST3H2A; PSMA5 G1/S Transition 0.008623104 0.546704814 0.370644315 1.564175082 430 70 MCM5; CCNE2; FBXO5; CCNA2; RB1; DHFR; Reactome POLE; AKT2; PRIM1; MNAT1; CCNB1; PPP2R3B; POLE4; SKP2; PRIM2; CDK2; POLA2; RRM2; UBC; UBB; RPA2; MCM10; POLA1; CCNA1; POLE3; MCM4; CDC7; MCM2; PTK6 Regulation of expression of SLITs and ROBOs 0.011538385 0.645470811 0.557460704 1.707824344 576 18 ROBO3; ISL1; MSI1; LHX4; ROBO1; HOXA2; Reactome ROBO2; DAG1; LHX2; LDB1; UBC; UBB; SLIT1 Neurotransmitter receptors and postsynaptic 0.035910224 0.693910801 0.291106071 1.347770564 1799 119 GNG5; KCNJ5; ADCY4; CAMK2B; GLRA2; Reactome signal transmission GRIK3; CHRNB2; HRAS; MAPK1; GNAL; MDM2; CAMK2A; AP2A2; PRKCG; PLCB2; RPS6KA3; RPS6KA2; PDPK1; ADCY3; RASGRF1; GRIK1; ADCY7; GNB5; CHRNG; RRAS; KCNJ10; AP2M1; KCNJ9; RPS6KA1; GNAI3; GABBR2; GNB4; CHRNA5; ACTN2; GABRB2; PLCB1 Signaling by SCF-KIT 0.002292435 0.494390184 0.468311909 1.776418915 114 42 CHEK1; PTPN11; SOCS6; GAB2; HRAS; SH2B3; Reactome STAT1; STAT5A; NRAS; VAV1; LYN; JAK2; MMP9; PIK3R2; GRAP; GRB7 Signaling by FGFR in disease 0.002600676 0.494390184 0.479804168 1.768849487 129 37 FGF5; GAB2; HRAS; STAT1; TRIM24; FGFR1OP; Reactome FGFR3; ZMYM2; STAT5A; NRAS; ERLIN2 DNA Replication Pre-Initiation 0.042837992 0.693910801 0.416569436 1.471162121 2140 31 MCM5; POLE; PRIM1; POLE4; PRIM2; CDK2; Reactome POLA2; UBC; UBB; RPA2; MCM10; POLA1; POLE3; MCM4; CDC7; MCM2; RPS27A; RPA1; GMNN M/G1 Transition 0.042837992 0.693910801 0.416569436 1.471162121 2140 31 MCM5; POLE; PRIM1; POLE4; PRIM2; CDK2; Reactome POLA2; UBC; UBB; RPA2; MCM10; POLA1; POLE3; MCM4; CDC7; MCM2; RPS27A; RPA1; GMNN Post NMDA receptor activation events 0.047347768 0.693910801 0.408297073 1.455118479 2376 32 CAMK2B; HRAS; MAPK1; CAMK2A; RPS6KA3; Reactome RPS6KA2; PDPK1; ADCY3; RASGRF1; RRAS; RPS6KA1; ACTN2 Toll Like Receptor 4 (TLR4) Cascade 0.0301014 0.693910801 0.291253991 1.360552593 1510 126 PTPN11; JUN; TLR1; IRAK1; S100A12; PLCG2; Reactome MAP2K1; CD36; TICAM1; TMEM189; RIPK1; MAPK1; RIPK3; TAB1; TLR2; IRAK3; NOD2; RPS6KA3; CD14; RPS6KA2; MEF2C; IRAK4; MAP2K3; DNM1; IRF7; AGER; MAPKAPK3; LBP; RPS6KA1; VRK3; UBE2D3; TAB2; MAP2K4; TRAF3; TLR6; UBC; UBB; ATF1 Signaling by cytosolic FGFR1 fusion mutants 0.02521699 0.693910801 0.533694057 1.606635804 1257 17 GAB2; STAT1; TRIM24; FGFR1OP; ZMYM2; Reactome STAT5A; FGFR1OP2; CPSF6; PIK3R1 FGFR1 mutant receptor activation 0.008454468 0.546704814 0.515043237 1.725252787 423 25 FGF5; GAB2; STAT1; TRIM24; FGFR1OP; Reactome ZMYM2; STAT5A; ERLIN2; FGFR1OP2; CPSF6; PIK3R1 Signaling by FGFR1 in disease 0.003983826 0.494390184 0.494024464 1.76063992 199 32 FGF5; GAB2; HRAS; STAT1; TRIM24; FGFR1OP; Reactome ZMYM2; STAT5A; NRAS; ERLIN2; FGFR1OP2; CPSF6; PIK3R1 Gastrin-CREB signalling pathway via PKC and 0.005151643 0.494390184 0.602598536 1.814066261 256 17 HRAS; EGFR; MAPK1; RPS6KA3; RPS6KA2; Reactome MAPK NRAS; RPS6KA1; HBEGF MyD88:Mal cascade initiated on plasma mem- 0.046558947 0.693910801 0.30108643 1.345321599 2333 96 JUN; TLR1; IRAK1; S100A12; MAP2K1; CD36; Reactome brane TMEM189; MAPK1; TAB1; TLR2; IRAK3; NOD2; RPS6KA3; CD14; RPS6KA2; MEF2C; IRAK4; MAP2K3; AGER; MAPKAPK3; RPS6KA1; VRK3; TAB2; MAP2K4; TLR6; UBC; UBB; ATF1; MAPK11; RIPK2; MAP3K8; UBE2V1; NKIRAS2; ELK1; SAA1; NOD1; SIGIRR; RPS27A

41 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S6: Lung cancer gene set enrichment analysis (GSEA) (continued)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme Toll Like Receptor TLR1:TLR2 Cascade 0.046558947 0.693910801 0.30108643 1.345321599 2333 96 JUN; TLR1; IRAK1; S100A12; MAP2K1; CD36; Reactome TMEM189; MAPK1; TAB1; TLR2; IRAK3; NOD2; RPS6KA3; CD14; RPS6KA2; MEF2C; IRAK4; MAP2K3; AGER; MAPKAPK3; RPS6KA1; VRK3; TAB2; MAP2K4; TLR6; UBC; UBB; ATF1; MAPK11; RIPK2; MAP3K8; UBE2V1; NKIRAS2; ELK1; SAA1; NOD1; SIGIRR; RPS27A Toll Like Receptor TLR6:TLR2 Cascade 0.046558947 0.693910801 0.30108643 1.345321599 2333 96 JUN; TLR1; IRAK1; S100A12; MAP2K1; CD36; Reactome TMEM189; MAPK1; TAB1; TLR2; IRAK3; NOD2; RPS6KA3; CD14; RPS6KA2; MEF2C; IRAK4; MAP2K3; AGER; MAPKAPK3; RPS6KA1; VRK3; TAB2; MAP2K4; TLR6; UBC; UBB; ATF1; MAPK11; RIPK2; MAP3K8; UBE2V1; NKIRAS2; ELK1; SAA1; NOD1; SIGIRR; RPS27A Toll Like Receptor 2 (TLR2) Cascade 0.046558947 0.693910801 0.30108643 1.345321599 2333 96 JUN; TLR1; IRAK1; S100A12; MAP2K1; CD36; Reactome TMEM189; MAPK1; TAB1; TLR2; IRAK3; NOD2; RPS6KA3; CD14; RPS6KA2; MEF2C; IRAK4; MAP2K3; AGER; MAPKAPK3; RPS6KA1; VRK3; TAB2; MAP2K4; TLR6; UBC; UBB; ATF1; MAPK11; RIPK2; MAP3K8; UBE2V1; NKIRAS2; ELK1; SAA1; NOD1; SIGIRR; RPS27A FRS-mediated FGFR1 signaling 0.020411427 0.693910801 0.563807735 1.639277683 1020 15 PTPN11; FGF5; HRAS; NRAS Reactome Resolution of D-loop Structures through Holli- 0.007078623 0.546704814 0.521156029 1.745728954 354 25 EME2; DNA2; BLM; BARD1; XRCC2; GEN1; Reactome day Junction Intermediates RAD51; BRCA2; RAD50; XRCC3; TOP3A; PALB2 Resolution of D-loop Structures through 0.014896684 0.693910801 0.505722889 1.655319564 743 23 DNA2; BLM; BARD1; XRCC2; RAD51; BRCA2; Reactome Synthesis-Dependent Strand Annealing RAD50; XRCC3; TOP3A; PALB2 (SDSA) Resolution of D-Loop Structures 0.008340317 0.546704814 0.50189357 1.714001031 417 27 EME2; DNA2; BLM; BARD1; XRCC2; GEN1; Reactome RAD51; BRCA2; RAD50; XRCC3; TOP3A; PALB2 NOTCH1 regulation of human endothelial cell 0.012947907 0.429593065 0.565655774 1.703454736 644 17 SOX6; DLL1; FGFR3; DLL4; DLL3; MGP; ALPL; Wikipathways calcification JAG2; ITGA1; CALU; SAT1 Energy Metabolism 0.01705587 0.429593065 -0.404783261 -1.55838458 855 45 SIRT3; MYBBP1A; UCP3; MEF2A; PPP3CC; Wikipathways TFB1M; EP300; NRF1; PPARG; PPP3CB; PRKAA2; ATF2; PRMT1; MEF2B Fluoropyrimidine Activity 0.007188338 0.383959044 0.4843776 1.70963535 358 31 UPP2; DHFR; RRM1; GGH; UMPS; FPGS; XRCC3; Wikipathways TYMP; CES2; ABCC5; TP53; ABCG2; RRM2; ABCC3; TDG; UPP1 IL-1 signaling pathway 0.02181235 0.429593065 0.374172869 1.503218196 1087 55 PTPN11; JUN; IRAK1; MAP2K1; MAPK1; TAB1; Wikipathways IL1RAP; MAP3K14; IRAK3; IRAK4; MAP2K3; IL1B; PIK3R2; TAB2; MAP2K4; PIK3R1 IL-7 Signaling Pathway 0.023391696 0.429593065 0.474438774 1.586739115 1168 25 PTK2B; STAT1; MAP2K1; MAPK1; BAD; STAT5A; Wikipathways IL7; BCL2L1; PIK3R2 Androgen Receptor Network in Prostate Cancer 0.044706917 0.490576187 0.334425835 1.390117319 2236 65 SPRY2; PTPN11; FOXA1; JUN; PTK2B; HRAS; Wikipathways SMARCA4; RASA1; MAP2K1; RAP1B; MAPK1; HSD17B3; SMARCD3 Retinoblastoma Gene in Cancer 4e-05 0.013691501 0.431472262 1.891761227 1 86 FAF1; E2F2; RBBP4; CHEK1; FANCG; CCNE2; Wikipathways CCNA2; RB1; DHFR; POLD3; E2F3; BARD1; POLE; MSH6; RRM1; MDM2; PRIM1; CCNB1; ANLN; HMGB2; SKP2; TP53; DCK; ZNF655; CDK2; CDK6; MAPK13; RBBP7; RRM2 Nanoparticle triggered autophagic cell death 0.003728145 0.318756355 -0.581397567 -1.832634063 186 20 TSC1; ATG7; ULK2; MAP1LC3A; ATG4A; TSC2; Wikipathways CHAF1A; SH3GLB1; BCL2 Aryl Hydrocarbon Receptor 0.013704952 0.429593065 0.416211196 1.586078175 682 43 HPGDS; RB1; HRAS; MAP2K1; EGFR; CD36; Wikipathways MAPK1; NRAS; ARNT; GCLC; NCOR2; CDC37; CDK2 Notch Signaling 0.049067576 0.490576187 0.370069984 1.417864534 2446 44 KCNJ5; LFNG; MFNG; DLL1; CTBP1; ADAM17; Wikipathways DVL2; DVL3; HES5; NCOR2; DTX3; DLL4; DLL3; MAML3 Mammary gland development pathway - Preg- 0.00056065 0.06391414 0.556819695 1.965323405 27 31 CEBPB; GAL; EGFR; ELF5; PNCK; ERBB4; Wikipathways nancy and lactation (Stage 3 of 4) ORAI1; BCL2L1; TNFSF11; JAK2; PRLR; ATP2C2 Bladder Cancer 0.031270032 0.486106862 0.400376699 1.492783629 1560 39 RB1; THBS1; HRAS; MAP2K1; EGFR; FGFR3; Wikipathways MAPK1; MDM2; NRAS; TYMP; MMP9; PIK3R2; TP53; HBEGF; CDH1 IL-3 Signaling Pathway 0.034509033 0.490576187 0.375656486 1.459043847 1720 47 PTPN11; JUN; GAB2; HRAS; MAP2K1; MAPK1; Wikipathways BAD; STAT5A; BCL2L1; VAV1; LYN; JAK2; IL3RA; PIK3R2; CSF2RB IL1 and megakaryocytes in obesity 0.024731937 0.429593065 0.478149273 1.582769407 1233 24 TLR1; IRAK1; F2R; IL18; TLR2; IL1B; MMP9; Wikipathways HBEGF; NLRP3; TIMP2 Kit receptor signaling pathway 0.04258049 0.490576187 0.34477382 1.405720859 2123 59 PTPN11; SOCS6; GAB2; HRAS; STAT1; MAP2K1; Wikipathways MAPK1; BAD; STAT5A; RPS6KA3; VAV1; LYN; JAK2; RPS6KB1; PIK3R2; GRB7; RPS6KA1 Signaling of Hepatocyte Growth Factor Recep- 0.008262308 0.383959044 0.463582723 1.672953779 411 34 PTPN11; JUN; PTK2B; HRAS; RASA1; MAP2K1; Wikipathways tor RAP1B; MAPK1; MET Melatonin metabolism and effects 0.042413309 0.490576187 0.426013246 1.478593933 2115 29 IRAK1; AANAT; ACHE; CAMK2A; CYP2D6; Wikipathways MAOA; PER1; CRY1; PER2 Preimplantation Embryo 0.025122401 0.429593065 0.396990647 1.512833405 1251 43 AQP3; SOX2; CDX2; POU5F1; SMARCA4; Wikipathways BATF3; E2F5; ZFP36; MYBL1; FOXD1; HMGA1; TBX3; AQP9; CDH1; GATA2; H2AFY2; GATA3; SOX11 Exercise-induced Circadian Regulation 0.016795994 0.429593065 -0.397994164 -1.546332431 841 47 HERPUD1; GFRA1; PSMA4; ETV6; STBD1; Wikipathways UCP3; KLF9; G0S2; CLDN5; CBX3; GSTM3; ARNTL; PPP2CB; BTG1; PURA; GSTP1; NR1D2; NCKAP1

42 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.21.053918; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S6: Lung cancer gene set enrichment analysis (GSEA) (continued)

Pathway p value p value ES NES n More Ex- Size Leading Edge Genes Source (adjusted) treme MET in type 1 papillary renal cell carcinoma 0.028346014 0.461635087 0.363248835 1.464864955 1411 56 PTPN11; JUN; HRAS; AKT2; MAP2K1; RAP1B; Wikipathways STRN; MAPK1; MET; BAD; NRAS; ALK; TFE3; RPL11 Phosphodiesterases in neuronal function 0.040324845 0.490576187 0.377085467 1.444743246 2010 44 PDE8A; GUCY1B2; ADCY4; PDE5A; PDE3A; Wikipathways PDE4A; ADCY10; PDE4B; ADCY3; ADCY7 Non-small cell lung cancer 0.008981498 0.383959044 0.37789964 1.576064839 449 66 E2F2; RB1; HRAS; PLCG2; E2F3; AKT2; PDK1; Wikipathways MAP2K1; EGFR; MAPK1; BAD; PRKCG; STAT5A; NRAS; ALK; PIK3R2; GADD45A; TP53; RXRB Breast cancer pathway 0.047455598 0.490576187 0.273185514 1.30107725 2377 142 E2F2; JUN; FGF5; RB1; HRAS; E2F3; AKT2; Wikipathways WNT11; DLL1; MAP2K1; FZD3; EGFR; MAPK1; FZD2; DVL2; RAD51; NRAS; BRCA2; RAD50; CSNK1A1L; DVL3; HES5; TNFSF11; HEYL; WNT10A; RPS6KB1; PIK3R2; GADD45A; WNT10B; TP53; PTEN; CSNK1A1; DLL4; DLL3; FRAT1; SHC2; CDK6; FGF19; TCF7L2; PIK3R1 LTF danger signal response pathway 0.008870384 0.383959044 0.590021326 1.746165632 440 16 IRAK1; MAPK1; TLR2; CD14; IRAK4; IL1B; Wikipathways AGER; TREM1 G1 to S cell cycle control 0.0226521 0.429593065 0.365984997 1.487035992 1129 58 E2F2; MCM5; CCNE2; RB1; E2F3; POLE; ATF6B; Wikipathways MDM2; CDKN2C; PRIM1; MNAT1; CCNB1; CREB3; GADD45A; TP53; PRIM2; CDK2; POLA2; CDK6 DNA Replication 0.01926508 0.429593065 0.427640186 1.564624613 960 36 MCM5; POLD3; POLE; PRIM1; POLD1; PRIM2; Wikipathways CDK2; POLD4; RFC1; POLA2; UBC; RPA2; MCM10; POLA1; RFC4; MCM4; CDC7; MCM2 Hedgehog Signaling Pathway 0.047153273 0.490576187 -0.52375165 -1.521369023 2366 15 SAP18; KIF7; GLI3 Wikipathways IL-2 Signaling Pathway 0.045674089 0.490576187 0.380180057 1.433483893 2278 41 PTPN11; JUN; GAB2; PTK2B; HRAS; STAT1; Wikipathways MAP2K1; MAPK1; STAT5A Type II interferon signaling (IFNG) 0.02016339 0.429593065 0.429058527 1.55989128 1006 35 PTPN11; PSMB9; STAT1; NOS2; CXCL9; OAS1; Wikipathways IRF8; STAT2; JAK2; IL1B; HLA-B ErbB Signaling Pathway 0.018199656 0.429593065 0.330621667 1.455654347 908 88 JUN; CAMK2B; HRAS; PLCG2; AKT2; PDK1; Wikipathways MAP2K1; EGFR; MAPK1; BTC; MDM2; ERBB4; CAMK2A; BAD; PRKCG; EREG; STAT5A; NRAS; ABL2; RPS6KB1; PIK3R2; TP53; NRG4; HBEGF; BCL2L11; MAP2K4; SHC2; PAK1; PIK3R1 Serotonin Receptor 2 and ELK-SRF-GATA4 0.048044558 0.490576187 0.483773588 1.503377097 2397 19 HRAS; GATA4; MAP2K1; MAPK1; HTR2B; Wikipathways signaling NRAS; RASGRF1 Toll-like Receptor Signaling Pathway 0.04829932 0.490576187 0.307545254 1.351237917 2413 87 JUN; TLR1; IRAK1; SPP1; TLR8; STAT1; AKT2; Wikipathways MAP2K1; TICAM1; RIPK1; MAPK1; CXCL9; TAB1; TLR2; CD14; IRAK4; MAP2K3; IL1B; PIK3R2; IRF7; LBP; CD40; TAB2; MAP2K4; MAPK13; TRAF3; TLR6; PIK3R1

43