MASTER THESIS

B.Sc. Martin Zerbst

Transcriptome progression in community acquired pneumonia

2018

Faculty of Applied Computer Sciences and Biosciences

MASTER THESIS

Transcriptome progression in community acquired pneumonia

Author: Martin Zerbst

Study Programme: Molecular Biology/

Seminar Group: Mo15

First Referee: Prof. Dr. rer. nat. Dirk Labudde

Second Referee: PD Dr. rer. nat. habil. Hans Binder

Mittweida, January 2018

Bibliographic Information

Zerbst, Martin: Transcriptome progression in community acquired pneumonia, 75 pages, 13 fig- ures, Hochschule Mittweida, University of Applied Sciences, Faculty of Applied Computer Sci- ences and Biosciences

Master Thesis, 2018

Abstract

Community acquired pneumonia (CAP) is a very common, yet infectious and sometimes lethal disease. Therefor, this disease is connected to high costs of diagnosis and treatment. To actually reduce the costs for health care in this matter, diagnosis and treatment must get cheaper to conduct with no loss in predictive accuracy. One effective way in doing so would be the identification of easy detectable and highly specific transcriptomic markers, which would reduce the amount of work required for laboratory tests by possibly enhanced diagnosis capability. Transcriptomic whole blood data, derived from the PROGRESS study was combined with several documented features like age, smoking status or the SOFA score. The analysis pipeline included processing by self organizing maps for dimensionality and noise reduction, as well as diffusion pseudotime (DPT). Pseudotime enabled modelling a disease run of CAP, where each sample represented a state/time in the modelled run. Both methods combined resulted in a proposed disease run of CAP, described by 1476 marker . The additional conduction of a geneset analysis also provided information about the immune related functions of these marker genes.

Contents

Contents

List of Figures

List of Tables

Nomenclature

1 Introduction 1

2 Methods & Materials3 2.1 Origin of data ...... 3 2.2 SOFA score ...... 3 2.3 Used software ...... 4 2.4 Self organizing maps ...... 4 2.5 Geneset analysis ...... 5 2.6 Correlation of documented features with metagenes ...... 5 2.7 Pseudotime analysis ...... 6 2.8 Identification of markers and features ...... 6 2.9 Determination of transcriptomic changes ...... 7 2.10 Describing progression of CAP ...... 8

3 Results 9 3.1 Relation of features and pneumonia ...... 9 3.2 Pseudotime analysis ...... 10 3.3 Identification of possible marker spots ...... 12 3.4 Describing progression of CAP ...... 14

4 Discussion 19 4.1 Relations of features and CAP ...... 19 4.2 Functional analysis of spots and marker genes ...... 19 4.3 Description of CAP progression ...... 20

5 Summary 29

A Supplementary data 31 List of Figures

3.1 DC of DPT analysis, using 2500 metagene expression profiles, with descriptive

colour overlays ...... 10

3.2 DC of DPT analysis, performed using all 48,107 expression profiles, with

descriptive colour overlays ...... 11

3.3 DC of DPT analysis, performed only with 1476 marker profiles,

with descriptive coloured overlays ...... 11

3.4 Sample-wise expression profiles of metagenes with respective significance, ordered

by pseudotime ...... 12

3.5 Significant, sufficient variant metagenes and their positions within the SOM . . . 13

3.6 Spreading of CAP to other organs with increasing pulmonal subSOFA score by

plotting mean subSOFA scores of each organ against the respective pulmonal

subSOFA score...... 15

3.7 Modelled, pseudotime dependent likelihood of metagene expression, sorted spot-

wise by turning point of likelihood distribution model ...... 15

3.8 Modelled, pseudotime dependent likelihood for marker gene expression, sorted

spotwise and by switch point ...... 17

A.1 Starting parameters and sorting of preprocessed/normalized data ...... 32

A.2 Definition of function: fetch (sub-)SOFA scores from files ...... 33

A.3 Calculation and visualization of pseudotime ...... 33

A.4 Function definition for the generalized additively model describing relations be-

tween pseudotime and meta-/genes ...... 34

A.5 Function for distributing metagenes into separate spots ...... 35 List of Tables

1.1 Most common pathogens causing CAP ...... 1

2.1 Determination of subSOFA scores ...... 3

2.2 Packages used within R/Rstudio, without dependencies ...... 4

3.1 Additional documented features for each patient correlated with metagene expressions 9

3.2 Selection of Chaussabels genesets and their enrichment in spot A and B . . . . 13

3.3 Frequencies of most relevant keywords from geneset titles by ’oposSOM’ . . . . 14

3.4 Results of the geneset analysis performed with Chaussabels genesets for the

1476 marker genes determined...... 18

A.1 Transcriptional switches of meta-/genes ...... 35

I

I. Nomenclature

CAP ...... Community acquired pneumonia CNS ...... Central nervous system DC ...... Diffusion components DN ...... Downregulation/Downregulated at DPT ...... Diffusion pseudotime GPCR ...... G- coupled receptor GTP ...... Guanosine triphosphate ICU ...... Intensive care unit LTα ...... Lymphotoxin α MAP ...... Mean arterial pressure mmHg ...... Millimetre mercury ROS ...... Reactive oxygen species SOFA ...... Sequential organ failure assessment SOM ...... Self organizing maps TNF ...... Tumor necrose factor UP ...... Upregulated/Upregulation at WHO ...... World Health Organization

Chapter 1: Introduction 1

1 Introduction

Community acquired pneumonia (CAP) is one of the most common infectious and po- tential serious diseases worldwide. Especially children and elderly persons are infected at most. Considering a rising life expectancy, the incidences of CAP will increase in the future, [1] leading to higher treatment and health care costs ($8.4 billion annual in the USA) [2]. In 2014, 16,000 people died in Germany, while 2015 the numbers increased to 20,000 [3]. CAP thereby was the eighth leading cause of death with 2.1% frequency, compared to chronic ischemic heart failure, which is the leading cause of death with 8.2% frequency. Acute myocardial infarction, categorized after ICD-10 by WHO [4], is the second leading cause of death with a frequency of 5.3%. To reduce mortality, several aetiologic studies were conducted, showing that usually either bacterial infec- tion, viral infection or a combination of both cause CAP. Most common pathogens are listed in Table 1.1 [5] [6] [1]. Typically, S. pneumoniae causes CAP in children and se- niors, whereas young adults suffer more from atypical infections (e.g. with Mycoplasma pneumoniae [6]) including other pathogens [2].

Table 1.1: Most common pathogens causing CAP Pathogen Frequency [%] Streptococcus pneumoniae 25 - 43.6 Coxiella burnetti 6 - 18.5 Haemophilus influenzae 5 - 11 Virus in general 10-14.4 Legionella sp. 8 Chlamydia spp. 7 - 10.6 Gram negative enteric bacilli 6 Pseudomonas aeruginosa 5 Mycoplasma pneumoniae 15.9

The Diagnosis of CAP consists of 3 steps: physical examination (auscultation), radio- logical examination (actual establishment of diagnosis) and finally laboratory tests (e.g. leukocyte count, sputum Gram stain, blood cultures and urine antigens) [2]. With the identification of specific, transcriptomic markers, diagnosis of CAP would become more secure and cheaper. Such markers would greatly provide help in choosing the right treatment like whether antibiotics are required or if an enhanced risk of death exists. To know, which genes are expressed during which phase of the disease would greatly reduce costs of tests usually checking a broad range of criteria [7]. Most studies yet focus influence of single nucleotide polymorphisms in CAP regarding TNF- and LTα- gene polymorphisms, pattern recognition molecules (esp. mannose binding lectins), inflammatory molecules and the coagulation system [8] [9]. In this thesis, the main objectives are sorting the blood transcriptome data from 392 CAP patients, taken from the PROGRESS study, by degree of progressed CAP in order 2 Chapter 1: Introduction to find relations between disease progression and gene expression. Mostly anamnetic features and the SOFA score, a typical scoring system for sepsis and organ failure, will be taken into acount as well. The resulting marker genes from this procedures shall be a proposed model, applicable as tool in diagnosing stages of CAP. With the help of diffusion pseudotime, this model will also contain markers for different courses of CAP, if they exist. Chapter 2: Methods & Materials 3

2 Methods & Materials

2.1 Origin of data

Expression data including additional anamnetic patient data were obtained from the PROGRESS study. This study deals with adults, being invasive or non-invasive artificially respirated and who suffer from CAP. The 796 samples contained RNA from stabilized whole blood serum, partially taken at different time points during the treatment of the 392 patients [10]. Expression values were determined with Illumina HT12v4 chips. The data were obtained preprocessed by Dr. Holger Kirsten and Prof. Markus Scholz from the Institute of Medical Informatics, Statistics and Epidemologics (IMISE) at the Leipzig University. Methodics for the PROGRESS study are mentioned within the NIH database for clinical trials [11] and in the respective paper [10].

2.2 SOFA score

To deal with organ failure, sepsis and dysfunction in general, the SOFA-score (Sequential organ failure assessment - score) was introduced. It is one of the most relevant scoring systems to estimate stages of pneumonia [12]. It’s usually applied to patients at the intensive care unit (ICU). In order to gain the SOFA score of a patient, several criteria depending on the organ/organic system are tested (see Table 2.1). For every organ, a subSOFA score is determined, adding up to a total SOFA score [13]. This score, as well as the single subSOFA scores may be of importance and provide information about progressing CAP and which samples belong to which states or scores.

Table 2.1: Determination of subSOFA scores SOFA score 1 2 3 4 Respiration: PaO2/FiO2 [mmHg] <400 <300 <220 <100 SaO2/FiO2 221-331 142-220 67-141 <67 Coagulation: Platelets x 10-3/ml <150 <100 <50 <20 Liver: Bilirubin [mg/dl] 1.2-1.9 2.0-5.9 6.0-11.9 >12 Cardiovascular: Hypotension MAP Dopamine Dopamine Dopamine < 70 mmHg ≤ 5 µg/kg/min > 5 µg/kg/min > 15 µg/kg/min or detected or or dobutamine norepinephrin norepinephrin ≤ 0.1 µg/kg/min > 0.1 µg/kg/min CNS: Glasgow coma score 13-14 10-12 6-9 <6 Renal: Creatinine [mg/dl] 1.2-1.9 2.0-3.4 3.5-4.9 >5.0 Urine output [ml/d] - - <500 <200 4 Chapter 2: Methods & Materials

2.3 Used software

The packages mentioned in Table 2.2 were used with Rstudio v.1.0.143 and the 64- bit version of Fedora 25 as operating system. For the package ’oposSOM’ the most recent and not yet published ’in-house’ version was used with approval of Dr. Henry Löffler-Wirth. A ’light’ version is available from the Bioconductor Project. The package ’DPT’, currently not available on Bioconductor or CRAN repository, was obtained from the Institute of Computational Biology at the German Research Centre for Environmen- tal Health ’Helmholtz-Zentrum München’. Starting parameters for oposSOM are men- tioned in the supplementary data (see Figure A.1), as well as self written functions in R.

Table 2.2: Packages used within R/Rstudio, without dependencies Package Version oposSOM in-house VGAM 1.0 DPT 0.6.0 pscl 1.5.2

2.4 Self organizing maps

When dealing with large amounts of data, several strategies for optimal processing can be applied. This thesis works with the unsupervised learning approach of self organizing maps (SOM) using the R package ’oposSOM’ [14] from the Bioconductor Project [15]. The concept of SOM was introduced in 1982 by Kohonen. It provides the means to reduce multidimensional data to much less dimensional spaces [16]. In this thesis a two dimensional, matrix like map is calculated. 48,107 genes per 796 different samples from 392 patients were used as input data for oposSOM, training a 50 times 50 SOM, form- ing a visualizable matrix like structure. Each sample contains one expression value for each of the 48,107 genes, which means that every gene shows 796 different expression values, depending on the sample. These respective 796 values are considered gene expression profiles, a vector for each of the 48,107 genes. The SOM algorithm then would try to find representative vectors most similar to the gene profiles. With the help of these representatives, so called metagenes, the vectors are clustered by similarity to these metagenes with minimal to no loss of information. These metagenes then are adapted to the respective clustered genes in a defineable number of training cycles (see Figure A.1), resulting in 2500 metagenes. Each metagene thereby represents a cluster of genes that show very similar expression patterns. Metagenes themself can be quite similar and thus are sorted by similarity into the SOM grid, whereas the most similar metagenes are always found next to each other. Metagenes that appear apart from each other showed large distances within the SOM grid. Also, metagenes, with genes showing a narrow expression range, are positioned in the center of the SOM, while Chapter 2: Methods & Materials 5 metagenes containing genes with a wide range of expression values usually appear at the border of the SOM. By similarity, adjacent metagenes can be associated to certain functions. The ’oposSOM’ pipeline uses a variety of statistical and analytical tools to identify metagenes in all generated SOMs (796 in this thesis) fitting special require- ments, like overexpression, underexpression or correlation with each other. This way, an overall summary for the analyzed feature is generated in one SOM grid. Coordi- nates of metagenes meeting the conditions are called spots, whereas the conditions to identify such spots, can be defined at will. The package ’oposSOM’ uses this algorithm, building a working pipeline around it to provide lots of additional functions, e.g. helping to visualize the metadata/-genes provided by the SOM into portraits [17]. One of the tools most relevant within the ’oposSOM’ pipeline was the geneset analysis, mentioned in Section 2.5.

2.5 Geneset analysis

One of the numerous features in the ’oposSOM’ pipeline is the geneset analysis based on Fisher-test. With this analysis, given spots can be tested for enriched genesets, whereas not the contained metagenes are required, but genes summarized by them. Such genesets contain a number of genes associated with each other. The association thereby can originate in network analysis, observed and deducted pathways or simply genes expressed under the same circumstances [18]. Thus, a pathway can be under- stood as a geneset, too. The genesets used in this thesis are described by Chaussabel and were defined with the main target to find disease related marker genes. Also, genes in the respective genesets were functionally associated with each other, representing processes or cell types [19]. Geneset analysis in the ’oposSOM’ pipeline was performed genset-wise via Fisher-test, followed by determination of adjusted p-values. Since these steps were performed by the ’oposSOM’ pipeline, annotation for genes from the gene- set, as well as for genes contained in the analyzed data should be Ensembl ID. In order to find certain spots within the SOM associated with a certain function or vice versa, geneset analysis based on Fisher test appeared the best choice.

2.6 Correlation of documented features with metagenes

To evaluate whether the SOFA score, subSOFA score or other determined features are an appropriate tool for description of CAP, they were correlated with the meta- gene expressions of the respective samples. Due to the categorical type of data, every feature was correlated by spearman correlation. The only exception was ’age’, for which Pearson correlation was used. 6 Chapter 2: Methods & Materials

2.7 Pseudotime analysis

Pseudotime is a concept to determine the chronological order of experimental data. It can be used to analyze data from measurements of heterogenous populations like cells/tissues or progressions of different diseases. Most of the results usually can’t be directly dated to certain times (e.g.: myeloid lineage) because of their nonlinear behaviour [20] or heterogenity in e.g. cellular and molecular properties [21]. Methods like hierarchical clustering, principal component analysis or independent component analysis are designed to detect discrete subgroups but lack the possibility to preserve a continuous trajectory of development [22]. The algorithm, recently proposed by ’Wan- derlust’ [23], showed a first approach to this problem but was only able to detect non- branching trajectories through the data space [22]. Since there are currently no known differences in CAP except their aetiologic causes, using the Wanderlust algorithm would result in missed subtypes and courses of CAP.For this reason, pseudotime analysis was performed using the ’Diffusion pseudotime’ (DPT) algorithm. This algorithm is based on the concept, that each ’state’ in a dataset is able to diffuse into another state with a certain likelihood. Here, this would be probabilities for transitions between each of the 796 samples. From these probabilities, a so called transition matrix is formed, which eigenvalues and eigenvectors enable a pseudotemporal ordering of given data. The two eigenvectors associated with the lowest eigenvalues, further referred to as diffusion components (DC), provide a simple visualization of the sorted data. The n-th elements of those two DC thereby act as the coordinates of the respective sample in the plot [22]. Calculating a disease progression, based on each provided sample and sorted by a chronological order would greatly enhance the possibilities of describing CAP. Such a model enables to associate features with numerically defined sequences of CAP. The biggest advantage of the DPT algorithm is its ability to detect ’branches’ within the analyzed data. Like for cells of the myeloid lineage, a branching point would mark the time when e.g. common lymphoid progenitor cells differentiate into either natural killer cells or small lymphocytes [24]. Thus, 3 branches would be observed: one for the common path of differentiation and two branches for the respective and separated differentiation into natural killer cells or small lymphocytes. Most importantly, if the dis- ease progression of CAP should show different courses, a branching could be observed in this case as well. In this thesis, meta-/genes were used as the required ’states’, for which the transition matrix and the pseudotime were calculated. To compare the DPT results for metagenes with selected genes, the pseudotimes for both results were normalized.

2.8 Identification of markers and features

To deviate markers from the best pseudotime model, the meta-/gene profiles were tested for significance regarding pseudotime. Therefor, a generalized additively model estimating relations between pseudotime and meta-/genes (see Figure A.4) was applied. Chapter 2: Methods & Materials 7

The DPT model with the least significant meta-/gene profiles was discarded. Samples from the metagene DPT model were sorted by pseudotime in increasing order, forming a matrix, where each line then would show the expression behaviour over time. Samples, assigned to neither ’branch 1’ or ’branch 2’ were discarded as too unspecific. This way, the metagene profiles contain only expression values from samples most clearly associated to certain parts of CAP progression. Remaining metagenes, that passed the p-value threshold were correlated with pseudotime and those with an absolute Pearson correlation below 0.7 removed. Metagenes with a variance below 0.02 were assumed genetically inactive and removed as well. Remaining neighboured metagenes were grouped into respective/separate spots (see Figure A.5). To provide comparability of these results to other studies, the steps from Section 2.7 were repeated with the genes contained in the metagenes of all identified spots. The remaining genes again were correlated with pseudotime and filtered by variance. Those with an absolute Pearson correlation below 0.7 and a variance below 0.02 were removed. Identified spots within the SOM were checked for potential specific functions by a general geneset analysis with all genesets contained in the ’oposSOM’ pipeline. The pipeline internal description of all significant genesets with a p-value below 0.05 were checked for keywords. After- wards, geneset analysis only with Chaussabel’s genesets was conducted, taking only those genests into account that were proven significant regarding the enrichment with a p-value below 0.05.

2.9 Determination of transcriptomic changes

When dealing with biological data, especially expression values, these values often behave in a non-linear way. Normalized to a range from 0 to 1 and plotted against the pseudotime, these values form a logit-curve (compare Equation 2.1), representing the odds of a meta-/gene being expressed. The odds of an event are likewise described by Equation 2.1, which means that p represents the probability of a gene being expressed. In order to properly process this data, logistic regression was applied [25] [26]. p f (p) = ln (2.1) 1 − p

To do so, it is assumed that meta-/genes are divided into binary states of either being or not being expressed. Therefor, the normalized expression values exceeding 0.5 were set to 1 (expressed), while those values ≤ 0.5 were set to 0 (not expressed). The function describing the probability, whether the respective gene is expressed or not and when is called sigmoid function and is the inverse function of the logit function. A generalized linear ’logit’ model then was applied to the binarized expression data. This model uses a linear link function (see Equation 2.3) to formulate the sigmoid function (see Equation 2.2), as well as the logit function.

ex f (x) = (2.2) 1 + ex 8 Chapter 2: Methods & Materials

The generalized linear model was fed with the binarized expression data. The resulting coefficients β0 and β1 were used as coefficients in Equation 2.3 and refered to by f (p), which described the linearized respective meta-/gene models. p ln = β + β · x + ... + β · x (2.3) 1 − p 0 1 1 n n

The linear function f (p) was inserted into Equation 2.2 for x, which resulted in the following Equation 2.4 and led to a model describing the current meta-/gen-wise logistic problems [25] [26]. e f (p) sig( f (p)) = (2.4) 1 + e f (p) Finally each model then was tested by McFadden pseudo R2 [27] from the pscl package. Values above 0.2 were expected for excellent models [28].

2.10 Describing progression of CAP

The first approach in describing the progress of CAP was based on the subSOFA scores under the assumption, that progressing CAP is represented by a rising pulmonal subSOFA score. For each pulmonal subSOFA score, the mean of every other sub- SOFA score in samples with the same pulmonal subSOFA score was determined and plotted against the pulmonal subSOFA score. The behavior of mean subSOFA scores depending on the pulmonal subSOFA scores were expected to show trends which organs are affected by progressing CAP.The second approach for describing the disease progress was relying onto the switch points of modelled meta-/gene expression. Since gene assignments to metagenes does depend on the data used and the use of meta- genes is not broadly established state of the art, spot-wise converting of metagenes back into genes was necessary. In order to gain only relevant genes, a spot-wise geneset analysis as described in Section 2.5 was performed. For each geneset a p-value was calculated, whereas genesets with a p-value > 0.05 were discarded as most likely been found randomly. From the remaining genesets, only genes were taken into account, that showed an absolute Pearson correlation value of at least 0.7 with the DPT. Thus, only genes that provided an Ensembl ID by the oposSOM pipeline could be taken into account. To compare these genes with metagenes, DPT calculation was repeated as mentioned in Section 2.7 with the selected genes from respective metagenes. Those genes, whose p-value again exceeds 0.05 and whose absolute Spearman correlation value were below 0.7 with the newly calculated DPT were dis- carded. To determine the time, when meta-/genes are up- or downregulated, logistic regression of expression values was applied. The turning points of the resulting models were used as ’switch points’, the pseudotime value, when the respective meta-/gene was up- or downregulated. Chapter 3: Results 9

3 Results

3.1 Relation of features and pneumonia

Correlating all 2500 metagene expression profiles obtained from SOM training with given features alone led to a correlation range from -0.45 to 0.40. These values were achieved by correlating solely metagenes with the total SOFA score of respective sam- ples. Any other feature mentioned in Table 3.1 correlated with metagenes performed worse. The Pearson correlation of metagene expression with permutated combinations of subSOFA scores ranged from -0.47 to 0.46. Highest correlation value (0.46) was achieved when metagenes with all subSOFA scores except CNS- and coagulation sub- SOFA scores were correlated with metagene expression profiles. The lowest correla- tion value (-0.47) ocurred, when metagene correlation with a combination of pulmonal-, renal-, liver- and cardiovascular subSOFA score was performed. Obviously, the docu- mented features carry no simple relations with metagenes. But it can’t be excluded, that there might exist a linear combination of features or even a linear model, that shows better correlation with some metagenes. Only this way, a potential course of CAP could be evaluated with these features.

Table 3.1: Additional documented features for each patient correlated with metagene expressions Feature Correlation type Correlation range Age Pearson -0.24 ... 0.32 Sex Spearman -0.21 ... 0.12 Died within 28d Spearman -0.15 ... 0.17 Died at all Spearman -0.24 ... 0.22 Applied antibiotics Spearman -0.23 ... 0.21 Additional chronic disease present Spearman -0.20 ... 0.28 Smoker Spearman -0.21 ... 0.15 Vaccinated against Influenza Spearman -0.12 ... 0.18 Vaccinated against Pneumococcus Spearman -0.10 ... 0.14 Bacterial infection Spearman -0.12 ... 0.14 Viral infection Spearman -0.10 ... 0.10 Fungal infection Spearman -0.12 ... 0.15 SOFA score Spearman -0.45 ... 0.40 Pulmonal subSOFA Spearman -0.37 ... 0.33 Cardiovascular subSOFA Spearman -0.25 ... 0.32 CNS subSOFA Spearman -0.20 ... 0.25 Liver subSOFA Spearman -0.28 ... 0.29 Renal subSOFA Spearman -0.27 ... 0.25 Coagulation subSOFA Spearman -0.18 ... 0.23 10 Chapter 3: Results

3.2 Pseudotime analysis

With the results from the DPT analysis, a disease progress for CAP was modelled, where each time in this model is represented by a sample. Depending on the input data (metagenes or all genes), the visualized progression of CAP over time (compare Figures 3.1, 3.2 and 3.3) appeared with different density of samples. The model created by using all available 48,107 genes appeared the least dense regarding the plot of DC (see Figure 3.2). Calculating the DPT from metagenes lead to more dense sample distribution, seen in Figure 3.1. The most dense sample distribution, leading to the most sharp DPT trajectory, was obtained with the help of 1476 marker genes, which were identified in section 2.8. All three models have in common, that no distinct third branch could be evaluated. Further, all visualized samples appeared more or less ar- ranged in a similar curved structure. The results of the pseudotime analysis performed with metagenes are shown in Figure 3.1 (a) and (b). Figure 3.1 (a) shows the DCs, plotted against each other to form a simple pseudotemporal course of samples. The samples were coloured by their respective branch assignment through the algorithm. 557 samples belonged to ’branch 1’ (coloured in green), 189 samples to ’branch 2’ (coloured in blue) and only 5 samples to ’branch 3’ (coloured in pink). Although a third branch was detected, the number of samples assigned to it represent less than 1% of all samples. Such a branch most likely bears no relevant information, and thus was excluded from further discussion. 44 samples laid between ’branch 1’, ’branch 2’ and ’branch 3’. These samples couldn’t been clearly assigned to one of those branches by the DPT algorithm and therefor defined as ’uncertain1,2,3’ (coloured in orange). Only one sample seemed more problematic for assignment, which is why DPT algorithm declared it as ’unassigned 1,2,3’ (coloured in yellow). Figure 3.1 (b) shows the same plot of DC, but with a different colour overlay (blue scale) and the calculated pseudotime trajectory (red). Each sample is coloured by their calculated pseudotime, showing that the pseudotime begins at ’branch 1’ and ends at ’branch 2’.

(a) Branch assignment of samples with all (b) Trajectory of DPT through all samples with metagene expression profiles metagene expression profiles

Figure 3.1: DC of DPT analysis, using 2500 metagene expression profiles, with descriptive colour overlays .

The approach trying to model a pseudotemporal disease progression based on gene ex- pression entirely showed worse results than the approach with metagenes or selected Chapter 3: Results 11 genes. Pseudotime calculated from samples with all 48,107 genes (see Figure 3.2 (a) and (b)) appeared with similar branch assignments and a similar spatial arrangement of samples but far less dense like in Figure 3.1. ’branch 1’ contained 367 samples, ’branch 2’ 384 samples, ’branch 3’ contained 2 samples and 43 samples were summa- rized as ’uncertain 1,2,3’. So using all gene expression profiles instead of metagene expression profiles for pseudotime calculation resulted in less samples for ’branch 1’ (coloured in green), more samples in ’branch 2’ (coloured in blue) and only two samples in ’branch 3’ (coloured in pink). The two samples in ’branch 3’ are even less than those five from ’branch 3’ of DPT analyis with metagenes. The number of uncertain samples (coloured in red) remained nearly the same as well. DPT analysis with all genes showed no samples, that were assigned to ’unassigned 1,2,3’. Regarding Figure 3.2 (b), DPT progresses from ’branch 1’ to ’branch 2’, like in the DPT model for metagenes.

(a) Branch assignment of samples with (b) Trajectory of pseudotime through all gene expression profiles samples with gene expression profiles

Figure 3.2: DC of DPT analysis, performed using all 48,107 gene expression profiles, with descriptive colour overlays

Repeating DPT calculation with the 1476 marker genes described in section 3.3 led to the results seen in Figure 3.3. ’branch 1’ (coloured in green) contained 278 samples, ’branch 2’ (coloured in blue) contained 497 samples and 21 samples were declared ’uncertain 1,2,3’ (coloured in orange). Again, there were no samples declared as ’unas- signed 1,2,3’. In addition, no samples were assigned to ’branch 3’ either.

(a) Branch assignment of samples (b) Pseudotime trajectory through only containing marker gene ex- samples containing only marker pression profiles gene expression profiles

Figure 3.3: DC of DPT analysis, performed only with 1476 marker gene expression profiles, with descriptive coloured overlays 12 Chapter 3: Results

3.3 Identification of possible marker spots

Before spots and markers of interest could be identified, the two DPT models, calculated from all meta-/gene expression profiles had been checked for how good the respective expression profiles are related to DPT. To use the most sharp expression profiles, only samples from ’branch 1’ and ’branch 2’ of the respective DPT model were taken into ac- count. Application of the generalized, additively model to the DPT model calculated with all 2500 metagene expression profiles and the DPT model calculated using all 48,107 gene expression profiles, showed very different results. Testing the metagene profiles from the remaining 746 samples for significance regarding the pseudotime, lead to 2361 metagenes with a p-value smaller than 0.05, leaving 139 metagenes as insignificant. These 139 metagenes comply around 6% of the total amount of metagenes. Testing all 48,107 gene expression profiles regarding their relation to calculated DPT resulted in 18,795 out of 48,107 genes whose adjusted p-value exceeds 0.05, which is about 39%. For the application of the generalized additively model, 751 samples from ’branch 1’ and ’branch 2’ could been used from the DPT model calculated using gene expression profiles. Comparing the amount of insignificant expression profiles in both DPT models, pseudotime calculated using metagenes appeared far better with only about 6%. Therefor, this DPT model was used to identify marker genes for progress description of CAP.

Figure 3.4: Sample-wise expression profiles of metagenes with respective significance, ordered by pseudotime .

The first step in finding appropriate markers of pseudotemporal CAP progress, required the identified significant metagenes of the respective DPT model. All 746 samples, ordered coloumn-wise by pseudotime obtained from DPT analysis with metagenes were shown in Figure 3.4. Therefor, the pseudotemporal course of 2500 metagene expression profiles, coloured by expression values, can be seen row-wise. To the right, a significance plot was added, showing the adjusted p-values of the metagene profiles regarding their relation with DPT. Genes within metagenes, who show mostly invariant expression, appear white. Such metagenes usually are found in the center area of SOM grids. Chapter 3: Results 13

(a) Remaining metagene profiles, clustered (b) Identified positions of relevant meta- by similarity genes in the SOM

Figure 3.5: Significant, sufficient variant metagenes and their positions within the SOM

After applying Pearson correlation filter (absolute correlation > 0.7) and variance filter (variance > 0.02), only 315 metagenes remained fulfilling both criteria (see Fig. 3.5(a)). The expression values of the metagene profiles, which were ordered by pseudotime, showed a logit curve. Separating these metagenes into distinct areas led to two spots ’A’ and ’B’, seen in Figure 3.5(b). Each row counts as a metagene profile, coloured by expression values at the respective pseudotimes. All 190 metagenes in spot A are upregulated over time, while all 125 metagenes in spot B are downregulated over time. Converted into genes, 6246 belong to spot A and 2610 genes belong to spot B. All genes including their assigned metagenes and the respective spot are mentioned in Table A.1 in the supplementary data. Checking the ’oposSOM’ geneset titles for key- words, resulted in a few first hints regarding possible functions associatable with all marker metagenes in both spots. The frequencies of chosen terms are mentioned in 3.3. Most frequent keywords relevant were ’immune’/’immunome’ and differentiation. All other documented keywords can be associated with these nouns. Most likely, the disease progression is described by genes related to the immune system, since ’immune’ was one of the most frequent words mentioned in Table 3.3. ’Differentiation’ then would relate especially to cells of the myeloid lineage, of whom several are activated during immune response, followed by partly rapid differentiation into more specific cells, like T-helper cells.

Table 3.2: Selection of Chaussabels genesets and their enrichment in spot A and B Geneset Spot enriched p-value 2,8_T-cells A 1.35e-74 2,4_Ribosomal A 1.89e-54 2,1_Cytotoxic cells A 2.65e-28 2,6_Myeloid lineage B 1.22e-86 3,3_Inflammation II B 1.00e-99 1,5_Myeloid lineage B 6.19e-44 14 Chapter 3: Results

Table 3.3: Frequencies of most relevant keywords from geneset titles by ’oposSOM’ Terms Frequency Immunome, immune system, -regulation 56 Differentiation 58 Actin 17 Ubiquitin 35 B-cell/-s 5 Dendritic 14 Granulocytes 1 Megakaryocytes 2 Monocytes 21 Macrophage/-s, Macroautophagy 21 Myeloid cells 17 Natural killer cells 18 Platelet 13 T-cells 12 T-helper cells 11 Thymocytes 1

Checking the spots for possible enriched functions by a proper geneset analysis resulted in a small selection mentioned in Table 3.2. T-cells, as well as cytotoxic cells can be associated with the genesets 2,8_T-cells and 2,1_Cytotoxic cells, both enriched in spot A. Both cell types are important members of the hummoral/adaptive immune response. Expression of related genes grows important, once hummoral/adaptive immune re- sponse is initiated [19] [29]. Geneset 2,6_Myeloid lineage, mainly contains genes as- sociated with granulocytes and monocytes. The similar geneset 1,5_Myeloid lineage is associated with monocytes as well and further with dendritic cells. These cell types are typically activated during innate immune response. Summarized, it appeared that spot A most likely represents genes important for hummoral and/or adaptive immune response, while genes contained in spot B are more likely part of the innate immune response [19] [29]. Geneset 3,3_Inflammation II showed the lowest p-value with nearly zero. Inflammation however is a process to general, to be associated with phases of immune response.

3.4 Describing progression of CAP

Plotting pulmonal subSOFA score dependent means of other subSOFA scores with their respective pulmonal subSOFA score seen in Figure 3.6 showed linear increase in re- nal and liver subSOFA score. Coagulation subSOFA score showed a general increase, but partially alternated with progressing pulmonal subSOFA score. An exponential in- crease of the mean subSOFA score was noticeable for the CNS and the heart. For a pulmonal subSOFA-score of 1, already every documented organ/system can be af- Chapter 3: Results 15

Figure 3.6: Spreading of CAP to other organs with increasing pulmonal subSOFA score by plotting mean subSOFA scores of each organ against the respective pulmonal sub- SOFA score. fected by progressed CAP. 31 samples out of 211 samples with a pulmonary subSOFA score of 1 showed an increased renal subSOFA score with a mean of 0.19. Likewise, 32 samples showed an increased Coagulation subSOFA score with a mean of 0.18. Only four samples showed an increased subSOFA score for both systems. The heart showed affectivity in 18 samples, while the liver was affected in 13 samples. There was only one sample shared by both. Finally, the CNS was affected only in three cases with no effect on other systems occuring. As seen in Fig. 3.6 CNS and cardiovascular activity is exponential affected with progressing disease. The coagulation score shows a slight increase in general but tendentially differs with increasing pulmonal subSOFA score. The mean subSOFA score of kidney and liver increased linearly, whereas kidney subSOFA score showing the higher means in general.

Figure 3.7: Modelled, pseudotime dependent likelihood of metagene expression, sorted spot- wise by turning point of likelihood distribution model

The first result for trying to describe CAP progression is shown in Figure 3.7 using metagene expression probability distributions. Logistic regression of each sorted meta- gene expression profile resulted in 315 modelled probability distributions, which are 16 Chapter 3: Results plotted row-wise in Figure 3.7 and coloured by the pseudotime dependent likelihood. The models hereby range from 0 (downregulated, blue coloured) to 1 (upregulated, red coloured). In addition, the models were sorted row-wise for each spot by their respective turning point (white coloured). Metagenes from spot A thereby are likely upregulated in the beginning of CAP, becoming downregulated over time. Metagenes from spot B on the other hand, were likely downregulated in the beginning and later upregulated over time. Pseudotimes referring to the turning points (white coloured) of 315 metagene like- lihood distributions are mentioned in Table A.1 in the supplementary data. With the time points known, when metagenes change their expression patterns, the current state of CAP can be narrowed down to a short period of pseudotime. This period depends on the metagenes expressed and those who are not expressed. Since the turning points of the models mark time points, when expression patterns most likely are changed, they will be referred to as ’switches’ or ’switch points’. Since the concept of SOM isn’t state of the art yet, the 315 marker metagenes were reprocessed back into genes to enable comparabilty to results of other studies and papers. The 315 metagenes summarized 6544 genes, from which only 5755 genes were provided an Ensembl ID. Since geneset analysis in the ’oposSOM’ pipeline required genes annotated with Ensembl ID, only the 5755 genes with the respective ID were taken further into account. After geneset enrichment analysis, 4134 geneset out of 9828 curated genesets within the ’oposSOM’ pipeline appeared significant with a p- value below 0.05. Each of the significant genesets shared at least 14.4% of their genes with the remaining 5738 genes, obtained from converting metagenes back into genes with an Ensembl ID. Nearly every gene was found in multiple genesets. Pearson correlation of the 5738 gene expression profiles with DPT, calculated using metagene profiles, ranged from -0.91 to 0.89. Taking only genes into account, that achieve at least 0.70 absolute correlation, left 1476 genes. Since none of these genes showed a variance below 0.02, all 1476 were used as marker genes. Repeating DPT calculation with these 1476 marker genes led to the results seen in Figure 3.3(a) and (b). Compared to the structures in Figure 3.1 (a) and (b), Figure 3.3 (a) and (b) appeared more slim. The Pearson correlation of gene expression with DPT this time ranged from -0.93 to 0.90. When tested for significance regarding the course of DPT, every sample had a p-value < 0.05, whereas the lowest p-value was 3.05·10−98. For determined marker genes, switch points were calculated with logistic regression as well. Comparing the switch pseudotimes of metagenes with those of the summarized genes, several differences were observed. From 1476 genes, 788 gene switch times are more than 0.05 aberrant from their respective metagene switch point. Increasing this threshold to 0.1 leaves only 312 genes aberrant and a threshold of 0.2 leaves only 51 genes ’aberrant’ from their respective metagene switch point. The finally used likelihood models for marker genes are shown in Figure 3.8. Equal to Figure 3.7, gene expression likelihood models are plotted row-wise, sorted by respective spot and turning point of the model. They are coloured by probability of being upregulated as well, whereas likely upregulated states again are coloured in red and likely downregulated states are again coloured in blue. Switch/turning points of the likelihood models reside in the white Chapter 3: Results 17 colored areas. It shows clearly, that by converting metagenes back to their contained genes, the expression behavior doesn’t change. Spot A therefor contains all genes upregulted in the beginning and downregulated over time, whereas spot B contains all genes, downregulated in the beginning of CAP and upregulated over time. All marker meta-/gene switch points are mentioned in Table A.1 in the supplementary data. Mc- Fadden test showed, that metagenes are more appropriate for expression likelihood modeling, because the mean pseudo R2 of the models was 0.57, compared to the mean pseudo R2 of gene models with 0.39. Also the lowest pseudo R2 for metagenes was 0.25 and for genes 0.19, which is only slightly below the recommended 0.2 for an excellent model [28]. 135 of the 1476 Ensembl IDs are referred to by 288 illumina IDs, meaning that some genes are divided among different metagenes.

Figure 3.8: Modelled, pseudotime dependent likelihood for marker gene expression, sorted spotwise and by switch point

Table 3.4 contains the results from geneset analysis performed with the 1476 identified marker genes. Each of Chaussabels relevant genesets shared at least 32% of its genes with the marker genes. The function represented by each geneset was assigned to a pseudotemporal range, depending on the switch points of the contained genes in the coloumn ’DPT range’. Since the switch points of marker genes weren’t equally distributed, especially within genesets, the mean of these switch points was mentioned as well under DPT . These value marked the pseudotemporal main focus of respective genesets. With exception of ’1,5_Myeloid lineage’, evereý geneset had its main focus after 0.50. The general range for likely changed expression behaviour went from 0.19 to 0.78. Genesets related to innate immune response were especially ’1,5_Myeloid lineage’, ’1,7_MHC Ribosomal proteins’ and ’2,6_Myeloid lineage’. These genesets were enriched in spot B and therefor contained genes were upregulated over time. Genesets more related to adaptive and/or hummoral immune response were ’1,3_B- cells’, ’2,1_Cytotoxic cells’, ’2,8_T-cells’ and ’3,7_Spliceosome’. These genesets were enriched in spot A and therefor contained genes downregulated over time. Other gene- sets enriched in both spots more or less can be associated with immune response in general. Inflammatory processes are confirmed by enriched genesets ’3,2_Inflammation I’ and ’3,3_Inflammation II’. Switches of genes contained in both genesets cover nearly the whole pseudotime range, which means that inflammatory processes are very likely to cover most of CAP progression. This appears only natural, since inflammation is typical in diseases. 18 Chapter 3: Results

Table 3.4: Results of the geneset analysis performed with Chaussabels genesets for the 1476 marker genes determined. Intersecting Geneset p-value Share DPT range DPT genes 1,3_B-cells 4.58e-10 25 0.51 0.54 - 0.58 0.56 1,4_Replication 1.08e-10 42 0.38 0.47 - 0.63 0.57 1,5_Myeloid lineage 7.09e-24 60 0.54 0.19 - 0.65 0.47 1,7_MHC Ribosomal 3.9e-48 91 0.69 0.42 - 0.77 0.64 proteins 1,8_Metabolism 5.01e-08 43 0.32 0.39 - 0.66 0.57 Biosynthesis 2,1_Cytotoxic cells 3.81e-20 47 0.57 0.54 - 0.71 0.64 2,11_Replication 3.1e-08 39 0.34 0.31 - 0.70 0.57 2,4_Ribosomal 2.84e-40 75 0.69 0.36 - 0.69 0.59 proteins 2,6_Myeloid lineage 8.68e-50 97 0.67 0.22 - 0.70 0.54 2,8_T-cells 6.83e-57 91 0.79 0.49 - 0.78 0.61 2,9_Cytosceleton 1.37e-11 53 0.35 0.34 - 0.66 0.55 3,2_Inflammation I 1.28e-27 100 0.42 0.34 - 0.70 0.56 3,3_Inflammation II 1.31e-53 132 0.56 0.22 - 0.70 0.54 3,4_Protein 7.95e-20 103 0.34 0.48 - 0.70 0.59 phosphatases 3,5_Hemoglobin genes 6.89e-4 7 0.54 0.42 - 0.69 0.57 3,6_Mitochondrial 1.53e-20 91 0.37 0.32 - 0.70 0.56 ribosomal proteins 3,7_Spliceosome 2.13e-32 113 0.44 0.39 - 0.65 0.55 3,8_Enzymes 2.91e-27 95 0.43 0.44 - 0.71 0.59 3,9_Kinases 9.92e-20 88 0.37 0.44 - 0.71 0.58 Chapter 4: Discussion 19

4 Discussion

4.1 Relations of features and CAP

Correlation of the documented medical features and the metagene expression values ranging from -0.45 to 0.40 shows, that most likely none of these features are correlated to the disease progression of CAP. Each feature may contain aspects, that are more likely to corelate with the disease, but neither were documented nor likely to be easily detected. However, at least some features, like the SOFA score and the pulmonal subSOFA score, appear to be features dependent enough from CAP, to provide hints, which areas of the SOM and thus which genes and metagenes may play a role in the progression of CAP. Linear combinations of subSOFA scores with metagenes showed, that higher correlation values in the range of -0.47 ... 0.46 are achievable. Still, none of the documented features contain enough information to properly understand and describe the transcriptomic progression of CAP. When trying to find a method suitable for describing a disease progress, diffusion pseu- dotime was used. In the first approach, when DPT was calculated with metagene and genome expression data, the metagenes from the calculated SOM proofed more re- liable. Calculating the pseudotime with oposSOMs metagenes showed far less noise and a more sharp trajectory. This was especially illustrated by the number of signifi- cant meta-/genes in both procedures. DPT analysis with metagenes resulted in about 6% metagenes with a p-value over 0.05, whereas DPT analysis with genes required to remove about 39% of the genes due to a too high p-value. Besides, both calculations showed effectively no branching in the disease progression, considering the number of five, respectively three samples assigned to ’branch 3’. The number of samples within ’branch 1’ (557 with metagenes, 367 with genes) and ’branch 2’ (189 with metagenes, 384 with genes) differed considerably, but summarized, the difference number of sam- ples after pseudotime calculation laid with 746 samples only five samples behind the 751 usable samples of pseudotime calculation with genes.

4.2 Functional analysis of spots and marker genes

The most common key terms from geneset titles in the ’oposSOM’ pipeline, mentioned in Table 3.3, indicated that the most relevant functions of these genes are connected to cells of the immunome and to processes of the immune system. Hereby included are cells like monocytes, dendritic cells or T-cells, typically undergoing differentiation pro- cesses during innate and/or adaptive immune answer [29]. However, these key terms alone are not sufficient enough for either description of CAP progress, nor concrete functions inherited within the spots. Also, the question arises, whether the expression 20 Chapter 4: Discussion behaviour of the genes is CAP dependent or a general behaviour in reaction to pathogen invasion. To answer this, a comparison with disease progressions of other sicknesses is recommended. This way, a differential expression of genes is in the range of possi- bilities and may also reveal additional marker genes, which are not directly or indirectly related to immune response. In general, the geneset analysis with Chaussabel’s gene- sets showed, that the spots and marker genes are heavily related to immune response (see Table 3.4). Spot A thereby appeared to most likely represent the adaptive and hum- moral immune response, while spot B most likely represented innate immune response. For diagnosing purposes, genes from spot B therefor appear more appropriate. Re- garding the determination of CAP states, spot A should be taken into account as well, since parts of the immune response occuring in later stages of diseases.

4.3 Description of CAP progression

Using only subSOFA scores as means of description, different progressions of CAP in each systemic component of the body were visible (see Figure 3.6). With the assump- tion, that the pulmonal subSOFA increases when CAP progresses, Figure 3.6 showed linearly trends for renal and liver subSOFA scores, while the cardiovascular subSOFA score and the score for the CNS show exponentially trends. The subSOFA score for blood and coagulation oscillates but increases. The subSOFA scores itself are evident, that CAP affects not only the lung. Possibly dangerous trends are seen, when a pul- monal subSOFA score exceeds 3. Crossing this border means, that PaO2/FiO2 and/or SaO2/FiO2 fall below nearly the half of oxygen pressures in healthy people. Under these hypoxic conditions an exponentially increase of cardiovascular subSOFA score, which is determined also by norepinephrin concentrations, appears plausible [30]. The Glasgow-coma-score, which is the criteria for CNS subSOFA score, is also influenced by hypoxia [31]. So when the disease affects the CNS, the exponential course occurs because of taking the damage by hypoxia and by the pathogens into account. Trying to describe the disease progression based on the switch points of 1476 identified genes, it became clear, that not the complete time intervall between 0 and 1 is covered. While the first switch point was observerd with NCF2 (upregulated at (UP) 0.19), the last one appeared at 0.80 with EGR1. NCF2 is expressed in neutrophil macrophages, forms a complex with NCF1 (UP 0.68) and NCF4 (UP 0.34) to bind cytochrome b588 afterwards and to activate latent phagocyte NADPH oxidase. This would lead to oxida- tive burst, an essential step in phagocytosis [32]. EGR1 acts as transcriptional regulator responding to DNA damage, growth factors, hypoxia and ischemia. Being upregulated at 0.80 would mean, these effects have ocurred or still occur. Since EGR1 participates in tissue repair due to ischemia, it can be assumed that the body is trying to repair dam- ages from inflammation, respiratory burst, reactive oxygen species (ROS) and invading pathogens. Without data from healthy people or from people assumed freshly cured from CAP, it can not be verified, whether immune response is over at this point or not. Chapter 4: Discussion 21

The time between 0.19 and 0.80 appears more difficult to describe and deserves more attention for proper identification of possible marker genes. But since 1476 markers are still a large amount, not all can be discussed. Those genes being mentioned here were noticed by interacting with each other [32], being contained in the same geneset [19] or being one of very few genes within a certain range of pseudotime. Between 0.20 and 0.53 several receptor proteins are expressed, like SIRPA (UP 0.22, 0.37), CXCR1 (UP 0.28), FPR1 (UP, 0.30) and several Toll-like receptors (TLR1, UP 0.38; TLR4 UP 0.53; TLR5 UP 0.53, TLR6 UP 0.51 and TLR8 UP 0.52). SIRPA, which is a receptor for CD47 belongs to the immunglobulin superfamily. By binding CD47, den- dritic cells avoid phagocytosis by signaling [33]. CXCR1 is a G-protein coupled receptor (GPCR), binding IL8 with high affinity and thus mediating neutrophil chemotaxis [34][32]. Chemotaxis requires remodelling of cytoskeleton in order to migrate through the tis- sue. Genes participating in the process and respective signaling are CXCR1 (UP 0.28), RHOG (UP 0.51), DOCK5 (UP 0.57), CRK (UP 0.58) and LIMK2 (UP 0.65). CXCR1 is a receptor for IL8, signaling chemotaxis to nearby neutrophils. RHOG is a small GT- Pase of the Rho family working as switch in signal transducing cascades. Rho proteins promote reorganization of actin cytoskeleton and regulate cell shape. This then en- ables enhanced cell migration, required for immune answer. DOCK5, a member of gua- nine nucleotide exchange factor for small Rho G proteins and interacts with the adapter CRK. They regulate intestinal epithelial cell spreading and migration on collagen IV. Fi- nally, LIMK2 is thought to contribute Rho-induced reorganization of actin cytoskeleton and found downstream in this path. MYD88 (UP 0.51) is involved in Toll-like receptor and IL1 receptor signaling pathways. It can lead to NFκB activation, cytokine secre- tion and inflammatory response. Generally spoken, the expression of these mentioned and other receptor proteins appear important in preparation of requiring regulatory reac- tions. MYD88 is one of the first genes expressed, that can actively lead to inflammation. Toll-like receptors are expressed in an early stage of the immune answer. They play a fundamental role mediating pathogen recognition and in innate immune answer. TLR8 for example, is predominantly expressed in lung and peripheral blood leukocytes. On long terms, TLRs lead to production of ROS, that may also cause NETosis [34][32]. MSRB1 (UP 0.44) belongs to MsrB family, which protects proteins from oxidative stress. Probably, this upregulation happens as precautious measurement against ROS. Similar to NETs, METs are formed by macrophages. Although not quite proven, PADI2, ex- pressed since 0.66, seems to be the main cause for MET formation from macrophages [35]. HCK (UP 0.34) is a tyrosine kinase, particularly expressed in myeloid and B- lymphoid lineages. It may is included in activation of respiratory burst by neutrophils and phagocytes [36], the migration and degranulation of neutrophils. Thus, it may directly or indirectly activate the later upregulated NCF4. It acts downstream of receptors binding Fc region of immune globulins like CSF3R (downregulated at (DN) 0.22 and 0.4). With HCK upregulated at 0.34, it seems that differentiation of monocytes into macrophages is already going on. Maybe, even some of the Bcells were activated and prepare dif- ferentiation as well. FCGR2C (0.35, on) is a low affinity immunglobulin γ Fc receptor, that appears to be involved in phagocytosis, clearing of immune complexes and mod- 22 Chapter 4: Discussion ulation of production in Bcells. Being activated right after HCK supports the assumption, that Bcells are already preparing for differentiation. With FPR1 (UP 0.3), C5AR1 (UP 0.34), CSF2RA (UP 0.48 and 0.51) and FGR (UP 0.46 0.49), it seems that differentiation of monocytes into macrophages occurs during 0.30 and 0.51. FPR1 is a G protein coupled receptor in mammalian phagocytic cells and mediates the response of phagocytic cells to host invasion, which is important for inflammation and host de- fense. C5AR1 is a receptor for the chemotactic, inflammatory peptide anaphylatoxin C5a. anaphylatoxins are released after the early complement activation and C5a itself enhances phagocytosis [29]. CSF2RA is the α-subunit of heterodimeric receptor for CSF2. This receptor controls production, differentiation and function of macrophages and granulocytes. FGR contributes to regulation of immune response including neu- trophil, monocyte, macrophage, mast cell functions, cytoskeleton remodelling, phago- cytosis, cell adhesion and migration. It acts downstream of receptors binding Fc regions of immunglobulins and is involved in several pathways regarding the mentioned regula- tions. Since FGR contributes to a large number of regulatory processes in different cell types, its upregulation at 0.46 and 0.49 marks a more interesting time in CAP progres- sion. Unfortunately, for this protein being detected by different probes and in different metagenes at slightly different times blurres the exact moment. Considering the akku- mulation of genes involved in phagocytosis, neutrophils, monocyte differentiation and antigen presentation, the time from 0.20 until about 0.50 is very likely to cover the in- nate immune answer. With ongoing antigen presentation by dendritic cells and activated macrophages, T- lymphocytes become activated, initiating the adaptive immune answer[29]. Some of the first genes associated with adaptive immune answer are AQP9 (UP 0.39) and BCL6 (UP 0.43). AQP9 may play a role in leukocyte response to pathogens. Since macrophages secrete cytokines for letting fluids from blood vessels pass into the tissue [29], AQP9 may be involved in this process. Further, it appears as first sign of leukocyte activation. BCL6 is a transcriptional repressor, mainly required for antibody affinity maturation and formation of germinal centers in lymph nodes. Enhances Bcell proliferation in re- sponse to Tcell dependent antigens. suppresses macrophage proliferation by compet- ing with STAT5 at STAT-binding motifs. Since monocytes differentiate quite fast into macrophages [29], BCL6 appears to repress the maturation of too many macrophages. Genes with a more direct relation to Tcells were SEMA4A (UP 0.49) and CCDC14 (DN 0.49). SEMA4A is an immunmodulatory membrane protein with a Ig-like C2 type domain and activates Tcell mediated immunity. It also suppresses VEGF mediated en- dothelial cell migration, proliferation and angiogenesis. CCDC14 (DN 0.49) negatively regulates centriole duplication [32]. It is yet the earliest indicator for T cell division and found as downregulated exon in maturing Tcells [37]. One of Chaussabels genesets directly takes care of genes associated with Tcells. The problem hereby was, that most of these genes appear in spot A, meaning that they are all downregulated over time. Activation or Inhibition of gene expression and processes usually requires expressed proteins, whether they are activated or not. Thus, the activation of Tcells could not be done very well. At least, by examination of the downregulated genes within Chaussabels Chapter 4: Discussion 23

T-cell geneset, some times regarding inactivation or possible finished processes can be determined. Tcells usually are activated by dendritic cells or other antigen presenting cells, like macrophages. Bcells need armed effector Tcells in addition to presented antigens. Activation then is followed by differentiation [29]. Main reason for downregu- lation most likely is either, that the protein is in general no longer required or sufficient protein was synthesized. In general, genes from Chaussabels T-cell set range from 0.49 to 0.78 in their switchpoints and show a mean downregulation at 0.61. Given the range of this geneset, the earliest activation of Tcells would be expected at 0.49, which is the same time, when CCDC14 and SEMA4A change their expression patterns. So these three genes most likely mark the activation of Tcells and the following differenti- ation. While antigens are still presented by dendritic cells and, at this point may also by macrophages, Bcells can be activated with the help of the now most likely active T-cells. One small but important regulatory transduction of signals was found around CD27 (DN 0.63) with TRAF5 (DN 0.64) and SIVA1 (DN 0.54 and 0.59). SIVA1 binds with its N-terminal tail to the cytoplasmic antigen tail of CD27, which itself is a recep- tor in Tcells and also assumed to be a receptor in memory Bcells,[38]. SIVA1 induces apoptosis through binding CD27 [32]. Being downregulated before the downregulation of CD27 may is a genetic safety measurement for reduced rate of apoptosis. BCL2 (DN 0.62) would be expected to be overexpressed in cells that underwent apoptosis, but was mostly downregulated instead [39]. PJA1 (DN 0.60 and 0.66) is an ubiquitin ligase and is related to MHC1 mediated antigen processing and presentation. Being downregulated in a later stage could mean, that no further antigen presentation is re- quired. The downregulation of CD2 (DN 0.69), CD96 (DN 0.70), LY9 (DN 0.69 and CD6 (DN 0.71) could then signal an end of Tcell differentiation processes. CD2 is a typical Tcell marker, while CD96, part of the Ig-superfamily, is involved in adhesive interactions of activated Tcells and NK cells [32]. Moreover it is assumed, that CD96 helps present- ing antigens. Considering PJA1 downregulated, the assumption arises, that antigen presentation is conducted by different cell types over time. LY9 is an immunmodulatory receptor, activated by homo- or heterotypic cell-cell-interactions and enables activation and differentiation of immune cells. It therefor plays a role in innate and adaptive immune response. LY9 may act as a negative regulator of immune response by e.g. disabling autoantibody response or negative regulating the development of invariant natural killer cells. The last gene, CD6, is found on mainly T-lymphocytes and required for Tcell acti- vation by adhesion of activated T-lymphocytes. With these genes being downregulated nearly the same time, the main time for differentiation seems over. There still are some genes from Chaussabels Tcell geneset, like LEF1, that are downregulated at 0.72, but are more scattered. Bcell activation and differentiation happens after Tcell activation and requires armed ef- fector Tcells [29]. Since the earliest possible activation of Tcells was dated onto 0.49, The earliest activation of B cells possible would be expected short time afterwards. FCGR2C (UP 0.35), which was mentioned earlier, is one of the first signs in the begin- ning differentiation of Bcells. BRD9 (DN 0.52), BMS1 (DN 0.61) and ACTR (DN 0.67) act as possible markers for stages within Bcell differentiation and activation. BRD9, most 24 Chapter 4: Discussion likely required for chromatin remodelling and transcriptation, would be downregulated after reducing the condensity of chromatine. With reduced condensity, proteinbiosyn- thesis can start, which requires the assembly of ribosomes. BMS1 can by similarity assemble these ribosomes and most likely, mainly antibody production takes place. Af- ter the assembly, BMS1 is no longer required and can be downregulated. Finally, ACTR, associated with transcription coupled nucleotide excision repair and DNA double strand break repair, helps in DNA maintenance [32]. Typically, doube strand breaks of DNA occur during antibody formation [29][40]. Afterwards, ACTR would be downregulated. Assuming, that Chaussabels geneset 3,7_Splicosome represents differentiation of im- mune cells, this process can be determined within 0.39 and 0.65 and the meantime at 0.54. With the finished assumed differentiation time at 0.69 - 0.70 of Bcells, the lat- est switch time of 0.65 within geneset 3,7_Splicosome is not that far away. Similar to the genes associated with Tcells and the respective geneset 2,8_T-cells, these genes are mostly downregulated, therefor mainly positioned in spot A. The same accounts for genes from Chaussabels ribosomal geneset, strongly connected to MHC1 expressing cells. Genes from this geneset are mostly downregulated and positioned in spot A as well. During maturation of Bcells in the bone marrow, each cell ’constructs’ its own variable region of antigens by DNA recombination. The in general same procedure happens in formation of Tcell antigens as well. The activation of naive lymphocytes is followed by enlarging and reducing the density of the chromatin. After 4-5 days, the complete differ- entiation into an effector cell is completed. When no further antigen is presented, most cells underwent apoptosis, leaving only memory cells [29]. A geneset, likely covering these processes is 2,4_Ribosomal proteins, whose transcriptomic switch points range from 0.36 to 0.69, with a mean time at 0.58. The best marker gene from this geneset appeared to be PEBP1 (DN 0.61), which modulates pathways like NFκB, MAPK and GSK-3 [32]. Being downregulated may result in reduced mediating ability. This way, inflammation duration would be reduced. Another candidate appears to be SET (DN 0.59), whose downregulation enables apoptosis induced by cytotoxic T lymphocytes. Depending on the aetiology, the absence of this gene being expressed could mark an important step within the immune answer. During maturation and differentiation of cells, Metabolism of course won’t stand still. Several metabolic genes appeared to be important as well. They show, that in the time of about 0.40 to 0.65 a lot of processes take place, requiring lots of energy and mod- ular substrates. Even if these processes are common in mitotic active cells, the fact, that these genes appeared significant and correlate with pseudotime means, that they plan an essential role. E.g. BOLA3 (DN 0.39 and 0.44) is required for assembling the mitochondrial respiratory chain, while AK3 (DN 0.44) had maintained homeostasis of cellular nucleotides and may is involved in platelet production. Of course, at some time amino acids must been newly synthesized, which can be linked to WDR9 (DN 0.57), part of the subcomplex GATOR2. The protein complex GATOR inhibits mTORC1 and the TORC1 signaling pathway when amino acids are present. The downregulation of WDR9 would lead to activation of mTORC1 and the TORC1 signaling pathway, synthe- Chapter 4: Discussion 25 sizing new amino acids. Two very similar genesets, enriched in spot B, are M 2.6 and M 1.5, ranging from 0.54 to 0.70 (M 2.6) and from 0.19 to 0.65 (M 1.5) They are associated with myeloid lineage, including monocytes, dendritic cells and granulocytes. The ranges of their switch points appears congruent with the time ranges from monocyte, T-cell and B-cell differentiation and maturation, which were discussed beforehand. Again, genes like ASAH1 (UP 0.70), FGD4 (UP 0.70) and ACVR1B (UP 0.65) presumably signal an end of this phase a few pseudotime units earlier. ASAH1 degrades ceramides into sphingosine and fatty acids. Ceramides otherwise would mediate differentiation signals, cell growth, apoptosis and biosynthesis/secretiion of cytokines[41]. So activation of ASAH1 at 0.70 suggests ac- tivated cleanup processes. FGD4 is involved in regulation of actin cytoskeleton and cell shape. Since until now, most immune processes seem over, it may contributes to migrated cells, returning to their original forms. A little earlier, ACVR1B, which is a trans- membrane Ser/Thr kinase activin type 1 receptor, involved in regulation of e.g. wound healing, producing extracellular matrix and immunosuppression (tissue homeostasis), is upregulated. It is activated by ligands of TGFβ -family and BMP’s. Since the expression of this protein contributes to regulation of inflammatory processes and tissue homeosta- sis, it may acts as another marker for driving down inflammatory processes and start with damage repair within the tissue. When trying to describe a disease progression, inflammatory processes are important as well. Chaussabel covers two genesets regarding inflammatory processes, ranging from 0.37 to 0.70 for M 3.2 and from 0.22 to 0.70 for M 3.3. Both genesets are enriched in spot B, therefor contained genes are upregulated over time. Inflammation is the most visible factor for invading pathogens. Both genesets from Chaussabel show the last switch points at 0.70, another result, stating that afterwards inflammatory processes are either finished or driven down. Two interesting genes and even better candidates for marker genes are MAP3K3 (UP 0.47) and MAPKAPK2 (UP 0.60). MAP3K3 is a map kinase, that mediates signal cascade to activation of NFκB transcriptional regulators. Later, BCL3 (UP 0.60) becomes important by participating in an autoregulatory loop for NFκB-signaling and expression. MAPKAPK2 is directly regulated by p38 MAPK and involved in more general processes like inflammatory response, regulation of gene ex- pression and cell proliferation. MYD88 (UP 0.51) binds to the cytosolic tail of TLR4, which activates NFκB signaling. The expression of IRAK3 (UP 0.62), mainly expressed in macrophages and moncytes, intervenes with the TLR signaling and negatively reg- ulates NFκB signaling, which would reduce inflammation. These both genes might inidicate the core time of inflammation. A more general marker in the later phase of inflammation would be CR1 (UP 0.60 and 0.63), a complement activation receptor, is found on erythrocytes, leukocytes glomerular podocytes and splenic follicular dendritic cells. Being upregulated this late could point to a replenish of reactivity answering to complement signaling. A last possibly more important marker would be ALOX5 (UP 0.35), an enzyme forming a precursor of proinflammatory leucotriene A4. For a neu- trophil to end inflammation, lipoxins are necessary as braking signals. In order to do this, neutrophils alter the synthesis of LTB4 to lipoxins. Within the results, only LTB4R 26 Chapter 4: Discussion was found being upregulated at 0.56. Anti-inflammatory secretion of lipoxines would be expected whith initiation of regenerative pathways, after successful defeat of germs. Some marker genes should be mentioned with particular attention: RNF135 (UP 0.59 and 0.63) and TLR4. These markers are associated with specific reactions to invading pathogens and thus some candidates for improvement of laboratory diagnosis. RNF135 acts as an E2 dependent E3 ubiquitin protein ligase, which is involved in innate immune defense against viruses. TLR4 (UP 0.53) reacts most likely to lipopolysaccharides found in most gram-negative bacteria. It acts via MYD88 (UP 0.51) to activate NFκB, cytokine secretion and inflammatory response.Yet, their usage should be tested very carefully. Although being found an appropriate marker, one should consider, that the input data contained all kinds of aetiologic causes mixed together. As long as all remaining sam- ples after pseudotime analysis still contain samples with different detected pathogens (either bacteria, virus or funghi), the quality of pathogen specific marker genes can not be guaranteed. Because of the nature of SOM, meta-/gene profiles of such markers would always contain expression values caused by different pathogens. Finally, some genes should yet be mentioned, marking the pseudotemporal range from 0.71 to 0.79. This intervall represents the last occuring transcriptional changes and isn’t described in total by Chaussabels genesets. They only provided switch points until 0.78, leaving out SNRNP40 (DN 0.73 and 0.79). This protein removes introns from pre-mRNA and was detected twice. It is part of a multiprotein complex from the splicesosome [32]. Being not covered by geneset 3,7_Splicosome may point to being excluded from splico- somal processes for ubiquitin ligases [19]. However, this gene appears too unspecific, to be not part of the general splicesosome. GIMAP5 (DN 0.78) is an antiapoptotic pro- tein of GTP binding superfamily, affecting Tcell survival. By similarity it is assumed, that GIMAP5 may play a role in Tcell quiescence. Since it was downregulated, a fading of T- cells would be expected CCND2 (DN 0.68 and 0.76) belongs to highly conserved cyclin family, which are regulators of CDK kinases. It forms a complex with either CDK4 (DN 0.34) or CDK6 (DN 0.31). With CDK4, CCND2 regulates progression from G1 phase, whereas activity is restricted to G1-S phase. CDK6 activation during the mid-G1 phase is necessary for transition into S-phase. Considering the switch times, CDK4 and CDK6 must have a high persistance against degradation and inhibition in order to be present for the whole time CCND2 is expressed. Most likely, after downregulation of CDK4/6, cell cycle progression could still be provided. After finishing the immune response, CCND2 wouldn’t be required anymore for immune cell division, explaining its downregulation. SPOCK2 (DN 0.74) binds with glycosaminoglycans to form parts of the ECM. Due to its ability to bind Ca besides forming parts of the ECM, SPOCK2 can presumably help in maintenance of tissue homeostasis during inflammation. By downregulation, SPOCK2 would act as another marker for final stages of pathogen invasion. CD3D (DN 0.72) is part of a Tcell receptor and involved in signal transduction and Tcell development. ST6GALNAC3 (UP 0.72) is involved in biosynthesis of gangliosides, which are part of cell membranes. Although cell division processes, which require gangliosides, were performed mostly beforehand, ST6GALNAC3was upregulated afterwards. This means, either a reservoir of gangliosides will be increased or division steps or coming up, where Chapter 4: Discussion 27 gangliosides are more required than in cell division due to differentiation. IL1R1 (UP 0.72) is an important receptor in mediating cytokine induced inflammatory or immune responses. The expression of this receptor protein is not necessarily contradictionary to the assumption of finished immune response at this time. It appears more likely, that this receptor protein is upregulated this late on purpose, to prevent accidental signal transduction with still present cytokines and IL1. ITGB7 (DN 0.71) is an integrin that, when dimerized with ITGA7, mediates lymphocyte migration. Therefor it can be added to markers for shut down of immune response, as well as DOCK10 (DN 0.71). Since this guanosine exchange factor was downregulated, Bcell lymphopoiesis wasn’t sustained anymore [32]. BCL11B (DN 0.71) is a transcriptional repressor and a key regulator of differentiation and survival for thymocyte development. Essential for controling respon- siveness of hematopoietic stem cells to chemotactic signals. With all the respective genes and genesets, a general disease progression could be estimated. The mentioned values thereby act as the most likely times, when one state/phase of the disease progression changes into another. The specific processes occuring in each phase were considered mostly combined, if not already described by Chaussabels genesets in Table 3.4. From 0 to 0.19 presumably the yet unde- tected pathogen invasion takes place, since no transcriptional changes were detected. Complement cascade may just begun, then leading to first transcriptional responses to the detected pathogens. At 0.20 the innate immune answer begins, where anti- gens are detected and presented by dendritic cells. Monocytes begin differentiating into macrophages and at 0.49 the first signs for activated T-cells were seen, marking the transition to adaptive immune response. The hummoral immune answer, typically conducted by B-cells [29], begins short time after at 0.54, when the first signs of acti- vated B-cells were detected. The end steps of CAP appeared more overlapping, since many processes were active. Due to some genesets ’ending’ around 0.70, especially inflammation-, myeloid lineage- and replication associated genesets, this value marks the probable end of immune response. Genes and geneset refering to the time af- terwards appeared to be relevant for breaking up signal transductions in order to shut down inflammatory response and cycles. Also, tissue homeostasis and apoptotic pro- cesses played a role, indicating that tissue, damaged by pathogens, ROS or ischemia, still requires repair and enhanced maintenance. Apoptotic processes were required to reduce the amount of e.g. B-cells, keeping mostly B-memory cells alive [29]. The last transcriptional changes occured up to 0.80. Afterwards no changes were detected. The reason is either due to missing data of the total reecovery process or because the limits of pseudotime calculation were reached. In order to properly identify markers, that apply only to pneumonia itself and not to pathogen invasion in general, the comparison with blood transcriptomics from other dis- eases is highly recommended. 28 Chapter 5: Summary 29

5 Summary

SOFA- and subSOFA scores are helpful regarding description of medical conditions but lack the means for transcriptomic or pseudotemporal description of CAP. The same counts for anamnetic and medical features alone. Those features alone were found to correlate with an absolute value of at most 0.45, far to less for any considerable as- sociation. Although, there exist indications that linear combinations of features might be suitable for description, since combinations of subSOFA scores showed absolute spearman correlation values of at most 0.47. The more appropriate processing how- ever, appeared to be a combination of SOM’s and the concept of pseudotime, using ’oposSOM’ and ’DPT’, the diffusion pseudotime. DPT allowed it, to order the samples pseudotemporal by forming the most likely disease progression and helped uncovering, that the progression of CAP appears to be a linear one. No proof or even clue showed, that CAP somehow would occur with different disease progressions, regardless of the invading pathogen/s. Using oposSOM to reduce the number of genes to 2500 meta- genes greatly improved the determination of pseudotemporal ordering, creating a more sharp trajectory through a more condensed sample space. Obviously, the significance improved as well, comparing about 39% insignificant genes with about 6% insignificant metagenes. Vizualizing significant metagenes, fulfilling the condtions of sufficient corre- lation with pseudotime and adequatly large range of their expression values, showed two anticorrelated spots within the SOM in opposite corners. Reprocessing the genes from these spots under the same conditions, the marker metagenes were obtained, left 1476 possible gene markers, able to describe a general progression of CAP. Unfortunately, geneset analysis with Chaussabels genesets showed, that the disease progression is heavily blood related and that expression behaviour of genes appears to describe only immune processes. Without the comparison of CAP transcriptomics and other disease transcriptomics, there is no statement whether there are marker genes, specific for CAP. For upcoming research, these marker genes should at least be evaluated in clinical trials for linking pseudotime with actual time. A verification for actual application of these 1476 markers and their qualification should be included as well. Only this way a proper en- hancement with reduced costs of laboratory tests can be achieved. Also, the data used in this thesis and the obtained markers should be compared to other diseases. First, for exclusion of markers useable only for sickness in general. Second, to identify those markers, specificly applicable for CAP. Thereby might be an algorithm of help, who en- ables the comparison of SOM. Since the position of metagenes depends on the gene expression within the samples/data, the used data can influence the outcome of SOM training as well. Also, the resulting SOM depends as well on the chosen starting param- eters, which at least can become equalized. 30 Appendix A: Supplementary data 31 32 Appendix A: Supplementary data

Appendix A: Supplementary data

Figure A.1: Starting parameters and sorting of preprocessed/normalized data

1 l i b r a r y (oposSOM)

2 load(’progress_gx.RData’)

3 samples=read.table(’04_samples.txt’ , header=1)

4

5 env< − opossom . new(list (dataset.name< −

6 ’Aug17_CAPSys_all_samples_SOFAs_with_QN_with_Zentr01’,

7 e r r o r .model< − ’all.samples’,

8 dim.1stLvlSom =’auto’,

9 dim.2ndLvlSom = 20,

10 training.extension = 1,

11 rotate.SOM.portraits = 2,

12 flip .SOM.portraits = F,

13 database.dataset =’hsapiens_gene_ensembl’,

14 database.id.type =’illumina_humanht_12_v4’,

15 database.biomart =’ENSEMBL_MART_ENSEMBL’,

16 database.host =’www.ensembl.org’,

17 standard.spot.modules =’dmap’,

18 spot.threshold.samples = 0.65,

19 spot.coresize.modules = 3,

20 spot.threshold.modules = 0.95,

21 spot.coresize.groupmap = 5,

22 spot.threshold.groupmap = 0.75,

23 feature.centralization = T,

24 sample.quantile.normalization = T,

25 activated.modules = list(

26 ’reporting’ = TRUE,

27 ’primary.analysis’ = TRUE,

28 ’sample.similarity.analysis’ = TRUE,

29 ’geneset.analysis’ = TRUE,

30 ’geneset.analysis.exact’ = FALSE,

31 ’group.analysis’ = TRUE,

32 ’difference.analysis’ = TRUE ) ,

33 pairwise .comparison.list= list()

34 )

35 )

36 env$group.labels=’auto’

37 env$indata=gx

38 opossom. run(env) Appendix A: Supplementary data 33

Figure A.2: Definition of function: fetch (sub-)SOFA scores from files

1 getSOFAs< − f u n c t i o n(){

2 sofas< − read.csv(’./04_subsofas.txt’, header=T, sep=’\t’)

3 map< − read.csv(’./04_samples.txt’, header=T, sep=’\t’)

4 map< − na.omit (map) [c(1,2,3,5)]

5 f u l l< − merge(map, sofas , by=’PatID’)

6 i n d i c e s< − as.matrix(full[’EVENT’])

7

8 t a b l e =NULL

9 f o r(i in as.integer(rownames(full))){

10 l i n e< − which(grepl(indices[i], colnames(full)))

11 l i n e< − do.call(c, full[i, line])

12 names(line) = NULL

13 t a b l e< − r bi nd(table, line)

14 }

15 colnames(table)< − colnames(full)[5:10]

16 colnames(table)< − s u b s t r i n g(colnames(table), 9)

17 rownames(table)< − f u l l$sampleID

18 t a b l e< − cbind(table,

19 + map[match(rownames(table), map[,1]) ,][,4])

20 colnames(table)[7]< − ’SOFA’

21 r e t u r n(table)

22 }

Figure A.3: Calculation and visualization of pseudotime

1 l i b r a r y (oposSOM)

2 l i b r a r y(dpt)

3 l i b r a r y(ggplot2)

4 load(’./Aug17_CAPSys_all_samples_SOFAs_with_QN_with_Zentr01. RData’)

5

6 t s< − Transitions(t(env$metadata) )

7 pt< − dpt (ts, branching = T)

8 ev< − eigen(as.matrix(ts@transitions), T)$vectors

9 data< − as.data.frame(ev[, −1])

10 colnames(data)< − paste0 (’DC_’, seq_len(ncol(data)))

11

12 q p l o t (DC_1, DC_2, data=data, colour=pt$Branch)

13 p l o t_dpt(ts, pt, 1:2) 34 Appendix A: Supplementary data

Figure A.4: Function definition for the generalized additively model describing relations be- tween pseudotime and meta-/genes

1 r e q u i r e (VGAM)

2 sign_pt< − f u n c t i o n(data, pseudotime){

3 #data: meta −/gene from oposSOM environment to be tested

4 #pseudotime: calculated DPT, same order as samples in data

5 p . values< − apply(data,1,function(x){

6 f u l l M o d e l< − vgam( x~ sm.ns(pseudotime , df=3),

7 tobit(Lower = log10(0.1), lmu =’identitylink’) )

8 redModel< − vgam( x~1, tobit(Lower=log10(0.1) ,

9 lmu=’identitylink’))

10 l r t< − lrtest (fullModel ,redModel)

11 r e t u r n( lrt@Body[’Pr(>Chisq)’][2,] )

12 })

13 q.values< − p.adjust(p.values , method=’fdr’)

14 r e t u r n(q.values)

15 } Appendix A: Supplementary data 35

Figure A.5: Function for distributing metagenes into separate spots

1 d i s t r_spots< − f u n c t i o n(integers_list){

2 s p o t l i s t< − l i s t()

3 compare< − i n t e g e r s_list

4 growth< − T

5 while(length (compare)!= 0){

6 l c< − length(spotlist)

7 spotlist [[lc+1]]< − compare [ 1 ]

8 compare< − compare[ −1]

9 while(growth == T) {

10 growth< − F

11 f o r (p in compare){

12 f o r (found in spotlist[[lc]]) {

13 i f(abs(p −found) == 1 || abs(p−found ) == 50&& p %i n% spotlist[[lc]] == F){

14 spotlist [[lc]]< − append(spotlist [[ length(spotlist) ] ] , p )

15 growth< − T

16 }

17 }

18 }

19 spotlist [[lc]]< − as.numeric( names(table(spotlist[[lc]])))

20 compare< − compare [!(compare %in% spotlist [[ lc ]]) ]

21 }

22 growth< − T

23 }

24 r e t u r n(spotlist)

25 }

Table A.1: Transcriptional switches of meta-/genes

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1343291 298 ENSG00000116701 NCF2 0.63 0.19 B ILMN_1651228 148 ENSG00000198053 SIRPA 0.62 0.22 B ILMN_1651262 146 ENSG00000163932 PRKCD 0.59 0.22 B ILMN_1651405 147 ENSG00000119535 CSF3R 0.60 0.22 B ILMN_1651557 2101 ENSG00000124787 RPP40 0.61 0.25 A ILMN_1651628 47 ENSG00000163464 CXCR1 0.59 0.28 B ILMN_1651680 2204 ENSG00000047230 CTPS2 0.65 0.28 A ILMN_1651826 48 ENSG00000171051 FPR1 0.62 0.30 B ILMN_1651828 2101 ENSG00000106305 AIMP2 0.61 0.31 A ILMN_1651850 48 ENSG00000008516 MMP25 0.62 0.31 B 36 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1651899 2213 ENSG00000263464 PPIAL4C 0.44 0.31 A ILMN_1652073 2203 ENSG00000105810 CDK6 0.66 0.31 A ILMN_1652085 2161 ENSG00000102054 RBBP7 0.55 0.32 A ILMN_1652185 147 ENSG00000077420 APBB1IP 0.60 0.32 B ILMN_1652333 48 ENSG00000244682 0.62 0.33 B ILMN_1652379 2212 ENSG00000166226 CCT2 0.50 0.34 A ILMN_1652394 2103 ENSG00000135446 CDK4 0.65 0.34 A ILMN_1652594 45 ENSG00000197405 C5AR1 0.56 0.34 B ILMN_1652753 197 ENSG00000254087 LYN 0.61 0.34 B ILMN_1652768 247 ENSG00000181274 FRAT2 0.61 0.34 B ILMN_1653292 47 ENSG00000101336 HCK 0.59 0.34 B ILMN_1653367 46 ENSG00000275990 NCF4 0.57 0.34 B ILMN_1653469 149 ENSG00000275565 ALOX5 0.63 0.35 B ILMN_1653529 46 ENSG00000142405 NLRP12 0.57 0.35 B ILMN_1653618 2103 ENSG00000133142 TCEAL4 0.65 0.35 A ILMN_1653709 46 ENSG00000197249 SERPINA1 0.57 0.35 B ILMN_1653871 50 ENSG00000162551 ALPL 0.65 0.36 B ILMN_1654445 2154 ENSG00000100814 CCNB1IP1 0.65 0.36 A ILMN_1654516 48 ENSG00000274587 LOC107987462 0.62 0.36 B ILMN_1654560 2054 ENSG00000159685 CHCHD6 0.63 0.36 A ILMN_1654630 2213 ENSG00000183527 PSMG1 0.44 0.37 A ILMN_1654685 140 ENSG00000135926 TMBIM1 0.59 0.37 B ILMN_1654690 2259 ENSG00000074657 ZNF532 0.62 0.37 A ILMN_1654737 48 ENSG00000198053 SIRPA 0.62 0.37 B ILMN_1655154 45 ENSG00000141480 ARRB2 0.56 0.37 B ILMN_1655307 2051 ENSG00000163468 CCT3 0.59 0.38 A ILMN_1655497 150 ENSG00000174125 TLR1 0.64 0.38 B ILMN_1655684 2164 ENSG00000163918 RFC4 0.31 0.38 A ILMN_1656129 2001 ENSG00000213619 NDUFS3 0.58 0.38 A ILMN_1656145 2213 ENSG00000213370 0.44 0.38 A ILMN_1656186 100 ENSG00000105835 NAMPT 0.65 0.39 B ILMN_1656287 2261 ENSG00000128059 PPAT 0.59 0.39 A ILMN_1656327 2202 ENSG00000113048 MRPS27 0.66 0.39 A ILMN_1656335 150 ENSG00000103569 AQP9 0.64 0.39 B ILMN_1656399 2309 ENSG00000006451 RALA 0.63 0.39 A ILMN_1656540 45 ENSG00000144711 IQSEC1 0.56 0.39 B ILMN_1656621 48 ENSG00000135842 FAM129A 0.62 0.39 B ILMN_1656662 2164 ENSG00000099901 RANBP1 0.31 0.39 A ILMN_1656818 2163 ENSG00000163170 BOLA3 0.36 0.39 A ILMN_1656934 47 ENSG00000197249 SERPINA1 0.59 0.39 B ILMN_1656962 2162 ENSG00000159063 ALG8 0.45 0.40 A ILMN_1657204 47 ENSG00000119535 CSF3R 0.59 0.40 B Appendix A: Supplementary data 37

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1657612 2101 ENSG00000179958 DCTPP1 0.61 0.40 A ILMN_1657754 2101 ENSG00000145912 NHP2 0.61 0.41 A ILMN_1657864 2163 ENSG00000249193 0.36 0.41 A ILMN_1657868 2304 ENSG00000136261 BZW2 0.66 0.42 A ILMN_1657873 2263 ENSG00000113643 RARS 0.51 0.42 A ILMN_1657892 47 ENSG00000243646 IL10RB 0.59 0.42 B ILMN_1658289 2455 ENSG00000214975 0.65 0.42 A ILMN_1658464 49 ENSG00000143226 FCGR2C 0.64 0.42 B ILMN_1658884 2207 ENSG00000070882 OSBPL3 0.63 0.42 A ILMN_1658928 2110 ENSG00000083123 BCKDHB 0.59 0.42 A ILMN_1659047 2201 ENSG00000128050 PAICS 0.66 0.42 A ILMN_1659058 194 ENSG00000178458 0.61 0.42 B ILMN_1659077 2163 ENSG00000257727 CNPY2 0.36 0.42 A ILMN_1659095 2212 ENSG00000124207 CSE1L 0.50 0.43 A ILMN_1659285 100 ENSG00000151726 ACSL1 0.65 0.43 B ILMN_1659343 2213 ENSG00000108384 RAD51C 0.44 0.43 A ILMN_1659544 46 ENSG00000066336 SPI1 0.57 0.43 B ILMN_1659550 2101 ENSG00000048162 NOP16 0.61 0.43 A ILMN_1659659 96 ENSG00000123405 NFE2 0.58 0.43 B ILMN_1659725 45 ENSG00000275118 MBOAT7 0.56 0.43 B ILMN_1659771 100 ENSG00000113916 BCL6 0.65 0.43 B ILMN_1659976 2211 ENSG00000250471 0.56 0.43 A ILMN_1660021 2305 ENSG00000140391 TSPAN3 0.65 0.43 A ILMN_1660193 2454 ENSG00000213862 0.66 0.43 A ILMN_1660376 2164 ENSG00000110660 SLC35F2 0.31 0.43 A ILMN_1660426 45 ENSG00000066336 SPI1 0.56 0.43 B ILMN_1660577 2262 ENSG00000136824 SMC2 0.56 0.43 A ILMN_1660624 42 ENSG00000186635 ARAP1 0.57 0.43 B ILMN_1660629 96 ENSG00000168610 STAT3 0.58 0.44 B ILMN_1660691 46 ENSG00000198736 MSRB1 0.57 0.44 B ILMN_1660847 2153 ENSG00000164818 DNAAF5 0.65 0.44 A ILMN_1661170 2201 ENSG00000128050 PAICS 0.66 0.44 A ILMN_1661196 2257 ENSG00000147853 AK3 0.63 0.44 A ILMN_1661264 2151 ENSG00000163170 BOLA3 0.64 0.44 A ILMN_1661306 2213 ENSG00000105185 PDCD5 0.44 0.44 A ILMN_1661346 95 ENSG00000130724 CHMP2A 0.57 0.45 B ILMN_1661432 96 ENSG00000160796 NBEAL2 0.58 0.45 B ILMN_1661439 2053 ENSG00000189046 ALKBH2 0.64 0.45 A ILMN_1661695 300 ENSG00000133961 LOC101928143 0.63 0.45 B ILMN_1661886 89 ENSG00000117676 RPS6KA1 0.55 0.45 B ILMN_1661917 248 ENSG00000100330 MTMR3 0.62 0.45 B ILMN_1662049 192 ENSG00000177156 TALDO1 0.64 0.45 B 38 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1662065 46 ENSG00000100365 NCF4 0.57 0.45 B ILMN_1662192 2112 ENSG00000112118 MCM3 0.42 0.45 A ILMN_1662334 2101 ENSG00000145912 NHP2 0.61 0.45 A ILMN_1662364 2355 ENSG00000108561 C1QBP 0.65 0.46 A ILMN_1662417 2212 ENSG00000198056 PRIM1 0.50 0.46 A ILMN_1662524 2310 ENSG00000124767 GLO1 0.62 0.46 A ILMN_1662658 2253 ENSG00000138796 HADH 0.66 0.46 A ILMN_1662839 196 ENSG00000067992 PDK3 0.60 0.46 B ILMN_1662905 2151 ENSG00000182307 C8orf33 0.64 0.46 A ILMN_1662973 2161 ENSG00000213585 VDAC1 0.55 0.46 A ILMN_1663132 98 ENSG00000000938 FGR 0.62 0.46 B ILMN_1663160 2002 ENSG00000163832 ELP6 0.61 0.46 A ILMN_1663422 2151 ENSG00000226950 DANCR 0.64 0.46 A ILMN_1663538 2109 ENSG00000237804 0.61 0.46 A ILMN_1663618 2103 ENSG00000235962 0.65 0.46 A ILMN_1663799 2101 ENSG00000243678 NME2 0.61 0.47 A ILMN_1663954 2356 ENSG00000197498 RPF2 0.63 0.47 A ILMN_1664012 2212 ENSG00000265354 TIMM23 0.50 0.47 A ILMN_1664167 2306 ENSG00000173726 TOMM20 0.64 0.47 A ILMN_1664369 2259 ENSG00000170364 SETMAR 0.62 0.47 A ILMN_1664706 2258 ENSG00000160131 VMA21 0.63 0.47 A ILMN_1665066 2255 ENSG00000218574 0.65 0.47 A ILMN_1665117 2254 ENSG00000124207 CSE1L 0.66 0.47 A ILMN_1665205 2213 ENSG00000154640 BTG3 0.44 0.47 A ILMN_1665217 2155 ENSG00000085415 SEH1L 0.64 0.47 A ILMN_1665483 2163 ENSG00000091483 FH 0.36 0.47 A ILMN_1665761 2304 ENSG00000140743 LOC101060399 0.66 0.47 A ILMN_1665797 2002 ENSG00000074582 BCS1L 0.61 0.47 A ILMN_1665887 2104 ENSG00000214113 LYRM4 0.64 0.47 A ILMN_1665943 2154 ENSG00000133706 LARS 0.65 0.47 A ILMN_1666178 2152 ENSG00000165724 ZMYND19 0.64 0.47 A ILMN_1666399 2059 ENSG00000158079 PTPDC1 0.60 0.47 A ILMN_1666444 2051 ENSG00000171307 ZDHHC16 0.59 0.47 A ILMN_1666635 44 ENSG00000124126 PREX1 0.57 0.47 B ILMN_1666932 45 ENSG00000198909 MAP3K3 0.56 0.47 B ILMN_1667050 2203 ENSG00000103319 LOC101930123 0.66 0.47 A ILMN_1667081 49 ENSG00000135636 DYSF 0.64 0.47 B ILMN_1667222 2051 ENSG00000124562 SNRPC 0.59 0.47 A ILMN_1667418 2151 ENSG00000224156 TUBB 0.64 0.47 A ILMN_1667449 2059 ENSG00000106477 CEP41 0.60 0.47 A ILMN_1667510 2008 ENSG00000196284 SUPT3H 0.60 0.47 A ILMN_1667519 2101 ENSG00000174547 MRPL11 0.61 0.47 A Appendix A: Supplementary data 39

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1667577 2102 ENSG00000135624 CCT7 0.62 0.47 A ILMN_1667883 2105 ENSG00000214022 REPIN1 0.64 0.47 A ILMN_1667925 48 ENSG00000272556 0.62 0.47 B ILMN_1668090 2356 ENSG00000213619 NDUFS3 0.63 0.47 A ILMN_1668540 47 ENSG00000279072 0.59 0.47 B ILMN_1668865 47 ENSG00000237632 0.59 0.47 B ILMN_1668996 2207 ENSG00000058056 USP13 0.63 0.47 A ILMN_1669070 2305 ENSG00000229534 0.65 0.47 A ILMN_1669180 2260 ENSG00000155189 AGPAT5 0.61 0.48 A ILMN_1669317 2209 ENSG00000037474 NSUN2 0.62 0.48 A ILMN_1669394 2205 ENSG00000148187 MRRF 0.64 0.48 A ILMN_1669484 46 ENSG00000182511 FES 0.57 0.48 B ILMN_1669635 2112 ENSG00000167881 SRP68 0.42 0.48 A ILMN_1669696 100 ENSG00000101916 TLR8 0.65 0.48 B ILMN_1669927 2201 ENSG00000178035 IMPDH2 0.66 0.48 A ILMN_1670272 90 ENSG00000277111 PLEKHM1 0.57 0.48 B ILMN_1670302 45 ENSG00000101236 RNF24 0.56 0.48 B ILMN_1670420 2101 ENSG00000166557 TMED3 0.61 0.48 A ILMN_1670518 2101 ENSG00000100092 SH3BP1 0.61 0.48 A ILMN_1670723 44 ENSG00000177885 GRB2 0.57 0.48 B ILMN_1670807 96 ENSG00000167470 MIDN 0.58 0.48 B ILMN_1670901 2203 ENSG00000178921 PFAS 0.66 0.48 A ILMN_1670931 2251 ENSG00000179222 MAGED1 0.67 0.48 A ILMN_1671217 2205 ENSG00000198690 FAN1 0.64 0.48 A ILMN_1671257 2110 ENSG00000131778 CHD1L 0.59 0.48 A ILMN_1671314 2201 ENSG00000235036 0.66 0.48 A ILMN_1671554 2101 ENSG00000171960 PPIH 0.61 0.48 A ILMN_1671568 46 ENSG00000134686 PHC2 0.57 0.48 B ILMN_1671911 47 ENSG00000137642 SORL1 0.59 0.48 B ILMN_1671932 250 ENSG00000163162 RNF149 0.64 0.48 B ILMN_1672042 198 ENSG00000107738 VSIR 0.61 0.48 B ILMN_1672114 2212 ENSG00000117450 PRDX1 0.50 0.48 A ILMN_1672174 46 ENSG00000198223 CSF2RA 0.57 0.48 B ILMN_1672446 2054 ENSG00000217835 0.63 0.48 A ILMN_1672755 2306 ENSG00000173726 TOMM20 0.64 0.48 A ILMN_1672834 2356 ENSG00000227176 0.63 0.48 A ILMN_1673111 148 ENSG00000176788 BASP1 0.62 0.49 B ILMN_1673252 47 ENSG00000151948 GLT1D1 0.59 0.49 B ILMN_1673586 2254 ENSG00000130826 DKC1 0.66 0.49 A ILMN_1673711 2454 ENSG00000215492 0.66 0.49 A ILMN_1673738 2306 ENSG00000175455 CCDC14 0.64 0.49 A ILMN_1673820 2101 ENSG00000148300 REXO4 0.61 0.49 A 40 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1673941 2212 ENSG00000262473 GART 0.50 0.49 A ILMN_1673962 2411 ENSG00000276371 FAM60A 0.68 0.49 A ILMN_1674160 192 ENSG00000002919 SNX11 0.64 0.49 B ILMN_1674302 147 ENSG00000173757 STAT5B 0.60 0.49 B ILMN_1674698 2051 ENSG00000167085 PHB 0.59 0.49 A ILMN_1674703 198 ENSG00000172216 CEBPB 0.61 0.49 B ILMN_1674983 195 ENSG00000159840 ZYX 0.61 0.49 B ILMN_1675460 47 ENSG00000196189 SEMA4A 0.59 0.49 B ILMN_1675523 2106 ENSG00000196155 PLEKHG4 0.63 0.49 A ILMN_1675721 2152 ENSG00000168906 MAT2A 0.64 0.49 A ILMN_1675866 197 ENSG00000100731 PCNX1 0.61 0.49 B ILMN_1676026 2201 ENSG00000213730 0.66 0.49 A ILMN_1676091 2304 ENSG00000278845 MRPL45 0.66 0.49 A ILMN_1676254 249 ENSG00000161533 ACOX1 0.63 0.49 B ILMN_1676358 147 ENSG00000169180 XPO6 0.60 0.49 B ILMN_1676515 2151 ENSG00000167747 C19orf48 0.64 0.49 A ILMN_1676548 193 ENSG00000213246 SUPT4H1 0.62 0.49 B ILMN_1676846 2201 ENSG00000167325 RRM1 0.66 0.49 A ILMN_1676946 2010 ENSG00000132676 DAP3 0.59 0.49 A ILMN_1677239 48 ENSG00000000938 FGR 0.62 0.49 B ILMN_1677452 2001 ENSG00000143222 UFC1 0.58 0.49 A ILMN_1677768 2213 ENSG00000004779 NDUFAB1 0.44 0.49 A ILMN_1677919 2152 ENSG00000100814 CCNB1IP1 0.64 0.49 A ILMN_1677953 2009 ENSG00000273784 0.59 0.49 A ILMN_1677962 2101 ENSG00000145912 NHP2 0.61 0.49 A ILMN_1678087 2214 ENSG00000198856 OSTC 0.36 0.49 A ILMN_1678140 2404 ENSG00000229417 0.66 0.49 A ILMN_1678494 2151 ENSG00000037897 METTL1 0.64 0.49 A ILMN_1678775 248 ENSG00000169891 REPS2 0.62 0.50 B ILMN_1678934 200 ENSG00000166900 STX3 0.64 0.50 B ILMN_1679071 2052 ENSG00000077348 EXOSC5 0.61 0.50 A ILMN_1679133 2053 ENSG00000126756 UXT 0.64 0.50 A ILMN_1679134 147 ENSG00000139370 SLC15A4 0.60 0.50 B ILMN_1679177 2112 ENSG00000169857 AVEN 0.42 0.50 A ILMN_1679185 246 ENSG00000189067 LITAF 0.62 0.50 B ILMN_1679209 2001 ENSG00000128626 MRPS12 0.58 0.50 A ILMN_1679382 2051 ENSG00000100216 TOMM22 0.59 0.50 A ILMN_1679476 46 ENSG00000254470 AP5B1 0.57 0.50 B ILMN_1679555 47 ENSG00000275736 OSCAR 0.59 0.50 B ILMN_1679731 2203 ENSG00000136045 PWP1 0.66 0.50 A ILMN_1679796 2203 ENSG00000133142 TCEAL4 0.66 0.50 A ILMN_1679949 97 ENSG00000129355 CDKN2D 0.59 0.50 B Appendix A: Supplementary data 41

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1680129 45 ENSG00000035862 TIMP2 0.56 0.50 B ILMN_1680208 99 ENSG00000142166 IFNAR1 0.64 0.50 B ILMN_1680437 200 ENSG00000130066 SAT1 0.64 0.50 B ILMN_1680618 46 ENSG00000142405 NLRP12 0.57 0.50 B ILMN_1680675 2103 ENSG00000091732 ZC3HC1 0.65 0.50 A ILMN_1680996 2151 ENSG00000117395 EBNA1BP2 0.64 0.50 A ILMN_1681520 2051 ENSG00000169689 CENPX 0.59 0.50 A ILMN_1681846 2363 ENSG00000156162 DPY19L4 0.64 0.50 A ILMN_1682404 2156 ENSG00000112394 SLC16A10 0.63 0.50 A ILMN_1682694 2260 ENSG00000137513 NARS2 0.61 0.50 A ILMN_1682792 195 ENSG00000148841 ITPRIP 0.61 0.50 B ILMN_1682873 150 ENSG00000197208 SLC22A4 0.64 0.50 B ILMN_1683120 2210 ENSG00000049883 PTCD2 0.60 0.50 A ILMN_1683250 2257 ENSG00000133818 RRAS2 0.63 0.50 A ILMN_1683475 99 ENSG00000073331 ALPK1 0.64 0.50 B ILMN_1683498 2354 ENSG00000213412 0.66 0.50 A ILMN_1683597 2059 ENSG00000262333 0.60 0.50 A ILMN_1683664 47 ENSG00000279072 0.59 0.50 B ILMN_1683817 2405 ENSG00000263353 PPIAL4A 0.65 0.50 A ILMN_1683950 50 ENSG00000274148 LILRA6 0.65 0.50 B ILMN_1684034 197 ENSG00000114268 PFKFB4 0.61 0.51 B ILMN_1684054 2301 ENSG00000005448 WDR54 0.68 0.51 A ILMN_1684210 2101 ENSG00000169627 LOC107984053 0.61 0.51 A ILMN_1684258 46 ENSG00000198223 CSF2RA 0.57 0.51 B ILMN_1684289 248 ENSG00000174007 CEP19 0.62 0.51 B ILMN_1684321 2359 ENSG00000140395 WDR61 0.64 0.51 A ILMN_1684585 2201 ENSG00000167543 TP53I13 0.66 0.51 A ILMN_1684943 2060 ENSG00000027001 MIPEP 0.60 0.51 A ILMN_1685005 2161 ENSG00000149547 EI24 0.55 0.51 A ILMN_1685009 2054 ENSG00000125450 NUP85 0.63 0.51 A ILMN_1685088 294 ENSG00000100938 GMPR2 0.65 0.51 B ILMN_1685289 2303 ENSG00000137054 POLR1E 0.67 0.51 A ILMN_1685413 2101 ENSG00000112578 BYSL 0.61 0.51 A ILMN_1685661 2405 ENSG00000225185 0.65 0.51 A ILMN_1685722 2154 ENSG00000156471 PTDSS1 0.65 0.51 A ILMN_1685824 46 ENSG00000196923 PDLIM7 0.57 0.51 B ILMN_1685854 43 ENSG00000173638 SLC19A1 0.57 0.51 B ILMN_1686135 2156 ENSG00000110944 IL23A 0.63 0.51 A ILMN_1686367 92 ENSG00000180901 KCTD2 0.59 0.51 B ILMN_1686555 2161 ENSG00000078668 VDAC3 0.55 0.51 A ILMN_1686954 43 ENSG00000100599 RIN3 0.57 0.51 B ILMN_1687080 46 ENSG00000172936 MYD88 0.57 0.51 B 42 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1687538 45 ENSG00000177105 RHOG 0.56 0.51 B ILMN_1687609 2152 ENSG00000143621 ILF2 0.64 0.51 A ILMN_1687921 300 ENSG00000174130 TLR6 0.63 0.51 B ILMN_1687922 96 ENSG00000163191 S100A11 0.58 0.51 B ILMN_1688089 200 ENSG00000137393 RNF144B 0.64 0.51 B ILMN_1688098 2308 ENSG00000169964 TMEM42 0.63 0.51 A ILMN_1688127 2156 ENSG00000168792 ABHD15 0.63 0.51 A ILMN_1688299 2408 ENSG00000129197 RPAIN 0.61 0.51 A ILMN_1688526 47 ENSG00000272645 0.59 0.51 B ILMN_1688639 47 ENSG00000181444 ZNF467 0.59 0.51 B ILMN_1688749 2204 ENSG00000198131 ZNF544 0.65 0.51 A ILMN_1688753 2101 ENSG00000270617 URGCP-MRPS24 0.61 0.51 A ILMN_1688853 2206 ENSG00000109920 FNBP4 0.64 0.51 A ILMN_1688959 2254 ENSG00000113048 MRPS27 0.66 0.51 A ILMN_1688971 2062 ENSG00000167862 MRPL58 0.42 0.51 A ILMN_1689001 1851 ENSG00000184047 DIABLO 0.58 0.51 A ILMN_1689097 92 ENSG00000101150 TPD52L2 0.59 0.51 B ILMN_1689110 46 ENSG00000100330 MTMR3 0.57 0.51 B ILMN_1689445 2155 ENSG00000037757 MRI1 0.64 0.51 A ILMN_1689652 295 ENSG00000240809 0.64 0.51 B ILMN_1689836 2405 ENSG00000217094 0.65 0.51 A ILMN_1689953 2404 ENSG00000229887 0.66 0.51 A ILMN_1690063 196 ENSG00000148841 ITPRIP 0.60 0.51 B ILMN_1690125 2213 ENSG00000213370 0.44 0.51 A ILMN_1690252 1952 ENSG00000028310 BRD9 0.61 0.52 A ILMN_1690386 2104 ENSG00000136891 TEX10 0.64 0.52 A ILMN_1690494 150 ENSG00000101916 TLR8 0.64 0.52 B ILMN_1690546 2357 ENSG00000122034 GTF3A 0.63 0.52 A ILMN_1690586 2159 ENSG00000084092 NOA1 0.62 0.52 A ILMN_1690625 2051 ENSG00000100347 SAMM50 0.59 0.52 A ILMN_1691117 2311 ENSG00000071794 HLTF 0.61 0.52 A ILMN_1691341 100 ENSG00000169896 ITGAM 0.65 0.52 B ILMN_1691567 2453 ENSG00000136111 TBC1D4 0.68 0.52 A ILMN_1691843 2355 ENSG00000163882 POLR2H 0.65 0.52 A ILMN_1692225 2052 ENSG00000205937 RNPS1 0.61 0.52 A ILMN_1692620 2153 ENSG00000107815 TWNK 0.65 0.52 A ILMN_1692651 2207 ENSG00000050405 LIMA1 0.63 0.52 A ILMN_1693014 147 ENSG00000166188 ZNF319 0.60 0.52 B ILMN_1693108 50 ENSG00000282607 MGAM 0.65 0.52 B ILMN_1693145 2262 ENSG00000138709 LARP1B 0.56 0.52 A ILMN_1693650 197 ENSG00000120709 FAM53C 0.61 0.52 B ILMN_1693664 2463 ENSG00000176055 MBLAC2 0.65 0.52 A Appendix A: Supplementary data 43

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1694106 2257 ENSG00000198522 GPN1 0.63 0.52 A ILMN_1694233 2158 ENSG00000185875 THNSL1 0.62 0.52 A ILMN_1694398 2202 ENSG00000198242 RPL23A 0.66 0.52 A ILMN_1694479 2366 ENSG00000151135 TMEM263 0.57 0.52 A ILMN_1694502 1851 ENSG00000149600 COMMD7 0.58 0.52 A ILMN_1694548 196 ENSG00000160785 SLC25A44 0.60 0.52 B ILMN_1694587 2058 ENSG00000116984 MTR 0.61 0.52 A ILMN_1694589 2101 ENSG00000180992 MRPL14 0.61 0.52 A ILMN_1694686 2101 ENSG00000196419 XRCC6 0.61 0.52 A ILMN_1695025 46 ENSG00000102921 N4BP1 0.57 0.52 B ILMN_1695034 2454 ENSG00000214223 HNRNPA1P10 0.66 0.52 A ILMN_1695157 2161 ENSG00000152234 ATP5A1 0.55 0.52 A ILMN_1695261 250 ENSG00000215114 UBXN2B 0.64 0.52 B ILMN_1695386 2255 ENSG00000188242 PP7080 0.65 0.52 A ILMN_1695598 2455 ENSG00000245910 0.65 0.52 A ILMN_1695821 2162 ENSG00000197451 HNRNPAB 0.45 0.53 A ILMN_1695962 49 ENSG00000077238 IL4R 0.64 0.53 B ILMN_1696046 2101 ENSG00000169253 0.61 0.53 A ILMN_1696187 2213 ENSG00000035141 FAM136A 0.44 0.53 A ILMN_1696407 2161 ENSG00000243667 WDR92 0.55 0.53 A ILMN_1696463 248 ENSG00000141298 SSH2 0.62 0.53 B ILMN_1696466 196 ENSG00000274129 TSEN34 0.60 0.53 B ILMN_1696828 49 ENSG00000141934 PLPP2 0.64 0.53 B ILMN_1697348 200 ENSG00000144118 RALB 0.64 0.53 B ILMN_1697469 2206 ENSG00000171817 ZNF540 0.64 0.53 A ILMN_1697493 146 ENSG00000067182 TNFRSF1A 0.59 0.53 B ILMN_1697736 45 ENSG00000187534 0.56 0.53 B ILMN_1698258 200 ENSG00000136869 TLR4 0.64 0.53 B ILMN_1698463 200 ENSG00000165030 NFIL3 0.64 0.53 B ILMN_1698728 50 ENSG00000187554 TLR5 0.65 0.53 B ILMN_1698940 2001 ENSG00000089053 ANAPC5 0.58 0.53 A ILMN_1698996 145 ENSG00000100503 NIN 0.58 0.53 B ILMN_1699160 2261 ENSG00000112996 MRPS30 0.59 0.53 A ILMN_1699496 1951 ENSG00000139546 TARBP2 0.58 0.53 A ILMN_1699521 2355 ENSG00000108559 NUP88 0.65 0.53 A ILMN_1700306 2103 ENSG00000280071 LOC102724023 0.65 0.53 A ILMN_1700584 147 ENSG00000135766 EGLN1 0.60 0.53 B ILMN_1700628 2111 ENSG00000170779 CDCA4 0.54 0.53 A ILMN_1700660 2155 ENSG00000050426 LETMD1 0.64 0.53 A ILMN_1700674 2208 ENSG00000164024 METAP1 0.63 0.53 A ILMN_1700855 2152 ENSG00000110104 CCDC86 0.64 0.53 A ILMN_1700915 2101 ENSG00000110717 NDUFS8 0.61 0.53 A 44 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1701243 2206 ENSG00000059588 TARBP1 0.64 0.53 A ILMN_1701374 2209 ENSG00000115504 EHBP1 0.62 0.53 A ILMN_1701482 50 ENSG00000171236 LRG1 0.65 0.53 B ILMN_1701603 2210 ENSG00000083720 OXCT1 0.60 0.53 A ILMN_1701731 2212 ENSG00000137168 PPIL1 0.50 0.53 A ILMN_1701837 2106 ENSG00000152465 NMT2 0.63 0.53 A ILMN_1701854 46 ENSG00000171223 JUNB 0.57 0.53 B ILMN_1701875 2454 ENSG00000240370 0.66 0.53 A ILMN_1702009 2311 ENSG00000086475 SEPHS1 0.61 0.53 A ILMN_1702177 2162 ENSG00000101868 POLA1 0.45 0.53 A ILMN_1702301 2262 ENSG00000112110 MRPL18 0.56 0.53 A ILMN_1702396 2108 ENSG00000067113 PLPP1 0.61 0.53 A ILMN_1702787 2151 ENSG00000249590 0.64 0.53 A ILMN_1702806 2304 ENSG00000144381 HSPD1 0.66 0.53 A ILMN_1703279 44 ENSG00000132510 KDM6B 0.57 0.53 B ILMN_1703433 147 ENSG00000169891 REPS2 0.60 0.53 B ILMN_1703471 2254 ENSG00000051596 THOC3 0.66 0.53 A ILMN_1703524 299 ENSG00000188895 MSL1 0.63 0.53 B ILMN_1703565 2156 ENSG00000162521 RBBP4 0.63 0.53 A ILMN_1703697 2262 ENSG00000081307 UBA5 0.56 0.53 A ILMN_1704055 2411 ENSG00000196950 SLC39A10 0.68 0.54 A ILMN_1704315 2061 ENSG00000175792 RUVBL1 0.54 0.54 A ILMN_1704369 44 ENSG00000172354 GNB2 0.57 0.54 B ILMN_1705047 2256 ENSG00000133818 RRAS2 0.64 0.54 A ILMN_1705093 1801 ENSG00000115694 STK25 0.60 0.54 A ILMN_1705114 2053 ENSG00000178252 WDR6 0.64 0.54 A ILMN_1705151 2110 ENSG00000171723 GPHN 0.59 0.54 A ILMN_1705213 2251 ENSG00000125648 SLC25A23 0.67 0.54 A ILMN_1705515 2151 ENSG00000130204 TOMM40 0.64 0.54 A ILMN_1705594 2208 ENSG00000103018 CYB5B 0.63 0.54 A ILMN_1705737 100 ENSG00000158470 B4GALT5 0.65 0.54 B ILMN_1705743 2453 ENSG00000206228 0.68 0.54 A ILMN_1705871 2151 ENSG00000085998 POMGNT1 0.64 0.54 A ILMN_1705892 2456 ENSG00000114942 EEF1B2 0.63 0.54 A ILMN_1706149 2001 ENSG00000184990 SIVA1 0.58 0.54 A ILMN_1706217 49 ENSG00000145491 ROPN1L 0.64 0.54 B ILMN_1706238 2251 ENSG00000135372 NAT10 0.67 0.54 A ILMN_1706304 97 ENSG00000132589 FLOT2 0.59 0.54 B ILMN_1706357 2012 ENSG00000150456 EEF1AKMT1 0.43 0.54 A ILMN_1706583 348 ENSG00000235082 0.64 0.54 B ILMN_1706859 149 ENSG00000139178 C1RL 0.63 0.54 B ILMN_1707077 147 ENSG00000101307 SIRPB1 0.60 0.54 B Appendix A: Supplementary data 45

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1707156 2156 ENSG00000176225 RTTN 0.63 0.54 A ILMN_1707312 48 ENSG00000268500 0.62 0.54 B ILMN_1707326 2257 ENSG00000154174 TOMM70 0.63 0.54 A ILMN_1707339 296 ENSG00000108306 FBXL20 0.63 0.54 B ILMN_1707534 50 ENSG00000112053 SLC26A8 0.65 0.54 B ILMN_1708537 2101 ENSG00000172053 QARS 0.61 0.54 A ILMN_1708619 199 ENSG00000100485 SOS2 0.63 0.54 B ILMN_1708906 46 ENSG00000196663 TECPR2 0.57 0.54 B ILMN_1709549 144 ENSG00000068323 TFE3 0.59 0.54 B ILMN_1710216 2455 ENSG00000213228 0.65 0.54 A ILMN_1710514 2103 ENSG00000143436 MRPL9 0.65 0.54 A ILMN_1710885 147 ENSG00000162065 TBC1D24 0.60 0.54 B ILMN_1711023 2257 ENSG00000105849 TWISTNB 0.63 0.54 A ILMN_1711030 2202 ENSG00000136628 EPRS 0.66 0.54 A ILMN_1711361 2011 ENSG00000104980 TIMM44 0.54 0.54 A ILMN_1711414 2305 ENSG00000096092 TMEM14A 0.65 0.54 A ILMN_1711543 2310 ENSG00000138050 THUMPD2 0.62 0.54 A ILMN_1711573 2157 ENSG00000126870 WDR60 0.62 0.54 A ILMN_1711786 46 ENSG00000178719 GRINA 0.57 0.54 B ILMN_1711878 2408 ENSG00000165512 ZNF22 0.61 0.54 A ILMN_1712088 144 ENSG00000123810 B9D2 0.59 0.54 B ILMN_1712155 2259 ENSG00000165209 STRBP 0.62 0.54 A ILMN_1712236 46 ENSG00000064932 SBNO2 0.57 0.54 B ILMN_1712390 2154 ENSG00000160299 PCNT 0.65 0.54 A ILMN_1712431 2153 ENSG00000115866 DARS 0.65 0.54 A ILMN_1712452 2253 ENSG00000234012 MDC1 0.66 0.54 A ILMN_1712636 2102 ENSG00000232956 0.62 0.54 A ILMN_1712944 2051 ENSG00000183978 COA3 0.59 0.54 A ILMN_1712999 2360 ENSG00000162851 TFB2M 0.65 0.54 A ILMN_1713086 144 ENSG00000100284 TOM1 0.59 0.54 B ILMN_1713143 2405 ENSG00000164346 NSA2 0.65 0.54 A ILMN_1713189 2408 ENSG00000165512 ZNF22 0.61 0.54 A ILMN_1713369 300 ENSG00000133812 SBF2 0.63 0.54 B ILMN_1713934 2307 ENSG00000113460 BRIX1 0.64 0.54 A ILMN_1714444 2361 ENSG00000137601 NEK1 0.66 0.54 A ILMN_1714515 2260 ENSG00000149100 EIF3M 0.61 0.54 A ILMN_1714623 2306 ENSG00000197933 ZNF823 0.64 0.54 A ILMN_1714700 2202 ENSG00000059573 ALDH18A1 0.66 0.54 A ILMN_1714737 195 ENSG00000013563 DNASE1L1 0.61 0.54 B ILMN_1715068 45 ENSG00000166579 NDEL1 0.56 0.54 B ILMN_1715173 2104 ENSG00000138674 SEC31A 0.64 0.54 A ILMN_1715179 2211 ENSG00000068438 FTSJ1 0.56 0.54 A 46 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1715181 297 ENSG00000156675 RAB11FIP1 0.63 0.54 B ILMN_1715384 2454 ENSG00000213755 0.66 0.54 A ILMN_1715603 2307 ENSG00000225357 0.64 0.54 A ILMN_1715926 2151 ENSG00000145912 NHP2 0.64 0.54 A ILMN_1715947 2354 ENSG00000227638 0.66 0.54 A ILMN_1716105 47 ENSG00000135083 CCNJL 0.59 0.54 B ILMN_1716564 47 ENSG00000227080 0.59 0.54 B ILMN_1716730 47 ENSG00000279072 0.59 0.54 B ILMN_1717207 2302 ENSG00000148019 CEP78 0.68 0.54 A ILMN_1717594 2454 ENSG00000232054 0.66 0.54 A ILMN_1717636 2207 ENSG00000198040 ZNF84 0.63 0.54 A ILMN_1717809 2455 ENSG00000186468 RPS23 0.65 0.55 A ILMN_1717855 149 ENSG00000175471 MCTP1 0.63 0.55 B ILMN_1717868 2055 ENSG00000187601 MAGEH1 0.63 0.55 A ILMN_1718069 2151 ENSG00000090861 AARS 0.64 0.55 A ILMN_1718070 40 ENSG00000171302 CANT1 0.55 0.55 B ILMN_1718132 95 ENSG00000197324 LRP10 0.57 0.55 B ILMN_1718672 2061 ENSG00000167130 DOLPP1 0.54 0.55 A ILMN_1718900 2201 ENSG00000096384 HSP90AB1 0.66 0.55 A ILMN_1719316 2456 ENSG00000235552 0.63 0.55 A ILMN_1719392 2313 ENSG00000081307 UBA5 0.58 0.55 A ILMN_1719403 2207 ENSG00000258890 CEP95 0.63 0.55 A ILMN_1719656 1956 ENSG00000006194 ZNF263 0.59 0.55 A ILMN_1719661 149 ENSG00000100504 PYGL 0.63 0.55 B ILMN_1719906 2259 ENSG00000126698 DNAJC8 0.62 0.55 A ILMN_1719986 2305 ENSG00000115365 LANCL1 0.65 0.55 A ILMN_1720053 2110 ENSG00000113068 PFDN1 0.59 0.55 A ILMN_1720124 2051 ENSG00000127884 ECHS1 0.59 0.55 A ILMN_1720158 2260 ENSG00000083093 PALB2 0.61 0.55 A ILMN_1720270 2451 ENSG00000184613 NELL2 0.72 0.55 A ILMN_1720442 2260 ENSG00000184900 SUMO3 0.61 0.55 A ILMN_1720531 2157 ENSG00000168661 ZNF30 0.62 0.55 A ILMN_1720708 2204 ENSG00000037757 MRI1 0.65 0.55 A ILMN_1720745 2357 ENSG00000144895 EIF2A 0.63 0.55 A ILMN_1721138 2404 ENSG00000148303 RPL7A 0.66 0.55 A ILMN_1721563 2255 ENSG00000107937 GTPBP4 0.65 0.55 A ILMN_1721978 50 ENSG00000007237 GAS7 0.65 0.55 B ILMN_1722066 2109 ENSG00000103356 EARS2 0.61 0.55 A ILMN_1722218 50 ENSG00000183019 MCEMP1 0.65 0.55 B ILMN_1722292 143 ENSG00000078902 TOLLIP 0.60 0.55 B ILMN_1722491 2051 ENSG00000134684 YARS 0.59 0.55 A ILMN_1722838 2101 ENSG00000105135 ILVBL 0.61 0.55 A Appendix A: Supplementary data 47

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1722981 248 ENSG00000182831 C16orf72 0.62 0.55 B ILMN_1723158 2006 ENSG00000125485 DDX31 0.61 0.55 A ILMN_1723177 2251 ENSG00000184787 UBE2G2 0.67 0.55 A ILMN_1723436 2255 ENSG00000148229 POLE3 0.65 0.55 A ILMN_1723662 50 ENSG00000140563 MCTP2 0.65 0.55 B ILMN_1723729 2201 ENSG00000262788 UTP4 0.66 0.55 A ILMN_1723793 2357 ENSG00000234608 MAPKAPK5-AS1 0.63 0.55 A ILMN_1723874 2154 ENSG00000165689 SDCCAG3 0.65 0.55 A ILMN_1724009 246 ENSG00000165006 UBAP1 0.62 0.55 B ILMN_1724236 2209 ENSG00000132313 MRPL35 0.62 0.55 A ILMN_1724489 2157 ENSG00000089775 ZBTB25 0.62 0.55 A ILMN_1724668 2153 ENSG00000198276 UCKL1 0.65 0.55 A ILMN_1724822 2303 ENSG00000117868 ESYT2 0.67 0.55 A ILMN_1725175 2161 ENSG00000020426 MNAT1 0.55 0.55 A ILMN_1725417 100 ENSG00000163235 TGFA 0.65 0.55 B ILMN_1725471 2405 ENSG00000137720 C11orf1 0.65 0.55 A ILMN_1725642 2158 ENSG00000167840 ZNF232 0.62 0.55 A ILMN_1726169 2055 ENSG00000130177 CDC16 0.63 0.55 A ILMN_1726222 2056 ENSG00000156239 N6AMT1 0.62 0.55 A ILMN_1726392 2011 ENSG00000113734 BNIP1 0.54 0.55 A ILMN_1726520 2362 ENSG00000089022 MAPKAPK5 0.65 0.55 A ILMN_1726547 2409 ENSG00000092208 GEMIN2 0.62 0.55 A ILMN_1726743 2203 ENSG00000114544 SLC41A3 0.66 0.55 A ILMN_1726981 2003 ENSG00000087087 SRRT 0.63 0.55 A ILMN_1727134 347 ENSG00000278943 0.64 0.55 B ILMN_1727184 195 ENSG00000141526 SLC16A3 0.61 0.55 B ILMN_1728132 247 ENSG00000274129 TSEN34 0.61 0.55 B ILMN_1728228 2267 ENSG00000171311 EXOSC1 0.40 0.55 A ILMN_1728230 42 ENSG00000102882 MAPK3 0.57 0.55 B ILMN_1728605 2201 ENSG00000109534 GAR1 0.66 0.55 A ILMN_1728830 2056 ENSG00000114982 KANSL3 0.62 0.55 A ILMN_1729533 2453 ENSG00000233380 0.68 0.55 A ILMN_1729767 2354 ENSG00000237804 0.66 0.55 A ILMN_1729816 2456 ENSG00000241352 0.63 0.55 A ILMN_1730077 2158 ENSG00000198040 ZNF84 0.62 0.55 A ILMN_1730082 2354 ENSG00000255717 0.66 0.55 A ILMN_1730101 47 ENSG00000279072 0.59 0.55 B ILMN_1730260 95 ENSG00000271412 0.57 0.55 B ILMN_1730622 2456 ENSG00000230897 0.63 0.55 A ILMN_1730631 2455 ENSG00000232341 0.65 0.55 A ILMN_1731014 2454 ENSG00000213621 0.66 0.55 A ILMN_1731358 2455 ENSG00000116251 RPL22 0.65 0.55 A 48 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1731568 46 ENSG00000100266 PACSIN2 0.57 0.55 B ILMN_1731589 295 ENSG00000120656 TAF12 0.64 0.56 B ILMN_1731644 2256 ENSG00000100575 TIMM9 0.64 0.56 A ILMN_1731736 2163 ENSG00000120053 GOT1 0.36 0.56 A ILMN_1732080 198 ENSG00000143622 RIT1 0.61 0.56 B ILMN_1732089 93 ENSG00000168591 TMUB2 0.59 0.56 B ILMN_1732176 297 ENSG00000168461 RAB31 0.63 0.56 B ILMN_1732216 2151 ENSG00000182199 SHMT2 0.64 0.56 A ILMN_1732328 2355 ENSG00000225695 0.65 0.56 A ILMN_1732537 2205 ENSG00000138095 LRPPRC 0.64 0.56 A ILMN_1732923 2356 ENSG00000111142 METAP2 0.63 0.56 A ILMN_1733110 2004 ENSG00000182979 MTA1 0.62 0.56 A ILMN_1733288 2053 ENSG00000164610 RP9 0.64 0.56 A ILMN_1733305 46 ENSG00000197818 SLC9A8 0.57 0.56 B ILMN_1733390 2360 ENSG00000163281 GNPDA2 0.65 0.56 A ILMN_1733421 50 ENSG00000138772 ANXA3 0.65 0.56 B ILMN_1733443 2312 ENSG00000161547 SRSF2 0.60 0.56 A ILMN_1733616 2151 ENSG00000071462 BUD23 0.64 0.56 A ILMN_1733667 247 ENSG00000100266 PACSIN2 0.61 0.56 B ILMN_1733696 2255 ENSG00000164815 ORC5 0.65 0.56 A ILMN_1733956 46 ENSG00000069399 BCL3 0.57 0.56 B ILMN_1733997 2154 ENSG00000125246 CLYBL 0.65 0.56 A ILMN_1734312 2104 ENSG00000143748 NVL 0.64 0.56 A ILMN_1734486 2203 ENSG00000131876 SNRPA1 0.66 0.56 A ILMN_1734810 2157 ENSG00000174684 B4GAT1 0.62 0.56 A ILMN_1734826 2404 ENSG00000220157 0.66 0.56 A ILMN_1734867 2354 ENSG00000171490 RSL1D1 0.66 0.56 A ILMN_1735143 2155 ENSG00000103037 SETD6 0.64 0.56 A ILMN_1735360 47 ENSG00000276805 0.59 0.56 B ILMN_1735495 50 ENSG00000198814 GK 0.65 0.56 B ILMN_1735552 197 ENSG00000197442 MAP3K5 0.61 0.56 B ILMN_1735658 2101 ENSG00000163382 NAXE 0.61 0.56 A ILMN_1736180 143 ENSG00000169692 AGPAT2 0.60 0.56 B ILMN_1736242 45 ENSG00000165879 FRAT1 0.56 0.56 B ILMN_1736441 2310 ENSG00000163104 SMARCAD1 0.62 0.56 A ILMN_1736500 2109 ENSG00000116455 WDR77 0.61 0.56 A ILMN_1736814 2206 ENSG00000135108 FBXO21 0.64 0.56 A ILMN_1736929 46 ENSG00000213903 LTB4R 0.57 0.56 B ILMN_1737074 2105 ENSG00000148468 FAM171A1 0.64 0.56 A ILMN_1737084 2160 ENSG00000111361 EIF2B1 0.59 0.56 A ILMN_1737157 2207 ENSG00000112941 PAPD7 0.63 0.56 A ILMN_1737298 48 ENSG00000188786 MTF1 0.62 0.56 B Appendix A: Supplementary data 49

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1737314 2258 ENSG00000181038 METTL23 0.63 0.56 A ILMN_1737398 2203 ENSG00000171453 POLR1C 0.66 0.56 A ILMN_1737574 2356 ENSG00000137038 TMEM261 0.63 0.56 A ILMN_1737588 2456 ENSG00000220583 0.63 0.56 A ILMN_1737627 2251 ENSG00000007255 TRAPPC6A 0.67 0.56 A ILMN_1737705 2308 ENSG00000096872 IFT74 0.63 0.56 A ILMN_1737738 195 ENSG00000082701 GSK3B 0.61 0.56 B ILMN_1738220 45 ENSG00000165879 FRAT1 0.56 0.56 B ILMN_1738356 2105 ENSG00000119723 COQ6 0.64 0.56 A ILMN_1738383 2252 ENSG00000282171 LINC02210 0.67 0.56 A ILMN_1738523 2251 ENSG00000120896 SORBS3 0.67 0.56 A ILMN_1738529 2358 ENSG00000133641 C12orf29 0.62 0.56 A ILMN_1738656 1952 ENSG00000100335 MIEF1 0.61 0.56 A ILMN_1739257 49 ENSG00000129450 SIGLEC9 0.64 0.56 B ILMN_1739274 50 ENSG00000182885 ADGRG3 0.65 0.56 B ILMN_1739641 2201 ENSG00000125166 GOT2 0.66 0.56 A ILMN_1739792 2306 ENSG00000066651 TRMT11 0.64 0.56 A ILMN_1739847 2455 ENSG00000137154 RPS6 0.65 0.56 A ILMN_1740010 2453 ENSG00000111678 C12orf57 0.68 0.56 A ILMN_1740160 49 ENSG00000121060 TRIM25 0.64 0.56 B ILMN_1740170 2403 ENSG00000260615 0.68 0.56 A ILMN_1740298 2260 ENSG00000111581 NUP107 0.61 0.56 A ILMN_1740490 2353 ENSG00000069275 NUCKS1 0.67 0.56 A ILMN_1740493 2309 ENSG00000167635 ZNF146 0.63 0.56 A ILMN_1740749 2311 ENSG00000162851 TFB2M 0.61 0.56 A ILMN_1740819 2151 ENSG00000149823 VPS51 0.64 0.56 A ILMN_1740864 2209 ENSG00000166881 NEMP1 0.62 0.56 A ILMN_1740927 150 ENSG00000111261 MANSC1 0.64 0.56 B ILMN_1741200 198 ENSG00000117151 CTBS 0.61 0.56 B ILMN_1741917 2151 ENSG00000226492 CUTA 0.64 0.56 A ILMN_1741976 2354 ENSG00000100823 APEX1 0.66 0.56 A ILMN_1742166 2206 ENSG00000082512 TRAF5 0.64 0.56 A ILMN_1742238 200 ENSG00000155926 SLA 0.64 0.56 B ILMN_1742521 2053 ENSG00000160818 GPATCH4 0.64 0.56 A ILMN_1742577 2311 ENSG00000114062 UBE3A 0.61 0.56 A ILMN_1742798 2455 ENSG00000234728 C6orf48 0.65 0.56 A ILMN_1742889 47 ENSG00000085117 CD82 0.59 0.56 B ILMN_1743049 50 ENSG00000198814 GK 0.65 0.56 B ILMN_1743065 2454 ENSG00000167526 RPL13 0.66 0.56 A ILMN_1743933 2454 ENSG00000224261 0.66 0.56 A ILMN_1744068 2455 ENSG00000213216 0.65 0.56 A ILMN_1744347 2455 ENSG00000146677 0.65 0.56 A 50 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1744508 2451 ENSG00000197457 STMN3 0.72 0.56 A ILMN_1745110 2455 ENSG00000224032 0.65 0.56 A ILMN_1745112 2307 ENSG00000181827 RFX7 0.64 0.56 A ILMN_1745152 2103 ENSG00000109534 GAR1 0.65 0.56 A ILMN_1745172 2455 ENSG00000233565 0.65 0.56 A ILMN_1745887 2403 ENSG00000219470 0.68 0.56 A ILMN_1745994 2455 ENSG00000240376 0.65 0.56 A ILMN_1746252 2256 ENSG00000178202 KDELC2 0.64 0.57 A ILMN_1746393 2456 ENSG00000235065 0.63 0.57 A ILMN_1746408 2107 ENSG00000122484 RPAP2 0.62 0.57 A ILMN_1746426 2102 ENSG00000135390 ATP5G2 0.62 0.57 A ILMN_1746457 2051 ENSG00000166136 NDUFB8 0.59 0.57 A ILMN_1746485 2354 ENSG00000100823 APEX1 0.66 0.57 A ILMN_1746565 2057 ENSG00000198105 ZNF248 0.61 0.57 A ILMN_1746588 2003 ENSG00000133597 ADCK2 0.63 0.57 A ILMN_1746686 2102 ENSG00000125901 MRPS26 0.62 0.57 A ILMN_1746784 2453 ENSG00000163519 TRAT1 0.68 0.57 A ILMN_1747251 2409 ENSG00000173418 NAA20 0.62 0.57 A ILMN_1748018 2304 ENSG00000154473 BUB3 0.66 0.57 A ILMN_1748476 2101 ENSG00000065268 WDR18 0.61 0.57 A ILMN_1748625 147 ENSG00000146828 SLC12A9 0.60 0.57 B ILMN_1748650 44 ENSG00000169220 RGS14 0.57 0.57 B ILMN_1748770 2410 ENSG00000168283 BMI1 0.65 0.57 A ILMN_1748883 2201 ENSG00000183431 SF3A3 0.66 0.57 A ILMN_1748894 2205 ENSG00000085788 DDHD2 0.64 0.57 A ILMN_1749078 197 ENSG00000134698 AGO4 0.61 0.57 B ILMN_1749287 200 ENSG00000093167 LRRFIP2 0.64 0.57 B ILMN_1749868 2109 ENSG00000089123 TASP1 0.61 0.57 A ILMN_1749892 198 ENSG00000168214 RBPJ 0.61 0.57 B ILMN_1750008 93 ENSG00000156639 ZFAND3 0.59 0.57 B ILMN_1750101 140 ENSG00000135956 TMEM127 0.59 0.57 B ILMN_1750158 45 ENSG00000169891 REPS2 0.56 0.57 B ILMN_1750429 2455 ENSG00000116251 RPL22 0.65 0.57 A ILMN_1751072 50 ENSG00000116991 SIPA1L2 0.65 0.57 B ILMN_1751143 2161 ENSG00000184752 NDUFA12 0.55 0.57 A ILMN_1751378 2261 ENSG00000168291 PDHB 0.59 0.57 A ILMN_1751400 2103 ENSG00000100353 EIF3D 0.65 0.57 A ILMN_1751589 2255 ENSG00000084090 STARD7 0.65 0.57 A ILMN_1752249 2202 ENSG00000171453 POLR1C 0.66 0.57 A ILMN_1752269 2409 ENSG00000170584 NUDCD2 0.62 0.57 A ILMN_1752285 145 ENSG00000147459 DOCK5 0.58 0.57 B ILMN_1752394 2259 ENSG00000115946 PNO1 0.62 0.57 A Appendix A: Supplementary data 51

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1752455 2310 ENSG00000132341 0.62 0.57 A ILMN_1752526 2155 ENSG00000078070 MCCC1 0.64 0.57 A ILMN_1752923 2201 ENSG00000100138 SNU13 0.66 0.57 A ILMN_1753183 2159 ENSG00000083123 BCKDHB 0.62 0.57 A ILMN_1753249 2258 ENSG00000085721 RRN3 0.63 0.57 A ILMN_1753342 49 ENSG00000182885 ADGRG3 0.64 0.57 B ILMN_1753534 50 ENSG00000097033 SH3GLB1 0.65 0.57 B ILMN_1753607 198 ENSG00000100030 MAPK1 0.61 0.57 B ILMN_1753639 197 ENSG00000140199 SLC12A6 0.61 0.57 B ILMN_1753716 2251 ENSG00000254206 NPIPB11 0.67 0.57 A ILMN_1753745 2211 ENSG00000005022 SLC25A5 0.56 0.57 A ILMN_1754149 2102 ENSG00000146066 HIGD2A 0.62 0.57 A ILMN_1754234 198 ENSG00000221869 CEBPD 0.61 0.57 B ILMN_1754489 2158 ENSG00000050426 LETMD1 0.62 0.57 A ILMN_1754865 95 ENSG00000101152 DNAJC5 0.57 0.57 B ILMN_1754990 2006 ENSG00000103091 WDR59 0.61 0.57 A ILMN_1755235 2209 ENSG00000124784 RIOK1 0.62 0.57 A ILMN_1755364 2260 ENSG00000071994 PDCD2 0.61 0.57 A ILMN_1755391 248 ENSG00000100330 MTMR3 0.62 0.57 B ILMN_1755843 2355 ENSG00000180817 PPA1 0.65 0.57 A ILMN_1755862 2201 ENSG00000112118 MCM3 0.66 0.57 A ILMN_1756220 49 ENSG00000182885 ADGRG3 0.64 0.57 B ILMN_1756355 193 ENSG00000131061 ZNF341 0.62 0.57 B ILMN_1756501 2054 ENSG00000187713 TMEM203 0.63 0.57 A ILMN_1756793 2105 ENSG00000187713 TMEM203 0.64 0.57 A ILMN_1757129 2305 ENSG00000115365 LANCL1 0.65 0.57 A ILMN_1757262 2255 ENSG00000145220 LYAR 0.65 0.57 A ILMN_1757317 2457 ENSG00000114942 EEF1B2 0.62 0.57 A ILMN_1757384 2056 ENSG00000156521 TYSND1 0.62 0.57 A ILMN_1757408 2202 ENSG00000132768 DPH2 0.66 0.57 A ILMN_1757627 2201 ENSG00000112667 DNPH1 0.66 0.57 A ILMN_1757730 2105 ENSG00000185252 ZNF74 0.64 0.57 A ILMN_1758146 2053 ENSG00000147955 SIGMAR1 0.64 0.57 A ILMN_1758371 2360 ENSG00000257043 0.65 0.57 A ILMN_1758474 2454 ENSG00000233487 0.66 0.57 A ILMN_1758735 2454 ENSG00000244722 0.66 0.57 A ILMN_1758750 2306 ENSG00000213592 0.64 0.57 A ILMN_1758827 2456 ENSG00000214736 TOMM6 0.63 0.57 A ILMN_1759184 2211 ENSG00000179889 LOC102724985 0.56 0.57 A ILMN_1759419 1952 ENSG00000266066 0.61 0.57 A ILMN_1760174 2453 ENSG00000139239 0.68 0.57 A ILMN_1760201 2403 ENSG00000236459 0.68 0.57 A 52 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1760245 2454 ENSG00000240522 0.66 0.57 A ILMN_1760280 2304 ENSG00000124383 MPHOSPH10 0.66 0.58 A ILMN_1761031 2252 ENSG00000141560 FN3KRP 0.67 0.58 A ILMN_1761131 2308 ENSG00000172340 SUCLG2 0.63 0.58 A ILMN_1761159 2455 ENSG00000219932 0.65 0.58 A ILMN_1761479 2411 ENSG00000036549 ZZZ3 0.68 0.58 A ILMN_1761566 2201 ENSG00000142230 SAE1 0.66 0.58 A ILMN_1762167 194 ENSG00000206489 PPP1R10 0.61 0.58 B ILMN_1762713 2151 ENSG00000160917 CPSF4 0.64 0.58 A ILMN_1762725 48 ENSG00000223654 FLOT1 0.62 0.58 B ILMN_1762747 2258 ENSG00000254004 ZNF260 0.63 0.58 A ILMN_1762899 244 ENSG00000232258 TMEM114 0.64 0.58 B ILMN_1763080 195 ENSG00000100296 THOC5 0.61 0.58 B ILMN_1763460 2109 ENSG00000244119 0.61 0.58 A ILMN_1763568 50 ENSG00000160883 HK3 0.65 0.58 B ILMN_1763640 2209 ENSG00000121892 PDS5A 0.62 0.58 A ILMN_1763641 2401 ENSG00000134324 LPIN1 0.71 0.58 A ILMN_1763828 2204 ENSG00000243725 TTC4 0.65 0.58 A ILMN_1763875 2403 ENSG00000249353 0.68 0.58 A ILMN_1764166 2304 ENSG00000136997 MYC 0.66 0.58 A ILMN_1764323 2251 ENSG00000076248 UNG 0.67 0.58 A ILMN_1764414 50 ENSG00000233461 0.65 0.58 B ILMN_1764456 2107 ENSG00000117602 RCAN3 0.62 0.58 A ILMN_1764549 194 ENSG00000101457 DNTTIP1 0.61 0.58 B ILMN_1764721 2053 ENSG00000113716 HMGXB3 0.64 0.58 A ILMN_1764826 2205 ENSG00000130713 EXOSC2 0.64 0.58 A ILMN_1764851 2153 ENSG00000115539 PDCL3 0.65 0.58 A ILMN_1764871 2310 ENSG00000226478 0.62 0.58 A ILMN_1765109 2451 ENSG00000183918 SH2D1A 0.72 0.58 A ILMN_1765401 2362 ENSG00000150768 DLAT 0.65 0.58 A ILMN_1765523 2259 ENSG00000186416 NKRF 0.62 0.58 A ILMN_1765880 2054 ENSG00000158435 CNOT11 0.63 0.58 A ILMN_1765941 200 ENSG00000173281 PPP1R3B 0.64 0.58 B ILMN_1766010 2358 ENSG00000138182 KIF20B 0.62 0.58 A ILMN_1766045 2458 ENSG00000114686 MRPL3 0.61 0.58 A ILMN_1766125 195 ENSG00000141551 CSNK1D 0.61 0.58 B ILMN_1766245 2360 ENSG00000164284 GRPEL2 0.65 0.58 A ILMN_1766435 2307 ENSG00000259494 MRPL46 0.64 0.58 A ILMN_1767193 47 ENSG00000180549 FUT7 0.59 0.58 B ILMN_1767219 2253 ENSG00000196305 IARS 0.66 0.58 A ILMN_1767320 345 ENSG00000185973 TMLHE 0.65 0.58 B ILMN_1767360 48 ENSG00000184106 0.62 0.58 B Appendix A: Supplementary data 53

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1767811 46 ENSG00000112195 TREML2 0.57 0.58 B ILMN_1767816 2310 ENSG00000136527 TRA2B 0.62 0.58 A ILMN_1767960 2203 ENSG00000074696 HACD3 0.66 0.58 A ILMN_1767992 99 ENSG00000121297 TSHZ3 0.64 0.58 B ILMN_1768077 394 ENSG00000068697 LAPTM4A 0.71 0.58 B ILMN_1768127 2207 ENSG00000154743 TSEN2 0.63 0.58 A ILMN_1768251 2109 ENSG00000115207 GTF3C2 0.61 0.58 A ILMN_1768311 2151 ENSG00000144867 SRPRB 0.64 0.58 A ILMN_1768958 2101 ENSG00000154930 ACSS1 0.61 0.58 A ILMN_1769092 2253 ENSG00000132661 NXT1 0.66 0.58 A ILMN_1769383 2256 ENSG00000185808 PIGP 0.64 0.58 A ILMN_1769390 49 ENSG00000103005 USB1 0.64 0.58 B ILMN_1769409 2107 ENSG00000083817 ZNF416 0.62 0.58 A ILMN_1769449 243 ENSG00000103111 MON1B 0.65 0.58 B ILMN_1769451 2261 ENSG00000107949 BCCIP 0.59 0.58 A ILMN_1769545 2201 ENSG00000178605 GTPBP6 0.66 0.58 A ILMN_1769634 2155 ENSG00000092201 SUPT16H 0.64 0.58 A ILMN_1769637 2054 ENSG00000108773 KAT2A 0.63 0.58 A ILMN_1769886 2160 ENSG00000080839 RBL1 0.59 0.58 A ILMN_1769911 2257 ENSG00000147419 CCDC25 0.63 0.58 A ILMN_1770053 2051 ENSG00000160803 UBQLN4 0.59 0.58 A ILMN_1770206 2311 ENSG00000136243 NUPL2 0.61 0.58 A ILMN_1770339 2001 ENSG00000128185 DGCR6L 0.58 0.58 A ILMN_1770356 2454 ENSG00000237793 0.66 0.58 A ILMN_1770641 195 ENSG00000167193 CRK 0.61 0.58 B ILMN_1770768 100 ENSG00000163235 TGFA 0.65 0.58 B ILMN_1771003 2260 ENSG00000135521 LTV1 0.61 0.58 A ILMN_1771139 2003 ENSG00000115241 PPM1G 0.63 0.58 A ILMN_1771223 2059 ENSG00000151353 TMEM18 0.60 0.58 A ILMN_1771593 2258 ENSG00000134146 DPH6 0.63 0.58 A ILMN_1771651 2403 ENSG00000198034 RPS4X 0.68 0.58 A ILMN_1771689 2305 ENSG00000075336 TIMM21 0.65 0.58 A ILMN_1771734 2357 ENSG00000173145 NOC3L 0.63 0.58 A ILMN_1771966 2256 ENSG00000011295 TTC19 0.64 0.58 A ILMN_1772131 248 ENSG00000100030 MAPK1 0.62 0.58 B ILMN_1772302 2201 ENSG00000176978 DPP7 0.66 0.58 A ILMN_1772316 2053 ENSG00000055950 MRPL43 0.64 0.58 A ILMN_1772329 2002 ENSG00000168569 TMEM223 0.61 0.58 A ILMN_1772645 345 ENSG00000205352 PRR13 0.65 0.58 B ILMN_1772713 293 ENSG00000165861 ZFYVE1 0.67 0.58 B ILMN_1772719 146 ENSG00000189159 JPT1 0.59 0.58 B ILMN_1772876 2107 ENSG00000119599 DCAF4 0.62 0.58 A 54 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1772888 50 ENSG00000112053 SLC26A8 0.65 0.58 B ILMN_1773407 297 ENSG00000178458 0.63 0.58 B ILMN_1773716 2454 ENSG00000235776 0.66 0.58 A ILMN_1773760 2454 ENSG00000139239 0.66 0.58 A ILMN_1774062 2454 ENSG00000234589 0.66 0.58 A ILMN_1774334 2454 ENSG00000226360 0.66 0.58 A ILMN_1774547 2258 ENSG00000166881 NEMP1 0.63 0.58 A ILMN_1774844 149 ENSG00000146067 FAM193B 0.63 0.59 B ILMN_1774890 2158 ENSG00000119401 TRIM32 0.62 0.59 A ILMN_1775243 2053 ENSG00000011304 PTBP1 0.64 0.59 A ILMN_1775522 2455 ENSG00000236976 0.65 0.59 A ILMN_1775542 2008 ENSG00000089163 SIRT4 0.60 0.59 A ILMN_1775573 194 ENSG00000173171 MTX1 0.61 0.59 B ILMN_1775677 194 ENSG00000134830 C5AR2 0.61 0.59 B ILMN_1775703 2402 ENSG00000168672 FAM84B 0.71 0.59 A ILMN_1776080 2204 ENSG00000006695 COX10 0.65 0.59 A ILMN_1776147 2405 ENSG00000224993 0.65 0.59 A ILMN_1776260 145 ENSG00000106348 IMPDH1 0.58 0.59 B ILMN_1776487 2303 ENSG00000070718 AP3M2 0.67 0.59 A ILMN_1776582 2256 ENSG00000177034 MTX3 0.64 0.59 A ILMN_1776679 97 ENSG00000062282 DGAT2 0.59 0.59 B ILMN_1776993 2304 ENSG00000130935 NOL11 0.66 0.59 A ILMN_1777066 2307 ENSG00000119640 ACYP1 0.64 0.59 A ILMN_1777139 2354 ENSG00000175390 EIF3F 0.66 0.59 A ILMN_1777342 2306 ENSG00000124193 SRSF6 0.64 0.59 A ILMN_1777449 2301 ENSG00000232931 0.68 0.59 A ILMN_1777519 245 ENSG00000181481 RNF135 0.63 0.59 B ILMN_1777740 2404 ENSG00000212932 0.66 0.59 A ILMN_1777811 2257 ENSG00000147231 CXorf57 0.63 0.59 A ILMN_1777853 144 ENSG00000173171 MTX1 0.59 0.59 B ILMN_1778111 40 ENSG00000127663 KDM4B 0.55 0.59 B ILMN_1778173 2151 ENSG00000198931 APRT 0.64 0.59 A ILMN_1778734 2201 ENSG00000111641 NOP2 0.66 0.59 A ILMN_1779010 146 ENSG00000075426 FOSL2 0.59 0.59 B ILMN_1779015 2001 ENSG00000107223 EDF1 0.58 0.59 A ILMN_1779376 2257 ENSG00000188785 ZNF548 0.63 0.59 A ILMN_1779886 2155 ENSG00000163026 WDCP 0.64 0.59 A ILMN_1780197 2309 ENSG00000136169 SETDB2 0.63 0.59 A ILMN_1780368 2251 ENSG00000099849 RASSF7 0.67 0.59 A ILMN_1780546 2401 ENSG00000124181 PLCG1 0.71 0.59 A ILMN_1781060 2155 ENSG00000181007 ZFP82 0.64 0.59 A ILMN_1781121 2305 ENSG00000119335 SET 0.65 0.59 A Appendix A: Supplementary data 55

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1781155 2257 ENSG00000139737 SLAIN1 0.63 0.59 A ILMN_1781174 2355 ENSG00000055044 NOP58 0.65 0.59 A ILMN_1781198 2202 ENSG00000175216 CKAP5 0.66 0.59 A ILMN_1781386 2205 ENSG00000178105 DDX10 0.64 0.59 A ILMN_1781416 2357 ENSG00000111906 HDDC2 0.63 0.59 A ILMN_1781516 2304 ENSG00000088205 DDX18 0.66 0.59 A ILMN_1781680 2058 ENSG00000172315 TP53RK 0.61 0.59 A ILMN_1781721 2002 ENSG00000099821 POLRMT 0.61 0.59 A ILMN_1781996 2451 ENSG00000146021 KLHL3 0.72 0.59 A ILMN_1782034 2053 ENSG00000156521 TYSND1 0.64 0.59 A ILMN_1782050 2356 ENSG00000196290 NIF3L1 0.63 0.59 A ILMN_1782247 295 ENSG00000135315 CEP162 0.64 0.59 B ILMN_1782504 2059 ENSG00000198205 ZXDA 0.60 0.59 A ILMN_1782579 2161 ENSG00000132305 IMMT 0.55 0.59 A ILMN_1782688 2307 ENSG00000155438 NIFK 0.64 0.59 A ILMN_1782743 2404 ENSG00000188846 RPL14 0.66 0.59 A ILMN_1782745 2305 ENSG00000126226 PCID2 0.65 0.59 A ILMN_1782938 2151 ENSG00000130520 LSM4 0.64 0.59 A ILMN_1783546 2456 ENSG00000225573 0.63 0.59 A ILMN_1783695 2207 ENSG00000198556 ZNF789 0.63 0.59 A ILMN_1783985 49 ENSG00000132589 FLOT2 0.64 0.59 B ILMN_1784031 45 ENSG00000165886 UBTD1 0.56 0.59 B ILMN_1784428 2458 ENSG00000008324 SS18L2 0.61 0.59 A ILMN_1784467 150 ENSG00000096063 SRPK1 0.64 0.59 B ILMN_1784785 2359 ENSG00000109184 DCUN1D4 0.64 0.59 A ILMN_1784884 45 ENSG00000177674 AGTRAP 0.56 0.59 B ILMN_1785005 2157 ENSG00000275835 TUBGCP5 0.62 0.59 A ILMN_1785095 148 ENSG00000058799 YIPF1 0.62 0.59 B ILMN_1785161 2056 ENSG00000166669 ATF7IP2 0.62 0.59 A ILMN_1785179 50 ENSG00000087157 PGS1 0.65 0.59 B ILMN_1785191 2361 ENSG00000136243 NUPL2 0.66 0.59 A ILMN_1785198 2410 ENSG00000036549 ZZZ3 0.65 0.59 A ILMN_1785202 2156 ENSG00000138161 CUZD1 0.63 0.59 A ILMN_1785424 1955 ENSG00000155229 MMS19 0.60 0.59 A ILMN_1785795 2353 ENSG00000109452 INPP4B 0.67 0.59 A ILMN_1786016 2358 ENSG00000240857 RDH14 0.62 0.59 A ILMN_1786024 2351 ENSG00000124243 BCAS4 0.69 0.59 A ILMN_1786046 2205 ENSG00000130177 CDC16 0.64 0.59 A ILMN_1786050 2456 ENSG00000125743 SNRPD2 0.63 0.59 A ILMN_1786189 2411 ENSG00000215256 DHRS4-AS1 0.68 0.59 A ILMN_1786242 2403 ENSG00000213851 0.68 0.59 A ILMN_1786658 2455 ENSG00000224631 0.65 0.59 A 56 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1787511 2455 ENSG00000235552 0.65 0.59 A ILMN_1788024 2454 ENSG00000226327 0.66 0.59 A ILMN_1788099 2404 ENSG00000232938 RPL23AP87 0.66 0.59 A ILMN_1788213 2155 ENSG00000186687 LYRM7 0.64 0.59 A ILMN_1788607 2454 ENSG00000240418 0.66 0.59 A ILMN_1788742 2153 ENSG00000275945 0.65 0.59 A ILMN_1788768 2051 ENSG00000184990 SIVA1 0.59 0.59 A ILMN_1789001 2353 ENSG00000154511 FAM69A 0.67 0.59 A ILMN_1789171 346 ENSG00000104388 RAB2A 0.64 0.60 B ILMN_1789266 2056 ENSG00000133065 SLC41A1 0.62 0.60 A ILMN_1789338 49 ENSG00000272196 HIST2H2AA4 0.64 0.60 B ILMN_1789349 2455 ENSG00000244002 0.65 0.60 A ILMN_1789364 2206 ENSG00000186020 ZNF529 0.64 0.60 A ILMN_1789492 2151 ENSG00000230202 0.64 0.60 A ILMN_1789616 2352 ENSG00000125246 CLYBL 0.69 0.60 A ILMN_1789642 2053 ENSG00000231115 RING1 0.64 0.60 A ILMN_1789809 2155 ENSG00000130921 C12orf65 0.64 0.60 A ILMN_1790549 50 ENSG00000173262 SLC2A14 0.65 0.60 B ILMN_1790781 96 ENSG00000131389 SLC6A6 0.58 0.60 B ILMN_1791328 2204 ENSG00000085511 MAP3K4 0.65 0.60 A ILMN_1791388 50 ENSG00000183621 ZNF438 0.65 0.60 B ILMN_1791396 2253 ENSG00000113013 HSPA9 0.66 0.60 A ILMN_1791771 2307 ENSG00000067533 RRP15 0.64 0.60 A ILMN_1791896 2309 ENSG00000182700 IGIP 0.63 0.60 A ILMN_1791925 2101 ENSG00000171861 MRM3 0.61 0.60 A ILMN_1792671 2111 ENSG00000175792 RUVBL1 0.54 0.60 A ILMN_1792681 198 ENSG00000135365 PHF21A 0.61 0.60 B ILMN_1792682 50 ENSG00000278563 MGAM2 0.65 0.60 B ILMN_1793203 2251 ENSG00000085662 AKR1B1 0.67 0.60 A ILMN_1793290 246 ENSG00000174021 GNG5 0.62 0.60 B ILMN_1793651 2201 ENSG00000126249 PDCD2L 0.66 0.60 A ILMN_1793743 2309 ENSG00000139620 KANSL2 0.63 0.60 A ILMN_1794132 2355 ENSG00000144713 RPL32 0.65 0.60 A ILMN_1794165 2211 ENSG00000156261 CCT8 0.56 0.60 A ILMN_1794213 2051 ENSG00000204316 MRPL38 0.59 0.60 A ILMN_1794260 2258 ENSG00000114503 NCBP2 0.63 0.60 A ILMN_1794914 2104 ENSG00000042088 TDP1 0.64 0.60 A ILMN_1795158 2301 ENSG00000173511 VEGFB 0.68 0.60 A ILMN_1795236 242 ENSG00000198873 GRK5 0.66 0.60 B ILMN_1795428 2058 ENSG00000181191 PJA1 0.61 0.60 A ILMN_1795467 50 ENSG00000054523 KIF1B 0.65 0.60 B ILMN_1796138 48 ENSG00000089351 GRAMD1A 0.62 0.60 B Appendix A: Supplementary data 57

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1796210 2353 ENSG00000074696 HACD3 0.67 0.60 A ILMN_1796235 348 ENSG00000111647 UHRF1BP1L 0.64 0.60 B ILMN_1796407 2455 ENSG00000223718 0.65 0.60 A ILMN_1796490 2454 ENSG00000215905 0.66 0.60 A ILMN_1796642 2451 ENSG00000167106 FAM102A 0.72 0.60 A ILMN_1796762 2409 ENSG00000135185 TMEM243 0.62 0.60 A ILMN_1796976 2453 ENSG00000015171 ZMYND11 0.68 0.60 A ILMN_1797181 2258 ENSG00000102931 ARL2BP 0.63 0.60 A ILMN_1797332 2158 ENSG00000168795 ZBTB5 0.62 0.60 A ILMN_1797534 50 ENSG00000120306 CYSTM1 0.65 0.60 B ILMN_1797684 2360 ENSG00000114062 UBE3A 0.65 0.60 A ILMN_1797903 148 ENSG00000138613 APH1B 0.62 0.60 B ILMN_1798256 2060 ENSG00000119977 TCTN3 0.60 0.60 A ILMN_1798533 192 ENSG00000142694 EVA1B 0.64 0.60 B ILMN_1798619 100 ENSG00000259132 0.65 0.60 B ILMN_1798706 142 ENSG00000162889 MAPKAPK2 0.61 0.60 B ILMN_1798728 2105 ENSG00000135775 COG2 0.64 0.60 A ILMN_1798804 2454 ENSG00000234118 RPL13AP6 0.66 0.60 A ILMN_1798957 243 ENSG00000142409 ZNF787 0.65 0.60 B ILMN_1798977 48 ENSG00000142657 PGD 0.62 0.60 B ILMN_1799516 2259 ENSG00000094880 CDC23 0.62 0.60 A ILMN_1799688 2208 ENSG00000025156 HSF2 0.63 0.60 A ILMN_1799814 2253 ENSG00000129187 DCTD 0.66 0.60 A ILMN_1800311 2252 ENSG00000153107 ANAPC1 0.67 0.60 A ILMN_1800787 2201 ENSG00000133316 WDR74 0.66 0.60 A ILMN_1801105 2158 ENSG00000160208 RRP1B 0.62 0.60 A ILMN_1801348 2151 ENSG00000135972 MRPS9 0.64 0.60 A ILMN_1801403 2202 ENSG00000004487 KDM1A 0.66 0.60 A ILMN_1801710 2154 ENSG00000019485 PRDM11 0.65 0.60 A ILMN_1801795 2210 ENSG00000147140 NONO 0.60 0.60 A ILMN_1801869 2151 ENSG00000141385 AFG3L2 0.64 0.60 A ILMN_1801913 2453 ENSG00000116251 RPL22 0.68 0.60 A ILMN_1802157 2455 ENSG00000122406 RPL5 0.65 0.60 A ILMN_1802456 2452 ENSG00000215252 GOLGA8B 0.71 0.60 A ILMN_1802458 199 ENSG00000123643 SLC36A1 0.63 0.60 B ILMN_1802553 298 ENSG00000124788 ATXN1 0.63 0.60 B ILMN_1802615 2101 ENSG00000136718 IMP4 0.61 0.60 A ILMN_1803036 2210 ENSG00000196531 NACA 0.60 0.60 A ILMN_1803045 1951 ENSG00000239927 TRIM39-RPP21 0.58 0.60 A ILMN_1803302 2156 ENSG00000163607 GTPBP8 0.63 0.60 A ILMN_1803312 95 ENSG00000141551 CSNK1D 0.57 0.60 B ILMN_1803317 2451 ENSG00000136717 BIN1 0.72 0.60 A 58 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1803348 2159 ENSG00000276618 RAD17 0.62 0.60 A ILMN_1803564 44 ENSG00000275031 METRNL 0.57 0.60 B ILMN_1803676 2201 ENSG00000129103 SUMF2 0.66 0.60 A ILMN_1803925 2261 ENSG00000084090 STARD7 0.59 0.60 A ILMN_1803997 2156 ENSG00000198721 ECI2 0.63 0.60 A ILMN_1804150 50 ENSG00000203710 CR1 0.65 0.60 B ILMN_1804445 47 ENSG00000087903 RFX2 0.59 0.60 B ILMN_1804611 2454 ENSG00000237506 0.66 0.60 A ILMN_1804798 195 ENSG00000156931 VPS8 0.61 0.60 B ILMN_1804812 2357 ENSG00000226144 0.63 0.60 A ILMN_1805007 2261 ENSG00000125944 HNRNPR 0.59 0.60 A ILMN_1805175 2404 ENSG00000214988 0.66 0.60 A ILMN_1805192 2454 ENSG00000232341 0.66 0.60 A ILMN_1805225 2307 ENSG00000226723 0.64 0.60 A ILMN_1805228 2403 ENSG00000216866 0.68 0.60 A ILMN_1805481 2455 ENSG00000240376 0.65 0.60 A ILMN_1805658 47 ENSG00000279072 0.59 0.60 B ILMN_1805827 2254 ENSG00000117174 ZNHIT6 0.66 0.60 A ILMN_1806015 2151 ENSG00000169100 SLC25A6 0.64 0.60 A ILMN_1806293 2109 ENSG00000119977 TCTN3 0.61 0.60 A ILMN_1806818 2251 ENSG00000204272 NBDY 0.67 0.60 A ILMN_1806867 2454 ENSG00000258245 0.66 0.60 A ILMN_1806999 2255 ENSG00000240489 0.65 0.60 A ILMN_1807596 2254 ENSG00000132953 XPO4 0.66 0.61 A ILMN_1808047 244 ENSG00000105355 PLIN3 0.64 0.61 B ILMN_1808299 2352 ENSG00000130684 ZNF337 0.69 0.61 A ILMN_1808783 2205 ENSG00000147224 PRPS1 0.64 0.61 A ILMN_1808811 2208 ENSG00000168806 LCMT2 0.63 0.61 A ILMN_1808837 2451 ENSG00000163600 ICOS 0.72 0.61 A ILMN_1808846 2253 ENSG00000155561 NUP205 0.66 0.61 A ILMN_1808939 2451 ENSG00000136717 BIN1 0.72 0.61 A ILMN_1809708 2451 ENSG00000234498 RPL13AP20 0.72 0.61 A ILMN_1809866 296 ENSG00000107341 UBE2R2 0.63 0.61 B ILMN_1809963 95 ENSG00000100330 MTMR3 0.57 0.61 B ILMN_1810334 2105 ENSG00000087263 OGFOD1 0.64 0.61 A ILMN_1810420 2356 ENSG00000107672 NSMCE4A 0.63 0.61 A ILMN_1810423 2209 ENSG00000247626 MARS2 0.62 0.61 A ILMN_1810514 2455 ENSG00000179460 0.65 0.61 A ILMN_1810584 2251 ENSG00000089220 PEBP1 0.67 0.61 A ILMN_1810725 2206 ENSG00000152642 GPD1L 0.64 0.61 A ILMN_1810810 50 ENSG00000167434 CA4 0.65 0.61 B ILMN_1810922 2004 ENSG00000115073 ACTR1B 0.62 0.61 A Appendix A: Supplementary data 59

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_1811104 2453 ENSG00000145247 OCIAD2 0.68 0.61 A ILMN_1811991 146 ENSG00000159164 SV2A 0.59 0.61 B ILMN_1812191 46 ENSG00000165801 ARHGEF40 0.57 0.61 B ILMN_1812278 2454 ENSG00000240087 0.66 0.61 A ILMN_1812777 2004 ENSG00000101391 CDK5RAP1 0.62 0.61 A ILMN_1812973 2201 ENSG00000226492 CUTA 0.66 0.61 A ILMN_1812976 2401 ENSG00000071575 TRIB2 0.71 0.61 A ILMN_1813091 294 ENSG00000182175 RGMA 0.65 0.61 B ILMN_1813207 2360 ENSG00000167842 MIS12 0.65 0.61 A ILMN_1813544 293 ENSG00000132906 CASP9 0.67 0.61 B ILMN_1813625 2258 ENSG00000134987 WDR36 0.63 0.61 A ILMN_1813669 2453 ENSG00000111716 LDHB 0.68 0.61 A ILMN_1813836 2352 ENSG00000182670 TTC3 0.69 0.61 A ILMN_1813840 2254 ENSG00000189369 GSPT2 0.66 0.61 A ILMN_1814011 2361 ENSG00000134440 NARS 0.66 0.61 A ILMN_1814122 2153 ENSG00000089154 GCN1 0.65 0.61 A ILMN_1814859 50 ENSG00000112062 MAPK14 0.65 0.61 B ILMN_1815054 2304 ENSG00000167699 GLOD4 0.66 0.61 A ILMN_1815063 2151 ENSG00000105447 GRWD1 0.64 0.61 A ILMN_1815107 2161 ENSG00000115806 GORASP2 0.55 0.61 A ILMN_1815141 2005 ENSG00000156502 SUPV3L1 0.62 0.61 A ILMN_1815500 2456 ENSG00000226225 RPS18 0.63 0.61 A ILMN_1815759 2357 ENSG00000231767 0.63 0.61 A ILMN_1815924 2259 ENSG00000180228 PRKRA 0.62 0.61 A ILMN_1868047 2451 ENSG00000139641 ESYT1 0.72 0.61 A ILMN_1869109 2303 ENSG00000279738 0.67 0.61 A ILMN_1886515 2201 ENSG00000179409 GEMIN4 0.66 0.61 A ILMN_1898124 2251 ENSG00000165733 BMS1 0.67 0.61 A ILMN_2038774 150 ENSG00000132825 PPP1R3D 0.64 0.61 B ILMN_2043728 2360 ENSG00000156469 MTERF3 0.65 0.61 A ILMN_2044617 2258 ENSG00000089050 RBBP9 0.63 0.61 A ILMN_2044832 2206 ENSG00000198648 STK39 0.64 0.61 A ILMN_2047618 2411 ENSG00000177889 UBE2N 0.68 0.61 A ILMN_2047676 200 ENSG00000138463 DIRC2 0.64 0.61 B ILMN_2048326 50 ENSG00000091106 NLRC4 0.65 0.61 B ILMN_2048982 2306 ENSG00000115368 WDR75 0.64 0.61 A ILMN_2050255 2209 ENSG00000120805 ARL1 0.62 0.61 A ILMN_2050911 2303 ENSG00000213782 DDX47 0.67 0.61 A ILMN_2051867 249 ENSG00000168297 PXK 0.63 0.61 B ILMN_2052163 2357 ENSG00000143947 RPS27A 0.63 0.61 A ILMN_2052208 2161 ENSG00000146701 MDH2 0.55 0.61 A ILMN_2052790 2259 ENSG00000171792 RHNO1 0.62 0.61 A 60 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_2053546 2457 ENSG00000142676 RPL11 0.62 0.61 A ILMN_2054392 2457 ENSG00000090266 NDUFB2 0.62 0.61 A ILMN_2054442 50 ENSG00000170525 PFKFB3 0.65 0.61 B ILMN_2057573 148 ENSG00000226479 TMEM185B 0.62 0.61 B ILMN_2060115 45 ENSG00000177674 AGTRAP 0.56 0.61 B ILMN_2062620 2355 ENSG00000164163 ABCE1 0.65 0.61 A ILMN_2064898 2409 ENSG00000139793 MBNL2 0.62 0.61 A ILMN_2066124 2151 ENSG00000128626 MRPS12 0.64 0.61 A ILMN_2067656 199 ENSG00000073711 PPP2R3A 0.63 0.61 B ILMN_2067708 2301 ENSG00000176953 NFATC2IP 0.68 0.61 A ILMN_2067709 2362 ENSG00000135297 MTO1 0.65 0.61 A ILMN_2072603 100 ENSG00000005302 MSL3 0.65 0.61 B ILMN_2073010 2107 ENSG00000161791 FMNL3 0.62 0.61 A ILMN_2073012 2055 ENSG00000177192 PUS1 0.63 0.61 A ILMN_2075051 2151 ENSG00000229806 0.64 0.61 A ILMN_2075189 2051 ENSG00000213293 0.59 0.61 A ILMN_2077094 2455 ENSG00000230383 0.65 0.61 A ILMN_2077623 2355 ENSG00000156508 EEF1A1 0.65 0.61 A ILMN_2078697 2261 ENSG00000249858 0.59 0.61 A ILMN_2079004 2454 ENSG00000244722 0.66 0.61 A ILMN_2079386 2454 ENSG00000214016 0.66 0.61 A ILMN_2081988 2455 ENSG00000235552 0.65 0.61 A ILMN_2082314 2303 ENSG00000230280 0.67 0.61 A ILMN_2083243 150 ENSG00000138190 EXOC6 0.64 0.62 B ILMN_2083946 2255 ENSG00000063046 EIF4B 0.65 0.62 A ILMN_2086064 2101 ENSG00000104835 SARS2 0.61 0.62 A ILMN_2086077 2358 ENSG00000180964 TCEAL8 0.62 0.62 A ILMN_2086238 2401 ENSG00000102245 CD40LG 0.71 0.62 A ILMN_2087080 2354 ENSG00000213862 0.66 0.62 A ILMN_2088612 2454 ENSG00000177350 RPL13AP3 0.66 0.62 A ILMN_2091412 50 ENSG00000090376 IRAK3 0.65 0.62 B ILMN_2092118 2251 ENSG00000177192 PUS1 0.67 0.62 A ILMN_2092333 244 ENSG00000273213 0.64 0.62 B ILMN_2097793 148 ENSG00000153250 RBMS1 0.62 0.62 B ILMN_2098325 2156 ENSG00000180884 ZNF792 0.63 0.62 A ILMN_2101885 2401 ENSG00000121310 ECHDC2 0.71 0.62 A ILMN_2103295 2457 ENSG00000142676 RPL11 0.62 0.62 A ILMN_2103547 2054 ENSG00000124608 AARS2 0.63 0.62 A ILMN_2103591 2306 ENSG00000164163 ABCE1 0.64 0.62 A ILMN_2105923 2454 ENSG00000254772 EEF1G 0.66 0.62 A ILMN_2108938 2153 ENSG00000108439 PNPO 0.65 0.62 A ILMN_2109156 2054 ENSG00000141101 NOB1 0.63 0.62 A Appendix A: Supplementary data 61

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_2110167 2352 ENSG00000120910 PPP3CC 0.69 0.62 A ILMN_2110281 2201 ENSG00000129351 ILF3 0.66 0.62 A ILMN_2112524 2455 ENSG00000212802 0.65 0.62 A ILMN_2114876 2301 ENSG00000273761 DDX24 0.68 0.62 A ILMN_2115154 2159 ENSG00000163002 NUP35 0.62 0.62 A ILMN_2116366 2104 ENSG00000272835 SMDT1 0.64 0.62 A ILMN_2117330 2455 ENSG00000235552 0.65 0.62 A ILMN_2117904 2451 ENSG00000118922 KLF12 0.72 0.62 A ILMN_2121437 2403 ENSG00000219582 0.68 0.62 A ILMN_2122103 2304 ENSG00000165526 RPUSD4 0.66 0.62 A ILMN_2123665 2209 ENSG00000120798 NR2C1 0.62 0.62 A ILMN_2123871 2356 ENSG00000074201 CLNS1A 0.63 0.62 A ILMN_2124471 2454 ENSG00000174444 RPL4 0.66 0.62 A ILMN_2125675 46 ENSG00000167874 TMEM88 0.57 0.62 B ILMN_2127379 2404 ENSG00000161016 RPL8 0.66 0.62 A ILMN_2128967 2108 ENSG00000144741 SLC25A26 0.61 0.62 A ILMN_2129927 146 ENSG00000242324 0.59 0.62 B ILMN_2130180 2351 ENSG00000111371 SLC38A1 0.69 0.62 A ILMN_2131022 2210 ENSG00000081177 EXD2 0.60 0.62 A ILMN_2131336 50 ENSG00000115590 IL1R2 0.65 0.62 B ILMN_2133360 2362 ENSG00000152382 TADA1 0.65 0.62 A ILMN_2134062 50 ENSG00000099985 OSM 0.65 0.62 B ILMN_2135798 2107 ENSG00000213995 NAXD 0.62 0.62 A ILMN_2137536 2051 ENSG00000157593 SLC35B2 0.59 0.62 A ILMN_2138435 2101 ENSG00000248487 ABHD14A 0.61 0.62 A ILMN_2139351 2152 ENSG00000148840 PPRC1 0.64 0.62 A ILMN_2140207 2205 ENSG00000079134 THOC1 0.64 0.62 A ILMN_2140799 2304 ENSG00000086189 DIMT1 0.66 0.62 A ILMN_2141444 2305 ENSG00000106049 HIBADH 0.65 0.62 A ILMN_2141452 146 ENSG00000111684 LPCAT3 0.59 0.62 B ILMN_2141453 2451 ENSG00000096433 ITPR3 0.72 0.62 A ILMN_2142752 95 ENSG00000060069 CTDP1 0.57 0.62 B ILMN_2144574 2205 ENSG00000132953 XPO4 0.64 0.62 A ILMN_2151579 2351 ENSG00000174282 ZBTB4 0.69 0.62 A ILMN_2153280 96 ENSG00000236438 0.58 0.62 B ILMN_2153332 2054 ENSG00000183060 LYSMD4 0.63 0.62 A ILMN_2154566 2203 ENSG00000125944 HNRNPR 0.66 0.62 A ILMN_2155172 2310 ENSG00000198890 PRMT6 0.62 0.62 A ILMN_2156982 2251 ENSG00000063587 LOC105373378 0.67 0.62 A ILMN_2159859 2451 ENSG00000171791 BCL2 0.72 0.62 A ILMN_2162234 2104 ENSG00000011304 PTBP1 0.64 0.62 A ILMN_2166506 2451 ENSG00000101224 CDC25B 0.72 0.62 A 62 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_2166831 150 ENSG00000153250 RBMS1 0.64 0.62 B ILMN_2167617 2251 ENSG00000224078 0.67 0.62 A ILMN_2168217 249 ENSG00000151553 FAM160B1 0.63 0.62 B ILMN_2174394 2454 ENSG00000214016 0.66 0.62 A ILMN_2175601 2453 ENSG00000206147 0.68 0.62 A ILMN_2175894 2454 ENSG00000224858 0.66 0.62 A ILMN_2176768 2456 ENSG00000242299 0.63 0.62 A ILMN_2179018 2455 ENSG00000142937 0.65 0.62 A ILMN_2182198 2103 ENSG00000113716 HMGXB3 0.65 0.62 A ILMN_2182531 2455 ENSG00000233045 0.65 0.62 A ILMN_2183687 2454 ENSG00000242314 0.66 0.62 A ILMN_2186061 2454 ENSG00000234785 0.66 0.62 A ILMN_2186597 2152 ENSG00000175575 PAAF1 0.64 0.63 A ILMN_2187727 292 ENSG00000198663 C6orf89 0.68 0.63 B ILMN_2191436 2107 ENSG00000174500 GCSAM 0.62 0.63 A ILMN_2191759 2352 ENSG00000117174 ZNHIT6 0.69 0.63 A ILMN_2192385 195 ENSG00000258741 0.61 0.63 B ILMN_2192693 144 ENSG00000173535 TNFRSF10C 0.59 0.63 B ILMN_2194649 2301 ENSG00000001461 NIPAL3 0.68 0.63 A ILMN_2198878 2453 ENSG00000130255 RPL36 0.68 0.63 A ILMN_2200917 41 ENSG00000181409 AATK 0.56 0.63 B ILMN_2201347 2451 ENSG00000139193 CD27 0.72 0.63 A ILMN_2201966 48 ENSG00000146094 DOK3 0.62 0.63 B ILMN_2204545 2301 ENSG00000170915 PAQR8 0.68 0.63 A ILMN_2205211 243 ENSG00000160888 IER2 0.65 0.63 B ILMN_2205882 200 ENSG00000109466 KLHL2 0.64 0.63 B ILMN_2205963 2301 ENSG00000187838 PLSCR3 0.68 0.63 A ILMN_2218277 148 ENSG00000087903 RFX2 0.62 0.63 B ILMN_2219134 249 ENSG00000164066 INTU 0.63 0.63 B ILMN_2220283 150 ENSG00000157557 ETS2 0.64 0.63 B ILMN_2221564 2309 ENSG00000104442 ARMC1 0.63 0.63 A ILMN_2222880 394 ENSG00000105778 AVL9 0.71 0.63 B ILMN_2222984 47 ENSG00000140526 ABHD2 0.59 0.63 B ILMN_2223805 2251 ENSG00000166133 RPUSD2 0.67 0.63 A ILMN_2224143 2302 ENSG00000198301 SDAD1 0.68 0.63 A ILMN_2225709 2455 ENSG00000234882 0.65 0.63 A ILMN_2228710 2204 ENSG00000101019 UQCC1 0.65 0.63 A ILMN_2230624 50 ENSG00000079277 MKNK1 0.65 0.63 B ILMN_2230672 2305 ENSG00000152454 ZNF256 0.65 0.63 A ILMN_2231020 193 ENSG00000167566 NCKAP5L 0.62 0.63 B ILMN_2231021 2411 ENSG00000142556 ZNF614 0.68 0.63 A ILMN_2234229 2451 ENSG00000215788 TNFRSF25 0.72 0.63 A Appendix A: Supplementary data 63

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_2235283 50 ENSG00000203710 CR1 0.65 0.63 B ILMN_2243308 2451 ENSG00000172575 RASGRP1 0.72 0.63 A ILMN_2243553 2106 ENSG00000101654 RNMT 0.63 0.63 A ILMN_2246956 2301 ENSG00000103264 FBXO31 0.68 0.63 A ILMN_2247594 2101 ENSG00000001497 LAS1L 0.61 0.63 A ILMN_2252309 2103 ENSG00000159079 C21orf59 0.65 0.63 A ILMN_2256359 2251 ENSG00000130193 THEM6 0.67 0.63 A ILMN_2256765 248 ENSG00000132405 TBC1D14 0.62 0.63 B ILMN_2258774 2451 ENSG00000125245 GPR18 0.72 0.63 A ILMN_2261416 2251 ENSG00000132182 NUP210 0.67 0.63 A ILMN_2262288 2451 ENSG00000171130 ATP6V0E2 0.72 0.63 A ILMN_2263144 2252 ENSG00000099904 ZDHHC8 0.67 0.63 A ILMN_2264011 2306 ENSG00000147654 EBAG9 0.64 0.63 A ILMN_2274199 345 ENSG00000100379 KCTD17 0.65 0.63 B ILMN_2286514 2056 ENSG00000135622 SEMA4F 0.62 0.63 A ILMN_2286870 45 ENSG00000198933 TBKBP1 0.56 0.63 B ILMN_2289849 245 ENSG00000188997 KCTD21 0.63 0.63 B ILMN_2302716 2154 ENSG00000136444 RSAD1 0.65 0.63 A ILMN_2307656 2203 ENSG00000163389 POGLUT1 0.66 0.63 A ILMN_2309245 2207 ENSG00000186532 SMYD4 0.63 0.63 A ILMN_2310589 2103 ENSG00000184162 NR2C2AP 0.65 0.63 A ILMN_2311989 2306 ENSG00000205581 HMGN1 0.64 0.63 A ILMN_2315569 2451 ENSG00000169508 GPR183 0.72 0.63 A ILMN_2315979 2451 ENSG00000203896 LIME1 0.72 0.63 A ILMN_2316540 2309 ENSG00000152382 TADA1 0.63 0.63 A ILMN_2318725 2453 ENSG00000033867 SLC4A7 0.68 0.63 A ILMN_2318733 49 ENSG00000110080 ST3GAL4 0.64 0.63 B ILMN_2319344 2104 ENSG00000106246 PTCD1 0.64 0.63 A ILMN_2319996 2451 ENSG00000213626 LBH 0.72 0.63 A ILMN_2320250 196 ENSG00000069974 RAB27A 0.60 0.63 B ILMN_2321634 2451 ENSG00000262599 KIAA1147 0.72 0.63 A ILMN_2322935 2105 ENSG00000165689 SDCCAG3 0.64 0.63 A ILMN_2323048 50 ENSG00000112062 MAPK14 0.65 0.63 B ILMN_2323172 2451 ENSG00000167106 FAM102A 0.72 0.63 A ILMN_2323633 2255 ENSG00000105176 URI1 0.65 0.63 A ILMN_2325185 295 ENSG00000181481 RNF135 0.64 0.63 B ILMN_2325506 2351 ENSG00000171603 CLSTN1 0.69 0.63 A ILMN_2325837 2454 ENSG00000219470 0.66 0.63 A ILMN_2328378 2455 ENSG00000245205 0.65 0.63 A ILMN_2329735 95 ENSG00000214212 C19orf38 0.57 0.63 B ILMN_2329773 195 ENSG00000188933 0.61 0.63 B ILMN_2330267 2406 ENSG00000187472 0.63 0.63 A 64 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_2330341 2152 ENSG00000260518 0.64 0.63 A ILMN_2333319 2354 ENSG00000235508 0.66 0.64 A ILMN_2335704 2052 ENSG00000233762 0.61 0.64 A ILMN_2338116 343 ENSG00000172409 CLP1 0.71 0.64 B ILMN_2338323 2154 ENSG00000120253 NUP43 0.65 0.64 A ILMN_2338452 2303 ENSG00000276726 DNAJA3 0.67 0.64 A ILMN_2339796 2451 ENSG00000107679 PLEKHA1 0.72 0.64 A ILMN_2341363 2152 ENSG00000101158 NELFCD 0.64 0.64 A ILMN_2341793 346 ENSG00000181192 DHTKD1 0.64 0.64 B ILMN_2342066 2307 ENSG00000261208 0.64 0.64 A ILMN_2343010 2451 ENSG00000104814 MAP4K1 0.72 0.64 A ILMN_2343278 2455 ENSG00000164587 RPS14 0.65 0.64 A ILMN_2344002 50 ENSG00000021355 SERPINB1 0.65 0.64 B ILMN_2344204 2359 ENSG00000140367 UBE2Q2 0.64 0.64 A ILMN_2345872 2251 ENSG00000140688 C16orf58 0.67 0.64 A ILMN_2345898 2101 ENSG00000161999 JMJD8 0.61 0.64 A ILMN_2347349 148 ENSG00000163625 WDFY3 0.62 0.64 B ILMN_2355033 149 ENSG00000118217 ATF6 0.63 0.64 B ILMN_2355665 2106 ENSG00000170468 RIOX1 0.63 0.64 A ILMN_2356111 2453 ENSG00000142541 RPL13A 0.68 0.64 A ILMN_2358202 2356 ENSG00000145919 BOD1 0.63 0.64 A ILMN_2358382 2101 ENSG00000250479 CHCHD10 0.61 0.64 A ILMN_2358540 2451 ENSG00000082512 TRAF5 0.72 0.64 A ILMN_2358541 2154 ENSG00000134014 ELP3 0.65 0.64 A ILMN_2360730 2061 ENSG00000130347 RTN4IP1 0.54 0.64 A ILMN_2363058 2205 ENSG00000084463 WBP11 0.64 0.64 A ILMN_2363065 2201 ENSG00000133030 MPRIP 0.66 0.64 A ILMN_2364022 2256 ENSG00000198042 MAK16 0.64 0.64 A ILMN_2364272 2053 ENSG00000106608 URGCP 0.64 0.64 A ILMN_2365544 2451 ENSG00000099204 ABLIM1 0.72 0.64 A ILMN_2366388 293 ENSG00000132906 CASP9 0.67 0.64 B ILMN_2367782 92 ENSG00000235568 NFAM1 0.59 0.64 B ILMN_2368292 2255 ENSG00000170846 0.65 0.64 A ILMN_2368318 2451 ENSG00000169508 GPR183 0.72 0.64 A ILMN_2369785 2453 ENSG00000105176 URI1 0.68 0.64 A ILMN_2371280 2352 ENSG00000132199 ENOSF1 0.69 0.64 A ILMN_2371964 50 ENSG00000111052 LIN7A 0.65 0.64 B ILMN_2372082 192 ENSG00000181396 OGFOD3 0.64 0.64 B ILMN_2372974 2461 ENSG00000225858 0.68 0.64 A ILMN_2373566 2359 ENSG00000156469 MTERF3 0.64 0.64 A ILMN_2374249 2159 ENSG00000128694 OSGEPL1 0.62 0.64 A ILMN_2375418 197 ENSG00000058799 YIPF1 0.61 0.64 B Appendix A: Supplementary data 65

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_2376455 2403 ENSG00000137054 POLR1E 0.68 0.64 A ILMN_2376520 2454 ENSG00000197958 RPL12 0.66 0.64 A ILMN_2377109 2451 ENSG00000134954 ETS1 0.72 0.64 A ILMN_2379080 2054 ENSG00000166199 ALKBH3 0.63 0.64 A ILMN_2379469 50 ENSG00000179299 NSUN7 0.65 0.64 B ILMN_2380588 2201 ENSG00000115268 RPS15 0.66 0.64 A ILMN_2380605 197 ENSG00000226479 TMEM185B 0.61 0.64 B ILMN_2380740 2201 ENSG00000165271 NOL6 0.66 0.64 A ILMN_2380850 2053 ENSG00000089248 ERP29 0.64 0.64 A ILMN_2380967 150 ENSG00000153250 RBMS1 0.64 0.64 B ILMN_2381397 147 ENSG00000105971 CAV2 0.60 0.64 B ILMN_2383305 2451 ENSG00000182866 LCK 0.72 0.64 A ILMN_2383871 248 ENSG00000236018 0.62 0.64 B ILMN_2384536 394 ENSG00000283654 0.71 0.64 B ILMN_2384591 2253 ENSG00000261914 0.66 0.64 A ILMN_2385278 2203 ENSG00000279738 0.66 0.64 A ILMN_2386818 2403 ENSG00000227694 0.68 0.64 A ILMN_2387285 2051 ENSG00000161999 JMJD8 0.59 0.64 A ILMN_2387742 294 ENSG00000268799 0.65 0.64 B ILMN_2388090 2453 ENSG00000225093 0.68 0.64 A ILMN_2388112 2454 ENSG00000241556 0.66 0.64 A ILMN_2390453 2302 ENSG00000235581 0.68 0.64 A ILMN_2391141 2404 ENSG00000236887 0.66 0.64 A ILMN_2391765 2455 ENSG00000242262 0.65 0.64 A ILMN_2392043 150 ENSG00000274536 0.64 0.64 B ILMN_2392274 45 ENSG00000189077 TMEM120A 0.56 0.65 B ILMN_2392352 50 ENSG00000182541 LIMK2 0.65 0.65 B ILMN_2392546 2455 ENSG00000226836 0.65 0.65 A ILMN_2393296 48 ENSG00000127948 POR 0.62 0.65 B ILMN_2394210 2410 ENSG00000156976 EIF4A2 0.65 0.65 A ILMN_2395214 2303 ENSG00000109971 HSPA8 0.67 0.65 A ILMN_2395711 2254 ENSG00000213399 0.66 0.65 A ILMN_2396287 2104 ENSG00000236515 ZBTB9 0.64 0.65 A ILMN_2396648 2357 ENSG00000235552 0.63 0.65 A ILMN_2397199 2410 ENSG00000111875 ASF1A 0.65 0.65 A ILMN_2397521 343 ENSG00000114735 HEMK1 0.71 0.65 B ILMN_2398489 296 ENSG00000149177 PTPRJ 0.63 0.65 B ILMN_2399896 2451 ENSG00000065675 PRKCQ 0.72 0.65 A ILMN_2401779 2109 ENSG00000101452 DHX35 0.61 0.65 A ILMN_2401822 2302 ENSG00000177971 IMP3 0.68 0.65 A ILMN_2402341 2360 ENSG00000198860 TSEN15 0.65 0.65 A ILMN_2402936 2003 ENSG00000084652 TXLNA 0.63 0.65 A 66 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_2403446 2254 ENSG00000143390 RFX5 0.66 0.65 A ILMN_2404154 2201 ENSG00000103335 PIEZO1 0.66 0.65 A ILMN_2404385 2253 ENSG00000090621 PABPC4 0.66 0.65 A ILMN_2405797 2001 ENSG00000196365 LONP1 0.58 0.65 A ILMN_2406892 2401 ENSG00000117090 SLAMF1 0.71 0.65 A ILMN_2407529 250 ENSG00000136371 MTHFS 0.64 0.65 B ILMN_2407589 345 ENSG00000093167 LRRFIP2 0.65 0.65 B ILMN_2409596 2210 ENSG00000262327 AGK 0.60 0.65 A ILMN_2410965 297 ENSG00000143797 MBOAT2 0.63 0.65 B ILMN_2411190 1954 ENSG00000144524 COPS7B 0.61 0.65 A ILMN_2411559 2003 ENSG00000100413 POLR3H 0.63 0.65 A ILMN_2411723 2352 ENSG00000089094 KDM2B 0.69 0.65 A ILMN_2412549 50 ENSG00000185031 0.65 0.65 B ILMN_2412624 2052 ENSG00000101361 NOP56 0.61 0.65 A ILMN_2413084 2304 ENSG00000204677 0.66 0.65 A ILMN_2413278 147 ENSG00000182197 EXT1 0.60 0.65 B ILMN_2415157 2209 ENSG00000127463 EMC1 0.62 0.65 A ILMN_2415170 2404 ENSG00000255513 0.66 0.65 A ILMN_2415179 200 ENSG00000135503 ACVR1B 0.64 0.65 B ILMN_2415926 2451 ENSG00000154016 GRAP 0.72 0.65 A ILMN_3176040 195 ENSG00000143369 ECM1 0.61 0.65 B ILMN_3176790 2401 ENSG00000127334 DYRK2 0.71 0.65 A ILMN_3177271 2151 ENSG00000106263 EIF3B 0.64 0.65 A ILMN_3177285 2251 ENSG00000106608 URGCP 0.67 0.65 A ILMN_3182275 45 ENSG00000175274 TP53I11 0.56 0.65 B ILMN_3187648 2304 ENSG00000109971 HSPA8 0.66 0.65 A ILMN_3187771 2203 ENSG00000238172 0.66 0.65 A ILMN_3187852 2304 ENSG00000256356 0.66 0.65 A ILMN_3191695 2303 ENSG00000215840 0.67 0.65 A ILMN_3198247 2455 ENSG00000223668 0.65 0.65 A ILMN_3199780 2255 ENSG00000166881 NEMP1 0.65 0.65 A ILMN_3199798 2451 ENSG00000275342 PRAG1 0.72 0.65 A ILMN_3199916 196 ENSG00000203496 0.60 0.65 B ILMN_3200330 189 ENSG00000128011 LRFN1 0.61 0.65 B ILMN_3200384 48 ENSG00000225450 0.62 0.65 B ILMN_3201445 2258 ENSG00000233476 0.63 0.66 A ILMN_3201480 2404 ENSG00000204677 0.66 0.66 A ILMN_3201517 2454 ENSG00000230734 0.66 0.66 A ILMN_3201843 50 ENSG00000159496 RGL4 0.65 0.66 B ILMN_3202432 2052 ENSG00000235605 0.61 0.66 A ILMN_3202673 346 ENSG00000112183 RBM24 0.64 0.66 B ILMN_3204210 2303 ENSG00000234851 0.67 0.66 A Appendix A: Supplementary data 67

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_3205162 244 ENSG00000153558 FBXL2 0.64 0.66 B ILMN_3206343 2351 ENSG00000110651 CD81 0.69 0.66 A ILMN_3207060 2455 ENSG00000218426 0.65 0.66 A ILMN_3207933 50 ENSG00000178814 OPLAH 0.65 0.66 B ILMN_3208056 2351 ENSG00000104964 AES 0.69 0.66 A ILMN_3208973 2152 ENSG00000179051 RCC2 0.64 0.66 A ILMN_3210917 2401 ENSG00000198286 CARD11 0.71 0.66 A ILMN_3211079 2309 ENSG00000087448 KLHL42 0.63 0.66 A ILMN_3211132 194 ENSG00000105514 RAB3D 0.61 0.66 B ILMN_3211935 150 ENSG00000173597 SULT1B1 0.64 0.66 B ILMN_3212833 42 ENSG00000112561 TFEB 0.57 0.66 B ILMN_3213568 45 ENSG00000130775 THEMIS2 0.56 0.66 B ILMN_3213573 2059 ENSG00000204628 RACK1 0.60 0.66 A ILMN_3213792 2202 ENSG00000167658 EEF2 0.66 0.66 A ILMN_3216125 2252 ENSG00000130299 GTPBP3 0.67 0.66 A ILMN_3217172 2401 ENSG00000073849 ST6GAL1 0.71 0.66 A ILMN_3217815 2303 ENSG00000018699 TTC27 0.67 0.66 A ILMN_3218248 2001 ENSG00000110107 PRPF19 0.58 0.66 A ILMN_3218820 195 ENSG00000117115 PADI2 0.61 0.66 B ILMN_3219340 2258 ENSG00000164828 SUN1 0.63 0.66 A ILMN_3219643 2301 ENSG00000186918 ZNF395 0.68 0.66 A ILMN_3220792 148 ENSG00000070540 WIPI1 0.62 0.66 B ILMN_3220888 2301 ENSG00000135736 CCDC102A 0.68 0.66 A ILMN_3222402 50 ENSG00000183696 UPP1 0.65 0.66 B ILMN_3223798 2401 ENSG00000131378 RFTN1 0.71 0.66 A ILMN_3224907 2001 ENSG00000198917 SPOUT1 0.58 0.66 A ILMN_3224952 296 ENSG00000143337 TOR1AIP1 0.63 0.66 B ILMN_3225505 50 ENSG00000187037 GPR141 0.65 0.66 B ILMN_3225591 2104 ENSG00000133422 MORC2 0.64 0.66 A ILMN_3225784 2453 ENSG00000226360 0.68 0.66 A ILMN_3225941 2306 ENSG00000137818 RPLP1 0.64 0.66 A ILMN_3226291 2155 ENSG00000181191 PJA1 0.64 0.66 A ILMN_3226392 46 ENSG00000140526 ABHD2 0.57 0.66 B ILMN_3226807 2303 ENSG00000227337 0.67 0.66 A ILMN_3227732 2355 ENSG00000229638 0.65 0.66 A ILMN_3227811 2454 ENSG00000234618 0.66 0.66 A ILMN_3227912 2354 ENSG00000213988 ZNF90 0.66 0.66 A ILMN_3228294 2054 ENSG00000204946 ZNF783 0.63 0.66 A ILMN_3228822 2454 ENSG00000229638 0.66 0.66 A ILMN_3229210 2055 ENSG00000101442 ACTR5 0.63 0.67 A ILMN_3231390 2204 ENSG00000069998 HDHD5 0.65 0.67 A ILMN_3231621 46 ENSG00000085117 CD82 0.57 0.67 B 68 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_3232573 2306 ENSG00000144713 RPL32 0.64 0.67 A ILMN_3233442 150 ENSG00000005302 MSL3 0.64 0.67 B ILMN_3234436 293 ENSG00000141480 ARRB2 0.67 0.67 B ILMN_3234841 2455 ENSG00000213885 0.65 0.67 A ILMN_3235065 2352 ENSG00000105373 NOP53 0.69 0.67 A ILMN_3235113 50 ENSG00000134243 SORT1 0.65 0.67 B ILMN_3235148 2360 ENSG00000145293 ENOPH1 0.65 0.67 A ILMN_3236061 2455 ENSG00000166441 RPL27A 0.65 0.67 A ILMN_3236156 2204 ENSG00000099810 MTAP 0.65 0.67 A ILMN_3236259 2051 ENSG00000105135 ILVBL 0.59 0.67 A ILMN_3236556 2351 ENSG00000130816 DNMT1 0.69 0.67 A ILMN_3236713 2303 ENSG00000198721 ECI2 0.67 0.67 A ILMN_3236945 294 ENSG00000102125 TAZ 0.65 0.67 B ILMN_3237926 50 ENSG00000102010 BMX 0.65 0.67 B ILMN_3237977 2105 ENSG00000123064 DDX54 0.64 0.67 A ILMN_3238001 200 ENSG00000064999 ANKS1A 0.64 0.67 B ILMN_3238623 2252 ENSG00000115053 NCL 0.67 0.67 A ILMN_3238818 94 ENSG00000060069 CTDP1 0.57 0.67 B ILMN_3238931 46 ENSG00000196843 ARID5A 0.57 0.67 B ILMN_3239181 394 ENSG00000105778 AVL9 0.71 0.67 B ILMN_3239629 2454 ENSG00000230592 0.66 0.67 A ILMN_3239925 2455 ENSG00000249264 0.65 0.67 A ILMN_3240222 2454 ENSG00000226624 0.66 0.67 A ILMN_3240538 2253 ENSG00000243964 0.66 0.67 A ILMN_3240740 2207 ENSG00000277149 TYW1B 0.63 0.67 A ILMN_3240838 200 ENSG00000058063 ATP11B 0.64 0.68 B ILMN_3241051 50 ENSG00000170525 PFKFB3 0.65 0.68 B ILMN_3241169 148 ENSG00000163625 WDFY3 0.62 0.68 B ILMN_3241234 95 ENSG00000117298 ECE1 0.57 0.68 B ILMN_3241257 2309 ENSG00000149474 KAT14 0.63 0.68 A ILMN_3241462 2403 ENSG00000244485 0.68 0.68 A ILMN_3242459 2451 ENSG00000179715 PCED1B 0.72 0.68 A ILMN_3243297 2058 ENSG00000105643 ARRDC2 0.61 0.68 A ILMN_3243441 2451 ENSG00000100100 PIK3IP1 0.72 0.68 A ILMN_3243471 2451 ENSG00000196405 EVL 0.72 0.68 A ILMN_3243700 50 ENSG00000204634 TBC1D8 0.65 0.68 B ILMN_3243705 2354 ENSG00000255513 0.66 0.68 A ILMN_3243943 2454 ENSG00000174748 RPL15 0.66 0.68 A ILMN_3244117 2102 ENSG00000206490 ABCF1 0.62 0.68 A ILMN_3244526 100 ENSG00000179299 NSUN7 0.65 0.68 B ILMN_3244583 293 ENSG00000158517 NCF1 0.67 0.68 B ILMN_3245000 2353 ENSG00000102409 BEX4 0.67 0.68 A Appendix A: Supplementary data 69

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_3245625 2451 ENSG00000118971 CCND2 0.72 0.68 A ILMN_3245973 2301 ENSG00000181035 SLC25A42 0.68 0.68 A ILMN_3246292 2454 ENSG00000254772 EEF1G 0.66 0.68 A ILMN_3247006 2354 ENSG00000178464 0.66 0.68 A ILMN_3247064 294 ENSG00000213148 0.65 0.68 B ILMN_3247569 2154 ENSG00000234080 0.65 0.68 A ILMN_3247597 2356 ENSG00000250144 0.63 0.68 A ILMN_3247645 2204 ENSG00000244073 0.65 0.68 A ILMN_3247653 2253 ENSG00000233927 RPS28 0.66 0.69 A ILMN_3248412 2205 ENSG00000105193 RPS16 0.64 0.69 A ILMN_3248443 2355 ENSG00000136149 0.65 0.69 A ILMN_3248882 2255 ENSG00000175886 0.65 0.69 A ILMN_3249366 2205 ENSG00000241741 0.64 0.69 A ILMN_3250585 2306 ENSG00000162980 ARL5A 0.64 0.69 A ILMN_3251085 2451 ENSG00000116824 CD2 0.72 0.69 A ILMN_3251472 344 ENSG00000186866 POFUT2 0.67 0.69 B ILMN_3251526 2451 ENSG00000162894 FCMR 0.72 0.69 A ILMN_3251658 100 ENSG00000157152 SYN2 0.65 0.69 B ILMN_3251737 2451 ENSG00000138378 STAT4 0.72 0.69 A ILMN_3256712 2451 ENSG00000122224 LY9 0.72 0.69 A ILMN_3260070 2451 ENSG00000269469 0.72 0.69 A ILMN_3262025 2153 ENSG00000105640 RPL18A 0.65 0.69 A ILMN_3263329 2153 ENSG00000105640 RPL18A 0.65 0.69 A ILMN_3263918 2103 ENSG00000063177 RPL18 0.65 0.69 A ILMN_3265365 2007 ENSG00000232573 0.60 0.69 A ILMN_3266128 2304 ENSG00000233913 0.66 0.69 A ILMN_3266197 2203 ENSG00000240489 0.66 0.69 A ILMN_3266482 2455 ENSG00000242262 0.65 0.69 A ILMN_3266666 199 ENSG00000005302 MSL3 0.63 0.69 B ILMN_3267760 445 ENSG00000104763 ASAH1 0.72 0.70 B ILMN_3268028 2104 ENSG00000227097 0.64 0.70 A ILMN_3268914 2457 ENSG00000249617 0.62 0.70 A ILMN_3269719 2451 ENSG00000134954 ETS1 0.72 0.70 A ILMN_3274601 2451 ENSG00000168685 IL7R 0.72 0.70 A ILMN_3275696 248 ENSG00000139132 FGD4 0.62 0.70 B ILMN_3276794 2451 ENSG00000153283 CD96 0.72 0.70 A ILMN_3276990 2254 ENSG00000243927 MRPS6 0.66 0.70 A ILMN_3278627 148 ENSG00000115590 IL1R2 0.62 0.70 B ILMN_3278906 249 ENSG00000121964 GTDC1 0.63 0.70 B ILMN_3279017 50 ENSG00000116717 GADD45A 0.65 0.70 B ILMN_3279675 2009 ENSG00000140988 RPS2 0.59 0.70 A ILMN_3280019 2103 ENSG00000105640 RPL18A 0.65 0.70 A 70 Appendix A: Supplementary data

Probe ID Metagene Ensembl ID Protein name Ptm Ptg Spot ILMN_3280952 344 ENSG00000163812 ZDHHC3 0.67 0.70 B ILMN_3281599 2307 ENSG00000214199 0.64 0.70 A ILMN_3282174 2454 ENSG00000236552 RPL13AP5 0.66 0.70 A ILMN_3282436 2454 ENSG00000236552 RPL13AP5 0.66 0.70 A ILMN_3283155 150 ENSG00000187037 GPR141 0.64 0.71 B ILMN_3283573 2451 ENSG00000127152 BCL11B 0.72 0.71 A ILMN_3283680 2451 ENSG00000010810 FYN 0.72 0.71 A ILMN_3284119 2451 ENSG00000113263 ITK 0.72 0.71 A ILMN_3284584 2401 ENSG00000135905 DOCK10 0.71 0.71 A ILMN_3285198 2451 ENSG00000013725 CD6 0.72 0.71 A ILMN_3287068 2451 ENSG00000139626 ITGB7 0.72 0.71 A ILMN_3289100 247 ENSG00000198585 NUDT16 0.61 0.71 B ILMN_3289650 2151 ENSG00000148832 PAOX 0.64 0.71 A ILMN_3290385 344 ENSG00000233050 LOC100133267 0.67 0.71 B ILMN_3291098 2154 ENSG00000058600 LOC101060521 0.65 0.71 A ILMN_3293049 2204 ENSG00000243964 0.65 0.71 A ILMN_3294335 2451 ENSG00000138795 LEF1 0.72 0.72 A ILMN_3295847 2305 ENSG00000230076 0.65 0.72 A ILMN_3297317 2104 ENSG00000237039 0.64 0.72 A ILMN_3297996 197 ENSG00000215244 LOC399715 0.61 0.72 B ILMN_3298037 200 ENSG00000115594 IL1R1 0.64 0.72 B ILMN_3298694 200 ENSG00000184005 ST6GALNAC3 0.64 0.72 B ILMN_3299407 2451 ENSG00000167286 CD3D 0.72 0.72 A ILMN_3299478 2451 ENSG00000167286 CD3D 0.72 0.72 A ILMN_3300358 2405 ENSG00000228502 0.65 0.72 A ILMN_3300972 2405 ENSG00000214199 0.65 0.73 A ILMN_3301440 50 ENSG00000123836 PFKFB2 0.65 0.73 B ILMN_3302177 50 ENSG00000163803 PLB1 0.65 0.73 B ILMN_3304022 2110 ENSG00000060688 SNRNP40 0.59 0.73 A ILMN_3304519 2451 ENSG00000107742 SPOCK2 0.72 0.74 A ILMN_3305339 150 ENSG00000180509 KCNE1B 0.64 0.74 B ILMN_3305949 2401 ENSG00000118971 CCND2 0.71 0.76 A ILMN_3306215 2307 ENSG00000156508 EEF1A1 0.64 0.76 A ILMN_3306440 2356 ENSG00000156508 EEF1A1 0.63 0.77 A ILMN_3306997 2451 ENSG00000141293 SKAP1 0.72 0.78 A ILMN_3307772 2451 ENSG00000196329 GIMAP5 0.72 0.78 A ILMN_3307791 2153 ENSG00000060688 SNRNP40 0.65 0.79 A ILMN_3308936 347 ENSG00000120738 EGR1 0.64 0.80 B 71

Bibliography

[1] Simone M. C. Spoorenberg et al. “Microbial aetiology, outcomes, and costs of hospitalisation for community-acquired pneumonia; an observational analysis”. In: BMC infectious diseases 14 (2014), p. 335. ISSN: 1471-2334. DOI: 10 . 1186 / 1471-2334-14-335. [2] M. Nawal Lutfiyya et al. “Diagnosis and treatment of community-acquired pneu- monia”. In: American family physician 73.3 (2006), pp. 442–450.

[3] WHO, ed. Mortality Database. 1979. URL: http://apps.who.int/healthinfo/ statistics/mortality/whodpms/. [4] World Health Organization. International Statistical Classification of Diseases and Related Health Problems 10th Revision. 2016. URL: https://www.cdc.gov/ nchs/icd/icd10.htm. [5] Alberto Capelastegui et al. “Etiology of community-acquired pneumonia in a population- based study: Link between etiology and patients characteristics, process-of-care, clinical evolution and outcomes”. In: BMC infectious diseases 12 (2012), p. 134. ISSN: 1471-2334. DOI: 10.1186/1471-2334-12-134. [6] M. Ruiz et al. “Etiology of community-acquired pneumonia: Impact of age, comor- bidity, and severity”. In: American journal of respiratory and critical care medicine 160.2 (1999), pp. 397–405. DOI: 10.1164/ajrccm.160.2.9808045. [7] Mirjam Christ-Crain and Steven M. Opal. “Clinical review: The role of biomarkers in the diagnosis and management of community-acquired pneumonia”. In: Critical care (London, England) 14.1 (2010), p. 203. DOI: 10.1186/cc8155. [8] Richard G. Wunderink and Grant W. Waterer. “Genetics of community-acquired pneumonia”. In: Seminars in respiratory and critical care medicine 26.6 (2005), pp. 553–562. ISSN: 1069-3424. DOI: 10.1055/s-2005-925522. [9] J. M. Ferrer Agüero et al. “Community acquired pneumonia: Genetic variants in- fluencing systemic inflammation”. In: Medicina Intensiva (English Edition) 38.5 (2014), pp. 315–323. ISSN: 21735727. DOI: 10.1016/j.medine.2013.08.001. [10] Peter Ahnert et al. “PROGRESS - prospective observational study on hospital- ized community acquired pneumonia”. In: BMC pulmonary medicine 16.1 (2016), p. 108. ISSN: 1471-2466. DOI: 10.1186/s12890-016-0255-8.. [11] Petra Creutz and Peter Ahnert. Study of Progression of Community Acquired Pneumonia in the Hospital (PROGRESS). 2016. URL: https://www.clinicaltrials. gov/ct2/show/study/NCT02782013. [12] Antonino Gullo, ed. Anaesthesia, Pain, Intensive Care and Emergency Medicine — A.P.I.C.E. Milano: Springer Milan, 2003. ISBN: 978-88-470-0194-7. DOI: 10. 1007/978-88-470-2215-7. 72

[13] J. L. Vincent et al. “The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis- Related Problems of the European Society of Intensive Care Medicine”. In: Inten- sive care medicine 22.7 (1996), pp. 707–710. ISSN: 0342-4642. [14] Henry Löffler-Wirth, Martin Kalcher, and Hans Binder. “oposSOM: R-package for high-dimensional portraying of genome-wide expression landscapes on biocon- ductor”. In: Bioinformatics (Oxford, England) 31.19 (2015), pp. 3225–3227. ISSN: 1367-4811. DOI: 10.1093/bioinformatics/btv342. [15] Wolfgang Huber et al. “Orchestrating high-throughput genomic analysis with Bio- conductor”. In: Nature methods 12.2 (2015), pp. 115–121. ISSN: 1548-7105. DOI: 10.1038/nmeth.3252. [16] Teuvo Kohonen. “Self-organized formation of topologically correct feature maps”. In: Biological Cybernetics 43.1 (1982), pp. 59–69. ISSN: 0340-1200. DOI: 10 . 1007/BF00337288. [17] Henry Wirth. “Analysis of large-scale molecular biological data using self-organizing maps”. Dissertation. Universität Leipzig, 2012. URL: http://www.qucosa.de/ fileadmin / data / qucosa / documents / 10129 / Dissertation % 20Henry % 20Wirth.pdf. [18] Michael A. Mooney and Beth Wilmot. “Gene set analysis: A step-by-step guide”. In: American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics 168.7 (2015), pp. 517–527. DOI: 10.1002/ajmg.b.32328. [19] Damien Chaussabel et al. “A modular analysis framework for blood genomics studies: Application to systemic lupus erythematosus”. In: Immunity 29.1 (2008), pp. 150–164. ISSN: 1097-4180. DOI: 10.1016/j.immuni.2008.05.012. [20] Dmitry R. Bandura et al. “Mass cytometry: Technique for real time single cell mul- titarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry”. In: Analytical chemistry 81.16 (2009), pp. 6813–6822. ISSN: 1520- 6882. DOI: 10.1021/ac901049w.. [21] John Stingl et al. “Purification and unique properties of mammary epithelial stem cells”. In: Nature 439.7079 (2006), pp. 993–997. ISSN: 1476-4687. DOI: 10.1038/ nature04496. [22] Laleh Haghverdi, Florian Buettner, and Fabian J. Theis. “Diffusion maps for high- dimensional single-cell analysis of differentiation data”. In: Bioinformatics (Ox- ford, England) 31.18 (2015), pp. 2989–2998. ISSN: 1367-4811. DOI: 10.1093/ bioinformatics/btv325. [23] Sean C. Bendall et al. “Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development”. In: Cell 157.3 (2014), pp. 714–725. ISSN: 1097-4172. DOI: 10.1016/j.cell.2014.04.005.. 73

[24] Motonari Kondo. “Lymphoid and myeloid lineage commitment in multipotent hematopoi- etic progenitors”. In: Immunological reviews 238.1 (2010), pp. 37–46. DOI: 10. 1111/j.1600-065X.2010.00963.x.. [25] Y. Li et al. “Extension of the Peters-Belson method to estimate health disparities among multiple groups using logistic regression with survey data”. In: Statistics in medicine 34.4 (2015), pp. 595–612. ISSN: 1097-0258. DOI: 10.1002/sim.6357. [26] L. Zhao, Y. Chen, and D. W. Schaffner. “Comparison of logistic regression and linear regression in modeling percentage data”. In: Applied and environmental microbiology 67.5 (2001), pp. 2129–2135. ISSN: 0099-2240. DOI: 10.1128/AEM. 67.5.2129-2135.2001. [27] Thomas A. Domencich and Daniel McFadden. Urban travel demand: A behavioral analysis. Vol. 93. Contributions to economic analysis. Amsterdam: North-Holland Publ. Co, 1975. ISBN: 0444108300. [28] David A. Hensher and Peter R. Stopher, eds. Behavioural travel modelling. Lon- don: Croom Helm, 1979. ISBN: 9780856648199. [29] Charles Janeway. Immunobiology: The immune system in health and disease ; [animated CD-ROM inside]. 5. ed. New York, NY: Garland Publ, 2001. ISBN: 978- 0815336426. URL: http://www.ncbi.nlm.nih.gov:80/books/bv.fcgi? call=bv.View..ShowSection&rid=imm. [30] C. Lundby et al. “Heart rate response to hypoxic exercise: Role of dopamine D2- receptors and effect of oxygen supplementation”. In: Clinical science (London, England : 1979) 101.4 (2001), pp. 377–383. ISSN: 0143-5221. [31] Mar Ariza et al. “Influence of extraneurological insults on ventricular enlargement and neuropsychological functioning after moderate and severe traumatic brain injury”. In: Journal of neurotrauma 21.7 (2004), pp. 864–876. ISSN: 0897-7151. DOI: 10.1089/0897715041526203. [32] M. Rebhan. “GeneCards: Integrating information about genes, proteins and dis- eases”. In: Trends in Genetics 13.4 (1997), p. 163. ISSN: 01689525. DOI: 10 . 1016/S0168-9525(97)01103-7. [33] Siamon Gordon. “Phagocytosis: An Immunobiologic Process”. In: Immunity 44.3 (2016), pp. 463–475. ISSN: 1097-4180. DOI: 10.1016/j.immuni.2016.02.026. [34] Gretchen S. Selders et al. “An overview of the role of neutrophils in innate immu- nity, inflammation and host-biomaterial integration”. In: Regenerative biomaterials 4.1 (2017), pp. 55–68. ISSN: 2056-3418. DOI: 10.1093/rb/rbw041. [35] M. J. Kaplan and M. Radic. NETosis: At the Intersection of Cell Biology, Microbiol- ogy, and Immunology. Frontiers Research Topics. Frontiers E-books, 2013. ISBN: 9782889191581. URL: https://books.google.de/books?id=F233OULwmeMC. [36] C. Dahlgren and A. Karlsson. “Respiratory burst in human neutrophils”. In: Journal of immunological methods 232.1-2 (1999), pp. 3–14. ISSN: 0022-1759. 74 Appendix A: Supplementary data

[37] Nicole M. Martinez et al. “ networks regulated by signaling in human T cells”. In: RNA (New York, N.Y.) 18.5 (2012), pp. 1029–1040. ISSN: 1469- 9001. DOI: 10.1261/.032243.112. [38] K. Agematsu. “Memory B cells and CD27”. In: Histology and histopathology 15.2 (2000), pp. 573–576. ISSN: 0213-3911. [39] B. Py et al. “Siva-1 and an Alternative Splice Form Lacking the Death Domain, Siva-2, Similarly Induce Apoptosis in T Lymphocytes via a Caspase-Dependent Mitochondrial Pathway”. In: The Journal of Immunology 172.7 (2004), pp. 4008– 4017. ISSN: 0022-1767. DOI: 10.4049/jimmunol.172.7.4008. [40] L. Bross et al. “DNA double-strand breaks in immunoglobulin genes undergoing somatic hypermutation”. In: Immunity 13.5 (2000), pp. 589–597. ISSN: 1097-4180. [41] L. R. Ballou et al. “Ceramide signalling and the immune response”. In: Biochimica et biophysica acta 1301.3 (1996), pp. 273–287. ISSN: 0006-3002. Erklärung 75

Erklärung

Hiermit erkläre ich, dass ich meine Arbeit selbstständig verfasst, keine anderen als die angegebenen Quellen und Hilfsmittel benutzt und die Arbeit noch nicht anderweitig für Prüfungszwecke vorgelegt habe.

Stellen, die wörtlich oder sinngemäß aus Quellen entnommen wurden, sind als solche kenntlich gemacht.

Mittweida, 02.01.2018

HSMW-Thesis