Identification of key related to the mechanism and prognosis of lung squamous cell carcinoma using bioinformatics analysis

Miaomiao Gaoa,#, Weikaixin Kongb,#, Zhuo Huangb,* and Zhengwei Xiea,*

#These authors contributed equally to this work aDepartment of Pharmacology and Department of Biomedical Informatics, School of Basic

Medical Sciences, Peking University Health Science Center, 38 Xueyuan Lu, Haidian district,

Beijing 100191, China bDepartment of Molecular and Cellular Pharmacology, School of Pharmaceutical Sciences,

Peking University Health Science Center, 38 Xueyuan Lu, Haidian district, Beijing 100191,

China

* To whom correspondence should be addressed.

ZhengweiXie

Tel: 86-10-82802798; Fax: 86-10-82802798; Email: [email protected].

Zhuo Huang

E-mail: [email protected]

Abstract

Objectives

Lung squamous cell carcinoma (LUSC) often diagnosed as advanced with poor prognosis. The mechanisms of its pathogenesis and prognosis require urgent elucidation. This study was performed to screen potential biomarkers related to the occurrence, development and prognosis of LUSC to reveal unknown physiological and pathological processes.

Materials and Methods

Using bioinformatics analysis, the lung squamous cell carcinoma microarray datasets from the GEO and TCGA databases were analyzed to identify differentially expressed genes

(DEGs). Furthermore, PPI and WGCNA network analysis were integrated to identify the key genes closely related to the process of LUSC development. In addition, survival analysis was performed to achieve a prognostic model that accomplished a high level of prediction accuracy.

Results and Conclusion

Eighty-five up-regulated and 39 down-regulated genes were identified, on which functional and pathway enrichment analysis was conducted. GO analysis demonstrated that up-regulated genes were principally enriched in epidermal development and DNA unwinding in DNA replication. Down-regulated genes were mainly involved in cell adhesion, signal transduction and positive regulation of inflammatory response. After PPI and WGCNA network analysis, eight genes, including AURKA, RAD51, TTK, AURKB, CCNA2, TPX2,

KPNA2 and KIF23, have been found to play a vital role in LUSC development. The prognostic model contained 20 genes, 18 of which were detrimental to prognosis. The AUC of the established prognostic model for predicting the survival of patients at 1, 3, and 5 years

was 0.828, 0.826 and 0.824, respectively. To conclude, this study identified a number of biomarkers of significant interest for additional investigation of the therapies and methods of prognosis of lung squamous cell carcinoma.

Keywords: lung squamous carcinoma, bioinformatics, prognosis

1. Introduction

Lung cancer is among the deadliest malignancies globally and can be categorized into two main types: small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC).

NSCLC, accounting for approximately 85% of lung cancer cases, is principally divided into lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) depending on pathogenesis and histological morphology [1-3]. Many therapeutic methods are currently being used to treat lung cancer, such as surgical resection, chemotherapy, radiotherapy and targeted therapy. Because NSCLC is insensitive to radiotherapy and resistant to anticancer drugs, targeted therapy has become the most effective way to treat patients whose tumors cannot be surgically removed [4,5]. In recent years, targeted therapies have undergone considerable development, and some effective molecular targets have been identified, such as epidermal growth factor receptor (EGFR), anaplastic lymphoma kinase (ALK), etc. It has been confirmed that these targets are successful in lung adenocarcinoma, but not in lung squamous cell carcinoma, because the two major subtypes have different mutation profiles

[6-8]. In order to better diagnose and treat patients with LUSC, an investigation of novel biomarkers is required. LUSC, which accounts for 30% of cases of NSCLC, is more common in middle-aged and older men and has a high rate of metastasis and recurrence [9]. NSCLC patients are mostly diagnosed during the advanced stage, their 5-year survival rate lower than that of early stage patients [4,10,11]. Thus, further study of prognostic markers is required so as to personalize cancer treatment. Although a number of studies have reported on LUSC- related genes and prognostic markers, the specific molecular mechanisms in the pathogenesis and progression of LUSC have not been systematically evaluated, which constrains the potential for early diagnosis and treatment [12,13]. An in-depth understanding of the molecular mechanisms involved in the occurrence and development of lung cancer may provide a more effective strategy for early detection and subsequent clinical treatment.

Therefore, novel promising biomarkers or potential drug treatments are urgently required.

Genomic microarrays and high-throughput sequencing technology combined with bioinformatics analysis has gradually become a powerful tool for the discovery of disease biomarkers and related pathways. In the present study, three RNA microarray datasets from the GEO database were analyzed to identify differentially expressed genes (DEGs) in lung squamous cell carcinoma compared with normal tissues. ontology (GO) and Kyoto

Encyclopedia of Genes (KEGG) analyses were performed to identify key genes and pathways which possibly influence the pathogenesis of LUSC. On the other hand, RNAseq data from the TCGA database were also used to screen out DEGs for weighted gene co-expression network analysis (WGCNA) so as to obtain hub genes associated with the occurrence and development of LUSC. Eventually, key genes at the intersection of these two databases identified genes that are critical to the disease. In addition, we utilized patient clinical information from DEGs in the TCGA for univariate cox regression analysis, lasso regression analysis and multivariate cox regression analysis to identify prognostic-related key genes.

This ensured that an improved understanding of the mechanism and prognosis of lung squamous cell carcinoma was obtained.

2. Material and methods

2.1 Data Collection and data processing

The Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) database from the National Center for Biotechnology Information was searched for publicly available studies and samples that fulfilled the following criteria for analysis: (1) the gene expression data series contained LUSC tissue and normal tissue samples; (2) the species of the samples was Homo Sapiens. Finally, three gene expression profiles (GSE2088, GSE6044, GSE19188) were collected for further analysis [14]. These three datasets were downloaded, the details of

which are shown in Sup.Table 1. GEO2R was used to calculate the fold change value and P- values of every gene in each group. Any gene for which |log2FC| > 1 and P-value < 0.05 was defined as a differentially expressed gene [15]. If multiple probes corresponded to the same gene, the maximum value of fold change was considered the gene expression level.

2.2 TCGA data validation

To ensure accuracy, further validation analysis was performed to confirm the results.

LUSC RNAseq data was downloaded from The Cancer Genome Atlas (TCGA, https://www.cancer.gov/tcga) database, acquired using the Illumina Hiseq platform. The dataset included 502 LUSC samples and 49 normal samples. The genomic data used was standardized using the fpkm method [16]. For repetitive genetic data, the largest value was utilized as the expression level. Genes with a mean expression of less than 0.5 were excluded in each case to ensure significantly expressed genes were evaluated. The p-values of differentially-expressed genes were identified using a Wilcox test. From analysis methods of the GEO data, DEGs in the SCC samples compared with normal samples were identified.

Raw P-values obtained from the Wilcox test and log2FC calculation were adjusted using the

BH16 method [17]. Thresholds of |log2FC| > 1.0 and an adjusted P-value of < 0.05 were selected. The process above was conducted using the R package "limma".

2.3 Functional and pathway enrichment analysis

The GO database has a large collection of gene annotation terms, allowing genome annotation using consistent terminology. GO enrichment analysis including molecular function, cellular components and biological processes, identified which GO terms were over or underrepresented within a given set of genes [18]. The KEGG knowledge database, an integrated database resource, is generally used to identify functional and metabolic pathways

[19]. GO and KEGG analysis were both conducted using the Database for Annotation,

Visualization and Integrated Discovery (DAVID, https://david.ncifcrf.gov/), analysis tools for

extracting meaningful biological information from multiple gene and collections

[20,21]. Up- and down-regulated genes were analyzed separately, a P-value < 0.05 considered the threshold value.

2.4 PPI network construction and analysis of modules

Protein-protein interaction (PPI) networks can assist in identifying key genes and pivotal gene modules involved in the development of LUSC from an interaction level. The Search

Tool for The Retrieval of Interaction Genes (STRING, https://string-db.org/) was used to construct PPI networks. STRING is a database which provides critical assessment and integration of protein–protein interactions, including physical and functional associations

[22]. DEGs were mapped to STRING to evaluate the PPI information and set confidence score > 0.4 as the cut-off standard. Cytoscape was used to visualize the PPI network, a practical open-source software tool for the visual exploration of biomolecule interaction networks consisting of protein, gene and other types of interaction [23]. Module analysis was conducted using the plug-in Molecular Complex Detection (MCODE) in Cytoscape to display the biological significance of gene modules [24].

2.5 Weighted correlation network analysis of DEGs

In the WGCNA algorithm [25], the power exponential weighting of gene correlation coefficients was used to represent the correlation between genes. The power value 훽 was selected such that connections between genes were subject to scale-free network distribution.

A topological matrix that incorporated surrounding genetic information was calculated relative to distance d.

푆푖푗 = cor(i, j)

훽 훼푖푗 = |푆푖푗|

푑푖j = 1 – 휔푖푗 where 푙푖푗 = ∑푢 훼푖푢훼푢푗 and 푘푖 = ∑푢 훼푖푢 represented node connectivity. The samples were firstly clustered using hierarchical clustering and the threshold set to 40,000 to eliminate outliers. d was then calculated and the dynamic pruning method was used after determining values of gene module parameters (maxBlocksize=7000, deepSplit = 2, minModuleSize = 80, mergeCutHeight = 0.25). After obtaining these, Pearson correlation coefficients of the gene modules and phenotypes (cancer tissue or paracancerous tissue samples) were calculated.

This allowed selection of gene modules closely related to tumorigenesis. Genes in the gene modules from GO and KEGG analysis were used to observe the function of each gene module. Next, two indicators for the genes in these gene modules were calculated: one was the Pearson correlation coefficient of the gene in the module and the first principal component of the module, termed Module Membership (MM); the other is the Pearson correlation coefficient of the gene and phenotype in the sample, termed Gene Significance

(GS). If a gene within a module had both large MM and GS, then the gene was considered to be a key gene in the module. In this study, genes in which the MM and GS were in the upper quartile of all genes in the module were identified as key genes. The key genes in key modules (modules with a high correlation coefficient for phenotype) were compared with key genes derived from PPI to obtain decisive genes in the development of LUSC. The WGCNA calculations were accomplished using the "WGCNA" package in R [26].

2.6 Survival analysis

Patient clinical data were downloaded from the TCGA website, then samples missing overall survival (OS) data were deleted, finally resulting in a total of 493 samples for survival analysis. Univariate Cox proportional hazards regression analysis was used to screen for

genes significantly associated with prognosis (P < 0.05) [27]. Later, lasso regression analysis was used to eliminate collinearity between genes. After performing 1000 10-fold cross- validations, the λ value in which the error was minimized was selected as the optimum λ parameter value [28]. Then multivariate Cox proportional hazards regression analysis was used to find key genes involved in the establishment of a prognostic model [29]. The model used disease risk scores as predictors of prognostic status. The disease risk score was determined by parameter β of the multivariate Cox proportional hazards regression analysis and the magnitude of the expression of each gene in the sample, using the following formula:

푁 Risk score = ∑ 퐸푥푝푖 × 훽푖 i=1

where Expi and β are the expression levels of gene i in a particular patient and coefficients of the gene i in the multivariate cox regression analysis [30]. The risk score for each patient was calculated, which were then categorized into high or low-risk in comparison with the median value. Kaplan-Meier survival curves were then plotted to evaluate whether the prediction effect of the model was significant (p<0.05). The statistical method used in this process was a

Log-rank test. The predictive performance of this model at different endpoints (1, 3 or 5 years) was assessed using a time- dependent receiver operating characteristic (ROC) curve

[31]. The R packages used in the survival analysis procedure included: "survival", "caret",

"glmnet", "survminer" and "survivalROC".

3. Results

3.1 Identification of DEGs

Prior to GEO2R analysis, hierarchical clustering was employed to detect sample groups and remove data deviating from the sample group. After measuring the quality of samples, in total there were 97 normal lung samples and 84 with LUSC. Finally, a total of 1013, 675 and

3398 DGEs were identified from the three datasets. Among these DEGs, 128 genes were found in all three sets, including 88 significantly up-regulated DEGs and 4 significantly down-regulated DEGs. In a TCGA dataset containing 49 normal samples and 499 LUSC samples, 3348 up-regulated genes, 3387 down-regulated genes were identified (information on DEGs of the three GEO and one TCGA datasets and the overlap of genes between the datasets is displayed in Supplementary Table). The intersection is shown in Figure 1, including 85 significantly up-regulated and 39 down-regulated genes, the change in direction of expression of TCGA consistent with the DEGs in the GEO database. These genes were used to perform subsequent functional and pathway enrichment analysis.

3.2 GO term and KEGG pathway enrichment analysis of DEGs

The function and mechanism of the DEGs identified is this way were further evaluated by GO term and KEGG pathway analysis. Up- and down-regulated genes were uploaded to

DAVID to conduct the analysis. In biological processes, up-regulated DEGs were significantly associated with epidermis development, collagen catabolic process, collagen fibril organization. Down-regulated DEGs were predominantly involved in cell adhesion, signal transduction, positive regulation of inflammatory response and regulation of cell growth. For molecular function, significantly up-regulated DEGs were represented by response to protein binding, and structural molecular activity while significantly down- regulated DEGs were involved in flavin adenine dinucleotide binding, thrombospondin receptor activity, and protein binding. For cellular components, significantly up-regulated

DEGs were enriched in cytosol, extracellular matrix, desmosome and those that were significantly down-regulated were found in extracellular exosome and space, and membrane raft. Figure 2 presents additional details regarding GO term analysis. In addition, KEGG pathway analysis identified six enriched pathways, in which three were over-represented in up-regulated genes, including cell cycle, ECM-receptor interaction, and amoebiasis. Down-

regulated genes enriched in three metabolic pathways, namely drug metabolism of cytochrome P450, tyrosine metabolism and phenylalanine metabolism (Sup.Table 2).

3.3 PPI network analysis of DEGs

The PPI network was constructed by Cytoscape based on the STRING database, consisting of 105 nodes and 483 edges, including 76 up- and 29 down-regulated genes

(Figure 3A). The 15 DGEs with the greatest degree of connectivity were selected as key genes of SCC. These genes were: RFC4, AURKA, MAD2L1, TPX2, CCNA2, AURKB,

TTK, TOP2A, RAD51, CDKN3, TK1, KPNA2, KIF23, PBK and UBE2C, which may play an important role in SCC progression. MCODE in Cytoscape was used to perform module analysis. Four significant modules were identified after MCODE analysis. The most significant module (MCODE score = 23.217) included 24 nodes and 267 edges (Figure 3B).

Remarkably, genes in this module were all up-regulated. Functional and pathway enrichment analysis of the DEGs in this module were also conducted using DAVID. GO term enrichment analysis demonstrated that genes in this module were principally enriched in and mitotic nuclear division in biological processes. Cell component analysis indicated that genes were significantly enriched in nucleoplasm, spindle and kinetochore. Molecular functional analysis demonstrated that the genes were principally involved in the binding of ATP and protein. KEGG analysis suggested that the genes were mainly involved in cell cycle (Table

1). The other three modules were shown in Sup. Figure 1.

3.4 Weighted gene correlation network analysis of DEGs

Based on the results of hierarchical clustering, we first removed two samples:

TCGA.63.5128.01 and TCGA.92.8065.01 (Sup. Figure 2). A value of β=5 was selected as a soft threshold to establish a gene regulatory network (Figure 4A). After obtaining the gene modules using a dynamic pruning method, it was found that the correlation coefficients of the blue, yellow and turquoise modules were greatest, at 0.538, -0.542 and -0.870, respectively

(Figure 4C). In addition, the first principal component of the genes in these modules and the

Pearson correlation coefficient between the modules in terms of clustering, were calculated.

From these results (Figure 4B), we can see that the three modules turquoise, yellow and red were of greatest consistency. The correlation coefficients of these three modules and phenotypes were negative. The GO and KEGG analysis results for the blue, yellow and turquoise modules are shown in Sup. Figure 3. It can be seen that the genes in the yellow module were more related to inflammatory and immune response processes, also a key module most likely to appear in the WGCNA analysis of other disease and control groups

[32]. The blue module was more closely related to cell cycle, , structure changes, etc., possibly related to the excessive proliferation of cells during cancer, targeted by many classic anticancer drugs such as paclitaxel and vinblastine [33,34], that play important roles in these processes. Thus, genes in this module are important for drug development. The results of GO and KEGG analysis of the turquoise module were more complex. This principally included cell migration and adhesion, response to hormones, tissue development, cell-growth pathway (TGF-beta signaling pathway, etc.), and immune processes. Among the 20 key genes obtained by PPI, 8 genes were located in the blue module and belonged to key genes of the blue module (Sup.Table 3)., while the other 11 genes were not present among key genes of any module.

3.5 Survival analysis

In order to establish an effective model for predicting prognostic status, univariate Cox proportional hazards regression analysis was used, with Lasso regression analysis and multivariate Cox proportional hazards regression analysis to screen genes. In the Univariate

Cox proportional hazards regression analysis, 111 genes with significant effects on prognosis

(p<0.01, Supplementary Table) were identified. In lasso regression, following 10-fold cross- validation with 1000 repeats, λ was 0.0007079 (Figure 5D), where Partial Likelihood

Deviance was smallest. For a λ of this value, 33 of the 111 genes had coefficients that were not zero (Figure 5E). A total of 20 genes were obtained by the forward-backward selection technique in multivariate Cox proportional hazards regression analysis for to establish a prognostic risk score model, namely: KLK8, POPDC3, IGLL1, PLEKHA6, PTGIS,

ANKFN1, TM4SF19, JAG1, OR2W3, RALGAPA2, RPH3AL, PCDHGA7, C10orf55,

ACPT, MMP12, H1FOO, CT45A10, UGT2A1, a low expression of which improved prognosis, and LYSMD1 and ONECUT3, where improved prognosis occurred with high expression (Figure 6A).

Risk score = (0.00943 × KLK8) + (0.01536 × POPDC3) + (0.04825 × IGLL1) +

(−0.04272 × LYSMD1) + (0.01822 × PLEKHA6) + (0.01318 × PTGIS) + (0.14178 ×

ANKFN1) + (0.02925 × TM4SF19) + (0.00116 × JAG1) + (0.11625 × OR2W3) +

(−0.47624 × ONECUT3) + (0.01810 × RALGAPA2) + (0.03305 × RPH3AL) + (0.06967

× PCDHGA7) + (0.06217 × C10orf55) + (0.08721 × ACPT) + (0.00082 × MMP12) +

(0.58094 × H1FOO) +(0.01348 × CT45A10) + (0.02889 × UGT2A1).

The distribution of risk scores and groupings are shown in Figure 5A. Patient risk scores are ranked and patient gene expression plotted as a heatmap (Figure 5C). It can be seen from the heat map that gene expression levels with a negative coefficient gradually decreased as risk score increased. Conversely, expression levels of genes with a positive coefficient displayed the opposite. As risk scores increased, the number of patients that died increased and duration of survival gradually decreased, confirming that the risk model was realistic

(Figure 5B). The Kaplan–Meier curves were grouped by defined risk scores. It can be seen that prognosis of the low-risk group was significantly better than that of the high-risk group

(Figure 5F). By predicting survival of patients at 1, 3, and 5 years, the area under the curve

(AUC) of the ROC curves obtained from the risk-based prediction model was 0.828, 0.826 and 0.824 (Figure 6B).

4. Discussion

Although many relevant studies of LUSC have been performed, early diagnosis, efficacy of treatment and prognosis for LUSC remain poorly resolved. For diagnosis and treatment, it is necessary to further understand the molecular mechanisms resulting in occurrence and development. Due to the development of high-throughput sequencing technology, the genetic alterations due to disease progression can be detected, indicating gene targets for diagnosis, therapy and prognosis of specific diseases.

In this study, the intergrading of GEO and TCGA data for LUSC resulted in the identification of 124 significantly DEGs in lung squamous cell cancer compared with normal samples, including 85 up-regulated DEGs and 39 that were down-regulated. GO analysis demonstrated that the up-regulated genes were principally involved in epidermal development and DNA unwinding in DNA replication, while the down-regulated genes were mainly enriched in cell adhesion, signal transduction and positive regulation of inflammatory response. KEGG analysis indicated that these differentially expressed genes were involved in cell cycle regulation, cancer-related pathways, ECM receptor interaction and various metabolic pathways. In addition, 15 genes of high significance, including RFC4, AURKA,

MAD2L1, TPX2, CCNA2, AURKB, TTK, TOP2A, RAD51, CDKN3, TK1, KPNA2, KIF23,

PBK and UBE2C, were identified by constructing a PPI network. Through module analysis we identified four significant gene modules, in which the most significant module contained

26 nodes and 267 edges strongly associated with cell division and related activities.

Interestingly, the genes in this module were all up-regulated and fifteen key genes of the highest significance were contained in this module. Through WGCNA analysis, three

significant gene modules, in which the blue module was positively correlated with the development of tumors, the opposite of the others, were obtained. Key genes identified in the

PPI network were mostly contained in the blue module, including AURKA, RAD51, TTK,

AURKB, CCNA2, TPX2, KPNA2 and KIF23. GO analysis indicated that the blue module was associated with cell cycle, consistent with PPI submodule analysis. The WGCNA package constructed a network based on the correlation between genes, whereas the PPI network was based on protein networks reported in the known literature. It seems appropriate to combine WGCNA and PPI methods to identify key genes. Thus, these eight genes were selected as hub genes for the occurrence and development of LUSC. All eight are involved in cell cycle progression and regulate different mitotic events. A large number of studies have reported that abnormal expression levels were found in multiple types of human malignancy and have potential as anticancer therapeutic targets [35-40].

AURKA and AURKB are members of the aurora kinase family and both play central roles in regulating cell-cycle progression from G2 through to . AURKA is involved in many mitotic events, including centrosome maturation, mitosis entry, mitotic spindle formation and cytokinesis [41]. AURKB exerts its function by regulating chromosomal alignment, segregation and cytokinesis, as the catalytic protein of the chromosomal passenger complex (CPC) [40]. Yu et al. identified functional interaction between AURKA and AURKB, assisting in protection of their stability and partially explaining their persistent high expression and activity in cancers [43]. Ma et al. verified that up-regulation of miR-32 down-regulated AURKA and thus inhibited the occurrence and development of NSCLC [44]. Jin et al. performed a bioinformatics analysis and proposed that

AURKB may be the key gene in LUAC and could result in poor prognosis [45].

RAD51 is the central protein in the homologous recombination (HR) pathway and is critical for DNA replication and repair. Recent studies have reported that RAD51 is also

implicated in the suppression of innate immunity [46]. Hu et al. pointed out that RAD51 expression was elevated by a KRAS mutation which predominantly occurred in lung adenocarcinomas. Furthermore, they found that the high expression of RAD51 was associated with poor survival in lung adenocarcinoma through analysis of data from the

TCGA database [47]. Moreover, expression levels of RAD51 were also linked to resistance to chemotherapy. Takenaka et al. reported that the combined expression of RAD51 and another DNA repair enzyme, ERCC1 (Excision repair cross‐complementation group 1) is associated with resistance to platinum agents, indicating the potential role of RAD51 in prognosis [48].

Monopolar spindle1 (Mps1), also known as TTK, is a dual-specificity kinase that phosphorylates serine/threonine and tyrosine residues. TTK is a key mitotic checkpoint protein, similar to AURKA and AURKB. It functions as a critical component of the SAC

(spindle assembly checkpoint) signaling cascade, ensuring correct chromosome segregation

[49]. High expression of TTK in LUAC and LUSC has been revealed through analysis of

TCGA data, where it eliminates lung cancer by augmenting apoptosis and polyploidy [50].

Cyclin family member CCNA2 is involved in cytoskeletal dynamics, epithelial–mesenchymal transition (EMT) and metastasis [51]. Chen et al. used two lung adenocarcinoma cell lines to verify that miR-137 can induce G1/S cell cycle arrest and dysregulate mRNA expression in cell cycle associated , including CCNA2 [52].

TPX2, a -associated protein, is required for the formation of normal bipolar spindles and chromosome separation. Overexpression of TPX2 can induce abnormal centrosome amplification, leading to aneuploidy formation and malignant transformation of cells. It can also promote cell proliferation and participate in cell cycle and apoptosis regulation [53]. It is overexpressed in LUSC, which has been specifically verified [54].

Remarkably, TPX2 can specifically interact with AURKA to complete a series of biological

functions including targeting of AURKA to the spindle microtubes and assembly of spindles of the correct length which faithfully segregate [55]. Co-expression of TPX2 and AURKA have been verified, depletion of either causing defects in the formation of spindles [35,55]. In LUSC, up-regulation of AURKA and TPX2 are correlated, a high correlation in tumor tissues having been established [56].

KPNA2 regulates the transportation of a variety of important proteins from the cytoplasm into the nucleus. Increasing evidence has shown that KPNA2 contributes to the process of gene regulation, including that of oncogenes and tumor-suppressor genes, further affecting biological functions, such as cell proliferation, differentiation, cell-matrix adhesion, colony formation and migration [57]. Wang et al. found that KPNA2 was overexpressed in the nuclei of tumor cells in NSCLC, through analysis of the cancer cell secretome and tissue transcriptome, suggesting that KPNA2 is a potential biomarker for NSCLC [58].

Nuclear protein KIF23 is reported to interact with the aurora kinase family to regulate cell division and cytokinesis [59]. A reduction in KIF23 gene expression was able to suppress cell proliferation [60]. A number of experimental and bioinformatics studies have pointed to high levels of KIF23 expression observed in the majority of metastatic lung cancer tissues, indicating that KIF23 plays a role in NSCLC occurrence and development [36,61-63].

Importantly, abnormal behavior of genes in lung cancer has been widely reported. However, its role and molecular mechanisms in LUSC require additional investigation.

We have established a prognostic model that predicts patient survival. This model contains 20 key genes, some of which have not been reported in the literature in LUSC studies, including POPDC3, IGLL1, TM4SF19, RALGAPA2, RPH3AL, PCDHGA7 or

ACPT. The present study found that KLK8 (kallikrein8) is involved in the development of a variety of tumors, including cervical, ovarian and breast cancer. KLK8 is expressed in different tissues and cells and regulated by steroid hormones [64-66]. KLK is distributed

principally in normal tissues such as the kidneys, breast, skin, tonsil, testis, etc., but KLK8 is expressed less in the lungs, which may be a reason for the small coefficient (0.00943) for

KLK8 in the risk score [67]. That study demonstrated that KLK8 is highly expressed in ovarian cancer and can be used as a marker for tumorigenesis. Whether KLK8 can be used as a biomarker for squamous cell carcinoma deserves further investigation [68]. High expression of LYSMD1 and ONECUT3 has a protective effect on squamous cell carcinoma. LYSMD1 is a member of the LysM domain. The LysM domain is widely found in plants, bacteria, fungi and animals [69,70] and can be combined with a variety of other proteins. At present, few studies have reported on its role in mammalian tissues, but it can be seen from the results of survival analysis that high expression of LYSMD1 may have a protective effect on keratinized squamous cell carcinoma.

The ONECUT transcription factor contains three members in mammalian cells, namely hepatocyte nuclear factor 6 (ONECUT-1), ONECUT-2 (OC-2) and ONECUT-3 (OC-3), roles of which overlap [71]. ONECUT participates in the biological functions of cell cycle and proliferation regulation, cell differentiation, organ formation, cell migration, cell adhesion and metabolism by regulation of gene expression [72]. Current studies on the regulation of cell proliferation by OC transcription factors are focusing mainly on ONECUT-1. It is worth noting that ONECUT-1 has a certain tissue specificity for the regulation of cell proliferation.

For example, overexpression of ONECUT-1 in colonic adenocarcinoma cell line Caco-2 inhibits cell cycle progression [73]. At present, there are few reports of the regulation of cell cycle in ONUCUT-3, but the high expression of this gene has a significant prognostic effect.

In lung tissue, the expression of ONECUT-1 is reduced, so ONECUT-3 may regulate cell cycle and adhesion, but this requires further experimental verification. Related studies have shown that the occurrence of lung cancer and the metastasis of cancer cells can significantly change the patient's olfactory function, consistent with the genes we have identified. Thus,

OR2W3 (olfactory receptor family 2 subfamily W member 3) could potentially be used as a tumor marker for the examination of squamous cell carcinoma [74,75].

H1FOO is an H1.8 linker . Expression levels of this gene are closely related to time course, but different from other H1 protein subtypes. It not only functions in oocyte- embryo development transition, but also plays a crucial role in the genome reprogramming process. When the expression of the H1FOO gene in oocytes increases, the pluripotency of the cells also increases, indicating that H1FOO is involved in chromosome remodeling and gene reprogramming. This process is closely related to the occurrence of cancer, so to some extent, the levels of H1FOO expression reflect the degree of cancer cell proliferation [76,77].

Ankyrin repeats play an important role in the progression of inflammation-associated tumors. At present, it has been found that repeats are highly expressed in cervical cancer, liver cancer, cholangiocarcinoma, breast cancer, etc. and that their expression levels are closely related to depth of tumor invasion, grading stage [78] and tumor metastasis.

Studies have suggested that ankyrin repeats may play a role in cell proliferation by regulating the function of the important tumor suppressor genes p53, Rb and hypoxia inducible factor-1

(HIF-1), and the IL-8 pathway [79]. The protein fibronectin is a glycoprotein secreted by tumor-associated fibroblasts and tumor cells. Deposition of fibronectin stimulates cancer cells to bind to insulin-like growth factor binding protein-3 (IGFBP-3) and epidermal growth factor (EGF) [80]. When fibronectin increases in concentration, IGFBP-3 and EGF promote cancer cell proliferation. Conversely, if fibronectin is deleted, cancer cell proliferation is inhibited. This is consistent with the large positive coefficient (0.14178) for the ANKFN1

(ankyrin repeat and fibronectin type III domain containing 1) gene obtained in the present study.

5. Conclusions

In the present study, microarray data from the GEO database were integrated with RNA sequencing data from the TCGA, to identify key genes and more important hub genes.

Finally, we identified eight hub genes associated with the pathogenesis and progression of

LUSC. Additionally, we performed survival analysis and built a Cox proportional hazards model to identify prognostic biomarkers. A prognostic gene signature consisting of 20 genes was constructed with good performance in predicting overall survival. These results will serve as a reference for future research on the pathogenesis and drug treatment for LUSC.

Nevertheless, the lack of experimental verification is a limitation of this study. The predictions obtained from the bioinformatics analysis can be verified by future experimental studies.

Data availability

Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/)

The Cancer Genome Atlas (TCGA, https://www.cancer.gov/tcga)

Database for Annotation, Visualization and Integrated Discovery (DAVID, https://david.ncifcrf.gov/),

The Search Tool for The Retrieval of Interaction Genes (STRING, https://string-db.org/)

Abbreviations

SCLC Small Cell Lung Cancer

NSCLC Non-Small Cell Lung Cancer

LUAD Lung Adenocarcinoma

LUSC Lung Squamous Cell Carcinoma

DEGs Differentially Expressed Genes

PPI Protein-protein interaction

FC Fold Change

GEO Gene Expression Omnibus

TCGA The Cancer Genome Atlas

GO

KEGG Kyoto Encyclopedia of Genes and Genomes

DAVID Database for Annotation, Visualization and Integrated Discovery

STRING Search Tool for The Retrieval of Interaction Genes

MCODE Molecular Complex Detection

WGCNA Weighted Gene Co-Expression Network Analysis

GS Gene Significance

MM Module Membership

OS Overall Survival

Gene Abbreviation

RFC4 Replication Factor C Subunit 4

MAD2L1 Mitotic Arrest Deficient 2 Like 1

AURKA Aurora Kinase A

CDKN3 Cyclin Dependent Kinase Inhibitor 3

RAD51 DNA Repair Protein RAD51 Homolog 1

TOP2A DNA Topoisomerase II Alpha

TTK Threonine and Tyrosine Kinase

AURKB Aurora Kinase B

CCNA2 Cyclin A2

TPX2 Xenopus Kinesin-Like Protein 2

TK1 Thymidine Kinase 1

PBK PDZ Binding Kinase

KPNA2 Karyopherinα2

KIF23 Kinesin Family Member 23

KLK8 Kallikrein8

OR2W3 Olfactory Receptor Family 2 Subfamily W Member 3

H1FOO H1.8 Linker Histone

ANKFN1 Ankyrin Repeat and Fibronectin Type III Domain Containing 1

Funding

This work was supported by National key research and development program of China

(2018YFA0900200), National Natural Science Foundation of China Grants (31771519,

31871083) and Beijing Natural Science Foundation (5182012, 7182087), the Ministry of

Science and Technology of China Grant 2015CB559200.Funding for open access charge:

National key research and development program of China.

Author Contributions Statement

All authors contributed to the work presented in this paper. Miaomiao Gao and Zhengwei

Xie identified the research topic. Miaomiao Gao performed the selection of GEO dataset data, the screening of GEO differential expressed genes, the establishment of PPI, the GO and KEGG analysis of DEGs, drawing of differentially expressed genes and wrote relevant content. Weilaixin Kong performed the screening of TCGA differential expressed genes,

WGCNA analysis, survival analysis, GO and KEGG analysis of gene modules and wrote relevant content. Zhuo Huang and Zhengwei Xie gave suggestions on the choice of experimental methods and the writing of the paper. The manuscript was reviewed and approved by all authors.

Conflicts of interest

None.

References

[1] J. Xu, Y. Shu, T. Xu, W. Zhu, T. Qiu, J. Li, M. Zhang, J. Xu, R. Guo, K. Lu, L. Zhu, Y.

Yin, Y. Gu, L. Liu, P. Liu, R. Wang, Microarray expression profiling and bioinformatics

analysis of circular rna expression in lung squamous cell carcinoma, Am. J. Transl. Res.

10 (2018) 771-783.

[2] L. Gao, Y.N. Guo, J.H. Zeng, F.C. Ma, J. Luo, H.W. Zhu, S. Xia, K.L. Wei, G. Chen,

The expression, significance and function of cancer susceptibility candidate 9 in lung

squamous cell carcinoma: A bioinformatics and in vitro investigation, Int. J. Oncol. 54

(2019) 1651-1664. https://doi.org/10.3892/ijo.2019.4758.

[3] W.J. Chen, R.X. Tang, R.Q. He, D.Y. Li, L. Liang, J.H. Zeng, X.H. Hu, J. Ma, S.K. Li,

G. Chen, Clinical roles of the aberrantly expressed lncRNAs in lung squamous cell

carcinoma: A study based on RNA sequencing and microarray data mining, Oncotarget 8

(2017) 61282-61304. https://doi.org/10.18632/oncotarget.18058.

[4] C.H.Y. Cheung, H.F. Juan, Quantitative proteomics in lung cancer, J. Biomed. Sci. 24

(2017) 37. https://doi.org/10.1186/s12929-017-0343-y.

[5] Y. Zhang, Q. Yang, S. Wang, MicroRNAs: A new key in lung cancer, Cancer

Chemother. Pharmacol. 74 (2014) 1105-1111. https://doi.org/10.1007/s00280-014-2559-

9.

[6] F. Oberndorfer, L. Müllauer, Molecular pathology of lung cancer: Current status and

perspectives, Curr. Opin. Oncol. 30 (2018) 69-76.

https://doi.org/10.1097/CCO.0000000000000429.

[7] B. Piperdi, A. Merla, R. Perez-Soler, Targeting angiogenesis in squamous non-small cell

lung cancer, Drugs 74 (2014) 403-413. https://doi.org/10.1007/s40265-014-0182-z.

[8] K.H. Yu, C. Zhang, G.J. Berry, R.B. Altman, C. Ré, D.L. Rubin, M. Snyder, Predicting

non-small cell lung cancer prognosis by fully automated microscopic pathology image

features, Nat. Commun. 7 (2016) 12474. https://doi.org/10.1038/ncomms12474.

[9] L. Qian, L. Lin, Y. Du, X. Hao, Y. Zhao, X. Liu, MicroRNA-588 suppresses tumor cell

migration and invasion by targeting GRN in lung squamous cell carcinoma, Mol. Med.

Rep. 14 (2016) 3021-3028. https://doi.org/10.3892/mmr.2016.5643.

[10] Y. Mao, D. Yang, J. He, M.J. Krasna, Epidemiology of lung cancer, Surg. Oncol. Clin.

N. Am. 25 (2016) 439-445. https://doi.org/10.1016/j.soc.2016.02.001.

[11] Q. Wang, S. Liu, X. Zhao, Y. Wang, D. Tian, W. Jiang, MiR-372-3p promotes cell

growth and metastasis by targeting FGF9 in lung squamous cell carcinoma, Cancer Med.

6 (2017) 1323-1330. https://doi.org/10.1002/cam4.1026.

[12] Y. Matsuoka, Y. Takagi, K. Nosaka, T. Sakabe, T. Haruki, K. Araki, Y. Taniguchi, T.

Shiomi, H, Nakamura, Y. Umekita, Cytoplasmic expression of maspin predicts

unfavourable prognosis in patients with squamous cell carcinoma of the lung,

Histopathology 69 (2016) 114-120. https://doi.org/10.1111/his.12921.

[13] F. Guo, K. Hiroshima, D. Wu, M. Satoh, M. Abulazi, I. Yoshino, T. Tomonaga, F.

Nomura, Y. Nakatani, Prohibitin in squamous cell carcinoma of the lung: Its expression

and possible clinical significance, Hum. Pathol. 43 (2012) 1282-1288.

https://doi.org/10.1016/j.humpath.2011.10.006.

[14] R. Edgar, M. Domrachev, A.E. Lash, Gene Expression Omnibus: NCBI gene expression

and hybridization array data repository, Nucleic Acids Res. 30 (2002) 207-210.

https://doi.org/10.1093/nar/30.1.207.

[15] T. Barrett, S.E. Wilhite, P. Ledoux, C. Evangelista, I.F. Kim, M. Tomashevsky, K.A.

Marshall, K.H. Phillippy, P.M. Sherman, M. Holko, A. Yefanov, H. Lee, N. Zhang, C.L.

Robertson, N. Serova, S. Davis, A. Soboleva, NCBI GEO: Archive for functional

genomics data sets – Update, Nucleic Acids Res. 41 (2013) D991-D995.

https://doi.org/10.1093/nar/gks1193.

[16] K. Song, L. Li, G. Zhang, Bias and correction in RNA-seq data for marine species, Mar.

Biotechnol. (NY) 19 (2017) 541-550. https://doi.org/10.1007/s10126-017-9773-5.

[17] E.A. Gehan, A generalized Wilcoxon test for comparing arbitrarily singly-censored

samples, Biometrika 52 (1965) 203-223. https://doi.org/10.2307/2333825.

[18] Gene Ontology Consortium, The Gene Ontology (GO) database and informatics

resource, Nucleic Acids Res. 32 (2004) D258-D261. https://doi.org/10.1093/nar/gkh036.

[19] H. Ogata, S. Goto, K. Sato, W. Fujibuchi, H. Bono, M. Kanehisa, KEGG: Kyoto

encyclopedia of genes and genomes, Nucleic Acids Res. 27 (1999) 29-34.

https://doi.org/10.1093/nar/27.1.29.

[20] D.W. Huang, B.T. Sherman, R.A. Lempicki, Systematic and integrative analysis of large

gene lists using DAVID bioinformatics resources, Nat. Protoc. 4 (2009) 44-57.

https://doi.org/10.1038/nprot.2008.211.

[21] D.W. Huang, B.T. Sherman, R.A. Lempicki, Bioinformatics enrichment tools: Paths

toward the comprehensive functional analysis of large gene lists, Nucleic Acids Res. 37

(2009) 1-13. https://doi.org/10.1093/nar/gkn923.

[22] D. Szklarczyk, A.L. Gable, D. Lyon, A. Junge, S. Wyder, J. Huerta-Cepas, M.

Simonovic, N.T. Doncheva, J.H. Morris, P. Bork, L.J. Jensen, C.V. Mering, STRING

v11: Protein-protein association networks with increased coverage, supporting functional

discovery in genome-wide experimental datasets, Nucleic Acids Res. 47 (2019) D607-

D613. https://doi.org/10.1093/nar/gky1131.

[23] P. Shannon, A. Markiel, O. Ozier, N.S. Baliga, J.T. Wang, D. Ramage, N. Amin, B.

Schwikowski, T. Ideker, Cytoscape: A software Environment for integrated models of

biomolecular interaction networks, Genome Res. 13 (2003) 2498-2504.

https://doi.org/10.1101/gr.1239303.

[24] G.D. Bader, C.W.V. Hogue, An automated method for finding molecular complexes in

large protein interaction networks, BMC Bioinformatics 4 (2003) 2.

https://doi.org/10.1186/1471-2105-4-2.

[25] B. Zhang, S. Horvath, A general framework for weighted gene co-expression network

analysis, Stat. Appl. Genet. Mol. Biol. 4 (2005) Article 17. https://doi.org/10.2202/1544-

6115.1128.

[26] P. Langfelder, S. Horvath, WGCNA: An R package for weighted correlation network

analysis, BMC Bioinformatics 9 (2008) 559. https://doi.org/10.1186/1471-2105-9-559.

[27] M. Lunn, D. McNeil, Applying Cox regression to competing risks, Biometrics 51 (1995)

524-532. https://doi.org/10.2307/2532940.

[28] H.R. Shahraki, A. Salehi, N. Zare, Survival prognostic factors of male breast cancer in

Southern Iran: A LASSO-Cox regression approach, Asian Pacific J. Cancer Prev. 16

(2015) 6773-6777. https://doi.org/10.7314/APJCP.2015.16.15.6773.

[29] H. Liang, X.S. Hao, P. Wang, X.N. Wang, J.W. Li, J.C. Wang, D.C. Wang, Multivariate

Cox analysis on prognostic factors after surgery for rectal carcinoma, Zhonghua Zhong

Liu Za Zhi, 26 (2004) 688-691. (in Chinese)

[30] R. Huang, X. Liao, Q. Li, Identification and validation of potential prognostic gene

biomarkers for predicting survival in patients with acute myeloid leukemia, Onco.

Targets Ther. 10 (2017) 5243-5254. https://doi.org/10.2147/OTT.S147717.

[31] M. Schemper, R. Henderson, Predictive accuracy and explained variation in Cox

regression, Biometrics 56 (2000) 249-255. https://doi.org/10.1111/j.0006-

341X.2000.00249.x.

[32] J.D. Morrow, X. Zhou, T. Lao, Z. Jiang, D.L. DeMeo, M.H. Cho, W. Qiu, S. Cloonan, V.

Pinto-Plata, B. Celli, N. Marchetti, G.J. Criner, R. Bueno, G.R. Washko, K. Glass, J.

Quackenbush, A.M. Choi, E.K. Silverman, C.P. Hersh, Functional interactors of three

genome-wide association study genes are differentially expressed in severe chronic

obstructive pulmonary disease lung tissue, Sci. Rep. 7 (2017) 44232.

https://doi.org/10.1038/srep44232.

[33] T.S. Mok, Y.L. Wu, S. Thongprasert, C.H. Yang, D.T. Chu, N. Saijo, P. Sunpaweravong,

B. Han, B. Margono, Y. Ichinose, Y. Nishiwaki, Y. Ohe, J.J. Yang, B. Chewaskulyong,

H. Jiang, E.L. Duffield, C.L. Watkins, A.A. Armour, M. Fukuoka, Gefitinib or

carboplatin-paclitaxel in pulmonary adenocarcinoma, N. Engl. J. Med. 361 (2009) 947-

957. https://doi.org/10.1056/NEJMoa0810699.

[34] J.R. Germá, A.F. Sagarra, M.A. Izquierdo, M.A. Seguí, Sequential trials of cisplatin,

vinblastine, and bleomycin and etoposide and cisplatin in disseminated

nonseminomatous germ cell tumors of the testis with a good prognosis at a single

institution, Cancer 71 (1993) 796-803. https://doi.org/10.1002/1097-

0142(19930201)71:3<796::AID-CNCR2820710323>3.0.CO;2-1.

[35] M. Yan, C. Wang, B. He, M. Yang, M. Tong, Z. Long, B. Liu, F. Peng, L. Xu, Y. Zhang,

D. Liang, H. Lei, S. Subrata, K.W. Kelley, E.W. Lam, B. Jin, Q. Liu, Aurora-A Kinase:

A potent oncogene and target for cancer therapy, Med. Res. Rev. 36 (2016) 1036-1079.

https://doi.org/10.1002/med.21399.

[36] T. Kato, H. Wada, P. Patel, H.P. Hu, D. Lee, H. Ujiie, K. Hirohashi, T. Nakajima, M.

Sato, M. Kaji, K. Kaga, Y. Matsui, M.S. Tsao, K. Yasufuku, Overexpression of KIF23

predicts clinical outcome in primary lung cancer patients, Lung Cancer 92 (2016) 53-61.

https://doi.org/10.1016/j.lungcan.2015.11.018.

[37] B. Budke, W. Lv, A.P. Kozikowski, P.P. Connell, Recent developments using small

molecules to target RAD51: How to best modulate RAD51 for anticancer therapy?,

ChemMedChem 11 (2016) 2468-2473. https://doi.org/10.1002/cmdc.201600426.

[38] R. Dachineni, G. Ai, D.R. Kumar, S.S. Sadhu, H. Tummala, G.J. Bhat, Cyclin A2 and

CDK2 as novel targets of aspirin and salicylic acid: A potential role in cancer prevention,

Mol. Cancer Res. 14 (2016) 241-252. https://doi.org/10.1158/1541-7786.MCR-15-0360.

[39] A. Christiansen, L. Dyrskjøt, The functional role of the novel biomarker karyopherin α 2

(KPNA2) in cancer, Cancer Lett. 331 (2013) 18-23.

https://doi.org/10.1016/j.canlet.2012.12.013.

[40] Y. Xie, A. Wang, J. Lin, L. Wu, H. Zhang, X. Yang, X. Wan, R. Miao, X. Sang, H.

Zhao, Mps1/TTK: a novel target and biomarker for cancer, J. Drug Target 25 (2017)

112-118. https://doi.org/10.1080/1061186X.2016.1258568.

[41] T. Otto, P. Sicinski, Cell cycle proteins as promising targets in cancer therapy, Nat. Rev.

Cancer 17 (2017) 93-115. https://doi.org/10.1038/nrc.2016.138.

[42] Y. Liao, Y. Liao, J. Li, J. Li, Y. Fan, B. Xu, Polymorphisms in AURKA and AURKB are

associated with the survival of triple-negative breast cancer patients treated with taxane-

based adjuvant chemotherapy, Cancer Manag. Res. 10 (2018) 3801-3808.

https://doi.org/10.2147/CMAR.S174735.

[43] Z. Yu, Y. Sun, X. She, Z. Wang, S. Chen, Z. Deng, Y. Zhang, Q. Liu, Q. Liu, C. Zhao, P.

Li, C. Liu, J. Feng, H. Fu, G. Li, M. Wu, SIX3, a tumor suppressor, inhibits astrocytoma

tumorigenesis by transcriptional repression of AURKA/B, J. Hematol. Oncol. 10 (2017)

115. https://doi.org/10.1186/s13045-017-0483-2.

[44] Z.L. Ma, B.J. Zhang, D.T. Wang, X. Li, J.L. Wei, B.T. Zhao, Y. Jin, Y.L. Li, Y.X. Jin,

Tanshinones suppress AURKA through up-regulation of miR-32 expression in non-small

cell lung cancer, Oncotarget 6 (2015) 20111-20120.

https://doi.org/10.18632/oncotarget.3933.

[45] X. Jin, X. Liu, X. Li, Y. Guan, Integrated analysis of DNA methylation and mRNA

expression profiles data to identify key genes in lung adenocarcinoma, Biomed. Res. Int.

2016 (2016) 4369431. https://doi.org/10.1155/2016/4369431.

[46] S. Bhattacharya, K. Srinivasan, S. Abdisalaam, F. Su, P. Raj, I. Dozmorov, R. Mishra,

E.K. Wakeland, S. Ghose, S. Mukherjee, A. Asaithamby, RAD51 interconnects between

DNA replication, DNA repair and immunity, Nucleic Acids Res. 45 (2017) 4590-4605.

https://doi.org/10.1093/nar/gkx126.

[47] J. Hu, Z. Zhang, L. Zhao, L. Li, W. Zuo, L. Han, High expression of RAD51 promotes

DNA damage repair and survival in KRAS-mutant lung cancer cells, BMB Rep. 52

(2019) 151-156. https://doi.org/10.5483/BMBRep.2019.52.2.213.

[48] T. Takenaka, I. Yoshino, H. Kouso, T. Ohba, T. Yohena, A. Osoegawa, F. Shoji, Y.

Maehara, Combined evaluation of Rad51 and ERCC1 expressions for sensitivity to

platinum agents in non-small cell lung cancer, Int. J. Cancer 121 (2007) 895-900.

https://doi.org/10.1002/ijc.22738.

[49] S.T. Pachis, G.J.P.L. Kops, Leader of the SAC: Molecular mechanisms of Mps1/TTK

regulation in mitosis, Open Biol. 8 (2018) 180109. https://doi.org/10.1098/rsob.180109.

[50] L. Zheng, Z. Chen, M. Kawakami, Y. Chen, J. Roszik, L.M. Mustachio, J.M. Kurie, P.

Villalobos, W. Lu, C. Behrens, B. Mino, L.M. Solis, J. Silvester, K.L. Thu, D.W.

Cescon, J. Rodriguez-Canales, I.I. Wistuba, T.W. Mak, X. Liu, E. Dmitrovsky,

Threonine tyrosine kinase inhibition eliminates lung cancers by augmenting apoptosis

and polyploidy, Mol. Cancer Ther. 18 (2019) 1775-1786. https://doi.org/10.1158/1535-

7163.mct-18-0864.

[51] C.T. Cheung, N. Bendris, C. Paul, A. Hamieh, Y. Anouar, M. Hahne, J.M. Blanchard, B.

Lemmers, Cyclin A2 modulates EMT via β-catenin and phospholipase C pathways,

Carcinogenesis 36 (2015) 914-924. https://doi.org/10.1093/carcin/bgv069.

[52] R. Chen, Y. Zhang, C. Zhang, H. Wu, S. Yang, miR-137 inhibits the proliferation of

human non-small cell lung cancer cells by targeting SRC3, Oncol. Lett. 13 (2017) 3905-

3911. https://doi.org/10.3892/ol.2017.5904.

[53] B. Liang, C. Jia, Y. Huang, H. He, J. Li, H. Liao, X. Liu, X. Liu, X. Bai, D. Yang, TPX2

level correlates with hepatocellular carcinoma cell proliferation, apoptosis, and EMT,

Dig. Dis. Sci. 60 (2015) 2360-2372. https://doi.org/10.1007/s10620-015-3730-9.

[54] D.M. Lin, Y. Ma, T. Xiao, S.P. Guo, N.J. Han, K. Su, S.Z. Yi, J. Fang, S.J. Cheng, Y.N.

Gao, TPX2 expression and its significance in squamous cell carcinoma of lung,

Zhonghua Bing Li Xue Za Zhi 35 (2006) 540-544. (in Chinese)

[55] G. Garrido, I. Vernos, Non-centrosomal TPX2-dependent regulation of the Aurora a

kinase: Functional implications for healthy and pathological cell division, Front. Oncol. 6

(2016) 88. https://doi.org/10.3389/fonc.2016.00088.

[56] M.A. Schneider, P. Christopoulos, T. Muley, A. Warth, U. Klingmueller, M. Thomas,

F.J. Herth, H. Dienemann, N.S. Mueller, F. Theis, M. Meister, AURKA, DLGAP5,

TPX2, KIF11 and CKAP5: Five specific mitosis-associated genes correlate with poor

prognosis for non-small cell lung cancer patients, Int. J. Oncol. 50 (2017) 365-372.

https://doi.org/10.3892/ijo.2017.3834.

[57] L.N. Zhou, Y. Tan, P. Li, P. Zeng, M.B. Chen, Y. Tian, Y.Q. Zhu, Prognostic value of

increased KPNA2 expression in some solid tumors: A systematic review and meta-

analysis, Oncotarget 8 (2017) 303-314. https://doi.org/10.18632/oncotarget.13863.

[58] C.I. Wang, C.L. Wang, C.W. Wang, C.D. Chen, C.C. Wu, Y. Liang, Y.H. Tsai, Y.S.

Chang, J.S. Yu, C.J. Yu, Importin subunit alpha-2 is identified as a potential biomarker

for non-small cell lung cancer by integration of the cancer cell secretome and tissue

transcriptome, Int. J. Cancer 128 (2011) 2364-2372. https://doi.org/10.1002/ijc.25568.

[59] A.L. Vikberg, T. Vooder, K. Lokk, T. Annilo, I. Golovleva, Mutation analysis and copy

number alterations of KIF23 in non-small-cell lung cancer exhibiting KIF23 over-

expression, Onco. Targets Ther. 10 (2017) 4969-4979.

https://doi.org/10.2147/OTT.S138420.

[60] L. Sun, C. Zhang, Z. Yang, Y. Wu, H. Wang, Z. Bao, T. Jiang, KIF23 is an independent

prognostic biomarker in glioma, transcriptionally regulated by TCF-4, Oncotarget 7

(2016) 24646-24655. https://doi.org/10.18632/oncotarget.8261.

[61] F. Iltzsche, K. Simon, S. Stopp, G. Pattschull, S. Francke, P. Wolter, S. Hauser, D.J.

Murphy, P. Garcia, A. Rosenwald, S. Gaubatz, An important role for Myb-MuvB and its

target gene KIF23 in a mouse model of lung adenocarcinoma, Oncogene 36 (2017) 110-

121. https://doi.org/10.1038/onc.2016.181.

[62] K. Välk, T. Vooder, R. Kolde, M.A. Reintam, C. Petzold, J. Vilo, A. Metspalu, Gene

expression profiles of non-small cell lung cancer: Survival prediction and new

biomarkers, Oncology 79 (2011) 283-292. https://doi.org/10.1159/000322116.

[63] L. Ye, H.J. Li, F. Zhang, T.F. Lv, H.B. Liu, Y. Song, Expression of KIF23 and its

prognostic role in non-small cell lung cancer: Analysis based on the data-mining of

oncomine, Zhongguo Fei Ai Za Zhi 20 (2017) 822-826.

https://doi.org/10.3779/j.issn.1009-3419.2017.12.05. (in Chinese)

[64] M. Paliouras, E.P. Diamandis, Intracellular signaling pathways regulate hormone-

dependent kallikrein gene expression, Tumor Biol. 29 (2008) 63-75.

https://doi.org/10.1159/000135686.

[65] G.M. Yousef, G.M. Yacoub, M.E. Polymeris, C. Popalis, A. Soosaipillai, E.P.

Diamandis, Kallikrein gene downregulation in breast cancer, Br. J. Cancer 90 (2004)

167-172. https://doi.org/10.1038/sj.bjc.6601451.

[66] C.A. Borgono, T. Kishi, A. Scorilas, N. Harbeck, J. Dorn, B. Schmalfeldt, M. Schmitt,

E.P. Diamandis, Human kallikrein 8 protein (hK8) in ovarian cancer cytosols: A new

marker of favorable prognosis, Cancer Res. 64 (2004) 1025 LP - 1025.

http://cancerres.aacrjournals.org/content/64/7_Supplement/1025.2.abstract.

[67] KLK8 kallikrein related peptidase 8 [Homo sapiens (human)]

https://www.ncbi.nlm.nih.gov/gene/11202, updated on 11-Sep-2019.

[68] N. Ahmed, J. Dorn, R. Napieralski, E. Drecoll, M. Kotzsch, P. Goettig, E. Zein, S. Avril,

M. Kiechle, E.P. Diamandis, M. Schmitt, V. Magdolen, Clinical relevance of kallikrein-

related peptidase 6 (KLK6) and 8 (KLK8) mRNA expression in advanced serous ovarian

cancer, Biol. Chem. 397 (2016) 1265-1276. https://doi.org/10.1515/hsz-2016-0177.

[69] M. Cacaci, C. Giraud, L. Leger, R. Torelli, C. Martini, B. Posteraro, V. Palmieri, M.

Sanguinetti, F. Bugli, A. Hartke, Expression profiling in a mammalian host reveals the

strong induction of genes encoding LysM domain-containing proteins in Enterococcus

faecium, Sci. Rep. 8 (2018) 12412. https://doi.org/10.1038/s41598-018-30882-z.

[70] C. Gough, L. Cottret, B. Lefebvre, J.J. Bono, Evolutionary history of plant LysM

receptor proteins related to root endosymbiosis, Front. Plant Sci. 9 (2018) 923.

https://doi.org/10.3389/fpls.2018.00923.

[71] V. Vanhorenbeeck, P. Jacquemin, F.P. Lemaigre, G.G. Rousseau, OC-3, a novel

mammalian member of the ONECUT class of transcription factors, Biochem. Biophys.

Res. Commun. 292 (2002) 848-854. https://doi.org/10.1006/bbrc.2002.6760.

[72] Y. Tan, Y. Yoshida, D.E. Hughes, R.H. Costa, Increased expression of hepatocyte

nuclear factor 6 stimulates hepatocyte proliferation during mouse liver regeneration,

Gastroenterology 130 (2006) 1283-1300. https://doi.org/10.1053/j.gastro.2006.01.010.

[73] F. Lehner, U. Kulik, J. Klempnauer, J. Borlak, Inhibition of the liver enriched protein

FOXA2 recovers HNF6 activity in human colon carcinoma and liver hepatoma cells,

PLoS One 5 (2010) e13344. https://doi.org/10.1371/journal.pone.0013344.

[74] L. Ovesen, M. Sørensen, J. Hannibal, L. Allingstrup, Electrical taste detection thresholds

and chemical smell detection thresholds in patients with cancer, Cancer 68 (1991) 2260-

2265. https://doi.org/10.1002/1097-0142(19911115)68:10<2260::AID-

CNCR2820681026>3.0.CO;2-W.

[75] K. Belqaid, Y. Orrevall, J. McGreevy, E. Månsson-Brahme, W. Wismer, C. Tishelman,

B.M. Bernhardson, Self-reported taste and smell alterations in patients under

investigation for lung cancer, Acta Oncol. 53 (2014) 1405-1412.

https://doi.org/10.3109/0284186X.2014.895035.

[76] S. Gao, Y.G. Chung, M.H. Parseghian, G.J. King, E.Y. Adashi, K.E. Latham, Rapid H1

linker histone transitions following fertilization or somatic cell nuclear transfer: Evidence

for a uniform developmental program in mice, Dev. Biol. 266 (2004) 62-75.

https://doi.org/10.1016/j.ydbio.2003.10.003.

[77] T. Teranishi, M. Tanaka, S. Kimoto, Y. Ono, K. Miyakoshi, T. Kono, Y. Yoshimura,

Rapid replacement of somatic linker with the oocyte-specific linker histone

H1foo in nuclear transfer, Dev. Biol. 266 (2004) 76-86.

https://doi.org/10.1016/j.ydbio.2003.10.004.

[78] L. Gao, H. Xie, L. Dong, J. Zou, J. Fu, X. Gao, L. Ou, S. Xiang, H. Song, Gankyrin is

essential for hypoxia enhanced metastatic potential in breast cancer cells, Mol. Med.

Rep. 9 (2014) 1032-1036. https://doi.org/10.3892/mmr.2013.1860.

[79] S. Dawson, H. Higashitsuji, A.J. Wilkinson, J. Fujita, R.J. Mayer, Gankyrin: a new

oncoprotein and regulator of pRb and p53, Trends Cell Biol. 16 (2006) 229-233.

https://doi.org/10.1016/j.tcb.2006.03.001.

[80] J. McIntosh, G. Dennison, J.M.P. Holly, C. Jarrett, A. Frankow, E.J. Foulstone, Z.E.

Winters, C.M. Perks, IGFBP-3 can either inhibit or enhance EGF-mediated growth of

breast epithelial cells dependent upon the presence of fibronectin, J. Biol. Chem. 285

(2010) 38788-38800. https://doi.org/10.1074/jbc.M110.177311.

Figure captions

Figure 1. Overview of microarray signatures. (A-D). Volcano plots of DEGs of GSE2088,

GSE6044, GSE19188 and TCGA datasets. The red and blue points in the plot represent statistically significant up and down-regulated DEGs. (E). Venn diagrams of DEGs of the three GEO and one TCGA dataset.

Figure 2. GO enrichment analysis of DEGs. (A) Up-regulated genes. (B) Down-regulated genes. The y-axis shows significantly enriched GO terms, and x-axis represents significance of enrichment.

Figure 3. Protein-protein interaction network analysis of DEGs. (A) PPI network of 124 significantly differentially expressed genes. Red: up-regulated genes; Green: significantly down-regulated genes; (B) The most significant module of the PPI network. Node size is positively related to degree of expression.

Figure 4. WGCNA analysis of DEGs within the TCGA data. (A). Left panel: Abscissa: power value; ordinate: R2 value after linear regression of -log10 (k) and log10 (P (k)). k: connectivity of the gene nodes; P (k): probability of such a node. Horizontal line: 0.9. Right panel: Abscissa: power value; ordinate: mean connectivity of gene nodes. (B).

Interrelationship between different gene modules. Colors in the heat map represent Pearson correlation coefficients between gene modules. Expression levels of gene modules are represented by the first principal component. (C). Gene modules and phenotypes quantified using Boolean variables (1 represents occurrence, 0 represents no occurrence) were used to calculate correlation coefficients, represented as a heat map. p- values are displayed in brackets. (D). Blue, yellow and turquoise module gene correlation scatter plots. X-axis represents molecule membership, i.e. Pearson correlation coefficients of gene and module

(MM). Y-axis represents the importance of the gene for the phenotype, i.e. Pearson’s correlation coefficient of gene and phenotype (GS: phenotype is represented by a Boolean

variable). In the yellow, blue, turquoise modules, the upper quartile value of MM was 0.756,

0.683 and 0.708 respectively; the upper quartile value of GS was 0.431, 0.390 and 0.631, respectively.

Figure 5. Lasso regression and multivariate cox regression related results. (A). Distribution of risk scores. The abscissa is arranged in order of patient risk score, with y-axis representing risk score, with more than 10 records plotted as 10. Patient with a score less than the median is plotted in green, those equal to or greater than the median in red. (B) Distribution of duration of survival. The abscissa is arranged in order of patient risk score and ordinate representing patient survival time. (C) Heat map of gene expression levels of the most 20 significant genes. Patient risk scores increased from left to right. (D). Results of Lasso regression 1000 times 10-fold cross validation. λ was determined when partial likelihood deviance was smallest. (E). Coefficient curve. Different colored lines represent coefficient sizes of individual genes in different cases. The abscissa represents log (λ) and the number of coefficients (top) that are not zero under this penalty factor. (F). Survival curve for patients with different risk scores. Patients were divided into two groups according to the median survival curve score. Blue represents patients with a lower risk score. p- value <0.001.

Figure 6. (A). Forest plot for multivariate cox regression. The HR value is eβ. This figure displays the 95% confidence interval for the HR value over the box plot with associated p- values. An HR of less than 1 indicates that high gene expression improves prognosis, and HR greater than 1 the opposite. (B). ROC curves for the model representing 1, 3 and 5-year predictions, from left to right, respectively. The values in brackets are the areas under the curve.

Figures

Figure 1

A

B

Figure 2

Figure 3

Figure 4

Figure 5

Hazard ratio

K LK 8 (N =473) 1.01 0.01 * (1.00 − 1.02) 1.02 P O P D C 3 (N =473) (1.01 − 1.03) 0.002 ** IG LL1 (N =473) 1.05 <0.001 *** (1.02 − 1.08) 0.96 LY S M D 1 (N =473) (0.93 − 0.98) 0.001 ** P LE K H A 6 (N =473) 1.02 0.019 * (1.00 − 1.03) P TG IS (N =473) 1.01 0.086 (1.00 − 1.03) A N K FN 1 (N =473) 1.15 0.002 ** (1.05 − 1.26) TM 4S F19 (N =473) 1.03 <0.001 *** (1.01 − 1.05) 1.00 JA G 1 (N =473) (1.00 − 1.00) <0.001 *** O R 2W 3 (N =473) 1.12 0.031 * (1.01 − 1.25) 0.62 O N E C U T3 (N =473) (0.40 − 0.96) 0.032 * R A LG A PA 2 (N =473) 1.02 0.08 (1.00 − 1.04) 1.03 R P H 3A L (N =473) (1.01 − 1.06) 0.015 * P C D H G A 7 (N =473) 1.07 0.073 (0.99 − 1.16) C 10orf55 (N =473) 1.06 0.028 * (1.01 − 1.12) 1.09 A C P T (N =473) (1.01 − 1.18) 0.031 * M M P 12 (N =473) 1.00 0.033 * (1.00 − 1.00) 1.79 H 1FO O (N =473) (1.31 − 2.44) <0.001 *** C T45A 10 (N =473) 1.01 0.009 ** (1.00 − 1.02) 1.03 U G T2A 1 (N =473) (1.02 − 1.04) <0.001 *** # E vents: 150; G lobal p−value (Log−R ank): 1.0846e−19 A IC : 1379.95; C oncordance Index: 0.78 0.5 1 1.5 2 2.5

Figure 6

1 year R O C curve ( A U C = 0.828 ) 3 years R O C curve ( A U C = 0.826 ) 5 year R O C curve ( A U C = 0.824 )

0 0 0

. . .

1 1 1

8 8 8

. . .

0 0 0

e e e

t t t

a a a

r r r

6 6 6

. . .

e e e

0 0 0

v v v

i i i

t t t

i i i

s s s

o o o

p p p

4 4 4

. . .

e e e

0 0 0

u u u

r r r

T T T

2 2 2

. . .

0 0 0

0 0 0

. . .

0 0 0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate False positive rate False positive rate

Figure 7

Supplementary materials

Table 1. Detailed information of the GEO datasets.

GEO Accession Platform Number of LUSC sample Number of normal sample Total

GSE2088 GPL962 48 28 76

GSE6044 GPL201 9 4 13

GSE19188 GPL570 27 65 92

In all - 84 97 181

Table 2. KEGG pathway enrichment analysis of DEGs.

Term Count % P-value Genes

Up-regulated

hsa04110: Cell cycle 7 8.2353 0.0002 MAD2L1, TTK, PRKDC,

PTTG1, SFN, CCNA2,

MCM6

hsa04512:ECM- 6 7.0588 0.0004 SDC1, COL3A1,

receptor interaction COL1A1, COL11A1,

THBS2, SPP1

hsa05146: 4 4.7059 0.0413 COL3A1, COL1A1,

Amoebiasis SERPINB3, COL11A1

Down-regulated

hsa00982:Drug 5 0.0872 3.03E-05 FMO5, FMO3, MAOB,

metabolism- ADH1B, ALDH3B1

cytochrome P450

hsa00350:Tyrosine 3 0.0523 0.0041 MAOB, ADH1B,

metabolism ALDH3B1

hsa00360: 2 0.0349 0.0460 MAOB, ALDH3B1

Phenylalanine

metabolism

Table 3. MM and GS scores of eight significant genes.

AURKA CCNA2 TPX2 TTK AURKB RAD51 KIF23 KPNA2

MM 0.732 0.699 0.69 0.738 0.7 0.74 0.8 0.776

GS 0.469 0.445 0.398 0.412 0.43 0.475 0.5 0.461

Height

0 10000 20000 30000 40000 50000 60000

TCGA.NK.A7XE.01 TCGA.56.A4BW.01 TCGA.77.A5G8.01 TCGA.O2.A52W.01 TCGA.33.AASL.01 TCGA.77.7140.01 TCGA.94.7033.01 TCGA.34.A5IX.01 TCGA.33.AASI.01 TCGA.NC.A5HR.01 TCGA.85.8354.01 TCGA.77.A5GA.01 TCGA.77.A5G7.01 TCGA.33.A5GW.01 TCGA.85.A4CL.01 TCGA.85.A50Z.01 TCGA.98.A539.01 TCGA.33.6737.01 TCGA.56.A49D.01 TCGA.39.5027.01 TCGA.63.A5MH.01 TCGA.39.5028.01 TCGA.63.5128.01 TCGA.90.7964.01 TCGA.18.3411.01 TCGA.77.8007.11 TCGA.85.8070.01 TCGA.39.5011.01 TCGA.22.4591.01 TCGA.22.4599.01 TCGA.92.8063.01 TCGA.46.3768.01 TCGA.66.2727.01 TCGA.77.8009.01 TCGA.60.2709.01 TCGA.66.2754.01 TCGA.63.A5M9.01 TCGA.92.8065.01 TCGA.85.A4JC.01 TCGA.43.6771.01 TCGA.63.6202.01 TCGA.60.2726.01 TCGA.6A.AB49.01 TCGA.77.8150.01 TCGA.NC.A5HI.01 TCGA.46.6025.01 TCGA.33.6738.01 TCGA.33.4582.01 TCGA.NC.A5HE.01 TCGA.21.1081.01 TCGA.43.2576.01 TCGA.77.7335.01 TCGA.96.8169.01 TCGA.NC.A5HJ.01 TCGA.60.2715.01 TCGA.34.7107.11 TCGA.98.A53D.01 TCGA.63.A5MB.01 TCGA.56.7582.01 TCGA.77.8146.01 TCGA.56.A5DR.01 TCGA.43.2578.01 TCGA.77.7463.01 TCGA.98.A53J.01 TCGA.77.6845.01 TCGA.63.A5MY.01 TCGA.NC.A5HM.01 TCGA.63.A5MI.01 TCGA.85.A4QQ.01 TCGA.56.7221.01 TCGA.56.8624.01 TCGA.37.4129.01 TCGA.94.A4VJ.01 TCGA.33.AASB.01 TCGA.22.1002.01 TCGA.68.7757.01 TCGA.98.A53B.01 TCGA.L3.A4E7.01 TCGA.77.6844.01 TCGA.98.8022.01 TCGA.22.0944.01 TCGA.96.8170.01 TCGA.56.8304.01 TCGA.NC.A5HQ.01 TCGA.56.A5DS.01 TCGA.21.1082.01 TCGA.37.A5EN.01 TCGA.NC.A5HL.01 TCGA.58.8391.01 TCGA.85.8352.01 TCGA.63.A5MT.01 TCGA.O2.A52N.01 TCGA.18.4083.01 TCGA.77.7465.01 TCGA.85.8288.01 TCGA.56.A4BX.01 TCGA.58.A46J.01 TCGA.85.A4QR.01 TCGA.90.A59Q.01 TCGA.77.8140.01 TCGA.56.8307.01 TCGA.63.A5MG.01 TCGA.39.5024.01 TCGA.56.7579.01 TCGA.85.A512.01 TCGA.NC.A5HO.01 TCGA.56.8629.01 TCGA.56.7223.01 TCGA.85.A53L.01 TCGA.77.8143.01 TCGA.63.A5MW.01 TCGA.39.5040.11 TCGA.77.7335.11 TCGA.33.4587.11 TCGA.60.2709.11 TCGA.85.7710.11 TCGA.56.7730.11 TCGA.56.8309.11 TCGA.43.6143.11 TCGA.56.7580.11 TCGA.56.8083.11 TCGA.77.7142.11 TCGA.22.5481.11 TCGA.34.8454.11 TCGA.56.8623.01 TCGA.51.4081.11 TCGA.43.6771.11 TCGA.22.5478.11 TCGA.22.4593.11 TCGA.22.5471.11 TCGA.58.8386.11 TCGA.43.7658.11 TCGA.56.7823.11 TCGA.56.8201.11 TCGA.22.4609.11 TCGA.77.7337.11 TCGA.22.5472.11 TCGA.43.6773.11 TCGA.43.5670.11 TCGA.90.7767.11 TCGA.43.7657.11 TCGA.90.6837.11 TCGA.22.5482.11 TCGA.43.6647.11 TCGA.51.4079.11 TCGA.56.7731.11 TCGA.77.8008.11 TCGA.22.5489.11 TCGA.56.7579.11 TCGA.56.8623.11 TCGA.22.5491.11 TCGA.77.7138.11 TCGA.77.7338.11 TCGA.56.7222.11 TCGA.22.5483.11 TCGA.22.5478.01 TCGA.34.5234.01 TCGA.33.4589.01 TCGA.37.3783.01 TCGA.56.7822.01 TCGA.37.4133.01 TCGA.18.3412.01 TCGA.66.2766.01 TCGA.43.5668.01 TCGA.63.A5MU.01 TCGA.56.A62T.01 TCGA.22.5482.01 TCGA.51.4079.01 TCGA.39.5037.01 TCGA.85.A5B5.01 TCGA.52.7622.01 TCGA.22.1016.01 TCGA.96.7545.01 TCGA.58.8386.01 TCGA.56.5898.01 TCGA.63.7020.01 TCGA.21.1083.01 TCGA.85.8072.01 TCGA.60.2706.01 TCGA.21.1076.01 TCGA.21.1076.01_Rep258 TCGA.18.3409.01 TCGA.77.A5FZ.01 TCGA.56.7582.11 TCGA.85.6798.01 TCGA.98.A53A.01 TCGA.22.A5C4.01 TCGA.37.5819.01 TCGA.33.AASJ.01 TCGA.58.A46M.01 TCGA.37.4132.01 TCGA.51.6867.01 TCGA.77.8156.01 TCGA.39.5021.01 TCGA.85.8052.01 TCGA.77.8138.01 TCGA.33.4532.01 TCGA.66.2773.01 TCGA.18.4086.01 TCGA.77.6843.01 TCGA.21.1071.01 TCGA.58.8393.01 TCGA.66.2770.01 TCGA.56.7730.01 TCGA.J1.A4AH.01 TCGA.92.7341.01 TCGA.94.7557.01 TCGA.22.5483.01 TCGA.34.5240.01 TCGA.22.5485.01 TCGA.21.1075.01 TCGA.22.5479.01 TCGA.85.A4CN.01 TCGA.66.2737.01 TCGA.66.2791.01 TCGA.66.2800.01 TCGA.66.2758.01 TCGA.66.2795.01 TCGA.18.3415.01 TCGA.39.5036.01 TCGA.66.2783.01 TCGA.O2.A52Q.01 TCGA.77.8007.01 TCGA.43.6143.01 TCGA.68.8250.01 TCGA.39.5029.01 TCGA.46.3765.01 TCGA.O2.A52S.01 TCGA.43.6773.01 TCGA.66.2789.01 TCGA.66.2771.01 TCGA.60.2723.01 TCGA.85.8666.01 TCGA.33.4586.01 TCGA.18.5595.01 TCGA.22.5477.01 TCGA.37.4130.01 TCGA.37.4135.01

TCGA.18.3421.01 ple lusteringC am S TCGA.33.4566.01 TCGA.90.7769.01 TCGA.33.4583.01 TCGA.63.A5MJ.01 TCGA.94.A5I6.01 TCGA.68.A59J.01 TCGA.MF.A522.01 TCGA.60.2722.01 TCGA.34.5236.01 TCGA.34.2600.01 TCGA.63.7023.01 TCGA.63.5131.01 TCGA.77.7139.01 TCGA.37.A5EM.01 TCGA.NC.A5HG.01 TCGA.77.A5GF.01 TCGA.60.2716.01 TCGA.21.1078.01 TCGA.90.6837.01 TCGA.33.AASD.01 TCGA.22.4601.01 TCGA.22.4595.01 TCGA.58.A46L.01 TCGA.22.4596.01 TCGA.22.0940.01 TCGA.98.A53H.01 TCGA.56.A4ZJ.01 TCGA.98.A53C.01 TCGA.56.7731.01 TCGA.85.A4PA.01 TCGA.56.7580.01 TCGA.21.1070.01 TCGA.33.4547.01 TCGA.94.7943.01 TCGA.63.A5MV.01 TCGA.85.A513.01 TCGA.85.8584.01 TCGA.60.2721.01 TCGA.22.1005.01 TCGA.56.8626.01 TCGA.66.2793.01 TCGA.66.2756.01 TCGA.NC.A5HT.01 TCGA.77.7142.01 TCGA.22.4605.01 TCGA.77.8131.01 TCGA.85.8580.01 TCGA.XC.AA0X.01 TCGA.63.7021.01 TCGA.79.5596.01 TCGA.56.8622.01 TCGA.85.A50M.01 TCGA.NC.A5HP.01 TCGA.NC.A5HN.01 TCGA.18.5592.01 TCGA.77.A5GB.01 TCGA.85.8479.01 TCGA.43.A474.01 TCGA.22.1012.01 TCGA.34.7107.01 TCGA.33.AAS8.01 TCGA.LA.A446.01 TCGA.NC.A5HF.01 TCGA.77.8153.01 TCGA.52.7811.01 TCGA.43.A56V.01 TCGA.NK.A5CR.01 TCGA.43.A475.01 TCGA.66.2767.01 TCGA.66.2763.01 TCGA.66.2782.01 TCGA.92.7340.01 TCGA.51.4081.01 TCGA.68.7755.01 TCGA.34.5232.01 TCGA.58.8392.01 TCGA.58.A46N.01 TCGA.33.A4WN.01 TCGA.O2.A52V.01 TCGA.77.8133.01 TCGA.77.A5G3.01 TCGA.18.3408.01 TCGA.85.7696.01 TCGA.22.4593.01 TCGA.22.4607.01 TCGA.18.3414.01 TCGA.85.A510.01 TCGA.58.8388.01 TCGA.37.4141.01 TCGA.68.8251.01 TCGA.85.8276.01 TCGA.NC.A5HD.01 TCGA.77.7141.01 TCGA.63.A5MP.01 TCGA.60.2719.01 TCGA.NC.A5HK.01 TCGA.85.A4JB.01 TCGA.56.8082.01 TCGA.77.8145.01 TCGA.56.8503.01 TCGA.22.4604.01 TCGA.43.8118.01 TCGA.96.7544.01 TCGA.77.A5G1.01 TCGA.60.2696.01 TCGA.56.7222.01 TCGA.22.5471.01 TCGA.77.A5G6.01 TCGA.85.A511.01 TCGA.21.A5DI.01 TCGA.66.2734.01 TCGA.58.A46K.01 TCGA.63.A5ML.01 TCGA.21.1072.01 TCGA.66.2787.01 TCGA.56.A4BY.01 TCGA.NK.A5CX.01 TCGA.56.8308.01 TCGA.63.A5MR.01 TCGA.34.8456.01 TCGA.90.7766.01 TCGA.22.1011.01 TCGA.34.8454.01 TCGA.18.3407.01 TCGA.33.4538.01 TCGA.77.8154.01 TCGA.56.A4ZK.01 TCGA.98.A53I.01 TCGA.56.7823.01 TCGA.56.8504.01 TCGA.98.8023.01 TCGA.66.2790.01 TCGA.22.5481.01 TCGA.NK.A5D1.01 TCGA.85.8664.01 TCGA.85.8048.01 TCGA.56.8309.01 TCGA.37.A5EL.01 TCGA.68.A59I.01 TCGA.46.6026.01 TCGA.21.1080.01 TCGA.52.7810.01 TCGA.60.2708.01 TCGA.21.5786.01 TCGA.18.3417.01 TCGA.22.5473.01 TCGA.43.3394.01 TCGA.66.2778.01 TCGA.22.5491.01 TCGA.22.5480.01 TCGA.22.5489.01 TCGA.77.6842.01 TCGA.85.7699.01 TCGA.90.7767.01 TCGA.LA.A7SW.01 TCGA.85.8071.01 TCGA.66.2759.01 TCGA.37.3792.01 TCGA.51.4080.01 TCGA.94.8491.01 TCGA.22.5474.01 TCGA.22.4594.01 TCGA.34.5241.01 TCGA.39.5031.01 TCGA.22.5492.01 TCGA.66.2794.01 TCGA.52.7812.01 TCGA.46.3769.01 TCGA.21.5784.01 TCGA.66.2786.01 TCGA.39.5039.01 TCGA.22.5472.01 TCGA.22.1017.01 TCGA.56.8082.11 TCGA.92.7340.11 TCGA.33.6737.11 TCGA.43.5670.01 TCGA.85.8049.01 TCGA.39.5040.01 TCGA.66.2785.01 TCGA.90.A4EE.01 TCGA.96.A4JL.01 TCGA.60.2714.01 TCGA.66.2742.01 TCGA.66.2781.01 TCGA.98.A538.01 TCGA.34.8455.01 TCGA.34.5928.01 TCGA.56.8201.01 TCGA.85.6175.01 TCGA.85.8582.01 TCGA.77.8008.01 TCGA.77.7337.01 TCGA.56.8305.01 TCGA.94.8035.01 TCGA.60.2724.01 TCGA.85.7697.01 TCGA.85.6561.01 TCGA.77.7338.01 TCGA.L3.A524.01 TCGA.39.5016.01 TCGA.85.7698.01 TCGA.68.7756.01 TCGA.63.A5MN.01 TCGA.56.5897.01 TCGA.98.7454.01 TCGA.43.8115.01 TCGA.46.3767.01 TCGA.22.1000.01 TCGA.34.2608.01 TCGA.56.8625.01 TCGA.63.7022.01 TCGA.39.5022.01 TCGA.60.2697.01 TCGA.46.3766.01 TCGA.60.2704.01 TCGA.66.2777.01 TCGA.60.2725.01 TCGA.22.4609.01 TCGA.92.8064.01 TCGA.18.4721.01 TCGA.85.8355.01 TCGA.60.2698.01 TCGA.60.2713.01 TCGA.18.3416.01 TCGA.43.2581.01 TCGA.85.6560.01 TCGA.39.5034.01 TCGA.22.4613.01 TCGA.56.8628.01 TCGA.56.6546.01 TCGA.66.2744.01 TCGA.60.2710.01 TCGA.98.8021.01 TCGA.63.A5MM.01 TCGA.NK.A5CT.01 TCGA.37.3789.01 TCGA.66.2768.01 TCGA.18.3410.01 TCGA.56.8083.01 TCGA.43.6647.01 TCGA.56.6545.01 TCGA.77.8136.01 TCGA.85.7844.01 TCGA.66.2788.01 TCGA.66.2780.01 TCGA.18.3419.01 TCGA.66.2757.01 TCGA.60.2695.01 TCGA.39.5030.01 TCGA.85.8287.01 TCGA.58.8390.01 TCGA.60.2720.01 TCGA.43.3920.01 TCGA.66.2755.01 TCGA.66.2769.01 TCGA.NC.A5HH.01 TCGA.60.2703.01 TCGA.43.6770.01 TCGA.66.2753.01 TCGA.56.1622.01 TCGA.66.2792.01 TCGA.94.A5I4.01 TCGA.85.8277.01 TCGA.34.2596.01 TCGA.96.A4JK.01 TCGA.60.2707.01 TCGA.85.8353.01 TCGA.85.8481.01 TCGA.85.7710.01 TCGA.85.7843.01 TCGA.21.5782.01 TCGA.85.8350.01 TCGA.39.5019.01 TCGA.21.1077.01 TCGA.43.7657.01 TCGA.77.8148.01 TCGA.77.8128.01 TCGA.21.5787.01 TCGA.39.5035.01 TCGA.63.A5MS.01 TCGA.43.A56U.01 TCGA.52.7809.01 TCGA.60.2712.01 TCGA.33.4533.01 TCGA.58.8387.01 TCGA.90.A4ED.01 TCGA.43.7656.01 TCGA.34.5927.01 TCGA.77.8139.01 TCGA.O2.A5IB.01 TCGA.43.7658.01 TCGA.33.4587.01 TCGA.34.5239.01 TCGA.70.6722.01 TCGA.85.7950.01 TCGA.21.5783.01 TCGA.77.A5GH.01 TCGA.43.8116.01 TCGA.34.5929.01 TCGA.60.2711.01 TCGA.85.8351.01 TCGA.34.5231.01 TCGA.94.8490.01 TCGA.77.7138.01 TCGA.18.3406.01 TCGA.77.8144.01 TCGA.70.6723.01 TCGA.21.1079.01 TCGA.66.2765.01 TCGA.77.8130.01 TCGA.98.8020.01

Figure 2. A tree diagram of the hierarchical clustering results before the sample is removed.

Ubiquitin mediated proteolysis ●

Spliceosome ●

Pyrimidine metabolism ●

Purine metabolism ●

Proteasome ● Gene number ● 5 Progesterone−mediated oocyte maturation ● ● 10

p53 signaling pathway ● ● 15 ● 20 Oocyte meiosis ●

- log10(P.value) One carbon pool by folate ●

Mismatch repair ● 9

6 Glyoxylate and dicarboxylate metabolism ● 3 DNA replication ●

Citrate cycle (TCA cycle) ●

Cell cycle ●

Base excision repair ●

Aminoacyl−tRNA biosynthesis ●

2 4 6 8 10 Fold enrichment

Vascular smooth muscle contraction ● Toll−like receptor signaling pathway ● Tight junction ● TGF−beta signaling pathway ● Sphingolipid metabolism ●

● SNARE interactions in vesicular transport Gene number

Renal cell carcinoma ● ● 5 PPAR signaling pathway ● ● 10 Pathways in cancer ● ● 15 ● 20 Neurotrophin signaling pathway ● ● 25 MAPK signaling pathway ● Lysosome ● - log10(P.value) Long−term potentiation ● Leukocyte transendothelial migration ● 4 Jak−STAT signaling pathway ● 3 Insulin signaling pathway ● 2

Glycosaminoglycan degradation ● Focal adhesion ● Fatty acid metabolism ● Endocytosis ● Complement and coagulation cascades ● Adherens junction ●

2 3 4 Fold enrichment

Viral myocarditis ● Type I diabetes mellitus ● Toll−like receptor signaling pathway ●

T cell receptor signaling pathw ay ● Systemic lupus erythematosus ● Prion diseases ● Gene number Primary immunodeficiency ● ● 5 Other glycan degradation ● ● 10 Natural killer cell mediated cytotoxicity ● ● 15 Lysosome ● ● 20 Leukocyte transendothelial migration ● ● 25

m r Intestinal immune network for IgA production

e ●

T Hematopoietic cell lineage ● - log10(P.value) Graft−versus−host disease ● Endocytosis ● 10

Cytosolic DNA−sensing pathway ● 5 Cytokine−cytokine receptor inter action ● Chemokine signaling pathway ● Cell adhesion molecules (CAMs) ● Autoimmune thyroid disease ● Asthma ● Antigen processing and presentation ● Allograft rejection ● 4 8 12 Fold enrichment

Figure3. The bubble figure of the three gene modules KEGG pathway analysis. Top, middle and bottom are the results of blue, turquoise and yellow respectively. The bubble size is the number of genes, and the bubble color represents the magnitude of the significance. The abscissa is the degree of enrichment, and the ordinate is the different regulatory pathway items.

GO:0007049~cell cycle GO:0022402~cell cycle process GO:0022403~cell cycle phase GO:0000278~mitotic cell cycle GO:0000279~M phase GO:0051301~cell division GO:0048285~organelle fission GO:0007067~mitosis GO:0000280~nuclear division GO:0000087~M phase of mitotic cell cycle GO:0006259~DNA metabolic process GO:0033554~cellular response to stress GO:0009057~macromolecule catabolic process GO:0043933~macromolecular comple x subunit organization GO:0044265~cellular macromolecule catabolic process GO:0065003~macromolecular comple x assembly GO:0051276~chromosome organization GO:0006508~proteolysis GO:0006974~response to DNA damage stim ulus GO:0006396~RNA processing GO:0030163~protein catabolic process GO:0010605~negative regulation of macromolecule metabolic process GO:0010604~positive regulation of macromolecule metabolic process GO:0051603~proteolysis involved in cellular protein catabolic process GO:0044257~cellular protein catabolic process GO:0043632~modification−dependent macromolecule catabolic process GO:0032268~regulation of cellular protein metabolic process GO:0019941~modification−dependent protein catabolic process GO:0051726~regulation of cell cycle GO:0007017~microtubule−based process GO:0045184~establishment of protein localization GO:0015031~protein transport GO:0046907~intracellular transport GO:0034621~cellular macromolecular comple x subunit organization GO:0006412~translation GO:0006260~DNA replication GO:0007010~ organization GO:0006281~DNA repair GO:0070271~protein complex biogenesis GO:0034622~cellular macromolecular comple x assembly GO:0006461~protein complex assembly GO:0008283~cell proliferation GO:0044093~positive regulation of molecular function GO:0043085~positive regulation of catalytic activity GO:0016071~mRNA metabolic process GO:0008380~RNA splicing GO:0000226~microtubule cytoskeleton organization GO:0006397~mRNA processing GO:0051247~positive regulation of protein metabolic process GO:0032270~positive regulation of cellular protein metabolic process GO:0031399~regulation of protein modification process GO:0007346~regulation of mitotic cell cycle GO:0007059~chromosome segregation GO:0006511~ubiquitin−dependent protein catabolic process GO:0006325~chromatin organization GO:0034660~ncRNA metabolic process GO:0044092~negative regulation of molecular function GO:0051438~regulation of ubiquitin−protein ligase activity GO:0051340~regulation of ligase activity GO:0051248~negative regulation of protein metabolic process GO:0032269~negative regulation of cellular protein metabolic process GO:0031396~regulation of protein ubiquitination GO:0070727~cellular macromolecule localization GO:0051439~regulation of ubiquitin−protein ligase activity dur ing mitotic cell cycle GO:0043161~proteasomal ubiquitin−dependent protein catabolic process GO:0034613~cellular protein localization GO:0010498~proteasomal protein catabolic process GO:0044271~nitrogen compound biosynthetic process GO:0043086~negative regulation of catalytic activity GO:0031401~positive regulation of protein modification process GO:0031145~−promoting comple x−dependent proteasomal ubiquitin−dependent protein catabolic process GO:0022613~ribonucleoprotein complex biogenesis GO:0007051~spindle organization GO:0006323~DNA packaging GO:0006091~generation of precursor metabolites and energy GO:0000398~nuclear mRNA splicing, via spliceosome GO:0000377~RNA splicing, via tr ansesterification reactions with b ulged adenosine as nucleophile GO:0000375~RNA splicing, via tr ansesterification reactions GO:0000075~cell cycle checkpoint GO:0051443~positive regulation of ubiquitin−protein ligase activity GO:0051437~positive regulation of ubiquitin−protein ligase activity dur ing mitotic cell cycle GO:0051351~positive regulation of ligase activity GO:0031400~negative regulation of protein modification process GO:0031398~positive regulation of protein ubiquitination GO:0051444~negative regulation of ubiquitin−protein ligase activity GO:0051436~negative regulation of ubiquitin−protein ligase activity dur ing mitotic cell cycle GO:0051352~negative regulation of ligase activity GO:0034470~ncRNA processing GO:0032774~RNA biosynthetic process GO:0031397~negative regulation of protein ubiquitination GO:0006605~protein targeting GO:0051321~meiotic cell cycle GO:0034654~nucleobase, nucleoside, nucleotide and nucleic acid biosynthetic process GO:0034404~nucleobase, nucleoside and nucleotide biosynthetic process GO:0016568~chromatin modification GO:0015931~nucleobase, nucleoside, nucleotide and nucleic acid transport GO:0010564~regulation of cell cycle process GO:0006351~transcription, DNA−dependent GO:0051327~M phase of meiotic cell cycle GO:0051169~nuclear transport GO:0019932~second−messenger−mediated signaling GO:0009165~nucleotide biosynthetic process GO:0007126~meiosis GO:0006913~nucleocytoplasmic transport GO:0006333~chromatin assembly or disassembly GO:0006261~DNA−dependent DNA replication GO:0051236~establishment of RNA localization GO:0050658~RNA transport GO:0050657~nucleic acid transport GO:0048015~phosphoinositide−mediated signaling GO:0042254~ribosome biogenesis GO:0007093~mitotic cell cycle checkpoint GO:0007018~microtubule−based movement GO:0006403~RNA localization GO:0006399~tRNA metabolic process - log10(P.value) GO:0006310~DNA recombination GO:0000819~sister chromatid segregation

GO:0000070~mitotic sister chromatid segregation 30 GO:0051329~interphase of mitotic cell cycle GO:0051325~interphase

GO:0051052~regulation of DNA metabolic process 20 GO:0043623~cellular protein comple x assembly GO:0033043~regulation of organelle organization

GO:0010608~posttranscriptional regulation of gene e xpression 10 GO:0051028~mRNA transport GO:0065004~protein−DNA comple x assembly GO:0051656~establishment of organelle localization GO:0051640~organelle localization GO:0042770~DNA damage response , signal transduction GO:0034728~nucleosome organization GO:0033365~protein localization in organelle GO:0031497~chromatin assembly GO:0017038~protein impor t GO:0016072~rRNA metabolic process GO:0006732~coenzyme metabolic process GO:0006364~rRNA processing GO:0031570~DNA integr ity checkpoint GO:0006334~nucleosome assembly GO:0051783~regulation of n uclear division GO:0051170~nuclear import GO:0045786~negative regulation of cell cycle GO:0032508~DNA duplex unwinding GO:0032392~DNA geometric change GO:0030261~chromosome condensation GO:0022618~ribonucleoprotein complex assembly GO:0009116~nucleoside metabolic process GO:0007088~regulation of mitosis GO:0007052~mitotic spindle organization GO:0006413~translational initiation GO:0006268~DNA unwinding dur ing replication GO:0006119~oxidative phosphorylation GO:0051168~nuclear export GO:0043039~tRNA aminoacylation GO:0043038~amino acid activ ation GO:0040029~regulation of gene expression, epigenetic GO:0009262~deoxyribonucleotide metabolic process GO:0009108~coenzyme biosynthetic process GO:0008033~tRNA processing GO:0006986~response to unfolded protein GO:0006418~tRNA aminoacylation f or protein translation GO:0006302~double−strand break repair GO:0000910~cytokinesis GO:0000077~DNA damage checkpoint GO:0051653~spindle localization GO:0051297~centrosome organization GO:0051293~establishment of spindle localization GO:0051053~negative regulation of DNA metabolic process GO:0040001~establishment of mitotic spindle localization GO:0031023~microtubule organizing center organization GO:0030071~regulation of mitotic metaphase/anaphase tr ansition GO:0016447~somatic recombination of imm unoglobulin gene segments GO:0016445~somatic diversification of imm unoglobulins GO:0016444~somatic cell DNA recombination GO:0014075~response to amine stim ulus GO:0007163~establishment or maintenance of cell polar ity GO:0007127~meiosis I GO:0006405~RNA export from nucleus GO:0006298~mismatch repair GO:0006284~base−excision repair GO:0002562~somatic diversification of immune receptors via germline recombination within a single GO:0002440~production of molecular mediator of imm une response GO:0002377~immunoglobulin production GO:0002200~somatic diversification of immune receptors GO:0051784~negative regulation of nuclear division GO:0051303~establishment of chromosome localization GO:0051294~establishment of spindle or ientation GO:0050000~chromosome localization GO:0045841~negative regulation of mitotic metaphase/anaphase tr ansition GO:0045839~negative regulation of mitosis GO:0045190~isotype switching GO:0031577~spindle checkpoint GO:0030010~establishment of cell polar ity GO:0016446~somatic hypermutation of immunoglobulin genes GO:0010970~microtubule−based transport GO:0010948~negative regulation of cell cycle process GO:0009394~2'−deoxyribonucleotide metabolic process GO:0009263~deoxyribonucleotide biosynthetic process GO:0009219~pyrimidine deoxyribonucleotide metabolic process GO:0009132~nucleoside diphosphate metabolic process GO:0007096~regulation of exit from mitosis GO:0007094~mitotic cell cycle spindle assemb ly checkpoint GO:0007076~mitotic chromosome condensation GO:0006270~DNA replication initiation GO:0006220~pyrimidine nucleotide metabolic process GO:0006084~acetyl−CoA metabolic process GO:0002566~somatic diversification of imm une receptors via somatic m utation GO:0002381~immunoglobulin production during immune response GO:0002208~somatic diversification of immunoglobulins during immune response GO:0002204~somatic recombination of imm unoglobulin genes during immune response GO:0000245~spliceosome assemb ly GO:0000132~establishment of mitotic spindle or ientation GO:0051322~anaphase GO:0051319~G2 phase GO:0051225~spindle assembly GO:0043200~response to amino acid stim ulus GO:0043043~peptide biosynthetic process GO:0033205~cytokinesis dur ing cell cycle GO:0031576~G2/M transition checkpoint GO:0031572~G2/M transition DNA damage checkpoint GO:0009396~folic acid and derivative biosynthetic process GO:0009133~nucleoside diphosphate biosynthetic process GO:0009067~aspartate family amino acid biosynthetic process GO:0006760~folic acid and derivative metabolic process GO:0006611~protein export from nucleus GO:0006422~aspartyl−tRNA aminoacylation GO:0006297~nucleotide−excision repair, DNA gap filling GO:0006200~ATP catabolic process GO:0000090~mitotic anaphase GO:0000085~G2 phase of mitotic cell cycle GO:0000080~G1 phase of mitotic cell cycle GO:0000059~protein impor t into nucleus, docking GO:0051231~spindle elongation GO:0043570~maintenance of DNA repeat elements GO:0034501~protein localization to kinetochore GO:0031055~chromatin remodeling at centromere GO:0009221~pyrimidine deoxyribonucleotide biosynthetic process GO:0007549~dosage compensation GO:0006529~asparagine biosynthetic process GO:0006528~asparagine metabolic process GO:0006086~acetyl−CoA biosynthetic process from p yruvate GO:0000022~mitotic spindle elongation

0 25 50 75 100 Gene count

GO:0007242~intracellular signaling cascade GO:0042127~regulation of cell prolif eration GO:0010033~response to organic substance GO:0009611~response to wounding GO:0043067~regulation of progr ammed cell death GO:0042981~regulation of apoptosis GO:0010941~regulation of cell death GO:0006952~defense response GO:0006357~regulation of tr anscription from RNA polymerase II promoter GO:0022610~biological adhesion GO:0007155~cell adhesion GO:0010605~negative regulation of macromolecule metabolic process GO:0010604~positive regulation of macromolecule metabolic process GO:0008104~protein localization GO:0016265~death GO:0008219~cell death GO:0006955~immune response GO:0051174~regulation of phosphor us metabolic process GO:0042592~homeostatic process GO:0019220~regulation of phosphate metabolic process GO:0009891~positive regulation of biosynthetic process GO:0051173~positive regulation of nitrogen compound metabolic process GO:0042325~regulation of phosphor ylation GO:0031328~positive regulation of cellular biosynthetic process GO:0016192~vesicle−mediated transport GO:0009890~negative regulation of biosynthetic process GO:0006928~cell motion GO:0031327~negative regulation of cellular biosynthetic process GO:0012501~programmed cell death GO:0015031~protein transport GO:0010558~negative regulation of macromolecule biosynthetic process GO:0009725~response to hor mone stimulus GO:0009719~response to endogenous stim ulus GO:0006915~apoptosis GO:0045935~positive regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolic process GO:0016044~membrane organization GO:0010557~positive regulation of macromolecule biosynthetic process GO:0007167~enzyme linked receptor protein signaling pathw ay GO:0010628~positive regulation of gene expression GO:0045941~positive regulation of transcription GO:0008285~negative regulation of cell prolif eration GO:0006954~inflammator y response GO:0051172~negative regulation of nitrogen compound metabolic process GO:0046907~intracellular transport GO:0051338~regulation of tr ansferase activity GO:0045934~negative regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolic process GO:0044093~positive regulation of molecular function GO:0010629~negative regulation of gene expression GO:0007243~protein kinase cascade GO:0001944~vasculature development GO:0051254~positive regulation of RNA metabolic process GO:0048878~chemical homeostasis GO:0045893~positive regulation of transcription, DNA−dependent GO:0043549~regulation of kinase activity GO:0043068~positive regulation of programmed cell death GO:0043065~positive regulation of apoptosis GO:0010942~positive regulation of cell death GO:0007010~cytoskeleton organization GO:0001568~blood vessel development GO:0045859~regulation of protein kinase activity GO:0032268~regulation of cellular protein metabolic process GO:0016481~negative regulation of transcription GO:0007610~behavior GO:0051674~localization of cell GO:0048870~cell motility GO:0019725~cellular homeostasis GO:0060548~negative regulation of cell death GO:0044092~negative regulation of molecular function GO:0043069~negative regulation of programmed cell death GO:0043066~negative regulation of apoptosis GO:0032535~regulation of cellular component siz e GO:0016477~cell migration GO:0045944~positive regulation of transcription from RNA polymerase II promoter GO:0008284~positive regulation of cell prolif eration GO:0051270~regulation of cell motion GO:0051094~positive regulation of developmental process GO:0048545~response to steroid hor mone stimulus GO:0048514~blood vessel morphogenesis GO:0002684~positive regulation of immune system process GO:0070727~cellular macromolecule localization GO:0051253~negative regulation of RNA metabolic process GO:0045892~negative regulation of transcription, DNA−dependent GO:0034613~cellular protein localization GO:0030036~ cytoskeleton organization GO:0030029~actin filament−based process GO:0012502~induction of progr ammed cell death GO:0010324~membrane invagination GO:0006917~induction of apoptosis GO:0006897~endocytosis GO:0055082~cellular chemical homeostasis GO:0048584~positive regulation of response to stim ulus GO:0040012~regulation of locomotion GO:0040008~regulation of growth GO:0010647~positive regulation of cell comm unication GO:0050801~ion homeostasis GO:0030334~regulation of cell migr ation GO:0008610~lipid biosynthetic process GO:0006886~intracellular protein transport GO:0006873~cellular ion homeostasis GO:0043086~negative regulation of catalytic activity GO:0009991~response to extracellular stimulus GO:0009967~positive regulation of signal tr ansduction GO:0009628~response to abiotic stim ulus GO:0007169~transmembrane receptor protein tyrosine kinase signaling pathw ay GO:0000122~negative regulation of transcription from RNA polymerase II promoter GO:0045597~positive regulation of cell differentiation GO:0042060~wound healing GO:0001775~cell activation GO:0051493~regulation of cytosk eleton organization GO:0045087~innate immune response GO:0042493~response to dr ug GO:0035295~tube development GO:0033043~regulation of organelle organization GO:0032101~regulation of response to e xternal stimulus GO:0031399~regulation of protein modification process GO:0001501~skeletal system development GO:0044087~regulation of cellular component biogenesis GO:0043009~chordate embr yonic development GO:0031667~response to nutrient levels GO:0010035~response to inorganic substance GO:0009792~embryonic development ending in bir th or egg hatching GO:0008361~regulation of cell siz e GO:0007626~locomotor y behavior GO:0007517~muscle organ development GO:0007264~small GTPase mediated signal transduction GO:0001932~regulation of protein amino acid phosphor ylation GO:0051241~negative regulation of multicellular organismal process GO:0045321~leukocyte activation GO:0044057~regulation of system process GO:0010648~negative regulation of cell comm unication GO:0006916~anti−apoptosis GO:0001525~angiogenesis GO:0007507~heart development GO:0002252~immune effector process GO:0001817~regulation of cytokine production GO:0051347~positive regulation of transferase activity GO:0051240~positive regulation of multicellular organismal process GO:0051129~negative regulation of cellular component organization GO:0043627~response to estrogen stim ulus GO:0043405~regulation of MAP kinase activity GO:0010627~regulation of protein kinase cascade GO:0009968~negative regulation of signal tr ansduction GO:0006631~fatty acid metabolic process GO:0001701~in utero embr yonic development GO:0051247~positive regulation of protein metabolic process GO:0051098~regulation of binding GO:0051056~regulation of small GTP ase mediated signal transduction GO:0045860~positive regulation of protein kinase activity GO:0043434~response to peptide hor mone stimulus GO:0042330~taxis GO:0033674~positive regulation of kinase activity GO:0030003~cellular cation homeostasis GO:0008015~blood circulation GO:0007584~response to nutrient GO:0006935~chemotaxis GO:0003013~circulator y system process GO:0000165~MAPKKK cascade GO:0070482~response to oxygen levels GO:0050878~regulation of body fluid le vels GO:0050865~regulation of cell activ ation GO:0050778~positive regulation of immune response GO:0045596~negative regulation of cell differentiation GO:0032970~regulation of actin filament−based process GO:0032956~regulation of actin cytosk eleton organization GO:0032270~positive regulation of cellular protein metabolic process GO:0030155~regulation of cell adhesion GO:0007178~transmembrane receptor protein serine/threonine kinase signaling pathw ay GO:0002526~acute inflammator y response GO:0002443~leukocyte mediated immunity GO:0001558~regulation of cell gro wth GO:0051272~positive regulation of cell motion GO:0032870~cellular response to hor mone stimulus GO:0014070~response to organic cyclic substance GO:0009617~response to bacter ium GO:0006979~response to oxidative stress GO:0002694~regulation of leuk ocyte activation GO:0060537~muscle tissue development GO:0051348~negative regulation of transferase activity GO:0051130~positive regulation of cellular component organization GO:0051101~regulation of DNA binding GO:0050817~coagulation GO:0048660~regulation of smooth m uscle cell proliferation GO:0046649~lymphocyte activ ation GO:0040017~positive regulation of locomotion GO:0034097~response to cytokine stim ulus GO:0019216~regulation of lipid metabolic process GO:0010740~positive regulation of protein kinase cascade GO:0007599~hemostasis GO:0007596~blood coagulation GO:0002460~adaptive immune response based on somatic recombination of imm une receptors built from immunoglobulin superfamily domains GO:0002449~lymphocyte mediated imm unity GO:0002250~adaptive immune response GO:0001666~response to hypoxia GO:0051259~protein oligomer ization GO:0051249~regulation of lymphocyte activ ation GO:0051090~regulation of tr anscription factor activity GO:0043062~extracellular structure organization GO:0033673~negative regulation of kinase activity GO:0030335~positive regulation of cell migr ation GO:0019221~cytokine−mediated signaling pathw ay GO:0010876~lipid localization GO:0006959~humoral immune response GO:0002253~activation of immune response GO:0060541~respiratory system development GO:0051271~negative regulation of cell motion GO:0050727~regulation of inflammator y response GO:0048661~positive regulation of smooth muscle cell proliferation GO:0046486~glycerolipid metabolic process GO:0040013~negative regulation of locomotion GO:0032103~positive regulation of response to external stimulus GO:0030336~negative regulation of cell migr ation GO:0030324~lung development GO:0030323~respiratory tube development GO:0030278~regulation of ossification GO:0009612~response to mechanical stim ulus GO:0006869~lipid transport GO:0006469~negative regulation of protein kinase activity GO:0002237~response to molecule of bacter ial origin GO:0051789~response to protein stim ulus GO:0051604~protein maturation GO:0051384~response to glucocor ticoid stimulus GO:0051051~negative regulation of transport

GO:0050867~positive regulation of cell activ ation - log10(P.value) GO:0050863~regulation of T cell activ ation GO:0045937~positive regulation of phosphate metabolic process GO:0043254~regulation of protein comple x assembly GO:0042327~positive regulation of phosphor ylation GO:0032583~regulation of gene−specific tr anscription GO:0031960~response to cor ticosteroid stimulus

GO:0030855~epithelial cell diff erentiation 9 GO:0030198~extracellular matrix organization GO:0014706~striated muscle tissue development GO:0010562~positive regulation of phosphorus metabolic process GO:0010038~response to metal ion GO:0007565~female pregnancy GO:0007179~transforming growth factor beta receptor signaling pathw ay GO:0002697~regulation of imm une effector process 6 GO:0002696~positive regulation of leukocyte activation GO:0060627~regulation of vesicle−mediated transport GO:0051605~protein maturation by peptide bond cleavage GO:0051494~negative regulation of cytoskeleton organization GO:0051251~positive regulation of lymphocyte activ ation GO:0050900~leukocyte migration 3 GO:0045926~negative regulation of growth GO:0045767~regulation of anti−apoptosis GO:0045637~regulation of m yeloid cell differentiation GO:0043408~regulation of MAPKKK cascade GO:0032868~response to insulin stim ulus GO:0032496~response to lipopolysacchar ide GO:0032355~response to estradiol stimulus GO:0032271~regulation of protein polymer ization GO:0031349~positive regulation of defense response GO:0030832~regulation of actin filament length GO:0019724~B cell mediated imm unity GO:0016485~protein processing GO:0016064~immunoglobulin mediated immune response GO:0010639~negative regulation of organelle organization GO:0007568~aging GO:0007265~Ras protein signal tr ansduction GO:0007015~actin filament organization GO:0006633~fatty acid biosynthetic process GO:0001934~positive regulation of protein amino acid phosphor ylation GO:0000302~response to reactiv e oxygen species GO:0051260~protein homooligomer ization GO:0051100~negative regulation of binding GO:0048754~branching morphogenesis of a tube GO:0048585~negative regulation of response to stim ulus GO:0045792~negative regulation of cell siz e GO:0045768~positive regulation of anti−apoptosis GO:0043406~positive regulation of MAP kinase activity GO:0042542~response to hydrogen peroxide GO:0031099~regeneration GO:0030308~negative regulation of cell growth GO:0008064~regulation of actin polymer ization or depolymerization GO:0007585~respiratory gaseous exchange GO:0006956~complement activ ation GO:0006898~receptor−mediated endocytosis GO:0002541~activation of plasma proteins in volved in acute inflammator y response GO:0001819~positive regulation of cytokine production GO:0001763~morphogenesis of a branching structure GO:0001570~vasculogenesis GO:0070663~regulation of leuk ocyte proliferation GO:0060538~skeletal muscle organ development GO:0060326~cell chemotaxis GO:0052547~regulation of peptidase activity GO:0051495~positive regulation of cytoskeleton organization GO:0051099~positive regulation of binding GO:0050870~positive regulation of T cell activ ation GO:0050795~regulation of beha vior GO:0050730~regulation of peptidyl−tyrosine phosphor ylation GO:0046456~icosanoid biosynthetic process GO:0045785~positive regulation of cell adhesion GO:0045667~regulation of osteob last differentiation GO:0043407~negative regulation of MAP kinase activity GO:0043244~regulation of protein comple x disassembly GO:0033559~unsaturated fatty acid metabolic process GO:0033273~response to vitamin GO:0032944~regulation of monon uclear cell proliferation GO:0032570~response to progesterone stim ulus GO:0031100~organ regeneration GO:0030100~regulation of endocytosis GO:0016049~cell growth GO:0010638~positive regulation of organelle organization GO:0009266~response to temperature stimulus GO:0007519~skeletal muscle tissue development GO:0007259~JAK−STAT cascade GO:0006690~icosanoid metabolic process GO:0006643~membrane lipid metabolic process GO:0006636~unsaturated fatty acid biosynthetic process GO:0002683~negative regulation of immune system process GO:0001818~negative regulation of cytokine production GO:0050921~positive regulation of chemotaxis GO:0050920~regulation of chemotaxis GO:0048520~positive regulation of behavior GO:0045936~negative regulation of phosphate metabolic process GO:0045765~regulation of angiogenesis GO:0045088~regulation of innate imm une response GO:0043433~negative regulation of transcription factor activity GO:0043392~negative regulation of DNA binding GO:0043388~positive regulation of DNA binding GO:0043242~negative regulation of protein comple x disassembly GO:0042326~negative regulation of phosphor ylation GO:0030833~regulation of actin filament polymer ization GO:0030595~leukocyte chemotaxis GO:0030162~regulation of proteolysis GO:0010565~regulation of cellular k etone metabolic process GO:0010563~negative regulation of phosphorus metabolic process GO:0007588~excretion GO:0006958~complement activation, classical pathway GO:0006953~acute−phase response GO:0006937~regulation of m uscle contraction GO:0006800~oxygen and reactive oxygen species metabolic process GO:0002455~humoral immune response mediated by circulating immunoglobulin GO:0001936~regulation of endothelial cell prolif eration GO:0001569~patterning of blood vessels GO:0070507~regulation of microtub ule cytoskeleton organization GO:0055088~lipid homeostasis GO:0051592~response to calcium ion GO:0051591~response to cAMP GO:0051092~positive regulation of NF−kappaB tr anscription factor activity GO:0050818~regulation of coagulation GO:0050729~positive regulation of inflammator y response GO:0048008~platelet−der ived growth factor receptor signaling pathway GO:0045669~positive regulation of osteoblast differentiation GO:0045639~positive regulation of myeloid cell differentiation GO:0034329~cell junction assemb ly GO:0033189~response to vitamin A GO:0032886~regulation of microtub ule−based process GO:0032582~negative regulation of gene−specific tr anscription GO:0031110~regulation of microtub ule polymerization or depolymerization GO:0030301~cholesterol tr ansport GO:0022406~membrane docking GO:0019217~regulation of fatty acid metabolic process GO:0017015~regulation of tr ansforming growth factor beta receptor signaling pathw ay GO:0016050~vesicle organization GO:0015918~sterol transport GO:0015718~monocarboxylic acid transport GO:0008360~regulation of cell shape GO:0008277~regulation of G−protein coupled receptor protein signaling pathw ay GO:0006944~membrane fusion GO:0002761~regulation of myeloid leukocyte differentiation GO:0002673~regulation of acute inflammator y response GO:0001824~blastocyst development GO:0055092~sterol homeostasis GO:0051492~regulation of stress fiber f ormation GO:0051017~actin filament b undle formation GO:0046677~response to antibiotic GO:0046425~regulation of JAK−STAT cascade GO:0045807~positive regulation of endocytosis GO:0045766~positive regulation of angiogenesis GO:0045621~positive regulation of lymphocyte diff erentiation GO:0045582~positive regulation of T cell diff erentiation GO:0045428~regulation of nitr ic oxide biosynthetic process GO:0045216~cell−cell junction organization GO:0043450~alkene biosynthetic process GO:0043449~cellular alkene metabolic process GO:0042632~cholesterol homeostasis GO:0042509~regulation of tyrosine phosphor ylation of STAT protein GO:0033993~response to lipid GO:0032663~regulation of inter leukin−2 production GO:0032526~response to retinoic acid GO:0032272~negative regulation of protein polymer ization GO:0032231~regulation of actin filament b undle formation GO:0031348~negative regulation of defense response GO:0031333~negative regulation of protein comple x assembly GO:0030837~negative regulation of actin filament polymer ization GO:0030834~regulation of actin filament depolymer ization GO:0030510~regulation of BMP signaling pathw ay GO:0019370~leukotriene biosynthetic process GO:0010883~regulation of lipid stor age GO:0010743~regulation of foam cell differentiation GO:0010594~regulation of endothelial cell migr ation GO:0007566~embryo implantation GO:0006957~complement activ ation, alternative pathway GO:0006691~leukotriene metabolic process GO:0002763~positive regulation of myeloid leukocyte differentiation GO:0002687~positive regulation of leukocyte migration GO:0002685~regulation of leuk ocyte migration GO:0001933~negative regulation of protein amino acid phosphor ylation GO:0000188~inactivation of MAPK activity GO:0051693~actin filament capping GO:0050873~brown fat cell differentiation GO:0048286~lung alveolus development GO:0045861~negative regulation of proteolysis GO:0045429~positive regulation of nitric oxide biosynthetic process GO:0042531~positive regulation of tyrosine phosphor ylation of STAT protein GO:0042517~positive regulation of tyrosine phosphor ylation of Stat3 protein GO:0042516~regulation of tyrosine phosphor ylation of Stat3 protein GO:0033135~regulation of peptidyl−ser ine phosphorylation GO:0032273~positive regulation of protein polymer ization GO:0030835~negative regulation of actin filament depolymer ization GO:0022409~positive regulation of cell−cell adhesion GO:0022407~regulation of cell−cell adhesion GO:0015669~gas transport GO:0010888~negative regulation of lipid stor age GO:0010595~positive regulation of endothelial cell migr ation GO:0010332~response to gamma r adiation GO:0007043~cell−cell junction assemb ly GO:0007041~lysosomal tr ansport GO:0006471~protein amino acid ADP−r ibosylation GO:0002690~positive regulation of leukocyte chemotaxis GO:0002688~regulation of leuk ocyte chemotaxis GO:0002548~monocyte chemotaxis GO:0001937~negative regulation of endothelial cell prolif eration GO:0001910~regulation of leuk ocyte mediated cytotoxicity GO:0001825~blastocyst formation GO:0070613~regulation of protein processing GO:0070102~interleukin−6−mediated signaling pathw ay GO:0060416~response to growth hormone stimulus GO:0060396~growth hormone receptor signaling pathway GO:0060395~SMAD protein signal tr ansduction GO:0060389~pathway−restricted SMAD protein phosphor ylation GO:0060324~face development GO:0051895~negative regulation of focal adhesion formation GO:0051893~regulation of focal adhesion formation GO:0051482~elevation of cytosolic calcium ion concentr ation during G−protein signaling, coupled to IP3 second messenger (phospholipase C activ ating) GO:0048745~smooth muscle tissue development GO:0048010~vascular endothelial growth factor receptor signaling pathway GO:0046785~microtubule polymerization GO:0046457~prostanoid biosynthetic process GO:0045651~positive regulation of macrophage differentiation GO:0045649~regulation of macrophage diff erentiation GO:0044252~negative regulation of multicellular organismal metabolic process GO:0043536~positive regulation of blood vessel endothelial cell migr ation GO:0035313~wound healing, spreading of epider mal cells GO:0033138~positive regulation of peptidyl−ser ine phosphorylation GO:0032673~regulation of inter leukin−4 production GO:0032365~intracellular lipid transport GO:0022614~membrane to membrane docking GO:0014910~regulation of smooth m uscle cell migration GO:0010955~negative regulation of protein matur ation by peptide bond cleavage GO:0010953~regulation of protein matur ation by peptide bond cleavage GO:0010885~regulation of cholesterol stor age GO:0010812~negative regulation of cell−substr ate adhesion GO:0010745~negative regulation of foam cell differentiation GO:0010713~negative regulation of collagen metabolic process GO:0010712~regulation of collagen metabolic process GO:0006929~substrate−bound cell migration GO:0002675~positive regulation of acute inflammator y response GO:0002456~T cell mediated imm unity GO:0001960~negative regulation of cytokine−mediated signaling pathw ay GO:0001953~negative regulation of cell−matr ix adhesion GO:0001909~leukocyte mediated cytotoxicity GO:0001516~prostaglandin biosynthetic process GO:0043320~natural killer cell degranulation GO:0030185~nitric oxide transport GO:0002384~hepatic imm une response GO:0001869~negative regulation of complement activ ation, lectin pathway GO:0001868~regulation of complement activ ation, lectin pathway GO:0001765~membrane raft formation

0 20 40 60 80 Gene count

GO:0006955~immune response GO:0006952~defense response GO:0009611~response to wounding GO:0007242~intracellular signaling cascade GO:0016265~death GO:0008219~cell death GO:0006954~inflammator y response GO:0006508~proteolysis GO:0022610~biological adhesion GO:0007155~cell adhesion GO:0042127~regulation of cell prolif eration GO:0012501~programmed cell death GO:0006915~apoptosis GO:0043067~regulation of programmed cell death GO:0042981~regulation of apoptosis GO:0019882~antigen processing and presentation GO:0010941~regulation of cell death GO:0010033~response to organic substance GO:0002684~positive regulation of immune system process GO:0001775~cell activation GO:0045321~leukocyte activation GO:0007267~cell−cell signaling GO:0046649~lymphocyte activ ation GO:0048584~positive regulation of response to stim ulus GO:0016192~vesicle−mediated transport GO:0007610~behavior GO:0050865~regulation of cell activ ation GO:0007626~locomotory behavior GO:0042330~taxis GO:0007010~cytoskeleton organization GO:0006935~chemotaxis GO:0070271~protein complex biogenesis GO:0051249~regulation of lymphocyte activ ation GO:0050778~positive regulation of immune response GO:0048534~hemopoietic or lymphoid organ de velopment GO:0042110~T cell activ ation GO:0006928~cell motion GO:0006461~protein complex assembly GO:0002694~regulation of leuk ocyte activation GO:0002520~immune system development GO:0043068~positive regulation of programmed cell death GO:0043065~positive regulation of apoptosis GO:0030097~hemopoiesis GO:0010942~positive regulation of cell death GO:0008284~positive regulation of cell prolif eration GO:0008283~cell proliferation GO:0050867~positive regulation of cell activ ation GO:0050863~regulation of T cell activ ation GO:0010647~positive regulation of cell comm unication GO:0009967~positive regulation of signal tr ansduction GO:0009617~response to bacter ium GO:0002252~immune effector process GO:0048002~antigen processing and presentation of peptide antigen GO:0030036~actin cytoskeleton organization GO:0030029~actin filament−based process GO:0016044~membrane organization GO:0012502~induction of progr ammed cell death GO:0009615~response to vir us GO:0006917~induction of apoptosis GO:0002696~positive regulation of leukocyte activation GO:0002504~antigen processing and presentation of peptide or polysacchar ide antigen via MHC class II GO:0051251~positive regulation of lymphocyte activ ation GO:0051094~positive regulation of developmental process GO:0010324~membrane invagination GO:0006897~endocytosis GO:0002521~leukocyte differentiation GO:0001817~regulation of cytokine production GO:0050870~positive regulation of T cell activ ation GO:0045597~positive regulation of cell diff erentiation GO:0045087~innate immune response GO:0043086~negative regulation of catalytic activity GO:0030098~lymphocyte differentiation GO:0007264~small GTPase mediated signal transduction GO:0002460~adaptive immune response based on somatic recombination of imm une receptors built from immunoglobulin superfamily domains GO:0002449~lymphocyte mediated imm unity GO:0002443~leukocyte mediated immunity GO:0002250~adaptive immune response GO:0045619~regulation of lymphocyte diff erentiation GO:0010876~lipid localization GO:0010740~positive regulation of protein kinase cascade GO:0010627~regulation of protein kinase cascade GO:0006968~cellular defense response GO:0006916~anti−apoptosis GO:0055074~calcium ion homeostasis GO:0055065~metal ion homeostasis GO:0045580~regulation of T cell diff erentiation GO:0030217~T cell differentiation GO:0030005~cellular di−, tr i−valent inorganic cation homeostasis GO:0019724~B cell mediated imm unity GO:0016064~immunoglobulin mediated immune response GO:0008015~blood circulation GO:0006959~humoral immune response GO:0006875~cellular metal ion homeostasis GO:0006874~cellular calcium ion homeostasis GO:0003013~circulator y system process GO:0070663~regulation of leuk ocyte proliferation GO:0050670~regulation of lymphocyte prolif eration GO:0045621~positive regulation of lymphocyte diff erentiation GO:0042060~wound healing GO:0032944~regulation of monon uclear cell proliferation GO:0019884~antigen processing and presentation of e xogenous antigen GO:0006869~lipid transport GO:0002697~regulation of imm une effector process GO:0002474~antigen processing and presentation of peptide antigen via MHC class I GO:0002253~activation of immune response GO:0051604~protein maturation GO:0045582~positive regulation of T cell diff erentiation GO:0043123~positive regulation of I−kappaB kinase/NF−kappaB cascade GO:0043122~regulation of I−kappaB kinase/NF−kappaB cascade GO:0016485~protein processing - log10(P.value) GO:0007015~actin filament organization GO:0002822~regulation of adaptiv e immune response based on somatic recombination of imm une receptors built from immunoglobulin superfamily domains GO:0002819~regulation of adaptiv e immune response GO:0002699~positive regulation of immune effector process 50 GO:0002478~antigen processing and presentation of e xogenous peptide antigen GO:0002237~response to molecule of bacter ial origin GO:0070665~positive regulation of leukocyte proliferation 40 GO:0051605~protein maturation by peptide bond cleavage GO:0050871~positive regulation of B cell activ ation GO:0050864~regulation of B cell activ ation GO:0050727~regulation of inflammator y response 30 GO:0050671~positive regulation of lymphocyte prolif eration GO:0048871~multicellular organismal homeostasis GO:0046634~regulation of alpha−beta T cell activ ation GO:0045058~T cell selection 20 GO:0042129~regulation of T cell prolif eration GO:0042108~positive regulation of cytokine biosynthetic process GO:0042035~regulation of cytokine biosynthetic process 10 GO:0033077~T cell differentiation in the thymus GO:0032946~positive regulation of mononuclear cell proliferation GO:0002824~positive regulation of adaptive immune response based on somatic recombination of imm une receptors built from immunoglobulin superfamily domains GO:0002821~positive regulation of adaptive immune response GO:0002708~positive regulation of lymphocyte mediated imm unity GO:0002706~regulation of lymphocyte mediated imm unity GO:0002705~positive regulation of leukocyte mediated immunity GO:0002703~regulation of leuk ocyte mediated immunity GO:0051384~response to glucocor ticoid stimulus GO:0050730~regulation of peptidyl−tyrosine phosphor ylation GO:0046635~positive regulation of alpha−beta T cell activ ation GO:0045061~thymic T cell selection GO:0042102~positive regulation of T cell prolif eration GO:0032663~regulation of inter leukin−2 production GO:0032496~response to lipopolysacchar ide GO:0031960~response to cor ticosteroid stimulus GO:0022415~viral reproductive process GO:0019886~antigen processing and presentation of e xogenous peptide antigen via MHC class II GO:0019883~antigen processing and presentation of endogenous antigen GO:0019835~cytolysis GO:0016032~viral reproduction GO:0009306~protein secretion GO:0007229~integrin−mediated signaling pathw ay GO:0006909~phagocytosis GO:0002683~negative regulation of immune system process GO:0002495~antigen processing and presentation of peptide antigen via MHC class II GO:0001894~tissue homeostasis GO:0070661~leukocyte proliferation GO:0051346~negative regulation of hydrolase activity GO:0050848~regulation of calcium−mediated signaling GO:0046890~regulation of lipid biosynthetic process GO:0046651~lymphocyte proliferation GO:0045577~regulation of B cell diff erentiation GO:0045086~positive regulation of inter leukin−2 biosynthetic process GO:0045076~regulation of inter leukin−2 biosynthetic process GO:0045059~positive thymic T cell selection GO:0043368~positive T cell selection GO:0032943~mononuclear cell proliferation GO:0031343~positive regulation of cell killing GO:0031341~regulation of cell killing GO:0030301~cholesterol transport GO:0019915~lipid storage GO:0019885~antigen processing and presentation of endogenous peptide antigen via MHC class I GO:0019218~regulation of steroid metabolic process GO:0019058~viral infectious cycle GO:0018904~organic ether metabolic process GO:0015918~sterol transport GO:0008360~regulation of cell shape GO:0008154~actin polymer ization or depolymer ization GO:0007159~leukocyte adhesion GO:0007033~vacuole organization GO:0006958~complement activ ation, classical pathway GO:0006956~complement activ ation GO:0006662~glycerol ether metabolic process GO:0006641~triglyceride metabolic process GO:0006639~acylglycerol metabolic process GO:0006638~neutral lipid metabolic process GO:0006518~peptide metabolic process GO:0006487~protein amino acid N−link ed glycosylation GO:0002768~immune response−regulating cell surface receptor signaling pathway GO:0002757~immune response−activating signal transduction GO:0002711~positive regulation of T cell mediated imm unity GO:0002709~regulation of T cell mediated imm unity GO:0002700~regulation of production of molecular mediator of imm une response GO:0002541~activation of plasma proteins in volved in acute inflammator y response GO:0002483~antigen processing and presentation of endogenous peptide antigen GO:0002455~humoral immune response mediated by circulating immunoglobulin GO:0002429~immune response−activating cell surface receptor signaling pathway GO:0001912~positive regulation of leukocyte mediated cytotoxicity GO:0001910~regulation of leuk ocyte mediated cytotoxicity GO:0001816~cytokine production GO:0065005~protein−lipid comple x assembly GO:0050850~positive regulation of calcium−mediated signaling GO:0046849~bone remodeling GO:0046640~regulation of alpha−beta T cell prolif eration GO:0046638~positive regulation of alpha−beta T cell diff erentiation GO:0046637~regulation of alpha−beta T cell diff erentiation GO:0045885~positive regulation of sur vival gene product expression GO:0045884~regulation of sur vival gene product expression GO:0045579~positive regulation of B cell diff erentiation GO:0045453~bone resorption GO:0045445~myoblast differentiation GO:0045060~negative thymic T cell selection GO:0043383~negative T cell selection GO:0043372~positive regulation of CD4−positive, alpha beta T cell diff erentiation GO:0043370~regulation of CD4−positiv e, alpha beta T cell diff erentiation GO:0034377~plasma lipoprotein par ticle assembly GO:0033344~cholesterol efflux GO:0032760~positive regulation of tumor necrosis f actor production GO:0032374~regulation of cholesterol tr ansport GO:0032371~regulation of sterol tr ansport GO:0032020~ISG15−protein conjugation GO:0030890~positive regulation of B cell prolif eration GO:0030101~natural killer cell activ ation GO:0019059~initiation of vir al infection GO:0009405~pathogenesis GO:0007040~lysosome organization GO:0006026~aminoglycan catabolic process GO:0002702~positive regulation of production of molecular mediator of imm une response GO:0002637~regulation of imm unoglobulin production GO:0002285~lymphocyte activ ation during immune response GO:0001916~positive regulation of T cell mediated cytoto xicity GO:0001914~regulation of T cell mediated cytoto xicity GO:0048627~myoblast development GO:0034447~very−low−density lipoprotein par ticle clearance GO:0034382~chylomicron remnant clearance GO:0006923~cleavage of cytoskeletal proteins during apoptosis GO:0006922~cleavage of lamin GO:0002481~antigen processing and presentation of e xogenous protein antigen via MHC class Ib , TAP−dependent GO:0002477~antigen processing and presentation of e xogenous peptide antigen via MHC class Ib GO:0002475~antigen processing and presentation via MHC class Ib GO:0002428~antigen processing and presentation of peptide antigen via MHC class Ib

0 25 50 75 Gene count

Figure 4. The figure of the three gene modules GO analysis. Top, middle and bottom are the results of blue, turquoise and yellow respectively. The y-axis shows significantly enriched

GO terms, and x-axis represents significance of enrichment.