<<

Available online at www.jbr-pub.org.cn

Open Access at PubMed Central

The Journal of Biomedical Research, 2021 35(1): 21–35

Original Article

Identification of four novel prognosis biomarkers and potential therapeutic drugs for human colorectal cancer by bioinformatics analysis

Zhen Sun1,2, Chen Liu1, Steven Y. Cheng1,3,✉

1Department of Medical Genetics, 2Department of Pathology and Pathophysiology, 3Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, Jiangsu 211166, China.

Abstract Colorectal cancer (CRC) is one of the most deadly cancers in the world with few reliable biomarkers that have been selected into clinical guidelines for prognosis of CRC patients. In this study, mRNA microarray datasets GSE113513, GSE21510, GSE44076, and GSE32323 were obtained from the Gene Expression Omnibus (GEO) and analyzed with bioinformatics to identify hub genes in CRC development. Differentially expressed genes (DEGs) were analyzed using the GEO2R tool. Gene ontology (GO) and KEGG analyses were performed through the DAVID database. STRING database and Cytoscape software were used to construct a protein-protein interaction (PPI) network and identify key modules and hub genes. Survival analyses of the DEGs were performed on GEPIA database. The Connectivity Map database was used to screen potential drugs. A total of 865 DEGs were identified, including 374 upregulated and 491 downregulated genes. These DEGs were mainly associated with metabolic pathways, pathways in cancer, cell cycle and so on. The PPI network was identified with 863 nodes and 5817 edges. Survival analysis revealed that HMMR, PAICS, ETFDH, and SCG2 were significantly associated with overall survival of CRC patients. And blebbistatin and sulconazole were identified as candidate drugs. In conclusion, our study found four hub genes involved in CRC, which may provide novel potential biomarkers for CRC prognosis, and two potential candidate drugs for CRC.

Keywords: colorectal cancer, Gene Expression Omnibus, biomarkers, bioinformatics analysis

Introduction deaths were reported each year in the world[1]. With the understanding of pathophysiology of the disease, Nowadays, colorectal cancer (CRC) is one of the different treatment options to improve the survival most deadly cancers and almost 900 000 CRC-related rates of CRC patients have been developed in the

✉Corresponding author: Steven Y. Cheng, Department of Medical 25 August 2020; Published online: 22 October 2020 Genetics, Jiangsu Key Lab of Cancer Biomarkers, Prevention and CLC number: R735.3, Document code: A Treatment, Collaborative Innovation Center for Cancer The authors reported no conflict of interests. Personalized Medicine, Nanjing Medical University, 101 Longmian This is an open access article under the Creative Commons Attribu- Avenue, Nanjing, Jiangsu 211166, China. Tel/Fax: +86-25- tion (CC BY 4.0) license, which permits others to distribute, remix, 86869463, E-mail: [email protected]. adapt and build upon this work, for commercial use, provided the Received: 13 February 2020; Revised: 12 August 2020; Accepted: original work is properly cited.

© 2021 by the Journal of Biomedical Research. https://doi.org/10.7555/JBR.34.20200021 22 Sun Z et al. J Biomed Res, 2021, 35(1) world. The 5-year survival rate of CRC patients was between CRC and normal tissues, we searched the >90% when the patients were diagnosed at early NCBI-GEO database to collect enough and adequate stages[2]. However, due to lacking early detection tissues. A total of 4 GEO datasets were selected, methods, many CRC patients were diagnosed at an including GSE113513, GSE21510, GSE44076, and advanced stage or in the metastasis status. And the 5- GSE32323. These mRNA profiles were based on year survival rate for those diagnosed with metastasis platform GPL15207 (GSE113513), GPL570 [3] was at approximately 12% . Recently, a new kind of (GSE21510 and GSE32323), and GPL13667 analysis method has been used to identify the (GSE44076). A total of 253 CRC samples and 203 differential expression genes between CRC and normal samples were chosen for this study, including normal tissues based on the high-throughput 14 pairs of cancer and normal samples in GSE113513, sequencing platforms, such as microarrays. This is a promising tool with extensive clinical applications, 124 CRC samples and 24 normal samples in including molecular diagnosis, prognosis prediction, GSE21510, 98 pairs of cancer and normal samples new drug targets discovery, etc[4–6]. Furthermore, plus 50 healthy donor tissues in GSE44076, and 17 microarray assay combining bioinformatics analysis pairs of cancer and normal samples in GSE32323. made it possible to analyze the gene expression on Identification of DEGs and data preprocessing mRNA level in CRC progression. For example, several studies have used this method to identify key To identify the DEGs, we used the NCBI-GEO2R genes in CRC development through comparing with online tool to analyze these datasets. Subsequently, normal samples, and showed that the key genes were adjusted P-value <0.05 and |log2(fold change)| >1 were involved in different signal pathways, biological set as the cutoff criteria to screen the significant DEGs processes, and molecular functions[7–12]. of each dataset. Finally, Venn diagrams were However, with a relatively limited degree of performed to get the overlap significant DEGs of the overlap, we still can not find reliable biomarkers or 4 datasets. drug targets. Therefore, the discovery of novel biomarkers for early detection and prognosis GO and KEGG pathway analyses of DEGs prediction of CRC is urgently required. To find the biological functional roles of DEGs, GO In the present study, we targeted to find key genes and KEGG pathway analyses were performed through to develop novel biomarkers or drug targets for CRC. DAVID database. Significant results of molecular Therefore, we chose four Gene Expression Omnibus function (MF), biological process (BP), cellular (GEO) datasets, GSE113513, GSE21510, GSE44076, component (CC), and biological pathways were and GSE32323, and used bioinformatics methods selected with P-value <0.05. to screen the significant differentially expressed genes (DEGs) between CRC tissues and normal PPI network construction and module analysis tissues. Gene ontology (GO) and KEGG Pathway The DEGs profiles were submitted to STRING analyses were used to find the biological roles of these database for exploring their potential interactions. DEGs through DAVID database. Furthermore, the PPI The interactions with a combined score >0.4 were network of DEGs was constructed and key modules or considered significant. Subsequently, the interaction hub genes were selected with Molecular Complex Detection (MCODE) plugin of Cytoscape software. files were downloaded and imported into Cytoscape And the clinical significance was validated by GEPIA software to construct the PPI network. The MCODE database. Finally, small active candidate molecules plugin was used to find key modules of the whole PPI were identified to develop new drugs through network with a degree cutoff=2, node score Connectivity Map (CMap) database. In brief, we cutoff=0.2, K-core=2, and max depth=100. The found four hub genes involved in CRC, which may hub genes were then selected with connectivity provide novel potential biomarkers for CRC degree >10. Furthermore, KEGG pathway analyses of prognosis, and two potential candidate drugs for CRC. the significant modules were performed with P-value <0.05. Materials and methods Analysis and validation of hub genes Data resources To verify the hub genes we found, we used GEPIA To explore the differential gene expression profiles database to analyze their expression and clinical Novel prognosis biomarkers and drugs for colorectal cancer 23 prognostic information in 270 CRC patients. And the edges (Fig. 3). Based on degree scores using the survival curve, stage analysis and box plot were MCODE plugin, two key modules were detected from performed to show the clinical implications of the whole PPI network complex. Module 1 contained hub genes. 61 nodes and 1648 edges, and DEGs were enriched in cell cycle, oocyte meiosis, progesterone-mediated Identification of small molecules oocyte maturation, DNA replication and p53 signaling To find potential small active molecules to develop pathway (Fig. 4A and B). Module 2 had 55 nodes and new drugs for treating CRC, we uploaded DEGs 625 edges, and these DEGs were enriched in probe profiles into the CMap database. This database chemokine signaling pathway, ribosome biogenesis in can help to predict small molecules that induce or eukaryotes, cytokine-cytokine receptor interaction, reverse gene expression signature with a score from pathways in cancer, purine metabolism, RNA −1 to 1. And the molecules which value from 0 closer polymerase, retrograde endocannabinoid signaling, TNF signaling pathway, legionellosis, regulation of to −1 were functioned as reversing the cancer cell status. lipolysis in adipocytes, NOD-like receptor signaling Results pathway, cytosolic DNA-sensing pathway and gastric acid secretion (Fig. 4C and D). Additionally, the top Identification of DEGs in CRC 20 hub genes, CDK1, CCNB1, MYC, CCNA2, MAD2L1, AURKA, TOP2A, CDC6, UBE2C, CHEK1, Analyzed with the GEO2R online tool, a total of RRM2, BUB1B, TTK, TRIP13, TPX2, BUB1, NCAPG, 1763, 4411, 2428, and 2276 DEGs were extracted KIF2C, KIF23, and MCM4 were identified with from GSE113513, GSE21510, GSE44076, and higher degrees of connectivity. These hub genes were GSE32323, respectively, using adjusted P-value <0.05 enriched in cell cycle, progesterone-mediated oocyte and |log2(fold change)| >1 as cutoff criteria. The maturation, oocyte meiosis, p53 signaling pathway, volcano plots of DEGs in each dataset were shown in and HTLV-I infection (Fig. 4E and F). Fig. 1A. And the Venn diagrams showed that 865 overlap DEGs were identified from these four Analysis and validation of hub genes datasets, including 374 significantly upregulated genes To validate the hub genes we got from this study, and 491 downregulated genes (Fig. 1B and we uploaded the hub genes list into GEPIA database Supplementary Table 1, available online). and explored the correlation between hub genes Enrichment analysis of DEGs expression and the clinical characteristics of CRC. It was found that HMMR, PAICS, ETFDH, and SCG2 To explore the biological functional roles of the were significant DEGs in 270 CRC samples from overlap DEGs, GO and KEGG analyses were GEPIA (Fig. 5A). And these four genes could performed on DAVID database. And the top 20 terms represent the important prognostic biomarkers for were listed in the charts (Fig. 2A–D and predicting the survival of CRC patients (Fig. 5B). Supplementary Table 2, available online). The GO Meanwhile, PAICS and SCG2 were related to the analysis results consist of three functional categories, stages of CRC progression (Fig. 5C). The summaries including BP, CC, and MF. In the BP group, DEGs of four hub genes were shown in Table 1. were mainly enriched in cell proliferation (Fig. 2A). Identification of related active small molecules In the CC group, DEGs were enriched in cytoplasm (Fig. 2B). And in the MF group, DEGs were enriched To search candidate small molecules for developing in protein binding (Fig. ). KEGG pathway analysis potential drugs to treat CRC, we uploaded DEGs showed that DEGs were enriched in metabolic probe profiles into the CMap database. And the pathways, pathways in cancer and cell cycle (Fig. 2D). predicted results were download and filtered with The details of the top 20 terms were listed in enrichment score <0 and P-value <0.05. The results Supplementary Table 2. were shown in Table 2. And Fig. 5D listed the top 20 small molecules with their enrichment scores and PPI network construction and module analysis P-values. Therefore, these small molecules may be the Using the STRING online database and Cytoscape targets to develop new drugs or therapies of CRC. software, a total of 865 DEGs were filtered into the Among these molecules, Blebbistatin and Sulconazole PPI network complex, containing 863 nodes and 5817 may be selected for new clinical trials. 24 Sun Z et al. J Biomed Res, 2021, 35(1)

A GSE113513 GSE21510 11.82 55.83

9.46 44.66

7.09 33.50 -value) -value) P P ( ( 10 10 4.73 22.33 –log –log

2.36 11.17

0 0 –5.41 –3.63 –1.86 –0.08 –1.69 –3.46 –5.24 –7.91 –5.22 –2.53 0.16 2.84 5.53 8.22

log2(fold change) log2(fold change)

GSE44076 GSE32323 118.01 118.01

94.41 94.41

70.81 70.81 -value) -value) P P ( ( 10 10 47.2 47.2 –log –Log

23.6 23.6

0 0 –7.81 –5.58 –3.34 –1.11 1.12 3.36 5.59 –4.91 –3.17 –1.42 0.32 2.06 3.81 5.55

log2(fold change) log2(fold change)

B GSE21510 GSE44076

GSE113513 2006 406 GSE32323 56 294 39

205 149 343 197

256 865 600 76 98 58

GSE21510 GSE44076 GSE21510 GSE44076

GSE113513 1263 270 GSE32323 GSE113513 770 142 GSE32323 22 149 20 27 141 20

50 76 181 111 166 69 161 90

112 374 405 146 491 195 28 53 48 44

4 53

Fig. 1 Identification of the DEGs in CRC. A: Volcano plots of gene expression profiles between CRC and normal samples from GSE113513, GSE21510, GSE44076 and GSE32323. Red dots: significantly upregulated genes in CRC; Green dots: remarkably downregulated genes in CRC. Adjusted P value <0.05 and |log2(fold change)| >1 were considered as significant criteria. B: Venn diagrams show that 865 overlap DEGs were found through GEO2R in the four datasets, including 374 upregulated DEGs and 491 downregulated DEGs. DEGs: differentially expressed genes; CRC: colorectal cancer. Novel prognosis biomarkers and drugs for colorectal cancer 25

A GO: 0006508~proteolysis GO: 0043066~negative regulation of apoptotic process GO: 0045893~positive regulation of transcription: DNA-templated GO: 0001666~response to hypoxia Count GO: 0006629~lipid metabolic process 15 GO: 0030198~extracellular matrix organization 20 GO: 0008284~positive regulation of cell proliferation 25 GO: 0010629~nagative regulation of gene expression 30 GO: 0042493~response to drug 35 GO: 0006260~DNA replication 40 GO: 0043065~positive regulation of apoptotic process

GO: 0000086~G2/M transition of mitotic cell cycle –log10 (P-value) GO: 0042127~regulation of cell proliferation 5 GO: 0006364~rRNA processing GO: 0000082~G1/S transition of mitotic cell cycle 4 GO: 0008285~nagative regulation of cell proliferation 3 GO: 0030855~epithelial cell differentiation 2 GO: 0008283~cell proliferation GO: 0051301~cell division GO: 0007067~mitotic nuclear division

0 0.01 0.02 0.03 0.04 Rich factor

B GO: 0045121~membrane raft GO: 0005886~plasma membrane GO: 0031965~nuclear membrane GO: 0048471~perinuclear region of cytoplasm GO: 0009986~cell surface GO: 0005576~extracellular region Count GO: 0005813~centrosome 100 GO: 0005819~spindle 200 GO: 0005654~nucleoplasm

GO: 0030496~midbody –log10 (P-value) GO: 0005737~cytoplasm

GO: 0005730~nucleolus 10 GO: 0016020~membrane

GO: 0005887~integral component of plasma membrane 5 GO: 0005578~proteinaceous extracellular matrix GO: 0031012~extracellular matrix GO: 0016324~apical plasma membrane GO: 0005829~cytosol GO: 0005615~extracellular space GO: 0070062~extracellular exosome

0 0.01 0.02 Rich factor (Continued) 26 Sun Z et al. J Biomed Res, 2021, 35(1)

(Continued)

C GO: 0016491~oxidoreductase activity GO: 0016887~AT pase activity GO: 0003712~transcription cofactor activity GO: 0005200~structural constituent of cytoskeleton GO: 0005215~transporter activity Count GO: 0046983~protein dimerization activity 100 GO: 0003682~chromatin binding 200 GO: 0005179~hormone activity 300 GO: 0019901~protein kinase binding 400 GO: 0050660~flavin adenine dinucleotide binding

GO: 0001786~phosphaticlylserine binding –log10 (P-value)

GO: 0005201~extracellular matrix structural constituent 4 GO: 0005254~chloride channel activity GO: 0016301~kinase activity 3 GO: 0008009~chemokine activity 2 GO: 0008201~heparin binding GO: 0004435~phosphatidylinositol activity GO: 0005524~ATP binding GO: 0017080~sodium channel regulator activity GO: 0005515~protein binding

0 0.01 0.02 0.03 0.04 0.05 Rich factor D hsa00240: Pyrimidine matebolism hsa04640: Hematopoietic cell lineage hsa04512: ECM-receptor interaction hsa03008: Ribosome biogenesis in eukaryotes hsa04970: Salivary secretion Count hsa04911: Insulin secretion 20 hsa04310: Wnt signaling pathway 40 hsa05146: Amoebiasis 60 hsa04270: Vascular smooth muscle contraction 80 hsa01100: Metabolic pathways

hsa04115: p35 signaling pathway –log10 (P-value) hsa03320: PPAR signaling pathway 4 hsa04919: Thyroid hormone signaling pathway hsa05200: Pathways in cancer 3 hsa00230: purine metabolism 2 hsa04971: Gastric acid secretion hsa04924: Renin secretion hsa04972: Pancreatic secretion hsa04978: Mineral absorption hsa04110: Cell cycle 0 0.01 0.02 0.03 0.04 Rich factor

Fig. 2 GO and KEGG analysis of the overlap differentially expressed genes in colorectal cancer through DAVID online-tools. Top 20 terms of biological processes (A), cellular components (B), molecular functions (C), and KEGG signaling pathways (D) were shown in the charts, and P-value <0.05 was considered as selection criteria. Novel prognosis biomarkers and drugs for colorectal cancer 27

Fig. 3 PPI interaction network of the overlap differentially expressed genes by STRING database.

In summary, we chose GSE113513, GSE21510, Discussion GSE44076, and GSE32323 GEO datasets and found 865 significant DEGs between CRC tissues and In our study, we chose four GEO datasets and used normal tissues. Subsequently, the biological roles of bioinformatics methods to get 865 DEGs (374 these DEGs were confirmed with enrichment pathway upregulated and 491 downregulated). KEGG pathway analysis. Furthermore, the four hub genes, HMMR, analysis showed that the key modules were mainly PAICS, ETFDH, and SCG2 were identified as metabolic pathways, pathways in cancer, cell cycle, important prognostic biomarkers for predicting the purine metabolism, pancreatic secretion, thyroid survival of CRC patients based on the GEPIA hormone signaling pathway and Wnt signaling database. Finally, blebbistatin and sulconazole were pathway. The PPI network was constructed including picked out to develop new drugs through CMap 863 nodes and 5817 edges. The four hub genes, database (Fig. 6). HMMR, PAICS, ETFDH, and SCG2 were remarkably 28 Sun Z et al. J Biomed Res, 2021, 35(1)

A B hsa04115: p35 signaling pathway Count 4 hsa03030: 6 DNA replication 8 hsa04914: 10 Progesterone-mediated 12 oocyte maturation –log10(P-value) 12.5 hsa04114: Oocyte meiosis 10.0 7.5 hsa04110: 5.0 Cell cycle

0 0.0005 0.0010 0.0015 Rich factor

C D hsa04971: Gastric acid secretion hsa04623: Cytosolic DNA-sensing pathway hsa04923: Regulation of lipolysis in adipocytes hsa04621: NOD-like receptor signaling pathway Count hsa05134: 5.0 Legionellosis hsa04668: 7.5 TNF signaling pathway 10.0 hsa04723: Retrograde endocannabinoid signaling hsa03020: RNA polymerase hsa00230: –log10(P-value) Purine metabolism hsa05200: 7.5 Pathways in cancer hsa04060: Cytokine-cytokine receptor interaction 5.0 hsa03008: Ribosome biogenesis in eukaryotes 2.5 hsa04062: Chemokine signaling pathway 0 0.01 0.02 0.03 0.04 Rich factor

(Continued) Novel prognosis biomarkers and drugs for colorectal cancer 29

(Continued)

E F hsa05166: HTLV- I infection Count 4 hsa04115: 6 p53 signaling pathway 8 10 hsa04114: Oocyte meiosis –log10 (P-value)

hsa04914: Progesterone-mediated oocyte maturation 10

hsa04110: 5 Cell cycle

0 0.005 0.010 Rich factor

Fig. 4 Module and KEGG analyses of the hub genes. A and C: Two top modules screened from the whole PPI network of DEGs analyzed by MCODE plugin in Cytoscape. E: Top 20 hub genes with a higher degree of connectivity of DEGs were selected with MCODE plugin. B, D, and F: KEGG pathway analysis of the two top modules and top 20 hub genes. DEGs: differentially expressed genes.

related to the prognosis of patients. Furthermore, two study, we have identified four hub genes as new small molecules, blebbistatin and sulconazole, also potential biomarkers to predict the prognosis of CRC have been identified as potential candidates to develop patients and two new small molecules. Further studies new drugs. are needed to develop new drugs to treat CRC. Recently, findings about DEGs or molecular Several studies have reported that these hub genes biomarkers of CRC have been increasingly reported. play important roles in cancer development. For Based on integrated analysis of GSE32323, instance, HMMR expression level was remarkably GSE74602, and GSE113513 datasets, and TCGA correlated with the progression and prognosis of databases, CCL19, CXCL1, CXCL5, CXCL11, breast cancer[13], bladder cancer[14], prostate CXCL12, GNG4, INSL5, NMU, PYY, and SST were cancer[15–16], lung cancer[17–19], hepatocellular carcinoma identified as hub genes. And 9 genes including (HCC)[20–22], and gastric cancer[23]. Furthermore, SLC4A4, NFE2L3, GLDN, PCOLCE2, TIMP1, HMMR was confirmed to maintain its oncogenic CCL28, SCGB2A1, AXIN2, and MMP1 was related to properties and resistance to chemotherapy through predicting overall survivals of CRC patients[8]. activating TGF-β/Smad-2 signaling pathway[24]. And Moreover, TOP2A, MAD2L1, CCNB1, CHEK1, HMMR was highly expressed in glioblastoma and CDC6, and UBE2C were indicated as hub genes, and related to support the self-renewal and tumorigenic TOP2A, MAD2L1, CDC6, and CHEK1 may serve as potential of glioblastoma stem cells[25]. PAICS was prognostic biomarkers in CRC[10]. In addition, also upregulated in several kinds of cancer tissues and CEACAM7, SLC4A4, GCG, and CLCA1 genes were it promotes cancer cells proliferation, migration, and associated with unfavorable prognosis in CRC[11]. invasion[26–31]. The expression level of ETFDH was According to analysis of GEO datasets and survival found significantly decreased in HCC tissues, and this analysis by GEPIA database, AURKA, CCNB1, low expression was related to poor overall survival in CCNF, and EXO1 were significantly associated with patients[32]. However, the role of SCG2 in cancer longer overall survival. Moreover, CMap predicted remains unclear. that DL-thiorphan, repaglinide, MS-275, and In the present study, HMMR, PAICS, ETFDH, and quinostatin have the potential to treat CRC[9]. In this SCG2 were significantly up or down regulated in CRC 30 Sun Z et al. J Biomed Res, 2021, 35(1)

A HMMR PAICS ETFDH SCG2 6 5 8 5 5 4 6 4 4 3 3 4 3 2 2 2 2 1 1 1

0 0 0 0 Tumor Normal Tumor Normal Tumor Normal Tumor Normal n=275 n=349 n=275 n=349 n=275 n=349 n=275 n=349 B Overall survival Overall survival 1.0 Low HMMR TPM 1.0 Low PAICS TPM Hign HMMR TPM Hign PAICS TPM Log-rank P=0.019 Log-rank P=0.009 7 0.8 HR(high)=0.56 0.8 HR(high)=0.52 P(HR)=0.02 P(HR)=0.011 n(hign)=135 n(hign)=135 0.6 n(low)=135 0.6 n(low)=135

0.4 0.4 Percent survival Percent survival 0.2 0.2

0 0 0 50 100 150 0 50 100 150 Months Months Overall survival Overall survival 1.0 Low ETFDH TPM 1.0 Low SCG2 TPM Hign ETFDH TPM Hign SCG2 TPM Log-rank P=0.028 Log-rank P=0.005 0.8 HR(high)=0.58 0.8 HR(high)=2 P(HR)=0.03 P(HR)=0.006 n(hign)=135 n(hign)=135 0.6 n(low)=134 0.6 n(low)=135

0.4 0.4 Percent survival Percent survival 0.2 0.2

0 0 0 50 100 150 0 50 100 150 Months Months C 6 9 F value=2.81 F value=2.97 Pr(>F)=0.039 8 5 Pr(>F)=0.0324 8 4 7 3

6 2

5 1 0 Stage Ⅰ Stage Ⅱ Stage Ⅲ Stage Ⅳ Stage Ⅰ Stage Ⅱ Stage Ⅲ Stage Ⅳ PAICS SCG2

(Continued) Novel prognosis biomarkers and drugs for colorectal cancer 31

(Continued)

D DL-thiorphan Quinostatin 1, 4-chrysenequinone Menadione Count Blebbistatin 2.0 2.5 Thioguanosine 3.0 Sulconazole 3.5 0297417-0002B 4.0 Etacrynic acid Latamoxef –log10 (P-value) GW-8510 2.8 Triflusal 2.4 Piperidolate Daunorubicin 2.0 Repaglinide 1.6 Camptothecin Norcyclobenzaprine Ronidazole Ellipticine

–0.9 –0.8 –0.7

Fig. 5 Analysis and validation of hub genes and identification of related active small molecules. A and B: The expression level and prognostic value of four hub genes based on the GEPIA database. C: PAICS and SCG2 were related with the stages of CRC progression through GEPIA database. D: Top 20 potential small active molecules reverse DEG of CRC predicted by CMap database. TPM: transcripts per million.

GEO datasets selection

A total of four GEO datasets

DEGs identification

DEGs among each GEO dataset

Overlap DEGs identification

865 overlap DEGs of the four GEO datasets Functional characteristics analysis GO and KEGG pathway enrichment analysis

PPI network establishment

STRING database and module analysis

Hub genes identification

Functional characteristics TCGA database validation CMap database analysis KEGG pathway Expression and survival analysis Two potential drugs enrichment analysis

Fig. 6 Workflow model of the present study. 32 Sun Z et al. J Biomed Res, 2021, 35(1)

Table 1 Gene summaries of the four hub genes

Microarray datasets [P-value, log2(fold change)] Gene Summary GSE113513 GSE21510 GSE44076 GSE32323

The protein encoded by this gene is involved in cell motility. It is expressed in breast tissue and together with other proteins, it forms a complex with BRCA1 and HMMR 6.82e−04, 1.11 3.84e−35, 3.22 5.32e−22, 1.20 2.63e−04, 1.63 BRCA2, thus is potentially associated with higher risk of breast cancer. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene.

This gene encodes a bifunctional enzyme containing phosphoribosylaminoimidazole carboxylase activity in its N-terminal region and phosphoribosylaminoimidazole succinocarboxamide synthetase in its C-terminal region. It catalyzes steps 6 and 7 of purine biosynthesis. The gene is PAICS closely linked and divergently transcribed with a locus 4.77e−08, 1.31 5.71e−47, 2.48 2.14e−55, 1.49 1.38e−06, 1.73 that encodes an enzyme in the same pathway, and transcription of the two genes is coordinately regulated. The human genome contains several pseudogenes of this gene. Multiple transcript variants encoding different isoforms have been found for this gene.

This gene encodes a component of the electron-transfer system in mitochondria and is essential for electron transfer from a number of mitochondrial flavin-containing ETFDH dehydrogenases to the main respiratory chain. Mutations 1.78e−08, −1.47 1.67e−27, −1.92 1.80e−71, −1.85 3.14e−08, −1.78 in this gene are associated with glutaric acidemia. Alternatively spliced transcript variants that encode distinct isoforms have been observed.

The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. Studies in rodents suggest that the full- length protein, secretogranin II, is involved in the SCG2 packaging or sorting of peptide hormones and2.37e−07, −2.50 1.13e−17, −1.86 4.83e−25, −1.90 2.91e−07, −2.28 neuropeptides into secretory vesicles. The full-length protein is cleaved to produce the active peptide secretoneurin, which exerts chemotaxic effects on specific cell types, and EM66, whose function is unknown.

tissues compared with those in normal samples, and proliferation and formation of breast cancer stem cells the survival rate of CRC patients was positively through blocking the NF-κB/IL-8 signaling correlated with the expression of these genes. Besides, pathway[35]. Although these two molecules have several small molecules with potential therapeutic significant antitumor activity, their specific roles in efficacy were identified through bioinformatics CRC development need to be further clarified. analyses, including blebbistatin and sulconazole. Acknowledgments Blebbistatin has been reported to inhibit cell migration and invasiveness of pancreatic adenocarcinoma[31], and This work was supported by grants from the decrease spreading and migration of breast cancer National Natural Science Foundation of China (No. cells[33]. Moreover, blebbistatin has shown its 81672748 and No. 81871936). antitumorigenic properties in HCC cells[34]. Another small molecule, sulconazole, also inhibited the Novel prognosis biomarkers and drugs for colorectal cancer 33

Table 2 The significant small active molecules that may reverse the DEGs of CRC predicted by CMap

CMap name Count Enrichment P-value CMap name Count Enrichment P-value

DL-thiorphan 2 −0.976 0.001 27 8-azaguanine 4 −0.644 0.038 11

Quinostatin 2 −0.878 0.029 90 Acepromazine 4 −0.640 0.040 08

1,4-chrysenequinone 2 −0.866 0.035 75 4 −0.639 0.040 48

Menadione 2 −0.852 0.043 24 Mefloquine 5 −0.637 0.016 28

Blebbistatin 2 −0.844 0.048 59 Trifluridine 4 −0.637 0.041 87

trazodone 3 −0.813 0.013 20 4 −0.636 0.042 51

Thioguanosine 4 −0.809 0.002 59 Trioxysalen 4 −0.633 0.044 24

Sulconazole 4 −0.805 0.002 82 4 −0.629 0.046 73

0297417-0002B 3 −0.792 0.018 27 Etofenamate 4 −0.624 0.049 27

Etacrynic acid 3 −0.776 0.022 99 Phthalylsulfathiazole 5 −0.616 0.022 71

Latamoxef 3 −0.759 0.028 76 Oxetacaine 5 −0.610 0.025 23

GW-8510 4 −0.757 0.007 12 Amiodarone 5 −0.601 0.029 00

Triflusal 3 −0.755 0.030 11 Meticrane 5 −0.599 0.029 62

Piperidolate 3 −0.743 0.034 63 Vorinostat 12 −0.596 0.000 18

Daunorubicin 4 −0.733 0.010 19 5 −0.592 0.032 64

Repaglinide 4 −0.727 0.011 42 5 −0.573 0.042 46

Camptothecin 3 −0.725 0.042 85 Zimeldine 5 −0.572 0.043 38

Norcyclobenzaprine 4 −0.723 0.012 02 Clemizole 5 −0.568 0.045 48

Ronidazole 3 −0.711 0.049 19 Astemizole 5 −0.567 0.046 14

Ellipticine 4 −0.708 0.015 14 Meclozine 5 −0.565 0.047 08

Gliclazide 4 −0.707 0.015 28 Procaine 5 −0.565 0.047 46

Phenoxybenzamine 4 −0.704 0.015 76 Cloperastine 6 −0.551 0.031 56

0175029-0000 6 −0.701 0.001 63 6 −0.543 0.036 21

Tyloxapol 4 −0.694 0.018 52 Dipyridamole 6 −0.537 0.039 63

Skimmianine 4 −0.690 0.020 01 6 −0.523 0.047 85

Bisacodyl 4 −0.690 0.020 03 16 −0.504 0.000 20

Resveratrol 9 −0.684 0.000 04 16 −0.434 0.002 99

Medrysone 6 −0.680 0.002 92 LY-294002 61 −0.428 0.000 00

Bepridil 4 −0.674 0.025 22 Trichostatin A 182 −0.425 0.000 00

Pyrvinium 6 −0.672 0.003 34 18 −0.406 0.003 80

Nortriptyline 4 −0.669 0.027 07 20 −0.388 0.003 27

Apigenin 4 −0.667 0.027 65 Alpha-estradiol 16 −0.357 0.025 35

Methylergometrine 4 −0.664 0.028 71 Geldanamycin 15 −0.349 0.039 26

Bufexamac 4 −0.660 0.030 10 Sirolimus 44 −0.335 0.000 08

Prestwick-1084 4 −0.656 0.032 41 19 −0.329 0.024 17

Deptropine 4 −0.651 0.034 95 Wortmannin 18 −0.314 0.046 91

Dextromethorphan 4 −0.647 0.036 86 Tanespimycin 62 −0.311 0.000 00

Protriptyline 4 −0.646 0.037 36 Fulvestrant 40 −0.225 0.029 99 34 Sun Z et al. J Biomed Res, 2021, 35(1)

References [16] Wang YP, Chen L, Ju LG, et al. Novel biomarkers associated with progression and prognosis of bladder cancer identified by [1] Dekker E, Tanis PJ, Vleugels JLA, et al. Colorectal cancer[J]. co-expression analysis[J]. Front Oncol, 2019, 9: 1030. Lancet, 2019, 394(10207): 1467–1480. [17] Song YJ, Tan J, Gao XH, et al. Integrated analysis reveals key [2] Brenner H, Kloor M, Pox CP. Colorectal cancer[J]. Lancet, genes with prognostic value in lung adenocarcinoma[J]. 2014, 383(9927): 1490–1502. Cancer Manag Res, 2018, 10: 6097–6108. [3] Siegel RL, Miller KD, Jemal A. Cancer statistics, 2015[J]. CA [18] Zhang L, Zhang Z, Yu ZL. Identification of a novel glycolysis- Cancer J Clin, 2015, 65(1): 5–29. related gene signature for predicting metastasis and survival in [4] Kulasingam V, Diamandis EP. Strategies for discovering novel patients with lung adenocarcinoma[J]. J Transl Med, 2019, cancer biomarkers through utilization of emerging 17(1): 423. technologies[J]. Nat Clin Pract Oncol, 2008, 5(10): 588–599. [19] Liu C, Li YY, Wei MJ, et al. Identification of a novel [5] Nannini M, Pantaleo MA, Maleddu A, et al. Gene expression glycolysis-related gene signature that can predict the survival profiling in colorectal cancer using microarray technologies: of patients with lung adenocarcinoma[J]. Cell Cycle, 2019, results and perspectives[J]. Cancer Treat Rev, 2009, 35(3): 18(5): 568–579. 201–209. [20] Zhou ZY, Li YZ, Hao HY, et al. Screening hub genes as [6] Bustin SA, Dorudi S. Gene expression profiling for molecular prognostic biomarkers of hepatocellular carcinoma by staging and prognosis prediction in colorectal cancer[J]. Expert bioinformatics analysis[J]. Cell Transplant, 2019, 28(1S): Rev Mol Diagn, 2004, 4(5): 599–607. 76S–86S. [7] Zhao B, Baloch Z, Ma YH, et al. Identification of potential key [21] Ni WK, Zhang SQ, Jiang B, et al. Identification of cancer- genes and pathways in early-onset colorectal cancer through related gene network in hepatocellular carcinoma by combined bioinformatics analysis[J]. Cancer Control, 2019, 26(1): bioinformatic approach and experimental validation[J]. Pathol 1073274819831260. Res Pract, 2019, 215(6): 152428. [8] Chen LB, Lu DW, Sun KK, et al. Identification of biomarkers [22] Shen S, Kong JJ, Qiu YW, et al. Identification of core genes associated with diagnosis and prognosis of colorectal cancer and outcomes in hepatocellular carcinoma by bioinformatics patients based on integrated bioinformatics analysis[J]. Gene, analysis[J]. J Cell Biochem, 2019, 120(6): 10069–10081. 2019, 692: 119–125. [23] Zhang HZ, Ren LL, Ding Y, et al. Hyaluronan-mediated [9] Chen J, Wang ZH, Shen XJ, et al. Identification of novel motility receptor confers resistance to chemotherapy via biomarkers and small molecule drugs in human colorectal TGFβ/Smad-2-induced epithelial-mesenchymal transition in cancer by microarray and bioinformatics analysis[J]. Mol gastric cancer[J]. FASEB J, 2019, 33(5): 6365–6377. Genet Genomic Med, 2019, 7(7): e00713. [24] Tilghman J, Wu H, Sang YY, et al. HMMR maintains the [10] Yu C, Chen FQ, Jiang JJ, et al. Screening key genes and stemness and tumorigenicity of glioblastoma stem-like Cells[J]. signaling pathways in colorectal cancer by integrated Cancer Res, 2014, 74(11): 3168–3179. bioinformatics analysis[J]. Mol Med Rep, 2019, 20(2): [25] Ye S, Liu Y, Fuller AM, et al. TGFβ and Hippo pathways 1259–1269. cooperate to enhance sarcomagenesis and metastasis through [11] Bian QL, Chen JX, Qiu WQ, et al. Four targeted genes for the hyaluronan-mediated motility receptor (HMMR)[J]. Mol predicting the prognosis of colorectal cancer: a bioinformatics Cancer Res, 2020, 18(4): 560–573. analysis case[J]. Oncol Lett, 2019, 18(5): 5043–5054. [26] Chakravarthi BVSK, Del Carmen Rodriguez Pena M, Agarwal [12] Sun GW, Li YL, Peng YJ, et al. Identification of differentially S, et al. A role for de novo purine metabolic enzyme PAICS in expressed genes and biological characteristics of colorectal bladder cancer progression[J]. Neoplasia, 2018, 20(9): cancer by integrated bioinformatics analysis[J]. J Cell Physiol, 894–904. 2019, 234(9): 15215–15224. [27] Meng MJ, Chen YL, Jia JB, et al. Knockdown of PAICS [13] Yeh MH, Tzeng YJ, Fu TY, et al. Extracellular matrix-receptor inhibits malignant proliferation of human breast cancer cell interaction signaling genes associated with inferior breast lines[J]. Biol Res, 2018, 51(1): 24. cancer survival[J]. Anticancer Res, 2018, 38(8): 4593–4605. [28] Goswami MT, Chen GA, Chakravarthi BVSK, et al. Role and [14] Yang D, Ma Y, Zhao PC, et al. Systematic screening of regulation of coordinately expressed de novo purine protein-coding gene expression identified HMMR as a potential biosynthetic enzymes PPAT and PAICS in lung cancer[J]. independent indicator of unfavorable survival in patients with Oncotarget, 2015, 6(27): 23445–23461. papillary muscle-invasive bladder cancer[J]. Biomed [29] Zhou SY, Yan YL, Chen X, et al. Roles of highly expressed Pharmacother, 2019, 120: 109433. PAICS in lung adenocarcinoma[J]. Gene, 2019, 692: 1–8. [15] Rizzardi AE, Rosener NK, Koopmeiners JS, et al. Evaluation [30] Chakravarthi BVSK, Goswami MT, Pathi SS, et al. Expression of protein biomarkers of prostate cancer aggressiveness[J]. and role of PAICS, a de novo purine biosynthetic gene in BMC Cancer, 2014, 14: 244. prostate cancer[J]. Prostate, 2017, 77(1): 10–21. Novel prognosis biomarkers and drugs for colorectal cancer 35

[31] Duxbury MS, Ashley SW, Whang EE. Inhibition of pancreatic 2006, 66(9): 4725–4733. adenocarcinoma cellular invasiveness by blebbistatin: a novel [34] Doller A, Badawi A, Schmid T, et al. The cytoskeletal myosin II inhibitor[J]. Biochem Biophys Res Commun, 2004, inhibitors latrunculin A and blebbistatin exert antitumorigenic 313(4): 992–997. properties in human hepatocellular carcinoma cells by [32] Wu YX, Zhang XS, Shen R, et al. Expression and significance interfering with intracellular HuR trafficking[J]. Exp Cell Res, of ETFDH in hepatocellular carcinoma[J]. Pathol Res Pract, 2015, 330(1): 66–80. 2019, 215(12): 152702. [33] Betapudi V, Licate LS, Egelhoff TT. Distinct roles of [35] Choi HS, Kim JH, Kim SL, et al. Disruption of the NF-κB/IL-8 nonmuscle myosin II isoforms in the regulation of MDA-MB- signaling axis by sulconazole inhibits human breast cancer 231 breast cancer cell spreading and migration[J]. Cancer Res, stem cell formation[J]. Cells, 2019, 8(9): 1007.

RECEIVE IMMEDIATE NOTIFICATION FOR EARLY RELEASE ARTICLES PUBLISHED ONLINE

To be notified by e-mail when Journal early release articles are published online, sign up at jbr-pub.org.cn.