3467

Original Article

Identification of biomarkers for the diagnosis and treatment of primary colorectal cancer based on microarray technology

Zhi Gang Zheng1, Bao Qing Ma2, Yu Xiao3, Tian Xi Wang4, Tian Yu5, Yu Hu Huo6, Qing Qing Wang7, Meng Jie Shan8, Ling Bing Meng9, Jing Han10

1The First Ward of the General Surgery Department, The First Hospital of Shijiazhuang, Shijiazhuang 050011, China; 2Department of Gastrointestinal Surgery, Dezhou People’s Hospital, Dezhou 253000, China; 3School of Basic Medicine, Peking University, Beijing 100191, China; 4School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China; 5Department of Surgery, Beijing Hospital, National Center of Gerontology, Beijing 100730, China; 6Department of Breast and Thyroid Surgery, Shanghai Tenth People’s Hospital, Tongji University, School of Medicine, Shanghai 200072, China; 7Department of Biotherapy, Tianjin Medical University Cancer Institute, and Hospital, Tianjin 300060, China; 8Graduate School, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China; 9Neurology Department, Beijing Hospital, National Center of Gerontology, Beijing 100730, China; 10Department of Medical Oncology, The Fourth Hospital of Hebei Medical University, Shijiazhuang 050000, China Contributions: (I) Conception and design: ZG Zheng, Y Xiao; (II) Administrative support: BQ Ma, J Han; (III) Provision of study materials or patients: LB Meng, YH Huo; (IV) Collection and assembly of data: TX Wang, QQ Wang; (V) Data analysis and interpretation: T Yu, MJ Shan; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors. Correspondence to: Jing Han. Department of Medical Oncology, The Fourth Hospital of Hebei Medical University, 12 Jiankang Road, Shijiazhuang 050000, China. Email: [email protected].

Background: Primary colorectal cancer (PCRC) is one of the most common malignant tumors in clinic, and is characterized by high heterogeneity occurring between tumors and intracellularly. Therefore, this study aimed to explore potential targets for the diagnosis and treatment of PCRC via bioinformatic technology. Methods: Gene Expression Omnibus (GEO) was used to download the data used in this study. Differently expressed (DEGs) were identified with GEO2R, and the gene set enrichment analysis (GSEA) was implemented for enrichment analysis. Then, the researchers constructed a -protein interaction (PPI) network, a significant module, and a hub genes network. Results: The GSE81558 dataset was downloaded, and a total of 97 DEGs were found. There were 23 up- regulated DEGs and 74 down-regulated DEGs in the PCRC samples, compared with the control group. The PPI network included a total of 42 nodes and 63 edges. One module network consisted of 11 nodes and 25 edges. Another module network consisted of 4 nodes and 6 edges. The hub genes network was created by cytoHubba using GCG, GUCA2B, CLCA4, ZG16, TMIGD1, GUCA2A, CHGA, PYY, SST, and MS4A12. Conclusions: Ten hub genes were found from the genomic samples of patients with PCRC and normal controls by bioinformatics analysis. The hub genes might provide novel ideas and evidence for the diagnosis and targeted therapy of PCRC.

Keywords: Primary colorectal cancer (PCRC); differentially expressed genes; gene set enrichment analysis (GSEA); hub genes

Submitted Oct 25, 2019. Accepted for publication Mar 24, 2020. doi: 10.21037/tcr-19-2290 View this article at: http://dx.doi.org/10.21037/tcr-19-2290

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 3454 Zheng et al. Biomarkers for the diagnosis and treatment of PCRC

Introduction intestinal tissues. Analysis of DEGs was completed with (GO) and Kyoto Encyclopedia of Genes Primary colorectal cancer (PCRC) is a colorectal cancer that and Genomes (KEGG). At the same time, all genes in the includes colon and rectal cancer. PCRC is a common tumor 2 groups of samples were used for gene set enrichment of the digestive system. In the USA, from 2007 to 2017, analysis (GSEA), and DEGs were used to construct a the incidence rate of PCRC ranked third among all tumors, protein-protein interaction (PPI) network. We screened and the mortality of PCRC ranks second and third for both a significant module of the PPI network and found 10 male and female tumors, respectively (1). To date, the cause significant genes. The role of differentially expressed genes of colorectal cancer is still unclear, but it may be related in PCRC was then analyzed. to a malignant transformation of colon polyps, chronic inflammatory stimulation of colonic mucosa, a high-fat diet with insufficient dietary fiber, genetics, and other factors. Methods Early stage of colorectal cancer is characterized by insidious Download data onset, with only fecal occult blood being positive. With the progress of the lesion, patients may have hematochezia, Gene Expression Omnibus (GEO) (https://www.ncbi.nlm. diarrhea, constipation, abdominal pains, abdominal masses, nih.gov/geo/) is a public repository for high throughput and other symptoms. Patients with advanced disease may gene expression data that can be stored and freely also show progressive emaciation, cachexia, and anemia. distributed (6). Currently, GEO stores approximately 1 The incidence and mortality of colorectal cancer in most billion individual gene expression data from more than countries of the world are on the rise. In China, the 100 organisms, covering a wide range of biological issues. incidence and mortality of colorectal cancer also have shown This volume of data can be effectively mined, retrieved, an increasing trend. Among them, the incidence of colon and visualized using user-friendly web-based tools. cancer has increased significantly, but early detection is still GSE81558 [GPL15207 (PrimeView) Affymetrix Human a challenge, and most patients are already in the middle or Gene Expression Array] was downloaded from the GEO late stages when the cancer is discovered (2). Although the database. A total of 32 samples, which consisted of 9 control current treatment of colorectal cancer has progressed, the colorectum samples and 23 PCRC samples, were selected prognosis is still unsatisfactory. Therefore, it has become from the GSE81558 database. critical to explore the mechanisms of PCRC development and to identify new molecules for the improved treatment DEGs and prognosis of PCRC. Bioinformatics is a branch of life science research that GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r) is a uses the computer as a tool to collect, process, store, system for online analysis of data in GEO. This tool system disseminate, analyze, and interpret biological information. runs in the R language. To be exact, GEO2R is based on It is also a new subject formed by the combination of life two R packages, GEOquery, and limma. The former is science and computer science. used for data reading, and the latter is used for calculation. In recent years, high-quality microarrays and high- GEO2R was performed to identify the differentially throughput sequencing techniques have achieved excellent expressed genes between the control group and the PCRC results in detecting the development and progress of group. The cut-off criteria were the adjusted P values (adj. P) colorectal cancer; furthermore, good results have been <0.001, and logFC ≥2 or ≤−2. achieved in the screening of biomarkers in the diagnosis, treatment, and prognosis of colorectal cancer (3). An Functional enrichment of GO and KEGG analysis increasing number of scholars are using bioinformatics technology to study differently expressed genes (DEGs) GO is a widely used biological database, which consists of 2 in various cancer processes and their roles in biological aspects: one is the ontology itself, namely the terms defined processes (BP), molecular functions (MF), and signaling by biologists and the structural relations between them; pathways (4,5). the second is the relationship between the gene products This research screened DEGs from PCRC tumor and entries, namely gene ontology annotation (7). The tissues of PCRC through a comparison with normal KEGG is a systematic analysis of the functional genome

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 Translational Cancer Research, Vol 9, No 5 May 2020 3455

Volcano Plot network, respectively, from the PPI network (11). 15.3

The hub gene analysis 12.24 The heatmap analysis presented the expression level of 9.18 hub genes between the normal and PCRC groups. The cBioPortal (http://www.cbioportal.org) could construct 6.12 the co-expression network of hub genes. The University -log10(P-value) of California, Santa Cruz (UCSC) Xena (https://xena.

3.06 ucsc.edu/welcome-to-ucsc-xena/) was used to integrate the public genomic data sets to analyze and visualize the expression level of hub genes. Also, the effects of hub gene 0 −1.33 −0.91 −0.49 −0.07 0.35 0.77 1.19 expressions for the pathological stage were displayed with log2 (Fold Change) Gene Expression Profiling Interactive Analysis (GEPIA, Figure 1 The volcano plot presents the differently expressed genes http://gepia.cancer-pku.cn/). Furthermore, the overall (DEGs) between primary colorectal cancer and normal samples. survival (OS) of the PCRC patients was analyzed by the Kaplan-Meier plotter (http://kmplot.com/analysis/index. php?p=background). information database, which helps researchers integrate genes and expression information as a whole network (8). The Database for Annotation, Visualization, and Identification of hub genes associated with cancer and Integrated Discovery (DAVID) (version: v6.8, https:// inflammation david.ncifcrf.gov/) was able to identify enriched biological The comparative toxicogenomics database (CTD) (http:// themes, mainly GO terms, and visualize genes on BioCarta ctdbase.org/) was used to explore the relationships between & KEGG pathway maps. gene products and cancer and inflammation. Metascape (http://metascape.org/gp/index.html#/main/ step1) was also used to complete the function and pathway enrichment in the research. Results GSEA is an advanced algorithm for evaluating gene- The DEGs between control and PCRC samples specific probes based on data from microarrays. GSEA is used by users to classify gene probes based on the co- In the GSE81558 dataset, a total of 97 DEGs were found expression data of relevant biological pathways and when the adj. P value <0.001 and the logFC ≥2 or ≤−2. experiments published in authoritative journals and to There were 23 up-regulated DEGs and 74 down-regulated determine whether the probe set can reveal the distribution DEGs in the PCRC samples after comparison with the mode of relevant genome phenotypes through a series of control group, and these genes are presented in the volcano operations based on the correlation. Therefore, GSEA was map (Figure 1). performed to complete GO and KEGG analysis. GO and KEGG functional annotation for DEGs via Construction of the PPI network, significant modules, and DAVID and Metascape hub genes network Through DAVID analysis, the results of the GO analysis Search Tool for the Retrieval of Interacting Gene showed that variations in DEGS were linked with BP (STRING) database (https://string-db.org) was used to and were mainly enriched in the bicarbonate transport, construct the PPI network (9). Also, Cytoscape (version cell surface receptor signaling pathways, chloride 3.6.1) was used to perform the data visualization of the PPI transmembrane transport, one-carbon metabolic process, network (10). Molecular Complex Detection (MCODE) cartilage development, collagen catabolic process, (version 1.5.1) and cytoHubba, 2 plug-ins of Cytoscape, xenobiotic transport, sodium-ion transmembrane transport, were used to identify the significant modules and hub genes body fluid secretion, among many others (Figure 2A).

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 3456 Zheng et al. Biomarkers for the diagnosis and treatment of PCRC

A B

C D

E

F G

Figure 2 The enrichment analysis of DEGs by Database for Annotation, Visualization, and Integrated Discovery (DAVID) and Metascape. Detailed information relating to changes via DAVID in the (A) cell component (CC), (B) biological processes (BP), (C) molecular function (MF), and (D) Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. (E) Heatmap of enriched terms via the Metascape. (F) The network of enriched terms colored by cluster identity. (G) The network of enriched terms colored by P value. DEGs, differently expressed genes.

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 Translational Cancer Research, Vol 9, No 5 May 2020 3457

Table 1 Functional enrichment analysis of DEGs in PCRC using GSEA Gene set name Size ES NES P value

Downregulated

GO_PHOTORECEPTOR_OUTER_SEGMENT 68 0.593 1.909 0.000

GO_REGULATION_OF_LYASE_ACTIVITY 84 0.567 1.872 0.000

GO_POSITIVE_REGULATION_OF_LYASE_ACTIVITY 59 0.614 1.868 0.000

GO_SUBSTANTIA_NIGRA_DEVELOPMENT 42 0.617 1.823 0.000

GO_RESPIRATORY_CHAIN 78 0.668 1.805 0.006

GO_NEURONAL_ACTION_POTENTIAL 28 0.712 1.786 0.002

Upregulated

GO_TELOMERASE_HOLOENZYME_COMPLEX 19 −0.688 −1.793 0.006

GO_NEGATIVE_REGULATION_OF_DNA_RECOMBINATION 16 −0.760 −1.770 0.002

GO_RNA_METHYLTRANSFERASE_ACTIVITY 38 −0.687 −1.769 0.004

GO_TRNA_MODIFICATION 56 −0.691 −1.727 0.013

GO_SOMATIC_DIVERSIFICATION_OF_IMMUNOGLOBULINS 27 −0.682 −1.724 0.011

GO_RNA_MODIFICATION 109 −0.631 −1.723 0.012 PCRC, primary colorectal cancer; ES, enrichment score; NES, normalized enrichment score; DEGs, differently expressed genes; GSEA, gene set enrichment analysis.

Variations in DEGs linked with cell component (CC) many others, (P<0.05) (Figure 2E,F,G). were significantly enriched in extracellular space, proteinaceous extracellular matrix, extracellular region, GO and KEGG pathway enrichment analysis of DEGs in integral component of the plasma membrane, apical plasma PCRC using GSEA membrane, plasma membrane, anchored component of membrane, among others (Figure 2B). Concerning MF, GSEA was used to perform GO and KEGG analysis DEGs were significantly enriched in hormone activity, to explore the function and pathways of DEGs. GO carbonate dehydratase activity, receptor binding, chloride enrichment analysis showed that 2,658/4,564 gene sets were channel activity, xenobiotic-transporting ATPase activity, downregulated in PCRC, 324 gene sets were significantly arylesterase activity, extracellular matrix structural enriched at nominal P value <0.05, and 69 gene sets were constituent, among others (Figure 2C). Analysis of KEGG significantly enriched at nominal P value <0.01. Also, pathways indicated that the top canonical pathways 1,906/4,564 gene sets were upregulated in PCRC, 136 associated with DEGs were nitrogen metabolism, bile gene sets were significantly enriched at nominal P value secretion, proximal tubule bicarbonate reclamation, and <0.05, and 18 gene sets were significantly enriched at pancreatic secretion (Figure 2D). nominal P value <0.01. The most significant enrichments Furthermore, the functional enrichment analysis with for both down- and up-regulated gene sets in the significant Metascape indicated that the DEGs between the normal order (size of NES) are listed in Table 1. Six significant and PCRC samples were significantly enriched in the enrichment plots are shown in Figure 3, such as “GO_ transport of small molecules, erythrocyte take-up of oxygen PHOTORECEPTOR_OUTER_SEGMENT”, “GO_ and release of carbon dioxide, non-integrin membrane- REGULATION_OF_LYASE_ACTIVITY”, “GO_ ECM interactions, positive regulation of peptidyl-threonine POSITIVE_REGULATION_OF_LYASE_ACTIVITY”, phosphorylation, cell-cell adhesion via plasma-membrane “GO_SUBSTANTIA_NIGRA_DEVELOPMENT”, adhesion molecules, cation homeostasis, second-messenger- “GO_RESPIRATORY_CHAIN”, “GO_NEURONAL_ mediated signaling, cellular response to metal ion, among ACTION_POTENTIAL”. GO enrichment analysis

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 3458 Zheng et al. Biomarkers for the diagnosis and treatment of PCRC

A B Enrichment plot: Enrichment plot: GO_PHOTORECEPTOR_OUTER_SEGMENT GO_REGULATION_OF_ LYASE ACTIVITY 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1 Enrichment score (ES) Enrichment score Enrichment score (ES) Enrichment score 0.0 0.0

'healthy' (positively correlated) 'healthy' (positively correlated) 1 1

0 Zero cross at 7059 0 Zero cross at 7059

−1 −1 'tumor' (negatively correlated) 'tumor' (negatively correlated) 0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000 0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000 Rank in Ordered Dataset Rank in Ordered Dataset Ranked list metric (Signal2Noise) Ranked list metric (Signal2Noise) Enrichment profile Hits Ranking metric scores Enrichment profile Hits Ranking metric scores C D

Enrichment plot: Enrichment plot: GO_POSITIVE_REGULATION_OF_LYASE_ACTIVITY GO_TELOMERASE_HOLOENZYME_COMPLEX

0.6 0.0 0.5 −0.1 0.2 0.4 − −0.3 0.3 −0.4 0.2 −0.5 0.1 0.6

Enrichment score (ES) Enrichment score − Enrichment score (ES) Enrichment score 0.0 −0.7

'healthy' (positively correlated) 'healthy' (positively correlated) 1 1

0 Zero cross at 7059 0 Zero cross at 7059

−1 −1 'tumor' (negatively correlated) 'tumor' (negatively correlated) 0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000 0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000 Rank in Ordered Dataset Rank in Ordered Dataset Ranked list metric (Signal2Noise) Ranked list metric (Signal2Noise)

Enrichment profile Hits Ranking metric scores Enrichment profile Hits Ranking metric scores E F Enrichment plot: Enrichment plot: GO_NEGATIVE_REGULATION_OF_DNA_RECOMBINATION GO_RNA_METHYLTRANSFERASE_ACTIVITY

0.0 0.0 −0.1 −0.1 −0.2 0.2 − −0.3 −0.3 −0.4 −0.4 −0.5 −0.5 0.6 −0.6 (ES) Enrichment score − Enrichment score (ES) Enrichment score −0.7 −0.7

'healthy' (positively correlated) 'healthy' (positively correlated) 1 1 0 Zero cross at 7059 0 Zero cross at 7059 −1 −1 'tumor' (negatively correlated) 'tumor' (negatively correlated) 0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000 0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000

Rank in Ordered Dataset Ranked list metric (Signal2Noise) Rank in Ordered Dataset Ranked list metric (Signal2Noise) Enrichment profile Hits Ranking metric scores Enrichment profile Hits Ranking metric scores

Figure 3 Significant Gene Ontology (GO) enrichment plots of DEGs using gene set enrichment analysis (GSEA). DEGs, differently expressed genes.

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 Translational Cancer Research, Vol 9, No 5 May 2020 3459

Table 2 Pathway enrichment analysis of DEGs in PCRC using GSEA Gene set name Size ES NES P value

Downregulated

KEGG_PARKINSONS_DISEASE 113 0.630 1.769 0.002

KEGG_LONG_TERM_POTENTIATION 70 0.566 1.741 0.002

KEGG_ALZHEIMERS_DISEASE 158 0.511 1.677 0.014

KEGG_OXIDATIVE_PHOSPHORYLATION 117 0.535 1.676 0.025

KEGG_TASTE_TRANSDUCTION 52 0.474 1.661 0.010

KEGG_PHOSPHATIDYLINOSITOL_SIGNALING_SYSTEM 76 0.502 1.628 0.010

Upregulated

KEGG_AMINOACYL_TRNA_BIOSYNTHESIS 41 −0.590 −1.615 0.057

KEGG_HOMOLOGOUS_RECOMBINATION 26 −0.662 −1.579 0.021

KEGG_DNA_REPLICATION 36 −0.738 −1.536 0.016

KEGG_RNA_POLYMERASE 27 −0.648 −1.534 0.033

KEGG_MISMATCH_REPAIR 22 −0.646 −1.523 0.037

KEGG_CELL_CYCLE 123 −0.546 −1.420 0.111 PCRC, primary colorectal cancer; ES, enrichment score; NES, normalized enrichment score; DEGs, differently expressed genes; GSEA, gene set enrichment analysis.

revealed that downregulated gene sets in PCRC were signaling system. Upregulated gene sets participated in mainly associated with regulation of lyase activity. And aminoacyl tRNA biosynthesis, homologous recombination, the upregulated gene sets frequently correlated with DNA replication, RNA polymerase, mismatch repair, and telomerase holoenzyme complex, negative regulation of cell cycle. DNA recombination, and RNA methyltransferase activity. Furthermore, KEGG enrichment analysis indicated PPI network, module analysis, and hub genes that 131/178 gene sets were downregulated in PCRC compared to normal colorectal samples, 33 gene sets The PPI network included a total of 42 nodes and 63 edges, were significantly enriched at nominal P value <0.05, which shows that there were closed interactions between all and 10 gene sets are were enriched at nominal P value DEGs (Figure 5A). Two significant modules were identified <0.01. Furthermore, 47/178 gene sets were upregulated from the PPI network. One module network consisted of in PCRC, and 4 gene sets were significantly enriched at 11 nodes and 25 edges (Figure 5B), and the other module nominal P value <0.05. We respectively display the top network consisted of 4 nodes and 6 edges (Figure 5C). 12 gene sets correlated with PCRC according to NES The hub gene network was created by cytoHubba: GCG, in Table 2. Six significant enrichment plots are shown in GUCA2B, CLCA4, ZG16, TMIGD1, GUCA2A, CHGA, Figure 4, including “KEGG_PARKINSONS_DISEASE”, PYY, SST, and MS4A12 (Figure 5D). “KEGG_LONG_TERM_POTENTIATION”, “KEGG_ALZHEIMERS_DISEASE”, “KEGG_ The analysis of hub genes AMINOACYL_TRNA_BIOSYNTHESIS”, “KEGG_ HOMOLOGOUS_RECOMBINATION” and “KEGG_ The expression profile of all the hub genes is presented DNA_REPLICATION”. According to KEGG pathway in the heatmap, which shows that the expression levels of enrichment analysis, downregulated gene sets in PCRC all the hub genes was lower in the PCRC group than the were involved in the pathway of long-term potentiation, control group (Figure 5E). The co-expression network of oxidative phosphorylation, and phosphatidylinositol the hub genes was constructed using the cBioPortal, as

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 3460 Zheng et al. Biomarkers for the diagnosis and treatment of PCRC

A Enrichment plot: KEGG_PARKINSONS_DISEASE B Enrichment plot: KEGG_LONG_TERM_POTENTIATION

0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 Enrichment score (ES) Enrichment score Enrichment score (ES) Enrichment score 0.0 0.0

'healthy' (positively correlated) 'healthy' (positively correlated) 1 1

0 Zero cross at 7059 0 Zero cross at 7059

−1 −1 'tumor' (negatively correlated) 'tumor' (negatively correlated) 0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000 0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000 Ranked list metric (Signal2Noise) Rank in Ordered Dataset Ranked list metric (Signal2Noise) Rank in Ordered Dataset

Enrichment profile Hits Ranking metric scores Enrichment profile Hits Ranking metric scores C D Enrichment plot: KEGG_ALZHEIMERS_DISEASE Enrichment plot: 0.5 KEGG_AMINOACYL_TRNA_BIOSYNTHESIS

0.4 0.0

0.3 −0.1 −0.2 0.2 −0.3 0.1

Enrichment score (ES) Enrichment score −0.4

0.0 −0.5 Enrichment score (ES) Enrichment score −0.6

'healthy' (positively correlated) 'healthy' (positively correlated) 1 1 0 Zero cross at 7059 0 Zero cross at 7059

−1 −1 'tumor' (negatively correlated) 'tumor' (negatively correlated) 0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000 0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000 Ranked list metric (Signal2Noise)

Rank in Ordered Dataset Ranked list metric (Signal2Noise) Rank in Ordered Dataset

Enrichment profile Hits Ranking metric scores Enrichment profile Hits Ranking metric scores E F Enrichment plot: KEGG_DNA_REPLICATION Enrichment plot: KEGG_HOMOLOGOUS_RECOMBINATION 0.0

0.0 −0.1 −0.1 −0.2 −0.2 −0.3 0.4 −0.3 − 0.5 −0.4 −

Enrichment score (ES) Enrichment score −0.6 −0.5 Enrichment score (ES) Enrichment score −0.7 −0.6

'healthy' (positively correlated) 'healthy' (positively correlated) 1 1

0 Zero cross at 7059 0 Zero cross at 7059

−1 −1 'tumor' (negatively correlated) 'tumor' (negatively correlated) 0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000 0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000 Ranked list metric (Signal2Noise)

Ranked list metric (Signal2Noise) Rank in Ordered Dataset Rank in Ordered Dataset

Enrichment profile Hits Ranking metric scores Enrichment profile Hits Ranking metric scores

Figure 4 Significant KEGG pathway enrichment plots of DEGs using GSEA. KEGG, Kyoto Encyclopedia of Genes and Genomes; DEGs, differently expressed genes; GSEA, gene set enrichment analysis.

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 Translational Cancer Research, Vol 9, No 5 May 2020 3461

A B C

D

E

F

G

Figure 5 The protein-protein interaction (PPI) network, module analysis, and hub genes analysis. (A) The PPI network. (B) One significant module network. (C) Another significant module network. (D) The hub genes network. (E) The expression profile of all hub genes. (F) The co-expression network of the hub genes was constructed by the cBioPortal. (G) The University of California, Santa Cruz (UCSC) Xena analysis presented the expression of hub genes. PPI, protein-protein interaction.

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 3462 Zheng et al. Biomarkers for the diagnosis and treatment of PCRC shown in Figure 5F. The UCSC Xena analysis showed that problems, such as difficulty in early diagnosis, metastasis of the expression of hub genes in the PCRC group was lower tumors, postoperative recurrence, and a low 5-year survival than the control group (Figure 5G). The expression of hub rate (14). The key to the treatment of colorectal cancer is genes was not related to the pathological stage of PCRC early detection and early diagnosis, which is conducive to a (Figure 6). Kaplan-Meier analysis showed that the PCRC radical cure. In one study of PCRC patients, approximately patients with high expression levels of GCG had poorer OS 22% had distant metastases of the liver or other organs than those with high expression levels (P<0.05, Figure 7A). at the time of the first diagnosis, but most of them (75% There was no statistically significant effect on OS associated to 90%) were not suitable for surgical resection (15). with the expression of GUCA2B (P>0.05; Figure 7B). PCRC Therefore, early detection of PCRC can improve the patients with low expression levels of CLCA4 had poorer resection rate and prolong the survival time of patients. overall survival time than those with low expression levels Exploring the molecular mechanisms of the occurrence (P<0.05; Figure 7C). PCRC patients with high expression levels and development of PCRC will play an essential role in the of ZG16 had a poorer OS than those with high expression screening, diagnosis, and treatment of patients with PCRC. levels (P<0.05, Figure 7D). There was no statistically significant Bioinformatics has been widely used in exploring genetic effect on OS associated with the expression of TMIGD1 changes such as gene changes and variations and GUCA2A (P>0.05; Figure 7E,F). PCRC patients with in the course of disease occurrence and development. It high expression levels of CHGA had poorer OS than those is also valuable in searching for critical genes for disease with high expression levels (P<0.05, Figure 7G). There was development, which may provide a reliable basis and no statistically significant effect on OS associated with the method for finding therapeutic targets of diseases. In expression of PYY (P>0.05; Figure 7H). PCRC patients with our study, through using GSEA to complete GO and high expression levels of SST had poorer OS than those with KEGG analysis, we found 12 gene sets and 10 distinctly high expression levels (P<0.05, Figure 7I). PCRC patients with differentially expressed hub genes between PCRC and low expression levels of MS4A12 had poorer overall survival normal tissues. Bioinformatics analysis revealed that these times than those with low expression levels (P<0.05; Figure 7J). DEGs (SM4A12, CLCA4, TMIGD1, GUCA2A, GYCA2B, CHGA, SST, ZG16, GCG, and PYY) were downregulated in patients with PCRC. Identification of hub genes CLCA regulator is a kind of protein, which is The CTD showed that hub genes targeted cancer and characterized by symmetrical multiple cysteine sequences inflammation, and the data are shown inFigure 8. at its amino terminal. The human CLCA gene is found on chromosome 1p31-1p22. CLCA is a chloride channel regulator for outward rectification. CLCA protein is Discussion activated by calcium ions and plays a role in controlling Colorectal cancer is one of the most common malignant chloride outflow in epithelial cells (16). Studies have shown tumors in clinic, and is characterized by high heterogeneity that CLCA genes are involved in a variety of BP, including occurring between tumors and intracellularly (12). the development of processes such as cell differentiation, According to worldwide statistics, the incidence rate adhesion, and apoptosis (17). CLCA4, as a member of of colorectal cancer ranks 4th among all malignant the CLCA family, is a tumor suppressor, which has been tumors as of 2018. In recent years, the incidence rate shown to contribute to the progress of some tumor diseases. has increased yearly, and it has showed a trend of being However, its role in PCRC remains poorly studied. Studies diagnosed in younger people. Although surgical treatment have shown that the human brain, testis, small intestine, significantly reduces mortality in colorectal cancer patients, colon, and lung tissues have elevated levels of CLCA4 postoperative patients still face problems such as intestinal mRNA expression. Lack of expression of CLCA4 can function problems and direct affects to their quality of life reduce the inhibition of tumor cells (17). In this study, we due to related psychological stress (13). Gene mutation found that the expression level of CLCA4 in PCRC cancer and microenvironment changes are closely related to the cells was significantly lower than that in normal control occurrence and development of PCRC. In recent years, tissues. CLCA4 may be a biomarker for early diagnosis considerable progress has been made in the screening, and metastasis of colorectal cancer. Early detection of diagnosis, and treatment of PCRC, but there are still some this marker can assist in early diagnosis, predict patient

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 Translational Cancer Research, Vol 9, No 5 May 2020 3463

A GCG F value=1.5 B 7 GUCA2B F value=0.642 Pr (>F)=0.589 8 Pr (>F)=0.214 6

6 5 4

4 3

2 2 1

0 0 C Stage I Stage II Stage III Stage IV D Stage I Stage II Stage III Stage IV 10 CLCA4 F value=0.116 10 ZG16 F value=0.228 Pr (>F)=0.951 Pr (>F)=0.877

8 8

6 6

4 4

2 2

0 0 Stage I Stage II Stage III Stage IV Stage I Stage II Stage III Stage IV E 6 F TMIGD1 F value=1.38 10 GUCA2A F value=1.24 Pr (>F)=0.249 Pr (>F)=0.296 5 8 4 6 3

4 2

1 2

0 0 Stage I Stage II Stage III Stage IV Stage I Stage II Stage III Stage IV G H CHGA F value=1.06 5 PYY F value=1.18 10 Pr (>F)=0.365 Pr (>F)=0.317

8 4

6 3

4 2

2 1

0 0 Stage I Stage II Stage III Stage IV Stage I Stage II Stage III Stage IV

I 7 SST F value=1.08 J MS4A12 F value=0.759 Pr (>F)=0.358 Pr (>F)=0.518 6 8

5 6 4

3 4

2 2 1

0 0 Stage I Stage II Stage III Stage IV Stage I Stage II Stage III Stage IV

Figure 6 The relationship between expression of hub genes and pathological stage. prognosis, and serve as a basis for the diagnosis of tumor containing immunoglobulin domain. TMIGD protein recurrence. is a cell surface glycoprotein, that consists of 3 main The transmembrane and immunoglobulin domain domains: the extracellular domain containing one or more (TMIGD) is a kind of cell surface adhesion molecule immunoglobulin-like domains, the single transmembrane

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 3464 Zheng et al. Biomarkers for the diagnosis and treatment of PCRC

A GCG B GUCA2B C CLCA4 1.0 HR=1.44 (1.03−2.01) 1.0 HR=1.25 (0.89−1.76) 1.0 HR=0.66 (0.46−0.94) log rank P =0.031 log rank P =0.02 log rank P =0.021 0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4 Probability Probability Probability

0.2 0.2 0.2 Expression Expression Expression low low low 0.0 high 0.0 high 0.0 high 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time (months) Time (months) Time (months) Number at risk Number at risk Number at risk low 181 126 67 37 16 13 11 low 252 176 97 51 24 15 11 low 95 59 28 11 3 3 3 high 190 125 67 33 19 9 4 high 119 75 37 19 11 7 4 high 276 192 106 59 32 19 12

D ZG16 E TMIGD1 F GUCA2A 1.0 HR=1.42 (1.01−1.98) 1.0 HR=0.73 (0.51−1.06) 1.0 HR=1.25 (0.87−1.8) log rank P =0.04 log rank P =0.098 log rank P =0.23

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4 Probability Probability Probability

0.2 0.2 0.2 Expression Expression Expression low low low 0.0 high 0.0 high 0.0 high 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time (months) Time (months) Time (months) Number at risk Number at risk Number at risk low 168 118 66 40 18 14 11 low 97 59 27 12 1 1 1 low 276 192 107 55 26 18 13 high 203 133 68 30 17 8 4 high 274 192 107 58 34 21 14 high 95 59 27 15 9 4 2

G CHGA H PYY I SST 1.0 HR=1.69 (1.12−2.56) 1.0 HR=1.33 (0.93−1.9) 1.0 HR=1.48 (1.02−2.14) log rank P =0.011 log rank P =0.12 log rank P =0.038

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4 Probability Probability Probability

0.2 0.2 0.2 Expression Expression Expression low low low 0.0 high 0.0 high 0.0 high 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time (months) Time (months) Time (months) Number at risk Number at risk Number at risk low 99 69 42 24 10 6 5 low 277 187 102 57 27 19 13 low 122 87 50 24 9 5 3 high 272 182 92 46 25 16 10 high 94 64 32 13 8 3 2 high 249 164 84 46 26 17 12 J MS4A12 1.0 HR=0.64 (0.45−0.91) log rank P =0.013

0.8

0.6

0.4 Probability

0.2 Expression low 0.0 high 0 10 20 30 40 50 60 Time (months) Number at risk low 108 58 27 14 4 2 1 high 263 193 107 56 31 20 14

Figure 7 The overall survival of Kaplan-Meier analysis of 6 hub genes.

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 Translational Cancer Research, Vol 9, No 5 May 2020 3465

A GCG B GUCA2B

C CLCA4 D ZG16

E TMIGD1 F GUCA2A

G CHGA H PYY

I SST J MS4A12

Figure 8 Relationship to cancer and inflammation related to hub genes.

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 3466 Zheng et al. Biomarkers for the diagnosis and treatment of PCRC domain, and the intracellular domain of C-terminal (18). for comprehensive verification to further understand the At present, 3 members of this family have been identified: molecular mechanism of PCRC. TMIGD1, TMIGD2, and TMIGD3. TMIGD1 is In conclusion, we identified 12 gene sets and 10 hub expressed in the brain, kidney, stomach, small intestine, genes from the genomic samples of patients with PCRC and and colon epithelium, while the expression in other tissues normal controls by bioinformatics analysis. The key genes is significantly lower than the above. Cattaneo et al. found in DEGs may provide novel insights and critical evidence a low expression of TMIGD1 in colorectal cancer (19). for the diagnosis and targeted therapy of PCRC. TMIGD1 can form homologous dimers between cells through its immunoglobulin-like domain, which mediates Acknowledgments intercellular adhesion, stabilizes cell membrane structure, and inhibits cell proliferation and migration. Iyer et al. Funding: None. found that TMIGD3 can play a role as a tumor suppressor in osteosarcoma and inhibit the proliferation, migration, Footnote invasion, and tolerance to adverse stimuli of osteosarcoma cells by inhibiting the activation of the PKA-AKT-NF-KB Conflicts of Interest: All authors have completed the ICMJE pathway (20). In this study, we found that the expression uniform disclosure form (available at http://dx.doi. level of TMIGD in PCRC cells was significantly reduced, org/10.21037/tcr-19-2290). The authors have no conflicts indicating its tumor-suppressing effect which could provide of interest to declare. references and ideas for targeted therapy of colorectal cancer. Ethical Statement: The authors are accountable for all Go-Respiratory-Chain, also called electron transfer aspects of the work in ensuring that questions related chain, is an energy conversion system consisting of a series to the accuracy or integrity of any part of the work are of electron transporters located in the mitochondrial inner appropriately investigated and resolved. This research membrane in a standard oxidation-reduction potential, was approved by the Ethics Review Committee of Fourth arranged from low to high. It transfers the paired hydrogen Hospital of Hebei Medical University. The approval atoms from the metabolites to oxygen and produces number is 2017MEC115. water and ATP, with the latter being the primary source of energy for life activities. Tumor cells are in a state of Open Access Statement: This is an Open Access article infinite proliferation and need a large energy supply, but distributed in accordance with the Creative Commons the free radical reactive oxygen species (ROS) produced Attribution-NonCommercial-NoDerivs 4.0 International by respiration has an important impact on the normal License (CC BY-NC-ND 4.0), which permits the non- growth of cells, and a too-low ROS level can inhibit cell commercial replication and distribution of the article with proliferation (21). However, our analysis found that Go- the strict proviso that no changes or edits are made and the Respiratory-Chain is weakly expressed in PCRC. We original work is properly cited (including links to both the hypothesized that the expression of related genes in the formal publication through the relevant DOI and the license). mitochondrial respiratory chain is out of control, enhances See: https://creativecommons.org/licenses/by-nc-nd/4.0/. mitochondrial activity and energy ATP production, and triggers cell proliferation and cancerization. Energy is References the limiting factor for cell growth and proliferation, and regulation of the mitochondrial respiratory chain may serve 1. Donovan MG, Selmin OI, Doetschman TC, et al. as a potential therapeutic target. Mediterranean Diet: Prevention of Colorectal Cancer. Although the study conducted a rigorous bioinformatics Front Nutr 2017;4:59. analysis, there are still some shortcomings: (I) the sample 2. Dekker E, Tanis PJ, Vleugels JLA, et al. Colorectal cancer. size in the data set was small, so it is necessary to further Lancet 2019;394:1467-80. expand the sample size to obtain more accurate results. (II) 3. Falzone L, Scola L, Zanghi A, et al. Integrated analysis This paper only conducted bioinformatics data analysis but of colorectal cancer microRNA datasets: identification of did not conduct experimental verification. A large number microRNAs associated with tumor development. Aging of clinical samples and animal experiments should be used (Albany NY) 2018;10:1000-14.

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290 Translational Cancer Research, Vol 9, No 5 May 2020 3467

4. Zhao Y, Chung M, Johnson BA, et al. Hierarchical of colorectal cancer: a review. Updates Surg 2016;68:7-11. Feature Selection Incorporating Known and Novel 14. Díaz-Tasende J. Colorectal cancer screening and survival. Biological Information: Identifying Genomic Features Rev Esp Enferm Dig 2018;110:681-3. Related to Prostate Cancer Recurrence. J Am Stat Assoc 15. Clancy C, Burke JP, Barry M, et al. A meta-analysis to 2016;111:1427-39. determine the effect of primary tumor resection for stage 5. Yan L, You WQ, Sheng NQ, et al. A CREB1/miR- IV colorectal cancer with unresectable metastases on 433 reciprocal feedback loop modulates proliferation patient survival. Ann Surg Oncol 2014;21:3900-8. and metastasis in colorectal cancer. Aging (Albany NY) 16. Mura CV, Delgado R, Delgado MG, et al. A CLCA 2018;10:3774-93. regulatory protein present in the chemosensory cilia of 6. Wang Z, Monteiro CD, Jagodnik KM, et al. Extraction olfactory sensory neurons induces a Ca(2+)-activated Cl(- and analysis of signatures from the Gene Expression ) current when transfected into HEK293. BMC Neurosci Omnibus by the crowd. Nat Commun 2016;7:12846. 2017;18:61. 7. The Gene Ontology Consortium. The Gene Ontology 17. Liu CL, Shi GP. Calcium-activated chloride channel Resource: 20 years and still GOing strong. Nucleic Acids regulator 1 (CLCA1): More than a regulator of chloride Res 2019;47:D330-D338. transport and mucus production. World Allergy Organ J 8. Lu J, Zhang YH, Wang S, et al. Analysis of four types 2019;12:100077. of leukemia using Gene Ontology term and Kyoto 18. Meyer RD, Zou X, Ali M, et al. TMIGD1 acts as a tumor Encyclopedia of Genes and Genomes pathway enrichment suppressor through regulation of p21Cip1/p27Kip1 in scores. Comb Chem High Throughput Screen 2018. [Epub renal cancer. Oncotarget 2017;9:9672-84. ahead of print]. 19. Cattaneo E, Laczko E, Buffoli F, et al. Preinvasive 9. A P, Xu X, Wang C, et al. A Bioinformatic Profile of Gene colorectal lesion transcriptomes correlate with endoscopic Expression of Colorectal Carcinoma Derived Organoids. morphology (polypoid vs. nonpolypoid). EMBO Mol Med Biomed Res Int 2018;2018:2594076. 2011;3:334-47. 10. Sun G, Li Y, Peng Y, et al. Identification of differentially 20. Iyer SV, Ranjan A, Elias HK, et al. Genome-wide RNAi expressed genes and biological characteristics of colorectal screening identifies TMIGD3 isoform1 as a suppressor cancer by integrated bioinformatics analysis. J Cell Physiol of NF-κB and osteosarcoma progression. Nat Commun 2019. [Epub ahead of print]. 2016;7:13561. 11. Yang G, Zhang Y, Yang J. A Five-microRNA Signature 21. Kim SM, Hwang KA, Choi DW, et al. The cigarette as Prognostic Biomarker in Colorectal Cancer by smoke components induced the cell proliferation and Bioinformatics Analysis. Front Oncol 2019;9:1207. epithelial to mesenchymal transition via production of 12. Siegel RL, Miller KD, Fedewa SA, et al. Colorectal cancer reactive oxygen species in endometrial adenocarcinoma statistics, 2017. CA Cancer J Clin 2017;67:177-93. cells. Food Chem Toxicol 2018;121:657-65. 13. Favoriti P, Carbone G, Greco M, et al. Worldwide burden

Cite this article as: Zheng ZG, Ma BQ, Xiao Y, Wang TX, Yu T, Huo YH, Wang QQ, Shan MJ, Meng LB, Han J. Identification of biomarkers for the diagnosis and treatment of primary colorectal cancer based on microarray technology. Transl Cancer Res 2020;9(5):3453-3467. doi: 10.21037/tcr-19-2290

© Translational Cancer Research. All rights reserved. Transl Cancer Res 2020;9(5):3453-3467 | http://dx.doi.org/10.21037/tcr-19-2290