Downloaded from the Cancer Genome Atlas Genes and Genomes (KEGG) Enrichment Analyses Utilizing the R (TCGA) Database (
Total Page:16
File Type:pdf, Size:1020Kb
ORIGINAL RESEARCH published: 22 March 2021 doi: 10.3389/fonc.2021.633899 High BLM Expression Predicts Poor Clinical Outcome and Contributes to Malignant Progression in Human Cholangiocarcinoma Xiaolong Du 1†, Chen Zhang 1†, Chuanzheng Yin 1, Wenjie Wang 1, Xueke Yan 1, Dawei Xie 2, Xichuan Zheng 1, Qichang Zheng 1, Min Li 1 and Zifang Song 1* 1 Department of Hepatobiliary Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China, 2 State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China Molecular mechanisms underlying the tumorigenesis of a highly malignant cancer, cholangiocarcinoma (CCA), are still obscure. In our study, the CCA expression profile data were acquired from The Cancer Genome Atlas (TCGA) database, and differentially Edited by: expressed genes (DEGs) in the TCGA-Cholangiocarcinoma (TCGA-CHOL) data set were Dawit Kidane, University of Texas at Austin, utilized to construct a co-expression network via weighted gene co-expression network United States analysis (WGCNA). The blue gene module associated with the histopathologic grade of Reviewed by: CCA was screened. Then, five candidate hub genes were screened by combining the co- Zehua Bian, expression network with protein–protein interaction (PPI) network. After progression and Affiliated Hospital of Jiangnan University, China survival analyses, bloom syndrome helicase (BLM) was ultimately identified as a real hub Lu Zhang, gene. Moreover, the receiver operating characteristic (ROC) curve analysis suggested Henan University, China that BLM had a favorable diagnostic and predictive recurrence value for CCA. The gene *Correspondence: Zifang Song set enrichment analysis (GSEA) results for a single hub gene revealed the importance [email protected] of cell cycle-related pathways in the CCA progression and prognosis. Furthermore, we †These authors have contributed detected the BLM expression in vitro, and the results demonstrated that the expression equally to this work level of BLM was much higher in the CCA tissues and cells relative to adjacent non-tumor samples and normal bile duct epithelial cells. Additionally, after further silencing the BLM Specialty section: This article was submitted to expression by small interfering RNA (siRNA), the proliferation and migration ability of CCA Gastrointestinal Cancers, cells were all inhibited, and the cell cycle was arrested. Altogether, a real hub gene (BLM) a section of the journal and cell cycle-related pathways were identified in the present study, and the gene BLM Frontiers in Oncology may be involved in the CCA progression and could act as a reliable biomarker for potential Received: 26 November 2020 Accepted: 09 February 2021 diagnosis and prognostic evaluation. Published: 22 March 2021 Keywords: cholangiocarcinoma, prognosis, progression, co-expression network analysis, bioinformatics Citation: Du X, Zhang C, Yin C, Wang W, Yan X, Xie D, Zheng X, Zheng Q, Li M and INTRODUCTION Song Z (2021) High BLM Expression Predicts Poor Clinical Outcome and Contributes to Malignant Progression Cholangiocarcinoma (CCA) is one of the most frequent primary biliary duct malignancies and in Human Cholangiocarcinoma. accounts for 3% of all gastrointestinal neoplasias (1, 2). Since CCA possesses the characteristics Front. Oncol. 11:633899. of high malignancy, insidious onset, and rapid progression, the prognosis of patients with CCA is doi: 10.3389/fonc.2021.633899 often poor (3, 4). Large-scale clinical research revealed that the 5-year survival rates were 5 and 17% Frontiers in Oncology | www.frontiersin.org 1 March 2021 | Volume 11 | Article 633899 Du et al. Biomarker BLM Identification in Cholangiocarcinoma for intrahepatic CCA (iCCA) and extrahepatic CCA (eCCA), adjacent non-cancerous samples. These two data sets were used respectively, for the cases diagnosed between 2000 and 2007 in to further validate our candidate hub genes. Europe (5). In the USA alone, CCA accounts for approximately 5,000 deaths per year (6). Moreover, since its early symptoms are not obvious and CCA lacks a highly efficient disease screening Differentially Expressed Gene Screening biomarker, many patients have frequently lost the chance of and Principal Component Analysis radical surgery upon the first diagnosis (7). Furthermore, as The “TCGAbiolinks” R package (15) was used to identify the the pathogenesis of CCA is rather complicated, there is still DEGs between adjacent normal tissues and CCA samples in no satisfying targeted treatment option available for CCA in the CHOL data set. Significance analysis with a false discovery < the age of individualized medicine (8). Therefore, there is an rate (FDR) 0.01 and |log2 fold change (FC)| ≥ 1 was urgent need to explore the molecular pathogenesis of CCA and adopted to choose the genes for a network construction. To reveal the biomarkers closely related to the diagnosis, occurrence, explore gene expression patterns between the CCA samples and progression, and prognosis of CCA. adjacent tissues, the top 500 DEGs were included in the principal Due to advances in sequencing and microarray technologies, component analysis (PCA) utilizing the R package “pca3d” bioinformatics has begun to play an increasingly important role (https://cran.r-project.org/web/packages/pca3d/). The first three in various fields of life sciences (9, 10). The development of large- principal components were utilized and plotted to show the scale “omics” research methods such as genomics, proteomics, expression pattern of the two groups. and transcriptomics has brought about a vast amount of biological information data (11). Gene co-expression networks Weighted Gene Co-expression Network can describe and cluster multiple genes sharing high-expression correlations in microarray data or high-throughput sequencing Analysis data and ultimately be used to establish key regulatory genes(12). R package of “WGCNA”(16) was utilized to build a co-expression A weighted gene co-expression network analysis (WGCNA), as network for the qualified expression data profiles of filtered one of the most representative analytical methods of a gene DEGs with the available clinical data. First, we retained the network, has provided significant biological information for the RNA-seq data of the filtered CCA tissues if they proved to be research on multiple species such as humans, mice, and yeast good samples. Second, outlier samples were culled after a sample (13, 14). In this study, a WGCNA network was constructed, and clustering via correlation analysis. Third, based on a Pearson genes with similar expression patterns were incorporated into correlation analysis, a matrix of similarity for all pairs of kept the identical modules. By relating the results of the module to genes was constructed. Then, appropriate soft-thresholding was the corresponding clinical data, the modules that were mostly chosen to achieve a scale-free co-expression network. Next, the associated with the CCA progression were found. Finally, after adjacency matrix was converted to a topological overlap matrix a range of screening and validation tests, we identified a bloom (TOM). In line with the TOM-based dissimilarity method, genes syndrome helicase (BLM) gene that could indeed promote the were assigned to the different modules. Here, we set the soft- 2 tumor progression and predict the prognosis of CCA. thresholding power to 9 (scale-free R = 0.913), the cut height to 0.25, and the minimal module size to 30 to identify key modules. The module most relevant to clinical traits was chosen to screen MATERIALS AND METHODS co-expression hub genes. Data Sets and Study Design RNA-sequencing (RNA-seq) data (Illumina RNA Seq V2, Function Enrichment Analyses Illumina, San Diego, CA) and the corresponding clinical data on We performed gene ontology (GO) and Kyoto Encyclopedia of CCA samples were downloaded from The Cancer Genome Atlas Genes and Genomes (KEGG) enrichment analyses utilizing the R (TCGA) database (http://cancergenome.nih.gov/). This CHOL package “clusterprofiler” (17). GO terms or KEGG pathways with data set included 36 CCA samples and 9 corresponding adjacent p < 0.05 were regarded as statistically significant and visualized non-cancerous samples, and its clinical data were composed by R package “GOplot” (18). of tumor histologic grade, pathological stage, and numerous follow-up information. The messenger RNA (mRNA) expression profile and clinical data were employed to search for differentially Candidate Hub Genes Identification expressed genes (DEGs) and construct co-expression networks. In this research, a module of interest was determined, and co- The microarray data sets GSE76297 and GSE132305 were expression hub genes were specified as high module connectivity acquired from the gene expression omnibus (GEO) database (module membership > 0.8) and high clinical trait significance of the National Center for Biotechnology Information (NCBI) (gene significance > 0.4). Moreover, the gene list of this key database (https://www.ncbi.nlm.nih.gov/). These two data sets module was submitted to the STRING database to build a were generated on the platforms of the Affymetrix Human protein–protein interaction (PPI) network (19), which was then Transcriptome Array 2.0 and Affymetrix Human Genome visualized by Cytoscape software (20). PPI network hub genes U219 Array, respectively. The data set GSE76297 contained 91 were