Weighted Gene Coexpression Analysis Indicates That PLAGL2 and POFUT1 Are Related to the Differential Features of Proximal and Distal Colorectal Cancer
Total Page:16
File Type:pdf, Size:1020Kb
ONCOLOGY REPORTS 42: 2473-2485, 2019 Weighted gene coexpression analysis indicates that PLAGL2 and POFUT1 are related to the differential features of proximal and distal colorectal cancer YIMING LV1,2*, BINBIN XIE3*, BINGJUN BAI1,2, LINA SHAN1,2, WENQIAN ZHENG1,2, XUEFENG HUANG1,2 and HONGBO ZHU1,2 1Department of Colorectal Surgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University; 2Key Laboratory of Biotherapy of Zhejiang Province; 3Department of Medical Oncology, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang 310016, P.R. China Received May 18, 2019; Accepted August 9, 2019 DOI: 10.3892/or.2019.7368 Abstract. In the current era of precision medicine, there is location, may serve as biomarkers and therapeutic targets for a general consensus that the anatomical site is an important the precise diagnosis and treatment of CRC in the future. factor in the management of colorectal cancer (CRC). To investigate the underlying molecular mechanisms between Introduction proximal and distal CRC and to identify the responsible genes, we analyzed the gene expression patterns of colorectal tumors Colorectal cancer (CRC), which accounted for approximately from two microarray datasets, GSE39582 and GSE14333, 1.8 million new cases and more than 860,000 deaths in 2018, on the NCBI Gene Expression Omnibus and the RNA-seq ranks as the fourth most commonly diagnosed malignancy data from TCGA. Weighted coexpression network analysis and the second leading cause of cancer-related deaths world- (WGCNA) was applied to construct a gene coexpression wide (1). The incidence and mortality rates of CRC are still network. The red module in GSE39582 and the dark-gray increasing rapidly in many developing countries around the module from the TCGA dataset were found to be highly corre- world, causing a considerable public health issue (2). lated with the anatomical site of CRC. A total of 12 hub genes Nearly three decades ago, J.A. Bufill proposed sub‑classi- were found in two datasets, 2 of which PLAG1 like zinc finger fying CRC depending on the anatomical site, either proximal 2 (PLAGL2) and protein O-fucosyltransferase 1 (POFUT1) (right) or distal (left) to the splenic flexure (3). Subsequent were common and upregulated in tumor samples in CRC. research has observed distinct differences in epidemiology The module with the highest correlation provided references and pathological features according to primary tumor loca- that will help to characterize the difference between left-sided tion in CRC. In 2000, H. Elsaleh found that the tumor site is and right-sided CRC. The survival analysis of PLAGL2 and associated with survival benefit from adjuvant chemotherapy POFUT1 expression revealed differences between proximal in CRC (4). This researcher discovered that patients with and distal CRC. Gene set enrichment analysis based on those right‑sided tumors have better survival benefits from adjuvant two genes provided similar results: GPI anchor biosynthesis chemotherapy than patients with left-sided tumors. In addition, and peroxisome and selenoamino acid metabolism. PLAGL2 the frequency of MSI was much higher in right-sided tumors and POFUT1, which have the highest correlation with tumor than in left-sided tumors (5,6). It is now well established by a variety of studies that primary tumor location affects the outcome of the chemotherapy and immunotherapy of CRC patients in a large-scale population, and tumor location is a high‑risk parameter for prognosis in specific stages. There Correspondence to: Dr Hongbo Zhu, Department of Colorectal is a general consensus that primary tumor location plays an Surgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang important role in CRC development. We could even define University, 3 Qingchun East Road, Hangzhou, Zhejiang 310016, P. R. Ch i na right-sided and left-sided tumors as two different diseases that E‑mail: [email protected] need different treatments (7). This influence of tumor loca- tion may be due to differences in embryological development. *Contributed equally Specifically, the right side of the colon has historically been understood to be derived from the embryological midgut, Key words: bioinformatics analysis, colorectal cancer, tumor and the left colon arises from the embryological hindgut. The location, weighted gene coexpression analysis, WGCNA, PLAGL2, transverse colon is composed of parts of both structures. These POFUT1 different origins could result in various clinical traits. However, the underlying molecular mechanism governing those different behaviors and outcomes has not been fully 2474 LV et al: COEXPRESSED GENES RELATED TO TUMOR SITES IN CRC elucidated to date. With the popularization of next-generation weight for each observation. Thus, the WGCNA workflows sequencing technology, we currently have abundant published originally developed for microarray analysis can be used on research describing the use of the Chip-seq or RNA-seq the RNA-seq data. Further preprocessing steps included the method to investigate problems related to cancer. In the last removal of control samples and the genes with zero counts in decade, a considerable number of studies have been published more than 80% of samples. As mentioned before, genes that on the distinct gene expression between left- and right-sided are not differentially expressed between samples must be CRC (8,9). The generalizability of much of the published excluded; thus, we chose the top 12,000 genes with the highest research on this issue has been restricted to the analysis of MAD for the network building. Fig. 1 depicts a flow chart for differential gene expression, while few previous studies have the bioinformatic analysis. investigated this problem from the perspective of expression patterns. Weighted gene coexpression analysis (WGCNA) is a Construction of weighted gene coexpression networks. The powerful tool to describe the correlation patterns among genes R package ‘WGCNA’ was used in our study to construct a across microarray or RNA-seq samples (10). This method gene coexpression network (10). After data collection and has been widely used to identify modules of tightly corre- normalization, it is crucial that outliers be excluded. However, lated genes and summarize such modules using the module it was difficult to distinguish outlying samples in a dendro- eigengene or intramodular hub genes. After the modules are gram when the number of samples was large. To solve this identified, we can easily evaluate the association between the problem, we used the standardized connectivity (Z. K) method modules and external clinical traits using eigengene network recommended by WGCNA authors with the default threshold, methodology. This approach has been generally acknowledged Z. K score £2. After filtering out the outlying samples, expres- and successfully applied to various cancer studies. sion data were tested to determine whether the samples and In this study, we aimed to utilize the gene expression data genes were good using the integrated function in the WGCNA from the public genomic database to explore the inner connec- package. tions and genetic difference between proximal and distal CRC After filtering out the outliers and bad samples in the dataset, and to use weighted gene coexpression analysis (WGCNA) to the next step of WGCNA is to build a scale-free network. In a search for the responsible genes. scale-free network, several nodes, which are called hub nodes, are highly connected to other nodes in the network (18). In Materials and methods our study, we use the unsigned coexpression measure, which means that the positive correlation and negative correlation are Data collection. The raw expression data of GSE39582 (11) equal. We constructed the gene coexpression network using and GSE14333 (12) were retrieved from the Gene Expression the following steps. Omnibus database (http://www.ncbi.nlm.nih.gov/geo/), both First, we need a soft thresholding power β to which based on the platform of GPL570 Affymetrix Human Genome coexpression similarity is raised to calculate adjacency. By U133 Plus 2.0 Array. We used the Affy package in R to trans- raising the absolute value of the correlation to a power β≥1 form the CEL files of the tumor samples into an expression (soft thresholding), the weighted gene coexpression network matrix (13). To improve the data quality, we used the k-nearest construction emphasizes high correlations at the expense neighbors algorithm (k-NN) from the impute package in R of low correlations. To determine the best soft threshold to impute the missing expression data (14). Meanwhile, the power, scale independence and average connectivity degree robust multiarray average algorithm (RMA) was utilized to of modules with different power values were calculated by adjust the data for potential batch effects and for background the gradient method. We selected the power β to ensure that correcting (15). Prior to WGCNA analysis, we filtered out the coexpression network was a ‘scale-free’ network, which the probes that were absent in all samples. The probe infor- was biologically close to reality. Moreover, to minimize the mation was then transformed into the official gene symbols effects of noise and spurious associations, we subsequently using Bioconductor in R. If multiple probes were applied constructed the Topology Overlap Matrix (TOM) from the to detect the same mRNA, the average value of the probes adjacency matrix and calculated the corresponding dissimi- was used.