Expression Profile Analysis Identifies Key
Total Page:16
File Type:pdf, Size:1020Kb
Guan et al. Cancer Cell Int (2020) 20:104 https://doi.org/10.1186/s12935-020-01179-x Cancer Cell International PRIMARY RESEARCH Open Access Expression profle analysis identifes key genes as prognostic markers for metastasis of osteosarcoma Xiaoqing Guan1*†, Zhiyuan Guan2,3† and Chunli Song2,3* Abstract Background: OS is the most common malignant tumor of bone which was featured with osteoid or immature bone produced by the malignant cells, and biomarkers are urgently needed to identify patients with this aggressive disease. Methods: We downloaded gene expression profles from GEO and TARGET datasets for OS, respectively, and per- formed WGCNA to identify the key module. Whereafter, functional annotation and GSEA demonstrated the relation- ships between target genes and OS. Results: In this study, we discovered four key genes—ALOX5AP, HLA-DMB, HLA-DRA and SPINT2 as new prognostic markers and confrmed their relationship with OS metastasis in the validation set. Conclusions: In conclusion, ALOX5AP, HLA-DMB, HLA-DRA and SPINT2 were identifed by bioinformatics analysis as possible prognostic markers for OS metastasis. Keywords: Biomarker, Osteosarcoma, Metastasis, Molecular classifer, Prognosis, Gene expression Background leading cause of death in patients with OS, compared Osteosarcoma (OS) is the most common type of cancer with 70% of patients with localized disease, and only that arises in bones and most people diagnosed with OS about 20% becoming long-term survivors. are under the age of 25 [1]. Te incidence of OS in the Previous studies have investigated mutational altera- general population is 2–3/million/year and the peak at tions or gene factors in an attempt to identify candidate the age of 15–19 is 8–11/million/year [2]. OS is charac- OS driver oncogenes or tumors suppressors [4–6]. So terized by early metastasis, poor prognosis without treat- far, for patients with metastatic OS, neither prognostic ment [3], and more than 90% of patients die from lung factors nor optimal treatment methods have been well metastasis before multiple chemotherapy. OS is currently established. Terefore, more attention must be paid to undergoing multidisciplinary treatment, with approxi- more precise risk assessment, not only for patient consul- mately 15–20% of patients showing signs of metastasis tation, but also for determining treatment options based at diagnosis, most in the lungs. Metastasis remains the on reliable stratifed criteria. In order to detect pulmo- nary metastasis OS early and improve poor survivorship, it is important to further explore more efective prognos- *Correspondence: [email protected]; [email protected] tic biomarkers and therapeutic targets. †Xiaoqing Guan and Zhiyuan Guan contributed equally to this work 1 Center for Cancer Bioinformatics, Key Laboratory of Carcinogenesis Although research on biomarkers for metastasis within and Translational Research (Ministry of Education), Peking University OS has recently expanded [7, 8], the targets after any Cancer Hospital & Institute, Beijing, China OS diagnosis within the clinic and suitable for various 2 Department of Orthopaedics, Peking University Third Hospital, Beijing, China sequencing platforms remain sparse. Recent develop- Full list of author information is available at the end of the article ment of gene chips and high-throughput sequencing © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creat iveco mmons .org/publi cdoma in/ zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Guan et al. Cancer Cell Int (2020) 20:104 Page 2 of 10 technology, have enabled the identifcation of key genes Te results may reveal the prognostic value of the gene related to tumor progression and prognosis based on big signature for OS (Fig. 1). data integration and bioinformatics. Weighted gene co- expression network analysis (WGCNA) is a systematic Methods biological method that could identify highly synergisti- Data sources and data preprocessing cally altered gene sets and screen out therapeutic targets We downloaded standardized matrix profle (*series or candidate biomarkers based on the inherent char- matrix.txt) of GSE21257 (a microarray dataset) and acteristics of the gene sets and the correlation between obtained patient information from GEO database gene sets and phenotypes. (Table 1) [9]. Te platform of the dataset is the GPL10295 Aiming at identifying and validating key genes in OS Illumina human-6 v2.0 expression beadchip. We removed metastasis, the present study frstly identifed associated probes not mapping to the Gene symbol using platform module by WGCNA according to the gene expression annotation fle. For diferent probes corresponding to the profles from Gene Expression Omnibus (GEO) data- same gene, their median expression values were taken as sets and determined the diferentially expressed genes the fnal gene expression value. Diferentially expressed between metastatic OS samples and non-metastatic genes (DEGs) between OS samples with metastasis samples. Subsequently, Gene Ontology (GO) and Kyoto and those without metastasis were identifed using the Encyclopedia of Genes and Genomes (KEGG) pathway “limma” (linear models for microarray data) R package analyses were performed to determine the most sig- (False discovery rate (FDR) < 0.05 and absolute of log- nifcant pathways associated with OS metastasis. Addi- 2fold change (FC) > 1) [10]. tionally, we constructed Kaplan–Meier (KM) curves Te OS RNA-seq expression data and the correspond- and receiver operating characteristic (ROC) curves and ing clinical follow-up data were obtained from the pub- screened the key genes related to OS prognosis. Moreo- licly available website of the National Cancer Institute ver, univariate and multivariate Cox regression analysis TARGET Data Matrix (https ://ocg.cance r.gov/progr were conducted to evaluate the predictive efect of the ams/targe t/data-matri x). To meet the requirement for gene signature. Finally, we validated the gene signature data analysis, we excluded the samples with incomplete using an external RNA sequencing (RNA-Seq) expres- information, then 84 OS expression data were remained. sion data obtained from Te Terapeutically Applicable Genes that have average expression (Transcripts Per Research to Generate Efective Treatments (TARGET). Million (TPM) > 1) between samples were deemed as Fig. 1 Flow chart of study design Guan et al. Cancer Cell Int (2020) 20:104 Page 3 of 10 Table 1 Clinical features of patients in the training set and validation set Training set p Validation set p Metastasis Non‑metastasis Metastasis Non‑metastasis Age 0.41 0.56 Median 16 18 14.05 14.37 Gender 0.11 0.25 Female 9 10 12 25 Male 25 9 9 38 Grade 0.33 – – – 1 11 2 2 9 7 3 7 6 4 3 2 expressed. Te expression value was processed as log2 criteria of MM > 0.8 and GS > 0.2, hub genes in the blue (TPM + 1) for subsequent analysis. module were screened. Cytoscape version 3.7.2 was used for network visualization. Te above analysis is imple- Constructing dynamic weighted gene co‑expression mented using the R package “WGCNA”. network We chose the 3000 most-varying genes for network con- Functional annotation and gene set enrichment analysis struction and module detection. Specifcally, the median (GSEA) absolute deviation (MAD) was used as a robust measure R package clusterProfler was used to conduct GO Bio- of variability. Te network was built based on the proto- logical Process (BP) [13] and KEGG biological pathway cols of R package WGCNA [11, 12]. We frstly clustered over representation analysis for interesting module genes the samples to detect outliers. It appeared there was one [14]. GO terms and KEGG pathways with adjust p < 0.05 outlier and we removed it by hand (Additional fle 1: were considered statistically signifcant pathways. Te Figure S1A). Use PickSoftTreshold function to select enrichment analysis was implemented in command line β = 7 (scale-free R2 = 0.89) to build an adjacency matrix of GSEA [15, 16]. An expression dataset and phenotype to make our gene distribution conform to scale-free net- labels in the GSE21257 dataset were used to conduct works based on connectivity for training set (Additional GSEA analysis according to metastasis status (metasta- fle 1: Figure S1B, C). Next, we used a blockwiseModules sis vs. non-metastasis). Te data was then interrogated function to build a gene co-expression network in one against Reactome gene sets (1499 gene-sets) from Te step and a dynamic tree-cutting algorithm detected the Molecular Signatures Database (MSigDB) version 6.2 [17, modules. Te parameters used for blockwiseModules 18]. We set the cut-of criteria as gene set size > 15, Num- function in WGCNA included a minimum module size ber of enriched gene sets that are signifcant, as indicated of 30, and the dendrogram cut height for module detec- by a FDR of less than 25%. tion set to 0.25 to defne modules of co-expressed probe- sets. Meanwhile we calculated module eigengene of each Cox‑regression based survival analysis module by measuring the frst principal component of a Univariate cox regression analysis was frstly performed specifc module, which represented the overall level of to screen survival related genes.