Integration Analysis for Novel Lncrna Markers Predicting Tumor Recurrence in Human Colon Adenocarcinoma Fangyao Chen1, Zhe Li2, Changyu Deng3 and Hong Yan1*
Total Page:16
File Type:pdf, Size:1020Kb
Chen et al. J Transl Med (2019) 17:299 https://doi.org/10.1186/s12967-019-2049-2 Journal of Translational Medicine RESEARCH Open Access Integration analysis for novel lncRNA markers predicting tumor recurrence in human colon adenocarcinoma Fangyao Chen1, Zhe Li2, Changyu Deng3 and Hong Yan1* Abstract Background: Numerous evidence has suggested that long non-coding RNA (lncRNA) acts an important role in tumor biology. This study focuses on the identifcation of novel prognostic lncRNA biomarkers predicting tumor recurrence in human colon adenocarcinoma. Methods: We obtained the research data from The Cancer Genome Atlas (TCGA) database. The interaction among diferent expressed lncRNA, miRNA and mRNA markers between colon adenocarcinoma patients with and without tumor recurrence were verifed with miRcode, starBase and miRTarBase databases. We established the lncRNA– miRNA–mRNA competing endogenous RNA (ceRNA) network based on the verifed association between the selected markers. We performed the functional enrichment analysis to obtain better understanding of the selected lncRNAs. Then we use multivariate logistic regression to identify the prognostic lncRNA markers with covariates. We also gener- ated a nomogram predicting tumor recurrence risk based on the identifed lncRNA biomarkers and clinical covariates. Results: We included 12,727 lncRNA, 1881 miRNA and 47,761 mRNA profling and clinical features for 113 colon adenocarcinoma patients obtained from the TCGA database. After fltration, we used 37 specifc lncRNAs, 60 miRNAs and 148 mRNAs in the ceRNA network analysis. We identifed fve lncRNAs as prognostic lncRNA markers predicting tumor recurrence in colon adenocarcinoma, in which four of them were identifed for the frst time. Finally, we gener- ated a nomogram illustrating the association between the identifed lncRNAs and the tumor recurrence risk in colon adenocarcinoma. Conclusions: The four newly identifed lncRNA biomarkers might be potential prognostic biomarkers predicting tumor recurrence in colon adenocarcinoma. We recommend that further clinical and fundamental researches be conducted on the identifed lncRNA markers. Keywords: lncRNA, Colon adenocarcinoma, Tumor recurrence, Integrative analysis, ceRNA network Background 24 months after surgery [2] and tumor recurrence still Colorectal carcinoma (CRC) is among the most com- acts as one of the most severe risk factors to overall sur- mon cancers with high morbidity and mortality among vival of CA patients [3]. Tus, the issue of tumor recur- all malignancies and the most common CRC is colon rence following a primary CA becomes very important adenocarcinoma (CA) [1]. It was reported that over 70% [4]. It is essential to identify prognostic markers in order of CA patients would develop tumor recurrence within to study the biological mechanism in CA and identify the candidate targets for therapy. Long non-coding RNAs (lncRNAs), with lengths of at *Correspondence: [email protected] least 200 nucleotides, modulate gene expression at the 1 Department of Epidemiology and Biostatistics, School of Public Health, post-transcriptional level [5]. With the innovations in Xi’an Jiaotong University Health Science Center, 76 Yanta Xilu Road, Xi’an 710061, Shaanxi, China RNA sequencing technologies and computational biol- Full list of author information is available at the end of the article ogy, recent fndings suggest that lncRNAs are involved in © The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Chen et al. J Transl Med (2019) 17:299 Page 2 of 13 the biological process of cancer development [6]. Numer- correct their positively skewed distributions. Suppose xij ous studies on the role of lncRNAs in various types of is the expression for jth lncRNA or miRNA expression cancers have been performed, and several lncRNA bio- of ith individual, the transformed value of jth lncRNA markers have been identifed to be related to the devel- or miRNA expression of ith individual would equal to opment, diagnosis and overall survival of various cancers ln(xij) (if xij > 0) or 0 (if xij = 0). Te transformation can be [7–11]. Recently, lncRNA HOTAIR has been identifed expressed with: to be related to the overall survival in CA [12]. LncRNA ln xij , if xij > 0 ATB is associated with poor prognosis of CRC [13]. Transformed_expression = 0, if xij = 0 LncRNA CC AT1 is reported to be of clinical value in the diagnosis of CA [14]. All these studies suggest a potential Ten, before incorporating into multi-variate analysis, value of lncRNAs in the prognosis and diagnosis of CA. to obtain a better explanation to the coefcient obtained Te competing endogenous RNA (ceRNA) hypoth- in regression analysis, we transform the log-transformed esis was presented as a new model demonstrating the lncRNA expressions into binary variables according to association between non-coding and coding RNAs and whether the log-transformed expression was higher (up- accepted as one of the most efcient tools in lncRNAs regulated) or lower (down-regulated) than its log-trans- research [10]. It has been widely utilized in the identif- formed mean. cation of diagnostic and prognostic lncRNA markers in Te mRNA profling obtained from the TCGA data- various cancers [15–18]. base were normally distributed, therefore, no pre-pro- Few studies have focused on the ceRNA network cessing was performed on the mRNA expressions. related to tumor recurrence in CA [19–22]. Tus, in this study, we aim to establish the lncRNA–miRNA–mRNA Statistical analysis ceRNA network for the tumor recurrence of CA to iden- All analyses were performed through R (version 3.4.4, tify novel prognostic lncRNA biomarkers for the predic- the R Foundation for Statistical Computing, Vienna, tion of tumor recurrence in CA and to achieve better Austria). Clinical and demographic characteristics were understanding of the role of lncRNAs in CA based on tested (with α = 0.05) by Chi-square test (gender, pathol- the RNA sequencing data obtained from Te Cancer ogy stage and tumor site), Mann–Whitney test (TNM Genome Atlas (TCGA) database. stage) and t-test (age at diagnosis). We also used t-test to select lncRNAs, miRNAs (log-transformed) and mRNAs Materials with diferent expression levels between CA patients Data profle with and without tumor recurrence (with α = 0.01), the We obtained the RNA sequencing (including miRNA, p-value obtained were adjusted with the BH method lncRNA and mRNA) measurements and clinical charac- [23]. Prognostic lncRNA markers were identifed based teristics of CA patients from Te Cancer Genome Atlas on the adjusted ORs obtained from multi-variate logistic (TCGA) database (https ://cance rgeno me.nih.gov/), an regression. open source of information to identify novel biomarkers in cancer research, using R package “TCGAbiolinks” in Establishment of lncRNA–miRNA–mRNA ceRNA network June 2018. As a result, 12,727 lncRNA, 1881 miRNA and We constructed the lncRNA–miRNA–mRNA ceRNA 47,761 mRNA profling were obtained. network to identify miRNAs associated mRNAs based on We excluded patients with missing information in the interaction among lncRNA, miRNA and mRNA that tumor recurrence status, tumor location, venous sta- were verifed based on the miRcode (http://www.mirco tus, lymphatic invasion status, histology type, pathology de.org/) [24], starBase (http://starb ase.sysu.edu.cn/) [25], stage, TNM stage or age at diagnosis. We also excluded and miRTarBase (http://mirta rbase .mbc.nctu.edu.tw/ records without lncRNA, miRNA or mRNA meas- php/index .php) databases [26]. First, we use t-test (with urements. Finally, 113 records were included in this BH correction) to select lncRNAs, miRNAs and mRNAs study. Te maximum follow-up time was 10.37 years with diferent expression levels between cases with (3780 days) with medium follow-up time equal to and without tumor recurrence. Ten, the diferentially 1.34 years (488 days). During the follow-up, 6 out of 113 expressed lncRNAs, miRNAs and mRNAs, which have individuals were recorded as deceased. been verifed in the miRcode, starBase and miRTanBase databases, were incorporated into the construction of Data pre‑processing lncRNA–miRNA–mRNA ceRNA network. Te lncRNA– Since the obtained lncRNA and miRNA expres- miRNA–mRNA ceRNA network was conducted using sion measurements were not normally distributed, the Cytoscape software (version 3.6.1, National Institute we performed a log transformation to normalize and of General Medical Science, Bethesda, MD, US) [27]. We Chen et al. J Transl Med (2019) 17:299 Page 3 of 13 also used the “clusterMaker2” [28] application within in Table 1 Clinical characteristics of included CA patients the Cytoscape software to identify subnetworks through Factor Categories Tumor status Total p‑value the Markov Cluster Algorithm (MCL clustering) [28]. Free Recurred Functional annotation analysis Gender Female 10 53 63 0.361a We performed the functional annotation analysis using Male 5 45 50 “clueGO” [29] application within the Cytoscape software. Venous invasion No 11 73 84 0.924a Gene Ontology (GO) analysis was performed based on Yes 4 25 29 the GO database (http://www.geneo ntolo gy.org/) [30]. Lymphatic inva- No 9 48 57 0.427a Pathway enrichment analysis was performed based on sion Yes 6 50 56 the Kyoto Encyclopedia of Genes and Genomes database Pathology stage I/II 10 57 67 0.053a (KEGG) (https ://www.genom e.jp/kegg/pathw ay.html) III/IV 5 41 46 [31] and the Reactome (https ://www.react ome.org/) T stage T1 0 3 3 0.151b databases [32].