Wang et al. Cancer Cell Int (2018) 18:27 https://doi.org/10.1186/s12935-018-0513-3 Cancer Cell International

PRIMARY RESEARCH Open Access Integrated TCGA analysis implicates lncRNA CTB‑193M12.5 as a prognostic factor in lung adenocarcinoma Xuehai Wang, Gang Li, Qingsong Luo, Jiayong Xie and Chongzhi Gan*

Abstract Background: Lung cancer is a malignant tumor with the highest incidence and mortality around the world. Recent advances in RNA sequencing technology have enabled insights into long non-coding RNAs (lncRNAs), a previously largely overlooked species in dissecting lung cancer pathology. Methods: In this study, we used a comprehensive bioinformatics analysis strategy to identify lncRNAs closely associ- ated with lung adenocarcinoma, using the RNA sequencing datasets collected from more than 500 lung adenocarci- noma patients and deposited at The Cancer Genome Atlas (TCGA) database. Results: Diferential expression analysis highlighted lncRNAs CTD-2510F5.4 and CTB-193M12.5, both of which were signifcantly upregulated in cancerous specimens. Moreover, network analyses showed highly correlated expression levels of both lncRNAs with those of diferentially expressed -coding , and suggested central regulatory roles of both lncRNAs in the co-expression network. Importantly, expression of CTB-193M12.5 showed strong negative correlation with patient survival. Conclusions: Our study mined existing TCGA datasets for novel factors associated with lung adenocarcinoma, and identifed a largely unknown lncRNA as a potential prognostic factor. Further investigation is warranted to character- ize the roles and signifcance of CTB-193M12.5 in lung adenocarcinoma biology. Keywords: Lung cancer, lncRNA, Prognostic factor, TCGA datasets

Background problem, and is therefore under intensive biomedical and Lung cancer is the leading cause of cancer-related mortal- clinical research. ity worldwide, with a particularly low 5-year survival rate Breakthroughs in ‘omics’ technologies, such as genom- for patients sufering from this disease at its advanced ics, transcriptomics, and proteomics, have opened ave- stages. In the US, lung cancer is estimated to account nues for a systematic approach for understanding and for approximately one quarter (26%) of all cancer-related treating cancer [3, 4]. In particular, a furry of recent deaths in the year 2017 [1]. In China, which currently cancer profling studies have focused on RNA sequenc- hosts the largest population in the world, 730,000 new ing (RNA-Seq), a rapidly maturing development of the cases of lung cancer were estimated for the year 2015, next-generation sequencing technology. Compared with along with more than 610,000 deaths [2]. Across the microarray analysis, RNA-Seq profling allows for larger globe, as incidence and mortality generally continue dynamic range, and higher sensitivity and throughput [5]. with rise, lung cancer has become a major public health As a result, RNA-Seq profling has been used in several recent studies of lung cancer molecular pathogenesis, including discovery of novel mutations in key oncogenes *Correspondence: [email protected] and genomic rearrangements in squamous cell lung can- Department of Thoracic Surgery, Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, 32 West Second Section First cer [6] and adenocarcinoma [7], identifcation of potential Ring Road, Chengdu 610072, Sichuan, People’s Republic of China

© The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Wang et al. Cancer Cell Int (2018) 18:27 Page 2 of 16

biomarkers in non-small cell lung cancer (NSCLC) [8], validation and characterization by “wet bench” and clini- and quantifcation of expression of marker genes [9]. cal research. One revelation largely enabled by high-throughput sequencing analysis was that non-coding RNAs make up Methods the majority (approx. 85%) of transcriptome. Based on Data collection and preprocessing transcript length, non-coding RNAs can be divided into Te data used in this study were obtained from Te Can- short non-coding RNAs (sncRNAs, < 200 nucleotide) cer Genome Atlas database (https://portal.gdc.cancer. and long non-coding RNAs (lncRNAs, > 200 nucleotide) gov/), including protein-coding transcript and lncRNA [10]. Deregulation of lncRNAs has been well recognized profles of lung adenocarcinoma specimens and the cor- in cancer, and has been suggested to modulate tumor responding patient clinical follow-up data. RNA-Seq data development at chromosomal, transcriptional, and post- (presented as Fragments Per Kilobase Million) were col- transcriptional levels [10, 11]. In lung cancer, the list of lected on Illumina HiSeq platforms. implicated lncRNAs is expanding rapidly [11]. How- Te two datasets came from a total of 592 specimens, ever, much still remains unknown about the mechanics which consisted of 59 normal and 533 cancerous tissues. and signifcance of lncRNAs in many aspects of this dis- Notably, there are 57 pairs of cancerous and the corre- ease, such as carcinogenesis, development, metastasis, sponding adjacent tissue in the datasets. Before further response to anti-cancer treatment, and prognosis. processing, quantile normalization was performed on the In this study, we took advantage of large-scale expres- ‘Level-3’ read counts to standardize the data. sion profles and a systems biology strategy to identify Next, we selected lncRNAs and PCGs whose normal- lncRNAs that were signifcantly regulated in lung cancer ized FPKM values were larger than 1 (in RNA-seq analy- specimens, and were strongly co-expressed with a large ses, genes with a FPKM value no great than 1 are typically pool of protein-coding genes (PCGs). In order to detect considered as not expressed) in more than 50% of all 57 co-expression pattern among the lncRNAs and PCGs in specimen pairs, and extracted the expression of these our TCGA datasets, weighted gene co-expression net- retained lncRNAs and PCGs from all 593 specimens for work analysis (WGCNA) was applied. WGCNA has been further analysis. established as an efective data mining method for fnd- ing clusters or modules of highly correlated biomolecules Screening for diferentially expressed lncRNAs and identifying intramodular “hubs”, including genes and protein‑coding genes [12], miRNAs [13], and metabolites [14]. Consequently, Expression profles of lncRNAs and PCGs were analyzed WGCNA has been successfully applied in several lung separately, in order to identify the diferential expression cancer profling investigations, such as identifcation of these genes in normal and cancerous tissue samples. A of diferential mRNA expression [12] and lncRNAs previously reported approach was used in screening for expression profle signature [15] in lung squamous cell diferentially expressed genes [16]. Briefy, for lncRNAs carcinoma. and PCGs with an expression of 0 in more than 30% of In the present study, we used RNA-Seq datasets from either normal or cancerous tissues, Filter B was applied, Te Cancer Genome Atlas (TCGA) database to identify while Filter A was applied for the remaining lncRNAs novel lncRNAs associated with lung cancer. LncRNA and PCGs. profling and protein-coding transcript profles of lung Filter A. fold_change > 2 or fold_change < 0.5 and sta- cancer were extracted from TCGA. Afterwards, these tistically signifcant (p < 0.01, paired Student’s t test), datasets were subjected to a battery of analyses, includ- where fold_change values calculated as indicated in ing diferential expression analysis, co-expression net- Table 1. A fold_change value of greater than two indi- work and cluster analyses, KEGG pathway enrichment, cates that compared with normal specimen, expression and survival analysis. After several rounds of screening, two largely uncharacterized lncRNAs, CTD-2510F5.4 Table 1 Calculation of fold_change values for lncRNAs and CTB-193M12.5, were identifed. Both lncRNAs were and PCGs screened with Filter A signifcantly upregulated in cancerous specimens and Cancerous Normal co-expressed with 304 protein-coding genes, suggest- sample sample ing a wide spectrum of target PCGs under the modula- tion of these two lncRNAs. More importantly, expression Number of samples A1 B1 where expression is 0 levels of CTB-193M12.5 also showed signifcant nega- Number of samples A2 B2 tive correlation with the prognosis of the patients from where expression is not 0 whom the RNA-seq datasets were derived. Together, our fold_change (A2 * (B1 B2)/B2 * (A1 A2)) > 2 results provide a promising lncRNA candidate for further = + + Wang et al. Cancer Cell Int (2018) 18:27 Page 3 of 16

of the gene is upregulated in the cancerous specimens, and those with low level of correlation. Te former whereas a fold_change of less than 0.5 indicates down- group was then selected for KEGG pathway enrichment regulated expression in cancerous specimens. analysis. Filter B. fold_change > 2 or fold_change < 0.5 and statistically signifcant (p < 0.01, Fisher’s exact test), Kaplan–Meier survival analysis where fold_change values were calculated as fold_ To investigate the impact of the expression levels of two change = non-zero expression in cancerous specimen/ candidate lncRNAs on prognostic survival of patients, non-zero expression in normal specimen. Kaplan–Meier survival analysis was performed using Subsequently, we performed hierarchical cluster analy- GEPIA (http://gepia.cancer-pku.cn/), a web-based sis using the R package heatmaps. Based on the z-scores interactive toolkit for analyzing gene expression profl- derived for the expression levels of selected genes in all ing datasets [18]. We compared the prognostic survival samples, we calculated the Euclidean distances of all gene of patients groups based on the expression level of the pairs, which were then used to detect gene clusters. either lncRNA. Briefy, patients were assigned into either the high expression or low expression group based on the Co‑expression network analysis of the expression expression level of each lncRNA in their specimens, and of lncRNAs and PCGs the prognostic survival was analyzed using the survival For detection of gene co-expression modules, co-expres- analysis feature with default parameters. sion network analysis was performed on both expression To further validate the results, another web-based profles using an R package WGCNA [17]. interactive toolkit, Kaplan–Meier plotter was applied Briefy, following FPKM normalization, the Pearson’s [19]. Kaplan–Meier plotter (http://kmplot.com/analysis/) correlation coefcient (PCC) cor(i, j) was calculated for is a comprehensive online platform that ofers assess- each pair of retained lncRNAs and PCGs from the corre- ment of the efect of 54,675 genes on survival based on sponding expression levels. Next, a similarity co-expres- 10,293 cancer samples. In particular, we focused on the sion matrix was computed as follows: dataset of 2437 lung cancer patients with a mean follow- up of 49 months. We selected all databases related to β aij = 0.5 × 1 + cor i, j NSCLC (GSE1918, GSE29013, GSE30219, GSE31210, GSE3141, GSE37745, GSE50081), which included a total where aij represents connection strength between nodes of 673 patients, to assess the prognostic value of the two i and j. candidate lncRNAs in lung adenocarcinoma carcinoma. Afterwards, the similarity matrix was transformed to LncRNA function predication and target gene enrichment an adjacency matrix (AM) using a power β = 14, based on the scale-free topology criterion described in the analysis WGCNA package documents [17]. Ten, a topological Next, we used RIblast, an RNA–RNA interaction predic- overlap matrix (TOM) was derived from the AM, and tion algorithm package to predict target mRNAs [20]. was in turn converted into a dissimilarity TOM, from Using a seed-and-extension approach, RIblast discov- which a dendrogram was mapped via hierarchical clus- ers seed regions using sufx arrays, and extends these tering. By applying the dynamic tree cutting technique, regions based on an RNA secondary structure energy clusters were obtained from the dendrogram. Te result- model. We used 27,519 mRNA sequences obtained from ing clusters are co-expression modules containing lncR- Te RefGene database (http://varianttools.sourceforge. NAs and PCGs that are considerably interconnected. net/Annotation/RefGene) to establish the RIblast dataset. Te predicted target genes were sorted by sum_energy, Analysis of correlation between co‑expression modules and the top 100 genes were selected for GO enrichment and clinical status analysis. After identifying co-expression modules, we selected the Blue module, a co-expression of fve lncRNAs and 304 Results PCGs, as our evidence suggested that it was the mod- Preprocessing of the datasets ule most positively correlated with lung cancer. Ten, We used RNA-seq datasets (presented as FPKMs) col- through clustering of PCCs, two lncRNAs closely cor- lected from 592 specimens consisting of 533 cancerous related to PCGs were selected from the Blue module. In and 59 normal specimens, including 57 pairs of matched parallel, we also screened for PCGs in the module based cancerous and normal adjacent tissue samples. Moreover, on the strength of their correlation with the fve lncR- the datasets contain expression levels of 14,448 lncRNAs NAs. Te PCGs were classifed into two groups, namely and 19,069 protein-coding genes. Upon obtaining the those with high level of correlation to the fve lncRNAs, expression profles, we performed quantile normalization Wang et al. Cancer Cell Int (2018) 18:27 Page 4 of 16

to standardize the datasets. Afterwards, we selected between normal and cancerous states in lung cancer. lncRNAs and PCGs whose expression levels are greater Moreover, this higher level of specifcity lends lncRNAs than 1 in more than 50% of the 57 matched specimen to be more suitable targets for targeted therapy of lung pairs. After the selection, a total of 679 lncRNAs and adenocarcinoma. 12,040 PCGs were used for further analysis. Analysis of diferential expressions of PCG and lncRNAs Expression analysis of lncRNAs and PCG As described in a previous section, 679 lncRNAs and We compared the expression levels of lncRNA against 12,040 PCGs were retained for diferential expression those of PCGs in both normal (Fig. 1a) and in cancerous analysis, during which fold change of the expression level tissue samples (Fig. 1b). In both types of tissues, expres- of each PCG or lncRNA was calculated as aforemen- sion levels of lncRNAs are much lower than those of tioned. A total of 119 diferentially expressed lncRNAs PCGs, which is consistent with previous reports [21, 22]. and 1934 PCGs were identifed. Table 2 presents an over- Furthermore, we compared global expression diferences view of the numbers of diferentially expressed lncRNAs in the expression of PCGs (Fig. 1c) and lncRNAs (Fig. 1d) and PCGs. Interestingly, while comparable numbers of in both types of specimens. As shown in these fgures, diferentially expressed PCGs showed signifcant up- or both PCG and lncRNA showed signifcant diferences. In down-regulation, the majority of diferentially expressed particular, the expression of PCGs and lncRNAs in can- lncRNAs was downregulated in cancerous specimens as cerous specimens are generally low. In terms of expres- compared with normal ones, suggesting that in lung can- sion levels between normal and cancerous specimens, cer, lncRNAs are more inclined to be downregulated. lncRNAs showed grater variance than PCGs, suggest- Next, we performed clustering analysis on these dif- ing an interesting possibility that lncRNA expression is ferentially expressed lncRNAs and PCGs. As shown in more specifc, whereas PCGs are expressed more stably, the resulting heatmaps, diferentially expressed lncRNAs

a Normal samples b Cancer Samples p=3.45e−20 p=3.46e−20 6 6 e FPKM e FPKM maliz maliz 45 23 Quantiles nor 2345 Quantiles nor

PCG LncRNA PCGLncRNA

c PCG d LncRNA p=5.70e−11 6.5 3.0 e FPK M e FPK M 2.5 6.0 maliz maliz 2.0 5.5 1.5 Quantiles nor Quantiles nor p=1.37e−13

Cancer Normal Cancer Normal Fig. 1 Comparison between global expression levels of PCGs and lncRNAs in normal and lung cancer biopsies. PEG and lncRNA expression levels are shown in a normal and b cancerous samples. Expression levels of c PCGs and d lncRNAs between normal and cancerous samples were also compared Wang et al. Cancer Cell Int (2018) 18:27 Page 5 of 16

Table 2 An overview of diferentially expressed lncRNAs (Fig. 2a) and PCGs (Fig. 2b) consistently distinguished and PCGs normal specimens from cancerous ones. lncRNAs PCGs KEGG enrichment analysis Upregulated in lung cancer 29 899 After identifying diferentially expressed PCGs, we per- Downregulated in lung cancer 90 1035 formed a KEGG pathway enrichment analysis using the Total 119 1934 R package clusterProfler for overview of the biologi- cal signifcance of these genes [23]. As shown in Fig. 3,

a SampleClass b Cancer Normal

3

2

1

0

−1

−2

−3

Fig. 2 Heatmaps showing clustering patterns of diferentially expressed a PCGs and b lncRNAs between normal and lung cancer specimens

ab

Cell cycle Rap1 signaling pathway

Biosynthesis of amino acids Count p.adjust 10 0.04 Carbon metabolism 15 Osteoclast differentiation 0.03 20 ECM−receptor interaction 0.02 0.01 Protein digestion and absorption p.adjust Complement and coagulation cascades 0.03 Glycolysis / Gluconeogenesis Count 0.02 Malaria 15 20 DNA replication 0.01 25 Fructose and mannose metabolism Staphylococcus aureus infection One carbon pool by folate

0.02 0.03 0.04 0.05 0.03 0.04 0.05 GeneRatio GeneRatio Fig. 3 KEGG enrichment analysis of protein-encoding genes that are signifcantly a upregulated and b downregulated in lung cancer samples as compared with normal samples Wang et al. Cancer Cell Int (2018) 18:27 Page 6 of 16

nine KEGG pathways were enriched among signifcantly and identifed co-expression modules. Although the upregulated genes, which encompass a variety of cellular WGCNA approach has been highly automated through processes, including cell cycle, DNA replication, ECM- continued algorithm optimization, several key param- receptor interaction, and several metabolism-related eters still needed to be fne-tuned empirically in order to pathways. On the other hand, fve KEGG pathways were ensure that the co-expression network to be constructed enriched among downregulated genes, including sign- is scale-free [17]. To this end, we fnally determined a β aling cascades such as Rap1 signaling pathway and the value of 14 (Fig. 4a, b). complement and coagulation cascades. Subsequently, the expression matrix was transformed a topological overlap matrix, to which we applied the Identifcation of co‑expression modules average-linkage method for sequence clustering. Next, lncRNAs have been known to regulate gene expression in we employed a dynamic tree cutting procedure to detect a number of ways, assuming roles including decoys, scaf- co-expression clusters (i.e. modules). Again, after opti- folds, guides, and signals [10]. We postulate that for an mization, the minimal number of genes in each cluster lncRNA to regulate the expression of a PCG, their expres- was set at 30 in order to fulfll the criteria of dynamic tree sion profles are expected to exhibit similar patterns. cutting. Afterwards, another round of clustering analysis Terefore, using the R package WGCNA, we mapped a (height = 0.25) was performed, where closely associated weighted co-expression network of lncRNAs and PCGs modules were merged into larger ones.

a Scale independence bdMean connectivity 14 20 1 Module−trait relationships 10111213 15 171819 6 7 8 9 16 0. 8 5 −0.92 0.92 1 turquoise (2e−48) (2e−48) 60 0

.6 4 2 −0.78 0.78 brown (2e−24) (2e−24) 3 0.5 0. 40

00 0.62 −0.62 green (3e−13) (3e−13) .2 2 signed R^ 0 2 −0.4 0.4

SF T, yellow (1e−05) (1e−05) Mean Connectivity 0. 00 20 04 3 0.74 −0.74 −0.5 4 blue (3e−21) (3e−21) 5 6 7 1 0 8 91011121314151617181920 0.55 −0.55 red (3e−10) (3e−10) −0.4 −1 5101520 5101520 Soft Threshold (power) Soft Threshold (power) rmal Tumor No

c Cluster Dendrogram 0.9 0.7 Height 0.5 0.3

Module colors

Fig. 4 An overview of co-expression modules identifed from the lung cancer RNA-seq datasets used in this study. a Scale independence and b mean network connectivity for diferent soft-thresholding powers (β). A soft-thresholding power of 14 was selected to achieve maximal model ft. c Cluster dendrogram of the identifed lncRNA-PEG co-expression modules. Each of the diferential expressed lncRNAs and PCGs is represented by a tree leaf, and each of the six modules by a tree branch. The lower panel shows colors designated for each module. Note that the leaves presented in the color gray indicates unassigned lncRNAs and PCGs. d Module-trait weighted correlations and corresponding p values (in parenthesis) between the identifed modules and clinical status (normal and tumor). The color scale on the right shows module-trait correlation from 1 (green) to 1 − (red), where green represents a perfect negative correlation, and red a perfect positive one Wang et al. Cancer Cell Int (2018) 18:27 Page 7 of 16

In the end, WGCNA analysis identifed six co-expres- 89 lncRNAs was constructed. Notably, we also computed sion modules (Fig. 4c). Table 3 summarizes the distri- and plotted the correlation of each module with the clini- bution of lncRNAs and PCGs among these modules. cal status of the corresponding samples, as a measure the Altogether, a co-expression network of 1303 PCGs and strength of correlation between the lncRNAs and PCGs in that module and lung cancer. As shown in Fig. 4d, the Blue module showed the strongest positive correlation (module-trait weighted correlation = 0.74) with cancer- Table 3 A summary of the six co-expression modules ous specimens, and the turquoise module a negative cor- revealed with WGCNA analysis relation that is close to perfect (module-trait weighted Module color No. lncRNAs No. PCGs correlation = − 0.92). Next, we performed a second KEGG enrichment Blue 5 304 analysis for each co-expression module with clusterPro- Brown 13 141 fler. From the 1303 PCGs in the six modules, a total of Green 1 63 26 KEGG pathways were enriched. As shown in Fig. 5, a Red 1 39 distinct set of pathways were enriched from each mod- Yellow 4 67 ule, with no overlapping, suggesting largely independent Turquoise 45 709 sets of functions exerted by genes in each co-expression

Cell cycle ECM-receptor interaction Amoebiasis Oocyte meiosis p53 signaling pathway Fanconi anemia pathway DNA replication AGE-RAGE signaling pathway in Progesterone-mediated oocyte green diabetic complications maturation

Cellular senescence blue Antifolate resistance Protein digestion and absorption

Base excision repair Biosynthesis of amino acids Focal adhesion One carbon pool by folate

Glycolysis / Gluconeogenesis Mismatch repair Carbon metabolism

Vascular smooth muscle contraction Phagosome red Staphylococcus aureus infection

turquoise

brown Amino sugar and nucleotide sugar metabolism

Complement and coagulation cascades PPAR signaling pathway

Module Osteoclast differentiation KEGG Pathway Fig. 5 KEGG pathway analysis of the six modules Wang et al. Cancer Cell Int (2018) 18:27 Page 8 of 16

module. Furthermore, by comparing with the KEGG Analysis of the expression level of lncRNAs in the Blue pathways enriched from diferentially expressed genes, module we noticed that six out of the nine pathways enriched We analyzed the expression levels of the fve lncRNAs in in upregulated genes were also enriched from the Blue the Blue module between cancerous and normal tissues module. As described above, our correlation analy- (Fig. 7). All fve lncRNAs were signifcantly upregulated sis indicates that this co-expression module shows the in lung cancer samples as compared with normal sam- strongest positive correlation with cancerous specimens. ples (p < 0.001 for all, Mann–Whitney test). In particular, Moreover, among the 14 KEGG pathways enriched from CTD-2510F5.4 and CTB-193M12.5 showed most intense this module, many have been established as key cascades upregulation. closely related to the initiation, growth, and dissemina- tion of lung cancer, including cell cycle and senescence, Construction of an lncRNA‑PCG regulatory network DNA damage repair, and p53 signaling [24–26]. Tese Next, we set out to construct an lncRNA-PCG regula- enriched KEGG pathways suggest high relevance of the tory network of the fve lncRNAs in the Blue module and genes in the Blue module to lung cancer. the 178 highly correlated PCGs selected in the previous section. Protein–protein interaction data were retrieved Analysis of the correlation between lncRNAs and PCGs from Human Integrated Protein–Protein Interaction in the Blue module rEference database and visualized with Cytoscape. A Due to its strong positive correlation with cancerous regulatory network with 683 connections and 182 nodes specimens, as well as the versatile enriched KEGG path- was constructed. As shown in Fig. 8, the majority of ways of signifcance in lung cancer, we looked further into the connections concentrated on a few nodes, suggest- the Blue module. For all fve lncRNAs and the 304 PCGs ing signifcant roles of the corresponding lncRNAs and in the module, we extracted the expression level PCC for PCGs. Notably, two lncRNAs, CTD-2510F5.4 and CTB- each lncRNA-PCG pair, and performed cluster analysis 193M12.5, had 172 and 81 connections, respectively based on these PCCs. As shown in Fig. 6, two lncRNAs, (Table 5). Tese connections constituted approx. 25 and namely CTD-2510F5.4 and CTB-193M12.5, showed the 12% of all connections in the regulatory network, which strongest overall co-expression with the PCGs, suggest- pointed to the centrality of these two ‘hub’ lncRNAs. ing central roles for these lncRNAs in this co-expression To highlight the highly connected genes, we selected module. only nodes with more than 15 connections. Tese nodes Next, we examined the annotated functions of 304 corresponded to three lncRNAs and ten PCGs (Table 5). PCGs in the module. Based on overall strength with the From the ten PCGs, six KEGG pathways were enriched fve lncRNAs, we selected 178 PCGs (PCC > 0.6) and (Fig. 9a), which encompass essential aspects of cancer performed KEGG pathway enrichment analysis (Table 4). biology, including cell cycle and senescence, DNA repli- Notably, out of the 15 KEGG pathways enriched, a pre- cation, and viral carcinogenesis [24, 27, 28]. dominant majority (13 pathways) overlapped with those A closer analysis of the correlation between the lncR- enriched from all PCGs in the module, suggesting that NAs and ten PCGs revealed that all ten PCGs were these 178 PCG are representative of the major functions signifcantly correlated to CTD-2510F5.4, nine to CTB- of the protein-coding genes in the Blue module. 193M12.5, and one to RP11-467L13.7 (Fig. 9b). Te

CTD−2510F5.4 0.8

CTB−193M12.5 0.6

SNHG1 0.4

RP11−467L13.7 0.2 MAFG−AS1 RU ALDO ILF2 BRIX1 INTS1 3 CKS1B CCT 3 HDGF SHMT2 KIF22 SFXN1 PGK1 SUV39H2 DUS4L F HSPD1 ZC3HA DHFR NUP155 CDCA7 DDX11 PFKP E2F1 CCNF CENP W FA KIF3C GAPDH TMSB10 TRIP13 CCT 5 TPI1 PKM H2AFX APOBEC3B TUBG1 UBE2C RA BOP 1 TBRG4 RPL39L AIMP2 TK 1 FIGNL 1 CENP L RECQL BIRC5 CDC20 CHAF1B ASF1B CDKN3 CENP M ENO1 AT AD3A POC1A DTYM K JPT1 PCLAF DKC1 PTTG MRGBP ASNS UBE2T P FA P PAT DBF 4 RFC5 AT AD2 CKAP 2 ZWILC CDCA5 PUS7 T BUB1 CCNA2 KIF11 FBXO5 CDCA4 RFWD 3 CENP N NE TO TUBA1C RA MKI67 KIF20B INCENP MTHFD2 F ECT2 PR TIMELES PRC1 NCAPD3 TA CCNB2 NUSAP1 KIF2C CDT 1 KIF20A MIS18A CCNB1 A A CEP5 5 CENP O KPNA2 RRM 2 MYBL 2 SP RBL 1 LMNB1 MCM7 EZH SPDL 1 MCM8 DSN1 CHTF1 8 GMNN NDC1 PRR11 CHAF1A WDR76 DT L FO TPX2 NCAP H ZWINT MCM6 NCAPG2 RAD51AP MY CENP U DONSON CENP H KIFC1 FEN1 SASS6 MCM4 NDE 1 TPBG TRIB3 PA RNASEH2A KNOP1 SLC5A6 FBXL19 CHEK2 CDC7 CDCA8 MSH2 GPI BARD1 PPIF UBE2S TYMS PSRC1 C19orf48 MCM2 XPO5 WA TCF1 9 DSCC1 RMI2 RFC4 LMNB2 ANL N MTHFD1L NCAPD2 HMGA1 BORA KNSTRN F PSA PMAIP NL N MRPL15 GPSM 2 POL B NECTIN1 DHRS13 STMN 1 PCNA MARCKSL 1 FO MEST FBXO32 EIF4EBP1 SBK1 PERP EY CKAP 5 CDK1 SKP2 SSX2IP SO PRKDC KIF2A E2F3 B3GALNT UCK2 AD TMEM132A CDH24 DBN1 SM OX RCC2 TXNRD1 UCHL 1 MAP1B MANEAL SCAMP5 ARNTL2 SLC16A1 PPM1H HO CP OX STMN 3 BCL 9 NSUN2 CEP7 2 SLC6A8 LRP8 CIB2 NIPSNAP1 NPM 3 TMEM177 SO EL CAD EPOP TWN T D SIX4 ZBTB12 LRFN 4 S PAT CABLES HYLS1 BRI3BP SELENOI SLFN13 PA PFN2 RMI1 AT AD3B CEP8 5 HL P TNNT DSG2 EPRS MOCOS DSC2 HMGB3 SLC7A5 MMP12 IDH2 IRS1 SLC2A1 DLG5 TWF 1 ENTPD7 DCBLD2 EPHB2 ETNK2 HOMER 3 CCDC34 MD K GGH PODXL 2 PRR36 ISG15 CDCP1 FSCN1 SLC39A4 MR DNAAF5 TMEM79 GARS ZP 3 SLC52A2 PYCR3 FLAD1 VPS37D HMBS DENND5B FA ANP32E NET1 DSP NUP210 TNFRSF18 CNF N TDRKH T EFNA4 L NAXE NUDT 8 FO PFDN2 CRABP 2 AICS ARP 1 YSMD1 ANCG ANCI AAP24 AF1A ARS 2 OP2A URKA URKB ARS 2 AG QR4 F CC3 TFDC1 M83D M136A M60A XM 1 XRED2 X4 X12 XA1 OV A2 TF SC 1 C3 CGAP1 XB7 AM19 O19 OH6 SF 1 AH1B3 T1 2 S2 K 5 1 L6 2 1 A 1 V1L H 4 2 S 1 2 Fig. 6 Pearson’s correlation coefcients (PCCs) of expression levels of each lncRNA-PEG pair within the Blue module were computed and clustered. In the resulting heatmap, the PCCs of each of the fve lncRNAs with the 304 PCGs were plotted in a separate row. The color scale on the right shows PCC value from 0 (blue) to 1 (red), where blue represents no correlation, and red a perfect positive one. Euclidean distances were used for the clustering analysis Wang et al. Cancer Cell Int (2018) 18:27 Page 9 of 16

Table 4 KEGG pathway enrichment analysis of PCGs with a high overall correlation with the fve lncRNAs in the Blue co- expression module Description GeneRatio p value q value Count

Cell cycle 15/79 2.71E 12 2.56E 10 15 − − DNA replication 8/79 3.18E 09 0.000000151 8 − Biosynthesis of amino acids 9/79 7.29E 08 0.0000023 9 − Carbon metabolism 10/79 0.000000378 0.00000895 10 Glycolysis/Gluconeogenesis 8/79 0.000000524 0.00000992 8 One carbon pool by folate 5/79 0.00000181 0.0000286 5 Fanconi anemia pathway 5/79 0.000302021 0.003766033 5 Cellular senescence 8/79 0.000318021 0.003766033 8 Oocyte meiosis 7/79 0.000369272 0.003887073 7 Progesterone-mediated oocyte maturation 6/79 0.000679229 0.0064348 6 p53 signaling pathway 5/79 0.000810879 0.006983646 5 Mismatch repair 3/79 0.00186508 0.014724313 3 Pentose phosphate pathway 3/79 0.004048799 0.029505417 3 Antifolate resistance 3/79 0.004447883 0.030098454 3 Fructose and mannose metabolism 3/79 0.005315566 0.033571995 3

30 p<0.001

p<0.001 25

p<0.001 20 15 p<0.001

p<0.001 10 Gene Expression 05

r mal mo mal mal mor mal mal umor umor Nor Tu Nor T Nor Tu Nor T Nor Tumor CTB−193M12.5 RP11−467L13.7 CTD−2510F5.4CTD−2510F5.4 MAFG−AS1 SNHG1 Fig. 7 All fve lncRNAs in the Blue module showed signifcant upregulation in lung cancer samples (p < 0.001 for all, Mann–Whitney test)

strong correlation of CTD-2510F5.4 and CTB-193M12.5 dysregulated expression in and close correlation with the to these genes suggested strongly roles of these two lncR- disease. To assess the clinical relevance of these lncRNAs, NAs in regulating the expression of these PCGs, which we performed prognostic survival analysis to examine in turn, modulate the cancer initiation and development whether the expression levels of these lncRNAs signif- through a host of cellular processes, such as cell cycle and cantly correlate to the survival of patients who provided death, and DNA replication. the specimens. As shown in Fig. 10, high expression of both lncRNAs were signifcantly negatively correlated Prognostic analysis of CTD‑2510F5.4 and CTB‑193M12.5 with patient overall survival (OS; Logrank p = 0.0013 for expression levels and patient survival CTD-2510F5.4; Logrank p = 0.0053 for CTB-193M12.5), Evidence so far suggest high relevance of CTD- suggesting potentials of both lncRNAs as prognostic 2510F5.4 and CTB-193M12.5 in lung cancer, including indicators. Wang et al. Cancer Cell Int (2018) 18:27 Page 10 of 16

TPBG

CDKN3

PPAT CENPU APOBEC3B ZWILCH SLC5A6 MTHFD2 CDCA5 CENPO TRIB3 INTS13 CENPL RMI2NCAPG2

UBE2C CENPW CCNF CDC7 WASF1CHAF1A CENPH CKAP2 TYMS SASS6 RFC4 FBXL19ZC3HAV1LDBF4 MYO19 MYBL2 ATAD2 PAQR4 ANLN HDGF CENPM DKC1PTTG1 FAM136A NUSAP1 MIS18A CENPN CDCA4 POC1CHTF18A SPDL1 KPNA2 CDCA8 SFXN1 RRM2 INCENP MRGBPDDX11 XPO5 NCAPH RBL1 TCF1PSAT19 DUS4L CHAF1B E2F1 BOP1 MCM7 CTB-193M12.5 CDCA7 FAM83D UBE2T MCM8 RECQL4 CDT1 RFC5 KIF2RNASEH2A2 HMGA1 KNOP1 RP11-467L13.CCNB7 1 NDE1 FIGNL1 BRIX1 AURKB MCM4 MSH2 TOP2A ASF1BPCLAF CCNA2 PRR11 DSN1 PGK1 CTD-2510F5.4 SHMT2 DSCC1 CHEK2 RAC3NCAPD3 KIF2C PMAIP1 PRC1 ILF2 BORA TPI1 SNHG1 DONSON MAFG-AS1 ASNGMNS N CDC20 BUB1 H2AFX ENO1 MCM6 BIRC5 TIMELESS PAICS CKS1B FOXM1 KIF20A SUV39H2 CCT3 TBRG4 ALDOA MCM2 EZH2 CEP55 LMNB1C19orf48 PSRC1 RAD51AP1 GAPDH TPX2 KNSTRN AURKA NDC1 FEN1 CCNB2 DTL GPI TRIP13 RUSC1 LMNB2 PPIF HSPD1 TUBA1CTACC3 PRTFDC1 NETO2TUBG1PUS7 TK1 ECT2 NUP155 MKI67 FBXO5 CCT5 BARD1 ATAD3APKM ZWINT DTYMK NCAPD2 JPT1 RACGAP1 RFWD3 RPL39L FANCI NLN KIF11 UBE2S KIFC1 WDR76 GPSM2

TMSB10 DHFR

PFKPMRPL15 MTHFD1L SPAG5 FANCG KIF20B lncRNA FAAP24 KIF3C Gene Fig. 8 An interaction network constructed with the fve lncRNAs and 178 highly correlated PCGs (Pearson’s correlation coefcient > 0.6) in the Blue module

To further validate the prognostic value of CTD- CTB‑193M12.5 target prediction and function analysis 2510F5.4 and CTB-193M12.5 in lung cancer, an inde- Target genes of CTB-193M12.5 were predicted with pendent dataset consisting of 673 lung adenocarcinoma RIblast, an RNA–RNA interaction prediction algorithm patients from seven GEO datasets was subjected to [20]. Based on the levels of intramolecular and intermo- Kaplan–Meier survival analysis. As shown in Fig. 11, lecular free energy between lncRNA-mRNA sequence, the two lncRNAs showed opposite direction of corre- a list of target PCGs were generated (Additional fle 1: lation of OS. Of note, in this analysis, CTD-2510F5.4 Table S1). After sorting by sum_energy, the top 100 genes expression showed positive correlation with prog- were subjected to GO enrichment analysis. No molecular nosis (Logrank p = 0.00087), which was inconsistent function pathway was signifcantly enriched. Biological with our results. However, CTB-193M12.5 expression process pathway and cellular component pathway terms level was negatively correlated to patient OS (Logrank (false discovery rate < 0.05, gene number > 20) were p = 3e−07), which was in accordance with our analysis sorted by signifcance, and the top ten enriched terms of the TCGA profles. were retained (Fig. 12). Wang et al. Cancer Cell Int (2018) 18:27 Page 11 of 16

Table 5 Nodes with more than 15 connections in the net- the genome. More than 3000 lncRNAs have been iden- work shown in Fig. 8 tifed so far; however, functions and biological roles for Node Degree Type only 1% of them have been proposed, much fewer char- acterized [10]. Insights into the function of the few char- CTD-2510F5.4 172 lncRNA acterized lncRNAs suggest a surprising diverse variety of CTB-193M12.5 81 LncRNA cellular processes, from chromatin modifcation, tran- PCLAF 33 PCG scription, splicing, and translation to cellular diferentia- MCM2 28 PCG tion, cell cycle regulation, and stem cells reprogramming CDC20 24 PCG [10]. RP11-467L13.7 24 LncRNA Recent emergence and maturation of the RNA AURKA 21 PCG sequencing technology has greatly facilitated identifying MCM7 18 PCG lncRNAs associated with various diseases. Traditional AURKB 16 PCG hybridization-based approaches such as DNA microar- CCNA2 16 PCG ray sufer from several limitations, including reliance on CCNB1 16 PCG sequenced genomes, high background levels, and a rela- CDT1 16 PCG tively narrow dynamic range. More importantly, compar- ENO1 16 PCG ison of expression profles across diferent experiments is often difcult and requires complex data processing. In contrast, RNA-Seq enjoys a number of advantages, Te most signifcantly enriched term was ‘cellular mac- including very low background signal and large dynamic romolecule metabolic process’, also with the greatest gene range of detection. Furthermore, RNA-seq enables high- number. Tis term refers to chemical reactions and path- throughput sequencing of transcriptomes at single-base ways involving macromolecules, including essential meta- resolution, whose quantifcation across experiments bolic processes of DNA and glycoprotein. It is known that can also be performed with simple normalization algo- an important hallmark of cancer cells is a profound change rithms. Together, these factors have made RNA-seq in metabolism. Most tumor cells are characterized by higher an ideal choice for screening for lncRNAs with clinical rates of glycolysis, lactate production, and biosynthesis of signifcance. lipids and other macromolecules [29]. Tese results hint Consequently, databases of publicly available RNA-seq at possible roles of CTB-193M12.5 in regulating lncRNAs profles have been constructed and showing continu- implicated in DNA and/or glycoprotein metabolism. ous growth, although many of the datasets remain to be mined with comprehensive bioinformatics tools in order Discussion to reveal identifes of potential key master regulators that Recent investigations have provided good evidence that could provide hints for validation and clinical applica- opens avenues to the largely unknown roles of lncRNAs, tion. In this study, we used transcriptome datasets col- which are estimated to make up for approximately 85% of lected with RNA-seq to screen for potential lncRNAs

ab MCM2 Cell cycle

p.adjust

0.04 Oocyte meiosis CDC20 PCLAF 0.03

0.02

Progesterone−mediated oocyte maturation 0.01 MCM7 AURKA CTD-2510F5.4

Viral carcinogenesis Count 2 3 CCNA2 CTB-193M12.5 CDT1 Cellular senescence 4 5 RP11-467L13.7

ENO1 DNA replication CCNB1 AURKB 0.30.4 0.50.6 0.7 GeneRatio Fig. 9 a KEGG pathways enriched from the ten PCGs selected from the lncRNA-PEG interaction network. b An interaction network between these ten PCGs and all fve lncRNAs in the Blue module Wang et al. Cancer Cell Int (2018) 18:27 Page 12 of 16

a Overall Survival b Overall Survival

1. 0 Low CTD−2510F5.4 TPM

1. 0 Low CTB−193M12.5 TPM High CTD−2510F5.4 TPM High CTB−193M12.5 TPM Logrank p=0.0013 Logrank p=0.0053 HR(high)=1.6 .8 HR(high)=1.5

p(HR)=0.0014 0. 8 p(HR)=0.0056 n(high)=239 n(high)=239 n(low)=239 n(low)=239 al al .6 iv 0. 60 iv 0. 4 0. 40 rcent su rv rcent su rv Pe Pe .2 .2 0. 00 0. 00

050100 150200 250 050100 150200 250 Months Months Fig. 10 Kaplan–Meier analysis of lung adenocarcinoma-specifc overall survival of patient (TCGA Datasets) with tumors expressing diferent levels of a CTD-2510F5.4 and b CTB-193M12.5. Dotted line indicates 95% confdence interval

Fig. 11 Kaplan–Meier analysis of lung adenocarcinoma-specifc overall survival of 673 patients with tumors expressing diferent levels of a CTD- 2510F5.4 and b CTB-193M12.5

markers associated with lung cancer. Te expression cancerous specimens), 679 lncRNAs and 12,040 PCGs profles were analyzed with a series of analytical tools. were selected for diferential expression analysis, and As a frst step, lncRNAs and protein-coding genes that 119 lncRNAs and 1934 PCGs were found to be difer- showed signifcant up- or down-regulation were iden- entially expressed. Te large number of diferentially tifed (Fig. 1). From 592 specimens (59 normal and 533 expressed lncRNAs is consistent with the versatile roles Wang et al. Cancer Cell Int (2018) 18:27 Page 13 of 16

a transcription, DNA−templated

nucleic acid−templated transcription

RNA biosynthetic process count 30 35

s protein modification process 40 45 cellular protein modification process

-log10(FDR) macromolecule biosynthetic process 2.5

biological proces 2.4 cellular macromolecule biosynthetic process 2.3

2.2 system development

cellular macromolecule metabolic process

macromolecule metabolic process

0.0060.007 0.008 0.009 0.010 Gene Ratio b nucleoplasm

non−membrane−bounded organelle

intracellular non−membrane−bounded organelle -log10(FDR)

2.4 nuclear lumen 2.2

organelle lumen 2.0

membrane−enclosed lumen count 30 cellular component intracellular organelle lumen 40 50 membrane−bounded organelle 60

intracellular organelle

organelle

0.0050.006 0.007 Gene Ratio Fig. 12 Dot plot of ten most signifcant pathway enriched from the predicted targets of CTB-193M12.5 Wang et al. Cancer Cell Int (2018) 18:27 Page 14 of 16

and regulatory mechanisms of lncRNAs unveiled thus novel onco-lncRNAs of clinical relevance to lung cancer, far, and suggests a vast unchartered territory of the roles although further research is warranted for validation. of these biomolecules in lung cancer biology [10, 11, 30]. As for the 304 PCGs, KEGG pathway analysis showed Te next step was to detect similar patterns of expres- that they were enriched in processes closely related to sion among these diferentially expressed lncRNAs and lung cancer biology, such as p53 signaling, cellular senes- PCGs. Tere were two purposes to this analysis, namely cence, DNA replication, and metabolism [24, 25]. Tese to identify lncRNAs and PCGs that may function in enriched pathways may be used as a basis for gaining pathways in the same cellular processes, and to iden- deeper insights into the fve lncRNAs. tify lncRNAs (hubs) that potentially play central roles Following detection of co-expression networks, we in modulating the expression of targets within the co- chose the Blue module due to its strong correlation with expression module [17, 31]. lung cancer, and determined the hub genes in this mod- Unlike sncRNAs, lncRNAs are poorly conservative and ule. To suppress background noise, 178 PCGs with strong highly versatile in modulating biomolecules. A plethora over correlation with the fve lncRNAs (PCC > 0.6) were of mechanisms by which lncRNAs regulate gene expres- selected and subjected to regulatory network analysis. Two sion have been reported [10]. Due to their large size and lncRNAs, namely CTD-2510F5.4 and CTB-193M12.5, therefore the ability to adopt complex conformations, were identifed as hubs of the resulting network. In addi- lncRNAs can bind to DNAs, RNAs, and . Tese tion, both lncRNAs also showed the strongest overall cor- interactions, in turn, enable lncRNAs to act as activa- relation with all 304 PCGs in the Blue module, further tors, blockers, and scafolds of their interacting partners, supporting their centrality in this co-expression module. including DNA, mRNAs, miRNAs, transcription factors, Moreover, survival analysis showed signifcant correlation and chromatin regulators [11]. At the transcriptional between expression of either lncRNA and poor prognos- level, transcription of lncRNA upstream of a target can tic overall survival, suggesting CTD-2510F5.4 and CTB- facilitate or impede that of the latter through modulating 193M12.5 as potential prognostic indicators. DNA conformation, RNA Pol III activity, or the associa- Currently very little is known about either lncRNA. As tion of transcription factors and promoters. In addition, a result, neither has an ofcial Nomen- lncRNAs also regulate alternative splicing, or serve as clature Committee symbol. CTD-2510F5.4 (GenBank mRNA stabilizers and a sncRNA repertoire. Further- accession AC099850.7) is the transcript of the gene more, lncRNAs can modulate genome activity through ENSG00000265415, which is located to afecting histone modifcation, DNA methylation, and 17 (chromosome 17: 59,065,973–59,264,225). CTD- chromatin structure [10, 11, 32]. 2510F5.4 has been reported to show consistent increased Of the 119 lncRNAs and 1934 PCGs that showed dif- expression in relation to p53 mutations in lung adenocar- ferential expression between normal and cancerous spec- cinomas [33]. Moreover, CTD-2510F5.4 was also found imens, six co-expression modules were detected with to be diferentially expressed in another study that used weighted co-expression network analysis. Among these RNA-seq data from TCGA and two independent experi- modules, the Blue module showed the strongest positive ments of more than 60 lung adenocarcinoma specimens, correlation with lung cancer (Fig. 4d). Te fve lncRNAs which supports the validity of our results. in this module, despite brief mentioning as part of signif- Functions of CTD-2510F5.4 remain to be char- cantly regulated genes in a handful of previous reports acterized. Proline rich 11, a gene neighboring [33–38], remain almost entirely uncharacterized. Inter- ENSG00000265415, was recently suggested as a weak estingly, all fve lncRNAs showed upregulation in lung prognostic factor in non-mucinous invasive lung adeno- cancer specimens, suggesting potential tumor-promoting carcinoma [43], suggesting a possible mechanism by roles. which elevated CTD-2510F5.4 expression contributes to Similar to protein-coding genes, lncRNAs can be clas- poor prognosis. As suggested by KEGG pathway analy- sifed into two major groups, tumor suppressor lncRNAs sis, CTD-2510F5.4 may also be implicated in key lung and onco-lncRNAs [39]. Several lncRNAs have been pro- cancer-related cellular processes such as senescence. posed as oncogenic in lung cancer, including MALAT1 Upon induction of cellular senescence with overexpres- (a diagnostic and prognostic biomarker in NSCLC) [40], sion of oncogene B-RAF, CTD-2510F5.4 was shown to be AK126698 (mediates cisplatin resistance in NSCLC) downregulated as compared with control cells [34]. Since [41], and lncRNA-DQ786227 (implicated in chemical oncogene-induced senescence (OIS) is an important carcinogenesis) [42]. All three onco-lncRNAs showed defense mechanism against lung cancer initiation [44], a upregulation in lung cancer, similar to the fve lncRNAs hypothesis could be proposed, in which aberrant overex- in the Blue module. Conceivably, these lncRNAs may be pression of CTD-2510F5.4 contributes to survival of cells overexpressing the tumor-promoting B-RAF despite OIS, Wang et al. Cancer Cell Int (2018) 18:27 Page 15 of 16

and thereby exert oncogenic functions. More research Abbreviations TCGA: The Cancer Genome Atlas; lncRNA: long non-coding RNA; RNA-Seq: is, obviously, needed for validation of this hypothetical RNA sequencing; NGS: next-generation sequencing; NSCLC: non-small cell mechanism. lung cancer; PCG: protein-coding genes; WGCNA: weighted gene co-expres- Te other hub lncRNA, CTB-193M12.5 (GenBank sion network analysis; FPKM: Fragments Per Kilobase Million; PCC: Pearson’s correlation coefcient; KEGG: Kyoto Encyclopedia of Genes and Genomes. accession AC026401.7), is the product of the gene ENSG00000280206, which is located to chromosome Authors’ contributions 16 (chromosome 16: 15,570,622–15,708,653). CTB- XW and CG conceived and designed the experiments; XW analyzed the data; QL, GL and JX wrote the paper. All authors read and approved the fnal 193M12.5 was found to be upregulated in lung squamous manuscript. cell carcinomas in a recent report analyzing RNA-seq profles [37], which is consistent with our fnding of the Acknowledgements overexpression of this lncRNA in lung cancer specimens. We thank all the doctors in our department for their help in patients care so In addition, expression of this lncRNA was reported to that we can have enough time to conduct this study. be dramatically increased in gastric cancer tissues [37] Competing interests and in triple negative breast cancer cell lines and pri- The authors declare that they have no competing interests. mary tumors (Cancer RNA-seq Nexus database, analysis title GSE58135) [45]. We also tried to gain insights into Availability of data and materials The data of this manuscript can be download from The Cancer Genome Atlas the potential functions of CTB-193M12.5 by predict- database (https://portal.gdc.cancer.gov/). ing its target PCGs and enriched pathways. Te most signifcantly enriched term suggests the roles of CTB- Content of publication Not applicable. 193M12.5 in DNA and/or glycoprotein metabolism, both are known to be crucial in cancer progression [29]. Ethics approval and consent to participate In summary, starting from TCGA gene transcript pro- Not applicable. fles collected from 592 lung cancer specimens, through Funding integrated bioinformatics analyses, we identifed two Not applicable. largely unknown lncRNAs CTD-2510F5.4 and CTB- 193M12.5. Expression levels of both lncRNAs were Publisher’s Note signifcantly increased in lung cancer specimens, and Springer Nature remains neutral with regard to jurisdictional claims in pub- lished maps and institutional afliations. showed strong correlation with those of more than 300 diferentially expressed protein-coding genes. Moreover, Received: 12 December 2017 Accepted: 23 January 2018 further analysis placed these lncRNAs in the center of the regulatory network consisting of the lncRNAs and PCGs in a co-expression module that showed the strongest pos- itive correlation with lung cancer. Most importantly, high References expression of CTD-2510F5.4 and CTB-193M12.5 signif- 1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2017. CA Cancer J Clin. cantly correlated to poor overall prognostic patient sur- 2017;67:7–30. 2. Chen W, Zheng R, Baade PD, Zhang S, Zeng H, Bray F, Jemal A, Yu XQ, He J. vival, and the prognostic value of the latter was further Cancer statistics in China, 2015. CA Cancer J Clin. 2016;66:115–32. supported by an independent validation. 3. Yoo BC, Kim KH, Woo SM, Myung JK. Clinical multi-omics strategies for the Altogether, these results provide evidence that, for the efective cancer management. J Proteom. 2017. https://doi.org/10.1016/j. jprot.2017.08.010. frst time, correlate CTB-193M12.5 with prognosis of 4. Shukla HD. Comprehensive analysis of cancer-proteogenome to identify lung cancer patients, and thereby can be used as the basis biomarkers for the early diagnosis and prognosis of cancer. Proteomes. for further investigation towards elucidating its biological 2017;5(4):28. 5. Oshlack A, Robinson MD, Young MD. From RNA-seq reads to diferential signifcance and clinical applications. expression results. Genome Biol. 2010;11:220. 6. Cancer Genome Atlas Research N. Comprehensive genomic characteriza- Conclusions tion of squamous cell lung cancers. Nature. 2012;489:519–25. 7. Cancer Genome Atlas Research N. Comprehensive molecular profling of Trough mining existing TCGA datasets for novel fac- lung adenocarcinoma. Nature. 2014;511:543–50. tors, this study identifed and validated a largely unknown 8. Han SS, Kim WJ, Hong Y, Hong SH, Lee SJ, Ryu DR, Lee W, Cho YH, Lee S, lncRNA CTB-193M12.5 as a promising prognostic factor Ryu YJ, et al. RNA sequencing identifes novel markers of non-small cell lung cancer. Lung Cancer. 2014;84:229–35. for lung adenocarcinoma. 9. Wang L, Zhan C, Zhang Y, Ma J, Xi J, Jiang W, Shi Y, Wang Q. Quantifying the expression of tumor marker genes in lung squamous cell cancer with Additional fle RNA sequencing. J Thorac Dis. 2014;6:1380–7. 10. Yang G, Lu X, Yuan L. LncRNA: a link between RNA and cancer. Biochim Additional fle 1: Table S1. CTB-193M12.5 target gene predicted by Biophys Acta. 2014;1839:1097–109. RIblast. 11. Xie W, Yuan S, Sun Z, Li Y. Long noncoding and circular RNAs in lung cancer: advances and perspectives. Epigenomics. 2016;8:1275–87. Wang et al. Cancer Cell Int (2018) 18:27 Page 16 of 16

12. Tian F, Zhao J, Fan X, Kang Z. Weighted gene co-expression network 30. Wu T, Yin X, Zhou Y, Wang Z, Shen S, Qiu Y, Sun R, Zhao Z. Roles of non- analysis in identifcation of metastasis-related genes of lung squamous coding RNAs in metastasis of nonsmall cell lung cancer: A mini review. J cell carcinoma based on the Cancer Genome Atlas database. J Thorac Dis. Cancer Res Ther. 2015;11(Suppl 1):C7–10. 2017;9:42–53. 31. Yang Y, Han L, Yuan Y, Li J, Hei N, Liang H. Gene co-expression network 13. Giulietti M, Occhipinti G, Principato G, Piva F. Identifcation of candidate analysis reveals common system-level properties of prognostic genes miRNA biomarkers for pancreatic ductal adenocarcinoma by weighted across cancer types. Nat Commun. 2014;5:3231. gene co-expression network analysis. Cell Oncol (Dordr). 2017;40:181–92. 32. Bohmdorfer G, Wierzbicki AT. Control of chromatin structure by long 14. DiLeo MV, Strahan GD, den Bakker M, Hoekenga OA. Weighted correlation noncoding RNA. Trends Cell Biol. 2015;25:623–32. network analysis (WGCNA) applied to the tomato fruit metabolome. PLoS 33. Ashouri A, Sayin VI. Pan-cancer transcriptomic analysis associates long ONE. 2011;6:e26683. non-coding RNAs with key mutational driver events. 2016;7:13197. 15. Tang RX, Chen WJ, He RQ, Zeng JH, Liang L, Li SK, Ma J, Luo DZ, Chen G. 34. Montes M, Nielsen MM, Maglieri G, Jacobsen A, Hojfeldt J, Agrawal-Singh Identifcation of a RNA-Seq based prognostic signature with fve lncRNAs S, Hansen K, Helin K. The lncRNA MIR31HG regulates p16(INK4A) expres- for lung squamous cell carcinoma. Oncotarget. 2017;8:50761–73. sion to modulate senescence. Nat Commun. 2015;6:6967. 16. Kaczkowski B, Tanaka Y, Kawaji H, Sandelin A, Andersson R, Itoh M, Lass- 35. Wang L, He Y, Liu W, Bai S, Xiao L, Zhang J, Dhanasekaran SM, Wang Z, mann T, Hayashizaki Y, Carninci P, Forrest AR, Consortium F. Transcriptome Kalyana-Sundaram S, Balbin OA, et al. Non-coding RNA LINC00857 is analysis of recurrently deregulated genes across multiple cancers identi- predictive of poor patient survival and promotes tumor progression via fes new pan-cancer biomarkers. Cancer Res. 2016;76:216–26. cell cycle regulation in lung cancer. Oncotarget. 2016;7:11487–99. 17. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation 36. Chen WM, Huang MD, Sun DP, Kong R, Xu TP, Xia R, Zhang EB, Shu YQ. network analysis. BMC Bioinformatics. 2008;9:559. Long intergenic non-coding RNA 00152 promotes tumor cell cycle 18. Tang Z, Li C, Kang B, Gao G, Li C, Zhang Z. GEPIA: a web server for cancer progression by binding to EZH2 and repressing p15 and p21 in gastric and normal gene expression profling and interactive analyses. Nucleic cancer. Oncotarget. 2016;7:9773–87. Acids Res. 2017. https://doi.org/10.1093/nar/gkx247. 37. Chen WJ, Tang RX, He RQ, Li DY, Liang L, Zeng JH, Hu XH, Ma J, Li SK, Chen 19. Gyorfy B, Surowiak P, Budczies J, Lanczky A. Online survival analysis soft- G. Clinical roles of the aberrantly expressed lncRNAs in lung squamous ware to assess the prognostic value of biomarkers using transcriptomic cell carcinoma: a study based on RNA-sequencing and microarray data data in non-small-cell lung cancer. PLoS ONE. 2013;8:e82241. mining. Oncotarget. 2017;8:61282–304. 20. Fukunaga T, Hamada M. RIblast: an ultrafast RNA-RNA interaction predic- 38. Li J, Casteels T, Frogne T, Ingvorsen C, Honore C, Courtney M, Huber tion system based on a seed-and-extension approach. Bioinformatics. KV, Schmitner N, Kimmel RA, Romanov RA, et al. Artemisinins tar- 2017;33:2666–74. get GABAA receptor signaling and impair alpha cell identity. Cell. 21. Popadin K, Gutierrez-Arcelus M, Dermitzakis ET, Antonarakis SE. Genetic 2017;168(86–100):e115. and epigenetic regulation of human lincRNA gene expression. Am J Hum 39. Chen J, Wang R, Zhang K, Chen LB. Long non-coding RNAs in non-small Genet. 2013;93:1015–26. cell lung cancer as biomarkers and therapeutic targets. J Cell Mol Med. 22. Cabili MN, Trapnell C, Gof L, Koziol M, Tazon-Vega B, Regev A, Rinn 2014;18:2425–36. JL. Integrative annotation of human large intergenic noncoding 40. Weber DG, Johnen G, Casjens S, Bryk O, Pesch B, Jockel KH, Kollmeier J, RNAs reveals global properties and specifc subclasses. Genes Dev. Bruning T. Evaluation of long noncoding RNA MALAT1 as a candidate 2011;25:1915–27. blood-based biomarker for the diagnosis of non-small cell lung cancer. 23. Yu G, Wang LG, Han Y, He QY. clusterProfler: an R package for comparing BMC Res Notes. 2013;6:518. biological themes among gene clusters. OMICS. 2012;16:284–7. 41. Yang Y, Li H, Hou S, Hu B, Liu J, Wang J. The noncoding RNA expression 24. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. profle and the efect of lncRNA AK126698 on cisplatin resistance in non- 2011;144:646–74. small-cell lung cancer cell. PLoS ONE. 2013;8:e65309. 25. Robles AI, Linke SP, Harris CC. The p53 network in lung carcinogenesis. 42. Gao L, Mai A, Li X, Lai Y, Zheng J, Yang Q, Wu J, Nan A, Ye S, Jiang Y. Oncogene. 2002;21:6898–907. LncRNA-DQ786227-mediated cell malignant transformation induced by 26. Dufy MJ, Synnott NC, McGowan PM, Crown J, O’Connor D, Gallagher benzo(a)pyrene. Toxicol Lett. 2013;223:205–10. WM. p53 as a target for the treatment of cancer. Cancer Treat Rev. 43. Sakai Y, Ohbayashi C, Yanagita E, Jimbo N, Kajimoto K, Sakuma T, Hirose 2014;40:1153–60. T, Yoshimura M, Maniwa Y, Itoh T. PRR11 immunoreactivity is a weak 27. Syrjanen KJ. HPV infections and lung cancer. J Clin Pathol. prognostic factor in non-mucinous invasive adenocarcinoma of the lung. 2002;55:885–91. Pathologica. 2017;109:133–9. 28. de Freitas AC, Gurgel AP, de Lima EG, Sao Marcos BD, do Amaral CM. 44. Reddy JP, Li Y. Oncogene-induced senescence and its role in tumor sup- Human papillomavirus and lung cancinogenesis: an overview. J Cancer pression. J Mammary Gland Biol Neoplasia. 2011;16:247–56. Res Clin Oncol. 2016;142:2415–27. 45. Li JR, Sun CH, Li W, Chao RF, Huang CC, Zhou XJ, Liu CC. Cancer RNA-Seq 29. Fajas L. Metabolic control in cancer cells. Ann Endocrinol (Paris). Nexus: a database of phenotype-specifc transcriptome profling in 2013;74:71–3. cancer cells. Nucleic Acids Res. 2016;44:D944–51.

Submit your next manuscript to BioMed Central and we will help you at every step: • We accept pre-submission inquiries • Our selector tool helps you to find the most relevant journal • We provide round the clock customer support • Convenient online submission • Thorough peer review • Inclusion in PubMed and all major indexing services • Maximum visibility for your research

Submit your manuscript at www.biomedcentral.com/submit