<<

Cancer Therapy (2014) 21, 74–82 & 2014 Nature America, Inc. All rights reserved 0929-1903/14 www.nature.com/cgt

ORIGINAL ARTICLE Abnormal gene expression and gene fusion in lung adenocarcinoma with high-throughput RNA sequencing

Z-H Yang, R Zheng, Y Gao, Q Zhang and H Zhang

To explore the universal law of the abnormal gene expression and the structural variation of related to lung adenocarcinoma, the gene expression profile of GSE37765 were downloaded from Gene Expression Omnibus database. The differentially expressed genes (DEGs) were analyzed with t-test and NOISeq tool, and the core DEGs were screened out by combining with another RNA-seq data containing totally 77 pairs of samples in 77 patients with lung adenocarcinoma. Moreover, the functional annotation of the core DEGs was performed by using the Database for Annotation Visualization and Integrated Discovery following selection of oncogene and tumor suppressor by combining with tumor suppressor genes and Cancer Genes database, and motif-finding of core DEGs was performed with motif-finding algorithm Seqpos. We also used Tophat-fusion tool to further explore the fusion genes. In total, 850 downregulated DEGs and 206 upregulated DEGs were screened out in lung adenocarcinoma tissues. Next, we selected 543 core DEGs, including 401 downregulated and 142 upregulated genes, and vasculature development (P ¼ 1.89E À 06) was significantly enriched among downregulated core genes, as well as mitosis (P ¼ 6.26E À 04) enriched among upregulated core genes. On the basis of the cellular localization analysis of core genes, wnt-1-induced secreted 1 (WISP1) and receptor (G protein-coupled) activity modifying protein 1 (RAMP1) identified mainly located in extracellular region and extracellular space. We also screened one oncogene, v- avian myeloblastosis viral oncogene homolog-like 2 (MYBL2). Moreover, GATA2 was mined by motif-finding analysis. Finally, four fusion genes belonged to the human leukocyte antigen (HLA) family. WISP1, RAMP1, MYBL2 and GATA2 could be potential targets of treatment for lung adenocarcinoma and the fusion of HLA family genes might have important roles in lung adenocarcinoma.

Cancer Gene Therapy (2014) 21, 74–82; doi:10.1038/cgt.2013.86; published online 7 February 2014 Keywords: lung adenocarcinoma; high-throughput mRNA sequencing; tumor suppressor genes; oncogenes; fusion gene

INTRODUCTION confirmed in the development of lung adenocarcinomas and Lung cancer is the most leading cause of cancer deaths in both the abnormalities related to carcinogenesis need urgent research men and women worldwide,1,2 and also is the leading cause of in detail. cancer-related death in the United States.3 Approximately 1.2 Recently, microarray analysis has been considered as the million new cases are diagnosed each year and their prognoses most comprehensive method to detect gene expression and 9 are poor.4 Despite advances in treatment, such as combination find significant advances in lung adenocarcinoma. However, chemotherapy and chemoradiation, survival rate has improved microarray analysis also has several limitations, including very little over the past few decades.5 Lung adenocarcinoma is the probe hybridization kinetics, probe selection and background most common form of lung cancer and has an average 5-year hybridization, which may limit the ability to accurately estimate 10 survival rate of 15%, mainly because of late-stage detection and a low-level transcripts and cross-platform comparability. Although paucity of late-stage treatments.6 transcriptome sequencing has been shown to be comparable to 11 For lung adenocarcinoma, thereRETRACTED were some studies focused on microarrays, it has potential advantages, such as a larger the analysis of potential mechanism up to now. For example, dynamic range, the ability to detect all expressed transcripts Beer et al. reported that a set of genes were screened out by as a function of depth of read coverage and the ability to 12,13 using microarray analysis and they could predict survival in early- detect the structure of transcripts. Besides, transcriptome stage lung adenocarcinoma, such as vascular endothelial growth sequencing analysis has been usually used to investigate 14 15 factor, cystatin C epidermal 2 erbB2 and genetic variation, transcription factor binding sites and DNA 16 oncogene crk and so on.7 In addition, frequently mutated genes, methylation. including tyrosine kinases, multiple receptor genes EPHA3 In this study, we used high-throughput mRNA sequencing and receptor homologue ERBB4, were (RNA-Seq) to characterize the differences and similarities of detected in human lung adenocarcinomas.6 Moreover, the over- transcriptome expression in patients with lung adenocarcinoma expression of antioxidant enzyme AOE372, ATP synthase subunit by comparing with healthy controls, to determine the universal d and other were identified with two-dimensional poly- law of the abnormal genes expression in lung adenocarcinoma acrylamide gel electrophoresis and mass spectrometry in lung tissues. In addition, based on the high resolution of RNA-Seq adenocarcinoma.8 However, the specific mechanism is not database, we also detected the fusion gene in each lung

Department of Respiratory Medicine, Shengjing Hospital of China Medical University, Shenyang, China. Correspondence: Dr R Zheng, Department of Respiratory Medicine, Shengjing Hospital of China Medical University, No. 36 Sanhao Street, Heping District, Shenyang 110004, China. E-mail: [email protected] Received 23 August 2013; revised 10 December 2013; accepted 21 December 2013; published online 7 February 2014 Lung adenocarcinoma by RNA-seq Z-H Yang et al 75 adenocarcinoma sample, attempting to explore the structural which differentially expressed in the lung adenocarcinoma samples; the variation of genes related to lung adenocarcinoma. thresholds of significance were set as Po0.01 and fold change not o2. Second, the NOISeq, which is a novel nonparametric approach for the identification of differentially expressed genes (DEGs),21 was used to detect MATERIALS AN METHODS the differentially expressed transcripts in single lung adenocarcinoma sample and normal lung sample, and the NOISeq q-valueX0.8 was Sample data source demanded in each differentially expressed transcript, as well as the fold The set of gene expression profile GSE37765 (ref. 17) was downloaded change among samples also not o2. from the public functional genomics database Gene Expression Omnibus To further explore the core DEGs among the DEGs in lung adenocarci- (http://www.ncbi.nlm.nih.gov/geo/). The pairs of samples were designed noma, we combined another RNA-seq data,22 which contained totally 77 and each pair of lung adenocarcinoma sample and normal lung sample was pairs of samples in 77 patients with lung adenocarcinoma. In addition, the obtained from primary lung adenocarcinoma tumor and adjacent specifically and highly expressed factors among the core DEGs of lung noncancerous lung tissue of each patient, respectively. All patients were adenocarcinoma were screened out with cellular localization analysis. In diagnosed with lung cancers for the first time. None of the patients addition, in order to research the possibility of these highly expressed had distant metastasis or family history of lung cancer. Next, six lung factors as the marker of identification the occurrence of lung adeno- adenocarcinoma samples (GSM927309, GSM927311, GSM927313, carcinoma, Gene Pattern database was combined to observe the expression GSM927315, GSM927317 and GSM927319) and six normal lung samples levels of these factors in different type of the normal tissues. (GSM927308, GSM927310, GSM927312, GSM927314, GSM927316 and GSM927318) were selected from this set of data. The Illumina Genome Functional annotation of the DEGs Analyzer IIx platform (Illumina, San Diego, CA) was used for sequencing DAVID (Database for Annotation, Visualization and Integrated Discover)23 with paired-end reads sequencing method. was chosen to analyze the function of the DEGs based on the default in DAVID (Count ¼ 2, EASE ¼ 0.1), including three aspects: biological process, Comparing the RNA-seq data and calculating the expression molecular function and cellular component. Next, the DEGs, which values of genes possessed the function of transcriptional regulation, were screened out The comparison of reads was performed with Tophat software18 based on and marked. Finally, the known oncogenes and tumor suppressor genes among the DEGs were selected combining with the relevant cancer hg19 reference sequences RNA-Seq downloaded from the UCSC Genome 24 25 Browser (http://genome.ucsc.edu). During the comparison, the unique database: tumor suppressor genes and cancer genes. result of reads and o2 base mismatch were required, and other parameters were set with the default setting of the Tophat software. Predicting the upstream regulatory elements of the DEGs After finishing the comparison, the transcriptome of each sample was The DEGs we obtained before were divided into upregulated and assembled and the expression values of genes were calculated with downregulated genes. Here we defined the region within 1 kb upstream Cufflinks and Cuffdiff tools,19 according to the combination with the transcription start site of the gene as the promoter region. Next, motif information of reference sequence gene annotation. In addition, the result finding was performed in the promoter region of the upregulated and of expression was built on the fragments per kilobase of transcript per downregulated genes with a new motif-finding algorithm Seqpos,26 to million fragments mapped method.20 predict the transcription factor with the regulatory function on the upregulated and downregulated DEGs, with the criterion of Po0.00001 in Analysis of the differentially expressed genes motif-finding analysis of each promoter. We used two methods, t-test and NOISeq, to explore the differential transcripts between the lung adenocarcinoma samples and normal lung Detecting fusion genes samples, respectively. First, all data of 12 samples were integrated, and Tophat-fusion is an algorithm designed to discover transcripts represent- paired t-test was applied to analyze the differentially expressed transcripts, ing fusion gene products.27 We used Tophat-fusion tool to further explore

Table 1. The functional analysis of significantly downregulated and upregulated genes in lung adenocarcinoma tissues

GO term Gene counts P-value

Downregulated genes BP GO:0001944Bvasculature development 41 1.07E À 12 BP GO:0007155Bcell adhesion 64 3.93E À 08 BP GO:0042127Bregulation of cell proliferation 69 5.26E À 08 BP GO:0030036Bactin cytoskeleton organization 30 2.06E À 07 BP GO:0016477Bcell migration 31 4.39E À 06 CC GO:0005886RETRACTEDBplasma membrane 249 5.13E À 12 CC GO:0044421Bextracellular region part 84 7.32E À 09 CC GO:0005912Badherens junction 24 5.27E À 07 MF GO:0005509Bcalcium ion binding 81 2.16E À 09 MF GO:0008092Bcytoskeletal protein binding 50 1.45E À 07 MF GO:0019838Bgrowth factor binding 15 1.97E À 04

Upregulated genes BP GO:0000278Bmitotic cell cycle 12 3.59E À 04 BP GO:0005996Bmonosaccharide metabolic process 8 0.003146273 BP GO:0006260BDNA replication 7 0.006124068 BP GO:0006793Bphosphorus metabolic process 15 0.041116248 BP GO:0006887Bexocytosis 4 0.077718839 CC GO:0005794BGolgi apparatus 18 0.002188962 CC GO:0044427Bchromosomal part 10 0.008479627 MF GO:0001882Bnucleoside binding 26 0.002333555 MF GO:0005524BATP binding 23 0.007159416 MF GO:0004672Bprotein kinase activity 12 0.015412651 Abbreviations: BP, biological process; CC, cellular component; GO, ; MF, molecular function.

& 2014 Nature America, Inc. Cancer Gene Therapy (2014), 74 – 82 Lung adenocarcinoma by RNA-seq Z-H Yang et al 76 the fusion genes among the DEGs, with the number of spanning reads, region (P ¼ 7.32E À 09, Table 1), whereas among the upregulated supporting mate pair and supporting mate pair with spanning reads all not genes in lung adenocarcinoma samples we found that upregu- o20 in the predicted fusions, as well as the number of contradicting reads lated genes were significantly clustered in the mitotic cell cycle 0. Besides, the predicted fusion genes in normal human tissue was seen as (P ¼ 3.59E À 04). For the location of these genes, Golgi apparatus the false-positive result, and the false-positive results in lung adenocar- (P ¼ 0.002188962) and chromosomal part (P ¼ 0.008479627) were cinoma samples were estimated as far as possible. For the fusion genes that existed on one , the 10-kb minimum distance was the most significant location. In addition, the function of the considered as a compromise that allows Tophat-fusion to detect upregulated genes mainly focused on the nucleoside binding intrachromosomal rearrangements; inversions might be also included in (P ¼ 0.002333555) and ATP binding (P ¼ 0.007159416, Table 1). intrachromosomal fusions. Selection of core DEGs and functional analysis RESULTS Combining with another RNA-seq data,22 totally 77 pairs of Screening of DEGs in lung adenocarcinoma samples samples in 77 patients with lung adenocarcinoma were included. In total, 1169 differentially expressed transcripts were identified in In total, 401 genes were downregulated in the lung adenoca- lung adenocarcinoma samples by paired designing in 12 samples rcinoma tissues obviously; meanwhile, another 142 genes were of the two groups, containing 6 lung adenocarcinoma samples significantly activated in lung adenocarcinoma tissues. These 543 and 6 normal lung samples; there were 951 downregulated genes, which were differentially expressed both in lung adenocar- transcripts and 218 upregulated transcripts, which corres- cinoma tissues and other lung adenocarcinoma samples, were ponded to 850 downregulated DEGs and 206 upregulated DEGs, defined as core DEGs associated with lung adenocarcinoma. respectively. Through functional enrichment analysis, the downregulated core genes were also enriched in vasculature development, cell adhesion and regulation of cell proliferation. Besides, the new Function annotation of the DEGs biological process were enriched, such as response to wounding Gene Ontology functional enrichment analysis was performed for (P ¼ 1.17E À 04) and enzyme-linked receptor protein signaling all DEGs. The result showed that the downregulated genes were pathway (P ¼ 3.03E À 04, Table 2), whereas among upregulated enriched in the vasculature development (P ¼ 1.07E À 12), cell core genes mitosis was significantly enriched in the biological adhesion (P ¼ 3.93E À 08) and cell proliferation (P ¼ 5.26E À 08). process (P ¼ 6.26E À 04). For the molecular function, these genes For the cellular component, there was 35% downregulated genes were also associated with nucleoside binding (P ¼ 1.81E À 01) and located in the plasma membrane (P ¼ 5.13E À 12) and extracellular ATP binding (P ¼ 4.18E À 01).

Table 2. The functional enrichment of downregulated and upregulated core genes

GO term Gene counts P-value FDR Benjamini-adjusted P-value

Downregulated core genes BP GO:0001944Bvasculature development 25 9.52E À 10 1.63E À 06 1.89E À 06 BP GO:0007155Bcell adhesion 39 1.31E À 07 2.26E À 04 5.20E À 05 BP GO:0009611Bresponse to wounding 32 4.12E À 07 7.08E À 04 1.17E À 04 BP GO:0042127Bregulation of cell proliferation 40 9.03E À 07 0.001551043 2.24E À 04 BP GO:0007167Benzyme-linked receptor protein signaling pathway 24 1.38E À 06 0.002365966 3.03E À 04 CC GO:0005886Bplasma membrane 146 6.50E À 12 8.51E 1.72E À 09 CC GO:0044421Bextracellular region part 100 7.32E À 09 2.04E 2.04E À 06 MF GO:0005509Bcalcium ion binding 46 4.63E À 07 6.71E À 04 2.41E À 04 MF GO:0019838Bgrowth factor binding 12 2.21E À 05 0.031991201 5.71E À 03

Upregulated core genes BP GO:0007067Bmitosis 11 1.94E À 06 0.002893052 6.26E À 04 CC GO:0005794BGolgiRETRACTED apparatus 15 0.00172056 2.049064829 2.36E À 01 CC GO:0044427Bchromosomal part 8 0.014241662 15.84024702 3.61E À 01 MF GO:0001882Bnucleoside binding 15 0.002786757 3.474708664 1.81E À 01 MF GO:0005524BATP binding 18 0.012578033 14.82046705 4.18E À 01 MF GO:0004672Bprotein kinase activity 10 0.016176 18.6715761 3.93E À 01 Abbreviations: BP, biological process; CC, cellular component; FDR, false discovery rate; GO, Gene Ontology; MF, molecular function.

Table 3. Cellular localization and annotation of specifically and highly expressed factors

GO term Gene counts Genes

CC GO:0005615Bextracellular space 4 PRDX4, TGFA, RAMP1, WFDC2 CC GO:0044421Bextracellular region part 5 SPINT2, PRDX4, TGFA, RAMP1, WFDC2 CC GO:0005576Bextracellular region 8 WISP1, SPINT2, CST4, MUC20, PRDX4, TGFA, RAMP1, WFDC2 CC GO:0031012Bextracellular matrix 1 SPINT2 Abbreviation: CC, cellular component.

Cancer Gene Therapy (2014), 74 – 82 & 2014 Nature America, Inc. Lung adenocarcinoma by RNA-seq Z-H Yang et al 77 In addition, totally eight specifically and highly expressed CST4 (cystatin S), MUC20 ( 20), PRDX4 (peroxiredoxin 4), factors of lung adenocarcinoma were found with cytological RAMP1 (receptor (G protein-coupled) activity modifying protein 1), position analysis, including WFDC2 (WAP four-disulfide core SPINT2 (serine protease inhibitor Kunitz-type 2) and transforming domain protein 2), WISP1 (Wnt-1-induced secreted protein 1), growth factor-a (Table 3). The comparative result of expression

Figure 1. Heat map of the differentially expressed genes in 83 patients with lung adenocarcinoma. Red color represents the genes expression corresponding to 6 sets of primary data, blue color represents the gene expression of the additional 77 sets of data. Red arrow displayed the upregulated core genes and green arrow displayed the downregulated core genes. RETRACTED

Figure 2. Comparing the expression level of serine protease inhibitor Kunitz-type 2 (SPINT2), wnt-1-induced secreted protein 1 (WISP1) and receptor (G protein-coupled) activity modifying protein 1 (RAMP1) in normal tissues of various types. (a) Expression level of SPINT2; (b) the expression level of WISP1; (c) the expression level of RAMP1.

& 2014 Nature America, Inc. Cancer Gene Therapy (2014), 74 – 82 Lung adenocarcinoma by RNA-seq Z-H Yang et al 78 Table 4. The known tumor suppressor genes and oncogenes in abnormally expressed core genes

Putative Tumor suppressor Putative Oncogene tumor oncogene suppressor counts counts

Down 33 AKAP12, BMP2, CAV1, CAV2, CBFA2T3, CDH13, CDH19, CMTM5, 13 CXCL3, EGR1, ERG, FOSB, GAS6, ID1, CNTN6, CSRNP1, CYGB, DAB2IP, DLC1, FBLN1, HIC1, HYAL1, ID4, KL, JUNB, JUND, MAFF, PDGFB, RAB40A, KLF10, , LATS2, NDRG2, NR4A1, NR4A3, PECAM1, SASH1, RECK, TIMP3 SEMA3B, SLIT2, SPN, STARD8, TENC1, THSD1, TNS1 Up 6 AURKB, ESRP1, LRIG3, SPINT2, UHRF1, XDH 3 MUC20, MYBL2, NME1

Table 5. The TFs in abnormally expressed core genes

TF TF Oncogene TF Tumor counts suppresor TF

Down 26 EGR1, ERG, FOSB, FOXF1, FOXF2, GATA2, HIC1, HOXA4, ID1, ID2, JUNB, JUND, KLF13, KLF2, EGR1, ERG, FOSB, KLF4, KLF4, LDB2, MAFF, NR4A1, NR4A2, NR4A3, SOX18, TBX4, TCF15, TCF21, TEF, TWIST2 JUNB, JUND, MAFF NR4A1, NR4A3 Up 1 MYBL2 MYBL2

levels of these specifically and highly expressed factors showed Prediction of the upstream regulatory elements for these DEGs that SPINT2 highly expressed in the colon, hypothalamus, The motif finding of transcription factor in promoter region of the kidney, lung and testis; SPINT2 could not be an independent core genes showed that three transcription factors were predicted candidate of blood test indicators in lung adenocarcinoma in upregulated genes and eight transcription factors were in (Figure 1 and Figure 2). It is worth noting that the expression downregulated core genes (Table 6). Combining with the levels of WISP1, mucin 20, RAMP1 were very low in normal lung analyzing result of gene expression profile, transcription factor tissue, and WISP1 and RAMP1 were scarcely expressed. Hence, we GATA2 was differentially expressed in lung adenocarcinoma thought that WISP1 and RAMP1 could be candidates of biopsy tissues. Besides, statistics result indicated that the ratio of GATA2 indicator to identify normal lung tissue and lung adenocarcinoma with abnormal expression in cancer tissues was 67.4% (56/83) of tissue. all included samples, whereas the result of motif finding in core downregulated genes suggested that a known tumor suppressor gene A5 (HOXA5) was identified, and its motif feature Tumor suppressor genes and oncogenes in the core DEGs was obviously enriched in these core downregulated genes. To detect the genes related to cancer diseases among core However, there was no other transcription factors differentially DEGs, the Cancer Gene database was used to find the ratio of expressed in lung adenocarcinoma tissues. known tumor suppressor and oncogene in the differentially upregulated and downregulated genes, respectively. Among the downregulated genes, totally 33 tumor suppressor genes Detection of fusion genes and 13 oncogenes were detected (Table 4), and the ratio to the Gene fusion is a widespread phenomenon in tumor tissues and downregulated core genes was 8.2% (33/401) and 3.2% (13/401), the recombination of exon fragments of different genes could respectively, whereas among the upregulated genes the ratio result in dysfunction.28 Tophat-fusion tool was used to detect the of tumor suppressor genes and oncogenes were 4.2% (6/142) fusion genes in all RNA-seq database of six lung adenocarcinoma and 2.1% (3/142), respectively.RETRACTED There was no significant differ- samples. In total, seven fusion gene loci were defined and shown ence in tumor suppressor genes and oncogenes between in Table 7; these seven fusion loci distributed only in three upregulated and downregulated genes (Fisher’s exact test, patients and there were four corresponding fusion genes P ¼ 0.7097), no tumor suppressor genes significantly enriched belonging to the human leukocyte antigen (HLA) family by in downregulated genes and no oncogenes obviously enriched combining with the functions of the fusion region. The HLA in upregulated genes. This result suggested that tumor sup- members, including HLA-A, HLA-B, HLA-C, HLA-G, HLA-DRB5, pressor genes and oncogenes had no specific pattern, and HLADRB6 and HAL-DQB1, were involved in gene fusion. In the abnormal expression of the genes in lung adenocarcinoma addition, gene fusion of short fragments was found both in tissues was possibly associated with the dysfunction of individual LA_4 and LA_5 patients and both pointed to collagen-a1 type 1 genes. (COL1A1), whose product was collagen, which was referred to Furthermore, the core DEGs, which possessed the function of skeleton of extracellular matrix. In addition, the fusion of FCGR2A transcription factor, were screened out by combining with (Fc fragment of IgG low-affinity IIa receptor)–FCGR2C only existed transcription factor database. Finally, 26 transcription factors were in tumor tissues of the LA_5 patient, and FCGR2A and FCGR2C found in downregulated genes, but only one genes v-myb avian belonged to immunoglobulin Fc receptor family, which mainly myeloblastosis viral oncogene homolog-like 2 (MYBL2) possessed participated in phagocytosis and clearing of immune complexes. transcriptional regulatory function in 143 upregulated genes Binding with the information of fusion genes in the Catalogue of (Table 5). MYBL2 was also a known oncogene, and the patients Somatic Mutations in Cancer database,29 only the fusion records with differentially upregulated expression of MYBL2 in cancer of COL1A1-USP6 (ubiquitin-specific peptidase 6) and COL1A1- tissues were 51.8% (43/83) of all included patients. PDGFB (platelet-derived growth factor b-polypeptide) were found,

Cancer Gene Therapy (2014), 74 – 82 & 2014 Nature America, Inc. Lung adenocarcinoma by RNA-seq Z-H Yang et al 79 Table 6. Predicting the binding sites of TFs in promoter region of upregulated core genes and downregulated core genes in lung adenocarcinoma tissues

TF Motif Hits Function and annotation

Upregulated core genes ITGAL 86

GATA2 62 Upregulated in LA

ZFP36L1 48

Downregulated core genes HOXA5 684 Tumor suppressor

IRF7 RETRACTED641

MYEF2 474

& 2014 Nature America, Inc. Cancer Gene Therapy (2014), 74 – 82 Lung adenocarcinoma by RNA-seq Z-H Yang et al 80 Table 6. (Continued )

TF Motif Hits Function and annotation

STAT4 383

STAT4 288

RUNX1 161

NR1H4 149

NR1I2 119 RETRACTED

DMRT3 117

Abbreviation: TF, transcription factor.

Cancer Gene Therapy (2014), 74 – 82 & 2014 Nature America, Inc. Lung adenocarcinoma by RNA-seq Z-H Yang et al 81 Table 7. The fusion genes in lung adenocarcinoma samples

Donor genes Acceptor genes Chromosome Sample ID Distance (bp) Function

COL1A1 COL1A1 Chr17–Chr17 LA_4 89 Group I collagen COL1A1 COL1A1 Chr17–Chr17 LA_5 37 Group I collagen FCGR2A FCGR2C Chr1–Chr1 LA_5 81656 Immunoglobulin Fc receptor HLA-G HLA-C Chr6–Chr6 LA_4 1443034 Human leukocyte antigen HLA-A HLA-B Chr6–Chr6 LA_4 1413821 Human leukocyte antigen HLA-DRB5 HLA-DRB6 Chr6–Chr6 LA_5 81856 Human leukocyte antigen HLA-DQB1 — Chr6–Chr6 LA_6 96211 Human leukocyte antigen

and integration of COL1A1 itself leading to exon fragment missing whereas in lung adenocarcinoma downregulated GATA2 led to was not observed. the abnormal activation of downstream genes, further to induce Moreover, the functional analysis of fusion genes was the relevant promotion of the genes in cell cycle and enhance the performed and the products of all fusion genes were located on cell proliferation. In previous studies, GATA2 loss dramatically the cell membrane. Meanwhile, all genes were involved in cellular reduced tumor development in a Kras-driven nonsmall cell lung immune responses except for COL1A1. This result showed that the cancer mouse model,39 as well as in prostate cancer.40 In addition, antigenic characteristics of lung adenocarcinoma cell surface loss of expression or overexpression of GATA was associated with changed and prevented lung adenocarcinoma cells from dis- a variety of cancers in humans, such as breast cancer and covering by organism immune mechanism, to further complete gastrointestinal cancers.41 Hence, GATA2 could be a new therapy the cell proliferation of lung adenocarcinoma. target for lung adenocarcinoma. For HOXA5, it was downregulated among core genes, while previous research indicated that HOXA5 was upregulated in primary tumors.42 Besides, expression levels of HOXA5 in squamous cell carcinoma tissues and adenocarcinoma DISCUSSION tissues were also significantly higher than in the non-cancerous In this research, we screened out the DEGs and core DEGs in the tissues.43 Moreover, no significant decrease of HOXA5 expression lung adenocarcinoma tissues by comparing with control samples occurred during lung development and HOXA5 remained highly and performed the functional analysis. The downregulated DEGs expressed in the lungs of newborn animals.44 Hence, the potential were closely related to vasculature development, cell adhesion effects of abnormal expression of HOXA5 in lung adenocarcinoma and cell proliferation, whereas the upregulated DEGs significantly were uncertain. involved in regulation and activation of mitosis. This suggested Moreover, the detection of fusion genes revealed that four that cell mitosis was inhibited in lung adenocarcinoma, while cell fusion genes were identified in the HLA family. In previous studies, proliferation and vasculature development were positively related HLA-G expression in lung cancer was supposed to be one of the to lung adenocarcinoma. ways how the tumor downregulates host immune response.45 Furthermore, 543 core DEGs were screened out by combining Moreover, HLA class I antigen defects were frequently present in with another RNA-seq data, and their potential functions were head and neck squamous cell carcinoma cells may provide the annotated. After cytological position analysis, eight specifically tumor with an escape mechanism from immune surveillance.46 high-expression genes were screened out, such as WISP1 and Thus, we thought the fusion of HLA family might lead to the RAMP1 hardly expressed in normal tissue, but both were specially abnormal characteristics of HLA proteins, to further trigger the expressed factors in lung adenocarcinoma tissue. In fact, WISP1 is defects of immune mechanism. a cysteine-rich secreted factor belonging to the CCN family and it In conclusion, based on the DEGs and core genes, we obtained could participate in the inhibition of metastasis30 and in the 31 using RNA-seq data analysis; some DEGs were identified posses- development and progression of primary lung cancers. It was sing the crucial functions on the development of lung adeno- also essential for the metastatic process and specifically for carcinoma, such as WISP1, RAMP1, MYBL2 and GATA2. These pulmonary metastases.32 For RAMP1, its expression was increased 33 genes could be potential targets for treating lung adenocarcinoma in cancer cells, and the role of RAMP1 was reported to be a and the fusion of HLA family genes might have important roles in promotor in prostate tumorigenesis and as a novel biomarker, and 34 lung adenocarcinoma. However, these results need to be further possible therapeutic target in prostateRETRACTED cancer. Thus, WISP1 and confirmed with more experiments. RAMP1 could be the potential candidates of biopsy indexes and potential therapy targets of lung adenocarcinoma. In addition, in this research, through analyzing the function of core genes, MYBL2 was not only an oncogene, but also had CONFLICT OF INTEREST transcription regulatory function, and its expression was upregu- The authors declare no conflict of interest. lated in lung adenocarcinoma tissues. As we all know, MYBL2, a member of g-MYB-related oncogenes, was independently ampli- fied in a small fraction of breast cancers.35,36 It was identified in REFERENCES 37,38 the cell cycle pathway in lung cancer. Thus, MYBL2 might 1 Parkin DM, Bray F, Ferlay J, Pisani P. Estimating the world cancer burden: Glo- associate with the cell proliferation in lung adenocarcinoma, and bocan 2000. Int J Cancer 2001; 94: 153–156. could regulate the expression of relevant genes, whereas the 2 Parkin DM, Bray F, Ferlay J, Pisani P. Global cancer statistics, 2002. Cancer J Clin specific function was unclear. 2005; 55: 74–108. Besides, after motif-finding for the core DEGs, we found only 3 Molina JR, Yang P, Cassivi SD, Schild SE, Adjei AA. Non-small cell lung cancer: GATA2 and HOXA5 differentially expressed in lung adenocarci- epidemiology, risk factors, treatment, and survivorship. Mayo Clin Proc 2008; 83: 584–594. noma tissues among these transcription factors. We supposed that 4 Juergens RA, Brahmer JR. Adjuvant treatment in non-small cell lung cancer: where GATA2 was a transcription inhibitor, as GATA2 was upregulated in are we now? J Natl Compr Canc Netw 2006; 4: 595–600. normal state, leading to inhibit the transcription of downstream 5 Schiller JH. Current standards of care in small-cell and non-small-cell lung cancer. genes combining with the promoter region of target genes, Oncology 2001; 61(Suppl 1): 3–13.

& 2014 Nature America, Inc. Cancer Gene Therapy (2014), 74 – 82 Lung adenocarcinoma by RNA-seq Z-H Yang et al 82 6 Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K et al. Somatic 28 Cheung A, Deng W. Telomere dysfunction, genome instability and cancer. Front mutations affect key pathways in lung adenocarcinoma. Nature 2008; 455: Biosci 2008; 13: 2075–2090. 1069–1075. 29 Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D et al. COSMIC: mining 7 Beer DG, Kardia SL, Huang C-C, Giordano TJ, Levin AM, Misek DE et al. complete cancer genomes in the catalogue of somatic mutations in cancer. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nucleic Acids Res 2011; 39(suppl 1): D945–D950. Nat Med 2002; 8: 816–824. 30 Soon LL, Yie T-A, Shvarts A, Levine AJ, Su F, Tchou-Wong K-M. Overexpression of 8 Chen G, Gharib TG, Huang C-C, Thomas DG, Shedden KA, Taylor JM et al. Pro- WISP-1 down-regulated motility and invasion of lung cancer cells through inhi- teomic analysis of lung adenocarcinoma identification of a highly expressed set of bition of Rac activation. J Biol Chem 2003; 278: 11465–11470. proteins in tumors. Clin Cancer Res 2002; 8: 2298–2305. 31 Chen P-P, Li W-J, Wang Y, Zhao S, Li D-Y, Feng L-Y et al. Expression of Cyr61, 9 Jiang H, Deng Y, Chen H-S, Tao L, Sha Q, Chen J et al. Joint analysis of two CTGF, and WISP-1 correlates with clinical features of lung cancer. PLoS One 2007; microarray gene-expression data sets to select lung adenocarcinoma marker 2: e534. genes. BMC Bioinformatics 2004; 5:81. 32 Margalit O, Eisenbach L, Amariglio N, Kaminski N, Harmelin A, Pfeffer R et al. 10 Beane J, Vick J, Schembri F, Anderlind C, Gower A, Campbell J et al. Characterizing Overexpression of a set of genes, including WISP-1, common to pulmonary the impact of smoking and lung cancer on the airway transcriptome using metastases of both mouse D122 Lewis lung carcinoma and B16-F10. 9 melanoma RNA-Seq. Cancer Prev Res 2011; 4: 803–817. cell lines. Br J Cancer 2003; 89: 314–319. 11 Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. 33 Zudaire E, Martınez A, Cuttitta F. and cancer. Regul Peptides 2003; Nat Rev Genet 2009; 10: 57–63. 112: 175–183. 12 Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N et al. mRNA-Seq whole- 34 Logan M, Saab ST, Hameed O, Anderson PD, Abdulkadir SA. RAMP1 is a direct transcriptome analysis of a single cell. Nat Methods 2009; 6: 377–382. NKX3.1 target gene up-regulated in prostate cancer that promotes tumorigenesis. 13 Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of Am J Pathol 2013; 183: 951–963. technical reproducibility and comparison with gene expression arrays. Genome 35 Tanner MM, Grenman S, Koul A, Johannsson O, Meltzer P, Pejovic T et al. Frequent Res 2008; 18: 1509–1517. amplification of chromosomal region 20q12-q13 in ovarian cancer. Clin Cancer Res 14 Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF et al. Paired-end 2000; 6: 1833–1839. mapping reveals extensive structural variation in the . Science 36 Miller LD, Smeds J, George J, Vega VB, Vergara L, Ploner A et al. An expression 2007; 318: 420–426. signature for status in human breast cancer predicts mutation status, 15 Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G et al. Genome- transcriptional effects, and patient survival. Proc Natl Acad Sci USA 2005; 102: wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 13550–13555. 2007; 448: 553–560. 37 Hosgood HD, Menashe I, Shen M, Yeager M, Yuenger J, Rajaraman P et al. 16 Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD et al. Shotgun Pathway-based evaluation of 380 candidate genes and lung cancer suscepti- bisulphite sequencing of the Arabidopsis genome reveals DNA methylation pat- bility suggests the importance of the cell cycle pathway. Carcinogenesis 2008; 29: terning. Nature 2008; 452: 215–219. 1938–1943. 17 Kim SC, Jung Y, Park J, Cho S, Seo C, Kim J et al. A high-dimensional, deep- 38 Sarlomo-Rikala M, Andersson LC, Knuutila S, Miettinen M. DNA sequence copy sequencing study of lung adenocarcinoma in female never-smokers. PLoS One number changes in gastrointestinal stromal tumors: tumor progression and 2013; 8: e55596. prognostic significance. Cancer Res 2000; 60: 3899–3903. 18 Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA- 39 Kumar MS, Hancock DC, Molina-Arcas M, Steckel M, East P, Diefenbacher M et al. Seq. Bioinformatics 2009; 25: 1105–1111. The GATA2 transcriptional network is requisite for RAS oncogene-driven non- 19 Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ et al. small cell lung cancer. Cell 2012; 149: 642–655. Transcript assembly and quantification by RNA-Seq reveals unannotated tran- 40 Bo¨hm M, Locke W, Sutherland R, Kench J, Henshall S. A role for GATA-2 in tran- scripts and isoform switching during cell differentiation. Nat Biotechnol 2010; 28: sition to an aggressive phenotype in prostate cancer through modulation of key 511–515. androgen-regulated genes. Oncogene 2009; 28: 3847–3856. 20 Li C, Feng W, Qiu L, Xia C, Su X, Jin C et al. Characterization of skin ulceration 41 Zheng R, Blobel GA. GATA transcription factors and cancer. Genes Cancer 2010; 1: syndrome associated microRNAs in sea cucumber Apostichopus japonicus by deep 1178–1188. sequencing. Fish Shellfish Immunol 2012; 33: 436–441. 42 Plowright L, Harrington K, Pandha H, Morgan R. HOX transcription factors are 21 Tarazona S, Garcı´a-Alcalde F, Dopazo J, Ferrer A, Conesa A. Differential expression potential therapeutic targets in non-small-cell lung cancer (targeting HOX genes in RNA-seq: a matter of depth. Genome Res 2011; 21: 2213–2223. in lung cancer). Br J Cancer 2009; 100: 470–475. 22 Seo J-S, Ju YS, Lee W-C, Shin J-Y, Lee JK, Bleazard T et al. The transcriptional 43 Abe M, Hamada J-i, Takahashi O, Takahashi Y, Tada M, Miyamoto M et al. Dis- landscape and mutational profile of lung adenocarcinoma. Genome Res 2012; 22: ordered expression of HOX genes in human non-small cell lung cancer. Oncol Rep 2109–2119. 2006; 15: 797–802. 23 Dennis Jr G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC et al. DAVID: database 44 Grier D, Thompson A, Kwasniewska A, McGonigle G, Halliday H, Lappin T. for annotation, visualization, and integrated discovery. Genome Biol 2003; 4:P3. The pathophysiology of HOX genes and their role in cancer. J Pathol 2005; 205: 24 Zhao M, Sun J, Zhao Z. TSGene: a web resource for tumor suppressor genes. 154–171. Nucleic Acids Res 2013; 41: D970–D976. 45 Urosevic M, Kurrer MO, Kamarashev J, Mueller B, Weder W, Burg G et al. Human 25 Higgins ME, Claremont M, Major JE, Sander C, Lash AE. CancerGenes: a gene leukocyte antigen G up-regulation in lung cancer associates with high-grade selection resource for cancer genome projects. Nucleic Acids Res 2007; 35(suppl 1): histology, human leukocyte antigen class I loss and -10 production. Am D721–D726. J Pathol 2001; 159: 817–824. 26 Liu T, Ortiz JA, Taing L, Meyer CA,RETRACTED Lee B, Zhang Y et al. Cistrome: an integrative 46 Meissner M, Reichert TE, Kunkel M, Gooding W, Whiteside TL, Ferrone S et al. platform for transcriptional regulation studies. Genome Biol 2011; 12: R83. Defects in the human leukocyte antigen class I antigen processing machinery in 27 Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion head and neck squamous cell carcinoma: association with clinical outcome. Clin transcripts. Genome Biol 2011; 12: R72. Cancer Res 2005; 11: 2552–2560.

Cancer Gene Therapy (2014), 74 – 82 & 2014 Nature America, Inc. Cancer Gene Therapy (2015), 1 © 2015 Nature America, Inc. All rights reserved 0929-1903/15 www.nature.com/cgt

RETRACTION Abnormal gene expression and gene fusion in lung adenocarcinoma with high-throughput RNA sequencing

ZH Yang, R Zheng, Y Gao, Q Zhang and H Zhang

Cancer Gene Therapy advance online publication, 18 December 2015; doi:10.1038/cgt.2015.69

Retraction to: Cancer Gene Therapy (2014) 21,74–82; doi:10.1038/ Ethics (COPE). After a thorough investigation we have cgt.2013.86 strong reason to believe that the peer review process was compromised. The Publisher and Editor retract this article in accordance with the recommendations of the Committee on Publication The original article was published online on 7 February 2014.