Transcriptome Analysis of Recurrently Deregulated Genes Across Multiple
Total Page:16
File Type:pdf, Size:1020Kb
Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484 Cancer Integrated Systems and Technologies Research Transcriptome Analysis of Recurrently Deregulated Genes across Multiple Cancers Identifies New Pan-Cancer Biomarkers Bogumil Kaczkowski1, Yuji Tanaka1,2, Hideya Kawaji1,2,3, Albin Sandelin4, Robin Andersson4, Masayoshi Itoh1,3, Timo Lassmann1,5, the FANTOM5 consortium, Yoshihide Hayashizaki3, Piero Carninci1, and Alistair R.R. Forrest1,6 Abstract Genes that are commonly deregulated in cancer are clinically which are upregulated in cancer, defining promoters that overlap attractive as candidate pan-diagnostic markers and therapeutic with repetitive elements (especially SINE/Alu and LTR/ERV1 targets. To globally identify such targets, we compared Cap elements) that are often upregulated in cancer. Lastly, we docu- Analysis of Gene Expression profiles from 225 different cancer mented for the first time upregulation of multiple copies of cell lines and 339 corresponding primary cell samples to identify the REP522 interspersed repeat in cancer. Overall, our genome- transcripts that are deregulated recurrently in a broad range of wide expression profiling approach identified a comprehensive cancer types. Comparing RNA-seq data from 4,055 tumors and set of candidate biomarkers with pan-cancer potential, and 563 normal tissues profiled in the The Cancer Genome Atlas and extended the perspective and pathogenic significance of repetitive FANTOM5 datasets, we identified a core transcript set with ther- elements that are frequently activated during cancer progression. anostic potential. Our analyses also revealed enhancer RNAs, Cancer Res; 76(2); 1–11. Ó2015 AACR. Introduction time, cancers from different tissues can share some common features, for example, The Cancer Genome Atlas (TCGA) has Successful cancer treatment depends heavily on early detection found genes and pathways, DNA copy number alterations, muta- and diagnosis. Despite decades of research, relatively few bio- tions, methylation, and transcriptome changes that recur across markers are routinely used in clinics (e.g., CA-125 and PSA in 12 different primary tumor types (4). ovarian and prostate cancers, respectively; refs. 1, 2). There is a Here using Cap Analysis of Gene Expression (CAGE) data need for reliable and clinically applicable new cancer biomarkers collected for the Functional ANnoTation Of Mammalian for early detection. Cancers originating in the same tissue can be genome (FANTOM5) project (5), we identified mRNAs, very heterogeneous, often being derived from different cell types long-noncoding RNAs (lncRNA), enhancer RNAs (eRNA), and and having drastically different mutation profiles (3). At the same RNAs initiating from within repeat elements, which are recur- rently perturbed in cancer cell lines. To confirm that these transcripts are relevant to tumors, we compared their expres- 1RIKEN Center for Life Science Technologies, Division of Genomic Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, sion in 4,055 primary tumors and 563 matching tissue sets Japan. 2RIKEN Advanced Center for Computing and Communication, RNA-seq profiled by the TCGA (6) and in a set of colorectal Preventive Medicine and Applied Genomics unit, 1-7-22 Suehiro-cho, tumor (7) samples profiled proteomically. Finally, for the most Tsurumi-ku, Yokohama, Japan. 3RIKEN Preventive Medicine & Diag- nosis Innovation Program, 2-1 Hirosawa, Wako, Saitama 351-0198, promising biomarker candidates we performed qRT-PCR vali- Japan. 4The Bioinformatics Centre, Department of Biology and Bio- dations in cancer cell lines and tumor cDNA panels. Taken tech Research and Innovation Centre (BRIC), University of Copenha- together, our analyses allowed for identification of a set of gen, Copenhagen, Denmark. 5Telethon Kids Institute, the University of Western Australia, Perth, Western Australia, Australia. 6Harry Perkins robust pan cancer biomarker candidates, which have the poten- Institute of Medical Research, QEII Medical Centre and Centre for tial for development as blood biomarkers for early detection Medical Research, the University of Western Australia, Nedlands, and for histological screening of biopsies. Western Australia, Australia. This work is part of the FANTOM5 project. Data download, Note: Supplementary data for this article are available at Cancer Research genomic tools, and copublished manuscripts have been summa- Online (http://cancerres.aacrjournals.org/). rized at the FANTOM5 website (8). Corresponding Authors: Bogumil Kaczkowski, RIKEN Center for Life Science Technologies (CLST), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Materials and Methods Japan. Phone: 81-45-503-9222; Fax: 81-45-503-9216; E-mail: [email protected]; and Alistair R.R. Forrest, QEII Medical Centre and FANTOM5 data Centre for Medical Research, the University of Western Australia, 6 Verdun We used the cap analysis of gene expression (CAGE) data from Street, Nedlands, WA 6009, Australia. Phone: 61-8-6151-0780; Fax: 61-8-6151- the FANTOM5 project (libraries sequenced to a median depth of 4 0701; E-mail: [email protected] million mapped tags; ref. 5). We used 564 CAGE profiles: 225 doi: 10.1158/0008-5472.CAN-15-0484 cancer cell lines and 339 primary cells samples. We split the data Ó2015 American Association for Cancer Research. into three data sets: (i) matched solid, (ii) unmatched solid, and www.aacrjournals.org OF1 Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research. Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484 Kaczkowski et al. A FANTOM5 DATA 225 cancer cell lines 339 primary cells 10 MATCHED origins UNMATCHED origins 2 MATCHED origins SOLID tumors SOLID cancers BLOOD cancers 72 cancer cell lines 102 cancer cell lines 51 cancer cell lines 65 primary cells 200 primary cells 74 primary cells edgeR and ON/OFF DEE pipelinei li DEE ppipeline ipelin DE pipeline analysis Solid cancer only differential expression Overlapping features Pan-cancer differential expression B CANCER Switching Expression shift NORMAL ON/OFF UP/DOWN Figure 1. p1@TERT p1@POLQ Summary of comparisons carried out to identify recurrently perturbed Upregulated transcripts in the FANTOM5 cell line dataset. A, differential expression (DE) pipeline applied to the FANTOM5 data. B, examples of differentially expressed promoters showing expression Melanocytes Mesothelium Melanocytes Mesothelium Lymphoid Myeloid Bone Brain Kidney Liver Lung Breast Prostate Ovary Lymphoid Myeloid Bone Brain Kidney Liver Lung Breast Prostate Ovary switching (ON and OFF) and expression p1@NAALADL1 p1@C13orf15 shift (UP and DOWN). C, comparison 3 between promoter and gene level 2 differential expression (based on CAGE Downregulated data). Note: Although the majority of differentially expressed promoters reflect gene-wise differential expression, a significant fraction behave differently, for example, MPP2 or BCAT1. Lymphoid Brain Kidney Liver Lung Melanocytes Mesothelium Myeloid Bone Breast Prostate Ovary Melanocytes Mesothelium Lymphoid Myeloid Bone Brain Kidney Liver Lung Breast Prostate Ovary D, table summarizing the number of Blood Solid Blood Solid promoters and genes showing matched matched matched matched differential expression. Numbers in C parentheses indicate numbers of unique genes. FC 2 Promoter level log Promoter level Gene level log FC D 2 ′ Wi Non Gene- Promoter Gene- OF2 Cancer Res; 76(2) January 15, 2016 Cancer Research Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research. Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484 Pan-Cancer Transcriptome Table 1. The numbers of differentially expressed promoters (DPI clusters) and the type of genomic region they overlap, based on Gencode 19 Upregulated Downregulated Type of genomic region Pan cancer Solid only Pan cancer Solid only Total % Protein coding 434 455 92 354 1,335 63 LincRNA 45 38 2 7 92 4 Antisense 37 28 0 4 69 3 Pseudogene 12 9 2 4 27 1 Other ncRNAs 20 33 0 5 58 3 Unannotated 233 251 3 40 527 25 Total 781 814 99 414 2,108 100 (iii) matched blood (Supplementary Table S1A and S1C for list of upper quartile normalized RSEM count estimates with expression cancer types and sample annotation). The CAGE tag counts profiles of 20,531 genes in 4,618 samples. under 184,827 robust decomposition-based peak identification The counts were log2 transformed and used as an input (DPI) clusters (5) were used to represent a promoter-level expres- expression data to LIMMA. sion. For the enhancer activity, we used the CAGE tags counts The cancer versus normal comparison was performed using under 43,011 enhancer regions identified in ref. 9. equal weight for each solid cancer type, each type contributing equally to overall comparison. The P-values were adjusted for FANTOM5 differential expression analysis multiple testing by Benjamini–Hochberg method. The thresholds To identify up- and downregulated transcripts in cancer cell of fold change >2 and FDR <0.01 were used. lines versus normal primary cells, we used Genewise Negative Binomial Generalized Linear Models as implemented in edgeR Enrichment for cancer-related genes (10). The cancer versus normal comparison was performed We tested for the enrichment by applying a hypergeometric using glmLRT function. In matched solid comparison, we set test, using the significance threshold of P < 0.05. The list of equal weight for each solid cancer type, each type contributing