Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484 Cancer Integrated Systems and Technologies Research

Transcriptome Analysis of Recurrently Deregulated across Multiple Cancers Identifies New Pan-Cancer Biomarkers Bogumil Kaczkowski1, Yuji Tanaka1,2, Hideya Kawaji1,2,3, Albin Sandelin4, Robin Andersson4, Masayoshi Itoh1,3, Timo Lassmann1,5, the FANTOM5 consortium, Yoshihide Hayashizaki3, Piero Carninci1, and Alistair R.R. Forrest1,6

Abstract

Genes that are commonly deregulated in cancer are clinically which are upregulated in cancer, defining promoters that overlap attractive as candidate pan-diagnostic markers and therapeutic with repetitive elements (especially SINE/Alu and LTR/ERV1 targets. To globally identify such targets, we compared Cap elements) that are often upregulated in cancer. Lastly, we docu- Analysis of Expression profiles from 225 different cancer mented for the first time upregulation of multiple copies of cell lines and 339 corresponding primary cell samples to identify the REP522 interspersed repeat in cancer. Overall, our genome- transcripts that are deregulated recurrently in a broad range of wide expression profiling approach identified a comprehensive cancer types. Comparing RNA-seq data from 4,055 tumors and set of candidate biomarkers with pan-cancer potential, and 563 normal tissues profiled in the The Cancer Genome Atlas and extended the perspective and pathogenic significance of repetitive FANTOM5 datasets, we identified a core transcript set with ther- elements that are frequently activated during cancer progression. anostic potential. Our analyses also revealed enhancer RNAs, Cancer Res; 76(2); 1–11. 2015 AACR.

Introduction time, cancers from different tissues can share some common features, for example, The Cancer Genome Atlas (TCGA) has Successful cancer treatment depends heavily on early detection found genes and pathways, DNA copy number alterations, muta- and diagnosis. Despite decades of research, relatively few bio- tions, methylation, and transcriptome changes that recur across markers are routinely used in clinics (e.g., CA-125 and PSA in 12 different primary tumor types (4). ovarian and prostate cancers, respectively; refs. 1, 2). There is a Here using Cap Analysis of Gene Expression (CAGE) data need for reliable and clinically applicable new cancer biomarkers collected for the Functional ANnoTation Of Mammalian for early detection. Cancers originating in the same tissue can be genome (FANTOM5) project (5), we identified mRNAs, very heterogeneous, often being derived from different cell types long-noncoding RNAs (lncRNA), enhancer RNAs (eRNA), and and having drastically different mutation profiles (3). At the same RNAs initiating from within repeat elements, which are recur- rently perturbed in cancer cell lines. To confirm that these transcripts are relevant to tumors, we compared their expres- 1RIKEN Center for Life Science Technologies, Division of Genomic Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, sion in 4,055 primary tumors and 563 matching tissue sets Japan. 2RIKEN Advanced Center for Computing and Communication, RNA-seq profiled by the TCGA (6) and in a set of colorectal Preventive Medicine and Applied Genomics unit, 1-7-22 Suehiro-cho, tumor (7) samples profiled proteomically. Finally, for the most Tsurumi-ku, Yokohama, Japan. 3RIKEN Preventive Medicine & Diag- nosis Innovation Program, 2-1 Hirosawa, Wako, Saitama 351-0198, promising biomarker candidates we performed qRT-PCR vali- Japan. 4The Bioinformatics Centre, Department of Biology and Bio- dations in cancer cell lines and tumor cDNA panels. Taken tech Research and Innovation Centre (BRIC), University of Copenha- together, our analyses allowed for identification of a set of gen, Copenhagen, Denmark. 5Telethon Kids Institute, the University of Western Australia, Perth, Western Australia, Australia. 6Harry Perkins robust pan cancer biomarker candidates, which have the poten- Institute of Medical Research, QEII Medical Centre and Centre for tial for development as blood biomarkers for early detection Medical Research, the University of Western Australia, Nedlands, and for histological screening of biopsies. Western Australia, Australia. This work is part of the FANTOM5 project. Data download, Note: Supplementary data for this article are available at Cancer Research genomic tools, and copublished manuscripts have been summa- Online (http://cancerres.aacrjournals.org/). rized at the FANTOM5 website (8). Corresponding Authors: Bogumil Kaczkowski, RIKEN Center for Life Science Technologies (CLST), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Materials and Methods Japan. Phone: 81-45-503-9222; Fax: 81-45-503-9216; E-mail: [email protected]; and Alistair R.R. Forrest, QEII Medical Centre and FANTOM5 data Centre for Medical Research, the University of Western Australia, 6 Verdun We used the cap analysis of gene expression (CAGE) data from Street, Nedlands, WA 6009, Australia. Phone: 61-8-6151-0780; Fax: 61-8-6151- the FANTOM5 project (libraries sequenced to a median depth of 4 0701; E-mail: [email protected] million mapped tags; ref. 5). We used 564 CAGE profiles: 225 doi: 10.1158/0008-5472.CAN-15-0484 cancer cell lines and 339 primary cells samples. We split the data 2015 American Association for Cancer Research. into three data sets: (i) matched solid, (ii) unmatched solid, and

www.aacrjournals.org OF1

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research. Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

Kaczkowski et al.

A FANTOM5 DATA 225 cancer cell lines 339 primary cells

10 MATCHED origins UNMATCHED origins 2 MATCHED origins SOLID tumors SOLID cancers BLOOD cancers 72 cancer cell lines 102 cancer cell lines 51 cancer cell lines 65 primary cells 200 primary cells 74 primary cells

edgeR and ON/OFF DEE pipelinei li DEE ppipeline ipelin DE pipeline analysis

Solid cancer only differential expression Overlapping features Pan-cancer differential expression

B CANCER Switching Expression shift NORMAL ON/OFF UP/DOWN Figure 1. p1@TERT p1@POLQ Summary of comparisons carried out to identify recurrently perturbed Upregulated transcripts in the FANTOM5 cell line dataset. A, differential expression (DE) pipeline applied to the FANTOM5 data. B, examples of differentially expressed promoters showing expression Melanocytes Mesothelium Melanocytes Mesothelium Lymphoid Myeloid Bone Brain Kidney Liver Lung Breast Prostate Ovary Lymphoid Myeloid Bone Brain Kidney Liver Lung Breast Prostate Ovary switching (ON and OFF) and expression p1@NAALADL1 p1@C13orf15 shift (UP and DOWN). C, comparison 3 between promoter and gene level 2 differential expression (based on CAGE Downregulated data). Note: Although the majority of differentially expressed promoters reflect gene-wise differential expression, a significant fraction behave differently, for example, MPP2 or BCAT1. Lymphoid Brain Kidney Liver Lung Melanocytes Mesothelium Myeloid Bone Breast Prostate Ovary Melanocytes Mesothelium Lymphoid Myeloid Bone Brain Kidney Liver Lung Breast Prostate Ovary D, table summarizing the number of Blood Solid Blood Solid promoters and genes showing matched matched matched matched differential expression. Numbers in C parentheses indicate numbers of unique genes. FC 2 Promoter level log Promoter level

Gene level log FC D 2 ′ Wi Non Gene- Promoter

Gene-

OF2 Cancer Res; 76(2) January 15, 2016 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research. Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

Pan-Cancer Transcriptome

Table 1. The numbers of differentially expressed promoters (DPI clusters) and the type of genomic region they overlap, based on Gencode 19 Upregulated Downregulated Type of genomic region Pan cancer Solid only Pan cancer Solid only Total % Protein coding 434 455 92 354 1,335 63 LincRNA 45 38 2 7 92 4 Antisense 37 28 0 4 69 3 Pseudogene 12 9 2 4 27 1 Other ncRNAs 20 33 0 5 58 3 Unannotated 233 251 3 40 527 25 Total 781 814 99 414 2,108 100

(iii) matched blood (Supplementary Table S1A and S1C for list of upper quartile normalized RSEM count estimates with expression cancer types and sample annotation). The CAGE tag counts profiles of 20,531 genes in 4,618 samples. under 184,827 robust decomposition-based peak identification The counts were log2 transformed and used as an input (DPI) clusters (5) were used to represent a promoter-level expres- expression data to LIMMA. sion. For the enhancer activity, we used the CAGE tags counts The cancer versus normal comparison was performed using under 43,011 enhancer regions identified in ref. 9. equal weight for each solid cancer type, each type contributing equally to overall comparison. The P-values were adjusted for FANTOM5 differential expression analysis multiple testing by Benjamini–Hochberg method. The thresholds To identify up- and downregulated transcripts in cancer cell of fold change >2 and FDR <0.01 were used. lines versus normal primary cells, we used Genewise Negative Binomial Generalized Linear Models as implemented in edgeR Enrichment for cancer-related genes (10). The cancer versus normal comparison was performed We tested for the enrichment by applying a hypergeometric using glmLRT function. In matched solid comparison, we set test, using the significance threshold of P < 0.05. The list of equal weight for each solid cancer type, each type contributing oncogenes was a union of oncogenes listed in MSigDB (11) equally to overall comparison. In the matched solid and and UniProt (12) databases. For tumor suppressors, we con- matched blood dataset, simple cancer versus normal compar- sidered genes listed in at least two of three sources: MSigDB ison was performed. (11), UniProt (12), and TSGene (13) tumor suppressor list. The P-values were adjusted for multiple testing by Benjamini– For the list of genes frequently mutated in cancer we used Hochberg method. The thresholds of fold change >4 and FDR the high confidence drivers mutated across 12 cancer types <0.01 were used. from Tamborero and colleagues (14). We also tested for enrichment of cancer-related genes listed in COSMIC: Cancer ON/OFF analysis Gene census (15). For each feature, we determined the expression status in binary fashion: ON (expressed, count > 0), OFF (not detected, count ¼ Chromatin Interaction Analysis Paired-End Tag 0). We then calculated the frequency of expression in cancer and enhancer–promoter pairs normal samples. Features expressed four times more frequently in We obtained the Chromatin Interaction Analysis Paired-End cancer than in normal samples were selected as "ON in cancer," Tags (ChIA-PET) data from ENCODE/GIS-Ruan project whereas features not expressed/lost four times more often in (GSE39495, April 21, 2014). Data files from 15 experiments cancer than in normal samples mere selected as "OFF in cancer." covered five cell lines (Hct-116, Helas3, K562, Mcf7, and Nb4) The procedure was applied to each dataset (matched solid, and three transcription factors (Pol2, Ctcf, ERalpha a). We merged unmatched solid, and matched blood). The significance of the the interaction from all experiments. We then extracted the association (contingency) between ON/OFF status and cancer/ ChIA–PET interaction clusters overlapping the genomic locations normal status was tested by two-sided Fisher exact test with of enhancers and searched if the linked genomic locations overlap adjustment for multiple testing by Benjamini–Hochberg method. promoters of the annotated genes. The threshold of FDR < 0.01 was used. The pipeline of differential expression described above was applied separately to the DPI/ Quantitative PCR for cell line samples and human promoter counts and enhancer counts. The features found dif- cancer/normal tissue cDNA ferentially expressed in all three datasets were selected as "pan" Five hundred nanograms of total RNA from K562, HepG2, cancer features, whereas features differentially expressed in MCF7, and HDF was reverse-transcribed using oligo dT primer, matched and unmatched solid datasets only were selected as which was then diluted 12.5 times with DNA/RNA free water. "solid only" cancer features. Primers for real-time PCR were designed by the Primer3 web tool (Supplementary Table S12). The housekeeping gene ACTB TCGA RNA-seq data was utilized as to normalize the expression levels. Quantitative We obtained the RNA-Seq profiling data of 4,055 cancer PCR (qPCR) was carried out with ABI 7500 Fast Real-Time PCR samples and 563 normal tissues data from The Cancer Genome System using Power SYBR Green PCR Master Mix. For validation Atlas (TCGA) Data Portal (data status as of Aug 5, 2013, origin in tumor samples we performed qPCR reactions on TissueScan listed in Supplementary Table S1B; ref. 6). The profiles repre- Cancer Survey Panel 96 – I cDNA panel (CSRT501, OriGene, MD). sented 14 solid cancer types for which both tumor and normal Each reaction was run in triplicate for cell line samples, and singlet tissue samples were available. We downloaded level 3 RNASeqV2, for human cancer/normal tissue cDNA panel.

www.aacrjournals.org Cancer Res; 76(2) January 15, 2016 OF3

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research. Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

Kaczkowski et al.

FANTOM5 CAGE TCGA RNA-seq PRAME Results ZIC2 ZIC5 log FC fi GABRD 2 Identi cation of transcripts recurrently up- or downregulated HOXB13 cancer OTX1 in cancer cell lines DLX4 vs. DLX6 TERT normal Using CAGE data collected for the FANTOM5 (5, 9) project, we ONECUT2 HOXC13 compared expression levels of transcripts from 184,827 promoter FABP6 TNNT1 10 TP73 and 43,011 enhancer regions between a panel of 225 cancer cell MNX1 E2F8 lines and a panel of 339 primary cell samples (samples IDs and GRIN2D DLX1 TOP2A their annotation is listed in Supplementary Table S1C). NMU 5 MMP13 First, the cancer cell line and primary cell datasets were CD70 KIF14 CENPF divided into three subsets (see Supplementary Table S1A); cell PKMYT1 MKI67 0 lines and primary cells from solid tissues or blood lineages that EXO1 SRCIN1 could be matched are referred to as matched-solid or matched- MYEOV SLCO1B3 IGF2BP3 blood. The remaining samples from solid tissue are referred to FAM111B −5 BUB1B as unmatched-solid. TM4SF19 TMEM145 fi E2F7 In each subset, we identi ed promoters that were differen- POLQ RAD54L −10 tially expressed between cancer and normal (edgeR; ref. 10, SKA3 CDC6 > < SGOL1 4-fold change, FDR 0.01). We also performed an alternative KIF18A NUSAP1 binary analysis (we refer to it as an ON/OFF analysis) to MAST1 RHPN1 CBX2 identify transcripts that were consistently switched off or CLSPN SLC7A11 switched on in cancer [four times more often expressed XRCC2 RDM1 (switched ON) or not detected (switched OFF) in the cancer CCNO KRT80 fi GPR19 group compared to the normal group, using a signi cance level RNFT2 KIF23 of FDR < 0.01 by Fisher exact test (examples on Fig. 1B)]. The PDX1 ATAD2 HELLS results of the ON/OFF and edgeR analyses were then merged to MLF1IP RCOR2 obtain a final selection of up- and downregulated promoters BLM CCDC150 CCNE2 (Fig. 1A and Supplementary Table S2). CASC5 CHRNA5 In total, 2,108 promoters were differentially expressed in cancer DNAH14 MCM2 ATAD5 cell lines. Seven hundred and eighty-one were consistently up PABPC1L RACGAP1 regulated in all three comparisons and a further 814 were up only KIFC2 CDC7 DBF4 in solid cancers. Conversely 99 were consistently down-regulated RFC4 ASNS in all three datasets and a further 414 were down only in solid MTHFD2 cancers (Table 1). Sixty-three percent of the differentially ACOX2 LYNX1 expressed peaks overlapped protein-coding genes, 12% over- PTRF CLIP3 SLIT3 lapped long noncoding genes (GENCODE v19; ref. 16) and BST1 JAM3 25% were not associated to any known genes (Supplementary THBD PTGS1 PEAR1 Table S3). ANPEP ARMCX1 In some cases, the CAGE analysis identified alternative pro- BEX4 KCTD12 ZNF677 moters. Comparing the gene-wise differential expression (total FBLN5 FLRT2 CAGE signal for the same gene) to the differential expression of TWIST2 CXCL2 individual promoters (Fig. 1C), we found that for 23% of differ- TNFAIP8L3 TMEM220 TIMP3 entially expressed protein coding genes, at least one alternative TSPYL5 MT1A promoter behaved differently to that of the whole gene, whereas APOD IGF2 DNALI1 for lncRNAs (which have fewer alternative promoters) it was only SERPING1 GYPC 5% (Fig. 1D). S1PR1 MYL9 NAALADL1 GPX3 SERP2 Differentially expressed protein coding genes are enriched in TPM2 OSR1 cancer-associated genes PHYHD1 NDN 0 PHYHIP Focusing on CAGE peaks unambiguously at the 5 end of FEZ1 0 PCDHGA12 protein coding genes (500 bp from the 5 end of annotated SRPX MT1E PAMR1 DCN SFRP1 AOX1 FABP4 heat maps show the expression fold changes from cancer versus normal NKAPL COX7A1 comparisons. The right panels represent 10 solid tumor origins and blood HSPB6 TCEAL7 cancer based on FANTOM5 cancer cell lines. The left panels represent 14 Blood Bone Brain Breast Kidney Liver Lung Melanocyte Mesothelium Ovary Prostate BLCA BRCA COAD HNSC KICH KIRC KIRP LIHC LUAD LUSC PRAD READ THCA UCEC tumor types from TCGA. The TCGA tumor type abbreviations are BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; COAD, colon adenocarcinoma; HNSC, head and neck squamous cell carcinoma; KICH, kidney chromophobe; KIRC, kidney renal clear cell carcinoma; KIRP, kidney Figure 2. renal papillary cell carcinoma; LIHC, liver hepatocellular carcinoma; LUAD, Pan-cancer biomarker candidates. Genes aberrantly expressed in both lung adenocarcinoma; LUSC, lung squamous cell carcinoma; PRAD, prostate FANTOM5 cancer cell lines (>4 fold change or four times expression gain/loss, adenocarcinoma; READ, rectum adenocarcinoma; THCA, thyroid carcinoma; FDR < 0.01) and in the TCGA primary tumors (fold change > 2, FDR < 0.01). The UCEC, uterine corpus endometrial carcinoma.

OF4 Cancer Res; 76(2) January 15, 2016 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research. Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

Pan-Cancer Transcriptome

A RP11-124N14.3 is downregulated in cancer and antisense to VIM (EMT marker) Entrez gene hg19 VIM Gencode v19 transcripts

ENST00000456355.1 (RP11-124N14.3)

Primer span Downregulated, confirmed by qPCR

B REP522 initiated bidirectional transcription of CCD144NL and CCD144NL-AS1 Entrez gene hg19 CCDC144NL CCDC144NL-AS1 ABHD17AP6 RNASEH1P1 Primer span Upregulated, confirmed by qPCR Upregulated, confirmed by qPCR Gencode v19 transcripts seq A

FANTOM5 CAGE Phase1 CTSS

UCSC hg19 repeatmasker repeats 2011-02-02 REP522 AluY REP522 L1M2 L1ME1 AT_rich AluY

C REP522 initiated bidirectional transcription of BRCAT95 and T091486 Entrez gene hg19 TNFRSF19 MIPEP Gencode v19 transcripts

Primer span no qPCR signal Upregulated, confirmed by qPCR MiTranscriptome transcript assembly seq T091467 T091482 T091486 T091490 T091483 T091491 ~ T091478 T091484 T091488 (BRCAT95) T091485 T091489 T091480 T091492 T091481 T091487 T091512~

FANTOM5 CAGE Phase1 CTSS

UCSC hg19 repeatmasker repeats 2011-02-02 L1M2 REP522 REP522 MIRc REP522 FANTOM5 Human permissive enhancers phase 1 and 2 [rev:0 fwd:198] (mean) score

D Enhancer chr1:46575103-4657517 ChIA-PET chromatin interaction with promoter of PIK3R3 hg19 chr1 46482587..46621931+ [len 139.3kb ]

46500000 46550000 46600000 Entrez gene hg19 MAST2 LOC100133124 LOC100132269 PIK3R3 CAGE tags RLE normalized [rev:9.7 fwd:0.22 scale:9.75] (exp mean) rle

Enhancers - permissive set (Andersson and colleagues)

Encode ChiaPet ditags :: tagcount

ChIA–PET chromatin interaction

www.aacrjournals.org Cancer Res; 76(2) January 15, 2016 OF5

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research. Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

Kaczkowski et al.

transcripts or located in 50 UTRs) we identified 911 promoters permanent reprograming of metabolism in cancer cells rather corresponding to 656 unique genes that were differentially than response to tumor microenvironment, or cell culture con- expressed: 435 upregulated and 221 downregulated (Supplemen- ditions. Finally, we also observed seven discrepancies; CDKN2A, tary Table S9). The gene set was significantly enriched for COL1A1, COL5A2, GJB2, HIST1H2BH, MMP9, and TNFRSF6B oncogenes (hypergeometric test P ¼ 7.5e05, 33 genes), tumor were downregulated in the cancer cell lines but up regulated in the suppressors (P ¼ 0.0043, 13 genes), genes frequently mutated in primary tumor analysis. cancer (P ¼ 0.034, 18 genes; ref. 14) and genes listed in the Cancer Finally, we used recent proteome data from 90 colorectal cancers Gene Census (P ¼ 0.01, 28 genes; see Supplementary Table S3F; and 30 normal tissues published by Zhang and colleagues (7). The ref. 15). Interestingly, eight oncogenes were downregulated, and spectral count data were available for 239 of our 656 differentially five tumor suppressors were upregulated, changing in the oppo- expressed genes. Twenty mRNAs/proteins were upregulated in site direction to one would expect (Supplementary Fig. S1A–S1C). both the cancer cell lines (CAGE) and colorectal tumors (mass This may be caused by regulatory feedback loops responding to spec data) whereas 16 were upregulated in both the RNA-seq and neoplastic changes. mass spec data (Supplementary Fig. S2A and Supplementary Table We next performed an analogous cancer versus normal analysis S9C). Notably, four genes were robustly upregulated in all three on RNA-seq data from 14 tumor-normal pairs (4,055 primary comparisons: MCM2, TOP2A, ASNS, and MKI67. cancer samples and 563 normal tissues samples; Supplementary There were 108 genes that were downregulated at the protein Table S1B) from The Cancer Genome Atlas (TCGA, http://cancer- level and in at least one transcriptome analysis (CAGE or RNA- genome.nih.gov/). The fold changes observed for the TCGA seq). Strikingly, the top 10 enriched terms within those genes were analysis were considerably weaker (Fig. 2), than those seen for all related to metabolic processes, either to oxidative processes or the FANTOM5 analysis presumably because of the mixture of lipid metabolism (Supplementary Fig. S2B), thereby confirming cells in a tumor diluting the cancer cell signal (Fig. 2). To recover the metabolic pathway changes that we have observed from the similar numbers of genes from both the TCGA and FANTOM5 RNA data. analyses, we therefore applied a weaker threshold (abs FC > 2, FDR < 0.01) to identify 490 upregulated genes and 1,661 Pan-cancer long-noncoding RNAs downregulated genes (Supplementary Table S4). The up-regu- From the cancer cell line analysis we identified 271 diffe- lated genes were enriched for those listed in the cancer gene rentially expressed lncRNAs (181 lncRNAs annotated with census (hypergeometric test P ¼ 0.03, 18 genes). Of particular GENCODE 19, plus a further 90 with the miTRanscriptome note, many more genes were downregulated in the tumor- annotation; ref. 23). The majority (247 lncRNAs) were upregu- normal comparison than we observed for the cancer cell lated whereas 24 were downregulated (Supplementary Table line-primary cell comparison. S10A). In total, 39 and five of these were up- and downregulated, respectively, in both the cancer cell line analysis and at least one Potential pan-cancer biomarkers tumor type in the miTranscriptome study (23). Of those, 21 were We found that 76 (17%) of the upregulated genes identified in consistently upregulated and two consistently downregulated in the cancer cell lines analysis were also upregulated in primary cancer cell lines and at least two tumor types (Supplementary tumors from TCGA (Fig. 2). Among them we find oncogenes Table S10B). (HOXC13, MYEOV, MNX1, and CASC5), cancer antigens For two of these lncRNAs (ENST00000448869 and FOXP4- (PRAME, CD70, CASC5, IDF2BP3) and, somewhat unexpectedly, AS1), we performed qRTPCR validation in cancer cell lines the tumor suppressors (TP73, BLM, BUB1B). The upregulated versus primary cells and also in a cDNA panel covering eight genes were also enriched in genes involved in cell cycle, DNA tumor types and normal matching tissues. In both cases, the metabolism, biopolymer metabolism, and homeobox genes targets were highly significantly upregulated in both cancer cell involved in development. This included well-known pan-cancer lines and tumors (Fig. 4). genes such as TERT, PRAME, and TOP2A (17, 18) and MYEOV and We also looked for the overlap with the lists of pan-cancer MNX1, which are implicated in blood malignancies (19, 20) and lncRNAs to the 229 "onco-lncRNAs" identified by Cabanski and FAM111B in prostate cancer (21). colleagues (24), which allowed us to confirm three additional For the downregulated genes, 52 (19%) genes from the upregulated lncRNAs (Supplementary Table S10B and S10D). FANTOM5 cancer cell lines analysis were also downregulated in Our analysis of preprocessed TCGA RNA-seq data also allowed us primary tumors (Fig. 2). Interestingly, the list was enriched for to confirm deregulation of four lncRNAs, two already confirmed genes related to oxidoreductase activity (five genes: AOX1, PTGS1, by miTranscriptome and Cabanski (MEG3 and DGCR5) and two ACOX2, COX7A1, and the tumor suppressor GPX3; ref. 22). that were missed by other reports; downregulation of the MT1L Because the downregulation is seen in both cancer cell lines and pseudogene and most notably the up-regulation of PVT1, which is primary tumors, we deduce that the changes are caused by a a well-known lncRNA oncogene (25).

Figure 3. Genomic neighborhood of pan-cancer–associated noncoding transcripts. A, RP11-124N14.3 lncRNA is consistently downregulated in cancer cell lines and is positioned antisense to VIM (EMT marker). The downregulation has been confirmed by qPCR (Fig. 4A) and in the miTranscriptome data (Supplementary Table S10A). B, we observed that REP22 repeat becomes activated in cancer, giving rise to bidirectional transcription. The example here shows the promoters of the protein coding CCD144NL and its antisense CCD144NL-AS1 overlapping a REP522 element. The upregulation of both genes was validated in cancer cell lines by qPCR (Fig. 4A). C, in another example, BRCAT95, which is upregulated in breast cancer (Supplementary Table S10A), is initiated from a REP522 element and is confirmed to be upregulated by qPCR. The other transcript could not be confirmed by qPCR, likely due to low expression level and/or incorrectly assembled transcript. Interestingly, the promoter pair overlaps FANTOM5 defined enhancer region. D, ChIA-pet data show pan-cancer enhancer chr1:46575103-46575175 is physically associated with the promoter for the PIK3R3 gene that is reported to increase tumor migration and invasion when overexpressed in colorectal cancer (29) and is identified as a potential therapeutic target in ovarian cancer (41). The visualizations were performed in the ZENBU genome browser (42).

OF6 Cancer Res; 76(2) January 15, 2016 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research. Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

Pan-Cancer Transcriptome

Deregulated long-noncoding RNAs located near cancer-related 11 upregulated, REP522 initiated transcripts from different genes genomic regions in three cancer cell lines and dermal fibro- We next looked at the genomic neighborhood of the differ- blast cells as a control. For eight of these we confirmed entially expressed lncRNAs. For 27 of the 181 (GENCODE19) upregulation in the cancer cell lines compared with normal differentially expressed lncRNAs, we found 33 cancer-related fibroblasts (Fig. 4A). In one case, we confirmed the bidirec- genes within 100 kb (Table 2, example in Supplementary Fig. tional transcription of CCD144NL and CCD144NL-AS1 from S3). For example, PVT1 neighbors MYC; these are consistently one REP522 element (Fig. 3B). The three transcripts for which cogained in cancer. We also observe RP11-1070N10.5 neigh- the qPCR validations did not yield any results represented very boring the TCL6 (lincRNA), TCL1A,andTCL1B oncogenes lowly expressed, novel and computationally assembled tran- (located in a breakpoint cluster region on scripts from miTranscriptome, hinting at the possibility that 14q32 in T-cell leukemia (26) and HOXA11-AS,neighboring they were either too lowly expressed or the transcripts were not HOXA13 and HOXA9 and overlapping the HOXA11 oncogene. correctly assembled (Fig. 3C). To our knowledge this is the first Notably five out of six cancer-related genes located within 1 kb report implicating REP522 activation in cancer. from upregulated lncRNAs were also upregulated, these include the MCF2L, GATA2,andMNX1 oncogenes and BSG and CSAG1 Enhancer activation in cancer cancer antigens (Table 2). Possibly linked to cancer metabolism, Taking advantage of the fact that CAGE data can be used to the upregulated PCAT7 is located antisense to fructose-1,6- estimate the activity of enhancers from balanced bidirectional bisphosphatase-2 (FBP2; Supplementary Fig. S3), whose capped transcription (9), we performed differential expression decreased expression promotes glycolysis and growth in gastric analysis based on CAGE tags counts under 43,011 CAGE-defined cancer cells (27). enhancers (9), using the same differential expression pipeline as for the promoter regions. We found 28 pan-cancer enhancers Activation of repeat elements in cancer upregulated in solid and blood cancers and a further 62 upregu- Globally about 20% of all FANTOM5 promoters initiate from lated in solid cancers only (Supplementary Table S5). Enhancers within repetitive elements and low complexity DNA sequences tend to be highly cell-type specific (9); accordingly we found no annotated by RepeatMasker. We observed a simple relationship broadly downregulated enhancers in cancer. for promoters that overlapped a repetitive element; the higher the We found that 23 of the 90 upregulated enhancers could be fold change (upregulation in cancer), the higher the probability associated to a miTranscriptome transcript (50 end within 500 bp that the promoter overlapped a repetitive element (Supplemen- from the enhancers; Supplementary Table S11A) and that four of tary Fig. S5, see Supplementary Table S13 for the promoter–repeat those transcripts were reported to be upregulated in at least one associations). A more detailed analysis revealed that the upregu- cancer type (Supplementary Table S11B). lated promoters are enriched in retrotransposons (SINE/Alu, We next used Chromatin Interaction Analysis Paired-End Tags LINE/L1, LTR/ERV1, LTR/ERVL). The SINE/Alu and LINE/L1 over- (ChIA-PET) data from the ENCODE project to associate these lapping promoters tended to be located in intronic regions of pan-cancer enhancers with their target genes. We found that 55 of protein coding genes (49% and 32%, respectively) and not the 90 upregulated enhancers can be physically linked to pro- associated to known RNA transcripts, whereas the upregulated moters of known genes (228 unique enhancer—gene links, Sup- promoters overlapping LTR/ERV1 often initiated the expression plementary Table S6). 17 of the enhancers were linked to cancer of lncRNAs (31 GENCODE lncRNAs and 48 miTranscriptome related genes, including seven oncogenes (BCL9, CREB1, ZNF384, lncRNAs; Table 3). SALL4, TFRC, BTG1, and oncomir MIR21), two tumor suppressors In contrast, the majority of promoters overlapping simple (ING4, KCTD11) and five Mut-Drivers (PIK3CB, CLIP1, KIFC3, repeats and low complexity sequences were associated with GPS2, and CARM1; Supplementary Table S6). In addition, eight of protein coding transcripts. Simple repeats were enriched among the upregulated enhancers were linked to promoters found to be upregulated promoters, whereas low complexity sequences were upregulated in cancer cell lines, including cancer-linked genes enriched among downregulated promoters (Table 3). such as TNFSF12 and PIK3R3 (Fig. 3D; ref. 29).

Bidirectional transcription from REP522 satellite repeat is activated in cancer Discussion Interestingly, a specific repeat family, REP522, was strongly By using both the FANTOM5 CAGE expression data from enriched in the most upregulated promoters. REP522, origi- cancer cell lines and primary cells, and the TCGA RNA-seq and nally called a telomeric satellite, is a largely palindromic, TCGA proteome expression datasets from TCGA tumors and unclassified interspersed repeat of 1.8 Kb in size (28). We normal tissues we have built an overview of recurrent expression observed that out of 72 promoters overlapping REP522, 25 changes in cancer. were upregulated in cancer (odds ratio, 62.05). Twenty out of These datasets have their own strengths and weaknesses. Com- these 25 promoters were associated with a known transcript plicating the TCGA analysis, both tumors and normal tissues are (five pseudogenes, nine lncRNAs, and one protein coding complex mixtures of cell types (cancer cells, infiltrating lympho- gene) including the pseudogene BAGE2 (B melanoma antigen cytes, stroma and blood vessels), thus interpretation of differen- family, member 2) and the lncRNAs PCAT7 and BRCAT95, tial expression between normal and cancer is complicated. Differ- which were previously implicated in cancer (23). In most ences in gene expression may simply reflect differences in cell cases, the transcription is initiated bidirectionally and in five composition. To minimize this issue, the TCGA (3) required that cases it overlaps regions previously annotated as enhancers. To profiled tumor samples contain at least 60% tumor cells and less show that the observed activation of REP522 elements was not than 20% necrosis. The FANTOM5 cell line and primary cell data due to a mapping artifact, we performed qPCR validation for avoids this complication as relatively homogenous, pure cell

www.aacrjournals.org Cancer Res; 76(2) January 15, 2016 OF7

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research. Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

Kaczkowski et al.

A B BLM, mean normal C = 35.7 t PRAME, mean normal Ct = 33.2 log FC = 2.6, P = 0.0000002 2 log2FC = 1.3, P = 0.12 10 10 Pr

5 5 0 ) vs. mean normal ) vs. mean normal t t

5 Ps 0 FC (ddC FC (ddC 2 2 10 log log

5 Normal Tumor Normal Tumor An

An

Ps ENST00000448869, mean normal Ct = 33.3 FOXP4 AS1, mean normal Ct = 31.6 log2FC = 2.6, P = 0.0001 log2FC = 2.2, P = 0.00000007

Pr 10 6 Pr 5 4

2 0 ) vs. mean normal t

) vs. mean normal 0 t

5 2 FC (ddC FC (ddC 2 2 4 log Legend: 10 log Statistical significance (double-sided t test): 6 * - P < 0.05, ** - P < 0.01, *** - P < 0.001. Normal Tumor Normal Tumor • - qPCR validation in cDNA tumor panel. ˚ - Bidirectional transcription from the same REP522 element. C ENST00000448869 Breast Colon Kidney Thyroid 10

5

0 ) vs. mean normal t −5 FC (ddC 2 −10 log Liver Lung Ovarian Prostate 10 10

5 5

0 0 ) vs. mean normal t −5 −5 FC (ddC 2 −10 −10 log Normal Tumor Normal Tumor Normal Tumor Normal Tumor

Figure 4. Validation of pan-cancer biomarkers by qRT-PCR. A, summary of the qRT-PCR validations in three cancer cell lines and dermal fibroblasts as normal reference. The table shows the significant upregulation of eight REP522-associated transcripts and six potential biomarkers, and downregulation of three lncRNAs (potential tumor suppressors). B, for the most promising candidates, we performed qRT-PCR validation across a cDNA panel of 65 tumors, seven lesions and 24 normal tissues. Note: BLM, a known tumor suppressor, and two selected lncRNAs (ENST00000448869 and FOXP4-AS1) showed highly significant upregulation in cancer. C, as in B, but showing the upregulation of ENST00000448869 across multiple cancer types.

cultures were profiled. Conversely, artifacts from the long-term etoposide (30). ASNS is targeted in asparginase therapy of acute culture of cell lines and their artificial in vitro culture conditions lymphoblastic leukemia (31), and both MKI67 and MCM2 have could affect our FANTOM5 analysis. The TCGA avoids this by been studied as biomarkers (32) and (33) and potential drug directly profiling fresh tissue. targets (34, 35). Targeting these genes is likely to bring therapeutic As expected, there are differences in the genes sets identified by value to many patients as they are recurrently upregulated across the two datasets. Despite this, we identified a core set of 128 many cancer types. Our pan-cancer markers also appear to be markers that are consistently perturbed in both the FANTOM5 cell mostly novel, as comparison to prior works [Rhodes and collea- and TCGA tissue analyses. Four of the markers are also upregulated gues, multicancer meta-signature of 67 genes upregulated in at the protein level in a colon cancer dataset. Specifically, TOP2A, cancer by meta-analysis of 40 published microarray experiments MKI67, MCM2,andASNS, which are among some of the most (18); Xu and colleagues, 46 genes upregulated across 21 microarray studied cancer biomarkers and drug targets. TOP2A is targeted by data sets (36)] found little overlap (Supplementary Table S8).

OF8 Cancer Res; 76(2) January 15, 2016 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research. Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

Pan-Cancer Transcriptome

Table 2. The differentially expressed lncRNAs located within 100 kb from known cancer-related genes lncRNA Neighbor Neighbor Distance from lncRNA DE summary gene name DE summary Neighbor gene info lncRNA Overlap Strand MCF2L-AS1 Solid UP MCF2L Solid UP Oncogene <1 kb Yes Opposite RP11-475N22.4 (GATA2-AS1) Solid ON GATA2 Solid ON Oncogene, cancer gene census <1 kb Yes Opposite MNX1-AS1 Pan ON MNX1 Solid ON Oncogene <1 kb No Opposite AC009005.2 Solid ON BSG Pan UP Cancer antigen <1 kb Yes Opposite CSAG4 Pan ON CSAG1 Pan ON Cancer antigen <1 kb No Opposite HOXA11-AS Pan UP HOXA11 Oncogene, cancer gene census <1 kb Yes Opposite RHPN1-AS1 Pan UP MAFA Solid UP Tumor suppressor, oncogene <100 kb No Same LINC00624 Pan ON BCL9 Solid UP Oncogene, cancer gene census <100 kb No Opposite IFITM9P Solid DOWN MYEOV Pan UP Oncogene <100 kb Yes Opposite RP11-435O5.2 Pan UP PTCH1 Pan ON Tumor suppressor <100 kb No Same LIFR-AS1 Solid ON LIFR Oncogene, cancer gene census, Mut Driver <100 kb Yes Opposite AC079767.4 Pan ON CREB1 Oncogene, cancer gene census, Mut Driver <100 kb No Same RP11-460N16.1 Pan ON MITF Oncogene, cancer gene census <100 kb No Same RP11-1070N10.5 Solid ON TCL6 Oncogene, cancer gene census <100 kb No Same RP11-1070N10.5 Solid ON TCL1A Oncogene, cancer gene census <100 kb No Opposite HOXA11-AS Pan UP HOXA9 Oncogene, cancer gene census <100 kb No Opposite PVT1 Solid UP MYC Oncogene, cancer gene census <100 kb No Same LAMTOR5-AS1 Solid UP RBM15 Oncogene, cancer gene census <100 kb No Same RNU6-781P Pan UP ZNF384 Oncogene, cancer gene census <100 kb No Opposite RP11-284F21.7 Pan UP PRCC Oncogene, cancer gene census <100 kb No Opposite HOXA11-AS Pan UP HOXA13 Oncogene, cancer gene census <100 kb No Opposite RP11-1070N10.5 Solid ON TCL1B Oncogene <100 kb No Same TSPY26P Solid OFF HCK Oncogene <100 kb No Opposite RP5-884M6.1 Solid ON PIK3CG Mut Driver <100 kb No Same RP11-405F3.4 Pan ON KIFC3 Mut Driver <100 kb No Same RP5-991G20.1 Pan UP ZFHX3 Mut Driver <100 kb Yes Opposite CTA-714B7.5 Pan UP TOM1 Mut Driver <100 kb No Opposite LINC00243 Pan UP MDC1 Mut Driver <100 kb No Same CTA-714B7.5 Pan UP HMGXB4 Mut Driver <100 kb No Opposite CTC-338M12.9 Pan ON TRIM7 Mut Driver <100 kb No Opposite SCARNA14 Solid ON MAP2K1 Cancer gene census <100 kb No Opposite AC034193.5 Solid UP FANCD2 Cancer gene census <100 kb No Same RNU6-781P Pan UP ING4 Tumor suppressor <100 kb No Opposite NOTE: For the complete list of genes that are located near differentially expressed lncRNAs, see Supplementary Table S7. Abbreviation: DE, differentially expressed.

The FANTOM5 CAGE data also allowed us to look at tran- types. We also identify 90 enhancer RNA-producing regions script types rarely studied in prior efforts (long-noncoding that are recurrently activated in cancer cell lines. For four of RNAs, enhancer RNAs, and repetitive element derived RNAs). them a corresponding lncRNA transcript model is upregulated We report 271 pan-cancer lncRNAs, including famous cancer- in the TCGA dataset. associated lncRNAs such as PVT1 and many novel cases. Public The observation that promoters that overlap repetitive ele- datasets confirmed the upregulation of 25 and downregulation ments are often upregulated in cancer is quite interesting, and of three of these lncRNAs in at least two primary tumor types the link of the little known REP522 element to cancer is novel. (23, 24) and we further validated upregulation of two novel One instance of REP522 near the B melanoma antigen (BAGE) lncRNAs by qRTPCR in a cDNA panel covering eight tumor has been reported to be marked with H3K9me3 and actively

Table 3. Promoters overlapping repetitive elements Location of differentially expressed repeat Repeat overlapping Differentially expressed Protein promoters repeat overlapping promoters coding Repeat family Total # down Odds ratio P-value # up Odds ratio P-value 50UTR Intron Exon 30UTR lncRNA Pseudogene Not annotated REP522 72 0 0 1 25 62.05 2.2E16 1 0 0 0 9 3 12 Low_complexity 2,013 13 2.37 4.7E03 18 1.04 0.81 15 2 2 6 2 0 4 Simple_repeat 11,982 44 1.35 0.06 204 2.13 2.2E16 86 70 4 7 17 1 63 SINE/Alu 3,961 0 0 2.4E05 138 4.44 2.2E16 5 67 1 1 11 3 50 LINE/L1 3,426 1 0.1 1.5E03 67 2.35 1.8E09 2 22 0 0 12 0 32 LINE/L2 3,220 2 0.22 0.01 25 0.9 0.7 2 4 0 0 4 0 17 LTR/ERVL-MaLR 3,613 0 0 7.8E05 31 0.99 1 6 4 0 0 10 0 11 LTR/ERV1 3,932 2 0.18 3.0E03 133 4.3 2.2E16 7 12 0 0 31 2 83 LTR/ERVL 1,488 0 0 0.04 20 1.57 0.049 2 2 0 0 8 0 8 NOTE: The table shows the numbers of upregulated and downregulated promoters that overlap nine families of repetitive elements (20 promoters) as well as Fisher exact statistics of the enrichments (odd ratios and P-value, two-sided test). The right side of the table shows the available information about the annotation of those promoters.

www.aacrjournals.org Cancer Res; 76(2) January 15, 2016 OF9

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research. Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

Kaczkowski et al.

transcribed (37), perhaps suggesting REP522 transcriptional acti- Writing, review, and/or revision of the manuscript: B. Kaczkowski, vation is responsible for upregulation of BAGE in cancer. Other R. Andersson, P. Carninci, A.R.R. Forrest better studied elements such as LTR elements have previously Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): Y. Tanaka, H. Kawaji, A. Sandelin, T. Lassmann, been reported to act as alternative promoters of host genes in The FANTOM5 Consortium, P. Carninci mouse embryos (38) and to contribute to the complexity of the Study supervision: A.R.R. Forrest transcriptome of iPS and stem cells (39). Thus, the reactivation of Other (supported experimental design of qPCR validation of pan-cancer these elements and the eRNAs identified above suggests acquisi- marker's candidates and data analysis, as well as performing experiments): tion of stem cell like properties by cancer cells. Possibly because Y. Tanaka repetitive sequences are usually suppressed by methylation in Acknowledgments somatic cells; however, in cancer they are frequently hypomethy- The authors thank Erik Arner, Efthymios Motakis, Kosuke Hashimoto, Dave lated (40). Tang, Chung-Chau Hon, Jordan Ramilowski, and Giovani Pascarella for valu- In conclusion, our results, which highlight the transcriptome able discussions and comments to the manuscript, and Yuri Ishizu for technical changes in cancer and cover both protein coding genes, non- assistance. protein coding transcripts, unannotated promoters and enhancer RNAs, represent an important step towards discovery of poten- Grant Support tially useful cancer biomarkers and therapeutic targets. Develop- B. Kaczkowski was supported by Postdoctoral Fellowship Program from ment of technologies to detect and target these molecules has the Japan Society for the Promotion of Science (JSPS) and Foreign Postdoctoral great potential to be applicable to a broad range of cancers. One Researcher (FPR) program from RIKEN, Japan. Y. Tanaka was supported by Grants-in-Aid for Scientific Research (KAKENHI) from the Ministry of last note is that we identify molecules that are consistently up or Education, Culture, Sports, Science, and Technology. R. Andersson was down in cancer normal comparisons, but are not necessarily supported by funding from the European Research Council (ERC) under always higher in all cancers relative to all normal tissues (a subset the European Union's Horizon 2020 Research and Innovation Programme are). Such molecules may not be suitable for plasma/serum based (grant agreement No. 638273). A. Sandelin was supported by the Novo diagnostics but would be useful in screening biopsies in a histo- Nordisk Foundation and the Lundbeck Foundation. FANTOM5 was made pathologic setting. possible by a Research Grant for RIKEN Omics Science Center from MEXT to Y. Hayashizaki and a Grant of the Innovative Cell Biology by Innovative Technology (Cell Innovation Program) from the MEXT, Japan to Y. Haya- Disclosure of Potential Conflicts of Interest shizaki. This study is also supported by Research Grants from the Japanese P. Carninci is founder and CEO for TransSINE Technologies. No potential Ministry of Education, Culture, Sports, Science and Technology through conflicts of interest were disclosed by the other authors. RIKEN Preventive Medicine and Diagnosis Innovation Program to Y. Haya- shizaki and RIKEN Centre for Life Science, Division of Genomic Technol- Authors' Contributions ogies to P. Carninci. A.R.R. Forrest is supported by a Senior Cancer Research Conception and design: B. Kaczkowski, Y. Hayashizaki, P. Carninci, Fellowship from the Cancer Research Trust and funds raised by the MACA A.R.R. Forrest Ride to Conquer Cancer. Development of methodology: B. Kaczkowski, M. Itoh, P. Carninci The costs of publication of this article were defrayed in part by the Acquisition of data (provided animals, acquired and managed patients, payment of page charges. This article must therefore be hereby marked provided facilities, etc.): Y. Tanaka, M. Itoh, The FANTOM5 Consortium, advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate P. Carninci, A.R.R. Forrest this fact. Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): B. Kaczkowski, H. Kawaji, A. Sandelin, R. Andersson, Received February 18, 2015; revised September 21, 2015; accepted October 4, M. Itoh, T. Lassmann 2015; published OnlineFirst November 9, 2015.

References 1. Felder M, Kapur A, Gonzalez-Bosquet J, Horibata S, Heintz J, Albrecht R, 10. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for et al. MUC16 (CA125): tumor biomarker to cancer therapy, a work in differential expression analysis of digital gene expression data. Bioinfor- progress. Mol Cancer 2014;13:129. matics 2010;26:139–40. 2. Makarov DV, Loeb S, Getzenberg RH, Partin AW. Biomarkers for prostate 11. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, cancer. Annu Rev Med 2009;60:139–51. et al. Gene set enrichment analysis: a knowledge-based approach for 3. Cancer Genome Atlas Research Network. Kandoth C, Schultz N, Cherniack interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A AD, Akbani R, Liu Y, et al. Integrated genomic characterization of endo- 2005;102:15545–50. metrial carcinoma. Nature 2013;497:67–73. 12. Magrane M, Consortium U. UniProt Knowledgebase: a hub of integrated 4. Cancer Genome Atlas Research Network. Genome Characterization protein data. Database (Oxford) 2011:bar009. Center. Chang K, Creighton CJ, Davis C, Donehower L, et al. The 13. Zhao M, Sun J, Zhao Z. TSGene: a web resource for tumor suppressor genes. Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 2013;45: Nucleic Acids Res 2013;41:D970–6. 1113–20. 14. Tamborero D, Gonzalez-Perez A, Perez-Llamas C, Deu-Pons J, Kandoth C, 5. FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter- Reimand J, et al. Comprehensive identification of mutational cancer driver level mammalian expression atlas. Nature 2014;507:462–70. genes across 12 tumor types. Sci Rep 2013;3:2650. 6. The Cancer Genome Atlas - Data Portal [Internet]. [cited 2015 Jul 14]. 15. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, et al. A Available from: https://tcga-data.nci.nih.gov/ census of human cancer genes. Nat Rev Cancer 2004;4:177–83. 7. Zhang B, Wang J, Wang X, Zhu J, Liu Q, Shi Z, et al. Proteogenomic 16. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, characterization of human colon and rectal cancer. Nature 2014;513: et al. GENCODE: the reference annotation for The 382–7. ENCODE Project. Genome Res 2012;22:1760–74. 8. FANTOM5 project [Internet]. [cited 2015 Jul 14]. Available from: http:// 17. Fratta E, Coral S, Covre A, Parisi G, Colizzi F, Danielli R, et al. The biology of fantom.gsc.riken.jp/5/ cancer testis antigens: putative function, regulation and therapeutic poten- 9. Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, tial. Mol Oncol 2011;5:164–82. et al. An atlas of active enhancers across human cell types and tissues. 18. Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, et al. Nature 2014;507:455–61. Large-scale meta-analysis of cancer microarray data identifies common

OF10 Cancer Res; 76(2) January 15, 2016 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research. Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

Pan-Cancer Transcriptome

transcriptional profiles of neoplastic transformation and progression. Proc 31. Hawkins DS, Park JR, Thomson BG, Felgenhauer JL, Holcenberg JS, Natl Acad Sci U S A 2004;101:9309–14. Panosyan EH, et al. Asparaginase pharmacokinetics after intensive 19. Janssen JW, Vaandrager JW, Heuser T, Jauch A, Kluin PM, Geelen E, et al. polyethylene glycol-conjugated L-asparaginase therapy for children Concurrent activation of a novel putative transforming gene, myeov, and with relapsed acute lymphoblastic leukemia. Clin Cancer Res 2004;10: cyclin D1 in a subset of multiple myeloma cell lines with t(11;14)(q13; 5335–41. q32). Blood 2000;95:2691–8. 32. Dudderidge TJ, Stoeber K, Loddo M, Atkinson G, Fanshawe T, Griffiths 20. Taketani T, Taki T, Sako M, Ishii T, Yamaguchi S, Hayashi Y. MNX1-ETV6 DF,etal.Mcm2,Geminin,andKI67define proliferative state and are fusion gene in an acute megakaryoblastic leukemia and expression of the prognostic markers in renal cell carcinoma. Clin Cancer Res 2005;11: MNX1 gene in leukemia and normal B cell lines. Cancer Genet Cytogenet 2510–7. 2008;186:115–9. 33. Wharton SB, Chan KK, Anderson JR, Stoeber K, Williams GH. Replicative 21. Akamatsu S, Takata R, Haiman CA, Takahashi A, Inoue T, Kubo M, et al. Mcm2 protein as a novel proliferation marker in oligodendrogliomas and Common variants at 11q12, 10q26 and 3p11.2 are associated with its relationship to Ki67 labelling index, histological grade and prognosis. prostate cancer susceptibility in Japanese. Nat Genet 2012;44:426–9–S1. Neuropathol Appl Neurobiol 2001;27:305–13. 22. Barrett CW, Ning W, Chen X, Smith JJ, Washington MK, Hill KE, et al. 34. Liu Y, He G, Wang Y, Guan X, Pang X, Zhang B. MCM-2 is a therapeutic Tumor suppressor function of the plasma glutathione peroxidase gpx3 in target of Trichostatin A in colon cancer cells. Toxicol Lett 2013;221:23–30. colitis-associated carcinoma. Cancer Res 2013;73:1245–55. 35. Rahmanzadeh R, Rai P, Celli JP, Rizvi I, Baron-Luhr€ B, Gerdes J, et al. Ki-67 23. Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y, et al. The as a molecular target for therapy in an in vitro three-dimensional model for landscape of long noncoding RNAs in the human transcriptome. Nat Genet ovarian cancer. Cancer Res 2010;70:9234–42. 2015;47:199–208. 36. Xu L, Geman D, Winslow RL. Large-scale integration of cancer microarray 24. Cabanski CR, White NM, Dang HX, Silva-Fisher JM, Rauck CE, Cicka D, data identifies a robust common cancer signature. BMC Bioinformatics et al. Pan-cancer transcriptome analysis reveals long noncoding RNAs with 2007;8:275. conserved function. RNA Biol 2015;12:628–42. 37. Ward MC, Wilson MD, Barbosa-Morais NL, Schmidt D, Stark R, Pan Q, 25. Tseng Y-Y, Moriarity BS, Gong W, Akiyama R, Tiwari A, Kawakami H, et al. et al. Latent regulatory potential of human-specific repetitive elements. Mol PVT1 dependence in cancer with MYC copy-number increase. Nature Cell 2013;49:262–72. 2014;512:82–6. 38. Peaston AE, Evsikov AV, Graber JH, de Vries WN, Holbrook AE, Solter D, 26. Saitou M, Sugimoto J, Hatakeyama T, Russo G, Isobe M. Identification of et al. Retrotransposons regulate host genes in mouse oocytes and preim- the TCL6 genes within the breakpoint cluster region on chromosome plantation embryos. Dev Cell 2004;7:597–606. 14q32 in T-cell leukemia. Oncogene 2000;19:2796–802. 39. Fort A, Hashimoto K, Yamada D, Salimullah M, Keya CA, Saxena A, et al. 27. Li H, Wang J, Xu H, Xing R, Pan Y, Li W, et al. Decreased fructose-1,6- Deep transcriptome profiling of mammalian stem cells supports a regu- bisphosphatase-2 expression promotes glycolysis and growth in gastric latory role for retrotransposons in pluripotency maintenance. Nat Genet cancer cells. Mol Cancer 2013;12:110. 2014;46:558–66. 28. Wheeler TJ, Clements J, Eddy SR, Hubley R, Jones TA, Jurka J, et al. Dfam: a 40. Ehrlich M. DNA methylation in cancer: too much, but also too little. database of repetitive DNA based on profile hidden Markov models. Oncogene 2002;21:5400–13. Nucleic Acids Res 2013;41:D70–82. 41. Zhang L, Huang J, Yang N, Greshock J, Liang S, Hasegawa K, et al. 29. Wang G, Yang X, Li C, Cao X, Luo X, Hu J. PIK3R3 induces epithelial-to- Integrative genomic analysis of phosphatidylinositol 30-kinase family mesenchymal transition and promotes metastasis in colorectal cancer. Mol identifies PIK3R3 as a potential therapeutic target in epithelial ovarian Cancer Ther 2014;13:1837–47. cancer. Clin Cancer Res 2007;13:5314–21. 30. Johnson CA, Padget K, Austin CA, Turner BM. Deacetylase activity associ- 42. Severin J, Lizio M, Harshbarger J, Kawaji H, Daub CO, Hayashizaki Y, et al. ates with topoisomerase II and is necessary for etoposide-induced apo- Interactive visualization and analysis of large-scale sequencing datasets ptosis. J Biol Chem 2001;276:4539–42. using ZENBU. Nat Biotechnol 2014;32:217–9.

www.aacrjournals.org Cancer Res; 76(2) January 15, 2016 OF11

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research. Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

Transcriptome Analysis of Recurrently Deregulated Genes across Multiple Cancers Identifies New Pan-Cancer Biomarkers

Bogumil Kaczkowski, Yuji Tanaka, Hideya Kawaji, et al.

Cancer Res Published OnlineFirst November 9, 2015.

Updated version Access the most recent version of this article at: doi:10.1158/0008-5472.CAN-15-0484

Supplementary Access the most recent supplemental material at: Material http://cancerres.aacrjournals.org/content/suppl/2015/11/10/0008-5472.CAN-15-0484.DC1

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cancerres.aacrjournals.org/content/early/2016/01/07/0008-5472.CAN-15-0484. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cancerres.aacrjournals.org on September 27, 2021. © 2015 American Association for Cancer Research.