Cancer Therapy (2015) 22, 278–284 © 2015 Nature America, Inc. All rights reserved 0929-1903/15 www.nature.com/cgt

ORIGINAL ARTICLE Identification of commonly dysregulated in colorectal cancer by integrating analysis of RNA-Seq data and qRT-PCR validation

WH Xiao1,3,XLQu1,3,XMLi1, YL Sun2, HX Zhao1, S Wang1 and X Zhou1

The progression of colorectal cancer (CRC) is a multistep process and metastatic CRC is always incurable; consequently, CRC is the leading cause of cancer-related deaths. There is therefore an urgent need for identifying useful biomarkers with enough sensitivity and specificity to detect this disease at early stages, which will significantly reduce the mortality for this malignancy. In this study, we performed an integrating analysis of different RNA-Seq data sets to find new candidate biomarkers for diagnosis, prognosis and as therapeutic targets for this malignancy, as well as to elucidate the molecular mechanisms of CRC carcinogenesis. We identified 883 differentially expressed genes (DEGs) across the studies between CRC and normal control (NC) tissues by combining five RNA- Seq data sets. Gene function analysis revealed high correlation with carcinogenesis. The top 10 most significantly DEGs were further evaluated by quantitative real-time polymerase chain reaction (qRT-PCR) in both rectal cancer (RC) and colon cancer (CC), and the results matched well with integrating data, suggesting that the method of integrating analysis of different RNA-seq data sets is acceptable. Therefore, integrating analysis of different RNA-seq data sets may be a useful way to overcome the limitation of small sample size in a single RNA-seq study. In addition, our study showed that some genes, such as SIM2, ADAMTS6, FOXD4L4 and DNAH5, may have an important role in the development of CRC, which could be applied for diagnosis, prognosis and as therapy for this malignancy. Our findings would also help to understand the pathology of CRC.

Cancer Gene Therapy (2015) 22, 278–284; doi:10.1038/cgt.2015.20; published online 24 April 2015

INTRODUCTION RNA-Seq has been used to determine DEGs between cancer Colorectal cancer (CRC) is one of the leading causes of cancer- and normal tissues. However, there are inconsistencies among related death worldwide.1 At early stages, CRC is potentially studies due to utilization of different sample sources, platforms 11 curable by surgical resection and (neo-) adjuvant chemotherapy and analysis techniques. In addition, due to the cost of RNA-seq with a relatively high 5-year survival rate of 490%. Whereas experiments, RNA-seq experiments are currently performed on diagnosed at advanced stages, the tumour has already spread to very few biological replicates; therefore, analyses of detecting other organs, the5-year survival rate is generally o10%.2 There- differential expression between cancer and normal tissues tend to fore, it is urgently needed to explore useful biomarkers with lack detection power. To address these challenges, the method of enough sensitivity and specificity to detect this disease as early as integrating analysis was recently developed to increase statistical 12–14 possible, which will significantly reduce the mortality for this power for detecting DEGs. 15,16 malignancy. There are several studies of integrating analysis, based on CRC Accumulation of genetic mutations and epigenetic alterations microarray data, and these studies showed important sub-network in colonic cells drive the development of CRC.3–5 Circulating signatures for CRC, such as TP53 sub-network, PCNA sub-network carcinoembryonic antigen levels and tumor-associated genes such and IL8 sub-network. However, to our knowledge, there is no as APC, KRAS, , MSI,6 SOCS2 and SOCS6 (Letellier et al.7) have integrating study on the basis of RNA-seq data in CRC. In this study, been proposed due to their prognostic or predictive values as CRC we integrated multiple RNA-Seq data sets to detect candidate CRC biomarkers. biomarkers, investigated the pathology of CRC, guiding the develop- Differential expression analysis between cancer and normal ment of new therapies for CRC. Further validation by quantitative cells may identify biomarkers for diagnostic and prognostic use real-time polymerase chain reaction (qRT-PCR) not only verifies the and further shed light on the molecular mechanism underly- result of integrating analysis, but also detects the diagnostic power ing tumorigenesis. Recently, high-throughput mRNA of some genes in both rectal cancer (RC) and colon cancer (CC). (RNA-Seq) emerged as an attractive alternative to traditional microarrays in measuring global genomic expressions.8,9 Previous studies comparing RNA-Seq data with microarray data parallelly MATERIALS AND METHODS have reported that RNA-Seq has advantages over microarray in Identification of eligible RNA-seq data sets of CRC identifying differentially expressed genes (DEGs) because of We searched PubMed database and the Omnibus greater efficiency and higher resolution.10 database (GEO, http://www.ncbi.nlm.nih.gov/geo)17 to identify CRC

1Department of Oncology, The First affiliated Hospital of PLA General Hospital, Beijing, China and 2Beijing Yangshen Bioinformatic Technology, Beijing, China. Correspondence: Dr W Xiao, Department of Oncology, The First affiliated Hospital of PLA General Hospital, Beijing 100048, China. E-mail: [email protected] 3These authors contributed equally to this work. Received 20 January 2015; revised 9 March 2015; accepted 12 March 2015; published online 24 April 2015 Integrating analysis of different RNA-seq data sets WH Xiao et al 279

Table 1. The clinicopathological features of five CRC patients for qRT-PCR validation

No. Pathological diagnosis Site Pathological TNM staging Age/gender Size (cm)

C01C01 Adenocarcinoma Descending colon T3N0M0IIA 70/M 4.9 × 3.4 × 1.1 C02C01 Adenocarcinoma Sigmoid colon T3N0M0IIA 69/M 4.0 × 2.5 × 0.8 C07C01 Adenocarcinoma Sigmoid colon T3N0M0IIA 41/M 3.4 × 3.2 × 2.0 R01C01 Adenocarcinoma Rectum T3N2M0IIIC 75/M 4.5 × 3.8 × 1.0 R02C01 Adenocarcinoma Rectum T3N2M0IIIC 68/M 3.4 × 2.5 × 0.8 Abbreviations: CRC, colorectal cancer; qRT–PCR, quantitative real-time–polymerase chain reaction; M, male; TNM, tumor node metastases.

Table 2. Characteristics of individual studies for integrating analysis

GEO ID Samples (cancer: normal) Platform Sample source Country Year

GSE46622 4: 4 GPL10999, Illumina Genome Analyzer IIx In vivo Germany 2013 GSE41586 3: 0 GPL11154, Illumina HiSeq 2000 In vitro United States 2013 GSE39068 1: 0 GPL10999, Illumina Genome Analyzer IIx In vitro Netherlands 2012 GSE33782 2: 1 GPL10999, Illumina Genome Analyzer Iix In vivo China 2012 GSE29580 2: 2 Illumina Genome Analyzer In vivo Taiwan 2011

expression profiling studies performed by RNA-Seq. The following key CRC samples for qRT-PCR validation words and their combinations were used: ‘colorectal cancer, gene To validate findings in the integrating analysis, we selected the top 10 expression, human, RNA-Seq and genetics’. We only retained the original most significantly DEGs to determine their expression in both RC and CC, experimental studies containing gene expression profiling of both CRC and which are often referred to as CRC. The five CRC patients diagnosed with normal control (NC). Non-human studies, review articles and integrating RC or CC included in this study were from the First affiliated Hospital of PLA analysis of expression profiles were excluded. General Hospital. The clinicopathological features of each patient are summarized in Table 1. All protocols and consent forms were approved by the Medical Ethics Committee of the hospital. The written informed Statistical analysis consent forms were obtained from the patients or legal guardians of the To access the transcription abundance for each gene across different RNA- patients (as appropriate). Each tumor sample was matched with adjacent 18 Seq studies, the Cufflinks v1.0.3 was used to process the aligned reads apparently normal mucosa removed during the same operation. Frozen from different samples. The GTF file for annotation that sections were made from these tissues and examined independently by was used in this analysis was retrieved from University of California Santa senior pathologists. Paired normal and cancer samples were frozen Cruz (UCSC) database. The expression level for each transcript was immediately after the operation and were stored at − 135 °C until RNA normalized to the reads per kilobase of exon model per million mapped extraction. reads (FPKM). The Cuffdiff was used to process the original alignment file 19 fi produced by Tophat and GTF le for genome annotation to determine RNA isolation and qRT-PCR the DEGs. After applying Benjamini–Hochberg correction for multiple tests, the false discovery rate o0.01 was selected as the criteria for significant The total RNA from each sample was extracted using RNAeasy Mini Kit ’ differences. Hierarchical clustering of DEGs among samples was performed (Qiagen, Valencia, CA, USA) according to the manufacturer s protocol. We utilized PrimerPlex 2.61 (PREMIER Biosoft, Palo Alto, CA, USA) to design by R softwore.20 primers for SYBR-Green experiments using template sequences, and we adopted An ABI 7500 real-time PCR system (Applied Biosystems, Carlsbad, Functional classification of DEGs CA, USA). For each replicate, complementary DNA (cDNA) was synthesized μ To interpret the biological function of DEGs, we performed an enrichment from 1 to 5 g RNA using Superscript Reverse Transcriptase II (Invitrogen/ Life Technologies, Carlsbad, CA, USA). The qRT-PCR reaction comprised analysis of (GO) to display their functional distribution in 6.25 μl of Power SYBR Green PCR Master Mix (Applied Biosystems/Life CRC. GO categories are organized into three groups biological process, Technologies), 50 ng of diluted cDNA and 1 μM of each primer contributing cellular component and molecular function, which were examined a total volume of 12.5 μl. Reactions were conducted in triplicate to insure separately in our analysis. consistent technical replication and then run in 96-well plates under the In addition, we also studied the enrichment of DEGs in a particular following conditions: 50 °C for 2 min, 95 °C for 10 min, and 40 cycles of pathway on the basis of the Kyoto Encyclopedia of Genes and Genomes 95 °C for 15 s and 60 °C for 1 min. Melting curves (60–95 °C) were derived (KEGG) database. The online software GENECODIS (http://genecodis.cnb. for every reaction to insure a single product. Relative gene expression was csic.es) was used to perform these functional enrichment analysis of 21 evaluated with Data Assist Software version 3.0 (Applied Biosystems/Life DEGs. Technologies), using human gene as endogenous controls for RNA load and gene expression in analysis. The GraphPad Prism version 6.0 PPI network construction software package (GraphPad Software, San Diego, CA, USA) was used to output figures. The –protein interactions (PPIs) research can help to interpret the functions of at the molecular level and uncover the rules of cellular activities including growth, development, metabolism, differentia- RESULTS tion and apoptosis.22 The identification of protein-interacting ions in a Differential expression analysis by integrating analysis genome-wide scale is important for the interpretation of the regulatory mechanisms.23 In this analysis, we used Biological General Repository for We collected a total of five RNA-seq data sets of CRC according to Interaction Datasets (BioGRID) (http://thebiogrid.org/) to construct PPI the inclusion criteria. Selected details of the individual studies network and visualized the distribution characteristics of the top 20 most were summarized in Table 2. As CRC gene expression data significantly DEGs in the network by using Cytoscape software.24 performed with RNA-Seq is limited, gene expression data of both

© 2015 Nature America, Inc. Cancer Gene Therapy (2015), 278 – 284 Integrating analysis of different RNA-seq data sets WH Xiao et al 280

Figure 1. Heat map visualization of the gene expression patterns among samples across different data sets.

tissues and cell lines were included in our analysis. We combined binding (GO: 0003677, P = 0.0001) and transcription regulator the five RNA-Seq data sets and obtained a total of 43 107 genes. activity (GO: 0030528, P = 0.0001), and those for cellular compo- With a false discovery rate o0.01, 883 genes were found to show nent were cell projection (GO: 0042995, P = 0.0002) and intrinsic to altered expression in samples of CRC compared with NC tissues. the plasma membrane (GO: 0031226, P = 0.0069) (Table 4). Furthermore, Figure 1 showed the hierarchical clustering of DEGs Hypergeometric test with P–value o0.05 was used as the criteria among samples. A list of the top 20 most significantly DEGs for pathway detection. The most significantly enriched pathway was presented in Table 3. The primers deign list in qRT-PCR was melanoma (P = 0.0168). In addition, neuroactive ligand– experiments and the full list of DEGs between CRC and NC tissues interaction (P = 0.0252) is also found to be highly were shown in Supplementary Tables S1 and S2, respectively. enriched.

Functional annotation PPI network Genes of Po0.01 were selected and were tested against the PPI networks of top 20 most significantly DEGs were established background set of all genes with GO annotations. The significantly by using Cytoscape software, which included 50 nodes and 58 enriched GO terms for biological processes were transcription (GO: edges. Network for some genes was not available in BioGRID. 0006350, P = 0.0001) and regulation of transcription (GO: 0045449, In the PPI networks the nodes with high degree are defined as P = 1.07E-03), while those for molecular functions were DNA hub proteins. Degree measures how many neighbors a node

Cancer Gene Therapy (2015), 278 – 284 © 2015 Nature America, Inc. Integrating analysis of different RNA-seq data sets WH Xiao et al 281 as colon, prostate and pancreatic carcinomas, but it was not Table 3. The top 20 most significantly DEGs. expressed in breast, lung, or ovarian carcinomas nor in most Gene Gene Official full name FDR normal tissues. In colon tumors, SIM2-s expression was seen in ID symbol early stages. More importantly, antisense inhibition of SIM2 expression in a colon cancer cell line caused cell growth inhibition 124783 SPATA32 Spermatogenesis associated 32 0.0000 and apoptosis. SIM2 could be considered as a molecular target for 155185 AMZ1 Archaelysin family metallopeptidase 1 0.0000 cancer therapeutics. The important role of SIM2 in CRC could also 349334 FOXD4L4 Forkhead box D4-like 2; forkhead box 0.0001 be observed in our PPI analysis for the top 20 most significantly D4-like 4 DEGs as one of the significant hub proteins. The result above, in 3892 KRT86 86 0.0001 fi 349334 FOXD4L2 Forkhead box D4-like 2 0.0001 agreement with previous ndings, suggests that the integrating 11174 ADAMTS6 ADAM metallopeptidase with 0.0002 analysis approach in our study is acceptable. thrombospondin type1 motif, 6 On the basis of the gene function analysis of DEGs, we found 150221 RIMBP3C RIMS-binding protein 3C 0.0002 that transcription was significantly enriched for GO terms of 1767 DNAH5 , axonemal, heavy chain 5 0.0002 biological processes and melanoma was the most significantly 27125 AFF4 AF4/FMR2 family, member 4 0.0002 enriched pathway. Malignant transformation of the cell would 323 APBB2 (A4) precursor protein- 0.0002 initiate transcription of amounts of genes and binding, family B, member 2 to enable clonal proliferation, uncontrolled growth and finally 337874 HIST2H2BD Histone cluster 2, H2bd 0.0002 invasion. Melanomas are among the most common cancers in 6493 SIM2 Single-minded homolog 2 () 0.0002 440804 RIMBP3B RIMS-binding protein 3B 0.0002 human, which is the major cause of skin cancer deaths. 124359 CDYL2 Chromodomain protein, Y-like 2 0.0003 Melanomas and CRC share similar morphological and cytological 152877 FAM53A Family with sequence similarity 53, 0.0003 changes triggered by dysregulation of tumor suppressor genes member A and oncogenes such as MET, BRAF and TP53. 23762 OSBP2 Oxysterol binding protein 2 0.0003 To support our findings in the integrating analysis, we selected 337873 HIST2H2BC Histone cluster 2, H2bc 0.0003 top 10 most significantly DEGs for qRT-PCR validation in CC and 653427 FOXD4L5 Forkhead box D4-like 5 0.0003 RC. Consistently with those of the integrating analysis, we found 85376 RIMBP3 RIMS-binding protein 3 0.0003 that the 10 genes, ADAMTS6, AFF4, AMZ1, APBB2, DNAH5, FOXD4L4, 220107 DLEU7 Deleted in lymphocytic leukemia, 7 0.0004 KRT86, RIMBP3, RIMBP3C and SPATA32, were differently expressed Abbreviations: DEGs, differentially expressed genes; FDR, false between CRC and NC tissues. SPATA32 and ADAMTS6 were discovery rate. downregulated in CC, while upregulated in RC compared with NC tissues. SPATA32 was stage-specifically expressed in the acrosome of spermatids, and maybe involved in acrosome formation during directly connects to. The significant hub proteins contained spermiogenesis.32 The role of SPATA32 in CRC has not been AFF4 (degree = 30), SIM2 (degree = 10) and KRT86 (degree = 4) discovered yet. ADAMTS6 was implicated in the development of (Figure 2). myeloma and prolactin tumors,33,34 and it may be involved in the colorectal carcinogenesis. The different expression of ADAMTS6 qRT-PCR Validation between CC and RC suggested that the components and regulatory mechanisms differed in CC and RC. Li et al.35 also We performed qRT-PCR for top 10 most significantly DEGs confirmed this by comparing gene expression profilings of including ADAMTS6, AFF4, AMZ1, APBB2, DNAH5, FOXD4L4, KRT86, primary tumor tissues between CC and RC. RIMBP3, RIMBP3C and SPATA32 in both CC and RC. In qRT-PCR AMZ1, FOXD4L4, KRT86, RIMBP3C, DNAH5, AFF4 and RIMBP3 were assay, most of the genes showed similar expression patterns to downregulated in both CC and RC compared with NC tissues, that were observed in integrating analysis (Figure 3). We found − and KRT86, RIMBP3C and DNAH5 seemed to be more obviously that SPATA32 and ADAMTS6 were downregulated in CC ( 16.38- downregulated in CC. Of these genes, some of these genes are − fold and 4.89-fold), while upregulated in RC (3.3-fold, 3.65-fold) highly associated with carcinogenesis. FOXD4L4 is hypermethy- − − compared with NC tissues. AMZ1 ( 9.43-fold and 1.70-fold), lated in ovarian tumors, and may have important roles in the − − − FOXD4L4 ( 4.96-fold and 2.06-fold), KRT86 ( 4.98-fold and development of ovarian neoplasm.36 DNAH5 is frequently mutated − − − 1.03-fold), RIMBP3C ( 15.01-fold and 1.04-fold), DNAH5 in patients with primary ciliary dyskinesia37 and myeloma.38 AFF4, − − − − ( 14.35-fold and 1.07-fold), AFF4 ( 1.32-fold and 1.33-fold) a component of the positive transcription elongation factor b (P- − − and RIMBP3 ( 11.35-fold and 2.67-fold) were downregulated in TEFb) complex, was a key regulator in the pathogenesis of both CC and RC compared with NC tissues, and KRT86, RIMBP3C leukemia through many of the MLL partners.39 KRT86, a member and DNAH5 seemed to be more obviously downregulated in CC. of the keratin gene family, was involved in the hair and nails APBB2 (1.59-fold and 1.99-fold) showed upregulated expression in forming. Several studies observed a mutation of KRT86 in patients both CC and RC compared with NC tissues (Figure 3). with a rare dominant hair disease, monilethrixin.40 There was very limited research on the function of AMZ1, RIMBP3C and RIMBP3, and their specific biological roles were unknown. APBB2 was DISCUSSION upregulated in both CC and RC compared with NC tissues, and it In this integrating analysis, we combined five RNA-Seq data sets to was involved in the regulation of βAPP processing, which has a find that 883 genes were differently expressed between CRC and central role in the pathogenesis of Alzheimer’s disease.41 The NC tissues. In line with previous findings, among these DEGs some association with cancers was not reported. genes were implicated in the tumorigenesis of CRC, such as In summary, integrating analysis of multiple RNA-Seq data sets TGFB2,25 TP53,26 MET,27 BRAF28 and SIM2. TGFB2, an important may be a useful way to overcome the limitation of small sample member of the TGF-β signal pathway, which is frequently size in a single RNA-seq study. This integrating analysis of five overexpressed in CRC and other malignant tumors,29 has a key RNA-Seq data sets in CRC detected a novel gene expression profile role in carcinogenesis, especially in the prosesses of immunosup- associated with CRC, and gene function analysis exhibited a high pression and metastasis. Loss of tumor suppressor gene expres- correlation with carcinogenesis. The expression of DEGs identified sion (TP53) as well as activation of oncogenes (MET and BRAF) in integrating analysis was verified in CRC by qRT-PCR. In addition, drives carcinogenesis by regulating cell activities.30 DeYoung some genes, such as SIM2, ADAMTS6, FOXD4L4 and DNAH5, may et al.31 determined that SIM2 was expressed in solid tumors such have an important role in CRC and could also be considered as

© 2015 Nature America, Inc. Cancer Gene Therapy (2015), 278 – 284 Integrating analysis of different RNA-seq data sets WH Xiao et al 282

Table 4. The enriched GO categories of DEGs

Category GO ID GO term No. of genes FDR

Biological process GO:0006350 Transcription 98 0.0001 GO:0045449 Regulation of transcription 105 0.0001 GO:0006355 Regulation of transcription, DNA-dependent 75 0.0001 GO:0051252 Regulation of RNA metabolic process 75 0.0001 GO:0045892 Negative regulation of transcription, DNA-dependent 19 0.0003 GO:0051253 Negative regulation of RNA metabolic process 19 0.0004 GO:0031327 Negative regulation of cellular biosynthetic process 24 0.0011 GO:0016481 Negative regulation of transcription 21 0.0011 GO:0009890 Negative regulation of biosynthetic process 24 0.0014 GO:0010629 Negative regulation of gene expression 22 0.0014 GO:0010558 Negative regulation of macromolecule biosynthetic process 23 0.0017 GO:0045934 Process negative regulation of nucleobase, nucleoside, 22 0.0017 nucleotide and nucleic acid metabolic process GO:0015711 Organic anion transport 6 0.0019 GO:0048705 Skeletal system morphogenesis 9 0.0020 GO:0051172 Negative regulation of nitrogen compound metabolic process 22 0.0021 GO:0001501 Skeletal system development 16 0.0022 GO:0045793 Positive regulation of cell size 6 0.0028 GO:0000122 Negative regulation of transcription from RNA polymerase II promoter 14 0.0030 GO:0006357 Regulation of transcription from RNA polymerase II promoter 27 0.0035 GO:0003007 Heart morphogenesis 7 0.0036 GO:0009880 Embryonic pattern specification 5 0.0038 GO:0046777 Protein amino-acid autophosphorylation 7 0.0075 Molecular function GO:0003677 DNA binding 94 0.0001 GO:0030528 Transcription regulator activity 54 0.0001 GO:0003700 Transcription factor activity 38 0.0002 GO:0016564 Transcription repressor activity 18 0.0003 GO:0008270 Zinc ion binding 70 0.0005 GO:0004674 Protein serine/threonine kinase activity 21 0.0006 GO:0043167 Ion binding 113 0.0007 GO:0043565 Sequence-specific DNA binding 26 0.0007 GO:0043169 Cation binding 111 0.0009 GO:0046872 Metal ion binding 110 0.0010 GO:0046914 Transition metal ion binding 78 0.0022 GO:0004672 Protein kinase activity 24 0.0033 Cellular component GO:0042995 Cell projection 25 0.0002 GO:0031226 Intrinsic to the plasma membrane 31 0.0069 GO:0043005 Neuron projection 13 0.0071 GO:0005887 Integral to the plasma membrane 30 0.0092 Abbreviations: DEGs, differentially expressed genes; FDR, false discovery rate; GO, gene ontology.

Figure 2. The PPI networks of the top 20 dysregulated differentially expressed genes (DEGs). Nodes represent proteins, edges represent interactions between two proteins. The higher the node shape, the greater the degree of connection. Red-color nodes represent products of the top 20 dysregulated DEGs, respectively. Blue nodes denote products of genes predicted to interact with the DEGs.

Cancer Gene Therapy (2015), 278 – 284 © 2015 Nature America, Inc. Integrating analysis of different RNA-seq data sets WH Xiao et al 283

Figure 3. Quantitative real-time–polymerase chain reaction (qRT– Figure 3. Continued. PCR) analysis data for top 10 most significantly differentially expressed genes (DEGs) are presented in CC and RC. Three technical replicates were performed for each sample. The height of each box represents the mean average of sample-specific2− △△ct values, while associated error bars denote the s.e.m. Fold changes are shown in parentheses.

© 2015 Nature America, Inc. Cancer Gene Therapy (2015), 278 – 284 Integrating analysis of different RNA-seq data sets WH Xiao et al 284 potential candidate biomarkers for diagnosis, prognosis and as 21 Tabas-Madrid D, Nogales-Cadenas R, Pascual-Montano A. GeneCodis3: a non- therapeutic targets for this malignancy. Furthermore, our study redundant and modular enrichment analysis tool for functional genomics. Nucleic would shed light on the molecular mechanism underlying Acid Res 2012; 40: W478–W483. tumorigenesis of CRC. 22 Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y et al. A protein interaction map of Drosophila melanogaster. science 2003; 302: 1727–1736. 23 Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M et al. A map of the CONFLICT OF INTEREST interactome network of the metazoan C. elegans. Science 2004; 303: 540–543. 24 Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D et al. Cytoscape: a The authors declare no conflict of interest. software environment for integrated models of biomolecular interaction net- works. Genome Res 2003; 13: 2498–2504. 25 Fleming NI, Jorissen RN, Mouradov D, Christie M, Sakthianandeswaren A, REFERENCES Palmieri M et al. SMAD2, SMAD3 and SMAD4 mutations in colorectal cancer. 1 Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM. Estimates of worldwide Cancer Res 2013; 73:725–735. burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer 2010; 127: 2893–2917. 26 Shin YJ, Kim MS, Kim MS, Lee J, Kang M, Jeong JH. High-mobility group box 2 2 Siegel R, Naishadham D, Jemal A. Cancer statistics, 2013. CA 2013; 63:11–30. (HMGB2) modulates radioresponse and is downregulated by p53 in colorectal 3 Bass AJ, Lawrence MS, Brace LE, Ramos AH, Drier Y, Cibulskis K et al. Genomic cancer cell. Cancer Biol Ther 2013; 14:213–221. sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 27 Xie T, D’Ario G, Lamb JR, Martin E, Wang K, Tejpar S et al. A comprehensive – fusion. Nat Genet 2011; 43:964 968. characterization of genome-wide copy number aberrations in colorectal cancer 4 Issa JP. CpG island methylator phenotype in cancer. Nat Rev Cancer 2004; 4: reveals novel oncogenes and patterns of alterations. PloS One 2012; 7: e42001. – 988 993. 28 Joyce T, Oikonomou E, Kosmidou V, Makrodouli E, Bantounas I, Avlonitis S et al. A 5 Markowitz SD, Bertagnolli MM. Molecular origins of cancer: Molecular basis of molecular signature for oncogenic BRAF in human colon cancer cells is revealed – colorectal cancer. N Engl J Med 2009; 361: 2449 2460. by microarray analysis. Curr Cancer Drug Target 2012; 12:873–898. 6 Zoratto F, Rossi L, Verrico M, Papa A, Basso E, Zullo A et al. Focus on genetic and 29 Lee S, Bang S, Song K, Lee I. Differential expression in normal-adenoma-carcinoma epigenetic events of colorectal cancer pathogenesis: implications for molecular sequence suggests complex molecular carcinogenesis in colon. Oncol Rep 2006; – diagnosis. Tumour Biol 2014; 35: 6195 6206. 16:747–754. 7 Letellier E, Schmitz M, Baig K, Beaume N, Schwartz C, Frasquilho S et al. Identi- 30 Gutschner T, Hammerle M, Pazaitis N, Bley N, Fiskin E, Uckelmann H et al. Insulin- fi cation of SOCS2 and SOCS6 as biomarkers in human colorectal cancer. Br J like growth factor 2 mRNA-binding protein 1 (IGF2BP1) is an important protu- Cancer 2014; 111: 726–735. morigenic factor in hepatocellular carcinoma. Hepatology 2014; 59: 1900–1911. 8 Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of 31 DeYoung MP, Tress M, Narayanan R. Identification of Down’s syndrome critical technical reproducibility and comparison with gene expression arrays. Genome locus gene SIM2-s as a drug therapy target for solid tumors. Proc Natl Acad Sci USA Res 2008; 18: 1509–1517. 2003; 100:4760–4765. 9 Oshlack A, Robinson MD, Young MD. From RNA-seq reads to differential 32 Lee KF, Tam YT, Zuo Y, Cheong AW, Pang RT, Lee NP et al. Characterization of an expression results. Genome Biol 2010; 11:220. acrosome protein VAD1.2/AEP2 which is differentially expressed in spermato- 10 Xu X, Zhang Y, Williams J, Antoniou E, McCombie WR, Wu S et al. Parallel com- genesis. Mol Hum Reprod 2008; 14:465–474. parison of Illumina RNA-Seq and Affymetrix microarray platforms on tran- 33 Bret C, Hose D, Reme T, Kassambara A, Seckinger A, Meissner T et al. Gene scriptomic profiles generated from 5-aza-deoxy-cytidine treated HT-29 colon expression profile of ADAMs and ADAMTSs metalloproteinases in normal and cancer cells and simulated datasets. BMC 2013; 14(Suppl 9): S1. malignant plasma cells and in the bone marrow environment. Exp Hematol 2011; 11 Siddiqui AS, Delaney AD, Schnerch A, Griffith OL, Jones SJ, Marra MA. Sequence 39:546–557 e8. biases in large scale gene expression profiling data. Nucleic Acid Res 2006; 34: 34 Wierinckx A, Auger C, Devauchelle P, Reynaud A, Chevallier P, Jan M et al. A e83–e83. diagnostic marker set for invasion, proliferation, and aggressiveness of prolactin 12 Feichtinger J, Thallinger GG, McFarlane RJ, Larcombe LD. Microarray meta-ana- pituitary tumors. Endocrine-related Cancer 2007; 14:887–900. lysis: From data to expression to biological relationships. In: Computational 35 Li JN, Zhao L, Wu J, Wu B, Yang H, Zhang HH et al. Differences in gene expression Medicine. Springer, Vienna, Austria 2012, pp 59–77. fi 13 Ramasamy A, Mondry A, Holmes CC, Altman DG. Key issues in conducting a meta- pro les and carcinogenesis pathways between colon and rectal cancer. J Digestive – analysis of gene expression microarray datasets. PLoS Med 2008; 5: e184. Dis 2012; 13:24 32. 14 Rau A, Marot G, Jaffrezic F. Differential meta-analysis of RNA-seq data from 36 Huang YW, Jansen RA, Fabbri E, Potter D, Liyanarachchi S, Chan MW et al. Iden- fi multiple studies. BMC Bioinformatics 2014; 15: 91. ti cation of candidate epigenetic biomarkers for ovarian cancer detection. Oncol – 15 Sonachalam M, Shen J, Huang H, Wu X. Systems biology approach to identify Rep 2009; 22:853 861. gene network signatures for colorectal cancer. Front Genet 2012; 3:80. 37 Hornef N, Olbrich H, Horvath J, Zariwala MA, Fliegauf M, Loges NT et al. DNAH5 16 Shi M, Beauchamp RD, Zhang B. A network-based gene expression signature mutations are a common cause of primary ciliary dyskinesia with outer dynein – informs prognosis and treatment for colorectal cancer patients. PloS One 2012; 7: arm defects. Am J Respir Crit Care Med 2006; 174: 120 126. e41292. 38 Walker BA, Wardell CP, Melchor L, Hulkki S, Potter NE, Johnson DC et al. Intraclonal 17 Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M et al. NCBI heterogeneity and distinct molecular mechanisms characterize the development GEO: archive for functional genomics data sets—update. Nucleic Acid Res 2013; of t(4;14) and t(11;14) myeloma. Blood 2012; 120: 1077–1086. 41: D991–D995. 39 Lin C, Smith ER, Takahashi H, Lai KC, Martin-Brown S, Florens L et al. AFF4, a 18 Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ et al. component of the ELL/P-TEFb elongation complex and a shared subunit of MLL Transcript assembly and quantification by RNA-Seq reveals unannotated tran- chimeras, can link transcription elongation to leukemia. Mol Cell 2010; 37: scripts and isoform switching during cell differentiation. Nat Biotechnol 2010; 28: 429–437. 511–515. 40 Wu J, Lin Y, Xu W, Li Z, Fan W. A mutation in the type II KRT86 gene in 19 Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with a Han family with . J Biomed Res 2011; 25:49–55. RNA-Seq. Bioinformatics 2009; 25: 1105–1111. 41 Golanska E, Sieruta M, Gresner SM, Hulas-Bigoszewska K, Corder EH, Styczynska M 20 Reimers M, Carey VJ. Bioconductor: an open source framework for bioinformatics et al. Analysis of APBB2 gene polymorphisms in sporadic Alzheimer’s disease. and . Method Enzymol 2006; 411:119–134. Neurosci Lett 2008; 447: 164–166.

Supplementary Information accompanies the paper on Cancer Gene Therapy website (http://www.nature.com/cgt)

Cancer Gene Therapy (2015), 278 – 284 © 2015 Nature America, Inc.