The EXPANDER Integrated Platform for Transcriptome Analysis

Article

Tom Aharon Hait 1,2, Adi Maron-Katz 3, Dorit Sagir 1, David Amar 4, Igor Ulitsky 5, Chaim Linhart 1, Amos Tanay 5,6, Roded Sharan 1, Yosef Shiloh 2, Ran Elkon 2,7 † and Ron Shamir 1,†

1 - The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel 2 - Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel 3 - Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA 4 - Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA 94305, USA 5 - Department of Biological Regulation, Weizmann Institute of Science, Rehovot 76100, Israel 6 - Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel 7 - Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel

Correspondence to Ran Elkon and Ron Shamir:Correspondence to: R. Elkon, Department of Human Molecular Genetics & Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.Correspondence to: R. Shamir, Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel. [email protected], [email protected]. https://doi.org/10.1016/j.jmb.2019.05.013

Abstract

Genome-wide analysis of cellular transcriptomes using RNA-seq or expression arrays is a major mainstay of current biological and biomedical research. EXPANDER (EXPression ANalyzer and DisplayER) is a comprehensive software package for analysis of expression data, with built-in support for 18 different organisms. It is designed as a “one-stop shop” platform for transcriptomic analysis, allowing for execution of all analysis steps starting with gene expression data matrix. Analyses offered include low-level preprocessing and normalization, differential expression analysis, clustering, bi-clustering, supervised grouping, high-level functional and pathway enrichment tests, and networks and motif analyses. A variety of options is offered for each step, using established algorithms, including many developed and published by our laboratory. EXPANDER has been continuously developed since 2003, having to date over 18,000 downloads and 540 citations. One of the innovations in the recent version is support for combined analysis of gene expression and ChIP-seq data to enhance the inference of transcriptional networks and their functional interpretation. EXPANDER implements cutting-edge algorithms and makes them accessible to users through user-friendly interface and intuitive visualizations. It is freely available to users at http://acgt.cs.tau.ac.il/expander/. © 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Introduction most vastly used techniques in biomedical research, being routinely applied for multiple goals, including The ability to profile entire cellular transcriptomes, the elucidation of transcriptional networks that drive first by expression microarrays and subsequently by different biological processes and cellular responses RNA-sequencing (RNA-seq) has transformed bio- to challenges and the identification of expression logical research over the last two decades, by signatures for multiple pathological conditions [2–4]. turning the paradigm of systems-level analysis The utility of transcriptomic experiments greatly from a formidable task to one that is readily depends on the availability of advanced, yet easy accessible to most experimental laboratories [1]. to use, bioinformatics tools for mining meaningful Consequently, transcriptome profiling is one of the biological knowledge out of the raw data. 0022-2836/© 2019 The Author. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Journal of Molecular Biology (2019) 431, 2398–2406 The Expander Platform for Transcriptome Analysis 2399

ChIP-seq Expression data data matrix

Preprocessing / Map to target integraon genes Normalizaon GSEA

Diﬀerenally Network expressed genes data

Network-guided Bi-clustering Clustering grouping

Cytoscape Enrichment tests interface

GO Chr Pathways Mofs miRNAs categories locaon

Fig. 1. A flowchart summarizing the main analysis modules that are implemented in EXPANDER 8.0. (analysis modules involving ChIP-seq data are detailed in Fig. 3). Gene expression (GE) data analysis is a multi-step using ChIP-seq [7]. The true power of this method for process, ranging from low-level analyses that delineating transcriptional networks lies in its inte- generate a normalized data matrix to high-level grated analysis with corresponding transcriptomic network-guided analyses that dissect the transcrip- data sets. To meet this challenge, one of the tional response into functional modules that carry out innovations in EXPANDER 8.0 is the support for specific biological endpoints. While there are many combined analysis of GE and ChIP-seq data. tools and software packages that perform dedicated Another aspect that significantly enhances inference tasks, we reasoned that an integrative platform that of biological working models from transcriptomic covers the entire analysis pipeline would greatly data is its analysis in the context of gene and protein boost the ability of nonspecialist experimental networks [8]. Therefore, in EXPANDER 8.0,we laboratories to effectively mine their transcriptomic substantially augmented this analysis feature. The data. With this guiding principle, we are continuously current version allows streamlined export of developing EXPANDER (EXPression ANalyzer and network-based results into the highly popular Cytos- DisplayER) since 2003 [5,6] as a “one-stop shop” cape tool [9]. Once imported into Cytoscape, users software tool for GE data analysis. Here we describe can benefit from the multitude of analysis plugins the current version of our tool—EXPANDER 8.0. and network visualization utilities provided by this Analysis modules provided by EXPANDER 8.0 platform. include low-level preprocessing and normalization, EXPANDER is freely available to users at http:// differential expression analysis, clustering, bi- acgt.cs.tau.ac.il/expander/. clustering, supervised grouping, high-level functional and pathway enrichment tests, and networks and cis-regulatory motif analyses. A variety of estab- Results and Discussion lished algorithms, including many developed and published by our lab, are offered for many steps. EXPANDER covers, under one roof, all the steps These algorithms are made accessible to users of GE data analysis, starting with GE data matrix through user-friendly interface, and results are recorded using either GE arrays or RNA-seq. The displayed in intuitive and interactive visualizations. analysis modules provided by EXPANDER are As an interactive tool, by design, EXPANDER avoids summarized in Fig. 1, and the tools/algorithms computationally heavy processes. Hence, the ma- implemented in each module are summarized in jority of the implemented analysis modules typically Supplementary Table S1. Some of these modules take less than a minute or just a few minutes to require species-specific annotation files [e.g., Gene complete on a standard PC. Ontology (GO) annotations, promoter sequences], Recent years also witnessed great advance in the and EXPANDER 8.0 has a built-in support for 18 application of epigenomics techniques, including different organisms (Table S2). Here, we demon- genome-wide profiling of physical interactions be- strate EXPANDER's analysis pipeline by its appli- tween transcription factors (TFs) and the chromatin cation to a data set that profiled GE, using RNA-seq, 2400 The Expander Platform for Transcriptome Analysis in cells treated by ionizing irradiation (IR) at three data set, 637 DE genes were identified based on time-points (0, 4, and 8 h) using two biological fold-change criteria. replicates (of CAL51 breast cancer cell line) [10] (Table S3). We focus here on the typical analysis Gene grouping based on expression patterns pipeline. Description of alternative ones is given in EXPANDER's user manual. DE genes in a data set exhibit various expression patterns (induction or repression, different response Data preprocessing and low-level analysis kinetics, etc.), where genes sharing an expression pattern (co-expressed genes) are expected to func- An EXPANDER session typically starts by upload- tion together towards achieving certain biological inganexpressiondatamatrix(geneorprobesin endpoints [20]. Therefore, typically, the next step in rows and samples in columns), which contains the analysis pipeline is the partition of DE genes into estimates for either absolute or relative expression groups according to their expression patterns. EX- levels. A matrix of read counts is supported for PANDER provides three approaches for this task: (a) analysis of RNA-seq data. The user also indicates at clustering, (b) network-based grouping, and (c) this stage the organism of the analyzed data set. As biclustering. EXPANDER's species annotation files use specific Clustering divides the DE genes into distinct gene accession IDs (e.g., Entrez Gene IDs for Homo clusters such that genes assigned to the same sapiens), the user should provide a conversion file cluster show highly similar patterns, while those that maps the IDs used in the data file to IDs assigned to different clusters show dissimilar pat- expected by EXPANDER for that organism. (Con- terns. EXPANDER implements three popular clus- version files from probe ids to gene ids are provided tering algorithms: K-means [21], self-organizing by EXPANDER for selected widely used expression maps [22], and CLICK, which was developed in our arrays). The example data file we analyzed here laboratory [23]. In the example data set, CLICK contains 14,992 genes measured over six biological divided the set of DE genes into four clusters (Figs. samples. As initial preprocessing, EXPANDER S1 and 2A). offers multiple normalization schemes. For RNA- Network-based grouping seeks groups of genes seq data, trimmed mean of M values [11], relative log that show highly correlated expression pattern and expression [12], and upper quartile normalization are located in close proximity to each other in some [13] are offered. For microarray expression data or global gene network (e.g., protein–protein interac- processed RNA-seq data (e.g., RPKM data) that tion network). For this task, EXPANDER provides requires further normalization, EXPANDER offers protein–protein interaction network files for selected two normalization schemes: quantile [14] and low- organisms (Table S2) and contains an implementa- ess [15]. The overall structure of the data can then tion of MATISSE, our graph-theoretic algorithm that be inspected using hierarchical clustering and detects significant co-expressed connected subnet- principal component analysis. Subsequent analy- works (network modules) [24]. Such modules, ses are usually performed on a subset of genes that whose definition is based on the combined analysis showed significant expression variation across the of network and expression data, are often more biological conditions [termed differentially biologically meaningful than co-expression clusters, expressed (DE) genes]. EXPANDER offers both as they account also for the network relationships generic tests for DE genes (e.g., moderated T test between the genes. Furthermore, the genes in such implemented by limma [16]) and methods specifi- a module are more likely to be functionally related cally developed for data obtained by expression and act together in the same pathway. Figure 2B arrays (SAM [17]) or RNA-seq read counts data shows a network module of 30 genes detected by (edgeR [18] and DESeq2 [12,19]). In the analyzed MATISSE in our data set. Each module is composed of “forward” and “back” nodes. The forward nodes

Fig. 2. Expression-based gene grouping and enrichment tests. (A) Average expression patterns of the cluster of 367 genes, detected by CLICK in our example data set, showing a sustained induction of expression upon irradiation. In EXPANDER's display, each cluster is represented by the average expression pattern calculated over all the genes in the cluster. Error bars indicate ±SD. Conditions are labeled over the X axis. Y axis shows standardized expression levels. (B) A network module detected by MATISSE in our example data set. This module contains 30 genes, consisting of 17 genes that showed a transient decrease in expression upon irradiation (“forward” nodes colored in blue) and 13 genes (“back” nodes colored in pink) added by the algorithm to induce connectivity of the forward nodes. (C) Results of GO enrichment analysis (using TANGO) for the cluster shown in panel A. This cluster is enriched for genes that function in the regulation of cell proliferation and cell death. (D) Results of motif enrichment analysis (using PRIMA) for the cluster shown in panel A. The promoters of the genes assigned to this cluster were found to be enriched for motifs of several TFs, including p53, which is a well-known key regulator of the transcriptional responses to irradiation. Each bar corresponds to an enriched TF. Clicking on a bar opens a window that lists the genes in the cluster whose promoters contain a hit for the TF motif. h xadrPafr o rncitm Analysis Transcriptome for Platform Expander The (a) (c)

(b)

(d)

Fig. 2. (legend on previous page) 2401 2402 The Expander Platform for Transcriptome Analysis show highly correlated expression pattern in the data [27] and WikiPathways [28]), which examine, re- set, while the back nodes are added by the algorithm spectively, whether each gene set is enriched for to keep high connectivity among the module's particular GO terms or canonical pathways (Fig. 2C genes. Note that the robustness of network-based shows the results of GO enrichment analysis for the grouping building on correlated expression patterns cluster of genes that were induced in response to IR tends to increase with the number of samples. in our data set, and Fig. S3A shows the results for all To further augment network-based analysis, EX- the clusters; Fig. S3B shows enriched GO terms PANDER 8.0 implements a streamlined interface found on MATISSE's module; Fig. S4 shows an with the highly popular Cytoscape platform [9], which enriched KEGG pathway detected in our data set); provides numerous visualization and analytical (c) Chromosome location tests, evaluating if the utilities for the combined analysis of expression genes in any gene set are enriched for any specific and network data. Figure S2 shows the MATISSE chromosomal region (this test could, for example, module imported to Cytoscape. Once loaded into detect mega-base scale deletions and amplifications Cytoscape, users can benefit from the plethora of in cancer cells); (d) Motif analysis aimed at detecting plugins offered by this rich resource. cis-regulatory DNA motifs that are over-represented Clustering assumes that co-expressed genes in the promoter sequences of the co-expressed show a global similarity over all the examined genes, under the assumption that co-expression conditions. This assumption is less likely to hold implies transcriptional co-regulation. EXPANDER when a data set contains a large number of different implements both our PRIMA method [29], which conditions (e.g., above 20). In such cases, bicluster- uses prior information on TF PWMs, and our ing is a preferred alternative for grouping genes AMADEUS algorithm for de novo motif analysis according to expression pattern. A bicluster is a set [30]. Over the last decade, we demonstrated the of genes that show high similarity over a subset of power of this reverse engineering approach to reveal the conditions. A biclustering algorithm can detect a key regulators of numerous biological processes in collection of biclusters, where genes or conditions human and mouse, including cell-cycle progression can take part in more than one bicluster. EXPAND- [29], DNA damage response [31], immune re- ER implements the iterative signature algorithm [25] sponses [32], and cellular differentiation [33] and our SAMBA biclustering algorithm [26], which (Fig. 2D shows the over-represented motifs detected can handle data sets with hundreds to thousands of by PRIMA in the cluster of IR-induced genes); and conditions. (e) miRNA analysis that searches for enriched To increase flexibility, users can upload their own miRNA target sites in 3′ UTRs of co-expressed gene sets and use this partition for subsequent genes (reasoning that co-expression could stem not analyses. only from transcriptional co-regulation but also from co-regulation of mRNA stability controlled by miR- Enrichment tests NAs). EXPANDER implements our FAME algorithm [6], which uses TargetScan predictions [34], utilizing Once sets of co-expressed genes are identified, target site context scores and accounting for the next challenge is to ascribe each set with some possible biases owing to 3′-UTR sequence length. biological function and to discover key regulators The above enrichment tests are applied to clusters that control the observed co-expression. To this end, of genes that passed some test for differential EXPANDER implements five enrichment tests (Fig. expression. However, transcriptomic experiments 1): (a) GO tests using our TANGO algorithm [6], typically include only a small number of replicate which takes into account the dependencies between samples (mostly 1–3 replicates per condition), which GO terms that stem from the hierarchical structure of limits the statistical power of tests for DE genes. An this ontology; (b) Pathway DB tests (including KEGG alternative statistical approach provided by

Fig. 3. EXPANDER's modules for ChIP-seq analysis. (A) A flowchart summarizing the modules implemented in EXPANDER for analysis of ChIP-seq data. (B) Results of de novo motif analysis (using AMADEUS) applied to the 1830 peaks detected in the p53 ChIP-seq data set. Reassuringly, the top enriched motif corresponds to the known binding motif of p53 (accession M00272 in Transfac DB). (C) Mapping of the p53 peaks to their nearest genes, within a distance limit, defined a set of 376 putative p53 target genes. Gene sets derived from ChIP-seq peaks can be tested for differential expression between two selected conditions. Shown here is the fold change in expression upon irradiation (IR 8 h) compared to the control untreated cells (C0). The putative p53 target genes (n = 367) show significant induction compared to a background (Bg) gene set in the data set (n = 11,697; p-value calculated using one-sided Wilcoxon's test). (D) Gene sets derived from ChIP-seq peaks can also be utilized for enrichment tests applied to gene groups defined by EXPANDER based on their expression patterns. In the example shown here, the cluster of 367 genes that showed a sustained induction of expression upon irradiation (shown in Fig. 2A) was significantly enriched for the 376 putative p53 target genes (93 overlapping genes; enrichment factor = 8.35; p = 2.75 × 10−61 (hypergeometric test)). h xadrPafr o rncitm Analysis Transcriptome for Platform Expander The (a) (b)

(d)

(c)

Fig. 3. (legend on previous page) 2403 2404 The Expander Platform for Transcriptome Analysis

EXPANDER for functional interpretation of transcrip- mapped to its closest k genes located within a tomic data is based on implementation of gene set distance of L bp for it (k and L are set by the user; by enrichment analysis (GSEA) [35]. Instead of focus- default, k = 1 and L = 1 Mbp) (Fig. S6A). ing on the set of DE genes, GSEA considers all the The mapping of ChIP-seq peaks to genes pro- genes that are expressed in a data set and ranks duces a gene set of putative targets of the analyzed them based on a score of differential expression TF. This gene set can be combined into the analysis between the compared samples (e.g., fold-change of GE data using two modes (Fig. 3A): (a) differential or T score calculated between treated and control expression analysis, which tests if the genes in the samples). The ranked gene list is then tested against set are significantly up- or down-regulated in the a large number of curated gene sets, seeking those comparison between two conditions selected by the whose genes are significantly concentrated at either user (Fig. 3C), and (b) enrichment tests, which end of the expression list (these ends represent, examine if the gene-set defined by the ChIP-seq respectively, induced and repressed genes). This peaks is enriched for any gene cluster detected sensitive method builds on the amplification of weak based on expression data (as described in the gene signals, achieved by considering the coordinated grouping section above) (Fig. 3D). Collectively, response of many genes that function in the same these features enhance users' ability to gain insights process, where individually most of them show only into the function of TFs and delineate the pathways mild change in expression, not sufficient to reach by which these biological endpoints are mediated statistical significance in per-gene tests. Figure S5 and executed. shows a few results of GSEA on our data set. In conclusion, the widespread scope of the analysis modules provided by EXPANDER, the strength of the Analysis of ChIP-seq data algorithms it implements, and the ease of their access to users make our package a unique software suit for One of the main novel features implemented in analysis of transcriptome data. EXPANDER 8.0 is the support for integrated analysis of GE and ChIP-seq data (Table S2). For this task, users should upload ChIP-seq peaks Methods (detected by a peak calling tool, e.g., MACS2 [36]) in the standard .bed format, and indicate the version EXPANDER is implemented in Java programming of the genome assembly in which peak coordinates language and makes use of components of the R are given (e.g., hg19 or hg38 for H. sapiens). As an statistical language. EXPANDER works under Win- example, we uploaded ChIP-seq results for p53 dows and Linux operating systems (OSs). profiled 2 h after irradiation in the same cell line that EXPANDER requires at least 2 GB RAM in order was used in the expression data set described to functionally work and offers two options to launch above (Table S4) [10]. A total of 1830 p53 binding the application: EXPANDER_2GB.bat (2GB RAM) sites were detected in this ChIP-seq experiments. and EXPANDER_4GB.bat (4GB RAM). Users are Once the peaks .bed file is uploaded, EXPANDER expected to install the following tools according to provides three utilities (Figs. 3A; S6A): (a) Enrich- their OS and 32/64-bit computer architecture: (1) ment for genomic regions. Based on genome Java run-time environment for running Java appli- annotation files, EXPANDER provides statistical cations (https://www.oracle.com/technetwork/java/ summary for the prevalence of peaks located in javase/downloads/jre8-downloads-2133155.html), different genomic categories (e.g., intergenic, up- (2) R statistical language tool (https://cran.r-project. stream TSS/promoter, introns) and tests for enrich- org/), and (3) Cytoscape tool (https://cytoscape.org/ ment of the peaks for any category (Fig. S6B–C). (b) download-platforms.html). Additional information Motif analysis. Enriched DNA motifs are sought in can be found in our online user manual: http://acgt. the peaks' sequences (compared to flanking se- cs.tau.ac.il/expander/help/ver8.0Help/html/. quences) using AMADEUS. (The sequences of the Supplementary data to this article can be found peaks and their control flanking regions are extract- online at https://doi.org/10.1016/j.jmb.2019.05.013. ed from the corresponding genome using our implementation of twoBitToFa utility from UCSC (https://github.com/ENCODE-DCC/kentUtils)). This forms an important quality control step for TF ChIP- seq experiments, as the called peaks are expected Acknowledgments to be enriched for the motif of the examined TF (and potentially, also for motifs of its cofactors) (Fig. 3B). The study is supported in part by the DIP German– (c) Map peaks to nearest genes. To allow integrated Israeli Project cooperation (to R.S. and R.E.), Israel analysis with GE data, the peaks are mapped to their Science Foundation (Grant No. 1339/18 to R.S.), putative target genes. Currently, EXPANDER imple- and Len Blavatnik and the Blavatnik Family founda- ments a naïve approach in which each peak is tion (to R.S). T.A.H. is supported in part by a The Expander Platform for Transcriptome Analysis 2405 fellowship from the Edmond J. Safra Center for cistrome, and epigenome in the cellular response to ionizing Bioinformatics at Tel-Aviv University. R.E. is a radiation, Sci. Signal. 7 (2014) rs3. Faculty Fellow of the Edmond J. Safra Center for [11] M.D. Robinson, A. Oshlack, A scaling normalization method Bioinformatics at Tel Aviv University. for differential expression analysis of RNA-seq data, Genome Biol. 11 (2010) R25. [12] S. Anders, W. Huber, Differential expression analysis for Received 7 February 2019; sequence count data, Genome Biol. 11 (2010) R106. Received in revised form 7 May 2019; [13] J.H. Bullard, E. Purdom, K.D. Hansen, S. Dudoit, Evaluation Available online xxxx of statistical methods for normalization and differential expression in mRNA-Seq experiments, BMC Bioinformatics KeywordsGene expression analysis; 11 (2010) 94. Software package; [14] B.M. Bolstad, R.A. Irizarry, M. Astrand, T.P. Speed, A Transcriptomic analysis; comparison of normalization methods for high density RNA-seq analysis oligonucleotide array data based on variance and bias, Bioinformatics 19 (2003) 185–193. [15] J.A. Berger, S. Hautaniemi, A.K. Jarvinen, H. Edgren, S.K. Mitra, Equal contribution. J. Astola, Optimized LOWESS normalization parameter selec- tion for DNA microarray data, BMC Bioinformatics 5 (2004) 194. [16] M.E. Ritchie, B. Phipson, D. Wu, Y. Hu, C.W. Law, W. Shi, G. K. Smyth, limma powers differential expression analyses for Abbreviations used: RNA-sequencing and microarray studies, Nucleic Acids Res. RNA-seq, RNA-sequencing; GE, gene expression; 43 (2015) e47. EXPANDER, EXPression ANalyzer and DisplayER; TF, [17] V.G. Tusher, R. Tibshirani, G. Chu, Significance analysis of transcription factor; IR, ionizing irradiation; DE, differen- microarrays applied to the ionizing radiation response, Proc. – tially expressed; GO, Gene Ontology; GSEA, gene set Natl. Acad. Sci. U. S. A. 98 (2001) 5116 5121. [18] M.D. Robinson, D.J. McCarthy, G.K. Smyth, edgeR: a Biocon- enrichment analysis. ductor package for differential expression analysis of digital gene expression data, Bioinformatics 26 (2010) 139–140. [19] M.I. Love, W. Huber, S. Anders, Moderated estimation of fold References change and dispersion for RNA-seq data with DESeq2, Genome Biol. 15 (2014) 550. [20] W. Tian, L.V. Zhang, M. Tasan, F.D. Gibbons, O.D. King, J. Park, [1] Z. Wang, M. Gerstein, M. Snyder, RNA-Seq: a revolutionary Z. Wunderlich, J.M. Cherry, F.P. Roth, Combining guilt-by- tool for transcriptomics, Nat. Rev. Genet. 10 (2009) 57–63. association and guilt-by-profiling to predict Saccharomyces [2] M. Cieslik, A.M. Chinnaiyan, Cancer transcriptome profiling cerevisiae gene function, Genome Biol. 9 (Suppl. 1) (2008) S7. at the juncture of clinical translation, Nat. Rev. Genet. 19 [21] S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho, G.M. (2018) 93–109. Church, Systematic determination of genetic network archi- [3] K.F. Au, V. Sebastiano, The transcriptome of human pluripo- tecture, Nat. Genet. 22 (1999) 281–285. tent stem cells, Curr. Opin. Genet. Dev. 28 (2014) 71–77. [22] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. [4] T.G. Rubin, J.D. Gray, B.S. McEwen, Experience and the Dmitrovsky, E.S. Lander, T.R. Golub, Interpreting patterns of ever-changing brain: what the transcriptome can reveal, gene expression with self-organizing maps: methods and Bioessays 36 (2014) 1072–1081. application to hematopoietic differentiation, Proc. Natl. Acad. [5] R. Shamir, A. Maron-Katz, A. Tanay, C. Linhart, I. Steinfeld, Sci. U. S. A. 96 (1999) 2907–2912. R. Sharan, Y. Shiloh, R. Elkon, EXPANDER—an integrative [23] R. Sharan, A. Maron-Katz, R. Shamir, CLICK and EXPAND- program suite for microarray data analysis, BMC Bioinfor- ER: a system for clustering and visualizing gene expression matics 6 (2005), 232. data, Bioinformatics 19 (2003) 1787–1799. [6] I. Ulitsky, A. Maron-Katz, S. Shavit, D. Sagir, C. Linhart, R. [24] I. Ulitsky, R. Shamir, Identification of functional modules Elkon, A. Tanay, R. Sharan, Y. Shiloh, R. Shamir, Expander: using network topology and high-throughput data, BMC Syst. from expression microarrays to networks and functions, Nat. Biol. 1 (2007) 8. Protoc. 5 (2010) 303–322. [25] S. Bergmann, J. Ihmels, N. Barkai, Iterative signature algorithm [7] T.S. Furey, ChIP-seq and beyond: new and improved for the analysis of large-scale gene expression data, Phys. methodologies to detect and characterize protein-DNA Rev. E Stat. Nonlinear Soft Matter Phys. 67 (2003), 031902. interactions, Nat. Rev. Genet. 13 (2012) 840–852. [26] A. Tanay, R. Sharan, M. Kupiec, R. Shamir, Revealing [8] K. Mitra, A.R. Carvunis, S.K. Ramesh, T. Ideker, Integrative modularity and organization in the yeast molecular network approaches for finding modular structure in biological by integrated analysis of highly heterogeneous genomewide networks, Nat. Rev. Genet. 14 (2013) 719–732. data, Proc. Natl. Acad. Sci. U. S. A. 101 (2004) 2981–2986. [9] P. Shannon, A. Markiel, O. Ozier, N.S. Baliga, J.T. Wang, [27] M. Kanehisa, Y. Sato, M. Kawashima, M. Furumichi, M. D. Ramage, N. Amin, B. Schwikowski, T. Ideker, Cytos- Tanabe, KEGG as a reference resource for gene and protein cape: a software environment for integrated models of annotation, Nucleic Acids Res. 44 (2016) D457–D462. biomolecular interaction networks, Genome Res. 13 (2003) [28] D.N. Slenter, M. Kutmon, K. Hanspers, A. Riutta, J. Windsor, 2498–2504. N. Nunes, J. Melius, E. Cirillo, S.L. Coort, D. Digles, et al., [10] S. Rashi-Elkeles, H.J. Warnatz, R. Elkon, A. Kupershtein, Y. WikiPathways: a multifaceted pathway database bridging Chobod, A. Paz, V. Amstislavskiy, M. Sultan, H. Safer, W. metabolomics to other omics research, Nucleic Acids Res. 46 Nietfeld, et al., Parallel profiling of the transcriptome, (2018) D661–D667. 2406 The Expander Platform for Transcriptome Analysis

[29] R. Elkon, C. Linhart, R. Sharan, R. Shamir, Y. Shiloh, [33] R. Elkon, B. Milon, L. Morrison, M. Shah, S. Vijayakumar, M. Genome-wide in silico identification of transcriptional regula- Racherla, C.C. Leitch, L. Silipino, S. Hadi, M. Weiss-Gayet, tors controlling the cell cycle in human cells, Genome Res. 13 et al., RFX transcription factors are essential for hearing in (2003) 773–780. mice, Nat. Commun. 6 (2015) 8549. [30] C. Linhart, Y. Halperin, R. Shamir, Transcription factor and [34] V. Agarwal, G.W. Bell, J.W. Nam, D.P. Bartel, Predicting effective microRNA motif discovery: the Amadeus platform and a microRNA target sites in mammalian mRNAs, Elife 4 (2015). compendium of metazoan target sets, Genome Res. 18 [35] A. Subramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B. (2008) 1180–1189. L. Ebert, M.A. Gillette, A. Paulovich, S.L. Pomeroy, T.R. [31] R. Elkon, S. Rashi-Elkeles, Y. Lerenthal, C. Linhart, T. Tenne, N. Golub, E.S. Lander, J.P. Mesirov, Gene set enrichment Amariglio, G. Rechavi, R. Shamir, Y. Shiloh, Dissection of a analysis: a knowledge-based approach for interpreting DNA-damage-induced transcriptional network using a combina- genome-wide expression profiles, Proc. Natl. Acad. Sci. U. tion of microarrays, RNA interference and computational S. A. 102 (2005) 15545–15550. promoter analysis, Genome Biol. 6 (2005) R43. [36] T. Liu, Use model-based analysis of ChIP-Seq (MACS) to [32] R. Elkon, C. Linhart, Y. Halperin, Y. Shiloh, R. Shamir, analyze short reads generated by sequencing protein–DNA Functional genomic delineation of TLR-induced transcrip- interactions in embryonic stem cells, Methods Mol. Biol. 1150 tional networks, BMC Genomics 8 (2007) 394. (2014) 81–95.