: An Environment for Frictionless Bioinformatics Michael Reich, Ted Liefeld, Marco Ocana, Dongkeun Jang, Jon Bistline, James Robinson, Peter Carr, Barbara Hill www.genomespace.org Judith McLaughlin, Nathalie Pochet, Diego Borges-Rivera, Thorin Tabor, Helga Thorvaldsdóttir, Aviv Regev, Jill P. Mesirov

Background Features Interface Organize and Recipes Driving Biological Projects Customize create groups Genomic research increasingly involves the generation GenomeSpace makes it easy for biologists to use the Easily manage your display of users (e.g. A collection of "recipes" provides quick guides to GenomeSpace development is done in collaboration your files and One-click according to your project Get help with and analysis of data across multiple modalities, e.g. tools they already know to perform analyses and to find directories and launching of the types of team) and add the analysis accomplishing tasks using the GenomeSpace tools: with two Driving Biological Projects (DBPs), which preview files analysis tools analyses you your local tools and sequence variation, gene expression, epigenetics, other tools that can help them extend their research into resident in the on your wish to tools to GenomeSpace provide scientific direction as well as a collection of cloud. datasets. perform. GenomeSpace. itself. proteomics. These efforts are limited however by the new areas. GenomeSpace features include: 1 2 3 4 5 target research scenarios and analytic workflows. Find differentially expressed subnetworks difficulty of analyzing and integrating results from these Seamless transfer of data between tools multiple modalities. Each mode has its own tools, and 6 Dissection of regulatory networks in cancer GenomeSpace automatically converts file Manage your Find differentially expressed genes in Table the tools are seldom designed to work together. account Browser igv RNA-Seq data stem cells by comparative network analysis formats, removing the need to write scripts information. To address these challenges, we have developed 1 with embryonic stem cells. Regulatory and “glue” code. 6 Preprocess and quality check RNA-Seq data GenomeSpace, a lightweight, cloud-based 1 2 3 4 5 networks of embryonic, induced pluripotent, 8 infrastructure to allow genomics tools to share data Easy import of data from public repositories and induced cancer stem cells are compared to ARRAYEXPRESS Identify and visualize expressed transcripts seamlessly. GenomeSpace aims to knock down the Users can transfer data directly from igv in RNA-Seq data find key differing regulatory networks. (Chan barriers between tools, freeing researchers to Lab, Stanford) Web-based resources to their genomics 7 Identify and annotate coding variants from perform analyses and investigate hypotheses that 7 igv tools without the need to download first. whole exome sequencing (WES) data Functional characterization of lincRNAs in previously were too difficult to consider. Context Identify biological functions for genes in copy mammalian genomes by integrating 8 One click Table Connect your own cloud storage accounts menus on Browser MSigDB number variation (CNV) regions Organize and files and launching of 2 epigenomic, transcription, RNA sequencing, Add your own Dropbox, Amazon or (coming manage your directories Share folders or analysis tools on cloud-based just like files with your selected Identify an up- or down-regulated pathway RNAi, WGAS, and other modalities. (Regev Lab, soon) Google Drive accounts easily. project folders desktop individuals, the datasets. MSigDB from expression data just like files on applications. public or specific Broad) Connect Collaborate Combine your desktop. groups of users.

Seed Tools New Tools How You Can Participate We are seeking genomic researchers, bioinformatics tool developers, and data repository providers who are interested in joining and expanding the GenomeSpace community. See Galaxy GenePattern Genomica IGV UCSC Browser Cistrome InSilico DB ArrayExpress Synapse - Available soon ISAcreator Reactome - Available soon MSigDB (UCSD) (Penn State University) (Broad Institute) (Weizmann Institute) (Broad Institute) (UCSC) (Dana-Farber) (University of Brussels) (EMBL-EBI) (Columbia University) (Sage Bionetworks) (U. of Oxford) (Ontario Institute for (The Broad Institute) www.genomespace.org Cancer Research)

At each step, GenomeSpace performs Finding transcription factor regulators of in Action all data conversions and transfers human hematopoiesis between tools. 4 Galaxy This example GenomeSpace scenario reproduces part of the DMap analysis from the Regev lab paper in Cell, Novershtern et al, 2010 Compute overlaps 3. User loads the lineage-specific a. Upload annotation tracks for the genomic locations of the transcription factors generated regulators, a set of previously published SNPs and a set of 1. User saves the expression data from 2. User performs differential linkage regions from a genome-wide association study. the GO transcription factors to expression using the in GenePattern to Genomica b. Run an overlap analysis to determine the intersection of GenomeSpace. expression data loaded through GenomeSpace. putative regulators, SNPs, and linkage regions from GenomeSpace. Genomica 4. User uploads bed IGV 1 annotation tracks to 5 Extract transcription factors Galaxy and IGV through Visualize data a. Load expression data containing 200 samples and 8000 genes GenomeSpace a. Load annotation tracks for the 3 types of data in step 4 into b. Load a gene set containing Gene Ontology (GO) transcription IGV factors GenePattern Genomica b. View the concordance between the locations of the c. Save the expression data from only the GO transcription 2 3 analytically identified potential regulators and the previously factors to the GenomeSpace Data Manager. Compute differentially expressed Identify module networks published SNPs and linkage regions transcription factors a. Compute module networks to determine coexpressed a. Perform differential expression analysis to determine genes “modules” of genes within the original expression dataset. Hematologic that significantly distinguish human embryonic stem cells b. Load the lineage-specific transcription factors generated by disorders (hESCs) versus differentiated cells. GenePattern c. Use these two datasets to generate a list of potential regulators

IGV multi-locus view