Ted Liefeld Cancer Informatics Development Broad Institute of MIT and Harvard Need: Insights through integra ve studies
Gene mutation causing Leigh Syndrome Discovery of 3 new genes involved in French Canadian Type (LFSC) and 8 Glioblastoma Multiforme (NF1, ERRB2, other mitochondrial diseases PIK3R1); Confirmation of TP53, PTEN, Integrate: candidate genomic region, EGFR, RB1, PIK3CA mitochondrial proteomic data, and cancer Integrate: DNA sequence, copy number, expression compendium methylation aberrations and expression Authors: Mootha et al. 2003, Calvo et al. 2006 profiles in 206 glioblastomas Authors: TCGA Research Network 2008 Subtle repression of oxphos genes in diabetic muscle: role for mitochondrial ~3000 novel, large non-coding RNAs with dysregulation in diabetes pathology; new functions in development, the immune computational approach Gene Set response and cancer Enrichment Analysis Integrate: Genome sequences from 21 Integrate: Gene sets/pathways & processes mammals, epigenomic maps, and with expression profiles expression profiles Authors: Mootha et al. 2003, Subramanian & Tamayo Authors: Guttman, Rinn et al. 2009 et al. 2005 Characterization of disease subtypes and IKBKE as a new breast cancer oncogene improved risk stratification for Integrate: RNAi screens, transformation medulloblastoma patients of activated kinases, and copy number Integrate: Copy number, expression, from SNP arrays of cell lines clinical data for 96 medulloblastoma Authors: Boehm et al. 2007 patients Authors: Tamayo et al., Cho et al. 2011 The Challenges
• Flood of high throughput biological data genomic sequence, global mRNA expression profiles, copy number and LOH, epigene c data, protein level and modifica on status, metabolite profiles
• Prolifera on of tools Databases, visualiza on, and analysis Difficulty of ge ng tools to work together Access, analyze, visualize each data type separately Integra ve Genomics Use Case
GenePattern Cytoscape IGV/UCSC Genomica CMAP iii
1 Compendium Differentially Expressed Genes GCT GXP Load compendium Arrests SIF ODF GMT Show module G2/M 3 map Show i Network 2 GSEA test network GXC GRP enrichment 5 Show Extract HTML 4 module Expand +1 Chromosome Idea (include NA gene list ii Alterations neighbors) 6 Add Transcription Learn p53 atcgcgtttattcgataagg! Factor track site/score on atcgcgttttttcgataagg! Added to GenePattern from UCSC promoter Pathway iv activation v Expression 8 7 Test for Looks closeGFF GXAsimilarity of Conclusion to p53 site p53 and gene location vi 12 steps, 6 tools, 7 transi ons
1 i GenePattern Cytoscape IGV Genomica 2 3 ii iii CMAP UCSC Browser
Analysis step 4 5 iv v Analysis conclusion Within tool Across tools
6 8 The Need
• A lightweight “connec on layer” for a wide variety of integra ve genomic analyses o Support for all types of resource: Web-‐based, desktop, etc. o Automa c conversion of data formats between tools o Easy access to data from any loca on o Any tool that joins is automa cally connected to the community of tools o Ease of entry into the environment Cloud-based storage
API connectivity layer Online community to share diverse computational tools
3 Driving Biological 6 Seed Tools Projects
Cytoscape lincRNAs Galaxy GenePa ern Cancer stem cells Genomica IGV Pa ent Stra fica on UCSC Browser
Outreach: new Outreach: new tools www.genomespace.org DPBs GenomeSpace Principles
• Aimed towards non-‐programming users
• Support interoperability through automa c cross-‐tool data transfer
• Requires minimal changes to tools
Open Source, LGPL @genomespace www.genomespace.org GenomeSpace Components
Galaxy GenePattern Cytoscape Integrative Genomics Viewer
GS Enabled Tools
1 Authentication and Authorization
2 3 Analysis and Data Manager Tool Manager
Genome Space GenomeSpace Server Project Data …
External Data Sources & Tools Seed Tools
Cytoscape Galaxy GenePattern
Genomica IGV UCSC Genome Browser Using GenomeSpace
Tools and Data Sources Actions
Cloud- based filesystem GenomeSpace Ac ons GenomeSpace Tool Enablement: IGV GenomeSpace Tool Enablement: Galaxy GenomeSpace Data Source Integration: InSilico DB Other collabora ng projects
• Cistrome (Dana Farber Cancer Ins tute) • Reactome (Ontario Ins tute for Cancer Research) • geWorkbench (Columbia University) • Taverna/MyExperiment (University of Manchester) • ArrayExpress (European Bioinforma cs Ins tute) • Na onal Center for Biomedical Ontology (Stanford University) • ISA-‐TOOLS (Oxford University) • Archon Genomics X-‐Prize DBP3: Studying the regulatory control of human hematopoiesis DBP3: Studying the regulatory control of human hematopoiesis – Overview
Part 1: Data pre-processing Part 3: Studying the Part 4: cis-regulatory and quality control transcriptional program site analysis Genomica From 2 From part 3 3 2 4 part 2 GenePattern 2 3 4 1 IGV To part 2 2 2 1 1 1 From part 2 1 Cytoscape Galaxy 1 1 1 1 1 2 2 Manual step 3 2 To part 5 1 2 Analysis section 1 Part 5: Finding new 4 Analysis step (# steps) transcription factor regulators 1 1 Analysis conclusion From part 4 Optional choices Part 2: Basic analysis 1 Currently integrated From part 1 To part 3 3 1 2 3 2 Not yet integrated 2 2 1 2 2 1 3+ 2 1 2 1
To part 4 Part 5: Finding new transcription factor regulators
From part 4
Step 1 2 3 Step 4
Step 2 2
Step 3 2 Step 1: Create transcription factor dataset in Genomica and save to GenomeSpace Step 2: Send transcription factor datasets into GenePattern Step 2: Perform differential expression analysis in GenePattern Step 3: Send differentially expressed genes to Genomica Step 3: perform module network analysis in Genomica Step 4: Visualize regulators with known SNPs and linkage regions Step 4: Visualize regulators with known SNPs and linkage regions GenomeSpaceDeployment Architecture Architecture
Clients Amazon REST GS UI Clustered
Identity
REST Service IGV (OpenID)
Analysis REST CDK Gene- Task
Pattern Manager https (ATM) REST
Galaxy Simple DB
History REST UCSC REST Data Manager (DM) External REST Data Sources
S3 File transfers 28 Adding it to your Galaxy h p://wiki.g2.bx.psu.edu/GenomeSpace
1. Pull the changesets 2. Add GenomeSpace to your toolbox 3. Enable OpenID in your config file 4. Restart Join the GenomeSpace community
We’re looking for Researchers with driving biological projects Developers & Data providers – Add your tools – Contribute format converters – Build new infrastructure Acknowledgements
GenomeSpace Collaborators Broad Institute Cytoscape: Trey Ideker Lab, UCSD Michael Reich Galaxy: Anton Nekrutenko Lab, Penn State University Helga Thorvaldsdottir Genomica: Eran Segal Lab, Weizmann Institute Jim Robinson UCSC Browser Team Marco Ocana GenePattern Team Jill Mesirov, PI IGV Team
Driving Biological Projects Howard Chang Lab – Stanford University Aviv Regev Lab – Broad Institute Funding
[email protected] www.genomespace.org @genomespace Ques ons?