© 2016 Nature America, Inc. All rights reserved. ‘data highway’ that handles transfers between tools through through tools between transfers a handles and added, that seamlessly highway’ be ‘data can tools which into pad’ launch ( methods analysis and types data across easily work to scientists non-programming enable to resource community and platform protocols. analytic for manual’ ‘laboratory accepted an of absence the in follow,to particularly recipes right the identify to and up tools keep of all with available the biologists—to oriented computationally less for especially but experts, for ficult—even to Moreover,work together. sources it is dif different from tools getting of complications the and tools various visualization and analysis use to need the between gap ever-growing an is There biologists. many for challenge enormous an remains tools ware soft multiple with types data diverse of analysis integrative The investigators through high-utility analysis tasks. it offers a growing set of ‘recipes’, short workflowsto guide T interaction of 20 bioinformatics tools and data resources. community resource that currently supports the streamlined ( the biomedical research community. We introduce Genome software tools in concert and remain challenging for much of C Howard Y Chang Cambridge, Massachusetts, USA. Genomics Institute, University of SantaCalifornia, SantaCruz, USA. California, Cruz, of Rehovot,Science, Israel. 6 Massachusetts, USA. USA. 1 Eran Segal Brian T Lee Gil Ben-Artzi James T Robinson Diego Borges-Rivera Helga Thorvaldsdottir Kun Qu in Genome of bioinformatics tools analysis by interoperation I Received Received 15 Ap Correspondence should be addressed to J.P.M. ( Department Department of Computer andScience Applied Mathematics, Weizmann Institute of Rehovot,Science, Israel. Program in Epithelial Biology, Stanford University School of Medicine, Stanford, USA. California, h o o facilitate integrative analysis by non-programmers, h omplex biomedical analyses require the use of multiple ntegrative genomic t Here, we present , an open-source interoperability t t t p p 3 : Department Department of Medicine, University of San California, Diego, La Jolla, USA. California, : / / / / w w w 1 w , w 12 w 6 r , Trey Ideker . . g , , Sara Garamszegi il il 2015; g 9 e e , Robert , M Robert Kuhn n n 6 5 o o , Program in Translational NeuroPsychiatric Genomics, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA. 7 m m , , Daniel Blankenberg a e e 1 ccepted ccepted 4 s 8 s 2 , Department Department ofand Biochemistry Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA. 11 p p , 3 a a , Barry Demchak , Barry & Jill P Mesirov 4 c c , , Nathalie Pochet e e 3 S 2 11 . . , Michael Reich , Ted Liefeld o o Howard Hughes Medical Institute, Stanford University, Stanford, USA. California, r r pace D g g ecembe ), ), a cloud-based, cooperative ). GenomeSpace provides a ‘tool ‘tool a provides GenomeSpace ). 2 9 , , , Anton Nekrutenko 12 r [email protected] 2015; published online 18 J , , Felix Wu 2 , 3 2 2 8 , Marco Ocana 3 , , 3 , , Galt P Barber 3 , , Tim Hull 2 , Aviv Regev , 5 ,

2 , 12 ). ,

3

,

S 2 8

, pace pace 4 , 2 a , 10 , nu 9 3 , , - - , ar

y y 2016; 10 recapitulated the steps and recapitulated of results analyses published and to design For development. drive GenomeSpace example, we Our consortium labs provided resources. projects biological and needs analytical data and tools 20 connects now GenomeSpace (IGV) Viewer Genomics tools bioinformatics popular six ( of teams development and and developers. tool users of community the for help and contact of point and analysis tasks. The website as serves a base, knowledge newsstand to guides as quick and serve tools multiple how to strate leverage demon that cases use high-utility of set growing a is Resource Recipe GenomeSpace The conversions. the scripting and identi of fying burden the of scientists relieving converters, format steps required to perform a genomic analysis—a challenging task steps to required a challenging perform genomic analysis—a genom storage. local or accounts cloud processed within data additional collect and operations, launching simple data files, move analysis results input into it other tools interface feed as needed through simultaneously web and tool a the launch can from researcher a summary, In interface. web unifying launched outside of GenomeSpace); and when (iv) a as lightweight, simple, functionality and interface user same the presents and environment native its retains tool (each converters format file by ‘behind-the-scenes’ facilitated all tools, between analyses and data move to and tools launch to ability the (iii) S3)); Amazon Drive, Google (Dropbox, accounts cloud other to connections supports also GenomeSpace and storage, cloud of allocation an receive holders account (all capabilities data-sharing alongside types, storage cloud of variety a in management dataset easy (ii) ( applications of range broad a spanning tools resident of collection the (i) entry: user to barrier a low with analysis tive has GenomeSpace features several that together facilitate integra it. within tools and GenomeSpace platform the only using biologists non-programming by originally While tools. performed be now can work this scripting, between substantial requiring transfers data multiple and methods and steps analytical types, data diverse required study ( cells stem cancer human in networks regulatory gene the visualizing ( GenomeSpace Howard Hughes Medical Institute, Massachusetts Institute of Technology, Initially seeded by a consortium of biology research labs labs research biology of consortium a by seeded Initially We the developed GenomeSpace Recipe Resource ( ( user’sperspective a From d o 4 Supplementary Note 1 Note Supplementary Department Department of Biology, Massachusetts Institute of Technology, Cambridge, i : espace.org/recipe 1 0 2 . The The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, 1 0 1 3 , 2 8 , , Galaxy / n nature methods m Supplementary Figs. 1 Figs. Supplementary e 7 t Department Department of Molecular Biology,Cell Weizmann Institute h . 3 7 3 3 , , GenePattern 2 12 s brief communications These These authors contributed equally to this work. ) to aid researchers in identifying the the identifying in researchers aid to ) 6 n te CC al Browser Table UCSC the and Fig. Fig. , ,

Supplementary Figs. 2 Figs. Supplementary | ADVANCE 1 and and 4 , , Genomica Supplementary Fig. 6 Fig. Supplementary and and

ONLINE 2 ), dissecting and and dissecting ), 9 5 UC UC Santa Cruz

, , the Integrative PUBLICATION http://www. 8 Table – , 9 5 within within ). The The ). | 1 7

), ), ); ); ), ),  - - -

© 2016 Nature America, Inc. All rights reserved. prefer one tothe other. a Reactome Multiple Myeloma Genomics Portal (MMGP) cBioPortal for Cancer Genomics Cancer Cell Line Encyclopedia (CCLE) Project Achilles I UCSC Table Browser Synapse InSilicoDB ArrayExpress D ISAcreator Integrative Genomics Viewer (IGV) Gitools Molecular Signatures Database (MSigDB) Online Tools Genomica GenePattern Galaxy Cytoscape 2 Cytoscape Cytoscape 3 Cistrome A T T workflow. the through users walk help that videos and guides screenshot include Recipes results. of the interpretation possible problem, a relevant example dataset, detailed recipe steps and one ago. decades three biology molecular ratize guide lab classical A Laboratory Manual the after modeled is Resource Recipe our of notion The analyses. complex more part of as employ can investigators that tasks research important commoditize but tools, three or two involving short, generally hensive of descriptions cross-tool analysis areworkflows. Recipes compre i.e., recipes: of ‘cookbook’ or collection a providing by forflexible exploratory research. We took an alternative approach the embody entire workflow of a study, they may be insufficiently can pipelines preconstructed Although analyses. short for even  bioinformatics tools. bioinformatics F Cytoscape 3and Cytoscape2havedifferent underlying architectures and different userinterfaces. Bothversions are made availablethrough GenomeSpace toaccommodate userswho may ntegrated portals (data and analysis) ool name platforms foruserdata igure igure able able 1

ata ata resources nalysis and tools visualization brief communications Cloud-based storage | Each GenomeSpace recipe contains a motivating biological biological motivating a contains recipe GenomeSpace Each ADVANCE Google Drive Amazon S3 Dropbox 1 | | GenomeSpace provides access to a diverse set of bioinformatics tools and resources The GenomeSpace environment for interoperation of interoperation for environment GenomeSpace The a a

ONLINE

PUBLICATION 1 0 GenomeSpace component , , which used a similar approach to democ transfer manager connection an Tool anddata | nature methods d

s Molecular Cloning: Cloning: Molecular Institute and New York University Medical Center Ontario Institute for Cancer Research, European Bioinformatics Multiple Myeloma Research Consortium, Broad Institute and Memorial Sloan Kettering Cancer Center Broad Institute and Novartis Institutes for Biomedical Research Dana-Farber Dana-Farber Cancer Institute and Broad Institute University of California Santa Cruz Sage Bionetworks InSilico Genomics University University Pompeu Fabra, Barcelona European Bioinformatics Institute Broad Institute and UCSD University of Oxford Broad Institute and UCSD Columbia University Weizmann Institute of Science Broad Institute and University of California, San Diego (UCSD) Pennsylvania State University and Johns Hopkins University Cytoscape Consortium Cytoscape Consortium Dana-Farber Cancer Institute Bioinformatics tools and dataresources Translational Genomics Research Institute (TGen) Tool Tool Tool sourc sourc sourc Data Data Data e e e - -

functions for genes in copy number variation (CNV) regions” regions” ( (CNV) variation number copy in genes for functions biological “Identify example, recipe second a describe also We by differentiation hematopoietic of ulation ( SMAD1 phenotype leukemic a to a normal from transformation with are correlated that processes of an oncogene, cells were transformed into leukemia stem cells by progenitor the introduction granulocyte-macrophage which in study a from data is gene expression recipe this with provided dataset The example ( (GO) Ontology Gene the via networks expressed genes and annotates the biological functions within sub differentially between interactions network identifies recipe this ated biological functions.” Briefly, given a gene expression dataset, subnetworks of expressed differentially genes and associ identify recipes. existing improve to ideas and recipes multitool new for crowdsourced, collaborative effort, and we encourage suggestions a collection recipe make to vehicles media social Weadding are ( itself GenomeSpace for using utilities basic as well as analyses genomic diverse cover recipes Current one of several approaches: aggregators host a large number of of number command line tools (Galaxy, architectures plug-in GenePattern); large a host aggregators approaches: several of one one each used have to efforts interoperability cross-tool connect Recent individually. to need the circumventing and sources, tools data GenomeSpace-connected all to access simultaneous tools developers’ independent of giving community. also capabilities while the developer GenomeSpace extending by the benefit by mutual provides contributed This tools diverse of O Supplementary Note 2 Supplementary rganization An example from the Recipe Resource illustrates how to “Find illustrates Resource Recipe the from An example An important design goal was to facilitate rapid addition addition rapid facilitate to was goal design important An -dependent signaling, a process associated with the reg the with associated process a signaling, -dependent MLL-AF9 and and (ref.

Supplementary Figs. 9 Figs. Supplementary upeetr Fg 8 Fig. Supplementary 1 1 ). ). Applying the recipe identifies

h h h h h h h h h h h h h h h h h h h h t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t p p p p p p p p p p p p p p p p p p p p : : : : : : : : : : : : : : : : : : : : / / / / / / / / / / / / / / / / / / / / Supplementary Table 1 Supplementary / / / / / / / / / / / / / / / / / / / / Supplementary Fig. 7 Fig. Supplementary w b w b b g s i w w w w w w g w w w w w n y e e r r r w w w w w w w w w w w w w s o o o Project website n n n i w w w w w w w w w w w w w a a a a o o l i d d d p ...... m m TGF- c r c e m i i g g g g c c c i i i s g s o e b y y i n n n b e e i a e i e a s s a v t d c t t i s s s w n i l . . i t - o . o o a o c . a u b o t t t g o e r t a o o t i i i x p . s s c o . r d o p r t t t c w o c r g c c y l s o g m u u u b o k s a . β o m a a c p

e r u

. . t t t b l t e . m p p t o r i o s and and e e e e k t e e a o z . . e e r e . . . r . / and and o

d o m n l g j o o o g . . o r ), such as as such ), . a e r u o o r c

n o r r r

r g a r g c g g g

h r r g r . r n

g g t

o a g / / /

. . n

o m c a y BMP

r o g . 10 r c e c r a g m

l g h x e c

p

i g .

). l i r p l l e e

1 s s 2 ). ). ). ). s

- - - .

© 2016 Nature America, Inc. All rights reserved. o Note: Any Information Supplementary and Source Data files are available in the v and are Methods references any in available the associated M tools. and methods analysis new to investigators introduce also can and scenarios analysis complex more into assembled be that can provide workflows short recipes reproducibility. Fourth, and robustness for findings their test to alternatives use and iar that so are the they tool can with which investigators choose most famil capabilities similar with tools multiple of inclusion the encourage we Third, analyses. integrative of interpretation the enriching tool, single any with in than diversity examined and be depth greater to data allows tools connected of set large the Second, biologists. many for barrier insurmountable an scripts, conversion custom for need the obviate and tools between data moving and launching as such tasks speed converters format file gists. First, it tools. transition allows Automaticbetween seamless biolo to accessible options of universe the expand and analysis GenomeSpace. of outside developed are expand the set of supported formats by converters leveraging that can we model, data GenomeSpace-specific a on rely not do and Moreover, are to converters independent GenomeSpace. because connecting ones, legacy supporting especially tools, for and models data central defining of burden development the obviates ( formats of file pairs between converting directly for converters file of range a offers Methods). (Online platform the to connect to the amount of minimizing and required while effort tools, web-based desktop diverse among this Moreover, interoperation supports providers. approach storage cloud multiple to interface security resources; data mechanisms and and user-controlled levels of sharing; and a common tools GenomeSpace for sign-on ( platform The resulting systems. and aggregating messaging both of aspects combines approach hybrid lightweight, open-source, (MeDICi tools between instructions and data package basic a of functionality (Cytoscape, geWorkbench the extend to way a provide Supplementary Fig. 11 Fig. Supplementary n e ethods l GenomeSpace thus provides several benefits that help ease ease help that benefits several provides thus GenomeSpace GenomeSpace interoperability, cross-tool facilitate Tofurther r i n s e i

o v n e r

s o i o f n

t

o h f e

t

h p e a

p p a e p r e . r . Supplementary Note 3 Supplementary and Online Methods) provides single single provides Methods) Online and 1 3 , MeV 1 4 ); and messaging systems send ). Direct conversion conversion Direct ). 1 5 , Gaggle , 1 6 o ). Our Our ). n l i n e - -

R The authors declare no competing financial interests. COM manuscript as submitted. and J.P.M. wrote the manuscript. All authors reviewed and approved the final J.P.M. supervised the GenomeSpace project. K.Q., S.G., F.W., H.T., M.R., H.C., A.R. also consulted on the GenomeSpace architecture. H.T., M.R., A.R., H.Y.C. and by J.T.R., B.D., T.H., G.B.-A., D.B., G.P.B., B.T.L., R.M.K., A.N., E.S. and T.I., who by S.G., F.W. and D.B.-R. The GenomeSpace seed tools were added to the system supervision and input from A.R., H.Y.C. and J.P.M. The recipes were implemented and N.P. implemented the driving biological projects within GenomeSpace with M.R. designed and implemented the GenomeSpace software. K.Q., S.G., F.W. M.R., A.R. and J.P.M. conceived of the GenomeSpace concept. T.L., M.O. and AUTHOR Amazon Web Services (AWS). Institute P01 HG005062 and U41 HG007517, with additional initial support from supported by US National Institutes of Health–National Human Genome Research citations and figures, and L. Gaffney for help with the figures. This work has been stages of the GenomeSpace project. We thank J. Bistline for help with the J. Kent (University of California, Santa Cruz) for their involvement in the nascent M. Smoot (University of California, San Diego). Special thanks to D. Haussler and Institute of MIT and Harvard); J. Zhang (Stanford University); and H. Carter and contributions and input: P. Carr, B. Hill-Meyers, S.H. Lee and T. Tabor (Broad We thank other members of the GenomeSpace and GenePattern Teams for their A 16. 15. 14. 13. 12. 11. 10. 9. 8. 7. 6. 5. 4. 3. 2. 1. c o eprints and permissions information is available online at ckno

m

/ P 7 Society,2008). Computer (IEEE 95–104 E.) Woods, & D. P.,Garlan, Kruchten, (eds. Conference IEEE/IFIP Working Seventh 2008. WICSA 2008. Architecture, 26 Manual (2004). Shannon, P.T., Reiss, D.J., Bonneau, R. & Baliga, N.S. Baliga, & R. Bonneau, P.T.,D.J., Shannon,Reiss, in J. Chatterton, & J. Almquist, A., Wynne, I., Gorton, A.I. Saeed, A. Califano, & J. Watkinson, Z., Ji, K., Smith, A., Floratos, S. Karlsson, & J. Larsson, A.V.Krivtsov, T.Maniatis, & E.F. Fritsch, J., Sambrook, D.J. Wong, I. Ben-Porath, D. Karolchik, J.T.Robinson, Regev,A. & Koller,D. N., Friedman, E., Segal, M. Reich, Giardine,B. P.Shannon, B. Demchak, r ETIN , 176 (2006) 176 , e w , 1779–1780 (2010). 1779–1780 , p

r led CONTRIBUTIONS i n G G vol. 3 (Cold Spring Harbor Laboratory Press, 1989). Press, Laboratory Harbor Spring (Cold 3 vol. t FINANCIAL s g / ments i et al. et n et al. et et al. et d et al. et et al. et et al. et e et al. et et al. et . x nature methods et al. et et al. et

. h Nat. Genet. Nat.

t

Cell Stem Cell Stem Cell

Biotechniques

m Genome Res. Genome

Genome Res. Genome

F1000Res. INTERESTS Nucleic Acids Res. Acids Nucleic Nature

l . Nat. Biotechnol. Nat. Nat. Genet. Nat. brief communications

Oncogene

442 3

8 3

, 500–501 (2006). 500–501 ,

, 151 (2014). 151 , , 818–822 (2006). 818–822 , 1 2 13

3 5 40 , 333–344 (2008). 333–344 , 4

, 1451–1455 (2005). 1451–1455 , , 2498–2504 (2003). 2498–2504 , , 374–378 (2003). 374–378 , |

, 499–507 (2008). 499–507 , ADVANCE 24

29 3 , 5676–5692 (2005). 5676–5692 , 2 , 24–26 (2011). 24–26 , , D493–D496 (2004). D493–D496 , Molecular Cloning: A Laboratory A Cloning: Molecular

Nat. Genet. Nat. ONLINE BMC Bioinformatics BMC

h Software PUBLICATION t t

p 3 Bioinformatics : 6 / / , 1090–1098 , w w w . n a

t u |

r e

 . © 2016 Nature America, Inc. All rights reserved. connection layer includes a collection of web services with with services web of collection a includes GenomeSpace. layer to connection tools Connecting a smooth script-free connection between tools. Dropbox), data sharing, and the file format conversions that transferprovide to/from the cloud (including Amazon S3, Google toolsDrive, from within a tool; and(3) A Data Manager handles data storage, launches,tool includingability launchtheto otherGenomeSpace coordinatesand dependenciescapabilities and tool aboutmation ToolsAnalysismaintainsManagerinfor An (2) users. ofgroups default, but users may share directories or files with other by users or private are data All agencies. government and organizations compliantarerequirementswhich the withmany fromstandards AWSmechanisms,Amazon security the leverages GenomeSpace including single sign-on to the GenomeSpace tools, and data access. components. three (1) An Identity manages of Service sign-on credentials, consists It (EC2). Cloud Compute Elastic Amazon the in (AMI) InstanceMachine Amazon an as runs currently server acts with the server through these entry points. The GenomeSpace tionality ( points to the GenomeSpace server that provideslayer’ the thatcore includessystem func a collection of webGenomeSpace architecture. services with well-defined entry ONLINE nature methods

METHODS Supplementary Fig. 11 GenomeSpace presents a ‘connection ). The web user interface also inter The GenomeSpace GenomeSpace The - - -

under the GNU LGPL version 2.1 license. Technical resources resources Technical license. 2.1 version LGPL GNU the under on available is code. source GenomeSpace members— aggregator current Galaxy. and its GenePattern of either GenomeSpace via the join community can do interface that user own tools their have command-line not that note We it GenomeSpace. that to source data a as reported portal team web-based this connect to hour an development took the and Center, Cancer ( cBioPortal was community the to join tool recent the most The tool. on takes of type on file the typically depending or less, days programmer of two order resources, including to these tool a using tasks, Adding panels. GenomeSpace, user authentication common and dialogs for chooser available are widgets interface that user of Developers number a of language. advantage take any also can for (API) interface programming application RESTful a with services in web as or developed languages, tools those for Java kits as development that available client is JavaScript It server and functionality. GenomeSpace system core the the to provides points entry well-defined at at for software developers are available on the GenomeSpace website h t h t t p t : p / : / / w / g w e n w o . c m b e i o s h p p t t a o p c r : e t / a . / o b l . r i o g t r b / g f u o ) from Memorial Sloan Kettering Kettering Sloan Memorial from ) c r The GenomeSpace source code code source GenomeSpace The k - d e e t . v o e r l g o / p G e e r s n / o . doi: m e 10.1038/nmeth.3732 S p a c e / c o m b i n e d ,