Software for Systems Biology: from Tools to Integrated Platforms

REVIEWS STUDY DESIGNS Software for systems biology: from tools to integrated platforms Samik Ghosh*, Yukiko Matsuoka*‡, Yoshiyuki Asai §, Kun-Yi Hsin§ and Hiroaki Kitano*§|| Abstract | Understanding complex biological systems requires extensive support from software tools. Such tools are needed at each step of a systems biology computational workflow, which typically consists of data handling, network inference, deep curation, dynamical simulation and model analysis. In addition, there are now efforts to develop integrated software platforms, so that tools that are used at different stages of the workflow and by different researchers can easily be used together. This Review describes the types of software tools that are required at different stages of systems biology research and the current options that are available for systems biology researchers. We also discuss the challenges and prospects for modelling the effects of genetic changes on physiology and the concept of an integrated platform. Systems biology emerged in the mid‑1990s with the underlying molecular mechanisms and to predict the aim of achieving a system-level understanding of living impact of perturbations, such as drug treatments, on organisms and applying this knowledge in various fields, these biological systems. including medicine and biotechnology1–4. Early applica‑ Software tools and resources for systems biology need tions included modelling cell cycle dynamics5–7, such as to be tailored to their intended applications in order to a computational model that explained the effects of over achieve the objectives of novel biological discoveries, 120 knockout mutations on cell cycle dynamics in yeast7. drug design and answers to life-science research ques‑ Significant progress has also been made in the analysis of tions. A typical workflow for computational analysis is signalling pathways — for example, in understanding the a cyclical process involving data acquisition, modelling dynamics of mitogen-activated protein kinase (MAPK) and analysis. Prediction and explanation capabilities are signalling8 — and in cancer drug discovery applications, associated with this cycle, and the integration and shar‑ *The Systems Biology in which a reagent that was developed using modeling of knowledge help to sustain these capabilities (FIG. 1). Institute, 5F Falcon Building, based computational analysis is now in clinical trials9,10. Here we describe the principles of each stage in this 5‑6‑9 Shirokanedai, Minato, System-level studies are often built on molecular and workflow and some examples of current tools. Links to Tokyo 108‑0071, Japan. ‡JST ERATO Kawaoka genetic findings and ‘omics’ studies, such as genomics, the tools and resources mentioned in this Review are Infection-induced Host proteomics, and metabolomics. The main challenges provided in Supplementary information S1 (table), along Response Project, 4‑6‑1 in systems biology are the complexity of the systems, with information about their type and access policy. Shirokanedai, Minato, the vast quantities of data and the scattered pieces of TABLE 1 provides a matrix to help users choose appropri‑ Tokyo 108‑8639, Japan. knowledge; these all have to be integrated; therefore, ate tools and resources. We provide a perspective on the §Okinawa Institute of Science and Technology, 1919‑1, systematic, computational tools are crucially important current challenges facing systems biology software tools, Tancha, Onna-son, Kunigami, in systems biology. Software platforms have transformed and we describe our view that integrated software plat‑ Okinawa 904‑0412, Japan. industries — such as aviation, entertainment and elec‑ forms will help to address future research problems in ||Sony Computer Science tronics — by drastically improving productivity and biology and medicine. Laboratories, Inc., 3‑14‑13 11 Higashi-Gotanda, Shinagawa, by offering new capabilities . Biological sciences are Tokyo 141‑0022, Japan. no different. In particular, the success of systems biol‑ Data management Correspondence to S.G. ogy, and its application in areas such as systems drug The proper acquisition and handling of data is crucially and H.K. design, requires sophisticated data handling, model‑ important for both the generation and verification of e-mails: [email protected]; ling, integrated computational analysis and knowledge hypotheses. The rapid development of high-throughput [email protected] doi:10.1038/nrg3096 integration. For example, the creation of computational experimental techniques is transforming life-science 12 Published online models enables us to predict the behaviours of bio‑ research into ‘big data’ science , and although numerous 3 November 2011 logical systems, thereby helping us to understand the data-management systems exist13–16, the heterogeneity of NATURE REVIEWS | GENETICS VOLUME 12 | DECEMBER 2011 | 821 © 2011 Macmillan Publishers Limited. All rights reserved REVIEWS C 4CYGZRGTKOGPVCNFCVC 'ZRGTKOGPVU &CVCOCPCIGOGPV 'ZRGTKOGPVCNFGUKIP #PPQVCVGFGZRGTKOGPVCNFCVCUGVU 2TQDNGOFGȮPKVKQP 8CTKQWURWDNKECVKQPU 0GVYQTMKPHGTGPEG &GGREWTCVKQP 2CVJYC[UCPFQVJGTFCVCDCUGU +PHGTTGFPGVYQTMU /QNGEWNCTKPVGTCEVKQPOCR +FGPVKȮECVKQPQHPQXGNKPVGTCEVKQPU 2CTCOGVGTQRVKOK\CVKQP &[PCOKECNOQFGN /QFGNCPCN[UKUCPFXGTKȮECVKQP 0GYJ[RQVJGUKU *[RQVJGUGUVQGZRNCKPU[UVGODGJCXKQWTU D 6KOGEQWTUGCPFOWNVKRNG /KETQCTTC[FCVCCPF RGTVWTDCVKQPGZRGTKOGPVUHQT/%(EGNNU 5+.#%RTQVGQOKEUFCVC CPFVCOQZKHGPTGUKUVCPV/%(EGNNU 'ZRGTKOGPVCNFGUKIP 2TQDNGOFGȮPKVKQP WPFGTUVCPFKPIDTGCUVECPEGT FTWITGUKUVCPEGOGEJCPKUOU &GȮPGVJGUEQRG 2WDOGFUGCTEJCPF QHFGGREWTCVKQP 2CVJ6GZVDCUGFVGZVOKPKPI #PPQVCVGFFCVCUGVU &GGREWTCVKQP WUKPI%GNN&GUKIPGT 4GCEVQOGCPF2CPVJGT RCVJYC[FCVCDCUG $C[GUKCPKPHGTGPEG QPOKETQCTTC[FCVC %QPUKFGTJ[RQVJGVKECN KPVGTCEVKQPVQDG *[RQVJGUGUQH CFFGFVQVJGOCR &GXGNQRCOQNGEWNCTKPVGTCEVKQP KPVGTCEVKQPUFGTKXGF OCRDCUGFQP')(4O614#-6CPF HTQOOKETQCTTC[FCVC 1GUVTQIGPTGEGRVQTTGNCVGFRCVJYC[U 2CTCOGVGTQRVKOK\CVKQP WUKPIIGPGVKECNIQTKVJOU &[PCOKECNUKOWNCVKQPWUKPI/#6.#$ /QFGNCPCN[UKUCPFXGTKȮECVKQP &GUKIPGZRGTKOGPVUVQ XGTKH[VJGJ[RQVJGUKU *[RQVJGUKUQPCFTWITGUKUVCPEGOGEJCPKUO *[RQVJGUGUVQGZRNCKPU[UVGODGJCXKQWTU 0CVWTG4GXKGYU^)GPGVKEU 822 | DECEMBER 2011 | VOLUME 12 www.nature.com/reviews/genetics © 2011 Macmillan Publishers Limited. All rights reserved REVIEWS ◀ Figure 1 | Workflow of computational tasks in systems biology. A research cycle semantic annotation of data. Various specialized ontol‑ showing the computational modelling and analyses that are involved in the workflow. ogies for biology are in development; for example, the a | The workflow starts from the ‘problem definition’ of the research project (shown in Gene Ontology (GO) and the Systems Biology Ontology the green box). One stream of the workflow starts with experimental design, followed (SBO) (see Supplementary Information S1 (table) for a by the execution of experiments, data management and network inference. A parallel stream of the workflow consists of deep curation, parameter optimization, dynamical comprehensive list of biomedical ontologies). model analysis and model verification using experimental data. Outputs are shown in red boxes. Discrepancies between simulation results from the computational model Data-management and data-analysis tools. Current and experimental data indicates that some of the underlying hypotheses need to be data-management systems can be broadly classified modified; the simulation should then be tested again when these new hypotheses as spreadsheet-based or Web-based, or as laboratory are incorporated into the model. Transformation of a network that is inferred from information management systems (LIMS). Spreadsheet large-scale data into a precise, mechanism-based model is an important step. However, programs have historically been the most popular mode this step is not yet fully achievable in practice, as indicated by the dotted arrow in the of data storage and communication in the life-science figure. b | An example biological application of the workflow from part a; in this case, community, owing mainly to the ease of use and sharing; research aiming to understand mechanisms of drug resistance in breast cancer. After for example, template-based spreadsheets like MAGE- the definition of the problem, time-series, multiple perturbation experiments would be designed, followed by data annotation, data analysis and network inference. Results TAB (a spreadsheet-based, MIAME-supportive format from the data analysis would be used to define the scope of deep curation. However, in for microarray data) and the Investigation–Study–Assay some cases, a molecular interaction map would be created before the experiment is (ISA)-TAB formats. However, their integration with designed, so that the experiments could be designed based on existing knowledge. analysis tools and computational workflows requires When moving from the molecular interaction map to dynamical simulation, often only custom-built interfaces that are not supported on all a part of the deep-curation-based molecular interaction map would be used for software platforms. In addition, a standardized practice dynamical modelling, by which possible hypotheses for drug resistance mechanisms for filling the spreadsheet is required. could be generated. This is an iterative process involving both ‘dry’ and ‘wet’ research. More recently, online wiki-based document and EGFR, epidermal growth factor receptor; mTOR, mammalian target of rapamycin; project management has become a popular mode of SILAC, stable isotope labelling with amino acids in cell culture. exchange

Software for Systems Biology: from Tools to Integrated Platforms

Practical Resources for Enhancing the Reproducibility of Mechanistic Modeling in Systems Biology

"Using Views of Systems Biology Cloud: Application

Systems Biology Graphical Notation Systems Biology Markup

Principled Annotation of Quantitative Models in Systems Biology

Controlled Vocabularies and Semantics in Systems Biology

DOE Systems Biology Knowledgebase Implementation Plan

Integrating Systems Biology Models and Biomedical Ontologies

NATIONAL ACADEMY of SCIENCES Survey of Bioinformatics

Notions of Similarity for Systems Biology Models

A Systemic Multi-Scale Unified Representation of Biological Processes in Prokaryotes Vincent J

SBML (The Systems Biology Markup Language), Model Databases, And

Controlled Vocabularies and Semantics in Systems Biology