A Fully Featured COMBINE Archive of a Simulation Study
Total Page:16
File Type:pdf, Size:1020Kb
F1000Research 2016, 5:2421 Last updated: 26 MAR 2020 DATA NOTE A fully featured COMBINE archive of a simulation study on syncytial mitotic cycles in Drosophila embryos [version 1; peer review: 1 approved, 2 approved with reservations] Martin Scharm, Dagmar Waltemath Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Rostock, Germany First published: 29 Sep 2016, 5:2421 ( Open Peer Review v1 https://doi.org/10.12688/f1000research.9379.1) Latest published: 29 Sep 2016, 5:2421 ( https://doi.org/10.12688/f1000research.9379.1) Reviewer Status Abstract Invited Reviewers COMBINE archives are standardised containers for data files related to a 1 2 3 simulation study in computational biology. This manuscript describes a fully featured archive of a previously published simulation study, including (i) the version 1 original publication, (ii) the model, (iii) the analyses, and (iv) metadata 29 Sep 2016 report report report describing the files and their origin. With the archived data at hand, it is possible to reproduce the results of the original work. The archive can be used for both, educational and research purposes. Anyone may reuse, extend and update the archive to make it a valuable resource for the 1 Lars Juhl Jensen, University of Copenhagen, scientific community. Copenhagen, Denmark Keywords 2 Laurence Calzone, Institut Curie, Paris, France COMBINE , data , containers Mihaly Koltai, Institut Curie, Paris, France 3 Alan Garny, University of Auckland, Auckland, New Zealand This article is included in the Data: Use and Reuse Any reports and responses or comments on the collection. article can be found at the end of the article. Corresponding authors: Martin Scharm ([email protected]), Dagmar Waltemath ([email protected]) Competing interests: No competing interests were disclosed. Grant information: This work has been funded by the German Federal Ministry of Education and Research (BMBF) as part of the e:Bio programs SEMS (FKZ 0316194) and SBGN-ED+ (FKZ 031 6181). Copyright: © 2016 Scharm M and Waltemath D. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication). How to cite this article: Scharm M and Waltemath D. A fully featured COMBINE archive of a simulation study on syncytial mitotic cycles in Drosophila embryos [version 1; peer review: 1 approved, 2 approved with reservations] F1000Research 2016, 5:2421 ( https://doi.org/10.12688/f1000research.9379.1) First published: 29 Sep 2016, 5:2421 (https://doi.org/10.12688/f1000research.9379.1) Page 1 of 13 F1000Research 2016, 5:2421 Last updated: 26 MAR 2020 Introduction automatically retrieved from an online resource (initial archive). In systems biology and systems medicine, the steadily increasing size Secondly, the data files were organised into subdirectories, fol- and complexity of simulation studies pose additional challenges to lowing the different aspects of a simulation study (documentation, sharing reproducible results1. Repeated mentions of problems with model, experiment, result). Thirdly, missing files were manually replication and reproducibility2–4 led to new standards, tools, and retrieved from web resources or created using COMBINE-compliant methods for the transfer of reproducible simulation studies5–9. Sev- software tools. The three steps are described in the following. eral projects and initiatives already deal with reproducibility issues, such as COMBINE (co.mbine.org), FAIRDOM (fair-dom.org), Retrieving an initial COMBINE archive and the Reproducibility Initiative (reproducibilityinitiative.org). The initial version of the COMBINE archive was generated using the web-based software tool M2CAT18 Version 0.1 (m2cat.sems.uni- The Computational Modeling in Biology Network (COMBINE) rostock.de). Among the suggested archives for the work by Calzone coordinates the development of standard formats for various aspects et al., we chose the simulation study containing a CellML model of a simulation study: The Systems Biology Markup Language and a visualisation of the model in three different formats (PNG, (SBML)10 and CellML11 encode the mathematical models; the SVG, AI). M2CAT automatically generated the initial COMBINE Systems Biology Graphical Notation (SBGN)12 encodes the visual archive from these files. It also added metadata to the archive, such representation of models; the Simulation Experiment Description as annotations to creators, contributors, and modification times. Markup Language (SED-ML)13 encodes the simulation recipes; and M2CAT retrieved this metadata from the corresponding GIT project the Systems Biology Result Markup Language (SBRML)14 encodes in the Physiome Model Repository (git log). numerical data and simulation results. Organising the COMBINE archive Today’s studies consist of multiple, heterogeneous, and sometimes For convenience, the files inside the COMBINE archive distributed data files, leading to the challenge of exchanging were structured in subfolders. The initial archive was therefor complete and thus reproducible results. To close this gap, the imported into the CombineArchiveWeb application (WebCAT,9) COMBINE community developed the COMBINE archive8. A Version 0.4.13 (webcat.sems.uni-rostock.de). WebCAT is a web COMBINE archive is a single file that aggregates all data files and interface to display and modify the files contained in an archive, information necessary to reproduce a simulation study in compu- together with metadata and file structures. The files inside the tational biology. The skeleton of a COMBINE archive consists of archive were organised in four directories, which reflect the a manifest and a metadata file, specified by the Open Modeling different aspects of a simulation study: EXchange format (OMEX). • documentation/: files that describe and document the model and/or experiment (empty) Here we describe a fully featured COMBINE archive, which encodes an investigation of the syncytial mitotic cycles in Drosophila embryos15. The study published by Calzone et al. • model/: files that encode and visualise the biological system proposes a dynamical model for the molecular events underlying (4 files) rapid, synchronous, syncytial nuclear division cycles in Drosophila embryos. This particular study was chosen for several reasons. • experiment/: files that encode the in silico setup of the Firstly, the paper, the documentation, and the related data are experiment (empty) openly accessible. Secondly, the model is available in two standard formats: The CellML encoding is available from the Physi- • result/: files that result from running the experiment ome Model Repository16 at models.cellml.org/exposure/ (empty) 1a3f36d015121d5596565fe7d9afb332 and the SBML encoding is available from BioModels17 at www.ebi.ac.uk/biomodels-main/ All files in the initial archive were stored in the model/ directory. BIOMD0000000144. Thirdly, both model files are already curated, However, these files alone are not sufficient to reproduce the study. which increases the level of trust. Fourthly, the model describes a common biological system (cell cycle). Thus, the basic Extending the COMBINE archive mechanisms of the encoded biology should be familiar to To make the encoded study reproducible, the COMBINE archive many researchers, reducing the effort of understanding the needs to be extended with additional files. example. The article is typically the central object of a research study. This archive contains files that are openly available for download, For this study, the original publication by Calzone et al., together as well as previously unpublished files that were generated using with available supplementary information, was retrieved from COMBINE-compliant software tools (see Section Materials and the homepage of the journal Molecular Systems Biology (msb. methods). When executed, it reproduces the original findings by embopress.org/content/3/1/131). Using WebCAT, the files were Calzone et al. uploaded to the documentation/directory of the archive. The automatically added metadata was adjusted to attribute the authors Materials and methods of the publication and to state when and where the files were The fully featured COMBINE archive was created in three subse- downloaded. In the background, WebCAT encoded the metadata in quent steps. Firstly, all available materials relating to the study were RDF/XML and added it to the archive. Page 2 of 13 F1000Research 2016, 5:2421 Last updated: 26 MAR 2020 The model is not only available in CellML format, but also in open repositories, an initial version was created using the SED-ML SBML format. The SBML file was retrieved from BioModels Web Tools (SWT) Version 2.1 (bqfbergmann.dyndns.org/SED- (www.ebi.ac.uk/biomodels-main/download?mid=BIOMD0000 ML_Web_Tools). SWT takes the model files and creates a default 000144, SBML Level 2 Version 1) and uploaded to the model/ simulation description with standard settings. For this study, a directory. Again, the metadata was corrected to attribute the origi- default SED-ML file encodes instructions to generate 66 plots nal authors, curators, and contributors, as stated on the BioModels and a data table. Each plot describes the change of concentration website