COMBINE Archive Specification
Total Page:16
File Type:pdf, Size:1020Kb
COMBINE Archive Specification Frank T. Bergmann Nicolas Rodriguez [email protected] [email protected] Computing and Mathematical Sciences Babraham Institute California Institute of Technology Babraham Campus Cambridge Pasadena, CA, US Cambridge, UK Nicolas Le Novère [email protected] Babraham Institute Babraham Campus Cambridge Cambridge, UK Version 1 September 15, 2014 This release of the specification is available at http://co.mbine.org/documents/archive Contents 1 Document conventions 3 2 Background 4 2.1 Motivation ............................................................... 4 2.2 History ................................................................. 4 3 Proposed syntax and semantics 5 3.1 The archive format .......................................................... 5 3.2 COMBINE archive extensions .................................................... 5 3.3 Content of the archive ........................................................ 5 3.4 Namespace URI and other declarations necessary ........................................ 6 3.5 Primitive data types .......................................................... 6 3.6 The manifest.xml file and the OmexManifest class ...................................... 6 3.7 The Content class .......................................................... 6 3.8 Advised format for the archive metadata .............................................. 8 4 Illustrative examples of the syntax 10 5 Future development 11 5.1 Linking to external documents .................................................... 11 5.2 Cross References between entries ................................................. 11 5.3 Alternative versions of the archive metadata ............................................ 11 5.4 Convergence with other archive formats .............................................. 11 Acknowledgments 12 References 13 Page 2 of 14 1 Document conventions 1 Following the precedent set by other COMBINE specification documents, we use UML 1.0 (Unified Modeling Lan- 2 guage; Eriksson and Penker(1998); Oestereich(1999)) class diagram notation to define the constructs provided by 3 this package. 4 We also use the following typographical conventions to distinguish the names of objects and data types from other 5 entities: 6 Class: Names of ordinary (concrete) classes begin with a capital letter and are printed in an upright, bold, sans- 7 serif typeface. In electronic document formats, the class names are also hyperlinked to their definitions in 8 this specification document. 9 SomeThing, otherThing: Attributes of classes, data type names, literal XML, and generally all tokens other than 10 UML class names, are printed in an upright typewriter typeface. Primitive types defined begin with a cap- 11 ital letter; the COMBINE archive also makes use of primitive types defined by XML Schema 1.0 (Biron and 12 Malhotra, 2000; Fallside, 2000; Thompson et al., 2000), but unfortunately, XML Schema does not follow any 13 capitalization convention and primitive types drawn from the XML Schema language may or may not start 14 with a capital letter. 15 Section 1 Document conventions Page 3 of 14 2 Background 1 2.1 Motivation 2 Computational modeling is an increasingly interdisciplinary field. Different aspects of the modeling activity come 3 together, that need information to be stored in a cohesive unit. When sharing a model, not exchanging all relevant 4 files might cause problems. Several different approaches have been tried to solve this issue, such as folder based 5 project structures, or special versions of version control systems. Unfortunately these approaches are not as easy 6 to support for tool authors as a single file based solution would. 7 This specification describes COMBINE archive, a file that contains all the information needed to describe a mod- 8 eling and simulation experiment. The COMBINE archive is encoded with the Open Modeling Exchange format 9 (OMEX).A COMBINE archive is a single file containing the various documents (and possibly in the future, refer- 10 ences to documents), necessary for the description of a model and all associated datasets and procedures. This 11 includes for instance, but not limited to, simulation experiment descriptions in SED-ML, all models needed to run 12 the simulations in SBML or CellML and their graphical representations in SBGN-ML. 13 The COMBINE archive aims to augment these efforts by standardizing a manifest that describes all files in the 14 archive, guidance for encoding metadata and a convention for bundling the information together. Such an ap- 15 proach also allow resources using a distributed version control system (such as PMR2, http://models.cellml. 16 org/) to export COMBINE archives, by generating a manifest file during export. http://models.cellml.org/ Details 17 of earlier independent proposals are provided below. 18 2.2 History 19 The COMBINE archive was engendered from the SED-ML archive, that was first introduced in the SED-ML mailing 20 list in 2009, and briefly described in SED-ML Level 1 Version 1 (Waltemath et al., 2011). The SED-ML archive 21 addressed the need of providing a way to include all relevant models used in a SED-ML simulation experiment 22 along with the experiment specification. The SED-ML archive is basically the convention to provide, inside a ZIP 23 file (with extension .sedx), a SED-ML document with the same base name as the archive and all models listed in 24 the SED-ML file. 25 This specification has been created out of the many discussions that took place on the combine-discuss mailing 26 list. The primary threads are located here: 27 http://listserver.ebi.ac.uk/pipermail/combine-discuss/2011-October/thread.html 28 ■ http://listserver.ebi.ac.uk/pipermail/combine-discuss/2011-November/000016.html 29 ■ http://listserver.ebi.ac.uk/pipermail/combine-discuss/2012-January/thread.html 30 ■ The discussion continued at the last Hackathons, HARMONY 2012 where for the first time a larger audience dis- 31 cussed a preliminary proposal put together by Richard Adams, Frank T. Bergmann and Nicolas Le Novère (Adams 32 et al., 2012). Further discussions were held at HARMONY 2013. 33 (https://docs.google.com/document/d/1ji2mOiWzXXl9ON6sU6Svd3E_IufowUCz24hklqci0iM/edit). 34 Discussions about the development of OMEX and the COMBINE archive are now taking place on the combine- 35 archive Google group (https://groups.google.com/forum/?#!forum/combine-archive). 36 Section 2 Background Page 4 of 14 3 Proposed syntax and semantics 1 In this section, we define the syntax and semantics of the COMBINE archive. We expound on the various data 2 types and constructs defined, then in Section 4 on page10, we provide complete examples of archives. 3 3.1 The archive format 4 The COMBINE archive is encoded using the "Open Modeling EXchange format" (OMEX). The archive itself is a 5 "zip" file (Wikipedia, 2014). Zip is a file format used for data compression and archiving. A zip file contains one 6 or more files that have been compressed, to reduce file size, or stored as is. The technical specification of the ZIP 7 format is available from the PKWARE website (PKWARE Inc., 2012). 8 3.2 COMBINE archive extensions 9 The extension for the COMBINE archive is .omex 10 Additional extensions are available to indicate what is the main standard format used within the archive. This 11 help users to choose between different archives, and select appropriate software tools and in particular parsers. As 12 such, the purpose of the extensions is different from the master attribute of the manifest file, described in section 13 Section 3.7, which is read by the processing software once the archive is loaded and parsed. Currently, the following 14 extensions are specified: 15 .sedx - SED-ML archive (Waltemath et al., 2011) 16 ■ .sbex - SBML archive (Hucka et al., 2003) 17 ■ .cmex - CellML archive (Cuellar et al., 2003) 18 ■ .neux - NeuroML archive (Gleeson et al., 2010) 19 ■ .phex - PharmML archive (Moodie et al., 2013) 20 ■ .sbox - SBOL archive (Galdzicki et al., 2014) 21 ■ Note that a COMBINE archive may contain files in several different standard formats. Therefore the specific ex- 22 tension is only an indication. For instance, a database of models could distribute archives of the same models, 23 encoded in SBML with the extension .sbex and encoded in CellML with the extension .cmex. However, archives 24 containing models in both formats could also be distributed and the extension .omex used. If the archives contain 25 SED-ML files, the extension .sedx could be used and the selection between the model formats be devolved to the 26 SED-ML file. 27 3.3 Content of the archive 28 The archive contains: 29 1. a mandatory manifest file, called manifest.xml, always located at the root of the archive, that describes the 30 location and the type of each data file contained in the archive plus an entry describing the archive itself. 31 The location of those files is defined by a relative path. In the current version of the COMBINE archive, all the 32 files described must be included in the archive itself. It is envisioned that in the future the manifest might 33 list files located elsewhere, using valid and resolvable http URIs. 34 2. metadata files containing clerical information about the various files contained in the archive, and the archive 35 itself. A best practice is to include only one file for each file format metadata called metadata.* (where * 36 means the suitable file extension) 37 3. all remaining files necessary to the model and simulation project. 38 Section 3 Proposed syntax and semantics Page