An Introduction to Whole-Cell Modeling Release 0.0.1
Total Page:16
File Type:pdf, Size:1020Kb
An introduction to whole-cell modeling Release 0.0.1 Jonathan Karr Arthur Goldberg Jun 26, 2020 Contents 1 Introduction 3 1.1 Motivation for WC modeling......................................4 1.1.1 Biological science: understand how genotype influences phenotype.............4 1.1.2 Medicine: personalize medicine for individual genomes....................5 1.1.3 Synthetic biology: rationally design microbial genomes....................5 1.2 The biology that WC models should aim to represent and predict...................5 1.2.1 Phenotypes that WC models should aim to predict.......................5 1.2.2 Physics and chemistry that WC models should aim to represent................6 1.3 Fundamental challenges to WC modeling................................7 1.3.1 Integrating molecular behavior to the cell level over several spatiotemporal scales......7 1.3.2 Assembling a unified molecular understanding of cells from imperfect data.........8 1.3.3 Selecting, calibrating and validating high-dimensional models................ 10 1.4 Feasibility of WC models........................................ 10 1.4.1 Experimental methods, data, and repositories......................... 10 1.4.2 Modeling and simulation tools................................. 18 1.4.3 Models of individual pathways and model repositories.................... 23 1.4.4 Models of multiple pathways.................................. 26 1.5 Emerging principles and methods for WC modeling.......................... 28 1.5.1 Principles of WC modeling................................... 29 1.5.2 Methods for WC modeling................................... 30 1.6 Latest WC models and their limitations................................. 34 1.6.1 Coarse-grained models..................................... 34 1.6.2 Genomically-centric bottom-up fine-grained models...................... 34 1.6.3 Physiologically-centric top-down fine-grained models..................... 35 1.6.4 Spatially-centric bottom-up fine-grained models........................ 35 1.6.5 Hybrid models......................................... 35 1.7 Bottlenecks to more comprehensive and predictive WC models.................... 36 1.7.1 Inadequate experimental methods and data repositories.................... 36 1.7.2 Incomplete, inconsistent, scattered, and poorly annotated pathway models.......... 37 1.7.3 Inadequate software tools for WC modeling.......................... 37 1.7.4 Inadequate model formats................................... 38 1.7.5 Lack of coordination among the cell modeling community.................. 38 1.8 Technologies needed to advance WC modeling............................. 38 1.8.1 Experimental methods for characterizing cells......................... 38 1.8.2 Tools for aggregating, standardizing, and integrating heterogeneous data........... 38 1.8.3 Tools for scalably designing models from large datasets.................... 38 i 1.8.4 Rule-based format for representing models.......................... 39 1.8.5 Scalable network-free, multi-algorithmic simulator...................... 39 1.8.6 Scalable tools for calibrating models.............................. 39 1.8.7 Scalable tools for verifying models............................... 39 1.8.8 Additional tools that would help accelerate WC modeling................... 39 1.9 A plan for achieving comprehensive WC models as a community................... 40 1.9.1 Phase I: Piloting the core technologies and concepts of WC modeling............ 41 1.9.2 Phase II: Piloting collaborative WC modeling......................... 41 1.9.3 Phase III: Community modeling and model validation..................... 42 1.10 Ongoing efforts to advance WC modeling................................ 42 1.10.1 Genomically-centric models.................................. 42 1.10.2 Physiologically-centric, spatially-centric, and hybrid models................. 43 1.10.3 Technology development.................................... 43 1.11 Resources for learning about WC modeling............................... 45 1.11.1 Summer schools......................................... 45 1.11.2 Online forum.......................................... 45 1.12 Outlook.................................................. 46 2 Foundational concepts and skills for computational biology 47 2.1 Typing.................................................. 47 2.2 Software engineering........................................... 47 2.2.1 An introduction to Python................................... 47 2.2.2 Numerical computing with NumPy ............................... 54 2.2.3 Plotting data with matplotlib ................................ 57 2.2.4 Developing database, command line, and web-based programs with Python......... 59 2.2.5 Writing code for Python 2 and 3................................ 67 2.2.6 Organizing Python code into functions, classes, and modules................. 67 2.2.7 Structuring Python projects................................... 67 2.2.8 Revisioning code with Git, GitHub, and Meld......................... 68 2.2.9 Testing Python code with unittest, pytest, and Coverage.................... 71 2.2.10 Debugging Python code using the PyCharm debugger..................... 75 2.2.11 Documenting Python code with Sphinx............................ 75 2.2.12 Continuously testing Python code with CircleCI, Coveralls, Code Climate, and the Karr Lab’s dashboards........................................ 78 2.2.13 Distributing Python software with GitHub, PyPI, Docker Hub, and Read The Docs..... 83 2.2.14 Recommended Python development tools........................... 89 2.2.15 Comparison between Python and other languages....................... 91 2.3 Linux................................................... 91 2.3.1 How to build a Linux Mint virtual machine with Virtual Box................. 91 2.3.2 How to build a Ubuntu Linux image with Docker....................... 92 2.3.3 An introduction to Linux Mint................................. 95 2.4 Version and sharing data with Quilt................................... 96 2.4.1 Overview............................................ 96 2.4.2 Using Quilt........................................... 96 2.5 Scientific communication: papers, presentations, graphics....................... 96 2.5.1 Writing manuscripts...................................... 96 2.5.2 Reviewing manuscripts..................................... 99 2.5.3 Making posters......................................... 100 2.5.4 Making presentations...................................... 102 2.5.5 Visualizing data......................................... 103 2.5.6 Formatting textual documents with LaTeX........................... 104 2.5.7 Drawing vector graphics with Adobe Illustrator and Inkscape................. 105 2.5.8 Editing raster graphics with Gimp............................... 110 ii 3 Fundamentals of cell modeling 113 3.1 Data aggregation............................................. 113 3.1.1 Common data types and data sources.............................. 113 3.1.2 Finding data sources...................................... 114 3.1.3 Finding relevant data for models................................ 114 3.1.4 Data aggregation tools..................................... 115 3.1.5 Determining the consensus of multiple observations...................... 115 3.1.6 Exercise............................................. 115 3.2 Input data organization.......................................... 115 3.2.1 Schema............................................. 116 3.2.2 Software tools.......................................... 116 3.2.3 Exercises............................................ 117 3.3 Model design............................................... 117 3.3.1 Software tools.......................................... 117 3.3.2 Exercises............................................ 118 3.4 Model calibration............................................. 119 3.4.1 Key concepts.......................................... 119 3.4.2 Calibration data......................................... 119 3.4.3 Approximate, multi-stage parameter estimation........................ 120 3.4.4 Exercise............................................. 121 3.5 Model representation........................................... 121 3.5.1 Custom numerical simulation code............................... 121 3.5.2 Standard numerical simulation packages............................ 121 3.5.3 Enumerated modeling languages................................ 122 3.5.4 Ruled-based modeling languages................................ 122 3.5.5 Rule-based modeling API.................................... 123 3.5.6 High-level rule-based modeling language........................... 123 3.6 Model annotation............................................. 123 3.6.1 Component-level semantic annotations of species, reactions, and parameters......... 123 3.6.2 Model-level semantic annotations............................... 124 3.6.3 Model provenance annotations................................. 124 3.6.4 Simulation algorithm annotations................................ 124 3.6.5 Exercises............................................ 125 3.7 Model composition............................................ 125 3.7.1 Model composition procedure................................. 126 3.7.2 Software tools.......................................... 126 3.7.3 Exercises............................................ 127 3.8 Mathematical representations and simulation algorithms........................ 129 3.8.1 Boolean/logical models..................................... 129 3.8.2