Development and Dissemination of Computational Methods for Genome-Scale Modeling

UC San Diego UC San Diego Electronic Theses and Dissertations Title Development and Dissemination of Computational Methods for Genome-scale Modeling Permalink https://escholarship.org/uc/item/1w98620m Author Ebrahim, Ali Publication Date 2016 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, SAN DIEGO Development and Dissemination of Computational Methods for Genome-scale Modeling A Dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Bioengineering by Ali Ebrahim Committee in charge: Professor Bernhard Palsson, Chair Professor Nuno Bandeira Professor Christian Metallo Professor Glenn Tesler Professor Kun Zhang 2016 Copyright Ali Ebrahim, 2016 All rights reserved. The Dissertation of Ali Ebrahim is approved, and it is ac- ceptable in quality and form for publication on microfilm and electronically: Chair University of California, San Diego 2016 iii DEDICATION My heart breaks for the refugees who have been driven from their homes in Syria. Though it must mean very little, I dedicate this thesis to them, and I will try to apply some of what I have learned to help give them some of the opportunities I have had. iv EPIGRAPH What I cannot create, I do not understand. |Richard P. Feynman I like to call it low-input, high-throughput, no-output biology. |Sydney Brenner v TABLE OF CONTENTS Signature Page................................... iii Dedication...................................... iv Epigraph......................................v Table of Contents.................................. vi List of Figures................................... xi List of Tables.................................... xiii Acknowledgements................................. xiv Vita......................................... xix Abstract of the Dissertation............................ xxi Chapter 1 COBRApy: COnstraints-Based Reconstruction and Analysis for Python................................1 1.1 Abstract............................1 1.1.1 Background......................1 1.1.2 Results.........................2 1.1.3 Conclusion.......................2 1.2 Background..........................2 1.3 Implementation........................4 1.4 Results and discussion....................4 1.4.1 Core classes: model, metabolite, reaction, & gene.6 1.4.2 Key capabilities....................8 1.4.3 Advanced capabilities.................9 1.5 Conclusions.......................... 10 1.6 Acknowledgements...................... 10 Chapter 2 Distribution and Portability of Validated Genome-Scale Constraint-Based Models...................... 12 2.1 Introduction and Problem Statement............ 12 2.2 A Potential Solution..................... 13 2.3 Acknowledgments....................... 15 Chapter 3 Model-driven elucidation of transcriptional regulatory network in bacteria............................... 16 3.1 Abstract............................ 16 3.2 Introduction.......................... 17 vi 3.3 Results............................. 19 3.3.1 Model-driven prediction of activation conditions for TFs........................... 19 3.3.2 Experimental confirmation of predicted conditions. 21 3.3.3 Confirmation of ChIP-exo binding sites with motif analysis and comparison to known sites....... 23 3.3.4 Reconstruction of NtrC and Nac regulons...... 25 3.3.5 Association of TF with σ-factors........... 26 3.3.6 Contrasting functions of NtrC and Nac regulons.. 28 3.3.7 Primary response of the NtrC regulon........ 29 3.3.8 Carbon flux rebalancing by the Nac regulon.... 32 3.3.9 Lower activation of Nac on cytidine......... 33 3.3.10 Coupling of nitrogen and carbon metabolism.... 35 3.4 Discussion........................... 37 3.4.1 Selection of optimal experimental conditions by model-driven experimental design.......... 37 3.4.2 Model-guided regulon elucidation and characterization 39 3.4.3 Application of ChIP-exo method in bacterial studies 40 3.4.4 TF binding sites in non-regulatory regions..... 41 3.4.5 Nitrogen starvation is coupled to other responses.. 42 3.5 Materials and Methods.................... 43 3.6 Author Contributions..................... 44 3.7 Acknowledgements...................... 44 Chapter 4 Multi-omic data integration enables discovery of hidden biological regularities.............................. 52 4.1 Abstract............................ 52 4.2 Results and Discusssion.................... 53 4.3 Acknowledgements...................... 60 Chapter 5 Construction of Genome-scale models of Metabolism and Gene Expression for the Escherichia coli family............ 64 5.1 Abstract............................ 64 5.2 Introduction.......................... 65 5.3 Results and Discussion.................... 67 5.3.1 The \miniME" Reformulation of ME models.... 67 5.3.2 Global Parameter Characterization......... 69 5.3.3 Extending from E. coli K12 MG1655 to 40 related strains......................... 70 5.3.4 Conclusions...................... 72 5.4 Materials and Methods.................... 72 5.4.1 Optimization Procedure............... 72 5.4.2 Multiple ME Model Construction.......... 74 5.5 Acknowledgements...................... 75 vii Appendix A Documentation for COBRApy................... 79 A.1 Getting Started........................ 79 A.1.1 Reactions....................... 80 A.1.2 Metabolites...................... 82 A.1.3 Genes......................... 83 A.2 Building a Model....................... 85 A.3 Reading and Writing Models................. 89 A.3.1 SBML......................... 89 A.3.2 JSON......................... 90 A.3.3 MATLAB....................... 91 A.3.4 Pickle......................... 91 A.4 Simulating with FBA..................... 92 A.4.1 Running FBA..................... 92 A.4.2 Changing the Objectives............... 94 A.4.3 Running FVA..................... 95 A.4.4 Running pFBA.................... 98 A.5 Simulating Deletions..................... 98 A.5.1 Single Deletions.................... 98 A.5.2 Double Deletions................... 100 A.6 Phenotype Phase Plane.................... 101 A.7 Mixed-Integer Linear Programming............. 105 A.7.1 Ice Cream....................... 105 A.7.2 Restaurant Order................... 106 A.7.3 Boolean Indicators.................. 107 A.8 Quadratic Programming................... 110 A.9 Loopless FBA......................... 113 A.10 Gapfillling........................... 117 A.10.1 GrowMatch...................... 117 A.10.2 SMILEY........................ 119 A.11 Solver Interface........................ 119 A.11.1 Attributes and functions............... 120 A.11.2 Example with FVA.................. 124 A.12 Using the COBRA toolbox with cobrapy.......... 125 Appendix B Distributing Validated Constraint-Based Models in SBML... 128 B.1 Systems Biology Ontology Terms and COBRA models... 128 B.2 Strict Models......................... 130 B.3 Defining Flux Bounds..................... 130 B.4 Reaction Reversibility..................... 132 B.5 Gene Reaction Rules..................... 133 B.6 Biomass Objective Reactions................. 134 B.7 Conventions for Identifiers.................. 135 viii Appendix C Model-driven elucidation of transcriptional regulatory network in bacteria - Supplementary Information............... 136 C.1 Advantages of ChIP-exo over previous methods...... 136 C.2 Locally acting TFs regulated by Nac............. 137 C.3 Supplementary Experimental Procedures.......... 138 C.3.1 Prediction of activation conditions for transcription factor.......................... 138 C.3.2 Bacterial strains, media, and growth conditions.. 139 C.3.3 ChIP-exo experiment................. 140 C.3.4 RNA-seq expression profiling............. 141 C.3.5 Peak calling for ChIP-exo dataset.......... 143 C.3.6 Classification of regulatory and non-regulatory binding sites........................ 143 C.3.7 Motif search from ChIP-exo peaks.......... 144 C.3.8 Calculation of differentially expressed gene..... 144 C.3.9 COG functional enrichment............. 145 C.3.10 Measuring growth rate and glucose uptake rates on different nitrogen sources............... 145 C.3.11 FBA analysis with glucose uptake rates....... 146 C.3.12 Comparison of in silico growth with experimental evidence........................ 146 C.3.13 Conservation analysis of nitrogen-related genes... 147 C.4 Supplementary Figures.................... 148 Appendix D Multi-omic data integration enables discovery of hidden biological regularities - Supplementary Information............. 163 D.1 Methods............................ 163 D.1.1 mRNA seq methods.................. 163 D.1.2 Ribosome profiling.................. 164 D.1.3 Genome-wide Secondary Structure Annotations.. 165 D.1.4 Tertiary Structure/ Protein Domain annotations.. 167 D.1.5 Identification of Shine-Dalgarno-like codons.... 167 D.1.6 Ribosome density/pause site accounting (Figures D.2- D.3).......................... 168 D.1.7 Computational Method for Predicting keff parameters 169 D.1.8 Predictions of mRNA expression under identical conditions to proteomic data............... 172 D.1.9 Predicting Differential Gene Expression with iOL1650-ME...................... 173 D.1.10 Sampling of M-model flux states in iJO1366.... 174 D.2 Structure of ME models................... 175 D.3 ME model coupling parameters............... 178 D.4 Simulation of Batch Growth with ME............ 180 ix D.5 Simulating ME with estimated parameters......... 180 D.6 Ribosome profiling pause site analysis............ 182 D.7 Supplementary Figures.................... 184

Load more