UC San Diego UC San Diego Electronic Theses and Dissertations

Title Utilizing genome-scale models to enhance high-throughput data analysis : : from pathways to dynamics

Permalink https://escholarship.org/uc/item/3h54t9fb

Author Bordbar, Aarash

Publication Date 2014

Peer reviewed|Thesis/dissertation

eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, SAN DIEGO

Utilizing genome-scale models to enhance high-throughput data analysis: from pathways to dynamics

A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy

Bioengineering

Aarash Bordbar

Committee in charge:

Professor Bernhard Ø. Palsson, Chair Professor Shankar Subramaniam, Co-Chair Professor Kelly Frazer Professor Alexander Hoﬀmann Professor Kun Zhang

Co-Chair

Chair

University of California, San Diego

2014

iii DEDICATION

To maman and baba.

iv EPIGRAPH

I’ve failed over and over and over again in my life and that is why I succeed. —Michael Jordan

v TABLE OF CONTENTS

Signature Page ...... iii

Dedication ...... iv

Epigraph ...... v

Table of Contents ...... vi

List of Figures ...... xi

List of Tables ...... xiii

Acknowledgements ...... xiv

Vita ...... xvii

Abstract of the Dissertation ...... xx

Chapter 1 Constraint-based models predict metabolic and associated cellular function ...... 1 1.1 Introduction ...... 1 1.2 Foundational Developments ...... 7 1.2.1 Initial Studies (1986-1998) ...... 7 1.2.2 Building Genome-Scale Networks (1999-2004) . 8 1.2.3 Integrating Omics Data (2005-2009) ...... 8 1.2.4 Maturing to Predictive Practice (2010-) . . . . . 8 1.3 Reﬁning objectives ...... 9 1.3.1 Do cells maximize growth rate? ...... 11 1.3.2 Moving beyond the assumption of growth optimality ...... 11 1.4 Contextualizing omic data ...... 12 1.4.1 Why are some enzymes speciﬁc and some promiscuous? ...... 12 1.4.2 What is the role of metabolism in pathogenesis? 13 1.5 Characterizing interaction networks ...... 16 1.5.1 Genetic interaction networks ...... 16 1.5.2 Transcriptional regulatory networks ...... 17 1.6 Targeted metabolic knowledge expansion ...... 18 1.6.1 Discovering new human metabolic capabilities . 18 1.6.2 Discovering enzyme functions in E. coli . . . . . 19 1.7 Designing metabolic phenotypes ...... 19 1.7.1 Production of non-natural, commodity chemicals 21

vi 1.8 Discovering Drug Targets ...... 21 1.8.1 Exploiting deﬁciencies in cancer metabolism . . 22 1.8.2 Essential metabolites guide antibiotic discovery 23 1.8.3 Increasing antibiotic eﬃcacy through ROS production ...... 24 1.9 Coupling with other cellular processes ...... 25 1.9.1 Modeling transcription, translation, and metabolism ...... 25 1.9.2 Merging statistics with mechanistic networks . . 27 1.9.3 Structural Systems Biology ...... 28 1.10 From pathways to dynamics ...... 29 1.11 Acknowledgements ...... 31

Chapter 2 Minimal metabolic pathway structure is consistent with associated biomolecular interactions ...... 32 2.1 Abstract ...... 32 2.2 Introduction ...... 33 2.3 Results ...... 34 2.3.1 Defining a minimal network pathway structure for metabolism ...... 34 2.3.2 MinSpan pathways are supported by independent biological datasets ...... 37 2.3.3 MinSpan pathways are more biologically supported than human-defined pathways ...... 40 2.3.4 Global Comparison of MinSpan and Human- Derived Pathways ...... 41 2.3.5 Key Examples of MinSpan Differences . . . . . 46 2.3.6 MinSpan pathways predict transcriptional regulation ...... 49 2.3.7 MinSpan pathways aid in experimental discovery of novel regulation ...... 54 2.4 Discussion ...... 56 2.5 Materials and Methods ...... 58 2.5.1 MinSpan Formulation ...... 58 2.5.2 Correlation Analysis ...... 61 2.5.3 Comparing Pathway Databases ...... 62 2.5.4 Determining Reaction Fluxes and Transcription Factor Activities ...... 63 2.5.5 Growth conditions, RNA isolation, and RNA-seq 64 2.5.6 Transcript Quantification from RNA-seq reads . 64 2.5.7 Analysis workflow for dual perturbations . . . . 64 2.6 Acknowledgements ...... 65

vii Chapter 3 Insight into human alveolar macrophage and M. tuberculosis interactions via metabolic reconstructions ...... 66 3.1 Abstract ...... 66 3.2 Introduction ...... 67 3.3 Results ...... 68 3.3.1 Constructing the HM: human alveolar macrophage, iAB-AMØ-1410 ...... 69 3.3.2 Characterizing the HM, iAB-AMØ-1410 . . . . 71 3.3.3 Infection in silico, the HPM: iAB-AMØ-1410-Mt- 661 ...... 74 3.3.4 Characterizing the metabolism of the HPM . . . 80 3.3.5 Changes in reaction activity of iAB-AMØ-1410- Mt-661 ...... 82 3.3.6 Gene Essentiality Predictions ...... 84 3.3.7 Host-pathogen interactions in diﬀerent infectious states ...... 88 3.4 Discussion ...... 93 3.5 Materials and methods ...... 96 3.5.1 Construction of iAB-AMØ-1410 ...... 96 3.5.2 Biomass Composition ...... 97 3.5.3 Integration of iAB-AMØ-1410 and iNJ661 . . . 97 3.5.4 Flux-balance analysis ...... 98 3.5.5 Characterizing the macrophage’s metabolic functions ...... 99 3.5.6 Reaction activity tests ...... 99 3.5.7 Monte Carlo sampling ...... 100 3.5.8 Infectious state M. tb objective function formulation ...... 101 3.5.9 Gene essentiality tests ...... 101 3.5.10 Building M. tb infection speciﬁc models . . . . . 101 3.6 Acknowledgements ...... 102

Chapter 4 Model-driven multi-omic data analysis elucidates metabolic immunomodulators of macrophage activation ...... 103 4.1 Abstract ...... 103 4.2 Introduction ...... 104 4.3 Results ...... 106 4.3.1 RAW 264.7 metabolic network reproduces experimentally measured flux rates ...... 106 4.3.2 Deletion analysis differentiates critically essential reactions of M1 and M2 activation phenotypes . 107 4.3.3 Sensitivity analysis identifies immunomodulatory metabolites of macrophage activation ...... 110

viii 4.3.4 Sampling analysis identiﬁes potential intracellular mechanisms linked to immunomodulatory metabolites ...... 117 4.3.5 Predicted metabolic activation phenotypes enhance mechanistic interpretation of experimental data for activated RAW 264.7 cells ...... 120 4.4 Discussion ...... 124 4.5 Materials and methods ...... 127 4.5.1 Tailoring the global human metabolic network to RAW 264.7 ...... 127 4.5.2 Functional characterization using ﬂux balance analysis ...... 128 4.5.3 Reaction deletion analysis ...... 129 4.5.4 Sensitivity and sampling analysis of metabolite exchanges ...... 129 4.5.5 Quantitative analysis of stimulated RAW 264.7 macrophages ...... 130 4.6 Acknowledgements ...... 131

Chapter 5 iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states ...... 132 5.1 Abstract ...... 132 5.2 Background ...... 133 5.3 Results and Discussion ...... 135 5.3.1 Proteomic based erythrocyte reconstruction . . 135 5.3.2 Functional assessment of iAB-RBC-283 . . . . . 139 5.3.3 Metabolite connectivity ...... 140 5.3.4 The human erythrocyte’s potential as a biomarker 143 5.3.5 Utilizing iAB-RBC-283 to develop biomarker studies ...... 147 5.4 Conclusion ...... 152 5.5 Methods ...... 154 5.5.1 Reconstructing the comprehensive erythrocyte network ...... 154 5.5.2 Constraint-based modeling and functional testing 155 5.5.3 Calculating metabolite connectivity ...... 157 5.5.4 Analyzing iAB-RBC-283 as a functional biomarker 157 5.6 Acknowledgements ...... 158

Chapter 6 Multi-omic data-driven construction of personalized kinetic models of metabolism ...... 159 6.1 Abstract ...... 159

ix 6.2 Introduction ...... 160 6.3 Results ...... 161 6.3.1 Personalized kinetic modeling workflow . . . . . 161 6.3.2 Model parameters reflect the genotype, and vice- versa ...... 168 6.3.3 Temporal Decomposition ...... 172 6.3.4 Dynamic simulations for understanding ribavirin susceptibility ...... 178 6.4 Discussion ...... 180 6.5 Materials and Methods ...... 181 6.5.1 Study sample ...... 181 6.5.2 Data Preparation ...... 181 6.5.3 Metabolite profiling: standards, instrumentation, acquisition, and quantification ...... 182 6.5.4 Constraint-based model construction and flux calculations ...... 184 6.5.5 Kinetic model construction ...... 184 6.5.6 Temporal decomposition ...... 186 6.5.7 Statistical analysis for targeted associations . . 186 6.5.8 Statistical analysis for dynamic simulations . . . 187 6.6 Acknowledgements ...... 187

Chapter 7 Conclusions ...... 189 7.1 Recapitulation ...... 189 7.2 The Prospect of Genome-Scale Science ...... 189 7.3 Future ...... 192 7.4 Acknowledgements ...... 193

Bibliography ...... 194

x LIST OF FIGURES

Figure 1.1: Constraint-based modeling: motivation and deﬁnition . . . . . 4 Figure 1.2: Constraint-based modeling: introduction to methods for analysis 5 Figure 1.3: The multiple uses of high-throughput data in constraint-based models...... 10 Figure 1.4: Predictive case-studies in understanding underlying principles of interaction networks...... 15 Figure 1.5: Predictive case studies in metabolic engineering and drug targeting...... 20 Figure 1.6: Expanding predictive scope through integrative modeling. . . 26 Figure 1.7: Outline of this thesis dissertation ...... 30

Figure 2.1: Overview of the MinSpan algorithm...... 36 Figure 2.2: Correlation analysis shows MinSpan pathways are biologically relevant...... 38 Figure 2.3: Global comparison of MinSpan pathways with databases of human-defined pathways...... 42 Figure 2.4: The three differences between MinSpan and human-defined pathways...... 48 Figure 2.5: MinSpan pathways help predict transcription factor activity. . 50 Figure 2.6: MinSpan TRN predictions suggest informative dual perturbation experiments that led to discovery of novel E. coli transcriptional regulation...... 53

Figure 3.1: Workflow of building the cell-specific model iAB-AMØ-1410. . 70 Figure 3.2: Comparison of reactions and network capabilities of iAB-AMØ- 1410 and the global human network (Recon 1)...... 72 Figure 3.3: Workflow of building the host-pathogen model, iAB-AMØ-1410- Mt-661...... 76 Figure 3.4: Schematic of the integration and results of the alveolar macrophage (iAB-AMØ-1410) and Mycobacterium tuberculosis (iNJ661) reconstructions...... 81 Figure 3.5: Topological map of the metabolites and reactions in the central metabolism of M. tb...... 83 Figure 3.6: Reaction comparison of the three infection-specific models of iAB-AMØ-1410-Mt-661 for latent, pulmonary, and meningeal tuberculosis...... 90

Figure 4.1: Reaction deletion analysis diﬀerentiates metabolic diﬀerences observed for M1 and M2 activation...... 109 Figure 4.2: Network sensitivity analysis recapitulates literature-supported immunomodulatory metabolites...... 112

xi Figure 4.3: Randomized sampling elucidates intracellular mechanisms for observed macrophage activation and suppression...... 118 Figure 4.4: High-throughput data support in-silico predictions...... 121

Figure 5.1: Workﬂow for building a comprehensive in silico erythrocyte metabolic network...... 137 Figure 5.2: Topological map of the human erythrocyte metabolic network (iAB-RBC-283)...... 138 Figure 5.3: The metabolite connectivity of iAB-RBC-283, Recon 1, and the similarly sized organelles in Recon 1...... 142 Figure 5.4: To test the potential use of erythrocytes as biomarkers, we iden- tiﬁed the known morbid SNPs and drug targets of erythrocyte enzymes account for in the reconstructed network...... 144 Figure 5.5: Flux variability of the exchange reactions can be used to detect metabolic signatures of simulated morbid SNPs and drug treated erythrocytes...... 148

Figure 6.1: The Mass Action Stoichiometric Simulation (MASS) approach for building high-throughput personalized kinetic models. . . 163 Figure 6.2: Overview of the human erythrocyte kinetic model and metabolite measurements...... 165 Figure 6.3: Personalized kinetic metabolic can be used for genomic discoveries...... 169 Figure 6.4: Time scale decomposition of the ensemble kinetic models elucidated personalized systemic variations at speciﬁc time scales. 173 Figure 6.5: Dynamic simulations with the ensemble personalized kinetic models predisposition factors to ribavirin (RBV) induced hemolytic anemia...... 176

Figure 7.1: Physical and biological models to describe macroscopic properties of microscopic measurements...... 191

xii LIST OF TABLES

Table 1.1: A comparison of modeling and analysis techniques for high- throughput data ...... 3

Table 2.1: Pathway numbers and lengths for MinSpan and pathway databases in E. coli and S. cerevisiae ...... 44

Table 3.1: Modiﬁcations of the original iNJ661 biomass objective function to produce the ﬁnal infectious state Mycobacterium tuberculosis objective function ...... 79 Table 3.2: Genes predicted to be essential in iAB-AMØ-1410-Mt-661. . . 85 Table 3.3: Genes predicted to be unessential in iAB-AMØ-1410-Mt-661. . 86 Table 3.3: ...... 87

Table 5.1: Morbid SNPs with non-erythrocyte related pathologies . . . . 146 Table 5.2: Uniqueness of metabolic signatures ...... 151

xiii ACKNOWLEDGEMENTS

I am indebted to numerous individuals for the successes I have had at the Systems Biology Research Group (SBRG) and UC San Diego. First and foremost, I would like to thank my thesis advisor, mentor, and friend: Bernhard Palsson. He has made graduate school a lively and educational experience for which I could not have asked for a better advisor. He has provided a “semi-infinite” amount of intellectual and financial resources for me to study systems biology, develop professionally, and develop personally. His undying optimism and interest in my work has pushed me through the tougher times of graduate school. I have also had a few other key mentors throughout my undergraduate and graduate experience at UC San Diego. As an undergraduate student, I worked with a graduate student, Ines Thiele, in the SBRG. Her enthusiasm and direction is what got me excited about graduate school and to stay at UC San Diego for my Ph.D. As a graduate student, I have received a lot of direction from Neema Jamshidi. This thesis would not have been possible without him. He is the senior author of my first manuscript and the corresponding author for the last manuscript that this thesis is comprised of. He is also a dear friend. During the first few months as a graduate student, I was lucky to consult for GT Life Sciences and Iman Famili, an exposure that taught me a great deal about translating ideas into industry and the importance of presentation. Iman was the senior author of a manuscript that we co-authored together in 2011. In 2012, Monica Mo and I co-first authored a manuscript together. Her mentorship and fascination of immunology drove the project and I learned a great deal from her senior experience. I owe a great deal of gratitude to the senior members of the lab throughout my time at the SBRG, who have also been important mentors, in particular: Nathan Lewis, Harish Nagarajan, Jan Schellenberger, Niko Sonnenschein, Daniel Hyduke, and Karsten Zengler. In addition, my work would not have been possible without the help of my colleagues and co-authors: Daniel Zielinski, Joshua Lerman, Steve Federowicz, Haythem Latif, Ali Ebrahim, Jonathan Monk, Douglas McCloskey, Zachary King, and Miguel Campodonico. I am a chatter box and I apologize to those around me for delaying their thesis’ as I am probably a con-

xiv stant distraction. The collaborative environment in the SBRG is unique and truly amazing. I hope it continues. BΩΠ for life. Graduate school is an arduous and frustrating experience at times. I owe my emotional and spiritual well-being to my close friends, in particular: Ranvir Mangat, Eric Chehab, Alex Fortenko, Michael Grubic, Nikhil Rao, and Adam Young. Finally, I would like to thank my loving and beautiful parents, Ardeshir and Roshanak. The successes in this thesis are more a testament of their ability to raise a child than my own accomplishments. In 1991, they moved to the United States and provided a four year old Iranian boy limitless opportunities. I am grate- ful for their undying love, support, and teachings. This thesis is dedicated to them.

Chapter 1 in part is a reprint of the material Bordbar A, Monk JM, King ZA, Palsson BO. Constraint-based models predict metabolic and associated cellular functions. Nat Rev Gen. (2014) 15:107-120. The dissertation author was the primary author.

Chapter 2 in part is from a manuscript that is under review for publication Bordbar A, Nagarajan H, Lewis NE, Latif H, Ebrahim A, Federowicz S, Schel- lenberger J, Palsson BO. Minimal metabolic pathway structure is consistent with associated biomolecular interactions. The dissertation author was the primary author.

Chapter 3 in part is a reprint of the material Bordbar A, Lewis NE, Schel- lenberger J, Palsson BO, Jamshidi N. Insight into human alveolar macrophage and M. tuberculosis interactions via metabolic reconstructions. Mol Syst Biol. (2010) 6:422. The dissertation author was the primary author.

Chapter 4 in part is a reprint of the material Bordbar A*, Mo ML*, Nakayasu ES, Schrimpe-Rutledge AC, Kim Y, Metz TO, Jones MB, Frank BC, Smith RD, Peterson SN, Hyduke DR, Adkins JN, Palsson BO. Model drive

xv multi-omics analysis elucidates metabolic immunomodulators of macrophage activation. Mol Syst Biol. (2012) 8:558. The dissertation author was one of the two primary authors.

Chapter 5 in part is a reprint of the material Bordbar A, Jamshidi N, Palsson BO. iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states. BMC Syst Biol. (2011) 5:110. The dissertation author was the primary author.

Chapter 6 in part is from a manuscript that is being prepared for publication Bordbar A, McCloskey D, Zielinski D, Sonnenschein N, Jamshidi N, Palsson BO. Multi-omic data-driven construction of personalized kinetic models. The dissertation author was the primary author.

Chapter 7 in part is a reprint of the material Bordbar A, Monk JM, King ZA, Palsson BO. Constraint-based models predict metabolic and associated cellular functions. Nat Rev Gen. (2014) 15:107-120. The dissertation author was the primary author.

xvi VITA

2008 B.S. in Bioengineering: Biotechnology, University of Califor- nia, San Diego

2014 Ph.D. in Bioengineering, University of California, San Diego

PUBLICATIONS

1. Zielinski DC, Corbett AJ, Bordbar A, Thomas A, Jamshidi N, Palsson BO. Characteristic ﬂux states and molecular drivers of cancer cell metabolism. In Preparation 2. Ebrahim A, O’Brien EJ, Lerman JA, Kim D, Bordbar A, Feist AM, Palsson BO. Estimating parameters in a genome-scale metabolic and expression model of Escherichia coli. In Preparation 3. Kim D, Ebrahim A, Seo SW, Bordbar A, Palsson BO. Elucidating the transcriptional regulation of nitrogen metabolism with systems approaches. In Preparation 4. Hefzi H, Bordbar A, Nagarajan H, Baycin-Hizal D, Paglia G, Betenbaugh MJ, Lewis NE, Palsson BO. Using a genome-scale metabolic model elucidates deleterious mutations in the Chinese hamster ovary cell line CHO-K1. In Preparation 5. Bordbar A, McCloskey D, Zielinski D, Sonnenschein N, Jamshidi N, Palsson BO. Multi-omic data-driven construction of personalized kinetic models of metabolism. In Preparation 6. Sonnenschein N, Zielinski D, De Bree J, Palsson S, Thomas A, Bordbar A, Jamshidi N, Palsson BO. The MASS Toolbox: Accessible dynamic modeling. In Preparation 7. Zielinski D, Filipp F, Bordbar A, Smith J, Herrgard M, Mo ML*, Palsson BO*. Identifying drug response signatures using genome-scale metabolic network analysis of human cell line expression data. In Preparation *Authors Contributed Equally to Work 8. Bordbar A, Nagarajan H, Lewis NE, Latif H, Ebrahim A, Federowicz S, Schellenberger J, Palsson BO. Minimal metabolic pathway structure is consistent with associated biomolecular interactions. Under Review 9. Nam H, Campodonico M, Bordbar A, Hyduke DR, Kim S, Palsson BO. A systems approach to predict oncometabolites via context-speciﬁc genome- scale metabolic networks. Under Review

xvii 10. Thomas A, Rahmanian S, Bordbar A, Palsson BO, Jamshidi N. Systems analysis of human platelet metabolism identiﬁes metabolic signature for as- pirin resistance. Scientiﬁc Reports. (2014) 4:3925. 11. Bordbar A, Monk JM, King ZA, Palsson BO. Constraint-based models predict metabolic and associated cellular functions. Nat Rev Gen. (2014) 15:107- 120. 12. Lewis NE*, Liu X*, Li Y*, Nagarajan H*, Yerganian G, O’Brien E, Bordbar A, Roth AM, Rosenbloom J, Bian C, Xie M, Chen W, Li N, Baycin-Hizal D, Latif H, Forster J, Betenbaugh MJ, Famili I, Xu X, Wang J, Palsson BO. The genomic divergence of CHO cell lines is revealed by the genomic sequence of the Chinese Hamster, Cricetulus griseus. Nat Biotech. (2013) 31(8):759-765. *Authors Contributed Equally to Work 13. Thiele I, Fleming RMT, Que R, Bordbar A, Diep D, Palsson BO. Multi- scale modeling of metabolism and macromolecular synthesis in E. coli and its application to the evolution of codon usage. PLoS ONE. (2012) 7(9):e45635. 14. Bordbar A*, Mo ML*, Nakayasu ES, Schrimpe-Rutledge AC, Kim Y, Metz TO, Jones MB, Frank BC, Smith RD, Peterson SN, Hyduke DR, Adkins JN, Palsson BO. Model drive multi-omics analysis elucidates metabolic immunomodulators of macrophage activation. Mol Syst Biol. (2012) 8:558. *Authors Contributed Equally to Work 15. Bordbar A and Palsson BO. Using the global human metabolic reconstruction to study physiology and pathology. J Int Med. (2012) 271(2):131-141. 16. Bordbar A, Feist AM, Usaite-Black R, Woodcock J, Palsson BO, Famili I. A multi-tissue type genome-scale metabolic network for analysis of whole-body systems biology. BMC Syst Biol. (2011) 5:180. 17. Schellenberger J, Que R, Fleming RMT, Thiele I, Orth JD, Feist AM, Zielinski DC, Bordbar A, Lewis NE, Rahmanian S, Kang J, Hyduke DR, Palsson BO. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v 2.0. Nat Protocols. (2011) 6:1290-1307. 18. Bordbar A, Jamshidi N, Palsson BO. iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states. BMC Syst Biol. (2011) 5:110. 19. Lewis NE, Schramm G, Bordbar A, Schellenberger S, Andersen MP, Cheng JK, Patel N, Yee A, Lewis RA, Eils R, Konig R, Palsson BO. Large-scale in silico modeling of metabolic interactions between cell types in the human brain. Nat Biotech. (2010) 28:1279-1285. 20. Bordbar A, Lewis NE, Schellenberger J, Palsson BO, Jamshidi N. In- sight into human alveolar macrophage and M. tuberculosis interactions via metabolic reconstructions. Mol Syst Biol. (2010) 6:422.

xviii 21. Thiele I, Fleming RMT, Bordbar A, Schellenberger J, Palsson BO. Func- tional characterization of multiple equivalent states of Escherichia coli’s transcriptional and translational machinery. Biophys J. (2010) 98(10):2072-2081.

xix ABSTRACT OF THE DISSERTATION

Utilizing genome-scale models to enhance high-throughput data analysis: from pathways to dynamics

Aarash Bordbar

Doctor of Philosophy in Bioengineering

University of California, San Diego, 2014

Professor Bernhard Ø. Palsson, Chair Professor Shankar Subramaniam, Co-Chair

Technological advances have revolutionized the life sciences. The cost of biological data generation has decreased exponentially in the past decade allowing simultaneous measurement of various biomolecules, which has enabled biologists to study cells as systems of interacting components. However, it is becoming clear that the true bottleneck to biological understanding is not data generation, but data analysis, leaving many data sets incompletely analyzed. To cope with this data deluge, diverse and sophisticated data analysis techniques have been developed and implemented. In this thesis, methodological advances for genome-scale metabolic models are developed and applied to interpret high-throughput data sets

xx to better understand cellular processes at the pathway, network, and dynamic levels. First, an un-biased, algorithmic approach to enumerate biological pathways is presented. The computed pathways are more consistent with the biomolecular interactions of the associated proteins and genes than even classical, human-defined pathways. Further, the computed pathways are utilized to discover novel and non-intuitive transcriptional regulatory interactions in Escherichia coli. Second, mRNA and protein expression omic data are used to build genome-scale metabolic models of the human alveolar macrophage and the murine macrophage-like cell line, RAW 264.7. Model simulations and prospective high-throughput experiments determined metabolism’s role in macrophages during infection and inflammation at the network level. Finally, personalized data-driven kinetic models of human erythrocyte metabolism are presented. Tailored to 24 individuals’ plasma and intracellular metabolite levels, the kinetic models better represent the underlying genetic makeup of the individuals than the metabolite levels alone. Further, the personalized models were used to predict and interpret side effects on an individual basis.

xxi Chapter 1

Constraint-based models predict metabolic and associated cellular function

1.1 Introduction

Understanding the genotype-phenotype relationship is at the core of the life sciences. For the latter half of the 20th century, the reductionist approaches of genetics, biochemistry, and molecular biology focused on the elucidation of biological components underlying this fundamental relationship. These approaches have provided detailed understanding of individual components, but they do not address the systemic interactions of biological and environmental components that underlie phenotypes. Technological advances have now enabled high-throughput methods to comprehensively characterize biological components simultaneously. The cost of such data generation has decreased exponentially and the amount of data generated has become more abundant, which enables biologists to view and study cells as systems of interacting components. To cope with the rapidly growing number of high-dimensional datasets, sophisticated data analysis methods are needed. Diverse approaches that range from stochastic kinetic models to statistical Bayesian networks have been ap-

1 2 plied, each with differing rationales and advantages (Table 1.1). One of these approaches is constraint-based reconstruction and analysis that is applied to genome- scale metabolic networks. Reconstructed genome-scale metabolic networks contain curated and systematized information about the known small metabolites and metabolic reactions of a cell type, which is based on its annotated genome and on experimental literature [1, 2]. Genome-scale metabolic networks can be converted to a mathematically consistent format, which is known as the stoichiometric matrix (Fig. 1.1). This matrix is the central component of a constraint-based model (CBM), which can be queried by an ever-growing set of modeling methods [3] (Fig. 1.2). CBMs have been primarily built for metabolic networks, including multicel- lular metabolic interactions [4, 5, 6, 7, 8]. CBMs have also been built for signaling [9, 10], transcriptional regulation [11], and macromolecule synthesis [12]. 3 Refs [13] [14] [3, 15] [16] [17, 18] [19, 20] [21] Disadvantages Computationally intensive, needs parameters Computationally intensive, needs parameters Noregulation, dynamics, nor concentrations Biologicaltems sys- arediscrete rarely Statistical, over fittingrequires training issues, data Dynamicsnot are represented explicitly Biased toman hu- pathways defined Advantages Mechanistic, dynamic,stochasticity captures Mechanistic, dynamic Mechanistic, large-scale, no kinetic parameters Can modelnamics and dy- regu- lation Unbiased,cludes in- disparate data/associations previous Incorporates prior biologicalencompasses most data, cellular processes Simple and quick, takes prior knowledge into account Typical prediction type Fluxes, Concen- trations, Regula- tory states Fluxes, Concen- trations, Regula- tory states Metabolicstates, flux essentiality Global gene states, activity states of genes on/off Probability distribution score Enriched clusters of genes andteins pro- Enrichedways path- Parameter- ization Detailednetic ki- parame- ters Detailednetic ki- parame- ters Network topology, exchange rates Rule-based interaction network High- throughput datasets Interaction network based ondata biological Pathway databases : A comparison of modeling and analysis techniques for high-throughput data Model systems Small-scale biological processes Small-scale biological processes Genome-scale metabolism Signaling/ transcriptional regulatory networks Gene regulatory/ signaling networks Protein- protein, geneticteraction in- networks Metabolic/ signaling networks Table 1.1 Method Stochastic kinetic modeling Determinis- tic kinetic modeling Constraint- based modeling Logical, Boolean, rule-based formalisms Bayesian approaches Graph/ Interaction networks Pathway Analysis 4

Figure 1.1: Constraint-based modeling: motivation and definition The functional capabilities of biological systems are constrained by their genetics and environment, and physico-chemical laws. For example, most natural environments are limited in nitrogen or phosphate. In addition, the rate of photosynthesis is a function of latitude as the incident flux of photons changes. In the 1960s, Daniel Atkinson realized that solvent capacity was a limitation in all cells, as cells tend to consist of 70% water and 30% biomass [22]. In 1973, Paul Weisz showed that most intracellular processes operate at rates that are close to the limits of diffusion [23]. These and other myriad constraints under which cells operate and evolve have been summarized [24].

We can now systematically reconstruct metabolic and other biochemical reaction networks. Metabolic networks are analogous to flow networks where metabolites (shown as circles) ‘flow’ through the network similar to liquids flowing in a pipe. These flows, and thus the state of a network are subject to myriad constraints. The network can be converted into a mathematical format known as the stoichiometric matrix for computation. Rather than derive a single solution, constraint-based models have an associated solution space (shown as a box) where all feasible phenotypic states exist given the imposed constraints. This allows one for simultaneous accounting for the many processes that act on and in cells.

Metabolite flow is constrained by, among other things, the network topology (for example, the connection of metabolites) and a steady-state assumption (for example, the assumption that internal metabolites must be produced and consumed in a mass-balanced manner). It also constrained by the known upper (also known as capacities; for example v1,max) and lower bounds of individual reaction fluxes. Im- posing such constraints ‘shrinks’ the solution space to a more biologically relevant region. The challenges in constraint-based modeling lie in identifying and imposing the necessary and dominant constraints to define a solution space, as well as in probing the solution space in a manner such that physiologically relevant fluxes or phenotypes, are determined. 5

Figure 1.2: Constraint-based modeling: introduction to methods for analysis Constraint-based models (CBMs) have been widely deployed; see Lewis et al. [3] for an extensive description of the developed methods. However, most of these techniques are based on two key components: the constraints on the biological system and the analysis method to predict ﬂuxes.

Constraining metabolic models: The toy model (part a) contains metabolites (shown as circles) that are converted by reactions (shown as arrows). Each reaction has a range of potential flux values, which can be constrained (shown as sliders). The imposition of constraints defines the associated solution space of the CBM. Most methods modify the metabolic reaction bounds for model parameterization. Simple constraints include fixing cellular input and output ranges on the basis of uptake and secretion of metabolites [25], as well as carrying out genetic knockouts by setting the associated bounds of the reactions to zero [26]. More advanced techniques include modifying reaction bounds on the basis of mRNA and protein expression data, either by setting the bounds to zero for reactions that correspond to absent transcripts and proteins [27, 28] or by linearly adjusting the bounds based on transcript and protein abundances [29].

Determining flux distributions: After model parameterization, fluxes are calculated. A simplified solution space is depicted with two reactions (v1 and v2) and an objective function (vobj) (part, b). The standard approach of flux-balance analysis [15] either maximizes or minimizes the flux of a user-defined reaction (that is, the objective function) using linear programming (shown by the green circle). An- other common approach is flux-variability analysis [30], in which the the maximum and minimum fluxes through each reaction are iteratively computed when the flux of the objective function is typically constrained to its maximum value (shown by yellow circles). Finally, Markov Chain Monte Carlo (MCMC) sampling computes many candidate flux distributions (shown as red dots) that provide a probability distribution for the fluxes. This approach is unbiased, as no assumption of an objective is required. 6 7

This introductory chapter begins with a brief description of the four-phase history of the development of CBM applications. Next, studies are presented that show the recent progress of integrating high-throughput data sets with the mechanistic and functional context of CBMs to predict metabolic phenotype. Finally, a roadmap is presented of the remaining chapters, where constraint-based modeling and kinetic models are utilized to interpret high-throughput data.

1.2 Foundational Developments

Constraint-based analysis has been applied to biochemical reaction networks for more than 25 years. To put these developments into context, we exhaustively searched the literature using Web of Knowledge to collect research articles that utilize CBMs for interpreting and predicting biological phenotypes. We collected 645 articles from 1986 to June 15, 2013. The articles are available with short descriptions of their contributions to the research ﬁeld at the literature on model- driven analysis website hosted by the University of California, San Diego Systems Biology Research Group (http://sbrg.ucsd.edu/cobra-predictions). An analysis of this literature shows that the history of CBMs can be divided into four phases.

1.2.1 Initial Studies (1986-1998)

CBMs were initially used to determine theoretical pathway yields and metabolite overflows [31, 32]. Experimental metabolic fluxes and growth rates were shown to be consistent with fluxes that were computed on the basis of optimization of cellular objective functions, including minimal production of reactive oxygen species (ROS) for hybridoma cells [33] and maximal growth rate for laboratory strains of Escherichia coli [34]. Concurrently, algorithms - such as Elementary Flux Modes [35] and Extreme Pathways [36] - were developed [37] to exhaustively calculate metabolic pathways in CBMs for analysis of network topology [38] and for uses in metabolic engineering [39] uses. The quantitative match between CBM predictions and measured cellular behavior opened up the possibility of predicting phenotypes from a biochemically reconstructed network. 8

1.2.2 Building Genome-Scale Networks (1999-2004)

The ability to sequence whole genomes [40] made it possible to formulate CBMs at the genome-scale and allowed representation of the complete metabolic gene content in the assessment of phenotypic functions [41]. Importantly, metabolic reactions in a CBM could be directly linked to the genotype of the target cell, which allowed for prediction of the consequences of gene knockouts [26, 42]. These genome-scale models facilitated the study of the global organization of cellular behavior: such as pathway structure [43], adaptive evolution end points [44], metabolic ﬂuxes [45], and bacterial evolution [46, 47].

1.2.3 Integrating Omics Data (2005-2009)

As the generation of ‘omics’ data became cheaper and as larger data sets appeared, researchers began to incorporate these data sets into CBMs [48] (Fig. 1.3). Initially, the metabolic network was used as a scaﬀold to interpret transcriptional changes [49, 50], in a manner similar to pathway enrichment analysis (Fig. 1.3a). Subsequently, omic data were used more directly by further constraining individual metabolic reactions to increase the context speciﬁcity of CBMs [27, 28] (Fig. 1.3b).

1.2.4 Maturing to Predictive Practice (2010-)

These efforts resulted in highly curated and validated genome-scale models that are now enabling the research community to obtain meaningful predictions of biological functions. The following examples are focused on this most recent phase in the field of CBM development. We first discuss the latest evaluations of the assumptions of constraint- based modeling . Second, we discuss the integration of genome-scale data sets - specifically, omic data and biomolecular interaction data - with CBMs. Third, we focus on how discrepancies between model predictions and experimental data allow targeted experimentation that leads to biochemical discovery. Fourth, translational applications of constraint-based modeling, including metabolic engineering 9 and drug target discovery, are discussed. Finally, we focus on recent advances of integrating CBMs with other modeling approaches to increase predictive scope.

1.3 Reﬁning objectives

The first constraint-based method for biological predictions was flux balance analysis (FBA). Its formulation is rooted in the hypothesis that a cell is “striving” to achieve a metabolic objective (Fig. 1.2). Studies have shown that optimizing the assumed cellular objectives of growth [44] and energy usage [51, 52], one can predict metabolic fluxes in microbes. Other studies have questioned the universality of the biomass growth objective function for predicting relevant metabolic fluxes [53, 54, 55]. 10

Figure 1.3: The multiple uses of high-throughput data in constraint- based models. Constraint-based modeling can be used to interpret and augment omic data sets by using an underlying cellular network that has been biochemically validated. Metabolites are represented by circles. (A) Similar to pathway enrichment analysis and interaction networks, high-throughput data can be integrated with the metabolic network topology to determine enriched regions and even significantly perturbed metabolites [49]. (B) Omic data add an additional layer of constraints for reaction fluxes. One study [56] integrated expression profiling data to determine context-specific flux distributions (pathway shown in red), which increases the fidelity of the data (represented as bars) as well as the accuracy of flux predictions (upper panel). In addition, two other stuides [57, 58] used omic data to build cell- and tissue- specific models of human metabolism by removing unexpressed reactions (shown as discolored reactions) from the global human metabolic network (lower panel). Differences in these networks can be exploited to learn unique features of each network. (C) Constraint-based analysis predictions can be compared and validated against high-throughput data sets. One study [59] compared flux-balance analysis solutions of different objectives against 13C fluxomics data to find a combination of objectives that best fit the in vivo fluxes. Abbreviations: HT - High-throughput 11

1.3.1 Do cells maximize growth rate?

To identify the objective function that best predicts experimental data on growing cells, one study [59] greatly expanded the initial assessment of appropriate objective functions for FBA by compiling 44 metabolic flux analysis data sets of in vivo flux distributions for E. coli, and the researchers evaluated the ability of a reduced CBM to predict these measurements using dozens of single and combined candidate cellular objective functions (Fig. 1.3c). The best representation for the in vivo fluxomic data sets was a Pareto surface that is defined by a combination of three objectives: maximizing biomass generation, maximizing ATP generation, and minimizing reaction fluxes across the network; that is, the minimization is a proxy for the most efficient utilization of the proteome [60]. Utilizing flux variability analysis (FVA) (Fig. 1.2), the authors found that there is some ‘slack’ in metabolic reaction fluxes when the cell is operating close but not on the Pareto surface. In fact, they also observed that the in vivo flux distributions were slightly sub-optimal. The authors showed that this sub-optimality is most likely an evolutionary adaptation that allows rapid adjustment to environmental perturbations. In this study, metabolic flux analysis simulations were limited to central carbon metabolism. Future studies are therefore needed to determine whether the optimality principles that have been derived in this study will hold for other metabolic subsystems that are studied by different labelled metabolites. Nonetheless, CBMs can use optimality principles to predict the approximate growth state and the ‘hedging’ functions that keep the cells from fully reaching the predicted optimal states, which are yet to be delineated. Such hedging functions are expected to vary from strain to strain and organism to organism on the basis of their evolutionary history.

1.3.2 Moving beyond the assumption of growth optimality

Although the prediction of optimal growth rates has historically received much attention, most of the recent studies that are highlighted in this Introduction do not assume optimality of cellular growth. Researchers have been increasingly adopting alternative unbiased approaches - such as Markov Chain Monte Carlo 12

(MCMC) sampling, omic data integration, and metabolic pathway analyses [3] - that are not subject to assumptions of optimality.

1.4 Contextualizing omic data

The constraint-based modeling framework is amenable to simultaneous integration of a range of omic data types [48] (Fig. 1.3). In particular, omic data have been used to constrain calculated ﬂux distributions and as a comparison and validation tool for model predictions. Such omic data integration has enabled context-speciﬁc studies of the metabolism of an organism and, in the following cases, the studies of enzyme promiscuity and pathogenesis.

1.4.1 Why are some enzymes speciﬁc and some promiscuous?

It is thought that ancestral enzymes were promiscuous and inefficient, and that they have evolved to become catalytically efficient and specific [61]. However, it is not well understood why such evolution took place for some enzymes, but not for others. To address this question, one study [62] classified proteins in the E. coli CBM [63] into two groups (that is, specialist and generalist enzymes) on the basis of the number of reactions that are catalyzed by each enzyme. The authors showed that reactions that are associated with specialist enzymes are more likely to be essential on the basis of growth phenotypes of knockout strains, and that these reactions are more likely to carry a high and variable flux across hun- dreds of in silico environmental conditions on the basis of MCMC sampling (Fig. 1.2). Large, disparate data sets were used to validate simulations. Gene essentiality predictions were validated by comparison with a gene deletion collection [64]. An analysis of kinetic parameters from the Braunschweig enzyme database (BRENDA) [65] showed that specialist enzymes have higher in vitro catalytic activity (that is, higher turnover number (kcat)) and higher substrate affinity (that is, lower Michaelis constant (Km)). Omic data sets revealed that specialist enzymes are more tightly regulated at multiple levels, which is indicated by transcriptional 13 and post-translation modifications, as well as by small molecule mediated control. Although enzyme promiscuity has not been fully elucidated and might not be fully captured in the model, the CBM is nevertheless the best self-consistent representation of known metabolic reactions and enzymes in E. coli. Consequently, these predictions provided a direction to integrate and interpret disparate data sets with the CBM and thereby validating a genome-scale hypothesis that the evolution of an enzyme towards specificity and catalytic efficiency is dependent both on the function of the enzyme in its metabolic network context and its evolutionary response to selection pressures.

1.4.2 What is the role of metabolism in pathogenesis?

Intracellular pathogens adapt metabolism to their host environment during pathogenesis. One study [56] generated transcription profiling data of pathogenic intracellular growth to investigate the relationship between metabolism and pathogenesis of Listeria monocytogenes. The researchers analyzed the data both through traditional pathway enrichment analysis (Table 1.1) and also integration with a CBM of L. monocytogenes. They utilized the iMAT algorithm [27] that computes a flux distribution which best uses reactions that are associated with upregulated genes and which avoids using reactions that are associated with downregulated genes (Fig. 1.3b), thereby predicting differential reaction use between conditions. By comparing pathway enrichment analysis of the transcription data with and without iMAT, the authors found that iMAT increased accuracy in representing known changes in intracellular growth because the CBM and the computed flux states contextualize the expression data. In this way, incorrectly upregulated transcripts - either due to a false-positive measurement or post-transcriptional regulation - are algorithmically ‘corrected’ if the rest of the associated pathway is inactive (Fig. 1.3b) and vice versa. The higher predictive accuracy helped the authors focus their experiments on highly active pathways, which were then experimentally confirmed by generating conditional knockout strains. Prospective experiments that were based on the identified pathways showed that limiting concentrations of branched-chain amino acids induced virulence activator genes and elucidated 14 the role of amino acid metabolism in pathogenesis. Thus, analysis of transcription data is often hindered by the low signal-to-noise ratio and the limitation that post-transcriptional regulation is not captured in these datasets. These limitations can be ameliorated for metabolic transcripts by the contextualization of the iMAT algorithm and CBMs. 15

Figure 1.4: Predictive case-studies in understanding underlying principles of interaction networks. Many network types are used to represent cellular behavior. Recent studies have compared the properties of interaction networks against constraint-based models (CBMs) to learn global principles. (A) One study [66] compared an experimental set of genetic interactions for metabolic genes against interactions that were predicted by flux-balance analysis (FBA). The CBM was able to recapitulate many of the in vivo principles. However, there were a high number of incorrect model predictions. Using machine-learning techniques, key changes to the metabolic network that would improve model accuracy were identified. Using growth screens, the authors validated that the synthesis of NAD+ from amino acids was only possible from L-tryptophan (L-trp) buts not L-aspartate (L-asp). ∆bna refers to any of the genes that are related to the kyurinene pathway, including bna1, bna2, bna4, and bna5. (B) Another study [67] calculated metabolic pathways - Elementary Flux Patterns - for the network. Elementary Flux Patterns decompose the metabolic network into distinct functional pathways (shown by different colors). The degree of co-regulation of each pathway’s genes was calculated, revealing that some pathways are highly correlated, whereas others are not. Variation in co-regulation was attributed to the ‘cost’ that is needed for building the proteins in a particular pathway. 16

1.5 Characterizing interaction networks

Recent work shows that CBMs can be used to place interaction networks of diverse biological components into context and to interpret these networks. In- teraction networks describe the phenomenological interactions between diﬀerent biomolecules, including: genes [68], proteins [69], and transcription factors [70]. Such interaction networks are information dense and highly valuable, but they cannot generally be formulated into a modeling framework for prediction of physiological functions. In the following examples, the mechanistic information in CBMs is used in conjunction with biomolecular interaction networks to derive principles underlying cellular organization.

1.5.1 Genetic interaction networks

The theoretical aspects of genetic interactions of metabolic genes in Saccha- romyces cerevisiae that were derived using CBMs has been studied [71, 72, 73]. A recent study [66] used a CBM and experimental data to discover mechanistic principles that underlie global properties of S. cerevisiae genetic interaction data (Fig. 1.4a). First, the authors experimentally and computationally quantified genetic interactions for the genes in the S. cerevisiae CBM [25]. Both the experimental data and computational predictions showed a global property that genes which are associated with low-fitness single mutants share many genetic interactions. They then used the CBM to propose a mechanistic explanation of this phenomenon. The researchers showed that these deleterious gene deletions directly disrupt the production of multiple metabolite precursors that are necessary for cellular growth. Thus, these genes share genetic interactions with other genes that contribute to the various aspects of their functionality. The same researchers found that FBA underpredicts genetic interactions, which can be attributed to the optimality assumption of FBA, the inherent inability of FBA to capture regulation and data on toxic intermediates, or to an incomplete or incorrectly annotated metabolic network. To determine whether modifying the CBM could increase its predictive power, a machine learning method 17 was implemented to reconcile the two networks by removing reactions, modifying reaction reversibility, and altering the biomass function in the CBM. Model re- finement identified one of the two NAD+ biosynthetic pathways from amino acids in the CBM as a source of inaccurate predictions. Through growth experiments using mutant strains, the researchers confirmed that the second biosynthetic pathway was not present in S. cerevisiae. This study shows that CBMs can suggest the mechanistic underpinnings of genetic interaction networks and that the comparison of the metabolic and genetic interaction networks can lead to targeted improvements in biochemical knowledge.

1.5.2 Transcriptional regulatory networks

CBMs have aided the characterization of underlying principles of transcriptional regulatory networks for E. coli metabolism [67] (Fig. 1.4b). Previous studies have shown a moderate link between metabolic topology and transcriptional regulation [43, 74]. To provide a more detailed analysis, one study [67] calculated potential pathways through metabolic subsystems of the E. coli CBM. Metabolic pathway structure has been of great interest [43] because a full enumeration of pathways can describe all possible steady-state metabolic phenotypes. However, the difficulty in computing Elementary Flux Modes for genome-scale networks has hindered their widespread use. A recently developed alternative - Elementary Flux Patterns [75] - calculates metabolic pathways in individual subsystems. This method ignores pathways that traverse multiple metabolic subsystems but it is computationally tractable for genome-scale networks. By comparing Elementary Flux Patterns with transcription profiling data sets [76], the authors showed that pathways were only moderately co-expressed, but the degree of such expression varied greatly from perfect co-expression to no co-expression. They then showed that transcriptional regulation of pathways is dependent on the ‘cost’ of producing the associated enzymes and on the required response time. In pathways that contain ‘expensive’ proteins (that is, proteins that are larger in size), transcriptional regulation typically occurs for all enzymes in the pathway, whereas pathways with ‘low-cost’ proteins are typically transcriptionally regulated only at the first 18 and last enzymes of the pathway. This categorization explained some cases of low co-expression. Thus, by pairing the CBM network topology with transcriptional regulatory networks, this study was able to outline key principles of metabolic regulation for different types of metabolic pathways.

1.6 Targeted metabolic knowledge expansion

The studies discussed above focus on integrating CBMs with large-scale data sets to gain mechanistic understanding. However, incomplete knowledge of the metabolism of the target cell leads to inaccurate predictions. One feature of computational models is that incorrect predictions can identify missing or incorrect metabolic knowledge. Thus, the discrepancies between CBM predictions and experimental data have been used to design targeted experiments that correct such inaccuracy in metabolic knowledge [77, 78].

1.6.1 Discovering new human metabolic capabilities

The initial reconstruction of the global human metabolic network - Recon 1 [79] - is incomplete owing to gaps in our knowledge of human metabolism. Thus Recon 1 is missing metabolic reactions. Using an established protocol [78], one study [80] identified such gaps in our knowledge by simulating either production or consumption of every metabolite to assess whether the metabolite was fully connected to the rest of the network. For the metabolites that were not fully connected, a universal database of metabolic reactions [81] was used to predict the fewest reactions required to fully connect them. The authors found 73 candidate ‘gap-filling’ solutions that fully connected the disconnected metabolites, 47 which were supported by the literature. Focusing on gluconate, which is a disconnected metabolite, the authors experimentally characterized open reading frame 103 on chromosome 9 (C9orf103) as the gene that encodes gluconokinase. This study illustrates how a self-consistent model of metabolism guides researchers to refine experimentals to fill in missing gaps in our current knowledge. 19

1.6.2 Discovering enzyme functions in E. coli

Constraint-based modeling has been used to discover biochemical knowledge about the well-characterized metabolic network of E. coli. Through systematic genetic perturbation of E. coli central carbon metabolism, one study [82] discovered a novel pathway and previously uncharacterized enzymatic functions. Single, double, and triple knockout strains of central metabolic genes were grown on 13 different growth conditions with various carbon sources to determine positive and negative genetic interactions. Concurrently, genetic interactions were predicted using the E. coli CBM [63]. After careful removal of false predictions that were due to model assumptions (for example, the inability of FBA to differentiate major and minor isozymes as enzyme abundance and kinetic activity are not captured), it was observed that discrepancies related to talAB interactions in the pentose phosphate pathway could not be reconciled. To determine the cause, the authors generated transcription and metabolite profiling data for the wild type and knockout strains. A metabolomic analysis identified a new metabolite - sedoheptulose-1,7-bisphosphate - that had not previously been characterized in E. coli, which suggests the existence of a novel reaction. Utilizing metabolic flux analysis and in vitro enzyme assays, they confirmed that phosphofructokinase car- ries out the reaction and glycolytic aldolase can split the seven-carbon sugar into three- and four-carbon sugars. Thus the detailed analysis of the CBM against data discrepancies found two new catalytic functions of classical glycolytic enzymes.

1.7 Designing metabolic phenotypes

CBMs have been used for translational applications, including the design of metabolic phenotypes. In the past ten years, many algorithms have been developed for predicting useful genetic manipulation strategies for metabolic engineering [83, 84]. They have also been important in assessing the net energy balance and level of greenhouse gas emission for bioethanol and biodiesel production [85]. Here, we discuss one recent CBM success in this ﬁeld of research. 20

Figure 1.5: Predictive case studies in metabolic engineering and drug targeting. Constraint-based models have been used for answering important questions in translational research. (A) One study [86] used multiple computational and experimental tools to design an Escherichia coli strain that produces 1,4- butanediol (BDO). An unengineered wild-type (WT) strain trades off metabolite production with cellular growth (shown by the solid line in the solution space). Using the OptKnock algorithm, BDO production was ‘coupled’ with the growth objective of the cell by forcing the synthetic BDO pathway to be the sole route for E. coli to maintain redox balance (shown by black arrows). Thus, the solution space is modified such that BDO production is linked to cellular growth (shown by the dashed line in the solution space). (B) In one study, [87] researchers took an alternative, metabolite centric approach to drug targeting which computationally removes consuming reactions of a particular metabolite. The approach was experimentally confirmed for V. vulnificus by a structural analogue of the endogenous metabolite, which also acts as a small-molecule inhibitor. (C) Metabolic reactions in the E. coli model were augmented to capture reactive oxygen species (ROS) generation, which allowed the use of flux-balance analysis to predict ROS production in one study [88]. In follow-up experiments, the authors show that it is possible to predict drug target strategies to enhance endogenous ROS production to increase the efficacy of other antibiotics. TCA - tricarboxylic acid cycle 21

1.7.1 Production of non-natural, commodity chemicals

There has been a push to use biotechnology to produce commodity chemicals. To this end, one study[86] designed an E. coli strain that produces 1,4- butanediol (BDO) at high yields. Two key hurdles were overcome using computational methods. First, BDO is not a naturally occurring compound in any organism. The authors used a pathway prediction algorithm [89] that determines the necessary biochemical transformations to convert an endogenous E. coli metabolite to BDO. A final pathway was chosen on the bases of thermodynamic feasibility [90], the theoretical yield of BDO (which was determined using FBA), the number of known enzymes for the biochemical transformations (which was determined using pathway databases [21]), and the topological distance of the pathway from central carbon metabolism. Second, when the pathway was introduced, the organism did not produce BDO at high rates; thus a ‘rational’ approach to producing a metabolic design was pursued using the E. coli CBM [63] and the OptKnock algorithm [91]. A four-knockout strategy that blocked the production of natural fermentation products was chosen to force the strain to balance redox and channel all carbon flux through BDO production (Fig. 1.5a). Further genetic manipula- tions were needed to create the final strain, which included modifying transcription factors, swapping E. coli metabolic enzymes with non-native enzymes, and optimizing codons. There are many hurdles to designing a production strain, but this study shows that CBMs can have a vital role in accelerating and completing the industrial strain design pipeline to produce non-natural metabolites.

1.8 Discovering Drug Targets

The capability of constraint-based modeling to predict the eﬀects of gene knockouts provides an important tool for drug targeting studies [92]. Three recent experimentally validated studies have discovered novel cancer drug targets and antibiotics. 22

1.8.1 Exploiting deﬁciencies in cancer metabolism

There has been renewed interest in studying metabolic alterations that occur in cancer cells [93]. In two studies [57, 58], researchers hypothesized that they could use CBMs to determine and exploit the metabolic auxotrophies of cancer cells. The first study [57] used a model building algorithm [94] that uses cues from transcriptomic data to prune metabolic reactions from Recon 1 [79] in order to build a ‘generic’ cancer model (Fig. 1.3b). They then used FBA predicted knockout phenotypes to determine ‘cytostatic’ drug targets that selectively block growth of the cancer model but that do not affect ATP generation and growth of the ‘healthy’ Recon 1 model . Interestingly, even though cancer cells exhibit heterogeneous genotypes and phenotypes, it was found that approved or experimental cancer drugs exist for 40% of the determined cytostatic drug targets. Analyses of growth phenotypes using CBMs focus on the capacity for growth under single knockout conditions. The surprising agreement of computational predictions and experimental results for a generic model suggests that the metabolic capabilities of cancer cells are starkly different from those of healthy human cells, which allows for drug combinations to be detected. In a follow-up study [58], the researchers experimentally investigated a fumarate hydratase deficiency that can cause hereditary leiomyomatosis and renal- cell cancer. At the time, no mechanism for reduced NADH regeneration was known for fumarate hydratese-deficient cells. The researchers immortalized and constructed two cell lines, one of which expressed fumarate hydratase and one that was deficient in fumarate hydratase. Starting from Recon 1, transcription data were used to build two cell line-specific models. By comparing predicted knockout growth phenotypes, they identified a selectively essential pathway for heme biosynthesis and degradation in fumarate hydratase-deficient cells, which represented a potential mechanism for NADH regeneration. Heme oxygenase (HMOX1) was experimentally inhibited in both cell lines and fumarate hydratase-deficient cells were selectively killed, showing that fumarate hydratese and HMOX1 are in fact synthetically lethal as predicted Interestingly, the heme pathway ranked only 39th in terms of overexpressed 23 pathways in fumarate hydratase-deficient cells, which meant that the predictions were only possible through the combination of expression data with the CBM. These results are a step toward identifying effective anticancer drugs using genome- scale metabolic knowledge. As the CBM predictions focus on differential metabolic capacities, there exists a potential for false-negative predictions, as additional layers of differences are not taken into account. In addition, it will be interesting to see whether these methods can be extended to other cancer types where germ-line mutations are unknown or absent. The identification of the heme pathway as synthetically lethal with fumarate hydratase represents a key success in using CBM predictions for prospective experimentation for studying human disease.

1.8.2 Essential metabolites guide antibiotic discovery

Gene knockout simulations in CBMs are accomplished by constraining the gene-associated reaction bounds to zero (Fig. 1.1). Moving past a gene-centric approach, an alternative approach for drug targeting is metabolite essentiality analysis [95] (Fig. 1.5b). To ‘remove’ a metabolite in a CBM, the bounds of the reactions that consume the metabolite are constrained to zero, and the steady-state constraint for that particular metabolite is relaxed to allow internal metabolite accumulation. One study [87] reconstructed a genome-scale metabolic network for Vibrio vulnificus, which is a Gram-negative pathogen. By applying metabolite essentiality analysis, the authors found 193 metabolites that are essential to cellular growth. They narrowed the list down to five essential metabolites that represented promising targets for drug development by removing metabolites that are found in humans to lower the potential toxic side-effect and by removing metabolites that have a single consuming reaction for a more robust effect on the pathogen. The identified metabolites typically affected a single gene, meaning that traditional reaction knockouts could have been used. However, using a metabolite-centric approach has its advantages. It allowed the authors to search for structural analogues of the essential metabolites to inhibit the enzymes that relied on them as substrates. They screened the inhibitory 24 capability of 352 compounds that were structurally similar to the predicted essential metabolites, and the most potent compound was chosen for further evaluation as an antibiotic. The compound was confirmed to bind to the target enzyme in folate biosynthesis, which validated the CBM and chemoinformatic predictions. Furthermore, they found that the compound was more effective than current antibacterials. By using a CBM to analyze metabolism, antibiotic discovery can be approached from multiple perspectives (for example, from the perspective of a gene, reaction, or metabolite).

1.8.3 Increasing antibiotic eﬃcacy through ROS production

Reactive oxygen species (ROS) can weaken and kill pathogens, and modu- lation of ROS production could therefore be used as part of an antimicrobial strategy. As a proof of concept, one study [88] predicted genetic engineering strategies in E. coli to increase internal ROS generation in order to increase antibiotic efficacy. The current E. coli CBM [63] does not account for the major sources of ROS production. Thus, 133 metabolic reactions with potential for ROS generation were augmented in the E. coli CBM (Fig. 1.5c). With the updated CBM, computed single knockout flux distributions included a quantitative readout of ROS generation. Thus, gene knockouts that increase the endogenous ROS production were predicted, many of which increased inefficiencies in production or usage of ATP. For validation, the researchers experimentally knocked out genes that were either predicted to increase endogenous ROS production, as well as genes that were predicted to have no effect as negative controls. There was high qualitative con- cordance of the predictions with experimental measurements of ROS production, which suggests that a CBM could be used to tune ROS production. These results are striking because little quantitative information was necessary in the coupling of flux with ROS production and because a statistical ensemble approach was used to account for unknown parameters. This study was able to predict genetic engineering strategies that were proven to increase ROS production and to potentiate oxidative attack from oxidants and antibacterials, which provides a novel approach 25 for antibiotic discovery.

1.9 Coupling with other cellular processes

Constraint-based modeling has been mainly applied to metabolism. How- ever, researchers have recently extended the scope of CBMs and combined them with diﬀerent modeling methods to address questions beyond metabolism. Two approaches that have emerged are extending CBMs of metabolism to include additional cellular processes and connecting diﬀerent modeling methods.

1.9.1 Modeling transcription, translation, and metabolism

CBMs have been reconstructed for cellular processes other than metabolism [10, 9, 11, 12]. However, until recently, CBMs of different cellular processes had not been integrated. One study [96] integrated a CBM of Thermotoga maritima metabolism [97] with a CBM for transcription and translation [96] (Fig. 1.6a). By adding information about the transcription and translation machinery, the CBM accounts for mRNA transcription, protein translation, necessary protein post-translational modifications, and use of the protein complex to catalyze metabolic reactions in T. maritima (Fig. 1.6a). To couple the necessary machinery for a particular metabolic reaction, the authors used coupling constraints [98] that mathematically link a metabolic reaction flux with its required molecular and enzymatic machinery in formulating the linear programming problem. The result is an integrated network reconstruction that contains the molecular biology and metabolism of T. maritima at the genome-scale and that allows the computation of the functional proteome that is needed to express a given phenotype. The in- corporation of new cellular processes in the constraint-based modeling framework is exciting but requires additional parameterization of enzyme efficiencies under different biological conditions. A key challenge for the improvement and the use of these new models is the development of parametrization techniques that are driven by proteomic and transcriptomic data. 26

Figure 1.6: Expanding predictive scope through integrative modeling. The predictive scope of constraint-based modeling has been extended beyond metabolism either by explicitly accounting for non-metabolic components in the constraint-based modeling approach or by coupling with other modeling frameworks. Metabolites are represented by circles. (A) The transcription and translation of the necessary mRNA, proteins, and cofactors have been explicitly represented in a constraint-based modeling framework alongside the metabolism of Thermotoga maritima [96]. This allows simultaneous computation of metabolic fluxes, mRNA transcript expression, and proteome levels (lower panel). (B) Metabolic models have also been coupled with other modeling frameworks. The probability of metabolic gene activation and repression by transcription factors (TFs) can be computed using a probabilistic transcriptional regulatory network based on high-throughput data sets (upper panel). The calculated probabilities are then relayed into the constraints of the metabolic reaction fluxes in the constraint-based model [99], which allows prediction of TF-knockout phenotypes (lower panel). (C) Structural systems biology can predict biophysical properties of proteins. One study [100] calculated the individual activity changes of each metabolic enzyme during temperature shift. The combined effect of all the metabolic enzymes on the cell was computed by integrating the individual enzyme changes into the flux constraints of the Escherichia coli constraint-based model (upper panel), allowing prediction of growth rate as a function of temperature (lower panel). Enz, enzyme; NTP, nucleoside 5’-triphosphate; P, probability. 27

The integrated model hopes to address some of the crucial challenges that have limited metabolic CBMs. First, the integrated model takes into account the variability of cellular composition at different growth rates, while metabolic CBMs only use one biomass function for growth rate optimization. Cellular composition is dependent on growth rate, and metabolic CBMs have previously accounted for variations in growth with different cellular compositions [101]. However, an integrated model explicitly represents nucleotide and protein demand as a function of growth rate, so that a traditional biomass function is no longer necessary. Second, by coupling transcript and protein synthesis with active metabolic reactions, the authors quantitatively predicted differential experimental transcriptome and proteome levels across varying conditions. They used upstream genomic sequences of the differentially expressed genes to determine putative consensus sequences for transcription factor binding. The newly derived sequences helped to identify a candidate metabolite transporter, which was subsequently verified experimentally [102]. Finally, by incorporating the required demands for the machinery for metabolism, the integrated CBM unifies the three-objective Pareto surface that was discussed above into a single objective [103]. Thus, as the content of these models increases, the ability of CBMs to explain and predict biological functions grow in scope.

1.9.2 Merging statistics with mechanistic networks

Statistical approaches are useful when there is limited knowledge of the underlying biological networks. Unlike metabolic networks, the functional states of transcriptional regulatory networks (TRNs) are harder to define mechanistically because the underlying biochemistry and biophysics are often unknown. One study modeled the cellular processes of metabolism and transcriptional regulation using two different modeling formulations, which included a CBM for metabolism and a probability metric that is based on omic data for the TRN [99] (Fig. 1.6b). For E. coli and Mycobacterium turberculosis, the authors amassed the available transcriptional profiling data sets and the existing transcriptional regulatory interaction networks. Rather than using a Boolean formulation for the TRN [104], 28 they calculated the probabilities of activation and repression on the basis of the collected expression data sets for each pair of transcription factor and target gene. Similar to how basic constraints are added (Fig. 1.2), the TRN was combined with metabolism by adjusting upper and lower bounds of individual metabolic reactions in the CBM on the basis of both the calculated probabilities of activation of associated target metabolic genes and the allowable flux states (which are determined by FVA) (Fig. 1.6b). The integrated E. coli metabolic regulatory model was more accurate in predicting transcription factor-knockout phenotypes than previous attempts that used integrated models [104]. The newly developed integrated M. tuberculosis network predicted drug targets and aided in the identification of novel regulatory roles of transcription factors. Although the TRN modifies the CBM of metabolism, the calculated flux distributions and the metabolites that are present do not feedback to parameterize the TRN. A further improvement of integrated modeling between transcriptional and metabolic networks will be to include feedback mechanisms from metabolism.

1.9.3 Structural Systems Biology

One study expanded the E. coli metabolic network by including the experimentally derived protein structures (where available) and the computationally predicted protein structures for the metabolic enzymes in the network [100] (Fig. 1.6c). Using structural bioinformatics [105] techniques, changes in enzyme activity were predicted and were used as constraints on the activity of individual metabolic reactions. The researchers focused on thermostability of E. coli’s enzymes to study growth rate as a function of temperature. With this approach, they were able to computationally predict growth rates at varying temperatures, which were consistent with experimental data. The growth-limiting enzymes were then determined on the basis of temperature-dependent flux constraints. Although other temperature dependent parameters, such as cellular composition, were not considered [106], the predicted growth-limiting enzymes significantly overlapped with mutated genes from a previous study [107] on adaptive evolution of E. coli to higher temperatures. For direct experimental validation, the growth-constraining 29 enzymes were bypassed by supplementing growth media with the metabolic product of the enzyme or of the pathway to which it belongs. Such supplementation was beneficial for E. coli that was grown at superoptimal temperatures, which supported the predictive capability of CBMs to account for enzyme thermosensitivity. These promising results raise the prospect of the substantial effects that structural modeling might have on improving CBM predictions in the future.

1.10 From pathways to dynamics

The successes of the fourteen studies discussed above demonstrate that constraint-based modeling is an approach that enables genome-scale science for metabolism. The remaining chapters of this dissertation utilize metabolic models (both constraint-based models and kinetic models) for a variety of biological applications starting from the level of pathways all the way up to cell-scale dynamics. In each chapter, a biological question is posed, novel computational methods or models are developed, and then used to analyze high-throughput data (Fig. 1.7). In Chapter 2, I focus on biological systems at the pathway level. An automated, un-biased approach to defining metabolic pathways is introduced and shown to be more consistent with large-scale biological datasets than traditional, human-derived pathways. In Chapters 3 and 4, I focus on biological systems at the network level to study macrophages under infection and inflammation. In Chap- ters 5 and 6, I focus on the dynamics of biological systems. A cell-scale metabolic reconstruction of the human erythrocyte is first presented. Then, high-throughput metabolite profiling data from individuals is used with the developed network to build high-throughput kinetic models for applications in personalized genomics and pharmacodynamics. 30

Figure 1.7: Outline of this thesis dissertation. Constraint-based and kinetic models are built and deployed to interpret various types of complex biological datasets. First, biological systems are studied at the pathway level to develop novel algorithms for data analysis. Second, biological systems are studied at the network level to study “immunometabolism” of macrophages during infection and inﬂam- mation. Finally, the dynamics of the human erythrocyte are studied for individuals, with potential applications for personalized genomics and pharmacodynamics. 31

1.11 Acknowledgements

We wish to thank Daniel Zielinski, Joshua Lerman, Nathan E. Lewis, and Harish Nagarajan for their insightful criticisms and comments on the manuscript. This work was supported by NIH grants GM068837 and GM057089 and the Novo Nordisk Foundation. ZAK is supported through the NSF Graduate Research Fel- lowship (DGE-1144086). Chapter 1 in part is a reprint of the material Bordbar A, Monk JM, King ZA, Palsson BO. Constraint-based models predict metabolic and associated cellular functions. Nat Rev Gen. (2014) 15:107-120. The dissertation author was the primary author. Chapter 2

Minimal metabolic pathway structure is consistent with associated biomolecular interactions

2.1 Abstract

Pathways are a universal paradigm for functionally describing cellular processes. Even though advances in high-throughput data generation have transformed biology, the core of our biological understanding, and hence data interpretation, is still predicated on human-defined pathways. Here, we introduce an unbiased, pathway structure for genome-scale metabolic networks defined based on principles of parsimony that do not mimic canonical human-defined textbook pathways. Instead, these minimal pathways better describe multiple independent pathway-associated biomolecular interaction datasets suggesting a functional organization for metabolism based on parsimonious use of cellular components. We use the inherent predictive capability of these pathways to experimentally discover novel transcriptional regulatory interactions in Escherichia coli metabolism for three transcription factors, effectively doubling the known regulatory roles for

32 33

Nac and MntR. This study suggests an underlying and fundamental principle in the evolutionary selection of pathway structures; namely that pathways may be minimal, independent, and segregated.

2.2 Introduction

Historically, biochemical experimentation has defined pathways or functional groupings of biomolecular interactions. Such pathways are foundational to human curated databases, such as KEGG [81], BioCyc [108], and Gene Ontology [109], are the basis for education in biochemistry, and are broadly deployed for analyzing and conceptualizing complex biological datasets [21]. However, the order of discovery and perceived importance of cellular components has unavoidably introduced a man-made bias. Pathway organization is thus often defined in a universal (rather than organism-specific) manner, missing potential organism-specific physiology. It is unclear if the currently used pathway structures correctly account for observed interactions between the macromolecules needed to carry out their function. Systems biology has led to the elucidation and analysis of multiple cellular networks, representing metabolism [25, 110], transcriptional regulation [70], protein-protein interactions [111], and genetic interactions [68]. These networks provide the opportunity to build unbiased pathway structures using statistical or mechanistic algorithms. Statistical approaches have been employed to high- throughput data and interaction networks to reconstruct the Cellular Component ontology of Gene Ontology [112]. However, such approaches were not meant to reconstruct the Biological Processes ontology and build pathways [113]. Mechanistic approaches include utilizing convex analysis with metabolic networks to automatically define pathways. Genome-scale metabolic networks contain curated and systematized information about all known biochemical moieties (metabolites) and transformations (reactions) of a particular cell’s metabolism encoded on its genome and described in experimental literature [114]. The stoichiometric matrix (S) is a mathematical description of a genome-scale metabolic 34 network, which can be queried by many available modeling methods [3]. These models and the calculated reaction fluxes are typically studied under a steady- state assumption (Fig. 2.1a). Thus, all potential steady-state reaction fluxes of a metabolic network are contained in the associated null space of S [115]. The basis vectors of the null space have been previously shown to correspond to biochemical pathways providing a fundamental connection between mathematical and biological concepts [116]. This connection has generated many attempts to characterize the null space’s contents using convex analysis [37]. Though readily applicable to small networks, it has been recognized for some time that convex pathway definitions (e.g. extreme pathways [36] and elementary flux modes [43]) cannot be globally applied to genome-scale networks as the enumeration of all such pathway vectors is not computationally feasible [117]. More recently, approaches have been developed to define subsets of metabolic pathways [75, 118, 119], though these pathways do not describe the totality of phenotypic states. In this study, we present a mixed-integer linear optimization algorithm (MinSpan) that can for the first time define the shortest, functional pathways for metabolism at the genome-scale using metabolic networks thereby describing the totality of steady-state phenotypes. We find that (1) the minimal pathways are biologically supported by independent biomolecular interaction networks, (2) the minimal pathways have stronger biological support than traditional human- defined metabolic pathways, and (3) the minimal pathways guided experimental discovery of novel regulatory roles for E. coli transcription factors.

2.3 Results

2.3.1 Deﬁning a minimal network pathway structure for metabolism

In this study, we introduce a network-based pathway framework called MinSpan that calculates the set of shortest pathways (based on reaction number) that are linearly independent from each other (Fig. 2.1). The MinSpan 35 pathways are the sparsest linear basis of the null space of S that maintains the biological and thermodynamic constraints of the network. The MinSpan pathways have a couple notable properties. First, unlike convex analysis approaches [120], MinSpan pathways can be computed for genome-scale metabolic networks. Second, the sparsest basis (Figure 1B) maximally segregates the network into clusters of reactions, genes, and proteins that function together. This property allows for an unbiased functional segregation of cellular metabolism into biologically meaningful pathways. The mathematical derivation of MinSpan is provided in the Materials and Methods. Here, we begin with an illustrative example of MinSpan for a metabolic model that contains a simplified representation of glycolysis and the TCA cycle (Fig. 2.1c). In this example, S has dimensions (m x n) where m = 14 metabolites and n = 18 reactions. The linear basis for the null space (N) has dimensions (n x n-r) where r is the rank of S. This S has rank (r = 14), meaning that the null space is four dimensional (e.g. = 18-14). Thus, a set of four linearly independent pathways through the network represents a linear basis for the null space of S. There are numerous potential sets of linearly independent pathways for a metabolic network as the linear basis of the null space is not unique. MinSpan chooses a set representing the shortest independent pathways and we later show that this linear basis is more biologically relevant than other linear bases. Running MinSpan on the simplified model converts the linear basis matrix (N) to a MinSpan pathway matrix (P) that contains the four shortest, linearly independent reaction pathways (Fig. 2.1b). The resulting pathways are presented on the network map (Fig. 2.1d). MinSpan pathways 1 and 3 are similar to traditional metabolic pathways (e.g. pathways that look like glycolysis and TCA cycle in this simplified network), while the last two MinSpan pathways do not mimic traditional pathways. In the supplementary material, we contrast MinSpan with past convex analysis methods (e.g. Extreme Pathways and Elementary Flux Modes) and also present another illustrative but more complex example for E. coli core metabolism. 36

Figure 2.1: Overview of the MinSpan algorithm. (A) A metabolic network is mathematically represented as a stoichiometric matrix (S). Reactions fluxes (v) are determined assuming steady-state. All potential flux states lie in the null space (N). (B) The MinSpan algorithm determines the shortest, independent pathways of the metabolic network by decomposing the null space of the stoichiometric matrix to form the sparsest basis. (C) A simplified model for glycolysis and TCA cycle is presented with 14 metabolites, 18 reactions, and a 4 dimensional null space. Reversible reactions are shown. (D) The four pathways calculated by MinSpan for the simplified model are presented, two of which recapitulate glycolysis and the TCA cycle, while the other two represent other possible metabolic pathways. The flux directions of a pathway through reversible reactions are shown as irreversible reactions. 37

2.3.2 MinSpan pathways are supported by independent biological datasets

MinSpan pathways are a fundamental and unbiased attempt to deﬁne pathways for metabolism. We next determined whether MinSpan pathways have biological relevance. By deﬁnition, pathways represent a grouping of biochemical transformations that can concurrently function. The biomolecular machinery (e.g. genes and proteins) of metabolic pathways have been previously shown to preferentially share interactions compared to components outside of pathways. Thus, the genes within pathways preferentially contain positive genetic interactions [121] and are co-regulated [67]. Furthermore, the proteins within pathways preferentially contain protein-protein interactions (see Supplementary Material). Thus, we compared calculated MinSpan pathways of genome-scale metabolic networks to the independent genome-scale networks of protein-protein interactions (PPI) [122], genetic interactions [68], and transcriptional regulation (TRN) [70]. We computed MinSpan pathways for the genome-scale metabolic networks of Escherichia coli [110] and Saccharomyces cerevisiae [25]. They contain 750 and 332 pathways respectively, representing the dimensions of the two null spaces. Each reaction in a metabolic network has a “gene-protein-reaction” association, which is a set of Boolean rules describing the required genes, transcripts, and proteins required to catalyze a metabolic reaction (Fig. 2.2a). For each MinSpan pathway, we grouped the genes and proteins of all the metabolic reactions within a particular pathway providing 750 and 332 sets of genes and proteins for each organism (Fig. 2.2b). 38

Figure 2.2: Correlation analysis shows MinSpan pathways are biologically relevant. (A) “Gene-protein-reaction” (GPR) associations describe the necessary genes and proteins required for the catalysis of a metabolic reaction. Pyruvate dehydrogenase in E. coli is shown as an example. (B) We grouped genes and proteins in the GPRs for each MinSpan pathway to check consistency with data sets on pathway associated biomolecular interactions. (C) Correlation analysis of the gene and protein sets shows MinSpan pathways are biologically consistent with three different biomolecular interaction networks: (D) Protein-protein interactions in S. cerevisiae (yeast two-hybrid data), (E) Positive genetic interactions in S. cerevisiae (p < 0.05 and > 0.16 as defined by Costanzo et al.), and (F) Transcriptional regulation in E. coli. MinSpan pathways are more consistent with data-driven protein interaction, genetic interaction, and transcriptional regulatory networks than human-defined pathways (KEGG, BioCyc, and GO), a least sparse null space (MaxSpan), and randomly generated null spaces (RandSpan). Accuracy (y-axis) is determined by the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. Coverage (x-axis) is determined by the number of interactions the method made a prediction for. The dotted circle for RandSpan represents the mean plus one standard deviation of the 100 random null spaces. x,y axes values are in the Supplementary Material. 39

We compared the MinSpan pathway protein and gene sets against three independent and different biomolecular interaction networks (Fig. 2.2c). We hypothesized that a highly correlated co-occurrence or co-absence of two proteins across all the MinSpan protein sets of a particular organism was an indication that the proteins share a PPI and that a co-occurrence or co-absence of two genes implies that the genes positively interact or that they are co-regulated by the same transcription factor (TF). We compared MinSpan pathways to PPI and genetic interactions in S. cerevisiae and the TRN of E. coli as the datasets are most complete for those particular organisms. By testing for significant Spearman correlation coefficients of co-occurrence or co-absence of two proteins across the S. cerevisiae MinSpan pathway protein sets, we were able to determine a priori roughly 80% of known yeast two-hybrid PPIs in metabolism [122] with high accuracy and coverage (Fig. 2.2d). Similarly, positive genetic interactions [68] in metabolism were accurately predicted by significant correlations of genes across the S. cereivisae MinSpan pathway gene sets (Fig. 2.2e). We also used correlation analysis with E. coli pathway gene sets to predict co-regulation by the same TF (Fig. 2.2f). We predicted over 6700 co-regulated gene pairs within E. coli metabolism. Our analysis quantitatively revealed two levels of regulation. First, local regulation (by TFs regulating at most 30 genes, which accounts for 90% of E. coli TFs regulating metabolism) is pathway based with TFs acting directly on linearly independent, minimal pathways (Fig. 2.2f). Second, global regulation (TFs with more than 30 regulated metabolic genes) involves many simultaneous cellular functions that are not just metabolic, and does not necessarily mimic the metabolic scaffold. Hence, MinSpan pathways recapitulate local and intermediate regulatory mechanisms, but do not capture the less-specific roles of global regulators. 40

2.3.3 MinSpan pathways are more biologically supported than human-deﬁned pathways

MinSpan pathways are highly consistent with PPI, positive genetic interactions, and local transcriptional regulation implying their biological relevance. However, a key question arises: are there other pathway structures (human or network-defined) that are equally or better suited at representing pathway- associated biomolecular data types? To answer this question, we compared the biological relevance of MinSpan pathways to other network pathway structures derived from the null space of the stoichiometric matrix. We calculated the “MaxSpan” or a null space basis matrix with the least number of non-zero entries (e.g. the longest pathways) and generated “RandSpan” or randomly generated null space bases (n =100) that had random criteria for the sparsity of the matrix. We also compared MinSpan to commonly used human-defined pathway databases including: KEGG Modules [81], BioCyc (EcoCyc [123] for E. coli and YeastCyc [124] for S. cerevisiae), and the Biological Processes ontology of Gene Ontology [109] for both organisms. Surprisingly, repeating the correlation analysis for these alternative network and human-defined pathways, we found that MinSpan pathways were generally more consistent with recapitulating biomolecular interactions (Fig. 2.2d, e, & f). For PPIs, MinSpan was marginally more consistent than other methods. As most PPIs occur between proteins within the same metabolic complex or adjacent metabolic reactions, most metabolic pathway structures should conserve PPIs. However, for the more difficult predictions of positive genetic interactions and local/intermediate transcriptional regulation, MinSpan was by far the most accurate pathway structure examined. None of the pathway structures were highly consistent with global regulation. These finding have two important implications. First, MinSpan pathways have more underlying support from biomolecular data types than human-defined pathways suggesting an alternative and fundamental modular organization of cellular metabolism. Defining pathways by human intuition and interpretation is less representative of the biomolecular interactions. Second, a minimal pathway struc- 41 ture is more biologically relevant than other potential linear bases of the null space confirming the principle underlying its use.

2.3.4 Global Comparison of MinSpan and Human-Derived Pathways

How different are the MinSpan pathways from other sources of pathway definitions? We can delineate the coverage and similarity of MinSpan pathways against traditional pathway databases for E. coli and S. cerevisiae metabolism (Fig. 2.3) to answer this question. First, we determined the number of pathways in each database covering the metabolic genes in the E. coli and S. cerevisiae metabolic models (see Table 2.1). There are widely varying numbers of pathways between all the databases. KEGG is the smaller of the two metabolic pathway databases. Gene Ontology contains many other genetic classifications and is larger than KEGG and BioCyc. MinSpan was the largest pathway database. Second, we calculated pair-wise connection specificity indices (CSI) [125, 126] for pathways across all databases (MinSpan and human-defined) based on their gene products and hierarchically clustered them (Fig. 2.3a). The CSI provides both a metric of how similar two pathways are, and how specific their similarity is compared to the rest of the available pathways. For each pathway definition (MinSpan, KEGG, BioCyc, and GO), we determined how many of their pathways are captured in other pathway definitions by whether or not they shared a high CSI value (Fig. 2.3b). As the number of MinSpan pathways is much larger than the number of pathways for other databases for E. coli, MinSpan captures most of their information, while KEGG, EcoCyc, and GO capture much less of MinSpan. For S. cerevisiae, Gene Ontology has the highest coverage. It has two fewer pathways than MinSpan (Table 1), but fully captures MinSpan pathways. 42

Figure 2.3: Global comparison of MinSpan pathways with databases of human-defined pathways. (A) The pairwise connection specificity index (CSI) was calculated for all pathway definitions (from four sources: MinSpan, KEGG, BioCyc, and Gene Ontology) for E. coli and S. cerevisiae as a measure of pathway similarity. The CSI matrix was hierarchically clustered and the database that the pathway originates from is color-coded to the left and above the heatmap to illustrate the clustering. (B) The percentage of pathways that share a high CSI value (top 15% of interactions) between pathway databases is presented. For example, MinSpan pathways are similar to and capture roughly 88% of all pathways in KEGG for E. coli. Conversely, KEGG is much smaller and only captures 26% of MinSpan pathways. (C) A K-nearest neighbor search was done to see how pathways classify into other databases. KEGG and BioCyc pathways have the closest resemblance with Gene Ontology being the next similar. MinSpan is significantly different than human-defined pathways. (* - significant enrichment, † - significant depletion). (D) 533 MinSpan pathways are similar to 582 traditional pathways. There is not a one-to-one mapping as similar pathways may exist in multiple human-defined databases. For E. coli, there are 204 unique MinSpan pathways. In S. cerevisiae, there are none as Gene Ontology captures all the pathways from the metabolic model. 43 44 13 7.1 298 S. cerevisiae Minspan (filtered) and 7.9 6.1 296 Gene E. coli Ontology S. cerevisiae 7.1 2.2 121 YeastCyc 74 3.9 1.4 KEGG 8.9 737 13.6 (filtered) MinSpan 7.3 2.4 348 Gene Ontology E. coli 6.6 2.7 199 EcoCyc 91 5.4 1.6 KEGG : Pathway numbers and lengths for MinSpan and pathway databases in Number of Pathways Average Pathway Length (Number of Genes) Average Gene Usage Table 2.1 45

Third, we used a K-nearest neighbor search to assign the individual pathways of one pathway definition into the other three pathway definitions (Fig. 2.3c) to determine if certain pathway definitions are more similar than others. The number of pathways that were similar between KEGG and BioCyc pathways and BioCyc and GO was statistically significant (p < 0.05, binomial distribution, Bonferroni correction) for both organisms. The number of pathways that shared similarities with MinSpan was significantly depleted (p < 0.05, binomial distribution, Bonferroni correction), or dissimilar, from most pathway definitions in both organisms. This suggests that KEGG and BioCyc have the most similar pathways, followed with Gene Ontology, which has more similarities with BioCyc than KEGG. MinSpan pathways are significantly different from human-defined pathway databases and for E. coli contain many unique pathways (Fig. 2.3c). Fourth, we determined what caused the significant difference in MinSpan and human-defined databases by looking at the individual pathways that were similar or dissimilar between the pathway definitions (Fig. 2.3d). For both E. coli and S. cerevisiae, MinSpan captured traditional pathways in carbon metabolism (e.g. glycolysis, pentose phosphate pathway, TCA cycle), amino acid metabolism, and nucleotide metabolism. However, 26 of the 56 pathways missed by MinSpan for E. coli were related to fatty acid metabolism. A MinSpan pathway operates under the steady-state assumption, leading to full flux balance of the metabolic network (e.g. all metabolites and cofactors in the pathway must be produced, consumed, and/or recycled). Traditional fatty acid pathway representations do not include all necessary components and fatty acid pathways typically require the most precursors and cofactors. Conversely, 99 of the 204 MinSpan pathways missed by traditional pathways dealt with pathways that contained the necessary cofactors and precursors, mainly for fatty acid metabolism. The second representative difference was that a few traditional pathways (representing gluconeogenesis and deoxyribonucleotide biosynthesis) were broken up into smaller MinSpan pathways. Third, MinSpan pathways for E. coli contained 54 novel pathways related to ion transport, alternate carbon metabolism, and electron transfer. For S. cerevisiae, Gene Ontology has a larger coverage of metabolism than 46

MinSpan. This difference is due to two reasons: 1) the metabolic model of S. cerevisiae is not as comprehensive as the model for E. coli, and 2) the Gene Ontology for S. cerevisiae is relatively more comprehensive than the one for E. coli. There were 32 pathways missed by MinSpan due to differing representations than traditional pathway databases. Seven missed pathways dealt with fatty acid metabolism and their MinSpan counterparts took cofactors and precursors into account. Five traditional pathways for tyrosine biosynthesis and triglyceride biosynthesis were broken up into smaller pathways by the MinSpan algorithm. The specific pathways that are missed by the MinSpan algorithm are provided in the Supplementary Data.

2.3.5 Key Examples of MinSpan Diﬀerences

As discussed from a global perspective, there are three representative differences between MinSpan and traditional pathways (Fig. 2.4). First, MinSpan enumerates pathways not already described in databases. We found 54 metabolic pathways in E. coli that were not described in KEGG, Gene Ontology, or EcoCyc. For example, one such pathway involves the degradation of shikimate, an aromatic compound, to L-tryptophan (Fig. 2.4a). The pathway consists of eight metabolic reactions, six of which are co-regulated by TrpR in E. coli, lending support to the pathway’s biological relevance. Second, MinSpan pathways are mass-balanced and are functionally independent units that take into account systemic requirements. For example, the traditional human-defined pathway for purine biosynthesis starts from phosphoribosyl pyrophosphate (prpp) and L-glutamine (Fig. 2.4b). Purine biosynthesis consists of 11 metabolic reactions that lead to IMP, which is further modified to other purines. This traditional pathway exists in KEGG, Gene Ontology, and EcoCyc and is biologically relevant as all 11 reactions are co-regulated by PurR in E. coli. The third metabolic reaction for purine biosynthesis (phosphoribosyl- glycinamide formyltransferase) requires 10-formyltetrahydrofolate (10fthf). Thus, the MinSpan pathway for purine biosynthesis also includes tetrahydrofolate (THF) recycling which contains three reactions. The gene for the first reaction in THF 47 recycling is transcriptionally regulated by PurR, while the two other reactions? genes are not transcriptionally regulated. MinSpan elucidates a coupling of THF recycling to IMP biosynthesis that is independently verified by the co-regulation of the necessary genes. Third, the minimalist decomposition of MinSpan is especially useful for complex metabolic network topologies where pathway enumeration is manually difficult. For example, threonine and methionine metabolism in S. cerevisiae is a small but complex network consisting of 12 metabolic reactions that involve multiple amino acids (Fig. 2.4c). KEGG contains two pathways for this region: L-aspartate to L-threonine and L-aspartate to L-methionine (Fig. 2.4c). This ignores another potential path to L-methionine from L-threonine. YeastCyc and GO cover all reactions in the example by containing many more pathways, seven and five pathways respectively. Similar to KEGG, both YeastCyc and GO describe L-methionine synthesis through L-aspartate. On the other hand, MinSpan decomposes the network into L-threonine production through L-aspartate and L- methionine production through L-threonine (Figure 4C). These two functional units contain the shortest possible connection between the major metabolites. In the process, this decouples L-aspartate from L-methionine production. Genetic interactions are consistent with the parsimonious approach. From the correlation analysis (at a False Positive Rate of 20%), both MinSpan and human-defined pathways correctly identified four positive genetic interactions in the L-threonine synthesis pathway (Fig. 2.4c) suggesting a functional metabolic pathway. However, there were no positive genetic interactions in the traditional L-methionine synthesis from L-aspartate, suggesting no functional pathway and leading to five false positive predictions. In fact, YDR158W and YER091C interact negatively, further supporting that the two genes are not in the same pathway. Conversely, MinSpan separates L-aspartate from L-methionine and hence correctly predicts no genetic interactions. In addition, YCL064C negatively interacts with YER052C, YDR158W, YJR139C, and YCR053W lending support to L-threonine and L-methionine production being decoupled. 48

Figure 2.4: The three differences between MinSpan and human-defined pathways. (A) MinSpan automates the enumeration of biologically relevant pathways. (B) MinSpan includes all required components of a pathway to be independent. The additional pathway components not found in human-defined pathways, such as THF recycling, are often co-regulated and thus a part of a coherent pathway functioning as a “module” in a network. (C) MinSpan decomposes complex topology into the simplest representation. For example, there is a shorter route to L-methionine production through L-threonine than from L-aspartate. Note: for MinSpan pathways, only the representative genes of the pathway are shown. Abbreviations: 10fthf: 10-formyltetrahydrofolate; 2obut: 2-oxobutanoate; gar: glycinamide ribonucleotide; L-cysta: L-cystathionine; methf: 5,10-methenyltetrahydrofolate; mlthf: 5,10-methylenetetrahydrofolate; prpp: phosphoribosyl pyrophosphate; skm: shikimate; thf: tetrahydrofolate. 49

2.3.6 MinSpan pathways predict transcriptional regulation

MinSpan is an inherent property of metabolic networks, unlike human- defined pathways, and offers the direct ability to predict pathway associated biomolecular properties from flux distributions calculated by constraint-based modeling. From the above correlation analysis, we observed that genes within a MinSpan pathway are often co-regulated by the same TF. Thus, we tested whether TFs act directly on the MinSpan pathway structure during metabolic shifts to coordinate expression of the metabolic genes needed to implement a fully functional pathway. Constraint-based models can be used with Monte Carlo sampling methods to compute candidate reaction flux states through the metabolic network [127]. Comparing the significantly changed reaction fluxes between two sampled metabolic conditions has been previously shown to be consistent with experimental datasets [128, 5, 62] (Fig. 2.5a). However, these predicted differences are on an individual reaction basis, not for coordinated changes in flux states that might reflect the actions of the TRN. A reaction flux state can be decomposed into its constituent pathways. As the MinSpan pathway matrix (P) is a linear basis for the null space of S, any sampled flux distribution (v) can be decomposed into linear weights (α) of P (Wiback et al., 2003). Thus, metabolic reactions (v) determined from Monte Carlo sampling can be converted into changes in pathway flux loads (α) (Fig. 2.5a). As MinSpan pathways maintain the transcriptional regulatory hierarchy, the MinSpan pathways can then be associated to TFs based on enrichment of known regulatory gene targets [70] (p < 0.01, hypergeometric test). Thus, a significant change in the flux load of a MinSpan pathway (α) is a direct predictor of pathway-associated TF activity. 50

Figure 2.5: MinSpan pathways help predict transcription factor activity. (A) Constraint-based models can determine reaction activity, or flux states (v), using Monte Carlo sampling. Decomposing sampled flux states into linear weight- ings of MinSpan pathways (α) allows prediction of TF activity. For example, metabolic reaction fluxes are sampled under glucose minimal media and glucose minimal media + L-arginine supplementation. Typical analysis would yield a list of reactions (including vi) that are significantly changed. With MinSpan pathways, the flux distributions can be converted into significant changes in pathway activity (including αj). TFs are associated with pathways based on enrichment of regulated genes. Predicting TF activity is based on which TFs are associated with the significantly changed pathways, in this case αj is associated with ArgR. (B) The TF activity of 51 nutrient shifts was predicted and can be hierarchically clustered by nutrient shift type. TF activity for the heatmap is defined as the percentage of differential MinSpan pathways that are associated with that TF. 36% of the TF-environment associations predicted are not known, providing numerous predictions for experimentation. Experimentally tested TF-environment associations are highlighted. 51 52

Metabolic reaction fluxes were computed by Monte Carlo sampling [127] for minimal, aerobic glucose conditions, as well as 51 nutritional shifts due to changes in carbon, nitrogen, phosphorus, and sulfur sources, as well as supplementation of amino acids and nucleotide precursors and removal of oxygen. Sampled metabolic reaction fluxes (v) were decomposed into MinSpan pathway flux loads (α) to determine significantly changed pathways across nutritional conditions (Fig. 2.5a). TFs associated with significantly changed pathways (p < 0.05, Wilcoxon signed-rank test) were then used as predictors of transcriptional regulation. Predicted TF activities (Fig. 2.5b) substantially agreed with known regulatory changes detailed in EcoCyc [123] and primary literature (see Supplementary Material). As the TRN is not completely known, we focused only on true positive predictions. Transcriptional regulatory changes for 37 of the 51 nutrient shifts matched known associations (true positives) and eight other shifts partially matched known TF-environment associations (see Supplementary Material). Fo- cusing only on these two groups, 36% of the predicted TF activities are not known to be associated with the corresponding shift, providing numerous novel transcriptional regulation predictions. Hierarchically clustering nutrient shifts based on predicted TRN response stratifies key classes of shifts (Fig. 2.5b). Nucleotide precursor supplementation is characterized by PurR and GcvA activity. Alternate sulfur sources as well as L-cysteine and L-methionine supplementation clustered by CysB activity. Sugar carbon sources clustered by Cra. Organic acid carbon sources, including glycerol, were systemic, characterized by Fnr, Lrp, and Cra activity. Other systemic shifts included the response to the lack of oxygen and alternate nitrogen sources. Fi- nally, predicted transcriptional regulatory changes of well-studied shifts [129, 130] of amino acid supplementation (L-arginine, L-leucine, L-tryptophan), nucleotide supplementation (adenine), and oxygen depletion are described in greater detail enumerating specific MinSpan pathway changes (see Supplementary Material). 53

Figure 2.6: MinSpan TRN predictions suggest informative dual perturbation experiments that led to discovery of novel E. coli transcriptional regulation. (A) Nac plays a role in purine metabolism and a larger role in nitrogen metabolism than previously known. Nac also regulates Lrp through gcvB. (B) Cra regulates tnaCAB, which is also subject to Crp regulation. (C) MntR plays a regulatory role for four genes that are heavily regulated by ArcA/Fnr/Fur/PdhR and are utilized during anaerobic conditions. 54

2.3.7 MinSpan pathways aid in experimental discovery of novel regulation

MinSpan pathways not only accurately predict known TF activities but also offer an opportunity to discover novel regulation. We chose three novel TF- environment predictions to experimentally validate that are non-obvious, in the sense that little to no literature links the TF with the predicted associated environment. To be rigorous in the experimental design, we chose environmental shifts that have been well-studied; where discovering novel experimental findings would be more difficult. The three tested associations were: Nac with adenine supplementation (ade/Nac), Cra with L-tryptophan supplementation (L-trp/Cra), and MntR with shift to anaerobic conditions (OO/MntR). The chosen shifts represent three distinct magnitudes of dual perturbations. In the ade/Nac case, the environment and genetic perturbations are both relatively minor. In the L-trp/Cra case, Cra is a broad acting TF and dominates, while the environmental perturbation to the absence of oxygen dominates in the O2/MntR case. We generated RNA-seq data from dual perturbation experiments [104, 131] for the three cases consisting of perturbations in the environment (media supplementation) and genetics (TF knockout) of E. coli. For each case, we determined the gene set that is uniquely differentially expressed because of the genetic perturbation during environmental shift (see Materials and Methods) in order to analyze whether the TF plays a role in the environmental shift. Global analysis of the gene sets, based on enrichment of regulatory interactions with known TF associations [70], suggested that predictions for ade/Nac and O2/MntR were correct; and the L-trp/Cra prediction was indeterminate. In the ade/Nac case, the gene set was enriched with GcvA, Lrp, and PurR (p = 9.5e- 6, 1.6e-4, 1.8e-4, hypergeometric test) suggesting that Nac (Nitrogen assimilation control) regulates similar genes during the shift or even regulates the corresponding TFs. In the L-trp/Cra case, there were no enriched TFs suggesting no global consensus. This discrepancy might be due to: 1) Cra knockout causing a large genetic shift that might have changed how E. coli responds to L-trp, and 2) MinSpan 55

is inaccurate for predicting global regulation. In the O2/MntR case, TFs known to be associated with the anaerobic shift (including ArcA and Fnr) were enriched as a whole (p = 3.6e-3, hypergeometric test). Through differential expression and detection of high confidence binding sequence motifs, we identified novel regulatory roles for all three tested TFs (Fig. 2.6). For the ade/Nac case, we identified potential regulation of genes involved in purine metabolism, involved in nitrogen assimilation, and regulated by Lrp (Fig. 2.6a). The transcription units (TUs) gcvTHP and gcvB are known to be regulated by GcvA and PurR and are potentially regulated by Nac. Nac also seems to regulate gcvB, which is a small regulatory RNA of Lrp [132]. Using FIMO [133], we detected a significant Nac binding sequence motif for gcvB (- 173 bp of transcription start site (TSS), p = 1.73e-6). A significant increase in gcvB suggests a repression of Lrp and genes typically repressed by Lrp should have higher expression and genes activated by Lrp should have lower expression. This trend was observed in all significantly changed expression of Lrp regulated genes (Fig. 2.6a). We also identified novel regulation and high confidence binding sequence motifs for nitrogen assimilation genes: nirB (-87 bp of TSS, p = 7.63e-5) and nrdHIEF (-170 bp of TSS, p = 2.2e-5). Though there was no global trend for the L-trp/Cra case, we did find that Cra potentially regulates the L-trp symporter and tryptophanase (tnaCAB, Fig. 2.6b). Crp is a known regulator of this TU [134] but our data suggests that Cra also plays a role, possibly by in-direct regulation through Crp [135].

Finally in the O2/MntR case, MntR potentially regulates four TUs highly regulated during the anaerobic shift including the TF GadX (-120 bp of TSS, p = 2.6e-5) (Fig. 2.6c). GadX regulates pH-inducing genes and the GAD system that play roles during fermentation [136]. MntR also potentially regulates a subunit of pyruvate formate lyase (yﬁD), NADH:ubiquinone oxidoreductase II (ndh), and molybdenum biosynthesis (moaABC). It is important to note that potential regulatory sites were only detected during dual perturbation, suggesting that experiments to elucidate TRNs under one environmental condition underestimate non-intuitive regulatory events. The 56 additional potential binding sites nearly double the known potential binding sites for Nac and MntR [123]. Dual perturbation predictions were more accurate for local TFs (Nac/MntR) than global TFs (Cra), which is also consistent with the correlation analysis. The analysis presented to predict TF-environment associations is only possible with MinSpan pathways, as opposed to pathway databases, as MinSpan pathways directly link ﬂux simulations from constraint-based models to pathway biomolecular properties.

2.4 Discussion

High-throughput technologies have transformed biological data generation and experimentation. However, a major remaining challenge is analysis and interpretation of large datasets for achieving biological understanding [137, 138]. Though data analysis is steadily improving, the underlying interpretation is still often relies upon historically determined, human-defined pathways [21]. In this study, we introduce an unbiased genome-scale method to define pathways based on whole network function and a principle of parsimonious use of cellular components. We find that the MinSpan pathways are not only biologically relevant in their ability to recapitulate independent datasets on biomolecular interactions, but are surprisingly more accurate than traditional pathway databases such as KEGG, EcoCyc, YeastCyc, and Gene Ontology. The results have three implications. First, the results suggest that the traditional approach to defining metabolic pathways is not complete and an unbiased alternative might be more representative of the underlying pathway structure. Traditional pathway enumeration focuses predominantly on biochemical reactions. By incorporating a minimal criterion of the number of reactions used, the MinSpan approach indirectly introduces the requirements for biomolecular machinery usage into the pathway definition. There are two fundamental features of MinSpan pathways that differentiate them from traditional pathways. 1) MinSpan pathways account for all necessary components to make a pathway fully functionally independent (e.g. they are network-based). 2) MinSpan pathways represent the simplest pathway structure in a given network 57 context. The improved consistency of MinSpan with biomolecular interactions suggests that the coordinated regulation and usage of biological components in the cell have evolved to be minimal and independent in order to adapt to perturbation with as little cost to the cell as possible. Biologically meaningful pathways may be minimal, independent, and segregated. Future delineation of metabolic pathways, both network and human-defined, should take into consideration the cost of biomolecular machinery and systemic functional requirements for metabolic function. Second, the MinSpan pathways provide an alternative, complementary, and potentially more powerful approach for investigators to analyze their generated data. Current pathway databases are tremendously important in conceptualizing biological function and are used by numerous investigators for data analysis. As MinSpan pathways are more biologically relevant in terms of the underlying biomolecular interactions, the theory presented here for an unbiased pathway structure opens up the potential for a whole new suite of pathways to be used with tools such as Gene Set Enrichment Analysis (GSEA) [139]. Third, MinSpan pathways can guide the difficult process of reconstructing and determining TRNs. The best characterized TRN is that in E. coli. Though the E. coli TRN is not complete, our approach of coupling metabolic models with MinSpan pathways identifies novel associations between TFs and environmental shifts, providing a rational method to design context-specific dual perturbation experiments. We have experimentally validated three of the 84 novel predictions allowing us to double the known potential regulatory sites for Nac and MntR. Our analysis shows that experiments under varying environmental conditions are required to elucidate novel regulatory roles. The remaining 81 MinSpan predictions provide a roadmap for future experimentation to help discover numerous new regulatory roles in E. coli and the overall method can applied to any of the over 100 other organisms with a genome-scale metabolic network. There are also some limitations and areas for further research with regards to the MinSpan algorithm. First, the MinSpan algorithm is dependent on the quality of the genome-scale metabolic model utilized, in the same way the quality 58 of pathway databases for particular organisms is dependent on the biochemical knowledge available. The MinSpan pathways are more comprehensive in E. coli than S. cerevisiae as iJO1366 is much more complete than iMM904. This difference is a reflection of the biochemistry of E. coli being better studied than S. cerevisiae. Second, just as there are multiple pathway databases for the same organism, there are sometimes multiple metabolic network reconstructions for the same organism. Further research is needed to assess the differences in the calculated MinSpan pathways for different metabolic models of the same organism. Third, human-defined pathways are often defined in a universal, rather than organism-specific, manner. This can be a strength, particularly for educational purposes as it provides a common “language” to describe the function of many organisms. However, universal pathways can also be a weakness. Human-defined pathways focus on the topology of gene products, while ignoring the organism-specific functional context of metabolic pathways. For example, isotopomer metabolic flux profiling has shown that metabolic functionality can often be quite different than the gene products present [140]. To assess the conservation of MinSpan pathways, a preliminary analysis comparing E. coli and S. cerevisiae MinSpan pathways is provided in the Supplementary Material. However, further research with several reconstructions of different organisms is needed to fully assess whether MinSpan pathways are conserved across species.

2.5 Materials and Methods

2.5.1 MinSpan Formulation

The MinSpan algorithm determines the shortest, linearly-independent pathways for a stoichiometric matrix (S) with dimensions m x n and a rank of r. The input is a metabolic model (variables S, lb, ub) and outputs a MinSpan pathway matrix (P). P is the sparsest null space of S that maintains biological and thermodynamic constraints (lb and ub). Coleman and Pothen defined the mathematical problem for the sparsest linear basis of the null space as the “sparse null space basis problem” and then proved that a greedy algorithm 59 must find the globally optimal sparsest null space (least number of non-zero entries) [141]. More recently Gottlieb and Neylon showed that a similar problem, “the matrix sparsification problem”, is equivalent to the “sparse null space basis problem” [142]. We formulated “the matrix sparsification problem” as a Mixed Integer Linear Programming (MILP) problem. The MILP is boxed in the pseudo-code below. Simply put, the orthonormal null space (N) of S is initially defined by singular value decomposition. Then, the vectors of the orthonormal null space are iteratively replaced by the shortest pathways that span the removed vector’s subspace. This process is continuously repeated until the number of non-zero entries in P has converged to a minimum. Before running the algorithm, all reactions that cannot carry a flux are removed, all exchanges are opened, and the biomass function is removed. The algorithm is summarized below:

N = null(S) P = N while(true) P0 = P for j = 1:n-r ( 0 Pa,b if b 6= j Pa,b = 0 if b = j x = N · null(ΘT) where N · Θ = P P min bi where b ∈ {0, 1} S · v = 0

lbi ≤ vi ≤ ubi

− 1000bi ≤ vi ≤ 1000bi xT · v 6= 0 ( 0 Pa,b if b 6= j Pa,b = v if b = j end P = P0 if nnz(P) == nnz(P0), break, end end 60

The null() operator deﬁnes the orthonormal null space using singular value decomposition. The nnz() operator determines the number of non-zero entries in a matrix. Vectors are in bold and matrices are capitalized. P’ is similar to P, but without the vector pj, and x is a vector that spans the space of pj and is linearly independent from P’. This is proved by contradiction. If P’ and x are linearly dependent then multiples of x and the vectors of P’ should linearly combine to zero:

P0Λ + xλ = 0 where Λ, λ 6= 0 P0Λ = −N · null(ΘT)λ N-1P0Λ = −null(ΘT)λ (2.1) ΘΛ = −null(ΘT)λ ΘTΘΛ = −ΘTnull(ΘT)λ ΘTΘ = 0

ΘTΘ is a positive semideﬁnite matrix by deﬁnition and cannot equal zero. Thus P and x are linearly independent. The xT·v 6= 0 constraint ensures that the calculated pathway spans the proper dimension of the null space. For a MILP problem, the constraint is formulated as below, where is an arbitrarily small value (set to 0.1, various other choices yield similar results) and f+ and f- are binary variables required to formulate a “not-equal” constraint:

f + + f − = 1 where f ∈ {0, 1} xT · v − 1000(1 − f +) ≥ f + (2.2) −xT · v + 1000(1 − f −) ≥ f −

The termination criteria for the branch and bound method for each MILP iteration is a relative gap tolerance of 1e-3 or time limit of 30 minutes. These criteria were developed based on convergent properties of solutions as the two parameters were varied. As the MinSpan algorithm is a MILP problem, there can 61 exist alternative optimal solutions. The total number of non-zero entries in the matrix is unique, but the pathways may not be. Rerunning the correlation analysis to determine biological relevance with alternate MinSpan pathways for both E. coli and S. cerevisiae yielded little changes to the results (see Supplementary Material). MinSpan is available for COBRApy at (https://github.com/sbrg/minspan). MaxSpan and RandSpan were generated similarly with slight modiﬁcations. For MaxSpan, the optimization was switched to a maximization problem to make the densest null space. For RandSpan, we assigned a random value from -0.5 to 0.5 to the optimization c vector to randomly minimize and maximize the use of reactions while constructing each null space. For maximization, the following constraints are added to link the binary and continuous variables:

vi − bi ≥ 1000(gi − 1) − (1000 − ) (2.3) vi + bi ≥ −1000gi − (1000 − )

Where is an arbitrarily small value (set to 0.1, various other choices yield similar results) and gi are dummy binary variables to allow an “OR” statement in linear programming. If gi is either 1 or 0, one of the two above constraints is oﬀ.

2.5.2 Correlation Analysis

The metabolic reactions in MinSpan pathways were converted to gene and protein sets based on the gene-protein-reaction associations. Pairwise Spearman rank correlations of the co-occurrence or co-absence of genes and proteins across the pathways were calculated. Correlations that were not significant (p ≥ 0.05, permutation test) were filtered from further analysis. The total number of correlations remaining after filtering is the Coverage criteria in the x-axes of Fig. 2.2. The correlation coefficient was the varied discrimination threshold for generating the receiver operating characteristics (ROC) curve. KEGG modules [81] and GO Biological Processes ontology [109] were downloaded from their respective websites on 01/27/2013 for comparison. EcoCyc [123] and YeastCyc [124] pathways were downloaded on 02/12/2013. Only distinct pathways with two or more genes from 62 the metabolic networks were considered. The same correlation analysis was used for the human curated pathways, MaxSpan, and RandSpan.

2.5.3 Comparing Pathway Databases

We calculated the connection specificity index (CSI) [125, 126] between pathways based on their gene products across all pathway definitions (MinSpan, KEGG, BioCyc, and Gene Ontology). The CSI is a metric that determines the similarity between two vectors by ranking the Pearson correlation coefficient of the two vectors based on the correlations of all other vectors versus the two vectors in question. A CSI between pathways A and B is defined as:

# pathways correlated with A and B that PCC < PCCAB - t CSIAB = (2.4) ny

where PCC is the correlation coefficient of gene products, PCCAB is the correlation between A and B, t is an empirically derived threshold, and ny is the total number of pathways. Further explanation of CSI and software tools for its use are available [126]. The threshold for CSI was set based on the distribution of correlations (E. coli - 0.0350, S. cerevisiae - 0.0562, see Supplementary Figures S7). Pathways were considered similar if their CSI ranked in the top 15% of CSI values. We also employed a k-nearest neighbors search to find the most similar pathways across the pathway databases. The Pearson correlation was used as the distance metric. The closest hit that also had a high CSI value (top 15%) was used as the nearest neighbor. If the pathway did not have a high CSI value, then the pathway was deemed unique compared to the other databases. MinSpan pathways contain many gene products related to the mass- balancing of the network, such as transporters, that are active in nearly every pathway. In order for a meaningful comparison between MinSpan and human- defined databases, the genes in each MinSpan pathway were filtered to only the representative genes of that pathway. To do so, we used a conservative filter to 63 remove genes that were in nearly every pathway (p > 0.85, hypergeometric, empirically derived).

2.5.4 Determining Reaction Fluxes and Transcription Fac- tor Activities

Monte Carlo sampling [127] was used to determine 10000 reaction flux distributions for the E. coli metabolic network for each of the 52 nutrient conditions. For glucose minimal media conditions, exchange constraints were taken from [104]. For anaerobic conditions, rate of oxygen input was set to zero. For amino acid and nucleotide simulations, rate of amino acid or nucleotide input was set to the minimum rate that would allow the biomass constituent, at the wild type growth rate, to be generated solely from exogenous substrate. Six nutrient shifts were not considered (L-alanine, L-asparagine, L-aspartate, L-glutamate, L-glutamine, and uracil), as they are involved with type 3 loops (physiologically infeasible pathways that are artifacts of pathway enumeration algorithms [115]). For alternate carbon, nitrogen, phosphorus, and sulfur sources, the original input source from glucose minimal media conditions was set to zero and an equal rate of atom flow was set as the input. All nutrient conditions had the basal biomass flux rate set to 90% of the optimal value. To compare different conditions, sampled fluxes within a particular condition were normalized by median growth rate for comparison. The specific lower bounds for exchanges used with the E. coli model are detailed in the Supplementary Data. All upper bounds are set to 1000. Assignment of TFs to MinSpan pathways was by hypergeometric enrichment (p < 0.01) of the TF regulated genes as determined by RegulonDB [70]. Determination of whether or not a TF plays a role in an environmental shift was determined by hypergeometric enrichment (p < 0.05) of the number of significantly changed TF-associated pathways. TFs regulating only one metabolic reaction or appearing in one pathway were removed due to a lack of statistical power. 64

2.5.5 Growth conditions, RNA isolation, and RNA-seq

KEIO collection knock out strains were used [64]. Strains were grown to mid exponential phase under conditions specified in Table S1. Spinner flasks were used for aerobic culture and serum bottles for anaerobic cultures. One volume of mid exponential sample was mixed with two volumes of RNA-Protect (Qiagen). Cell pellets were lysed for 30 min at 37C using Readylyse Lysozyme (Epicentre), SuperaseIn (Ambion), Proteinase K (Invitrogen) and SDS. Following cell lysis, total RNA was isolated using RNeasy columns (Qiagen) following vendor procedures with on column DNAse treatment for 30 min at room temperature. Paired-end, strand-specific RNAseq was performed using the dUTP method [143] with the following modifications. rRNA was removed with Epicentre’s Ribo-Zero rRNA Removal Kit. Subtracted RNA was fragmented for 3 min using Ambion’s RNA Fragmentation Reagents. cDNA was generated using Invitrogen’s SuperScript III First-Strand Synthesis protocol with random hexamer priming.

2.5.6 Transcript Quantiﬁcation from RNA-seq reads

RNA-seq reads were aligned to the genome sequence of E. coli (RefSeq: NC 000913) using Bowtie [144] with 2 mismatches allowed per read alignment. To estimate transcript abundances, FPKM values were calculated using Cufflinks [145] with appropriate parameters set for the strand-specific library type and upper- quartile normalization. EcoCyc annotations were used for transcript quantification. Differential expression analysis was done using cuffdiff with upper-quartile normalization and appropriate parameters set for strand-specific library type. A fold change of greater than 2 fold and a false discovery rate cutoff of 0.05 was used to determine significant differential expression.

2.5.7 Analysis workﬂow for dual perturbations

Dual perturbation experiments consisted of four RNA-seq experiments: 1) wild type (WT) on glucose minimal media, 2) WT + nutrient shift, 3) TF knockout (KO) on glucose minimal media, and 4) TF KO + nutrient shift. We de- 65

fined differentially expressed gene sets between the four conditions as: E1 (WT vs WT+nutrient), E2 (KO vs KO+nutrient), G1 (WT vs KO), and G2 (WT+nutrient vs KO+nutrient). TF regulation of genes during nutrient shift was defined by the union of the sets: {{E2∆E1}\G1} and G2\G1. The gene sets for the three experiments are provided in the Supplementary Material. To globally determine prediction accuracy, we used RegulonDB to determine whether a gene set was enriched (p < 0.05, hypergeometric test, Bonferroni correction) in genes of a particular transcription factor. A prediction was deemed correct if the enriched transcription factors had known associations with the environmental shift.

2.6 Acknowledgements

We would like to thank Gabriela Guzman for aiding HL with experiments and Dr. Niko Sonnenschein, Dr. Neema Jamshidi, and Daniel Zielinski for valuable discussions. This work was supported by NIH grant GM068837 and the Novo Nordisk Foundation. HL is supported through the National Science Foundation Graduate Research Fellowship (DGE1144086). RNA-seq data is available at Gene Expression Omnibus (GSE48324). Chapter 2 in part is from a manuscript that is being prepared for publication Bordbar A, Nagarajan H, Lewis NE, Latif H, Ebrahim A, Federowicz S, Schellenberger J, Palsson BO. Minimal metabolic pathway structure is consistent with associated biomolecular interactions. The dissertation author was the primary author. Chapter 3

Insight into human alveolar macrophage and M. tuberculosis interactions via metabolic reconstructions

3.1 Abstract

Metabolic coupling of Mycobacterium tuberculosis to its host is foundational to its pathogenesis. Computational genome-scale metabolic models have shown utility in integrating -omic as well as physiologic data for systemic, mechanistic analysis of metabolism. To date, integrative analysis of host-pathogen interactions using in silico mass-balanced, genome-scale models has not been performed. We, therefore, constructed a cell-speciﬁc alveolar macrophage model, iAB-AMØ-1410, from the global human metabolic reconstruction, Recon 1. The model successfully predicted experimentally veriﬁed ATP and nitric oxide production rates in macrophages. This model was then integrated with an M. tuberculosis H37Rv model, iNJ661, to build an integrated host-pathogen genome-scale reconstruction, iAB-AMØ-1410-Mt-661. The integrated host-pathogen network enables simulation of the metabolic changes during infection. The resulting reaction activity and

66 67 gene essentiality targets of the integrated model represent an altered infectious state. High-throughput data from infected macrophages were mapped onto the host-pathogen network and were able to describe three distinct pathological states. Integrated host-pathogen reconstructions thus form a foundation upon which understanding the biology and pathophysiology of infections can be developed.

3.2 Introduction

Mycobacterium tuberculosis (M. tb) is a highly persistent pathogen that primarily affects the third world. About one-third of the world’s population is infected, with 9.27 million new cases and 1.76 million deaths in 2007 [146]. Further- more, the development of multidrug-resistant tuberculosis and extensively drug- resistant tuberculosis, which are infections that cannot be treated with first-line and second-line drugs, respectively, continue to keep M. tb a concern in developed countries. A key aspect of M. tb’s pathogenicity is its ability to dramatically change its metabolism in different states. After infecting an alveolar macrophage, M. tb shifts into an infectious state, stopping biomass accumulation. M. tb also accumulates mycolic and fatty acids on its cell wall in order to survive a very hostile phagosome environment that is nutrient poor, hypoxic, nitrosative, and oxidative [147] The mycolic and fatty acids are essential to M. tb’s resistance to drug therapies [148]. Under a latent state, the pathogen cannot be spread and the patient is not in danger. However, latent tuberculosis can activate in the lungs (75% of cases) or other parts of the body. The lack of proper experimental M. tb models that provide a mechanistic understanding of in vivo conditions hinders a better understanding of the infection process [149]. It is not only difficult to work with M. tb because of its slow growth rate, but most in vitro models are inaccurate simulations of in vivo conditions. In vivo animal models better characterize the disease, but there is less control over experimental conditions. Genome-scale metabolic reconstructions are useful for increasing the under- 68 standing of the genotype/phenotype relationship in organisms [150]. These networks are built in a bottom-up manner in which the nodes (metabolites) are connected by links (biochemical transformations and reactions) as defined by genetic and physiological data [151, 152]. There are now genome-scale reconstructions for numerous prokaryotes [153, 63], including two such networks for M. tb [154, 155] and eukaryotes [156, 157]. In addition, the global human metabolic network, Recon 1, has been reconstructed [79] containing all the annotated biochemical reactions for human cells. Recon 1 has sparked interest in building specific networks for different human tissues and cells. As the number of genome-scale reconstructions increases, there has been interest in building networks that characterize the metabolic interaction between multiple organisms. In this study, we looked to increase the understanding of the metabolic changes in both the host (alveolar macrophage) and the pathogen (M. tb) during infection through the use of a host-pathogen genome-scale reconstruction. We did this by first building a manually curated cell-specific human network for the alveolar macrophage. We then integrated the host model (HM) with the pathogen model (PM) (iNJ661) to form the host-pathogen model (HPM). We used established constraint-based analysis methods [158] and published gene expression data for M. tb infections to further our understanding of macrophage and M. tb metabolic functions.

3.3 Results

Interrogation of the interactions between an HM and PM required completing four steps: (1) construction of the alveolar macrophage HM, (2) adaptation of the M. tb PM, (3) integration of the two into an HPM, and (4) analysis of the resulting HPM in diﬀerent contexts. These analyses led to three general observations,

1. Integration of the PM into the HM both reduced the size and altered the solution space of both models. The changes in the PM were primarily because of the interfacial constraints set on the network. 69

2. Analysis of the PM portion of the HPM improved gene essentiality predictions.

3. The HPM enabled analysis of complex data sets from complex experimental conditions and was able to highlight diﬀerences in metabolism through the comparison of the three infectious states.

3.3.1 Constructing the HM: human alveolar macrophage, iAB-AMØ-1410

Reconstruction of the global alveolar macrophage network was a time intensive two-step process beginning with algorithmic tailoring of Recon 1 followed by a set of manual curation steps (Fig. 3.1). Using gene expression data from inactive macrophages, maximum enzyme fluxes, and exchange data for the macrophage (see Supplementary information), two preliminary models were built using two different previously published algorithms [28, 27]. These algorithms produced context- specific, not global, models that were the best fit of reaction flux and pathway length to the gene expression and exchange data. All other inactive and dead- end reactions were removed by these algorithms. Unfortunately, macrophages are known to have varied gene expression states, such as during infection and inflammation; therefore, the algorithms do not take into account other potential states. In fact, both algorithms failed to include nitric oxide synthase, which is critical for nitric oxide production. This is due to the initial gene expression data used for inactive alveolar macrophages. Hence, a global, consensus model was constructed through manual curation and reconciliation of the two models. Manual curation was conducted using primary literature, enzyme, and immunohistological databases, and network structure and requirements. The global model included all inactive and dead-end reactions to provide opportunities for further research with different exchange constraints and to allow for future gap filling, respectively. The resulting HM was named iAB-AMØ-1410 for i (in silico), AB (the primary author’s initials), AMØ(alveolar macrophage), 1410 (number of open reading frames). 70

Figure 3.1: Workflow of building the cell-specific model iAB-AMØ-1410. Gene expression data for alveolar macrophages and macrophage specific exchanges were fed into two model building algorithms (GIMME and Shlomi-NBT-08) to build two preliminary context-specific alveolar macrophage networks. Using enzyme databases (BRENDA and HPRD), immunohistological staining databases (Human Protein Atlas), transporter databases (HMTD), primary literature (see Supplementary information), and network features, the preliminary models were reconciled and manually curated into the final iAB-AMØ-1410. 71

3.3.2 Characterizing the HM, iAB-AMØ-1410

The iAB-AMØ-1410 model has many of the capabilities of Recon 1 because of its high number of reactions (intracellular: 3012) and active genes (1410) (Fig. 3.2a). In addition, the HM maintains a high reaction count from all major subsystems (Fig. 3.2b). Macrophages are known to be highly metabolically active as well as have many different metabolic states because of varied patterns of gene expression [159]. A biomass maintenance function was formulated (see Materi- als and Methods) and used to constrain the HM. The metabolic components of the biomass maintenance function are detailed in the Supplementary information. As macrophages do not readily multiply, the biomass maintenance function used here represents cellular maintenance requirements, such as lipid, protein, mRNA turnover, DNA repair, and ATP maintenance. A series of tests were used to validate the macrophage model. We first tested ATP production of the macrophage model. We optimized the HM for ATP production yielding a flux of 0.6843 mmol/h/g cell DW (Fig. 3.4c). In vitro experiments with macrophages yielded an ATP generation rate of 0.7121 mmol/h/g cell DW [160]. This puts the HM at a production accuracy of 96.1%. Second, we simulated nitric oxide production, which is another important metabolic capacity of macrophages. Maximum nitric oxide production in silico was 0.0359 mmol/h/g cell DW, which is 1.7% lower than the maximum in vitro rate of 0.0365 mmol/h/g cell DW [161]. The macrophage is also highly anaerobic in its growth despite its uptake of oxygen [160]. Macrophages have high production rates of lactate [162] because of high glucose oxidation very similar to the Warburg effect [163]. Consistent with this phenotype, the HM predicts a high glucose oxidation rate and produces 0.3093 mmol/h/g cell DW of lactate. 72

Figure 3.2: Comparison of reactions and network capabilities of iAB- AMØ-1410 and the global human network (Recon 1). (A) The overall gene and reaction counts of Recon 1 and iAB- AMØ-1410 are provided. The majority of genes and reactions are preserved. (B) Most reactions were included in the ﬁnal macrophage network. The largest reductions in number of reactions by subsystem were in amino-acid, lipid, and polysaccharide metabolism. Carbohydrate and central metabolism were well preserved as expected. (C) In stark contrast to the number of reactions included, the network capabilities of iAB-AMØ-1410 are much reduced as compared with Recon 1. Large reductions of metabolic functions occurred in polysaccharide, cofactor, amino-acid, and nucleotide metabolism. This is due in part to removing key reactions from Recon 1, but more importantly to the exchange constraints of the model. 73 74

Recon 1 was characterized and validated by showing its potential in accom- plishing 288 metabolic functions with the proper exchange constraints. A similar assessment of the HM was performed using the same 288 metabolic functions of Recon 1 (see Supplementary information). Unlike the reaction count (Fig. 3.2b), the HM does not preserve the comprehensive metabolic functions of Recon 1 (Fig. 3.2c), as would be expected. Important metabolic functions dealing with carbohydrate and central metabolism are well preserved. However, several peripheral metabolic functions such as amino-acid, lipid, cofactor, and nucleotide metabolism were not preserved. There are two reasons for this. First, removing a few reactions from Recon 1 to build the HM does not significantly affect the reaction count, but it can have significant effects on functional pathways. Second, the 288 metabolic functions were originally characterized in Recon 1 using open exchange constraints. In this study, we ran all simulations using the macrophage exchange constraints, which can have substantive effects on output. For example, most metabolic functions in the miscellaneous category dealt with growth under different substrates. As there was no literature support for the macrophage to import some of these metabolites, the simulations were deemed as failures.

3.3.3 Infection in silico, the HPM: iAB-AMØ-1410-Mt-661

The alveolar macrophage network, iAB-AMØ-1410, was integrated with the M. tb network, iNJ661. The integration involved three steps: mathematical integration, setting interfacial constraints, and revision of the objective function (Fig. 3.3). Integration had a few preparatory steps. We renamed reactions and metabolites for proper compartmentalization, added transporter reactions that linked the alveolar macrophage cytosol with a new phagosome compartment, and modified iNJ661 to uptake metabolites from the phagosome rather than the extracellular compartment. M. tb blocks certain signaling pathways to hinder maturation of the phagosome into the phagolysosome [164]. Without the formation of the phagolysosome, M. tb is able to survive within the phagosome. After these preparatory steps, the two stoichiometric matrices were mathematically combined. In doing so, we created a modularized network with M. tb residing in the macrophage phagosome, 75 using resources from its host (Fig. 3.4a). The final model was named iAB-AMØ- 1410-Mt-661, which added the first letter of the genus and species name (Mt) of the pathogen and 661 for the number of open reading frames. 76

Figure 3.3: Workﬂow of building the host-pathogen model, iAB-AMØ- 1410-Mt-661. Integration involved combining the two stoichiometric matrices, re- compartmentalizing the metabolites and reactions, and creating a relevant phagosome environment through transport and sink reactions. In order to characterize the model under in vivo conditions, the biomass objective function was revised. Using gene expression data from infectious states in vitro and in vivo, the objective function was iteratively modiﬁed to match gene expression tests completed by sampling the solution space. The new objective function better represents the metabolic activity of the pathogen under in vivo conditions. 77

Upon integration, it was readily apparent that a proper infectious state could not be simulated unless interfacial constraints were set between the HM and PM, as the PM would take up any metabolite from the HM. Thus, we set interfacial exchange constraints based on literature. In the phagosome, M. tb is under hypoxic conditions [165] and oxidative stress from metabolites such as nitric oxide [147]. In addition, glucose is depleted and the pathogen is dependent on glycerol and fatty acids [166, 147] as a carbon source. Interestingly, the PM would not grow in silico unless there was a minimal oxygen uptake, consistent with functional hypoxia as opposed to anoxia. M. tb biomass production was compared with M. tb oxygen uptake showing a linear dependence on oxygen until an uptake of 0.129 mmol/h/g cell DW was reached, in which additional oxygen did not increase the biomass value. The uptake represented 13.2% of the in vitro uptake rate [155], which was used for simulations. The oxygen M. tb biomass curve is consistent with existing knowledge that reduced oxygen is associated with slowed M. tb growth and disease progression [167]. In addition, in vitro studies lowering partial pressure of oxygen below 1% of normal conditions have shown to fully halt M. tb replication [168]. A detailed listing of the exchange constraints of in vitro (iNJ661) and in vivo (iAB-AMØ-1410-Mt-661) networks for M. tb is provided in the Supplementary information. The third integration step involved revision of the iNJ661 objective function (biomass) to better represent an acute infection. The original biomass function for the PM was formulated to represent rapid growth in mid-log phase. In mid-log phase, the pathogen tries to accumulate biomass by dividing rapidly. However, under in vivo conditions, the pathogen survives in the phagosome’s harsh environment, resulting in an altered metabolic state. In order to build a new objective function, we used M. tb gene expression data derived from in vivo mouse model studies [169, 170] as well as infection simulated in vitro studies [171, 172, 173]. Though the in vivo expression profiling studies came from mouse and in vitro simulated models, the studies represent the best available transcriptional profiling of M. tb during an infectious state. There have been some noted differences, especially in nitric oxide production mechanisms between murine and human macrophages 78

[174], but the human alveolar macrophage is able to produce nitric oxide in a similar manner to murine macrophages [175]. The biomass reaction is a demand function that pulls resources from the metabolic model. Each metabolite in the biomass affects reaction activity and hence, the final solution space. We hypothesized that by using the known differential reaction activity determined by differentially expressed genes, an altered infection-specific solution space could be defined by tailoring the PM’s biomass objective function. Using randomized sampling, we characterized the solution space for each metabolic component in the biomass function, individually. Using linear regression in an iterative manner, we added and removed metabolic components of the biomass objective function to develop a new objective function (Table 3.1) with a solution space that best fit the gene expression data. Particularly, we removed phenolic glycolipids from the M. tb infection objective function. These computationally driven revisions were confirmed with established experimental data that showed that M. tb H37Rv clinical strains lack biosynthesis of phenolic glycolipids [176]. 79

Table 3.1: Modiﬁcations of the original iNJ661 biomass objective function to produce the ﬁnal infectious state Mycobacterium tuberculosis objective function Original Increased Reduced Removed RNA Amino Acids ATP maintenance Peptidoglycans Mycobactin DNA Phenolic glycolipids Mycocerosates Fatty Acids Mycolic Acids Phospholipids Sugars 80

3.3.4 Characterizing the metabolism of the HPM

The macrophage component of the integrated HPM is similar to iAB-AMØ- 1410, with an M. tb component, which is starkly different from iNJ661. The integration radically affected the major objective function fluxes of the macrophage and M. tb components (Fig. 3.4c). The maximum macrophage biomass was slightly reduced (˜12%), while maximum ATP, nitric oxide, and NADH production were significantly reduced (˜75, ˜55, and ˜70%, respectively). Such a large reduction in ATP, nitric oxide, and NADH production points toward a weakened or compro- mised alveolar macrophage. The M. tb biomass function was also greatly reduced. We then compared the size of the macrophage solution space in the HM and the HPM (Fig. 3.4c). There was a significant reduction in the size of the solution space (51% reduction of the mean reaction flux span). By adding the M. tb model to the alveolar macrophage reconstruction, we were able to substantially decrease the solution space without adding any additional constraints on the internal reactions of the macrophage portion of the network. Only two reactions had a larger flux span: the exchanges of sulfate and phosphate. The reactions with the largest flux span reduction (>70%) dealt with fatty acid, lipid, and proteoglycan metabolism as well as transport reactions. The M. tb network tightened the flexibility of these reactions because of its need for fatty acids and glycerol. 81

Figure 3.4: Schematic of the integration and results of the alveolar macrophage (iAB-AMØ-1410) and Mycobacterium tuberculosis (iNJ661) reconstructions. (A) Metabolic links between the extracellular space (e), alveolar macrophage (am), phagosome (ph), and Mycobacterium tuberculosis (mt) in iAB- AMØ-1410-Mt-661. The model is compartmentalized using the abbreviations as shown. The major carbon sources of the alveolar macrophage are glucose and glutamine. The macrophage is also aerobic and requires the essential amino acids. Albeit its use of oxygen, the macrophage exhibits anaerobic respiration and produces much lactate. The major carbon sources available for M. tb in the phagosome environment are glycerol and fatty acids. The phagosome environment is also functionally hypoxic. (B) The flux span of iAB-AMØ-1410-Mt-661 is significantly reduced (51%) compared with iAB-AMØ-1410. This shows a stricter definition of the alveolar macrophage solution space without adding additional constraints on the alveolar macrophage portion of the network. (C) Reaction, metabolite, and gene properties of the three reconstructions. Maximum production rates of ATP, nitric oxide, redox potential (NADH), and biomass are shown. 82

3.3.5 Changes in reaction activity of iAB-AMØ-1410-Mt- 661

Using randomized sampling, we identified the distribution of all feasible steady states for each reaction in iAB-AMØ-1410, iNJ661, and iAB-AMØ-1410- Mt-661. Statistically significant up- and down-regulated reactions in the integrated HPM were identified after in silico phagocytosis of M. tb. The majority of differences are seen in the M. tb portion of the integrated model, which was expected because of the changes in exchange constraints and revised objective function. However, there are also some differences in the alveolar macrophage portion of the network. The changes in flux states of the M. tb portion of the HPM show a shift in carbon uptake and overall usage. There are major shifts in flux states in central metabolism. Glycolysis is suppressed (Fig. 3.5, ENO) with the production of acetyl-CoA coming from fatty acids through the glyoxylate shunt (Fig. 3.5, ICL). In addition, glucose is generated from gluconeogenesis (Fig. 3.5, FBP). Beyond central metabolism, there are several changes. For example, there is an up-regulation of fatty acid oxidation pathways, as fatty acids become a major carbon source. In addition, there is a shift toward mycobactin and mycolic acid synthesis. Production of nucleotides, peptidoglycans, and phenolic glycolipids is also reduced. Thus, the model more accurately described an infectious state for M. tb. Overall, 75 reactions were up-regulated, while 128 reactions were suppressed. A full listing of reactions, associated genes, and their flux states are provided in the Supplementary information. 83

Figure 3.5: Topological map of the metabolites and reactions in the central metabolism of M. tb. The map shows predicted change of expression states when comparing in vivo (iAB-AMØ-1410-Mt-661) and in vitro (iNJ661) conditions. The expression states were determined from the change in the solution space by randomized sampling. Reactions that were up-regulated in iAB-AMØ-1410-Mt-661 are labeled in green, while down-regulated reactions are labeled in red. Pathways pertaining to glyoxylate metabolism were up-regulated while glycolysis is downregulated. The sampling results of three key reactions of central metabolism are shown. Isocitrate lyase (ICL) and fructose bisphosphatase (FBP) are up-regulated in iAB-AMØ-1410-Mt-661, while enolase (ENO) is down-regulated. It is important to note that the results are absolute values of the normalized values. The ﬂux state of enolase in iAB-AMØ-1410-Mt-661 is negative suggesting gluconeogenesis. 84

There are fewer changes in the alveolar macrophage portion of the integrated model. Only 8 reactions were up-regulated and 46 reactions were downregulated. There is a higher reliance on glycolysis as reported by a much greater activity through phosphoglycerate mutase and enolase. In addition, reactions in the pathways for nitric oxide production are up-regulated. However, the majority of the changes were down-regulations of ATP production, nucleotide synthesis, and amino-acid metabolism. The changes in ﬂux states of the alveolar macrophage during infection point toward a suppressed metabolic state that has a high reliance on glucose oxidation.

3.3.6 Gene Essentiality Predictions

Interestingly, by modeling the host and pathogen together, we were able to increase the accuracy of gene deletion tests. Single gene essentiality predictions were completed for the M. tb portion of the HPM and compared with the results from iNJ661. Model predictions were improved for essential as well as non-essential genes when compared with experimental data. In total, the PM predicts 188 essential metabolic genes, while the HPM predicts 162. Predictions from the two networks overlap with 153 similarly predictedd genes (see Supplementary information). The non-overlapping predictions are shown in (Table. 3.2) and (Table. 3.3). 85

Table 3.2: Genes predicted to be essential in iAB-AMØ-1410-Mt-661. aDenotes genes that were shown to be essential in other experimental publications. ID Gene Function/Reaction TraSH Support Rv0252 nirB Nitrite reductase No Rv0253 nirD Nitrite reductase - Rv0363c fba Fructose-bisphosphate aldolase - Rv0946c pgi Glucose-6-phosphate isomerase - Rv1099c Fructose-bisphosphatase Yes Rv1161 narG Nitrate reductase Noa Rv1162 narH Nitrate reductase Noa Rv1163 narJ Nitrate reductase Noa Rv1164 narI Nitrate reductase Noa Rv1475c acn Aconitase - Rv2391 sirA Sulﬁte reductase Yes 86

Table 3.3: Genes predicted to be unessential in iAB-AMØ-1410-Mt-661. ID Gene Function/Reaction TraSH Support Rv0334 rmlA Nucleotidyl transferase Rv0482 murB Aminosugars metabolism Yes Rv1011 ispE Terpenoid biosynthesis Rv1018c glmU Nucleotidyl transferase Yes Rv1086 - Geranyltranstransferase Yes Rv1302 rfe Peptidoglycan metabolism Yes Rv1315 murA UDP-N-acetylglucosamine 1- carboxyvinyltransferase Rv1338 murI Glutamate racemase No Rv1512 epiA Nucleotide-sugar epimerase Rv2136c - (Un)decaprenyl-diphosphatase Yes Rv2152c murC UDP-N-acetylmuramate-alanine ligase Rv2153c murG N-acetylglucosamine transferase Rv2155c murD UDP-N-acetylmuramoylalanine- d-glutamate ligase Rv2156c murX Phospho-N-acetylmuramoyl- pentappeptidetransferase Rv2157c murF UDP-N-acetylmuramoyl- tripeptide-d-alanyl-d-alanine ligase Rv2158c murE UDP-N-acetylmuramoylalanyl-d- glutamate-2,6-diaminopimelat E ligase Rv2163c pbpB Peptidoglycan subunit synthesis Rv2870c dxr 1-deoxy-d-xylulose 5-phosphate Yes reductoisomerase 87

Table 3.3: Genes predicted to be unessential in iAB-AMØ-1410-Mt-661, Continued. ID Gene Function/Reaction TraSH Support Rv2957 - Glycosyltransferase Yes Rv2958c - Glycosyltransferase Yes Rv2962c - Glycosyltransferase Yes Rv2981c ddlA d-alanine-d-alanine ligase No Rv3265c wbbL1 Rhamnosyltransferase Yes Rv3266c rmlD dTDP-6-deoxy-l-lyxo-4-hexulose Yes reductase Rv3398c idsA1 Dimethylallyltranstransferase Rv3423c alr Alanine racemase Rv3436c glmS Glutamine-fructose-6-phosphate transaminase Rv3465 rmlC dTDP-4-dehydrorhamnose 3,5- epimerase Rv3581c ispF 2-C-methyl-d-erythritol 2,4- cyclodiphosphate synthase Rv3582c ispD 2-C-methyl-d-erythritol 4- phosphate cytidylyltransferase Rv3792 - Arabinofuranosyl transferase Rv3793 embC Arabinofuranosyl extension Rv3794 embA Arabinofuranosyl extension No Rv3795 embB Arabinofuranosyl extension Rv3808c glfT Galactofuranose transferase Rv3809c glf UDP-galactopyranose mutase Yes Rv3818 - UDP-MurNAc hydroxylase 88

The changes in gene essentiality were compared with published in vivo experimental data [177]. Using Transposon Site Hybridization (TraSH), Sassetti and Rubin identified M. tb genes that are essential for survival in a mouse lung model. A total of 374 of 2972 M. tb genes assessed using TraSH overlap with the metabolic models used here. For the nine genes that were predicted as essential solely in the HPM, two are supported by the TraSH data, while five contradict it. Four of the contradictions are components of nitrate reductase (narGHIJ). Though the TraSH data does not support it, [178] has shown that nitrate reduction enhances M. tb survival under sudden stresses of hypoxia and nitric oxide, which M. tb is under during the infectious state. In addition, our gene essentiality results are further validated by in vitro M. tb tests [179] of the essentiality of sulfite reduction (sirA). Thirty-five genes are predicted to be essential in the PM, but non-essential in the HPM. Of the 35 genes, 12 are also non-essential in TraSH. Only two genes (murI, embA) conflict with the TraSH data. The majority of the genes deal with peptidoglycan metabolism and membrane metabolism. Such a change from in vitro conditions shows a shift from division and production of biomass to a more specific infectious state dealing with a harsh phagosome environment. Hence, genes that deal with membrane and cell wall components are less likely to affect the survival of M. tb in vivo. The changes in our new objective function (Table. 3.1) deduced by gene expression data is key for the non-essential predictions. The 35 genes are predicted as non-essential only in the HPM with the revised objective function for M. tb. Removal of peptidoglycan and phenolic glycolipid components of the objective function increased prediction accuracy with the HPM as evidenced with better correspondence with in vivo data (TraSH), which also further supported the condition-specific objective function reformulation.

3.3.7 Host-pathogen interactions in diﬀerent infectious states

M. tb is an insidious pathogen and can remain latent for many years before 89 actively infecting a host. Furthermore, its clinical presentation can be extremely varying and it can infect every organ system. Antibiotic treatment regimens are often tailored for different types of tissue infections, based on their ability to pen- etrate different tissues. In an analogous manner, infections in different tissues may involve activation of different metabolic pathways in the host and pathogen. Hence, depending on the nature of the infection and the type of tissue affected, different treatment regiments may be appropriate. We sought to determine the activity of metabolic pathways in both the macrophage and M. tb in order to better understand M. tb infections and predict potential drug targets. In this study, we analyzed differences between active infections in two different tissues as well as latent tuberculosis. An important application of genome-scale metabolic network reconstructions is to provide a biologically meaningful context for the analysis of large data sets, that is a “context for content” [150]. High-throughput data such as gene expression profiles can be mapped to the reconstructions and subsequently analyzed to evaluate functional consequences of changes in expressions levels of genes. We mapped expression data [180] of macrophages from different M. tb infectious states (latent, pulmonary, meningeal) onto the HPM (iAB-AMØ-1410-Mt-661) and built three infection-specific models: iAB-AMØ-1410-Mt-661L (HPM-L), iAB-AMØ- 1410-Mt-661P (HPM-P), and iAB-AMØ-1410-Mt-661M (HPM-M), which can then be compared based on the differentially active and inactive reactions in the three states in both the macrophage and M. tb determined by flux variability analysis (Fig. 3.6). A full detailing of the approach is provided in Materials and methods. 90

Figure 3.6: Reaction comparison of the three infection-specific models of iAB-AMØ-1410-Mt-661 for latent, pulmonary, and meningeal tuberculosis. (A) We used macrophage gene expression data from three types of M. tb infections to build context-specific models of iAB-AMØ-1410-Mt-661 for latent (L), pulmonary (P), and meningeal (M) tuberculosis. The models had significant differences in reactions for the macrophage and surprisingly M. tb. (B) There were many reactions with disparate activity between the three infections. We determined the subsystems of the reactions with differential activity. (C) The total number of disparate reactions was calculated. The left-most panel shows the macrophage reactions shown to be active in each state as calculated from the expression profile. This data was then used with the GIMME algorithm to build three flux-balance models using iAB-AMØ-1410-Mt-661. The middle diagram shows the active M. tb reactions in each model. The right-most diagram shows the active macrophage reactions in each model. 91 92

The majority of differences in active reactions were found in the macrophage (Fig. 3.6b and Fig. 3.6c). This is expected because of the expression profile data coming from the macrophage. We saw two main discrepancies in active macrophage reactions between the three different types of infections. First, hyaluronan synthesis and exchange was active only in the HPM-P. This suggests that hyaluronan is produced and secreted into the extracellular space. Hyaluronan is important for cell proliferation, especially for progression of some malignant tumors [181]. It seems that hyaluronan could have an important function as a carbon source for pulmonary M. tb. In fact, it has been shown that hyaluronan synthase is active and important for extracellular replication of M. tb [182]. As hyaluronan synthase is not critical for macrophage survival, inhibiting hyaluronan synthase (has123) could be a potential method for stopping activation of M. tb from a latent to active pulmonary state. Second, we saw activation of vitamin D and folate metabolism in the active infectious states (HPM-P and HPM-M). Vitamin D injections were once used as a pre-antibiotic era treatment for tuberculosis [183]. It is proposed that vitamin D is crucial for increasing nitric oxide production in the macrophage. In the active infectious states, the macrophage is readily fighting the infection and hence has active vitamin D metabolism. Folate is important as an enzyme cofactor for DNA synthesis and repair. It has been shown to be up-regulated during activation of macrophages for arthrosclerosis [184]. The HPM-P and HPM-M predict that this also occurs for active M. tb infection. Finally, we saw fewer but interesting changes in reaction activity of the M. tb portion of the three context-specific HPMs (Fig. 3.6b and Fig. 3.6c). Polyprenyl metabolism was only active in the HPM-L and de novo synthesis of nicotinamide cofactors was active in latent and meningeal M. tb infections. Polyprenyl metabolism is important for the cell wall and it makes sense that it is active in the latent state when M. tb builds up its cell wall. Both of these pathways have been mentioned as potential drug targets for tuberculosis [185, 186]. The lack of activity of the polyprenyl and nicotinamide pathways in the HPM-P and the polyprenyl pathway in the HPM-M suggests that such drug targets would not be as 93 effective in eradicating the pathogen in a pulmonary or meningeal infection. How- ever, the activity of these two pathways in the HPM-L suggests that such drug targets could be potentially viable for latent infections. Using high-throughput data from only one organism in the host-pathogen network, we were able to infer resulting changes in the other organism through simulations. This is critical for host-pathogen interactions in which high-throughput data cannot be properly at- tained for one of the organisms. A listing of all the reactions in each of the three infection-specific models is provided in the Supplementary information.

3.4 Discussion

There has been growing interest in genome-scale metabolic reconstructions and constraint-based analysis over the past 10 years. Development of a global human metabolic reconstruction, Recon 1, as well as reconstructions of multiple human pathogens enabled us to simulate how metabolism in the host and pathogen changes as a result of infection. In order to carry out this study, a manually curated human alveolar macrophage model of cellular metabolism, iAB-AMØ-1410, was created using Re- con 1. Purely algorithmic approaches were inadequate to achieve a physiologically relevant model, hence extensive manual curation was used. Flux predictions for ATP and NO production for the macrophage were within 5-10% of experimental measurements. In addition, the in silico model exhibited high glucose oxidation and lactate production, also seen experimentally. The HM was then integrated with an M. tb reconstruction, resulting in an HPM. We compared the two reconstructions to different experimental data sets for macrophages and M. tb to validate our models. Interestingly, integration of the pathogen with the host immediately shrunk the solution space without the imposition of additional internal constraints. As the objective function of M. tb in the infectious state is necessarily different than the biomass growth function, gene expression data was used to help define the metabolic objectives in the infectious state. These predictions were validated through comparison with in vivo gene 94 essentiality studies (TraSH). Among the observations of the HPM, it was noted that there was a higher reliance on glycerol and fatty acids in a hypoxic environment that relied on gluconeogenesis to produce sugars. The integrated model was reliant on the glyoxylate shunt. In addition, nucleotide synthesis was suppressed with increased mycolic acid synthesis. As with in vitro simulations, blocking nitrogen and sulfur reduction pathways were shown to significantly affect survival rates of the pathogen. Analysis of the HPM model also showed improved gene essentiality predictions compared with the PM alone. We used gene expression data to specify the context for analysis of three different types of M. tb infection: latent, pulmonary, and meningeal. There were key differences in both the metabolism of the macrophage and M. tb in the different infection types. iAB-AMØ-1410-Mt-661 predicted hyaluronan synthase in the macrophage to be a potential drug target for inhibiting pulmonary M. tb infection. In addition, we showed that mapping high-throughput data on one reconstruction of a host-pathogen network can elucidate physiologically relevant differences on the other reconstruction in the network. Polyprenyl metabolism and certain nicotinamide pathways were shown to be only active in latent and meningeal infections. Drug targets for these pathways have been discussed previously and should be only looked into for latent M. tb infections. A recent growing amount of evidence has shown M. tb to actively metabolize cholesterol [187, 188]. Interestingly, some reactions involved in cholesterol metabolism were highlighted by the model. As these pathways have not yet been fully characterized and were not in the current PM, the significance and interpretation of these findings were not clear at this point, but point to an area worth expanding the scope of the model and directing further future investigation. Metabolic network reconstructions have many different uses and can be viewed as platforms for data interpretation, analysis, and prediction. Metabolism is being increasingly recognized as important in the role of pathogenicity [189]. Thus, metabolic HPMs are essential for detailing the molecular interaction between human cells and pathogens because of the importance of metabolic changes in both 95 the human host cell and the pathogen. Host-pathogen reconstructions have several applications including providing a context for content analysis, biological discovery, and predictive power. Such reconstructions can serve as scaffolds for mapping high- throughput data onto the host or pathogen and then determining the changes in the other organism. In addition, discrepancies in experimental data and in silico predictions can help elucidate unknown interfacial interactions between the host and pathogen. For example, we showed that some oxygen uptake was necessary for M. tb infection, consistent with functional hypoxia, as opposed to strictly anaerobic versus aerobic conditional dependence. Finally, host-pathogen reconstructions take a step toward being a more biologically faithful platform for predicting potential drug targets as the network better represents in vivo conditions. For example, the models can be used to serve as platforms for understanding the host-pathogen interactions through dual perturbation experiments in silico [190] for screening responses to different drugs simultaneously with different growth conditions, which could then be complemented with in vitro or even ex vivo experiments. There are numerous opportunities for improvement and discovery in HPMs. The host and pathogen integration was performed in a modularized manner, thus allowing for replacement of the M. tb component with other bacterial pathogens that infect macrophages such as Salmonella typhimurium [191] or viral pathogens such as the human immunodeficiency virus. In addition, future revisions of the M. tb reconstruction that further expand the scope of the metabolic processes included in the current model, transcription/translation machinery, and/or Toll-like receptors [9] can be fitted within this framework. Another interesting application of the integration framework involves building bacterial community networks that show the interaction between more than two organisms in different environments, such as the gut. We hope that this successful demonstration of the integration of two genome-scale models will spur future computational explorations into multi-organism models to complement experiments and drive forward mechanistic understanding as well as generate new hypotheses. 96

3.5 Materials and methods

3.5.1 Construction of iAB-AMØ-1410

The reconstruction is derived from the global human metabolic network, Recon 1. Recon 1 consists of 1496 ORFs accounting for 3012 intracellular reactions and 2583 metabolites (Fig. 3.2a). Recon 1 was used as described in [79], with a few revisions. The originally described Complex IV reaction was replaced with three reactions. This was performed to ensure that the reaction was charge balanced, at the cost of decoupling superoxide production from the cytochrome c oxidase reaction (i.e. any coupling would have to be fixed by the modeler adjusting the relative constraints on the equations). In addition, the trans-mitochondrial phosphate transporter was re-written in the electroneutral form. The reaction for- mulas are explicitly shown in the Supplementary information. An overview of the model building process is shown in Fig. 3.1. Cell-specific gene expression data for healthy, inactivated alveolar macrophage was obtained from Gene Expression Omnibus. The specific gene expression study [192] had a healthy patient sample size of n = 11. Presence/absence calls were made for the gene expression data using the PANP software package in the R computing platform. The PANP algorithm used signal intensities to determine the presence of transcripts. Exchange constraints as well as upper bounds for fluxes of major pathways of central metabolism were set from literature [193, 160, 194, 195]. The expression data was then mapped to the reactions and using two separate algorithms, GIMME [28] and Shlomi-NBT-08 [27], we built two preliminary cell- specific models. The two preliminary models were reconciled with primary literature databases (BRENDA [196] and HPRD [197]), immunohistological staining data (Human Protein Atlas [198]), transporter databases (HMTD [199]), primary literature (see Supplementary information), and network stoichiometry to build a final, fully curated reconstruction. 97

3.5.2 Biomass Composition

The biomass was formulated similar to the methods described in previous eukaryotic reconstructions of Saccharomyces cerevisiae and Mus musculus [200, 157]. The biomass component percentages were determined from literature (protein/sugar [201], lipid [202], and DNA [203]) and are presented in the Supple- mentary information. The total composition of each biomass element with respect to the whole cell, except for RNA, was found in literature for macrophages. The total RNA content was adapted from the yeast and mouse reconstructions. All components were found in macrophages, but the RNA amount was adapted from the yeast and mouse reconstructions. DNA composition was set to 41% GC content. RNA and protein composition was determined by averaging the sequence composition of expressed transcripts from the gene expression data used to build the HM. Sugar was assumed to be stored solely as glycogen. Lipid components were adapted from studies on human alveolar macrophages [204]. ATP maintenance requirements were calculated similar to the method for the mouse reconstruction. All calculations for biomass formulation are provided in the Supplementary information. The biomass reaction was not used as the objective function. Instead, a lower bound was set as a maintenance requirement for the HM at physiologically relevant levels [205] for all computational tests. Objective functions used for the HM include ATP and nitric oxide production.

3.5.3 Integration of iAB-AMØ-1410 and iNJ661

The integration workﬂow is shown in Fig. 3.3. Before integrating iAB- AMØ-1410 and iNJ661, the reactions and metabolites were renamed to delineate between organisms and compartments. Once the reactions and metabolites were recompartmentalized, the stoichiometric matrix of the PM was added to the right- hand side of the HM. The external environment of the PM was renamed as the phagosome environment. Transporter reactions were added that connected the cy- toplasm of the HM with the phagosome environment for the exchange metabolites. The PM’s exchange reactions were set to zero, except for ferric ion, copper ion, and hydrogen. These metabolites are not present in the HM and the exchanges 98 were used as sinks because of the PM’s need. The transporter reactions provided metabolites known to be in the phagosome to the PM, including ions [206], nitric oxide, fatty acids, traces of glycerol, and oxygen (10% of iNJ661). A full list of metabolites available in the phagosome is provided in the Supplementary information.

3.5.4 Flux-balance analysis

The reconstructions were represented mathematically using the stoichiometric matrix (S). The matrix is m by n, containing m metabolites and n reactions. The relationship of reactions and the concentration time derivatives is as shown: dx = S · v (3.1) dt where x and v denote the concentrations of metabolites and ﬂuxes of reactions, respectively. Steady-state ﬂux-balance analysis is accomplished using the following linear optimization problem:

max(cT · v) subject to S · v = 0 (3.2) lb < v < ub

The optimization problem is constrained by thermodynamic and enzyme kinetics using upper and lower bounds (ub, lb) on the fluxes of reactions. The optimization vector (c) is a zero-vector except for a one assigned to the reaction or function to be optimized for. A full description of flux-balance analysis and its biological uses can be found in earlier publications [34]. The flux span is calculated by the difference of the maximization and minimization of the linear optimization problem above for all reactions in the models. Reactions that carried no flux and were perceived to be involved in thermodynamically infeasible loops were not considered when calculating the flux span. 99

3.5.5 Characterizing the macrophage’s metabolic functions

The global human metabolic network contained 288 metabolic functions. Sink reactions were added for the metabolite being produced in each metabolic function. Flux-balance analysis was completed using the SimPheny software platform and the pathways were detected by manual inspection of ﬂux maps. The original exchange constraints for iAB-AMØ-1410 were used for all simulations. A listing of all the simulations and results are provided in the Supplementary information.

3.5.6 Reaction activity tests

iAB-AMØ-1410, iNJ661, and iAB-AMØ-1410-Mt-661 were randomly sampled using a Monte Carlo method detailed below. The macrophage objective functions were maintained at the aforementioned physiologically relevant levels. The tuberculosis objectives were maintained at 75% of the maximum values. As substrate uptake rates for in vitro versus in vivo M. tb are significantly different, the sampling results required normalization to properly predict differential expression of metabolic pathways. We assumed that the cells will use a slightly different set of reactions for the two growth conditions. In addition, the total flux through all pathways will be constrained by the amount of enzyme, assuming there is adequate substrate. To represent this, we normalized the sample points to have the same median total network flux level by summing the absolute flux through all tuberculosis reactions for each sample point. All points are scaled by the median network flux. Differential reaction activity was determined under the assumption that if the distributions of candidate flux states, from sampling, for the same reaction under two different conditions do not significantly overlap, then the cell will have to adjust the gene expression to compensate for the complete shift in the solution space. For each tuberculosis reaction, a P-value was computed, comparing the distributions of normalized sample points of the two models. Significance of change 100 was determined at a false discovery rate of 0.05 [207]. The genes of the significantly changed reactions were determined from the gene-protein reaction association data and compared with published gene expression data.

3.5.7 Monte Carlo sampling

Monte Carlo sampling was used to generate a set of uniform, feasible flux distributions (points). The method is a modified version of the artificially centered hit and run (ACHR) algorithm [208]. Initially, a set of non-uniform pseudo-random points, called warmup points, is generated. In a series of iterations, each point is moved in a random manner to a new point, always remaining within the solution space. This is performed by (1) choosing a random direction, (2) computing the limits of how far one can travel in that direction and (3) choosing a new point randomly along this line. After many iterations, the set of points will be mixed and approach a uniform random sample of the solution space. Warmup points are generated by linear programming. For each point, the objective coefficients are set to a random vector with values in [-1,1]. This generates a point at random corners in the solution space. The direction of movement is chosen as described in [208]. The center point of all points is computed and the direction is the difference of a randomly selected point and the center point. This has the effect of biasing the directions in the longer directions of the solution space and speeds up the rate of mixing while maintaining sample uniformity. One of the problems with the ACHR is that the termination condition is not clearly defined. Here, we introduce the concept of the mixed fraction as a measure of how many iterations are required until proper mixing. A partition is created over the set of points by drawing a line at the median value with half the points on either side of the line. The mixed fraction is a count of how many points cross this line between the beginning of sampling and the end as a fraction of the total number of points. Initially, the mixed fraction is 1 as all points are on the same side of the partition. When perfect mixing is achieved, each point has a 50% chance of crossing the partition line so the mixed fraction will be close to 0.5. This value is approached asymptotically and a threshold of 0.53 was used for sampling. 101

3.5.8 Infectious state M. tb objective function formulation

The initial build of iAB-AMØ-1410-Mt-661 used the original tuberculosis biomass objective function of iNJ661. To build an objective function more specific to in vivo conditions of M. tb, the PM objective function was modified according to gene expression data. The original biomass was split into major metabolite components (Table. 3.1). Each component was randomly sampled to determine averages of reaction fluxes. In an iterative manner using linear regression, the components were either removed or weighted and sampling was redone. This process was continued till the sampling results from the specific objective function best fit the gene expression data.

3.5.9 Gene essentiality tests

Gene essentiality tests were completed by systematically setting all reaction ﬂuxes associated with a gene to an upper and lower bound of zero. Flux-balance analysis was then computed to determine the change, if any, of phenotypic state. Only tuberculosis genes were tested in the PM and the HPM. We set 0.2 of the maximal biomass as the threshold for essentiality.

3.5.10 Building M. tb infection speciﬁc models

We used gene expression profiling data from ex vivo macrophages infected with M. tb [180]. The study had three types of macrophages: derived from nurses with latent tuberculosis and treated patients of pulmonary and meningeal tuberculosis. The expression data was used to build three context-specific models using the PANP software package and GIMME algorithm. All reactions were deemed active in the models that were able to carry a non-zero steady-state flux. This was performed using flux variability analysis, in which the minimum and maximum fluxes are independently calculated for each reaction in an iterative manner [209]. 102

3.6 Acknowledgements

We thank Richard Que for helping in putting together the Supplemen- tary information. This work was supported in part by National Institutes of Health Grants GM068837 and Y1-AI-8401-01, and the National Science Foun- dation IGERT Plant Training Grant DGE-0504645. Chapter 3 in part is a reprint of the material Bordbar A, Lewis NE, Schel- lenberger J, Palsson BO, Jamshidi N. Insight into human alveolar macrophage and M. tuberculosis interactions via metabolic reconstructions. Mol Syst Biol. (2010) 6:422. The dissertation author was the primary author. Chapter 4

Model-driven multi-omic data analysis elucidates metabolic immunomodulators of macrophage activation

4.1 Abstract

Macrophages are central players in immune response, manifesting divergent phenotypes to control inﬂammation and innate immunity through release of cytokines and other signaling factors. Recently, the focus on metabolism has been reemphasized as critical signaling and regulatory pathways of human pathophysiology, ranging from cancer to aging, often converge on metabolic responses. Here, we used genome-scale modeling and multi-omics (transcriptomics, proteomics, and metabolomics) analysis to assess metabolic features that are critical for macrophage activation. We constructed a genome-scale metabolic network for the RAW 264.7 cell line to determine metabolic modulators of activation. Metabolites well-known to be associated with immunoactivation (glucose and arginine) and immunosuppression (tryptophan and vitamin D3) were among the most critical eﬀectors. Intracellular metabolic mechanisms were assessed, identifying a

103 104 suppressive role for de-novo nucleotide synthesis. Finally, underlying metabolic mechanisms of macrophage activation are identified by analyzing multi-omic data obtained from LPS-stimulated RAW cells in the context of our flux-based predictions. Our study demonstrates metabolism’s role in regulating activation may be greater than previously anticipated and elucidates underlying connections between activation and metabolic effectors.

4.2 Introduction

Macrophages have a key role in coordinating immune response in mammalian systems through the production of regulatory cytokines, proteases, and coagulation factors. In addition, they are critical for digesting pathogens, necrotic cellular debris, and other foreign objects that are encountered within the host. Growing evidence indicates that macrophages manifest divergent metabolic phenotypes to mediate a variety of alternative functions, including immunosuppression and tissue repair [210]. While metabolic status can greatly vary depending on macrophage type or stimulated state, the murine leukemic monocyte macrophage cell line RAW 264.7 has been an amenable in vitro model of macrophage and monocyte functions as it exhibits key characteristics representative of different macrophage types in vivo [211, 212, 213]. For that reason, the RAW cell line has served as a host model to experimentally probe macrophage phenotypes under various exposure conditions (e.g., endotoxins and cytokines) as well as during direct parasitic infection [214, 215] and proliferative inflammatory response states [216]. The phenotypic responses that macrophages display have been extensively studied and are generally categorized according to activation status [217]. While macrophage activation functions at the front line of host defense, improper control of its activation has also been implicated to have major roles in disease progression. For example, infiltration of activated macrophages in tissues has been strongly associated with a number of pathological disorders including diabetes, obesity, and renal injury [218, 219]. The recruitment of specific, activated macrophage sub- populations in the tumor microenvironment is widely known to promote chemore- 105 sistance by enabling cancer cells to effectively evade host immune responses [220]. Macrophage activation states are also modulated by parasitic organisms, which hi- jack host macrophage cells to promote their long-term, intracellular survival [221]. Macrophage activation is metabolically associated with the amino-acid arginine that diverges into classical (M1) and alternative (M2) pathways with the respective productions of: (1) nitric oxide (NO) for microbicidal purposes via NOS2 and (2) proline and polyamines for inducing local cell proliferation and collagen remodeling via arginase [210, 222]. These polarized functions are activated in response to bacterial and viral infections in the M1 phenotype, and to parasitic infection, tissue remodeling and angiogenesis in the M2 phenotype [210, 222]. Although there is a growing interest in understanding the interface between metabolism and immunity [223], little systems-based approaches have been utilized in elucidating metabolic mechanisms that are linked to macrophage activation to date. Molecular systems biology has arisen as a discipline to meet the challenges associated with the current era of high-throughput, data-rich biology. Genome- scale reconstructions provide mechanistic foundations for network-level modeling, biological discovery, and analyzing high-throughput data sets [150]. Metabolic networks bridge the gap between genomic and biochemical information and form a mechanistic context in which data sets can be incorporated to evaluate causal phenotypic relationships. This approach has been demonstrated in evaluating metabolic phenotypes for microbial and eukaryotic systems, ranging from industrial microbes to pathogens to human cells [79, 63, 155]. Recently, algorithmic approaches have leveraged genome-scale networks as a mechanistic scaffold for interpreting condition- and tissue-specific gene expression data [4, 224, 94]. In this study, we present a genome-scale metabolic reconstruction and analysis for the RAW 264.7 cell line to evaluate metabolite effectors and mechanisms associated with macrophage activation. Highly effective metabolites identified by our analysis are richly supported in the published literature for their immunomodulatory properties, in which several predicted metabolites have been previously experimentally verified. Mechanisms for activation and inhibition by predicted metabolic immunomodulators were investigated through Monte Carlo sampling 106 analysis. Finally, transcriptomic, proteomic, and metabolomic analysis of LPS- stimulated RAW cells demonstrates how model-based predictions enhance the mechanistic interpretation of high-throughput data to enable better understanding of macrophage metabolic activation phenotypes.

4.3 Results

4.3.1 RAW 264.7 metabolic network reproduces experimentally measured ﬂux rates

A RAW 264.7 metabolic network was reconstructed based on the global human metabolic network Recon 1 [79] by integrating gene expression and proteomic data with a Homologene-mapped metabolic network (see Materials and methods for workflow and details). The physiological capabilities of the network were evaluated using uptake rates derived from in-vitro data sets. Rates of biomass growth, ATP production, and NO synthesis were compared with experimental values. The maximum doubling time for the imposed in-vitro uptake rates was 16.99 h. Although the measured growth rate tends to vary for different experimental conditions, the calculated rate is consistent with the previously reported range of 11 h [225], 18-22 h [226], and 24.7 h [227]. For subsequent ATP and NO rate calculations, the lower bound on the biomass reaction was set to the lower experimental growth rate of 0.0281/h to mimic minimal maintenance of the macrophage. The maximum calculated ATP production rate was 0.796 mmol/h/g DW, similarly to the in-vitro rate of 0.712 mmol/h/g DW [160]. The NO production rate (via NOS2) was 0.0399, mmol/h/g DW, similarly to an in-vitro measured rate of 0.0365, mmol/h/g DW [161]. Overall, our results indicate that the metabolic network is predictive of physiologically relevant experimental rates when in-vitro uptake rates are imposed. To further evaluate the physiological accuracy of the cell-specific RAW 264.7 metabolic network compared with the larger global model from which it was derived, we calculated biomass growth, ATP production, and NO production rates for the Recon 1 network. We also performed sensitivity analyses 107 presented in the later sections. For both analyses, we found Recon 1 predictions on the quantitative production rates to be considerably less accurate compared with the RAW 264.7-specific network (see Supplementary information).

4.3.2 Deletion analysis diﬀerentiates critically essential reactions of M1 and M2 activation phenotypes

To metabolically characterize macrophage phenotypes, we defined five metabolic objective functions associated with general macrophage function and activation: (1) energy (ATP) generation, (2) redox maintenance (NADH), (3) NO production for M1 activation, (4) synthesis of extracellular matrix precursors (proline), and (5) induction of local cell proliferation through polyamines (i.e., putrescine) to define M2 activation [210]. As a highly mobile and active cell, macrophages require high glycolytic activity to generate ATP and NADH for essential functional purposes [160]. M1 and M2 activation phenotypes serve different functions in immune response, with the primary difference due to the differential upregulation of NOS2 and arginase [210, 222]. We first evaluated whether dis- tinguishing M1 and M2 phenotypic features can be demonstrated through the optimization of the respective M1- and M2-associated objective functions without directly imposing regulatory constraints on NOS2 and arginase reactions. A single reaction deletion analysis was performed while maximizing for M1 (NO production) and M2 activation (proline and putrescine productions) to evaluate reactions that are differentially critical, or essential, for M1 and M2 phenotypes. In general, the essential reactions between M1 and M2 functions were similar, as has been previously demonstrated by the overall similarities between M1 and M2 metabolic flux distributions [228]. This result is primarily due to the close network proximity of NO, proline, and polyamine production. Of the 76 metabolic subsystems in the RAW 264.7 network, only 4 had a large difference in response to reaction deletions: alanine and aspartate metabolism, arginine and proline metabolism, oxidative phosphorylation, and urea cycle/amino group metabolism (Fig. 4.1). In all, there were 37 reaction deletions that resulted in a large differential response (>10% difference from original objective function value) 108 between M1 and M2 activation (Fig. 4.1). As expected, reactions that were directly involved in the respective production of NO (NOS1 and NOS2), proline (G5SADrm), and putrescine (ORNDC) were critically essential as their deletion resulted in the inability to produce their respective products (i.e., value of 0). 109

Reaction Deletion Importance by Subsystem

Alanine and Aspartateat Arginine and Proline Oxidative Urea cycle Metabolism Metabolism Phosphorylation Metabolism ARGSS ARGSL ASPTA ASPTAmASPGLUm NOS2 NOS1 CYOR_u10m CYOOm3 ATPS4m GLU5Km G5SDym

M1 Activation M2 Activation

>30%0 >30%

Figure 4.1: Reaction deletion analysis differentiates metabolic differences observed for M1 and M2 activation. The difference between reaction essentiality for M1 and M2 activation is shown. In the top portion, the reactions are grouped by subsystem and rank ordered in terms of importance for M1 activation. Only a few subsystems were differentially important. Largely differential subsystems are shown in reaction detail. The reaction importance differences seen for oxidative phosphorylation and the shuttling of NADH equivalent reflects known metabolic flux variations seen in M1 and M2 activation. 110

Outside of the reactions that were directly involved in the production of NO, proline, and putrescine, the majority of differential reactions were involved in oxidative phosphorylation and the malate-aspartate shuttle. The deletion of oxidative phosphorylation reactions (CYOR u10, CYOOm3, and ATPS4m) reduced proline and putrescine production to 63-76% of its maximum capacity. The effect on NO production was far less detrimental at 88-96%, thus indicating oxidative phosphorylation to be more important for the M2 activation phenotype than for M1. Glycolysis-associated reaction deletions (GAPD, PGK, PGM, PYK and ENO) led to a reduced NO production of 73-74%, slightly lower than 75-78% for proline and putrescine. However, alanine and aspartate metabolism reactions (ASPTA, ASPGLUm) were significantly more important for NO production as shown for the reduction to 83-85%, compared with 93-96% for the M2-associated objective functions. The reaction deletion analysis indicates that the M1-activation phenotype has a slightly higher dependency on the shuttling of glycolysis-associated NADH equivalents from the cytosol to the mitochondria. However, the removal of oxidative phosphorylation reactions is more detrimental in M2 activation. These results are consistent with a recent study showing that M1-activated macrophages have a higher glycolytic capacity whereas M2-activated macrophages are more dependent on oxidative phosphorylation [228].

4.3.3 Sensitivity analysis identiﬁes immunomodulatory metabolites of macrophage activation

To determine important metabolic immunomodulators of macrophage activation, we performed a sensitivity analysis on the metabolite exchanges in the RAW macrophage network for each of the five activation-based objective functions. The analysis quantifies metabolite exchanges that are strongly associated with the maximization of each objective function by prioritizing the slope values calculated from robustness analysis [229] (Fig. 4.2). For example, NO production is expected to be highly sensitive to L-arginine uptake as L-arginine is its precursor metabolite. In general, sensitivity scores for NO, proline, and putrescine did not vary signif- 111 icantly as their respective productions are coupled with arginine fate and their synthesis pathways are proximal to one another, as previously discussed. These results are supportive of the similarities in metabolic activation flux patterns of M1 and M2 phenotypes measured in murine peritoneal macrophages [228]. 112

Figure 4.2: Network sensitivity analysis recapitulates literature- supported immunomodulatory metabolites. Five objective functions were evaluated for activating and suppressing metabolites based on magnitude and directionality of slope. Support from previously published experimental studies was enriched toward metabolites that were predicted to be most eﬀective. Metabolites with literature support and discussed in our analysis are denoted by (†). Metabo- lites denoted with (*) were excluded as those results are due to artifacts of the network. 113

ATP NADH NO Proline Ptrc o2† glucose† arginine-L† glutamine-L† histidine-L lysine-L valine-L† isoleucine-L† serine-L usrtsProducts glycine Substrates co2 leucine-L h2o pyruvate h+ L-cystine† oleic acid inositol choline tyrosine-L myristic acid nh4 methionine-L stearic acid phosphate phenylalanine-L† tryptophan-L† threonine-L * * *** urea† nh4 glutamate-L† h h2o cysteine-L hco3 mercplaccys acetate glycine betaine succinate creatine glycine pyruvate lactate-L lactate-D alanine-D alanine-L serine-L 2-hydroxybutyrate h2o2 acetone glutamine-L acetaldehyde oxalate tyrosine-L dihydroxy-L-phe o2s o2 so4 nitric oxide * inositol formate uracil taurine glucose thymine sphinganine 1-phos palmitoleic acid urate elaidic acid vaccenic acid uridine cytidine deoxycytidine deoxyuridine linoelaidic acid deoxyadenosine inosine DOPEG† choline deoxyinosine adenosine hyaluronic acid vitamin D3†

Suppression Activation 114

Several computationally predicted activating metabolic substrates are well established to be critical for macrophage activation (Fig. 4.2). Oxygen, glucose, and glutamine have all been previously shown to be of great importance in macrophage metabolism and activation as they are key for cellular respiration, energy production, and respiratory burst [230]. High consumption of glutamine is particularly important for enhancing immunity [160], as it is required for arginine biosynthesis and nitrite/urea production [231]. The uptake of arginine and the branched chain amino acids (BCAAs) valine and isoleucine were also identified to be important for ATP, NADH, proline, and putrescine production; however, it was not found to be critical for NO production. Arginine is among the most critical amino acid as it is a direct precursor metabolite for NO [232] and its transport increases up to five-fold during activation and proliferation [233]. Previous experiments verify these predictions as higher arginine and glucose concentrations of LPS- and LTA-activated macrophages increased NO production above levels seen without supplementation [160, 234]. Though BCAAs are not as well studied as arginine in macrophages, supplementation of BCAAs has shown to help improve immune response in long distance runners [235]. During strenuous exercise, plasma glutamine levels drop having a large effect on peripheral blood mononuclear cells (PBMCs). Bassit et al, (2002) showed that BCAA supplementation before strenuous exercise allowed PBMC to retain their ability to proliferate. L-cystine was also found to be selectively activating for NO production. While L-cystine was not predicted to be an effective activator or suppressor of M2 activation, it has been previously shown that LPS-activated NO production induces L-cystine uptake in murine macrophages [236]. Tryptophan and phenylalanine were highly ranked among substrates in which its catabolism had a suppressive effect on M1 and M2 metabolic activation phenotypes (Fig. 4.2). Most notably, indoleamine 2,3-dioxygenase (IDO), a key enzyme in tryptophan catabolism, is a widely known suppressive mechanism by which IDO-expressing monocytes and macrophages inhibit T-cell proliferation in mammals [237]. Tryptophan catabolites have been widely implicated in medi- ating immune tolerance [238], and the tryptophan catabolite 3-hydroxyanthranilic 115 acid has been directly shown to suppress NO synthase expression [239]. A similar catabolic mechanism has been proposed for phenylalanine, by which the immunosuppressive enzyme IL4I1 has been shown to inhibit T-cell proliferation in vitro via L-phenylanine oxidase [240]. It is important to note that the uptake of threonine had the largest inhibitory effect on activation objectives, but was not under consideration during analysis. While constructing the model, it was noted that the RAW 264.7 cell line lacked threonine degradation pathways, confirmed through low threonine dehydrogenase transcript levels. As threonine uptake increased in the sensitivity analysis, the amino acid was being shuttled only to biomass production that resulted in the diversion of precursor metabolites away from M1 and M2 phenotypes, thus appearing to be an immunosuppressive metabolite. Produced metabolites were also found to have profound effects on metabolic activation. Synthesis of most metabolites had a suppressive effect on M1- and M2- associated metabolic phenotypes (Fig. 4.2), as there is a metabolic ‘cost’ by diverting precursor metabolites and energy resources away from the arginine-derived activation pathways. Vitamin D3 production was found to be the most effective at lowering flux across all activation functions, a metabolite which has been previously implicated as a critical negative feedback mechanism to control activated macrophages through paracrine signaling [241]. A similar mechanism has been found to be employed by the immunosuppressant cyclosporine, which upregulates vitamin D3 synthesis via 1-α-hydroxylase [242]. The production of several nucleotides and deoxynucleotides was also indicated to be negatively associated with macrophage activation as 8 of the top 13 inhibiting metabolites involved in nucleotide metabolism. In particular, adenosine is known to be released in tissues during injurious stimuli to mitigate pro-inflammatory responses across various immune cell types [243, 244]. Interestingly, the supply of purines and pyrimidines has previously been found to be vital for intracellular pathogens to proliferate within macrophages, as many pathogens have auxotrophies for nucleotides [245]. The complementary nature of a metabolite that causes macrophage inactivation due to its production and shows importance for pathogenic growth was also found for hyaluronan. Hyaluronan production was one of the most suppressive of 116 macrophage activation and has been previously been shown to be a critical substrate that enhances Mycobacterium tuberculosis proliferation [182]. In addition, our previous study indicated that hyaluronan synthase is active only in pulmonary M. tuberculosis infections versus latent infections [4]. Finally, DOPEG production that occurs via monoamine oxidase (MAO)-mediated norepinephrine degradation was also suppressive of macrophage activation. Our network finding of the suppressive properties of DOPEG production is supported by a recent study that demonstrated metabolic inactivation of catecholamines occurs via upregulated MAO in LPS-stimulated macrophages to regulate intensity of inflammatory injury [246]. In general, metabolite production is metabolically taxing to macrophage activation phenotypes (Fig. 4.2); however, it is sometimes more efficient to secrete nitrogen by-products, such as for urea, ammonia, and glutamate, the only metabolites through which its production had a positive effect on activation. In particular, urea and glutamate production arises from the respective use of arginine and glutamine. It has been experimentally shown that urea supplementation in the media of RAW 264.7 cells inhibits NO production [247]. Prabhakar et al. noted that the mechanism for NO production inhibition is not related to transcription, as inducible NO synthase mRNA levels are the same in control and urea spiked samples. However, it is possible that the addition of urea in the media perturbs homeostatic cycling of arginine, thereby inhibiting NO production. In addition, the inhibition of glutaminase, an enzyme that produces glutamate from glutamine, is known to mediate activated immune response in macrophages [248]. As demonstrated later, proteomic data from LPS-activated RAW 264.7 cells indicate increased glutaminase activity during macrophage activation. Overall, our results demonstrate that metabolite effectors with the highest sensitivity scores across macrophage functions and activation phenotypes are widely known to have an immunomodulatory role. Supplementation of cell culture media with glucose and arginine has been previously shown to increase nitrite production. Predicted suppressive modulators have also been previously confirmed experimentally through signaling regulatory mechanisms. The high correlation between our findings and experimental literature is of particular interest 117 as our network approach does not explicitly account for signaling or regulation of metabolism. However, it is highly possible that the converging changes in metabolism during macrophage activation and suppression, as mediated by regulatory and signaling factors, may in fact have an important role in directing macrophage phenotypes.

4.3.4 Sampling analysis identiﬁes potential intracellular mechanisms linked to immunomodulatory metabolites

The results of the sensitivity analysis demonstrated that predicted metabolites were supported by previous experimental studies on metabolic effectors. To understand the underlying metabolic pathways that are associated with the metabolic effectors, Monte Carlo sampling analysis was used to identify intracellular reaction changes linked to metabolites (i.e., metabolite uptake and production) with suppressive properties. In particular, we studied the most effective inhibitor (tryptophan) and the most demanding group of produced metabolites (nucleotides and deoxynucleotides). In total, 90 different conditions were sampled to determine the perturbed metabolic pathways and reactions of 9 metabolic exchanges on the 5 activating objectives, during uptake or synthesis compared with those without such activities. First, we evaluated the suppressive effects of tryptophan uptake. Our sampling analysis indicates that the macrophage enters a ketogenic-like state during increased tryptophan uptake (Fig. 4.3). Increased tryptophan catabolism (i.e., IDO induction) results in an increase in the production of reduced glutathione and the degradation of two ketogenic amino acids, leucine and lysine, as shown by increased activity of hydroxymethylglutaryl-CoA lyase, as well as decreased pyruvate dehydrogenase and increased pyruvate carboxylase activity, a switch characteristic of a ketogenic state [249]. Interestingly, rats that are fed a ketogenic diet result in elevated levels of reduced glutathione [250], and fasting individuals in ketosis are known to have decreased inflammation [251]. 118

Trp Uptake Lys Nucleotide Production Perturbations Perturbations Pyr

PDH 2OXOADOX Gln CBP PC Nucleotides ACoA Leu NO, Proline, HMGL LEUTA Polyamines PRPPPR CS Glc Oaa Cit ATPAT P

Increased Flux Activity Decreased Flux Activity

Figure 4.3: Randomized sampling elucidates intracellular mechanisms for observed macrophage activation and suppression. Tryptophan induces a shift to a ketogenic-like state, increasing metabolic usage of leucine and lysine. To balance the redox potential shift, there is a signiﬁcantly greater use of the malate-aspartate shuttle, diverting glutamate from activation pathways. In addition, increased nucleotide synthesis shifts metabolic resources toward nucleotide intermediates PRPP and CRP. PRPP and CRP are produced from glutamine and glucose, respectively, diverting metabolic resources from nitric oxide, proline, putrescine, and ATP generation. 119

Due to the change in redox potential, there is a cellular need to shuttle NAD+ and NADH across the mitochondrial membrane. Sampling analysis demonstrated a statistically significant increase in use of the malate-aspartate shuttle. The shuttle is dependent on glutamate, which in-silico analysis shows is derived from glutamine. Thus, glutamine is diverted from activation-associated metabolites (NO, proline, and putrescine) to balance the redox potential across the mitochondrial membrane. In the M1- and M2-reaction deletion analysis, we noticed a greater importance of the shuttle for M1 activation. With sampling, we saw the same result as the basal level of the malate-aspartate shuttle in M1 activation was greater than in M2 activation. However, forced tryptophan uptake significantly increased shuttle activity for both M1 and M2 activation. Next, we evaluated eight nucleotides and deoxynucleotides whose synthesis was predicted to be especially taxing on activation properties. Intracellular changes induced by nucleotide production are drastically different than those for tryptophan uptake (Fig. 4.3). As expected, de-novo purine and pyrimidine synthesis pathways are activated, including reactions pertaining to synthesis of IMP, UMP, and deoxynucleotides. Purine and pyrimidine synthesis requires 5-Phosphoribosyl diphosphate (PRPP) and carbamoyl phosphate (CBP), respectively. Sampling indicated that PRPP production via pentose phosphate pathway was increased resulting in suppression of glycolysis, pyruvate metabolism, and overall ATP production. There was also an increase in CBP synthesis from glutamine via CBP synthase; draining the availability of glutamine toward the arginine-derived synthesis of NO, proline, and putrescine. We detected different inhibitory mechanisms for tryptophan uptake and nucleotide production on macrophage activation. However, the mechanisms often converged on glutamine and overall energy production, thus implicating intracellular metabolic shifts in glutamine utilization as a potential critical junction for immunomodulatory mechanisms. To further study and confirm the proposed mechanisms, we generated multiple levels of high-throughput data sets from LPS- activated RAW 264.7 cells. 120

4.3.5 Predicted metabolic activation phenotypes enhance mechanistic interpretation of experimental data for activated RAW 264.7 cells

Our network sensitivity and flux sampling analyses defined potential metabolic mechanisms associated with macrophage activation and inhibition. In particular, our results indicate that macrophage activation may be dependent on the flux divergence from glutamine/glutamate as a critical junction (Fig. 4.4a). To experimentally confirm our predictions, we stimulated RAW 264.7 cells with LPS (derived from Salmonella typhimurium), a TLR4 agonist associated with M1 activation, to assess whether these mechanisms were associated with similar changes at the transcript, protein (2, 4, and 24 h after stimulation), and metabolite levels (24 h after stimulation) following macrophage activation. 121

MP Trans A B Nos2 argsuc ARGSL Activation NO, Pro, Ptrc Pycr1 Urea Cycle arg-L Pycr2* Production Oat* NOS1 Odc1 ARGSS Arg2 ASNS1 NOS2 Arg Met nwharg Ass1 Acadsb urea Val/Ile Met asn-L asp-L Aldh6a1 no ARGN Urea/Glu Met Slc14a2 citr-L Gls* Prodh ametam Pro/Ptrc Degradation OCBT Polyamine Metabolism Mtap orn SPMS Trp Metabolism Hadh ORNDC 5mta Ada ptrc Adsl ORNTA* SPRMS spmd Adss MTAP 5mdr1p Adssl1 ametam sprm Ak3 Atic GLUN*GLU gln-L glu-L Cad Ctps CBPS Dck OMPDC Nucleotide Dut from prpp UMP Synthesis Metabolism Gart GLUPRT PPP cbp orot5p ump Gmps Nt5c2 IMP Synthesis Paics PRAGSr GARFT PRFGS PRAIS Pfas Ppat Nucleotide Nucleotide Mechanism Rrm1 pram gar fgam fpram air imp Rrm2 ASPTA Tk1 Malate-Aspartate Umps Pgd MDH Shuttle asp-L mal-L Prps1 akg glu-L oaa Pentose Phosphate Prps2 Pathway Rbks AKGMALtm Taldo1 ASPGLUm Tkt Tktl1 akg[m] glu-L[m] oaa[m] Redox Slc25a13 asp-L[m] mal-L[m] Metabolism Gpx1

Trp Mechanism Trp MDHm Log2 Fold Change ASPTAm Suppression Reporter Metabolites < -3 0 > 3

Figure 4.4: High-throughput data support in-silico predictions. (A) Re- porter metabolites provide a global analysis of the expression data. Major changes pertained to predicted pathways of activation and suppression. Green nodes are scaled by degree of enrichment. Circled metabolites in red and blue represent signiﬁcantly changed metabolites detected by GC-MS. (B) Directionality of in- silico predictions was in high accordance with the transcriptional and proteomic response of LPS-stimulated cells. Pycr2, Oat, and Gls expression contradicted model predictions, but the proteomics data conﬁrmed the predictions. Only 24 h transcriptomics data are shown due to sparsity of proteomic data. MP - Model Prediction, metabolite, and reaction abbreviations are provided in Supplementary information. 122

First, we performed a global, unbiased assessment of the primary metabolic changes during activation by calculating reporter metabolites based on the gene expression data in accordance with a published method [49]. Reporter metabolites are nodes that are statistically enriched with transcriptional changes in the metabolic network and represent key regions that are significantly perturbed. A total of 105 statistically significant reporter metabolites (P<0.05) were identified for LPS-stimulated RAW cells. Reporter metabolites for LPS-stimulated macrophages at 24 h are shown (Fig. 4.4a). Among the top 20 reporter metabolites, 12 are related to the predicted activating and suppressive mechanisms previously discussed: urea cycle, NO and putrescine production, and nucleotide biosynthesis. Thirteen extracellular reporter metabolites were also identified, eight of which were predicted to be important metabolic effectors through sensitivity analysis (including glutamine, arginine, glucose, and phenylalanine). Hence, the reporter metabolite results provide an unbiased confirmation that major transcriptional changes in metabolism during macrophage activation are highly correlated with the model- predicted regions of arginine and urea metabolism as well as nucleotide biosynthesis. We then evaluated the directional changes of reactions involved with the most highly perturbed reporter metabolites (Fig. 4.4b). We compared sampling predictions during activation and suppression with the associated genes and proteins in the transcriptomic and proteomic data. We found a highly significant (P<1e-5 through a permutation test) correlation between model predictions and high-throughput data (Fig. 4.4b; see Supplementary information). As expected, LPS-stimulated cells showed significantly upregulated (P<0.05) genes and proteins involved with NO, proline, and putrescine production. In addition, metabolic pathways associated with proline and putrescine degradation were suppressed, suggesting accumulation of the characteristic M2-associated metabolites. Gene expression data also showed upregulated arginase (Arg2) and urea transporter (Slc14a2), as well as several genes (Acadsb, Aldh6a1) associated with BCAA degradation. In contrast with our predictions, we found that glutaminase (Gls) and ornithine aminotransferase (Oat) were significantly downregulated in the expression data. 123

However, at the protein level, both Gls and Oat were upregulated and a previous study showed a respective 150% and 40% increase in Gls and Oat enzyme activity in LPS-activated versus resident macrophages [160]. These experimental results provide further support that arginine and BCAA metabolism as well as urea and glutamate production are highly active during macrophage activation. Conversely, metabolic genes and proteins involved in the predicted suppressive malate-aspartate shuttle and nucleotide synthesis were found to be significantly downregulated during LPS stimulation. The pentose phosphate pathway and nucleotide synthesis, including PRPP and CBP utilizing genes (Ppat and Cad), were significantly downregulated (Fig. 4.4b). We also observed a gene involved with the malate-aspartate shuttle (Slc25a13) to be significantly downregulated. Finally, we profiled 700 metabolites using GC-MS to determine altered metabolite levels during LPS activation. In all, 51 metabolites that were detected in both control and activated conditions had significant changes (P<0.05). In particular, we detected a significant increase in putrescine (ptrc), in accordance with transcription and proteomic data that suggested its accumulation (Fig. 4.4a). Downstream metabolites spermine (sprm) and spermidine (spmd) were decreased, although an increase in 5’-deoxy-5’-(methylthio)adenosine (5mta), a by-product of spermine and spermidine synthesis, was indicated. Most notably, 11 metabolites associated with the model-predicted metabolic suppressors were significantly decreased: nucleotide metabolism (adn, amp, gmp, hxan, ins, ump), pentose phosphate pathway (r5p), malate-aspartate shuttle (asp-L, glu-L, mal-L), and tryptophan catabolism (kynate). Though metabolite level changes are not directly linked to predicted flux changes, these results are highly supportive of a systemic shift of metabolic resources away from nucleotide metabolism and other predicted suppressive pathways to activation-associated metabolic functions. To further verify the observed divergence of nucleotide synthesis and activation functions, we re-evaluated time-course quantitative proteomic data of RAW 264.7 cells infected with Salmonella typhimurium from a previous study [252]. Based on time-course proteomic data, seven detected proteins were associated with NO and proline production (Nos2 and Pycr2), nucleotide biosynthesis 124

(Atic and Impdh2), and pentose phosphate pathway (G6pdx, Pgls, and Tkt). We found a similar trend between the LPS-stimulated and infection results (Supple- mentary Figure S3). In addition, the protein abundance changes followed a clear time-course trend, with the accumulation of activation proteins and a continual decrease in proteins associated with suppressive nucleotide pathways and pentose phosphate pathway during infection. In summary, the intracellular metabolic signature predicted for macrophage activation from the tailored genome-scale metabolic network to the RAW cell line was significantly correlated with and was highly supported by three levels of high- throughput data sets. Our experimental results demonstrate that the regulation of glutamine/glutamate towards de-novo nucleotide synthesis and malate-aspartate shuttle may be an effective mechanism for modulating macrophage metabolic activation as the LPS-stimulated macrophage significantly downregulated those pathways during activation. In addition, we re-analyzed time-course proteomic data of infected RAW 264.7 cells, finding a similar trend as the LPS results. A comparison of our model-predicted changes with the full time-course transcriptomics and proteomics of LPS-stimulated macrophages as well as a list of significantly altered metabolites are available in Supplementary information.

4.4 Discussion

It is becoming clear that the two common denominators in many patho- physiological states are metabolism and inflammation [223]. Research areas, particularly in the cancer field which previously focused heavily on regulation and signaling controls, are now recognizing metabolic reprogramming as a predominant characteristic during disease progression. For example, well-established tumor suppressors that regulate cancer cell proliferation, such as p53, have been shown to converge on the control of central metabolism [253](Cairns et al, 2011). In this study, we present a case that metabolism similarly has a crucial role in the control of macrophage activation and, consequently, immune response. Notably, our systematic network analysis of macrophage metabolism highlights metabolic effec- 125 tors that were previously established in the experimental literature and proposes potential mechanisms that are critically linked to activation. In recent years, high-throughput technologies have emphasized the need to develop integrative data analysis tools with improved biological relevance [137]. While most data-driven analysis approaches are purely statistical in nature, metabolic network-based approaches have demonstrated highly coherent depen- dencies at the gene expression level that are functionally correlated based on flux associations [74, 67]. Our study further supports the notion that predicted flux- based associations provide a biologically accurate context for elucidating metabolic mechanisms from multiple layers of high-throughput data. By coupling model- based predictions with transcriptomic, proteomic, and metabolomic analyses on LPS-stimulated RAW 264.7 cells, we confirmed a consistent trend by which glutamine is a critical junction for activation and a possible contending link between nucleotide synthesis and macrophage activation via pentose phosphate pathway and CBP synthase exists. Our re-analysis of previously published proteomic data from Salmonella-infected RAW 264.7 cells demonstrated a similar divergent trend, indicating that metabolic network-based predictions can enhance the mechanistic interpretation of omics data than previously possible. With a growing interest in metabolism as therapeutic targets [254], the results of our study indicate metabolic network-based approaches can be used to delineate metabolic mechanisms as immunotherapeutic targets. In particular, our study provides the framework for identifying metabolic signatures of macrophage activation that serve as potential therapeutic strategies during infectious states. Interestingly, our previous work on differential metabolism during M. tuberculosis infection in human alveolar macrophages showed that nucleotide synthesis, pentose phosphate pathway, hyaluronan synthase, and Vitamin D3 metabolism were solely active in macrophages with pulmonary and meningeal tuberculosis infections versus latent infections [4]. Several macrophage-produced metabolites that suppressed activation phenotypes, particularly nucleotides and hyaluronan, have also been shown to be important substrates for intracellular pathogen replication and survival [245, 182]. Our current analysis suggests a complementary nature 126 between pathogen auxotrophies and the mitigation of macrophage activation due to the synthesis of metabolites required for pathogen replication. These interactions are indicative of potential virulence mechanisms by intracellular pathogens that enable suppression of macrophage activation and serve as possible therapeutic strategies to treat persistent infectious states. Regulation of immune cells has been traditionally explored from a signaling perspective, but recent studies have suggested that metabolic factors may influence the activities of immune cell responses [255, 256, 223, 257]. In particular, macrophage recruitment and activation has drawn considerable interest from the scientific community for its interfacing role between metabolism and immunity [258, 222]. While the relationship between metabolism and immunity has been historically explored from a nutritional perspective [222], recent studies have demonstrated and characterized immune cell-secreted metabolites to be signaling modulators. For example, macrophage production of both Vitamin D3 and DOPEG have been shown to have potent immunosuppressive effects as a result of paracrine regulatory control to mitigate pro-inflammatory response [241, 246]. While these regulatory mechanisms are well established, our analysis indicates alternative con- tributing factors that may be imposed by the complementing metabolic capabilities of effector immune cells. The ability to recapitulate most known metabolite effectors with our metabolic network analysis strongly suggests an underlying com- plementation in how metabolite signals are biochemically processed (i.e., degraded or synthesized) and its eventual immunomodulatory role (i.e., activating or suppressing). Hence, our study suggests metabolism as a potential remote sensor that mediates the activation status of macrophages through its metabolic capacities. Taken together, our results lend support to the notion that understanding reci- procity between metabolism and its regulatory signaling state may be critical to decoding mechanisms in immunity [222] and other biological systems [259], thus es- tablishing potential application of metabolic systems biology toward the emerging field of immunometabolism. 127

4.5 Materials and methods

4.5.1 Tailoring the global human metabolic network to RAW 264.7

A full workflow for generating the reconstruction is provided in Supplemen- tary information. The NCBI HomoloGene database was used to replace Recon 1 genes with murine Entrez Gene IDs. Expression data for untreated RAW 264.7 cells was obtained [260]. Present and absent calls were made according to a probability distribution approximated for the log2-transformed, averaged values using a Gaussian mixture model [224]. The processed data was integrated with reactions of the mouse metabolic network according to the gene-protein-reaction associations. The biomass function was adopted from Bordbar et al (2010) as the data used to construct it came mostly from mice. Two established algorithms that integrate high-throughput data with metabolic networks [28, 27] were used to construct the cell-specific networks. We also employed a new algorithm that improves upon the original GIMME algorithm through the integration of proteome-based objective functions called Gene Inactivity Moderated by Metabolism and Expression by Proteome (GIMMEp). The GIMME algorithm [28] was used to construct the RAW 264.7 cell line model through an integrated transcripto-proteomic approach. The original algorithm builds context-specific models by utilizing expression data to minimally activate a set of reactions that are necessary to fulfill a single user-defined reaction objective. This can be problematic given that there is no clear single objective in mammalian cells and tissues. To build the RAW 264.7-specific model, the GIM- MEp algorithm integrated published proteomic data of untreated RAW 264.7 cells [252] as objective function cues that are evaluated separately with the original GIMME algorithm in an interative fashion. Hence, separate subnetworks are determined for each proteome-associated reaction objectives that are active on the basis of satisfying each objective function in turn. The subnetworks are then combined to create the finalized GIMMEp-based model. iMat [27] was also used to predict a flux distribution most consistent with the omics data. iMat was used as it 128 is not biased by a particular objective. A macrophage-specific reaction subnetwork was derived from the iMat analysis. The automated draft reconstruction process yielded three algorithmically tailored models (GIMME, iMAT, and GIMMEp) that were compared by their ability to perform the published 288 metabolic functions of Recon 1. Differences in metabolic functional capability were used as a classification system to target manual curation efforts in the literature. The final reconciled model was used in the study.

4.5.2 Functional characterization using ﬂux balance analysis

A genome-scale reconstruction is converted into a mathematical format by representing the reactions and metabolites in a stoichiometric matrix (S). The rows of the matrix represent the metabolites in the network, while the columns represent the reactions. Flux balance analysis (FBA) is used to characterize the system. FBA is a linear programming-based mathematical operation to calculate the maximum ﬂux through a particular reaction objective under mass balance (S·v=0) and thermodynamic/capacity constraints (lb,ub) (Equation 4.1) without the need for kinetic parameters. The objective vector (c) is a zero vector with a value of 1 corresponding to the reaction that is being maximized. An FBA primer is now available [15].

max(cT · v) subject to S · v = 0 (4.1) lb < v < ub

Uptake rates for major carbon sources (glucose, glutamine, pyruvate, and fatty acids) and oxygen were set from murine macrophage literature [193, 194, 195, 160]. Minimal supplementation of essential amino acids and choline of 0.1 mmol/cell gDW/h was required to generate biomass. In addition, sodium, chlorine, bicar- bonate, and ammonia were allowed to enter the network freely. 129

4.5.3 Reaction deletion analysis

Deletion analysis was completed by iteratively removing a reaction from the network and determining the maximum value of the three objective functions associated with M1 and M2 activation. The two objective functions for M2 activation (proline and putrescine generation) were averaged and compared with M1 activation (NO production). Large differences (>10% difference from the original objective value) in reaction deletion for M1 and M2 were analyzed. Within the two M2 activation functions, some reactions had large differences (>10%) and were ignored in this analysis as they are not characteristic of M2 activation but rather specifically of proline or putrescine generation. There were only six such reactions.

4.5.4 Sensitivity and sampling analysis of metabolite exchanges

For each exchanged metabolite and objective function, sensitivity analysis was completed by iteratively fixing the exchange flux at 20 different flux values ranging from the minimum to maximum allowable flux and calculating the maximum objective value. To compare the effects on each of the objectives, we calculated the slope of the sensitivity curves. The sensitivity of metabolites to the objective functions was typically monotonic. However, 8 of the 355 tests were not monotonic (L-cysteine production on ATP, L-glutamate production on NADH, l-cystine uptake on NOS, NH4 uptake on PTRC, and NH4 production on ATP, NADH, NOS, PRO functions) where a change in direction of the function occurs at large non-physiological exchange values (>0.1 mmol/h/g cell DW). For these cases, the sensitivity slope was calculated for the monotonic region closer to an exchange of 0. Though threonine was predicted to be the most effective suppres- sant, the result is due to an artifact of the network analysis and those results were removed. Randomized Monte Carlo sampling of the solution space constrained by metabolite exchanges and objective function value was used to identify reaction activity changes pertaining to the different phenotypes calculated from sensitivity 130 analysis. Sampling was performed for both the first and last points by fixing the exchange flux and the objective, thus alternate solutions were inherently accounted for. The sampling procedure and differential reaction activity detection was done similarly to a previous study [4]. Reactions that were differential in activity across a minimum of three objectives were only considered to filter out objective biased changes.

4.5.5 Quantitative analysis of stimulated RAW 264.7 macrophages

The RAW 264.7 (ATTC) cell line was stimulated for 0, 2, 4, and 24 h with LPS (from Salmonella enterica Serovar Typhimurium). Treated cells were washed twice with Dulbecco’s PBS and harvested for high-throughput analyses. Quanti- tative proteomics was done with accurate mass and time (AMT) tag. After digesting the proteins with trypsin, peptides were analyzed by liquid chromatography- tandem mass spectrometry on an LTQ XL (Thermo Scientific) (to build the mass tag (MT database) or an LTQ Orbitrap XL (Thermo Scientific) (for the quantitative measurements) mass spectrometer. Tandem mass spectra were searched with SEQUEST [261] against mouse IPI database and the reliable identifications were used to build the MT database, which was further matched to the orbitrap data using the VIPER tool [262]. Abundance values were processed and submitted to statistical test using DAnTE [263]. Labeled cDNA was prepared as described [264]. A mixture Cy3-labeled control cDNA and Cy5-labeled were hybridized to Agilent Mouse GE 4 x 44K v2 Microarray (Agilent Technologies) and processed. Image analysis and intra-chip normalization were performed with Feature Extraction 9.5.3.1 (Agilent). Data were analyzed with MeV (tm4.org) or with custom python scripts. For metabolomics, cell suspensions were centrifuged. Ammonium bicar- bonate was added to the cell pellet and metabolites were extracted with a chlo- roform/methanol mixture. Extracted metabolites were successively derivatized by two chemical reagents to enhance stability and volatility for gas chromatography- mass spectrometry (GC-MS) analysis [265]. Samples were analyzed in biological 131 triplicates and technical duplicates in a GC-MS system (Agilent GC 7890A coupled with a single quadrupole MSD 5975C) connected to HP-5 MS column (30 m 0.25 mm 0.25 µm; Agilent). Omics data are available to the larger scientific community. The Sys- BEP.org project website provides links to each of the disseminated materials with the transcriptomics data disseminated via GEO [266] under the accession number GSE35237 and proteomics data via (http://omics.pnl.gov; [267]). More detailed methods on experimental conditions and data generation are provided in Supple- mentary information.

4.6 Acknowledgements

We would like to thank Nathan E Lewis for helping us with the figures. This work was supported by the US National Institute of Allergy and Infectious Diseases agreement Y1-AI-8401-01 and the National Institutes of Health Grant GM068837. Proteomics analyses were performed in the Environmental Molecular Sciences Laboratory, a US DOE BER national scientific user facility at Pacific Northwest National Laboratory using instrumentation developed under support from the US Department of Energy (DOE) Office of Biological and the NIH Na- tional Center for Research Resources (RR018522). Chapter 4 in part is a reprint of the material Bordbar A*, Mo ML*, Nakayasu ES, Schrimpe-Rutledge AC, Kim Y, Metz TO, Jones MB, Frank BC, Smith RD, Peterson SN, Hyduke DR, Adkins JN, Palsson BO. Model drive multi-omics analysis elucidates metabolic immunomodulators of macrophage activation. Mol Syst Biol. (2012) 8:558. The dissertation author was one of the two primary authors. Chapter 5 iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states

5.1 Abstract

Background: The development of high-throughput technologies capable of whole cell measurements of genes, proteins, and metabolites has led to the emer- gence of systems biology. Integrated analysis of the resulting omic data sets has proved to be hard to achieve. Metabolic network reconstructions enable complex relationships amongst molecular components to be represented formally in a biologically relevant manner while respecting physical constraints. In silico models derived from such reconstructions can then be queried or interrogated through mathematical simulations. Proteomic proﬁling studies of the mature human ery-

132 133 throcyte have shown more proteins present related to metabolic function than previously thought; however the significance and the causal consequences of these findings have not been explored. Results: Erythrocyte proteomic data was used to reconstruct the most expansive description of erythrocyte metabolism to date, following extensive manual curation, assessment of the literature, and functional testing. The reconstruction contains 281 enzymes representing functions from glycolysis to cofactor and amino acid metabolism. Such a comprehensive view of erythrocyte metabolism implicates the erythrocyte as a potential biomarker for different diseases as well as a ‘cell- based’ drug-screening tool. The analysis shows that 94 erythrocyte enzymes are implicated in morbid single nucleotide polymorphisms, representing 142 pathologies. In addition, over 230 FDA-approved and experimental pharmaceuticals have enzymatic targets in the erythrocyte. Conclusion: The advancement of proteomic technologies and increased generation of high-throughput proteomic data have created the need for a means to analyze these data in a coherent manner. Network reconstructions provide a systematic means to integrate and analyze proteomic data in a biologically meaning manner. Analysis of the red cell proteome has revealed an unexpected level of complexity in the functional capabilities of human erythrocyte metabolism.

5.2 Background

The advancement of high-throughput data generation has ushered a new era of “omic” sciences. Whole-cell measurements can elucidate the genome sequence (genomics) as well as detect mRNA (transcriptomics), proteins (proteomics), and small metabolites (metabolomics) under a specific condition. Though these methods provide a broad coverage in determining cellular activities, little integrated functional analysis has been performed to date. Genome-scale network reconstructions are a common denominator for computational analysis in systems biology as well as an integrative platform for experimental data analysis [150, 114]. There are several applications of reconstructions 134 including: 1) contextualization of high-throughput data, 2) directing hypothesis- driven discovery, and 3) network property discovery [150]. Network reconstruction involves elucidating all the known biochemical transformations in a particular cell or organism and formally organizing them in a biochemically consistent format [2]. Genome sequencing has allowed genome-scale reconstruction of numerous metabolic networks of prokaryotes and eukaryotes [41, 268, 269]. In fact, a genome- scale reconstruction of human metabolism has been completed, called Recon 1. Re- con 1 is a global human knowledge-base of biochemical transformations in humans that is not cell or tissue specific [79]. More recently, Recon 1 has been adapted to study specific cells and tissues with the help of high-throughput data, including the human brain [5], liver [270, 94], kidney [224], and alveolar macrophage [4]. Though many cell and tissue-specific models have been reconstructed from Recon 1, the human erythrocyte has undeservedly received less attention as the cell has been largely assumed to be simple. Historically, red cell metabolic models began with simple glycolytic models [271, 272]. In a fifteen year period, the original mathematical model was updated to include the pentose phosphate pathway, Rapoport-Luebering shunt, and adenine nucleotide salvage pathways [273, 274, 275, 276]. More recent metabolic models have been built accounting for additional regulatory [277] and metabolic components (e.g. glutathione [278]). However, in the past decade, attempts to obtain comprehensive proteomic coverage of the red cell have demonstrated a much richer complement of metabolism than previously anticipated [279, 280, 281, 282]. Modeling the unexpected complexity of erythrocyte metabolism is critical to further understanding the red cell and its interactions with other human cells and tissues. Thus, we use available proteomics to develop the largest (in terms of biochemical scope) in silico model of metabolism of the human red cell to date. Though comprehensive proteomic data provides an overview of red cell proteins, we believe it does not provide a full functional assessment of erythrocyte metabolism. Thus, we have also gathered 50+ years of erythrocyte experimental studies in the form of 60+ peer-reviewed articles and books to manually curate the final model. In order to objectively test physiological functionality, we have put the final model 135 through rigorous simulation.

5.3 Results and Discussion

iAB-RBC-283 is a proteomic based metabolic reconstruction and a biochemical knowledge-base, a functional integration of high-throughput biological data and existing experimentally veriﬁed biochemical erythrocyte knowledge that can be queried through simulations and calculations. We ﬁrst describe the process and characterize the new erythrocyte reconstruction and determine the metabolic functionality. Then, we analyze the results by mapping genetic polymorphisms and drug target information onto the network.

5.3.1 Proteomic based erythrocyte reconstruction

Proteomic data has been successfully used for reconstructions of Thermo- toga maritima [97] and the human mitochondria [283] and provides direct evidence of a cell’s ability to carry out specific enzymatic reactions. One challenge in the measurement of proteomic data is the depth of coverage, which is still known to be incomplete, even for studies aiming to obtain comprehensive coverage. Another challenge is the possible contamination in the extract preparation with other blood cells [279]. In addition, leftover enzymes from immature erythroid cells are possibly retained in mature red blood cells. With multiple comprehensive proteomic studies carried out in the last decade, the coverage for the red cell has improved significantly but gaps and inaccurate data still plague proteomic studies of the erythrocyte. Thus, in this study, we construct a full bottom-up reconstruction of erythrocyte metabolism with rigorous manual curation in which reactions inferred from proteomically detected enzymes were cross-referenced with existing experimental studies and metabolomic data as part of the quality control measures to validate and gap fill metabolic pathways and reactions (Fig. 5.1). The final reconstruction, iAB-RBC-283, contains a metabolic network that is much more expansive than red blood cell models presented to date (Fig. 5.2). 136

The reconstruction contains 292 intracellular reactions, 77 transporters, 267 unique small metabolites, and accounts for 283 genes, suggesting that the erythrocyte has a more varied and expansive metabolic role than previously recognized. A full bottom-up reconstruction of the human erythrocyte provides a functional interpretation of proteomic data that is biochemically meaningful. Manual curation provides experimental validation of metabolic pathways, as well as gap ﬁlling. The data can be rigorously and objectively analyzed through in silico simulation. 137

A B Raw Erythrocyte Proteomic Data Global Human Recon 1

Remove non-RBC ORFs based on proteomic studies

Omic Database curation

Protein/Metabolite Curated Metabolic Reconstruction literature curation

Uptake/secretion literature curation

iAB-RBC-283

Figure 5.1: Workflow for building a comprehensive in silico erythrocyte metabolic network. (A) The three major data types required are: the human genome sequence, high-throughput data (specifically, proteomics for an enucleated cell), and primary literature. The global human metabolic network, Recon 1, was constructed from the human genome sequence and annotation. To build the erythrocyte network, iAB-RBC-283, proteomics was used to remove non-erythrocyte related open reading frames (ORFs) or genes. Detailed curation utilizing protein, metabolite, and transport experimental literature was needed to build a high- quality metabolic reconstruction. (B) Without network reconstruction and rigorous curation, the experimentally generated proteomic data is raw and difficult to interpret. The process detailed in panel A provides a means to build a meaningful knowledge-base of available high-throughput data that can then be probed and tested. 138

Figure 5.2: Topological map of the human erythrocyte metabolic network (iAB-RBC-283). Utilizing proteomic data, a much expanded metabolic network was reconstructed accounting for additional carbohydrate and nucleotide metabolism. In addition, the erythrocyte plays roles in amino acid, cofactor, and lipid metabolism. Abbreviations: PPP - pentose phosphate pathway, Arg - arginine, Met - methionine. 139

5.3.2 Functional assessment of iAB-RBC-283

In order to ascertain the functional capabilities of the expanded erythrocyte reconstruction, iAB-RBC-283 was converted into a mathematical model. The expanded erythrocyte network was topologically and functionally compared to a previous constraint-based model of erythrocyte metabolism [284] (see Supporting Material). Predictions made by this model could be recapitulated by iAB-RBC- 283. To determine new functionalities of the expanded erythrocyte network, the system is assumed to be at a homeostatic state and qualitative capacity/capability simulations are done to ascertain which reactions and pathways can be potentially active in the in silico erythrocyte. Flux variability analysis (FVA) was utilized to determine the functional metabolic pathways of the erythrocyte network (see Materials and Methods). FVA determines the minimum and maximum allowable flux through each metabolic reaction [30]. In short, the FVA method defines the bounding box on network capabilities. Reactions that had a non-zero minimum or maximum flux value were deemed to be functional. Network level metabolic functional assessment showed that iAB-RBC-283 accounts for additional pathways into glycolysis through galactose, fructose, man- nose, glucosamine, and amino sugars (N-Acetylneuraminate). Galactose can also be shuttled to the pentose phosphate pathway through glucuronate interconver- sions. Citric acid cycle enzymes (fumarase, isocitrate dehydrogenase, and malate dehydrogenase) are present, but we were unable to fully understand their roles as full metabolic pathways were not present. However, fumarate can be shuttled into the model and exported as pyruvate through conversions by fumarase and malic enzyme. In addition, nucleotide metabolism and salvage pathways have been expanded from previous metabolic models to account for XMP, UMP, GMP, cAMP, and cGMP metabolism. In particular, ATP and GTP can be converted into cAMP and cGMP respectively as adenylate and guanylate cyclases were found to be present in the proteomic data. The erythrocyte uses amino acids to produce glutathione for redox balancing, converts arginine to polyamines and a byproduct of urea, and utilizes 140 homocysteine for methylation. It has been proposed that Band III is the major methylated protein, particularly for timing cell death [285]. This has also been included in iAB-RBC-283. In addition, polyamine metabolism produces 5- methylthioadenosine which can be salvaged for methionine recycling. Another major expansion in metabolic capabilities represented in iAB- RBC-283 is lipid metabolism. Though the mature erythrocyte is unable to produce or degrade fatty acids, the cell can uptake fatty acids from the blood plasma to produce and incorporate diacylglyercol into phospholipids for upkeep of its membrane [286]. A pseudo-carnitine shuttle in the cytosol is used to create a buffer of CoA for the cell [287]. All major erythrocyte fatty acids (C16:0, C18:1, C18:2) and phospholipids (phosphatidylcholine, phosphatidylethanolamine, phosphatidylinositol) are explicitly modeled. The phosphatidylinositols can be converted into various forms of myo-inositols, which play an extensive role in cell signaling [288]. Finally, FVA showed that the erythrocyte plays an important role in cofactor metabolism. The reconstruction accounts for uptake, modification, and secretion of multiple cofactors including vitamin B6, vitamin C, riboflavin, thiamine, heme, and NAD. In addition, human erythrocytes play a role in deactivating catecholamines [289], hydrolyzing leukotriene [290], and detoxifying acetaldehyde [291] which were confirmed in the literature.

5.3.3 Metabolite connectivity

In order to compare the network structure of the in silico erythrocyte versus other similar metabolic networks, we calculated the connectivity of each metabolite [292]. Simply, the connectivity is the number of reactions that a metabolite participates in. As a metabolite can be defined as a node in a network structure, the biochemical reactions associated with a particular metabolite are the edges of the network. Metabolite connectivity thus involves determining the number of edges (reactions) connected to every node (metabolite). We compared the in silico erythrocyte to the global human metabolic network, Recon 1, as well as the separate human organelles (Fig. 5.3). A dotted line, linking the minimum and maximum connectivities, is drawn on the distributions 141 as a reference for comparing the distributions. A network with higher connectivity would have many points above the dotted line, while lower connectivity would result in points below the dotted line. Recon 1 has most points above the reference line (Fig. 5.3, first panel). In fact, the metabolite node connectivities for genome- scale reconstructions usually have a distribution similar to Recon 1, highlighting the complexity of these networks [115]. However, the organelles in Recon 1 usually have values below the reference line, due mainly to a higher difficulty to annotate reactions specific to an organelle. The separate organelle metabolic networks are less connected. The exception is the mitochondria (Fig. 5.3, last panel), which is well studied and has a very important and complex metabolic role. The metabolite connectivity of the erythrocyte network is very similar to the less connected organelles in Recon 1. This similarity is due to either 1) the erythrocyte biology or 2) the lack of complete proteomic profiling. First, as the erythrocyte circulates in blood, it has access to many types of metabolites. As the erythrocyte is relatively simple, it is possible that the reconstructed metabolic network is complete as different types of metabolites do not have to be created and can easily be transported into or out of the cell (e.g. amino and fatty acids). However, the network simplicity of the in silico erythrocyte model may also be attributed to the limitations of proteomic techniques. As coverage of proteomic data is typically not as complete as transcriptomics, the reconstructed metabolic network may reflect this limitation. Thus, deeper proteomic profiling could be of great use to further elucidate the role of the erythrocyte in systemic metabolism and the complexity of its own metabolic network. 142

Recon 1 iAB-RBC-283 3 3 10 10

2 2 10 10

1 1 10 10

0 0 10 10 0 2 4 0 1 2 3 10 10 10 10 10 10 10

Endoplasmic Reticulum Golgi Apparatus 3 3 10 10

2 2 10 10

1 1 10 10

0 0 10 10

Number of reactions of Number 0 1 2 3 0 1 2 3 10 10 10 10 10 10 10 10

Lysosome Mitochondria 3 3 10 10

2 2 10 10

1 1 10 10

0 0 10 10 0 1 2 3 0 1 2 3 10 10 10 10 10 10 10 10 Metabolite number

Figure 5.3: The metabolite connectivity of iAB-RBC-283, Recon 1, and the similarly sized organelles in Recon 1. Recon 1 and the mitochondria have high network connectivity, with many data points above the reference lines. The erythrocyte network and other organelles have less metabolic connectivity denoted by data point being on or below the reference lines. This characteristic can be attributed to either 1) an inherently ‘fragmented’ erythrocyte network, or 2) incomplete proteomic coverage. 143

In silico simulations show a greater physiological functionality of erythrocyte metabolism. The additional functionality is not evident from the proteomic data alone. However, metabolite connectivity analysis shows that additional targeted proteomics are of interest. During the manual curation steps we noted that portions of the TCA cycle were detected but not functional. Further studies for cysteine, folate, and phospholipid metabolism are also of interest as some enzymes were detected in the proteomic data but little or no conclusive experimental literature was found.

5.3.4 The human erythrocyte’s potential as a biomarker

Decades of models have described erythrocyte metabolism to include prin- cipally glycolysis, the Rapoport-Luebering shunt, the pentose phosphate pathway, and nucleotide salvage pathways. Integration and compilation of proteomic data, however, has surprisingly shown evidence of a much richer metabolic role for the erythrocyte. Erythrocytes make contact with most portions of the body and are one of the most abundant cells (about 2 liters in volume in a typical adult). With such a varied metabolic capacity, the erythrocyte can act as a sink for and source of metabolites throughout the body. Erythrocytes have been previously studied as potential biomarkers for riboflavin deficiency [293], thiamine deficiency [294], alcoholism [295], diabetes [296], and schizophrenia [297], however comprehensive systems level analyses have not been performed to date. iAB-RBC-283 explicitly accounts for the genetic basis of the enzymes and transporters that it represents. To determine the capability of the in silico erythrocyte as a potential biomarker, we cross-referenced morbid SNPs from the OMIM and nearly 4800 drugs from the DrugBank database with the 281 enzymes accounted for in iAB-RBC-283. 142 morbid SNPs were found in 90 of the erythrocyte enzymes. In addition, 232 FDA-approved, FDA-withdrawn, and experimental drugs have known protein targets in the human erythrocyte (Fig. 5.4). 144

A Erythrocyte enzymes with known morbid SNPs 13 31 Causal SNPs, Metabolic Correlated SNPs

non-RBC related pathology

15 RBC related pathology 59

17 Causal SNPs, non-Metabolic 7

B Drugs with known targets in erythrocyte 4

85 Approved Experimental Withdrawn

143

Figure 5.4: To test the potential use of erythrocytes as biomarkers, we identiﬁed the known morbid SNPs and drug targets of erythrocyte enzymes account for in the reconstructed network. 142 morbid SNPs were identiﬁed with the majority dealing with non-erythrocyte related pathologies (see Table 5.1). In addition, over 230 FDA-approved, FDA-withdrawn, and experimental drugs have known protein targets in the human erythrocyte. 145

Only 35 of the 142 morbid SNPs are related to pathologies unique to the erythrocyte, mainly dealing with hemolytic anemia. The majority of the morbid SNPs deal with pathologies that are peripheral to the red blood cell and to the blood system. The remaining non-erythrocyte related pathologies are classified in Table 1 using the Merck Manual [298]. Most of the observed SNPs are causal and simple targeted assays could be used as diagnostic tools. We also cross-referenced the enzymes in iAB-RBC-283 with the DrugBank’s known protein targets of pharmaceuticals. 85 FDA-approved, 4 FDA-withdrawn, and 143 experimental drugs have known targets in the human erythrocyte. These medications target enzymes for the treatment of a wide range of diseases including seizures (topimarate), allergies (flunisolide), cancer (topotecan), HIV (saquinavir), and high cholesterol (gemfibrozil). Due to the availability of erythrocytes from any individual, drugs can be easily screened and optimized in vitro for individual patients where the effect of the drug is known to occur in the erythrocyte. A comprehensive listing of all observed morbid SNPs and drugs are provided in the Supplementary Material. 146

Table 5.1: Morbid SNPs with non-erythrocyte related pathologies Causal Causal Correlated SNPs, SNPs, SNPs Metabolic Non- Metabolic Pediatric Disorders 28 6 4 Endocrine and Metabolic 10 2 8 Disorders Neurologic Disorders 5 2 1 Genitourinary Disorders 4 2 1 Eye Disorders 1 3 2 Musculoskeletal and Con- 3 - 1 nective Tissue Disorders None 1 - 3 Gastrointestinal Disorders 1 - 2 Hepatic and Billary Disor- 3 - - ders Hematology and Oncology - - 2 Disorders Nutritional Disorders - - 2 Psychiatric Disorders - - 2 Cardiovascular Disorders 1 - - Dermatologic Disorders - 1 - Ear, Nose, Throat, and - 1 - Dental Disorders Immunology, Allergic Dis- 1 - - orders Injuries, Poisoning 1 - - Pulmonary Disorders - - 1 147

5.3.5 Utilizing iAB-RBC-283 to develop biomarker studies

An important application of metabolic reconstructions and the resulting mathematical models is to predict and compare normal and perturbed physiology. We used iAB-RBC-283 to simulate not only normal conditions to study the capacity of erythrocyte function, but also the detected morbid SNPs and drug treated conditions for drugs with known erythrocyte enzyme targets. Flux variability analysis was used to characterize the exchange reactions of the network for determining a metabolic signature in the erythrocyte for the associated perturbed conditions. We compared the minimum and maximum fluxes through each reaction under normal conditions versus all perturbed conditions and determined differential reaction activity (see Methods and 5.5a). Activated or suppressed flux from in silico simulations provides a qualitative understanding into which metabolites and reactions are perturbed, allowing for experimental followup. We were able to confidently detect in silico metabolic flux changes in at least one exchange reaction for 75% of the morbid SNPs and 70% of the drug treated conditions (Fig. 5.5b). On average, there were 12.6 and 9.9 differential activities of exchange reactions for morbid SNPs and drug treated conditions respectively. The average is skewed by some morbid SNPs and drug treated conditions that have over 45 affected exchange reactions, as most differences are detected in between one and ten exchange reactions (Fig. 5.5b). The morbid SNPs with greater than 45 exchange reactions with differential activity deal mostly with glycolytic and transport enzymes that are known to cause anemias and spherocytosis (see Supplementary Material). In addition, most of the drug treated conditions with high numbers of differential exchange activity correspond to drugs that have enzyme targets for the same morbid SNPs that are known to cause hemolytic diseases. 148

Figure 5.5: Flux variability of the exchange reactions can be used to detect metabolic signatures of simulated morbid SNPs and drug treated erythrocytes. (A) Exchange reactions are artificial reactions that allow the mathematical model to uptake and secrete metabolites into the extracellular space. Uptake of a metabolite into the erythrocyte is expressed as a negative value and secretion is expressed as a positive value. There are four major differences that can occur for an exchange reaction in two different states: i) the reaction is either active (non-zero minimum or maximum flux) or inactive (zero minimum and maximum flux), ii) the exchange becomes fixed in one direction (uptake or secretion only), iii) there is a magnitude change in exchange, iv) the reaction is unaffected and is the same for both states. (B) We detected 71% and 62% of the morbid SNPs and drugs associated with the erythrocyte. The distribution shows that most perturbed conditions have between one to ten differentially active exchange reactions. 149

A Exchange Fluxes i) Active/Inactive

State A ii) Fixed Direction State B

iii) Magnitude Change

iv) Una!ected

Uptake / - 0 Secretion / + B

20 20 18 18 16 16 14 14 12 12 10 10 8 8

6 Number of Drugs 6

Number of morbid SNPs 4 4 2 2 0 0 015304560 015304560 Number of perturbed exchanges Number of perturbed exchanges 150

An important metabolic enzyme found in erythrocytes is catechol-o- methyltransferase (COMT) used to methylate cathecolamines. A morbid SNP of COMT gene has been implicated in susceptibility to schizophrenia [299]. In silico simulations show that the associated morbid SNP erythrocyte has lowered uptake of dopamine and norepinephrine and lowered secretion of the methylated counterparts. Though COMT has not been shown to be causal for schizophrenia, the morbid SNP may have an effect on the phenotype. Qualitative in silico simulation of this effect can help focus experimental design on these metabolic pathways in erythrocyte screening. In silico simulations show that the erythrocyte can also be used as a diagnostic for drug treated conditions. For example, topimarate is a drug for treating seizures and migraines and inhibits carbonic anhydrase. Inhibition of this enzyme during simulations showed a drastic change in 54 exchange reaction fluxes pertaining to carbohydrate, cofactor (thiamine, pyridoxine, bilirubin), lipid, and nucleotide metabolism. As individuals react differently to pharmaceuticals and sometimes require different dosages and types of drugs, our analysis shows that the red blood cell can act as a readily available diagnostic for personalizing drug therapies. To further investigate the diagnostic capability of the red blood cell, we assessed the uniqueness of the metabolic signatures detected. We compared all the metabolite signatures to see if some were shared between different SNPs or drug treatments. In all, 67% of the metabolic signatures are unique with most of the remaining similar to only one other perturbed condition (Table 5.2). 151

Table 5.2: Uniqueness of metabolic signatures # of conditions sharing Morbid Drug same met. signature SNPs Treated 1 (Unique) 19 19 2 8 3 3 3 1 4 1 1 5 0 1 7 1 0 152

The in silico simulation results provide a method to focus biomarker discovery experiments in the human erythrocyte, as well as interpret global metabolomic profiling. The flux variability shows that a large number of morbid SNPs and drug effects can be detected in the erythrocyte, with most having a unique metabolic signature. The differential activity in exchanges for the perturbed conditions allow for focusing experiments to particular metabolites, exchanges, and associated pathways, allowing development of targeted assays. In addition, global metabolomic profiling of perturbed conditions can be interpreted using the calculated metabolic signatures and the erythrocyte reconstruction. A full listing of all detected morbid SNPs and drug treated conditions, as well as the corresponding exchange reactions with differential activity and fluxes is provided in the Supplementary Material.

5.4 Conclusion

The mature, enucleated erythrocyte is the best studied human cell for metabolism due to its relative simplicity and availability. Still, the view of its metabolism is rather limited. The advances in high-throughput proteomics of the erythrocyte has enabled construction of a comprehensive in silico red blood cell metabolic reconstruction, iAB-RBC-283. Proteomic data alone is not adequate for generating an accurate, complete, and functional model. The reconstruction was rigorously curated and validated by experimental literature sources as proteomic samples are known to be incomplete, contaminated with other types of blood cells and inactive enzymes are passed down the erythrocyte diﬀerentiation lineage. Thus, iAB-RBC-283 is a knowledge-base of integrated high-throughput and biological data, which can also be queried through simulations. Functional testing showed that the new reconstruction takes into account historically neglected areas of carbohydrate, amino acid, cofactor, and lipid metabolism. Traditionally, the erythrocyte is known for its role in oxygen de- livery, but the varied metabolism the cell exhibits points towards a much more expanded metabolic role as the cell can act as a sink or source of metabolites, through interactions with all organs and tissues in the body. 153

Metabolite connectivity analysis showed that the erythrocyte metabolic network is relatively simple and is similar to human organelles in network structure. This could be due either to shortcomings of the high-throughput data or the relatively simple metabolism of red cells. From our manual curation steps, targeted proteomic studies would be useful for a few metabolic pathways: including TCA cycle, cysteine, folate, and phospholipid metabolism. A metabolically rich and readily available erythrocyte can be useful for clinical biomarkers. To determine potential uses, we cross-referenced the enzymes in iAB-RBC-283 with known morbid SNPs and enzymes that are reported drug targets in DrugBank. There are 142 morbid SNPs detectable in erythrocyte enzymes with the majority dealing with non-erythrocyte related pathologies. In addition, over 230 pharmaceuticals in the DrugBank have known protein targets in the human erythrocyte. Utilizing iAB-RBC-283, we qualitatively detected metabolic signatures for the majority of in silico perturbed conditions pertaining to the morbid SNPs and drugs from DrugBank. The aﬀected exchange reactions, metabolites, and associated pathways can be used to focus experiments for biomarker discovery as well as interpret global metabolomic proﬁles. Taken together, with available proteomic data, a comprehensive constraint- based model of erythrocyte metabolism was developed. Genome-scale metabolic reconstructions have been shown to be an important tool for integrating and analyzing high-throughput data for biological insight [5, 4, 300, 301]. In this study, we show that the comprehensive metabolic network of the erythrocyte plays an unan- ticipated, varied metabolic role in human physiology and thus has much potential as a biomarker with clinical applications. As erythrocytes are readily available, the proteomics and metabolomics of normal and pathological states of individuals can be easily obtained and used for identifying biomarkers in a systems context. 154

5.5 Methods

5.5.1 Reconstructing the comprehensive erythrocyte network

Metabolic network reconstruction has matured into a methodological, systematic process with quality control and quality assurance steps that can be carried out according to standardized detailed protocols [2]. The sequencing of the human genome enabled a comprehensive, global reconstruction for all human cell types to be carried out, with the caveat that in order to implement any context, condition, or cell specific analysis, one would need specific data for the particular human cell or tissue of interest. Metabolic reconstructions contain all known metabolic reactions of a particular system. The reactions are charge and elementally balanced, with gene- protein-reaction (GPR) annotations. GPRs mechanistically connect the genome sequence with the proteome and the enzymatic reactions. GPRs provide a platform for integration of high-throughput data to model specific conditions. In this study, proteomic data was integrated with Recon 1 to build a comprehensive erythrocyte network. iAB-RBC-283 was reconstructed in the following manner. Proteomic data for erythrocytes from multiple sources [279, 280, 281, 282] were consolidated and cross-referenced with Recon 1. The proteomic data was provided in the IPI format and was converted to Entrez Gene Ids. The Entrez Ids were linked with the Recon 1 transcripts, including alternatively spliced variants, and used to generate a list of potential erythrocyte reactions. Reactions and pathways from Recon 1 that were present in the proteomic data were used to build an automated draft reconstruction. However, blood cell contamination and remnant enzymes from immature erythroid cells decrease the accuracy of algorithmically derived models based on high-throughput data. In order to build the most complete and accurate final model, the draft reconstruction was rigorously and iteratively manually curated. In brief, manual curation involves (1) resolving all metabolite, reaction, and enzyme promiscuity, (2) exploring existing experimental literature to deter- 155 mine whether or not detected enzymes in the proteomics data are correct, (3) determining and filling gaps in the proteomic data through topological gap analysis and flux-based functional tests. Steps 2 and 3 are an iterative process where new biochemical data increases the scope of the network, requiring additional gap and functional analysis as well as additional literature mining. Thus, possible remnant enzymes, such as glycogen synthase and aspartate aminotransferase, were properly removed when experimental validation was not available. The manual curation process yielded 60+ peer reviewed articles and books representing over 50 years of erythrocyte research. In addition, erythrocyte literature sources from the Human Metabolome Database [302] were used for validating existence of unique metabolites. A full listing of literature sources for each reaction, enzyme, and metabolite is provided in the Supplementary Material. Fatty acid chains were not represented as generic R-groups as in Recon 1, but instead the three most common (by percent mass composition) fatty acids, palmitic, linolenic, and linoleic (C16:0, C18:1, C18:2, respectively) were used [303]. The final reconstruction is termed iAB-RBC-283 for i (in silico), AB (the primary author’s initials), RBC (red blood cell), 283 (number of open reading frames accounted for in the network). iAB-RBC-283 consists of all the known metabolites, reactions, thermodynamic directionality, and genetic information that the detected erythrocyte metabolic enzymes catalyze. The final reconstruction is provided in both an XLS and SBML format in the Supplementary Material and can also be found at the BioModels Database (id: MODEL1106080000).

5.5.2 Constraint-based modeling and functional testing

The network reconstruction can be represented as a stoichiometric matrix, S, that is formed from the stoichiometric coefficients of the biochemical transformations. Each column of the matrix represents a particular elementally and charge balanced reaction in the network, while each row corresponds to a particular metabolite [115]. Thus, the stoichiometric matrix converts the individual fluxes 156 into network based time derivatives of the concentrations (Equation 5.1). dx = S · v (5.1) dt As each reaction is comprised of only a few metabolites, but there are many metabolites in a network, each flux vector is quite sparse. The stoichiometric matrix is sparse, with 1.3% non-zero elements, and has dimensions (m n); where m is the number of compounds and n is the number of reactions. In the case of iAB-RBC-283, m is 342 and n is 469. As in vivo kinetic parameters are difficult to obtain, the system is assumed to be at a homeostatic state (Equation 5.2),

S · v = 0 (5.2) allowing for simulation without kinetic parameters. The network ﬂuxes (v) are bounded by thermodynamic constraints that limit the directionality of irreversible catalytic mechanisms (lb = 0 for irreversible reactions) as well as known vmax’s.

lb < v < ub (5.3)

Thus, the network is studied under mass conservation and thermodynamic constraints. In addition, constraints are placed on fluxes that exchange metabolites with the surrounding system (the blood plasma in this case), based on existing literature of metabolite transport in the human erythrocyte (see Supplementary Material). These reactions are called exchange reactions and control the flow of metabolites into and out of the in silico cell. Flux balance analysis (FBA) is a well-established optimization procedure [115] used to determine the maximum possible flux through a particular reaction in the network based on the constraints on the network (Equations 5.2 and 5.3) without the need for kinetic parameters. A primer for using FBA and related tools is detailed by Orth et al. [15]. Publicly available software packages exist [158]. In this work, a variant of FBA, called flux variability analysis (FVA) [30], is used. FVA iteratively calculates both the maximum and minimum allowable flux through every reaction in the network. Reactions with a calculated non-zero 157 maximum or minimum have the potential to be active and have a potential physiological function. Thus, we use FVA to determine the capability/capacity of the network reactions to determine metabolic functionality. For a reaction to have a non-zero flux, the reaction must be linked to other metabolic reactions and pathways and plays a functional role in the system. Thus, potentially active reactions are deemed as functional. After determining which reactions were functional, the reaction list was perused to determine pathway and subsystem functionality in the network.

5.5.3 Calculating metabolite connectivity

The stoichiometric matrices of the reconstructed erythrocyte network, Re- con 1, and its constituent organelles were used to calculate metabolite connectivities [292] of every species in each network. The number of reactions each metabolite participates in was summed. For the organelle calculations, only the metabolites corresponding to the particular organelle of Recon 1 were considered. The metabolite connectivity of each organelle as well as Recon 1 and iAB-RBC-283 was ranked order from greatest to least connected to form a discrete distribution (Fig. 5.3).

5.5.4 Analyzing iAB-RBC-283 as a functional biomarker

The Morbid Map from the Online Mendelian Inheritance in Man (OMIM) [304] and the DrugBank [305] were downloaded from their respective databases (accession date: 27/09/10). The enzyme names in iAB-RBC-283 were cross-referenced against the database entries to determine morbid SNPs in erythrocyte proteins and drugs with protein targets in the erythrocyte. The morbid SNPs that did not have sole pathological effects in the erythrocyte were classified using the Merck Manual [298]. Just as FVA can be used to assess the function of a network under a particular set of constraints, it can also be used to assess the changes in function and thus has applications for characterizing disease states [306] and identifying biomarkers [307]. When simulating a morbid SNP or a drug inhibited enzyme, 158 the lower and upper bound constraints on the affected reaction is set to zero as per Shlomi et al. FVA is then used to characterize the exchange reactions under morbid SNP or drug treated conditions (Fig. 5.5a) and then compared to the normal state. A reaction was considered to be confidently altered if the change in the minimum or maximum flux was 40% of the total flux span. The flux span is defined as the absolute difference between the original (unperturbed) maximum and minimum fluxes. Potential thresholds from 5 - 60% were tested. Thresholds in the 15 - 40% range were very consistent while thresholds above 40% had a dropoff (see Supplementary Material).

5.6 Acknowledgements

We would like to thank Nathan Lewis and Daniel Zielinski for insightful conversations and suggestions for the manuscript. This work was supported by National Institutes of Health Grant GM068837. Chapter 5 in part is a reprint of the material Bordbar A, Jamshidi N, Palsson BO. iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states. BMC Syst Biol. (2011) 5:110. The dissertation author was the primary author. Chapter 6

Multi-omic data-driven construction of personalized kinetic models of metabolism

6.1 Abstract

Genome-wide association studies are the primary method of identifying genetic bases of phenotypic variation. However, subtle effects are often not statistically detected and may potentially be elucidated through mechanistic modeling that integrates high-throughput measurements. In this study, we present a scalable workflow to construct data-driven, personalized kinetic models of cellular metabolism. The approach is applied to characterize erythrocyte metabolic function of 24 healthy individuals using plasma and erythrocyte metabolomics and whole-genome genotyping, leading to three key findings. First, personalized kinetic rate constants are more representative of the genotype than are metabolite levels. Second, personalized differences in metabolic dynamics coincide with time-scales of blood circulation. Third, drug perturbation simulations suggest a mechanism of how ITPA deficiency protects patients against ribavirin-induced hemolytic anemia. This study demonstrates the feasibility and utility of personalized kinetic models and we anticipate that their use will accelerate discoveries in the genetic basis of

159 160 human metabolic variation.

6.2 Introduction

High-throughput technologies for measuring biomolecules have made great strides in characterizing human variation and its associated genetic basis [308, 309, 310]. The main approach to do so has been genome-wide association studies (GWAS). GWAS have been successful in determining genetic factors for organ- ismal traits [311, 312], molecular traits [313, 314, 315], and drug susceptibilities [309, 316]. For metabolism, GWAS are applied to understand the genetic basis of variation in plasma metabolites, called metabolite quantitative trait loci (mQTL) [313, 317, 318]. In spite of these notable advances, the promised revolution in medicine with each new omic data type has not been achieved. Physiology and physiological responses are complex and multi-factorial, so for the gross majority of diseases, a single data type will not be sufficient for diagnosing, prognosing, and determining appropriate treatment. There is a need to integrate large, complex data sets in a biologically coherent fashion in order to understand measured human variation in mechanistic terms. Constraint-based modeling approaches [15] have been shown to provide deeper understanding of metabolism than high-throughput data alone [319] and initial studies of interpreting individual variation using these models are encouraging [320, 321, 307, 322]. However, these models can only capture full loss-of-function mutations, such as Mendelian disorders. The majority of human metabolic variation is subtler, affecting metabolite and enzyme levels, as well as enzyme activity. Kinetic models are a more suitable mechanistic modeling approach as they can capture metabolite and enzyme properties. Further, there is already evidence that kinetic models and dynamic simulations can be used to interpret genetic variation and identify causal single nucleotide polymorphisms (SNPs) in a mechanistic fashion [323, 324]. In this study, we present a scalable, multi-omic data-driven workflow to construct personalized kinetic models of cellular metabolism. As a pilot study, the 161 approach is applied to the erythrocytes of 24 healthy individuals using measured plasma and intracellular erythrocyte metabolomics. We find that: 1) the kinetic rate constants are more representative of the underlying genotype than the metabolite levels, and 2) the majority of dynamic differences occur on time scales that coincide with blood circulation. Finally, personalized variability to drug perturbation simulations elucidated a potential mechanism for ribavirin-induced hemolytic anemia and the protective effect of ITPA deficiency.

6.3 Results

6.3.1 Personalized kinetic modeling workﬂow

Traditionally, kinetic models of biochemical reaction networks are constructed through a biophysical approach using rate laws that require kinetic parameters (Michaelis constant, Km; turnover number, kcat; and enzyme concentration,

E0) derived from in vitro enzymatic assays. The need and difficulty for obtaining kinetic parameters and their appropriateness for in vivo situations [325] limits the scaling of traditional kinetic models to large networks, let alone applications to a large number of individuals. An alternative to traditional kinetic models is the Mass Action Stoichiometric Simulation (MASS) approach [326] (Fig. 6.1a). MASS models are an approximation of the cellular dynamics in an observable homeostatic state, and they are readily scalable to the scope of available metabolomics data and reconstructed metabolic networks. Workflows, advantages/disadvantages, and examples of MASS models have been previously discussed [326, 327, 328]. Briefly, a MASS model is comprised of the network topology, derived from a metabolic network reconstruction [2, 1], which is integrated with data types representing the steady state of the system (Fig. 6.1a). The enzyme kinetics of the metabolic reaction is approximated by one lumped term, the pseudo-elementary rate constant (PERC). The PERC is a function of the traditional kinetic parameters. A comprehensive mathematical comparison of MASS and rate law kinetic models is presented in the supplementary material as well as analyses showing that individual variability is the overriding factor of dynamic simulations, not modeling 162 approach (e.g. MASS models vs enzymatic rate laws). 163

Figure 6.1: The Mass Action Stoichiometric Simulation (MASS) approach for building high-throughput personalized kinetic models. (A) MASS models are an approximation of the biochemical reaction kinetics but require only omic data types for parameterization. A metabolic network reconstruction and its computed fluxes are used for the network topology (S) and reaction flux states (v). Equilibrium constants (Keq) are selected from experimental measurements, where available, and group contribution theory methods. Pseudo- elementary rate constants (PERCs) are then calculated. PERCs are bulk parameters representing the traditional kinetic parameters (Michaelis constant, Km; turnover number, kcat; enzyme concentration, E0). (B) First, a baseline kinetic model for the human red cell was constructed. The erythrocyte reconstruction (iAB-RBC-283) was used for the network topology. A baseline flux state was calculated using Markov Chain Monte Carlo (MCMC) sampling. Experimentally measured Keqs were obtained from the NIST database and literature, while the remaining Keqs were estimated using the eQuilibrator. Normal metabolite levels were obtained from the literature. The computationally determined Keqs were minimally adjusted for linear stability of the kinetic model. (C) The baseline model was tailored to 24 individuals’ plasma and erythrocyte concentrations determined by LC/MS. An algorithm was used to minimally adjust the concentrations to ensure thermodynamic consistency. The measured metabolite concentrations were minimally adjusted for linear stability of the kinetic models. As flux states were not measured and vary between individuals, 300 models for each individual were built with different flux states from MCMC sampling. The ensemble models were filtered for linear stability. 164 165

Figure 6.2: Overview of the human erythrocyte kinetic model and metabolite measurements. (A) iAB-RBC-283 was pruned to pathways with metabolites that were measured by LC/MS. The final model contained 55 transport and 87 intracellular reactions encompassing subsystems such as glycolysis, pentose phosphate, AMP/GMP metabolism, glutathione synthesis, and homocysteine metabolism. The model contains 169 plasma and intracellular moieties (145 of which are metabolites). 76 of the metabolites are measured for each individual. The transport and intracellular reactions are catalyzed by metabolic enzymes, which are encoded by 158 genes. The Illumina BeadChip has 1320 SNP probes in regions of 86 of the 158 genes. (B) Histogram of significantly different erythrocyte metabolite concentrations for the 24 individuals (p < 0.05, Bonferroni corrected, ANOVA). 39 of the 69 measured metabolites were significantly different between at least two individuals. The black lines on each distribution represent the 25th, 50th, 75th quantiles. (C) Histogram of significantly different plasma metabolite concentrations for the 24 individuals (p < 0.05, Bonferroni corrected, ANOVA). 39 of the 44 measured metabolites were significantly different between at least two individuals. (D) Histogram of significantly different calculated pseudo-elementary rate constants (PERCs) for the ensemble kinetic models of the 24 individuals (p < 0.05, empirical test, >2x effect size in medians). 54 PERCs for the 143 transport and intracellular reactions were significantly different between at least two individuals. 166 167

To build personalized kinetic models, we used the MASS approach with the following workflow (Fig. 6.1b and Fig. 6.1c). We first built a “baseline” MASS model representing the average metabolite concentrations of the erythrocyte, the cell type of interest (Fig. 6.1b). A metabolic network reconstruction of the human erythrocyte (iAB-RBC-283 [329]), derived from deep coverage proteomic data sets, was used. iAB-RBC-283 was tailored to the scope of the metabolomics data (see supplementary material). The average metabolic concentrations and equilibrium constants were determined from experimental literature and computational methods where necessary (Materials and Methods). Metabolic reaction fluxes were determined by Markov Chain Monte Carlo (MCMC) sampling [127, 4] of the constraint-based model, parameterized by available experimental data. Af- ter integration of the necessary data, PERCs are calculated for all the reactions. The final model and its characteristics are presented in Fig. 6.2a. After constructing the baseline model, we personalized the model based on high-throughput metabolite measurements in a fasting state for 24 healthy individuals (Fig. 6.1c). Four of those individuals were measured at four total time points across six weeks to act as negative controls. 44 plasma and 69 intracellular erythrocyte metabolites were measured, with 76 total metabolites intersecting with the MASS model. We determined significantly different metabolite levels in erythrocytes (Fig. 6.2b) and plasma (Fig. 6.2c) between individuals. Before integration with the kinetic model, an algorithm reconciled thermodynamic inconsistencies in the metabolite measurements (Materials and Methods) and a personalized kinetic model was constructed for each individual by modifying the metabolite concentrations of the baseline model with the measured values of the individual in question. The PERCs were personalized based on the imposed individual metabolite levels. To this point in the workflow, the personalized models only take into account metabolite variability. However, there may be flux variability between individuals as well. To deal with this unmeasured uncertainty, we utilized an ensemble approach [330] where 300 candidate flux distributions were used to build 300 models for each individual. Introducing uncertainty in the reaction flux states helps mitigate not measuring individual flux states and allowed us to determine significantly 168 different PERCs between ensembles of kinetic models for individuals (Fig. 6.2d).

6.3.2 Model parameters reﬂect the genotype, and vice- versa

The pseudo-elementary rate constant is a function of traditional enzymatic parameters (Fig. 6.1a). The Michaelis constant (Km) and turnover number (kcat) are kinetic properties that reflect the protein sequence, or the genetic sequence in protein-coding exons. The enzyme concentration (E0) reflects many independent processes including transcriptional and translation regulation, protein degradation, and splicing regulation. Some of these processes may be reflected in the genetic sequence in the gene region (e.g. variants in promoters, enhancers, synonymous mutations in exons, and splice junctions), but may occur outside of these regions as well. Due to the small sample size of the pilot study, we focused solely on genetic variants occurring in the gene region of the metabolic genes in the constructed erythrocyte kinetic model, thus ignoring variants further away from the gene that may still affect enzyme concentration. To test whether the PERCs are biologically relevant, we globally determined if significantly different PERCs (p < 0.05, empirical test, >2x effect size in medians) between two individuals were enriched in differences of alleles in the genetic variants in the associated gene region (Fig. 6.3a, top panel). Transporter reactions were excluded, as computational estimation of equilibrium constants for transporters is much less accurate than reactions within one compartment [331]. For all genetic variant types (all intron SNPs, all exon SNPs, synonymous exon SNPs, non-synonymous exon SNPs), we found statistical enrichment, suggesting the PERC has a heritable component. Further, we determined which reactions had high percentages of differential PERCs in the negative controls (see supplementary material for specific reactions). These PERCs are highly variable within the same individual, thus having less chance to have a basis in the genotype. Filtering these reactions from the analysis further increased enrichment levels, suggesting that some false positives can be removed based on metabolite profiling at multiple time points. 169

Figure 6.3: Personalized kinetic metabolic can be used for genomic discoveries. (A) Significantly different pseudo-elementary rate constants (PERCs) between individuals were enriched with genotype variations in the region of the associated genes (top panel). Further, we removed spurious PERCs determined from our negative control study and found a higher level of enrichment (filtered PERCs). Surprisingly, significantly different metabolite concentrations were not enriched, and were often depleted, with genotype variations in the gene regions whose associated reactions were directly connected to the metabolite. We next assessed whether genotype variations were enriched with significantly different PERCs associated with that gene (bottom panel). Though a tougher comparison than the first, the genotype variations were enriched with significantly different PERCs, with a higher level of enrichment using the filtered PERCs. Genotype variations were significantly depleted with significantly different metabolites that were directly linked to reactions associated with the genes. Note: Intron SNPs include all non-coding variants (e.g. UTR regions). (* - significantly enriched, † - significantly depleted, p < 0.05, binomial distribution) (B) The accuracy of the genotype variations as predictors of significantly different PERCs was significantly correlated (Pearson correlation) with the minor allele frequency (based on the 1000 Genomes Project) of the SNPs. This result is consistent with the notion that rare SNPs or SNPs in conserved genomic regions have a higher propensity to be functional. (C) In a targeted analysis driven by significantly different PERCs, three SNPs in the gene region of pyruvate kinase were determined to be significantly associated with the PERC for pyruvate kinase. One of the SNPs (rs2856929) is located in a non-coding exon region and has been previously associated to body mass index (BMI). 170 171

We next determined whether differences in alleles between individuals are enriched in PERCs (Fig. 6.3a, bottom panel). This is a much more difficult test as most genetic variants are not functional. Enrichment levels were much lower and variants that are synonymous or in intron regions were not significantly enriched with PERC differences. Filtering the same reactions determined from the negative controls, all variant types were statistically enriched. We completed a similar analysis comparing significantly different intracellular erythrocyte metabolites (p < 0.05, Bonferroni corrected, ANOVA) with genetic variants. The variants in gene regions associated with metabolic reactions producing or consuming the metabolite were considered. Transporter reactions were excluded for consistency with the PERCs analysis. Unexpectedly, significantly different metabolite levels were not enriched in differences of alleles (Fig. 6.3a, top panel). Conversely, differences in alleles across individuals were also not enriched in significantly different metabolite levels (Fig. 6.3a, bottom panel). Though PERCs are a lumped, computationally derived approximation of the quantitative traits of enzyme activity and concentration, they are more representative of the genotype than metabolic quantitative traits. This is likely due to the fact that metabolite levels are effected by multiple metabolic enzymes. Metabolite levels can vary in a system, while the enzyme activity or concentration stays constant. To further characterize the relationship of PERCs and genetic variants, we found that there is a significant correlation between the accuracy of predicting differences in PERCs with the minor allele frequency (MAF, based on the 1000 Genomes Project) of the variant (Fig. 6.3c). This result is consistent with the notion that functional variants are more likely to be rare [332]. The heteroscedasticity of the results near MAF = 0 is likely due to the small sample size of individuals in this study. Finally, we completed a model-driven association study. Due to the small sample size, PERCs were only compared to genetic variants within the associated gene region to reduce the number of multiple hypotheses. We found three significant associations of the PERC for pyruvate kinase with three SNPs in the gene region of PKM2 (rs1037680, p = 7.6e-3; rs2856929, p = 7.6e-3; and rs4506844, p = 172

3.3e-3). Of the three associated SNPs, rs2856929 is of highest interest for follow-up. rs2856929 is a non-coding exon variant that has been previously been associated to body mass index [333]. Using GWAVA [334], a bioinformatics tools recently published to assess functionality of non-coding variants, rs2856929 had a relatively high score as compared to the distribution of scores in the GWAVA study. We were unable to detect signiﬁcant associations between metabolite concentrations and variants in gene regions of associated reactions. This lack of association is consistent with the previous results that diﬀerences in metabolite levels are not enriched with variations in alleles and thus larger sample sizes are needed for statistical power.

6.3.3 Temporal Decomposition

Above, we focused on statistical analyses that are commonly used in GWAS. However, personalized kinetics models can be analyzed for mechanistic and physiological insights that are not fundamentally possible with traditional studies of human variation. Specifically, we analyzed the individual variability in network dynamic responses of the erythrocyte by assessing the properties of the personalized Jacobian matrices [335]. There is a dual set of Jacobian matrices (a flux component and a metabolite component) and they describe the network dynamics near the homeostatic reference state [336]. Using a similarity transformation, the Jacobian matrix can be decomposed into their independent dynamic components (modes) and their eigenvalues [337]. The modes are expressed either as fluxes or metabolites, depending on which Jacobian matrix is decomposed. The eigenvalues derived from the similarity analysis of the dual Jacobian matrices are the same. The eigenvalues are the negative reciprocal of the time constants that the modes operate on. Decomposing the ensemble metabolic models, there were 105 total modes, or systemic dynamic components, in all the personalized models. 173

Figure 6.4: Time scale decomposition of the ensemble kinetic models elucidated personalized systemic variations at specific time scales. The kinetic models were temporally decomposed using the Jacobian and modal matrices. The vectors of the modal matrix represent the dynamically independent variables of the system, while the eigenvalues are the time scales on which the vectors act. (A) The majority of differences in time scales between ensemble models of different individuals occurred on the order of seconds. The time scale of variability in dynamics coincides with the physiological time scales of blood circulation (around 1 min). Repeating the analysis with the negative controls did not yield such a specific response. (B) Metabolic reactions can be aggregated based on the time scales that they coordinate their activity. The first two time scales are shown. For example, aldolase (FBA) and triosephosphate isomerase (TPI) are very fast reactions, operate near equilibrium, and begin coordinating in the first time scale (around 1 ms). These two reactions then begin coordinating with the rest of lower glycolysis in the second time scale (around 1 sec). The second time scale (around 1 sec) is of interest as the major variability between individuals was seen here. Important metabolic pathways begin aggregating on this scale including lower glycolysis, pentose phosphate pathway (oxidative and non-oxidative), AMP metabolism, glutathione recycling, and methionine salvage. 174 175

To assess the systemic dynamic components, we determined which time constants were significantly different between individuals (p < 0.05, empirical test). Surprisingly, almost all pair-wise differences in the time constants occurred between one second and one minute (Fig. 6.4a). Coincidentally, circulation of erythrocytes occurs in about a minute, suggesting that key metabolic functions during the oxygenation and deoxygenation process represent the main variation between individuals’ erythrocytes. The time constants that were differential in this region ranged from 1.12 seconds to 32.7 seconds. The majority of pairwise differences were in time constants ranging from 2.62 to 3.90 seconds. The median effect size of time constant differences was 14.5%, with effect sizes as large as 150%. We detected almost no changes in larger time constants and only a few in the sub-second time scale. To ensure that the results were not artifacts of the experimental or computational methods, we repeated the temporal decomposition with the negative controls. Results from the negative controls (see supplementary material) did not have the same specific difference as seen between individuals, suggesting that systemic dynamic differences of erythrocytes between individuals occurs on the physiologically relevant time-scale of blood circulation. Next, we determined what are the dynamic flux components of the modes that operate on each of the separate time scales (ms, sec, min, hr, day) to ascertain what fluxes comprise the individual variation seen in the time constants. A previously published method [337] was applied to determine when individual fluxes aggregate or coordinate their activities on the separate time scales based on the modes. The first two time scales are shown in Fig. 6.4b as by the longer time scales, most of the reactions had aggregated into one large group. The first time scale (around 1 ms) is the coordination of very fast reactions near equilibrium, such as isomerases (PGI, TPI, RPI). The second time scale (around 1 sec), where most individual variation was detected, is characterized by the coordination of fluxes into parts of key erythrocyte pathways. In particular, the coordinated activity of lower glycolysis, the oxidative PPP, glutathione oxidation and reduction is formed on this time scale. 176

Figure 6.5: Dynamic simulations with the ensemble personalized kinetic models predisposition factors to ribavirin (RBV) induced hemolytic anemia. (A) RBV is an anti-viral, which is used to treat hepatitis C and has a common side effect of dose-dependent non-immune mediated hemolytic anemia. The generally accepted mechanism for this side effect is that erythrocytes import and phosphorylate the drug, but cannot dephosphorylate it. This leads to a depletion of the cellular phosphate and ATP levels, making the cell prone to oxidative damage. We added reactions to each of the ensemble kinetic models describing this RBV mechanism and iteratively simulated an increasing dosage of RBV (up to 3x the dosage) to determine the plasma concentration at which the models could no longer return to a steady state. (B) A robust classification tree was constructed to determine factors (PERCs) that predispose models whether or not to return to a steady-state. High PERCs for phosphoglycerate kinase (PGK) and adenine transport were identified as the factors of susceptibility to RBV. (C) The ATP/ADP ratio in erythrocytes plays a potential role to susceptibility to RBV induced hemolytic anemia. ITPA deficiency is a well-documented protection against RBV induced hemolytic anemia. The association of ATP/ADP ratio to the highest associated variant with protection (rs6051702) is shown. The empirical p-value is 0.0257 suggesting that the sample size is the primary issue for lack of significance. 177 178

6.3.4 Dynamic simulations for understanding ribavirin susceptibility

A unique aspect of personalized kinetic models is the ability to simulate personal dynamic responses to perturbation. Briefly, we found significant differences in individual responses to a change in erythrocyte redox ratios (see supplementary material). We leveraged the personalized simulation capability to study the individual response to ribavirin (RBV), an anti-viral drug commonly used for treating the hepatitis C virus. A common side effect of RBV is dose-dependent non-immune mediated hemolytic anemia [338]. The mechanism generally accepted for this side effect is that RBV can be phosphorylated in the erythrocyte, but cannot be de- phosphorylated. This mechanism can cause accumulation of over 1 mM of RBV and its phosphorylated forms (RMP, RDP, RTP) in erythrocytes [339], draining over 2 mM of inorganic phosphate. Curiously, individuals with ITPA deficiency, the inability to hydrolase ITP to IMP, are protected from this side effect [316]. ITPA deficiency has no known clinical phenotype and is marked with a buildup of ITP in erythrocytes (roughly 0.2 mM) [340], a metabolite that is not detected in normal erythrocytes. We added the necessary kinetic reactions into each model to describe phosphorylation of RBV (see supplementary material). Initial simulations of drug load on the baseline model showed that simulated dynamic changes were in accordance with RBV stimulation in vitro [341]. Using dynamic simulations, we then determined at which plasma RBV concentration (up to 3x the typical plasma RBV concentration [342]) the personalized models were unable to return to a steady state due to drug load (Fig. 6.5a). Most individuals (22) were non-responders, as the ensemble models for those individual were able to handle RBV at the typical dosage. However for two of the 24 individuals examined, a large portion of their ensemble models were unable to handle a typical RBV load and based on the simulations, those individuals would have the side effect. The simulated incidence rate (8.3%) is in accordance with the percent of RBV patients (n = 1184) during Phase III clinical trials [343] that had RBV dosage reduced (7.4%), discontinued (0.3%), or received transfusions (0.1%). Further, eight individuals were heterozygous and 179

1 individual was homozygous for the variant (rs60517029) highest associated with ITPA protection against RBV induced hemolytic anemia. All nine of these individuals were in the non-responder group. To determine if there was an underlying predisposition to RBV induced hemolytic anemia, we used classification trees with PERCs as predictors to classify non-responders versus responders. A robust classifier was built (Materials and Methods) to separate non-responders and responders (Fig. 6.5b). The classifier indicated that high PERCs for phosphoglycerate kinase (PGK, the first reaction of ATP production in glycolysis) and adenine transport are a potential reason for RBV induced hemolytic anemia. The classifier was supported by the fact that previous clinical studies had identified individuals with hemolytic anemia that had high glycolytic rates and high adenine catabolism [344]. A high glycolysis rate and a high adenine catabolism rate allow for quicker access of RBV to ATP and increasing the RBV phosphorylation rate. Thus, predisposition to RBV induced hemolytic anemia may be due to the dynamic nature of the system and its inability to adapt to high rates of RBV phosphorylation in certain individuals. We next determined if protection by ITPA deficiency is related to the identified classifiers. One of the predisposition factors, the PERC of PGK and high glycolytic rates, is dependent on a high cellular ATP/ADP ratio. ITPA deficiency, which leads to a build of ITP in erythrocytes, is in effect sequestering phosphate from the total phosphate pool. Changing the phosphate pool has been shown to affect the ATP/ADP ratio [345] and a sequestering of phosphate in ITP could potentially lower the ATP/ADP ratio. We found that the ATP/ADP ratio was modestly associated (p = 0.0591) with rs6051702 (Fig. 6.5c). The empirical p- value (0.0257) suggests that the small sample size is the main reason for the modest association. Thus, lowered ATP/ADP ratio due to a smaller phosphate pool may play a role in ITPA deficiency protection from RBV induced hemolytic anemia. 180

6.4 Discussion

We have developed a metabolomics-driven, automated approach to construct personalized kinetic models in a fashion scalable to thousands of individuals. To our knowledge, this approach represents the first mechanistic attempt to bridge the variation in dynamic responses of human metabolism in a cell type with the underlying genotype utilizing high-throughput data. The approach can be extended to any cell type where metabolomics data can be procured. Our approach is an alternative to the traditional statistically, and often univariate, based GWAS. The results have two main implications for accelerating discoveries in the genetic basis of human metabolic variation and personalized medicine. First, determining mQTLs are the primary method to study the genetic basis of human metabolic variation. Our analysis shows that the PERC, an approximation for enzyme activity and abundance, is more representative of the genotype than metabolite concentrations alone. Metabolite levels are a conse- quence of the many enzymes that act upon them and do not directly represent the genotype. This suggests that more associations with enzyme activities can be found than mQTLs in larger studies as the PERCs better represent the subtleties in human variation. Further, it is conceivable that the combination of the two approaches (PERC and mQTL) could yield additional systemic insights into how multiple metabolic enzymes affect metabolite buildup or depletion. We anticipate that scale-up of this approach to sample sizes more typical of GWAS will lead to many new discoveries in genomics for cellular metabolism. Second, a unique feature of personalized kinetic models is the ability to simulate the dynamic responses of individuals and elucidate subtle but consequen- tial variations. A model-driven analysis aided us to propose a novel mechanism for predisposition of RBV induced hemolytic anemia. The mechanism is consistent with the protective properties of ITPA deficiency. An alternative mechanism has been proposed [346], but the conditions under which those in vitro experiments were done are not physiologically relevant. Further, clinical phenotypes of other drugs and deficiencies are not consistent with the alternative mechanism (see supplementary material). Follow-up of our mechanism with larger populations is 181 needed to see if the ATP/ADP ratio in erythrocytes is in fact a major determinant of RBV induced hemolytic anemia susceptibility. An otherwise clinically irrelevant buildup of ITP in erythrocytes might cause a subtle but systemic change in phosphate levels, the ATP/ADP ratio, and glycolytic rates offering protection against RBV induced hemolytic anemia. This proposed mechanism is both systemic and dynamic in nature, which highlights the utility of personalized kinetic models for pharmacodynamics.

6.5 Materials and Methods

6.5.1 Study sample

University of California, San Diego IRB approval was obtained for Project Number 111100X permitting the voluntary acquisition and subsequent analysis of blood from healthy volunteers via peripheral venipuncture. 24 individuals (16 male, 8 female) who identified themselves as healthy, without any known chronic medical conditions, provided written informed consent and provided venous blood samples for the purpose of plasma and erythrocyte metabolite profiling as well as SNP genotyping in a morning fasted state (a minimum of eight hours). Four of the individuals (Individuals #1-4) were sampled for metabolite profiling four times across six weeks as negative controls (e.g. week 0, 1, 4, and 6). Their final time point is used in the comparisons against the other 20 individuals.

6.5.2 Data Preparation

Two vials of blood (8 mL) were drawn from each individual. The first vial (included in PAXgene blood DNA kit, Qiagen, Germantown, MD) was subsequently used for DNA extraction and SNP genotyping using the kit’s protocols. The Illumina Infinium HumanOmniExpressExome BeadChip was used for SNP genotyping. The UCSD IGM Genomics Center (http://igm.ucsd.edu/core- services/) completed all analyses and made the final SNP calls. The second vial, containing heparin anticoagulant, was immediately placed 182 on ice for metabolite extraction. The vial was split into two for plasma and intracellular erythrocyte metabolite extraction. Erythrocyte samples were centrifuged at 1000g for 2 min at 4C. 25 uL of packed red cells were aliquoted with 25 uL of internal standards (see next section) and 450 uL of -80C methanol (80%) in a 1.5 mL tube. Tubes were immersed in liquid nitrogen until frozen to lyse the cells. After thawing in a water bath at room temperature, tubes were centrifuged at 15800g for 15 min at 4C. The supernatant was retained for metabolite profiling. Concurrently, plasma samples were centrifuged at 3000g for 20 min at 4C. 100 uL of plasma was aliquoted with 35 uL of internal standards and 365 uL of -80C methanol (80%) in a 1.5 mL tube. Tubes were centrifuged at 15800g for 15 min at 4C. The supernatant was retained for metabolite profiling and stored at -80C. Samples were later lyophilized and reconstituted in water and ran at the nomi- nal concentration and at a ten-fold dilution to account for the large concentration range of intracellular metabolites. Metabolite samples were prepared in triplicate. Across all samples, 2.8% and 4.0% of the erythrocyte and plasma metabolomes, respectively, were not measured. These values were imputed using a K-nearest neighbor searching, using a weighted average of the five nearest neighbors. This imputation approach has been previously shown to yield high quality results for metabolomics [347].

6.5.3 Metabolite proﬁling: standards, instrumentation, acquisition, and quantiﬁcation

Standards were purchased from Sigma-Aldrich (St. Louis, MO) or Santa Cruz Biotechnology (Dallas, TX). LC-MS reagents were purchased from Honeywell Burdick Jackson (Morris Township, NJ). Metabolically labeled internal standards were synthesized as described previously [348]. Calibrator standards were generated from stock solutions freshly prepared or kept in -80C for no longer than 3 days. Calibrator standards were combined into mixtures and aliquoted and lyophilized to dryness and stored in -80C. Aliquots were reconstituted in water and serially diluted to generate a calibration curve that spanned the lower and upper limits of detection for each compound. 183

A XSELECT HSS XP 150 mm 2.1 mm 2.5 um (Waters Corporation, Mil- ford, MA) with an UFLC XR HPLC (Shimadzu Scientific Instruments, Columbia, MD) was used for chromatographic separation. Mobile phase A was composed of 10 mM tributylamine (TBA), 10 mM acetic acid (pH 6.86), 5% methanol, and 2% 2-propanol; mobile phase B was 2-propanol. Oven temperature was 40C. The chromatographic conditions are as follows: 0, 0, 0.4; 5, 0, 0.4; 9, 2, 0.4; 9.5, 6, 0.4; 11.5, 6, 0.4; 12, 11, 0.4; 13.5, 11, 0.4; 15.5, 28, 0.4; 16.5, 53, 0.15; 22.5, 53, 0.15; 23, 0, 0.15; 27, 0, 0.4; 33, 0, 0.4;(Total time [min], Eluent B [vol.%], Flow rate [mL/min]). The autosampler temperature was 10C and the injection volume was 10 uL with full loop injection. An AB SCIEX Qtrap 5500 mass spectrometer (AB SCIEX, Framingham, MA) was operated in negative mode. Electrospray ionization parameters were optimized for 0.4 mL/min flow rate, and are as follows: electrospray voltage of -4500 V, temperature of 500C, curtain gas of 40, CAD gas of 12, and gas 1 and 2 of 50 and 50 psi, respectively. Analyzer parameters were optimized for each compound using manual tuning. The instrument was mass calibrated with a mixture of polypropylene glycol (PPG) standards. Samples were acquired using the scheduled MRM pro algorithm in Analyst 1.6.2. The acquisition method consisted of a multiple reaction monitoring (MRM) survey scan coupled to an information dependent acquisition (IDA) consisting of an enhanced product ion (EPI) scan for compound identity confirmation. Samples were quantified using IDMS39, 40 with metabolically labeled internal standards and processed using Multiquant 2.1.1. Linear regressions for compound quanti- tation were based on peak height ratios and the logarithm of the concentration of calibrator concentrations from a minimum of four consecutive concentration ranges that showed minimal bias. A peak height greater than 1e3 ion counts and a signal-to-noise ratio greater than 20 were used to define the lower limit of quan- titation (LLOQ). Quality controls and carry-over checks were included with each batch. Due to the number of biological isomers, the integration of each compound is manually checked. 184

6.5.4 Constraint-based model construction and ﬂux calculations

The human erythrocyte metabolic model was updated using the COBRA Toolbox v2 [127] for Matlab (MathWorks, Natick, MA). Model changes included removing reactions that were in pathways whose metabolites were primarily outside of the scope of the metabolomic measurements. Further, we added reactions to the network based on the additional scope contained in the metabolomics measurements and a recently published proteomic data set [349]. The supplementary material details specific changes to the model. Literature sources (see supplementary material) were used to set transport fluxes of metabolites into and out of the model, as well as flux rates of NADH and ATP usage, catalase activity, and flux ratio splits. Lower and upper bounds for these fluxes were set as the mean +/- standard deviation, where available. Otherwise, flux bounds were set as 75% to 125% of the mean. Markov Chain Monte Carlo (MCMC) sampling [127, 4] was used to determine candidate flux states, or sample points, for the erythrocyte. 5000 sample points were computed and the mean was used as the initial flux state used to build the baseline model. Later, 300 of the 5000 sample points were randomly chosen to build the ensemble models. The MCMC approach is “un-biased” as no objective function is required.

6.5.5 Kinetic model construction

Kinetic model construction, temporal decomposition, and dynamic simulations were completed with the MASS Toolbox (http://sbrg.github.io/MASS- Toolbox/) for Mathematica (Wolfram, Champaign, IL). The MASS modeling approach and the workflow for building the models have been previously documented [326, 327]. The constraint-based model and the associated fluxes calculated by MCMC sampling were imported from Matlab into Mathematica. Initial metabolite concentrations were set based on literature sources. Plasma concentrations were assumed to be constant during simulations and temporal decomposition. Concen- tration data for some of the metabolites was not available and was set based on 185 a concentration sampling of thermodynamically feasible ranges. We modified the constraints of the MCMC sampling algorithm used for calculating candidate flux distributions in order to calculate candidate concentrations that are thermodynamically feasible (Equation 6.1). The known concentrations were fixed and the remaining concentrations were calculated. The highest probability concentrations were chosen.

T S · log(x) ≤ log(Keq) if v > 0

T S · log(x) ≥ log(Keq) if v < 0 (6.1)

lb = ub = xknown

Equilibrium constants were set based on experimental measurements, where available, or group contribution methods [350] (http://equilibrator.weizmann.ac.il). Equilibrium constants for transport reactions were set based on a previous method [331]. Literature sources for metabolite concentrations and equilibrium constants are provided in the supplementary material. Linear stability was assessed by calculating the Jacobian matrix of the kinetic model and checking that the eigenvalues were all negative or zero within a tolerance of 10−15. To achieve linear stability of the baseline model, the computationally determined equilibrium constants were varied up to three orders of magnitude and the eigenvalues of the Jacobian matrix were re-calculated to determine the minimal adjustment needed for a stable kinetic model. The internal standards were used to relatively quantify the plasma and erythrocyte concentrations and the mean concentration of a metabolite across all samples was assumed to be the baseline concentration. Measured plasma and erythrocyte concentrations were minimally adjusted using the following QP problem to ensure thermodynamic consistency (Equation 6.2). Γ is the mass action ratio. x0 and Γ0 are the baseline model concentrations and mass action ratio. was set to 10% of the difference of Γ and Keq. This ensures that the calculated PERC in later steps is not infinity or negative infinity. 186

2 2 min(log(x/x0)) + (log(Γ/Γ0))

T S · log(x) ≤ log(Keq) − if v > 0 (6.2) T S · log(x) ≥ log(Keq) + if v < 0

log(0.5) ≤ log(x/x0) ≤ log(1.5) After integration of the individual metabolite data, the measured metabolite concentrations were varied up to an order of magnitude and the eigenvalues of the Jacobian matrix were re-calculatd to determine the minimal adjustment needed for a stable kinetic model. Once stabilized, 300 ensemble models were built for each individual model and the baseline model using MCMC sample points. The models were ﬁltered based on their ability to simulate to a steady-state due to a perturbation by decreasing the ATP concentration by 5% and increasing the ADP concentration by that amount. The number of stable models varied for each individual from 62 to 287 models, with 48 ﬂux states having a stable kinetic model for all individuals and the baseline. The 48 models for each individual were utilized in the study.

6.5.6 Temporal decomposition

Temporal decomposition was accomplished by applying a similarity transformation to the Jacobian matrix (J = MΛM−1) to determine the modal matrix (M−1)[337]. The modal matrix separates the dynamically independent motions of the kinetic model into separate modes. Λ is a diagonal matrix of the eigenvalues of J, which are the negative reciprocal of the time constants or scales of the modes. To determine the aggregation of reaction fluxes across different time scales, a previously developed method was used [337] with a correlation coefficient threshold of 0.85.

6.5.7 Statistical analysis for targeted associations

A linear regression model was used for comparing medians of PERCs (log2 transformed) to SNPs only within the associated gene regions. Only SNPs with 187 at least two individuals having homozygous minor alleles were considered. The median PERC value from the ensemble models was utilized for the regression. A Bonferroni correction was applied to the number of multiple hypotheses for each PERC. For example, the pyruvate kinase PERC had four possible genetic variants to test, thus the threshold is 0.0125 (0.05/4).

6.5.8 Statistical analysis for dynamic simulations

For the redox simulations (see supplementary material), significantly different responses between individuals during dynamic simulations were determined on the basis of the intra-variability between the 48 ensemble models of an individual and the inter-variability between the ensembles of two individuals. Simulations were run from time 0 to 1e6 hr, a time point well beyond when all of the models had returned to a steady state. Mathematica outputs interpolating functions from the ODE simulator. The interpolating functions were discretized at 100 time points from 1e-6 hr to 1e6 hr. The distance metric for comparing two discretized time profiles (x1 vs x2) used for this calculation is a classical distance metric for kinetic model simulations (Equation 6.3). An empirical test was used for significance (p < 0.05). Before each distance calculation, the two time profiles were scaled so that the initial condition of the variables were the same, so that the distance metric was a test of the perturbation response, not the absolute difference between individuals.

x (t) − x (t)2 X 1 2 (6.3) x1(t)

6.6 Acknowledgements

We would like to thank all the individuals who donated blood for this project. We would also like to thank Dr. Nathan Lewis, Jonathan Monk, and Ali Ebrahim for valuable discussions, as well as Andreas Draeger for standardizing the kinetic models for SBML. This work was supposed by NIH grant GM068837. 188

Conclusions

7.1 Recapitulation

In this thesis, I have advanced computational methods for understanding cellular processes utilizing high-throughput omic data. A mechanistic, bottom-up approach is utilized where the underlying biochemistry of the cellular processes are explicitly defined in genome-scale models. Next, comprehensive measurements of biomolecules, or omic data, is deeply analyzed through integration with the constructed mechanistic models. The cellular processes investigated in this thesis span various scales from the fine-grain (e.g. metabolic pathways), to the comprehensive (e.g. whole metabolic networks), and to cellular dynamics. The work presented has implications in the research fields of immunometabolism, personalized genomics and pharmacodynamics, and general biological data analysis using biological pathways.

7.2 The Prospect of Genome-Scale Science

Gregor Mendel described discrete quanta of information traveling from one generation to the next, which determines the form and the function of an organism. Subsequently, Wilhelm Johannsen formulated the concept of a gene as the quanta of information, which led him to the deﬁnition of genotype and phenotype. Since then, a major goal of biology has been the quantitative description of the

189 190 fundamental genotype-phenotype relationship. The push in the quantitative biological sciences to understand macroscopic properties from microscopic measurements has parallels to the elucidation of fundamental principles in physics some hundred years ago (Fig .7.1). For example, the Einstein-Smoluchowski relation is a model for Brownian motion that quantitatively predicts properties of diffusion. Although the theory was an approximation of the physical processes [351], it has been applied and has helped to develop more sophisticated models. The work in this thesis and other recent developments in this field of research suggests that the life sciences have now reached a point at which many aspects of the genotype-phenotype relationship for metabolism can be quantified and used to build mechanistic models that allow meaningful biological predictions to be made. The formulation of high-dimensional models that are required to compute full molecular phenotypes are enabled by genome sequencing technology, which allows the generation of a cellular parts list; by various omic data types, which allows a functional readout of these parts; and by mechanistic modeling frameworks that are amenable to reconciling omic data, network structure and knowledge from primary literature. The successes of this field demonstrate that constraint-based modeling is an approach that enables the genome-scale study of metabolism. 191

Figure 7.1: Physical and biological models to describe macroscopic properties of microscopic measurements. 192

7.3 Future

As with any modeling framework, the mathematical theory and the applications of constraint-based modeling will continue to be challenged and reﬁned, thus improving our interpretation of biological phenomena. We foresee progress to unfold in several major directions. First, constraint-based modeling has mainly focused on metabolism, and more integrative modeling approaches must be explored. Trends in current literature [99, 100, 352] indicate that other cellular processes may be modeled using alternative frameworks that are better suited for a particular biological phenomena. Statistical approaches are also powerful for modeling biological processes that are poorly understood. Integrating other approaches with CBMs of metabolism can expand the scope of quantitative prediction. Second, the majority of applications of CBMs have been for single-cell organisms. We see two areas of application into which CBMs are likely to expand: human disease and the microbiome. Although the human reconstruction (that is, Recon 1) is far from complete, the cancer drug target studies showed that quantitative predictions are still possible. With the availability of the second build (that is, Recon 2 [353]), we foresee greater applied uses in human disease. There has also been a steady increase in the amount of omic data of the human microbiome, and CBMs will have an important role in analysing these complex data sets [354, 355]. Third, the underlying assumptions and methods for constraint-based modeling analyses will continue to evolve as more data types become available. Similarly to the testing of optimality assumptions of FBA, other key assumptions of CBMs will be tested in the next few years. For example, with the increasing availability of time-course metabolomics, the steady-state assumption can be bypassed and concentration changes can be explicitly modeled. Rather than assuming constant internal metabolite levels, these concentrations can be directly measured over a time course in an experiment and the rate of change can be integrated explicitly. In addition, the increasing availability of genomic data and sophisticated models for the interpretation of these data will allow explicit description and integration of the dependence of genomic sequence on gene expression, protein synthesis and protein structures for metabolic reactions in CBMs. We anticipate that these de- 193 velopments will enable even greater growth in the diversity of predictions and in the biological discoveries that are achievable by using constraint-based modeling.

7.4 Acknowledgements

[1] Feist AM, Herrg˚ardMJ, Thiele I, Reed JL, Palsson BØ (2008) Reconstruc- tion of biochemical networks in microorganisms. Nature Reviews Microbiol- ogy 7: 129–143.

[2] Thiele I, Palsson BØ (2010) A protocol for generating a high-quality genome- scale metabolic reconstruction. Nature protocols 5: 93–121.

[3] Lewis NE, Nagarajan H, Palsson BO (2012) Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods. Na- ture Reviews Microbiology 10: 291–305.

[4] Bordbar A, Lewis NE, Schellenberger J, Palsson BØ, Jamshidi N (2010) Insight into human alveolar macrophage and m. tuberculosis interactions via metabolic reconstructions. Molecular systems biology 6.

[5] Lewis NE, Schramm G, Bordbar A, Schellenberger J, Andersen MP, Cheng JK, Patel N, Yee A, Lewis RA, Eils R, Konig R, Palsson BO (2010) Large- scale in silico modeling of metabolic interactions between cell types in the human brain. Nature biotechnology 28: 1279–1285.

[6] Bordbar A, Feist A, Usaite-Black R, Woodcock J, Palsson B, Famili I (2011) A multi-tissue type genome-scale metabolic network for analysis of whole- body systems physiology. BMC systems biology 5: 180.

[7] Klitgord N, Segr`eD (2010) Environments that induce synthetic microbial ecosystems. PLoS computational biology 6: e1001002.

[8] Zhuang K, Izallalen M, Mouser P, Richter H, Risso C, Mahadevan R, Lov- ley DR (2010) Genome-scale dynamic modeling of the competition between rhodoferax and geobacter in anoxic subsurface environments. The ISME journal 5: 305–316.

[9] Li F, Thiele I, Jamshidi N, Palsson BØ (2009) Identiﬁcation of potential pathway mediation targets in toll-like receptor signaling. PLoS computational biology 5: e1000292.

194 195

[10] Papin JA, Palsson BO (2004) The jak-stat signaling network in the human b-cell: an extreme signaling pathway analysis. Biophysical journal 87: 37– 46.

[11] Gianchandani EP, Joyce AR, Palsson BØ, Papin JA (2009) Functional states of the genome-scale escherichia coli transcriptional regulatory system. PLoS computational biology 5: e1000403.

[12] Thiele I, Jamshidi N, Fleming RM, Palsson BØ (2009) Genome-scale reconstruction of escherichia coli’s transcriptional and translational machinery: a knowledge base, its mathematical formulation, and its functional characterization. PLoS computational biology 5: e1000312.

[13] Wilkinson DJ (2009) Stochastic modelling for quantitative description of heterogeneous biological systems. Nature Reviews Genetics 10: 122–133.

[14] Steuer R (2007) Computational approaches to the topology, stability and dynamics of metabolic networks. Phytochemistry 68: 2139–2151.

[15] Orth JD, Thiele I, Palsson BØ (2010) What is ﬂux balance analysis? Nature biotechnology 28: 245–248.

[16] De Jong H (2002) Modeling and simulation of genetic regulatory systems: a literature review. Journal of computational biology 9: 67–103.

[17] Friedman N, Linial M, Nachman I, Pe’er D (2000) Using bayesian networks to analyze expression data. Journal of computational biology 7: 601–620.

[18] Stephens M, Balding DJ (2009) Bayesian statistical methods for genetic association studies. Nature Reviews Genetics 10: 681–690.

[19] Ideker T, Krogan NJ (2012) Diﬀerential network biology. Molecular systems biology 8.

[20] Califano A, Butte AJ, Friend S, Ideker T, Schadt E (2012) Leveraging models of cell regulation and gwas data in integrative network-based association studies. Nature genetics 44: 841–847.

[21] Khatri P, Sirota M, Butte AJ (2012) Ten years of pathway analysis: current approaches and outstanding challenges. PLoS computational biology 8: e1002375.

[22] Atkinson DE (1968) Energy charge of the adenylate pool as a regulatory parameter. interaction with feedback modiﬁers. Biochemistry 7: 4030–4034.

[23] Weisz PB (1973) Diﬀusion and chemical transformation an interdisciplinary excursion. Science 179: 433–440. 196

[24] Reed JL (2012) Shrinking the metabolic solution space using experimental datasets. PLoS computational biology 8: e1002662.

[25] Mo ML, Palsson BØ, Herrg˚ard MJ (2009) Connecting extracellular metabolomic measurements to intracellular ﬂux states in yeast. BMC systems biology 3: 37.

[26] Edwards JS, Ibarra RU, Palsson BO (2001) In silico predictions of escherichia coli metabolic capabilities are consistent with experimental data. Nature biotechnology 19: 125–130.

[27] Shlomi T, Cabili MN, Herrg˚ardMJ, Palsson BØ, Ruppin E (2008) Network- based prediction of human tissue-speciﬁc metabolism. Nature biotechnology 26: 1003–1010.

[28] Becker SA, Palsson BO (2008) Context-speciﬁc metabolic networks are consistent with experiments. PLoS computational biology 4: e1000082.

[29] Colijn C, Brandes A, Zucker J, Lun DS, Weiner B, Farhat MR, Cheng TY, Moody DB, Murray M, Galagan JE (2009) Interpreting expression data with metabolic ﬂux models: predicting mycobacterium tuberculosis mycolic acid production. PLoS computational biology 5: e1000489.

[30] Mahadevan R, Schilling C (2003) The eﬀects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metabolic engineering 5: 264–276.

[31] Fell DA, Small JR (1986) Fat synthesis in adipose tissue. an examination of stoichiometric constraints. Biochem J 238: 781–786.

[32] Majewski R, Domach M (1990) Simple constrained-optimization view of acetate overﬂow in e. coli. Biotechnology and bioengineering 35: 732–738.

[33] Savinell JM, Palsson BO (1992) Optimal selection of metabolic ﬂuxes for in vivo measurement. ii. application to escherichia coli and hybridoma cell metabolism. Journal of theoretical biology 155: 215–242.

[34] Varma A, Palsson BO (1994) Stoichiometric ﬂux balance models quantitatively predict growth and metabolic by-product secretion in wild-type escherichia coli w3110. Applied and environmental microbiology 60: 3724– 3731.

[35] Schuster S, Hilgetag C (1994) On elementary ﬂux modes in biochemical reaction systems at steady state. Journal of Biological Systems 2: 165–182. 197

[36] Schilling CH, Letscher D, Palsson BØ (2000) Theory for the systemic definition of metabolic pathways and their use in interpreting metabolic function from a pathway-oriented perspective. Journal of theoretical biology 203: 229–248. [37] Clarke BL (1980) Stability of complex reaction networks. Wiley Online Li- brary. [38] Dandekar T, Schuster S, Snel B, Huynen M, Bork P (1999) Pathway alignment: application to the comparative analysis of glycolytic enzymes. Biochem J 343: 115–124. [39] Liao JC, Hou SY, Chao YP (1996) Pathway analysis, engineering, and physiological considerations for redirecting central metabolism. Biotechnology and bioengineering 52: 129–140. [40] Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, McKenney K, Sutton G, FitzHugh W, Fields C, Gocayne J, Scott J, Shirley R, Liu L, Glodek A, Kelley J, Weidman J, Phillips C, Spriggs T, Hedblom E, Cotton M, Utterback T, Hanna M, Nguyen D, Saudek D, Brandon R, Fine L, Fritchman J, Fuhrmann J, Geoghagen N, Gnehm C, McDonald L, Small K, Fraser C, Smith H, Venter J (1995) Whole-genome random sequencing and assembly of haemophilus influenzae rd. Science 269: 496–512. [41] Edwards JS, Palsson BO (1999) Systems properties of the haemophilus in- fluenzaerd metabolic genotype. Journal of Biological Chemistry 274: 17410– 17416. [42] Segre D, Vitkup D, Church GM (2002) Analysis of optimality in natural and perturbed metabolic networks. Proceedings of the National Academy of Sciences 99: 15112–15117. [43] Stelling J, Klamt S, Bettenbrock K, Schuster S, Gilles ED (2002) Metabolic network structure determines key aspects of functionality and regulation. Nature 420: 190–193. [44] Ibarra RU, Edwards JS, Palsson BO (2002) Escherichia coli k-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420: 186–189. [45] Almaas E, Kovacs B, Vicsek T, Oltvai Z, BarabásiAL (2004) Global organization of metabolic fluxes in the bacterium escherichia coli. Nature 427: 839–843. [46] Papp B, Pal C, Hurst LD (2004) Metabolic network analysis of the causes and evolution of enzyme dispensability in yeast. Nature 429: 661–664. 198

[47] P´alC, Papp B, Lercher MJ (2005) Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nature genetics 37: 1372–1375.

[48] Hyduke DR, Lewis NE, Palsson BØ (2013) Analysis of omics data with genome-scale models of metabolism. Mol BioSyst 9: 167–174.

[49] Patil KR, Nielsen J (2005) Uncovering transcriptional regulation of metabolism by using metabolic network topology. Proceedings of the Na- tional Academy of Sciences of the United States of America 102: 2685–2689.

[50] Kharchenko P, Church GM, Vitkup D (2005) Expression dynamics of a cellular metabolic network. Molecular Systems Biology 1.

[51] Carlson R, Srienc F (2004) Fundamental escherichia coli biochemical pathways for biomass and energy production: creation of overall ﬂux states. Biotechnology and bioengineering 86: 149–162.

[52] Carlson R, Srienc F (2004) Fundamental escherichia coli biochemical pathways for biomass and energy production: identiﬁcation of reactions. Biotech- nology and bioengineering 85: 1–19.

[53] Harcombe WR, Delaney NF, Leiby N, Klitgord N, Marx CJ (2013) The ability of ﬂux balance analysis to predict evolution of central metabolism scales with the initial distance to the optimum. PLoS computational biology 9: e1003091.

[54] Schuetz R, Kuepfer L, Sauer U (2007) Systematic evaluation of objective functions for predicting intracellular ﬂuxes in escherichia coli. Molecular systems biology 3.

[55] Molenaar D, van Berlo R, de Ridder D, Teusink B (2009) Shifts in growth strategies reﬂect tradeoﬀs in cellular economics. Molecular systems biology 5.

[56] Lobel L, Sigal N, Borovok I, Ruppin E, Herskovits AA (2012) Integrative genomic analysis identiﬁes isoleucine and cody as regulators of listeria monocytogenes virulence. PLoS genetics 8: e1002887.

[57] Folger O, Jerby L, Frezza C, Gottlieb E, Ruppin E, Shlomi T (2011) Predict- ing selective drug targets in cancer through metabolic networks. Molecular systems biology 7.

[58] Frezza C, Zheng L, Folger O, Rajagopalan KN, MacKenzie ED, Jerby L, Micaroni M, Chaneton B, Adam J, Hedley A, Kalna G, Tomlinson I, Pol- lard P, Watson D, Deberardinis R, Shlomi T, Ruppin E, Gottlieb E (2011) Haem oxygenase is synthetically lethal with the tumour suppressor fumarate hydratase. Nature 477: 225–228. 199

[59] Schuetz R, Zamboni N, Zampieri M, Heinemann M, Sauer U (2012) Multi- dimensional optimality of microbial metabolism. Science 336: 601–604.

[60] Lewis NE, Hixson KK, Conrad TM, Lerman JA, Charusanti P, Polpitiya AD, Adkins JN, Schramm G, Purvine SO, Lopez-Ferrer D, Weitz K, Eils R, Konig R, Smith R, Palsson BO (2010) Omic data from evolved e. coli are consistent with computed optimal growth from genome-scale models. Molecular systems biology 6.

[61] Khersonsky O, Tawﬁk D (2010) Enzyme promiscuity: a mechanistic and evolutionary perspective. Annual review of biochemistry 79: 471–505.

[62] Nam H, Lewis NE, Lerman JA, Lee DH, Chang RL, Kim D, Palsson BO (2012) Network context and selection in the evolution to enzyme speciﬁcity. Science 337: 1101–1104.

[63] Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BØ (2007) A genome-scale metabolic reconstruction for escherichia coli k-12 mg1655 that accounts for 1260 orfs and thermodynamic information. Molecular systems biology 3.

[64] Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H (2006) Construction of escherichia coli k- 12 in-frame, single-gene knockout mutants: the keio collection. Molecular systems biology 2.

[65] Scheer M, Grote A, Chang A, Schomburg I, Munaretto C, Rother M, S¨ohngen C, Stelzer M, Thiele J, Schomburg D (2011) Brenda, the enzyme information system in 2011. Nucleic acids research 39: D670–D676.

[66] Szappanos B, Kov´acsK, Szamecz B, Honti F, Costanzo M, Baryshnikova A, Gelius-Dietrich G, Lercher MJ, Jelasity M, Myers CL, Andrews B, Boone C, Oliver S, Pal C, Papp B (2011) An integrated approach to characterize genetic interaction networks in yeast metabolism. Nature genetics 43: 656– 662.

[67] Wessely F, Bartl M, Guthke R, Li P, Schuster S, Kaleta C (2011) Optimal regulatory strategies for metabolic pathways in escherichia coli depending on protein costs. Molecular systems biology 7.

[68] Costanzo M, Baryshnikova A, Bellay J, Kim Y, Spear ED, Sevier CS, Ding H, Koh JLY, Touﬁghi K, Mostafavi S, Prinz J, Onge RPS, VanderSluis B, Makhnevych T, Vizeacoumar FJ, Alizadeh S, Bahr S, Brost RL, Chen Y, Cokol M, Deshpande R, Li Z, Lin ZY, Liang W, Marback M, Paw J, Luis BJS, Shuteriqi E, Tong AHY, Dyk Nv, Wallace IM, Whitney JA, Weirauch MT, Zhong G, Zhu H, Houry WA, Brudno M, Ragibizadeh S, Papp B, P´al 200

C, Roth FP, Giaever G, Nislow C, Troyanskaya OG, Bussey H, Bader GD, Gingras AC, Morris QD, Kim PM, Kaiser CA, Myers CL, Andrews BJ, Boone C (2010) The genetic landscape of a cell. Science 327: 425–431.

[69] Uetz P, Giot L, Cagney G, Mansﬁeld TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbﬂeisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM (2000) A comprehensive analysis of protein–protein interactions in saccharomyces cerevisiae. Nature 403: 623–627.

[70] Gama-Castro S, Salgado H, Peralta-Gil M, Santos-Zavaleta A, Muñiz- Rascado L, Solano-Lira H, Jimenez-Jacinto V, Weiss V, Garc´ıa-SoteloJS, López-Fuentes A, Porrón-SoteloL, Alquicira-Hernández S, Medina-Rivera A, Mart´ınez-FloresI, Alquicira-HernándezK, Mart´ınez-AdameR, Bonavides- Mart´ınezC, Miranda-R´ıosJ, Huerta AM, Mendoza-Vargas A, Collado-Torres L, Taboada B, Vega-Alvarado L, Olvera M, Olvera L, Grande R, Morett E, Collado-Vides J (2011) RegulonDB version 7.0: transcriptional regulation of escherichia coli k-12 integrated within genetic sensory response units (gensor units). Nucleic Acids Research 39: D98–D105.

[71] Segre D, DeLuna A, Church GM, Kishony R (2004) Modular epistasis in yeast metabolism. Nature genetics 37: 77–83.

[72] Harrison R, Papp B, P´alC, Oliver SG, Delneri D (2007) Plasticity of genetic interactions in metabolic networks of yeast. Proceedings of the National Academy of Sciences 104: 2307–2312.

[73] He X, Qian W, Wang Z, Li Y, Zhang J (2010) Prevalent positive epistasis in escherichia coli and saccharomyces cerevisiae metabolic networks. Nature genetics 42: 272–276.

[74] Notebaart RA, Teusink B, Siezen RJ, Papp B (2008) Co-regulation of metabolic genes is better explained by ﬂux coupling than by network distance. PLoS computational biology 4: e26.

[75] Kaleta C, de Figueiredo LF, Schuster S (2009) Can the whole be less than the sum of its parts? pathway analysis in genome-scale metabolic networks using elementary ﬂux patterns. Genome Research 19: 1872–1883.

[76] Faith JJ, Driscoll ME, Fusaro VA, Cosgrove EJ, Hayete B, Juhn FS, Schnei- der SJ, Gardner TS (2008) Many microbe microarrays database: uniformly normalized aﬀymetrix compendia with structured experimental metadata. Nucleic acids research 36: D866–D870.

[77] Orth JD, Palsson BØ (2010) Systematizing the generation of missing metabolic knowledge. Biotechnology and bioengineering 107: 403–412. 201

[78] Reed JL, Patel TR, Chen KH, Joyce AR, Applebee MK, Herring CD, Bui OT, Knight EM, Fong SS, Palsson BO (2006) Systems approach to refining genome annotation. Proceedings of the National Academy of Sciences 103: 17480–17484. [79] Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, Vo TD, Srivas R, Palsson BØ (2007) Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proceedings of the National Academy of Sciences 104: 1777–1782. [80] Ottar R, Giuseppe P, Manuela M, Bernhard OP, Ines T (2013) Inferring the metabolism of human orphan metabolites from their metabolic network context affirms human gluconokinase activity. Biochemical Journal 449: 427– 435. [81] Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2012) Kegg for integration and interpretation of large-scale molecular data sets. Nucleic acids research 40: D109–D114. [82] Nakahigashi K, Toya Y, Ishii N, Soga T, Hasegawa M, Watanabe H, Takai Y, Honma M, Mori H, Tomita M (2009) Systematic phenome analysis of escherichia coli multiple-knockout mutants reveals hidden reactions in central carbon metabolism. Molecular systems biology 5. [83] Lee SY, Lee DY, Kim TY (2005) Systems biotechnology for strain improvement. TRENDS in Biotechnology 23: 349–358. [84] Park JH, Lee SY (2008) Towards systems metabolic engineering of microorganisms for amino acid production. Current opinion in biotechnology 19: 454–460. [85] Caspeta L, Nielsen J (2013) Economic and environmental impacts of microbial biodiesel. Nature biotechnology 31: 789–793. [86] Yim H, Haselbeck R, Niu W, Pujol-Baxley C, Burgard A, Boldt J, Khandu- rina J, Trawick JD, Osterhout RE, Stephen R, Estadilla J, Teisan S, Schreyer HB, Andrae S, Yang TH, Lee SY, Burk MJ, Van Dien S (2011) Metabolic engineering of escherichia coli for direct production of 1,4-butanediol. Nature Chemical Biology 7: 445–452. [87] Kim HU, Kim SY, Jeong H, Kim TY, Kim JJ, Choy HE, Yi KY, Rhee JH, Lee SY (2011) Integrative genome-scale metabolic analysis of vibrio vulnificus for drug targeting and discovery. Molecular systems biology 7. [88] Brynildsen MP, Winkler JA, Spina CS, MacDonald IC, Collins JJ (2013) Potentiating antibacterial activity by predictably enhancing endogenous microbial ros production. Nature biotechnology . 202

[89] Hatzimanikatis V, Li C, Ionita JA, Henry CS, Jankowski MD, Broadbelt LJ (2005) Exploring the diversity of complex metabolic networks. Bioinformatics 21: 1603–1609.

[90] Constantinou L, Gani R (1994) New group contribution method for estimating properties of pure compounds. AIChE Journal 40: 1697–1710.

[91] Burgard AP, Pharkya P, Maranas CD (2003) Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnology and bioengineering 84: 647–657.

[92] Oberhardt MA, Yizhak K, Ruppin E (2013) Metabolically re-modeling the drug pipeline. Current opinion in pharmacology .

[93] Hsu PP, Sabatini DM (2008) Cancer cell metabolism: Warburg and beyond. Cell 134: 703–707.

[94] Jerby L, Shlomi T, Ruppin E (2010) Computational reconstruction of tissue- speciﬁc metabolic models: application to human liver metabolism. Molecular systems biology 6.

[95] Kim PJ, Lee DY, Kim TY, Lee KH, Jeong H, Lee SY, Park S (2007) Metabo- lite essentiality elucidates robustness of escherichia coli metabolism. Proceed- ings of the National Academy of Sciences 104: 13638–13642.

[96] Lerman JA, Hyduke DR, Latif H, Portnoy VA, Lewis NE, Orth JD, Schrimpe-Rutledge AC, Smith RD, Adkins JN, Zengler K, Palsson BO (2012) In silico method for modelling metabolism and gene product expression at genome scale. Nature Communications 3: 929.

[97] Zhang Y, Thiele I, Weekes D, Li Z, Jaroszewski L, Ginalski K, Deacon AM, Wooley J, Lesley SA, Wilson IA, Palsson B, Osterman A, Godzik A (2009) Three-dimensional structural view of the central metabolic network of thermotoga maritima. Science 325: 1544–1549.

[98] Thiele I, Fleming RM, Bordbar A, Schellenberger J, Palsson BØ (2010) Functional characterization of alternate optimal solutions of escherichia coli’s transcriptional and translational machinery. Biophysical journal 98: 2072– 2081.

[99] Chandrasekaran S, Price ND (2010) Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in escherichia coli and mycobacterium tuberculosis. Proceedings of the National Academy of Sciences 107: 17845–17850. 203

[100] Chang RL, Andrews K, Kim D, Li Z, Godzik A, Palsson BO (2013) Structural systems biology evaluation of metabolic thermotolerance in escherichia coli. Science 340: 1220–1223.

[101] Pramanik J, Keasling J (1998) Eﬀect of escherichia coli biomass composition on central metabolic ﬂuxes predicted by a stoichiometric model. Biotechnol- ogy and bioengineering 60: 230–238.

[102] Rodionova IA, Yang C, Li X, Kurnasov OV, Best AA, Osterman AL, Rodi- onov DA (2012) Diversity and versatility of the thermotoga maritima sugar kinome. Journal of bacteriology 194: 5552–5563.

[103] O’Brien EJ, Lerman JA, Chang RL, Hyduke DR, Palsson BO (2013) Genome-scale models of metabolism and gene expression extend and reﬁne growth phenotype prediction. Molecular systems biology 9.

[104] Covert MW, Knight EM, Reed JL, Herrgard MJ, Palsson BO (2004) Integrat- ing high-throughput and computational data elucidates bacterial networks. Nature 429: 92–96.

[105] Gu J, Bourne PE (2009) Structural bioinformatics, volume 44. Wiley. com.

[106] Marr AG, Ingraham JL (1962) Eﬀect of temperature on the composition of fatty acids in escherichia coli. Journal of Bacteriology 84: 1260–1267.

[107] Tenaillon O, Rodr´ıguez-Verdugo A, Gaut RL, McDonald P, Bennett AF, Long AD, Gaut BS (2012) The molecular diversity of adaptive convergence. Science 335: 457–461.

[108] Caspi R, Altman T, Dale JM, Dreher K, Fulcher CA, Gilham F, Kaipa P, Karthikeyan AS, Kothari A, Krummenacker M, Latendresse M, Mueller LA, Paley S, Popescu L, Pujar A, Shearer AG, Zhang P, Karp PD (2010) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Research 38: D473– D479.

[109] Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the uniﬁcation of biology. Nature Genetics 25: 25–29.

[110] Orth JD, Conrad TM, Na J, Lerman JA, Nam H, Feist AM, Palsson BØ (2011) A comprehensive genome-scale reconstruction of escherichia coli metabolism—2011. Molecular systems biology 7. 204

[111] Han JDJ, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJM, Cusick ME, Roth FP, Vidal M (2004) Evidence for dynamically organized modularity in the yeast protein–protein interaction network. Nature 430: 88–93.

[112] Dutkowski J, Kramer M, Surma MA, Balakrishnan R, Cherry JM, Krogan NJ, Ideker T (2013) A gene ontology inferred from molecular networks. Na- ture biotechnology 31: 38–45.

[113] Dolinski K, Botstein D (2013) Automating the construction of gene ontolo- gies. Nature biotechnology 31: 34–35.

[114] Feist AM, Palsson BØ (2008) The growing scope of applications of genome- scale metabolic reconstructions using escherichia coli. Nature biotechnology 26: 659–667.

[115] Palsson B (2006) Properties of reconstructed networks. Cambridge: Systems Biology .

[116] Papin JA, Price ND, Wiback SJ, Fell DA, Palsson BO (2003) Metabolic pathways in the post-genome era. Trends in biochemical sciences 28: 250– 258.

[117] Yeung M, Thiele I, Palsson BØ (2007) Estimation of the number of extreme pathways for metabolic networks. BMC bioinformatics 8: 363.

[118] De Figueiredo LF, Podhorski A, Rubio A, Kaleta C, Beasley JE, Schuster S, Planes FJ (2009) Computing the shortest elementary ﬂux modes in genome- scale metabolic networks. Bioinformatics 25: 3158–3165.

[119] Kelk SM, Olivier BG, Stougie L, Bruggeman FJ (2012) Optimal ﬂux spaces of genome-scale stoichiometric models are determined by a few subnetworks. Scientiﬁc reports 2.

[120] Llaneras F, Pic´oJ (2010) Which metabolic pathways generate and characterize the ﬂux space? a comparison among elementary modes, extreme pathways and minimal generators. BioMed Research International 2010.

[121] Kelley R, Ideker T (2005) Systematic interpretation of genetic interactions using protein networks. Nature biotechnology 23: 561–566.

[122] Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M (2006) Biogrid: a general repository for interaction datasets. Nucleic acids research 34: D535–D539. 205

[123] Keseler IM, Collado-Vides J, Santos-Zavaleta A, Peralta-Gil M, Gama- Castro S, Mu˜niz-RascadoL, Bonavides-Martinez C, Paley S, Krummenacker M, Altman T, Kaipa P, Spaulding A, Pacheco J, Latendresse M, Fulcher C, Sarker M, Shearer AG, Mackie A, Paulsen I, Gunsalus RP, Karp PD (2011) EcoCyc: a comprehensive database of escherichia coli biology. Nucleic Acids Research 39: D583–D590. [124] Cherry JM, Hong EL, Amundsen C, Balakrishnan R, Binkley G, Chan ET, Christie KR, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hirschman JE, Hitz BC, Karra K, Krieger CJ, Miyasato SR, Nash RS, Park J, Skrzypek MS, Simison M, Weng S, Wong ED (2012) Saccharomyces genome database: the genomics resource of budding yeast. Nucleic Acids Research 40: D700–D705. [125] Green RA, Kao HL, Audhya A, Arur S, Mayers JR, Fridolfsson HN, Schul- man M, Schloissnig S, Niessen S, Laband K, Wang S, Starr DA, Hyman AA, Schedl T, Desai A, Piano F, Gunsalus KC, Oegema K (2011) A high- resolution c. elegans essential gene network based on phenotypic proﬁling of a complex tissue. Cell 145: 470–482. [126] Bass JIF, Diallo A, Nelson J, Soto JM, Myers CL, Walhout AJ (2013) Using networks to measure similarity between genes: association index selection. Nature methods 10: 1169–1176. [127] Schellenberger J, Que R, Fleming RMT, Thiele I, Orth JD, Feist AM, Zielin- ski DC, Bordbar A, Lewis NE, Rahmanian S, Kang J, Hyduke DR, Palsson BØ (2011) Quantitative prediction of cellular metabolism with constraint- based models: the COBRA toolbox v2.0. Nature Protocols 6: 1290–1307. [128] Bordbar A, Mo ML, Nakayasu ES, Schrimpe-Rutledge AC, Kim YM, Metz TO, Jones MB, Frank BC, Smith RD, Peterson SN, Hyduke DR, Adkins JN, Palsson BO (2012) Model-driven multi-omic data analysis elucidates metabolic immunomodulators of macrophage activation. Molecular Systems Biology 8: n/a–n/a. [129] Cho BK, Federowicz S, Park YS, Zengler K, Palsson BØ (2012) Decipher- ing the transcriptional regulatory logic of amino acid metabolism. Nature chemical biology 8: 65–71. [130] Cho BK, Federowicz SA, Embree M, Park YS, Kim D, Palsson BØ (2011) The purr regulon in escherichia coli k-12 mg1655. Nucleic acids research 39: 6456–6464. [131] Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgar- ner R, Goodlett DR, Aebersold R, Hood L (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292: 929–934. 206

[132] Modi SR, Camacho DM, Kohanski MA, Walker GC, Collins JJ (2011) Func- tional characterization of bacterial srnas using a network biology approach. Proceedings of the National Academy of Sciences 108: 15522–15527.

[133] Grant CE, Bailey TL, Noble WS (2011) Fimo: scanning for occurrences of a given motif. Bioinformatics 27: 1017–1018.

[134] Botsford JL, DeMoss R (1971) Catabolite repression of tryptophanase in escherichia coli. Journal of bacteriology 105: 303–312.

[135] Shimada T, Yamamoto K, Ishihama A (2011) Novel members of the cra regulon involved in carbon metabolism in escherichia coli. Journal of bacteriology 193: 649–659.

[136] Tramonti A, Visca P, De Canio M, Falconi M, De Biase D (2002) Functional characterization and regulation of gadx, a gene encoding an arac/xyls-like transcriptional activator of the escherichia coli glutamic acid decarboxylase system. Journal of bacteriology 184: 2603–2613.

[137] Palsson B, Zengler K (2010) The challenges of integrating multi-omic data sets. Nature Chemical Biology 6: 787–789.

[138] Sboner A, Mu XJ, Greenbaum D, Auerbach RK, Gerstein MB (2011) The real cost of sequencing: higher than you think! Genome biology 12: 125.

[139] Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression proﬁles. Proceedings of the National Academy of Sciences of the United States of America 102: 15545–15550.

[140] Amador-Noguez D, Feng XJ, Fan J, Roquet N, Rabitz H, Rabinowitz JD (2010) Systems-level metabolic ﬂux proﬁling elucidates a complete, bifur- cated tricarboxylic acid cycle in clostridium acetobutylicum. Journal of bacteriology 192: 4452–4461.

[141] Coleman TF, Pothen A (1986) The null space problem i. complexity. SIAM Journal on Algebraic Discrete Methods 7: 527–537.

[142] Gottlieb LA, Neylon T (2010) Matrix sparsiﬁcation and the sparse null space problem. In: Approximation, Randomization, and Combinatorial Optimiza- tion. Algorithms and Techniques, Springer. pp. 205–218.

[143] Levin JZ, Yassour M, Adiconis X, Nusbaum C, Thompson DA, Friedman N, Gnirke A, Regev A (2010) Comprehensive comparative analysis of strand- speciﬁc rna sequencing methods. Nature methods 7: 709–715. 207

[144] Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory- eﬃcient alignment of short dna sequences to the human genome. Genome Biol 10: R25.

[145] Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript assembly and quantiﬁca- tion by rna-seq reveals unannotated transcripts and isoform switching during cell diﬀerentiation. Nature biotechnology 28: 511–515.

[146] World Health Organization (2009) Global tuberculosis control: epidemiology, strategy, ﬁnancing.

[147] Schnappinger D, Ehrt S, Voskuil MI, Liu Y, Mangan JA, Monahan IM, Dolganov G, Efron B, Butcher PD, Nathan C, Schoolnik GK (2003) Tran- scriptional adaptation of mycobacterium tuberculosis within macrophages insights into the phagosomal environment. The Journal of Experimental Medicine 198: 693–704.

[148] Barkan D, Liu Z, Sacchettini JC, Glickman MS (2009) Mycolic acid cyclo- propanation is essential for viability, drug resistance, and cell wall integrity of mycobacterium tuberculosis. Chemistry & biology 16: 499–509.

[149] Boshoﬀ HI, Barry CE (2005) Tuberculosis—metabolism and respiration in the absence of growth. Nature Reviews Microbiology 3: 70–80.

[150] Oberhardt MA, Palsson BØ, Papin JA (2009) Applications of genome-scale metabolic reconstructions. Molecular systems biology 5.

[151] Palsson B (2004) Two-dimensional annotation of genomes. Nature biotechnology 22: 1218–1219.

[152] Reed JL, Famili I, Thiele I, Palsson BO (2006) Towards multidimensional genome annotation. Nature Reviews Genetics 7: 130–141.

[153] Thiele I, Vo TD, Price ND, Palsson BØ (2005) Expanded metabolic reconstruction of helicobacter pylori (iit341 gsm/gpr): an in silico genome-scale characterization of single-and double-deletion mutants. Journal of Bacteri- ology 187: 5818–5830.

[154] Beste DJ, Hooper T, Stewart G, Bonde B, Avignone-Rossa C, Bushell ME, Wheeler P, Klamt S, Kierzek AM, McFadden J (2007) Gsmn-tb: a web-based genome-scale network model of mycobacterium tuberculosis metabolism. Genome biology 8: R89.

[155] Jamshidi N, Palsson BØ (2007) Investigating the metabolic capabilities of mycobacterium tuberculosis h37rv using the in silico strain inj661 and proposing alternative drug targets. BMC systems biology 1: 26. 208

[156] Duarte NC, Herrg˚ardMJ, Palsson BØ (2004) Reconstruction and validation of saccharomyces cerevisiae ind750, a fully compartmentalized genome-scale metabolic model. Genome research 14: 1298–1309. [157] Sheikh K, F¨orsterJ, Nielsen LK (2005) Modeling hybridoma cell metabolism using a generic genome-scale metabolic model of mus musculus. Biotechnol- ogy progress 21: 112–121. [158] Becker SA, Feist AM, Mo ML, Hannum G, Palsson BØ, Herrgard MJ (2007) Quantitative prediction of cellular metabolism with constraint-based models: the cobra toolbox. Nature protocols 2: 727–738. [159] Gordon S (2007) The macrophage: past, present and future. European journal of immunology 37: S9–S17. [160] Newsholme P, Curi R, Pithon Curi T, Murphy C, Garcia C, Pires de Melo M (1999) Glutamine metabolism by lymphocytes, macrophages, and neu- trophils: its importance in health and disease. The Journal of nutritional biochemistry 10: 316–324. [161] Griscavage J, Rogers N, Sherman M, Ignarro L (1993) Inducible nitric oxide synthase from a rat alveolar macrophage cell line is inhibited by nitric oxide. The Journal of Immunology 151: 6329–6337. [162] Burke B, Lewis C (2002) The macrophage, volume 2nd. Oxford University Press. [163] Warburg O (1956) On the origin of cancer cells. Science 123: 309–314. [164] Pieters J (2008) Mycobacterium tuberculosis and the macrophage: Main- taining a balance. Cell host & microbe 3: 399–407. [165] Bentrup KHz, Russell DG (2001) Mycobacterial persistence: adaptation to a changing environment. TRENDS in Microbiology 9: 597–605. [166] McKinney JD, zu Bentrup KH, Mu˜noz-El´ıasEJ, Miczak A, Chen B, Chan WT, Swenson D, Sacchettini JC, Jacobs WR, Russell DG (2000) Persis- tence of mycobacterium tuberculosis in macrophages and mice requires the glyoxylate shunt enzyme isocitrate lyase. Nature 406: 735–738. [167] Rustad TR, Sherrid AM, Minch KJ, Sherman DR (2009) Hypoxia: a window into mycobacterium tuberculosis latency. Cellular microbiology 11: 1151– 1159. [168] Yuan Y, Crane DD, Simpson RM, Zhu Y, Hickey MJ, Sherman DR, Barry CE (1998) The 16-kda alpha-crystallin (acr) protein of mycobacterium tuberculosis is required for growth in macrophages. Proceedings of the National Academy of Sciences 95: 9578–9583. 209

[169] Talaat AM, Lyons R, Howard ST, Johnston SA (2004) The temporal expression profile of mycobacterium tuberculosis infection in mice. Proceedings of the National Academy of Sciences of the United States of America 101: 4602–4607. [170] Shi L, Sohaskey CD, Kana BD, Dawes S, North RJ, Mizrahi V, Gennaro ML (2005) Changes in energy metabolism of mycobacterium tuberculosis in mouse lung and under in vitro conditions affecting aerobic respiration. Proceedings of the National Academy of Sciences of the United States of America 102: 15629–15634. [171] Bacon J, James BW, Wernisch L, Williams A, Morley KA, Hatch GJ, Man- gan JA, Hinds J, Stoker NG, Butcher PD, Marsh PD (2004) The influence of reduced oxygen availability on pathogenicity and gene expression in mycobacterium tuberculosis. Tuberculosis 84: 205–217. [172] Hampshire T, Soneji S, Bacon J, James BW, Hinds J, Laing K, Stabler RA, Marsh PD, Butcher PD (2004) Stationary phase gene expression of mycobacterium tuberculosis following a progressive nutrient depletion: a model for persistent organisms? Tuberculosis 84: 228–238. [173] Voskuil MI, Visconti K, Schoolnik G (2004) Mycobacterium tuberculosis gene expression during adaptation to stationary phase and low-oxygen dormancy. Tuberculosis 84: 218–227. [174] Schneemann M, Schoeden G (2007) Macrophage biology and immunology: man is not a mouse. Journal of leukocyte biology 81: 579–579. [175] Rich E, Torres M, Sada E, Finegan C, Hamilton B, Toossi Z (1997) Mycobac- terium tuberculosis (mtb)-stimulated production of nitric oxide by human alveolar macrophages and relationship of nitric oxide production to growth inhibition of mtb. Tubercle and Lung Disease 78: 247–255. [176] Constant P, Perez E, Malaga W, LanéelleMA, Saurel O, DafféM, Guilhot C (2002) Role of the pks15/1 gene in the biosynthesis of phenolglycolipids in the mycobacterium tuberculosiscomplex evidence that all strains synthesize glycosylatedp-hydroxybenzoic methyl esters and that strains devoid of phenolglycolipids harbor a frameshift mutation in thepks15/1 gene. Journal of Biological Chemistry 277: 38148–38158. [177] Sassetti CM, Rubin EJ (2003) Genetic requirements for mycobacterial survival during infection. Proceedings of the National Academy of Sciences 100: 12989–12994. [178] Sohaskey CD (2008) Nitrate enhances the survival of mycobacterium tuberculosis during inhibition of respiration. Journal of bacteriology 190: 2981– 2986. 210

[179] Pinto R, Harrison JS, Hsu T, Jacobs WR, Leyh TS (2007) Sulfite reduction in mycobacteria. Journal of bacteriology 189: 6714–6722. [180] Thuong NTT, Dunstan SJ, Chau TTH, Thorsson V, Simmons CP, Quyen NTH, Thwaites GE, Thi Ngoc Lan N, Hibberd M, Teo YY, Seielstad M, Aderem A, Farrar JJ, Hawn TR (2008) Identification of tuberculosis susceptibility genes with human macrophage gene expression profiles. PLoS Pathog 4: e1000229. [181] Itano N, Zhuo L, Kimata K (2008) Impact of the hyaluronan-rich tumor microenvironment on cancer initiation and progression. Cancer science 99: 1720–1725. [182] Hirayama Y, Yoshimura M, Ozeki Y, Sugawara I, Udagawa T, Mizuno S, Itano N, Kimata K, Tamaru A, Ogura H, Kobayashi K, Matsumoto S (2009) Mycobacteria exploit host hyaluronan for efficient extracellular replication. PLoS Pathog 5: e1000643. [183] Martineau AR, Wilkinson RJ, Wilkinson KA, Newton SM, Kampmann B, Hall BM, Packe GE, Davidson RN, Eldridge SM, Maunsell ZJ, Rainbow SJ, Berry JL, Griffiths CJ (2007) A single dose of vitamin d enhances immunity to mycobacteria. American Journal of Respiratory and Critical Care Medicine 176: 208–213. [184] Antohe F, Radulescu L, Puchianu E, Kennedy MD, Low PS, Simionescu M (2005) Increased uptake of folate conjugates by activated macrophages in experimental hyperlipemia. Cell and tissue research 320: 277–285. [185] Eoh H, Brown AC, Buetow L, Hunter WN, Parish T, Kaur D, Brennan PJ, Crick DC (2007) Characterization of the mycobacterium tuberculosis 4-diphosphocytidyl-2-c-methyl-d-erythritol synthase: potential for drug development. Journal of bacteriology 189: 8922–8927. [186] Boshoff HIM, Xu X, Tahlan K, Dowd CS, Pethe K, Camacho LR, Park TH, Yun CS, Schnappinger D, Ehrt S, Williams KJ, Barry CE (2008) Biosynthesis and recycling of nicotinamide cofactors in mycobacterium tuberculosis AN ESSENTIAL ROLE FOR NAD IN NONREPLICATING BACILLI. Journal of Biological Chemistry 283: 19329–19341. [187] Brzostek A, Pawelczyk J, Rumijowska-Galewicz A, Dziadek B, Dziadek J (2009) Mycobacterium tuberculosis is able to accumulate and utilize cholesterol. Journal of bacteriology 191: 6584–6591. [188] Hu Y, Van Der Geize R, Besra GS, Gurcha SS, Liu A, Rohde M, Singh M, Coates A (2010) 3-ketosteroid 9alpha-hydroxylase is an essential factor in the pathogenesis of mycobacterium tuberculosis. Molecular microbiology 75: 107–121. 211

[189] Kafsack BF, LlinásM (2010) Eating at the table of another: metabolomics of host-parasite interactions. Cell host & microbe 7: 90–99. [190] Jamshidi N, Palsson BO (2009) Using in silico models to simulate dual perturbation experiments: procedure development and interpretation of out- comes. BMC systems biology 3: 44. [191] AbuOun M, Suthers PF, Jones GI, Carter BR, Saunders MP, Maranas CD, Woodward MJ, Anjum MF (2009) Genome scale reconstruction of a salmonella metabolic model comparison of similarity and differences with a commensal escherichia coli strain. Journal of Biological Chemistry 284: 29480–29488. [192] Kazeros A, Harvey BG, Carolan BJ, Vanni H, Krause A, Crystal RG (2008) Overexpression of apoptotic cell removal receptor mertk in alveolar macrophages of cigarette smokers. American journal of respiratory cell and molecular biology 39: 747. [193] Newsholme P, Curi R, Gordon S, Newsholme EA (1986) Metabolism of glucose, glutamine, long-chain fatty acids and ketone bodies by murine macrophages. Biochem J 239: 121–125. [194] Sato H, Watanabe H, Ishii T, Bannai S (1987) Neutral amino acid transport in mouse peritoneal macrophages. Journal of Biological Chemistry 262: 13015–13019. [195] Curi R, Newsholme P, Newsholme EA (1988) Metabolism of pyruvate by isolated rat mesenteric lymphocytes, lymphocyte mitochondria and isolated mouse macrophages. Biochem J 250: 383–388. [196] Chang A, Scheer M, Grote A, Schomburg I, Schomburg D (2009) Brenda, amenda and frenda the enzyme information system: new content and tools in 2009. Nucleic acids research 37: D588–D592. [197] Prasad TSK, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Math- ivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Kishore CJH, Kanth S, Ahmed M, Kashyap MK, Mohmood R, Ramachan- dra YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, Ramabadran S, Chaerkady R, Pandey A (2009) Human protein reference database—2009 update. Nucleic Acids Research 37: D767–D772. [198] Berglund L, BjörlingE, Oksvold P, Fagerberg L, Asplund A, Szigyarto CAK, Persson A, Ottosson J, WernérusH, Nilsson P, Lundberg E, Sivertsson A,˚ Navani S, Wester K, Kampf C, Hober S, PonténF, UhlénM (2008) A genecentric human protein atlas for expression profiles based on antibodies. Molecular & Cellular Proteomics 7: 2019–2027. 212

[199] Yan Q, Sad´eeW (2000) Human membrane transporter database: a web- accessible relational database for drug transport studies and pharmacoge- nomics. AAPS PharmSci 2: 11–17.

[200] Famili I, F¨orsterJ, Nielsen J, Palsson BO (2003) Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome- scale reconstructed metabolic network. Proceedings of the National Academy of Sciences 100: 13134–13139.

[201] Iyengar U, Vakil U (1985) Enumeration and structural assessment of peritoneal macrophages during progressive protein deﬁciency in rats. Journal of Biosciences 7: 15–26.

[202] Schmien R, Seiler KU, Wassermann O (1974) Drug-induced phospholipidosis. Naunyn-Schmiedeberg’s archives of pharmacology 283: 331–334.

[203] Grutman M, Orgel’ MY (1970) Nucleic acid concentration in cultures of macrophages with antigen. Bulletin of Experimental Biology and Medicine 70: 793–794.

[204] Sahu S, Lynn WS (1977) Lipid composition of human alveolar macrophages. Inﬂammation 2: 83–91.

[205] Mbawuike IN, Herscowitz HB (1989) Mh-s, a murine alveolar macrophage cell line: morphological, cytochemical, and functional characteristics. Journal of leukocyte biology 46: 119–127.

[206] Wagner D, Maser J, Lai B, Cai Z, Barry CE, zu Bentrup KH, Russell DG, Bermudez LE (2005) Elemental analysis of mycobacterium avium-, mycobacterium tuberculosis-, and mycobacterium smegmatis-containing phagosomes indicates pathogen-induced microenvironments within the host cell’s endo- somal system. The Journal of Immunology 174: 1491–1500.

[207] Storey JD, Tibshirani R (2003) Statistical methods for identifying diﬀeren- tially expressed genes in dna microarrays. In: Functional Genomics, Springer. pp. 149–157.

[208] Kaufman DE, Smith RL (1998) Direction choice for accelerated convergence in hit-and-run sampling. Operations Research 46: 84–95.

[209] Reed JL, Palsson BØ (2004) Genome-scale in silico models of e. coli have multiple equivalent phenotypic states: assessment of correlated reaction subsets that comprise network states. Genome Research 14: 1797–1805.

[210] Mosser DM (2003) The many faces of macrophage activation. Journal of leukocyte biology 73: 209–212. 213

[211] Raschke W, Baird S, Ralph P, Nakoinz I (1978) Functional macrophage cell lines transformed by abelson leukemia virus. Cell 15: 261–267.

[212] Chapekar MS, Zaremba TG, Kuester RK, Hitchins VM (1996) Synergistic induction of tumor necrosis factor alpha by bacterial lipopolysaccharide and lipoteichoic acid in combination with polytetraﬂuoroethylene particles in a murine macrophage cell line raw 264.7. Journal of biomedical materials research 31: 251–256.

[213] Scheel J, Weimans S, Thiemann A, Heisler E, Hermann M (2009) Exposure of the murine raw 264.7 macrophage cell line to hydroxyapatite dispersions of various composition and morphology: assessment of cytotoxicity, activation and stress response. Toxicology in Vitro 23: 531–538.

[214] Cirillo DM, Valdivia RH, Monack DM, Falkow S (1998) Macrophage- dependent induction of the salmonella pathogenicity island 2 type iii secretion system and its role in intracellular survival. Molecular microbiology 30: 175–188.

[215] Gutierrez MG, Master SS, Singh SB, Taylor GA, Colombo MI, Deretic V (2004) Autophagy is a defense mechanism inhibiting bcg and mycobacterium tuberculosis survival in infected macrophages. Cell 119: 753–766.

[216] Moeslinger T, Friedl R, Volf I, Brunner M, Baran H, Koller E, Spieckermann PG (1999) Urea induces macrophage proliferation by inhibition of inducible nitric oxide synthesis1. Kidney international 56: 581–588.

[217] Mosser DM, Edwards JP (2008) Exploring the full spectrum of macrophage activation. Nature Reviews Immunology 8: 958–969.

[218] Heilbronn LK, Campbell LV (2008) Adipose tissue macrophages, low grade inﬂammation and insulin resistance in human obesity. Current pharmaceutical design 14: 1225–1230.

[219] Guo S, Wietecha TA, Hudkins KL, Kida Y, Spencer MW, Pichaiwong W, Kojima I, Duﬃeld JS, Alpers CE (2011) Macrophages are essential contrib- utors to kidney injury in murine cryoglobulinemic membranoproliferative glomerulonephritis. Kidney international 80: 946–958.

[220] De Palma M, Lewis CE (2011) Macrophages limit chemotherapy. Nature 472: 303–4.

[221] Stempin CC, Dulgerian LR, Garrido VV, Cerban FM (2009) Arginase in parasitic infections: macrophage activation, immunosuppression, and intracellular signals. BioMed Research International 2010. 214

[222] Odegaard JI, Chawla A (2011) Alternative macrophage activation and metabolism. Annual review of pathology 6: 275.

[223] Mathis D, Shoelson SE (2011) Immunometabolism: an emerging frontier. Nature Reviews Immunology 11: 81–83.

[224] Chang RL, Xie L, Xie L, Bourne PE, Palsson BØ (2010) Drug oﬀ-target effects predicted using structural analysis in the context of a metabolic network model. PLoS computational biology 6: e1000938.

[225] Sakagami H, Kishino K, Amano O, Kanda Y, Kunii S, Yokote Y, Oizumi H, Oizumi T (2009) Cell death induced by nutritional starvation in mouse macrophage-like raw264. 7 cells. Anticancer research 29: 343–347.

[226] Zhuang JC, Wogan GN (1997) Growth and viability of macrophages continuously stimulated to produce nitric oxide. Proceedings of the National Academy of Sciences 94: 11875–11880.

[227] Alldridge LC, Harris HJ, Plevin R, Hannon R, Bryant CE (1999) The annexin protein lipocortin 1 regulates the mapk/erk pathway. Journal of Biological Chemistry 274: 37620–37628.

[228] Rodr´ıguez-PradosJC, TravésPG, Cuenca J, Rico D, AragonésJ, Mart´ın- Sanz P, Cascante M, BoscáL (2010) Substrate fate in activated macrophages: a comparison between innate, classic, and alternative activation. The Journal of Immunology 185: 605–614.

[229] Edwards JS, Palsson BO (2000) Robustness analysis of the escherichiacoli metabolic network. Biotechnology Progress 16: 927–939.

[230] Newsholme P, Rosa L, Newsholme E, Curi R (1996) The importance of fuel metabolism to macrophage function. Cell biochemistry and function 14: 1– 10.

[231] Murphy C, Newsholme P (1998) Importance of glutamine metabolism in murine macrophages and human monocytes to l-arginine biosynthesis and rates of nitrite or urea production. Clinical Science 95: 397–407.

[232] Baydoun A, Bogle R, Pearson J, Mann G (1993) Arginine uptake and metabolism in cultured murine macrophages. Agents and Actions 38: C127– C129.

[233] Yeramian A, Martin L, Arpa L, Bertran J, Soler C, McLeod C, Modolell M, Palac´ınM, Lloberas J, Celada A (2006) Macrophages require distinct arginine catabolism and transport systems for proliferation and for activation. European journal of immunology 36: 1516–1526. 215

[234] Fernando De Souza L, Rafaela Jardim F, Pretto Sauter I, Moreira De Souza M, Aida Bernard E (2008) High glucose increases raw 264.7 macrophages activation by lipoteichoic acid from staphylococcus aureus. Clinica chimica acta 398: 130–133.

[235] Bassit RA, Sawada LA, Bacurau RF, Navarro F, Martins Jr E, Santos RV, Caperuto EC, Rogeri P, Costa Rosa LF (2002) Branched-chain amino acid supplementation and the immune response of long-distance athletes. Nutri- tion 18: 376–379.

[236] Watanabe H, Bannai S (1987) Induction of cystine transport activity in mouse peritoneal macrophages. The Journal of experimental medicine 165: 628–640.

[237] Mellor AL, Munn DH (2004) Ido expression by dendritic cells: tolerance and tryptophan catabolism. Nature Reviews Immunology 4: 762–774.

[238] Moﬀett JR, Namboodiri MA (2003) Tryptophan and the immune response. Immunology and cell biology 81: 247–265.

[239] Oh GS, Pae HO, Choi BM, Chae SC, Lee HS, Ryu DG, Chung HT (2004) 3- hydroxyanthranilic acid, one of metabolites of tryptophan via indoleamine 2, 3-dioxygenase pathway, suppresses inducible nitric oxide synthase expression by enhancing heme oxygenase-1 expression. Biochemical and biophysical research communications 320: 1156–1162.

[240] Carbonnelle-Puscian A, Copie-Bergman C, Baia M, Martin-Garcia N, Allory Y, Haioun C, Cr´emadesA, Abd-Alsamad I, Farcet JP, Gaulard P, Castellano F, Molinier-Frenkel V (2009) The novel immunosuppressive enzyme IL4I1 is expressed by neoplastic cells of several b-cell lymphomas and by tumor- associated macrophages. Leukemia 23: 952–960.

[241] Helming L, B¨oseJ, Ehrchen J, Schiebe S, Frahm T, Geﬀers R, Probst-Kepper M, Balling R, Lengeling A (2005) alpha, 25-dihydroxyvitamin d3 is a potent suppressor of interferon gamma–mediated macrophage activation. Blood 106: 4351–4358.

[242] Overbergh L, Decallonne B, Valckx D, Verstuyf A, Depovere J, Laureys J, Rutgeerts O, Saint-Arnaud R, Bouillon R, Mathieu C (2000) Identiﬁca- tion and immune regulation of 25-hydroxyvitamin d-1-alpha-hydroxylase in murine macrophages. Clinical & Experimental Immunology 120: 139–146.

[243] Hask´oG, Cronstein BN (2004) Adenosine: an endogenous regulator of innate immunity. Trends in immunology 25: 33–39. 216

[244] Palmer T, Trevethick M (2008) Suppression of inﬂammatory and immune responses by the a2a adenosine receptor: an introduction. British journal of pharmacology 153: S27–S34.

[245] Appelberg R (2006) Macrophage nutriprive antimicrobial mechanisms. Jour- nal of leukocyte biology 79: 1117–1128.

[246] Flierl MA, Rittirsch D, Nadeau BA, Chen AJ, Sarma JV, Zetoune FS, McGuire SR, List RP, Day DE, Hoesel LM, Gao H, Van Rooijen N, Huber- Lang MS, Neubig RR, Ward PA (2007) Phagocyte-derived catecholamines enhance acute inﬂammatory injury. Nature 449: 721–725.

[247] Prabhakar SS, Zeballos GA, Montoya-Zavala M, Leonard C (1997) Urea inhibits inducible nitric oxide synthase in macrophage cell line. American Journal of Physiology-Cell Physiology 273: C1882–C1888.

[248] Yawata I, Takeuchi H, Doi Y, Liang J, Mizuno T, Suzumura A (2008) Macrophage-induced neurotoxicity is mediated by glutamate and attenu- ated by glutaminase inhibitors and gap junction inhibitors. Life sciences 82: 1111–1116.

[249] Wang D, De Vivo D (1997) Pyruvate carboxylase deﬁciency. GeneReviews at GeneTests Medical Genetics Information Resource (database online) Copy- right 2012.

[250] Jarrett SG, Milder JB, Liang LP, Patel M (2008) The ketogenic diet increases mitochondrial glutathione levels. Journal of neurochemistry 106: 1044–1051.

[251] Garai J, LórándT, MolnárV (2005) Ketone bodies affect the enzymatic activity of macrophage migration inhibitory factor. Life sciences 77: 1375– 1380.

[252] Shi L, Chowdhury SM, Smallwood HS, Yoon H, Mottaz-Brewer HM, Nor- beck AD, McDermott JE, Clauss TRW, Heﬀron F, Smith RD, Adkins JN (2009) Proteomic investigation of the time course responses of RAW 264.7 macrophages to infection with salmonella enterica. Infection and Immunity 77: 3227–3233.

[253] Cairns RA, Harris IS, Mak TW (2011) Regulation of cancer cell metabolism. Nature Reviews Cancer 11: 85–95.

[254] Vander Heiden MG (2011) Targeting cancer metabolism: a therapeutic window opens. Nature reviews Drug discovery 10: 671–684.

[255] Newell MK, Villalobos-Menuey E, Schweitzer SC, Harper ME, Camley RE (2006) Cellular metabolism as a basis for immune privilege. Journal of immune based therapies and vaccines 4: 1. 217

[256] Kominsky DJ, Campbell EL, Colgan SP (2010) Metabolic shifts in immunity and inflammation. The Journal of Immunology 184: 4062–4068. [257] Osborn O, Olefsky JM (2012) The cellular and signaling networks linking the immune system and metabolism in disease. Nature medicine 18: 363–374. [258] Chawla A, Nguyen KD, Goh YS (2011) Macrophage-mediated inflammation in metabolic disease. Nature Reviews Immunology 11: 738–749. [259] McKnight SL (2010) On getting there from here. Science Signaling 330: 1338. [260] Shell SA, Hesse C, Morris SM, Milcarek C (2005) Elevated levels of the 64-kda cleavage stimulatory factor (cstf-64) in lipopolysaccharide-stimulated macrophages influence gene expression and induce alternative poly (a) site selection. Journal of Biological Chemistry 280: 39950–39961. [261] Yates III JR, Eng JK, McCormack AL, Schieltz D (1995) Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Analytical chemistry 67: 1426–1436. [262] Monroe ME, TolićN, Jaitly N, Shaw JL, Adkins JN, Smith RD (2007) Viper: an advanced software package to support high-throughput lc-ms peptide identification. Bioinformatics 23: 2021–2023. [263] Polpitiya AD, Qian WJ, Jaitly N, Petyuk VA, Adkins JN, Camp DG, Ander- son GA, Smith RD (2008) Dante: a statistical tool for quantitative analysis of-omics data. Bioinformatics 24: 1556–1558. [264] Jones MB, Peterson SN, Benn R, Braisted JC, Jarrahi B, Shatzkes K, Ren D, Wood TK, Blaser MJ (2010) Role of luxs in bacillus anthracis growth and virulence factor expression. Virulence 1: 72–83. [265] Kim YM, Metz TO, Hu Z, Wiedner SD, Kim JS, Smith RD, Morgan WF, Zhang Q (2011) Formation of dehydroalanine from mimosine and cysteine: artifacts in gas chromatography/mass spectrometry based metabolomics. Rapid Communications in Mass Spectrometry 25: 2561–2564. [266] Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Toma- shevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Holko M, Ayanbule O, Yefanov A, Soboleva A (2011) NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Research 39: D1005– D1010. [267] Auberry KJ, Kiebel GR, Monroe ME, Adkins JN, Anderson GA, Smith RD (2010) Omics. pnl. gov: A portal for the distribution and sharing of multi- disciplinary pan-omics information. Journal of proteomics & bioinformatics 3: 1. 218

[268] Reed JL, Vo TD, Schilling CH, Palsson BO (2003) An expanded genome- scale model of escherichia coli k-12 (ijr904 gsm/gpr). Genome Biol 4: R54.

[269] F¨orsterJ, Famili I, Fu P, Palsson BØ, Nielsen J (2003) Genome-scale reconstruction of the saccharomyces cerevisiae metabolic network. Genome research 13: 244–253.

[270] Gille C, BöllingC, Hoppe A, Bulik S, Hoffmann S, HübnerK, Karlstädt A, Ganeshan R, KönigM, Rother K, Weidlich M, Behre J, HolzhütterHG (2010) HepatoNet1: a comprehensive metabolic reconstruction of the human hepatocyte for the analysis of liver physiology. Molecular Systems Biology 6: n/a–n/a.

[271] Rapoport TA, Heinrich R, Jacobasch G, Rapoport S (1974) A linear steady- state treatment of enzymatic chains. European Journal of Biochemistry 42: 107–120.

[272] Brumen M, Heinrich R (1984) A metabolic osmotic model of human erythrocytes. Biosystems 17: 155–169.

[273] Joshi A, Palsson BO (1989) Metabolic dynamics in the human red cell: Part i—a comprehensive kinetic model. Journal of theoretical biology 141: 515– 528.

[274] Mulquiney P, Bubb W, Kuchel P (1999) Model of 2, 3-bisphosphoglycerate metabolism in the human erythrocyte based on detailed enzyme kinetic equations1: in vivo kinetic characterization of 2, 3-bisphosphoglycerate synthase/phosphatase using 13c and 31p nmr. Biochem J 342: 567–580.

[275] Schauer M, Heinrich R, Rapoport S (1981) Mathematische modellierung der glykolyse und des adeninnukleotidstoﬀwechsels menschlicher erythrozyten. Acta Biological er Medica Germanica 40: 1659–1682.

[276] Jamshidi N, Edwards JS, Fahland T, Church GM, Palsson BO (2001) Dy- namic simulation of the human red blood cell metabolic network. Bioinfor- matics 17: 286–287.

[277] Kinoshita A, Tsukada K, Soga T, Hishiki T, Ueno Y, Nakayama Y, Tomita M, Suematsu M (2007) Roles of hemoglobin allostery in hypoxia-induced metabolic alterations in erythrocytes simulation and its veriﬁcation by metabolome analysis. Journal of Biological Chemistry 282: 10731–10741.

[278] Raftos JE, Whillier S, Kuchel PW (2010) Glutathione synthesis and turnover in the human erythrocyte alignment of a model based on detailed enzyme kinetics with experimental data. Journal of Biological Chemistry 285: 23557– 23567. 219

[279] Roux-Dalvai F, Peredo AGd, Sim´oC, Guerrier L, Bouyssi´eD, Zanella A, Citterio A, Burlet-Schiltz O, Boschetti E, Righetti PG, Monsarrat B (2008) Extensive analysis of the cytoplasmic proteome of human erythrocytes using the peptide ligand library technology and advanced mass spectrometry. Molecular & Cellular Proteomics 7: 2254–2269.

[280] Pasini EM, Kirkegaard M, Mortensen P, Lutz HU, Thomas AW, Mann M (2006) In-depth analysis of the membrane and cytosolic proteome of red blood cells. Blood 108: 791–801.

[281] Low TY, Seow TK, Chung M (2002) Separation of human erythrocyte membrane associated proteins with one-dimensional and two-dimensional gel electrophoresis followed by identiﬁcation with matrix-assisted laser desorption/ionization-time of ﬂight mass spectrometry. Proteomics 2: 1229– 1239.

[282] Goodman SR, Kurdia A, Ammann L, Kakhniashvili D, Daescu O (2007) The human red blood cell proteome and interactome. Experimental Biology and Medicine 232: 1391–1408.

[283] Vo TD, Greenberg HJ, Palsson BO (2004) Reconstruction and functional characterization of the human mitochondrial metabolic network based on proteomic and biochemical data. Journal of Biological Chemistry 279: 39532–39540.

[284] Wiback SJ, Palsson BO (2002) Extreme pathway analysis of human red blood cell metabolism. Biophysical Journal 83: 808–818.

[285] Lou LL, Clarke S (1987) Enzymatic methylation of band 3 anion transporter in intact human erythrocytes. Biochemistry 26: 52–59.

[286] Allan D, Michell RH (1976) Production of 1, 2-diacylglycerol in human erythrocyte membranes exposed to low concentrations of calcium ions. Biochim- ica et Biophysica Acta (BBA)-Biomembranes 455: 824–830.

[287] Arduini A, Mancinelli G, Radatti G, Dottori S, Molajoni F, Ramsay R (1992) Role of carnitine and carnitine palmitoyltransferase as integral components of the pathway for membrane phospholipid fatty acid turnover in intact human erythrocytes. Journal of Biological Chemistry 267: 12673–12681.

[288] Berridge MJ, Irvine RF (1984) Inositol trisphosphate, a novel second mes- senger in cellular signal transduction. Nature .

[289] Tsukamoto T, Sonenberg M (1979) Catecholamine regulation of human erythrocyte membrane protein kinase. Journal of Clinical Investigation 64: 534. 220

[290] McGee J, Fitzpatrick F (1986) Erythrocyte-neutrophil interactions: formation of leukotriene b4 by transcellular biosynthesis. Proceedings of the Na- tional Academy of Sciences 83: 1349–1353.

[291] Inoue K, Ohbora Y, Yamasawa K (1978) Metabolism of acetaldehyde by human erythrocytes. Life sciences 23: 179–183.

[292] Becker S, Price N, Palsson B (2006) Metabolite coupling in genome-scale metabolic networks. BMC bioinformatics 7: 111.

[293] Graham JM, Peerson JM, Haskell MJ, Shrestha RK, Brown KH, Allen LH (2005) Erythrocyte riboflavin for the detection of riboflavin deficiency in pregnant nepali women. Clinical chemistry 51: 2162–2165.

[294] Baines M, Davies G (1988) The evaluation of erythrocyte thiamin diphosphate as an indicator of thiamin status in man, and its comparison with erythrocyte transketolase activity measurements. Annals of clinical biochemistry 25: 698–705.

[295] Agarwal DP, Tobar-Rojas L, Harada S, Werner Goedde H (1983) Compar- ative study of erythrocyte aldehyde dehydrogenase in alcoholics and control subjects. Pharmacology Biochemistry and Behavior 18: 89–95.

[296] Murakami K, Takahito K, Ohtsuka Y, Fujiwara Y, Shimada M, Kawakami Y (1989) Impairment of glutathione metabolism in erythrocytes from patients with diabetes mellitus. Metabolism 38: 753–758.

[297] Prabakaran S, Wengenroth M, Lockstone HE, Lilley K, Leweke FM, Bahn S (2007) 2-d dige analysis of liver and red blood cells provides further evidence for oxidative stress in schizophrenia. Journal of proteome research 6: 141– 149.

[298] Dadoly J (2007) The Merck Manual of Diagnosis and Therapy, volume 18. Elsevier.

[299] Shifman S, Bronstein M, Sternfeld M, Pisanté-ShalomA, Lev-Lehman E, Weizman A, Reznik I, Spivak B, Grisaru N, Karp L, Schiffer R, Kotler M, Strous RD, Swartz-Vanetik M, Knobler HY, Shinar E, Beckmann JS, Yakir B, Risch N, Zak NB, Darvasi A (2002) A highly significant association between a COMT haplotype and schizophrenia. The American Journal of Human Genetics 71: 1296–1302.

[300] Usaite R, Patil KR, Grotkjær T, Nielsen J, Regenberg B (2006) Global transcriptional and physiological responses of saccharomyces cerevisiae to ammonium, l-alanine, or l-glutamine limitation. Applied and environmental microbiology 72: 6194–6203. 221

[301] Herrg˚ardMJ, Lee BS, Portnoy V, Palsson BØ (2006) Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in saccharomyces cerevisiae. Genome research 16: 627–635.

[302] Wishart DS, Knox C, Guo AC, Eisner R, Young N, Gautam B, Hau DD, Psychogios N, Dong E, Bouatra S, Mandal R, Sinelnikov I, Xia J, Jia L, Cruz JA, Lim E, Sobsey CA, Shrivastava S, Huang P, Liu P, Fang L, Peng J, Fradette R, Cheng D, Tzur D, Clements M, Lewis A, Souza AD, Zuniga A, Dawe M, Xiong Y, Clive D, Greiner R, Nazyrova A, Shaykhutdinov R, Li L, Vogel HJ, Forsythe I (2009) HMDB: a knowledgebase for the human metabolome. Nucleic Acids Research 37: D603–D610.

[303] Surgenor D (1974) The red blood cell. Access Online via Elsevier.

[304] Amberger J, Bocchini CA, Scott AF, Hamosh A (2009) Mckusick’s online mendelian inheritance in man (omim). Nucleic acids research 37: D793– D796.

[305] Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M (2008) Drugbank: a knowledgebase for drugs, drug actions and drug targets. Nucleic acids research 36: D901–D906.

[306] Sigurdsson M, Jamshidi N, Jonsson JJ, Palsson BØ (2009) Genome-scale network analysis of imprinted human metabolic genes. Epigenetics 4: 43– 46.

[307] Shlomi T, Cabili MN, Ruppin E (2009) Predicting metabolic biomarkers of human inborn errors of metabolism. Molecular systems biology 5.

[308] Ward LD, Kellis M (2012) Interpreting noncoding genetic variation in complex traits and human disease. Nature biotechnology 30: 1095–1106.

[309] McCarthy JJ, McLeod HL, Ginsburg GS (2013) Genomic medicine: a decade of successes, challenges, and opportunities. Science translational medicine 5: 189sr4–189sr4.

[310] Soon WW, Hariharan M, Snyder MP (2013) High-throughput sequencing for biology and medicine. Molecular systems biology 9.

[311] Yang J, Manolio TA, Pasquale LR, Boerwinkle E, Caporaso N, Cunningham JM, de Andrade M, Feenstra B, Feingold E, Hayes MG, Hill WG, Landi MT, Alonso A, Lettre G, Lin P, Ling H, Lowe W, Mathias RA, Melbye M, Pugh E, Cornelis MC, Weir BS, Goddard ME, Visscher PM (2011) Genome parti- tioning of genetic variation for complex traits using common SNPs. Nature Genetics 43: 519–525. 222

[312] Anderson CA, Boucher G, Lees CW, Franke A, D’Amato M, Taylor KD, Lee JC, Goyette P, Imielinski M, Latiano A, LagacéC, Scott R, Amininejad L, Bumpstead S, Baidoo L, Baldassano RN, Barclay M, Bayless TM, Brand S, BüningC, Colombel JF, Denson LA, De Vos M, Dubinsky M, Edwards C, Ellinghaus D, Fehrmann RSN, Floyd JAB, Florin T, Franchimont D, Franke L, Georges M, Glas J, Glazer NL, Guthery SL, Haritunians T, Hayward NK, Hugot JP, Jobin G, Laukens D, Lawrance I, LémannM, Levine A, Libioulle C, Louis E, McGovern DP, Milla M, Montgomery GW, Morley KI, Mowat C, Ng A, Newman W, Ophoff RA, Papi L, Palmieri O, Peyrin- Biroulet L, PanésJ, Phillips A, Prescott NJ, Proctor DD, Roberts R, Russell R, Rutgeerts P, Sanderson J, Sans M, Schumm P, Seibold F, Sharma Y, Simms LA, Seielstad M, Steinhart AH, Targan SR, van den Berg LH, Vatn M, Verspaget H, Walters T, Wijmenga C, Wilson DC, Westra HJ, Xavier RJ, Zhao ZZ, Ponsioen CY, Andersen V, Torkvist L, Gazouli M, Anagnou NP, Karlsen TH, Kupcinskas L, Sventoraityte J, Mansfield JC, Kugathasan S, Silverberg MS, Halfvarson J, Rotter JI, Mathew CG, Griffiths AM, Gearry R, Ahmad T, Brant SR, Chamaillard M, Satsangi J, Cho JH, Schreiber S, Daly MJ, Barrett JC, Parkes M, Annese V, Hakonarson H, Radford-Smith G, Duerr RH, Vermeire S, Weersma RK, Rioux JD (2011) Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nature Genetics 43: 246–252.

[313] Suhre K, Shin SY, Petersen AK, Mohney RP, Meredith D, WägeleB, Alt- maier E, CARDIoGRAM, Deloukas P, Erdmann J, Grundberg E, Hammond CJ, de Angelis MH, KastenmüllerG, KöttgenA, Kronenberg F, Mangino M, Meisinger C, Meitinger T, Mewes HW, Milburn MV, Prehn C, Raffler J, Ried JS, Römisch-Margl W, Samani NJ, Small KS, Erich Wichmann H, Zhai G, Illig T, Spector TD, Adamski J, Soranzo N, Gieger C (2011) Human metabolic individuality in biomedical and pharmaceutical research. Nature 477: 54–60.

[314] Majewski J, Pastinen T (2011) The study of eqtl variations by rna-seq: from snps to phenotypes. Trends in genetics 27: 72–79.

[315] Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y, Pritchard JK (2010) Understanding mechanisms underlying human gene expression variation with rna sequencing. Nature 464: 768–772.

[316] Fellay J, Thompson AJ, Ge D, Gumbs CE, Urban TJ, Shianna KV, Little LD, Qiu P, Bertelsen AH, Watson M, Warner A, Muir AJ, Brass C, Albrecht J, Sulkowski M, McHutchison JG, Goldstein DB (2010) ITPA gene variants protect against anaemia in patients treated for chronic hepatitis c. Nature 464: 405–408. 223

[317] Suhre K, Gieger C (2012) Genetic variation in metabolic phenotypes: study designs and applications. Nature Reviews Genetics 13: 759–769.

[318] Rhee EP, Ho JE, Chen MH, Shen D, Cheng S, Larson MG, Ghorbani A, Shi X, Helenius IT, O’Donnell CJ, Souza AL, Deik A, Pierce KA, Bullock K, Walford GA, Vasan RS, Florez JC, Clish C, Yeh JRJ, Wang TJ, Gerszten RE (2013) A genome-wide association study of the human metabolome in a community-based cohort. Cell Metabolism 18: 130–143.

[319] Bordbar A, Monk JM, King ZA, Palsson BO (2014) Constraint-based models predict metabolic and associated cellular functions. Nature Reviews Genetics 15: 107–120.

[320] Jamshidi N, Miller FJ, Mandel J, Evans T, Kuo MD (2011) Individualized therapy of hht driven by network analysis of metabolomic proﬁles. BMC systems biology 5: 200.

[321] Jamshidi N, Palsson BØ (2006) Systems biology of snps. Molecular systems biology 2.

[322] Agren R, Mardinoglu A, Asplund A, Kampf C, Uhlen M, Nielsen J (2014) Identiﬁcation of anticancer drugs for hepatocellular carcinoma through personalized genome-scale metabolic modeling. Molecular systems biology 10.

[323] Jamshidi N, Wiback SJ, Palsson BØ (2002) In silico model-driven assessment of the eﬀects of single nucleotide polymorphisms (snps) on human red blood cell metabolism. Genome research 12: 1687–1692.

[324] Saucerman JJ, Healy SN, Belik ME, Puglisi JL, McCulloch AD (2004) Proar- rhythmic consequences of a kcnq1 akap-binding domain mutation computational models of whole cells and heterogeneous tissue. Circulation research 95: 1216–1224.

[325] Teusink B, Passarge J, Reijenga CA, Esgalhado E, van der Weijden CC, Schepper M, Walsh MC, Bakker BM, van Dam K, Westerhoﬀ HV, Snoep JL (2000) Can yeast glycolysis be understood in terms of in vitro kinetics of the constituent enzymes? testing biochemistry. European Journal of Biochemistry 267: 5313–5329.

[326] Jamshidi N, Palsson BØ (2010) Mass action stoichiometric simulation models: incorporating kinetics and regulation into stoichiometric models. Bio- physical journal 98: 175–185.

[327] Jamshidi N, Palsson BØ (2008) Formulating genome-scale kinetic models in the post-genome era. Molecular systems biology 4. 224

[328] Palsson B (2011) Systems biology: simulation of dynamic network states. Cambridge University Press.

[329] Bordbar A, Jamshidi N, Palsson BO (2011) iab-rbc-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states. BMC systems biology 5: 110.

[330] Tan Y, Liao JC (2012) Metabolic ensemble modeling for strain engineers. Biotechnology journal 7: 343–353.

[331] Jol SJ, K¨ummelA, Hatzimanikatis V, Beard DA, Heinemann M (2010) Ther- modynamic calculations for biochemical transport and reaction processes in metabolic networks. Biophysical journal 99: 3139–3144.

[332] Cargill M, Altshuler D, Ireland J, Sklar P, Ardlie K, Patil N, Lane CR, Lim EP, Kalyanaraman N, Nemesh J, Ziaugra L, Friedland L, Rolfe A, War- rington J, Lipshutz R, Daley GQ, Lander ES (1999) Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genetics 22: 231–238.

[333] Rua˜noG, Bernene J, Windemuth A, Bower B, Wencker D, Seip RL, Kocherla M, Holford TR, Petit WA, Hanks S (2009) Physiogenomic comparison of edema and bmi in patients receiving rosiglitazone or pioglitazone. Clinica Chimica Acta 400: 48–55.

[334] Ritchie GR, Dunham I, Zeggini E, Flicek P (2014) Functional annotation of noncoding sequence variants. Nature methods 11: 294–296.

[335] Palsson B, Joshi A, Ozturk S (1987) Reducing complexity in metabolic networks: making metabolic meshes manageable. In: Federation proceedings. volume 46, pp. 2485-2489.

[336] Jamshidi N, Palsson BØ (2009) Flux-concentration duality in dynamic nonequilibrium biological networks. Biophysical journal 97: L11–L13.

[337] Jamshidi N, Palsson BØ (2008) Top-down analysis of temporal hierarchy in biochemical reaction networks. PLoS computational biology 4: e1000177.

[338] Russmann S, Grattagliano I, Portincasa P, Palmieri VO, Palasciano G (2006) Ribavirin-induced anemia: mechanisms, risk factors and related targets for future research. Current medicinal chemistry 13: 3351–3357.

[339] Homma M, Matsuzaki Y, Inoue Y, Shibata M, Mitamura K, Tanaka N, Kohda Y (2004) Marked elevation of erythrocyte ribavirin levels in interferon and ribavirin-induced anemia. Clinical gastroenterology and hepatology 2: 337–339. 225

[340] Van Waeg G, Niklasson F, Ericson A,˚ De Verdier CH (1988) Purine metabolism in normal and itp-pyrophosphohydrolase-deﬁcient human erythrocytes. Clinica chimica acta 171: 279–292.

[341] De Franceschi L, Fattovich G, Turrini F, Ayi K, Brugnara C, Manzato F, Noventa F, Stanzial AM, Solero P, Corrocher R (2000) Hemolytic anemia induced by ribavirin therapy in patients with chronic hepatitis c virus infection: role of membrane oxidative damage. Hepatology 31: 997–1004.

[342] Brochot E, Castelain S, Duverlie G, Capron D, Nguyen-Khac E, Fran¸coisC (2010) Ribavirin monitoring in chronic hepatitis c therapy: anaemia versus eﬃcacy. Antivir Ther 15: 687–95.

[343] Bodenheimer H, Lindsay KL, Davis GL, Lewis JH, Thung SN, Seeﬀ LB (1997) Tolerance and eﬃcacy of oral ribavirin treatment of chronic hepatitis c: a multicenter trial. Hepatology 26: 473–477.

[344] Mills GC, Levin WC, Alperin JB (1968) Hemolytic anemia associated with low erythrocyte atp. Blood 32: 15–32.

[345] Minakami S, Yoshikawa H (1966) Studies on erythrocyte glycolysis iii. the eﬀects of active cation transport, ph and inorganic phosphate concentration on erythrocyte glycolysis. Journal of biochemistry 59: 145–150.

[346] Hitomi Y, Cirulli ET, Fellay J, McHutchison JG, Thompson AJ, Gumbs CE, Shianna KV, Urban TJ, Goldstein DB (2011) Inosine triphosphate protects against ribavirin-induced adenosine triphosphate loss by adenylosuccinate synthase function. Gastroenterology 140: 1314–1321.

[347] Tong LV (2008) Development and application of mass spectrometry-based metabolomics methods for disease biomarker identiﬁcation. Ph.D. thesis, Massachusetts Institute of Technology.

[348] McCloskey D, Gangoiti JA, King ZA, Naviaux RK, Barshop BA, Palsson BO, Feist AM (2013) A model-driven quantitative metabolomics analysis of aerobic and anaerobic metabolism in e. coli k-12 mg1655 that is biochemically and thermodynamically consistent. Biotechnology and bioengineering .

[349] D’Alessandro A, Righetti PG, Zolla L (2010) The red blood cell proteome and interactome: an update. Journal of proteome research 9: 144-163.

[350] Noor E, Bar-Even A, Flamholz A, Lubling Y, Davidi D, Milo R (2012) An integrated open framework for thermodynamics of reactions that combines accuracy and coverage. Bioinformatics 28.

[351] M¨ortersP, Peres Y (2010) Brownian motion, volume 30. Cambridge Univer- sity Press. 226

[352] Karr JR, Sanghvi JC, Macklin DN, Gutschow MV, Jacobs JM, Bolival Jr B, Assad-Garcia N, Glass JI, Covert MW (2012) A whole-cell computational model predicts phenotype from genotype. Cell 150: 389–401.

[353] Thiele I, Swainston N, Fleming RMT, Hoppe A, Sahoo S, Aurich MK, Har- aldsdottir H, Mo ML, Rolfsson O, Stobbe MD, Thorleifsson SG, Agren R, BöllingC, Bordel S, Chavali AK, Dobson P, Dunn WB, Endler L, Hala D, Hucka M, Hull D, Jameson D, Jamshidi N, Jonsson JJ, Juty N, Keating S, Nookaew I, Le NovèreN, Malys N, Mazein A, Papin JA, Price ND, Selkov Sr E, Sigurdsson MI, Simeonidis E, Sonnenschein N, Smallbone K, Sorokin A, van Beek JHGM, Weichart D, Goryanin I, Nielsen J, Westerhoff HV, Kell DB, Mendes P, Palsson BØ (2013) A community-driven global reconstruction of human metabolism. Nature Biotechnology 31: 419–425.

[354] Borenstein E (2012) Computational systems biology and in silico modeling of the human microbiome. Brieﬁngs in bioinformatics 13: 769–780.

[355] Levy R, Borenstein E (2013) Metabolic modeling of species interaction in the human microbiome elucidates community-level assembly rules. Proceedings of the National Academy of Sciences 110: 12804–12809.