The Pennsylvania State University the Graduate School
Total Page:16
File Type:pdf, Size:1020Kb
The Pennsylvania State University The Graduate School COMPUTATIONAL METHODS FOR DESIGNING DE NOVO METABOLIC PATHWAYS AND GENOME-MINIMIZED CHASSIS STRAINS A Dissertation in Chemical Engineering by Lin Wang Ó 2020 Lin Wang Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2020 ii The dissertation of Lin Wang was reviewed and approved by the following: Costas D. Maranas Donald B. Broughton Professor of Chemical Engineering Dissertation Advisor Chair of Committee Andrew L. Zydney Distinguished Professor of Chemical Engineering Antonios Armaou Professor of Chemical Engineering Andrew D. Patterson Associate Professor of Molecular Toxicology Phillip E. Savage Professor of Chemical Engineering Head of the Department and Walter L. Robb Family Chair iii ABSTRACT Metabolic pathways reflect an organism's chemical repertoire and hence their elucidation and design have been a primary goal in metabolic engineering. Various computational methods have been developed to design novel metabolic pathways while taking into account several prerequisites such as pathway stoichiometry, thermodynamics, host compatibility, and enzyme availability. The choice of the method is often determined by the nature of the metabolites of interest and preferred host organism, along with computational complexity and availability of software tools. In Chapter 1, we review different computational approaches used to design metabolic pathways based on the reaction network representation of the database (i.e., graph or stoichiometric matrix) and the search algorithm (i.e., graph search, flux balance analysis, or retrosynthetic). We also put forth a systematic workflow that can be implemented in projects requiring pathway design and highlight current limitations and obstacles in computational pathway design. One of the challenges of existing computational de novo pathway design tools is that important considerations in such as seamless blending of known transformations with novel steps, complexity of pathway topology, mass conservation, cofactor balance, thermodynamic feasibility, microbial chassis selection, and cost are largely dealt with in a posteriori fashion. In Chapter 2, we describe a computational algorithm to design bioconversion routes while simultaneously considering any combination of the aforementioned design criteria. First, we track and codify as rules all reaction centers using a novel prime factorization based encoding technique (rePrime). Reaction rules and known biotransformations are then simultaneously used by the pathway designing algorithm (novoStoic) to trace both metabolites and molecular moieties through balanced bio-conversion strategies. We demonstrate the use of novoStoic in bypassing steps in existing iv pathways through the use of novel transformations, assembling complex pathways blending both known and novel steps towards pharmaceuticals, and postulating ways to biodegrade xenobiotics. Another challenge of computational pathway design tool is to balance the stoichiometry of co-metabolites and cofactors and dealing with reaction rule utilization in a single workflow. In Chapter 3, we provide a workflow of using two complementary stoichiometry-based pathway design tools optStoic and novoStoic to tackle these challenges. optStoic is designed to determine the stoichiometry of overall conversion first which optimizes a performance criterion (e.g. high carbon/energy efficiency) and ensures a comprehensive search of co-metabolites and cofactors. The procedure then identifies the minimum number of intervening reactions to connect the source and sink metabolites. We further the pathway design procedure by expanding the search space to include both known and hypothetical reactions, represented by reaction rules, in a new tool termed novoStoic. We demonstrate the use of the two computational tools in pathway elucidation by designing novel synthetic routes for isobutanol. The heterogeneity of the aromatic products originating from lignin catalytic depolymerization remains one of the major challenges associated with lignin valorization. Microbes have evolved catabolic pathways that can funnel heterogeneous intermediates to a few central aromatic products. These aromatic compounds can subsequently undergo intra- or extradiol ring opening to produce value-added chemicals. However, such funneling pathways are only partially characterized for a few organisms such as Sphingobium sp. SYK-6 and Pseudomonas putida KT2440. Herein, in Chapter 4, we apply the pathway design workflow (optStoic and novoStoic) to computationally prospect all possible ways of funneling lignin-derived mono- and biaryls. We demonstrate the application for (i) designing alternative pathways of funneling S, G, and H lignin monomers, and (ii) exploring cleavage pathway of �-1 and �-� dimers. By exploring the uncharted chemical space afforded by enzyme promiscuity, novoStoic can help discover v previously unknown native pathways leveraging enzyme promiscuity and propose new carbon/energy efficient lignin funneling pathways with few heterologous enzymes. Genome minimized strains offer advantages as bacterial chassis by reducing transcriptional cost, eliminating competing functions and limiting unwanted regulatory interactions with any heterologous metabolic pathways introduced to the production host. Existing approaches for identifying stretches of DNA to remove are largely ad hoc based on information on presumably dispensable regions through experimentally determined non-essential genes and comparative genomics. In Chapter 5, we introduce a versatile genome reduction algorithm MinGenome that implements a mixed integer linear program (MILP) to iteratively identify the largest dispensable contiguous sequences without affecting the organism’s growth or other desirable traits. Known essential genes or genes that cause significant fitness or performance loss are flagged and their deletion is thus prohibited. MinGenome also preserves needed transcription factors and promoter regions ensuring that retained genes will be properly transcribed while also avoiding the simultaneous deletion of synthetic lethal pairs. The potential benefit of removing even larger contiguous stretches of DNA if only one or two essential genes (to be re-inserted elsewhere) are within the deleted sequence is explored. We applied the algorithm to design a minimized E. coli strain and found that we were able to recapitulate the long deletions identified in previous experimental studies. In Chapter 6, we conclude the metabolic pathway design and bacterial chassis design efforts and discuss the future perspectives for integrating thermodynamic analysis into the pathway design workflow. vi TABLE OF CONTENTS LIST OF FIGURES ............................................................................................................ ix LIST OF TABLES .............................................................................................................. xi ACKNOWLEDGEMENTS ................................................................................................ xii Chapter 1 A Review of Computational Tools for Design and Reconstruction of Metabolic Pathways .................................................................................................... 1 1.1 Introduction ........................................................................................................... 1 1.2. Generalized in silico pathway design workflow .................................................... 3 1.2.1. Databases ................................................................................................... 4 1.2.2. Representation of the database (metabolic network).................................... 6 1.2.3. Network pruning ........................................................................................ 8 1.2.4. Search algorithms ....................................................................................... 10 1.2.5. Pathway ranking......................................................................................... 12 1.3. DNA sequence selection, protein engineering, and de novo enzyme design ........... 14 1.4. Perspective ........................................................................................................... 16 1.5. References ............................................................................................................ 22 Chapter 2 Pathway Design Using De Novo Steps Through Uncharted Biochemical Spaces ......................................................................................................................... 34 2.1. Introduction .......................................................................................................... 34 2.2. Results ................................................................................................................. 37 2.2.1. An illustrative example for rePrime and novoStoic. .................................... 37 2.2.2. 1,4-Butanediol synthesis. ............................................................................ 39 2.2.3. Phenylephrine synthesis. ............................................................................ 41 2.2.4. Oxidative degradation of benzo[a]pyrene to catechol. ................................. 44 2.3. Discussion ............................................................................................................ 48 2.4. Methods ..............................................................................................................