Open Farasat Phd Dissertation.Pdf

The Pennsylvania State University The Graduate School Department of Chemical Engineering SEQUENCE-TO-FUNCTION MODELS FOR EFFICIENT OPTIMIZATION OF METABOLIC PATHWAYS AND GENETIC CIRCUITS A Dissertation in Chemical Engineering by Iman Farasat 2015 Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Auguest 2015 The dissertation of Iman Farasat was reviewed and approved* by the following: Howard M. Salis Assistant Professor of Chemical Engineering and Biological Engineering Dissertation Advisor Chair of Committee Costas D. Maranas Donald B. Broughton Professor of Chemical Engineering Ali Borhan Professor of Chemical Engineering Timothy C. Meredith Assistant Professor of Biochemistry and Molecular Biology Janna K. Maranas Professor of Chemical Engineering Chair of Graduate Program *Signatures are on file in the Graduate School ABSTRACT A central challenge in creating a biorenewable economy is the economically competitive production of valuable chemicals from renewable feedstocks. The latest advances in metabolic engineering and synthetic biology have yielded engineered microbes that manufacture hydrocarbon fuels, biodegradable plastics, and drugs. This has mainly been achieved by either manipulating a natural host’s metabolic network or by expressing heterologous enzymes that convert one of the host’s metabolites to the product of interest. Recent attempts in synthetic biology and metabolic engineering have been focused on identifying well-defined and beneficial mutations to extend an organism’s metabolic capabilities. Technical developments in DNA synthesis and sequencing have enabled the rapid assembly of synthetic metabolic modules, direct genome mutagenesis, and the construction of synthetic bacterial genomes. These advances provide almost complete control over a microorganism’s DNA sequence, enabling the insertion of heterologous proteins and the variation of an organism’s protein expression levels. However, the space for possible DNA mutations is extremely large and cannot be brute-forced to optimize target genetic systems; for instance, the number of possible ways to mutate 100 base pairs of DNA is larger than the number of atoms on earth. Most mutations have neutral or negative effects on the target genetic systems’ output, and the beneficial mutations must be fished out by rational predictive design or found by exhaustive experimental study. The absence of a quantitative and predictive theory that relates a genetic system’s DNA sequence to its output phenotypic behavior has hampered our ability to design and optimize large genetic systems. Here, we employ biophysical modeling to build quantitative maps that relate a genetic system’s DNA sequence with the expression rate of its genetic elements and final phenotypic activity. We use these maps to optimize three types of genetic systems: a heterologous metabolic iii pathway, a genetic circuit controller, and a CRISPR/(d)Cas9 system. For each system, we first perform a minimal number of experiments to systematically develop a biophysical sequence- expression-activity map that we have called SEAMAP, then use it to optimize the genetic system at the DNA sequence level. We show that designing experiments based on the governing biophysics of the genetic systems substantially reduces the time and effort needed to optimize the genetic systems. For instance, we varied the protein production rate by over 1000-fold in five different microbial hosts and for five different proteins by performing 8 to 36 experiments in each case. As a comparison, a similar-fold change in the final activity of a protein-containing system may be achieved by conducting 100 to 100,000 experiments using generic DNA libraries (e.g., promoter and RBS libraries that alter expression of the protein) and well-established protein engineering approaches (e.g. Directed Evolution). In the first study (Chapter 1 and Appendix A), we perform biophysical guided optimization of a carotenoid biosynthesis metabolic pathway. We express three heterologous enzymes from R. sphaeroides on a bacterial operon to convert IPP, an E. coli metabolite, to neurosporene, a yellow-brown pigment. To optimize the expression rate of these enzymes, we first perform 73 systematic experiments to build a quantitative map that links the RBS sequence of each enzyme with its intracellular expression levels and the neurosporene productivity of the mutants. We then use the map to design the DNA sequence of the metabolic pathway for various applications. This includes navigating the pathway’s transcription and translation design space, identifying metabolic rate-limiting steps, examining the evolutionary robustness of genetic systems during long-term growth, quantifying optimality conditions for maximizing productivity, and designing pathway variants with maximal productivity. We show that the SEAMAP enables us to systematically vary productivity of the metabolic pathway over 100-fold. iv The second case study (Chapter 2 and Appendix B) involves design and optimization of an analog genetic circuit controller: a signal amplifier circuit. The circuit is constructed by transmitting an input transcriptional signal through the cascade of two transcription factors (TFs). We first develop a thermodynamic-based map to quantify the effect of changes in genetic context and environmental conditions on transcriptional regulation in bacteria. We examine the map by performing 847 experiments in diverse microbial hosts and genetic contexts. Using the map, we also design a set of experiments to systematically measure in vivo binding energy of six TetR- homolog TFs to their DNA operator sites on an absolute energy scale (kcal/mol). We then employ the biophysical map to design the DNA sequence for a family of signal amplifying genetic circuits called genetic OpAmps that expand the dynamic range of cell sensors. In the last case study (Chapter 3 and Appendix C), we focused on developing a SEAMAP to optimize activity and specificity of the revolutionary CRISPR/Cas9 system. This simple genetic system requires two basic elements, the Cas9 protein and a guide RNA, and can be programmed with any target sequence to perform high-throughput genome editing as well as high-throughput transcriptional regulation in diverse organisms. A major challenge in employing this system is its high rate of off-target activity, which may result in many unexpected indels or in misregulation. Our analysis of the available high throughput measurements shows that Cas9 activity and specificity are anti-correlated. Here, we employ statistical thermodynamics and enzyme kinetics to build a SEAMAP that relates the guide RNA and Cas9 RBS sequences with the expression of the total guide RNA-loaded Cas9 complex and its final phenotypic activity on any DNA locus in a target genome. We first combine the high-throughput measurements from six studies with a small set of systematic in vivo experiments to parameterize the SEAMAP. We then employ the map to solve the max Cas9 activity-max Cas9 specificity problem by considering the effect of Cas9 and guide RNA expression rate, mutations in guide RNA sequence, growth v condition, and genetic context on the final Cas9 activity at any genome locus. We finally use the map to explain the results of two recent high-throughput studies: a genome-wide analysis of targeting a specific DNA sequence in the λ-phage genome and an HIV treatment that involves excising the HIV cassette from the genome of HEK293T human cell line. vi TABLE OF CONTENTS List of Figures.... ...................................................................................................................... x List of Tables ........................................................................................................................... xiii Achievements .......................................................................................................................... xv Acknowledgements .................................................................................................................. xvi Chapter 1 Efficient search, mapping, and optimization of multi-protein genetic systems in diverse bacteria ................................................................................................................ 1 Introduction ...................................................................................................................... 1 Results .............................................................................................................................. 4 Efficient Searching of the Sequence-Expression Space ........................................... 4 Navigation of Expression Spaces in Diverse Bacterial Species ............................... 8 Efficient search in gram-positive and gram-negative bacterial genomes ................. 10 Efficient search in multi-dimensional expression spaces ......................................... 12 Mapping the Sequence-Expression-Activity Space of a Multi-enzyme Pathway .... 15 Design and Optimization of Multi-enzyme Pathways using SEAMAPs ................. 18 The Expanding Search for Optimally Balanced Pathways....................................... 21 Discussion ........................................................................................................................ 23 Materials and Methods ..................................................................................................... 28 Strains and Plasmid Construction ...........................................................................

Open Farasat Phd Dissertation.Pdf

A Simple Colorimetric RNA Polymerase Assay

The Conserved Structure of Plant Telomerase RNA Provides the Missing Link for an Evolutionary Pathway from Ciliates to Humans

RNA-Dependent RNA Polymerase Speed and Fidelity Are Not the Only Determinants of the Mechanism Or Eﬃciency of Recombination

T7 RNA Polymerase from Escherichia Coli BL 21/Par 1219 Nucleoside-Triphosphate: RNA Nucleotidyltransferase (DNA-Directed), EC 2.7.7.6 Cat

Datasheet for T7 RNA Polymerase

Regulation of Trypanosoma Brucei

Interaction of Ribonucleoside Triphosphates with the Gene 4 Primase of Bacteriophage T7*

Inhibition and Regulation of Isoprenoid Biosynthetic Pathways in Plants Inhibition Et Régulation Des Voies De Biosynthèse Des Isoprénoïdes Chez Les Végétaux

Mechanical Unfolding of Long Human Telomeric RNA (TERRA) Miguel Garavís,‡Ab Rebeca Bocanegra,‡C Elías Herrero-Galán,C Carlos González,B Alfredo Villasante,A and J

Gluconeogenesis Is Essential for Trypanosome Development in the Tsetse Fly Vector

RNA Template-Directed RNA Synthesis by T7 RNA Polymerase CHRISTIAN CAZENAVE* and OLKE C

Original Article a Glucose Sensor Role for Glucokinase in Anterior Pituitary Cells Dorothy Zelent,1 Maria L