Engineering Orthogonal Signaling Pathways to Probe Sequence Space Capacity
Total Page:16
File Type:pdf, Size:1020Kb
Engineering orthogonal signaling pathways to probe sequence space capacity By Conor James McClune B.A. Molecular and Cell Biology University of California, Berkeley (2012) Submitted to the Department of Biology in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY IN BIOLOGY at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY August 2019 © Conor James McClune. All rights reserved. The author hereby grants Massachusetts Institute of Technology permission to reproduce and distriBute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created ____________________________________ ____________________________________ Conor James McClune Michael T. LauB Author Professor of Biology August 7th, 2019 Thesis Advisor ____________________________________ ____________________________________ Christopher A. Voigt Stephen P. Bell Professor of Biological Engineering Professor of Biology and Biological Engineering Thesis Advisor Co-Chair, Biology Graduate Committee Engineering orthogonal signaling pathways to probe sequence space capacity by Conor McClune Submitted to the Graduate Program in Biology on September 3rd, 2019 in partial fulfillment of the requirement for the degree of Doctor of Philosophy in Biology at the Massachusetts Institute of Technology ABSTRACT Gene duplication is a common and powerful mechanism by which cells create new signaling pathways, but recently duplicated proteins typically must become insulated from each other, and from other paralogs, to prevent unwanted cross-talk. A similar challenge arises when new sensors or synthetic signaling pathways are engineered within cells or transferred between genomes. How easily new pathways can be introduced into cells depends on the density and distribution of paralogous pathways in the sequence space defined by their specificity-determining residues. Here, I directly probe how crowded sequence space is by generating novel two-component signaling proteins in Escherichia coli using cell sorting coupled to deep-sequencing to analyze large libraries designed based on coevolution patterns. I produce 58 new insulated pathways, in which functional kinase-substrate pairs have different specificities than the parent proteins, and demonstrate that several new pairs are orthogonal to all 27 paralogous pathways in E. coli. Additionally, I readily identify sets of 6 novel kinase-substrate pairs that are mutually orthogonal to each other, significantly increasing the two-component signaling capacity of E. coli. These results indicate that sequence space is not densely occupied. The relative sparsity of paralogs in sequence space suggests that new, insulated pathways can easily arise during evolution or be designed de novo. I demonstrate the latter by engineering a new signaling pathway in E. coli that responds to a plant cytokinin without cross-talk to extant pathways. The work in this thesis also demonstrates how coevolution-guided mutagenesis and sequence-space mapping can be used to design large sets of orthogonal protein-protein interactions. Thesis Co-supervisor: Michael T. Laub Title: Professor of Biology Thesis Co-supervisor: Christopher A. Voigt Title: Professor of Biological Engineering McClune | 3 Table of Contents Figure Index ............................................................................................................................................. 6 Acknowledgements .................................................................................................................................. 8 Chapter 1 – Introduction ............................................................................................................... 9 Biological systems evolve through reuse and rewiring of a common set of parts ........................... 10 Evolutionary plasticity of protein interactions ................................................................................... 12 Regulatory flexiBility arising through rewiring of protein-DNA interactions .................................................... 12 The scale and plasticity of protein-protein interaction networks ........................................................................ 13 Molecular mechanisms of rewiring protein interactions ................................................................... 14 Predicting specificity determinants using extant protein sequences ................................................ 20 The fitness cost of spurious interactions ............................................................................................. 26 The sequence space of protein-protein interactions ........................................................................... 28 Two-component pathways: a tool for studying protein sequence space .......................................... 33 Mechanism of action ........................................................................................................................................... 36 Domain modularity ............................................................................................................................................. 37 Specificity and insulation of two-component signaling pathways ...................................................................... 39 Sequence space of specificity-determining residues ........................................................................................... 42 Conclusion .............................................................................................................................................. 44 References .............................................................................................................................................. 47 Chapter 2 – Engineering orthogonal signaling pathways reveals the sparse occupancy of sequence ........................................................................................................................................ 56 Introduction ........................................................................................................................................... 57 Diversifying coevolving residues to engineer new signaling protein pairs ...................................... 59 Orthogonality of PhoQ*-PhoP* variants to parent proteins ............................................................ 66 Insulation is enforced by stringent phosphatase specificity .............................................................. 69 Orthogonality to endogenous two-component pathways .................................................................. 70 Diversity of specificity amongst PhoQ*-PhoP* variants ................................................................... 75 Design of mutually orthogonal signaling pathways ........................................................................... 75 Utilizing orthogonal PhoQ*-PhoP* domains to insulate an Arabidopsis two-component sensor .. 81 Discussion ............................................................................................................................................... 83 Methods .................................................................................................................................................. 84 Bacterial strains and media ................................................................................................................................. 84 Design and assemBly of degenerate PhoQ-PhoP library .................................................................................... 85 LiBrary selection and Sort-seq ............................................................................................................................ 87 Illumina sample preparation ................................................................................................................................ 89 Construction of comBinatorial 79 x 71 mutant library ........................................................................................ 89 Illumina data processing ..................................................................................................................................... 90 Orthogonal set design .......................................................................................................................................... 91 Reconstruction and in vivo characterization of individual PhoQ* and PhoP* variants ...................................... 93 Purification of two-component signaling proteins and in vitro phosphotransfer assays .................................... 94 RNA-seq .............................................................................................................................................................. 95 Identification of two-component signaling proteins and generation of force-directed graphs ........................... 97 Data AvailaBility ............................................................................................................................................... 100 Code AvailaBility .............................................................................................................................................. 100 References ............................................................................................................................................ 101 Chapter 3 – Conclusions and future directions ........................................................................ 104 Conclusions