Synthetic Biology for the Directed Evolution of Protein Biocatalysts: Navigating Sequence Space Intelligently

Chemical Society Reviews Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently Journal: Chemical Society Reviews Manuscript ID: CS-REV-10-2014-000351.R1 Article Type: Review Article Date Submitted by the Author: 20-Oct-2014 Complete List of Authors: Currin, Andrew; The University of Manchester, Chemistry & Manchester Institute of Biotechnology Swainston, Neil; The University of Manchester, School of Computer Science and Manchester Institute of Biotechnology Day, Philip; University of Manchester, Manchester Interdisciplinary Biocentre; University of Manchester, Centre for Integrated Genomic Medical Research Kell, Douglas; The University of Manchester, School of Chemistry Page 1 of 105 Chemical Society Reviews Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently Andrew Currin 1,2,3 , Neil Swainston 1,3,4 Philip J. Day 1,3,5, Douglas B. Kell 1,2,3,* 1Manchester Institute of Biotechnology, The University of Manchester, 131, Princess St, Manchester M1 7DN, United Kingdom 2School of Chemistry, The University of Manchester, Manchester M13 9PL, United Kingdom 3Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), The University of Manchester, 131, Princess St, Manchester M1 7DN, United Kingdom 4School of Computer Science, The University of Manchester, Manchester M13 9PL, United Kingdom 5Faculty of Medical and Human Sciences, The University of Manchester, Manchester M13 9PT, United Kingdom * To whom correspondence should be addressed. Tel: 0044 161 306 4492; Email: [email protected] http://dbkgroup.org/ @dbkell 1 Chemical Society Reviews Page 2 of 105 Table of contents Abstract ..................................................................................................................................................... 4 Introduction ............................................................................................................................................... 5 The size of sequence space ....................................................................................................................... 5 The ‘curse of dimensionality’ and the sparseness or ‘closeness’ of strings in sequence space ...... 6 The nature of sequence space ................................................................................................................... 7 Sequence, structure and function .................................................................................................... 7 How much of sequence space is ‘functional’? ................................................................................ 7 What is evolving and for what purpose? ......................................................................................... 8 Protein folds and convergent and divergent evolution .................................................................. 10 Constraints on globular protein evolution structure in natural evolution ...................................... 11 Coevolution of residues ................................................................................................................. 11 The nature, means of analysis and traversal of protein fitness landscapes ................................... 11 NK landscapes as models for sequence-activity landscapes ......................................................... 12 Experimental directed protein evolution ................................................................................................ 12 Initialisation; the first generation. ........................................................................................................... 13 Scaffolds ........................................................................................................................................ 14 Computational protein design ....................................................................................................... 14 Docking ......................................................................................................................................... 14 Diversity creation and library design ...................................................................................................... 15 Effect of mutation rates, implying that higher can be better ......................................................... 15 Random mutagenesis methods ...................................................................................................... 15 Site-directed mutagenesis to target specific residues .................................................................... 16 Optimising nucleotide substitutions .............................................................................................. 17 Reduced library designs ................................................................................................................ 17 Non-canonical amino acid incorporation ...................................................................................... 17 Recombination .............................................................................................................................. 18 Cell-free synthesis ......................................................................................................................... 19 Synthetic biology for directed evolution ................................................................................................ 19 SpeedyGenes and GeneGenie: tools for synthetic biology applied to the directed evolution of biocatalysts .................................................................................................................................... 20 Genetic selection and screening .............................................................................................................. 20 Genetic Selection ........................................................................................................................... 21 Screening ................................................................................................................................................ 21 Microfluidics, microdroplets and microcompartments ................................................................. 21 Assessment of diversity and its maintenance ......................................................................................... 21 Sequence-activity relationships and machine learning ................................................................. 22 The objective function(s): metrics for the effectiveness of biocatalysts ................................................ 23 The distribution of k cat values among natural proteins .................................................................. 23 What enzyme properties determine k cat values? ............................................................................ 24 The importance of non-active-site mutations in increasing k cat values .................................................. 24 What if we lack the structure? Folding up proteins ‘from scratch’ ........................................................ 25 Metals ..................................................................................................................................................... 26 Enzyme stability, including thermostability ........................................................................................... 26 Solvency ................................................................................................................................................. 27 Reaction classes ...................................................................................................................................... 28 Concluding remarks and future prospects .............................................................................................. 31 Acknowledgments .................................................................................................................................. 33 2 Page 3 of 105 Chemical Society Reviews Legends to figures ................................................................................................................................... 34 References ............................................................................................................................................... 36 3 Chemical Society Reviews Page 4 of 105 Abstract The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo , allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the ‘search space’ of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (K d) and catalytic (k cat ) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving k cat (where natural evolution rarely seeks the highest values),

Synthetic Biology for the Directed Evolution of Protein Biocatalysts: Navigating Sequence Space Intelligently

UNIVERSITY of CALIFORNIA, IRVINE an Expanded Genetic Code

Directed Evolution: Bringing New Chemistry to Life

Machine Learning-Assisted Directed Evolution Navigates a Combinatorial Epistatic Fitness Landscape with Minimal Screening Burden

Design and Screening of Degenerate-Codon-Based Protein Ensembles with M13 Bacteriophage

Modern Methods for Laboratory Diversification of Biomolecules Bratulic and Badran 51

Combinatorial Reshaping of the Candida Antarctica Lipase a Substrate Pocket for Enantioselectivity Using an Extremely Condensed Library

Analyses of All Possible Point Mutations Within a Protein Reveals Relationships Between Function and Experimental Fitness: a Dissertation

1 Engineering Lipases with an Expanded Genetic Code Alessandro De Simone, Michael Georg Hoesl, and Nediljko Budisa

Methods for the Directed Evolution of Proteins

Nonproteinogenic Deep Mutational Scanning of Linear and Cyclic Peptides

Codongenie: Optimised Ambiguous Codon Design Tools

Deep Learning Applications for Computer-Aided Design of Directed Evolution Libraries