Methods for the Directed Evolution of Proteins

REVIEWS STUDY DESIGNS Methods for the directed evolution of proteins Michael S. Packer and David R. Liu Abstract | Directed evolution has proved to be an effective strategy for improving or altering the activity of biomolecules for industrial, research and therapeutic applications. The evolution of proteins in the laboratory requires methods for generating genetic diversity and for identifying protein variants with desired properties. This Review describes some of the tools used to diversify genes, as well as informative examples of screening and selection methods that identify or isolate evolved proteins. We highlight recent cases in which directed evolution generated enzymatic activities and substrate specificities not known to exist in nature. Natural selection Over many generations, iterated mutation and natural In this Review, we summarize techniques that gener- A process by which individuals selection during biological evolution provide solutions ate single-gene libraries, including standard methods as with the highest reproductive for challenges that organisms face in the natural world. well as novel approaches that can generate superior diver- fitness pass on their genetic material to their offspring, thus However, the traits that result from natural selection sity containing a larger proportion of functional mutants. maintaining and enriching only occasionally overlap with features of organ- We also review screening and selection methods that heritable traits that are isms and biomolecules that are sought by humans. identify or isolate improved variants within these librar- adaptive to the natural To guide evolution to access useful phenotypes more ies. Although these strategies can be applied to multi environment. frequently, humans for centuries have used artificial gene pathways3,4 and gene networks5–7, the examples in selection Artificial selection , beginning with the selective breeding of this Review will focus exclusively on the laboratory evolu- 1 2 (Also known as selective crops and domestication of animals . More recently, tion of single genes. In addition, although many of these breeding). A process by which directed evolution in the laboratory has proved to be approaches apply to other types of biomolecules, we focus human intervention in the a highly effective and broadly applicable framework on the directed evolution of proteins because protein evo- reproductive cycle imposes a selection pressure for for optimizing or altering the activities of individual lution has proved to be especially useful for generating 8 9 10 phenotypic traits desired by genes and gene products, which are the fundamental novel biocatalysts , reagents and therapeutics . the breeder. units of biology. Genetic diversity fuels both natural and laboratory Methods for gene diversification Libraries evolution. The occurrence rate of spontaneous muta- It is impossible to cover the entire mutational space of Diverse populations of DNA fragments that are subject to tions is generally insufficient to access desired gene a typical protein: complete randomization of a mere 13 downstream screening and variants on a time scale that is practical for labora- decapeptide would yield 10 unique combinations of selection. tory evolution. A number of genetic diversification amino acids, which exceeds the achievable library size techniques are therefore used to generate libraries of of almost all known protein library creation methods. gene variants that accelerate the exploration of a gene’s Because comprehensive coverage of sequence space is sequence space. Methods to identify and isolate library impossible, gene diversification strategies are designed members with desired properties are a second crucial to perform an optimal sparse sampling of a vast multi- component of laboratory evolution. During organismal dimensional sequence space. The activity level of each Department of Chemistry and evolution, phenotype and genotype are intrinsically cou- library member can be conceptualized as the elevation Chemical Biology, Harvard pled within each organism. However, during laboratory in a fitness landscape on an x–y coordinate that repre- University, 12 Oxford Street, evolution (FIG. 1a), it is often inconvenient or impossi- sents the genotype of that library member. The goal of Cambridge, Massachusetts ble to manipulate genes and gene products in a coupled directed evolution studies is to take mutational steps 02138, USA. manner. Therefore, single-gene evolution in the labora- within this landscape that ‘climb’ towards peak activity Correspondence to D.R.L. (FIG. 1b) e‑mail: [email protected] tory requires carefully designed strategies for screening levels . Over many generations, these benefi- doi:10.1038/nrg3927 or selecting functional variants in ways that maintain the cial mutations accumulate, resulting in a successively Published online 9 June 2015 genotype–phenotype association. improved phenotype. NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 379 © 2015 Macmillan Publishers Limited. All rights reserved REVIEWS a b Translation Diversiﬁcation Replication Library Absolute maximum Accepted step Screening or starting selection point Local maximum Rejected step Figure 1 | Key steps in the cycle of directed evolution. a | The process of directed evolution in the laboratory mimics that of biological evolution. A diverse library of genes is translated into a corresponding library ofNature gene productsReviews |and Genetics screened or selected for functional variants in a manner that maintains the correspondence between genotype (genes) and phenotype (gene products and their functions). These functional genes are replicated and serve as starting points for subsequent rounds of diversification and screening or selection. b | Although the mutational space is multidimensional, it is conceptually helpful to visualize directed evolution as a series of steps within a three-dimensional fitness landscape. Library generation samples the proximal surface of the landscape, and screening or selection identifies the genetic means Library size to ‘climb’ towards fitness peaks. Directed evolution can arrive at absolute maximum activity levels but can also become The number variants that are trapped at local fitness maxima in which library diversification is insufficient to cross ‘fitness valleys’ and access subjected to screening and neighbouring fitness peaks. selection. Library sizes are limited by molecular cloning protocols and/or by host transformation efficiency. Researchers can use focused mutagenesis to maxi- not only mutate the library member but also induce del- mize the likelihood that a library contains improved eterious mutations in the host genome. Host intolerance Focused mutagenesis variants, provided that amino acid positions that are to a high degree of genomic mutation places an upper A strategy of diversification likely determinants of the desired function are known. limit on in vivo mutagenesis rates. To avoid this con- that introduces mutations at 18 DNA regions expected to In the absence of plausible structure–function relation- straint, C. C. Liu and co‑workers developed orthogonal influence protein activity. ships, random mutagenesis can provide a greater chance in vivo DNA replication machinery that only mutates of accessing functional library members than focus- target DNA. This method co‑opts naturally occurring Random mutagenesis ing library diversity on incorrectly chosen residues Kluyveromyces lactis linear plasmids pGKL1/2 and their A strategy of diversification that introduces mutations in an that, when mutated, do not confer desired activities. specialized TP‑DNA polymerases. Because this plasmid unbiased manner throughout Researchers have developed an extensive range of meth- is exclusively cytoplasmic, the TP‑DNA polymerase the entire gene. ods to perform both forms of gene diversification, and exerts no mutational load on the host genome within the most successful strategies often integrate random the nucleus of Saccharomyces cerevisiae. Mutational spectrum and focused mutagenesis. The relatively low mutation rates and the lack of con- The frequency of each specific type of transition and trol offered by most previously described in vivo random transversion. The evenness of Random mutagenesis. Traditional genetic screens mutagenesis protocols have led to a strong preference this spectrum allows more use chemical and physical agents to randomly dam- towards in vitro random mutagenesis strategies. In thorough sampling of sequence age DNA. These agents include alkylating compounds error-prone PCR (epPCR), first described by Goeddel space. such as ethyl methanesulfonate (EMS)11, deaminat- and co-workers19, the low fidelity of DNA polymerases 12 Transformation ing compounds such as nitrous acid , base analogues under certain conditions generates point mutations dur- The process by which a cell such as 2‑aminopurine13, and ultraviolet irradiation14. ing PCR amplification of a gene of interest. Increased directly acquires a foreign Chemical mutagenesis is sufficient to deactivate genes magnesium concentrations, supplementation with man- DNA molecule. A number of at random for a genome-wide screen but is less com- ganese or the use of mutagenic dNTP analogues20 can protocols allow high-efficiency transformation of monly used for directed evolution because of biases in reduce the base-pairing fidelity and increase mutation 11,12 −4 −3 21 microorganisms through mutational spectrum . rates to 10 ~10 per replicated base . Because

Methods for the Directed Evolution of Proteins

UNIVERSITY of CALIFORNIA, IRVINE an Expanded Genetic Code

Directed Evolution: Bringing New Chemistry to Life

Machine Learning-Assisted Directed Evolution Navigates a Combinatorial Epistatic Fitness Landscape with Minimal Screening Burden

Design and Screening of Degenerate-Codon-Based Protein Ensembles with M13 Bacteriophage

Modern Methods for Laboratory Diversification of Biomolecules Bratulic and Badran 51

Combinatorial Reshaping of the Candida Antarctica Lipase a Substrate Pocket for Enantioselectivity Using an Extremely Condensed Library

Analyses of All Possible Point Mutations Within a Protein Reveals Relationships Between Function and Experimental Fitness: a Dissertation

1 Engineering Lipases with an Expanded Genetic Code Alessandro De Simone, Michael Georg Hoesl, and Nediljko Budisa

Nonproteinogenic Deep Mutational Scanning of Linear and Cyclic Peptides

Codongenie: Optimised Ambiguous Codon Design Tools

Deep Learning Applications for Computer-Aided Design of Directed Evolution Libraries

The Evolution of the Genetic Code: Impasses and Challenges