Machine Learning-Assisted Directed Protein Evolution with Combinatorial Libraries Zachary Wua, S. B. Jennifer Kana, Russell D. L
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
UNIVERSITY of CALIFORNIA, IRVINE an Expanded Genetic Code
UNIVERSITY OF CALIFORNIA, IRVINE An expanded genetic code for the characterization and directed evolution of tyrosine-sulfated proteins DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILSOPHY in Biomedical Engineering by Xiang Li Dissertation Committee: Assistant Professor Chang C. Liu, Chair Associate Professor Wendy Liu Associate Professor Jennifer A. Prescher 2018 Portion of Chapter 2 © John Wiley and Sons Portion of Chapter 3 © Springer Portion of Chapter 4 © Royal Society of Chemistry All other materials © 2018 Xiang Li i Dedication To My parents Audrey Bai and Yong Li and My brother Joshua Li ii Table of Content LIST OF FIGURES ..................................................................................................................VI LIST OF TABLES ................................................................................................................. VIII CURRICULUM VITAE ...........................................................................................................IX ACKNOWLEDGEMENTS .................................................................................................... XII ABSTRACT .......................................................................................................................... XIII CHAPTER 1. INTRODUCTION ................................................................................................ 1 1.1. INTRODUCTION ................................................................................................................. -
Directed Evolution: Bringing New Chemistry to Life
Angewandte Essays Chemie International Edition:DOI:10.1002/anie.201708408 Biocatalysis German Edition:DOI:10.1002/ange.201708408 Directed Evolution:Bringing NewChemistry to Life Frances H. Arnold* biocatalysis ·enzymes ·heme proteins · protein engineering ·synthetic methods Survival of the Fittest Expanding Nature’s Catalytic Repertoire for aSustainable Chemical Industry In this competitive age,when new industries sprout and decay in the span of adecade,weshould reflect on how Nature,the best chemist of all time,solves the difficult acompany survives to celebrate its 350th anniversary.A problem of being alive and enduring for billions of years, prerequisite for survival in business is the ability to adapt to under an astonishing range of conditions.Most of the changing environments and tastes,and to sense,anticipate, marvelous chemistry that makes life possible is the work of and meet needs faster and better than the competition. This naturesmacromolecular protein catalysts,the enzymes.By requires constant innovation as well as focused attention to using enzymes,nature can extract materials and energy from execution. Acompany that continues to provide meaningful the environment and convert them into self-replicating,self- and profitable solutions to human problems has achance to repairing,mobile,adaptable,and sometimes even thinking survive,even thrive,inarapidly changing and highly biochemical systems.These systems are good models for competitive world. asustainable chemical industry that uses renewable resources Biology has abrilliant -
Library Construction and Screening
Library construction and screening • A gene library is a collection of different DNA sequences from an organism, • which has beenAlso called genomic libraries or gene banks. • cloned into a vector for ease of purification, storage and analysis. Uses of gene libraries • To obtain the sequences of genes for analysis, amplification, cloning, and expression. • Once the sequence is known probes, primers, etc. can be synthesized for further diagnostic work using, for example, hybridization reactions, blots and PCR. • Knowledge of a gene sequence also offers the possibility of gene therapy. • Also, gene expression can be used to synthesize a product in particular host cells, e.g. synthesis of human gene products in prokaryotic cells. two types of gene library depending upon the source of the DNA used. 1.genomic library. 2.cDNA library Types of GENE library: • genomic library contains DNA fragments representing the entire genome of an organism. • cDNA library contains only complementary DNA molecules synthesized from mRNA molecules in a cell. Genomic Library : • Made from nuclear DNA of an organism or species. • DNA is cut into clonable size pieces as randomly possible using restriction endonuclease • Genomic libraries contain whole genomic fragments including gene exons and introns, gene promoters, intragenic DNA,origins of replication, etc Construction of Genomic Libraries 1. Isolation of genomic DNA and vector. 2.Cleavage of Genomic DNA and vector by Restriction Endonucleases. 3.Ligation of fragmented DNA with the vector. 4.Transformation of -
Machine Learning-Assisted Directed Evolution Navigates a Combinatorial Epistatic Fitness Landscape with Minimal Screening Burden
bioRxiv preprint doi: https://doi.org/10.1101/2020.12.04.408955; this version posted December 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Machine Learning-Assisted Directed Evolution Navigates a Combinatorial Epistatic Fitness Landscape with Minimal Screening Burden Bruce J. Wittmann1, Yisong Yue2, & Frances H. Arnold1,3,* 1Division of Biology and Biological Engineering, 2Department of Computing and Mathematical Sciences, and 3Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA *To whom correspondence should be addressed Abstract Due to screening limitations, in directed evolution (DE) of proteins it is rarely feasible to fully evaluate combinatorial mutant libraries made by mutagenesis at multiple sites. Instead, DE often involves a single-step greedy optimization in which the mutation in the highest-fitness variant identified in each round of single-site mutagenesis is fixed. However, because the effects of a mutation can depend on the presence or absence of other mutations, the efficiency and effectiveness of a single-step greedy walk is influenced by both the starting variant and the order in which beneficial mutations are identified—the process is path-dependent. We recently demonstrated a path- independent machine learning-assisted approach to directed evolution (MLDE) that allows in silico screening of full combinatorial libraries made by simultaneous saturation mutagenesis, thus explicitly capturing the effects of cooperative mutations and bypassing the path-dependence that can limit greedy optimization. -
Construction of Small-Insert Genomic DNA Libraries Highly Enriched
Proc. Natl. Acad. Sci. USA Vol. 89, pp. 3419-3423, April 1992 Genetics Construction of small-insert genomic DNA libraries highly enriched for microsatellite repeat sequences (marker-selected libraries/CA repeats/sequence-tagged sites/genetic mapping/dog genome) ELAINE A. OSTRANDER*tt, PAM M. JONG*t, JASPER RINE*t, AND GEOFFREY DUYKt§ *Department of Molecular and Cellular Biology, 401 Barker Hall, University of California, Berkeley, CA 94720; tHuman Genome Center, Lawrence Berkeley Laboratory, 1 Cyclotron Road, 74-157, Berkeley, CA 94720; and §Department of Genetics, Howard Hughes Medical Institute, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115 Communicated by Philip Leder, January 8, 1992 ABSTRACT We describe an efficient method for the con- Generation of a high-density map of markers for an entire struction of small-insert genomic libraries enriched for highly genome or a single chromosome requires the isolation and polymorphic, simple sequence repeats. With this approach, characterization of hundreds of markers such as microsatel- libraries in which 40-50% of the members contain (CA). lite repeats (10, 11). Two simple yet tedious approaches have repeats are produced, representing an =50-fold enrichment generally been used for this task. One approach is to screen over conventional small-insert genomic DNA libraries. Briefly, a large-insert genomic library with an end-labeled (CA),, or a genomic library with an average insert size of less than 500 (TG),, oligonucleotide (n > 15). Clones that hybridize to the base pairs was constructed in a phagemid vector. Ampliflcation probe are purified and divided into subclones, which are of this library in a dut ung strain ofEscherchia coli allowed the screened by hybridization for a fragment containing the recovery of the library as closed circular single-stranded DNA repeat. -
Multiplexed Microbial Library Preparation Using Smrtbell
Technical Overview: Multiplexed Microbial Library Preparation Using SMRTbell Express Template Prep Kit 2.0 Sequel System ICS v8.0 / Sequel Chemistry 3.0 / SMRT Link v9.0 Sequel II System ICS v9.0 / Sequel II Chemistry 2.0 / SMRT Link v9.0 Sequel IIe System ICS v10.0 / Sequel II Chemistry 2.0 / SMRT Link v10.0 For Research Use Only. Not for use in diagnostic procedures. © Copyright 2021 by Pacific Biosciences of California, Inc. All rights reserved. PN 101-742-600 Ver 2021-02-01-A (February 2021) Multiplexed Microbial Library Preparation Using SMRTbell Express Template Prep Kit 2.0 1. Multiplexed Microbial Sample Preparation Workflow Overview 2. Multiplexed Microbial Sample Preparation Workflow Details 3. Multiplexed Microbial Sequencing Workflow Details 4. Multiplexed Microbial Data Analysis Workflow Details 5. Multiplexed Microbial Library Example Performance Data 6. Technical Documentation & Applications Support Resources 7. Appendix: General Recommendations for High-Molecular Weight gDNA QC and Handling for Multiplexed Microbial SMRTbell Library Construction MULTIPLEXED MICROBIAL SEQUENCING: HOW TO GET STARTED Application-Specific Application-Specific Application Consumable Library Construction, Best Practices Guide Procedure & Checklist Bundle Purchasing Guide Sequencing & Analysis gDNA QC & Shearing ≥1 µg Input gDNA / Microbial Sample 10 kb – 15 kb Target DNA Shear Size Library Construction Multiplex Up To 48 Microbial Samples with the Sequel II and IIe Systems using Barcoded Overhang Adapters (BOA) SMRT Sequencing Use the Sequel, -
The Human Genome Project
TO KNOW OURSELVES ❖ THE U.S. DEPARTMENT OF ENERGY AND THE HUMAN GENOME PROJECT JULY 1996 TO KNOW OURSELVES ❖ THE U.S. DEPARTMENT OF ENERGY AND THE HUMAN GENOME PROJECT JULY 1996 Contents FOREWORD . 2 THE GENOME PROJECT—WHY THE DOE? . 4 A bold but logical step INTRODUCING THE HUMAN GENOME . 6 The recipe for life Some definitions . 6 A plan of action . 8 EXPLORING THE GENOMIC LANDSCAPE . 10 Mapping the terrain Two giant steps: Chromosomes 16 and 19 . 12 Getting down to details: Sequencing the genome . 16 Shotguns and transposons . 20 How good is good enough? . 26 Sidebar: Tools of the Trade . 17 Sidebar: The Mighty Mouse . 24 BEYOND BIOLOGY . 27 Instrumentation and informatics Smaller is better—And other developments . 27 Dealing with the data . 30 ETHICAL, LEGAL, AND SOCIAL IMPLICATIONS . 32 An essential dimension of genome research Foreword T THE END OF THE ROAD in Little has been rapid, and it is now generally agreed Cottonwood Canyon, near Salt that this international project will produce Lake City, Alta is a place of the complete sequence of the human genome near-mythic renown among by the year 2005. A skiers. In time it may well And what is more important, the value assume similar status among molecular of the project also appears beyond doubt. geneticists. In December 1984, a conference Genome research is revolutionizing biology there, co-sponsored by the U.S. Department and biotechnology, and providing a vital of Energy, pondered a single question: Does thrust to the increasingly broad scope of the modern DNA research offer a way of detect- biological sciences. -
1 Engineering Lipases with an Expanded Genetic Code Alessandro De Simone, Michael Georg Hoesl, and Nediljko Budisa
3 1 Engineering Lipases with an Expanded Genetic Code Alessandro De Simone, Michael Georg Hoesl, and Nediljko Budisa 1.1 Introduction Lipases (EC 3.1.1.3) are a class of ubiquitous enzymes that catalyze both the hydrolysis and synthesis of acylglycerols with long acyl chains (carbon atoms >10) [1]. Currently, they constitute one of the most important groups of biocatalysts applied in many fields, including foods, detergents, flavors, fine chemicals, cosmetics, biodiesel, and pharmaceuticals owing to their high specificity, regios- electivity, and enantioselectivity [2]. Lipases can be found in a broad range of organisms, including plants and animals, however, it is chiefly the microbial lipases that find immense application. This is because of their high yields and ease of genetic manipulation, as well as wide substrate specificity [3]. Although lipases’ properties (molecular weight, pH and temperature optima, stability, substrate specificity) are source dependent, they all share a common structure consisting of a compact minimal α/β hydrolase fold. The hydrolase fold is composed of a central β-sheet consisting of up to eight different β strands connected by up to six α helices [1]. The active site of the α/β hydrolase fold enzymes contains a nucleophilic residue (serine), a catalytic acid residue (aspartate/glutamate), and a histidine residue, always in this order in the amino acid sequence. These residues act cooperatively in the catalytic mechanism of ester hydrolysis [4]. The lipolytic reaction takes place at the interface between an insoluble substrate phase and the aqueous phase in which the enzyme is dissolved. Lipases are activated by the presence of this emulsion interface, a phenomenon called interfacial activation, which differentiates them from similar esterases. -
Cdna Libraries and Expression Libraries
Solutions for Practice Problems for Recombinant DNA, Session 4: cDNA Libraries and Expression Libraries Question 1 In a hypothetical scenario you wake up one morning to your roommate exclaiming about her sudden hair growth. She has been supplementing her diet with a strange new fungus purchased at the local farmer’s market. You take samples of the fungus to your lab and you find that this fungus does indeed make a protein (the harE protein) that stimulates hair growth. You construct a fungal genomic DNA library in E. Coli with the hope of cloning the harE gene. If you succeed you will be a billionaire! You obtain DNA from the fungus, digest it with a restriction enzyme, and clone it into a vector. a) What features must be present on your plasmid that will allow you to use this as a cloning vector to make fungal genomic DNA library. Your vector would certainly need to have a unique restriction enzyme site, a selectable marker such as the ampicillin resistance gene, and a bacterial origin of replication. Other features may be required depending upon how you plan to use this library. b) You clone your digested genomic DNA into this vector. The E. coli (bacteria) cells that you will transform to create your library will have what phenotype prior to transformation? Prior to transformation, the E. coli cells that you will transform will be sensitive to antibiotic. This allows you to select for cells that obtained a plasmid. c) How do you distinguish bacterial cells that carry a vector from those that do not? Cells that obtained a vector will have obtained the selectable marker (one example is the ampicillin resistance gene). -
Methods for the Directed Evolution of Proteins
REVIEWS STUDY DESIGNS Methods for the directed evolution of proteins Michael S. Packer and David R. Liu Abstract | Directed evolution has proved to be an effective strategy for improving or altering the activity of biomolecules for industrial, research and therapeutic applications. The evolution of proteins in the laboratory requires methods for generating genetic diversity and for identifying protein variants with desired properties. This Review describes some of the tools used to diversify genes, as well as informative examples of screening and selection methods that identify or isolate evolved proteins. We highlight recent cases in which directed evolution generated enzymatic activities and substrate specificities not known to exist in nature. Natural selection Over many generations, iterated mutation and natural In this Review, we summarize techniques that gener- A process by which individuals selection during biological evolution provide solutions ate single-gene libraries, including standard methods as with the highest reproductive for challenges that organisms face in the natural world. well as novel approaches that can generate superior diver- fitness pass on their genetic material to their offspring, thus However, the traits that result from natural selection sity containing a larger proportion of functional mutants. maintaining and enriching only occasionally overlap with features of organ- We also review screening and selection methods that heritable traits that are isms and biomolecules that are sought by humans. identify or isolate improved variants within these librar- adaptive to the natural To guide evolution to access useful phenotypes more ies. Although these strategies can be applied to multi environment. frequently, humans for centuries have used artificial gene pathways3,4 and gene networks5–7, the examples in selection Artificial selection , beginning with the selective breeding of this Review will focus exclusively on the laboratory evolu- 1 2 (Also known as selective crops and domestication of animals . -
Bacterial Artificial Chromosomes (Bacs) Became the Most Broadly Used Resource for Several Reasons
Bacterial Artificial Chromosomes STC Production on Human BACs Archive Provided for Historical Purposes Home STC Project History 1995 Meeting Articles Contacts Links HGP Sequences HGP Research Several types of DNA library resources were sponsored by the DOE before and during the Human Genome Program (HGP). These included both prokaryotic and eukaryotic vector systems, and clone libraries representing single chromosomes. Bacterial Artificial Chromosomes (BACs) became the most broadly used resource for several reasons. The large size was a good match for capabilities of high throughput sequencing centers. As contrasted to some earlier resources, chimerism (having gene segments from multiple chromosome sites combined in one clone) is substantially if not completely absent. With some interesting exceptions , the BACS are stable in their bacterial hosts. In support of the functional analysis of genes, the BACs are very useful for making transgenic animals with segments of human DNAs. A brief history of BAC development is available in a preface to a 2003 issue of Methods in Molecular Biology , wherein details of BAC related protocols reside. One particular BAC project was crucial to the timely completion of human genome sequencing. (See history .) In a 1996 initiative, the DOE Office of Biological and Environmental Research sponsored the production of sequence tag connectors (STCs) for the BACs being used in human genome sequencing. (STCs are sequence reads at the ends of cloned DNA segments; they mark the boundaries of the cloned DNA.) This publicly available resource has served both the international public collaboration and Celera Genomics Inc . in the generation of the human genome sequence. The BACs representing a genome can together serve as a scaffold on which much shorter DNA sequence assemblies can be located. -
Deep Learning Applications for Computer-Aided Design of Directed Evolution Libraries
Deep Learning Applications for Computer-Aided Design of Directed Evolution Libraries Anthony J. Agbay Department of Bioengineering Stanford University [email protected] Abstract In the past decades, the advancement in industrial enzymes, biological therapeutics, and biologic-based "green" technologies have been driven by advances in protein engineering. Specifically, the development of directed evolution techniques have enabled scientists to generate genetic diversity to iteratively test randomized li- braries of variants to generate optimized enzymes for specific functions. However, throughput of directed evolution is limited due to the potential number number of variants in a randomized library. Training neural network models that utilize the growing number of datasets from deep mutational scanning experiments that characterize entire mutational landscapes and new computational representations of protein properties provides a unique opportunity to significantly improve the efficiency and throughput of directed evolution , alleviating a major bottleneck and cost in many research and development projects. 1 Introduction From reducing manufacturing waste and developing green energy alternatives, to biocatalytic cascades, designing new therapeutics, and developing inducible protein switches, engineering proteins enable enormous opportunities across a diverse array of fields. (Huffman et al., 2019; Nimrod et al., 2018; Langan et al., 2019; Smith et al., 2012) Directed evolution, or utilizing evolutionary principles to guide iterative mutations