<<

Research Collection

Doctoral Thesis

Directed Evolution of using Ultrahigh-Throughput Screening

Author(s): Debon, Aaron

Publication Date: 2021

Permanent Link: https://doi.org/10.3929/ethz-b-000502194

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.

ETH Library DISS. ETH NO. 27208 OF ENZYMES USING ULTRAHIGH-THROUGHPUT SCREENING

A thesis submitted to attain the degree of DOCTOR OF SCIENCES of ETH ZURICH (Dr. sc. ETH Zurich)

presented by AARON DEBON

MSc Interdisciplinary Sciences, ETH Zürich

born on 15.11.1990

citizen of Einsiedeln, Schwyz

accepted on the recommendation of

Prof. Dr. Donald Hilvert, examiner Prof. Dr. Andrew deMello, co-examiner

2021

Don’t play the butter notes. - Miles Davis Acknowledgements

First, I would like to speak my deepest gratitude to my supervisor Prof. Don Hilvert for taking me in as his PhD student. His enthusiasm for science is a true inspiration and without his support and guidance none of this would have been possible. Additionally, I want to thank him for his excellent sportsmanship in the Schmutzli party. His ability ability not to laugh I will never forget. Along this line, I also want to thank Prof. Peter Kast, not only for being Don’s Schmutzli nemesis, but also for helpful inputs concerning everything microbiology. I’m grateful to Prof. Andrew deMello for refereeing this thesis and also the time I was able to spend working in his lab as a student. Furthermore, I want to thank Anita Meier-Lüssi and Antonella Toth for their help with administrative tasks and Leyla Hernandez for all her work in the lab. I want to thank the people that were mentors to me over the years. Big thanks go out to Sabine Studer for introducing me to the lab and, of course, being my Ying. I thank Anthony Green for his support in science and doubtful computer skills. Richard Obexer was the best microfluidics mentor imaginable, always challenging me to learn. Moritz Pott, without you I would also only be half the dancer I’m now. This thesis was proofread by Tom Edwardson, Oliver Alleman, and Dominic Hoch for which I’m extremely grateful. My time in the Hilvert lab was not only marked by science, but also by the many great friendships that were formed during my time here. I’m deeply grateful to Doug Hansen for being the best host in the west; keep an eye out for these rattle snakes. David Niquille’s repeated attempts to spark my interest in football weren’t successful, but "high/low"-five anyway. I thank Shiksha Mantri for her cooking skills and for inventing coop time. Furthermore, I want to thank Reinhard Zschoche for his timeless sense of style, Takahiro Hayashi for his

iii iv excellent jokes, Stephan Tetter for co-spearheading, Mik Levasseur for admiring my forearms, Yusuke Azuma for proving that mornings are overrated, Oliver Allemann for teaching me the importance of speaking french in Paris, Susanne Mailand for never ending a night too early, brew master Christian Stocker, Xavi Garrabou for his various rants, Ines the Queenes Folger, Marcel Grogg for his group hike, and Duncan Macdonald and Anna Camus for sharing a home with me. The F328 lab was always a great working environment. I want to thank ev- eryone who contributed: Naohiro Terasaka for teaching me how to nap, Cathleen Zeymer for starting the snack tray, Richard Bernitzky for bringing new wind to the lab, Eita Sasaki aka the mountain goat, Sophie Basler for always laughing in the right moment, the talent scout Dominic G. Hoch, Sebastian Sjöstöm for his Swedish French press, and Raphi Frey for being an outstanding duet partner and fierce table tennis adversary. Also, they were brave enough to make me decide the music most of the time. The Höngg climbing group deserves special mention. The Austrian aces Madeleine Fellner and Matthias Tinzl, who thought me his language, and also Adrian Guggisberg for many hours spent at Gasi. Thanks to Thomas G. W. Edwardson for being the best person to get sandbagged with. Having great friends helps staying grounded besides the strenuous research life. Therefore, a big thank you to Jan Würschem, Roman Meier, Martin Meier, Sebastian Schenk, and Felix Schumacher for ski holidays and other time spent in the mountains and cities. I would not be writing these lines without the continuing support of my family. Thanks to my sister Mirjam and my mother Luzia. I’m deeply grateful for all the time, nerves, and energy you have invested into me. Thanks for always being there for me. Finally, I want to thank Anna Volokitin for being with me and for her loving support during the writing of this thesis. Abstract

Virtually all biologically relevant chemical reactions are catalyzed by enzymes. The latter are sophisticated biocatalysts shaped by billions of years of natural evolution. Enzymes display astounding rate accelerations, specificities and selec- tivities. Some even reach catalytic efficiencies that allow reactions to take place at the speed of diffusion. In general, enzymes are far more efficient catalysts than their human-made small counterparts, all while operating at ambient temperatures and in aqueous solution. Therefore, enzymes are of great interest in biotechnology to enable industrially important and environmentally impactful reactions under milder conditions. However, to render natural enzymes industri- ally useful, significant engineering to modify their function is typically required. With the advent of directed evolution as a robust method of tailoring function, this goal is now within reach. Several approaches to obtain new biocatalysts have been developed. The most widely used exploits the promiscuous activities of natural enzymes. More recently, computational design has enabled the creation of biocatalysts entirely from scratch. However, activities found from enzyme promiscuity or generated through computational design are generally low compared to naturally occurring enzymatic reactions. In both cases, low starting activities can be optimized by directed evolution, a powerful engineering algorithm that can be applied to tailor enzyme properties, in the absence of prior knowledge about structure or mechanism. Although directed evolution provides a method to explore new enzyme functions, beneficial mutations in the vast space of possible sequences are extremely rare. As a consequence many iterative cycles of mutation and screening may be needed to achieve the desired function. The success of laboratory evolution is thus often limited by the number of variants that can be screened in a reasonable amount of time. In this thesis, we explore fluorescence- activated droplet sorting (FADS) as an ultra-high throughput method to expedite

v vi this process. Screening of enzymatic activity in microfluidic droplets offers a more than 1000-fold increase in throughput compared to regular microtiter plate-based assays. The commonly used means of detecting enzymatic reactivity in picoliter-sized droplets is fluorescence . This limits the utility of droplet sorting, because it usually requires the use of labeled model substrates of little real world interest. In Chapter 2, we describe a strategy that overcomes this limitation. Using an enzymatic cascade reaction to detect , a common by-product of many enzyme-catalyzed reactions, oxidase enzymes can be assayed in ultra-high throughput in a label-free manner. We employed this approach to improve the promiscuous activity of cyclohexylamine oxidase (CHAO) for non- natural substrates. An initial library of CHAO variants containing four million members was created and sorted for oxidation of the non-natural substrates (S)-1- phenylpropylamine and 1,2,3,4-tetrahydroisoquinoline. For both amines, active variants were found that had a 9-fold and a 50-fold improvement in catalytic efficiency, respectively. This result is especially notable, as the respective starting activities of CHAO were 30% and 0.3% of its activity towards the native cyclohexylamine. It demonstrates the large dynamic range in activities that can be handled by this FADS assay. By creating a bespoke mutant library for the hydrophobic of CHAO with non-degenerate codons, we targeted 75% of all active site residues simultane- ously. Using our assay we screened the resulting 1.7x106 variants for the oxidation of 1-phenyl-1,2,3,4-tetrahydroisoquinoline (PheTIQ), a bulky chiral active phar- maceutical ingredient for which the wild-type enzyme possesses little activity −1 −1 −1 (kcat/KM = 10 M s , kcat = 0.0095 s ). We discovered a mutant that shows −1 −1 a 960-fold improvement in kcat/KM of 9,400 M s towards the (R)-enantiomer of PheTIQ. Thus, in one step of mutagenesis and screening, we created an en- zyme that has similar activity to CHAO with its native substrate (kcat/KM = 10,630 M−1s−1). The large increase in efficiency is mainly the consequence of a 340-fold improved rate acceleration. Additionally, this variant proved to be highly selective, it exhibits a 4,200-fold preference for the (R)-enantiomer. As a result, (S)-1-phenyl- and (S)-1-ethyl-1,2,3,4-tetrahydroisoquinoline could be syn- vii thesized with e.e. values of 99% and 98%, respectively. Computational modeling and further kinetic characterization of this variant showed that the active site was radically reshaped to completely alter the substrate scope of the enzyme. Overall, the enzyme now preferentially converts (R)-enantiomers of bulky sec- ondary and primary amines, which constitutes a substantial switch in stereo- and substrate preference compared to the wild-type enzyme’s strong selectivity for small (S)-configured primary amines. Besides improving promiscuous enzyme activity, FADS has also been suc- cessfully applied to the optimization of computationally designed catalysts. The design of metalloenzymes is an especially promising technique to generate enzy- matic activity de novo. Combining the design of protein folds with the inherent reactivity of a metal has resulted in several new enzymes that can be modified by directed evolution. A textbook example is MID1sc10, which started as a small zinc-binding dimer and was transformed through computational remod- eling and nine rounds of enzyme engineering into an artificial zinc-dependent metalloesterase with natural-like efficiency. However, its turnover number was subpar. In Chapter 3, we describe the development of a microfluidic assay to evolve MID1sc10 to even higher rate accelerations. Our assay makes use of periplasmic export of the enzyme so that cells can be left intact during the sorting process. We used this method to screen a large library of MID1sc10 active site mutants. We show that two consecutive sorts with varying stringency enriched the pool of variants in active enzymes. The fastest variants displayed convergence in their mutational pattern, and kinetic characterization showed that the best mutant has a turnover number in a similar range as that of MID1sc10. Interestingly, a key arginine residue that stabilizes the oxyanion transition state in MID1sc10 was removed and a second residue that had not previously been targeted for mutation was mutated to an arginine, functionally substituting for the original arginine. Finding useful variants that have significant sequence changes with respect to the starting enzyme highlights the importance of mutating multiple residues simultaneously to explore larger areas of the sequence-function landscape and detect synergistic interactions. viii

Our experiments showcase the utility of FADS for the screening of enzymatic function. Microfluidic screening platforms have a remarkable dynamic range, as we have demonstrated by screening enzymes with starting kcat/KM values rang- ing from 0.5 up to 9x105 M−1s−1. The enormous throughput of these assays allowed us to exhaustively assay millions of variants and uncover synergistic ef- fects between multiple residues. This capability will make it possible to tackle even harder evolutionary problems, such as improving enzymes that are barely active or already close to perfection. The approaches introduced here, specifi- cally the development of label-free assays and simultaneous mutation of multiple residues to harness synergy, promises to greatly advance the rapid engineering of enzymes. They may catalyze the implementation of environmentally benign biocatalytic routes to high value chemicals. Zusammenfassung

Praktisch alle biologisch relevanten chemischen Reaktionen werden durch Enzyme katalysiert. Diese sind hochentwickelte Biokatalysatoren, die von Milliarden von Jahren natürlicher Evolution geprägt wurden. Enzyme weisen erstaunliche Wechselzahlen, Spezifitäten und Selektivitäten auf. Einige erre- ichen sogar katalytische Effizienzen, die es erlauben, Reaktionen mit nahezu Diffusionsgeschwindigkeit ablaufen zu lassen. Im Allgemeinen sind Enzyme weitaus effizientere Katalysatoren als ihre vom Menschen hergestellten Pendants und das bei Umgebungstemperatur und in wässriger Lösung. Daher sind Enzyme in der Biotechnologie von großem Interesse, um industriell wichtige und umweltschädliche Reaktionen unter milderen Bedingungen zu ermöglichen. Um natürliche Enzyme industriell nutzbar zu machen, müssen sie normalerweise dem Prozess erheblich angepasst werden. Dank gerichteter Evolution als robuste Methode zur Modifikation von Enzymfunktion ist dieses Ziel nun in Reichweite. Bisher wurden mehrere Ansätze zur Gewinnung neuer Biokatalysatoren entwickelt. Der am weitesten verbreitete nutzt promiskuitive Aktivitäten natürlicher Enzyme. In jüngster Zeit hat das rechnergestützte Design von Enzymen die Herstellung von künstlichen Biokatalysatoren ermöglicht. Allerd- ings sind promiskuitive oder künstlich erschaffene Aktivitäten, im Vergleich zu natürlich vorkommenden enzymatischen Reaktionen gering. Jedoch können in beiden Fällen niedrige Startaktivitäten durch gerichtete Evolution optimiert werden. Dabei handelt es sich um einen robusten Optimierungsalgorithmus, der massgeschneiderte Enzyme kreieren kann, ohne dass Vorkenntnisse über deren Struktur oder Mechanismus vorhanden sind. Obwohl die gerichtete Evolution eine Methode zur Erforschung neuer Enzymfunktionen bietet, sind vorteilhafte Mutationen in der grossen Anzahl aller möglichen Aminosäurese- quenzen äußerst selten. Infolgedessen kann es sein, dass viele iterative Zyklen von Mutation und Selektion erforderlich sind, um die gewünschte Funktion

ix x zu erreichen. Der Erfolg gerichteter Evolution ist daher oft durch die Anzahl der Varianten die in einer angemessenen Zeitspanne analysiert werden können begrenzt. In dieser Arbeit untersuchen wir das fluoreszenzaktivierte sortieren von Pikoliter grossen tröpfchen (FADS) als Ultra-Hochdurchsatzmethode, um diesen Prozess zu beschleunigen. Das Screening der enzymatischen Aktivität in mikrofluidischen Tröpfchen bietet eine mehr als 1000-fache Steigerung des Durchsatzes im Vergleich zu herkömmlichen Methoden basierend auf Analysen in Mikrotiterplatten. Das gebräuchlichste Mittel zum Nachweis enzymatischer Reaktivität in Mikrofluidikgeräten ist die Fluoreszenzspektroskopie. Dies schränkt den Nutzen von Mikrofluidik Screenings ein, da sie in der Regel die Verwendung von modifizierten Modellsubstraten von wenig industriellem Interesse erfordert. In Kapitel 2 beschreiben wir eine Strategie, die diese Einschränkung überwindet. Mit Hilfe einer enzymatischen Kaskadenreaktion zum Nachweis von Wasser- stoffperoxid, einem häufigen Nebenprodukt enzymkatalysierter Reaktionen, können Oxidase-Enzymaktivitäten im Ultrahochdurchsatzverfahren bestimmt werden ohne markiertes Substrat zu verwenden. Wir machten von dieser Methode gebrauch, um die promiskuitive Aktivität von Cyclohexylaminoxidase (CHAO) für nicht-natürliche Substrate zu verbesern. Eine erste Bibliothek von CHAO-Varianten bestehend aus vier Millionen Mutanten wurde kloniert und für die Oxidation der nicht-natürlichen Substrate (S)-1-Phenylpropylamin und 1,2,3,4-Tetrahydroisochinolin sortiert. Für beide Amine wurden aktive Varianten gefunden, die eine 9-fache bzw. 50-fache Verbesserung der katalytischen Effizienz aufwiesen. Dieses Ergebnis ist besonders bemerkenswert, da die jeweiligen Ausgangsaktivitäten von CHAO 30% bzw. 0.3% dessen Aktivität gegenüber dem nativen Substrat Cyclohexylamin sind. Dies zeigt den großen Bereich von Aktivitäten, die mit diesem FADS-Assay detektiert werden können. Wir erstellten eine massgeschneiderten Mutantenbibliothek für das hy- drophobe aktive Zentrum von CHAO. Mithilfe komprimierter Codons konnten wir 75% aller Reste der aktiven Tasche gleichzeitig mutieren. Mit Hilfe unserer Methode analysierten wir die resultierenden 1.7x106-Varianten für die Oxidation von 1-Phenyl-1,2,3,4-tetrahydroisochinolin (PheTIQ), einem sperrigen chiralen xi pharmazeutischen Baustein, für den das natürliche Enzym nur eine geringe −1 −1 −1 Aktivität besitzt (kcat/KM = 10 M s , kcat = 0.0095 s ). Wir entdeckten eine −1 −1 Mutante, die eine 960-fache Verbesserung von kcat/KM (9,400 M s ) für das (R)-Enantiomer von PheTIQ aufweist. So haben wir in einer Runde gerichteter Evolution ein Enzym geschaffen, das mit seinem künstlichen Substrat eine ähnliche katalytische Effizient aufweist wie CHAO für sein natürliches Substrat −1 −1 (kcat/KM = 10,630 M s ). Die große Effizienzsteigerung ist hauptsächlich die Folge einer 340-fach verbesserten Ratenbeschleunigung. Darüber hinaus erwies sich diese Variante als sehr selektiv, mit einer 4,200-fachen Präferenz für das (R)-Enantiomer. Dadurch konnten (S)-1-Phenyl- und (S)-1-Ethyl-1,2,3,4- tetrahydroisochinolin mit e.e. Werten von 99% bzw. 98% synthetisiert werden. Computersimulationen und weitere kinetische Charakterisierung dieser Variante zeigte, dass sich das aktive Zentrum strukturell drastisch verändert hat und das Enzym eine neue Substratpräferenz besitzt. Insgesamt reagiert das Enzym nun bevorzugt mit (R)-Enantiomeren von sperrigen sekundären und primären Aminen, was eine wesentliche Veränderung der Substratpräferenz des Enzyms darstellt. Neben der Verbesserung von promiskuitiver Enzymaktivität wurde FADS auch erfolgreich zur Optimierung von Computern entworfenen Biokatalysatoren eingesetzt. Das Design von Metalloenzymen ist eine besonders vielversprechende Technik zur de novo Erzeugung enzymatischer Aktivität. Die Kombination von künstlich erstellter Proteinfaltung mit der inhärenten Reaktivität eines Met- allions hat zu mehreren neuen Enzymen geführt, die durch gerichtete Evolu- tion verändert werden konnten. Ein Lehrbuchbeispiel ist MID1sc10, welches als ein kleines zinkbindendes Peptiddimer im Computer entstand und durch computerbasierte Umgestaltung und neun Runden Enzym-Optimierung zu einer künstlichen zinkabhängigen Metalloesterase mit naturähnlicher katalytischer Ef- fizienz wurde. MID1sc10’s Usatzzahl war jedoch unterdurchschnittlich verglichen mit natürlichen Enzymen. In Kapitel 3 beschreiben wir die Entwicklung einer mikrofluidischen Methode, um MID1sc10 zu noch höheren Beschleunigungsraten zu evolvieren. Unser As- say nutzt den periplasmatischen Export des Enzyms, so dass die Zellen während xii des Sortierprozesses intakt gelassen werden können. Mit dieser Methode haben wir eine große Bibliothek von MID1sc10 Aktivtaschenmutanten analysiert. Wir zeigen, dass zwei aufeinanderfolgende Sortierungen mit unterschiedlicher Strin- genz aktive Enzyme im Variantenpool angereichert hat. Die schnellsten Varianten zeigten Konvergenz in ihrem Mutationsmuster, und ihre kinetische Charakter- isierung zeigte, dass die beste Mutante eine ähnliche Umsatzzahl wie MID1sc10 hat. Interessanterweise wurde ein wichtiger Arginin Rest, der den Oxyanion- Übergangszustand in MID1sc10 stabilisiert, wegmutiert, und ein zweiter Rest, der in vorhergehenden Evolutionsrunden nicht Ziel von Mutation war, wurde zu einem Arginin mutiert, welches das ursprüngliche Arginin funktionell ersetzte. Die Möglichkeit mehrere Positionen gleichzeitig zu mutieren erlaubte es uns sig- nifikante Sequenzänderungen zu erzielen und macht deutlich wie wichtig es ist, grosse Bereiche des Sequenzraumes abzudecken um so synergistische Interaktio- nen zu detektieren. Unsere Experimente zeigen den Nutzen von FADS für das Screening der enzymatischer Funktion. Mikrofluidische Screening-Plattformen haben einen bemerkenswerten Dynamikbereich, was wir durch das Screening von Enzymen 5 −1 −1 mit Ausgangswerten von kcat/KM von 0.5 bis zu 9x10 M s gezeigt haben. Der enorme Durchsatz dieser Methoden ermöglichte es uns, Millionen von Varianten zu untersuchen und synergistische Effekte zwischen mehreren Seiten- ketten aufzudecken. Dies wird es ermöglichen, noch schwierigere evolutionäre Probleme anzugehen, wie z.B. die Verbesserung von Enzymen, die kaum aktiv oder bereits nahe an der Perfektion sind. Die hier vorgestellten Ansätze, insbesondere die Nutzung von nicht markierten Substraten und die gleichzeitige Mutation mehrerer Seitenketten zur Nutzung von Synergieeffekten, versprechen einen großen Fortschritt für die schnelle Entwicklung von neuen Enzymen. Diese Methoden können die Nutzung von Enzymen für umweltfreundliche biokatalytische Prozesse katalysieren. Contents

Acknowledgements iv

Abstract viii

Zusammenfassung xii

1 Introduction1 1.1 The molecular architecture of enzymes...... 2 1.2 Industrial biocatalysis and enzyme engineering...... 11 1.2.1 The tailored biocatalyst...... 12 1.2.2 Computational enzyme design...... 26 1.3 Project aims...... 30

2 Evolving oxidases by ultra-high throughput screening 32 2.1 Introduction...... 33 2.2 Results...... 38 2.2.1 Initial assay development...... 38 2.2.2 Black and white sorting with cyclohexylamine...... 41 2.2.3 Initial FADS experiments...... 42 2.2.4 Single-step creation of a proficient oxidase...... 45 2.3 Discussion...... 54

xiii Contents xiv

3 Improving a proficient artificial metalloenzyme 57 3.1 Introduction...... 58 3.2 Results...... 61 3.2.1 Esterase assay development...... 61 3.2.2 Screening expression systems...... 64 3.2.3 MID1sc10 library generation...... 65 3.2.4 FADS for esterase activity...... 66 3.2.5 Variant isolation...... 67 3.2.6 Characterization of MID1sc10.1-3...... 68 3.3 Discussion...... 69

4 Perspective 73

5 Materials and methods 80 5.1 General methods...... 80 5.1.1 Materials...... 80 5.1.2 General analytical methods...... 80 5.1.3 Creation of electrocompetent cells...... 81 5.1.4 Oligonucleotides...... 81 5.2 Methods specific to Chapter 2...... 83 5.2.1 Construction of pKTNTET-CHAO_Kan...... 83 5.2.2 Plasmid stability...... 83 5.2.3 Library generation...... 84 5.2.4 Microfluidic setup...... 85 5.2.5 Microfluidic chip production...... 85 5.2.6 Microfluidic assay...... 86 5.2.7 Mictrotiter plate assay...... 90 Contents xv

5.2.8 Enzyme purification...... 91 5.2.9 Enzyme kinetics...... 91 5.2.10 Substrate scope...... 92 5.2.11 Chiral analysis...... 92 5.2.12 Deracemization reactions...... 93 5.2.13 Modeling of PT.1...... 95 5.2.14 Ligand docking of PheTIQ in PT.1...... 96 5.3 Methods specific to Chapter 3...... 96 5.3.1 Materials...... 96 5.3.2 ...... 96 5.3.3 Construction of pMG209-pelB-MID1sc10...... 97 5.3.4 Construction of pQE-MID1sc10...... 97 5.3.5 Construction of pKTNTET-pelB-MID1sc10...... 98 5.3.6 Construction of pKTNTET-MID1sc10...... 98 5.3.7 Construction of the active site library...... 98 5.3.8 Microfluidic assay...... 99 5.3.9 Microtiter plate assay...... 102 5.3.10 pQE-MBP subcloning...... 103 5.3.11 Protein purification...... 103 5.3.12 Enzyme characterization...... 105

Bibliography 106

A Supplementary figures A-1 A.1 CHAO discovery...... A-1 A.2 Kinetic characterization of CHAO...... A-3 A.3 Plasmid maps...... A-4 Contents xvi

B Supplementary Tables B-1 B.1 DYT and BYT codon table...... B-1 B.2 Screening hits...... B-2 B.3 Evolutionary analysis of CHAO...... B-3

Curriculum vitae B-3 Chapter 1 Introduction

The nature of chemical bonds has greatly impacted the origin of life and the molecular workings of cellular organisms. On the one hand, having stable atomic connections is crucial for organisms to survive and stably encode information. On the other, the cell’s energy economy, homeostasis in various environments, and reproduction require constant bond formation and cleavage. For example, the most ubiquitous biopolymers (proteins, oligonucleotides and sugars) are con- tinuously synthesized and recycled by cells. Comparing their spontaneous break- down, which happens with half-lifes in the range of hundreds to millions of years in aqueous solutions at ambient temperatures, to the rate of biological processes [1], it becomes apparent that lies at the heart of a chemical explanation for life. Unsurprisingly, most modern theories of place the emergence of a biological catalyst at the center of their hypotheses [2,3]. Regardless of life’s origins, enzymes are now the most prevalent biological catalysts responsible for controlling reaction rates in all biological systems. From a phenomenological perspective, enzymes are well understood and not thought to be any different than small molecule catalysts. In order for a substrate to be transformed into a product, it has to pass through a high energy transition state whose formation is rate determining (Fig. 1.1a) [4]. Catalysts accelerate reactions by lowering this energy barrier. Combining transition state theory with a ther- modynamic cycle that links the catalyzed and spontaneous reactions reveals that an enzymes need to bind the transition state more tightly than their substrate to reach large rate acceleration (Fig 1.1b,c) [5]. Given the very low values for the rate of uncatalyzed reactions (kuncat) for common biochemical transformations

1 1. Introduction 2

described above, enzymes achieve extremely large rate accelerations (kcat/kuncat) in the range of 107 1019 [1]. The fastest enzymes reach catalytic efficiencies − (kcat/KM) approaching the diffusion limit, i.e. the rate at which the substrate and the enzyme interact [6]. However, such a phenomenological explanation of catalysis is only a first step toward understanding the origins of these effects. Questions that remain include: What mechanisms are used to achieve such rate accelerations? What are the origins of today’s enzymes? And, as an ultimate test of our knowledge, can we re-engineer existing catalysts and create enzymes from scratch in the laboratory?

1.1 The molecular architecture of enzymes

Enzymes are polypeptide chains that fold into a globular shape and possess a well-defined active site for catalysis. This site provides a three dimensional envi- ronment that can stabilize the transition state through i) shape complementarity, ii) precisely oriented Coulombic interactions between the enzyme and evolving charges/hydrophobic moieties, iii) pre-organization of the reactants, and iv) co- valent bonds [7]. Furthermore, enzymes can recruit cofactors, including small organic or metal , to further expand their catalytic arsenal. The way that enzymes employ these tools to catalyze specific reactions needs to be evaluated on a case by case basis. Here, four examples will illustrate how enzymes can perform catalysis through side chain-substrate interactions, dative binding of catalytic metals, small organic molecule cofactors, and metal-binding cofactors. Serine hydrolases are a prime example of enzymes that use covalent catal- ysis for the hydrolytic cleavage of amide and ester bonds [8]. Their catalytic mechanism has been well studied biochemically. Structural characterization with mechanism based inhibitors and targeted mutation of active site residues has revealed three residue side chains that are involved in catalysis. This catalytic triad consists of a serine, a histidine, and an asparagine [9]. The serine is rendered more nucleophilic by a hydrogen bonding network within the triad and attacks the scissile carbonyl group of the substrate. This reaction affords the first tetra- hedral intermediate (Fig. 1.2a). Cleavage of the ester bond and departure of 1. Introduction 3

Figure 1.1: Thermodynamic and catalytic considerations in enzyme catalysis. a Gibbs free energy diagram of a reaction that proceeds spontaneously (uncatalyzed, red), as well as enzymatically catalyzed (blue). b A thermodynamic cycle of a substrate being converted to a product via a spontaneous and enzyme catalyzed pathway. c Due to path independence of the Gibbs free energy the total energy, along the cycle must be zero. Applying the laws of Arrhenius and transition state theory reveals that the transition state must be bound much stronger than the substrate for catalysis to occur. the alcohol leaving group then yields an acyl-enzyme intermediate that can often be detected and in some cases even isolated. Subsequently, a water molecule is activated by the rest of the triad and attacks the acyl-enzyme intermediate, leading to the second tetrahedral intermediate. Both high energy species are stabilized by a network of hydrogen bonds throughout the catalytic triad. Mu- 1. Introduction 4

Figure 1.2: Paradigms of enzyme catalysis. a Covalent catalysis as observed in serine proteases. A covalent intermediate between the enzyme and the substrate is formed as an intermediate. b Dative coordination of metal cofactors expands the reactivity of enzymes. Here a Zn(II) ion is bound to activate hydroxide ions for the cleavage of peptide bonds. c Enzymes can also expand their catalytic properties by anchoring small molecule cofactors. Here, FAD is bound by an oxidase and enables the abstraction of electrons from the substrate. d Metalloenzymes can bind metals embedded in an organic coenzyme. In heme enzymes an iron-porphyrin coenzyme enables the generation of highly reactive iron-oxo species, for example. 1. Introduction 5 tation of either the serine or histidine residues in the triad to alanine results in 6 4 a 10 -fold decrease in kcat, whereas substitution of aspartate leads to a 10 -fold decrease [10]. Mutation of two triad residues simultaneously does not further decrease activity, showing that this sophisticated arrangement works in a highly synergistic fashion. The developing negative charge on the substrate is stabilized by bidentate hydrogen bonding to the backbone or side chain amides [11, 12]. This catalytic apparatus enables reaction efficiencies approaching the diffusion limit for enzymes like acetylcholine esterase [13]. Notably this enzyme only uses amino acids and their side-chains, the catalytic triad, an oxyanion hole, as well as carboxylic acid side chains to stabilize the tertiary amine of acetylcholine, to reach such impressive efficiency. Utilization of metal cofactors is an alternative strategy that enzymes use to speed up chemical reactions. Metal ions possess an inherent reactivity that can be harnessed for catalysis. They act as simple Lewis acids [14] or redox equivalents [15, 16]. Metals can be either bound to a cofactor, which is recognized by the enzyme, or bound datively, i.e. coordinated by side chain residues. While the metal ion confers a basis for chemical reactivity, the enzyme’s task is to steer the reaction. It provides selectivity for a substrate and product by enforcing a specific orientation on the active metal and may also take part in the reaction through acid-base catalysis. For example, thermolysin facilitates the cleavage of peptide bonds by activating water using a datively coordinated Zn(II) as a Lewis acid (Fig. 1.2b) [17]. Additionally, the zinc ion polarizes the carbonyl group of the substrate making it more electrophilic. This leads to a transition state with a pentacoordinate zinc cation, three dative ligands and the two co-substrates, where an active-site glutamic acid residue functions as a general base to deprotonate the water and protonate the leaving group amine. Inhibitor studies show that residues distant from the active-site determine substrate binding and recognition [18, 19]. This results in drastic changes of catalytic efficiencies upon even small perturbations to the substrate far away from the scissile bond, e.g. a 200-fold reduction of kcat/KM when an alanine is mutated to glycine at the -1 position of the substrate [20]. Thus, embedding Zn(II) in an enzyme can yield a superb hydrolytic catalyst. 1. Introduction 6

In order to perform reactions that necessitate electron transfer, enzymes can use small molecule co-factors. Flavin-dependent enzymes for example tightly bind flavin adenine dinucleotide (FAD), in some cases even covalently. This cofactor can exist in three oxidation states: the oxidized FAD form, the one- electron reduced semiquinone FADH, and the two electron reduced FADH2 [21]. Oxidase enzymes utilize this cofactor to oxidize various functional groups, such as amines, alcohols and thiols (Fig. 1.3)[22, 23]. D-Amino acid oxidases catalyze the enantioselective oxidation of important neuromodulator amino acids and dopamine [24]. They do so by positioning the substrate with a strong bidentate hydrogen bond between an active site arginine and the substrate’s carboxylic acid group and activating a water molecule to act as a base (Fig. 1.2c) [25]. This forces the alpha-hydrogen to point towards the FAD cofactor, initiating hydride transfer [26]. The amino acid’s side chain is positioned in a loose hydrophobic pocket which restricts binding in a catalytically productive manner to D-configured substrates but places little restriction on the size of the side chain. Therefore, even though the oxidase catalyzes the reaction with extremely high selectivity factors (S = (kcat/KM)D/(kcat/KM)L) of up to 4,000 [27], it exhibits a broad substrate scope, converting several D-amino acids and D-dopamine with similar efficiency [28, 29]. The protein environment also thermodynamically stabilizes the semiquinone structure, which enables FADH2 to efficiently reduce molecular oxygen to hydrogen peroxide [30, 31]. By enacting single electron transfers to oxygen the enzyme circumvents the direct, but slow, spin-forbidden reaction 3 with O2 [32]. This process regenerates the cofactor to its oxidized FAD state and resets the enzyme for the next turnover. Oxidases and other flavoenzymes tune the redox potential and therefore the reactivity of FAD via non-covalent interactions [33] or by covalently binding the cofactor [34]. 1. Introduction 7

Figure 1.3: a The structure of the FAD cofactor. b Proposed mechanism of an FAD- dependent amine oxidases. In the reductive half reaction a hydride transfer from the substrate to FAD leads to the formation of FADH2 and the oxidized substrate. Af- ter substrate release molecular oxygen enters the active site. By performing a single electron transfer, forming the FAD semiquinone and superoxide, oxidase enzymes pre- vent having to perform a spin-forbidden reaction. Subsequent recombination of the semiquinone and superoxide initiate the second reduction step that yields FAD and hydrogen peroxide. The cycle is completed by H2O2 release and another substrate molecule can be converted. 1. Introduction 8

For more difficult reactions, enzymes use redox active metal ion cofactors. Cy- tochromes P450, for example, are very powerful oxidants that make use of iron’s redox properties and an electron transport chain to supply redox equivalents. However, unlike metalloproteases, the metal ion is part of an organometallic co- factor that is bound by the folded apo-enzyme [35]. Iron(III) is embedded in a planar porphyrin cofactor that provides four equatorially coordinating ligands for the metal (Fig. 1.2d). Only a cysteine side-chain from the enzyme coordi- nates the iron axially on the opposite side of the reactive center [36]. With the aid of electron-supplying coenzymes, electron shuttling cofactors, and molecular oxygen, cytochromes manage to produce highly reactive iron-oxo species that ab- stract electrons from unactivated C-H bonds [37]. Generally, substrate binding starts the catalytic cycle, enabling reduction of Fe(III) to Fe(II), which then co- ordinates molecular oxygen. Another reduction step and loss of water yields the highly reactive Fe(IV)=O cation radical species which is able to insert oxygen into C-H bonds by a radical rebound mechanism (Fig. 1.5)[36].

Figure 1.4: Examples of cytochrome P450 mediated reactions. a P450 11B1 produces cortisol from 11-deoxycortisol by oxidizing a single aliphatic C-H bond. a P450cam regioselectively monooxygenates camphor to only produce 5-exo-hydroxy-camphor.

Human P450s that act on steroid hormones are a case in point. For example, P450 11B1 catalyzes crucial steps in the of cortisol and aldosterol by hydroxylation of an unreactive aliphatic carbon (Fig. 1.4a) [38]. The most stud- ied example of a P450 enzyme is, however, P450cam from Pseudomonas putida [39]. It performs the regio- and stereospecific hydroxylation of an aliphatic car- bon in camphor, yielding 5-exo-hydroxy-camphor as a single product (Fig. 1.4b). 1. Introduction 9

Atkins and Sligar have shown by site-directed mutagenesis and the synthesis of various substrate analogs that P450cam manages to perfectly align the substrate with the reactive cofactor with a single hydrogen bond to the ketone group and a perfectly complementary active-site-pocket [40, 41]. Considering the complex- ity of the reaction cycle [42], which involves large conformational changes of the enzyme and protein-protein interactions with the electron providing co-enzyme [43, 44], this reaction is a perfect example of the sophistication of highly evolved enzymatic systems. For comparison, uncatalyzed monooxygenations of this type usually only proceed at high temperature, and there is no precedent in the syn- thetic chemical literature for such high reaction specificities [45]. This begs the question of whether the catalytic prowess of enzymes can be harnessed in syn- thetic and industrial manufacture. 1. Introduction 10

Figure 1.5: The generally accepted reaction mechanism of a P450 monooxygenase. Binding of the substrate initiates the first electron transfer, followed by oxygenation of the iron. Oxidation followed by protonation of the oxo-iron species leads to loss of water and formation of a highly reactive iron-(IV) species that can insert oxygen into unreactive C-H bonds. 1. Introduction 11 1.2 Industrial biocatalysis and enzyme engineering

Enzymes not only display large rate accelerations and exquisite enantio- and re- gioselectivities, but also do so at moderate temperatures, ambient pressure, and in aqueous solutions. They therefore represent an important green alternative for conventional chemical synthesis of value-added compounds [46]. Examples of industrial-scale fermentation reach as far back as 1880, with the targeted fer- mentation of acetic acid [47], but the biggest developmental wave started in the middle of the twentieth century. The selective hydrolysis of penicillin to 6- aminopenicillanic acid, without opening the lactam ring, by penicillin amidase enabled the industrial scale production of semi-synthetic beta-lactam antibiotics [48]. Similarly, P450-mediated hydroxylation of progesterone enabled an indus- trially viable synthesis of cortisol, bypassing 31 steps in the previous synthesis route [49]. The use of a hydrolytic enzyme in the production of synthetic human insulin is another important later example of the utilization of enzymes, in this case the preparation of an important biopharmaceutical [50]. Human insulin and its equivalent from pigs differ by just the C-terminal residue (Thr vs. Ala) on the B-chain. Morihara et al. succeeded in producing human from porcine insulin by removing the terminal Ala through cleavage with carboxypeptidase A and subsequent coupling of the pro-insulin with Thr-OtBu in a reaction catalyzed by trypsin [50]. Although these examples illustrate the industrial potential of enzymes, they depend on wild-type, i.e. unaltered, enzymes and required extended reaction en- gineering to make the processes economical. The conditions demanded by chem- ical processes, however, often precluded the use of available natural enzymes, which have evolved to perform under physiological conditions. Additionally, the available catalysts may have only had low activity towards industrially interest- ing compounds [47]. In almost every case, their physical properties and chemical activity had to be altered through enzyme engineering to suit the substrate and other constraints of the reaction in question [51]. With the advent of recombinant DNA technology in the 1980s, structure-based engineering of the biocatalysts 1. Introduction 12 themselves became possible [51]. This advance provided access to modified scaf- folds that allowed the conversion of some non-natural substrates. Nevertheless, large changes through enzyme engineering were still difficult to achieve. Unlike small molecule catalysts, enzymes are encoded by DNA and can be tuned and optimized systematically using a technique that mimics natural evo- lution called directed evolution. As first shown by Spiegelman and co-authors in 1967 [52], and later extended by Arnold and Stemmer [53, 54], genes can be altered by the researcher to produce novel variants (mutagenesis) whose activity is assayed individually. In this way, variants with beneficial mutations can be selected. Iterating this process by using the most proficient variant from one round of evolution in a subsequent round enables the discovery of biocatalysts that differ drastically from their starting point [51, 55–57]. Importantly, directed evolution works even with no prior knowledge about the structure of the enzyme. It is thus a general and very robust strategy to improve enzymes [58]. Genetic diversity can be generated by introducing mutations randomly into the gene encoding the starting enzyme [54, 59]. Alternatively, structural and evolutionary data can be assessed to guide targeted mutation [60, 61]. These approaches have been used to evolve enzymes that tolerate high organic solvent concentration and temperature [53, 62], improve or completely invert enantios- electivity [63, 64], convert novel substrates [65], and even perform completely abiological reactions [[66]. Directed evolution changed ’ view of biocat- alysts. No longer was it necessary to adapt an industrial process to an existing enzyme. Instead enzymes could be tailored to the process [51]. Nowadays, biocat- alysts are typically considered during reaction and process planning [67–69] and sizeable collections of well characterized biocatalysts are commercially available [70].

1.2.1 The tailored biocatalyst

Directed evolution is an expedient way to obtain tailor-made catalysts. Together with the increasing numbers of sequenced genomes [71] and computational tools [72, 73], this approach provides a reliable route to new enzymes for use in the 1. Introduction 13 chemical industry. The biocatalysts employed in many contemporary processes might not resemble their starting point. For example, to synthesize a key chiral alcohol from its ketone precursor in the production of Montelukast, a drug used to treat acute asthma attacks, Codexis evolved a ketoreductase (KRED, Fig. 1.6) [65]. A crucial synthetic intermediate in the desired production route contains various groups that prohibit the use of standard metal hydride reactants [74] or conventional hydrogenation reactions [75]. Only the corrosive and moisture sen- sitive chemocatalyst (-)-DIP-Cl catalyst proved suitably selective and sufficiently mild to promote the reaction with few side products. Biocatalysis was therefore seen as an attractive option. Initial screening of enzyme panels yielded a start- ing point that already exhibited 99.9% selectivity for the desired (S)-enantiomer, but the starting activity was >1000-fold lower than necessary to compete with

( )-DIP-CL catalysis [65]. Not only did the kcat/KM value need to be signifi- − cantly improved, the enzyme also had to function in a 1:5:3 toluene/IPA/water solvent system at elevated temperature. The isopropanol acts as a sacrificial hy- dride donor to regenerate the NADH cofactor. Five rounds of directed evolution yielded a 3,000-fold improved catalyst that promoted the target reaction with complete sterocontrol affording the production to >98.5% chemical purity (Fig. 1.6). Since then, other work on the redox chemistry of alcohols using enzymes has led to the development of synthetic routes to access lactones from diols by driving the reverse reaction to form terminal acids followed by ring closure [76], chemoselective preparation of heterocyclic alcohols [77], cascade reactions to invert the stereocenter of sec-alcohols [78], the discovery of KREDs that pro- duce sec-(R)-enantiomers [79], and the expansion of KRED reactivity to perform single-electron transfers by exploiting the possibility of photoexciting NADH [80]. Similar to chiral alcohol synthesis, problems involving regioselectivity and en- vironmental impact also affect the production of chiral amines. Several enzymatic routes to produce chiral amines have been developed as attractive alternatives to chemical synthesis (Fig. 1.7). Primary amines can be synthesized directly by transaminases from a ketone precursor [81, 82], amine oxidase and imine reductase (IRED) enzymes have been used to produce optically pure secondary amines in 1. Introduction 14

Figure 1.6: Asymmetric synthesis of a precursor to Montelukast. An evolved ke- toreductase (KRED) enabled the biosynthetic production of a chiral alcohol at high yield. The enzymatic cofactor NADPH is recycled in situ using isopropanol as sacrifi- cial hydrogen donor, leading to acetone production. Unlike its chemical counterparts, no side-reactions with R1 and R2 take place. high yield [83, 84], and aspartase has been shown to catalyze the hydroamination of C-C double bonds [85] (Fig. 1.7). Additionally, reductive aminases (RedAm) have more recently been discovered [86]. Both RedAms and IREDs perform the reduction of imines. Although IREDs can perform the reductive amination of ketones, extremely high ratios of amine to ketone are required to drive the initial conversion to the imine in aqueous media [87]. RedAms can perform the same reaction as IREDs but at a 1:1 ratio of ketone to amine by catalyzing imine formation as well as the subsequent reduction. Alternative approaches to amine derivatives have utilized heavily engineered enzymes to perform C-H amination by incorporating artificial iridium porphyrin cofactors [88] or evolving cytochrome P450 enzymes to catalyze nitrene transfer reactions [89]. 1. Introduction 15

Figure 1.7: Retrosynthetic analysis of biocatalytic chiral amine synthesis. Transam- inases can asymmetrically synthesize primary amines from the corresponding ketones and an amine donor, e.g. isopropyl amine. Engineered P450s have been shown to pro- mote the C-H insertion of amines to aliphatic benzenes. Aspartase enzymes catalyze the addition of amines to alpha-beta unsaturated carboxylic acids. Enantioselective ox- idases combined with oxygen and a non-specific reductant yield deracemize a racemic mixture. Imine reductases (IREDs) and reductive aminases (RedAms) perform the reductions of imines. 1. Introduction 16

Perhaps the most notable engineering effort, however, is the development of a biocatalytic route to Sitagliptin. As the demand for this drug increased due to ever increasing number of cases of type II diabetes, Merk considered revising an established synthetic approach to the compound [90]. The original route involved an asymmetric hydrogenation of an enamine under 250 psi of H2, which was catalyzed by a rhodium-based Josiphos catalyst (Fig. 1.8b) [91]. The reaction itself was high yielding, but the enantioselectivity of the chemocatalyst was “only” 97% and the toxic rhodium metal necessitated a tedious cleanup. These problems were circumvented by engineering a transaminase to perform the asymmetric transfer of an amine to the ketone substrate (Fig. 1.8a) [92]. This feat proved especially challenging because no known transaminase converted substrates as large as prositagliptin. Hence, screening with the actual substrate was not possible [93–95]. A suitable starting point for evolution was found by computational modeling of an enzyme that had been previously engineered for the transamination of acetophenone [95]. This enzyme was then evolved, starting with a smaller substrate analog and performing a substrate walk, with increasingly bulky substrates over the course of evolution until the large target substrate could be converted. In every round the assay conditions were also rendered more stringent by increasing either the temperature, co-solvent, or substrate concentration. After 11 rounds of evolution and screening nearly 36,000 variants, the best biocatalyst contained 27 mutations compared to the starting point. Its activity was improved >10,000-fold and it yielded the desired product with >99.95% e.e. in 92% yield. The enzymatic manufacture of Sitagliptin marked a seminal achievement in industrial biocatalysis. With increasing awareness of enzymes as synthetic tools, recent development has focused on the integration of multi-reaction cascades catalyzed by several enzymes [96]. The synthesis of chiral amines from alcohols by the Turner group provides an elegant example of an enzymatic cascade [81] (Fig. 1.9a). In the first step, an alcohol is converted into a ketone in a reaction catalyzed by an alcohol dehydrogenase and in a second step the ketone with ammonia and reduced by an IRED. Interestingly, the redox cofactor, NADH, that is produced in the alcohol oxidation step is recycled in the second reaction, imine reduction, making the 1. Introduction 17

Figure 1.8: Merck’s routes to the anti-diabetic Sitagliptin. a The enzymatic synthesis was made possible through extensive evolution of a transaminase biocatalyst. The enzyme’s activity, first towards a dummy substrate (blue), and later towards the full substrate were notably improved. Tolerance toward high concentrations of amine donor, acetone, and increased temperature were important factors as well. b The traditional route to the chemical included more steps in total. overall reaction redox neutral. In another notable example, Merck constructed a five enzyme cascade to produce the anti-HIV nucleoside analog Islatravir [97] (Fig. 1.9b). Galactose ox- idase (GalOx) was used to perform a challenging desymmetrization of 2-ethynyl- glycerol to produce (R)-2-ethynyl-glyceraldehyde, followed by phosphorylation of the pro-5’-hydroxy group by pantothenate kinase (PanK). Both enzymes were immobilized on solid supports to reduce the amount of protein needed for the process. The next steps follow the natural nucleoside salvage pathway and were carried out in situ. The ribose-5-phosphate analog was formed by an aldol con- densation catalyzed by deoxyribose 5-phosphate aldolase (DERA). Next, phos- phopentomutase (PPM) catalyzed the transfer of the phosphate group to the 3’-alcohol, activating it for nucleophilic displacement by the artificial nucleobase in a reaction catalyzed by purine nucleoside phosphorylase (PNP). It is note- worthy that all enzymes had to be evolved. Cumulatively, 23 rounds of directed evolution were needed to make every catalyst suit the overall reaction scheme. 1. Introduction 18

The enantioselectivity of GalOx had to be reversed by introducing 34 mutations and DERA’s solvent tolerability needed to be enhanced. The other enzymes were mainly improved in their rate acceleration: PanK’s activity was improved 100-fold, PPM activity >70-fold, and PNP 350-fold. These efforts, together with efficient process engineering, led to the development of a highly efficient and streamlined manufacturing process. No intermediate substrate purification was needed and the product was obtained in half as many steps as the original synthetic route. Both of these examples were carried out in vitro with purified enzymes and cleared lysates. Cascades have also been realized by expressing all enzymes in a single host and adding whole cells to the reaction mixture. The formal C-H amination of ethylbenzenes was achieved this way [98]. A previously engineered P450cam was able to perform hydroxylation of hydrocarbons to form secondary alcohols in living cells [99]. Subsequent oxidation to a ketone by an alcohol dehy- drogenase enabled transamination, yielding a chiral primary amine end product. The use of whole cells can complicate processes through crosstalk with cel- lular components [81] and issues with downstream processing [100]. However, with recent advances in the understanding of microbiological hosts and progress in metabolic engineering [101–103], the future of biocatalysis will probably in- volve constructing in vivo pathways to ferment the desired product, using only metabolically-derived building-blocks. At the core of every production route, however, lie individual enzymatic transformations. All of the examples mentioned above used heavily engineered enzymes and the trend towards longer cascade se- quences makes it even more obvious that expedient methods to produce tailored biocatalysts through directed evolution are urgently needed [104]. 1. Introduction 19

Figure 1.9: Enzymatic cascade reactions. a The combination of an alcohol dehydro- genase (ADH) and an amine dehydrogenase enables a redox neutral conversion from achiral alcohols to chiral amines. The NAD/H cofactor is always recycled by either partner enzyme. b The synthesis of islatravir in five-step cascade reaction. Galactose oxidase (GalOx) aided by auxiliary enzymes, catalase to remove H2O2 and (HRP) to buffer GalOx’s redox state, performs a desymmetrization of 2- ethynyl glycerol. Pantothenate kinase transfers a phosphate from phopho-acetate to the pro-5’-hydroxyl group, acetate kinase (AcK) aids in forming new phosphate donor from acetate. An aldol condensation with acetaldehyde, as catalyzed by deoxy 5-phosphate ribose aldolase (DERA), yields the complete ribose analog. The 1’-hydroxy group is activated by a phosphate transfer through phosphopentomutase (PPM) and displaced the non-natural base catalyzed by purine nucleoside phosphorylase (PNP). 1. Introduction 20

Screening for enzyme activity

Directed evolution works by artificially generating genetic diversity and screen- ing the resulting libraries of mutants to identify variants with improvements in a desired property. Mutations can be introduced randomly through error-prone PCR [105], by targeting specific residues [106] or de novo synthesis of libraries [107]. Regardless of the library generation method, turnaround time from wild- type gene to a library of mutants is typically greatly exceeded by the time it takes to screen these mutants. Furthermore, the theoretical library size grows exponentially when multiple residues are simultaneously mutated [108]. Impor- tantly, although combinations of beneficial mutations may improve the catalyst synergistically in a multiplicative rather than additive manner [109], sequence space is thought to be only sparsely populated with improved variants [110] and useful combinations may only be accessible by passing through a low fitness re- gion of the fitness landscape [111]. Strategies to reduce the number of clones in a library by using non-degenerate codons have proven effective, but they don’t guarantee that the best solution will be found at a given amino acid position [112, 113]. Directed evolution is thus a numbers game, and success often depends on how many mutants can be tested during a campaign [114]. Higher throughput methods could therefore be the key to isolating proteins possessing the desired properties [115]. Genetic selection provides an approach to probe millions of enzyme variants [61]. However, the enzymatic product needs to be tied to the organism’s growth rate. In contrast, screening gene libraries by isolating individ- ual clones and separately analyzing them one-by-one for activity is usually easier and provides tight control over reaction conditions. The simplest way to screen enzyme libraries is to grow individual host colonies in microtiter plate wells. After growth and expression of the gene, enzymatic activity can be analyzed directly by spectroscopy, or mass spec- trometry. Each well in a microtiter plate functions as a reaction chamber, pro- viding a simple physical link between the genetic information encoded within the cells (genotype) and the measured parameter of the specific enzyme produced (phenotype). Up to 1,000 variants can usually be analyzed manually with such a setup. The above mentioned montelukast, sitagliptin, and islatravir examples 1. Introduction 21 all employed complex robotic set-ups for enzyme evolution, which increase the throughput of this strategy to up to 105 variants a day [65, 92, 97]. However, such robotic liquid handling systems are extremely maintenance intensive and costly in terms of consumables and high volumes of reactants [116]. Screening a library of 105 members may cost up to 300,000 USD [117]. Logically, a way to miniaturize such systems would ameliorate many of these issues. Moving from plate-based screens to microcapillaries affords arrays of up to 106 reaction vessels on a single plate as opposed to of <1,536 wells on a microtiter plate [118]. Such miniaturization, however, leads to challenges in the handling of individual wells when a hit has to be isolated. A more elegant solution to the problem would be to isolate hits by sorting libraries of cells in bulk into active and inactive categories. The best variants can then be enriched by increasing the sorting threshold over multiple sorts. Fluorescence-activated cell sorting (FACS) enables the screening of up to 108 variants a day [119, 120]. FACS has been used to evolve hydrolases [121–123], non- ribosomal peptide synthethases [124], glutathione transferases [125], aminoacyl- tRNA synthetases [126], peroxidases [127], and oxidases [128]. These assays have the limitation that the readout must be fluorescence. Additionally, either the substrate has to be able to permeate the cell or enzymes need to be displayed, and the product needs to stay associated with the cell. These factors often lead to crosstalk between variants and loss of dynamic range [128]. Techniques to miti- gate these problems have been developed, such as encapsulation in polyelectrolyte shells [129] and, most notably, microfluidic water-in-oil emulsions [130]. In vitro compartmentalization in water droplets generated in bulk was the first method to use emulsions to link genotype and phenotype for directed evo- lution [131]. Bulk emulsions can be assayed by FACS at ultra-high through- put, but as they are polydisperse, reaction conditions may differ from droplet to droplet [132]. Advances in fabrication of microstructures has enabled the rapid, exact, and cheap manufacture of microfluidic devices from polydimethylsiloxane (PDMS) [133]. A negative relief of a network of channels can be created by soft lithography: The design is produced as a photomask which is used to illuminate a silicon wafer coated with a photoresist, and the non-illuminated parts are etched 1. Introduction 22 away. Surface treatment with a silane enables the wafer to be used as a mold to cast a positive relief with PDMS. After punching inlets to connect syringes to the chip, the PDMS slab can be plasma bonded onto glass to yield a closed channel. Electrodes can be made from molten alloys or highly concentrated salt solution to give dielectric input to the chip [134, 135]. By pumping an oil and an aque- ous phase into intersecting channels, monodisperse water-in-oil emulsions can be made with droplet volumes ranging from femtoliters to picoliters [136, 137]. This enables the encapsulation of single cells according to a Poisson distribution, with tight control over reactant concentrations in every droplet [138]. Analogous to an electronic chip, microfluidic chips are modular and can perform many differ- ent functions on individual droplets. After their creation, their contents can be mixed [139], they can be split or merged [140], reactants added [141], diluted [142], incubated [143], and sorted according to fluorescence [130], all at speeds of >1000 droplets/s (Fig. 1.10). Enzymes can be produced in droplets in various ways. They can be displayed on the surface of encapsulated cells [116], directly secreted into the droplets [144, 145], or released from cells by lysis [146]. Once released, they can be challenged by a substrate to generate a signal that can be detected. Combining the ability to lyse cells while maintaining the link between genotype and phenotype with the ultra-high throughput of FACS has made droplet microfluidics a powerful tool for laboratory evolution [147, 148]. Fluorescence-activated droplet sorting (FADS) is the most commonly used microfluidic technique for directed evolution, due to its sensitivity and fast measurement speed. The general workflow includes the creation and expression of a gene library, encapsulation of single cells, incubation for a period of time to allow the reaction to happen, and sorting of the library. This entire process can be performed iteratively to enrich the most active variants even further. Finally, once the library has been reduced to a more manageable size, individual variants can be analyzed in a secondary assay. The enormous throughput of FADS (up to 108 variants per screen) was par- ticularly useful for optimizing an artificial aldolase whose evolutionary trajectory with traditional techniques had stalled (Fig 1.11)[149]. The computationally de- signed starting point for this evolution, RA95.0 [150], was initially evolved for the 1. Introduction 23

Figure 1.10: Features of a FADS microfluidics chip. a A general overview of the design that includes a T-junction to generate droplets and a one hour incubation line leading to a sorting junction.b The droplet links between the fluorogenic product of the reaction and the DNA encoding the enzyme variant despite the cell being lysed. c Zoom of the individual features of the chip. Adapted from [149]. 1. Introduction 24 cleavage of methodol, a fluorogenic aldol substrate, over 13 rounds using 96-well plate assays. The best variant, RA95.5-8, displayed a 4,400-fold improvement in −1 −1 −1 −1 kcat/KM, from 0.17 M s to 1,600 M s [151]. This increase in catalytic effi- ciency was achieved by introducing a small number of mutations, mostly one, per round of evolution and recombining the positive clones through DNA shuffling. FADS, however, enabled the simultaneous randomization of up to five residues simultaneously [149]. The explorative power of FADS made it possible to improve −1 −1 RA95.5-8 another 30-fold to give a kcat/KM value of 34,000 M s . The rate acceleration >109 achieved by this catalyst was rivaling the activity of naturally occurring class I aldolases. The wealth of combinations tested harnessed syner- gistic effects between multiple residues, leading to a sophisticated active site. A Lys83 as a Schiff base, facilitating C-C bond cleavage by acting as an electron sink. Tyr51 acts as a catalytic base with the hydroxide ion product during Schiff base formation. In the x-ray crystal structure these residues form an intricate hydrogen-bonding network with Asn110, Tyr180, and the covalently bound tran- sition state analog. Simultaneous reversal of these mutated residues led to a 106-fold loss of activity in the final variant RA95.5-8F. These intricate relations mirror complexities found in natural enzymes. Furthermore, the authors showed that restarting the evolution with RA95.0 and using FADS from the beginning afforded much larger improvements than could be made in plate assays, owing to the ultra-high throughput [152]. FADS-compatible assays for a multitude of other enzyme types now exist, including peroxidases [116], laccases [153], hydrolases [146, 154–157], proteases [158, 159], and polymerases [160], which have all been engineered with this ap- proach. Except for the work on glucosidases, where a coupled enzyme assay was used [157], all of these assays relied on tagged substrates that become fluores- cent upon reaction. This places a large restriction on the possible reactions and substrates that can be tested with FADS. Furthermore, the hydrophobicity of the carrier oil used in droplet microfluidics also precludes the use of hydrophobic tags, as they would diffuse out of the droplet, breaking the link between genotype and phenotype [161]. The Hollfelder group succeeded in creating an absorbance-activated droplet 1. Introduction 25

Figure 1.11: Evolution of the computationally designed retroaldolase RA95.5-8F. a- c X-ray crystal structures of the evolutionary path from the computationally designed RA95.0 (a) evolved with 96-well plate assays to RA95.5-5 (b) which showed a rearranged active site. c Shows the final variant RA95.5-8F that has been evolved using FADS with the intricate hydrogen bonding network that has emerged. d The evolution of computationally designed enzymes is a promising way to derive new-to-nature catalysts. sorting device to evolve an amine dehydrogenase [162]. Although, the small scale of a microfluidic chip results in short optical pathlengths, hampering absorbance measurements, the group overcame these limitations by placing a fiber optic di- rectly on the chip and working with larger droplets. However, this resulted in a loss in throughput (400 Hz vs. up to 30 kHz [163]) and the system is only viable with strongly absorbing dyes. These limitations could potentially be over- come with an improved chip design [164]. Raman-activated droplet sorting is an interesting pathlength-independent method and has been developed to sort astaxanthin producing cells [165]. However, interference with the oil and diffrac- tion at the droplet surface made it necessary to focus the laser on the cells in the continuous phase. These effects made this approach impractical for in vitro assays and the long acquisition time limited the throughput to 1 Hz. Holland-Moritz et al. reported the first use of mass spectrometric signals to sort cells [166]. Since is a destructive technique, droplets were split on chip with one part going to the MS and the other through a delay line, to compensate for the measurement delay, and then to the sorting junction. In 1. Introduction 26 order to synchronize the MS signals and sorting events, the researchers employed an orthogonal population of marker droplets, containing only a dye that could be detected by MS and optically at the junction. They used their system to screen for transaminase reactivity, for which fluorogenic substrates were unavailable. The throughput, however, of MADS is still very limited, though. At <1 Hz, the device could only screen 15,000 variants in a day. Integration of MADS with a faster scanning rate spectrometer is a very promising development, yielding higher content information about product identity and concentration and has potential for multiplexing different analytes. Nevertheless, RADS and MADS, along with most other promising detection methods, including image-activated sorting [167, 168], fluorescence anisotropy [169], photothermal spectroscopy [170], and surface enhanced Raman scattering [171] have not yet been employed for directed evolution or even droplet sorting. As a consequence, the only viable option for screening enzyme activity at ultra- high throughput while retaining the flexibility of plate assays currently remains FADS. As such, FADS is currently the best option for developing new industrially useful enzymes. To use a fluorescent readout for industrially interesting processes, enzymes can be evolved with model substrates and then further evolved with other assays [172], or by-products of reactions could be leveraged, as has been demonstrated in AADS [162] and for glucosidase FADS [157]. Most studies, however, focus on fluorogenic model reactions for FADS [147]. The development of label-free detection is therefore an important goal in microfluidics and droplet sorting [148, 173].

1.2.2 Computational enzyme design

Another way to cope with the exploding number of biological sequences that need to be analyzed is to build computational models to screen enzyme sequences in silico. In theory, if a model is accurate enough, it can be used to improve en- zymes or even create them de novo by predicting the fold and function of new amino acid sequences. These models can be roughly divided into structure-based and sequence-based models. Practically, combinations of these approaches are 1. Introduction 27 often applied. The most finely grained methods used to study enzyme catalysis computationally are based on (QM) and molecular mechan- ics/molecular dynamics (MM/MD) [174]. In a single simulation, QM provides a very accurate description of ligands and active sites, whereas MM and MD are used to simulate the remainder of the enzyme. Although, QM provides high accuracy, it is computationally too expensive to use for in silico screening, and is generally used instead for mechanistic investigations. Improvements in algo- rithms and computational power will be needed to allow routine employment of QM and MM/MD for virtual enzyme optimization [175, 176]. Less empirical, yet still fine grained, methods such as the electron valence bond (EVB) model have been successfully used to improve the design of Kemp eliminases [177]. Im- pressively, the accuracy of individual transition state stabilization predictions are reportedly within 1 kcal/mol of the experimental error. Nevertheless, this study only evaluated fewer than a hundred variants with the EVB model, suggesting that such computationally expensive methods are still only useful as a filter for higher throughput approaches. Semi-empirical models use energy functions that contain physical terms, but can also contain terms with no physical basis derived statistically from the pro- tein database or from physical knowledge. The first method to solve the issue of having an accurate but computationally tractable energy function was Rosetta, arguably the currently most widely used structure-based algorithm [178]. For increased performance, Rosetta only considers interactions within a certain dis- tance cutoff and performs optimization by Monte-Carlo sampling [179]. Rosetta is powerful enough to design atomically accurate protein folds de novo, that exhibit RMSD values of 1 Å between the design and the experimentally determined ∼ structure [180]. Rosetta has also been used for enzyme design. Initially, a transition state in complex with stabilizing active site residues is calculated, the so-called “theozyme” [181]. In accord with Pauling’s principles, the chosen interactions should stabilize the rate-determining transition state. This arrangement of active-site residues can then be matched into a library of protein backbone structures [182–184]. The target protein can be a naturally existing fold or 1. Introduction 28 a de novo designed structure. Lastly, the residues around the active site are screened in silico to optimize packing [183, 185]. This strategy can be highly multiplexed and individual designs then ranked according to their energy score [186]. Rosetta has been used to design Kemp elimnases [72], retro-aldolases [150, 187], esterases [182, 188], and Diels-Alderases [189]. This type of modeling increases the chances of finding a catalyst compared to a random library [190], although many highly ranked designs are typically inactive and the active variants orders of magnitude slower than natural enzymes. This is probably due to loose binding and inaccurate positioning of the substrate relative to the active site residues, for which sub-Ångstrom precision is needed [175, 191, 192]. The redesign of active sites for new substrates or new reactions with Rosetta generally yields more active enzymes than de novo design [193–198]. More muta- tions can be evaluated in silico than is tractable for experimental screening and >10-fold improvements are not uncommon, an achievement that would typically need multiple rounds of directed evolution [199]. Li et al. showed that by sim- ulating mutants of an aspartase with Rosetta and a subsequent single round of experimental screening, they could identify a variant containing four mutations that performs the hydroamination of crotonic acid to (R)-β-aminobutanoic acid with a 99% e.e. and 92% isolated yield on a kilogram scale [85]. Two more variants were engineered analogously to produce (R)-β-aminopentanoic acid and (S)-β- asparagine with complete stereocontrol. Most notably, the authors performed two rounds of computational screening for (S)-β-phenylalanine. The Rosetta prediction for the first round was refined by a molecular dynamics simulation which was subsequently used as input for the second round. The resulting en- zyme contained four mutations and was able to produce (S)-β-phenylalanine with complete stereocontrol, albeit in moderate yield. Strikingly, though, the enzyme was completely regioselective and did not produce any α-phenylalanine. Thus, computational modeling made it possible to engineer four different aspar- tase variants for different substrates. In silico screening reduced the total number of experimentally tested variants to 75. Although this success likely reflects the fact that the native catalytic apparatus was retained. Further developments with respect to the accuracy of the energy functions, increasing computational power, 1. Introduction 29 growing structural databases, and knowledge about enzymology can be expected to allow routine design of efficient biocatalysts for many chemical transformations [200, 201]. For cases where no structural data are available, or large structural changes are expected, physics-based models might not be useful for protein engineering. In such cases sequence-driven statistical or machine learning models can be used to predict sequence-function relationships [202]. Statistical analysis of multiple sequence alignments (MSAs) of homologous enzymes can be used to guide library design and predict hot spots for mutation [203, 204]. Further, specific catalytic functions can be assigned [205, 206], mutational effects predicted [207–209], and new proteins designed [210, 211]. Most recently, a restricted Boltzmann machine was trained on a MSA of chorismate mutase enzymes and used to predict new sequences [212]. Active variants with <65% sequence identity, corresponding to 33 mutations, were found that exhibited activity matching their natural counter- parts. A downside to sequence-based methods is that they can not be used to guide engineering towards a certain specificity (e.g. substrate preference or enan- tioselectivity). However, they are interesting as a generative model for highly active enzymes that might possess new promiscuous activities. To focus design towards a specific goal, models must use labeled data, i.e. regarding sequence- function relationships. Machine learning tools, such as partial least squares regression [213], Gaus- sian process models [110], and neural networks [214, 215] have been applied to protein engineering. For example, a range of different machine learning algo- rithms was used to tune the enantioselectivity of a nitric oxide dioxygenase that performs a non-natural carbene Si-H insertion [216]. In each round, only a small number of variants was screened to train multiple models and predict additional variants. These were ranked by activity and used to generate a second focused recombination library. The authors were able to show that their libraries con- tained a higher fraction of active variants and that engineering proceeded faster than with the experimental evolutionary strategy alone. Besides these efforts, a lot of attention is currently directed towards emerging deep neural network models to encode and predict protein function [217, 218]. Although potentially 1. Introduction 30 powerful, such complex models need large datasets that are seldom available for an enzyme engineering project (>10,000 test examples) [202]. Additionally, the performance of machine learning models generally increases with more training data, regardless of the model type [219]. This is where the computational field and ultra-high throughput screening are probably going to meet in the future of enzyme design and engineering [220–222]. The combination has the potential to drive the development of increasingly sophisticated designer enzymes.

1.3 Project aims

Biocatalysis promises to be a key player in the future development of environ- mentally benign and efficient chemical manufacture [223]. Despite some notable successes, the development of enzymatic conversions in industry remains the ex- ception rather than the rule [92, 97]. The lack of available enzymes suitable for industrial processes requires time-consuming engineering of new variants by di- rected evolution, which is often a limiting factor in industrial biocatalysis [104]. Ultra-high throughput screening methods, such as fluorescence-activated droplet sorting (FADS), have sped up directed evolution [152] but most assays demand labeled substrates, rendering them impractical for changing an enzyme’s sub- strate selectivity [147]. Computational methods to create and improve enzymes exist [224, 225]. However, their predictive capability is often only powerful enough to guide library design or generate starting points, leaving most work again to experimental improvement [149, 216, 226]. In this thesis, we explore the use of ultra-high throughput screening as a tool for directed evolution. In Chapter 2, the development of a FADS assay for the evolution of oxidase enzymes is described. The peroxide side-product formed during regeneration of the flavin cofactor, is used as a reagent in an enzymatic cascade with a fluorogenic dye. We apply this label-free ultra-high throughput screening method experimentally to the diversification of cyclohexylamine oxidase and show that variants can be isolated that convert non-natural substrates that show little to no activity with the wild-type enzyme. Notably, the increased throughput of this assay allows the total redesign of the active site within one 1. Introduction 31 round of evolution, leading to entirely new substrate profiles and completely altered active site configurations. In Chapter 3, we attempt to improve an artificial metalloesterase with mi- crofluidic screening. MID1 is a computationally design zinc-binding protein that exhibits promiscuous ester cleavage activity [227, 228]. This enzyme was opti- mized over 10 rounds of conventional screening and mutagenesis in our lab to afford MID1sc10, a highly active mononuclear zinc-esterase [229]. Building on this work, we have created new libraries based on the evolved scaffold and devel- oped a microfluidic screen to increase the turnover number of the catalyst. One round of directed evolution revealed an alternative active site configuration, that could perform the reaction with similar rate acceleration to MID1sc10. With mutational and structural data from this project we hope to learn what residues and structural features could contribute to transition state stabilization as op- posed to binding. To reach this goal, fine-tuning of the assay conditions, more exhaustive screening of the created library, and other mutational approaches will likely be required. The results outlined in this thesis showcase the power and utility of ultra-high throughput screening in enzyme evolution, which allows us to evolve proteins with a range of starting activities into highly efficient artificial enzymes. The engineering of enzymes is often arduous due to the low throughput of most com- mon enzymatic assays and the inaccuracy of computational design methods. We show that by applying FADS, engineering of synthetically valuable biocatalysts is possible in a single step of directed evolution. Additionally, ultra-high throughput screening has been shown to be extremely effective in combination with computational design at creating nature like en- zymes de novo that mimic their natural counterparts. The assays shown here were applied to two different classes of enzymes to yield a palette of natural and computationally designed enzymes. Data generated from evolving enzymes can be used to improve computational design and will improve its accuracy. Combi- nation of ultra-high throughput screening and computational enzyme design will make it easier to design enzymes and make their use more widespread. Chapter 2 Evolving oxidases by ultra-high throughput screening

Author contributions for Chapter 2: The author was involved in all aspects of the project. Dr. Anthony P. Green and Dr. Richard Obexer initiated the project and assisted the initial experiments. Dr. Moritz Pott assisted microfluidic experiments and kinetic characterization of PT.1. Dr. Lukas Friedrich performed the molecular simulations of PT.1.

Parts of this chapter have been published in: Ultrahigh-throughput screening enables efficient single-round oxidase remodelling Aaron Debon, Moritz Pott, Richard Obexer, Anthony P. Green, Lukas Friedrich, Andrew D. Griffiths, Donald Hilvert, Nature Catalysis. 2, 740–747 (2019).

32 2. Evolving oxidases by ultra-high throughput screening 33 2.1 Introduction

Chiral amines play an indispensable role in the production of high-value chem- icals. Because approximately forty percent of active pharmaceutical ingredients (APIs) depend on optically active amines as intermediates during production [230], asymmetric amine synthesis has been identified as one of the key chal- lenges towards a greener and more sustainable chemical industry [223]. In recent years, biocatalysis has emerged as an attractive and competitive technology for chiral amine synthesis [81, 82, 86, 89], as enzymes display high levels of activity and stereocontrol, function under ambient conditions, and can in principle, be readily adapted for industrial purposes using laboratory evolution. Indeed, engi- neered enzymes have been applied in the large-scale manufacture of several chiral APIs and pharmaceutical building blocks [92, 231]. Adopting these biocatalysts has decreased the number of synthetic steps required, circumvented the need for costly transition metal catalysts, reduced the demand for organic solvents, and delivered products with increased optical purity, thus alleviating the need for expensive downstream purification steps. Amine oxidases have proven to be especially effective for the production of optically pure amines through deracemization, a process that combines oxida- tion catalyzed by an enantioselective enzyme and reduction by a non-selective chemical reducing agent (Fig. 2.1a) [232]. Cycles of selective oxidation and non-selective reduction lead to the accumulation of a single enantiomer in high yield. For example, monoamine oxidase from Aspergillus niger (MAO-N) and cyclohexylamine oxidase from Brevibacterium oxydans (CHAO) have been engi- neered to deracemize chiral auxiliaries [232] as well as secondary [233] and tertiary [234] amine-containing APIs and pharmaceutical building blocks [83]. Engineered amine oxidases have also been exploited in oxidative desymmetrization processes (Fig. 2.1b) [231]. MAO-N has been extensively engineered by the group of Nick Turner [22]. While the wild-type enzyme only showed activity towards amines with a single substituent at the α-carbon, initial rounds of evolution led to the identification of a triple mutant (MAO-N D3) which oxidized simple primary benzyl amines pos- 2. Evolving oxidases by ultra-high throughput screening 34

Figure 2.1: Deracemization (a) and desymmetrization (b) reactions using monamine oxidase enzymes, molecular oxygen and, in the former case, a non-specific reducing agent. sessing small α-substituents [232]. Two additional mutations afforded a variant (MAO-N D5) that efficiently oxidizes cyclic secondary amines such as 1-methyl tetrahydroisoquinoline and nicotine [235]. Subsequent structure-guided engineer- ing led to multiple variants capable of synthesizing alkaloids (MAO-N D5) [236], tetrahydro-β-carbolines (MAO-N D9) [234] and bulky APIs such as 1-phenyl tetrahydroisoquinoline (MAO-N D11) [83]. The evolution of MAO-N by Merck and Codexis to deracemize a precursor of the anti-hepatitis C drug, boceprevir represents a particularly impressive example of oxidase engineering [231]. Optimization over several rounds of mutagenesis and DNA shuffling with other oxidase homologs introduced 65 mutations into the wild-type enzyme. Although a high throughput robotic system capable of screening up to 100,000 variants a day was employed, the effort to obtain a suitable MAO-N variant was considerable. Nevertheless, the resulting catalyst drastically reduced the environmental impact of this production pipeline. MAO-N’s close homolog CHAO from Brevibacterium oxydans proved more difficult to engineer [237, 238], and only a few variants with altered substrate specificity have been created. Rational mutational screening of active site residues revealed variants capable of producing several 2-substituted 1,2,3,4-tetrahydro- quinolines [233, 238], aminotetralines, and a range of different primary amines [239]. However, improvements and activities of engineered variants were generally −1 −1 fairly low, with reported kcat/KM values almost never exceeding 1,000 M s .

For comparison wild-type CHAO oxidizes cyclohexylamine with a kcat/KM of 2. Evolving oxidases by ultra-high throughput screening 35

10,000 M−1s−1 [239]. Despite some notable successes, adapting oxidases to perform new tasks re- mains a challenging, time-consuming endeavor that restricts the potential impact of biocatalysis on many industrial processes. While powerful, directed evolution experiments are generally time consuming (months to years) and cost intensive (man hours, consumables, equipment) [116], rendering them incompatible with the fast pace of business and preventing broader implementation of biocatalysis [104]. The production of boceprevir using MAO-N is a case in point, as sales of the blockbuster drug were ceased just four years after its approval in favor of an alternative, improved medication. To increase the viability of biocatalysis for , methods that accelerate directed evolution are urgently needed [56, 115, 147]. Amine oxidases can be assayed by coupling the universal hydrogen perox- ide by-product formed during cofactor recycling to a reaction that generates a dye (Fig. 2.2a). This approach provides a versatile method for screening in microtiter plates [240], on agar plates [232], and in cells [128]. Nevertheless, engineering MAO-N for the Boceprevir manufacturing process required the in- troduction of 65 mutations. Even with dedicated infrastructure for automated and high-throughput enzyme engineering, biocatalyst optimization was arduous with most rounds contributing only single digit improvements in activity. Enzymatic assays using monodisperse emulsions generated on microfluidics chips have emerged as a powerful ultrahigh-throughput method in the search for novel biocatalysts [147]. This approach utilizes pL to nL volume aqueous droplets, stabilised by surfactant, in an inert, fluorinated oil, as independent mi- croreactors that contain single cells (Fig. 2.2b) akin to wells in a microtiter plate [241]. On-chip microfluidic modules allow multiple operations, including droplet production, splitting, fusion, reagent addition, incubation, fluorescence detec- tion, and fluorescence-activated droplet sorting (FADS), all at kHz frequencies (Fig. 2.2c) [130, 173]. To screen for novel enzymes, single cells that express in- dividual variants are typically encapsulated in droplets containing a fluorogenic substrate. Active enzymes can be displayed on the cell surface [116], secreted [144, 145], or released by cell lysis [146]. Compartmentalization thus links geno- 2. Evolving oxidases by ultra-high throughput screening 36 type and the fluorescent product, enabling enrichment of rare but active variants from large libraries by sorting according to activity (Fig. 2.2c). FADS increases the screening power dramatically compared to low- or medium-throughput tech- niques. Approximately 107 variants can be assayed per experiment in a sensitive manner, reducing the expenditure of time and cost in consumables by several orders of magnitude [116]. Here, we describe a label-free, ultrahigh-throughput, FADS-based coupled as- say for oxidase engineering and its application to rapid remodeling of the CHAO active site for industrially relevant substrates for which the original enzyme showed only low activity. We were able to generate an enzyme with a 3-order of magnitude increase in catalytic efficiency and wild-type levels of activity on the non-natural substrate in a single round of directed evolution. 2. Evolving oxidases by ultra-high throughput screening 37

Figure 2.2: Detection strategy and fluorescence-activated droplet sorting (FADS) of CHAO. a CHAO converts amines to the corresponding imine, reducing one equivalent of flavin adenine dinucleotide (FAD). Oxygen-dependent cofactor recycling produces equimolar amounts of hydrogen peroxide, which is detected by downstream oxidation of a fluorogenic substrate by horseradish peroxidase (HRP). b Surfactant stabilized droplets act as a diffusion barrier for catalyst-encoding DNA and the fluorescent re- porter generated in the coupled enzymatic reactions catalyzed by CHAO and HRP. c Schematic representation of a typical microfluidic chip used in this study29. Single cells are co-encapsulated with lysis agents, substrate, and the detection cascade in droplets dispersed in fluorinated oil. The cells are lysed in the droplets, releasing the enzyme, and, after incubation, droplet fluorescence is analyzed. Droplets with fluorescence ex- ceeding a defined threshold are sorted via dielectrophoresis and collected. The DNA of active clones is pooled, recovered by PCR, and subcloned for the next round of sorting and analysis. 2. Evolving oxidases by ultra-high throughput screening 38 2.2 Results

2.2.1 Initial assay development

Several parameters need to be optimal for such a coupled assay to be successful for CHAO evolution in a FADS format. High transformation efficiencies are a prerequisite for the construction of large libraries, expression has to be stable and potentially tunable, and the product must be detectable at low concentrations. To develop a suitable expression system for CHAO, several plasmid constructs were evaluated. Experiments to assess gene expression regulated by the T7 pro- moter showed significant plasmid instability in E. coli BL21 (DE3). Specifically, 27% of the transformants did not express the enzyme and preliminary sorting experiments largely enriched for mutants exhibiting lower toxicity to the cells. The need for an expression system with low basal transcription led us to create pKTNTET_6His_CHAO, a 4.2 kb medium-to-high copy plasmid that features a T7 and a tetracycline promoter. For sorting, the strategy was to use the XL1- Blue cell line, which lacks the T7 RNA polymerase, giving us tight control over expression levels with TetR. Additionally, XL1-Blue is easy to transform, reg- ularly yielding >108 CFUs per transformation, obviating the need for a second transformation step into a different expression strain. To over-produce and purify improved variants, BL21 (DE3)/ pLysS was used. It has downregulated basal expression by T7 lysozyme and is higher yielding than XL1-Blue, but rarely produces more than 105 CFUs per transformation. Both plasmid and strain combinations showed that the plasmid loss rate <1% was making them ideal for screening and expression. The envisioned assay works by detecting the hydrogen peroxide by-product of the oxidase reaction in a cascade reaction catalyzed by HRP that generates a fluorescent signal (Fig. 2.2a). The amine substrate, the fluorogenic dye, HRP, and lysis agents will be co-encapsulated with single cells expressing CHAO. Lysis will release the oxidase into solution and initiate H2O2 formation. We initially evaluated a resorufin-based fluorogenic probe as a detection dye. Testing our in- verted microscope set-up with 20 pL droplets containing different concentrations 2. Evolving oxidases by ultra-high throughput screening 39

Figure 2.3: General FADS workflow. Cycles of cloning, encapsulation, and sorting are carried out to enrich active variants in the population. The pool can be analyzed and the screen validated using different methods. of resorufin showed that the limit of detection is in the 50 nM range (Fig. 2.4). However, rapid leakage of resorufin into the oil phase was detected, which would lead to loss of signal in the FADS assay. Thus, we decided to monitor the reaction using the resorufin-based dye Amplex UltraRed. This reporter dye is a modified resorufin derivative that has increased solubility. These properties minimize loss of signal by diffusion, while retaining the spectral properties of the original dye, which make it ideal for detection in complex biological samples. 2. Evolving oxidases by ultra-high throughput screening 40

36 mW 2.5 18 mW 2.0 1.5 1.0 Signal Intensity (RFU) Signal 0.5 0.0

0 5 10 15 20 25

Resorufin (µM)

Figure 2.4: Photomultiplier signal against resorufin concentration. Different concen- trations were tested at two laser amplitudes, measuring at 609 nm with 36 mW (blue, circles) and 18 mW (red, triangles). The signal could not be improved by increasing the laser intensity at 50 nM resorufin. 2. Evolving oxidases by ultra-high throughput screening 41

2.2.2 Black and white sorting with cyclohexylamine

To validate the capability of our system to sort active variants from inactive ones, a black and white sort was carried out. An identical pKTNTET_6His_RA95.5- 8f plasmid was created containing the gene for the retroaldolase RA95.5-8f and transformed into XL1-Blue as a negative control for oxidase activity. CHAO and RA95.5-8f constructs were then expressed separately and aliquots of both cultures washed to remove dead and partially lysed cells. A mixture of 1:1,000 CHAO to RA95.5-8f expressing cells was then dispersed on a microfluidic chip (Methods, Fig. 5.1) and combined with cyclohexylamine, lysis agents, and the detection reagents in 20 pL droplets. The emulsion was collected off-chip and, after an hour of incubation, sorted based on activity (Fig. 2.5a). The CHAO and RA95.5-8f genes of the unsorted and sorted mixtures were amplified by PCR and analyzed using agarose gel electrophoresis. An increased amount of chao PCR product was observed after sorting. This indicated that we could indeed enrich the DNA of the positive versus the negative control (Fig. 2.5b).

Figure 2.5: Black and white sorting of pKTNTET_6His_CHAO expressed in XL1- Blue under tetracycline control. a Initially, cells expressing CHAO and RA95.5-8F were mixed in a 1:1000 ratio and then the 0.07% most fluorescent droplets were sorted. b Sorting enriched the amount of CHAO DNA after PCR amplification. 2. Evolving oxidases by ultra-high throughput screening 42

2.2.3 Initial FADS experiments

Focused library creation

To test the ability of our sorting assay to perform directed evolution experiments, an initial library was created by randomizing four residues simultaneously. Based on previous rational engineering efforts described in literature [239], four residues were targeted for mutation with NNK codons. Leu199, Tyr321, and Leu353 were targeted as they constitute part of the active site (Fig. 2.6a), whereas Met226 was targeted based on its proposed role in substrate uptake and product release [242]. The introduction of four NNK codons creates a library containing approximately 106 members, a pool that is intractable for any plate-based assay. Conversely, three-fold oversampling of a million-member-sized library, guaranteeing that ev- ery variant has been sampled [108], can easily be achieved with our FADS assay within 3-4 h by screening droplets at a rate of 1 kHz with a cell occupancy of 20-30%.

Figure 2.6: Design of the focused library and analysis of the initial FADS experi- ment with (S)-1-phenylethylamine. a Four residues chosen for simultaneous mutation are shown as blue spheres. Flavin adenine dinucleotide (FAD) and the co-crystalized cyclohexanone are shown in yellow. b A histogram of the activity distribution of ran- domly sampled variants from the unsorted library, as well as after one and two rounds of fluorescence activated droplet sorting. 2. Evolving oxidases by ultra-high throughput screening 43

Screening for (S)-1-phenylethylamine activity

As an initial test substrate (S)-1-phenylethylamine was chosen. It is converted by wild-type CHAO, but the starting activity is only approximately 30% that of cyclohexylamine. Two rounds of sorting were carried out using a chip that contains a delay-line, a long channel to incubate the droplets for 45 min on-chip (Methods, Fig. 5.2). A large fraction of the droplets were active (0.3%), likely due to the high starting activity for (S)-1-phenylethylamine. The unsorted and sorted pools were analyzed using a 96-well plate assay. No variant from the initial library was found to be as active as wild-type CHAO. However, after two rounds of FADS, a dozen improved variants displaying similarly high activities were discovered (Fig. 2.6b). The best variant, a double mutant containing the M226T and Y321I substitutions, was purified and kinetically characterized. It −1 −1 shows an almost 10-fold increase in kcat/KM (9,400 M s ) compared to wild- type CHAO (1,100 M−1 s−1). These data gave us confidence that the assay was capable of detecting and sorting highly active oxidases. We therefore advanced to the testing of more challenging substrates with lower starting activities.

Screening for 1-methyl-1,2,3,4-tetrahydroisoquinoline activity

To gauge the utility of the FADS assay and our library design when challenged with more difficult substrates, we decided to evolve CHAO for 1,2,3,4- tetrahydroisoquinoline (TIQ) scaffolds. 1-Substituted TIQs represent a common drug motif and are often found in complex alkaloids and synthetic drugs [243]. A key example is the blockbuster drug solifenacin, which contains a (S)-1- phenyl-1,2,3,4-tetrahydrosioquinoline ((S)-PheTIQ) moiety that was previously deracemized with MAO-N D11 [83]. In contrast to the evolved MAO-N variants, CHAO has no reported activity with such bulky TIQ derivatives. A Y321I mutant has been isolated and shown to possess very low activity towards 1-methyl TIQ [244]. In a plate assay, wild-type CHAO only shows trace activity with PheTIQ and MeTIQ. The starting activity for TIQ is 0.04% that of cyclohexylamine. This led us to sort emulsions based on TIQ activity to find a starting point for further 2. Evolving oxidases by ultra-high throughput screening 44 evolution towards bulkier TIQ derivatives. The emulsion was stored off-chip to allow for longer incubation times and re-injected onto a sorting device (Methods, Fig. 5.1) after 1-2h reaction at room temperature. Only an extremely small fraction of droplets, i.e. <1:10,000, showed any conversion (Fig. 2.7a). Since only 300 droplets could be isolated, a second round of sorting was not performed. Analysis by lysate-based assays showed visibly low activity and only few variants managed to convert enough substrate to be detected by a spectrophotometer (Fig 2.7b). The most active variant, T.1, had the previously observed M226T mutation in the substrate diffusion channel (Fig 2.7c). Additionally, it contains the counterintuitive L199F mutation, which presumably would make the active site smaller. The mutation Y321L identified in the previous sorts may make space for the larger substrate. Because it was most active, this variant was considered for purification and characterization. The T.2 variant is similar in terms of its sequence. T.3 had the mutation a of Tyr to Glu, placing a charged residue in a hydrophobic region, which would be expected to disfavour substrate binding and hence catalysis. We were specifically interested to know whether we had managed to create a starting point to evolve CHAO for chiral 1-substituted TIQs. Wild-type CHAO and T.1 were therefore purified and characterized for rac-1-MeTIQ oxidation. At relatively high enzyme concentrations, wild-type CHAO gave a kcat/KM of 0.5 M−1 −1 for this substrate, whereas T.1 was able to perform the reaction with −1 −1 a catalytic efficiency of 25 M s . Nevertheless, the kcat value was merely 1 min−1. Even though a 50-fold improvement is an order of magnitude higher than what is typically expected in a single step of directed evolution, T.1 is far from being a proficient enzyme. Increasing its efficiency would clearly require multiple rounds of mutagenesis and screening. We speculated that: a) our library design and residue choice did not encode many active variants for TIQ scaffolds, and b) the design could be improved by removing undesired mutations, such as Y321E, which was found in this sort. 2. Evolving oxidases by ultra-high throughput screening 45

Figure 2.7: Sorting of the focused library for activity against tetrahydroisoquinoline. a During the sorting procedure only a very low fraction of active droplets were observed and as a consequence only 300 droplets could be collected. b The plate assays only revealed three variants to have significant activity. The most active produced little, however, visible amounts of reporter dye. c Three variants showed increased initial velocities compared to wild-type CHAO.

2.2.4 Single-step creation of a proficient oxidase

Library design

Lack of knowledge of which residues to target, what substitutions to allow, and the combinatorial explosion of randomizing multiple residues simultaneously make the design of successful gene libraries challenging. Libraries can be enriched in functional variants by eliminating residues that are biophysically irrelevant. In the case of CHAO, the substrate binds adjacent to the FAD cofactor in a very hydrophobic pocket that is shielded from the outside by Met226, among other residues (Fig. 2.8a). We analyzed the side-chains closest to the co-crystallized cyclohexanone and made a rational model of the substrate’s binding mode (Fig. 2.8b). CHAO converts small (S)-configured 1-phenylamines in high turnover. 2. Evolving oxidases by ultra-high throughput screening 46

We assumed that their phenyl moiety will bind facing the exit channel in the wild-type structure, because other binding modes would lead to steric clashes. Additionally, in this light the Y321I and Y321L mutations found previously for amines possessing large α-substituents, we expected steric clashes in that region. Thus, we set out to design a library that reduces the number of hydrophilic residues in the active-site-pocket and allows us to target as many of the 11 residues that make up the active site as possible without becoming intractable for FADS. With the goal of retaining the overall active-site polarity but varying its shape and steric properties, two degenerate codons were designed to encode for six amino-acids encompassing hydrophobic residues of different size and maintains the possibility of hydrogen bonding. The DYT degenerate codon encodes for Ala, Ser, Thr, Val, Ile and Phe. For the Pro422 position, BYT which codes for Ala, Ser, Pro, Val, Leu and Phe was chosen to allow for the possibility of retaining the evolutionarily strongly conserved proline residue (Supplementary table B.4). The final library thus contained the BYT codon for Pro422 plus seven DYT codons for Thr198, Leu199, Met226, Tyr321, Leu353, Phe351, and Phe368. It was constructed by overlap-extension PCR using primers containing the degenerate codon. Both mutagenic codons feature only a single codon per amino acid and exclude stop codons (Supplementary table B.1& B.2). Therefore, up to eight residues can be targeted simultaneously, leading to a library with 1.7x106 members which can be screened by FADS in <6h. 2. Evolving oxidases by ultra-high throughput screening 47

Figure 2.8: Active site of CHAO (PDB ID: 4159). a The ten active site residues within 5.5 Å of the co-crystallized cyclohexanone ligand (yellow), Met226 in the diffusion channel, and FAD (green) are shown as sticks. The seven residues targeted for mutation (Thr198, Leu199, Tyr321, Phe351, Leu353, Phe368 and Pro422) are highlighted in orange. b Rationalization of wild-type CHAO with the substrate (R)-PheTIQ in a hypothetical binding mode. The entrance channel is indicated with arrows, with Met226 at the outside. The readily converted (S)-1-phenylethylamine is drawn in black, whereas the moieties that are added in (R)-PheTIQ are shown in red.

Discovery of PT.1

XL1-Blue cells were transformed with the library and the variants expressed with the goal of screening for variants that could convert 1-PheTIQ. Single E. coli were encapsulated and lysed in 15 pL droplets where they were challenged with 4 mM of rac-PheTIQ. The first two sorts were carried out with off-chip incubation for 1-2h before sorting. Subcloning and analysis of individual variants of the sorted and unsorted pools with a lysate-based microtiter plate assay found clones with up to 150-fold increase in activity after sorting, but the sequences showed little convergence (except for the P422S mutation) and the activity distribution indicated almost no enrichment. Judging from large improvements but poor enrichment, we conducted a third, more stringent sort. For higher evolutionary pressure, the droplets were incubated for 15 minutes using an on-chip incubation channel (Methods, Fig. 5.2b). The 2. Evolving oxidases by ultra-high throughput screening 48 plasmids extracted from the sorted droplets were used directly to transform XL1- Blue cells and the transformants analyzed in a microtiter plate assay. This time, many more active variants were detected, with 43% displaying improved activity as opposed to 3% before sorting. A 78-fold improvement in average substrate conversion rate was calculated over all variants that were sampled. The three most active variants showed a >400-fold increase in activity (Supplementary figure A.2). The latter variants all had the same coding mutations, but judging from different silent mutations acquired during PCR in the previous rounds of subcloning, they stemmed from individual sorting events. This catalyst, called PT.1, (Fig. 2.9a), has five mutations (L199V, M226T, Y321S, L353I, and P422S).

Figure 2.9: PT.1 model and characterization. a The mutated active-site residues of PT.1 are indicated as spheres, FAD can be seen in green, and the docked (R)- PheTIQ in yellow. b The catalytic efficiency of PT.1 and wild-type CHAO against both enantiomers of PheTIQ. The improvement in efficiency is almost three orders of magnitude for the (R)-enantiomer, whereas the (S)-enantiomer is converted at very low rates, even undetectable for the wild-type enzyme.

Kinetic characterization of PT.1

PT.1 was purified and characterized alongside wild-type CHAO for the oxidation of both PheTIQ enantiomers (Fig. 2.9b). PT.1 is a highly active catalyst, show- −1 −1 ing a 960-fold improved kcat/KM of 9,400 M s with (R)-PheTIQ compared 2. Evolving oxidases by ultra-high throughput screening 49 to wild-type CHAO with the same substrate (Table 2.1, Supplementary figure A.3). The main contributor to this improvement is a 340-fold increase in the −1 catalytic rate constant (kcat=3.2 s ). This is potentially due to screening with substrate concentrations well above the KM value for (R)-PheTIQ with wild-type CHAO (4 mM vs. 1 mM). The catalytic efficiency compares favorably with the re- ported kcat/KM of the wild-type enzyme with its native substrate cyclohexylamine (10,630 M−1 s−1)[239]. Additionally, PT.1 is extremely selective, preferring the

(R)-enantiomer with a selectivity factor S = (kcat/KM)R/(kcat/KM)S =4,200. The specificity is in the range of naturally occurring D-amino acid oxidases which exhibit s S values of 1,000-20,000 depending on the substrate [27]. Generally, however, a selectivity factor of >50 is considered suitable for industrial processes. Analysis of the substrate scope of PT.1 revealed a dramatic change in sub- strate acceptance (Fig. 2.10). Whereas the wild-type enzyme prefers primary amines like cyclohexylamine and the (S)-configured α-benzylamines 2 and 3, PT.1 is almost inactive towards these substrates. Instead, it converts (R)- configured α-benzylamines with sterically demanding minor substituents (e.g. 4 and 8) and bulky secondary amines like 6 and 7.

Table 2.1: Catalytic parameters for wild-type CHAO and the evolved 1-PheTIQ variant. The errors represent the s.d. derived from three individual measurements of separately purified protein batches. N.A. indicates that no activity was detectable above background.

k K k /K Enzyme Substrate cat M cat M (s−1) (µM) (M−1 s−1) WT (R)-1-PheTIQ 0.0095 0.0004 973 229 10 1 (S)-1-PheTIQ± –± – N.A.± PT.1 (R)-1-PheTIQ 3.2 0.1 343 39 9400 1100 (S)-1-PheTIQ± – –± 2.2 ± 0.2 ± 2. Evolving oxidases by ultra-high throughput screening 50

Figure 2.10: Substrate scope of PT.1 compared to wild-type (WT) CHAO. The wild-type preferably reacts with small (S)-configured amines and has very low acivity towards bulkier substrates. Conversely, PT.1 oxidizes bulky (R)-configured amines with minor α-substituents larger than a methyl group. Reactions with 2 mM substrate and varying enzyme concentration were monitored by observing the HRP catalyzed reaction between the hydrogen peroxide by-product, 4-aminoantipyrine, and vanillic acid at 498 nm.

Biocatalytic transformations

Purified PT.1 and wild-type CHAO were compared for their efficacy in producing the optically pure (S)-PheTIQ Solifenacin precursor by deracemization of rac- PheTIQ (Figure 2.11). The processes were initiated by adding purified oxidase to a buffered solution of the racemic substrate and 4 eq. of ammonia borane complex as reducing agent. Once the optical purity converged, the product was extracted using organic solvent. At 0.05% catalyst loading, PT.1 achieved a 99% e.e. for the desired (S)-enantiomer within 9 h, whereas the wild-type enzyme did not enrich either enantiomer over two days under the same conditions (Fig. 2.11, 2.12). Analogous results were obtained when the reaction with (rac)-7 was carried out with whole cells producing the PT.1 variant. The latter conditions were therefore used in deracemization experiments with the other tetrahydroisoquino- line derivatives shown in Figure 2.10 (Tab. 2.2). Compound 6, which has an ethyl rather than a phenyl substituent, afforded (S)-EthylTIQ with an e.e. of 98% after 36 h. In contrast, little enantiomeric enrichment (e.e. 5%) was ob- served for compound 5 with the even smaller methyl substituent, reflecting both its modest steric demands and substantially lower activity with the enzyme. 2. Evolving oxidases by ultra-high throughput screening 51

Figure 2.11: The isolated variant, PT.1, was used to deracemize PheTIQ in combi- nation with ammonia borane. Deracemizations mediated by PT.1 resulted in accumu- lation of (S)-PheTIQ (>99% e.e.) over the course of several hours, whereas analogous reactions with wild-type CHAO gave no enrichment in optical purity.

Attempts to deracemize primary amines like 2, 3, 4 and 8 were complicated by the competitive hydrolysis of the iminium ion product of the enzymatic reaction before it can be reduced by the ammonium borane, leading to accumulation of unwanted ketone and/or alcohol side products. Nevertheless, compound 4, which has a phenyl and a propyl group that are readily distinguishable and sufficiently large to bind productively to the enlarged PT.1 pocket, gave (S)-1- phenylbutylamine in 35% yield and 50% e.e. after 48 h reaction. In contrast, no enantiomeric enrichment was observed for 3 or 8. Although kinetic measurements with the individual enantiomers of 3 indicate an S value of 17 (Fig. 2.10), this compound is a relatively poor substrate for the enzyme, making preparative deracemization impractical over a reasonable time frame. Although 8 is a good substrate for the enzyme, the remodeled PT.1 binding pocket is apparently unable to distinguish between the phenyl and p-chlorophenyl substituents. 2. Evolving oxidases by ultra-high throughput screening 52

Figure 2.12: Deracemization reactions performed with PT.1. Chiral HPLC analysis of the products of the reactions with a, rac-4 (48 h, after acid base extraction; 50% e.e. for (S)-4); b, rac-5 (36 h; 50% e.e. for (S)-5); c, rac-6 (36 h; e.e. 98% e.e. (S)- 6); and d, rac-7 (9 h; 99% e.e. for (S)-7). Except for the reaction with rac-7, which was performed with purified enzyme, the reactions were carried out with whole cells producing PT.1. If the individual enantiomers were not commercially available, the peaks were assigned according to literature values [83, 245, 246]. The asterisk indicates the injection peak from the tert-butyl methyl ether.

Computational modeling

To gain insight into the molecular basis of the switch in substrate specificity, we preformed docking experiments on wild-type CHAO (PDB ID: 4i59) and an in silico model of the optimized PT.1 catalyst with both (R)- and (S)-PheTIQ. Although the starting enzyme cannot accommodate either substrate enantiomer without substantial steric clashes, the active site substitutions remodeled the binding pocket for shape complementary recognition of the larger PheTIQ sub- strate. In particular, the Y321S and P422S mutations carved out additional space for the tetrahydroisoquinoline ring, allowing the experimentally preferred (R)- 2. Evolving oxidases by ultra-high throughput screening 53

Table 2.2: Preparative deracemization with PT.1. Reactions were carried out with 4 equiv. NH3BH3 and 10 mM substrate in 1 M sodium phosphate buffer at pH 7.0 at 30 ◦C. Except for deracemization of compound 7, which contained 5 µM purified enzyme, all reactions were carried out with whole cells. ND means not determined.

Compound e.e. (%) Yield (%) Time (h) 3 0 ND 48 4 50 (S) 35 48 5 5 (S) 75 36 6 98 (S) 81 36 7 99 (S) 71 9 8 0 ND 48

PheTIQ to dock in a catalytically competent orientation proximal to the flavin cofactor (Fig. 2.13a). Although (S)-PheTIQ also fits in the enlarged pocket, it adopts an orientation that precludes hydride transfer to the cofactor because the relevant C-H bond points away. A hydrogen-bonding interaction between the (S)-PheTIQ amine and the side chain of Ser422, which was introduced during optimization, and the backbone carbonyl of Thr198 favors this unproductive pose (Fig. 2.13b). Mutation of Met226 in the substrate diffusion channel to threonine matches previous reports showing that smaller residues at this site boost activ- ity for a multitude of substrates [239], likely improving substrate uptake and/or product release. 2. Evolving oxidases by ultra-high throughput screening 54

Figure 2.13: Active site models of PT.1. a (R)-PheTIQ docked into the model shows shape complementarity of the substrate and the newly formed space by the Y321S an P422S mutations. b The (S)-enantiomer was found in a conformation with the relevant C-H bond facing away from FAD in a catalytically irrelevant position.

2.3 Discussion

Enzymes have the potential to play a major role in reducing the environmental impact of chemical manufacture. The dramatically enhanced screening power of droplet microfluidics compared to robotic, microtiter plate-based techniques is facilitating both their discovery and optimization. To date, screening large li- braries from bacteria [130, 146, 149], yeast [116, 153], and filamentous fungi [144] and large (>1 million genes) metagenomic libraries comprising environmental DNA from microorganisms [154] has yielded active hydrolases [154, 247], perox- idases [116], cellulases[155], and aldolases [149]. Screening pairs of enantiomeric substrates with different colored fluorescent leaving groups even enables discovery of enantioselective catalysts [247]. Non-fluorescent screening of enzymatic reac- tions in droplets using absorbance-activated droplet sorting, Raman-activated droplet sorting, and mass activated droplet sorting promises to further expand these capabilities [162, 166, 248]. If biocatalysis is to be competitive with conven- tional synthesis, it has been suggested that directed evolution efforts to adapt natural and designed enzymes for specific tasks needs to become at least an order 2. Evolving oxidases by ultra-high throughput screening 55 of magnitude faster [104]. Droplet microfluidics is poised to make this goal a real- ity. It has already proven particularly efficacious in cases where directed evolution using microtiter plate-based approaches fail as a result of reaching an apparent local fitness plateau, from which it is only possible to escape by screening a larger number of variants. For instance, directed evolution of an artificial aldolase us- ing droplet microfluidics enabled the best enzyme from a stalled microtiter-plate selection to be improved 30-fold to give a >109 rate enhancement that rivals the efficiency of natural class I aldolases [149]. Nevertheless, most examples of enzyme evolution using droplet microfluidics to date have focused on improving enzymes that already show important levels of activity over multiple rounds of evolution and use fluorogenic model substrates of little industrial interest. As highlighted here, coupled enzyme assays have the potential to generalize this approach. Libraries of cyclohexylamine oxidase that include >106 members were screened against multiple substrates of industrial interest. Even with a relatively unsophisticated library, randomizing four active- site residues, single rounds of mutagenesis and screening led to improvements up to 50-fold. Such large improvements are fairly uncommon using lower throughput methods. The discovery of PT.1 shows, that synthetically useful enzymes can be discovered from these huge libraries (here 1.7x106 permutations of 8 residues) in a time-efficient manner by FADS. In a single round of evolution, a nearly three order of magnitude leap in catalytic efficiency and a radically reshaped substrate profile was achieved. This approach outperformed conventional evolutionary optimization of MAO-N for PheTIQ, which entailed separate screening of six different libraries and a second round of mutagenesis to produce a catalyst with a kcat/KM an order of magnitude lower than PT.1 [245], underscoring the advantage of simultaneously mutating multiple residues to facilitate the discovery of novel active site configurations. In conclusion, our work highlights FADS as a powerful and versatile technol- ogy for rapidly engineering oxidases to perform new and valuable functions. It enabled engineering of a tailormade, synthetically useful enzyme for an impor- tant chiral API precursor in a timeframe compatible with today’s product plant development. FADS has the flexibility of a typical microtiter plate assay, making 2. Evolving oxidases by ultra-high throughput screening 56 it easy to adapt established screening approaches. In contrast to in cellulo assays [128], it is not limited by poor uptake of substrates or product diffusion. More- over, the coupled assay used here is label free, obviating the need for surrogate substrates through detection of a common by-product, and should be readily transferrable to any other oxidase. These capabilities promise to facilitate rapid tailoring of substrate specificity for a whole class of industrially important en- zymes for the production of chiral amines, secondary thiols [249], and alcohols [250]. This approach should be easily extendable to other enzyme families that produce detectable (by-)products during catalytic turnover. Chapter 3 Improving a proficient artificial metalloenzyme

Author contributions for Chapter 3: The author was involved in all aspects of this project, except the preliminary sorting experiments described in reference [251] and chemical synthesis. Dr. Douglas A. Hansen and Dr. Dominic G. Hoch synthesized the 2-phenylpropionate coumarin ester substrate. Dr. Sabine Studer was involved in the generation of the gene library. Oliver Allemann helped with assay development. Dr. Hoch was involved with screening and the characterization of MID1sc10.1-3.

Parts of this chapter have been published in: Evolution of a highly active and enantiospecific metalloenzyme from short Sabine Studer, Douglas A. Hansen, Zbigniew L. Pianowski, Peer R. E. Mittl, Aaron Debon, Sharon L. Guffy, Bryan S. Der, Brian Kuhlman, Donald Hilvert, Science. 362, 1285–1288 (2018).

57 3. Improving a proficient artificial metalloenzyme 58 3.1 Introduction

Tailoring naturally occurring enzymes using directed evolution is a powerful way of creating new catalysts. This approach fails, however, when no enzyme with suitable starting activity is available. The design of artificial enzymes has there- fore been a long standing goal in biological chemistry [252]. Producing small peptide structures from simple design principles, and then assaying them for promiscuous catalytic activity, is one way to create new en- zymes. Helical bundles, for example, can be designed with pen and paper using helical wheel drawings [253]. This method was used to create an enzyme that catalyzes the hydrolysis of the iron-binding siderophore enterobactin [254]. The authors designed a helix-turn-helix motif that formed a dimer in solution and cre- ated a library randomizing the core residues of the dimer, to test for promiscuous enzyme activity. By expressing this library in E. coli cells with an impaired iron- storage metabolism, a variant was discovered by genetic selection to rescue the cells phenotype by cleaving enterobactin and releasing Fe(III) ions in the process. Equipping minimalist designs with metal ion binding sites to harness the intrinsic reactivity of metal ions can also yield active catalysts [255]. For example, a three-helix bundle coordinating a structural Hg(II) ion and a catalytic Zn(II) ion was able to catalyse the cleavage of p-nitrophenyl actetate with a kcat/KM value of 3.1 M−1s−1 without any previous engineering [256]. Another example using metal ions is a design that employs Zn(II) ions bound at the interface of larger assemblies of redesigned natural proteins [257]. By using a genetic selection strategy to screen for antibiotic resistance, this scaffold was evolved for β-lactamase activity. However, even with a screening system as powerful as genetic selection, the best variant that was found only possessed a kcat/KM value of 5.8 M−1s−1. The authors’ choice to only randomize single residues at a time may have prevented them from harnessing synergistic effects between mutations and therefore from finding more efficient variants. These examples show that the de novo creation of enzymes from helical bundles is possible, but the starting activities of the designs are usually modest. Additionally, the last example showcases that without a suitable engineering strategy it is difficult to 3. Improving a proficient artificial metalloenzyme 59 create artificial enzymes with useful levels of activity. Recently, our lab described the creation of a highly efficient de novo metal- loesterase by using a design process mimicking the natural genesis of metalloen- zymes [229]. The starting point was the computational design MID1, a 45 amino acid helix-turn-helix polypeptide that dimerizes by binding two Zn(II) ions [227]. The metal ions which mediate dimerization were supposed to be coordinated by four histidines each. Upon structure elucidation, however, it was found that both Zn(II) ions possessed an open coordination site, allowing for promiscuous esterase activity [228]. To create a globular protein, the two subunits were linked together by a short linker and the metal-coordinating histidines for one of the binding sites were removed by computational design [229]. The resulting single-chain variant, −1 called MID1sc (Fig. 3.1a), exhibited a kcat of 0.011 s and a catalytic efficiency −1 −1 (kcat/KM) of 18 M s for the cleavage of rac-2-phenylpropionate coumarin ester (1, Fig. 3.1b). It also showed modest enantiospecificity for the (R)-enantiomer with a selectivity factor S=(kcat/KM)S/(kcat/KM)R of 0.5. MID1sc was subsequently optimized for higher esterase activity over nine rounds of directed evolution using a lysate-based, 96-well microtiterplate assay with rac-1. Initially, active site residues were targeted and the best mutations shuffled. Later, the whole gene was targeted by epPCR, and the zinc binding site rationally re-engineered to eliminate competing zinc binding modes. Remarkably, −1 −1 over the course of evolution kcat/KM increased >50,000-fold to 980,000 M s for (S)-1. The final variant was also highly selective for the (S)-enantiomer (S=990). This increase in specificity is striking, since screening was conducted entirely with the racemic substrate. Since multititer plate screens need large volumes of reactants, the concentration of the precious substrate had to be kept rather low during screening (2.5 µM in initial rounds), resulting in a very low KM value of 1.7 µM. Thus, even though the catalytic efficiency (kcat/KM) is similar −1 to natural hydrolases, the kcat of 1.64 s is 2-3 orders of magnitude lower than the fastest hydrolases [258–260](Fig. 3.1c,d). Co-crystallization with a phosphonate transition state analog showed that the structure of MID1sc10 had changed drastically during optimization. Tight- ening of the crossover angle between the two helix-turn-helix fragments through 3. Improving a proficient artificial metalloenzyme 60

Figure 3.1: MID1sc10 a proficient metalloesterase. a The crystal structure of MID1sc10 with a bound phosphonate inhibitor. b The hydrolytic cleavage reaction of 2-phenylpropionate coumarin ester (1) as catalyzed by MID1sc10. c The evolved enzyme is highly efficient in ester hydrolysis, but its turnover number is much lower than that of typical natural esterases [258, 259], as seen in c. a switch in the coordination sphere of the zinc ion created a pocket for the pro- pionate to bind tightly (Fig. 3.1a). One phosphonate of the transition state analog, which mimics the oxyanion formed during ester hydrolysis, binds to zinc and the other is bidentately coordinated by an arginine. This remarkable trans- formation came about by mutating >20% of the entire sequence. Moreover, the evolutionary trajectory showed that a structural element, like the zinc ion, can evolve into a dedicated cofactor and drive catalysis with astounding efficiency and selectivity. In this chapter we explore the utility of droplet microfluidics screening to further improve MID1sc10 with respect to its turnover rate. FADS has proven to be able to improve highly active enzymes, such as horseradish peroxidase with a catalytic efficiency of 3.1x106 M−1s−1. It uses fractions of the reactant volumes needed for other screening methods [116], so it should facilitate screening at 3. Improving a proficient artificial metalloenzyme 61

substrate concentrations well above the KM for (S)-1. Being able to explore sequence space more extensively might help to escape this local fitness plateau of MID1sc10, providing access to variants that achieve higher rate accelerations. Success would contribute to understanding the structural determinants of highly accelerated chemical reactions of artificial metalloenzymes.

3.2 Results

3.2.1 Esterase assay development

The fluorogenic substrate cleaved by MID1sc can be readily used for FADS due to its coumarin leaving group. The negatively charged sulfonate in the coumarin moiety is ideal for use in emulsion systems, ensuring that the fluorophore gen- erated upon cleavage remains in the droplet where it is generated. By limiting loss of signal and minimizing crosstalk between droplets, the critical link between genotype and phenotype is maintained. Initial proof-of-principle experiments were already carried out in our lab in the fourth round of MID1sc optimization [251]. Because MID1sc was expressed as a fusion construct with maltose binding protein (MBP), it had to be cleaved using TEV protease before analysis. To avoid this step, an alternative expression system involving periplasmic export was therefore evaluated for FADS (Fig. 3.2). Initially, MID1sc4a was cloned into a pMG209 plasmid [261] containing a pelB secretion tag. Initial black and white sorts of MID1sc4a-expressing cells and cells with an empty pMG209 vector indicated that the enzyme was retained in the periplasm and coumarin fluorescence could be detected in 2 pL droplets when in- cubated with (S)-1. Because, 2 pL droplet sorters were only available for off-chip droplet storage and incubation, we wanted to adapt the assay to work in 20 pL droplets. The available microfluidic devices for 20 pL droplets allow for incuba- tion in channels on-chip, allowing to tightly control the selection stringency. The larger dilution factor, means that stable expression of the construct is needed to detect esterase activity under stringent sorting conditions. Furthermore, the plasmid should be small and transform E. coli cells efficiently (>3 mio CFUs). 3. Improving a proficient artificial metalloenzyme 62

To see whether we would reproduce the preliminary results obtained in 2 pL droplets in by sorting 20 pL droplets, we replaced MID1sc4a in the pMG209 vector with MID1sc10, our most evolved catalyst. Test sorts were carried out by expressing this variant in E. coli, dispersed in 20 pL droplets that were collected off chip. We observed that esterase activity varied considerably between experiments (Fig. 3.3). Testing various substrate stocks with FADS and reverse- phase HPLC revealed that the assay was very sensitive to partial hydrolysis of the substrate and thus resulting in 2-phenylpropionate contamination. The latter is a fairly potent inhibitor of MID1sc10 with a Ki of 1.1 µM[229]. To select for variants with high turnover numbers, we used high substrate concentrations. For

MID1sc10, wich has a KM of 1.7 µM, more than 100 µM was chosen. At these high concentrations, even small amounts of hydrolysis can inhibit the enzyme notably. Thus, fresh substrate was synthesized, prepared from enantiopure (S)- 1-phenylpropionate.

Figure 3.2: Expression and detection strategies evaluated in this study. MID1sc10 can be released into the reaction buffer containing the substrate by cell lysis. Periplasmic export transports the enzyme co-translationally out of the cytoplasm into the periplasm. The periplasmic membrane is permeable to the substrate, but not to the enzyme. The encapsulation in droplets and retention of the enzyme in the cytoplasm/periplasm thus retains the link between genotype and phenotype. 3. Improving a proficient artificial metalloenzyme 63

These initial tests also allowed us to see if E. coli would be able to survive the sorting procedure and hence simply regrown, to circumvent the need to subclone the gene after every sort. Several iterations of reaction buffers (isotonic HEPES buffer or isotonic HEPES buffer containing 0.4% ) and LB media with or without kanamycin were tested for collection of the emulsion. The best buffer was isotonic HEPES with 0.04% glucose collected in LB medium lacking antibiotics. Emulsions were disrupted by adding 1H,1H,2H,2H-perfluorooctanol and gentle mixing, followed by centrifugation on a tabletop centrifuge. Unfortunately, only 25% of the sorted cells survived the sorting process. Given the multiple hours needed to sort a library containing a million members, the idea of regrowing the cells after sorting was consequently deemed impractical, as it would increase the required sorting time four-fold to >12h. Additionally, the MID1sc10 gene is short and was found to reliably amplify after sorting, making regular subcloning tractable.

Figure 3.3: Activity measurements of two sample FADS assays. The plot on the left corresponds to a substrate stock which was partially hydrolyzed, whereas the one on the right was freshly prepared. 3. Improving a proficient artificial metalloenzyme 64

3.2.2 Screening expression systems

The pMG209 plasmid proved difficult to transform, often yielding <105 CFUs, which is unsuitable for large libraries. Potential reasons include its relatively large size (6 kbp) and the constitutive salicylate promoter, which may lead to toxic effects due to high basal expression. On this basis, multiple other constructs were prepared. Two pKTNTET plasmids were generated, one with a pelB leader peptide and the other without any tags. These plasmids provide the option of using T7 promoter-driven expression in strains harboring a λ-phage plasmid or stringent tetracycline control in cells lacking this helper plasmid. Furthermore, the pKTNTET plasmid is generally found to transform well due to its small size (2.9kbp). In addition to the pQE-MBP-MID1sc10 construct used to produce MID1sc fused to MBP, a 5 kbp pQE-MID1sc10 plasmid was prepared with an untagged version of MID1sc10 under the control of the T5 promoter. This plasmid provides flexibility as it can be transcribed by E. coli RNA polymerase and therefore use the strong lac promoter for expression in strains with higher transformation efficiencies, such as XL1-Blue. All constructs were transformed into suitable expression strains: pMG209- pelB-MID1sc10 in BL21 (DE3) Gold and pKTNTET-MID1sc10, pKTNTET- pelB-MID1sc10, as well as pQE-MID1sc10 in both XL1-Blue and BL21 (DE3) Gold. All constructs were tested by FADS using multiple delay-line setups. If the variant was not exported into the periplasm, cells were lysed in the droplet to release the enzyme into solution. Surprisingly, only the periplasmic export plas- mid pMG209 expressed in BL21 (DE3) Gold converted enough esterase substrate to generate a signal that could be detected using on-chip incubation. All other constructs failed to provide sufficient expression, a result that we confirmed by simple expression tests in a 96-well plate assay. 3. Improving a proficient artificial metalloenzyme 65

3.2.3 MID1sc10 library generation

With a viable expression system in hand, a library was created for the pMG209- MID1sc10 periplasmic export system. Given the small size of the protein, we initially chose to randomize single active site residues to see whether placement of the substrate adjacent to the zinc-coordinated hydroxide and stabilization of the oxyanion and the leaving group during bond cleavage could be improved. Glu31 and Ser80 were targeted because of their proximity to the leaving group (Fig 3.4). Interestingly, Ser80 was never mutated in the previous evolutionary trajectory. Ile64 and Arg68 were also randomized as they contact the transition state analog in the x-ray crystal structure, they are close to the scissile bond and carbonyl group, respectively. Arg68 stabilizes the evolving negative charge during oxyanion formation. All residues were randomized simultaneously by introducing NNK codons using overlap extension PCR. The library insert was ligated into to the pMG209 vector and transformed into E. coli cells. By using large quantities of DNA and multiple aliquots of electrocompetent cells, the library of 106 theoretical variants was covered threefold.

Figure 3.4: MID1sc10 library design. The x-ray crystal structure of MID1sc10 is shown with the transition state analog in yellow. The residues targeted for mutation are indicated in pink. 3. Improving a proficient artificial metalloenzyme 66

3.2.4 FADS for esterase activity

For an initial, low stringency sort, the library was expressed and dispersed in a droplet making device with 150 µM of (S)-1. The emulsion was stored off-chip and injected into the sorting device 30 min after droplet generation. A strong fluorescence signal indicating active droplets was observed, as expected from this highly active enzyme (Fig. 3.5a). The library was sorted and the emulsion disrupted by addition of 1H,1H,2H,2H-perfluorooctanol and centrifugation. Since the cells were expected to be partly intact, lysis agents were added. The selected MID1sc10 genes were amplified and subcloned into a freshly prepared pMG209 stock, transformed, and expressed for a second round of sorting. This time, a chip with a 5 min on-chip incubation line was chosen (Methods, Fig. 5.3). A marked decrease in detectable activity was observed with this setup and only 0.03% of droplets showed activity (Fig. 3.5b). Most likely, few variants in the library were sufficiently active to turn over enough substrate under these higher stringency conditions to yield a high signal. Nevertheless, active droplets were collected, coalesced, and the cells lysed to subclone their genes.

Figure 3.5: Activity profiles found when the MID1sc10 library was sorted . a In- cubating for >30 min off-chip afforded high measured activity and a large fraction of droplets shows cleavage of (S)-1 and release of to coumarin. b The second screening round was performed with a 5 min delay line. With the increased stringency, only few variants managed to produce significant amounts of the fluorescent coumarin product. Therefore, the total activity markedly decreased. 3. Improving a proficient artificial metalloenzyme 67

3.2.5 Variant isolation

Variants from the unsorted library and from the first and second FADS sorts were randomly chosen for testing in a standard microtiter plate assay. Although the variants were produced by secretion, we also lysed the cells to ensure that all enzyme was released to the screening buffer. As expected for a library with randomized codons, no clones as active as MID1sc10 were found in the unsorted pool. The number of randomly sampled variants that were active increased from 1% to 39% after the first sort. Surprisingly, fewer active variants were found after the second round of sorting, where only 27% of the variants converted the substrate at rates above background. A possible reason for this result is that the low frequency of active droplets in the second sort was close to the false-positive rate of sorting. This would lead to an overall decrease in the total number of active variants isolated. However, the most active clones showed up to a 2-fold increase in activity over MID1sc10 in the plate assay. They were picked for sequencing. Five of the variants that showed increases in activity were found to be MID1sc10 lacking any mutations, showing the relatively high variance of the plate assay. However, three mutants (MID1sc10.1-3, Table 3.1) had similar sequences in which Arg68 was mutated to an uncharged amino acid and Ser80 to arginine (Fig. 3.6b). Conceivably, S80R could stabilize the oxyanion in lieu of Arg68, whereas Q31E or R68Q might facilitate departure of the leaving group. Due to this convergence, these variants were chosen for further characterization.

Table 3.1: Hits found in the pools after one and two rounds of sorting the MID1sc10 library for esterase activity. All activity values are normalized to MID1sc10 and the spontaneous hydrolysis of ester rac-1 in buffer. The dash denotes the residue has been conserved.

Variant Q31 I64 R68 S80 Activity 1 - L Q R 1.7 2 E L L R 1.3 3 Y G A R 1.4 4 - - - W 1.6 MID1sc10 - - - - 1 3. Improving a proficient artificial metalloenzyme 68

3.2.6 Characterization of MID1sc10.1-3

Variants 1-3 were subcloned into pQE-MBP for expression as a MBP fusion pro- teins. After affinity purification, TEV cleavage of the fusion protein, and subse- quent size exclusion chromatography, pure enzyme was kinetically characterized. As a control, MID1sc10 was also expressed and purified by the same protocol.

For every variant, kcat was approximated at saturating substrate concentration compared to the KM of MID1sc10 (i.e. 1.7 µM). The kcat of the parent was es- timated to be 1.1 s−1, slightly lower than 1.6 s−1 reported previously based on full Michaelis-Menten analysis [229]. The kcat values of MID1sc10.1-3 were 1.3, 0.6, and 0.8 s−1, respectively. Apparently, no significant improvement was made. However, the mutations retained the high activity of MID1sc10.1 the activity is similar as to MID1sc10 despite having a substantially remodeled constellation of active site residues.

Figure 3.6: Comparison of the x-ray crystal structure of MID1sc10 (a) and a com- putational model of MID1sc10.1 (b). The co-crystallized phosphonate transition state analog is shown in yellow and the zinc binding histidines as blue sticks. The removal of Arg68 is likely complemented by the S80R mutation. The newly created guanidinium group is at a similar distance and potentially capable of stabilizing the oxyanionic tran- sition state. The shape of the rest of the pocket has not changed by much, with I64L being a conservative mutation and R68Q filling out a similar volume as the previous residue. 3. Improving a proficient artificial metalloenzyme 69

Finding this alternative solution to the previous stabilization of the oxyanionic transition state still makes an interesting case for FADS. It shows that the removal of a crucial residue such as Arg68 can be rescued by a complementary mutation, something that would be difficult to achieve by targeting single residues at a time. However, to find more highly improved variants, the second sorting round will have to be repeated with longer incubation times. Given the low enrichment, it is likely that the best variants in this library were not found during screening.

3.3 Discussion

Fluorescence-activated droplet sorting offers a large increase in throughput com- pared to other commonly used screening techniques. This increase in throughput enables faster engineering of enzymes [152], as well as optimization of enzymes operating close to the diffusion limit [116]. Droplet microfluidic screening has al- ready proven useful for improving hydrolase activity of natural enzymes and dis- covery of hydrolytic enzymes from metagenomics libraries [144, 146, 154, 247, 262]. In two instances, hydrolytic enzymes with specificity constants in the range of 105 M−1s−1 have been successfully isolated using FADS. However, catalytic efficien- cies of hydrolase enzymes engineered with FADS are rarely reported to exceed 104 M−1s−1. Besides optimizing naturally occurring enzymes, ultra-high throughput screening was previously shown to be very efficacious for improving computa- tionally designed biocatalysts [152]. A computationally designed retro-aldolase, which had already been evolved using lower throughput methods, was success- fully engineered using FADS [149]. The enormous throughput made it possible to escape a local fitness plateau and reach a >109–fold rate enhancement in the final variant. Besides possessing the efficiency of natural aldolases, this optimized artificial enzyme also proved useful for the synthesis of several enantioenriched non-natural aldol adducts. Further diversification of this variant has even yielded variants capable of producing a palette of aliphatic and cyclic ketones [172]. As shown by this example, the combination of computational design and ultra-high throughput screening provides a means of creating proficient enzymes 3. Improving a proficient artificial metalloenzyme 70 from scratch. Similar approaches for artificial esterases have yet to be shown. MID1sc10 is a highly active artificial zinc-dependent esterase based on a com- putationally designed scaffold that was subjected to multiple rounds of enzyme engineering in microtiter plate assays [229]. It catalyzes the cleavage of (S)- 5 −1 −1 phenylpropionate coumarin ester with a kcat/KM value of 9x10 M s . The evolutionary trajectory from the starting point MID1sc to MID1sc10 impres- sively shows how an artificial metalloenzyme can be evolved to an extremely efficient catalyst by converting a metal from a structural element into a dedi- −1 cated catalytic cofactor. However, the kcat of 1.6 s is 1-3 orders of magnitude lower than the most sophisticated hydrolases found in nature [258–260]. In this chapter, we specifically sought to improve the turnover number of this enzyme by exploiting the enormous throughput of FADS and high substrate concentrations in droplets. During previous engineering efforts, the small zinc-binding helical bundle was expressed as a fusion construct with maltose binding protein, which was prote- olytically cleaved before analysis. To simplify this system for FADS, we imple- mented and tested multiple expression systems. Either the cells were lysed in the droplets to release the expressed untagged enzyme or the enzyme was exported to the periplasm of the host. Only periplasmic export under T7 promoter control yielded substrate conversion that was detectable in 20 pL droplets. Using this expression system, a MID1sc10 library targeting four active-site-residues simul- taneously was created, encompassing 106 members. Screening by FADS led to moderate enrichment of the libraries. The best variants from two rounds of sort- ing were purified and shown to possess rate constants similar to MID1sc10. The mutations found are nevertheless interesting: a key MID1sc10 residue, namely Arg68, disappeared and a new arginine at position 80 emerged. The latter pre- sumably assumes the same catalytic role. These proof-of-principle experiments illustrate the importance of high-throughput screening methods for mutating multiple residues simultaneously, through which complementary mutations can yield alternative active site configurations. Re-screening of the focused library to achieve higher enrichments is likely to filter out variants with even higher rate accelerations. Better calibration of the 3. Improving a proficient artificial metalloenzyme 71 assay conditions by testing longer incubation lines will most likely help to give higher enrichments. MID1sc10 is a small helical bundle and has no second shell residues. Large structural changes and reorientation of the active-site might be needed for more sophisticated transition state stabilization to emerge. For example, MID1sc10 only has one coordinating histidine that forms a hydrogen bond with its backside nitrogen to another residue. In other zinc-dependent hydrolases, the active site configuration is fine-tuned by hydrogen bonds to all coordinating residues and the activated water molecule [263]. Additionally, in the MID1sc10 x-ray crystal structure no residue is capable of interacting with the leaving group by hydrogen-bond donation, a feature that is essential for efficient amidase activity [8]. Random mutagenesis could help find substitutions important for long range interactions to active site residues and induce overall structural change. Drastic rearrangements have already been observed in the initial evolutionary trajectory and are not impossible, but would require many more mutations. In conclusion, we have shown that evolving this highly efficient esterase through microfluidic droplet sorting is possible, by finding an alternative active site configuration using FADS screening of 106 variants. Considering the small activity observed in the 5 min delay line and given that other enzymes that catalyze the reduction of hydrogen peroxide, with specificity constants >106 M−1s−1 have been evolved using FADS [116], it is likely that it will be possible to improve MID1sc10 further. Refining screening conditions will hopefully lead to such a variant and grant insights as to what is necessary for artificial metalloenzymes to reach catalytic perfection. The design principles from which MID1sc was derived, the combination of computationally designing metal binders and laboratory optimization, provides a promising framework to design artificial metalloenzymes. Indeed, the same scaffold has already shown other promiscuous activities that could be enhanced by evolution [251, 264]. Other activities, such as amide bond cleavage, have never been observed with any MID1sc variant. High throughput methods may expedite searching sequence space for more elusive activities. Fast ways of engineering will be required to parallelize the improvement of new scaffolds. FADS is perfectly 3. Improving a proficient artificial metalloenzyme 72 suited to do so. Chapter 4 Perspective

Enzymes are fantastic catalysts capable of catalyzing chemical reactions with rate enhancements and selectivities that put most small molecule catalysts to shame, all while operating in aqueous solution at ambient temperature and pressures. Moreover, directed evolution provides us a universal framework to tailor these biocatalysts to our liking [57]. By combining directed evolution with rational and computational enzyme design, entirely new function can be introduced into protein scaffolds that have no counterparts in nature [201]. Such designs can also serve as new templates for further engineering. Given the power of enzyme catalysis and the robustness of directed evolution as a technique to improve them, it is not surprising that all of these techniques are becoming important tools to improve the ecological footprint of the chemical industry. Their implementation, however, is held back by the rate at which suitable biocatalysts can be generated [104]. In this thesis, we demonstrated the development of two ultra-high throughput screening assays for directed evolution, one for oxidase enzymes and another for an artificial metalloenzyme. With the ability to test up to 108 variants per experiment, we show that it is possible to engineer enzymes more quickly and tackle harder problems than with lower throughput methods. In Chapter 2, we described the development of a fluorescence activated droplet sorting (FADS) assay for oxidase enzymes. Our assay detects the oxidase’s uni- versal hydrogen peroxide by-product in an enzymatic cascade that produces a fluorescent dye. This assay is label-free and can be used to perform substrate walks with ultra-high throughput screening. We show that its large dynamic

73 4. Perspective 74 range enabled the discovery of active biocatalysts from a naïve, million-membered library of cyclohexylamine oxidase (CHAO) variants for two non-native sub- strates. Wild-type CHAO processes 1-phenylpropylamine and 1-methyl-1,2,3,4- tetrahydroisoquinoline with moderate to low catalytic efficiency (kcat/KM =1,100 M−1s−1 and 0.5 M−1s−1). A single round of mutagenesis and selection, microflu- idic screening enabled a 9- and 50-fold increase in kcat/KM , respectively. This result underscores that even with relatively unsophisticated library design FADS outperforms microtiter plate-based assays, where double digit improvements are extremely rare. In combination with library design tailored to the hydrophobic active site pocket, a single round of mutagenesis and screening for the oxidation of 1-phenyl-1,2,3,4-tetrahydroisoquinoline (PheTIQ) enabled the isolation of PT.1. −1 −1 This quintet mutant has a 960-fold improved kcat/KM of 9’400 M s , mainly due to a 340-fold increased rate constant. Both activity and stereo selectivity for (R)-PheTIQ are in the range of natural oxidases, affording the efficient synthesis of the drug precursor (S)-PheTIQ to optical purity. Heavily engineered oxidase enzymes are widely employed in chemical manu- facturing, with some notable examples needing >35 mutations to be industrially viable [97, 231]. The assay we developed is applicable to any oxidase and can be used to screen for enantioselectivity by screening for activity against pure enantiomers of almost any substrate. This constitutes an important develop- ment because screening for enantioselectivity with model substrates can trans- late poorly to substrates of industrial interest [247]. The lable-free nature of our assay makes it readily applicable to problems where oxidase activity against non- natural substrates needs to be generated, even from starting points with very low activity. The library design strategy used to engineer PT.1 can be trans- ferred to any enzyme with a hydrophobic active site pocket. This mutational approach has proven to be extremely flexible. With one single library, activity with a range of different amines has since been detected. This includes both enantiomers of 2-methyl piperidine (unpublished data), pyrrolidines (personal communication Oliver Alleman), and by changing a single targeted residue in the library 2-phenyl-1,2,3,4-tetrahydroquinoline (unpublished data). Given that the assay is applicable to the whole family of oxidases, these findings open up 4. Perspective 75 the possibility of finding other enzyme variants capable of producing not only amines, but also chiral thiols, alcohols, and nitriles by rapidly engineering other enzymes of that class [249, 250, 265]. The approach of leveraging detectable by-products could enable the general- ization of other FADS assays. We have managed to detect the autofluorescence of NADH with our microfluidic setup (unpublished data), which could enable the detection of a range of other redox-active enzymes employed in industrial settings, such as ketone and imine reductases. Additionally, longer cascades in- volving more than just a single reaction can be envisioned. For example, the hy- droxylation of by P450 enzymes has been monitored in microtiter plates through a cascade oxidation of the product alcohol by galactose oxidase, fol- lowed by similar detection of the hydrogen peroxide by-product [266]. This assay could potentially be utilized to engineer monooxygenase activity in ultra-high throughput. Methods have been and are being developed to assay enzymatic reactions that don’t yield easily detectable side-products. For example, meth- ods based on donor/quencher-pairs allow the engineering of enzymes that act on biopolymers with little substrate re-design [156, 158–160]. Label-free detec- tion methods for droplet sorting are also starting to emerge. Absorption, and mass spectrometry have all been successfully used to sort mi- crofluidic droplets [162, 165, 166]. They are still in the experimental stage, but have exciting prospects for enzyme engineering. With higher throughput and the ability to multiplex readouts, even longer reaction pathways are potential targets of future directed evolution campaigns. These methods can also benefit ongoing efforts to design new enzymes from scratch. Equipping inert protein scaffolds with reactive metals represents one way to create enzymes de novo [267]. Although significant progress has been made with such systems, their creation and improvement through directed evo- lution hasn’t always been easy [268]. In this light, the evolution of MID1sc to MID1sc10 to create a highly proficient artificial esterase represents a remark- able success [229]. This enzyme catalyzes the cleavage of a 2-phenylpropionate coumarin ester with a catalytic efficiency of 9x105 M−1s−1, approximately two orders of magnitude below the diffusion limit. Although the activity of MID1sc10 4. Perspective 76

is impressive, its kcat is one to three orders of magnitude lower than that of the most active hydrolase enzymes [258–260]. Therefore, we were interested in evolv- ing new MID1sc10 variants that can close this gap to the best natural enzymes and find out what the distinguishing factors are. In Chapter 3, we set out to use our microfluidic screening platform to further improve this artificial metalloenzyme. Initial experiments established the use of periplasmic export to monitor the esterase cleavage in droplets without lysing the cells. This method was used to screen a library containing four simultaneously randomized active site residues for improved esterase cleavage. Two rounds of sorting, however, only managed to moderately enrich the pool of mutants with highly active variants. Despite imperfect enrichment, the best three variants were revealed to have similar rate enhancements to MID1sc10, but very different active site configurations. These results show that our assay is capable of filtering out highly active hydrolases. Experiments in the literature with an off-chip incubation setup similar to ours found a natural hydrolase with similar kcat/KM to MID1sc10 [154]. The pool of analyzed variants, however, was almost exclusively made up of enzymes that are >10-fold less active than MID1sc10. In our case, we found multiple variants that have similar rate enhancements as the starting point, which makes it difficult to discern small improvements. The experimental setup therefore needs to be perfectly tuned. Re-sorting of this and other libraries and adjusting the delay-line length to accurately control incubation times will likely be needed to identify more active variants. The convergence of the amino acid substations in our hits show that alter- native solutions to oxyanion stabilization exist in the MID1sc scaffold. Interest- ingly, the alternative arrangement of stabilizing charges was never explored in the previous evolutionary trajectory [229]. Ser80 was never targeted during evolu- tion, while Arg68 was introduced as a point mutation in round seven to prevent self-acylation of the original lysine with 2-phenylpropionate. The throughput of FADS enabled us to simultaneously mutate both residues and uncover their epistatic relationship. Preliminary experiments sorting the same library for ami- dase activity yielded hits with exactly the same mutational pattern (unpublished results). 4. Perspective 77

MID1sc is an enormously versatile scaffold for catalysis and is not limited to ester hydrolysis. It has been evolved to catalyze bimolecular hetero Diels- Alder cycloadditions with astounding activity and selectivity [251, 264]. Addi- tionally, the same Diels-Alderase shows starting activity for cyclopropanations, Michael additions, reductions, and retro-aldol reactions. New variants have al- ready been discovered with improved activity for the reduction of 3-hydroxy- 3-(6-methoxynaphthalen-2-yl)-1-(pyridin-2-yl)propan-1-one using Hantzsch ester as reductant and retro-aldol cleavage (Dr. Yusuke Ota, personal communica- tion). Various catalytic activities of MID1sc variants have also been observed by binding other metals, including Cu(II) and Au(I). Although MID1sc can accommodate diverse catalytic frameworks, the start- ing activities seen for these different reactions are almost always very low. Thus, fast ways of improving these enzymes are indispensable to harness their full po- tential. The esterase FADS assay is readily adaptable for these purposes. Retro- aldolase activity could quickly be improved by combining periplasmic secretion with the previously developed assay for an artificial retro-aldolase [149]. Alco- hol dehydrogenase activity could be amenable to coupled assays by detecting the oxidized reducing agent. Although the Hantzsch ester is likely incompat- ible with emulsion systems due to its poor solubility, switching to NADH, for example, could allow detecting the turnover of the reducing equivalent directly. Intramolecular gold-catalyzed hydroarylation to produce coumarin derivatives has already been used as a fluorogenic readout [269]. These substrates could be used to screen the gold-binding MID1 variant for catalytic activity, although, their water solubility will need improvement for FADS. In this thesis, we have shown that FADS is a very promising tool to improve both industrially relevant and artificial metalloenzymes. Of course, FADS is not without limitations. Not all enzymatic reactions provide a convenient fluo- rescent readout. Thus, the technique is often limited to model substrates with fluorogenic tags of limited interest. Chapter 2 shows how more generalizable detection methods can solve such problems, underscoring how coupled assays and future physical readouts have a large impact on biocatalysis. Additionally, it shows that ultra-high throughput screening can render enzymes synthetically 4. Perspective 78 useful in a timeframe that compares well with the development of chemocatalysts. Other drawbacks of droplet-based microfluidics include the necessity for soluble substrates. Cross-talk between droplets also prevents utilizing pH changes as a robust detection method. This problem might be fixed by creating more cus- tomized oil-surfactant combinations or using bead-based encapsulation to better shield the reaction compartment [270]. Microscopic static arrays are a promising, albeit lower throughput alternative to droplets, due to improved physical sepa- ration of samples and the capability of continuously measuring turnover [118]. Furthermore, passively encapsulating cells according to a Poisson distribution is wasteful, since fewer than a third of all droplets usually contain a cell. Pre-sorting the emulsions according to cell occupancy by standing acoustic waves or inertial ordering by Dean flow are promising options to enable >97% loading of single cells [271]. However, their compatibility with other sorting modes has still to be demonstrated. The combination of ultra-high throughput screening with next generation se- quencing has great potential to yield enormous amounts of functional data [220, 221]. These data, in combination with sequence-based deep learning approaches, could be extremely valuable to more accurately model sequence-function land- scapes [202, 217]. Computational design usually results in a large list of candidate sequences ranked by predicted activity. Practically, however, the ranking rarely correlates perfectly with activity. Microfluidics screening allows the assessment of libraries on the same scale as computers. Thus, these screening approaches could be used to quickly assess a large portion of computationally designed candidate enzymes. This could close the design-create-learn cycle for the improvement of force fields and semi-empirical models. All these developments will likely improve the computational and experimental creation of new enzymes and bring us closer to being able to enzymatically catalyze any at will. In the future, more standardized workflows and commercial platforms, compa- rable to FACS, will make droplet-based techniques ubiquitous. Similar develop- ments already took place with droplet microfluids-based approaches to single-cell RNA sequencing, however, with less sophisticated chip designs [272–274]. The greater accessibility of ultra-high throughput methods will likely lead to a huge 4. Perspective 79 gain in our understanding of enzyme catalysis and molecular evolution. It has the potential to make the synthetic use of enzymes more widespread and simplify the creation of tailored pathways for fermentation of small-molecule drugs. In the future, this could potentially lead to drastic price reductions of medicines through cheaper, higher yielding, and more customized production processes, making healthcare more affordable. Chapter 5 Materials and methods

5.1 General methods

5.1.1 Materials

Chemicals were purchased from Sigma, ABCR, Acros, Fluorochem, Thermo Fisher Scientific, and Dow Corning. Chemical standards for (R)- and (S)-1- phenyl-1,2,3,4-tetrahydroisoquinoline (R-7 and S-7) were bought from Toronto Research Chemicals and TCI Chemicals, respectively. Compounds (R)-7 and (S)-7 were further purified by recrystallizing the HCl salt in ethanol and di- ethyl ether, filtering, and drying in vacuo. Amplex UltraRed was purchased from Thermo Fisher Scientific. All enzymes were purchased from New England Biolabs. All bacterial strains were purchased from Agilent.

5.1.2 General analytical methods

NMR spectra were recorded on an AVIII 400 (1H 400 MHz, 13C 100 MHz) spectrometer. Small molecules were further analyzed by LC-MS (Waters H-class UPLC/SQD-2) using an Acquity UPLC BEH C-18 column (50 x 2.1 mm, 1.7 µM), 1 µL injection, monitoring ESI+, solvent A = H2O + 0.1% TFA, solvent B = MeCN + 0.1% TFA, flowrate = 1 mL/min, initial conditions = 5% B, 0-1.5 min ramp to 80% B, 1.5-2 min ramp to 100% B, 2-2.2 min = 100% B, 2.2-2.3 min ramp to 5% B, 2.3-3 min re-equilibration = 5% B.

80 5. Materials and methods 81

5.1.3 Creation of electrocompetent cells

Overnight cultures of XL-1 Blue, BL21 (DE3), BL21 (DE3) pLysS or BL21 (DE3) Gold cells were used to inoculate medium containing 2 % w/v tryptone, 0.5 % w/v yeast extract, 10 mM NaCl, and 2.5 mM KCl in a 1:1000 ratio and ◦ grown at 30 C and 230 rpm. After reaching and OD600 of 0.3 the culture was cooled to 4 ◦C and regularly swirled to suspend cells. All subsequent incubation and centrifugation steps were carried out on ice or at 4 ◦C, respectively. Cells were pelleted by centrifugation at 3500 rpm for 10 min and the supernatant discarded. The pellet was washed two times by resuspending in 400 mL ice-cold sterile ddH2O followed by pelleting by centrifugation at 3500 rpm for 10 min and discarding the supernatat. The washed cell pellet was resuspended in 10 % glycerol by gentle swirling on ice. Subsequently, the suspension was washed by adding 400 mL of 10% glycerol and pelleted by centrifugation at 3500 rpm for 15 min and the supernatant discarded. The pellet was resuspended in residual 10% glycerol (for a pellet of a 1.2 L culture up to 0.5 mL of 10% glycerol can be added to aid the process) and aliquots of 50 µL were frozen in liquid nitrogen and stored at -80 ◦C.

5.1.4 Oligonucleotides

Table 5.1: Primers used in cloning procedures.

Cloning primer Sequence chao fw 5’-GGAGATATACATATGCACCATCATCACCACC chao rv 5’-GGATCAGCTGACTAGTCATACGAGAGC mid1 fw 5‘-TTTCAGGGATCCGGCTCTCCGCTGGC mid1 rv 5‘-CTAATTAAGCTTGCCTGCAGGTCGACTTAGTCG pQM fw 5‘-TCCTCGCTGCCCAGCCGGCGATGGCCGGATCCGGCTCTCCGCTGGC pQM rv 5‘-TAGTGGTGGTGGTGGTGGTGCTCGAGAAGCTTGCCTGCAGGTCGAC pMG2pK fw 5‘-ATATACATATGGGATCCGGCTCTCCGCTGGCG pMG2pK rv 5‘-TCAGCTGACTAGTAAGCTTGCCTGCAGGTCGACT 5. Materials and methods 82

Table 5.2: Mutagenic primers used to generate libraries of cyclohexylamine oxidase. Mutagenic codons are highlighted in bold font. D = G + A + T ; Y = C + T ; R = A + G ; H = A + T + C; B = G + T + C; K = G + T; N = A + T + C + G.

Target residue(s) Sequence fw: 5’-CGGTTGCACACGGTTATAGTCAATDYTDYTCTAGGAGCAGACCCCTATGAGG T198/ L199 rv: 5’-ATTGACTATAACCGTGTGCAACCG fw: 5’-GGAACACGAGACGGAGCCCAATG M226 rv: 5’-CATTGGGCTCCGTCTCGTGTTCCARHTAGACTTTGTATCCCTTCGC fw: 5’-GCGCGCACCGATGGGAAGADYTTACAAGGTTCAGGC Y321 rv: 5’-TCTTCCCATCGGTGCGCGC fw: 5’-CTTGACACCGAAGACGTAGGTGTGDTYCTADYTGACGGAACAAAACCTACC F351/L353 rv: 5’-CACACCTACGTCTTCGGTGTCAAG fw: 5’-ATAGGCGGCTCAAATTACGACCGC F368 rv: 5’-GCGGTCGTAATTTGAGCCGCCTATARHTCCTATAAGGGTCGCTAGCG fw: 5’-GCAGGAGTGGGCAAAAGGTGGTBYTGTTACATATATGCCCCCGGGAG P421 rv: 5’-ACCACCTTTTGCCCACTCCTGC fwd: 5‘-CGTGTTAACGATAACCGTGTGCAACC L199 rv: 5‘-GGTTGCACACGGTTATCGTTAACACGNNKTTGGGTGCAGACCCCTATGAGGTT fwd: 5‘-GGTACGCGTGACGGAGCCCAATG M226 rv: 5‘-CCATTGGGCTCCGTCACGCGTACCNNNCAAACTCTGTATCCCTTCG fwd: 5‘-CGCGCGCACCGATGGGTCGTNNKTATAAAGTTCAGGCGCGCTACCCC Y321 rv: 5‘-ACGACCCATCGGTGCGCGCGC fwd: 5‘-GATGGTACTAAACCTACCGATACGCTAG L353 rv: 5‘-CTAGCGTATCGGTAGGTTTAGTACCATCNNNCAAGAAAACTCCGACGTCTTCGGT

Table 5.3: Mutagenic primers used for MID1sc10 library generation. Degenerate codons are in bold text.

Target residue(s) Sequence fw: 5’-CGTATGGATGAAGTGCGTACCCTGNNKGAAAACCTGCATCAGCTGATGC Q31 rv: 5’-CAGGGTACGCACTTCATCCATACG fw: 5’-GGTTCCCCTTTANNKCAACAAATCNNKAATATCCACTCCTTCATCCACCAAGC I64/K68 rv: 5’-GATTTGTTGGGCTAAAGGGGAACC fw: 5’-GCATGGACGAGGTTCGCACGTTANNKGAGAATTTACACCAATTAATGCACG S80 rv: 5’-TAACGTGCGAACCTCGTCCATGC 5. Materials and methods 83 5.2 Methods specific to Chapter 2

5.2.1 Construction of pKTNTET-CHAO_Kan

The plasmid pACYC-6His-CHAO [275] (Supplementary Figure A.4) was used to amplify the wild-type CHAO gene with primers chao fw and chao rv (Table 5.1), which introduced 5’ NdeI and 3’ SpeI restriction sites. The amplified insert was purified by agarose gel electrophoresis. Both this gene insert and the plasmid pKTNTET-0 [276] were digested with NdeI and SpeI-HF (New England Biolabs) overnight at 37 ◦C and purified by agarose gel electrophoresis. The fragments were ligated using T4 DNA ligase overnight at 16 ◦C and purified using the Clean and Concentrator 5 Kit (Zymo Research). The resulting construct pKTNTET-CHAO_Amp was transformed into XL10-Gold by electroporation. pKTNTET-6His-CHAO_Amp and pET-29b(+) (Novagen) were digested with BspHI, excising both their antibiotic markers, and the relevant fragments were purified using agarose gel electrophoresis. The kanamycin resistance cassette of pET-29b(+) was ligated into the pKTNTET-6His-CHAO vector for 1h at room temperature and the resulting construct used to transform XL1-Blue cells, which were plated on selective medium containing 25 µg/mL kanamycin. Plasmid DNA was extracted from a single colony, and the DNA sequence of the resulting plasmid pKTNTET-6His-CHAO_Kan (Supplementary Figure A.5) verified by sanger sequencing (Microsynth).

5.2.2 Plasmid stability

The toxicity of plasmids was determined through growth of transformed bac- teria on different media and activity-based assays. For BL21 (DE3) and BL21 (DE3) pLysS strains, the E. coli cells were transformed with pKTNTET-6His- CHAO_Kan and plated on selective media. Single clones were used to inoculate starter cultures supplemented with 25 µg/mL kanamycin, which were grown at ◦ 37 C. When the OD600 reached 0.3, appropriate dilutions were plated on LB Agar, LB Agar containing 25 µg/mL kanamycin, and LB Agar containing 25 µg/mL kanamycin and 0.5 mM IPTG. The plates were incubated overnight at 5. Materials and methods 84

37◦C and the toxicity evaluated by colony counting. A significant amount of growth on the IPTG containing plate for BL21 (DE3) indicated plasmid insta- bility in that strain. No growth on IPTG containing plates and therefore no plasmid rejection was observed for the BL21 (DE3) pLysS strain. For XL1-Blue cells, induction by tetracycline provided insufficient selective pressure and plas- mid instability was determined by enzymatic activity instead. Therefore, single colonies transformed with pKTNTET-6His-CHAO_Kan were used to inoculate 96 wells on a multititer plate. The enzyme was expressed and activity for cyclo- hexylamine oxidation was measured according to the section describing multititer plate assays (Section 5.2.7). All selected colonies were able to express functional CHAO enzyme, indicating no plasmid instabilty.

5.2.3 Library generation

Cloning primers and primers for cassette mutagenesis were purchased from Mi- crosynth AG. The initial focused library was created by introducing two NNK codons (Leu199 and Y321) and two NNN codons (Met226 and Leu353). Four pairs of primers harboring the mutagenic codons were used to create five frag- ments of the wild-type CHAO (Table 5.2). For the non-degenerate codon library, six pairs of primers were designed to introduce seven DYT and one BYT de- generate codons into the wild-type gene, by fragmenting the gene into seven. In both cases, the fragments were generated in individual PCR reactions, purified by agarose gel electrophoresis, and assembled by overlap extension PCR with primers chao fw and chao rv flanking the gene [106]. The resulting full-length library amplicons and pKTNTET-6His-CHAO_Kan plasmid were digested with NdeI and SpeI-HF endonucleases overnight at 37 ◦C. The digested gene frag- ment was purified using Clean and Concentrator 5 Kit, whereas the plasmid was purified by agarose gel electrophoresis and further desalted using a Clean and Concentrator 5 Kit (Zymo Research). The fragments were ligated with T4 DNA ligase at 16 ◦C overnight and desalted prior to electroporation into XL1-Blue cells. 5. Materials and methods 85

5.2.4 Microfluidic setup

A description of the optical setup used for the experiments described in this thesis can be found in ref. [149]. In brief, the setup consists of a laser combiner (Omicron Laserage GmbH) for illumination with a 375 nm and a 488 nm diode laser and a 561 nm SPPS laser. An inverted fluorescence microscope is used to detect fluorescence using photomultipliers (H10722-30, Hamamatsu Photonics) and corresponding single band pass filters, FF01-488/561/635, FF02-520/28- 25 and FF01-609/57-25 (Semrock) for detection at 448 nm, 520 nm, and 609 nm, respectively. A high voltage amplifier (632B, Trek) was used to induce the dielectrophoretic sorting pulses. The photomultipliers and high voltage controller were connected to a field-programmable gate array (FPGA) card (NI USB-7856R - R Series Multifunction RIO with Kintex-7 160T FPGA, National Instruments) and connected to a computer requiring a run time engine (National Instruments). The gain of the photomultipliers and the high voltage amplifier were controlled and the signals recorded using a custom-made LabView software.

5.2.5 Microfluidic chip production

The chip design and photomask production have been described previously [149, 152]. A droplet maker with two aqueous inlets and an oil inlet was used to collect droplets off-chip (Fig. 5.1). These emulsions were sorted on a separate device with two inlets for emulsions, two inlets for spacing oil, and two outlets. For on- chip incubation a 5, 15, and 45-minute delay line, designed to keep an equal time distribution of the droplets [143], was connected to a T-junction droplet maker with three aqueous inlets and one oil inlet with a sorting device (Fig. 5.2 and 5.3). Sylgard 184 base and curing agent (Dow Corning) were mixed in a 10:1 ratio and cast into silicon-SU8 molds (Wunderlichips). A desiccator was used to remove bubbles from the liquid PDMS. The degassed chips were cured at 90 ◦C for 30 min. The resulting PDMS slab was separated from the mold. Inlets, outlets, and connections for the electrodes were punched using surgical punchers (Shoney Scientific) with a 0.3 mm diameter. The PDMS slabs and glass slides (Corning) were cleaned using water and isopropanol and dried at 90 ◦C. The PDMS slabs 5. Materials and methods 86

Figure 5.1: Schematics of a droplet maker device for off-chip emulsion collection and a sorting device. The droplet generation chip features one oil inlet (top) and two aqueous inlets and an outlet (bottom). Droplets are generated at the flow focusing nozzle. The sorter has two inlets for emulsions (top), two inlets for spacing oil (light green), and two outlets for sorting (bottom). The electrodes are shown in grey. and glass slides were bonded together using a plasma cleaner (PDC-32G, Harrick Plasma) at 0.5 mbar for 30 s and incubated at 90 ◦C overnight. Chips containing electrodes were placed on a 90 ◦C hotplate, 51In 32.5Bi 16.5Sn alloy (Indium Corporation of America) melted into the electrode channels, and copper cables inserted into the inlets [134]. The finished chips were treated with trichloro(1H, 1H, 2H, 2H-perfluorooctyl)silane (Sigma-Aldrich) in vacuo overnight.

5.2.6 Microfluidic assay

The gene library was used to transform XL1-Blue E. coli cells by electroporation, which were recovered in 50 mL Super Optimal broth with Catabolite repression (SOC) at 37 ◦C shaking at 230 rpm for one hour before adding kanamycin to a concentration of 25 µg/mL. After three hours, 15 mL culture were used to inoculate a 50 mL LB culture (25 µg/mL kanamycin) which was grown at 37 ◦ C shaking at 230 rpm. After reaching an OD600 of 0.3 the temperature was lowered to 20 ◦C and gene expression induced by supplementing tetracycline to a final concentration of 2 µg/mL. After overnight expression, 2 mL of the 5. Materials and methods 87

Figure 5.2: Schematics of microfluidic chip with integrated droplet generation, in- cubation and sorting. Droplets are generated at a T-junction from one oil and three aqueous inlets. The droplet contents are mixed by chaotic advection [277]. The two incubation channel lengths used in this study are 45 min (a) and 15 min (b). The incubation channel have narrowing to ensure even incubation times by stochastic re- distribution of droplets [143]. The sorting junction is analogous to an off-chip sorting device. 5. Materials and methods 88 expression culture were harvested by centrifugation at 3500 xg and 4 ◦C and were washed three times with 1 mL supplemented M9 medium (47.6 mM Na2HPO4,

22 mM KH2PO4, 8.5 mM NaCl, 18.6 mM NH4Cl, 0.1 mM CaCl2, 1 mM MgSO4, 0.4% glucose, 5 µg/mL thiamine, 0.08% yeast ForMedium Complete Supplement Mixture (ForMedium), 1x US* trace element mix [278], pH 7.5). Cells were resuspended, filtered through a 5 µm syringe filter (Millipore), supplemented with 30% Percoll (equilibrated with 10x M9 salts) and 2 U/mL DNAse I (New

England Biolabs) and diluted to an OD600 of 0.04 or 0.06 for off-chip storage or on-chip storage, respectively. Off-chip storage: The emulsion was generated with a flow focusing nozzle [137] at a 300 µL/h flow for the oil phase (Novec HFE-7500 fluorinated oil (3M) containing 2% (w/w) 008-FluoroSurfactant from RAN Biotechnologies) and 100 µL/h flow for the two aqueous phases, generating 15 pL droplets at a frequency of 3 kHz. The cell suspension was co-encapsulated with a solution containing 4 U HRP, 11 µM Esculin, 2 mg/mL lysozyme, 4 mg/mL polymyxin B, 0.4 wt. % Pluronic F127, 0.4 mM Amplex UltraRed, 10 µM EGTA, 20 µM EDTA, 8 mM substrate and 4% DMSO in 50 mM sodium phosphate buffer at pH 7.0. The emulsion, which contains an average λ of 0.3 cells per droplet, was collected in a 1 mL syringe purged with HFE-7500. After an incubation of approximately 2 h at room temperature droplets were reinjected into the sorting device at a flowrate of 40 µL/h and spaced by injecting Novec HFE 7100 oil (3M) at a flowrate of 650-750 µL/h. The emulsion was sorted with an electric pulse frequency 15 kHz, at a voltage of 620 kV and pulse length ranging from 0.5 – 0.8 ms. The droplets were sorted at a frequency of ca. 1 kHz. On-chip incubation: Droplets were generated at a T-junction [136] with the oil phase (Novec HFE-7500 fluorinated oil (3M) containing 2% (w/w) 008- FluoroSurfactant from RAN Biotechnologies) flowing at 30 µl/h and all the aque- ous phases at 20 µL/h. A lysis mixture containing 6 U HRP, 15 µM Esculin, 3 mg/mL lysozyme, 6 mg/mL polymyxin B, 0.6 wt. % Pluronic F127, 15 µM EGTA, 30 µM EDTA, in 50 mM sodium phosphate buffer pH 7.0 was injected alongside the detection and substrate mixture containing 0.6 mM Amplex Ultra- Red, 3 µM Esculin, 12 mM substrate, and 3% (v/v) DMSO in 50 mM sodium 5. Materials and methods 89 phosphate buffer at pH 7.0. The emulsion, which contains an average λ of 0.3 cells per droplet, was spaced with Novec HFE-7100 oil (3M) at a flowrate of 650-750 µL/h. Sorting parameters were identical to the ones described above. Generally, the droplet size was determined using the fluorescence of Esculin excited at 375 nm and measured at 448 nm. The CHAO activity is proportional to the fluorescence, which is obtained by exciting at 561 nm and detecting emission at 609 nm, resulting from the HRP catalyzed oxidation of Amplex UltraRed through hydrogen peroxide. Although the total number of active droplets is unknown at the outset of a sorting experiment, the distribution of measured activities usually converges within seconds and changes only slowly over time. Based on the initial distribution, the sorting gate was set to collect the top 0.1% - 0.3 % droplets that displayed the highest fluorescence. For off-chip incubations, the active population shifts over time, which demands periodic adjustment of the threshold. This is in contrast to on-chip incubation, where incubation times are uniform and readjustment of the sorting gate is seldom necessary. While complete coverage of a library is not a necessity for successful directed evolution, the chances of seeing every variant of the 106 library members should be 99%, according to Bosley and Ostermeier [108]. As such, assuming an occupancy of 22% singly encapsulated cells, an average sorting frequency of 1 kHz, and 9 h sorting runs are necessary. The emulsion of sorted droplets was broken by adding 75 µL 1H,1H,2H,2H-perfluorooctanol and 15 µL 10 mM Tris-HCl (pH 8), 10 mM EDTA, 100 mM NaCl, 1% Triton X-100, 1 mg/mL proteinase K. Following a one-hour incubation at room temperature, DNA was recovered using the DNA Clean and Concentrator 5 kit and amplified by PCR using JumpStart Taq DNA Polymerase (Sigma Aldrich) (30 s 94 ◦C, 34 x (30 s 94 ◦C, 30 s 55 ◦C, 2 min 72 ◦C), 10 min 72 ◦C, final hold 4 ◦C) using chao fw and chao rv primers. The PCR product was purified by agarose gel electrophoresis and subcloned into pKTNTET_Kan vector, which was then used to transform electrocompetent XL1-Blue cells. If a third round of sorting was conducted the extracted plasmids were directly used to transformed electrocompetent XL1-Blue cells. Single clones from each round were further analyzed in a microtiter plate assay. 5. Materials and methods 90

5.2.7 Mictrotiter plate assay

After each round of microfluidics sorting, the subcloned library or extracted plas- mids were used to transform XL1-Blue cells by electroporation and plated on LB Agar containing 25 µg/mL kanamycin. Single colonies from the transfor- mations, alongside three wild-type control colonies, were used to inoculate pre- cultures on a microtiter plate containing 150 µL LB medium supplemented with 25 µg/mL kanamycin and the plates were sealed with a gas-permeable mem- brane (Breathe Easy, Diversified Biotech). After overnight incubation at 30 ◦C, 700 rpm, and 100% humidity, 30 µL of these pre-cultures were used to inoculate a deep-well plate containing 1.8 mL LB medium, 25 µg/mL kanamycin per well and plates were sealed with a gas-permeable membrane (Breathe Easier, Diver- ◦ sified Biotech). Cultures were grown at 37 C and after reaching OD600 of 0.3, gene expression was induced by adding tetracycline to a final concentration of 2 µg/mL. Subsequently, the plates were incubated at 20 ◦C and 240 rpm overnight. Cells were harvested by centrifugation at 4000 rpm for 20 min at 4 ◦C, the super- natant discarded, and the pellets stored at -20 ◦C. Lysis was carried out by four freeze thaw cycles and resuspension of the pellets in 50 mM sodium phosphate, 0.2 mg/mL lysozyme, 1 mg/mL polymyxin B, 10 µg/mL DNAaseI pH 7.0 and 3 h incubation at room temperature. The lysates were cleared by centrifugation at 4000 rpm for 20 min at 4 ◦C. In a flat bottom UV-Vis microtiter plate (Nunc MicroWell with Nunclon Delta Surface, Thermo Scientific) 150 µL assay solu- tion containing 3 mM substrate, 5 U/mL HRP, 0.15 mg/mL 4-aminoantipyrine, 1 mM vanillic acid, and 2% DMSO in 50 mM sodium phosphate buffer, pH 7.0 were distributed to each well. The reaction was initiated by adding 50 µL of cleared lysate. Activity values were determined by measuring the change of ab- sorbance at 498 nm in a microtiter plate reader (Varioskan, Thermo Scientific) and normalizing the slope to the average of three internal wild-type controls was performed in R (version 3.3.2) and python (version 2.7). 5. Materials and methods 91

5.2.8 Enzyme purification

Purified plasmids of single variants identified on microtiter plates were used to transform BL21 (DE3) pLysS E. coli cells. Cultures containing 900 mL of LB medium supplemented with 25 mg/mL kanamycin and 30 mg/mL chlorampheni- col were inoculated from a starter culture and grown at 37 ◦C and 240 rpm until reaching an OD600 of 0.3 and induced by addition of 0.5 mM IPTG. The enzyme was produced overnight at 25 ◦C followed by harvesting the cells by centrifugation at 4500 xg for 20 min at 4 ◦C. After discarding the supernatant, the pellets were frozen at -20 ◦C. All of the following purification steps were carried out at 4 ◦C. Pellets were resuspended in 5 mL 50 mM sodium phosphate buffer (pH 7.0) con- taining 0.2 mg/mL lysozyme, 1 mg/mL polymyxin B, 2 mM β-mercaptoethanol and incubated for 3 h. The suspension was subsequently sonicated in the pres- ence of DNase I. The soluble fraction was recovered by centrifugation at 14000 xg for 30 min and CHAO purified by Ni-NTA (Qiagen) affinity chromatography, washing with 10 column volumes of 25 mM imidazole in 50 mM sodium phos- phate buffer (pH 7.0), then with the same amount of 50 mM imidazole 50 mM sodium phosphate buffer (pH 7.0), and finally eluting with 250 mM imidazole in 50 mM sodium phosphate buffer (pH 7.0). Samples were analyzed by 20% SDS- PAGE (GE Healthcare). Fractions containing CHAO were pooled and dialyzed two times against 2 L of 50 mM sodium phosphate (pH 7.0) containing 2 mM DTT before concentration using 30 kDa MWCO centrifugation filters (Amicon, Millipore), spinning at 4000 xg and 4 ◦C. Protein concentration was determined by absorbance of the FAD cofactor at 450 nm using a extinction coefficient [279] of 11,300 M−1cm−1 on a NanoDrop 2000 (Thermo Fisher). Typical yields were 10-20 mg/L culture.

5.2.9 Enzyme kinetics

Purified variants were characterized by monitoring the coupled reaction between hydrogen peroxide, 4-aminoantipyrine, and vanillic acid, which is catalyzed by HRP, at 498 nm ( = 6234 M−1cm−1)[240]. Reactions contained 0.23 mg/mL 4- aminoantipyrine, 1 mM vanillic acid, 5 U HRP, and 2% DMSO in 50 mM sodium 5. Materials and methods 92 phosphate buffer (pH 7.0). All reactions were carried out at 30 ◦C and were initi- ated by adding enzyme to a final concentration of either 1 µM or 10 nM for wild- type and PT.1, respectively. For (R)-1-phenyl-1,2,3,4-tetrahydroisoquinoline a full Michaelis-Menten kinetic dataset was recorded, whereas for other enzyme- substrate combinations kcat/KM was determined from a single point measurement at the lowest substrate concentration where the reaction could still be observed. Kinetic parameters were determined using Prism 8 (GraphPad Software, Inc.) (Supplementary Figure A.3). In the case of PT.1, all kinetic characterizations were performed with three batches of independently produced enzyme. Other- wise, technical triplicates were measured.

5.2.10 Substrate scope

The substrate scope was determined in measurements analogous to the ki- netic characterizations. The enzymatic activity of each purified variant was evaluated by monitoring the coupled reaction between hydrogen peroxide, 4-aminoantipyrine, and vanillic acid, which is catalyzed by HRP, at 498 nm ( = 6234 M−1cm−1)[240]. Reactions contained 2 mM substrate, 0.23 mg/mL 4-aminoantipyrine, 1 mM vanillic acid, 5 U HRP, and 2% DMSO in 50 mM sodium phosphate buffer (pH 7.0). All reactions were carried out at 30 ◦C and were initiated by addition of the enzyme. The enzyme concentration was varied in the nM to µM range in order for v0 to be linear within the first 10 % of substrate consumption.

5.2.11 Chiral analysis

Enantiomeric excess was determined via HPLC on a chiral stationary phase using a Waters 717plus Autosampler (Waters Corporation) equipped with two Waters 515 HPLC Pumps (Waters Corporation) and a Waters 996 Photodiode Array Detector (Waters Corporation). 5. Materials and methods 93

5.2.12 Deracemization reactions

Deracemization of PheTIQ The deracemization procedure was adapted from Ghislieri et al. [83]. All deracemizations were carried out with 10 mM 1-phenyl- 1,2,3,4-tetrahydroisoquinoline (54 mg), 5 µM purified CHAO wild-type or PT.1, 40 mM borane-ammonia complex in 25 mL 1M sodium phosphate buffer at pH 7.0, and catalytic amounts of catalase (Sigma Aldrich). The reaction was initiated by adding the oxidase and incubated at 30 ◦C. HPLC samples were prepared by adding 20 µL of 10 M NaOH to each 250 µL sample and extracting the quenched reaction with 1 mL tert-butyl methyl ether (TBME). The organic phase was dried over a MgSO4 plug and filtered through a 0.2 µm AcroPrep 96 filter plate (Pall) and separated on a Chiralcel OD-H (Chiral Technologies Europe) column (150 x 4.6 mm, particles diameter = 5 µM) at room temperature, eluting isocratically with 70:30 n-hexane:isopropanol at a flow rate of 1 mL/min (Fig. 2.12d). The reaction was worked up by adding 200 µL 10 M NaOH and extracting with 50 mL TBME. The organic phase was dried over MgSO4 and the solvent removed in vacuo to yield the (S)-enantiomer as a white solid (38 mg, 71% yield, 99% e.e.). HRMS: Calculated [M+H]+ 210.1277, found 210.1275. 1H NMR, 400 MHz,

CDCl3 δ ppm: 7.4 – 7.25 (m, 5H), 7.21-7.14 (m, 2H), 7.10 – 7.03 (m, 1H), 6.78 (d, J = 7.76 Hz, 1H), 5.14 (s, 1H), 3.30 (m, 7.1 Hz, 1H), 3.18 – 3.01 (m, 2H), 2.92 13 – 2.80 (m, 1H), 1.82 (s, 1H). C NMR, 100 MHz, CDCl3 δ ppm: 144.7, 138.15, 135.39, 129.0, 128.9, 128.4, 128.1, 127.4, 126.3, 125.7, 62.1, 42.2, 29.7. Preparative deracemization of compounds 3-6 & 8 The deracemization reactions were set up as whole cell processes. The respective amine (15 mM), and 60 mM borane-ammonia complex were dissolved in 25 mL 1 M sodium phosphate buffer at pH 7.0. The reaction was initiated by addition of 2 g of a wet cell pellet of BL21 (DE3) pLysS E. coli cells expressing PT.1 CHAO. Aliquots were periodically analyzed by quenching 250 µL of the reaction mixture with 20 L of 10 M NaOH, followed by extraction of the amine with 1 mL TBME. The organic phase was dried over MgSO4 and filtered through a 0.2 µm AcroPrep 96 filter plate (Pall) and analyzed by HPLC on a chiral stationary phase. When the e.e. of the product converged, the reaction was quenched by raising the pH to 12 with 10 M NaOH and extracted three times with 25 mL TBME. The organic phase 5. Materials and methods 94

was combined, dried over MgSO4 , and solvent removed in vacuo. Compound 3 was analyzed on a Chiralcel OD-H (Chiral Technologies Europe) column (150 x 4.6 mm, particles diameter = 5 µM) eluted with a mixture of n- hexane and ethanol (95:5) containing 0.05% diethylamine at 1 mL/min at room temperature. No enantiomeric enrichment was detected over 48 h. The product was not further purified or analyzed. Deracemization of 4 yielded 19 mg of a colorless oil (35% yield, e.e. 50%). The product enantiomers were separated on a Chiralcel OD-H (Chiral Technologies Europe) column (150 x 4.6 mm, particles diameter = 5 µM) eluted with a mobile phase of n-hexane and ethanol (95:5) containing 0.05% diethylamine at 1 mL/min at room. The reaction was quenched by raising the pH to 12 with 10 M NaOH when the e.e. converged. After extraction with 3x25 mL TBME, the organic phase was combined, 75 mL of water added, and the pH lowered to 1 by addition of 1 M HCl. The phases were separated and the aqueous phase was again titrated to pH 12 with 10 M NaOH and extracted with 75 mL of TBME, the organic phase was collected, dried over MgSO4 and removed in vacuo and the product analyzed again by HPLC on a chiral stationary phase (Fig. 2.12a). The retention times were assigned according to the literature [246]. HRMS: Calculated [M+H]+ 1 148.1121, found 148.1122. H NMR, 400 MHz, CDCl3 δ ppm: 7.46 – 7.20 (m, 5H), 3.94 (td, J = 7.1, 2.1 Hz, 1H), 1.79 – 1.65 (m, 2H), 1.42 – 1.19 (m, 2H), 0.92 13 (t, J = 7.3 Hz, 3H); C NMR, 100 MHz, CDCl3 δ ppm: 128.62, 128.50, 127.16, 127.10, 126.46, 126.19, 56.06, 41.20, 19.66, 13.97. Deracemization of 5 yielded 41 mg of yellow oil (75% yield, e.e. 5%). An authentic standard of the racemic compound and the isolated product was sepa- rated using a Chiralpak AD-H column (4.6 mm x 250 mm, Daicel) with a mixture of n-hexane and ethanol (98:2) containing 0.05% diethylamine at 1 mL/min at room temperature (Fig. 2.12b). The e.e. was determined to be 5% (S). The enantiomers were assigned by comparison to reported values of their retention time [245]. HRMS: Calculated [M+H]+ 148.1121, found 148.1120. 1H NMR, 400

MHz, CDCl3 δ ppm: 7.34 – 7.07 (m, 4H), 4.64 (d, J = 7.3 Hz, 1H), 3.65 – 2.97 13 (m, 4H), 1.87 (d, J = 6.8 Hz, 3H); C NMR, 100 MHz, CDCl3 δ ppm: 133.11, 131.15, 129.16, 127.93, 127.32, 126.05, 51.04, 39.12, 25.71, 20.17. 5. Materials and methods 95

Deracemization of 6 yielded 49 mg of product as a yellow oil (81% yield, e.e. 98%). The racemic standard and the reaction product were analyzed on a Chiralcel OD-H (Chiral Technologies Europe) column (150 x 4.6 mm, particles diameter = 5 µM) at room temperature using n-hexane and ethanol (99:1) con- taining 0.05% diethylamine as an eluent at a flow rate of 1 mL/min (Fig. 2.12c). The enantiomers were assigned by comparison to reported retention time values [245]. The e.e. was determined to be 98% (S)-6. HRMS: Calculated [M+H]+ 1 162.1277, found 162.1281. H NMR, 400 MHz, CDCl3 δ ppm: 7.33 – 7.13 (m, 4H), 4.50 (t, J = 5.6 Hz, 1H), 3.67 (d, J = 11.9 Hz, 1H), 3.42 – 3.25 (m, 2H), 3.17 13 – 3.03 (m, 1H), 2.26 – 2.17 (m, 2H), 1.24 (t, 3H); C NMR, 100 MHz, CDCl3 δ ppm: 131.99, 131.66, 129.21, 127.86, 127.14, 126.40, 56.49, 39.92, 27.37, 25.76, 10.19. Compound 8 was analyzed on a Chiralpak AD-H column (4.6 mm x 250 mm, Daicel), and eluted using a mixture of n-hexane and ethanol (95:5) containing 0.05% diethylamine at 1 mL/min at room temperature. No enantiomeric en- richment was detected over 48 h. The product was neither further purified nor analyzed.

5.2.13 Modeling of PT.1

The crystal structure of cyclohexylamine oxidase from Brevibacterium oxydans IH-35A (PDB ID: 4i59) was used as reference structure for the docking stud- ies. The five mutations in PT.1 (L199V, M226T, Y321S, L353I and P422S) were manually introduced using the ProteinBuilder module in MOE2016.08 (Molecu- lar Operating Environment, Chemical Computing Group) [280]. To prepare the structure for docking, the following steps were performed with the MOE2016.08 Structure Preparation module: the structure was protonated at pH 7.4, structural issues were addressed (adding hydrogens, correct partial charges and hybridiza- tion). The resulting structure, as well as all ligand complexes, was minimized using the AMBER10:EHT (Extended Hückel Theory) forcefield (termination: root mean square gradient < 0.1 kcal mol−1 Å−2). 5. Materials and methods 96

5.2.14 Ligand docking of PheTIQ in PT.1

Potential binding sites were calculated by Site Finder in MOE2016.08 [280]. The active site identified next to the co-crystallized ligand was selected as the bind- ing pocket for docking. Docking was executed in MOE2016.08 using the Triangle Matcher placement method and London dG scoring method. 1000 poses were returned and 100 poses were passed to the refinement step. “Induced Fit” was chosen as the post-placement refinement method with default parameters using GBVI/WSA dG as final scoring method. 10 poses were finally retained. To confirm the identified poses, the docking was repeated with GOLD 5.5 (CCDC Software) as integrated in MOE2016.08. This re-docking was performed with default parameters (efficiency = 100) using the ChemPLP scoring function and “Induced Fit” as refinement method with GBVI/WSA dG as final scoring func- tion. 100 poses were passed to the refinement step, and 10 poses were saved.

5.3 Methods specific to Chapter 3

5.3.1 Materials

The plasmids pMG209 and pKTNTET were previously used in different studies [261, 276]. The pQE-MBP plasmid (Supplementary Figure A.8) was previously constructed from pQE-80L (Quiagen) [227] . All E. coli strains were purchased from Agilent. All constructs were verified by Sanger sequencing (Microsynth).

5.3.2 Chemical synthesis

The substrate was synthesized in a previous study [229]. Briefly, the nucleophilic substitution of 4-chloromethyl-7-hydroxycoumarin with Na2SO4 in water yielded the sulfonated leaving group. (S)-2-phenylpropionic acid was used to form the corresponding acyl chloride with thionyl chloride. Ester bond-formation between the 7-hydroxy-4-methylsulfonate coumarin and (S)-2-phenylpropionyl chloride yielded the esterase substrate (S)-1. The substrate was precipitated in 5. Materials and methods 97

MeOH:MeCN (1:4) and filtered. The filtrate was redissolved in H2O:MeOH

(1:1) and purified by preparatory HPLC (H2O:MeCN, 5% to 60% MeCN over 30 minutes, flowrate of 10 mL/min).

5.3.3 Construction of pMG209-pelB-MID1sc10 pMG209-pelB-MID1sc4a [251] was digested using BamHI and XhoI. The resulting vector was purified by agarose gel electrophoresis. The MID1sc10 gene was amplified from pQE-MBP-MID1sc10 [229] by PCR using mid1 fw and mid1 rv (Table 5.1), purified by agarose gel electrophoresis, and digested with BamHI and XhoI at 37◦C for 4h, and ligated with the vector at 16 ◦C overnight using T4 DNA ligase. The resulting plasmid (Supplementary Figure A.7) was used to transform BL21 (DE3) Gold cells, which were rescued in 1 mL of SOC medium for 1 h at 37 ◦C before plating on kanamycin containing LB Agar plates.

5.3.4 Construction of pQE-MID1sc10 pQE-MBP-GFP [229] was digested using EcoRI and HindIII and the vector purified by agarose gel electrophoresis. MID1sc10 was amplified from pMG209- MID1sc10 using primers pQM fwd and rev (Table 5.1) by PCR, introducing a new start codon, and purified by agarose gel electrophoresis. 80 ng of vector and 50 ng of insert were ligated by Gibson assembly using 30 mU T5 exonuclease, 150 mU Phusion polymerase, and 25 U Taq DNA ligase in 250 mM Tris-HCl, 10 mM MgCl2, 2 µM dNTP, 10 mM DTT, 125 µg/mL PEG-8000, 1 mM NAD, Ph 7.5 at 50 ◦C for 60 min (Supplementary Figure A.9). The mixture was used to transform chemically competent BL21 (DE3) Gold and XL-1 Blue cells, which were rescued in 1 mL of SOC medium for 1 h at 37 ◦C and then plated on ampicillin containing LB Agar plates. 5. Materials and methods 98

5.3.5 Construction of pKTNTET-pelB-MID1sc10

The pelB-MID1sc10 gene was amplified from pMG209-MID1sc10 by PCR using pMG2pK fw and rv primers (Table 5.1), which introduced NdeI and SpeI re- striction sites. These amplicons were purified by agarose gel electrophoresis and, alongside pKTNTET-CHAO-Kan, digested using NdeI and SpeI in an overnight reaction at 37 ◦C. The insert was purified using the Clean and Concentrator 5 Kit (Zymo research) and the pKTNTET-Kan vector by agarose gel electrophoresis. The fragments were ligated using T4 ligase at 16 ◦C overnight (Supplementary Figure A.6). The ligation mixture was used to transform chemically competent BL21 (DE3) Gold and XL-1 Blue cells, which were rescued in 1 mL of SOC medium for 1 h at 37 ◦C before plating on kanamycin containing LB Agar plates.

5.3.6 Construction of pKTNTET-MID1sc10

The MID1sc10 gene was amplified from pQE-MBP-MID1sc10 by PCR using pQE2pK fw and rev primers (Table 5.1), introducing an NdeI and SpeI restric- tion site. Amplicons were purified by agarose gel electrophoresis and, alongside pKTNTET-CHAO-Kan, digested using NdeI and SpeI in an overnight reaction at 37 ◦C. The insert was purified using the Clean and Concentrated 5 Kit (Zymo research) and the pKTNTET-Kan vector by agarose gel electrophoresis. The full length plasmid was formed by ligating the fragments using T4 ligase at 16 ◦C overnight (Supplementary figure A.6). The ligation mixture was used to trans- form chemically competent BL21 (DE3) Gold cells, which were rescued in 1 mL of SOC medium for 1 h at 37 ◦C and then plated on kanamycin containing LB Agar plates.

5.3.7 Construction of the active site library

Three pairs of primers were used to introduce four NNK codons into the MID1sc10 gene (Table 5.3). The gene was split into four fragments by PCR using the mutagenic primer pairs and mid1 fw and rv primers. The fragments were purified by agarose gel electrophoresis and the full length gene library 5. Materials and methods 99

Table 5.4: Overview of the different plasmids used in Chapter 3 and their properties.

Antibiotic Plasmid Secretion Length (kbp) Purpose resistance pKTNTET-MID1sc10 No Kan 2.9 Screening pKTNTET-pelB-MID1sc10 Yes Kan 3 Screening pQE-MID1sc10 No Amp 5 Screening pMG209-pelB-MID1sc10 Yes Kan 6 Screening pQE-6His-MBP-MID1sc10 No Amp 6.2 Expression assembled by overlap extension PCR. The assembly was purified by agarose gel electrophoresis, digested using BamHI and XhoI overnight at 37 ◦C and further purified using the Clean and Concentrator 5 kit. Freshly prepared pMG209 vector was prepared by digestion with BamHI and XhoI at 37 ◦C overnight, purification by agarose gel electrophoresis, and desalting with the Clean and Concentrator 5 kit. The ligation was carried out with 1.5 µg of vector and 3 molar equivalents of insert, using T4 ligase at 16 ◦C overnight. The ligation mixture was purified using the Clean and Concentrator 5 kit and the eluate used to transform four aliquots into XL1-Blue cells. Each aliquot of transformed cells was recovered in 50 mL of SOC at 37 ◦C, which was shaken at 230 rpm for one hour before adding kanamycin to a concentration of 25 µg/mL. After overnight incubation, the supercoiled plasmid was extracted using the Plasmid Miniprep Kit (Zymo Research) and further purified using the DNA Clean and Concentrator 5 kit. The pure supercoiled plasmid was used to transform electrocompetent BL21 (DE3) Gold cells.

5.3.8 Microfluidic assay

To assess expression systems, overnight cultures of the different vector-strain combinations were used to inoculate a 50 mL of LB expression culture containing the corresponding antibiotic. Library expression was preceded by transforming 500 µg of supercoiled plasmid into BL21 (DE3) Gold and recovering in 50 mL of SOC at 37 ◦C and 230 rpm. After one hour, the corresponding antibiotic was added. When the cultures reached an OD600 of 0.3, expression was induced by 5. Materials and methods 100 adding IPTG to a concentration of 0.5 mM and the temperature lowered to 18 ◦C. After overnight expression, 2 mL of the expression culture were harvested by centrifugation at 3500 xg and 4 ◦C and the resulting pellets were washed three times with 1 mL of ice-cold isotonic HEPES buffer (5 mM HEPES, 5 mM KCl, 150 mM NaCl, 0.4% glucose, pH 7.4). Cells were resuspended, filtered through a 5 µm syringe filter (Millipore) and diluted to an OD600 of 0.04 or 0.06 with isotonic HEPES for off-chip storage or on-chip storage, respectively. The expression systems were assessed for activity using both off-chip stor- age and 45 min on-chip delay lines. The library was first sorted using off-chip incubation, followed by another sort using a 5 min on-chip delay line. Off-chip storage: The emulsion was generated with a flow focusing nozzle [137] at a 300 µL/h flow for the oil phase (Novec HFE-7500 fluorinated oil (3M) containing 2% (w/w) 008-FluoroSurfactant from RAN Biotechnologies) and 100 µL/h flow for the two aqueous phases, generating 15 pL droplets at a frequency of 3 kHz. The cell suspension was co-encapsulated with a solution containing 15 µM fluorescein, 400 µM zinc sulfate, and 200 µM substrate in isotonic HEPES buffer. The emulsion, which contains an average λ of 0.3 cells per droplet, was collected in a 1 mL syringe purged with HFE-7500. After an incubation of 0.5 – 2 h at room temperature, droplets were reinjected into the sorting device at a flowrate of 40 µL/h and spaced by injecting Novec HFE 7100 oil (3M) at a flowrate of 650-750 µL/h. The emulsion was sorted with an electric pulse frequency of 15 kHz, at a voltage of 620 kV with pulse lengths ranging from 0.5 – 0.8 ms. The droplets were sorted at a frequency of ca. 1000 Hz. On-chip incubation: Droplets were generated at a T-junction [136] with the oil phase (Novec HFE-7500 fluorinated oil (3M) containing 2% (w/w) 008- FluoroSurfactant from RAN Biotechnologies) flowing at 30 µl/h and all the aqueous phases at 20 µL/h. A mixture containing 15 µM fluorescein and 400 µM zinc sulfate in isotonic HEPES buffer was injected alongside a solution containing 200 µM substrate in isotonic HEPES buffer and another containing the washed and diluted cells. The emulsion, containing an average λ of 0.3 cells per droplet, was spaced with Novec HFE-7100 oil (3M) at a flowrate of 650-750 µL/h. Sorting parameters were identical to the ones described above. 5. Materials and methods 101

Figure 5.3: Schematics of microfluidic chip with integrated droplet generation, 5 min incubation line, and sorting junction.

Generally, the droplet size was determined using the fluorescence of fluorescein excited at 488 nm and measured at 520 nm. Esterase activity is proportional to the fluorescence emission at 448 nm (excitation = 375 nm), resulting from the coumarin cleavage product. Although the total number of active droplets is unknown at the outset of a sorting experiment, the distribution of measured activities usually converges within seconds and changes only slowly over time. To test the capacity to regrow cells after sorting, the evaluation of pMG209- pelB-MID1sc10 using isotonic HEPES and isotonic HEPES with 0.4% glucose was used. Droplets displaying activity were sorted and the outlet tubing was placed in a vial containing LB with or without antibiotics. The collected emulsion was broken by adding 1H,1H,2H,2H-perfluorooctanol and gentle mixing, followed by centrifugation at 1000xg on a tabletop centrifuge. The aqueous phase was decanted and plated on an LB agar plate containing kanamycin. After overnight growth at 30 ◦C the number of colonies was compared to the amount of sorted droplets to determine the survival rate of E. coli cells. 5. Materials and methods 102

For the library sorting, gates were set to collect the top 0.3% and 0.03% droplets that displayed the highest fluorescence for the first and second sort, respectively. For off-chip incubations, the threshold was periodically adjusted to counteract non-uniform incubation times. The sorted emulsion was broken and cells lysed by adding 75 µL 1H,1H,2H,2H-perfluorooctanol and 15 µL 10 mM Tris-HCl (pH 8), 10 mM EDTA, 100 mM NaCl, 1% Triton X-100, 1 mg/mL proteinase K. Following one hour incubation at room temperature, DNA was recovered using the DNA Clean and Concentrator 5 kit and amplified by PCR using Phusion DNA polymerase (30 s 98 ◦C, 30 x (10 s 94 ◦C, 30 s 55 ◦C, 8 s 72 ◦C), 5 min 72 ◦C, final hold 4 ◦C) using mid1 fwd and mid1 rv primers (Table 5.1). The PCR product was purified by agarose gel electrophoresis and subcloned into fresh pMG209 vector and used to transform electro-competent BL21 (DE3) Gold cells. Both from one sort with off-chip incubation and from a subsequent sort with 5 min on-chip incubation, single clones were analyzed using a microtiter plate-based assay.

5.3.9 Microtiter plate assay

After each round of microfluidics sorting, the subcloned library was transformed by electroporation into BL21 (DE3) Gold cells and plated on LB Agar contain- ing 25 µg/mL kanamycin. Single colonies from the transformations, together with three wild-type control colonies, were used to inoculate pre-cultures on a microtiter plate containing 150 µL LB medium per well, supplemented with 25 µg/mL kanamycin and the plates were sealed with a gas-permeable membrane (Breathe Easy, Diversified Biotech). After overnight incubation at 30 ◦C, 700 rpm, and 100% humidity, 30 µL were used to inoculate a deep-well plate contain- ing 1.8 mL LB medium with 25 µg/mL kanamycin per well and plates were sealed with a gas-permeable membrane (Breathe Easier, Diversified Biotech). Cultures ◦ were grown at 37 C and after reaching OD600 of 0.3, gene expression was in- duced by adding IPTG to a final concentration of 0.5 mM. Subsequently, the plates were incubated at 18 ◦C and 240 rpm overnight. Cells were harvested by centrifugation at 4000 rpm for 20 min at 4 ◦C, the supernatant discarded, and 5. Materials and methods 103 the pellets stored at -20 ◦C. Lysis was carried out by four freeze thaw cycles and resuspension of the pellets in 40 mM HEPES, 50 mM NaCl, 0.2 mg/mL lysozyme, 1 µg/mL DNAaseI pH 8.0 and 3 h incubation at room temperature. The lysates were cleared by centrifugation at 4000 rpm for 20 min at 4 ◦C. In a flat bottom Fluoronunc microtiter plate (Thermo Scientific), 2 µL of a solution containing 2 mM substrate in water was distributed to each well. The reaction was initiated by adding 198 µL of cleared lysate. Activity values were determined by measuring the change of fluorescence (ex. at 380 nm, em. at 470 nm) in a microtiter plate reader (Varioskan, Thermo Scientific). Normalizing the slope to the average of three internal MID1sc10 controls was performed using a custom script in python (version 2.7). Overnight cultures were inoculated from the pre-cultures of the best variants. Their plasmids isolated and DNA sequences evaluated by sanger sequencing (Microsynth).

5.3.10 pQE-MBP subcloning

For protein expression, variants MID1sc10.1-3 were subcloned into pQE-MBP. PCR amplification of the pMG208-MID1sc10.1-3 genes and purification by agarose gel electrophoresis yielded the individual genes. The amplicons and pQE-MBP were digested using BamHI and XhoI overnight at 37 ◦C. The vector was further purified by agarose gel electrophoresis and the digested amplicons using Clean and Concentrated 5 Kit (Zymo research). The resulting fragments were ligated using T4 DNA ligase at 16 ◦C overnight and after purification by the Clean and Concentrator 5 kit used to transform BL21 (DE3) Gold cells.

5.3.11 Protein purification

The three variants, MID1sc10.1-3, were expressed as fusion proteins with maltose- binding protein (MBP), linked by a TEV-protease cleavage site. To that end, overnight cultures (5mL LB-medium, 100 µg/mL Amp) of single BL21 (DE3) Gold colonies transformed with pQE-6His-MBP-MID1sc10.1-3 were used to in- oculate three 500 mL LB-media cultures containing 100 µg/mL ampicillin, which 5. Materials and methods 104 were then incubated at 37 ◦C and 230 rpm. The temperature was switched to 18 ◦ C after the cells reached an OD600 of 0.4. Upon reaching an OD600 of 0.6 - 0.8, they were induced with 0.5 mM IPTG and further incubated overnight. The cultures were harvested by centrifugation at 4500 xg and 4 ◦C, the supernatant discarded, and cell pellets frozen at -20 ◦C. After thawing, the pellets were resuspended in 20 mM Tris, 100mM Nacl, 10% glycerol, pH 8.0 buffer supplemented with 11 µL of protease inhibitor (Roche, 1000x) and 1mg/mL of lysozyme. The cell suspension was incubated for 30 min on ice before sonicating four times for 1 min with a 1 min break in between with instrument settings of 0.5 cycles and 60% amplitude. After adding a 1 mg of DNaseI, the lysate was incubated for 10 min on ice. After clarification by centrifugation, the imidazole concentration of the supernatant was raised to 25 mM and it was added to a Nickel(II)-NTA bead (Qiagen) column equilibrated with 20 mM Tris-HCl, 100 mM NaCl, 25 mM imidazole and incubated for 5 min at room temperature. The column was then washed with 90 mL of equilibration buffer and target proteins eluted with 40 mL of elution buffer (20 mM Tris-HCl, 100 mM NaCl, 250 mM imidazole, pH 8.0). For each protein, two fractions were collected and concentrated using a 10k Amicon Ultra-15 Centrifugal Filter Unit (Merck Millipore). The eluted MBP-MID1sc10.1-3 fusion proteins were cleaved by adding TEV protease to a concentration of 0.05 mg/mL and dialyzing against 2 L of 40 mM HEPES, 100 mM NaCl, 2 mM EDTA, 1 mM DTT, pH 8.0 at 4 ◦C overnight. Following TEV cleavage, the protein mixture was washed over a second Ni- NTA column and 2 mL Amylose Resin to remove the 6His-tagged MBP and TEV protease. The MID1sc10 variants were further purified by size-exclusion chromatography on a Superdex-75 column (GE Healthcare, HiLoad 26/60 prep grade). The sample was eluted using 40 mM HEPES, 150 mM NaCl, pH 8.0 at 4 ◦C. Protein purity was assessed using SDS-PAGE and appropriate fractions were pooled and concentrated using a 3 kDa Amicon Ultra-15 Centrifugal Filter Unit (Merck Millipore). 5. Materials and methods 105

5.3.12 Enzyme characterization

The three variants of MID1sc10 were characterized with regards to kcat at sat- urating conditions. All reactions were carried out at 25 ◦C and monitored by measuring the change in absorbance of the coumarin product of (S)-1 cleav- age at 377 nM (= 10’520 M−1cm−1)[229]. Reactions were initiated by adding concentrated, purified enzyme to a mixture of 200 µM zinc sulfate and 50 µM substrate in 40 mM HEPES, 50 mM NaCl pH 8.0. Initial rates were corrected for spontaneous hydrolysis in the same buffer containing 200 µM zinc sulfate. Bibliography

[1] Wolfenden, R. & Snider, M. J. The Depth of Chemical Time and the Power of Enzymes as Catalysts. Acc. Chem. Res. 34, 938–945 (2001).

[2] Gilbert Walter. The RNA world. Nature 319, 618 (1986).

[3] Russell, M. J. & Martin, W. The rocky roots of the acetyl-CoA pathway. Trends Biochem. Sci. 29, 358–363 (2004).

[4] Eyring, H. The Activated Complex in Chemical Reaction. J. Chem. Phys. 3, 107–115 (1935).

[5] Pauling, L. Molecular architecture and biological reactions. Chem. Eng. News 24, 1375–1377 (1946).

[6] Alberty, R. A. & Hammes, G. G. Application of the theory of diffusion- controlled reactions to enzyme kinetics. J. Phys. Chem. 62, 154–159 (1958).

[7] Houk, K. N., Leach, A. G., Kim, S. P. & Zhang, X. Binding Affinities of Host-Guest, Protein-Ligand, and Protein-Transition-State Complexes. Angew. Chem. Int. Ed. 42, 4872–4897 (2003).

[8] Hedstrom, L. Serine protease mechanism and specificity. Chem. Rev. 102, 4501–4523 (2002).

[9] Blow, D. M., Birktoft, J. J. & Hartley, B. S. Role of a buried acid group in the mechanism of action of chymotrypsin. Nature 221, 337–340 (1969).

[10] Carter, P. & Wells, J. A. Dissecting the catalytic triad of a serine protease. Nature 332, 564–568 (1988).

106 Bibliography 107

[11] Rao, S. N., Singh, U. C., Bash, P. A. & Kollman, P. A. Free energy perturbation calculations on binding and catalysis after mutating Asn 155 in subtilisin. Nature 328, 551–554 (1987).

[12] Robertus, J. D., Kraut, J., Alden, R. A. & Birktoft, J. J. Subtilisin; a Stereochemical Mechanism Involving Transition-State Stabilization. Bio- chemistry 11, 4293–4303 (1972).

[13] Quinn, D. M. Acetylcholinesterase: Enzyme Structure, Reaction Dynamics, and Virtual Transition States. Chem. Rev. 87, 955–979 (1987).

[14] Christianson, D. W. & Lipscomb, W. N. Carboxypeptidase A. Acc. Chem. Res. 22, 62–69 (1989).

[15] Meunier, B., de Visser, S. P. & Shaik, S. Mechanism of oxidation reac- tions catalyzed by cytochrome P450 enzymes. Chem. Rev. 104, 3947–3980 (2004).

[16] Mortenson, L. E., Valentine, R. C. & Carnahan, J. E. An electron transport factor from Clostridium Pasteurianum. Biochem. Biophys. Res. Commun. 7, 1962 (1962).

[17] Holmes, M. A. & Matthews, B. W. Binding of Hydroxamic Acid Inhibitors to Crystalline Thermolysin Suggests a Pentacoordinate Zinc Intermediate in Catalysis. 20, 6912–6920 (1981).

[18] Monzingo, A. F. & Matthews, B. W. Binding of N-Carboxymethyl Dipep- tide Inhibitors to Thermolysin Determined by X-ray : A Novel Class of Transition-State Analogues for Zinc Peptidases. Biochem- istry 23, 5724–5729 (1984).

[19] Kester, W. R. & Matthews, B. W. Crystallographic Study of the Binding of Dipeptide Inhibitors to Thermolysin: Implications for the Mechanism of Catalysis. Biochemistry 16, 2506–2516 (1977).

[20] Morihara, K. & Tsuzuki, H. Thermolysin: Kinetic Study with Oligopep- tides. Eur. J. Biochem. 15, 374–380 (1970). Bibliography 108

[21] Walsh, C. T. & Wencewicz, T. A. Flavoenzymes: Versatile catalysts in biosynthetic pathways. Nat. Prod. Rep. 30, 175–200 (2013).

[22] Turner, N. J. Enantioselective Oxidation of C-O and C-N Bonds Using Oxidases. Chem. Rev. 111, 4073–4087 (2011).

[23] Ewing, T. A., Dijkman, W. P., Vervoort, J. M., Fraaije, M. W. & van Berkel, W. J. H. The Oxidation of Thiols by Flavoprotein Oxidases: a Biocatalytic Route to Reactive Thiocarbonyls. Angew. Chem. Int. Ed. 53, 13206–13209 (2014).

[24] Pollegioni, L., Sacchi, S. & Murtas, G. Human D-amino acid oxidase: Structure, function, and regulation. Front. Mol. Biosci. 5, 1–14 (2018).

[25] Mattevi, A. et al. Crystal structure of D-amino acid oxidase: A case of active site mirror-image convergent evolution with flavocytochrome b2. Proc. Natl. Acad. Sci. U. S. A. 93, 7496–7501 (1996).

[26] Fitzpatrick, P. F. Oxidation of Amines by Flavoproteins. Arch. Biochem. Biophys. 493, 13–25 (2011).

[27] Liu, Q. et al. pH-Dependent Enantioselectivity of D-amino Acid Oxidase in Aqueous Solution. Sci. Rep. 7, 1–9 (2017).

[28] Molla, G. et al. Characterization of human d-amino acid oxidase. FEBS Lett. 580, 2358–2364 (2006).

[29] Sacchi, S., Cappelletti, P. & Murtas, G. Biochemical properties of human D-amino acid oxidase variants and their potential significance in patholo- gies. Front. Mol. Biosci. 5, 1–21 (2018).

[30] Megerle, U. et al. Unraveling the flavin-catalyzed photooxidation of ben- zylic alcohol with transient absorption spectroscopy from sub-pico- to mi- croseconds. Phys. Chem. Chem. Phys. 13, 8869–8880 (2011).

[31] Kutta, R. J., Archipowa, N., Johannissen, L. O., Jones, A. R. & Scrutton, N. S. Vertebrate Cryptochromes are Vestigial Flavoproteins. Sci. Rep. 7, 1–11 (2017). Bibliography 109

[32] Roth, J. P. & Klinman, J. P. Catalysis of electron transfer during activation of O2 by the flavoprotein . Proc. Natl. Acad. Sci. U. S. A. 100, 62–67 (2003).

[33] Sacchi, S. et al. Engineering the substrate specificity of D-amino-acid oxidase. J. Biol. Chem. 277, 27510–27516 (2002).

[34] Fraaijet, M. W., Van Den Heuvel, R. H., Van Berkel, W. J. & Mattevi, A. Covalent flavinylation is essential for efficient redox catalysis in vanillyl- alcohol oxidase. J. Biol. Chem. 274, 35514–35520 (1999).

[35] Mason, H. S. Mechanisms of Oxygen Metabolism. Sceince 125, 1185–1188 (1957).

[36] Guengerich, F. P. Mechanisms of Cytochrome P450-Catalyzed Oxidations (2018).

[37] Rittle, J. & Green, M. T. Cytochrome P450 compound I: Capture, charac- terization, and C-H bond activation kinetics. Science 330, 933–937 (2010).

[38] Imai, T., Yamazaki, T. & Kominami, S. Kinetic studies on bovine cy- tochrome P450(11β) catalyzing successive reactions from deoxycorticos- terone to aldosterone. Biochemistry 37, 8097–8104 (1998).

[39] Poulos, T. L., Finzel, B. C. & Gunsalus, I. C. The 2.6-Å crystal structure of Pseudomonas putida cytochrome P-450. J. Biol. Chem. 260, 16122–16130 (1985).

[40] Atkins, W. M. & Sligar, S. G. The roles of active site hydrogen bonding in cytochrome P-450(cam) as revealed by site-directed mutagenesis. J. Biol. Chem. 263, 18842–18849 (1988).

[41] Atkins, W. M. & Sligar, S. G. Molecular Recognition in Cytochrome P-450: Alteration of Regioselective Hydroxylation via Protein Engineering. J. Am. Chem. Soc. 111, 2715–2717 (1989).

[42] Schlichting, I. et al. The catalytic pathway of cytochrome P450cam at atomic resolution. Science 287, 1615–1622 (2000). Bibliography 110

[43] Skinner, S. P. et al. Delicate conformational balance of the redox enzyme cytochrome P450cam. Proc. Natl. Acad. Sci. U. S. A. 112, 9022–9027 (2015).

[44] Basom, E. J., Spearman, J. W. & Thielges, M. C. Conformational land- scape and the selectivity of cytochrome P450cam. J. Phys. Chem. B 119, 6620–6627 (2015).

[45] Bugg, T. D. H. Introduction to Enzyme and Coenzyme Chemistry (John Wiley & Sons, Ltd, 2012), 3rd edn.

[46] Koeller, K. M. & Wong, C. H. Enzymes for chemical synthesis. Nature 409, 232–240 (2001).

[47] Sheldon, R. A. & Woodley, J. M. Role of Biocatalysis in Sustainable Chemistry. Chem. Rev. 118, 801–838 (2018).

[48] Buchholz, K. A breakthrough in enzyme technology to fight penicillin resistance—industrial application of penicillin amidase. Appl. Microbiol. Biotechnol. 100, 3825–3839 (2016).

[49] Peterson, D. H. et al. Microbiological Transformations of Steroids. 1 I. Introduction of Oxygen at Carbon-11 of Progesterone. J. Am. Chem. Soc. 74, 5933–5936 (1952).

[50] Morihara, K., Oka, T. & Tsuzuki, H. Semi-synthesis of human insulin by trypsin-catalysed replacement. Nature 280, 412–413 (1979).

[51] Bornscheuer, U. T. et al. Engineering the third wave of biocatalysis. Nature 485, 185–194 (2012).

[52] Mills, D. R., Peterson, R. L. & Spiegelman, S. An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule. Proc. Natl. Acad. Sci. U. S. A. 58, 217–224 (1967).

[53] Chen, K. & Arnold, F. H. Tuning the activity of an enzyme for unusual environments: Sequential random mutagenesis of subtilisin E for catalysis Bibliography 111

in dimethylformamide. Proc. Natl. Acad. Sci. U. S. A. 90, 5618–5622 (1993).

[54] Stemmer, W. P. Rapid evolution of a protein in vitro by DNA shuffling. Nature 370, 389–391 (1994).

[55] Jäckel, C., Kast, P. & Hilvert, D. Protein design by directed evolution. Annu. Rev. Biophys. 37, 153–73 (2008).

[56] Packer, M. S. & Liu, D. R. Methods for the directed evolution of proteins. Nat. Rev. Genet. 16, 379–394 (2015).

[57] Arnold, F. H. Directed Evolution: Bringing New Chemistry to Life. Angew. Chem. Int. Ed. 57, 4143–4148 (2018).

[58] Arnold, F. H. Design by Directed Evolution. Acc. Chem. Res. 31, 125–131 (1998).

[59] Chen, K. & Arnold, F. H. Enzyme engineering for nonaqueous solvents: Random mutagenesis to enhance activity of subtilisin E in Polar Organic Media. Bio/Technology 9, 1073–1077 (1991).

[60] Reetz, M. T. High-throughput Screening Systems for Assaying the Enan- tioselectivity of Enzymes. Enzym. Assays High-throughput Screening, Genet. Sel. Fingerprinting, chap. 2, 41–76 (2006).

[61] Taylor, S. V., Kast, P. & Hilvert, D. Investigating and engineering enzymes by genetic selection. Angew. Chem. Int. Ed. 40, 3310–3335 (2001).

[62] Giver, L., Gershenson, A., Freskgard, P.-O. O. & Arnold, F. H. Directed evolution of a thermostable esterase. Proc. Natl. Acad. Sci. U. S. A. 95, 12809–12813 (1998).

[63] Reetz, M. T. et al. Creation of Enantioselective Biocatalysts for by In Vitro Evolution. Angew. Chem. Int. Ed. 36, 2830–2832 (1997). Bibliography 112

[64] Bartsch, S., Kourist, R. & Bornscheuer, U. T. Complete inversion of enantioselectivity towards acetylated tertiary alcohols by a double mutant of a Bacillus subtilis esterase. Angew. Chem. Int. Ed. 47, 1508–11 (2008).

[65] Liang, J. et al. Development of a Biocatalytic Process as an Alternative to the (-)-DIP-Cl-Mediated Asymmetric Reduction of a Key Intermediate of Montelukast. Org. Process Res. Dev. 14, 193–198 (2010).

[66] Kan, S. B., Lewis, R. D., Chen, K. & Arnold, F. H. Directed evolution of cytochrome c for carbon-silicon bond formation: Bringing silicon to life. Science 354, 1048–1051 (2016).

[67] Bachmann, B. O. Biosynthesis: Is it time to go retro? Nat. Chem. Biol. 6, 390–393 (2010).

[68] Turner, N. J. & O’reilly, E. Biocatalytic retrosynthesis. Nat. Chem. Biol. 9, 285–288 (2013).

[69] de Souza, R. O., Miranda, L. S. & Bornscheuer, U. T. A Retrosynthesis Approach for Biocatalysis in . Chem. Eur. J. 23, 12040– 12063 (2017).

[70] Wu, S., Snajdrova, R., Moore, J. C., Baldenius, K. & Bornscheuer, U. T. Biocatalysis: Enzymatic Synthesis for Industrial Applications. Angew. Chem. Int. Ed. 2–34 (2020).

[71] Campodonico, M. A., Andrews, B. A., Asenjo, J. A., Palsson, B. O. & Feist, A. M. Generation of an atlas for commodity chemical production in Escherichia coli and a novel pathway prediction algorithm, GEM-Path. Metab. Eng. 25, 140–158 (2014).

[72] Röthlisberger, D. et al. Kemp elimination catalysts by computational en- zyme design. Nature 453, 190–195 (2008).

[73] Carbonell, P., Koch, M., Duigou, T. & Faulon, J. L. Enzyme Discov- ery: Enzyme Selection and Pathway Design. Methods Enzymol., 608, 3–27 (2018). Bibliography 113

[74] Shinkai, I., King, A. O. & Larsen, R. D. A practical asymmetric synthesis of LTD4 antagonist. Pure Appl. Chem. 66, 1551–1556 (1994).

[75] Simhadri, S. et al. Process for preparation of montelukast sodium (2013).

[76] Kara, S. et al. Access to lactone building blocks via horse liver alcohol dehydrogenase-catalyzed oxidative lactonization. ACS Catal. 3, 2436–2439 (2013).

[77] Liang, J. et al. Highly enantioselective reduction of a small heterocyclic ketone: Biocatalytic reduction of tetrahydrothiophene-3-one to the corre- sponding (R)-Alcohol. Org. Process Res. Dev. 14, 188–192 (2010).

[78] Voss, C. V. et al. Orchestration of concurrent oxidation and reduction cycles for stereoinversion and deracemisation of sec-alcohols. J. Am. Chem. Soc. 130, 13969–13972 (2008).

[79] Xu, G. C., Yu, H. L., Zhang, Z. J. & Xu, J. H. Stereocomplementary bioreduction of β-ketonitrile without ethylated byproduct. Org. Lett. 15, 5408–5411 (2013).

[80] Emmanuel, M. A., Greenberg, N. R., Oblinsky, D. G. & Hyster, T. K. Accessing non-natural reactivity by irradiating nicotinamide-dependent en- zymes with light. Nature 540, 414–417 (2016).

[81] Mutti, F. G., Knaus, T., Scrutton, N. S., Breuer, M. & Turner, N. J. Con- version of alcohols to enantiopure amines through dual-enzyme hydrogen- borrowing cascades. Science 224015, 1525–1529 (2015).

[82] Pavlidis, I. V. et al. Identification of (S)-selective transaminases for the asymmetric synthesis of bulky chiral amines. Nat. Chem. 8, 1076–1082 (2016).

[83] Ghislieri, D. et al. Engineering an enantioselective amine oxidase for the synthesis of pharmaceutical building blocks and alkaloid natural products. J. Am. Chem. Soc. 135, 10863–10869 (2013). Bibliography 114

[84] France, S. P. et al. One-Pot Cascade Synthesis of Mono- and Disubstituted Piperidines and Pyrrolidines using Carboxylic Acid Reductase (CAR), ω- Transaminase (ω-TA), and Imine Reductase (IRED) Biocatalysts. ACS Catal. 6, 3753–3759 (2016).

[85] Li, H., Zhu, S. & Zheng, G. Promiscuous (+)-γ-lactamase activity of an amidase from nitrile hydratase pathway for efficient synthesis of carbo- cyclic nucleosides intermediate. Bioorganic Med. Chem. Lett. 28, 1071– 1076 (2018).

[86] Aleku, G. A. et al. A reductive aminase from Aspergillus oryzae. Nat. Chem. 9, 961–969 (2017).

[87] Wetzl, D. et al. Asymmetric Reductive Amination of Ketones Catalyzed by Imine Reductases. ChemCatChem 8, 2023–2026 (2016).

[88] Dydio, P., Key, H. M., Hayashi, H., Clark, D. S. & Hartwig, J. F. Chemos- elective, enzymatic C-H bond amination catalyzed by a cytochrome P450 containing an Ir(Me)-PIX cofactor. J. Am. Chem. Soc. 139, 1750–1753 (2017).

[89] Prier, C. K., Zhang, R. K., Buller, A. R., Brinkmann-Chen, S. & Arnold, F. H. Enantioselective, intermolecular benzylic C–H amination catalysed by an engineered iron-haem enzyme. Nat. Chem. 9, 629–634 (2017).

[90] Janey, J. M. Development of A Sitagliptin Transaminase. Sustain. Catal., 75–87 (2013).

[91] Hansen, K. B. et al. Highly efficient asymmetric synthesis of sitagliptin. J. Am. Chem. Soc. 131, 8798–8804 (2009).

[92] Savile, C. K. et al. Biocatalytic asymmetric synthesis of chiral amines from ketones applied to sitagliptin manufacture. Science 329, 305–309 (2010).

[93] Höhne, M., Kühl, S., Robins, K. & Bornscheuer, U. T. Efficient asym- metric synthesis of Chiral amines by combining transaminase and pyruvate decarboxylase. ChemBioChem 9, 363–365 (2008). Bibliography 115

[94] Koszelewski, D., Clay, D., Rozzell, D. & Kroutil, W. Deracemisation of α-chiral primary amines by a one-pot, two-step cascade reaction catalysed by ω-transaminases. European J. Org. Chem. 2009, 2289–2292 (2009).

[95] Truppo, M. D., Turner, N. J. & Rozzell, J. D. Efficient kinetic resolution of racemic amines using a transaminase in combination with an amino acid oxidase. Chem. Commun. 2127–2129 (2009).

[96] France, S. P., Hepworth, L. J., Turner, N. J. & Flitsch, S. L. Constructing Biocatalytic Cascades: In Vitro and in Vivo Approaches to de Novo Multi- Enzyme Pathways. ACS Catal. 7, 710–724 (2017).

[97] Huffman, M. A. et al. Design of an In-Vitro Biocatalytic Cascade for the Manufacture of Islatravir. Science 1259, 1255–1259 (2019).

[98] Both, P. et al. Whole-Cell Biocatalysts for Stereoselective C-H Amination Reactions. Angew. Chem. Int. Ed. 55, 1511–1513 (2016).

[99] Robin, A. et al. Chimeric self-sufficient P450cam-RhFRed biocatalysts with broad substrate scope. Beilstein J. Org. Chem. 7, 1494–1498 (2011).

[100] Chen, R. R. Permeability issues in whole-cell bioprocesses and cellular membrane engineering. Appl. Microbiol. Biotechnol. 74, 730–738 (2007).

[101] Ro, D.-k. et al. Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature 440, 3–6 (2006).

[102] Galanie, S., Thodey, K., Trenchard, I. J., Interrante, M. F. & Smolke, C. D. Complete biosynthesis of opiods in yeast. Science 349, 1095–1100 (2015).

[103] Schwander, T., Schada von Borzyskowski, L., Burgener, S., Cortina, N. S. & Erb, T. J. A synthetic pathway for the fixation of carbon dioxide in vitro. Science 354, 900–904 (2016).

[104] Truppo, M. D. Biocatalysis in the pharmaceutical industry: The need for speed. ACS Med. Chem. Lett. 8, 476–480 (2017). Bibliography 116

[105] Cadwell, R. C. & Joyce, G. F. Randomization of genes by PCR mutagen- esis. Genome Res. 2, 28–33 (1992).

[106] Horton, R. M., Hunt, H. D., Ho, S. N., Pullen, J. K. & Pease, L. R. Engineering hybrid genes without the use of restriction enzymes: gene splicing by overlap extension. Gene 77, 61–68 (1989).

[107] Kosuri, S. & Church, G. M. Large-scale de novo DNA synthesis: Technolo- gies and applications. Nat. Methods 11, 499–507 (2014).

[108] Bosley, A. D. & Ostermeier, M. Mathematical expressions useful in the construction, description and evaluation of protein libraries. Biomol. Eng. 22, 57–61 (2005).

[109] Starr, T. N. & Thornton, J. W. Epistasis in protein evolution. Protein Sci. 25, 1204–1218 (2016).

[110] Romero, P. A. & Arnold, F. H. Exploring protein fitness landscapes by directed evolution. Nat. Rev. Mol. Cell Biol. 10, 866–876 (2009).

[111] Hayden, E. J., Ferrada, E. & Wagner, A. Cryptic genetic variation promotes rapid evolutionary adaptation in an RNA enzyme. Nature 474, 92–95 (2011).

[112] Pines, G. et al. Codon compression algorithms for saturation mutagenesis. ACS Synth. Biol. 4, 604–614 (2015).

[113] Ashraf, M. et al. ProxiMAX randomization: a new technology for non- degenerate saturation mutagenesis of contiguous codons. Biochem. Soc. Trans. 41, 1189–94 (2013).

[114] Pierce, N. A. & Winfree, E. Protein design is NP-hard. Protein Eng. 15, 779–782 (2003).

[115] Badenhorst, C. P. & Bornscheuer, U. T. Getting momentum: From bio- catalysis to advanced synthetic biology. Trends Biochem. Sci. 43, 180–198 (2018). Bibliography 117

[116] Agresti, J. J. et al. Ultrahigh-throughput screening in drop-based microflu- idics for directed evolution (Proceedings of the National Academy of Sci- ences of the United States of America. Proc. Natl. Acad. Sci. U. S. A. 107, 6560 (2010).

[117] Martis, E. A., Radhakrishnan, R. & Badve, R. R. High-throughput screen- ing: The hits and leads of drug discovery-An overview. J. Appl. Pharm. Sci. 1, 2–10 (2011).

[118] Chen, B. et al. High-throughput analysis and protein engineering using microcapillary arrays. Nat. Chem. Biol. 12, 76–81 (2016).

[119] Yang, G. & Withers, S. G. Ultrahigh-throughput FACS-based screening for directed enzyme evolution. ChemBioChem 10, 2704–2715 (2009).

[120] Varadarajan, N., Cantor, J. R., Georgiou, G. & Iverson, B. L. Construction and flow cytometric screening of targeted enzyme libraries. Nat. Protoc. 4, 893–901 (2009).

[121] Varadarajan, N., Gam, J., Olsen, M. J., Georgiou, G. & Iverson, B. L. Engi- neering of protease variants exhibiting high catalytic activity and exquisite substrate selectivity. Proc. Natl. Acad. Sci. U. S. A. 102, 6855–6860 (2005).

[122] Chen, I., Dorr, B. M. & Liu, D. R. A general strategy for the evolution of bond-forming enzymes using yeast display. Proc. Natl. Acad. Sci. U. S. A. 108, 11399–11404 (2011).

[123] Becker, S. et al. Single-Cell High-Throughput Screening To Identify Enan- tioselective Hydrolytic Enzymes. Angew. Chem. Int. Ed. 47, 5085–5088 (2008).

[124] Niquille, D. L. et al. Nonribosomal biosynthesis of backbone-modified peptides. Nat. Chem. 10, 282–287 (2018).

[125] Griswold, K. E., Aiyappan, N. S., Iverson, B. L. & Georgiou, G. The Evo- lution of Catalytic Efficiency and Substrate Promiscuity in Human Theta Class 1-1 Glutathione Transferase. J. Mol. Biol. 364, 400–410 (2006). Bibliography 118

[126] Santoro, S. W., Wang, L., Herberich, B., King, D. S. & Schultz, P. G. An efficient system for the evolution of aminoacyl-tRNA synthetase specificity. Nat. Biotechnol. 20, 1044–1048 (2002).

[127] Lipovšek, D. et al. Selection of Horseradish Peroxidase Variants with En- hanced Enantioselectivity by Yeast Surface Display. Chem. Biol. 14, 1176– 1185 (2007).

[128] Sadler, J. C., Currin, A. & Kell, D. B. Ultra-high throughput functional enrichment of large monoamine oxidase (MAO-N) libraries by fluorescence activated cell sorting. Analyst 143, 4747–4755 (2018).

[129] Fischlechner, M. et al. Evolution of enzyme catalysts caged in biomimetic gel-shell beads. Nat. Chem. 6, 791–6 (2014).

[130] Baret, J.-C. et al. Fluorescence-activated droplet sorting (FADS): efficient microfluidic cell sorting based on enzymatic activity. Lab Chip 9, 1850–1858 (2009).

[131] Tawfik, D. S. & Griffiths, a. D. Man-made cell-like compartments for molecular evolution. Nat. Biotechnol. 16, 652–656 (1998).

[132] Hai, M., Bernath, K., Tawfik, D. & Magdassi, S. Flow Cytometry : A New Method To Investigate the Properties of Water-in-Oil-in-Water Emulsions. Langmuir 14, 2081–2085 (2004).

[133] Xia, Y. & Whitesides, G. M. Soft lithography. Annu. Rev. Mater. Sci. 28, 153–184 (1998).

[134] Siegel, A. C., Bruzewicz, D. A., Weibel, D. B. & Whitesides, G. M. Mi- crosolidics: Fabrication of three-dimensional metallic microstructures in poly(dimethylsiloxane). Adv. Mater. 19, 727–733 (2007).

[135] Sciambi, A. & Abate, A. R. Generating electric fields in PDMS microfluidic devices with salt water electrodes. Lab Chip 14, 2605–2609 (2014).

[136] Abate, A. R. et al. Impact of inlet channel geometry on microfluidic drop formation. Phys. Rev. E 80, 1–5 (2009). Bibliography 119

[137] Anna, S. L., Bontoux, N. & Stone, H. A. Formation of dispersions using "flow focusing" in microchannels. Appl. Phys. Lett. 82, 364–366 (2003).

[138] Huebner, A. et al. Quantitative detection of protein expression in single cells using droplet microfluidics. Chem. Commun. 2, 1218–20 (2007).

[139] DeMello, A. J. Control and detection of chemical reactions in microfluidic systems. Nature 442, 394–402 (2006).

[140] Link, D. R., Anna, S. L., Weitz, D. A. & Stone, H. A. Geometrically Mediated Breakup of Drops in Microfluidic Devices. Phys. Rev. Lett. 92, 4 (2004).

[141] Abate, A. R., Hung, T., Marya, P., Agresti, J. J. & Weitz, D. A. High- throughput injection with microfluidics using picoinjectors using picoinjec- tors. Proc. Natl. Acad. Sci. U. S. A. 107, 19163–19166 (2010).

[142] Niu, X., Gielen, F., Edel, J. B. & DeMello, A. J. A microdroplet dilutor for high-throughput screening. Nat. Chem. 3, 437–442 (2011).

[143] Frenz, L., Blank, K., Brouzes, E. & Griffiths, A. D. Reliable microfluidic on-chip incubation of droplets in delay-lines. Lab Chip 9, 1344–1348 (2009).

[144] Beneyton, T. et al. High-throughput screening of filamentous fungi using nanoliter-range droplet-based microfluidics. Sci. Rep. 6, 27223 (2016).

[145] Beneyton, T. et al. Droplet-based microfluidic high-throughput screening of heterologous enzymes secreted by the yeast Yarrowia lipolytica. Microb. Cell Fact. 16, 1–14 (2017).

[146] Kintses, B. et al. Picoliter cell lysate assays in microfluidic droplet compart- ments for directed enzyme evolution. Chem. Biol. 19, 1001–1009 (2012).

[147] Bunzel, H. A., Garrabou, X., Pott, M. & Hilvert, D. Speeding up enzyme discovery and engineering with ultrahigh-throughput methods. Curr. Opin. Struct. Biol. 48, 149–156 (2018). Bibliography 120

[148] Mair, P., Gielen, F. & Hollfelder, F. Exploring sequence space in search of functional enzymes using microfluidic droplets. Curr. Opin. Chem. Biol. 37, 137–144 (2017).

[149] Obexer, R. et al. Emergence of a catalytic tetrad during evolution of a highly active artificial aldolase. Nat. Chem. 9, 50–56 (2017).

[150] Althoff, E. A. et al. Robust design and optimization of retroaldol enzymes. Protein Sci. 21, 717–726 (2012).

[151] Giger, L. et al. Evolution of a designed retro-aldolase leads to complete active site remodeling. Nat. Chem. Biol. 9, 494–498 (2013).

[152] Obexer, R., Pott, M., Zeymer, C., Griffiths, A. D. & Hilvert, D. Efficient laboratory evolution of computationally designed enzymes with low starting activities using fluorescence-activated droplet sorting. Protein Eng. Des. Sel. 29, 355–366 (2016).

[153] Beneyton, T., Coldren, F., Baret, J. C., Griffiths, A. D. & Taly, V. CotA laccase: High-throughput manipulation and analysis of recombinant en- zyme libraries expressed in E. coli using droplet-based microfluidics. Ana- lyst 139, 3314–3323 (2014).

[154] Colin, P. Y. et al. Ultrahigh-throughput discovery of promiscuous enzymes by picodroplet functional metagenomics. Nat. Commun. 6 (2015).

[155] Najah, M. et al. Droplet-based microfluidics platform for ultra-high- throughput bioprospecting of cellulolytic microorganisms. Chem. Biol. 21, 1722–1732 (2014).

[156] Ryckelynck, M. et al. Using droplet-based microfluidics to improve the catalytic properties of RNA under multiple-turnover conditions. RNA 1– 12 (2015).

[157] Ostafe, R., Prodanovic, R., Ung, W. L., Weitz, D. A. & Fischer, R. A high-throughput cellulase screening system based on droplet microfluidics. Biomicrofluidics 8, 6–10 (2014). Bibliography 121

[158] Ng, E. X., Miller, M. A., Jing, T. & Chen, C. H. Single cell multiplexed assay for proteolytic activity using droplet microfluidics. Biosens. Bioelec- tron. 81, 408–414 (2016).

[159] Price, A. K., MacConnell, A. B. & Paegel, B. M. HνSABR: Photochemical Dose-Response Bead Screening in Droplets. Anal. Chem. 88, 2904–2911 (2016).

[160] Larsen, A. C. et al. A general strategy for expanding polymerase function by droplet microfluidics. Nat. Commun. 7, 11235 (2016).

[161] Fenneteau, J., Chauvin, D., Griffiths, A., Nizak, C. & Cossy, J. Synthesis of New Hydrophilic Rhodamine Based Enzymatic Substrates Compatible With Droplet-Based Microfluidic Assays. Chem. Commun. 53, 5437–5440 (2017).

[162] Gielen, F. et al. Ultrahigh-throughput-directed enzyme evolution by absorbance-activated droplet sorting (AADS). Proc. Natl. Acad. Sci. U. S. A. 113, E7383–E7389 (2016).

[163] Sciambi, A. & Abate, A. R. Accurate microfluidic sorting of droplets at 30 kHz. Lab Chip 15, 47–51 (2015).

[164] Yang, T., Stavrakis, S. & DeMello, A. A High-Sensitivity, Integrated Ab- sorbance and Fluorescence Detection Scheme for Probing Picoliter-Volume Droplets in Segmented Flows. Anal. Chem. 89, 12880–12887 (2017).

[165] Wang, X. et al. Raman-activated droplet sorting (RADS) for label-free high-throughput screening of microalgal single-cells. Anal. Chem. 89, 12569–12577 (2017).

[166] Holland-Moritz, D. A. et al. Mass Activated Droplet Sorting (MADS) Enables High-Throughput Screening of Enzymatic Reactions at Nanoliter Scale. Angew. Chem. Int. Ed. 59, 4470–4477 (2020).

[167] Isozaki, A. et al. A practical guide to intelligent image-activated cell sorting. Nat. Protoc. 14, 2370–2415 (2019). Bibliography 122

[168] Anagnostidis, V. et al. Deep learning guided image-based droplet sorting for on-demand selection and analysis of single cells and 3D cell cultures. Lab Chip 20, 889–900 (2020).

[169] Gielen, F. et al. Quantitative affinity determination by fluorescence anisotropy measurements of individual nanoliter droplets. Anal. Chem. 89, 1092–1101 (2017).

[170] Maceiczyk, R. M., Hess, D., Chiu, F. W., Stavrakis, S. & DeMello, A. J. Differential detection photothermal spectroscopy: Towards ultra-fast and sensitive label-free detection in picoliter & femtoliter droplets. Lab Chip 17, 3654–3663 (2017).

[171] Cecchini, M. P. et al. Ultrafast surface enhanced resonance raman scat- tering detection in droplet-based microfluidic systems. Anal. Chem. 83, 3076–3081 (2011).

[172] Macdonald, D. S. et al. Engineered Artificial Carboligases Facilitate Re- gioselective Preparation of Enantioenriched Aldol Adducts. J. Am. Chem. Soc. 142, 10250–10254 (2020).

[173] Dressler, O. J., Casadevall i Solvas, X. & DeMello, A. J. Chemical and biological dynamics using droplet-based microfluidics. Annu. Rev. Anal. Chem. 10, 1–24 (2017).

[174] Lonsdale, R., Ranaghan, K. E. & Mulholland, A. J. Computational enzy- mology. Chem. Commun. 46, 2354–2372 (2010).

[175] Privett, H. K. et al. Iterative approach to computational enzyme design. Proc. Natl. Acad. Sci. U. S. A. 109, 3790–3795 (2012).

[176] Reiher, M., Wiebe, N., Svore, K. M., Wecker, D. & Troyer, M. Elucidating reaction mechanisms on quantum computers. Proc. Natl. Acad. Sci. U. S. A. 114, 7555–7560 (2017).

[177] Risso, V. A. et al. Enhancing a de novo enzyme activity by computationally- focused ultra-low-throughput screening. Chem. Sci. 11, 6134–6148 (2020). Bibliography 123

[178] Bender, B. J. et al. Protocols for Molecular Modeling with Rosetta3 and RosettaScripts. Biochemistry 55, 4748–4763 (2016).

[179] Rohl, C. A., Strauss, C. E., Chivian, D. & Baker, D. Modeling Structurally Variable Regions in Homologous Proteins with Rosetta. Proteins 55, 656– 677 (2004).

[180] Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–8 (2003).

[181] Tantillo, D. J., Chen, J. & Houk, K. N. Theozymes and compuzymes: Theoretical models for biological catalysis. Curr. Opin. Chem. Biol. 2, 743–750 (1998).

[182] Bolon, D. N. & Mayo, S. L. Enzyme-like proteins by computational design. Proc. Natl. Acad. Sci. U. S. A. 98, 14274–9 (2001).

[183] Zanghellini, A. et al. New algorithms and an in silico benchmark for computational enzyme design. Protein Sci. 15, 2785–2794 (2006).

[184] Malisi, C., Kohlbacher, O. & Höcker, B. Automated scaffold selection for enzyme design. Proteins 77, 74–83 (2009).

[185] Lassila, J. K., Privett, H. K., Allen, B. D. & Mayo, S. L. Combinatorial methods for small-molecule placement in computational enzyme design. Proc. Natl. Acad. Sci. U. S. A. 103, 16710–16715 (2006).

[186] Kiss, G., Röthlisberger, D., Baker, D. & Houk, K. N. Evaluation and ranking of enzyme designs. Protein Sci. 19, 1760–1773 (2010).

[187] Jiang, L. et al. De novo computational design of retro-aldol enzymes. Science 319, 1387–1391 (2008).

[188] Richter, F. et al. Computational design of catalytic dyads and oxyanion holes for ester hydrolysis. J. Am. Chem. Soc. 134, 16197–206 (2012).

[189] Siegel, J. B. et al. Computational design of an enzyme catalyst for a stere- oselective bimolecular Diels-Alder reaction. Science 329, 309–313 (2010). Bibliography 124

[190] Kipnis, Y. & Baker, D. Comparison of designed and randomly generated catalysts for simple chemical reactions. Protein Sci. 21, 1388–1395 (2012).

[191] Wang, L. et al. Structural analyses of covalent enzyme-substrate analog complexes reveal strengths and limitations of de novo enzyme design. J. Mol. Biol. 415, 615–625 (2012).

[192] Khersonsky, O. et al. Bridging the gaps in design methodologies by evo- lutionary optimization of the stability and proficiency of designed Kemp eliminase KE59. Proc. Natl. Acad. Sci. U. S. A. 109, 10358–10363 (2012).

[193] Gordon, S. R. et al. Computational design of an α-gliadin peptidase. J. Am. Chem. Soc. 134, 20513–20520 (2012).

[194] Grisewood, M. J. et al. OptZyme: Computational Enzyme Redesign Using Transition State Analogues. PLoS One 8, e75358 (2013).

[195] Khare, S. D. et al. Computational redesign of a mononuclear zinc met- alloenzyme for organophosphate hydrolysis. Nat. Chem. Biol. 8, 294–300 (2012).

[196] Liu, L., Murphy, P., Baker, D. & Lutz, S. Computational design of orthog- onal nucleoside kinases. Chem. Commun. 46, 8803–8805 (2010).

[197] Kong, X. D. et al. Engineering of an epoxide hydrolase for efficient biores- olution of bulky pharmaco substrates. Proc. Natl. Acad. Sci. U. S. A. 111, 15717–15722 (2014).

[198] Kong, X. D., Ma, Q., Zhou, J., Zeng, B. B. & Xu, J. H. A smart library of epoxide hydrolase variants and the top hits for synthesis of (S)-β-blocker precursors. Angew. Chem. Int. Ed. 53, 6641–6644 (2014).

[199] Wijma, H. J. & Janssen, D. B. Computational design gains momentum in enzyme catalysis engineering. FEBS J., 280, 2948–2960 (2013).

[200] Baker, D. An exciting but challenging road ahead for computational enzyme design. Protein Sci. 19, 1817–1819 (2010). Bibliography 125

[201] Kries, H., Blomberg, R. & Hilvert, D. De novo enzymes by computational design. Curr. Opin. Chem. Biol. 17, 221–228 (2013).

[202] Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).

[203] Stern, A. et al. Selecton 2007: Advanced models for detecting positive and purifying selection using a Bayesian inference approach. Nucleic Acids Res. 35, W506–W511 (2007).

[204] Engelen, S., Trojan, L. A., Sacquin-Mora, S., Lavery, R. & Carbone, A. Joint evolutionary trees: A large-scale method to predict protein interfaces based on sequence sampling. PLoS Comput. Biol. 5, 1000267 (2009).

[205] Sumbalova, L., Stourac, J., Martinek, T., Bednar, D. & Damborsky, J. HotSpot Wizard 3.0: Web server for automated design of mutations and smart libraries based on sequence input information. Nucleic Acids Res. 46, W356–W362 (2018).

[206] Halabi, N., Rivoire, O., Leibler, S. & Ranganathan, R. Protein Sectors: Evolutionary Units of Three-Dimensional Structure. Cell 138, 774–786 (2009).

[207] Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).

[208] Figliuzzi, M., Jacquier, H., Schug, A., Tenaillon, O. & Weigt, M. Coevo- lutionary landscape inference and the context-dependence of mutations in beta-lactamase tem-1. Mol. Biol. Evol. 33, 268–280 (2016).

[209] Salinas, V. H. & Ranganathan, R. Coevolution-based inference of amino acid interactions underlying protein function. Elife 7, 7:e34300 (2018).

[210] Russ, W. P., Lowery, D. M., Mishra, P., Yaffe, M. B. & Ranganathan, R. Natural-like function in artificial WW domains. Nature 437, 579–583 (2005). Bibliography 126

[211] Socolich, M. et al. Evolutionary information for specifying a protein fold. Nature 437, 512–518 (2005).

[212] Russ, W. P. et al. An evolution-based model for designing chorismate mutase enzymes. Science 369, 440–445 (2020).

[213] Fox, R. J. et al. Improving catalytic function by ProSAR-driven enzyme evolution. Nat. Biotechnol. 25, 338–344 (2007).

[214] Parkinson, J., Hard, R., Ainsworth, R. I., Li, N. & Wang, W. Engineering a Histone Reader Protein by Combining Directed Evolution, Sequencing, and Neural Network Based Ordinal Regression. J. Chem. Inf. Model. 60, 3992–4004 (2020).

[215] Wang, J., Cao, H., Zhang, J. Z. & Qi, Y. Computational Protein Design with Deep Learning Neural Networks. Sci. Rep. 8, 8:6349 (2018).

[216] Wu, Z., Kan, S. B. J., Lewis, R. D., Wittmann, B. J. & Arnold, F. H. Machine learning-assisted directed protein evolution with combinatorial libraries. Proc. Natl. Acad. Sci. 116, 8852–8858 (2019).

[217] Alley, E. C., Khimulya, G., Biswas, S., AlQuraishi, M. & Church, G. M. Unified rational protein engineering with sequence-based deep representa- tion learning. Nat. Methods 16, 1315–1322 (2019).

[218] Wu, Z. et al. Signal Peptides Generated by Attention-Based Neural Net- works. ACS Synth. Biol. 9, 2154–2161 (2020).

[219] Banko, M. & Brill, E. Scaling to very very large corpora for natural lan- guage disambiguation. Proc. 39th Annu. Meet. Assoc. Comput. Linguist., 26–33 (2001).

[220] Romero, P. A., Tran, T. M. & Abate, A. R. Dissecting enzyme function with microfluidic-based deep mutational scanning. Proc. Natl. Acad. Sci. U. S. A. 112, 7159–7164 (2015).

[221] Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014). Bibliography 127

[222] Mehlhoff, J. D. & Ostermeier, M. Biological fitness landscapes by deep mutational scanning. Methods Enzymol., 643, 203–224 (2020).

[223] Constable, D. J. C. et al. Key green chemistry research areas—a perspective from pharmaceutical manufacturers. Green Chem. 9, 411–420 (2007).

[224] Richter, F., Leaver-Fay, A., Khare, S. D., Bjelic, S. & Baker, D. De novo enzyme design using Rosetta3. PLoS One 6, e19230 (2011).

[225] Siedhoff, N. E., Schwaneberg, U. & Davari, M. D. Machine learning-assisted enzyme engineering. Methods Enzymol., 643, 281–315 (2020).

[226] Blomberg, R. et al. Precision is essential for efficient catalysis in an evolved Kemp eliminase. Nature 503, 418–21 (2013).

[227] Der, B. S., Edwards, D. R. & Kuhlman, B. Catalysis by a de novo zinc- mediated protein interface: implications for natural enzyme evolution and rational enzyme engineering. Biochemistry 51, 3933–40 (2012).

[228] Der, B. S. et al. Metal-mediated affinity and orientation specificity in a computationally designed protein homodimer. J. Am. Chem. Soc. 134, 375–85 (2012).

[229] Studer, S. et al. Evolution of a highly active and enantiospecific metalloen- zyme from short peptides. Science 362, 1285–1288 (2018).

[230] Hönig, M., Sondermann, P., Turner, N. J. & Carreira, E. M. Enantiose- lective chemo- and biocatalysis: Partners in retrosynthesis. Angew. Chem. Int. Ed. 56, 8942–8973 (2017).

[231] Li, T. et al. Efficient, chemoenzymatic process for manufacture of the boceprevir bicyclic [3.1.0]proline intermediate based on amine oxidase- catalyzed desymmetrization. J. Am. Chem. Soc. 134, 6467–6472 (2012).

[232] Alexeeva, M., Enright, A., Dawson, M. J., Mahmoudian, M. & Turner, N. J. Deracemization of α-methylbenzylamine using an enzyme obtained by in vitro evolution. Angew. Chem. Int. Ed. 41, 3177–3180 (2002). Bibliography 128

[233] Yao, P. et al. Biocatalytic Route to Chiral 2-Substituted-1,2,3,4- Tetrahydroquinolines Using Cyclohexylamine Oxidase Muteins. ACS Catal. 8, 1648–1652 (2018).

[234] Ghislieri, D., Houghton, D., Green, A. P., Willies, S. C. & Turner, N. J. Monoamine oxidase (MAO-N) catalyzed deracemization of tetrahydro- beta- carbolines: Substrate dependent switch in enantioselectivity. ACS Catal. 3, 2869–2872 (2013).

[235] Dunsmore, C. J., Carr, R., Fleming, T. & Turner, N. J. A chemo-enzymatic route to enantiomerically pure cyclic tertiary amines. J. Am. Chem. Soc. 128, 2224–2225 (2006).

[236] Rowles, I., Malone, K. J., Etchells, L. L., Willies, S. C. & Turner, N. J. Directed Evolution of the Enzyme Monoamine Oxidase (MAO-N): Highly Efficient Chemo-enzymatic Deracemisation of the Alkaloid (+/-)-Crispine A. ChemCatChem 4, 1259–1261 (2012).

[237] Leisch, H., Grosse, S., Iwaki, H., Hasegawa, Y. & Lau, P. C. Cyclo- hexylamine oxidase as a useful biocatalyst for the kinetic resolution and dereacemization of amines. Can. J. Chem. 90, 39–45 (2012).

[238] Li, G. et al. Deracemization of 2-methyl-1,2,3,4-tetrahydroquinoline using mutant cyclohexylamine oxidase obtained by iterative saturation mutage- nesis. ACS Catal. 4, 903–908 (2014).

[239] Li, G. et al. Substrate profiling of cyclohexylamine oxidase and its mutants reveals new biocatalytic potential in deracemization of racemic amines. Appl. Microbiol. Biotechnol. 98, 1681–1689 (2014).

[240] Holt, A. & Palcic, M. M. A peroxidase-coupled continuous absorbance plate-reader assay for flavin monoamine oxidases, copper-containing amine oxidases and related enzymes. Nat. Protoc. 1, 2498–2505 (2006).

[241] Clausell-Tormos, J. et al. Droplet-Based Microfluidic Platforms for the Encapsulation and Screening of Mammalian Cells and Multicellular Or- ganisms. Chem. Biol. 15, 427–437 (2008). Bibliography 129

[242] Mirza, I. A. et al. Structural Analysis of a Novel Cyclohexylamine Oxidase from Brevibacterium oxydans IH-35A. PLoS One 8 (2013).

[243] Scott, J. D. & Williams, R. M. Chemistry and biology of the tetrahydroiso- quinoline antitumor antibiotics. Chem. Rev. 102, 1669–1730 (2002).

[244] Li, G. et al. New recombinant cyclohexylamine oxidase variants for de- racemization of secondary amines by orthogonally assaying designed mu- tants with structurally diverse substrates. Sci. Rep. 6, 24973 (2016).

[245] Li, G. et al. Simultaneous engineering of an enzyme’s entrance tunnel and active site: The case of monoamine oxidase MAO-N. Chem. Sci. 8, 4093– 4099 (2017).

[246] Peng, Y. et al. Engineering chiral porous metal-organic frameworks for enantioselective adsorption and separation. Nat. Commun. 5, e4406 (2014).

[247] Ma, F. et al. Efficient molecular evolution to generate enantioselective enzymes using a dual-channel microfluidic droplet screening platform. Nat. Commun. 9, 1030 (2018).

[248] Wang, Y., Lan, D., Durrani, R. & Hollmann, F. Peroxygenases en route to becoming dream catalysts. What are the opportunities and challenges? Curr. Opin. Chem. Biol. 37, 1–9 (2017).

[249] Pickl, M. et al. Kinetic resolution of sec-thiols by enantioselective oxida- tion with rationally engineered 5-(hydroxymethyl)furfural oxidase. Angew. Chem. Int. Ed. 57, 2864–2868 (2018).

[250] Escalettes, F. & Turner, N. J. Directed evolution of galactose oxidase: Generation of enantioselective secondary alcohol oxidases. ChemBioChem 9, 857–860 (2008).

[251] Studer, S. From Structure to Function - Evolving De Novo Metalloenzymes. Ph.D. thesis, ETH Zürich (2018).

[252] Jencks, W. Catalysis in chemistry and enzymology (McGraw-Hill, New York, 1969). Bibliography 130

[253] Schiffer, M. & Edmundson, A. B. Use of Helical Wheels to Represent the Structures of Proteins and to Identify Segments with Helical Potential. Biophys. J. 7, 121–135 (1967).

[254] Donnelly, A. E., Murphy, G. S., Digianantonio, K. M. & Hecht, M. H. A de novo enzyme catalyzes a life-sustaining reaction in Escherichia coli. Nat. Chem. Biol. 14, 253–255 (2018).

[255] Zastrow, M. L. & Pecoraro, V. L. Designing hydrolytic zinc metalloen- zymes. Biochemistry 53, 957–78 (2014).

[256] Zastrow, M. L., Peacock, A. F. A., Stuckey, J. a. & Pecoraro, V. V. L. Hy- drolytic catalysis and structural stabilization in a designed . Nat. Chem. 4, 118–123 (2012).

[257] Song, W. J. & Tezcan, F. A. A designed supramolecular protein assembly with in vivo enzymatic activity. Science 346, 1525–1528 (2014).

[258] Lan, D. M. et al. A novel cold-active lipase from Candida albicans: Cloning, expression and characterization of the recombinant enzyme. Int. J. Mol. Sci. 12, 3950–3965 (2011).

[259] Choi, Y. H., Park, Y. J., Yoon, S. J. & Lee, H. B. Purification and char- acterization of a new inducible thermostable extracellular lipolytic enzyme from the thermoacidophilic archaeon Sulfolobus solfataricus P1. J. Mol. Catal. B Enzym. 124, 11–19 (2016).

[260] Verpoorte, J. A., Mehta, S. & Edsall, J. T. Esterase activities of human carbonic anhydrases B and C. J. Biol. Chem. 242, 4221–4229 (1967).

[261] Gamper, M., Hilvert, D. & Kast, P. Probing the role of the C-terminus of Bacillus subtilis chorismate mutase by a novel random protein-termination strategy. Biochemistry 39, 14087–14094 (2000).

[262] Granieri, L., Baret, J. C., Griffiths, A. D. & Merten, C. A. High- Throughput Screening of Enzymes by Retroviral Display Using Droplet- Based Microfluidics. Chem. Biol. 17, 229–235 (2010). Bibliography 131

[263] Christianson, D. W. & Fierke, C. A. Carbonic Anhydrase: Evolution of the Zinc Binding Site by Nature and by Design. Acc. Chem. Res. 29, 331–339 (1996).

[264] Basler, S. et al. Efficient Lewis acid catalysis of an abiological reaction in a de novo protein scaffold. Nat. Chem. 13, 231–235 (2021).

[265] Vilím, J., Knaus, T. & Mutti, F. G. Catalytic Promiscuity of Galactose Oxidase: A Mild Synthesis of Nitriles from Alcohols, Air, and Ammonia. Angew. Chem. Int. Ed. 57, 14240–14244 (2018).

[266] Weissenborn, M. J. et al. Whole-cell microtiter plate screening assay for terminal hydroxylation of fatty acids by P450s. Chem. Commun. 52, 6158– 6161 (2016).

[267] Wilson, M. E. & Whitesides, G. M. Conversion of a Protein to a Homo- geneous Asymmetric Hydrogenation Catalyst by Site-Specific Modification with a Diphosphinerhodium(I) Moiety. J. Am. Chem. Soc. 100, 306–307 (1978).

[268] Schwizer, F. et al. Artificial Metalloenzymes: Reaction Scope and Opti- mization Strategies. Chem. Rev. 118, 142–231 (2018).

[269] Vidal, C., Tomás-Gamasa, M., Destito, P., López, F. & Mascareñas, J. L. Concurrent and orthogonal gold(I) and ruthenium(II) catalysis inside living cells. Nat. Commun. 9, 1–9 (2018).

[270] van Vliet, L. D., Colin, P. Y. & Hollfelder, F. Bioinspired geno- type–phenotype linkages: Mimicking cellular compartmentalization for the engineering of functional proteins. Interface Focus 5, e20150035 (2015).

[271] Collins, D. J., Neild, A., DeMello, A., Liu, A. Q. & Ai, Y. The Poisson distribution and beyond: Methods for microfluidic droplet production and single cell encapsulation. Lab Chip 15, 3439–3459 (2015).

[272] Zheng, G. X. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, e114049 (2017). Bibliography 132

[273] Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

[274] Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).

[275] Azuma, Y., Zschoche, R., Tinzl, M. & Hilvert, D. Quantitative packaging of active enzymes into a protein cage. Angew. Chem. Int. Ed. 55, 1531–1534 (2016).

[276] Roderer, K. et al. Functional mapping of protein-protein interactions in an enzyme complex by directed evolution. PLoS One 9, e116234 (2014).

[277] Bringer, M. R., Gerdts, C. J., Song, H., Tice, J. D. & Ismagilov, R. F. Microfluidic systems for that rely on chaotic mixing in droplets. Philos. Trans. R. Soc. A 362, 1087–1104 (2004).

[278] Panke, S., Meyer, A., Huber, C. M., Witholt, B. & Wubbolts, M. G. An alkane-responsive expression system for the production of fine chemicals. Appl. Environ. Microbiol. 65, 2324–2332 (1999).

[279] Lewis, J. A. & Escalante-Semerena, J. C. The FAD-dependent tricarbally- late dehydrogenase (TcuA) enzyme of Salmonella enterica converts tricar- ballylate into cis-aconitate. J. Bacteriol. 188, 5479–5486 (2006).

[280] Vilar, S., Cozza, G. & Moro, S. and the Molecu- lar Operating Environment (MOE): Application of QSAR and Molecular Docking to Drug Discovery. Curr. Top. Med. Chem. 8, 1555–1572 (2008). Appendix A Supplementary figures

A.1 CHAO discovery

Figure A.1: Sequence alignment of CHAO PT.1 with wild-type CHAO. Five muta- tions were found (highlighted in red).

A-1 Supplementary figures A-2

Figure A.2: Analysis of the enrichment over three sorts. a Data of the initial velocity normalized to wild-type CHAO measured on a cleared lysate-based microtiter plate assay. For the unsorted library and the first sort 87 clones were analyzed, for the second and third sort 174 were analyzed. For the purpose of equal representation in the graph, the same number of variants (87) were randomly sampled for each sort. In the unsorted library, only one variant displays an activity that is significantly higher (70-fold) than that of the wild-type, whereas after the third sort the fraction of similarly active clones increased 26-fold. b The same plates were left to react three days at 4 ◦C to estimate the approximate turnover of each enzyme variant. It is apparent by the colored dye formed in the coupled enzyme assay, that not only were more active variants enriched during sorting but more productive enzymes were also identified. Supplementary figures A-3 A.2 Kinetic characterization of CHAO

Figure A.3: Three separately purified batches of wild-type CHAO and the PT.1 variant were assayed with (R)-1-phenyl-1,2,3,4-tetrahydroisoquinoline. The error-bars represent the s.d. of the three independent measurements. Supplementary figures A-4 A.3 Plasmid maps

Figure A.4: Plasmidmap of pACYC-CHAO [275] used to generate other CHAO constructs. Supplementary figures A-5

Figure A.5: Plasmidmap of pKTNTET-6His-CHAO_Kan used for expression and screening. It features a T7 promoter, as well as a tetracycline inducible promoter. Supplementary figures A-6

Figure A.6: Plasmidmap of pKTNTET-pelB-MID1sc10 tested for screening. It fea- tures a T7 promoter, as well as a tetracycline inducible promoter. Supplementary figures A-7

Figure A.7: Plasmidmap of pMG209-pelB-MID1sc10 used for screening. It features a T7 promoter and kanamycin resistance. Supplementary figures A-8

Figure A.8: Plasmidmap of pQE-MBP-MID1sc10 used for expression. It features a T5 promoter and ampicillin resistance. Supplementary figures A-9

Figure A.9: Plasmidmap of pQE-MID1sc10 a candidate for microfluidic screening. It features a T5 promoter and ampicillin resistance. The MBP fusion protein was removed to express MID1sc10 without a tag. Appendix B Supplementary Tables

B.1 DYT and BYT codon table

Table B.1: DYT codon table (D = G + A + T , Y = C + T).

Codon Amino acid # of codons GCU Ala 1 UUU Phe 1 AUU Ile 1 UCU Ser 1 ACU Thr 1 GUU Val 1

Table B.2: BYT codon table (B = G + C + T , Y = C + T).

Codon Amino acid # of codons GCU Ala 1 UUU Phe 1 CUU Leu 1 UCU Ser 1 CCU Pro 1 GUU Val 1

B-1 Supplementary Tables B-2 B.2 Screening hits

Table B.3: Screening hits. Variants found in the library screen are ordered by their activity in a lysate-based plate assay. Residues that are conserved with respect to PT.1 are highlighted in gray. Due to amplification with Taq polymerase between the sorts, many variants have acquired additional mutations.

Sorting Additional Name Activity T198 L199 M226 Y321 F351 L353 F368 P422 round mutations PT.1 3 684 T V T S F I F S PT.1’ 3 512 T V T S F I F S PT.1” 3 440 T V T S F I F S 1 3 330 T V T S F V F S E476K R13G, I35V, 2 2 150 V F I S F I I S V334A, I366V 3 2 123 V I T S F V F S S172G 4 2 99 V F T T F I I S 5 2 97 V V T S V V F S A314G 6 2 85 T F I T F V V A A418T 7 2 63 S F I T F V V S L294F 8 3 60 V F A T I F I S I222M, A341T 9 2 53 V F T T F I V A 10 1 30 T F T T F V T S WT 1 T L M Y F L F P Supplementary Tables B-3 B.3 Evolutionary analysis of CHAO

Table B.4: A blast search was performed for CHAO, and the active site residues for the top 15 homologs possessing >50% sequence identity are listed. Dots (.) indicate that the residue is conserved relative to wild-type CHAO; dashes (-) indicate a deletion of the residue.BO, Brevibacterium oxydans, MO Microbacterium oxydans.

F88 T198 L199 M226 Q233 Y321 F351 L353 F368 P422 Y459 BO-CHAO ......

MO-CHAO ......

PT.1 . . V T . S . I . S .

1 . . F ......

2 . A I ......

3 . A I ......

4 . A I L ......

5 W I . I . L W F . . .

6 W I . I . L W F . . .

7 W I . I . L W F . . .

8 . C V L . M M F . . .

9 Y S V L . M L F . . .

10 Y A V L . . A F . . .

11 . I V Q N ......

12 . I I Q N ......

13 L I I Q N E L . . . .

14 . G . - N M - I . . F

15 . G . - N M - I . . F Curriculum vitae

B-4 AARON DEBON

Hirtenhofstrasse 36, 6005 Luzern, Switzerland +41 79 451 91 72 [email protected]  EDUCATION PhD candidate July 2016 - Present Advisor: Prof. Donald Hilvert ETH Z¨urich, Switzerland Laboratory of Organic Chemistry

Master of Science in Interdisciplinary Sciences Sept. 2013 - Dec. 2015 ETH Z¨urich, Switzerland Department of Chemistry and Applied Biosciences

Bachelor of Science in Interdisciplinary Sciences Sept. 2010 - Sept. 2013 Major: Chemistry and Biophysics ETH Z¨urich, Switzerland Department of Chemistry and Applied Biosciences

Preparatory studies in Jazz guitar Aug. 2009 - July 2010 Lucerne University of Applied Sciences, Switzerland

EXPERIENCE Ultrahigh-throughput screening of oxidase enzymes, PhD project The project aimed at designing a label-free assay based on droplet microfluidic technology for the screening of millions of oxidase variants. The method enabled the rapid creation of a new industrially valuable biocatalyst, which was recognized in the news and views of Nature Catalysis (Fessner, W. Nat Catal 2, 738-739, 2019). We’re using this method now to modify multiple oxidases to synthesize compounds that are difficult to access chemically.

Directed evolution of a small designed metalloesterase, Masters / PhD project The creation of a proficient artificial metalloenzyme retraced the steps of how enzymes may have emerged from small peptides. For this project I kinetically characterized parts of the evolutionary trajectory and contributed fluorimetric and mass spectrometric biophysical analyses. Currently, we’re probing this enzyme for further evolvability and novel reactivities using ultra-high throughput screening.

Droplet confinement and leakage, Bachelor thesis The isolation and storage of biological samples in water-in-oil droplets is key to many cutting edge technologies. In this work I contributed algorithms to analyze microscopy images and performed mi- crofluidics experiments to investigate the mechanism and mitigation of droplet failure.

Teaching assistant in chemistry My PhD involved teaching in a series of three lectures about biological chemistry, ranging from nucleic acid chemistry and carbohydrate synthesis to enzymology. SKILLS

Molecular biology Cloning, strain selection, gene library generation, DNA sequence analysis Biochemistry Protein purification (affinity, size-exclusion, and ion-exchange chromatography) and characterization (mass spectrometric & spectroscopic), kinetic analysis, assay development, high-throughput screening Microfluidics Soft lithography, fluorescence-activated droplet sorting Chemistry Biocatalytic transformations and workup, chiral HPLC, NMR, LC-MS Software & tools Python, R, CLC workbench Languages Fluent in german and english, basic knowledge of french

PUBLICATIONS Journal Debon, A., Pott, M., Obexer, R., Green, A. P., Friedrich, L., Griffiths, A. D., & Hilvert, D. (2019). Ultrahigh-throughput screening enables efficient single-round oxidase remodelling. Nature Catalysis, 2 (9), 740-747. Studer, S., Hansen, D. A., Pianowski, Z. L., Mittl, P. R. E., Debon, A., Guffy, S. L., Der, B. S., Kuhlman, B., & Hilvert, D. (2018). Evolution of a highly active and enantiospecific metalloenzyme from short peptides. Science, 362 (6420), 1285-1288. Debon, A. P., Wootton, R. C. R., & Elvira, K. S. (2015). Droplet confinement and leakage: Causes, underlying effects, and amelioration strategies. Biomicrofluidics, 9 (2).

Conference papers Debon, A., Wootton, R. C. R., & Elvira, K. S. (2014). Droplet failure modes: Causes, underlying effects and amelioration strategies. 18th International Conference on Miniaturized Systems for Chemistry and Life Sciences, MicroTAS 2014, 1208-1210.

EXTRA-CURRICULAR ACTIVITIES Avid climber and backcountry skier Fascinated with music, especially playing the guitar Several years of experience as a scouts leader