Quick viewing(Text Mode)

Engineering Enzymes Towards Biotherapeutic Applications Using Ancestral Sequence Reconstruction

Engineering Enzymes Towards Biotherapeutic Applications Using Ancestral Sequence Reconstruction

Engineering towards biotherapeutic applications using ancestral sequence reconstruction

Natalie Hendrikse

Doctoral Thesis in Chemical Science and Engineering KTH Royal Institute of Technology Stockholm, Sweden 2020 © Natalie Hendrikse

TRITA-CBH-FOU-2020:33 ISBN 978-91-7873-629-4

Printed by: Universitetsservice US-AB, Stockholm, Sweden 2020 I propose a toast to evolution - may we use it well! Frances H. Arnold, Stockholm 2018

Abstract

Enzymes are versatile biocatalysts that fulfill essential functions in all forms of life and, therefore, play an important role in health and disease. One specific application of enzymes in life science is their use as biopharmaceuticals, which typically benefits from high catalytic activity and stability. Increased stability and activity are both de- sirable properties for biopharmaceuticals as they are directly related to dosage, which in turn affects administration time, cost of production and potency of a drug.The aim of the work presented in this thesis is to enhance the therapeutic potential of enzymes by means of engineering, in particular using ancestral sequence recon- struction. In Paper I, we established the utility of this method in a model system and obtained ancestral terpene cyclases with increased activity, stability and scope. In Paper II, we described the successful crystallization of the most stable ancestral terpene cyclase, which allowed for rational design of substrate specificity. Fi- nally, we applied the method to two therapeutically relevant enzyme families associated with rare metabolic disorders. We obtained ancestral phenylalanine/ ammonia- with substantially enhanced thermostability and long-term stability in Paper III and ancestral iduronate-2-sulfatases with increased activity in Paper IV. In summary, the results presented herein highlight the potential of ancestral sequence reconstruction as a method to obtain stable enzyme scaffolds for further engineering and to enhance therapeutic properties of enzymes.

Keywords: enzyme engineering, enzyme evolution, ancestral sequence reconstruc- tion, therapeutic enzymes, terpene cyclases, ammonia-lyases, sulfatases, lysosomal en- zymes

i Sammanfattning

Enzymer är mångsidiga biokatalysatorer som uppfyller väsentliga funktioner i alla livs- former och därför är viktiga när det kommer till hälsa och sjukdom. En specifik tillämp- ning av enzymer inom hälsovetenskap är deras användning som bio-baserade läkemedel, vilket vanligtvis drar nytta av hög katalytisk aktivitet och stabilitet. Ökad stabilitet och aktivitet är båda önskvärda egenskaper för biofarmaceutiska läkemedel eftersom de är direkt relaterade till dosering, vilket i sin tur påverkar administreringstid, pro- duktionskostnad och terapeutisk effekt. Syftet med arbetet som presenteras i denna avhandling är att förbättra den terapeutiska potentialen hos enzymer med hjälp av en- zymteknik, särskilt genom rekonstruktion av förfädersenzymer. I Artikel I testade vi användbarheten av denna metod i ett modellsystem och erhöll ancestrala terpencyklaser med ökad aktivitet, stabilitet och substratacceptans. I Artikel II beskrev vi kristallis- eringen av det mest stabila förfädersenzymet, vilket möjliggjorde rationell modifiering av substratspecificitet. Till slut använde vi metoden på två terapeutiskt relevanta en- zymfamiljer som båda är associerade med sällsynta ämnesomsättningssjukdomar. Vi er- höll ancestrala fenylalanin/tyrosin ammonia-lyaser med väsentligt förbättrad termosta- bilitet och långvarig stabilitet i Artikel III och ancestrala iduronat-2-sulfataser med ökad aktivitet i Artikel IV. Sammanfattningsvis belyser resultaten i denna avhandling potentialen av metoden för att erhålla stabila modellenzymer för ytterligare modifiering och för att förbättra terapeutiska egenskaper hos enzymer.

ii Popular scientific summary

Enzymes are biological catalysts that accelerate chemical reactions and thereby enable them to occur at timescales compatible with life. Not only do they perform a wide range of functions in all forms of life – their catalytic power has also been exploited outside of their natural context. One example is the therapeutic application of enzymes, where they are used for activation of pro-drugs, as diagnostic tools, or even as the ther- apeutic agent itself, such as in enzyme replacement therapy. Since enzymes did not naturally evolve towards all desired applications, enzyme engineering is often employed to improve their properties and increase their utility as biocatalysts. In a therapeutic context, such properties could be enhanced catalytic activity or stability, altered sub- strate selectivity, or decreased immunogenicity. The aim of the work presented in this thesis is to explore ancestral sequence reconstruction (ASR) as an enzyme engineering strategy to enhance the therapeutic potential of enzymes, focusing on their application in rare metabolic disorders. ASR is a bioinformatics method, in which sequences of ancestral proteins—in this case enzymes—are statistically reconstructed based on avail- able sequences of existing descendants of the ancestors. This process may be imagined as reconstructing a shared common ancestral language based on the words of two or more languages that still exist today, but in the case of proteins this language is made up of 20 letters that code for amino acids. Reconstructed ancestral enzymes are often more stable than their descendants and can show promiscuous catalytic behavior, which is sometimes attributed to their exposure to a warmer environment as well as the no- tion of generalist ancestral enzymes that evolved to be more specific over the course of evolution. Based on these presumptions, we hypothesized that ASR could be a viable strategy to obtain enzymes with enhanced therapeutic properties. Especially stability and activity are important from a therapeutic perspective, as they are directly related to the required dosage to achieve a desired effect.

In order to establish the use of ASR, we selected terpene cyclases as model enzymes for method development in Paper I. We reconstructed several ancestors of a bacterial diterpene cyclase and obtained putative ancestral enzymes with increased solubility, stability, and activity, as well as an extended substrate scope. Moreover, the ancestral enzyme with the highest thermostability displayed an increased optimum temperature for catalysis by 10 °C, which can represent a desirable feature for industrial applications. As protein crystallization benefits from high stability and solubility, we attempted to obtain an X-ray crystal structure of the most stable reconstructed ancestor, which is described in Paper II. Structural insights allowed for rational engineering of substrate specificity of the ancestral enzyme, highlighting the utility of ASR to obtain enzyme scaffolds for further engineering. In Paper III, we aimed to apply the method to pos-

iii sible therapeutic enzymes, more specifically to phenylalanine/tyrosine ammonia-lyases (PAL/TALs). The bispecific nature of these enzymes makes them attractive candidates for complementary treatment of patients with hereditary , as these accumu- late both phenylalanine and tyrosine. We reconstructed several ancestral PAL/TALs from fungi and compared them to two modern reference enzymes. Despite decreased activity, we found ancestral variants that displayed substantially increased stability compared to modern PAL/TALs; not only in terms of melting temperature and resis- tance to incubation at high temperature, but also in long-term stability studies. This increased stability could have a positive effect on dosage and circulation, as well as shelf-life and storage of a potential drug. Overall, the results in Paper III strengthen the notion that ASR can be used to obtain stable and robust enzyme scaffolds, which may be further optimized with respect to activity if necessary. Finally, we used ASR on a more complex family of therapeutically relevant enzymes in Paper IV, namely lysosomal sulfatases. The aim of this study was to enhance the therapeutic properties of iduronate-2-sulfatase (IDS), which is currently available as enzyme replacement ther- apy for – a rare lysosomal storage disease. We explored the sequence space of IDS by reconstructing three different ancestors, going back from the modern human enzyme to the last common ancestor of primates and rodents. One ancestral enzyme in particular displayed increased activity compared to human IDS, both in in vitro assays and in ex vivo experiments in patient fibroblasts. Moreover, this variant outperformed the enzyme that is commercially available today. The increased activ- ity of the ancestral enzyme could allow for lower dosage of a potential drug, which in turn decreases its administration time and cost of production. If the dose remains unchanged, however, it may improve the therapeutic effect of the drug and thereby address the current unmet medical need with respect to this disease.

In summary, the work presented in this thesis demonstrates the potential of ASR as an enzyme engineering approach, in particular for therapeutic applications. We have obtained stable, robust scaffolds that enabled determination of three-dimensional struc- tures and engineering of substrate specificity in terpene cyclases, and that possibly al- low lower dosage and prolong shelf-life of biotherapeutic enzymes. Finally, we obtained ancestral enzymes with increased activity compared to available therapeutic enzymes, which could also lead to decreased dosage or improved therapeutic effect. The results in this thesis show that, apart from providing insight into fundamental scientific questions and evolution, ASR is a promising approach that can yield enzyme scaffolds with useful properties that may be further engineered towards a variety of applications.

iv Populärvetenskaplig sammanfattning

Enzymer är biologiska katalysatorer som påskyndar kemiska reaktioner och därmed möjliggör att dessa sker på tidsskalor som är förenliga med liv som vi känner det. Enzymer utför inte bara ett brett spektrum av funktioner i alla livsformer – deras kat- alytiska förmåga har också tillämpats utanför deras naturliga kontext. Ett exempel är tillämpningen av enzymer inom läkemedelsindustrin, där de kan användas för aktivering av så kallade pro-drugs, som diagnostiska verktyg, eller i egenskap av medicinska medlet självt som i enzymersättningsterapi. Eftersom enzymer inte har utvecklats naturligt mot alla önskade tillämpningar används ofta en teknik som kallas för enzyme engineer- ing för att förbättra deras egenskaper och öka deras användbarhet som biokatalysatorer. Ökad katalytisk aktivitet eller stabilitet, modifierad substratselektivitet eller minskad immunogenicitet är exempel på önskvärda egenskaper i ett medicinskt sammanhang. Syftet med arbetet i denna avhandling är att utforska rekonstruktion av ancestrala sekvenser, eller ancestral sequence reconstruction (ASR), som strategi för att förbättra medicinsk potential hos enzymer, med fokus på sällsynta ämnesomsättningssjukdomar. ASR är en metod inom bioinformatiken där sekvenser från förfädersproteiner, i detta fall förfädersenzymer, rekonstrueras baserat på tillgängliga sekvenser från existerande, moderna enzymer med hjälp av statistiska modeller. Denna process kan beskrivas som att rekonstruera ett gemensamt föregångsspråk baserat på ord från två eller fler språk som används idag – ett språk som består av 20 bokstäver som motsvarar aminosyror när det gäller proteiner. Sådana rekonstruerade enzymer är ofta mer stabila än moderna enzymer och kan acceptera flera olika substrat, vilket ibland tillskrivs en varmare miljö förr i tiden såväl som teorin att förfädersenzymer var generalister som utvecklades till att vara mer specifika under evolutionens gång. Baserat på dessa förutsättningar trorvi att ASR-metoden kan vara en framgångsrik strategi för att förbättra terapeutiska egen- skaper hos enzymer. Framför allt stabilitet och aktivitet är viktigt ur ett terapeutiskt perspektiv, eftersom båda är direkt relaterade till doseringen som behövs för att uppnå en önskad effekt.

I Artikel I använde vi oss av terpencyklaser som modellsystem i ett metodutveck- lingsprojekt för att utforska användningen av ASR-metoden. Vi rekonstruerade flera förfäder till ett bakteriell diterpencyklas och erhöll enzymer med ökad aktivitet, lös- lighet och stabilitet, samt utökad substratacceptans. Dessutom visade det enzym med högst termostabilitet en ökad optimal reaktionstemperatur med 10 °C, vilket kan vara önskvärt inom industriell tillämpning. Denna ökade stabilitet i samband med en förbät- trad löslighet hos det ancestrala enzymet inspirerade oss till att försöka ta fram kristall- strukturen, som presenteras i Artikel II. Kunskap om strukturen möjliggjorde sedan rationell modifiering av substratspecificitet hos det ancestrala enzymet, vilket framhäver

v potentialen av ASR-metoden för att ta fram intressanta modellenzymer för vidare mod- ifiering. Syftet med Artikel III var att tillämpa metoden på potentiella terapeutiska enzymer, närmare bestämt på fenylalanin/tyrosin ammonia-lyaser (PAL/TAL). Deras bispecifika karaktär gör dem till lovande kandidater för kompletterande behandling av patienter med tyrosinemi – en medfödd ämnesomsättningssjukdom som karakteriseras av ackumulering av både fenylalanin och tyrosin. Vi rekonstruerade flera förfädersen- zymer till PAL/TAL från svampar och jämförde dem med två moderna enzymer som referens. Trots minskad aktivitet erhöll vi varianter med väsentligt ökad stabilitet jäm- fört med moderna PAL/TAL; detta gällde inte bara smälttemperaturen och motstånd mot inkubation vid hög temperatur utan också långsiktiga stabilitetsstudier. Denna stabilitetsökning kan ha en positiv effekt på dosering och cirkulation i kroppen, samt hållbarhet och lagring av det potentiella läkemedlet. Sammantaget stärker resultaten i Artikel III uppfattningen om att ASR-metoden kan användas för att erhålla stabila och robusta enzymvarianter som kan optimeras med avseende på aktivitet vid behov. Till sist tillämpade vi ASR-metoden på en mer komplex enzymfamilj med terapeutisk relevans i Artikel IV, nämligen lysosomala sulfataser. Syftet med denna studie var att förbättra medicinska egenskaper hos iduronate-2-sulfatas (IDS), som för närvarande är tillgängligt som enzymersättningsterapi för Hunter syndrom – en sällsynt medfödd lysosomal lagringssjukdom. Vi utforskade sekvensutrymmet genom att rekonstruera olika förfäder mellan det moderna humana enzymet och den sista gemensamma för- fadern till primater och gnagare. En enzymvariant i synnerhet visade ökad aktivitet jämfört med humant IDS, både i in vitro analyser och ex vivo experiment i fibroblaster från patienter. Dessutom överträffade denna variant det läkemedelsenzym som finns tillgänglig på marknaden idag. Denna aktivitetsökning skulle kunna möjliggöra en lä- gre dosering av ett potentiellt läkemedel, vilket i sin tur minskar administreringstid och produktionskostnad. Om dosen dock förblir oförändrad, skulle en aktivitetsökning kunna förbättra terapeutiska effekten och därmed uppfylla medicinska behovet som finns kvar med avseende på denna sjukdom.

Sammanfattningsvis belyser arbetet i denna avhandling ASR-metodens potential som strategi inom enzyme engineering, med fokus på medicinsk tillämpning. Vi har tagit fram stabila, robusta enzymer som möjliggjort bestämning av tredimensionella struk- turer samt modifiering av substratselektivitet i terpencyklaser, och som kan leda till lägre dosering eller förlängd hållbarhet som medicinskt medel. Vi erhöll även enzym- varianter med ökad aktivitet jämfört med terapeutiska enzymer som finns tillgängliga på marknaden, vilket kan leda till lägre dosering eller en förbättrad effekt. Resultaten i denna avhandling visar att ASR-metoden är en lovande strategi som, förutom att ge insikt i grundläggande vetenskapliga frågor, kan ge upphov till enzymer med värdefulla egenskaper som kan vidareutvecklas för tillämpning inom olika områden.

vi Thesis defense

This thesis will be defended on September 25th, 2020 at 10:00 in lecture hall F3, Lind- stedtsvägen 26, Stockholm, for the degree of Doctor of Philosophy in Chemical Science and Engineering.

Respondent: Natalie M. Hendrikse, M.Sc. Division of Coating Technology, KTH Royal Institute of Technology, Stockholm, Sweden

Faculty opponent: Prof. Reinhard Sterner Institut für Biophysik und physikalische Biochemie, University of Regensburg, Regensburg, Germany

Evaluation committee: Dr. Claudia Kutter Department of Microbiology, Tumor and Cell biology, Karolinska Institute, Solna, Sweden

Assoc. Prof. Lars Arvestad Department of Mathematics, Stockholm University, Stockholm, Sweden

Prof. Cecilia Emanuelsson Department of Biochemistry and Structural Biology, Lund University, Lund, Sweden

Chairman: Assoc. Prof. Christofer Lendel Division of Applied Physical Chemistry, KTH Royal Institute of Technology, Stockholm, Swe- den

Respondent’s main supervisor: Assoc. Prof. Per-Olof Syrén Division of Coating Technology, KTH Royal Institute of Technology, Stockholm, Sweden

Respondent’s co-supervisor: Dr. Erik Nordling Swedish Orphan Biovitrum AB, Solna, Sweden

Respondent’s co-supervisor: Assoc. Prof. Johan Rockberg Division of Protein Technology, KTH Royal Institute of Technology, Stockholm, Sweden

vii List of appended papers

This thesis is based on the following articles and manuscripts:

I N. M. Hendrikse, G. Charpentier, E. Nordling & P.-O. Syrén (2018). Ancestral diterpene cyclases show increased thermostability and substrate acceptance The FEBS Journal 285(24): p.4660-4673.

II K. Schriever, P. Saenz-Mendez, R. S. Rudraraju, N. M. Hendrikse, E. P. Hud- son, A. Biundo, R. Schnell & P.-O. Syrén (2020). Engineering of ancestor as a tool to elucidate structure, mechanism and specificity of extant terpene cyclase. Manuscript

III N. M. Hendrikse, A. Holmberg Larsson, S. Svensson Gelius, S. Kuprin, E. Nordling & P.-O. Syrén (2020). Exploring the therapeutic potential of mod- ern and ancestral phenylalanine/tyrosine ammonia-lyases as supplementary treat- ment of hereditary tyrosinemia. Scientific Reports 10(1): p.1315-13.

IV N. M. Hendrikse, A. Sandegren, T. Andersson, J. Blomqvist, Å. Makower, D. Possner, C. Su, N. Thalén, A. Tjernberg, U. Westermark, S. Svensson Gelius, P.-O. Syrén & E. Nordling (2020). Ancestral lysosomal enzymes with increased activity harbor therapeutic potential for treatment of Hunter syndrome. Manuscript.

Full versions are appended at the end of the thesis with permission of the copyright holders.

Respondent’s contributions to appended papers

I Planned and performed the majority of the experiments, performed data analysis and wrote the manuscript with contributions from co-authors.

II Performed part of the experiments and assisted in scientific discussions and revis- ing of the manuscript.

III Planned and performed the majority of the experiments, performed data analysis and wrote the manuscript with contributions from co-authors.

IV Planned and performed a major part of the experiments, performed data analysis and wrote the manuscript with contributions from co-authors.

viii Published work not included in the thesis

• S. W. Meister, N. M. Hendrikse & J. Löfblom (2019). Directed evolution of the 3C protease from coxsackievirus using a novel fluorescence-assisted intracellular method. Biological Chemistry 400(3): p.405-415.

• I. V. Pavlidis, N. M. Hendrikse & P.-O. Syrén (2018). Computational tech- niques for efficient biocatalysis. Royal Society of Chemistry. In: Modern biocatal- ysis - Advances towards synthetic biological systems: p.119-147.

• M. van Rosmalen, B. M. G. Janssen, N. M. Hendrikse, A. J. van der Linden, P. A. Pieters, D. Wanders, T. F. A. de Greef & M. Merkx (2017). Affinity matura- tion of a cyclic peptide handle for therapeutic antibodies using deep mutational scanning. Journal of Biological Chemistry 292(4): p.1477-1489.

ix Commonly used abbreviations

ASR Ancestral sequence reconstruction CHO Chinese hamster ovary E. coli Escherichia coli ERT Enzyme replacement therapy FDA Food and Drug Administration FGE Formylglycine-generating enzyme fGly Formylglycine FPP Farnesyl pyrophosphate GGPP Geranylgeranyl pyrophosphate HAL Histidine ammonia- HT Hereditary tyrosinemia IDS Iduronate-2-sulfatase L-Phe L-Phenylalanine L-Tyr L-Tyrosine LSD Lysosomal storage disease MD Molecular dynamics MIO 4-methylidene-imidazole-5-one ML Maximum likelihood MPS MSA Multiple sequence alignment PAH Phenylalanine hydroxylase PAL Phenylalanine ammonia-lyase PEG Polyethylene glycol PKU Phenylketonuria SvS Spiroviolene synthase

T m Melting temperature TAL Tyrosine ammonia-lyase

x Contents

Abstract ...... i Sammanfattning ...... ii Popular scientific summary ...... iii Populärvetenskaplig sammanfattning ...... v Thesis defense ...... vii List of appended papers ...... viii Commonly used abbreviations ...... x

1 Introduction 1 1.1 Introduction to enzymes ...... 2 1.2 Structure of enzymes ...... 4 1.2.1 Structural dynamics ...... 4 1.3 ...... 6 1.3.1 Thermodynamics of enzymatic reactions ...... 6 1.3.2 ...... 8 1.4 Enzyme evolution ...... 12 1.4.1 Catalytic activity ...... 13 1.4.2 Stability ...... 16 1.4.3 From evolution to engineering ...... 17

2 Therapeutic enzymes 19 2.1 Enzymes as biopharmaceuticals ...... 20 2.1.1 Challenges of therapeutic enzymes ...... 21 2.2 Inherited metabolic disorders ...... 22 2.2.1 Lysosomal storage diseases ...... 22 2.2.2 ...... 23

3 From model systems to complex applications 27 3.1 Terpene cyclases ...... 28 3.1.1 Spiroviolene synthase ...... 30 3.2 Ammonia lyases ...... 30

xi Contents

3.2.1 PAL/TAL ...... 33 3.3 Lysosomal sulfatases ...... 33 3.3.1 IDS ...... 34

4 Enzyme engineering 37 4.1 Directed evolution ...... 38 4.2 Rational design ...... 39 4.2.1 Computational methods for enzyme engineering ...... 39 4.2.2 Molecular dynamics ...... 40 4.2.3 Sequence-based enzyme engineering ...... 40 4.3 Experimental methods in enzyme engineering ...... 41 4.3.1 Production of enzyme variants ...... 41 4.3.2 Characterization of enzyme variants ...... 42

5 Ancestral sequence reconstruction 47 5.1 Phylogenetic trees ...... 48 5.1.1 Multiple sequence alignments ...... 48 5.1.2 Statistical inference ...... 49 5.1.3 Tree rooting ...... 51 5.1.4 Evolutionary models ...... 51 5.2 Ancestral sequence reconstruction ...... 52 5.2.1 ASR for enzyme engineering ...... 55

6 Present investigation 57 PaperI ...... 58 PaperII...... 63 PaperIII ...... 66 PaperIV ...... 72 Concluding remarks & Future perspectives ...... 78

Acknowledgements 81

Bibliography 82

xii Chapter 1

Introduction

Enzymes are biocatalysts - proteins that catalyze chemical reactions. Their catalytic power is increasingly recognized for its potential applications, in particular with respect to efforts in sustainable development [1]. Due to their intrinsically sustainable nature, enzymes are being explored in the context of several United Nations Sustainable Development Goals (SDGs) [2, 3], including Clean Water and Sanitation (SDG 6), Affordable and Clean En- ergy (SDG 7), Responsible Consumption and Production (SDG 12), and Climate Action (SDG 13). In addition, they could contribute to goals that are centered around health of the individual, such as Zero Hunger (SDG 2) and Good Health and Well-being (SDG 3). The work in this thesis is focused on the therapeutic application of enzymes and therefore directly relates to SDG 3. Moreover, increased activity, stability and substrate scope can be desirable properties for biocatalysts in industrial applications and thereby contribute to sustainable development in a variety of ways.

The present chapter provides a brief historical context and introduces the general structure of proteins and enzymes. This is followed by a more de- tailed introduction to enzyme catalysis that focuses on how enzymes perform their role as biocatalysts and what models are commonly used to analyze enzymatic reactions. Finally, different concepts of enzyme evolution are introduced, such as evolution of catalytic activity and stability, followed by a brief discussion of enzyme engineering that serves as a foundation for the following chapters.

1 Chapter 1. Introduction

1.1 Introduction to enzymes

Proteins are often referred to as the building blocks of life as they perform numerous structural functions, but they are also involved in essential processes of living cells such as signaling, transportation and catalysis. Enzymes are a group of proteins that func- tion as biological catalysts by accelerating chemical reactions, thereby enabling them to occur at rates that are compatible with life [4]. This catalytic power not only makes enzymes essential to all living organisms; it has also been appreciated and exploited by mankind for thousands of years, as it is usually retained even outside the enzyme’s natural context. One of the oldest examples is the use of enzymes from yeast for the fermentation of sugar to produce alcohol, but also the manufacturing of cheese and bread are among the first applications of enzymatic catalysis [5]. Early therapeutic us- age of enzymes includes their oral administration as digestive aids in the 20th century [6] and, more recently, enzymes have become of great interest to health science and pharmaceutical research, as many disease processes involve abnormal function of one or more enzymes.

Despite the diverse and well-established application of enzymes by humans for thou- sands of years, the systematic study of enzymes, or enzymology, did not begin until the 18th and 19th century. The word “enzyme” was introduced by Kühne in 1876 [6] and a first attempt at explaining their mode of action was made by Fischer in 1894,who proposed a “lock and key model” for the interaction between enzymes and substrates [7]. Further advancements in enzymology were made possible by enzymes becoming available in purified, or partly purified form, following early attempts including the preparation of cell-free yeast extract by Büchner [8] and partial purification of laccase from tree sap by Bertrand [9] in 1897. Obtaining purified enzymes allowed for detailed studies of enzymatic reactions that enabled the generation of various mathematical models to describe these reactions in the beginning of the 20th century, which are dis- cussed in more detail later in this chapter. Advancements in enzyme purification not only facilitated detailed studies of enzymes and their reactions, they also opened the door to structural studies in the early 20th century, when X-ray crystallography became the main method to elucidate structures of small molecules. It was not until the first crystallization of an enzyme—urease from jack beans—by Sumner in 1926 that is was confirmed that enzymes were proteins [10]. The first three-dimensional structure ofan enzyme was published in 1965 [11], marking the beginning of a structural revolution that continues today. It also inspired efforts to understand the structure-function rela- tionships of enzymes, which remains one of the biggest challenges in enzymology and enzyme engineering.

2 Chapter 1. Introduction

A

R1 R3 R5

R2 R4 R6

T N-terminus M A D F H C-terminus

B C D

Figure 1: The structure of proteins. A) The primary structure of a protein is determined by the sequence of amino acids in the polypeptide chain. Amino acids share a common structure with an amino and a carboxy group, but differ in their sidechains (R1,R2,…) which can be represented by single letters (M, D,…). The amino and carboxy end of the polypeptide chain are referred to as N- and C-terminus, respectively. B) Structural elements such as α-helices (left) and β-sheets (right) make up the secondary structure of the protein (PDB ID: 5KPE [12]). C) The three-dimensional arrangement of secondary structural elements constitutes the tertiary structure of a protein. D) Several subunits of tertiary structures may combine to form multimers, also known as the quaternary structure. The figure shows the top view of a homotetramer made up of four monomers identical to the one shown in C, indicated by different colors (PDB ID: 1Y2M [13]).

3 Chapter 1. Introduction

1.2 Structure of enzymes

Proteins are macromolecules that consist of polypeptide chains, which can fold into a three-dimensional structure. These polypeptide chains are made up of building blocks, amino acids, which can be infinitely combined to form sequences of different lengths and composition. A protein’s unique sequence of amino acid residues is generally re- ferred to as its primary structure - information that is genetically encoded in DNA. Most naturally occurring proteins are built up of a selection of 20 amino acids, which all share a common chemical architecture. This allows them to form peptide bonds and thereby establish the protein backbone that also includes amino acid-specific side chains (Figure 1A). Amino acids differ in their side chains, which harbor diverse physic- ochemical properties; ranging from a single hydrogen atom to larger organic structures, including charged and aromatic residues. Upon translation by the ribosome, parts of the linear polypeptide chain engage in localized structural elements, such as α-helices and β-sheets, through intramolecular interactions within the protein backbone (Figure 1B). These elements make up the secondary structure of the protein and a combination of them can be arranged in a three-dimensional fashion, which is defined as the tertiary structure of the protein (Figure 1C) [14]. The tertiary structure is generally referred to as the native fold or native conformation of the protein and infers its biological function. In some cases, several protein subunits assemble to form a quaternary structure, which may be required for protein function (Figure 1D). Further modifications of protein structure can occur during or after folding, commonly referred to as post-translational modifications, which also contribute to the vast diversity that is observed in proteins found in nature. Examples include the conversion of residue side chains to other func- tional groups, the addition of non-proteinogenic molecules (e.g. phosphorylation or glycosylation), and cleavage of the protein into smaller subunits (Figure 2).

1.2.1 Structural dynamics Enzymatic reactions can be divided into three subsequent events that make up the cat- alytic cycle: 1) Substrate binding to form the enzyme-substrate complex, 2) conversion of substrate to in the by means of the catalytic mechanism and 3) product release. Although the actual catalytic event takes place in the active site, this part of the enzyme usually only accounts for 10-20% of the total protein volume [15]. The remaining 80-90% is involved in positioning the catalytic residues in the active site, as well as electrostatic stabilization of the transition state, but its exact role in catalysis is not always understood. It is generally believed that its influence is mostly connected to structural dynamics; structural fluctuations and motions that occur in the enzyme. Throughout the catalytic cycle, enzymes generally undergo structural rearrangements

4 Chapter 1. Introduction

Ribosome

Cytosol

ER GlcNAc Mannose Glc

FGE

FGE Asparagine

Cysteine Formylglycine

Figure 2: Post-translational modifications. Proteins and enzymes that require post- translational modifications are targeted to the endoplasmic reticulum (ER) upon translation by the ribosome. The newly synthesized polypeptide chain can be modified by, for example, conver- sion of residues to other functional groups, such as the conversion of a cysteine to formylglycine by the formylglycine-generating enzyme (FGE) in lysosomal sulfatases, or asparagine-linked (N-linked) glycosylation. N-linked glycosylation involves the addition of pre-formed oligosac- charides made up of various sugar units (GlcNAc, Mannose, Glc, etc.) to asparagine residues by enzymes that recognize the motif asparagine-X-threonine/serine (NXT/S).

in order to adopt different conformations that correspond to the aforementioned steps in the catalytic process. Enzymes are required to maintain a delicate structural bal- ance: they must be rigid enough to retain their native (and active) conformation, but dynamic enough to allow necessary rearrangements that are required between substrate binding and product release. Structural dynamics of proteins include movements that occur at a wide range of timescales: bond vibrations (femto-picoseconds), rotation of amino acid side chains (pico-nanoseconds), alterations of backbone configurations such as loop rearrangements (micro-milliseconds), but also movements of entire domains and protein folding (up to seconds) [16, 17]. These movements further increase the

5 Chapter 1. Introduction complexity of structure-function relationships of enzymes and have gained increasing attention over the past decades [18, 19]. In 1958, Koshland was among the first to ac- knowledge the dynamic nature of proteins when he introduced the so-called induced-fit model for enzymatic catalysis, which assumes that the enzyme undergoes a conforma- tional change upon substrate binding [20]. In the 1960s, the conformational selection model was proposed, which assumes that the enzyme continuously samples different structural conformations [21, 22]. Today, the role of structural dynamics in catalysis is widely acknowledged, yet often poorly understood. It ranges from direct contributions, such as rearrangements that enable the substrate to enter the active site or the product to leave, to subtle contributions such as allosteric effects [23]. Allostery refers to the phenomenon where ligand binding at a site that is topographically distinct from the active site induces structural changes, which are transmitted through the protein and thereby influence protein function [24]. Apart from its role in catalysis, the importance of dynamics in the evolution of enzymes is increasingly acknowledged [25]. Enzyme dynamics can be studied using a variety of experimental techniques, e.g. nuclear mag- netic resonance (NMR) spectroscopy [26], but also by computational methods such as molecular dynamics (MD) simulations, which are further discussed in Chapter 4.

1.3 Enzyme catalysis

1.3.1 Thermodynamics of enzymatic reactions Throughout the catalytic cycle, an enzymatic reaction passes through a number of intermediates that are associated with different levels of free energy. The free energy is the amount of energy in the system that is free to perform work under the given conditions; in a catalytic process, it typically concerns the breaking and formation of bonds. When a process occurs at a constant pressure and temperature, this energy is referred to as the Gibbs free energy (G), which was described by Gibbs in 1873 [27]. To study processes, such as enzyme-catalyzed reactions, the change in Gibbs free energy (∆G) is used:

∆G = ∆H − T ∆S (1)

In order for a reaction to occur spontaneously, ∆G should be negative, i.e. the free energy of the product state needs to be lower than the free energy of the reactant state (Figure 3). The transition state is the molecular configuration that corresponds to the highest free energy along the reaction coordinate and ∆G between the enzyme-substrate complex (ES) and the transition state is defined as the free energy of activation (∆G‡). In analogy to Equation 1, ∆G‡ can be separated into an enthalpic component (∆H‡)

6 Chapter 1. Introduction

and an entropic component (∆S‡) as follows:

∆G‡ = ∆H‡ − T ∆S‡ (2)

Enthalpic component ∆H‡ describes a thermodynamic potential that corresponds to the systems internal energy, or heat, and is mainly related to electrostatic effects and bond formation in enzyme catalysis. Entropic component ∆S‡, on the other hand, is a measure of disorder in a system. It corresponds to the number of accessible microstates associated with translational, rotational and vibrational degrees of freedom, where a higher entropy is favorable. In enzyme catalysis, entropic contributions can arise e.g. from dynamic effects in the protein itself, but also from solvent molecules. Enzymes increase the rate at which reactions proceed as compared to the uncatalyzed reaction by decreasing ∆G‡ (Figure 3). The relationship between the reaction rate k and ∆G‡ is captured in Eyring’s transition state theory (Equation 3), which was published in 1939 [28] and is similar to the previously developed Arrhenius equation [29, 30]. In simple, generalized transition state theory, k is the rate constant, κ is the transmission

coefficient that is assumed to be one, T is the absolute temperature in Kelvin, kB is the Boltzmann constant, h is the Planck constant and R is the universal gas constant [31]:

‡ kBT −∆G k = κ e RT (3) h

The development of this theory inspired Pauling to suggest that enzymes achieve the decrease of ∆G‡ by binding and stabilizing the transition state, thereby decreasing the free energy of this generally unstable intermediate [32]. The enthalpic and entropic components of the energy barrier can be derived after combination of Equation 2 and 3 as follows (Equation 4):

k −∆H‡ ∆S‡ ln = + (4) kB T RT R h

From Equation 4 it is possible to calculate ∆H‡ and ∆S‡ by analysis of the temperature k dependence of the enzymatic reaction through linear regression of ln kB T versus 1/T , h where the slope multiplied by R represents ∆H‡ and the intercept multiplied by R represents ∆S‡. Analysis of the contribution of ∆H‡ and ∆S‡ to enzymatic reactions can be informative when comparing enzyme variants; in Paper I, these parameters were compared for a modern diterpene cyclase and a reconstructed ancestral enzyme.

7 Chapter 1. Introduction

S‡

ES‡

ΔG‡ Free energy Free E+S ES ΔG EP E+P

Reaction coordinate

Figure 3: Energy profile of an enzymatic reaction. A hypothetical energy profile of an enzyme-catalyzed reaction (in black) as compared to the uncatalyzed reaction (dashed grey), involving one substrate and a single intermediate. Free energy is shown as a function of reaction coordinate. Sequential steps of an enzyme-catalyzed reaction are indicated: formation of the enzyme-substrate complex (ES) from free enzyme and substrate (E+S), formation of the product (EP) via the transition state (ES‡), followed by product release (E+P). In order for the reaction to occur spontaneously, the free energy of EP should be lower than that of ES, i.e. ΔG should be negative. Enzymes lower the free energy of activation, ∆G‡, as compared to the uncatalyzed reaction by stabilizing high energy transition state ES‡.

1.3.2 Enzyme kinetics Efforts to understand an enzyme and its reaction mechanism often involve thestudy of enzyme kinetics—the rate at which the reaction occurs—under different conditions. Kinetic analysis allows for determination of kinetic parameters that reflect an enzyme’s catalytic efficiency and substrate binding efficiency and is based on the determination of reaction velocities (v), i.e. the rates of product formation, at different substrate concentrations ([S]). Upon mixing an enzyme with its substrate, product formation over time can be described by a linear function until approximately 10% of the initial substrate concentration ([S0]) has been consumed [33]. The reaction velocity in this linear interval is considered to be the initial rate v0 (v = v0) and, at low substrate concentrations, increases linearly with [S] (first-order behavior). At high substrate concentrations, however, v0 reaches a plateau and approaches zero-order behavior. An

8 Chapter 1. Introduction

explanation for this observation was posed by Brown in 1902 [34] and involves the separation of the enzymatic reaction into two steps; namely formation of the ES complex and product formation, which can be represented as in Equation 5:

k1 k E + S ↽−−−−⇀ ES −→2 E + P (5) k−1

The rate of the backwards reaction of the second step, k−2, is negligible in the relevant time interval and is not shown. According to this reaction scheme, v depends on

the concentration of the ES complex ([ES]) and the rate of product formation k2 as follows:

v = k2[ES] (6)

In reality, the arrow indicating the conversion of ES to E + P usually represents a

series of chemical steps rather than a single step with rate constant k2. Therefore,

an apparent rate constant for these collective steps is commonly used (kcat), which describes the rate-determining step of the reaction:

v = kcat[ES] (7)

Assuming that the enzyme concentration ([E]) is constant in all reactions, [ES] is directly proportional to [S] at low substrate concentrations, explaining the apparent first-order behavior. At high [S], however, nearly all enzyme molecules is present in ES complex form and v depends entirely on the rate of product formation, leading to the observed rate saturation or zero-order behavior. The mathematical description of this model was developed by Henri in 1903 [35] and later Michaelis and Menten in 1913 [36]. It assumes that a rapid equilibrium is established between E + S and ES,

followed by a slower conversion to free enzyme and product (k2 ≪ k−1). In the case where [E] is much lower than [S], the amount of substrate bound in the ES complex is negligible as compared to the total substrate concentration and it can be assumed that the concentration of free substrate is equal to the total substrate concentration

([Sfree] ≈ [S]). This leads to the following equilibrium dissociation constant (replacing

association and dissociation rates k1 and k−1 in Equation 5):

[E ][S] K = free (8) s [ES]

9 Chapter 1. Introduction

The free enzyme concentration ([Efree]) is equal to the difference between the total [E] and [ES], and therefore:

([E] − [ES])[S] K = (9) s [ES]

Equation 9 can be rearranged to give an expression for [ES] as follows:

[E][S] [ES] = (10) Ks + [S]

Equation 10 can in turn be combined with Equation 7 to yield the following expression for the reaction velocity:

k [E][S] v = cat (11) Ks + [S]

At infinite substrate concentrations this expression of v approaches a maximum rate

Vmax that is equal to kcat[E], which can be incorporated in Equation 11 as follows:

V [S] v = max (12) Ks + [S]

Contrary to the rapid equilibrium assumption, most experimental measurements of reaction rates are performed at a constant, steady-state concentration of the ES complex. This means that formation of the ES complex occurs at a rate that is equal to its deformation (d[ES]/dt = 0). In an experimental setting, this can be achieved by having a large excess of substrate as compared to free enzyme, i.e. [S] ≪ [E]. In that case, the same assumption that [Sfree] ≈ [S] can be made, as well as the assumption that initial product formation does not affect the total [S]. This so-called steady-state phase ends when a significant amount of substrate has been converted to product and [S] can no longer be assumed to be constant. Under these assumptions, the steady-state equilibrium can be described as follows:

k1[Efree][S] = (k−1 + k2)[ES] (13)

10 Chapter 1. Introduction

If we define a constant KM as follows:

k−1 + k2 KM = (14) k1

Equation 13 can be rearranged as follows:

[E ][S] [ES] = free (15) KM

As previously established [Efree] = [E]−[ES] and Equation 15 can thus be restructured to the following form:

[E][S] [ES] = (16) KM + [S]

This can be combined with the reaction velocity v (Equation 7) and the definition

of Vmax to yield the most commonly used expression for the analysis of steady-state kinetics:

V [S] v = max (17) KM + [S]

This equation is highly similar to the originally derived equation by Henri, Michaelis and Menten (Equation 12) using the rapid equilibrium assumption. Despite the steady-state assumption first being described by Briggs and Haldane in 1925 [37], it is commonly referred to as the Michaelis-Menten or Henri-Michaelis-Menten equation. If an enzy-

matic reaction can be saturated, i.e. the reaction rate asymptotically approaches Vmax,

kinetic parameters kcat and KM can be determined individually. KM, also known as the Michaelis constant, is equal to the concentration at which half the maximum velocity is

reached and can be used as a relative measure of an enzyme’s substrate affinity. kcat is commonly referred to as the turnover number: the number of catalytic events per time unit. This rate constant corresponds to the rate-limiting step of the catalytic reaction, which may represent any of the chemical steps that make up the reaction mechanism,

a certain conformational change or product release. kcat/KM is used as a measure of catalytic efficiency, but is especially useful when a reaction cannot be saturated asit can be determined directly as the slope of the linear part of the plot of v versus [S]. In Paper I, it was shown that an enzymatic reaction catalyzed by an ancestral diter- pene cyclase could be saturated, as opposed to the reaction catalyzed by the modern bacterial enzyme.

11 Chapter 1. Introduction

It should be noted that Equation 17 only holds under the assumption that there is no allostery, or inhibition involved in the reaction. That is, it is assumed that active sites of enzymes act independently of each other, which is not always the case. Some enzymes adapt a quaternary structure where several subunits combine to form a homomultimer and each monomer contains an active site. Binding of a substrate molecule in one active site in the multimer may affect the multimeric structure in such a way that the affinity in the other, empty, active sites is altered – an effect knownas cooperativity (a form of allostery). Cooperativity may be either positive or negative: binding of the substrate in one active site can either increase or decrease substrate affinity in the others, respectively. Equation 17 can be modified to accommodate for the cooperative effect by including the Hill coefficient (h) and is referred to as the Hill equation [38]:

V [S]h v = max (18) K′ + [S]h

In theory the Hill coefficient represents the total number of binding sites, but ifcoop- erativity is moderate (h

1.4 Enzyme evolution

As described above, the sequence of amino acids that make up the primary structure of a protein is encoded in DNA. Random errors, or mutations, are continuously intro- duced in DNA, either during replication or through other processes. Mutations at the genetic level may lead to an altered amino acid sequence, which in turn can change the structure of a protein and thereby its function. Evolution is the process of random mutagenesis—creating genetic variation—followed by selection that is inferred by an evolutionary pressure, but also includes neutral drift. This complex process creates bio- diversity not only at the species level, but also at the molecular level, and has shaped the planet over the course of billions of years. Natural evolution of enzymes has been ongoing since the emergence of life on earth and is closely linked to its evolution, due to the essential functions of enzymes in all living organisms. Early life on earth is presumed to have contained limited genetic information compared to the present and, therefore, a smaller number of proteins [39]. The expansion of this genetic information and the

12 Chapter 1. Introduction

acquisition of new functions at an early stage is likely to have occurred through gene duplication events and functional divergence of the associated proteins. This process is commonly referred to as divergent evolution: the acquisition of evolutionary related, but different biological characteristics by diversification from a common ancestor. In contrast, convergent evolution refers to the independent acquisition of similar biolog- ical characteristics through unrelated evolutionary processes. The optimization of an enzyme through random mutations may be envisioned as an iterative search through its sequence space, i.e. all possible combinations of amino acids, which can be repre- sented by a so-called fitness landscape. In this scenario, each mutation is associated with a change in fitness, where the fitness peak represents the highest achievable cat- alytic efficiency (or any other trait) given the evolutionary context of the enzyme. The search for the maximum is complicated by the fact that fitness increase per mutation is not constant and also becomes smaller as the enzyme approaches the peak [40, 41]. Moreover, the enzyme may encounter local fitness maxima that are hard to overcome. Paths through the fitness landscape are referred toas evolutionary trajectories and are likely not deterministic. Their predictability is complicated by cooperative interac- tions among amino acids, known as epistasis [42], which result in effects of combined mutations that are difficult to predict [43–45].

1.4.1 Catalytic activity Early evolution of protein-based functionalities, including the emergence of enzymatic activity, is challenging to study and remains poorly understood. Recent studies support a scenario in which key catalytic residues that interact with the substrate molecule are present in the substrate binding pocket before catalytic function evolves [46, 47]. Initial mutations in the active site then reshape the binding pocket and optimize enzyme- substrate complementarity and transition-state stabilization, thereby activating the catalytic residues and inducing weak catalytic activity. This de novo catalytic activ- ity can be enough to provide an evolutionary advantage and can be further fine-tuned through additional mutations that alter the enzyme dynamics to minimize the sampling of catalytically unfavorable conformations (see section Structural dynamics). These sec- ondary mutations are commonly further away from the active site and are referred to as long-range mutations. Due to the relatively small portion of amino acids in the active site, long-range mutations should in fact be more likely to occur than mutations in the active site, suggesting a directionality in the evolutionary process of mutations radiat- ing outward from the active site. Similar mutational patterns, as well as the dynamic enrichment of catalytic conformations, have been observed in laboratory-based evolu- tion efforts [40, 48]. Moreover, the scenario of catalytically active enzymes evolving from binding proteins is in some cases also supported by structural homology between

13 Chapter 1. Introduction them. The opposite may also happen during the course of evolution: enzymes can lose their catalytic abilities through mutation of essential catalytic residues. Such “former enzymes” that are still encoded in the DNA and functionally expressed are referred to as pseudoenzymes and may evolve to acquire new functions [49]. Their tertiary structure usually highly resembles that of their functional counterpart, despite generally low se- quence identities [50]. This phenomenon has been used to study long-range evolutionary constraints induced by a functional enzyme’s active site, which differ greatly from long- range evolutionary constraints in pseudoenzymes [51]. The finding that the presence of an active site poses long-range evolutionary constraints on an enzyme, rather than its three-dimensional structure, further supports the hypothesis of emergence of enzymatic activity followed by dynamic fine-tuning through long-range mutations.

Yet, it appears that the evolution of new active sites is rare and that new enzyme functions are more likely to evolve from existing ones by exploiting promiscuous ac- tivities, as first suggested by Jensen in 1976 (Figure 4) [52]. Catalytic promiscuity in enzymes is commonly defined as the ability to catalyze additional reactions totheone for which they evolved, which proceed via distinct transition states and target different functional groups of the substrate [53]. It is different from substrate promiscuity, which refers to the ability to catalyze reactions with different substrates. Promiscuity is often confused with the terms multi-specific enzymes or enzymes with substrate ambiguity, which refer to enzymes that evolved to perform a range of reactions, possibly with a variety of substrates. It is commonly believed that many, if not most enzymes today are promiscuous in some way, albeit weakly. That promiscuity can serve as a starting point for the evolution of new enzyme functions is in line with the notion of ancestral enzymes that were general catalysts in the context of limited genetic information and evolved towards different functions under changing evolutionary selection pressure. Col- lections of enzymes that are believed to have evolved from a common ancestor through a “chemistry-driven” selection, i.e. through conservation of their catalytic strategy, are known as superfamilies [54]. Superfamilies share sequence motifs and active site archi- tecture and can contain up to thousands of enzymes, which makes them highly suitable for studying evolution of enzyme function [55].

Whether it concerns the emergence of de novo catalytic activity or the transition from a weak promiscuous activity to the primary function of an enzyme, catalytic activity of enzymes is optimized according to evolutionary selection pressure. A beneficial muta- tion has a much larger impact on an organisms evolutionary fitness in an enzyme that catalyzes the rate-limiting step of a metabolic pathway compared to an enzyme that al- ready functions at the required threshold, resulting in a different evolutionary pressure [57]. It is important to realize that an enzyme’s contribution to the fitness of its host organism extends beyond kinetic parameters such as kcat and kcat/KM. Its evolutionary

14

Chapter 1. Introduction

y

activit Catalytic

Sequence space

Figure 4: Evolution of new catalytic activity from weak promiscuous activity. An enzyme catalyzing its natural reaction (green) may harbor weak promiscuous activity (repre- sented by the overlapping region with pink), even when it is optimized for the natural reaction (dashed line). Activity for the new reaction can then be optimized in a series of evolutionary intermediates (represented by arrows) that initially also retain the activity of the natural re- action. Gene duplication, either before or during the process of differentiation can result in two homologous enzymes with distinct activities. Figure inspired by [56].

trajectory is rather determined by a combination of biochemical, biophysical and regu- latory factors that contribute to organismal fitness, such as stability, solubility, availability and metabolite flux [58]. It is therefore perhaps not surprising that most enzymes are not “perfect catalysts”: although the fastest enzymes approach kinetic 9 10 −1 −1 perfection and work at the diffusion limit (kcat/KM ∼ 10 − 10 M s ), meaning that the rate-limiting step is the diffusion of the substrate or product when entering or leaving the active site, few enzymes do [59]. The average catalytic efficiency of natural enzymes is reported to be approximately 105 M−1s−1, several orders of magnitude away from the theoretical maximum [60]. However, many natural enzymes may have reached catalytic perfection in the context for which they were optimized. This notion could explain the observation that enzymes functioning in secondary typically dis- play lower catalytic efficiencies compared to enzymes involved in the central metabolism [60]. The difference could be due to their more recent emergence and corresponding limited evolutionary history, but also to the fact that they generally contribute less to

15 Chapter 1. Introduction organismal fitness: enzymes in secondary metabolism typically operate only in certain cells or tissues for limited periods of time, which may result in a weaker evolutionary pressure on optimizing catalytic efficiency.

1.4.2 Stability Whereas the evolution of catalytic activity is specific to enzymes, evolution of stability is relevant to all proteins and its implications expand into numerous areas within protein science. In particular thermostability, the ability of a protein to retain its structure and function at increasing temperatures, is often a highly desirable property from a practical perspective and has been studied extensively. Melting temperature (Tm) is commonly used as a thermostability index and is defined as the temperature at which half the protein molecules in a sample have lost their native—and in the case of enzymes, catalytically active—three-dimensional structure. Thermal stability of proteins depends on their amino acid composition and a variety of short- and long-range interactions, such as hydrogen bond or salt bridge networks, α-helix dipole interactions, disulfide bridges and packing of the hydrophobic core, which contribute to overall rigidity of the protein [61]. It is generally believed, yet sometimes debated, that early life on earth was thermophilic, i.e. that the last universal common ancestor evolved to function at higher temperatures than the average temperature on earth today [62–64]. This hypothesis has specific implications for the evolution of enzymes: as the temperature on earth decreased over time, the evolutionary pressure on thermostability weakened and was replaced by an evolutionary pressure to maintain functional catalytic rates at lower temperatures [65]. In other words, enzymes evolved in a context of active pressure on maintaining catalytic activity and a passive drift away from thermostability, resulting in generally lower melting temperatures of enzymes in so-called mesophilic organisms. However, due to organismal adaptation to a variety of different environments, thermophilic, mesophilic and psychrophilic (also known as cryophilic) enzymes can be found in nature today, spanning a wide range of melting temperatures.

The question of how enzymes maintained their catalytic activity in the context of a cooling earth has been studied by various approaches (see also Ancestral sequence reconstruction in Chapter 5). Wolfenden and coworkers showed that, given this evolu- tionary context, the origin of enzymatic rate-enhancement is likely enthalpic [66, 67]. A catalyst that decreases ∆H‡ would have a selective advantage over one that increases ∆S‡ by the corresponding amount: although both would have an identical effect on ∆G‡—and therefore reaction rate k—at a given temperature (Equation 2 and 3), the second catalyst would experience a much larger impact on ∆G‡ as the temperature goes down, which would lead to a steeper decrease in reaction rate [66, 67]. The dominant effect of temperature as an evolutionary driver is also directly reflected in enzymatic

16 Chapter 1. Introduction

reaction rates, as the kinetic energy of all molecules, as well as collision rates, directly depend on temperature. Due to the exponential dependency of the reaction rate on temperature (Equation 3), the rate increases with increasing temperature by a certain

factor until the melting temperature is approached. Q10 is a coefficient that corresponds to the fold-increase in rate per 10 °C increase in temperature and has been shown to not significantly differ between thermophilic, mesophilic and psychrophilic enzymes [68], contradicting the previously assumed direct trade-off between thermostability and enzymatic activity [69, 70].

1.4.3 From evolution to engineering The evolution from simple organisms with a limited enzyme repertoire to much more complex organisms harboring thousands of enzymes with a higher level of specificity may suggest some kind of directionality in evolution. Yet, it is important to remem- ber that evolution never stops. This is illustrated by the emergence of new enzymes in recent decades that have evolved to withstand man-made challenges, such as the introduction of and pesticides in the environment [71, 72], the increasing use of antibiotics [73, 74], as well as the enormous amount of plastic waste deposited in nature [75]. As long as the evolutionary pressure changes, enzymes will adapt to it and new functions will emerge - a phenomenon that is now commonly exploited in enzyme engineering efforts. Enzymes are often employed by humans to operate incon- ditions that they did not naturally evolve for and optimization of certain features might be required. Protein engineering and enzyme engineering refer to collective efforts to generate proteins or enzymes with altered properties compared to the starting point (the so-called wild type), mainly by introducing changes to the amino acid sequence, but also through chemical modifications. Enzyme engineering in particular focuses on the generation of enzymes with novel properties, often towards industrial or biomedical applications, which may be activity towards non-native substrates, increased catalytic turnover, increased thermostability or other properties. Advances in the field of molec- ular biology, such as recombinant DNA technologies (the amplification, cloning, and sequencing of DNA) and the possibility to express enzyme variants in various host sys- tems, have led to a rapid development of this field over the past decades. The concept of systematically replacing amino acids in the sequence of an enzyme was outlined by Winter et al. in 1982 [76], who converted a cysteine in the active site of tyrosyl tRNA synthetase to a serine that reduced enzymatic activity. Following this, Eigen published a theoretical workflow for directed evolution of enzymes in 1984 [77]. In the early 1990s, Frances Arnold and colleagues conducted pioneering experiments where they engineered subtilisin E using directed evolution [78], setting the stage for the development and ex- pansion of the method [79]. An important contribution to directed evolution as an

17 Chapter 1. Introduction enzyme engineering strategy was made by Stemmer in the first half of the 1990s, who introduced the concept of DNA shuffling: the fragmentation and re-assembly of genes [80, 81]. Early directed evolution efforts focused on engineering enzymes to act in non- natural conditions, such as in organic solvents or at high temperatures [78, 82]. Today, the strategy has even been shown to be successful in engineering enzymes that catalyze chemistry that does not occur in nature [56]. In 2018, Arnold received the Nobel Prize in Chemistry “for the directed evolution of enzymes”, highlighting the importance of this field from a societal perspective. Theoretical and experimental methodologies related to enzyme engineering are discussed in detail in Chapter 4.

18 Chapter 2

Therapeutic enzymes

As presented in Chapter 1, enzymes are biocatalysts that carry out a wide range of chemical reactions and their catalytic power makes them essential for many processes in the human body. Their crucial role is illustrated by the devastating effect of some enzyme deficiency disorders, which are char- acterized by lack of or impaired function of an enzyme. Treatment of such disorders often involves enzyme replacement therapy—administering the missing enzyme as a biotherapeutic drug—or enzyme substitution therapy with alternative enzymes. This chapter focuses on therapeutic applications of enzymes, in particular in the context of rare inherited metabolic disor- ders.

19 Chapter 2. Therapeutic enzymes

2.1 Enzymes as biopharmaceuticals

In 1978, Goeddel and coworkers were able to successfully express recombinant human insulin in Escherichia coli (E. coli) bacteria, which greatly facilitated large-scale pro- duction of the protein for treatment of diabetes [83]. In 1982, human insulin became the first recombinant protein drug approved by the Food and Drug Administration (FDA) in the United States. Since then, increasingly many protein-based drugs have been approved and even more are being developed; ranging from hormones such as insulin, to procoagulants, antibodies, and enzymes. The second FDA-approved recombinant protein drug was, in fact, an enzyme: alteplase, marketed as Activase®, was approved in 1987 as thrombolytic for the treatment of heart attacks [84]. Another milestone in the history of therapeutic enzymes followed shortly after, in 1990, when Adagen® was approved by the FDA for treatment of adenosine deaminase deficiency. This was the first enzyme approved under the Orphan Drug Act (Public Law 97-414), which was passed seven years earlier to facilitate the development of drugs for diseases that affected less than 200,000 individuals in the United States. Moreover, Adagen® was the first therapeutic enzyme approved for treatment of a genetic enzyme deficiency disorder as well as the first approved “PEGylated” protein drug. The covalent conju- gation to polyethylene glycol (PEG) as a means to mask immunogenicity and extend half-life of therapeutic proteins was first posed in 1977 by Abuchowski et al. [85] and re- mains one of the main strategies for enhancing drug delivery today (see also Challenges of therapeutic enzymes) [86]. Since the early 1990s, development of enzyme biophar- maceuticals has been ongoing and as of 2020, at least 20 enzymes have been approved by the FDA, most notably for applications in bleeding disorders [87] and metabolic disorders.

In contrast to industrial-scale applications, therapeutic enzymes are required in rela- tively small amounts, but with a high degree of purity [88]. Initially, enzymes were mostly sourced from human or animal tissue, which yielded insufficient quantities to meet the clinical demand. However, since the early days of recombinant protein pro- duction in E. coli, many other expression systems have been established that can be used for production of therapeutic enzymes. Commercial production of enzymes has already been achieved in a variety of host systems, such as mammalian cells, bacteria, yeast and molds [89]. Ideally, an enzyme source produces large amounts of functional enzyme, while minimizing the cost and complexity of the production and subsequent purification of the enzyme. Regardless of the expression system, the final formulation must be of high purity and appropriate for the desired means of administration; most commonly injection (parenteral administration) or oral administration (enteral admin- istration). The appropriate choice of administration method highly depends on the

20 Chapter 2. Therapeutic enzymes

enzyme’s mode of action, the nature of the therapeutic target and its localization in the body. Additionally, drug safety and potency—i.e. the amount of drug required to achieve a desired therapeutic effect—are important factors when it comes to adminis- tration and formulation. Therapeutic enzymes generally fall in one of three categories: 1) Enzymes that constitute the therapeutic agent itself, 2) enzymes with an indirect therapeutic effect through activation of so-called prodrugs, and 3) enzymes withadi- agnostic function [90]. Enzymes that belong to the first category are the main focus of this chapter, in particular in the context of rare inherited metabolic disorders.

2.1.1 Challenges of therapeutic enzymes Despite their high specificity and unique catalytic potential as drugs, the use ofen- zymes as biotherapeutics gives rise to some specific challenges. Some of these challenges, mostly related to distribution and circulation in the body as well as immunogenic re- sponses, are common for all protein drugs. Several efforts in enzyme engineering have focused on improving the therapeutic properties of enzymes, ranging from mutagenesis to increase activity or stability, to post-translational modifications and covalent link- age to other molecules to improve distribution, circulation half-life or immunogenicity [90]. In contrast to small-molecule drugs, the relatively large size of enzymes consti- tutes a problem for biodistribution, in particular to areas such as the brain. In some cases, distribution can be enhanced by targeting; covalent linkage of the therapeu- tic agent (enzyme) to another molecule that interacts with a receptor, which enables receptor-mediated uptake [91]. Another weakness of enzymes as biotherapeutics is their instability and low circulation half-life in the body [92]. Therapeutic enzymes are ex- posed to a wide range of stresses in the body that can cause unfolding or degradation, such as changes in temperature, pH or salt concentrations, oxidative stress, and expo- sure to proteases – enzymes that cleave proteins [93]. Enzymes that are administered orally face several obstacles in the gastrointestinal tract, most prominently harsh con- ditions in the stomach due to low pH and presence of proteases. Various approaches have been investigated to overcome these obstacles, including coating or encapsulation, conjugation to polymers (e.g. PEGylation [86]) and alternative formulations [94, 95]. Other stability-related challenges of therapeutic enzymes are related to temperature and involve storage and shelf-life of the formulation. Engineering efforts to increase long-term enzyme stability are thus of great interest and are further discussed in Paper III. Immunogenicity refers to the response of the immune system to foreign (exogenous) molecules which, in the context of therapeutic enzymes, is the enzyme itself. Such an immunogenic response may consist the generation of anti-drug antibodies that are directed against the exogenous enzyme and is typically more severe for non-human pro- teins. Harmful immune responses constitute a major obstacle in getting protein-based

21 Chapter 2. Therapeutic enzymes therapeutics approved for the market and several strategies have been developed in or- der to avoid them, most notably covalent linkage of the therapeutic agent to PEG [96]. Immunogenicity of a therapeutic enzyme depends on various factors, such as adminis- tration route, dosage and frequency, as well as the enzyme’s solubility and stability [93]. As therapeutic enzymes are preferably expressed, formulated and administered at high concentrations, high solubility is a desirable trait. Enzyme engineering strategies aimed at increasing enzyme solubility may also positively affect immunogenicity, as aggregated proteins are typically more immunogenic [97]. In Paper I, increased expression yield and solubility were observed for ancestral terpene cyclases, which are both beneficial properties for potential therapeutic enzymes and can also enable crystallization efforts, as shown in Paper II.

2.2 Inherited metabolic disorders

An important class of therapeutic enzymes includes those that aim to treat enzyme deficiency disorders, where patients have a defect in a gene that causes impairedor even absent function of the corresponding enzyme. Such medical conditions are also referred to as inherited metabolic disorders, or inborn errors of metabolism—as they usually involve metabolic enzymes—and are typically treated by enzyme replacement therapy (ERT). As the name suggests, the goal of ERT is to replace the enzyme that is deficient or completely absent in the patient - a concept that was first described inthe context of lysosomal storage diseases (LSDs) by de Duve in 1964 [98]. Shortly after the approval of Adagen®, the first enzyme to treat a genetic enzyme deficiency disorder, the first ERT of an LSD was approved in 1991 when Ceredase® became available for Gaucher disease [99]. As of today, most FDA-approved enzymes for ERT are within the field of LSDs, where nine different indications have been addressed [100].

2.2.1 Lysosomal storage diseases LSDs are a family of more than 50 known rare genetic disorders that are associated with impaired or depleted function of one or several lysosomal enzymes, which occur in approximately 1 out of 5000 live births [101]. The absence or impaired function of a lysosomal enzyme leads to accumulation of its substrate in the lysosome, which can induce cell death and organ failure. In some forms of LSDs the central nervous system (CNS) is affected, which makes them very severe and also complicates treatment with ERT due to the difficulty of passing the blood-brain barrier. Lysosomal enzymes are complex enzymes that require co- or post-translational modifications such as glycosy- lation, mannose phosphorylation, and modification of active site residues, which can

22 Chapter 2. Therapeutic enzymes

only be achieved in eukaryotic expression systems. In general, lysosomal enzymes are taken up by cells and targeted to the lysosome via receptor-mediated transcytosis by the mannose-6- receptor or Limp2 receptor, which interact with glycan units on the surface of the enzymes [102]. A common challenge of therapeutic lysosomal enzymes is that their biodistribution upon administration is highly dependent on the relative density of receptors on different cell types, causing variable responses between different organs. Other challenges with respect to ERT of LSDs include the lengthy weekly or biweekly infusions that can be difficult for the often very young patients, as well as the extremely high cost of production due to the complexity of the enzymes [103, 104]. Apart from the aforementioned challenges, the general issue of immunogenicity needs to be addressed in the context of ERT: the intracellular nature of the enzymes combined with low or absent expression of endogenous enzyme increases the likelihood of triggering an immune response upon administration to the blood.

Hunter syndrome, or mucopolysaccharidosis type II (MPS II), is an LSD associated with lack of or impaired function of lysosomal enzyme iduronate-2-sulfatase (IDS). The CNS is affected in approximately two-thirds of MPS II patients and those with this severe, early-onset form of the disease seldomly reach adulthood [105]. ERT for MPS II with the recombinant human enzyme is currently available in two forms [100, 105]: , marketed as Elaprase®, which was approved in the US in 2006 [106, 107] and idursulfase beta, marketed as Hunterase®, which was approved in South Korea in 2012 [108, 109]. Whereas Elaprase® is produced in a human cell line, Hunterase® is produced in Chinese hamster ovary (CHO) cells. Both therapies are effective in reducing urinary excretion of GAGs [109], but have no documented impact on CNS manifestations. Recently, fusion of IDS to a humanized anti-transferrin receptor antibody has been investigated as a means to cross the blood-brain barrier [110]. In Paper IV we investigated the use of ASR to enhance the therapeutic potential of recombinant human IDS.

2.2.2 Amino acid catabolism Another category of inherited metabolic disorders is associated with amino acid catabolism, i.e. the breakdown of amino acids. One example is phenylketonuria (PKU), which is caused by a defect in the gene that encodes the enzyme phenylalanine hydroxylase (PAH). The highest incidence of PKU is amongst Caucasians, where it affects approxi- mately 1 in 10,000 [111]. PAH is mostly located in the liver and catalyzes the hydroxy- lation of phenylalanine to tyrosine, as phenylalanine catabolism normally proceeds via the tyrosine breakdown pathway (Figure 5). Deficiency of PAH leads to accumulation of phenylalanine in blood which, if untreated, can lead to growth failure, microcephaly, seizure and reduced cognitive development. PKU can be included in newborn screen- ing and early detection allows for dietary restrictions with limited phenylalanine intake

23 Chapter 2. Therapeutic enzymes

L-Phenylalanine

L-Phenylalanine hydroxylase (PAH) PKU L-Tyrosine

L-Tyrosine aminotransferase HT type II 4-Hydroxyphenylpyruvate 4-Hydroxyphenylpyruvate dioxygenase HT type III Homogentisate

Maleylacetoacetate

Fumarylacetoacetate Fumarylacetoacetate (FAH) HT type I

Fumarate Acetoacetate

Figure 5: Inherited metabolic disorders related to phenylalanine and tyrosine catabolism. The catabolic pathway of L-Phenylalanine and L-Tyrosine in humans is shown, with arrows representing catabolic steps performed by different enzymes. Metabolic disorders PKU and HT type I/II/III are associated with distinct steps in the pathway and are shown in bold next to the corresponding metabolic enzyme in italics. A red arrow indicates the enzyme which is inhibited by Nitisinone in the treatment of HT type I, essentially mimicking the HT type III phenotype. that can significantly improve patient prospects, but do not reverse neurological dam- age. Moreover, tyrosine becomes an essential amino acid in absence of phenylalanine and an appropriate, balanced, diet is very difficult to maintain for a lifetime. Hereditary tyrosinemia (HT) is the collective name for a group of metabolic disorders associated with the same catabolic pathway, where patients have genetic defects in one of the genes coding for the enzymes that catabolize tyrosine (Figure 5) [112]. HT type I is charac- terized by impaired or lack of function of fumarylacetoacetate hydrolase, resulting in accumulation of maleylacetoacetate and fumarylacetoacetate: toxic intermediates that can induce organ failure and carcinogenesis. It has a prevalence of less than 1 in 100,000 in the general population and similarly to PKU, dietary restrictions are required that limit the intake of tyrosine as well as phenylalanine. In addition, HT type I patients

24 Chapter 2. Therapeutic enzymes

are currently treated with Nitisinone (NTBC), which inhibits an upstream enzyme and mimics the phenotype of HT type III (Figure 5) [113].

Several treatment options have been explored for PKU [114], including administration of a PAH-activating cofactor, gene therapy aimed at restoring PAH expression in the liver [115], and oral administration of bacteria that express therapeutic enzymes [116, 117]. Two types of enzyme therapy have been investigated for treatment of PKU: ERT with recombinant PAH and enzyme substitution therapy via an alternative pathway using phenylalanine ammonia-lyase (PAL). Despite being the more obvious path to take, replacement therapy with recombinant PAH proved to be challenging due to various factors, mainly the requirement of a cofactor and inherent instability that complicated large-scale purification [118, 119]. PAL is a non-mammalian enzyme that catalyzes the deamination of phenylalanine to cinnamic acid – a non-toxic compound that is secreted as hippurate in urine [120]. Some advantages of PAL as compared to PAH are its robustness, stability and ease of expression, as well as the fact that it does not require a cofactor. The potential of PAL for enzyme substitution therapy of PKU was recognized already in the 1980s; initial experiments focused on enteral administration in microcapsules with the aim to deplete phenylalanine in the gastrointestinal tract using PAL extracted from yeast [121]. In the late 1990s recombinant PAL from yeast was expressed in E. coli, making production economically feasible, and the possibility of parenteral administration was investigated [122]. Since then, PAL enzymes from different species have been explored as potential therapeutic enzymes and optimized for therapeutic application [123]. Finally, enzyme substitution therapy using PAL was approved by the FDA in 2018 and is marketed as Palynziq®, a PEGylated bacterial PAL from Anabaena variabilis that is administered through subcutaneous injection [124]. No enzyme therapy is currently available for hereditary tyrosinemia, but advances in enzyme therapy for PKU have inspired similar efforts for HT [125]. Originating from the same enzyme family as PAL, tyrosine ammonia-lyases (TAL) catalyze the deamination of tyrosine to form p-coumaric acid – a non-toxic product similar to cinnamic acid [126]. The therapeutic potential of this enzyme family for treatment of hereditary tyrosinemia was investigated in Paper III.

25

Chapter 3

From model systems to complex applications

After an introduction to enzymes as biological catalysts in Chapter 1, Chap- ter 2 described the applications and challenges of therapeutic enzymes, in particular for treatment of certain rare inherited metabolic disorders. Be- fore applying ASR to more complex enzymes and exploring it as a method to enhance the therapeutic potential of TAL and IDS, the method was tested in a less complex model system: bacterial terpene cyclases. This chapter provides a background for the enzymes that were used in the work presented in this thesis, which are described in Chapter 6.

27 Chapter 3. From model systems to complex applications

3.1 Terpene cyclases

Terpenes and terpenoids are secondary metabolites that constitute the largest known class of natural products and carry out a wide range of functions in all forms of life [127, 128]. Their natural roles are often unknown, but are mostly related to signaling to and protection from other organisms [129]. This diverse group of natural compounds has been found to harbor anti-microbial, anti-viral and anti-cancer properties, and has attracted increasing attention over the past decades. Terpenes consist of linked isoprene units with the basic formula (C5H8)n, as initially proposed by Ruzicka in 1953 [130]. They represent mostly cyclic hydrocarbon compounds that are often decorated with functional groups, in which case they are technically referred to as terpenoids. For clarity, both terpenes and terpenoids are referred to as “terpenes” in this thesis.

Prefixes used in the nomenclature of terpenes are based on the numberofC5-units as follows: hemi- (C5), mono- (C10), sesqui- (C15), di- (C20), sester- (C25), tri- (C30), sesquar- (C35), and poly- (C5n) terpenes (Figure 6A). The formation of longer linear precursors from C5-units for subsequent cyclization is catalyzed by a group of terpene synthases called prenyltransferases. Another group of terpene synthases is formed by terpene cyclases; enzymes that perform highly complex cyclization reactions of the linear precursors, thereby contributing to the vast structural diversity of this natural product class. In 2017, more than 80,000 terpene structures had been reported in the natural product dictionary [128].

The family of terpene cyclases is typically categorized into two different classes accord- ing to their mechanism of carbocation formation for reaction initiation (Figure 6B) [128]. Class I terpene cyclases act on activated substrates—isoprenoid diphosphates— and make use of metal-assisted ionization. The substrate is positioned in the active site through coordination of the diphosphate group to a Mg2+ cluster, which in turn interacts with the highly conserved aspartate-rich motif (DDxx(x)D) and NSE triad (NDxxSxxxE) [131, 132]. The cyclization reaction is initiated by metal-assisted ioniza- tion of the diphosphate group and proceeds via a carbocation intermediate that triggers the ring closure reactions. In class II terpene cyclases, on the other hand, the substrate’s terminal alkene or epoxide group is protonated by a conserved aspartic acid, yielding the carbocation that triggers the subsequent ring closures. For both classes, the highly reactive carbocation generated during the reaction initiation propagates through the substrate and induces ring closure reactions, forming several cyclic intermediates that lead to the final product after deprotonation. Due to this mechanism, the conformation of the substrate in the active site strongly influences the reaction sequence and the pre-folded substrate usually resembles the final product structure. The tightness of the active site cavity is therefore related to the product variety of terpene cyclases, some

28 Chapter 3. From model systems to complex applications

A

Geranyl pyrophosphate (GPP)

Farnesyl pyrophosphate (FPP)

Geranylgeranyl pyrophosphate (GGPP)

Squalene

B

Mg2+ Mg2+ OPP3- R 4- Class I mechanism OPP R 2+ 2+ Mg Mg2+ Mg Mg2+

R R Class II mechanism

Figure 6: Terpene cyclases. A) Four examples of linear substrates of terpene cyclases: C10-substrate geranyl pyrophosphate (GPP), C15-substrate farnesyl pyrophosphate (FPP), C20- substrate geranylgeranyl pyrophosphate (GGPP) and C30-substrate squalene. B) Terpene cyclases can be distinguished based on their mechanism of reaction initiation: whereas the carbocation that triggers the ring closure reactions is created through metal-assisted ionization of the substrate’s diphosphate group in Class I terpene cyclases, a conserved acid protonates the substrate’s terminal alkene in Class II terpene cyclases. of which have been reported to produce over 50 different products from the same sub- strate [133]. Moreover, several studies have investigated the substrate specificity of the terpene cyclase family and have shown that certain representatives can accept multiple substrates of different lengths [134, 135]. Promiscuity in diterpene cyclase-catalyzed reactions was investigated in Paper I and II.

29 Chapter 3. From model systems to complex applications

With the increasing number of available resolved crystal structures of terpene synthases their structural homology has become apparent: they generally consist of various com- binations of distinct folds – also referred to as α, β, and γ [128]. Class I terpene cyclase activity is associated with the α-fold, which contains the active site and metal-binding motifs and is hypothesized to have evolved from enzymes with prenyltransferases ac- tivity [136]. The γ-fold likely evolved from the β-fold after gene duplication. This relationship has been suggested based on structural homology as well as 23% sequence identity between the β and γ domain in squalene hopene cyclase from Alicyclobacillus acidocaldarius [137]. The βγ-fold was then formed by insertion of the γ domain between two helices in the β domain and class II catalytic activity evolved at the domain-domain interface. Fusion of the α domain to the βγ domain gave rise to the αβγ structure, which was initially identified in taxadiene synthase [132] and is associated with both class I and class II (and sometimes dual) activity. Structural investigation of an ances- tral class I diterpene cyclases was performed in Paper II.

3.1.1 Spiroviolene synthase Discovery of new terpene synthase enzymes is continuously ongoing, as exemplified by the publication of two bacterial class I diterpene cyclases in 2017 [138]. A class I terpene cyclase from Streptomyces violens was expressed in a heterologous bacterial expression system, purified and characterized. The enzyme was found to react withC20- substrate geranylgeranyl pyrophosphate (GGPP) and the product was isolated. Using NMR spectroscopy it was identified to be tetracyclic spiroviolene and the enzyme was named spiroviolene synthase (SvS, EC 4.2.3.158) accordingly. In addition, a reaction mechanism for the formation of spiroviolene was suggested based on isotope labeling experiments in conjunction with NMR, which is shown in Chapter 6. SvS harbors the highly conserved class I terpene cyclase motifs described above: metal-binding aspar- tate rich motif DDHRCD and NSE triad NDRHSLRKE. Due to its relatively small size of approximately 41 kDa and the availability of a sufficient number of homologous sequences in GenBank, SvS was used as a starting point for ancestral reconstruction of bacterial diterpene cyclases in Paper I.

3.2 Ammonia lyases

Ammonia lyases (EC 4.3.1.X) are carbon-nitrogen lyases that release ammonia as one of the products through cleavage of a C-N bond in the substrate. Up to 32 EC classes have been defined to date, most of which are related to amino acid-derived substrates [125]. One particular group of ammonia lyases acts on aromatic amino acids, in particular L-

30 Chapter 3. From model systems to complex applications

histidine (L-His), L-phenylalanine (L-Phe), L-tyrosine (L-Tyr) and L-tryptophan [139]. The enzymes in this group are histidine ammonia-lyase (HAL, EC 4.3.1.3), phenylala- nine ammonia-lyase (PAL, EC 4.3.1.24), tyrosine ammonia-lyase (TAL, EC 4.3.1.23), and tryptophan ammonia-lyase (EC 4.3.1.31). HAL, PAL and TAL were discovered much earlier than tryptophan ammonia-lyase and have been extensively characterized. They catalyze the reversible deamination of L-His, L-Phe and L-Tyr to urocanic acid, cinnamic acid and p-coumaric acid, respectively. One thing the enzymes in this class have in common is the use of catalytic moiety 4-methylidene-imidazole-5-one (MIO, formal name: 5-methylene-3,5-dihydro-4H-imidazol-4-one), which was first identified in the crystal structure of HAL from Pseudomonas putida [140]. MIO is formed through post-translational cyclization and condensation of an alanine-serine-glycine triad in the active site—although the first serine may sometimes be a cysteine or threonine—and is required for catalytic activity. Despite extensive efforts, including tritium incorporation [141], measuring kinetic isotope effects with nitrogen and deuterium [142], and compu- tational approaches [143, 144], the exact mechanism by which the proton is abstracted from the β-carbon remains debated [125]. The two generally considered hypotheses are a Friedel-Crafts mechanism via an aromatic ring-MIO intermediate [145], or an E1-cB elimination via an amine-MIO intermediate, the latter of which is supported by structural data and activity measurements with alternative substrates.

A B C

Figure 7: Structures of aromatic amino acid ammonia-lyases. A) Bacterial HAL from Pseudomonas putida (PDB ID: 1B8F [140]), the first structure that became available for this enzyme family. B) Structure of fungal PAL from Rhodosporidium thuroloides (PDB ID: 1Y2M [13]) with the typical larger size. C) Structure of bacterial TAL from Rhodobacter sphaeroides (PDB ID: 2O6Y [146]).

31 Chapter 3. From model systems to complex applications

The structure of HAL from P. putida was also the first crystal structure that became available for this enzyme family [140] and revealed the typical tetrameric, α-helical structure that characterizes all aromatic amino acid ammonia-lyases of which a struc- ture is known, e.g. [146–148] (Figure 7). Some fungal and plant enzymes have an additional α-helical domain at the C-terminus and are therefore larger in size (Figure 7B). HALs, PALs and TALs are active as tetramers, as each monomer contains an active site in which residues from two other subunits are involved. The orientation of the α-helices and their dipole moments could be a possible indication of the E1-cB elimination mechanism [147]: six out of seven helices have their positive faces pointed towards the active site, thereby creating an overall positive charge in the active site that could stabilize the negatively charged intermediate [149].

HAL is widespread in prokaryotes and eukaryotes, including humans, and catalyzes the first reaction in the catabolic pathway of L-His. It is associated with histidinemia, an enzyme deficiency disorder analogous to PKU and HT (see Amino acid catabolism in Chapter 2), characterized by a genetic defect in the HAL gene [150]. PAL, on the other hand, has been mostly identified in fungi (Dikarya) and land plants, where it consti- tutes the first enzyme in the biosynthetic phenylpropanoid pathway [151], acting atthe interface between primary and secondary metabolism. The second enzyme in this path- way is cinnamate 4-hydroxylase (C4H) that catalyzes the hydroxylation of PAL product cinnamic acid to p-coumaric acid, which is also the product of L-Tyr deamination by TAL. Enzymes with exclusive TAL activity have thus far only been identified in bacte- ria [152–154], which might explain why few bacterial PAL have been identified. Some examples of bacterial PAL are found in soil bacteria [155] and cyanobacteria [156], the latter of which was selected as the therapeutic enzyme for enzyme substitution therapy of PKU. The phenylpropanoid pathway was initially hypothesized to have emerged ca. 500 million years ago following the transition from plants to land, which exposed them to new factors such as UV radiation and microbial attacks [157]. More recent phyloge- netic analyses, however, support the notion that the part of this pathway downstream to C4H is in fact more ancient and was acquired through horizontal gene transfer from photosynthetic algae to the ancestor of either land plants or fungi and then transferred to the other group [151]. It is hypothesized that PAL and C4H were also acquired via horizontal gene transfer, but from symbiosis with soil bacteria after the transition to land [157]. Due to the essentiality of PAL in plants, most of them harbor multiple copies. In fungi, phenylalanine catabolism is used as a carbon and nitrogen source. The few enzymes with TAL activity only that have been identified in bacteria are charac- terized by low turnover numbers as compared to PAL [154]. Their role is likely more specialized, e.g. p-coumaric acid is used as a cofactor for the photoactive yellow protein [152], providing a possible explanation for the low activities.

32 Chapter 3. From model systems to complex applications

3.2.1 PAL/TAL Another EC class of aromatic amino acid ammonia-lyases are enzymes with both PAL and TAL activity, or PAL/TALs (EC 4.3.1.25), which have been observed in fungi and monocotylic plants. The first reported dual activity was in PAL/TAL from Zea mays [158, 159] and has later been identified in several enzymes from plants and fungi [160–163]. Interestingly, these bispecific PAL/TAL enzymes often show higher activi- ties towards L-Tyr than the L-Tyr specific TALs from bacteria. Several determinants of specificity have been described, most notably the mutation of a histidine topheny- lalanine in the active site that completely switched selectivity from L-Tyr to L-Phe in the enzyme from Rhodobacter sphaeroides [146, 164]. However, it appears that these findings do not generally apply to all enzymes in this family and substrate specificity of PAL/TALs remains complex [125]. Due to the high activities towards both L-Phe and L-Tyr, two substrates that accumulate in patients with hereditary tyrosinemia, PAL/TALs caught our attention as potential therapeutic enzymes. In paper III, ASR was explored as an enzyme engineering method to improve their therapeutic potential, in particular with respect to stability.

3.3 Lysosomal sulfatases

Lysosomes are subcellular, membrane-enclosed compartments that act as the main site of intracellular digestion; hosting the degradation of various macromolecules that are targeted there [165, 166]. This degradation is performed by a wide range of hydrolytic enzymes, including proteases, nucleases, glycosidases, lipases, phosphatases and sulfa- tases. Lysosomal enzymes are soluble acid that function at low pH (pH<5), conforming with the general pH of 4.5-5 maintained by lysosomes, ensuring protec- tion of the contents of the cytosol (pH≈7.2) from its own digestive enzymes. Proteins targeted to the lysosome—both membrane-bound and soluble—are co-translationally transported into the rough endoplasmic reticulum (ER), which provides a selective en- vironment for folding and processing of complex proteins [167]. After processing in the ER the proteins continue to the Golgi apparatus, from which subsequent delivery to the lysosome occurs through receptor-dependent vesicle transport. As discussed in section Lysosomal storage diseases in Chapter 2, lysosomal enzymes are targeted to the lysosome via receptor-mediated transcytosis by the mannose-6-phosphate receptor or Limp2 receptor [102, 168, 169]. Lysosomal enzymes are synthesized with an N-terminal signal sequence of 20-25 residues that allows translocation to the lumen of the ER, where the signal sequence is removed and molecular chaperones assist in folding by a series of controlled binding and release cycles [167, 170]. These molecular chaperones also play a crucial role in processing events such as asparagine-linked (N-linked) glyco-

33 Chapter 3. From model systems to complex applications sylation, by exposing the polypeptide chain in an appropriate conformation (Figure 2). Upon folding and glycosylation in the ER, lysosomal hydrolases are transported to the Golgi apparatus for glycan processing, which consists of the removal and addition of glycan units. A critical modification for lysosomal enzymes is the addition of mannose- 6-phosphate (M6P), which binds to the M6P receptor in the Golgi, thereby initiating packaging of the enzyme in clathrin-coated vesicles for transport to the lysosome.

One particular group of lysosomal enzymes are sulfatases (EC 3.1.6.X), which cat- alyze the hydrolysis of sulfate ester bonds. Sulfatases require an additional process- ing step in the ER; namely the enzyme-catalyzed modification of a cysteine residue to formylglycine (fGly). fGly formation is a multistep redox process catalyzed by the formylglycine-generating enzyme (FGE), which binds to the unfolded polypeptide chain during translation [171] (Figure 2). fGly was first discovered in the context of multiple sulfatase deficiency, a rare lysosomal storage disorder associated with impaired function of all lysosomal sulfatases [172, 173]. The sequence motif recognized by FGE is located downstream of the cysteine that is modified and is conserved in all human sulfatases, suggesting a general binding mechanism [174]. However, conversion of the cysteine to fGly varies between sulfatases and depends on flanking sequences of approximately ten residues on either side of the conserved motif. As of today, 17 human sulfatases have been identified, less than half of which are located in the lysosome [175]. Despite rela- tively low sequence identities among the different enzymes, active site architecture and nearly all catalytic residues (including fGly) are highly conserved, suggesting a common catalytic mechanism.

3.3.1 IDS IDS, or iduronate-2-sulfatase (EC 3.1.6.13), is a lysosomal sulfatase involved in the catabolism of glycosaminoglycans (GAGs). It catalyzes the hydrolysis of the C2-sulfate ester bond at the non-reducing end of 2-O-sulfo-alpha-L-iduronic acid residues in GAGs heparan sulfate and dermatan sulfate. Lack of or reduced IDS activity due to genetic defects in the IDS gene are known as Hunter Syndrome, or Mucopolysaccharidosis type II, which was first described in 1917 by Charles Hunter (see also Lysosomal storage diseases in Chapter 2) [176]. Over 500 pathogenic mutations have been identified at 128 positions in the 550 residue-long sequence, of which the first 33 residues consti- tute an N-terminal signal sequence that is removed during and after translation [177]. The crystal structure of IDS became available in 2017 and has provided insight into the disease-causing mechanism of most pathogenic mutations [178]. It also revealed that, despite low sequence identity, IDS shares its overall fold and active site architec- ture with other human sulfatases. IDS is monomeric and consists of two subdomains, which are connected by a loop of ten residues that is susceptible to proteolytic cleav-

34 Chapter 3. From model systems to complex applications age in the lysosome and was not resolved in the crystal structure. Subdomain 1 is the heavier domain of 42 kDa and contains the catalytic site, including fGly, which is post-translationally formed from cysteine 84 in IDS. Moreover, the structure confirmed the existence of eight N-linked glycosylation sites, one of which (N31) is not retained in the enzyme after proteolytic processing, as well as a calcium ion in the active site. Due to its monomeric nature and the availability of a crystal structure, as well as sufficient homologous sequences, IDS was selected for ancestral reconstruction of a lysosomal enzyme, which was explored in Paper IV.

35

Chapter 4

Enzyme engineering

Enzyme engineering refers to efforts with the aim to generate enzymes with altered properties compared to the starting point, which can be achieved by introducing changes in the sequence either randomly or guided by some kind of rationale (or by combining both approaches). This chapter intro- duces important concepts in enzyme engineering, such as directed evolution and structure- and sequence-based rational design approaches, as well as computational and experimental methods that have been used in the work presented in this thesis.

37 Chapter 4. Enzyme engineering

4.1 Directed evolution

The most common random approach to enzyme engineering is directed evolution, which mimics the evolutionary process in a laboratory: mutations are randomly introduced to the sequence of an enzyme and the resulting mutants are screened and selected for de- sired properties [179]. What makes directed evolution an appealing method for enzyme engineering, apart from harvesting the power of evolution, is that no prior structural or mechanistic knowledge of the protein in question is required. If such knowledge is available, however, it may be combined with random mutagenesis into a semi-rational strategy where an optimized library is designed, e.g. by choosing positions at which mu- tagenesis should occur [180]. Enzyme engineering through directed evolution is usually an iterative process of several rounds of genetic diversification and selection, where mu- tations are introduced to previously selected variants to further optimize the property of interest. Random mutagenesis can be achieved in various ways, e.g. by error-prone polymerase chain reaction or use of degenerate codons for saturation mutagenesis, and generate libraries of a desired size. The latter is an example of a semi-rational ap- proach, as the positions at which these codons are introduced need to be determined beforehand. Strategies such as gene shuffling may further contribute to the complexity of the library and also bring laboratory evolution closer to natural evolution, where members of the genetic population are not always independent from each other [80, 81]. Efforts to engineer enzymes towards new chemistry are often inspired by thetheory that new catalytic activities in enzymes can evolve from weak promiscuous activities of their ancestors, and selecting the right scaffold is therefore highly important [56, 79]. Apart from the starting point, the appropriate choice of screening method (“you get what you screen for”, as posed by You and Arnold [82]) and selection stringency are key determinants of the outcome and should be carefully considered for each individual target. The throughput of the chosen screening method needs to be taken into account when designing libraries, since there is little point in increasing library size if not all variants can be screened. Another requirement is that the system must maintain the physical link between the phenotype (the property of interest) and the genotype (the corresponding DNA that codes for the amino acid sequence) to allow for proper selec- tion. An inherent weakness of the stepwise introduction of mutations as it is performed in directed evolution is that the method relies on the assumption that there is a path leading to improvement at each iteration in the process. As such, cooperative effects of combined mutations can be missed, which strongly influences the evolutionary trajec- tory, but can be overcome by gene shuffling. However, introducing many mutations at the same time exponentially increases library size, which in turn may be limited by the available screening and selection method. Moreover, a large library is likely to contain many inactive variants, as most mutations are deleterious by chance [181].

38 Chapter 4. Enzyme engineering

4.2 Rational design

Although an in-depth analysis of improved variants obtained through directed evolution may lead to a rational understanding of the enzyme’s sequence-function relationship, this knowledge is not a prerequisite for random enzyme engineering efforts. In contrast, rational design strategies employ prior knowledge of the enzyme to improve properties of interest in a hypothesis-driven way. This knowledge may be derived from the enzyme’s three-dimensional structure, homology to previously studied enzymes, or its amino acid sequence alone. Depending on the nature of the available information, computational analyses can be used [182] e.g. to identify mutational “hot-spots” [180] or predict the effect of mutations in silico.

4.2.1 Computational methods for enzyme engineering The increasing availability of protein structures obtained from high-resolution X-ray crystallography, NMR spectroscopy and cryo-electron microscopy greatly facilitates structure-based rational design. If no structure is available for the protein of interest but does exist for a homologous protein (i.e. a protein that is assumed to have evolved from the same ancestor), homology modeling can be performed. Homology modeling is based on the fact that protein structure is much more conserved than sequence, which is evident from the observation that only a limited number of folds occur in nature de- spite an unlimited sequence space [183, 184]. This relation between sequence homology and structure conservation allows for modeling of a protein’s 3D structure based on its sequence alone, using an available structure as a template, with a predictive accuracy that can be equivalent to a low resolution X-ray structure [185]. In Paper III and IV, we have used homology modeling to investigate mutations in ancestral enzymes and their possible effects on enzyme dynamics. With a model for the enzyme structure available, molecular modeling of the enzymatic reaction can be performed based on different levels of theory. In molecular mechanics (MM), the potential energy ofthe protein and ligand can be calculated using classic mechanics, where atoms are modeled as spheres and bonds as springs. The potential energy function is referred to as a force field and comprises different potential components, e.g. deviation of bond lengths and angles from their equilibrium value, as well as electrostatic interactions. In contrast, quantum mechanics (QM) allows for detailed calculations of the geometry and potential energy of transition states or intermediates along a reaction coordinate, including the bond-forming and -breaking processes that occur. Due to computational constraints, QM is typically applied to much smaller systems and may be combined with MM in hybrid QM/MM approaches, where QM is used to model the active site and the rest of the enzyme is treated using MM.

39 Chapter 4. Enzyme engineering

4.2.2 Molecular dynamics Molecular dynamics (MD) simulations can be used to analyze molecular motion over time, by combining Newton’s second law of motion with a force field of choice to as- sess the energy of the system and its components at sequential time steps. Various force fields, e.g. AMBER [186], have been specifically developed for the simulation of proteins and are implemented in different programs for MD simulations. Examples of such programs are YASARA [187] and GROMACS [188, 189], which have been used for homology modeling and MD simulations in Paper III and Paper IV, respectively. Computational power currently allows for MD simulations up to milliseconds within rea- sonable timeframes, however, due to practical constraints MD trajectories are typically run in the nanosecond (ns) range. Although enzyme-catalyzed events greatly exceed this timescale, 100-200 ns trajectories can be sufficient to observe some relevant dynam- ical effects, such as side chain rotations and minor loop motions, that may contribute to a deeper understanding of the reaction. In Paper IV we used MD simulations to investigate whether ancestral mutations in a lysosomal enzyme had significant effects on enzyme dynamics. When comparing dynamics of different enzymes, the root mean square fluctuation (RMSF) is commonly used: it represents the average fluctuation of an atom or residue over the course of the MD trajectory. Analysis of the RMSF can reveal differences in dynamics between distinct regions of the proteins or the influence of mutations in different enzyme variants.

4.2.3 Sequence-based enzyme engineering If no (homologous) structure is available for the enzyme of interest, rational engineering efforts can be guided by its sequence, or rather by a collection of sequences thatare evolutionary related. These sequences can be aligned in a multiple sequence alignment (MSA), where each column represents a position in the common ancestor that the in- cluded proteins are assumed to have evolved from [190]. The distribution of amino acids among the proteins at each position—or site-specific frequency—can provide in- sight into enzyme evolution, as beneficial mutations are selected for during evolution and are thus generally overrepresented [191, 192]. The sequence that consists of the most common amino acid in each position in an MSA is referred to as the consensus sequence and has been investigated as an enzyme engineering approach, in particular with respect to stability [193]. A weakness of the consensus approach is that it treats positions in the sequence individually and therefore does not take epistatic effects and co-evolution of positions into account. Another sequence-based method that makes use of the evolutionary information harbored in MSAs is ancestral sequence reconstruction (ASR), where sequences of putative ancestors are reconstructed based on the sequences of modern homologues [194]. In contrast to the consensus method, ASR combines the

40 Chapter 4. Enzyme engineering

site-specific frequencies with an evolutionary model to account for possible epistatic interactions and co-evolution of certain residues. In Paper I, we compared thermosta- bility and activity of ancestral terpene cyclases to the corresponding consensus enzyme from the same MSA. ASR was used in Paper I, III and IV to create novel enzymes with increased thermostability and/or activity. A more detailed discussion of ASR can be found in Chapter 5.

A rather different form of rational protein or enzyme engineering is de novo design, which aims to design proteins or enzymes from scratch [195, 196]. Although catalytic activities of such enzymes are typically low [197], they can be enhanced by directed evo- lution – another example where rational design and random mutagenesis are combined [198]. Given the complexity of the evolution of catalytic activity, in particular with respect to epistasis and long-range interactions in the enzyme [199], it is not difficult to imagine that de novo design of active catalysts is highly challenging.

4.3 Experimental methods in enzyme engineering

Regardless of the chosen strategy to introduce changes to the enzyme’s sequence, the resulting enzyme variants need to be produced in order to allow characterization of their biochemical and biophysical properties. A general experimental workflow for enzyme engineering is shown in Figure 8 and can be divided in a protein production part and a characterization part. The most commonly used techniques in the work presented in this thesis are briefly discussed in this section.

4.3.1 Production of enzyme variants Depending of their complexity, recombinant proteins may be expressed in one of many different host systems. The enzymes from Paper I, II and III, SvS and PAL/TAL, were expressed in bacterial cells, namely in E. coli BL21(DE3) [200]. Ever since it was shown in the 1970s that functional proteins could be expressed from genes cloned into a bacterial plasmid [201], E. coli (and the BL21(DE3) strain in particular) has been the most commonly used host system for the expression of recombinant proteins [202]. The main advantages of the use of this gram-negative bacteria are the ease of cultivation at high densities, which enables a cost-effective production of proteins, as well as its short doubling time. In contrast to SvS and PAL/TAL that originate from bacteria and yeast, respectively, lysosomal enzymes require post-translational modifications by other enzymes and cannot be functionally expressed in prokaryotic cells [203]. IDS enzymes in Paper IV were therefore expressed in CHO cells; one of the most commonly

41 Chapter 4. Enzyme engineering used expression systems for the production of recombinant therapeutic proteins [204]. Advantages of theses mammalian cells include their ability to grow in suspension culture, high protein yields and an established regulatory track record. One drawback with CHO cells, however, is that glycosylation patterns may vary a lot due to small differences in expression conditions and characterization of the glycans may be required [205]. Upon expression in the host cell, the protein of interest needs to be purified to allow for detailed characterization of its properties. To this end, two different affinity tags were employed in the work presented herein: His-tag (Paper I, II and III) and C-tag (Paper IV). A His-tag, or polyhistidine-tag, is an affinity tag consisting of histidine residues

(e.g. His6-tag) that is commonly used in immobilized metal affinity chromatography [206]. The strong interaction between histidine and the metal ion immobilized on the matrix (e.g. Ni2+) provides the basis for affinity purification, which can be achieved by elution at low pH or by sufficient concentrations of imidazole. A His-tag may be added at either N- or C-terminus and is widely employed in the purification of recombinant proteins. Due to their relatively small size and charge His-tags rarely affect protein activity, but may be cleaved off if a linker with a protease cleavage site is included [207]. The C-tag consist of four residues, EPEA, that are directly fused to the C-terminus of the protein and is one of the smallest affinity tags available. It was first identified ina phage display selection, where the C-terminal region of α-synuclein was found to bind to a camelid antibody [208] and is characterized by high selectivity and binding affinity [209]. Elution of C-tagged proteins from the resin can be achieved by lowering the pH (pH<3) or with high concentrations of MgCl2 at neutral pH. Affinity purification may be followed by additional purification steps, such as size-exclusion chromatography (SEC), before proceeding with characterization of the enzymes.

4.3.2 Characterization of enzyme variants Analytical SEC can be coupled to multi-angle light scattering (MALS) for characteri- zation of enzymes, in particular with respect to molecular weight and oligomerization [210]. SEC-MALS has been used in Paper II and III to determine the oligomeric nature of SvS and PAL/TAL, as well as the contribution of protein and glycan fractions to the molecular weight (MW) of the IDS variants in Paper IV. A schematic SEC-MALS profile is shown in Figure 8 (bottom right): complexes with higher molecular weights, e.g. dimers (purple), elute earlier than smaller units such as monomers (pink) and the intensity of the peak is an indication for the prevalence of the specific form. The MW of each peak, including the contributions of protein fractions and modifiers such as glycans, can be determined by MALS after elution. Another commonly used method to investigate structural aspects of proteins is circular dichroism (CD) spectroscopy, which can provide insight into secondary structural elements [211] and thermal unfold-

42 Chapter 4. Enzyme engineering

PRODUCTION

Mutations Expression Purification

CHARACTERIZATION v

T (°C) [S] t (min)

Stability Activity Structure

Figure 8: Experimental workflow in enzyme engineering. Enzyme engineering starts with the design of mutants, which is usually achieved by one of two general strategies. In rational design, previous knowledge guides the design of mutations e.g. by targeting the active site of an enzyme (shown in pink). In a random approach, mutations are introduced randomly without prior knowledge, which may result in mutations all over the structure (shown in green). The DNA of different mutants (represented by different colors green, pink and purple) can be synthesized and transformed into microorganisms such as E. coli for expression of the proteins. The enzyme variants can be obtained in purified form, e.g. by affinity purification using a tag at the N- or C-terminus of the protein that interacts with the resin of the column. After purification the enzyme variants can be characterized with respect to stability (e.g.by nanoDSF), activity (e.g. by kinetic analyses), structure (e.g. by SEC-MALS), or other biochemical and biophysical properties. ing [212]. CD is the differential absorption of left- and right-handed circularly polarized light, in the context of biological molecules by e.g. the amides in the protein backbone and aromatic side chains, which results in distinct CD spectra for secondary structure

43 Chapter 4. Enzyme engineering elements in proteins. Contributions of α-helices and β-sheets to a protein’s structure can be assessed based on its CD spectrum and can be a valuable tool when comparing enzyme variants. Moreover, measuring CD spectra at increasing temperatures can be used to analyze thermal unfolding: CD spectroscopy was used in Paper III to comple- ment thermostability analysis of ancestral PAL/TAL variants. A more commonly used method for analysis of thermostability in the work presented is nano differential scan- ning fluorometry (nanoDSF), which relies on the intrinsic fluorescence of tryptophans in the protein [213]. Due to its hydrophobic nature, tryptophan is often located in the hydrophobic core of the protein and is thus shielded from the aqueous solvent. Thermal unfolding of the protein and subsequent exposure of tryptophan residues to the solvent alters their fluorescent properties and results in a shift in wavelength of the emission peak. Emission is typically measured at 330 and 350 nm and the ratio of F350/F330 is plotted as a function of increasing temperature, resulting in curves that look similar to the ones represented in Figure 8 (bottom left). The melting temperature of a protein or enzyme is usually determined as the maximum of the first derivative of the F350/F330 versus temperature plot. Due to its ease of use and relatively high throughput, nan- oDSF has become an increasingly popular tool for screening protein or enzyme libraries [214] and was used in all studies presented in this thesis.

When it comes to enzymes, catalytic activity and kinetic parameters are often the most important aspects when characterizing different variants. The most straight-forward way to measure enzyme activity is to follow the formation of product or the loss of substrate over time, either directly or indirectly, or in a coupled assay. In some cases it may be necessary to separate the product from other reactants and depending on its physicochemical character, different methods of detection can be used, such as chro- matography/flame ionization (Paper I and II) and spectroscopy (Paper III and IV). For the detection and quantification of spiroviolene and other products of SvS enzymes in Paper I and II we used gas chromatography (GC) coupled to mass spectroscopy (GC- MS) and a flame ionization detector (GC-FID), respectively. GC is commonly usedto separate volatile compounds in a reaction and FID can be used to quantitatively mea- sure the analytes in a gas stream [215]. The retention time, the length of time between application of the sample to the column and the appearance of the signal peak in the spectrum, is typically used to identify product peaks. Because the absolute response of analytes in GC can vary a lot, product peaks can be related to an internal standard (IS) of known concentration (e.g. decane in Paper I) and a relative response factor (RRF ) can be calculated to account for the difference in detector response between the internal standard and the reaction product [216]. The concentration of product (P ) can then be calculated as follows, where c and A are the concentration and peak area, respectively:

44 Chapter 4. Enzyme engineering

AP cIS cP = (19) AIS RRF

Performing single point measurements at different time points within the linear phase of the reaction for different substrate concentrations allowed for determination of kinetic parameters for SvS and ancestral variants in Paper I and II. Because the PAL/TAL products from both L-Phe and L-Tyr absorb light of a specific wavelength, it was possible to use absorption spectroscopy to directly and continuously follow product formation during activity assays in Paper III. By measuring absorption at 280 nm and 315 nm, respectively, the concentration of cinnamic acid and p-coumaric acid could be determined using the Lambert-Beer law, the optical path length and the determined extinction coefficients of the two products:

A = εlc (20)

Measuring absorbance over time enabled determination of the slope, ∆A, which could be converted to reaction rate for subsequent kinetic analyses by calculating ∆c from Equation 20. In Paper IV, a coupled assay was used to allow for detection of IDS activity using fluorescence spectroscopy, similar to one that had previously been described [217]. As a first step, the sulfate group of an artificial substrate that mimics natural substrates heparan sulfate and dermatan sulfate was removed by IDS. The product of the IDS reaction was subsequently processed by α-L-iduronase to release 4-methylumbelliferone (4-MU), which can be detected by measuring emission at 460 nm after excitation at 350 nm. Due to the introduction of an additional enzyme in the reaction set-up, single point measurements with a fixed substrate concentration were used instead of kinetic analyses. A standard curve with defined concentrations of product 4-MU was used to convert fluorescence intensity measurements into concentration units.

45

Chapter 5

Ancestral sequence reconstruction

One of the sequence-based approaches to enzyme engineering introduced in the previous chapter was ancestral sequence reconstruction, or ASR – a bioinformatics method to statistically reconstruct sequences of ancestral enzymes. The concept of inferring ancestral polypeptide sequences based on available modern (extant) sequences was first posed in 1963 by Pauling and Zuckerkandl [218]. They predicted that it would be possible to synthe- size such reconstructed ancestral polypeptide chains in the future and to make inferences about their functions by characterizing their biochemical properties. They also highlighted the potential use of ASR to shed light on the process of evolution, in particular for cases where no fossil records are available. In the past two decades, the use of this method has increased substantially, not only in attempts to understand protein and enzyme evo- lution, but also in the search for enzymes with interesting properties. We explored the potential of ASR as an enzyme engineering strategy in Paper I-IV, which are described in detail in Chapter 6. This chapter focuses on the concept of ASR and introduces different methodologies and applications of this approach.

47 Chapter 5. Ancestral sequence reconstruction

5.1 Phylogenetic trees

Ancestral sequence reconstruction is based on phylogenetics – the study of evolutionary history and relationships among a set of extant entities (taxa). In general, these taxa may be represented at the species level, organismal level or even at the molecular level (genetic level), as long as they are homologous, i.e. they evolved from a common ances- tor. The evolutionary relationships between homologues can be captured in branching diagrams that are referred to as phylogenetic trees. The first tree-like diagram of such kind was published by Augustin Augier in 1801 and depicted his view of the natural relationships between different plant species that were known at the time [219, 220]. In Charles Darwin’s On the origin of species, the only illustration was a drawing of the tree of life, which he presented in the context of his theory of evolution [221]. In their 1965 publication, Zuckerkandl and Pauling discuss different types of molecules as the basis for a molecular phylogeny and pose that nucleic acids are the “master molecules” for phylogenetic inference [222]. When DNA sequencing first became pos- sible and later became widely accessible, it added a new dimension to the meaning and applicability of phylogenetics: its potential as a tool for understanding evolution. The near-exponential growth of available sequences in databases that continues today enables increasingly precise inferences about evolutionary history, perhaps most impor- tantly about the evolution of life on earth. The tree of life, depicting the evolution of the domains of life from the last universal common ancestor, is one of the key organiza- tion diagrams in biology [223–225]. Rather than looking at one single gene, inference of the tree of life is performed using multiple genes that are known to be highly conserved among all domains of life. This process of linking several genes together to enhance the phylogenetic signal is called concatenation and can contribute to robustness of the inferred phylogenetic tree. For reasons of simplicity, this chapter describes the case of using the sequence of a single protein or enzyme, either at the nucleic acid or amino acid level. Construction of phylogenetic trees comprises several steps: (1) selection of a set of sequences followed by (2) construction of a multiple sequence alignment (MSA), (3) optional selection of an evolutionary model that can be incorporated in the (4) construc- tion of a phylogenetic tree. Each of these steps requires some form of input or choice by the user, which underlines the hypothetical nature of this field of bioinformatics.

5.1.1 Multiple sequence alignments The first step in any attempt at phylogenetic inference is the selection of sequences to include. This is arguably the most important and challenging step, as the outcome of any phylogenetic analysis directly depends on the input. As mentioned before, the main requirement is that the selected taxa are homologous and each column in the resulting

48 Chapter 5. Ancestral sequence reconstruction

MSA corresponds to a position in the common ancestor. Homologous sequences are commonly retrieved from databases such as GenBank or NCBI using BLAST [226]. Typically, between 30-200 sequences are included for ancestral reconstruction, although increased taxon sampling generally contributes to the robustness of the process if the sequences are sufficiently close [227]. Subsequent alignment of the selected sequences can be done using a variety of methods that have been developed over the past decades [190, 228]. As it is assumed that all amino acids in one column of the alignment descend from the same residue in the common ancestor, the changes between them represent mutations that have occurred during the course of evolution. This information is used when creating the alignment; several amino acid exchange matrices have been developed that estimate likelihoods of particular amino acid substitutions, such as PAM [229] and BLOSUM [230]. Two MSA algorithms that have been used in the work presented in this thesis are MUSCLE [231] and MAFFT [232], which are both high performing algorithms for nucleotide sequences and amino acid sequences [233]. MUSCLE and MAFFT are progressive alignment algorithms: they create a so-called guide tree based on pairwise comparison of the sequences, which is then used to build the final alignment. Both algorithms belong to the current state-of-the art MSA methods and can handle large datasets. Small differences in accuracy in the inference of ancestral sequences have been found when comparing different alignment algorithms [234]. However, these effects seem to vary depending on the evolutionary scenario and especially MAFFTis consistently among the best performing algorithms. One measure of MSA accuracy is the comparison of an alignment of sequences and an alignment of the same set of sequences, yet reversed [235]. It has been shown that reversing the sequences typically does affect the alignment, which further illustrates the uncertainty that is inherent to phylogenetic inference already at the alignment level. A more recent development is the incorporation of structural or functional information in the alignment, which may contribute to alignment quality [236]. For the purpose of evolutionary analysis the alignment may be trimmed to exclude gaps or poorly aligned regions with large uncertainties or questionable homology, such as signal sequences or other highly variable regions at the N- and C-termini [237]. In order to increase reproducibility, automated trimming of the alignment can be performed with algorithms such as trimAl [238], which has been used in Paper III and IV.

5.1.2 Statistical inference Once the MSA has been created it can be used to construct a phylogenetic tree. As is the case for MSAs, a variety of tree building methods are available that are implemented in different software packages [239]. The first algorithm for the purpose of phylogenetic inference was developed in the 1970s by Fitch and was based on the principles of maxi-

49 Chapter 5. Ancestral sequence reconstruction mum parsimony [240]. The maximum parsimony method makes use of non-parametric statistics and aims to construct the tree that requires the minimum number of muta- tions to explain the observed extant sequences. A weakness of this approach is that it does not take biases in the evolutionary process into account that do exist in reality, as all changes are assumed to be equally likely. In contrast, maximum likelihood (ML) and Bayesian inference are two parametric (probabilistic) methods that make use of an explicit model for evolution among the sequences and generally give more reliable results than maximum parsimony [241]. As the name suggests, the ML approach con- structs the tree with the highest likelihood (probability), which is the probability of observing the data, i.e. the aligned extant sequences (SA), given the phylogenetic tree (PT ) and the evolutionary model (EM):

p = p (dataSA|modelPT +EM ) (21) where data are the aligned extant sequences and model includes the tree and the cho- sen evolutionary model. An ML search starts by constructing an initial tree, most commonly using the Neighbor Joining method [242], which is then optimized in terms of model parameters and branch lengths in a heuristic search procedure similar to search- ing a fitness landscape. A drawback of this method is the probability of finding alocal likelihood optimum that may or may not be the best tree, which can only be overcome by resampling. ML methods are implemented in a large variety of software packages, such as MEGA [243, 244], IQTREE [245] and PAML [246], which have been used in Paper I, III and IV. Whereas the ML method constructs a tree followed by optimization of the model parameters, Bayesian inference simultaneously estimates the topology of the tree and the parameters in the evolutionary model. A Bayesian approach aims to find the tree with the highest posterior probability from a large set of sampled trees. The posterior probability distribution is calculated according to Bayes’ theorem:

p (dataSA|modelPT +EM ) p (modelPT +EM ) p (modelPT +EM |dataSA) = (22) p (dataSA) where p (modelPT +EM |dataSA) is the posterior probability of the model (tree + evo- lutionary model) given the data (sequence alignment), p (dataSA|modelPT +EM ) is the likelihood of the tree (see Equation 21) and p (modelPT +EM ) and p (dataSA) are the prior probability distributions for the model and the data. The posterior probability density is commonly approximated by Markov chain Monte Carlo (MCMC) sampling, where a sampling chain runs for millions of cycles that visits tree and parameter space in proportion to the probability. This strategy allows Bayesian inference to sample

50 Chapter 5. Ancestral sequence reconstruction

from more complex landscapes than ML, however, this comes at a computational cost. Bayesian methods are implemented in various programs such as MrBayes [247] and BAli-Phy [248], the latter of which was used in Paper III. In contrast to the posterior probabilities provided by MCMC sampling in Bayesian inference, the confidence in ML inference can be estimated using bootstrapping, which assesses the support of specific branches in the tree by randomly resampling the data. Bootstrapping is commonly performed between 500-2000 times and a high bootstrap value supporting a clade is an indication of a strong phylogenetic signal in the data. Overall, ML and Bayesian infer- ence are highly complementary approaches that share the likelihood function as well as evolutionary models and both provide an estimate of confidence in the inference.

5.1.3 Tree rooting In order to assign a directionality to the evolutionary process in the tree, the location of the oldest node in the tree—the common ancestor—needs to be specified. This process is referred to as tree rooting and is a requirement for inference of ancestral sequences. Rooting can be performed in various ways, such as using phylogenetic knowledge, e.g. from the species tree, or by minimal ancestor deviation [249]. The former is known as the outgroup approach; an outgroup is defined as a single sequence or group of homologous sequences that diverged from the ingroup before the sequences in the ingroup diverged from each other [194]. The outgroup is thus more distantly related and the divergence point between the outgroup and the ingroup can be specified as the root of the tree. A drawback of this method is that phylogenetic knowledge is required, which is not always available. An implicit assumption when building a tree is that the sequences in it evolved in a tree-like manner, i.e. that all sequences evolved from the root and that they evolved independently along the branches of the tree. This assumption is violated if, for example, sequences resulting from horizontal gene transfer are included in the tree. Horizontal gene transfer, also known as lateral gene transfer, occurs when genetic material moves from one organism to another that is not its offspring. It is a major contributor to the evolution and functional diversification of bacteria [250], but also occurred in archaea and eukaryotic microbes. Horizontal gene transfer events can disturb the phylogenetic signal in a dataset and can be detected through comparison of a gene tree to the species tree of the included species [241] or by means of other computational methods [251].

5.1.4 Evolutionary models In addition to the tree model, i.e. its topology and branch lengths, phylogenetic in- ference methods such as ML and Bayesian inference include evolutionary models that are designed to approximate the evolutionary process by capturing major contributing

51 Chapter 5. Ancestral sequence reconstruction factors. Most software packages include selection methods to identify the evolutionary model that best fits the dataset at hand, by computing distances derived from Akaike’s Information Criterion or Bayesian Information Criterion [239]. Evolutionary models generally consist of two parts: a substitution model that accounts for probabilities of amino acid substitutions (similar to construction of an MSA) and a model that accounts for rate heterogeneity in the dataset [252]. Rate heterogeneity, or among-site rate varia- tion, is the phenomenon that amino acids at different positions in the protein sequence evolve at different rates [253]. Most data have fast evolving sites, which in some cases may lead to saturation: the occurrence of multiple substitutions per site along the evo- lutionary trajectory. Saturation disturbs the phylogenetic signal and creates obstacles for phylogenetic inference; however, this problem is generally more common for nu- cleotide sequences. Since it is impossible to know the evolving rate for each site in the alignment, rather than assigning individual sites to rate categories, another approach is used in which the rate at a specific site is randomly drawn from a specified prior distribution. The most commonly used distribution is a gamma distribution with one free parameter for shape, α, that determines the extent of rate variation and can be estimated by the model selection algorithm [254]. In some cases, a proportion of sites may be included that are assumed to be invariant. It is important to realize that, in reality, the most suitable model may change throughout the tree and even throughout the sequence as not all homologues or even parts of the protein evolved under the same evolutionary constraints. Consequently, it can be useful to build multiple trees using different models and evaluate their probabilities.

5.2 Ancestral sequence reconstruction

Once a phylogenetic tree has been constructed and rooted, it is possible to statistically infer the ancestral states at all nodes in the tree using the same statistical methods and evolutionary models as for tree construction [194]. As predicted by Pauling and Zuckerkandl in 1963 [218], such reconstructed ancestral sequences can be synthesized and the corresponding proteins or enzymes can be expressed and purified (Figure 9). Characterization of these putative ancestors enables what could be called “traveling back in time at the molecular level” [255] and is an increasingly popular approach to studying proteins or enzymes. The first experimental studies of resurrected proteins were published in the early 1990s [256, 257] and since then, many studies have been published using ASR to various ends, some of which are highlighted here.

Studies performed with ASR have been used to make inferences about the development of temperature on earth over time and generally support the hypothesis that early life

52 Chapter 5. Ancestral sequence reconstruction on earth was thermophilic [63, 255, 258, 259]. As discussed in Chapter 1, this gave rise to the question how enzymes adapted to the decreasing temperature on earth while maintaining sufficient catalytic activity. Nguyen et al. addressed this question by com- paring rate-temperature dependencies of reconstructed adenylate kinases spanning 3 billion years of evolution to those of newly evolved thermophiles within the same tree [65]. They hypothesized that enzymes evolved under an active evolutionary pressure to maintain functional catalytic rates at decreasing temperature by exploiting transition state heat capacity. The difference in heat capacity (∆C‡) between the ES complex and the transition state is a thermodynamic parameter that, if non-zero, imparts tempera- ture dependence on activation enthalpy ∆H‡ and has gained increasing attention for its role in thermal adaptation of enzymes [260, 261]. By reducing ∆C‡ over the course of evolution the temperature-dependence of catalysis decreased, which would be beneficial for enzymes operating at lower temperatures. In another ASR study, distinct determi- nants of thermostability for ancient proteins versus more recently evolved thermophiles

VLRARHGLGAEEAVVEAYRLRD VLRAAHGLGAEEAVARAYALRD VLKAAHDWDTEQAVDEAYVLRD VLARDRGLHADVAAEEATRLRD VLARDRGLPRDLAVAEATRLRD VLARHRGLPRDLAVAEAVRLRD VLAHHRGLPRDLAAVEAVRLRD VMLHHEHGDLEWALAEAYGLRD VLLHHEHGDLEEALAEAYGLRD VLIHHEGSDLEEALAEAVGLRD VMLHHEGSDLEWALAEAVGLRD VMLHHECSDLEWALAGAVGLRD

VLIHHRGSLEEWALAGAYRLKD

VLKHHEHSDLEEAVAGAYGLRD

VLLHHEGSDLEWALAGAVGLRD

Figure 9: Ancestral sequence reconstruction. A phylogenetic tree is constructed based on a multiple sequence alignment of homologous sequences, where each node represents the common ancestor of the sequences in the corresponding clade. Nodes of interest can be iden- tified, such as those marked by a star. Ancestral sequences can be synthesized and clonedinto expression vectors, expressed in a host system of choice, and purified, followed by characteri- zation of their physicochemical properties.

53 Chapter 5. Ancestral sequence reconstruction were studied in metabolic enzyme 3-isopropylmalate dehydrogenase [262]. Hypersta- bility has been observed in several reconstructed ancestral enzymes [255, 263], most prominently in 2-3 billion year old β-lactamases that displayed up to 35 °C higher melt- ing temperatures compared to their modern counterparts [264]. It should be noted that this commonly observed increased thermostability in ancestors is sometimes attributed to a bias towards the consensus sequence in the reconstruction process, in particular for ML [265, 266]. Since only extant sequences are included in the tree, beneficial mu- tations that have been selected by evolution are likely to be more prevalent which may lead to overestimation of ancestral stability [267]. This hypothesis is supported by some of the work presented in this thesis, in which ancestral enzymes were characterized that display increased thermostability despite their relatively young age. Nevertheless, the consistently high thermostability observed in studies that target ancient Precambrian nodes outperform Tm enhancements that are typically achieved by enzyme engineering [268]. Another presumed ancestral trait was observed in the aforementioned ancestral β- lactamases: enhanced substrate promiscuity. As discussed in Chapter 2, it is generally believed that ancient catalysts were more general due to the limited genetic material available [52]. Increased substrate promiscuity has been observed in several ASR stud- ies, such as one by Devamani et al., who followed the evolutionary path from esterases to hydroxynitrile lyases and found that the ancestral intermediates were promiscuous and catalyzed both reactions [269]. Similarly, reconstructed ancestors of evolutionarily related luciferases and haloalkane dehalogenases were found to have both oxidoreduc- tase and hydrolase activities [270]. Characterization of reconstructed ancestors of sugar from histidine biosynthesis showed that ancestral promiscuity can in fact per- sist for extremely long time-spans in the absence of selective pressure [271]. Increased substrate promiscuity was also observed in ancestral terpene cyclases in Paper I and further investigated in Paper II. Several studies have been published in which ASR was used to understand structure-function relationships of proteins or enzymes, e.g. with respect to binding interactions or allosteric effects. The benefit of this so-called vertical approach—in contrast to the horizontal approach of comparing modern homologues—is that the differences between sequences are smaller, which facilitates the identification of key residues responsible for specific properties. In one example, an inversion ofal- losteric effect between the two subunits of was observed when comparing the enzyme from the last bacterial common ancestor to its modern counter- part. Reconstructing several evolutionary intermediates along the lineage enabled the identification of four key residues responsible for the inversion and, combined withMD simulations, revealed the underlying mechanism [272, 273]. In another example, ASR was used to shed light on blockbuster drug Gleevec’s 3000-fold higher affinity towards one tyrosine kinase over a structurally similar one [274]. Similarly, intermediates along the evolutionary path between the two kinases were reconstructed and characterized.

54 Chapter 5. Ancestral sequence reconstruction

Experiments revealed the development of a hydrogen bond network that prevented a loop from closing over the active site in the kinase with low inhibition by the drug, which confirmed previous hypotheses regarding the role of flexibility in determining Gleevec affinity [275].

A common concern with respect to ASR is whether or not the putative ancestors display similar characteristics as the true ancestral proteins. As stated by Williams et al., “any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method” [265]. Using a benchmark phylogeny created by laboratory- based evolution it has been shown that a good approximation in terms of phenotype can be achieved [276]. Construction of ancestral libraries that sample near-ancestor sequence ensembles showed mixed results, but offers a way of assessing prediction fidelity of ASR in general [277]. Overall, one might think of ancestral reconstruction as a way to generate hypotheses, rather than obtaining answers. Even if putative ancestral sequences are not in fact close to the original sequence of the enzyme that once existed, they may still be attractive scaffolds for protein engineering efforts. As pointed outby Risso and Sanchez-Ruiz, the degree of phenotypic correspondence to the true ancestor is less relevant from the engineering perspective, when the goal is to obtain proteins with evolvable properties [278].

5.2.1 ASR for enzyme engineering The potential of ASR as an approach to protein or enzyme engineering, in particular by combining ancestral scaffolds with further engineering efforts such as directed evolution, has received increasing attention [279–282]. As pointed out in the work by Nguyen et al., the evolution of enzymes under an active pressure to maintain catalytic activity combined with a passive drift away from thermostability may allow for identification of highly active and stable variants [65]. Especially the commonly observed increase in thermostability may contribute to scaffold evolvability, as the protein can tolerate more beneficial but destabilizing mutations before it falls below the stability threshold. By comparing reconstructed ancestors at subsequent nodes along an evolutionary lineage, also known as the vertical approach, specific stabilizing mutations can be identified that further contribute to protein stability [283]. In addition to being beneficial for evolvability, increased stability may also improve of protein drugs [284] and is likely to improve shelf-life as well. As of today, ASR has been scarcely ap- plied to therapeutic proteins. One example is the engineering of coagulation factor VIII with improved expression and decreased decay rates towards treatment of hemophilia A, which could allow for lower doses in gene therapy experiments [285]. The aim of the work presented in this thesis is to explore and establish the application of ASR as a tool for enzyme engineering, in particular with a focus on therapeutic enzymes.

55

Chapter 6

Present investigation

The aim of the work presented in this thesis is to explore the potential of ancestral sequence reconstruction as an enzyme engineering approach, focusing on therapeutic enzymes. We chose diterpene cyclases as model en- zymes for a method development project presented in Paper I, in which we obtained stable scaffolds with increased substrate scope. The ancestral ter- pene cyclases served as the basis for structure elucidation and engineering of substrate specificity in Paper II. In Paper III, we applied ASR topheny- lalanine/tyrosine ammonia-lyases with the aim to enhance their therapeu- tic properties towards complementary treatment of hereditary tyrosinemia. Finally, we turned our attention to lysosomal enzymes; highly complex en- zymes that require several post-translational modifications by other enzymes as well as successful targeting to the cell and the lysosome. In Paper IV, we applied ASR to iduronate-2-sulfatese: the lysosomal enzyme associated with Hunter Syndrome, a rare lysosomal storage diseases.

57 Chapter 6. Present investigation

Paper I - Ancestral diterpene cyclases show increased ther- mostability and substrate acceptance

Bacterial terpene cyclases are versatile biocatalysts that produce multicyclic build- ing blocks [286], which have been considered in the context of biomaterials, biofuels, medicines and synthetic biology [287, 288]. Narrow substrate scope and insufficient stability have been mentioned as potential weaknesses of biocatalysts [289, 290] and we hypothesized that ASR could be a successful strategy to improve enzyme stability and extend the substrate scope of terpene cyclases. Moreover, ancestral reconstruction of terpene cyclases constitutes an opportunity to expand the terpene cyclase sequence space and to investigate the potential of this strategy in obtaining scaffolds for further enzyme engineering. To this end, we selected spiroviolene synthase (SvS): a newly dis- covered class I diterpene cyclase from Streptomyces violens that catalyzes the formation of tetracyclic spiroviolene from GGPP (Figure 10) [138].

A maximum likelihood tree with 29 prokaryotic homologues was constructed and the most likely ancestral sequences at three nodes within the subtree of Streptomyces were chosen for further characterization (Figure 11A). The ancestral sequences differed from

Figure 10: Suggested mechanism for the reaction catalyzed by SvS. Spiroviolene synthase catalyzes the cyclization of linear GGPP to tetracyclic spiroviolene. Mechanism as suggested by [138], figure contains material taken from [291] with permission of the journal.

58 Chapter 6. Present investigation

A Streptomyces violens

A3 Streptomyces ochraceiscleroticus Streptomyces sclerotialus Streptomyces ambofaciens Streptomyces canus A2 Streptomyces sp. M1013 OMI85307.1 Streptomyces aureofaciens Streptomyces sp. NRRL WC -3795 A1 Streptomyces sp. CB02414 Streptomyces aurianticus Streptomyces caatingaensis Streptomyces aidingensis Streptomyces rimosus Streptomyces yerevanensis Streptomyces chartreusis

0.20

B 78 89 100 SvS S.violens LWVYWGFAFDDHRCDNGPFSNRP SvS-A1 LWVYWGFAFDDARCDNGPLSTRP SvS-A2 LWVYWGFAFDDARCDNGPLSTRP SvS-A3 LWVYWGFAFDDNRCDNGPFSTRP SvS-consensus LWVYWGFAFDDARCDSGPLSSRP *********** ***.**:*.** Figure 11: Ancestral reconstruction of diterpene cyclases. A) A subtree of the full ML tree, showing only the Streptomyces species and ancestral nodes A1-A3. B) A section of the MSA of modern SvS, the three ancestors and the consensus sequence. The aspartate-rich motif is highlighted in grey and non-conserved position 89 in pink. Numbering is according to SvS from S. violens. Figure adapted from [291] with permission of the journal. modern SvS by 6% (SvS-A3), 23% (SvS-A2), and 24% (SvS-A1) and were estimated to be between 0.04-1.7 million years old based on available calibration points [292]. Upon inspection of the reconstructed sequences, we noticed an ancestral mutation in the con- served aspartate-rich motif typical for class I terpene cyclases (see Terpene cyclases in

59 Chapter 6. Present investigation

Chapter 3): whereas modern SvS contained motif DDHRCD, the histidine was replaced by asparagine (SvS-A3) and alanine (SvS-A2 and SvS-A1) (Figure 11B). As changes in this metal-binding motif had previously been reported to affect product specificity [128], we were interested in investigating the effect of the ancestral mutations in this position and included the original ancestral sequences as well as single point mutants SvS-A1_A89H, SvS-A3_A89H and SvS-A3_N89H. In addition, we were interested in comparing the ancestral reconstruction approach to the consensus method, which is commonly used to enhance protein stability (see section Sequence-based enzyme engi- neering in Chapter 4) [193]. We therefore constructed a consensus sequence of all 12 sequences that descend from SvS-A1, which differed from SvS from S. violens by 28% in sequence identity and also contained an alanine at position 89.

Not only could all ancestral sequences be successfully expressed in E. coli BL21(DE3), but a 2- to 4-fold increase in yield of soluble protein was observed for the two oldest ancestors. Perhaps unexpectedly, the consensus enzyme was challenging to express under the same conditions, resulting in substantially lower yields in repeated expres- sion attempts. Moreover, we noticed that SvS-A1 and SvS-A2 could be stored at 4 °C for several weeks longer and could be frozen and thawed without precipitation as opposed to modern SvS and SvS-A3 (unpublished data). These observations indicated enhanced protein stability in the two oldest ancestors [293], which we aimed to con- firm by determining the melting temperatures of all enzymes using nanoDSF andDSF measurements with fluorescent dye. Indeed, we found that melting temperatures for

SvS-A1 and SvS-A2 were 7 °C and 13 °C higher than for modern SvS (Tm = 57 °C), respectively. Interestingly, the consensus enzyme also displayed increased thermosta- bility, with a Tm of 63 °C, similar to SvS-A1. In order to investigate possible ancestral promiscuity in the enzymes, initial activity measurements were performed with three different substrates (Figure 6A). Analysis by GC/MS showed that all three ancestors and mutants formed products from both C20-substrate GGPP and C15-substrate FPP, but none of the enzymes reacted with C10-substrate GPP. SvS from S. violens only appeared to be active towards GGPP and no activity towards any of the substrates could be detected for the consensus sequence. The activities were further investigated in quantitative competition experiments, in which GGPP and FPP were mixed in the same reaction in different ratios. Although all ancestral enzymes preferred GGPPas their main substrate, we found relative kcat/KM ratios for GGPP over FPP as low as eight (SvS-A2) and six (SvS-A3), indicating considerable substrate promiscuity. Mu- tations A89H and N89H in the aspartate-rich motif only resulted in small effects on relative activity and promiscuity, most notably a decrease of the latter. The fact that this mutation did not completely abolish promiscuity in the ancestors underlines the importance of the ancestral background for substrate selectivity.

60 Chapter 6. Present investigation

A SvS S. violens 0,012

0,010

)

1 - 0,008 20 °C 18 °C 0,006 14 °C 0,004 12 °C

Reaction rate (s rate Reaction 0,002 0,000 0 20 40 60 80 [GGPP] (μM) B SvS-A2 0,006

0,005

)

1 - 0,004 30 °C 25 °C 0,003 20 °C 0,002 12 °C

Reaction rate (s rate Reaction 0,001

0,000 0 20 40 60 80 100 120 [GGPP] (μM)

Figure 12: Kinetic analyses of modern SvS and SvS-A2. A) Initial reaction rates as a function of GGPP concentration at four different temperatures for SvS from S. violens. B) Initial reaction rates as a function of GGPP concentration at four different temperatures for SvS-A2. Dashed lines represent the fits that were used to determine kcat/KM (modern SvS) and kcat and KM (SvS-A2). Error bars are standard deviations with n=3. Figure adapted from [291] with permission of the journal.

Due to its high thermostability, SvS-A2 was selected for more detailed kinetic and ther- modynamic analyses alongside SvS from S. violens (Figure 12 and Figure 13). Figure 12 shows the initial reaction rates for both enzymes as a function of GGPP concentration and at different temperatures; whereas modern SvS could not be saturated, SvS-A2 showed Michaelis-Menten-like kinetic behavior. This finding is in line with the hypoth-

61 Chapter 6. Present investigation

1/T (K-1) 0,0032 0,0033 0,0034 0,0035 0,0036 -23,8

-24,0

-24,2

M

∗ ൗ

ℎ -24,4

B

cat

푘 푘

-24,6 푙푛

-24,8

-25,0 SvS S. violens

-25,2 SvS-A2

Figure 13: Thermodynamic analyses of modern SvS and SvS-A2. Linear fits were calculated according to transition state theory and are described by y = −2862.9x − 14.9 for modern SvS (R2 = 0.93) and y = −4062.1x − 10.6 (R2 = 0.99) for SvS-A2. Error bars are standard deviations with n=3. Figure adapted from [291] with permission of the journal.

esis that an increase in turnover number was prioritized at the expense of KM during the course of evolution [59]. The generally low catalytic rates and efficiencies support the notion that enzymes operating in secondary metabolism are typically under a weaker selection pressure to increase their catalytic rate compared to enzymes associated with primary metabolism, and are comparable to those of other terpene cyclases [128]. We also found that the 13 °C increase in melting temperature of SvS-A2 was accompanied by an increased temperature optimum for catalysis of ca. 10 °C. Moreover, its apparent kcat/KM at optimum temperature was nearly two-fold higher than that of SvS from S. violens. Based on (quasi)thermodynamic analyses (see Thermodynamics of enzymatic reactions in Chapter 1), the enthalpy and entropy of activation were calculated for both enzymes (Figure 13). For modern SvS and SvS-A2, ∆H‡ and ∆S‡ were calculated to be ca. 6 kcal mol−1 and -30 cal mol−1 K−1, and 8 kcal mol−1 and -21 cal mol−1 K−1, respectively. As Gibbs free energy of activation is described by ∆G‡ = ∆H‡ − T ∆S‡ (Equation 2) and assuming both enzymes operate at their optimum temperature, SvS- A2 would benefit more from a high entropy of activation than modern SvS. Aligned ‡ with this notion, the difference in entropy of activation (T ∆SvS,A2−SvS,S.violens∆S ) was calculated to ca. 2.5 kcal mol−1 (at 20 °C) and ca. 2.6 kcal mol−1 (at 30 °C), i.e. an increase in temperature is beneficial for SvS-A2. The corresponding difference in

62 Chapter 6. Present investigation

‡ −1 enthalpy of activation (∆SvS,A2−SvS,S.violens∆H ) was calculated to ca. 2 kcal mol , which is in accordance with the theory that decreasing ∆H‡ is beneficial in the process of adaptation to lower temperatures, as discussed in Chapter 1.

Concluding remarks The aim of this study was to establish the use of ASR in our laboratory and to in- vestigate the utility of the method in obtaining biocatalytic scaffolds with interesting properties. In summary, we successfully implemented ASR on an emerging family of biosynthetic enzymes for the first time and obtained ancestral diterpene cyclases with enhanced stability, activity and promiscuity that may serve as scaffolds for further en- gineering efforts. Kinetic and thermodynamic analyses illustrated different concepts in evolution, especially with respect to temperature-dependence of catalysis. Moreover, the observed increased yield as well as general robustness and stability of the two old- est ancestors are promising properties that could facilitate crystallization attempts, as described in Paper II for SvS-A2. These overall highly encouraging results demonstrate the potential of ASR as a means to explore sequence space of enzymes and enhance their stability, activity and substrate acceptance. Furthermore, the comparison of ancestral enzymes to the consensus enzyme highlights the strength of ASR as an evolutionary approach to enzyme engineering.

Paper II - Structure elucidation and engineering substrate speci- ficity of an ancestral diterpene cyclase

The lack of structural information in Paper I made it difficult to obtain a detailed under- standing of the SvS enzymes and their biochemical characteristics. Modern SvS from S. violens was challenging to obtain in large enough quantities for structural studies, due to issues with solubility and susceptibility to aggregation. The observed increase in solubility, but also increased melting temperature and long-term stability of SvS-A2 as compared to modern SvS encouraged us to select this ancestral enzyme as a candi- date for attempting structure elucidation. This strategy had previously been suggested [281] and applied to bacterial pyruvate decarboxylases [294], as well as mammalian cytochrome P450 enzymes [295] and flavin monooxygenases [296], for which no extant structures were available.

After expression and purification of SvS-A2, we were able to obtain a metal-free, un- liganded crystal structure at 2.30 Å resolution (deposited as PDB ID: 6TBD) of the ancestral enzyme. SvS-A2 crystallized as a homodimer (Figure 14), which was consis- tent with the observed molecular weight of the enzyme in solution during SEC-MALS

63 Chapter 6. Present investigation

Figure 14: Crystal structure of SvS-A2. The crystal structure of SvS-A2 was determined to 2.30 Å resolution (PDB ID: 6TBD). The enzyme crystallized as a dimer and a top view is shown with one monomer colored in grey and the other in pink. SvS-A2 adopts a typical terpene cyclase α-fold and consists of 11 antiparallel helices that are arranged in a circular fashion (indicated by letters). experiments (MW of main peak: 79 kDa, theoretical MW of monomer ca. 41 kDa). Modern SvS from S. violens was also determined to form a dimer in solution, indi- cating that the reconstructed ancestral enzyme retained its oligomeric nature despite 23% difference in sequence identity and several ancestral mutations in the dimerization interface. Electron densities were unresolved at the N- and C-terminus (residues 1-8 and 360-361, respectively) as well as for two loops in the sequence (R234-G240 and V312-A328). SvS-A2 adopts the typical terpene cyclase α-fold (see Terpene cyclases in Chapter 3) and is made up of 11 antiparallel α-helices that are arranged in a cir- cular fashion (Figure 14). Helices B, C, G, H and K form an inner circle within the α-fold and harbor the active site, including the conserved DDxx(x)D-motif on helix C (DDARCD) as well as the NSE-motif on helix H (NDRHSLRKE). The RY dimer (PRYLSL) is located in an unresolved segment at the C-terminus of helix K. These arginine and tyrosine residues, which have been found to be conserved in bacterial ter- pene cyclases and have been described to form hydrogen bonds with the substrate’s diphosphate group [286]. Finally, an effector motif (RLLSA) on the break of helixG had previously been identified in selinadiene synthase and is hypothesized to mediate an induced-fit mechanism upon substrate binding [131]. Ancestral mutations inSvS-

64 Chapter 6. Present investigation

A2 are scattered all over the protein, mostly on its surface, and are shown in orange in Figure 15. Based on the available crystal structure, a model was built in which the unresolved electron densities as well as the tri-metal Mg2+ cluster were modeled and GGPP could be docked into the active site. Inspection of this model revealed that the upper wall of the active site contracted with respect to the crystal structure, mediated by the metal binding motifs on helix C and H.

Figure 15: Ancestral mutations on the structure of SvS-A2. Positions where ancestral mutations occur are highlighted in orange on monomeric SVS-A2 (PDB ID: 6TBD) as seen from the side (left) and the top (right). The DDxx(x)D motif on helix C is shown in light green, the resolved part of the NSE triad on helix H is shown in dark green and the effector motif on the break of helix G is shown in purple. Unresolved parts in the sequence are indicated by dashed lines.

Concluding remarks Overall, the results in this study highlight the possibilities of obtaining ancestral en- zymes with increased solubility and stability, which makes them amenable for struc- ture elucidation. Moreover, ancestral enzyme variants can serve as robust scaffolds for structure-based rational engineering of substrate specificity and activity, which may also provide mechanistic insights.

65 Chapter 6. Present investigation

Paper III - Exploring the therapeutic potential of modern and ancestral phenylalanine/tyrosine ammonia-lyases as supple- mentary treatment of hereditary tyrosinemia

As described in the section Amino acid catabolism in Chapter 2, hereditary tyrosine- mia (HT) is a group of rare genetic enzyme deficiency disorders related to different enzymes involved in L-tyrosine (L-Tyr) catabolism (Figure 5). Patients with HT type I have a defect in the gene coding for fumarylacetoacetate hydrolase, which leads to the accumulation of two toxic upstream intermediates, maleylacetoacetate and fumary- lacetoacetate. HT type I patients are currently treated with Nitisinone (NTBC), which inhibits an upstream enzyme and thereby effectively mimics the phenotype of HT type III [113]. HT type II and type III are characterized by accumulation of L-Tyr and L-phenylalanine (L-Phe), as L-Phe catabolism proceeds via the same pathway (Figure 5), which can currently only be controlled by a protein-restricted diet. Despite treat- ment with Nitisinone and conforming to dietary restrictions, patients with HT type I have been shown to be at risk of presenting developmental delay and impaired cogni- tive function which may be due to elevated L-Tyr levels [297]. Moreover, high levels of L-Phe are also associated with developmental delay and intellectual impairment in the context of phenylketonuria (PKU), the enzyme deficiency disorder related to L-Phe catabolism (Figure 5) [111].

Due to challenges related to the deficient enzyme in PKU, alternative pathways of L-Phe degradation have been explored and enzyme substitution therapy with recombi- nant phenylalanine ammonia-lyase (PAL) from bacteria was approved by the FDA in 2018 [122, 124]. This approach inspired us to explore a similar route for HT type I, where enzyme substitution therapy could be complementary to treatment with Nitisi- none. We initially turned our attention to bacterial TAL, as they are the only L-Tyr −1 specific ammonia-lyases known to date and kcat values of 27 s had been reported [152]. However, upon characterization of activity of TAL from Rhodobacter capsulatus, −1 we measured kcat values of <0.1 s (unpublished data), which had also been described by other groups [153]. Since several bispecific PAL/TALs have higher reported activity towards L-Tyr than bacterial TAL, we decided to explore these enzymes as poten- tial therapeutic enzymes (Figure 16). We selected fungal PAL/TALs from Rhodotorula glutinis (RgPAL) [163] and Trichosporon cutaneum (TcPAL) [161, 298] for further inves- tigation, due to their relatively high activity towards L-Tyr. Apart from higher L-Tyr activity than bacterial TAL, an advantage of this enzyme class is that both L-Phe and L-Tyr are cleared, which accumulate in patients with HT. We hypothesized that ASR could be used to improve therapeutic properties of these enzymes, in particular enzyme stability and relaxed substrate specificity, which we had observed in Paper I.

66 Chapter 6. Present investigation

PAL + NH3

TAL + NH3

Figure 16: Reactions catalyzed by PAL/TAL. Bispecific PAL/TAL enzymes catalyze the deamination of L-Phenylalanine and L-Tyrosine to cinnamic acid and p-coumaric acid, respectively.

RgPAL and TcPAL were used as query sequences to construct a phylogenetic ML tree that included 68 fungal PAL/TALs (Figure 17A). We reconstructed putative ances- tral sequences at four nodes between the two modern reference enzymes using PAML (PAML_A1-PAML_A4) and, in order to explore the robustness of the reconstruction, reconstructed ancestral sequences at two of the four nodes using MEGA (MEGA_A1 and MEGA_A3). The ancestral sequences differed from RgPAL by 13–34% in sequence identity, but all known catalytic residues were conserved, as well as the Ala-Ser-Gly triad that forms catalytic moiety MIO. Interestingly, an investigation of the localiza- tions of ancestral mutations using a homology model of MEGA_A1 revealed that the interface between monomers was not conserved (Figure 17B). All enzymes except for PAML_A4 could be successfully expressed in E. coli and SEC-MALS experiments demonstrated that all ancestors retained their ability to form tetramers, despite the non-conserved interfaces. SEC-MALS experiments also showed that some ancestral enzymes, including MEGA_A1 and MEGA_A3, were less prone to form larger aggre- gates as compared to RgPAL. Initial activity measurements showed that all expressed enzymes were active with both L-Phe and L-Tyr, albeit less active compared to both RgPAL and TcPAL. Moreover, PAML_A3 displayed extremely low activity and was therefore excluded from further kinetic analyses (Table 1). Kinetics assays confirmed that activities of all ancestors were indeed reduced as compared to the modern enzymes, of which TcPAL displayed highest turnover numbers for both substrates. Moreover, co- operative kinetics were observed for TcPAL, which was in line with previous studies [161, 298]. MEGA_A1 and MEGA_A3 displayed the highest activities among the ancestors and altered PAL/TAL specificity ratios, which were slightly in favor of L-Tyr.

67 Chapter 6. Present investigation

A Rhodotorula glu�nis A1 A2 Rhodotorula taiwanensis Rhodotorula turoloides A3 Rhodotorula graminis Microbotryum lychnidisdioicae Microbotryum saponariae A4 Microbotryum intermedium Leucosporidium crea�nivorum Trichosporon asahii Cutaneotrichosporon oleaginosum Trichosporon cutaneum Mixia osmundae Melampsora laricipopulina Puccinia striiformis Puccinia coronata Puccinia tri�cina Us�lagomyco�na Agaricomyco�na

Ascomycota 0.3 B

Figure 17: Ancestral reconstruction of PAL/TAL from Fungi. A) Maximum Like- lihood tree of fungal PAL/TALs. The three subphyla from Basidiomycota are colored green (Agaricomycotina), grey (Ustilagomycotina) and purple (Pucciniamycotina). Outgroup As- comycota is shown in black, ancestral nodes are indicated by diamonds and the two query sequences are printed in bold and color. B) Homology model of MEGA_A1 showing the dis- tribution of ancestral mutations. Ancestral mutations are colored in green in monomer A, and surface mutations within 4 Å of another monomer are shown as spheres. Catalytically active group MIO is shown in yellow for monomer A. The model was built in YASARA using the crystal structure of PAL/TAL from Rhodotorula turoloides (PDB ID: 1Y2M) as a template. Figure adapted from [299] with permission of the journal.

68 Chapter 6. Present investigation

Table 1: Kinetic analyses of PAL/TAL enzymes with L-Phe and L-Tyr. Assays were performed at pH 9.2 at 37 °C. Standard deviations are shown with n=3, errors may occur due to rounding.

L-Phe L-Tyr a a kcat KM kcat/KM kcat KM kcat/KM [kcat/KM]P he −1 −1 −1 −1 −1 −1 (s )(mM)(M s )(s )(mM)(M s ) [kcat/KM]T yr RgPAL 9.0 ± 0.4 1.4 ± 0.1 6655 ± 259 1.7 ± 0.1 0.3 ± 0.0 6303 ± 288 1.1 TcPAL 13.2 ± 0.4 0.1 ± 0.0 −a 2.2 ± 0.3 0.1 ± 0.1 −a −a MEGA_A1 5.8 ± 0.3 3.8 ± 1.0 1560 ± 326 1.1 ± 0.1 0.4 ± 0.0 2643 ± 312 0.6 MEGA_A3 1.4 ± 0.0 0.4 ± 0.0 4069 ± 356 0.2 ± 0.0 0.1 ± 0.0 4509 ± 34 0.9 PAML_A1 0.3 ± 0.40 4.5 ± 0.6 68 ± 8 <0.1 2.7 ± 0.0 17 ± 1 4.1 PAML_A2 <0.1 0.7 ± 0.2 42 ± 7 −b −b −b −b a ′ For TcPAL, K is shown instead of KM due to cooperativity. bFor PAML_A2, activity towards L-Tyr was not sufficient for kinetic analysis.

Thermal unfolding of all enzymes was monitored using nanoDSF and circular dichroism (CD) spectroscopy and showed that TcPAL and RgPAL had the lowest melting tem- peratures of 62 °C and 66 °C, respectively. All ancestral enzymes displayed increased

thermostability, with the highest Tm being 79 °C for PAML_A3. Interestingly, for the ancestors reconstructed in PAML stability increased with each step back in the tree,

accompanied by a decrease in catalytic activity. The Tm of MEGA_A1 was determined

to be 71 °C and MEGA_A3 appeared to be even more stable in CD experiments (no Tm could be determined, likely due to an additional tryptophan at the N-terminus masking the nanoDSF signal). Further efforts to assess the enzymes’ stability were madeby incubating the two most active ancestral enzymes, MEGA_A1 and MEGA_A3, and RgPAL at 60 °C for 30 and 60 minutes, followed by activity measurements (Figure 18A). Results showed that, whereas RgPAL lost nearly all its activity after 60 min- utes, MEGA_A1 and MEGA_A3 retained 63% and 64% of their activity, respectively. MEGA_A1 was identified as the ancestral enzyme with highest therapeutic potential and further experiments were performed with a focus on therapeutic properties such as shelf-life and means of administration.

Several ways of administration could be envisioned in the case of therapeutic PAL/TALs. The first, and perhaps more obvious, is oral administration before or during meals aimed at clearing L-Phe and L-Tyr in the gastrointestinal tract before it is absorbed. An al- ternative approach would entail subcutaneous or intravenous injection, with the aim to reduce blood plasma levels of both amino acids. In order to explore the former, RgPAL and MEGA_A1 were incubated with pepsin and trypsin – digestive enzymes from the stomach and the duodenum. Activity measurements and SDS-PAGE analyses showed that both enzymes were susceptible to degradation by pepsin and trypsin, as well as inactivation by low pH (2.0), highlighting the challenges involved in this administration

69 Chapter 6. Present investigation

A 100 30 min 60 min 80

60

40

Relative activity(%) Relative 20

0 RgPAL MEGA_A1 MEGA_A3

B 100 RgPAL TcPAL

80 MEGA_A1 (%)

60 activity 40

Relative Relative 20

0 2 weeks 4 weeks 4 weeks 37 °C 37 °C 25 °C

Figure 18: Stability of PAL/TALs. A) Analysis of heat resistance of RgPAL, MEGA_A1 and MEGA_A3: relative activity towards L-Phe after incubation at 60 °C. Activity was measured at 37 °C and normalized to control samples that had been stored at 25 °C. Standard deviations are shown with n = 3. B) Analysis of long-term stability of RgPAL, TcPAL and MEGA_A1: relative activity towards L-Phe after incubation at 25 °C and 37 °C. Activity was measured at 37 °C and normalized to control samples that had been stored at −80 °C. Figure adapted from [299] with permission of the journal. route. As for the second option, we set out to test whether RgPAL and MEGA_A1 would be active in serum at pH 7.5, slightly below the optimum for PAL/TALs. We spiked calf serum with L-Tyr levels that were similar to those reported in patients [112], and measured L-Tyr and L-Phe concentrations in samples with varying enzyme

70 Chapter 6. Present investigation

concentrations. Both RgPAL and MEGA_A1 were indeed active in serum: all L-Phe was cleared from samples with 2000 nM RgPAL and MEGA_A1, whereas L-Tyr was reduced by 95% and 83%, respectively, indicating the potential of intravenous admin- istration. In order to assess their long-term stability, RgPAL, TcPAL and MEGA_A1 were incubated at 25 °C and 37 °C for 2–4 weeks, followed by evaluation of stability as well as L-Phe activity (Figure 18B). Results showed that MEGA_A1 performed better than both modern enzymes, retaining up to 52% of activity after 4 weeks at 37 °C. In- terestingly, despite its lower melting temperature, TcPAL was significantly more stable than RgPAL, which lost activity completely after 2 weeks at 37 °C.

Concluding remarks After successful application of ASR to model enzymes in Paper I, we aimed to explore the potential of the method for engineering therapeutic enzymes in this work. Despite the oligomeric nature of PAL/TAL enzymes, putative ancestors could be functionally expressed in E. coli and SEC-MALS analysis showed that they retained their ability to oligomerize. Moreover, some ancestral enzymes were less prone to form larger ag- gregates compared to RgPAL, which is beneficial for biotherapeutic proteins from the perspective of immunogenicity [97]. Overall, we obtained ancestral PAL/TAL enzymes with decreased catalytic turnover, but considerably enhanced stability as compared to two modern fungal enzymes. This stability increase was not only reflected in higher melting temperatures, but also in enhanced long-term stability at 37 °C and increased resistance to heat inactivation of some ancestors. All of these are indications of an increased shelf-life as well as potentially increased stability in the body, which are both highly attractive features from a therapeutic perspective. Despite the decrease in catalytic turnover, the ancestral variants displayed promising properties that could make them potential scaffolds amenable for further engineering towards complementary treatment of tyrosinemia. Based on these results, we believe that ASR can be a viable strategy to enhance therapeutic properties of complex multimeric enzymes.

71 Chapter 6. Present investigation

Paper IV - Ancestral lysosomal enzymes with increased activ- ity harbor therapeutic potential for treatment of Hunter syn- drome

After successful application of ASR with a focus on therapeutic enzymes in Paper III, we selected a more complex system: lysosomal sulfatases. As discussed in Lysosomal sulfa- tases in Chapter 3, this enzyme family requires multiple modifications by other enzymes in the ER, such as conversion of a cysteine to formylglycine (fGly) by the formylglycine- generating enzyme (FGE) and N-linked glycosylation at various sites. We decided to focus on iduronate-2-sulfatase (IDS), which is associated with MPS II (Hunter syn- drome): a genetic defect in the IDS gene, causing impaired or depleted function of the enzyme. ERT with the recombinant human enzyme is currently available for MPS II in two forms: Idursulfase (Elaprase®) and Idursulfase beta (Hunterase®). Both enzymes are effective in treating somatic symptoms of the disease, but do not reach therapeutic levels of activity in the brain, thus not alleviating all symptoms. Apart from biodis- tribution, dosing of ERT constitutes a challenge as required doses are generally high, resulting in lengthy intravenous administration procedures and costly production of the drugs [102, 103]. Overall, a significant unmet medical need remains, even for diseases such as MPS II where ERT is available. Therefore, increasing stability or activity of human IDS would be highly attractive, as both properties are directly proportional to the required dose to achieve a desired therapeutic effect. A previous study on aryl- sulfatase A, another lysosomal sulfatase, reported that the catalytic efficiency of the human enzyme could be enhanced by a few amino acid substitutions that occurred in the evolution of rodents [300]. This so-called “evolutionary redesign” was inspired by the observation that arylsulfatase A from mouse displayed higher activity than its human homologue. These results encouraged us to explore IDS sequence space within the tree of mammalian IDS between primates and rodents. We selected recombinant human IDS (rhIDS) and recombinant murine IDS (rmIDS) as reference enzymes and starting points to construct an ML tree of 30 homologues (Figure 19A). Three nodes were selected for further investigation, going back from the modern human enzyme to the last common ancestor of primates and rodents. Only 20 out of 550 residues dif- fered between the oldest ancestor IDS-A3 and rhIDS, demonstrating the high degree of sequence conservation in this enzyme family. Interestingly, no ancestral mutations occurred in any of the 128 positions where disease-causing mutations had been reported [178]. As shown in Figure 19B, the ancestral mutations were scattered all over the struc- ture, but occurred mostly on the surface. Only two mutations, A354T and H356R, were within 10 Å from catalytic residues and were therefore investigated further by including mutants rhIDS_A354T, rhIDS_H356R and rhIDS_A354T_H356R in the study.

72 Chapter 6. Present investigation

A B

Figure 19: Ancestral reconstruction of IDS. A) Maximum Likelihood tree of 30 mam- malian IDS sequences. The colored clades are primates (blue) and rodents (grey) and reference enzymes rhIDS and rmIDS are colored accordingly. B) All 20 ancestral mutations are colored in pink in the structure of rhIDS (PDB ID: 5FQL [178]). Residues in the active site are shown as orange sticks, glycans are shown as cyan sticks and the calcium ion in the active site is shown as a light blue sphere. Figure composed of material from submitted manuscript.

All enzymes could be successfully expressed in ExpiCHO cells and activity measure- ments showed that all enzymes were active (Figure 20A). Little difference in activity was found between rhIDS and rmIDS, yet, ancestral enzymes IDS-A2 and IDS-A3 showed up to 2.5-fold higher activity than rhIDS. However, after correction for the de- termined fGly content in the enzyme batches, it seemed possible that this increase in activity actually reflected an increased percentage of active enzyme. To investigate this further, the two modern enzymes and the three ancestors were co-expressed with FGE from ExpiCHO (Cricetulus griseus, CgFGE), followed by activity measurements and determination of fGly content (Figure 20B). Co-expression with CgFGE increased activ- ity of rhIDS by four to five-fold, which was higher than Idursulfase that was included in the assay for comparison. Interestingly, a similar trend was observed for IDS-A2 and IDS-A3, with the latter showing up to 2-fold higher activity than rhIDS and Idur- sulfase. All enzymes that were co-expressed with CgFGE had fGly contents close to 100%, which supported the hypothesis that the ancestral enzymes were in fact more active. Analysis of thermostability using nanoDSF showed little differences in melting

temperatures, however, all enzymes with lower Tm values than rhIDS contained muta- tion A354T: one of two mutations close to the active site of rhIDS, which was predicted in IDS-A3. Based on these observations a new variant of IDS-A3 with reversed mu-

73 Chapter 6. Present investigation

A 2000

1600

1200

800

Activity(U/mg) 8% 400 5% 7% 3% 4% 0

B 2000 97% 96% 1600 98%

1200 97% (U/mg) 800

Activity 400 8%

0

Figure 20: Activity and fGly content of IDS enzymes. A) Specific activities of IDS enzymes, error bars show the standard deviations from 3-12 replicates. The percentage of formylglycine as determined by mass spectrometry is shown above the bars for the respec- tive variants. B) Specific activities of IDS enzymes co-expressed with FGE from C. griseus (CgFGE). Average activities are shown for two independent transfection experiments for each variant and error bars show the standard deviations from 6-12 replicates. The percentage of formylglycine as determined by mass spectrometry is shown above the bars for the respective variants. Figure composed of material from submitted manuscript.

74 Chapter 6. Present investigation

tation was designed (IDS-A3_T354A) that was determined to be slightly more stable than rhIDS, suggesting that mutation A354T is destabilizing. rhIDS, IDS-A3, and IDS- A3_T354A were expressed in ExpiCHO, both with and without human FGE (HsFGE). Upon expression without FGE, activities of IDS-A3 and IDS-A3_T354A were similar and both higher than activity of rhIDS. Upon co-expression with HsFGE, both IDS-A3 variants were again more active than rhIDS, however, IDS-A3 displayed higher activity than IDS-A3_T354A. The in vitro data encouraged us to further characterize the activ- ity of the two IDS-A3 variants as compared to rhIDS and Idursulfase, which was done by incubating MPS II patient fibroblasts with the IDS enzymes, followed by analysis of intracellular enzyme concentration and consumed substrate (Figure 21). Figure 21A and Figure 21B show that intracellular concentrations of IDS-A3 and IDS-A3_T354A after 24 hours were substantially lower than those of all other enzymes, but that these two variants clear more substrate than rhIDS, IDS-A1 and IDS-A2. Despite lower ac- tivities in the in vitro assay, IDS-A3 and IDS-A3_T354A that were not co-expressed with FGE cleared similar amounts of substrate as Idursulfase (Figure 21C). Moreover, they cleared more substrate than rhIDS, which was in accordance with in vitro data. Interestingly, the two IDS-A3 variants that were co-expressed with HsFGE cleared sub- stantially more substrate than rhIDS (+ HsFGE) and Idursulfase (Figure 21D). The observed increase in activity of IDS-A3 and IDS-A3_T354A is highly attractive from a therapeutic perspective as it may allow for lower dosage, which in turn decreases the cost of production and administration time. If dosage is not adjusted, the increased levels of enzyme in the circulation can potentially improve distribution in the body, which may address the remaining unmet medical need.

Despite their increased activity, IDS-A3 and IDS-A3_T354A were the two enzymes that displayed the lowest intracellular concentration in patient fibroblasts after 24 hours, which could be due to decreased cellular uptake. Uptake of lysosomal enzymes to the cell is a receptor- mediated process that could be disturbed by general changes to the protein shape or surface, such as altered charge distribution, or by changes to the glycans on the enzyme surface. Both could be caused either directly or indirectly by ancestral mutations that occur in IDS-A3 and not in IDS-A2, as the latter showed no decreased intracellular concentration. In order to explore the possibility that the glycans were altered in IDS-A3 and IDS-A3_T354A, a series of experiments were conducted. SEC- MALS analysis showed no significant differences in molecular weight of either protein fraction or glycan fraction between the IDS-A3 variants and rhIDS. Furthermore, all three enzymes were found to successfully bind to the mannose-6-phosphate receptor with highly similar association and dissociation patterns. A more detailed investigation of the glycosylation patterns of all enzymes was performed using a RapiFluor assay, which yielded glycan profiles based on cleavage and subsequent fluorescent labelling

75 Chapter 6. Present investigation

A B )

400 ) 180 pM 350 pM 160 300 140 120 250 100 200 80 150 60 100 40

50 20 Intracellular concentration ( concentration Intracellular 0 ( concentration Intracellular 0 0,0 0,5 1,0 1,5 2,0 0,00 0,05 0,10 0,15 0,20 Added concentration (nM) Added concentration (nM)

C D

100 100

UA(S) (%) UA(S)

UA(S) (%) UA(S) -

50 - 50

GlcNAc

GlcNAc

- -

0 0 Consumed UA Consumed -50 ConsumedUA -50 1 10 100 1 000 1 10 100 1 000 Intracellular concentration (pM) Intracellular concentration (pM)

Figure 21: Intracellular concentrations of IDS variants and degradation of hep- aran sulfate in MPS II patient fibroblasts. A) Intracellular concentrations of enzymes expressed without HsFGE 24 hours after treatment. B) Intracellular concentrations of en- zymes expressed with HsFGE 24 hours after treatment. Different concentrations were used compared to A) due to differences in activity of one order of magnitude. C) Consumed sub- strate UA-GlcNAc-UA(S) (as compared to internal standard chondroitin disaccharide ∆di-4S sodium) as a function of intracellular enzyme concentration for enzymes expressed without HsFGE. D) Consumed substrate UA-GlcNAc-UA(S) (as compared to the internal standard) as a function of intracellular enzyme concentration for enzymes expressed with HsFGE. Figure composed of material from submitted manuscript. 76 Chapter 6. Present investigation

of the glycans. The composition of the glycan profiles was compared, but no obvious differences among any of the enzymes we produced could be observed. In conclusion, the experiments regarding glycan composition did not provide a satisfactory explanation for the observed difference in intracellular concentration. Finally, MD simulations were performed with rhIDS and a homology model of IDS-A3_T354A. Root mean square fluctuations per residue were analyzed over a trajectory of 100 ns and were highly similar for both enzymes, thus not providing any obvious explanation to differences in activity or intracellular activity. Another potential explanation for the observed low intracellular concentrations of IDS-A3 variants is a faster clearance from the cell, which is contradicted by the increased clearance of substrate.

Concluding remarks The aim of this study was to apply ASR to lysosomal enzymes, a family of highly complex enzymes that act in concert with other enzymes in the cell and require multiple co- and post-translational modifications. As we previously obtained ancestral enzymes with increased activity and stability, we hypothesized that this approach could be a way to address the remaining unmet medical need in the field of lysosomal storage diseases, since both properties are directly related to dosage. In summary, we obtained two ancestral IDS variants that displayed higher in vitro and ex vivo activity than rhIDS and Idursulfase, the enzyme that is currently available as ERT. These results are highly encouraging and could potentially allow for lower dosage, which is associated with many benefits regarding administration, therapeutic effect and production cost. Moreover, the results further underline the potential of ASR as an approach to enzyme engineering, even in the context of complex therapeutic enzymes.

77 Chapter 6. Present investigation

Concluding remarks & Future perspectives

The work presented in this thesis aims to establish the use of ancestral sequence recon- struction as a method for enzyme engineering, with the goal to obtain enzymes with enhanced therapeutic potential. As commented by Lazarus and Scheiflinger, “Turn- ing native proteins into therapeutics is challenging, but enhancing their pharmaceutical properties to create next-generation drugs is even more difficult” [301]. This is arguably even more true for enzymes, whose catalytic activity often depends on a variety of complex short- and long-range interactions that can easily be disrupted and are diffi- cult to understand, let alone predict. Throughout this thesis, different approaches to enzyme engineering have been described, ranging from random mutagenesis to rational design – each with their strengths and weaknesses. Ancestral sequence reconstruction can be envisioned as a combination of both: no structural knowledge of the enzyme is required, yet, mutations are not introduced randomly, but rather guided by the diver- sity in sequence space that nature has selected for over the course of evolution. The myriad of enzymes that can be found in nature today are the result of billions of years of evolution; a fascinating and complex process of continuous genetic diversification and selection that was initially proposed by Darwin in 1859. Just over a century later, Pauling and Zuckerkandl predicted that it would one day be possible to statistically re- construct and characterize the intermediates of this evolutionary process that no longer exist. The development of algorithms for phylogenetic inference in the 1970s combined with advancements in molecular biology allowed these predictions to come true and opened the door to various applications of ancestral sequence reconstruction. Since the first experimental studies of resurrected ancestral proteins in the 1990s, ASRhas yielded valuable insight into enzyme evolution, protein structure-function relationships and even the evolution of life on earth. Yet, to our knowledge, the method has been scarcely applied in the context of biopharmaceuticals.

Before applying ASR to therapeutic enzymes, we set out to establish the method in a model system. To this end, we selected diterpene cyclases and in Paper I we de- scribed the first reported application of ASR on this family of biosynthetic enzymes. We reconstructed ancestors of bacterial class I diterpene cyclase SvS and obtained ances- tral enzymes with increased thermostability, activity and an extended substrate scope. Interestingly, a consensus enzyme from the same sequence alignment also displayed increased thermostability, but appeared to be inactive. To our knowledge, this com- parison has not been reported previously and shows the strength of an evolutionary approach to enzyme engineering. The generally increased robustness and stability of one ancestral diterpene cyclase in particular, alongside a considerably increased yield of soluble protein, inspired us to attempt structure determination by X-ray crystallog-

78 Chapter 6. Present investigation

raphy. In Paper II, we describe successful crystallization of the ancestral diterpene cyclase, which allowed for structure-based rational engineering of substrate-specificity.

Encouraged by the successful application of ASR to terpene cyclases, we turned our attention towards therapeutically relevant enzymes in Paper III and IV. We envisioned that bispecific phenylalanine/tyrosine ammonia-lyases could be potential therapeutic enzymes for complementary treatment of hereditary , as patients with this rare metabolic disorder accumulate both amino acids. We reconstructed and characterized several ancestral PAL/TALs and found that, despite compromised activity, some ancestral enzymes showed substantially increased thermostability and long-term stability. Both are indications of an increased shelf-life and possibly an in- creased stability in the body, which are beneficial for a potential drug. Moreover, this study demonstrated the successful application of ASR to oligomeric enzymes, which had been scarcely reported. Finally, in Paper IV we aimed to apply ASR to lysoso- mal sulfatases – a complex enzyme family that requires multiple modifications by other enzymes. We obtained ancestral variants of IDS with increased in vitro and ex vivo activity compared to the modern human enzyme and Idursulfase, the enzyme that is currently commercially available. Increased activity could allow for lower dosage of a po- tential drug, which in turn leads to decreased administration time and production costs.

Overall, the work presented in this thesis highlights the potential of ASR as an ap- proach to enzyme engineering. Due to the inherent uncertainty of the method, it is generally advisable to view it as a means to generate hypotheses rather than providing clear answers. Yet, the work presented herein demonstrates that ancestral enzymes do not have to be perfect to be useful. Although the potential of generating stable and evolvable scaffolds for enzyme engineering through ASR has been previously recognized, few examples have been published to date, in particular towards biopharmaceutical ap- plications. The results in Paper I-IV strengthen the notion that ASR is a viable strategy to obtain enzyme scaffolds with interesting properties, which can outperform modern enzymes in various aspects. It is likely, as well as exciting, that this bioinformatics method will become more widely recognized in the future and that it will be used to generate novel enzymes towards various applications.

79

Acknowledgements

After four years and a few months, the time has finally come to write everyone’s favorite part ofthe thesis. I am extremely grateful to many people and will try to include as many as possible without adding another ten pages to this book.

First and foremost, the research in this thesis has been funded by the Swedish Foundation for Strategic Research (SSF) and Swedish Orphan Biovitrum AB (Sobi). I would like to thank Sobi and Gålöstiftelsen for financially supporting my attendance at conferences and workshops abroad. Finally, I am very grateful to Christofer, Per-Olof, Erik, Johan, Sebastian, Gaspar, Björn, Marina, Lotta and Karen for their input in this thesis.

Erik, thank you for your valuable support and positivity as my Sobi supervisor throughout this entire process. It has really been a rollercoaster ride, but somehow you always managed to convince me that everything was going to be fine. Your sarcastic emails and funny comments never failed to put asmile on my face and I am very happy I got the chance to work with you for four years.

Per-Olof, thank you for your support and valuable scientific input in the work - I am very happy that you introduced me to ASR and that we managed to do nice things with it. Thank you for creating a group of talented scientists, which also happened to be wonderful colleagues.

Johan, thank you for joining in as a co-supervisor and for all the support that you provided. I really appreciate your positivity and not-so-Swedish directness (always with a smile) and I am very happy that you were a part of my time as a PhD student.

I also want to thank the students I have had the pleasure to supervise during my PhD studies: thank you Gwenaëlle and Lauren for being part of the SvS project. Thank you Karin and Albin for choosing to do your degree projects with me and for your great work in the PAL/TAL project. I am very proud of what we have achieved together and I am grateful for all the nice moments we shared – inside and outside the lab.

To my current and former academic colleagues, in particular from our group: thank you for everything! Lotta, thank you for bringing this opportunity to my attention and guiding me through the first months in the lab. Thank you for a million other things (I’ll send you a list), but most importantly: thank you for being an amazing friend. Björn, thank you for being a wonderful colleague and friend that always puts a smile on my face. Also, thank you for all the nice coffees, lunches and dinners together, which I am sure we will continue to have. Arne, thank you for always inviting me for beers (I don’t like beer), for the emotional support, but mostly all the fun times we had. Karen, thank you for the nice collaboration on the SvS project, the lovely lunches and fikas (in real life and on zoom), andfor your wisdom and support in all kinds of matters. To my travel buddies Arne, Karen and David H.: thank you for the nice moments we shared in Whistler and Brno. Thank you Antonino, Wissam, Patricia, Caroline, and Sissi for the time we shared together.

Thanks to all other wonderful colleagues from Gamma 5 for creating a nice and supportive work atmosphere: the Hudson lab, the Jonas lab, Anders Olsson and the DDDP. Thank you David

81 Chapter 6. Acknowledgements

J., Kristina, Markus, and Kiyan for all the nice lunches and other activities.

Thank you to my former colleagues and friends from Plan3 in AlbaNova for always making me feel welcome and at home. Sebastian, I am so happy that we kind of shared our PhD journeys and I am proud of our scientific baby. Thank you for always watering my plants, for our medium-sized hikes, for countless breakfasts and fikas, and for being an amazing friend for life. Thank youtothe Rockgroup for adopting me and thank you Niklas for your help with the IDS project.

To my Sobi colleagues, in particular from the former RTS team: thank you!!! You have all contributed immensely to the great experience that I have had as an industrial PhD student, both with your expertise and your kindness. Especially in difficult times, the support I felt from all of youmademe feel much better and I consider myself very lucky to have shared this journey with you.

Thank you Stephen, Patrik and Bent for being supportive in tough times, for managing a group of wonderful people and scientists, and for always being interested in my work. Thank you former BSPI colleagues Peter, Erik, Christina, Stefan, Anna, Su and Micke, and new CTS colleagues Lisa Nina for your interest and input in my projects, but mostly for the nice time we spent together. Thank you, Stefan and Anna, for being so involved in the projects and for your scientific and non-scientific support throughout the years.

Thank you to the former Protein Engineering team for all the support and nice lunches. Thank you Monica, Johanna and Anna for all the help with protein expression. Robban, thank you for helping me when I was lost in the beginning and for always being there for me with your calm, positive attitude. Most importantly, thank you for always discussing Eurovision with me (#Rotterdam2021). Du är bäst! Dominik, danke für Alles!! Thank you for patiently explaining the same thing five times in the lab, thank you for your support and input in the projects and paper, and thank you for all the fun conversations about the politics of Liechtenstein.

Thank you to all other dear colleagues and co-authors Sergei, Chao, Ulrica, Tommy, Agneta, Åsa, and Jenny. Chao, thank you for taking the time to share your enzyme wisdom with me and for always praising my (super basic) Chinese. Sergei, thank you for teaching me about CD, Russian history, opera and many other things. You have been asking me since day one if I was writing my thesis already - well, I finally did!

A lot of the support throughout this journey has come from people outside of work. Thank you to my Stockholm friends (I would even say my Stockholm family) for making my time here amazing. Whether you are still here or have moved away, you have all contributed to what I believe to be the happiest time of my life and I will always cherish the beautiful memories we created together. David, muchísimas gracias por tu apoyo durante estos años. Thank you to my friends in Nederland for always making me feel like no time has passed when we meet and for continuing to invest time and effort in our relationships. A special thank you to you, Marina, for being the best friend I could ask for. After six years of long-distance friendship we are stronger than ever and you continue to inspire and support me in the best possible way. To Mama, Papa and Vivi: dank jullie wel voor alles ♡. Your love and support has been and will always be a motivation for me to be the best version of myself and I am incredibly grateful that I got to share this journey with you.

Looking back at this experience I mostly consider myself very lucky. Not everyone gets the chance to work with projects they love and with such amazing people, both in academia and in industry. I dare say I have seen the good and the bad things that come with both sides, but despite the challenges (and an ongoing pandemic) I honestly feel that I could not have asked for a better experience as a PhD student.

82 Bibliography

1. Editorial. Expanding biocatalysis for a sustainable future. Nature Catalysis 3, 179–180 (2020). 2. Prather, K. L. J. Accelerating and expanding nature to address its greatest challenges. Nature Catalysis 3, 181–183 (2020). 3. SDGs. https://www.un.org/sustainabledevelopment/sustainable-development-goals/. Accessed on 2020-08-13. 4. Wolfenden, R. & Snider, M. J. The depth of chemical time and the power of enzymes as catalysts. Accounts of Chemical Research 34, 938–945 (2001). 5. Copeland, R. A. in Enzymes: a practical introduction to structure, mechanism, and data analysis (ed Copeland, R. A.) 2nd ed., 1–10 (Wiley, 2000). 6. Kumar, S. S. & Abdulhameed, S. in Bioresources and Bioprocess in Biotechnology (eds Sugathan, S., Pradeep, N. & Abdulhameed, S.) 45–73 (Springer, 2017). 7. Fischer, E. Einfluss der Configuration auf die Wirkung der Enzyme. Berichte der deutschen chemischen Gesellschaft 27, 2985–2993 (1894). 8. Buchner, E. Alkoholische Gährung ohne Hefezellen. Berichte der deutschen chemischen Gesellschaft 30, 117–124 (1897). 9. Bertrand, G. Sur 1’intervention du manganese dans les oxydations provoquees par la laccase. Comptes rendus de l’Académie des Sciences 124, 1032–1035 (1897). 10. Sumner, J. B. The Isolation and Crystallization of the Enzyme Urease: Preliminary Paper. Journal of Biological Chemistry 69, 435–441 (1926). 11. Blake, C. C. et al. Structure of hen egg-white lysozyme: A three-dimensional Fourier synthesis at 2 Å resolution. Nature 206, 757–761 (1965). 12. Marcos, E. et al. Principles for designing proteins with cavities formed by curved β sheets. Science 355, 201–206 (2017). 13. Wang, L. et al. Structure-based chemical modification strategy for enzyme replacement treat- ment of phenylketonuria. Molecular Genetics and Metabolism 86, 134–140 (2005). 14. Anfinsen, C. B. Principles that Govern the Folding of Protein Chains. Science 181, 223–230 (1973). 15. Bugg, T. D. in Introduction to Enzyme and Coenzyme Chemistry (ed Bugg, T. D.) 2nd ed., 8–24 (Blackwell Publishing, 2004). 16. Henzler-Wildman, K. A. et al. A hierarchy of timescales in protein dynamics is linked to enzyme catalysis. Nature 450, 913–918 (2007). 17. Werner, T., Morris, M. B., Dastmalchi, S. & Church, W. B. Structural modelling and dynamics of proteins for insights into drug interactions. Advanced Drug Delivery Reviews 64, 323–343 (2012). 18. Henzler-Wildman, K. & Kern, D. Dynamic personalities of proteins. Nature 450, 964–972 (2007). 19. Kamerlin, S. C. L. & Warshel, A. At the dawn of the 21st century: Is dynamics the missing link for understanding enzyme catalysis? Proteins: Structure, Function, and Bioinformatics 78, 1339–1375 (2010). 20. Koshland, D. E. Application of a Theory of Enzyme Specificity to Protein Synthesis. Proceedings of the National Academy of Sciences 44, 98–104 (1958). 21. Monod, J., Wyman, J. & Changeux, J.-P. On the nature of allosteric transitions: A plausible model. J Mol Biol 12, 88–118 (1965). 22. Rubin, M. M. & Changeux, J.-P. On the nature of allosteric transitions: Implications of non- exclusive ligand binding. J Mol Biol 21, 265–274 (1966). 23. Kokkinidis, M., Glykos, N. M. & Fadouloglou, V. E. in Advances in protein chemistry and structural biology 181–218 (Elsevier, 2012).

83 Bibliography

24. Wodak, S. J. et al. Allostery in Its Many Disguises: From Theory to Applications. Structure 27, 566–578 (2019). 25. Tokuriki, N. & Tawfik, D. S. Protein Dynamism and Evolvability. Science 324, 203–207 (2009). 26. Palmer, A. G. Enzyme Dynamics from NMR Spectroscopy. Accounts of Chemical Research 48, 457–465 (2015). 27. Gibbs, J. W. A method of geometrical representation of the thermodynamic properties of substances by means of surfaces. Transactions of the Connecticut Academy, 382–404 (1873). 28. Eyring, H. & Stearn, A. E. The application of the theory of absolute reaction rates to proteins. Chemical Reviews 24, 253–270 (1939). 29. Arrhenius, S. Über die Dissociationswärme und den Einfluss der Temperatur auf den Dissocia- tionsgrad der Elektrolyte. Zeitschrift für Physikalische Chemie 4, 96–116 (1889). 30. Arrhenius, S. Über die Reaktionsgeschwindigkeit bei der Inversion von Rohrzucker durch Säuren. Zeitschrift für Physikalische Chemie 4, 226–248 (1889). 31. Garcia-Viloca, M. How enzymes work: Analysis by modern rate theory and computer simula- tions. Science 303, 186–195 (2004). 32. Pauling, L. Molecular architecture and biological reactions. Chemical And Engineering News 24, 1375–1377 (1946). 33. Copeland, R. A. in Enzymes: a practical introduction to structure, mechanism, and data analysis (ed Copeland, R. A.) 2nd ed., 109–145 (Wiley, 2000). 34. Brown, A. J. XXXVI. - Enzyme action 1902. 35. Henri, V. Lois générales de l’action des diastases. A. Hermann (Paris) (1903). 36. Michaelis, L., Menten, M. L., Goody, R. S. & Johnson, K. A. Die Kinetik der Invertinwirkung/ The kinetics of invertase action. Biochemistry 49, 333–369 (1913). 37. Briggs, G. E. & Haldane, J. B. S. A Note on the Kinetics of Enzyme Action. Biochemical Journal 19, 338–339 (1925). 38. Hill, A. V. The possible effects of the aggregation of the molecules of haemoglobin onits dissociation curves. Proceedings of the Physiological Society (1910). 39. Woese, C. The universal ancestor. Proceedings of the National Academy of Sciences 95, 6854– 6859 (1998). 40. Tokuriki, N. et al. Diminishing returns and tradeoffs constrain the laboratory optimization of an enzyme. Nature Communications 3, 1257 (2012). 41. Goldsmith, M. & Tawfik, D. S. Enzyme engineering: reaching the maximal catalytic efficiency peak. Current Opinion in Structural Biology 47, 140–150 (2017). 42. Starr, T. N. & Thornton, J. W. Epistasis in protein evolution. Protein Science 25, 1204–1218 (2016). 43. Miton, C. M. & Tokuriki, N. How mutational epistasis impairs predictability in protein evolu- tion and design. Protein Science 25, 1260–1272 (2016). 44. Starr, T. N., Picton, L. K. & Thornton, J. W. Alternative evolutionary histories in the sequence space of an ancient protein. Nature 549, 409–413 (2017). 45. Sailer, Z. R. & Harms, M. J. Molecular ensembles make evolution unpredictable. Proceedings of the National Academy of Sciences, 11938–11943 (2017). 46. Kaltenbach, M. et al. Evolution of chalcone from a noncatalytic ancestor. Nature Chemical Biology 14, 548–555 (2018). 47. Clifton, B. E. et al. Evolution of cyclohexadienyl from an ancestral solute-binding protein. Nature Chemical Biology 14, 542–547 (2018). 48. Campbell, E. et al. The role of protein dynamics in the evolution of new enzyme function. Nature Chemical Biology 12, 944–950 (2016). 49. Adrain, C. & Freeman, M. New lives for old: evolution of pseudoenzyme function illustrated by iRhoms. Nature Reviews Molecular Cell Biology 13, 489–498 (2012). 50. Todd, A. E., Orengo, C. A. & Thornton, J. M. Sequence and Structural Differences between Enzyme and Nonenzyme Homologs. Structure 10, 1435–1451 (2002). 51. Sharir-Ivry, A. & Xia, Y. Nature of long-range evolutionary constraint in enzymes: Insights from comparison to pseudoenzymes with similar structures. Molecular Biology and Evolution 35, 2597–2606 (2018). 52. Jensen, R. A. Enzyme recruitment in evolution of new function. Annual Review of Microbiology 30, 409–425 (1976). 53. Khersonsky, O. & Tawfik, D. S. : A mechanistic and evolutionary perspec- tive. Annual Review of Biochemistry 79, 471–505 (2010). 54. Gerlt, J. A. & Babbitt, P. C. Divergent evolution of enzymatic function: Mechanistically diverse superfamilies and functionally distinct suprafamilies. Annual Review of Biochemistry 70, 209– 246 (2001).

84 Bibliography

55. Baier, F., Copp, J. N. & Tokuriki, N. Evolution of enzyme superfamilies: Comprehensive ex- ploration of sequence-function relationships. Biochemistry 55, 6375–6388 (2016). 56. Renata, H., Wang, Z. J. & Arnold, F. H. Expanding the enzyme universe: Accessing non-natural reactions by mechanism-guided directed evolution. Angewandte Chemie - International Edition 54, 3351–3367 (2015). 57. .Newton, M. S., Arcus, V. L., Gerth, M. L. & Patrick, W. M. Enzyme evolution: innovation is easy, optimization is complicated. Current Opinion in Structural Biology 48, 110–116 (2018). 58. Davidi, D., Longo, L. M., Jabłońska, J., Milo, R. & Tawfik, D. S. A Bird’s-Eye View of Enzyme Evolution: Chemical, Physicochemical, and Physiological Considerations. Chemical Reviews 118, 8786–8797 (2018). 59. Bar-Even, A., Milo, R., Noor, E. & Tawfik, D. S. The Moderately Efficient Enzyme: Futile Encounters and Enzyme Floppiness. Biochemistry 54, 4969–4977 (2015). 60. Bar-Even, A. et al. The moderately efficient enzyme: Evolutionary and physicochemical trends shaping enzyme parameters. Biochemistry 50, 4402–4410 (2011). 61. Vieille, C. & Zeikus, G. J. Hyperthermophilic enzymes: Sources, uses, and molecular mecha- nisms for thermostability. Microbiology and Molecular Biology Reviews 65, 1–43 (2001). 62. Ciccarelli, F. D. et al. Toward Automatic Reconstruction of a Highly Resolved Tree of Life. Science 311, 1283–1287 (2006). 63. Akanuma, S. et al. Experimental evidence for the thermophilicity of ancestral life. Proceedings of the National Academy of Sciences 110, 11067–11072 (2013). 64. Weiss, M. C. et al. The physiology and habitat of the last universal common ancestor. Nature Microbiology 1, 16116 (2016). 65. Nguyen, V. et al. Evolutionary drivers of thermoadaptation in enzyme catalysis. Science 355, 289–294 (2016). 66. Stockbridge, R. B., Lewis, C. A., Yuan, Y. & Wolfenden, R. Impact of temperature on the time required for the establishment of primordial biochemistry, and for the evolution of enzymes. Proceedings of the National Academy of Sciences 107, 22102–22105 (2010). 67. Wolfenden, R. Primordial chemistry and enzyme evolution in a warm environment. Cellular and molecular life sciences 71, 2909–2915 (2014). 68. Elias, M., Wieczorek, G., Rosenne, S. & Tawfik, D. S. The universality of enzymatic rate– temperature dependency. Trends in Biochemical Sciences 39, 1–7 (2014). 69. Somero, G. N. Temperature as a selective factor in protein evolution: The adaptational strategy of “compromise”. Journal of Experimental Zoology 194, 175–188 (1975). 70. Miller, S. R. An appraisal of the enzyme stability-activity trade-off. Evolution 71, 1876–1887 (2017). 71. Serdar, C. M., Gibson, D. T., Munnecke, D. M. & Lancaster, J. H. Plasmid Involvement in Parathion Hydrolysis by Pseudomonas diminuta. eng. Applied and environmental microbiology 44, 246–249 (1982). 72. Seffernick, J. L. & Wackett, L. P. Rapid evolution of bacterial catabolic enzymes: A casestudy with chlorohydrolase. Biochemistry 40, 12747–12753 (2001). 73. Wright, G. D. Bacterial resistance to antibiotics: Enzymatic degradation and modification. Advanced Drug Delivery Reviews 57, 1451–1470 (2005). 74. Egorov, A. M., Ulyashova, M. M. & Rubtsova, M. Y. Bacterial enzymes and antibiotic resistance. eng. Acta naturae 10, 33–48 (2018). 75. Yoshida, S. et al. A bacterium that degrades and assimilates poly(ethylene terephthalate). Science 351, 1196–1199 (2016). 76. Winter, G., Fersht, A. R., Wilkinson, A. J., Zoller, M. & Smith, M. Redesigning enzyme structure by site-directed mutagenesis: tyrosyl tRNA synthetase and ATP binding. Nature 299, 756–758 (1982). 77. Eigen, M. & Gardiner, W. Evolutionary molecular engineering based on RNA replication. Pure and Applied Chemistry 56, 967–978 (1984). 78. Chen, K. & Arnold, F. H. Tuning the activity of an enzyme for unusual environments: Sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide. Proceedings of the National Academy of Sciences 90, 5618–5622 (1993). 79. Arnold, F. H. Directed evolution: Bringing new chemistry to life. Angewandte Chemie - Inter- national Edition 57, 4143–4148 (2018). 80. Stemmer, W. P. C. Rapid evolution of a protein in vitro by DNA shuffling. Nature 370, 389–391 (1994). 81. Stemmer, W. P. DNA shuffling by random fragmentation and reassembly: in vitro recombina- tion for molecular evolution. Proceedings of the National Academy of Sciences 91, 10747–10751 (1994). 82. You, L. & Arnold, F. H. Directed evolution of subtilisin E in Bacillus subtilis to enhance total activity in aqueous dimethylformamide. eng. Protein engineering 9, 77–83 (1996).

85 Bibliography

83. Goeddel, D. V. et al. Expression in Escherichia coli of chemically synthesized genes for human insulin. Proceedings of the National Academy of Sciences 76, 106–110 (1979). 84. Vellard, M. The enzyme as drug: application of enzymes as pharmaceuticals. Current Opinion in Biotechnology 14, 444–450 (2003). 85. Abuchowski, A., van Es, T., Palczuk, N. C. & Davis, F. F. Alteration of immunological prop- erties of bovine serum albumin by covalent attachment of polyethylene glycol. The Journal of biological chemistry 252 (ed Abuchowski, A.) 3578–3581 (1977). 86. Harris, J. M., Martin, N. E. & Modi, M. Pegylation. Clinical Pharmacokinetics 40, 539–551 (2001). 87. Lusher, J. M. Recombinant Clotting Factors. BioDrugs 13, 289–298 (2000). 88. Kunamneni, A., Ogaugwu, C. & Goli, D. in Enzymes in Human and Animal Nutrition (eds Nunes, C. S., Kumar, V. B. T. E. i. H. & Nutrition, A.) 301–312 (Academic Press, 2018). 89. Taipa, M. Â., Fernandes, P. & de Carvalho, C. C. C. R. in Therapeutic Enzymes: Function and Clinical Implications (ed Labrou, N.) 1st ed., 1–24 (Springer, 2019). 90. Lutz, S., Williams, E. & Muthu, P. in Directed Enzyme Evolution: Advances and Applications (ed Alcalde, M.) 1st ed., 17–68 (Springer, 2017). 91. Pulgar, V. M. Transcytosis to cross the Blood Brain Barrier, new advancements and challenges. Frontiers in Neuroscience 12 (2019). 92. Ghosh, S., Alam, S., Rathore, A. S. & Khare, S. K. in Therapeutic Enzymes: Function and Clinical Implications (ed Labrou, N.) 1st ed., 131–150 (Springer, 2019). 93. Marshall, S. A., Lazar, G. A., Chirino, A. J. & Desjarlais, J. R. Rational design and engineering of therapeutic proteins. Drug Discovery Today 8, 212–221 (2003). 94. Fuhrmann, G. & Leroux, J.-C. Improving the Stability and Activity of Oral Therapeutic Enzymes—Recent Advances and Perspectives. An Official Journal of the American Associa- tion of Pharmaceutical Scientists 31, 1099–1105 (2014). 95. Lapuhs, P. & Fuhrmann, G. in Therapeutic Enzymes: Function and Clinical Implications (ed Labrou, N.) 1st ed., 151–172 (Springer, 2019). 96. Greenwald, R. B. PEG drugs: an overview. Journal of Controlled Release 74, 159–171 (2001). 97. Rosenberg, A. S. Effects of protein aggregates: An immunologic perspective. The AAPS Journal 8, 501–507 (2006). 98. De Duve, C. From cytases to lysosomes. Federation proceedings 23, 1045–1049 (1964). 99. Barton, N. et al. Replacement therapy for inherited enzyme deficiency - macrophage-targeted for Gaucher’ disease. New England Journal of Medicine 324, 1464–1470 (1991). 100. Chen, H. H. et al. Enzyme replacement therapy for mucopolysaccharidoses; past, present, and future. Journal of Human Genetics 64, 1153–1171 (2019). 101. Boustany, R. M. N. Lysosomal storage diseases - The horizon expands. Nature Reviews Neurol- ogy 9, 583–598 (2013). 102. Desnick, R. & Schuchman, E. Enzyme Replacement Therapy for Lysosomal Diseases: Lessons from 20 Years of Experience and Remaining Challenges. Annual Review of Genomics and Human Genetics 13, 307–335 (2012). 103. Grabowski, G. A. & Whitley, C. Ten plus one challenges in diseases of the lysosomal system. Molecular Genetics and Metabolism 120, 38–46 (2017). 104. Gambello, M. J. & Li, H. Current strategies for the treatment of inborn errors of metabolism. Journal of Genetics and Genomics 45, 61–70 (2018). 105. Whiteman, D. A. H. & Kimura, A. Development of idursulfase therapy for mucopolysaccharido- sis type II(Hunter syndrome): The past, the present and the future. Drug Design, Development and Therapy 11, 2467–2480 (2017). 106. Muenzer, J. et al. A phase II/III clinical study of enzyme replacement therapy with idursulfase in mucopolysaccharidosis II (Hunter syndrome). Genetics in Medicine 8, 465–473 (2006). 107. Heartlein, M. & Kimura, A. Discovery and clinical development of idursulfase (Elaprase®) for the treatment of mucopolysaccharidosis II (Hunter syndrome). RSC Drug Discovery Series 38, 164–182 (2014). 108. Sohn, Y. B. et al. Phase I/II clinical trial of enzyme replacement therapy with idursulfase beta in patients with mucopolysaccharidosis II (Hunter Syndrome). Orphanet Journal of Rare Diseases 8, 2–9 (2013). 109. Kim, C. et al. Comparative study of idursulfase beta and idursulfase in vitro and in vivo. Journal of Human Genetics 62, 167–174 (2017). 110. Sonoda, H. et al. A blood-brain-barrier-penetrating anti-human transferrin receptor antibody fusion protein for neuronopathic mucopolysaccharidosis II. Molecular Therapy 26, 1366–1374 (2018).

86 Bibliography

111. Williams, R., Mamotte, C. D. & Burnett, J. Phenylketonuria: an inborn error of phenylalanine metabolism. The Clinical biochemist. Reviews / Australian Association of Clinical Biochemists (2008). 112. Scott, R. C. The genetic . American journal of . Part C, Seminars in medical genetics 142C, 121–126 (2006). 113. Lindstedt, S., Holme, E., Lock, E. A., Hjalmarson, O. & Strandvik, B. Treatment of hereditary tyrosinaemia type I by inhibition of 4-hydroxyphenylpyruvate dioxygenase. The Lancet 340, 813–817 (1992). 114. Kim, W. et al. Trends in enzyme therapy for phenylketonuria. Molecular Therapy 10, 220–224 (2004). 115. Grisch-Chan, H. M. et al. Low-dose gene therapy for murine PKU using episomal naked DNA vectors expressing PAH from its endogenous liver promoter. Molecular Therapy - Nucleic Acids 7, 339–349 (2017). 116. Durrer, K. E., Allen, M. S. & Hunt von Herbing, I. Genetically engineered probiotic for the treatment of phenylketonuria (PKU); assessment of a novel treatment in vitro and in the PAHenu2 mouse model of PKU. PLOS ONE 12, e0176286 (2017). 117. Isabella, V. M. et al. Development of a synthetic live bacterial therapeutic for the human metabolic disease phenylketonuria. Nature Biotechnology 36, 857–867 (2018). 118. Gámez, A., Wang, L., Straub, M., Patch, M. G. & Stevens, R. C. Toward PKU enzyme replace- ment therapy: PEGylation with activity retention for three forms of recombinant phenylalanine hydroxylase. Molecular Therapy 9, 124 (2004). 119. Sarkissian, C. N. & Gámez, A. Phenylalanine ammonia lyase, enzyme substitution therapy for phenylketonuria, where are we now? Molecular Genetics and Metabolism 86, 22–26 (2005). 120. Hoskins, J. & Gray, J. Phenylalanine ammonia lyase in the management of phenylketonuria: The relationship between ingested cinnamate and urinary hippurate in humans. Research com- munications in chemical pathology and pharmacology 35, 275–282 (1982). 121. Hoskins, J. et al. Enzymatic control of phenylalanine intake in phenylketonuria. The Lancet 315, 392–394 (1980). 122. Sarkissian, C. N. et al. A different approach to treatment of phenylketonuria: Phenylala- nine degradation with recombinant phenylalanine ammonia lyase. Proceedings of the National Academy of Sciences 96, 2339–2344 (1999). 123. Levy, H. L., Sarkissian, C. N. & Scriver, C. R. Phenylalanine ammonia lyase (PAL): From dis- covery to enzyme substitution therapy for phenylketonuria. Molecular Genetics and Metabolism 124, 223–229 (2018). 124. Hydery, T. & Coppenrath, V. A. A comprehensive review of , an enzyme substitution therapy for the treatment of phenylketonuria. Drug Target Insights 13, p.117739281985708 (2019). 125. Parmeggiani, F., Weise, N. J., Ahmed, S. T. & Turner, N. J. Synthetic and therapeutic appli- cations of ammonia-lyases and aminomutases. Chemical Reviews 118, 73–118 (2018). 126. Garrait, G. et al. Gastrointestinal absorption and urinary excretion of trans-cinnamic and p-coumaric acids in rats. Journal of Agricultural and Food Chemistry 54, 2944–2950 (2006). 127. Oldfield, E. & Lin, F. Y. Terpene biosynthesis: Modularity rules. Angewandte Chemie - Inter- national Edition 51, 1124–1137 (2012). 128. Christianson, D. W. Structural and Chemical Biology of Terpenoid Cyclases. Chemical Reviews 117, 11570–11648 (2017). 129. Gershenzon, J. & Dudareva, N. The function of terpene natural products in the natural world. Nat Chem Biol 3, 408–414 (2007). 130. Ruzicka, L. The isoprene rule and the biogenesis of terpenic compounds. Experientia 9, 357–367 (1953). 131. Baer, P. et al. Induced-fit mechanism in class i terpene cyclases. Angewandte Chemie - Inter- national Edition 53, 7652–7656 (2014). 132. Köksal, M., Jin, Y., Coates, R. M., Croteau, R. & Christianson, D. W. Taxadiene synthase structure and evolution of modular architecture in terpene biosynthesis. Nature 469, 116–120 (2011). 133. Steele, C. L., Crock, J., Bohlmann, J. & Croteau, R. Sesquiterpene synthases from grand fir (Abies grandis). Comparison of constitutive and wound-induced activities, and cDNA isolation, characterization, and bacterial expression of delta-selinene synthase and gamma-humulene syn- thase. The Journal of biological chemistry 273, 2078–2089 (1998). 134. Neumann, S. & Simon, H. Purification, partial characterization and substrate specificity ofa squalene cyclase from Bacillus acidocaldarius. Biological Chemistry Hoppe-Seyler 367, 723–730 (1986). 135. Bian, G. et al. Releasing the potential power of terpene synthases by a robust precursor supply platform. Metabolic Engineering 42, 1–8 (2017).

87 Bibliography

136. Huang, H. et al. Structure of a membrane-embedded prenyltransferase homologous to UBIAD1. 12, e1001911 (2014). 137. Wendt, K. U., Poralla, K. & Schulz, G. E. Structure and function of a squalene cyclase. Science 277, 1811–1815 (1997). 138. Rabe, P. et al. Mechanistic investigations of two bacterial diterpene cyclases: spiroviolene syn- thase and tsukubadiene synthase. Angewandte Chemie - International Edition 56, 2776–2779 (2017). 139. Kumavath, R. N. et al. Isolation and Characterization of L-Tryptophan Ammonia Lyase from Rubrivivax benzoatilyticus Strain JA2 2015. 140. Schwede, T. F., Rétey, J. & Schulz, G. E. Crystal structure of histidine ammonia-lyase revealing a novel polypeptide modification as the catalytic electrophile. Biochemistry 38, 5355–5361 (1999). 141. Hanson, K. R. & Havir, E. A. l-Phenylalanine ammonia-lyase: IV. Evidence that the prosthetic group contains a dehydroalanyl residue and mechanism of action. Archives of Biochemistry and Biophysics 141, 1–17 (1970). 142. Hermes, J. D., Weiss, P. M. & Cleland, W. W. Use of nitrogen-15 and deuterium isotope effects to determine the chemical mechanism of phenylalanine ammonia-lyase. Biochemistry 24, 2959– 2967 (1985). 143. Pilbák, S., Farkas, Ö. & Poppe, L. Mechanism of the tyrosine ammonia lyase reaction-tandem nucleophilic and electrophilic enhancement by a proton transfer. Chemistry - A European Jour- nal 18, 7793–7802 (2012). 144. Pinto, G. P. et al. New insights in the catalytic mechanism of tyrosine ammonia-lyase given by QM/MM and QM cluster models. Archives of Biochemistry and Biophysics 582, 107–115 (2015). 145. Poppe, L. & Rétey, J. Friedel-crafts-type mechanism for the enzymatic elimination of ammonia from histidine and phenylalanine. Angewandte Chemie - International Edition 44, 3668–3688 (2005). 146. Louie, V. G. et al. Structural determinants and modulation of substrate specificity in phenylalanine- tyrosine smmonia-lyases. Chemistry and Biology 13, 1327–1338 (2006). 147. Sariaslani, S., Vannelli, T., Jordan, D. B., Boodhoo, A. & Calabrese, J. C. Crystal structure of phenylalanine ammonia Lyase: Multiple helix dipoles implicated in catalysis. Biochemistry 43, 11403–11416 (2004). 148. Ritter, H. & Schulz, G. E. Structural basis for the entrance into the phenylpropanoid metabolism catalyzed by phenylalanine ammonia-lyase. The plant cell 16, 3426–3436 (2004). 149. Cooke, H. A., Christianson, C. V. & Bruner, S. D. Structure and chemistry of 4-methylideneimidazole- 5-one containing enzymes. Current Opinion in Chemical Biology 13, 460–468 (2009). 150. Suchi, M., Sano, H., Mizuno, H. & Wada, Y. Molecular cloning and structural characterization of the human histidase gene (HAL). Genomics 29, 98–104 (1995). 151. Labeeuw, L., Martone, P. T., Boucher, Y. & Case, R. J. Ancient origin of the biosynthesis of lignin precursors. Biology Direct 10, 1–21 (2015). 152. Kyndt, J. A., Meyer, T. E., Cusanovich, M. A. & Van Beeumen, J. J. Characterization of a bacterial tyrosine ammonia lyase, a biosynthetic enzyme for the photoactive yellow protein. FEBS Letters 512, 240–244 (2002). 153. Xue, Z., McCluskey, M., Cantera, K., Sariaslani, F. S. & Huang, L. Identification, charac- terization and functional expression of a tyrosine ammonia-lyase and its mutants from the photosynthetic bacterium Rhodobacter sphaeroides. Journal of Industrial Microbiology and Biotechnology 34, 599–604 (2007). 154. Jendresen, C. B. et al. Highly active and specific tyrosine ammonia-lyases from diverse origins enable enhanced production of aromatic compounds in bacteria and Saccharomyces cerevisiae. Applied and Environmental Microbiology 81, 4458–4476 (2015). 155. Xiang, L. & Moore, B. S. Biochemical characterization of a prokaryotic phenylalanine ammonia lyase. The Journal of Bacteriology 187, 4286 (2005). 156. Moffitt, M. C. et al. Discovery of two cyanobacterial phenylalanine ammonia lyases: Kinetic and structural characterization. Biochemistry 46, 1004–1012 (2007). 157. Emiliani, G., Fondi, M., Fani, R. & Gribaldo, S. A horizontal gene transfer at the origin of phenylpropanoid metabolism: A key adaptation of plants to land. Biology Direct 4, 1–12 (2009). 158. Havir, E. A., Reid, P. D. & Marsh, V. H. L-Phenylalanine Ammonia-Lyase (Maize). Plant Physiology 48, 130–136 (1971). 159. Rosler, J., Krekel, F., Amrhein, N. & Schmid, J. Maize phenylalanine ammonia-lyase has tyro- sine ammonia-lyase activity. Plant Physiology 113, 175–179 (1997). 160. Jangaard, N. O. The characterization of phenylalanine ammonia-lyase from several plant species. Phytochemistry 13, 1765–1768 (1974).

88 Bibliography

161. Vannelli, T., Xue, Z., Breinig, S., Qi, W. W. & Sariaslani, F. S. Functional expression in Es- cherichia coli of the tyrosine-inducible tyrosine ammonia-lyase enzyme from yeast Trichosporon cutaneum for production of p-hydroxycinnamic acid. Enzyme and Microbial Technology 41, 413–422 (2007). 162. Xue, Z. et al. Improved production of p-hydroxycinnamic acid from tyrosine using a novel thermostable phenylalanine/tyrosine ammonia lyase enzyme. Enzyme and Microbial Technology 42, 58–64 (2007). 163. Zhu, L. et al. Cloning, expression and characterization of phenylalanine ammonia-lyase from Rhodotorula glutinis. Biotechnology Letters 35, 751–756 (2013). 164. Watts, K. T., Mijts, B. N., Lee, P. C., Manning, A. J. & Schmidt-Dannert, C. Discovery of a substrate selectivity switch in tyrosine ammonia-lyase, a member of the aromatic amino acid lyase family. Chemistry and Biology 13, 1317–1326 (2006). 165. De Duve, C., Pressman, B. C., Gianetto, R., Wattiaux, R. & Appelmans, F. Tissue fractionation studies. 6. Intracellular distribution patterns of enzymes in rat-liver tissue. The Biochemical journal 60, 604 (1955). 166. Lüllmann-Rauch, R. in Lysosomes (ed Saftig, P.) 1–16 (Springer, 2005). 167. Brooks, D. & Parkinson-Lawrence, E. in Lysosomal Storage Disorders (eds Barranger, J. A. & Cabrera-Salazar, M. A.) 7–36 (Springer, 2007). 168. Hille-Rehfeld, A. Mannose 6-phosphate receptors in sorting and transport of lysosomal enzymes. Biochimica et Biophysica Acta (BBA) - Reviews on Biomembranes 1241, 177–194 (1995). 169. Hasilik, A., Wrocklage, C. & Schröder, B. Intracellular trafficking of lysosomal proteins and lysosomes. International Journal of Clinical Pharmacology and Therapeutics 47, 18–33 (2009). 170. Storch, S. & Braulke, T. in Lysosomes (ed Saftig, P.) 17–26 (Springer, 2005). 171. Preusser-Kunze, A. et al. Molecular characterization of the human Cα-formylglycine-generating enzyme. Journal of Biological Chemistry 280, 14900–14910 (2005). 172. Schmidt, B., Selmer, T., Ingendoh, A. & Figurat, v. K. A novel amino acid modification in sulfatases that is defective in multiple sulfatase deficiency. Cell 82, 271–278 (1995). 173. Von Figura, K., Schmidt, B., Selmer, T. & Dierks, T. A novel protein modification generating an aldehyde group in sulfatases: Its role in catalysis and disease. BioEssays 20, 505–510 (1998). 174. Roeser, D. et al. A general binding mechanism for all human sulfatases by the formylglycine- generating enzyme. Proceedings of the National Academy of Sciences 103, 81–86 (2006). 175. Diez-Roux, G. & Ballabio, A. Sulfatases and human disease. Annual Review of Genomics and Human Genetics 6, 355–379 (2005). 176. Hunter, C. A rare disease in two brothers. Proceedings of the Royal Society of Medicine 10, 104–116 (1917). 177. Wilson, P. J. et al. Hunter syndrome: Isolation of an iduronate-2-sulfatase cDNA clone and analysis of patient DNA. Proceedings of the National Academy of Sciences 87, 8531–8535 (1990). 178. Demydchuk, M. et al. Insights into Hunter syndrome from the structure of iduronate-2-sulfatase. Nature Communications 8, 15786 (2017). 179. Cobb, R. E., Chao, R. & Zhao, H. Directed evolution: Past, present, and future. AIChE Journal 59, 1432–1440 (2013). 180. Reetz, M. T., Bocola, M., Carballeira, J. D., Zha, D. & Vogel, A. Expanding the range of sub- strate acceptance of enzymes: Combinatorial Active-Site Saturation Test. Angewandte Chemie - International Edition 44, 4192–4196 (2005). 181. Kazlauskas, R. J. & Bornscheuer, U. T. Finding better protein engineering strategies. Nature Chemical Biology 5, 526–529 (2009). 182. Pavlidis, I. V., Hendrikse, N. M. & Syrén, P.-O. Chapter 5: Computational Techniques for Efficient Biocatalysis 32 (2018). 183. Zhang, C. & DeLisi, C. Estimating the number of protein folds. Journal of Molecular Biology 284, 1301–1305 (1998). 184. Illergård, K., Ardell, D. H. & Elofsson, A. Structure is three to ten times more conserved than sequence—A study of structural response in protein cores. Proteins: Structure, Function, and Bioinformatics 77, 499–508 (2009). 185. Xiang, Z. Advances in homology protein structure modeling. eng. Current protein & peptide science 7, 217–227 (2006). 186. Lindorff‐Larsen, K. et al. Improved side‐chain torsion potentials for the Amber ff99SB protein force field. Proteins 78, 1950–1958 (2010). 187. Krieger, E., Darden, T., Nabuurs, S. B., Finkelstein, A. & Vriend, G. Making optimal use of empirical energy functions: Force-field parameterization in crystal space. Proteins 57, 678–683 (2004).

89 Bibliography

188. Berendsen, H. J. C., van der Spoel, D. & van Drunen, R. GROMACS: A message-passing parallel molecular dynamics implementation. Computer Physics Communications 91, 43–56 (1995). 189. Pronk, S. et al. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29, 845–854 (2013). 190. Bawono, P. et al. in Bioinformatics Volume I: Data, sequence analysis and evolution (ed Keith, J. M.) 2nd ed., 167–189 (Humana Press, 2017). 191. Cole, M. F. & Gaucher, E. A. Exploiting models of molecular evolution to efficiently direct protein engineering. Journal of Molecular Evolution 72, 193–203 (2011). 192. Starr, T. N. & Thornton, J. W. Exploring protein sequence–function landscapes. Nature Biotech- nology 35, 125–126 (2017). 193. Lehmann, M., Pasamontes, L., Lassen, S. F. & Wyss, M. The consensus concept for thermosta- bility engineering of proteins. Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology 1543, 408–415 (2000). 194. Thornton, J. W. Resurrecting ancient genes: experimental analysis of extinct molecules. Nature Reviews Genetics 5, 366–375 (2004). 195. Kaplan, J. & DeGrado, W. F. De novo design of catalytic proteins. Proceedings of the National Academy of Sciences 101, 11566–11570 (2004). 196. Huang, P.-S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016). 197. Baker, D. An exciting but challenging road ahead for computational enzyme design. Protein Science 19, 1817–1819 (2010). 198. Kries, H., Blomberg, R. & Hilvert, D. De novo enzymes by computational design. Current Opinion in Chemical Biology 17, 221–228 (2013). 199. Morley, K. L. & Kazlauskas, R. J. Improving enzyme properties: when are closer mutations better? Trends in Biotechnology 23, 231–237 (2005). 200. Daegelen, P., Studier, F. W., Lenski, R. E., Cure, S. & Kim, J. F. Tracing ancestors and relatives of Escherichia coli B, and the derivation of B strains REL606 and BL21(DE3). eng. Journal of molecular biology 394, 634–643 (2009). 201. Ratzkin, B. & Carbon, J. Functional expression of cloned yeast DNA in Escherichia coli. Pro- ceedings of the National Academy of Sciences 74, 487–491 (1977). 202. Rosano, G. L. & Ceccarelli, E. A. Recombinant protein expression in Escherichia coli: advances and challenges. Frontiers in Microbiology 5, 172 (2014). 203. Walsh, G. & Jefferis, R. Post-translational modifications in the context of therapeutic proteins. Nature Biotechnology 24, 1241–1252 (2006). 204. Carillo, S., Mittermayr, S., Farrell, A., Albrecht, S. & Bones, J. in Heterologous protein pro- duction in CHO Cells (ed Meleady, P.) 227–241 (Springer, 2017). 205. Planinc, A., Bones, J., Dejaegher, B., Van Antwerpen, P. & Delporte, C. Glycan characteri- zation of biopharmaceuticals: Updates and perspectives. Analytica Chimica Acta 921, 13–27 (2016). 206. Porath, J., Carlsson, J., Olsson, I. & Belfrage, G. Metal chelate affinity chromatography, a new approach to protein fractionation. Nature 258, 598–599 (1975). 207. Terpe, K. Overview of tag protein fusions: from molecular and biochemical fundamentals to commercial systems. Applied Microbiology and Biotechnology 60, 523–533 (2003). 208. De Genst, E. J. et al. Structure and properties of a complex of α-synuclein and a single-domain camelid antibody. Journal of Molecular Biology 402, 326–343 (2010). 209. Jin, J. et al. Accelerating the clinical development of protein-based vaccines for malaria by efficient purification using a four amino acid C-terminal ‘C-tag’. International Journal for Parasitology 47, 435–446 (2017). 210. Some, D., Amartely, H., Tsadok, A. & Lebendiker, M. Characterization of proteins by Size- Exclusion Chromatography coupled to Multi-Angle Light Scattering (SEC-MALS). JoVE, e59615 (2019). 211. Greenfield, N. J. Using circular dichroism spectra to estimate protein secondary structure. Nature Protocols 1, 2876–2890 (2006). 212. Greenfield, N. J. Using circular dichroism collected as a function of temperature to determine the thermodynamics of protein unfolding and binding interactions. Nature Protocols 1, 2527– 2535 (2006). 213. Martin, L., Schwarz, S. & Breitsprecher, D. Thermal unfolding Analyzing Thermal Unfolding of Proteins : Nanotemper Technologies Application Note, 1–8 (2014). 214. Magnusson, A. O. et al. nanoDSF as screening tool for enzyme libraries and biotechnology development. FEBS Journal 286, 184–204 (2019). 215. McWilliam, I. G. & Dewar, R. A. Flame Ionization Detector for Gas Chromatography. Nature 181, 760 (1958).

90 Bibliography

216. de Saint Laumer, J.-Y. et al. Prediction of response factors for gas chromatography with flame ionization detection: Algorithm improvement, extension to silylated compounds, and applica- tion to the quantification of metabolites. Journal of Separation Science 38, 3209–3217 (2015). 217. Voznyi, Y. V., Keulemans, J. L. M. & van Diggelen, O. P. A fluorimetric enzyme assay for the diagnosis of MPS II (Hunter disease). Journal of Inherited Metabolic Disease 24, 675–680 (2001). 218. Pauling, L. & Zuckerkandl, E. Chemical paleogenetics - Molecular ”restoration studies” of extinct forms of life. Acta. Chem. Scand. 17, 9–16 (1963). 219. Stevens, P. F. Augustin Augier’s “Arbre Botanique” (1801), a remarkable early botanical rep- resentation of the natural system. Taxon 32, 203–229 (1983). 220. Hellström, N. P., André, G. & Philippe, M. Augustin Augier’s Botanical tree: transcripts and translations of two unknown sources. Huntia (2017). 221. Darwin, C. On the Origin of the Species (1859). 222. Zuckerkandl, E. & Pauling, L. Molecules as documents of evolutionary history. Journal of Theoretical Biology 8, 357–366 (1965). 223. Woese, C. R., Kandler, O. & Wheelis, M. L. Towards a natural system of organisms: Proposal for the domains Archaea, Bacteria, and Eucarya. Proceedings of the National Academy of Sciences 87, 4576–4579 (1990). 224. Williams, T. A., Foster, P. G., Cox, C. J. & Embley, T. M. An archaeal origin of eukaryotes supports only two primary domains of life. Nature 504, 231–236 (2013). 225. Hug, L. A. et al. A new view of the tree of life. Nature Microbiology 1, 1–6 (2016). 226. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. Journal of Molecular Biology 215, 403–410 (1990). 227. Pollock, D. D., Zwickl, D. J., Mcguire, J. A. & Hillis, D. M. Increased taxon sampling is advantageous for phylogenetic inference. Systematic Biology 51, 664–671 (2002). 228. Russell, D. J. Multiple sequence alignment methods 1st ed. (ed Russell, D. J.) (Humana Press, 2014). 229. Dayhoff, M., Schwartz, R. & Orcutt, B.in Atlas of protein sequence and structure 345–352 (1978). 230. Henikoff, S. & Henikoff, J. G. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences 89, 10915–10919 (1992). 231. Edgar, R. C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research (2004). 232. Katoh, K. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30, 3059–3066 (2002). 233. Katoh, K. & Standley, D. M. in Multiple sequence alignment methods (ed Russell, D. J.) 1st ed., 131–146 (Humana Press, 2014). 234. Vialle, R. A., Tamuri, A. U. & Goldman, N. Alignment modulates ancestral sequence recon- struction accuracy. Molecular Biology and Evolution 35, 1783–1797 (2018). 235. Landan, G. & Graur, D. Heads or Tails: A simple reliability check for multiple sequence align- ments. Molecular Biology and Evolution 24, 1380–1383 (2007). 236. Anisimova, M., Cannarozzi, G. M. & Liberles, D. A. Finding the balance between the mathe- matical and biological optima in multiple sequence alignment. Trends in Evolutionary Biology 2, 39–48 (2010). 237. Talavera, G. & Castresana, J. Improvement of phylogenies after removing divergent and am- biguously aligned blocks from protein sequence alignments. Systematic Biology 56, 564–577 (2007). 238. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009). 239. Whelan, S. & Morrison, D. A. in Bioinformatics Volume I: Data, sequence analysis and evolution (ed Keith, J. M.) 2nd ed., 349–377 (Humana Press, 2017). 240. Fitch, W. M. Toward defining the course of evolution: Minimum change for a specific tree topology. Systematic Zoology 20, 406–416 (1971). 241. Merkl, R. & Sterner, R. Ancestral protein reconstruction: Techniques and applications. Biolog- ical Chemistry 397, 1–21 (2016). 242. Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phyloge- netic trees. Molecular Biology and Evolution 4, 406–425 (1987). 243. Hall, B. G. Building phylogenetic trees from molecular data with MEGA. Molecular Biology and Evolution 30, 1229–1235 (2013). 244. Kumar, S., Stecher, G. & Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Molecular biology and evolution 33, 1870–1874 (2016).

91 Bibliography

245. Nguyen, L. T., Schmidt, H. A., Von Haeseler, A. & Minh, B. Q. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution 32, 268–274 (2015). 246. Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Computer applications in the biosciences 13, 555–556 (1997). 247. Huelsenbeck, J. P. & Ronquist, F. MR BAYES: Bayesian inference of phylogenetic trees. Bioin- formatics 17, 754–755 (2001). 248. Suchard, M. A. & Redelings, B. D. BAli-Phy: Simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22, 2047–2048 (2006). 249. Tria, F. D. K., Landan, G. & Dagan, T. Phylogenetic rooting using minimal ancestor deviation. Nature ecology & evolution 1, 0193 (2017). 250. Ochman, H., Lawrence, J. G. & Groisman, E. A. Lateral gene transfer and the nature of bacterial innovation. Nature 405, 299–304 (2000). 251. Cheong Xin, C., Beiko, R. G. & Ragan, M. A. in Bioinformatics Volume I: Data, sequence analysis and evolution (ed Keith, J. M.) 2nd ed., 421–432 (Humana Press, 2017). 252. Pupko, T., Doron-Faigenboim, A., Liberles, D. A. & Cannarozzi, G. M. in Ancestral Sequence Reconstruction (ed Liberles, D. A.) chap. 4 (Oxford University Press, 2007). 253. Echave, J., Spielman, S. J. & Wilke, C. O. Causes of evolutionary rate variation among protein sites. Nature Reviews Genetics 17, 109–121 (2016). 254. Yang, Z. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Molecular Biology and Evolution 10, 1396–1401 (1993). 255. Perez-Jimenez, R. et al. Single-molecule paleoenzymology probes the chemistry of resurrected enzymes. Nature Structural & Molecular Biology 18, 592–596 (2011). 256. Stackhouse, J., Presnell, S. R., Mcgeehan, G. M., Nambiar, K. P. & Benner, S. A. The ribonu- clease from an extinct bovid ruminant. FEBS Letters 262, 104–106 (1990). 257. Malcolm, B. A., Wilson, K. P., Matthews, B. W., Kirsch, J. F. & Wilson, A. C. Ancestral lysozymes reconstructed, neutrality tested, and thermostability linked to hydrocarbon packing. Nature 345, 86–89 (1990). 258. Gaucher, E. A., Govindarajan, S. & Ganesh, O. K. Palaeotemperature trend for Precambrian life inferred from resurrected proteins. Nature 451, 704–707 (2008). 259. Garcia, A. K., Schopf, J. W., Yokobori, S.-i., Akanuma, S. & Yamagishi, A. Reconstructed an- cestral enzymes suggest long-term cooling of Earth’s photic zone since the Archean. Proceedings of the National Academy of Sciences 114, 4619–4624 (2017). 260. Hobbs, J. K. et al. Change in heat capacity for enzyme catalysis determines temperature dependence of enzyme catalyzed rates. ACS Chemical Biology 8, 2388–2393 (2013). 261. Van der Kamp, M. W. et al. Dynamical origins of heat capacity changes in enzyme-catalysed reactions. Nature Communications 9, 1177 (2018). 262. Hobbs, J. K. et al. On the origin and evolution of thermophily: Reconstruction of functional precambrian enzymes from ancestors of Bacillus. Molecular Biology and Evolution 29, 825–835 (2012). 263. Babkova, P., Sebestova, E., Brezovsky, J., Chaloupkova, R. & Damborsky, J. Ancestral haloalkane dehalogenases show robustness and unique substrate specificity. ChemBioChem 18, 1448–1456 (2017). 264. Risso, V. A., Gavira, J. A., Mejia-Carmona, D. F., Gaucher, E. A. & Sanchez-Ruiz, J. M. Hy- perstability and substrate promiscuity in laboratory resurrections of precambrian β-lactamases. Journal of the American Chemical Society 135, 2899–2902 (2013). 265. Williams, P. D., Pollock, D. D., Blackburne, B. P. & Goldstein, R. A. Assessing the accuracy of ancestral protein reconstruction methods. PLoS Computational Biology 2, 0598–0605 (2006). 266. Trudeau, D. L., Kaltenbach, M. & Tawfik, D. S. On the potential origins of the high stability of reconstructed ancestral proteins. Molecular Biology and Evolution 33, 2633–2641 (2016). 267. Wheeler, L. C., Lim, S. A., Marqusee, S. & Harms, M. J. The thermostability and specificity of ancient proteins. Current Opinion in Structural Biology 38, 37–43 (2016). 268. Wijma, H. J., Floor, R. J. & Janssen, D. B. Structure- and sequence-analysis inspired engi- neering of proteins for enhanced thermostability. Current Opinion in Structural Biology 23, 588–594 (2013). 269. Devamani, T. et al. Catalytic promiscuity of ancestral esterases and hydroxynitrile lyases. Jour- nal of the American Chemical Society 138, 1046–1056 (2016). 270. Chaloupkova, R. et al. Light-emitting dehalogenases: Reconstruction of multifunctional biocat- alysts. ACS Catalysis 9, 4810–4823 (2019). 271. Plach, M. G., Reisinger, B., Sterner, R. & Merkl, R. Long-term persistence of bi-functionality contributes to the robustness of microbial life through exaptation. PLoS genetics 12, e1005836 (2016).

92 Bibliography

272. Busch, F. et al. Ancestral tryptophan synthase reveals functional sophistication of primordial enzyme complexes. Cell Chemical Biology 23, 709–715 (2016). 273. Schupfner, M., Straub, K., Busch, F., Merkl, R. & Sterner, R. Analysis of allosteric commu- nication in a multienzyme complex by ancestral sequence reconstruction. Proceedings of the National Academy of Sciences 117, 346–354 (2020). 274. Wilson, C. et al. Using ancient protein kinases to unravel a modern cancer drug’s mechanism. Science 347, 882–886 (2015). 275. Lovera, S. et al. The different flexibility of c-Src and c-Abl kinases regulates the accessibility of a druggable inactive conformation. Journal of the American Chemical Society 134, 2496–2499 (2012). 276. Randall, R. N., Radford, C. E., Roof, K. A., Natarajan, D. K. & Gaucher, E. A. An experimental phylogeny to benchmark ancestral sequence reconstruction. Nature Communications 7, 1–6 (2016). 277. Bar-Rogovsky, H. et al. Assessing the prediction fidelity of ancestral reconstruction by a library approach. Protein Engineering, Design and Selection 28, 507–518 (2015). 278. Risso, V. A. & Sanchez-Ruiz, J. M. in Directed Enzyme Evolution: Advances and Applications (ed Alcalde, M.) 229–255 (2017). 279. Alcalde, M. When directed evolution met ancestral enzyme resurrection. Microbial Biotechnol- ogy 10, 22–24 (2017). 280. Risso, V. A., Sanchez-Ruiz, J. M. & Ozkan, S. B. Biotechnological and protein-engineering implications of ancestral protein resurrection. Current Opinion in Structural Biology 51, 106– 115 (2018). 281. Gumulya, Y. et al. Engineering highly functional thermostable proteins using ancestral sequence reconstruction. Nature Catalysis 1, 878–888 (2018). 282. Gumulya, Y. et al. Engineering thermostable CYP2D enzymes for biocatalysis using Combina- torial Libraries of Ancestors for Directed Evolution (CLADE). Chemcatchem 11, 841 (2019). 283. Romero-Romero, M. L., Risso, V. A., Martinez-Rodriguez, S., Ibarra-Molero, B. & Sanchez- Ruiz, J. M. Engineering ancestral protein hyperstability. Biochemical Journal 473, 3611–3620 (2016). 284. Kratzer, J. T. et al. Evolutionary history and metabolic insights of ancient mammalian uricases. Proceedings of the National Academy of Sciences 111, 3763–3768 (2014). 285. Zakas, P. M. et al. Enhancing the pharmaceutical properties of protein drugs by ancestral sequence reconstruction. Nature Biotechnology 35, 35–37 (2017). 286. Dickschat, J. S. Bacterial terpene cyclases. Nat. Prod. Rep. 33, 87–110 (2016). 287. Bohlmann, J. & Keeling, C. I. Terpenoid biomaterials. Plant Journal 54, 656–669 (2008). 288. Zhu, Y., Romain, C. & Williams, C. K. Sustainable polymers from renewable resources. Nature 540, 354–362 (2016). 289. Bornscheuer, U. T. et al. Engineering the third wave of biocatalysis. Nature 485, 185–194 (2012). 290. Reetz, M. T. Biocatalysis in organic chemistry and biotechnology: Past, present, and future. Journal of the American Chemical Society 135, 12480–12496 (2013). 291. Hendrikse, N., Charpentier, G., Nordling, E. & Syrén, P.-O. Ancestral diterpene cyclases show increased thermostability and substrate acceptance. FEBS Journal 285, 4660–4673 (2018). 292. Marin, J., Battistuzzi, F. U., Brown, A. C. & Hedges, S. B. The timetree of prokaryotes: new insights into their evolution and speciation. Molecular biology and evolution 34, 437–446 (2016). 293. Leuenberger, P. et al. Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability. Science 355, eaai7825 (2017). 294. Buddrus, L. et al. Crystal structure of an inferred ancestral bacterial pyruvate decarboxylase. Acta Crystallographica Section F: Structural Biology Communications 74, 179–186 (2018). 295. Bart, A. G., Harris, K. L., Gillam, E. M. J. & Scott, E. E. Structure of an ancestral mammalian family 1B1 cytochrome P450 with increased thermostability. J Biol Chem 295, 5640–5653 (2020). 296. Nicoll, C. R. et al. Ancestral-sequence reconstruction unveils the structural basis of function in mammalian FMOs. Nature Structural and Molecular Biology 27, 14–24 (2020). 297. Bendadi, F. et al. Impaired cognitive functioning in patients with tyrosinemia type i receiving nitisinone. Journal of Pediatrics 164, 398–401 (2014). 298. Goldson-Barnaby, A. & Scaman, C. H. Purification and characterization of phenylalanine am- monia lyase from Trichosporon cutaneum. Enzyme Research (2013). 299. Hendrikse, N. M. et al. Exploring the therapeutic potential of modern and ancestral phenylala- nine/tyrosine ammonia-lyases as supplementary treatment of hereditary tyrosinemia. Scientific Reports 10, p. 1315 (2020).

93 Bibliography

300. Simonis, H., Yaghootfam, C., Sylvester, M., Gieselmann, V. & Matzner, U. Evolutionary re- design of the lysosomal enzyme arylsulfatase A increases efficacy of enzyme replacement therapy for metachromatic leukodystrophy. Human Molecular Genetics 28, 1810–1821 (2019). 301. Lazarus, R. A. & Scheiflinger, F. Mining ancient proteins for next-generation drugs. Nature Biotechnology 35, 28 (2017).

94