Directed Evolution of Bacillus gibsonii Alkaline towards Washing Applications: a Study of Adaptation

by Ronny Ernesto Martínez Moya

A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biochemical Engineering

Approved, Thesis Committee

Prof. Dr. Ulrich Schwaneberg, RWTH Aachen University

Prof. Dr. Danilo Roccatano, Jacobs University Bremen

Prof. Dr. Karl-Heiz Maurer, Henkel AG & Co KGaA

Date of Defense: September 10, 2010 School of Engineering and Science

Table of contents

Table of contents

Table of contents ...... i Acknowledgments...... iii Abstract ...... iv Abbreviations ...... vi Part I: General introduction...... 1 1. Protein engineering and Directed Evolution ...... 1 2. Cold adaptation and thermal stability of ...... 5 2.1. - Thermostable enzymes ...... 6 2.1. - Cold-adapted enzymes...... 7 3. - Aim of the project...... 9 Part II: Directed Evolution of Bacillus gibsonii Alkaline Protease towards washing applications 11 1. – Introduction ...... 11 1.1. - Aim of this work ...... 11 1.2. –...... 11 1.2.1. – Serine proteases and ...... 13 1.2.2. – proteases applications in industry...... 16 1.2.3. – Protein engineering in subtilisin proteases...... 18 1.2.4. – B. gibsonii Alkaline Protease (BgAP)...... 21 1.3. - SeSaM...... 23 1.3.1. – SeSaM principle...... 23 1.3.2. - SeSaM Features ...... 24 2. - Materials and methods ...... 26 2.1. – Materials...... 26 2.1.1. – Chemicals...... 26 2.1.2. – Bacterial strains...... 26 2.1.3. – Plasmids ...... 26 2.1.4. – Oligonucleotides ...... 26 2.1.5. – Cell culture media and cultivation ...... 28 2.2. - Methods ...... 28 2.2.1. - Cloning ...... 28 2.2.2. - Random mutagenesis library generation...... 28 2.2.3. - Protein expression...... 29 2.2.4. - Screening for improved activity at 15°C ...... 29 2.2.5. - Screening for improved thermostability ...... 30 2.2.6. - Site-directed mutagenesis and site saturation mutagenesis of BgAP variants...... 30 2.2.7. - Protein purification ...... 30 2.2.8. - Proteolytic activity: Suc-AAPF-pNA ...... 31 2.2.8. - activity: Azocasein...... 31 2.2.9. - Temperature dependence of the specific activity ...... 32 2.2.10. - Thermal inactivation...... 32 2.2.11. - Thermal shift assay...... 32 2.2.12. - Homology modeling ...... 33 3. – Results ...... 34 3.1. – Directed Evolution to improve activity at low temperatures ...... 34 3.1.1. - BgAP expression system ...... 34 3.1.2. - Skim milk microtiter plate assay ...... 35 3.1.3. - SeSaM library generation ...... 36 3.1.4. - Screening of BgAP SeSaM library round 1...... 37 3.1.5. - Screening of BgAP SeSaM library round 2...... 40 3.1.6. - Screening of BgAP SeSaM library round 3...... 43

i Table of contents

3.1.8. - Purification and characterization of selected clones from Directed Evolution...... 47 3.1.9. – Characterization using the suc-AAPF-pNA substrate...... 50 3.1.10. – Characterization using the macromolecular substrate Azo-...... 51 3.2. - Directed Evolution of BgAP for improved thermostability...... 52 3.2.1. – Re-screening of SeSaM libraries...... 55 3.2.2. – Site-directed mutagenesis to generate variant 39IC1+N253D...... 57 3.2.3. - Site-directed mutagenesis and combination of amino acid substitutions ...... 58 3.2.4. - Purification and characterization of variants with improved thermostability...... 59 3.3. - Homology modeling of BgAP...... 65 4. - Discussion and conclusions ...... 66 4.1.- Analysis of the mutations generated by SeSaM ...... 66 4.2. - Directed Evolution of BgAP...... 68 4.2.1. – Cold adaptation of BgAP ...... 68 4.3.2. – Increasing thermal stability of BgAP...... 71 4.3. - Conclusions of Part I...... 75 Part III: Temperature Effects on Dynamics of Psychrophilic Protease S41 and its Thermostable Mutants in Solution...... 76 1. – Introduction ...... 76 1.1. - Aim of this work ...... 76 1.2. – Molecular dynamics simulations...... 76 1.3. – Extremophiles and protein adaptation...... 77 1.4. – The Studied system ...... 78 2. - Materials and methods ...... 80 2.1. - Starting structures ...... 80 2.2. - MD simulations...... 80 2.3. - Analysis of the simulations...... 81 2.3.1. - General structure analysis...... 81 2.3.2 Cluster analysis ...... 82 2.3.3. - Essential Dynamics Analysis...... 82 2.4. - Graphical representations...... 83 3. - Results...... 83 3.1. - Structural properties...... 83 3.2. - Structural ions...... 86 3.3. - Dynamic properties...... 87 3.4. - Essential Dynamics Analysis...... 91 3.4.1. - WT S41...... 92 3.4.2. - MUT2 ...... 93 3.4.3. - MUT7 ...... 93 4. - Discussion and conclusions ...... 96 Future prospects: ...... 99 Part IV: References ...... 100

ii Acknowledgments

Acknowledgments

I would like to express my thanks to my supervisor, Professor Dr. Ulrich Schwaneberg, for giving me the opportunity to join his Biochemical Engineering group at Jacobs University Bremen. His advices and guidance have greatly helped me to overcome many challenges, not only in scientific aspects but in life in general. I have got nothing but support and motivation from his side in these three years of doctoral studies. Along with my supervisor, I would like to thank my PhD committee members Prof. Dr Danilo Roccatano and Prof. Dr. Karl-Heinz Maurer. Both of them have been extremely approachable and have provided me with an immense amount of feedback and advice regarding both scientific and life matters. Prof. Roccatano helped me during my whole doctoral studies introducing me to the interesting world of computational chemistry and supported me in my efforts to apply Molecular Dynamics in my studies. Prof. Maurer shared his knowledge of proteases and provided me with valuable information and advices from the industry and applied biotechnology point of view. During my doctoral work I had the valuable support from our collaborators Dr. Petra Siegert, Dr. Timothy O’Connell, Dr. Hendrik Hellmuth, Claudia Stammen and Brian Laufs at Henkel AG & Co KGaA. Their advices and friendly help contributed greatly to setting up the protease screening, characterization and purification methods in our laboratory. A very important part of my life during these last 3 years has been the people from the Schwaneberg working group. It was the first time I worked in a multicultural environment, and every single coworker taught me valuable lessons on both scientific and personal level. I was lucky enough to have an amazing environment inside and outside the lab and many productive and not-so-productive hours of discussion that I equally enjoyed. I am deeply thankful of our laboratory managers Marina and Daniela for their amazing work; organizing a laboratory with 30+ people is a difficult task and usually not appreciated until something is wrong. Without them life in the laboratory would have been much more difficult. I also thank my students, Usha, Tory, Pratik and Vladimir; I hope you have learned as much from me as I learned from having you working with me. My constant and most deep gratitude goes to my father, my mother, Karina, my sisters and the rest of my family; they are my main source of love, motivation and strength. All I am now is nothing but the result of their constant efforts and care. All this work would not be possible without the financial support of Jacobs University Bremen, The German government through the Bundesministerium für Bildung und Forschung (BMBF) and Henkel AG & Co KGaA.

iii Abstract

Abstract

Enzymes have been used by mankind for thousands of years. In the past decades protein engineering has facilitated the replacement of chemical catalysts with tailored biocatalyst due to its comparative advantages in terms of activity and stereo specificity, production costs and high yields of product. Despite these advantages, there are still many challenges for biocatalysts to completely replace chemical catalysts in industrial applications. B. gibsonii Alkaline Protease (BgAP) is a recently described subtilisin protease exhibiting activity and stability properties suitable for industrial applications in laundry and dish washing detergents, but having a significant decrease of activity at low temperatures. In order to improve BgAP activity at 15°C, a Directed Evolution experiment based on SeSaM random mutagenesis method was performed. An optimized microtiter plate expression system in B. subtilis was established and classical proteolytic detection methods were adapted for high throughput screening. Screening of three Directed Evolution rounds yielded a set of enzyme variants with improved specific activity

(K cat ) at 15°C and amino acid substitutions related to this improvement were identified. Most identified variants showed decreased thermal stability compared with WT BgAP. A parallel screening towards improved protein stability and further recombination yielded variant 39IC1+N253D with a half time at 60°C more than 200 fold that of WT BgAP. Recombination of both sets of amino acid substitutions resulted in variant 39IC1 MutIII with 1.5 fold the specific activity at 15°C and 100 times the half life at 60°C of the WT BgAP. Analysis of the introduced amino acid substitutions revealed that no substitutions close to the of the protein, and activity related substitutions were found to be non-charged to non-charged of the large to small type in the secondary structure of the protein whereas substitution related to improved thermal stability introduced negatively charged residues in loop areas of the protein surface, and increasing ionic and hydrogen bonds interactions. To further study protein adaptation, the psychrophilic protease subtilisin S41 from the Antarctic bacillus TA41, and two variants with two and seven amino acid substitutions were studied using Molecular Dynamics simulation at 283K and 363K. The analysis of protein dynamics revealed that the average global flexibility of both variants was slightly higher than S41 WT at both 283K and 363K, and a fluxional profile of the flexible loops different for each variant. In addition, essential dynamics analysis evidenced that the most important collective motions, especially at 363K, differ in profile and intensity for each protein. At high temperature and for the thermo labile wild type, a selective amplification of a subset of the low temperature largest collective motions was observed. Conversely, the two thermo-stable variants showed a rather different pattern of the collective motions at 363K from those at 283K. These results support the hypothesis that the introduced amino acid substitutions, rather than improving the global stability of the

iv Abstract variants by increasing its rigidity, selectively activate collective fluxional modes allowing the enzyme to explore a different set of conformations, which may be related to the improved stability. A better understanding of this process can open alternative strategies to design stable based on their dynamic behavior rather than static stability features.

v Abbreviations

Abbreviations

AAPF Ala-Ala-Pro-Phe tetrapeptide Ap Ampicillin AU Absorbance Unit BSA Bovine Serum Albumin B. subtilis Bacillus subtilis DMSO Dimethyl sulfoxide DNA Deoxyribonucleic acid E. coli epPCR Error-prone PCR Da Dalton LB Luria-Bertani medium MEGAWHOP Megaprimer PCR of Whole Plasmid MTP Microtiter plate OD Optical Density PCR Polymerase Chain Reaction pI Isoelectric point PMSF Phenylmethylsulfonyl fluoride SDM Site-directed mutagenesis SeSaM Sequence Saturation Mutagenesis StEP Staggered Elongation Process TCA Trichloroacetic Acid Tet Tetracycline WT Wild type

vi General Introduction

Part I: General introduction

1. Protein engineering and Directed Evolution

Enzymes found in nature specifically catalyze a myriad of different chemical reactions. They make life possible, since almost all processes in an organism need enzymes to occur at significant rates. Man’s use of enzymes predates recorded history. Unknowingly, mankind has applied enzymatic activity for the production of certain types of food and beverages, such as cheese, wine and beer, and also for the tanning of hides and skins to produce leather for garments, clothes and weapons. With the development of biochemistry in the 19 th century the actual name “enzyme” was coined and their nature and how they work started to be studied. During the 20 th century, the recognition that enzymes are proteins along with the design of techniques for their purification, analysis and the development of high volume fermentation technologies paved the way for the development of procedures for their industrial production and use [1]. Two major breakthroughs occurred in the 1960s that had a great impact on enzyme industry; the commercialization of glucoamylase which allowed the production of glucose from starch with an efficiency much greater than the chemical procedure of acid hydrolysis, and the launch of the first enzyme-containing detergents. Later in the 1980s, the development of genetic engineering tools allowed an explosive expansion on the use of enzyme in industry, generating the nowadays billion-dollar enzyme industry and the mass interest in biocatalysts. Due to their efficiency and specificity, enzymes are increasingly being used in the generation of products that traditionally were made using chemical synthesis methods. The increasing demand of biocatalysts as replacement for chemical synthesis has brought the need for enzymes able to perform in non-natural conditions, particularly applied to the synthesis of pharmaceuticals and fine chemicals [2-4] Different approaches have been developed to provide enzymes that perform under non-natural conditions; Enzymes isolated from extremophile organisms are regularly screened, in order to explore biodiversity and find biocatalysts naturally evolved towards similar conditions as those required in industry. Directed enzyme evolution and rational design aim to tune already available enzymes that require improvement for performing under a certain condition or to catalyze a given reaction. Directed Evolution over the last years has emerged as a versatile and successful approach for tailoring protein properties to industrial demands and for providing valuable information for the advancement in the understanding of structure-function relationships in biocatalysts. Directed Evolution is based on iterative cycles of diversity generation by random mutagenesis of a target

1 General Introduction gene, followed by a functional selection for improved variants at a given condition or catalyzing a specific chemical reaction. The key steps of a Directed Evolution experiment are summarized in Figure 1.1. The gene sequence of the protein of interest is selected as a parent, based in the proximity to the desired function and its ability to adapt in response to introduced mutations and a selective pressure. Mutations are introduced randomly in the parent gene and mutant libraries are generated in Step I, these libraries are expressed in appropriate host organism or in vitro. The size of ready-to-screen mutant libraries is commonly in the range of 1000 to 100000 clones. The generated library is then screened for improved variants (Step II), usually by means of agar plate or microtiter plate assays. Once improved variants are identified, mutant genes encoding for these variants are isolated, and used as parents for a new evolution round, until a satisfactory level of improvement is achieved.

Figure 1.1. - Scheme of a Directed Evolution experiment with iterative cycles of mutant library generation (Step I), screening for improved variants (Step II) and isolation of the mutant genes encoding for said variant (Step III). Adapted from Schwaneberg et al [5].

Due to the large library sizes, Directed Evolution experiments are significantly more time consuming than Rational Design experiments, where often site-directed mutagenesis is performed and screening includes a limited number of variants. On the other hand, the Directed Evolution

2 General Introduction approach has an important advantage, which is that no structural or mechanistic information is previously needed to perform the protein optimization. The Rational Design approach relies on the knowledge of the protein crystal structure and/or the molecular bases of the catalytic reaction performed by the enzyme in order to propose amino acid substitutions leading to the improvement of its functionality. Directed Evolution is, however, best used on proteins with known crystal structures to generate knowledge by analyzing structure-function relationships based on the amino acid substitutions found in the improved variants. The key steps of a Directed Evolution and how they affect the success of the experiment have been extensively discussed [6-13]. The first step is the library generation. Techniques for introducing random mutations include mutating strains, error-prone PCR (epPCR) [14] and chemical mutations induced by UV radiation. Further recombination can be achieved by using DNA shuffling [15] or Staggered Extension Process (StEP) [16], which allow the accumulation of beneficial mutations identified in separate genes, particularly useful in later stages of Directed Evolution. One of the latest developments in random mutagenesis library generation is Sequence Saturation Mutagenesis (SeSaM), which introduce mutations that saturates every amino acid position in a given protein, overcomes significant bias for transition mutations over transversions, and even introduces mutations in two consecutive bases (SeSaM is further discussed in Part II). The impact of the used mutagenesis techniques on the quality of the generated library mainly revolves between the mutation frequency, transition/transversion bias, and the tunability of the possible subsets of resulting amino acid substitutions but it is also related to the available screening capabilities. The screening is based on the Darwinian concept of selection involving the fitness of an individual for an evolutive pressure. Under the controlled conditions in the laboratory, the screening works analogous to an evolutionary pressure, where this pressure is unique and defined by the property that ultimately should be improved. As mentioned before, the most common screening methods employ the microtiter plate format, allowing the screening of 96, 384, 1536 or even more clones at the same time. Additionally, solid phase screening assays, such as agar plates, nitrocellulose membranes and filter paper assays allow a higher screening throughput, but this is not higher than 10 4 variants by manual screening or 10 6 variants using automated screening. Efforts in improving the quality of the random mutagenesis libraries are not reaching port if only a small fraction of the generated diversity is screened. Recently, ultra high throughput screening methods have been developed to grasp the full potential of the diversity generated by random mutagenesis methods. The use of Fluorescent Activated Cell Sorter (FACS) flow cytometry systems and microfluidic devices, coupled with emulsion technology has proven to be an attractive solution to the screening throughput challenge, offering a significantly higher

3 General Introduction throughput, reaching 10 8 clones per day [17-20]. Despite the improved throughput, these ultra- high throughput screening systems are in an early stage of development, working at the moment as qualitative indicators of positive or negative activity of each clone, allowing the enrichment of active/functional variants, especially in high mutational load libraries, for a subsequent quantitative screening in a medium throughput platform, such as microtiter plates. A more detailed comparison of current high throughput screening methods and technologies is available elsewhere [21]. Figure 1.2 summarizes the current strategies used for screening and Table 1.1 shows a comparison of the different properties of each strategy.

Table 1.1. - Comparison of screening and selection technologies [21]

Strategy Library size Advantage Disadvantage Selection ~10 9 Yields desirable Only possible if activity variants only confers advantage Agar plate screen ~10 5 Simple to operate Limited dynamic range Microtiter plate screen ~10 4 All analytical methods Relative low screening possible. Excellent capacity dynamic range Cell-in-droplet screen ~10 9 Large libraries Fluorescence detection and DNA modifying enzymes only Cell as microreactor ~10 9 Large libraries Fluorescence detection only Cell surface display ~10 9 Large libraries Fluorescence detection only In vitro compartmentalization ~10 10 No cloning steps. Large Fluorescence detection only libraries and DNA modifying enzymes only

After the identification of improved variants, the encoding genes are isolated and the mutations are investigated, these generally are used as parents for further iterative cycles of Directed Evolution, or recombined to accumulate beneficial mutations until a satisfactory level of improvement is achieved. The number of required Directed Evolution rounds depends on the characteristics of the starting protein, the protein sensitivity towards mutations and the expected improvement.

4 General Introduction

Figure 1.2. - Overview of screening technologies . (A) Experimental steps from obtaining the gene library as PCR product to the actual screen. (B) Explanation of symbols; (C) Cell growth/survival selection and agar plate screening; (D) Microtiter plate screening; (Cell as micro-reactor; (F) Cell surface display; (G) Cell-in-droplet: and (H) In vitro compartmentalization [21].

2. Cold adaptation and thermal stability of enzymes

In the past decades, there has been a vigorous development in enzyme technology. Compared to classical chemical catalysts, enzymes show an advantage in terms of catalytic activity, selectivity and ability to perform under a variety of mild conditions. Additionally, microbial extracellular

5 General Introduction enzymes can be produced in very high quantities and are comparatively inexpensive. Despite these advantages, some industrial applications, especially the pharmaceutical and fine chemical synthesis, are still dominated by chemical catalysts [2-4]. The reason is that there are differences between being a good biocatalyst and a good industrial catalyst. Industrial applications often require different conditions and higher concentration of reagents than in living organisms. Enzyme stability is another topic since inactivation can be caused by a number of different factors such as heat, cold, proteases, oxidation, acidic or alkaline pH and denaturing agents. Among all of them, inactivation related to temperature is by far the most important mode of enzyme inactivation [22].

2.1. - Thermostable enzymes Increasing temperature stability of an enzyme was one of the first attempts to optimize the use of biocatalysts for industrial applications. Temperature dependency of the enzymatic rate can be explained according to Arrhenius equation: k = Ae -Ea/RT , where k is the rate constant, A is the pre- exponential factor that depends on the reaction, Ea is the activation energy, R is the gas constant and T in the temperature in Kelvin. From the equation, any increase of temperature will induce an exponential increase of the reaction rate depending on the activation energy. Additionally, solubility and the prevention of bacterial contamination were other good reasons to reach for high temperature reactions. Many organisms are able to live at high temperatures; these organisms (thermophiles) are reported to contain proteins which are thermostable and resist denaturation and proteolysis [23]. Specialized proteins known as chaperones are produced by these organisms, which help to prevent heat denaturation or refold the proteins to their native state after heat denaturation [24]. One of the most common objectives in early protein engineering was to enhance the stability of industrially interesting enzymes in order to meet the necessary conditions for the complete replacement of a chemical catalyst in a given reaction. Classical examples of this are the work performed in glucose , proteases, amylases, cellulases, xylanases and lipases [25-32]. The molecular mechanisms used or found to be of importance for the stability of enzymes include the introduction of disulfide bridges, ionic interactions and salt bridges that increase the rigidity of the protein scaffold, reducing the possible conformations of the unfolded protein, and the stabilization of the secondary structure and hydrophobic core of the globular enzymes [32] The isolation of naturally-evolved versions of industrially important enzymes from thermophilic organisms have provided not only alternative biocatalyst to those obtained by protein engineering, but also a structural comparison for the study of the possible molecular mechanisms involved in protein stability [31].

6 General Introduction

To this date, the most accepted theory for a general mechanism of thermostability is the increase on the static stability of the protein structure by reducing the flexibility and/or increasing the affinity of structurally important ions or cofactors. There are, however, other approaches related to the change in the conformational entropy of thermostable enzymes, which had shown higher flexibility than meshophilic homologues, leading to the hypothesis of entropic stabilization in thermostable enzymes [33].

2.1. - Cold-adapted enzymes Living organisms have not only adapted to live under high temperature conditions; terrestrial and aquatic organisms can also be found inhabiting the polar and high altitude regions, deep sea, subterranean regions, upper-atmosphere and even in man-made refrigerated appliances. Unlike microorganisms that may experience relatively short periods of cold, those that inhabit permanently are optimally adapted for growth at low temperatures [34]. Enzymes present in these organisms are adapted through millions of generations to achieve stability and optimal activity under temperatures close to the freezing point. A reduction in temperature influences most biochemical reactions; most physiological processes are slowed down, protein-protein interactions are changed, membrane fluidity is reduced, as well as salt solubility whereas gas solubility is increased. Decrease in temperature also decreases the pH in biological buffers, affecting both the charge of some amino acid residues (particulary histidine) and protein solubility [35]. Also possible, though less evident, is cold denaturation of proteins. This phenomenon is a very general property of enzymes leading to the loss of activity at low temperatures [36], and is thought to occur through the hydration of polar and non-polar groups of proteins, a process thermodynamically favored at low temperatures [37]. Cold denaturation affects in particular multimeric enzymes and proteins showing a high degree of hydrophobicity [36] As mentioned earlier, low temperatures slow down and strongly inhibit chemical reaction rates catalyzed by enzymes. The effect of temperature on chemical reactions is described by the Arrhenius equation (see 2.1) where any decrease in temperature, exponentially decreases the reaction rate. The thermodependence of the activity can be approximately expressed by the Q10 value that is normally close to 2-3. The slower reaction rates are the main factor preventing the growth, at low temperatures, of non-adapted organisms [38]. Cold-adapted enzymes from psychrophilic organisms present a common behavior pattern; they have a higher activity at low and moderate temperatures and a low thermostability. Figure 1.3 shows a typical comparison of the effect of the temperature on the catalytic efficiency of a naturally cold-adapted enzyme and a mesophilic enzyme [39].

7 General Introduction

Figure 1.3. - Effect of temperature on the activity of psychrophilic and mesophilic enzymes . Curves representing the thermal dependence of the specific activity of the psychrophilic α-amylase from P. haloplanktis (solid circles) and of its mesophilic homologue from pig pancreas (open circles) [39].

The features of cold-adapted enzymes compared to mesophilic homologues are the increased specific activity at low temperatures, a maximal activity value shifted towards low temperatures and a early loss of activity at increasing temperatures. The possible mechanisms explaining these changes have been discussed in detail. In summary, cold-adapted proteins usually do not have a different catalytic mechanism than those of mesophile homologues, but they generally have reduced energy activation in the catalytic reaction, decreased ionic and electrostatic interactions, decreased core hydrophobicity, increased substrate affinity, increased accessibility to the active site and higher flexibility. This higher flexibility is usually related to the decreased thermal stability of these enzymes, however, the relationship between increased activity at low temperatures and thermostability has not been completely elucidated. [34, 35, 38-43]

Like thermostable enzymes, cold-adapted biocatalysts are also interesting for industrial applications. Cold-adapted enzymes offer economic benefits through energy savings. They eliminate the necessity of heating steps, performing in cold environments providing increased reaction yields, minimizing secondary enzymatic side reactions that occur at higher temperatures and can even be easily inactivated by temperature when needed. Additionally, the increased flexibility can help to perform in the presence of non-aqueous solvents where the stabilizing effects of low water activity are countered [44]. One of the biggest challenges of the application of cold-adapted enzymes resides in their rapid production in large amounts. Most natural organisms originally expressing cold-adapted enzymes cannot be isolated or easily cultivasted and therefore axenic cultures of these organisms are not available. Most of the characterized enzymes have been cloned and expressed in common

8 General Introduction mesophile hosts like E. coli or B. subtilis , with optimum growth and expression temperatures much higher than original hosts and therefore than the expressed proteins (25°C-37°C versus 4°C- 10°C). These differences in optimal growth and expression temperatures of host and expressed enzyme hamper the optimization of the fermentation process, affecting the production time and yields needed to meet industry standards. Additionally, the inherent low stability of cold-adapted proteins presents an equally important challenge for downstream processing design and optimization. A recent interest of enzyme industry, especially detergent industry, has inspired efforts to modify mesophile enzymes to match the activity of their cold-adapted homologues [45-48]. With the same objective, cold adapted enzymes have been modified to increase their stability, in order to make them suitable for industry applications [47, 49, 50]. From the available data, the adaptation of enzymes to low temperatures appears to follow different strategies to improve catalytic efficiency at low temperatures; the increase in specific activity and substrate affinity appear to be the main characteristic in naturally cold-adapted enzymes, however, the accumulation of neutral and potentially deleterious mutations at higher temperatures due to the lack of additional evolutionary pressures, prevents to establish a relationship between cold-adaptation, flexibility and thermal stability. Recently, with data provided by protein engineering works, isolated hypothetic mechanisms of adaptation are being proposed and studied aiding to the elucidation of the fundamental principles of protein adaptations to different temperatures.

3. - Aim of the project

The aim of this work was to contribute to the elucidation of the mechanisms related to protein adaptation. This work is divided in two parts: The first part relates to the Directed Evolution of an industrially important B. gibsonii alkaline protease. The aim of this section was to identify by the screening of random mutagenesis libraries, enzyme variants with improved activity at low temperatures (15°C) for application in washing detergents. The analysis of the amino acid substitutions found in the generated variants provides information about the possible mechanism responsible of the activity and stability changes. The second part focuses in the theoretical study of a success example of protein adaptation of a cold-adapted protease towards improved stability by Molecular Dynamics simulation. The aim of the study was to propose the molecular mechanism of stabilization by analyzing the changes in the structural and dynamic properties of S41 and its thermostable variants.

9 General Introduction

Results from both studies contribute to establish structure-function relationship for subtilisin proteases, and provide more evidence towards the elucidation of the relationship between low temperature activity and protein stability.

10 Directed Evolution of BgAP

Part II: Directed Evolution of Bacillus gibsonii Alkaline Protease towards washing applications

“It seems clear that there is a big difference between being a good biocatalyst and a good industrial catalyst. This difference is composed of several factors; one of most significant is operational stability. In order to be suitable for technological applications, catalysts should be stable under operational conditions for weeks or even months. Most enzymes do not satisfy this requirement ” - Alexander M. Klibanov, from Advances in Applied Microbiology , 1983

1. – Introduction

1.1. - Aim of this work

The aim of this work was to generate improved variants of a potential industrially important Bacillus gibsonii alkaline protease (BgAP) by Directed Evolution. The properties to be improved were the activity at low temperatures (15 °C) and subsequently thermal stability. A second objective was to study and analyze the amino acid substitutions related to both improved properties in order to investigate the possible structure-function relationship leading to the enhanced variants. For this, several Directed Evolution rounds were supposed to be performed by screening random mutagenesis libraries generated from BgAP using SeSaM. A microtiter plate expression and proteolytic assay system were adapted for high throughput screening of improved activity.

1.2. –Proteases

Proteases are enzymes that hydrolyze peptide bonds. They are necessary for the survival of all living creatures, where they are encoded by about 2% of the total gene content. Proteases are important for a number biological processes including recycling of intracellular proteins, digestion of food proteins, the blood cascade, antigen presentation, and activation of zymogens, peptide hormones, and neurotransmitters. Proteases can be classified according to the chemical groups responsible for the catalysis of the peptide bond hydrolysis. The six specific catalytic types that are recognized for proteases are the serine, threonine, cysteine, aspartic, glutamic and metallo- proteases. In proteases of the serine, threonine and cysteine type, the catalytic nucleophile is the reactive group of an amino acid side chain, either a hydroxyl group (serine and threonine proteases) or a sulfhydryl group (cysteine proteases). In aspartic and metalloproteases, the nucleophile is commonly an activated water molecule. In aspartic proteases, the water molecule is directly bound by the side chains of aspartic

11 Directed Evolution of BgAP residues. In metalloproteases, one or two divalent metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. The metal is most commonly zinc, but may also be cobalt, manganese or copper. A single metal ion is usually bound by three amino acid ligands. The activated water molecule is a fourth metal ligand, and the metal is described as “tetrahedrally coordinated”. When two metal ions are present, each is tetrahedrally coordinated, so that two activated water molecules are bound, and one amino acid residue ligates both metals [51]. The glutamic proteases were recognized only in 2005 [52], and much remains to be learned about their catalytic mechanisms, but they seem to employ a Glu/Gln catalytic pair. Just a few proteases are still of unknown catalytic type. There are many industrial uses for proteases; the earliest were for cheese production, initially using plant juices and “rennet” (animal stomach contents) to clot milk. Proteases are also used to tenderize meat, clarify beers and enhance the flavor of cheese and pet foods. Proteases are used in the leather industry to remove hair, and make the leather more supple (“bating” and “soaking” the leather). Proteases are also widely used in cleaning materials, such as biological washing powders and contact lens cleaning fluid; however, many of these are proprietary products for which the sequences and organisms of origin are not public. Besides being the targets of drugs, proteases are used in medicine to remove gastrointestinal parasites (anthelminthics), to remove dead skin from burn patients (debridement), for the determination of blood groups, and for relief of back pain by digesting the cartilage content of herniated intervertebral discs (chemonucleolysis). Proteases are also widely used as reagents in the laboratory for limited proteolysis of proteins, and for generating the peptides required for protein sequencing. Additional examples of uses of proteases are shown in Table 2.1 [51, 53]. Proteases are thus an exceptionally important group of enzymes in biology, medical research and biotechnology.

Table 2.1. – Commercial uses of proteases [51, 53]

Name Commercial use A Protein sequencing Cheese manufacturing mucorpepsin Cheese manufacturing endothiapepsin Cheese manufacturing Malting of cereal grains phytepsin Cheese manufacturing omptin Cleavage of recombinant fusion proteins Food processing Contact-lens cleaning. Stem cell isolation. Treatment of chymopapain herniated intervertebal disk glycyl endoprotease Protein sequencing stem Blood group determination Blood group determination ananain Burn debridement C Cheese manufacturing

12 Directed Evolution of BgAP

nuclear-inclusion-A Processing of recombinant fusion proteins (plum pox virus) tobacco vein mottling virus-type Processing of recombinant fusion proteins NIa endopeptidase tobacco etch virus NIa Processing of recombinant fusion proteins endopeptidase Pyroglutamyl-peptidase I Protein sequencing. Identification of group A (prokaryote) streptococci and enterococci microbial ( Vibrio Tissue cell dispersion. Preparation of Chlamydomonas sp .) protoplasts collagenase colA Many uses, including tissue cell disperssion gametolysin Preparation of Chlamydomonas protoplasts flavastacin Protein sequencing ecarin Hematoilogy applications russellysin Hematoilogy applications glutamate Experimental prodrug strategies for cancer therapy Lysis of staphylococcal cell wall Therapeutic use for local paralysis of neuromuscular bontoxilysin function deuterolysin Taste-forming factor in soy sauce peptidyl-Asp Protein sequencing tryptophanyl aminopeptidase Manufacturing of L-tryptophan Dispase Tissue cell dispersion Protein sequencing. Removal of allergens from milk A (cattle) protein hydrolyzates Protein sequencing. Preparation of bacterial media. 1 Bathing leather Cleavage of recombinant fusion proteins Benngn defibrinating agent crotalase Benngn defibrinating agent Benngn defibrinating agent coagulation factor Xa Cleavage of recombinant fusion proteins Cleavage of recombinant fusion proteins lysyl endopeptidase (bacteria) Protein sequencing Broad use in biological washing powders. Processing of Subtilisin Carlsberg whey in food industry. Production of pet food Digestion of casein by lactobacilli in cheese lactocepin I manufacturing Contributes to the effectiveness of organisms used in the cuticle-degrading endopeptidase biocontrol of insect and nematode pests Processing of recombinant proteins carboxipeptidase S1 Enhancement of flavors in foods

1.2.1. – Serine proteases and subtilisins

Subtilisins are a family of serine proteases, meaning that they have an essential serine residue at the active site. This serine residue is part of a formed by Asp, His and Ser, very similar to that of mammalian intestinal digestive enzymes, trypsin and chymotrypsin. The subtilisin family, officially known as peptidase family S8, is the second largest family. There are over 200 known members in this family, with the complete amino acid sequence established for the majority of them [54].

13 Directed Evolution of BgAP

Proteolytic enzymes that utilize serine in their catalytic triad are found everywhere. They include a wide range of peptidase activities, such as endoproteases, exoproteases and oligoproteases. Over 20 families of serine proteases have been identified and classified as members of 6 clans on the basis of structural and functional similarities. Subtilisins are found in archaea, bacteria, eukaryotes and viruses. Bacterial subtilisins are the subgroup of serine proteases of greater industrial significance and have been studied extensively, with regard to improving their catalytic efficiency and stabilities. Subtilisins produced by selected bacilli such as Subtilisin E ( B. subtilis ), Subtilisin BPN’ ( B. amyloliquefaciens ), Subtilisin 309 (Savinase TM , B. lentus ) and Carlsberg ( B. licheniformis ) have found widespread applications, especially as detergent additives. Subtilisins have another property in common with many secreted proteases; their biosynthesis requires the participation of an N- terminal pro-domain [55]. These domains act as intra-molecular chaperones to greatly expedite the folding rate of the mature, stable subtilisin (Figure 2.1). Recently, examples from novel subtilisin-like proteases having N- and C- terminal pro sequences have been reported [56].

Figure 2.1. - Three-dimensional structure of Subtilisin. Cartoon representation of the complete form of Subtilisin BPN’ including the intramolecular chaperone domain (A) and the mature protease (B) . The catalytic triad is highlighted in red.

Subtilisin proteases have a catalytic triad of serine, aspartate and histidine in common. A specific serine residue acts as a nucleophile and anchors the acylenzyme intermediate during the course of the catalytic reaction, with aspartate as an electrophile, and histidine as a base. Subtilisins catalyze the hydrolysis of peptide and ester bonds through the formation of an acyl-enzyme intermediate [57, 58]. This reaction is summarized in Figure 2.2, after the formation of an enzyme-substrate complex, the carbonyl carbon of the scissile bond is attacked by the active site serine (A), forming a tetrahedrical intermediate (B). In subtilisins this transition state is stabilized by hydrogen bonding to the backbone of the serine 221 (the active site nucleophile) and the side chain of

14 Directed Evolution of BgAP

Asp155. This transition state (C) decays as a proton is donated from the active site His64 to the amine group at the cleavage site of the substrate to liberate the first product of the reaction and simultaneous formation of the covalent acyl-enzyme intermediate. The enzyme is deacylated by nucleophilic attack by water (D), followed by the formation of another tetrahedral intermediate that is also stabilized by hydrogen bonding to the enzyme (E). This decays with proton transfer to the active site histidine and release of the second peptide product (F).

Figure 2.2. –General mechanism of catalysis of serine proteases .

There are four main features in the serine proteases catalytic site (Figure 2.3); the catalytic triad Asp32-His64-Ser221 (red), as described before, is directly involved in the actual enzymatic reaction. The residue Asn155 is important for the stabilization of the acyl-intermediate in the first stage of the catalysis. This residue forms a hydrogen bond with the negatively charged oxygen atom. This catalytic sub-site is known as (yellow). The substrate binding is divided between the unspecific binding pocket (green), where the interaction between the enzyme and the peptide substrate occurs in a protein backbone level, and the specificity pocket, where the properties of the binding residue will determine the success of the enzymatic reaction (blue).

15 Directed Evolution of BgAP

Figure 2.3. – Scheme of the active site of a typical subtilisin protease. The catalytic triad is shown in red and the oxyanion hole in yellow. The unspecific peptide binding pocket is shown in green whereas the specificity pocket is in blue (adapted from [59]). The enzyme and the relevant residues are represented in white; the substrate peptide is represented in green

It is interesting to mention that the catalytic triad in the chymotrypsin clan (another serine protease) is ordered His-Ser-Asp according to the linear position in the polypeptide chain; whereas in the subtilisin clan it is ordered Asp-His-Ser. Interestingly, bacterial subtilisins and mammalian serine proteases are paradigms of convergent evolution having independently arrived at this very similar catalytic triad [60]. Thus, for these serine proteases, having unrelated ancestral precursors, convergent evolution has resulted in a very similar structural arrangement to achieve a particular catalytic mechanism.

1.2.2. –Subtilisin proteases applications in industry

The largest industrial application of subtilisins is in detergents. Enzymes were first introduced into detergents in the early 1930s. The use of enzymes from animal sources turned up to be rather 16 Directed Evolution of BgAP unsuccessful, as those enzymes were not suited to washing conditions. A major development for detergent enzymes occurred in 1963 with the launch of Alcalase (subtilisin Carlsberg from Bacillus licheniformis ), with an alkaline pH optimum. Enzymes incorporated into detergents must exhibit satisfactory catalytic activities in the presence of other detergent components and of diverse washing conditions. Of the various classes of proteases, only serine proteases are suited to inclusion in detergents. Among them, bacterial subtilisins were identified, at an early stage, as being the most suitable for detergent applications. Current consumer demands together with the increased use of synthetic fibers, which do not tolerate high temperatures very well and the concern on spending less energy on washing processes, has led to the use of lower washing temperatures [61, 62]. Most industrial enzymes are produced using micro organisms. Currently, the majority of subtilisins used in detergents are isolated from B. licheniformis, B. amyloliquefaciens , B. lentus, B. alcalophilus, B. clausii or B. halodurans [63]. The first protease employed in detergents was subtilisin Carlsberg, obtained from B. licheniformis . This protein is a single polypeptide chain of 275 amino acids exhibiting typical Michaelis-Menten hyperbolic kinetics. Subtilisin BPN’ from B. amyloliquefaciens has 275 residues and its three-dimensional structure is very similar to that of subtilisin Carlsberg (Figure 2.1), although their kinetic properties vary. Subtilisin from B. lentus is also used frequently as it has a better activity profile at higher pH (9–12) which is more appropriate for powder detergents than subtilisin Carlsberg or BPN’. This subtilisin has 269 residues with about 60% sequence identity with each subtilisin Carlsberg and BPN’. All of the subtilisins used in detergents are of about 27 kDa. The fact that subtilisins are produced as extracellular enzymes is a major benefit as it greatly simplifies the separation of the enzyme from the cell biomass and facilitates relatively straightforward downstream purification processes [64]. The most important features for subtilisins to be suitable as a detergent component is to display high activity at the pH of detergent-containing wash water along with being reasonably stable in the presence of other detergent components. Subtilisins display broad substrate specificity, rendering them capable of hydrolyzing a range of protein structures. For commercial applications, subtilisins are produced extracellularly in large quantities by fermentation technology [64]. They can now be also generated by recombinant techniques and engineered in many aspects [65]. A detrimental factor for subtilisins in detergents is the presence of bleach that oxidizes sensitive residues near their active sites, such as methionine and cysteine. This obstacle can be overcome by site-directed mutagenesis to replace the sensitive residues with ones that do only affect catalytic activity to a limited amount, such as serine or alanine in the place of methionine [66]. This has led to the development of second-generation oxidation- resistant engineered subtilisins.

17 Directed Evolution of BgAP

The needs in the industry field have led to modifications in subtilisins that have completely changed the catalytic properties of the enzyme. Subtilisin BPN’ is a permanent example of modification of subtilisins. One modification of this subtilisin allowed it to catalyze peptide synthesis when dissolved in high concentrations of a water miscible organic solvent such as N, N- dimethylformamide (DMF). Furthermore, this modifcation reduced the turnover rate for peptide hydrolysis in 50% DMF to 1%, when compared to that in aqueous solution, whereas the turnover rate for the hydrolysis of ester substrates remained unchanged [67]. X-ray crystallography revealed that the imidazole ring of His64 had rotated and two new molecules of water stabilized the new conformation of the active site, with the loss of the low-barrier hydrogen bonds that had existed between His64 and Asp32. This modification provided a structural basis for the change in activity of these serine proteases in the presence of organic solvents. A combination of site-directed mutagenesis and chemical modifications targeting primarily the specificity pocket of the enzyme’s active site, have led to significant rate enhancement and broadening of substrate specificity [68]. These chemically modified or “polar patch” mutants have shown remarkable utility in peptide synthesis and can also generate glycopeptides at very high yield. Another approach to improve the peptide synthetic efficiency of subtilisin is site-selective glycosylation of the active site. Again, glycosylated subtilisin from B. lentus had greatly increased esterase and greatly reduced amidase activities; conditions which favor formation of amide bond rather than hydrolysis [69]. New interest in properties such as low-temperature performance has led to renewed interest in screening for novel subtilisins in nature and also to develop new variants that perform better at low temperatures. Cold adapted subtilisins have been found in nature [34, 40, 41, 70, 71], but their stability properties and constrains on large scale expression make them not suitable for commercial and industrial purposes [63, 72].

1.2.3. – Protein engineering in subtilisin proteases

Subtilisins have become a paradigm for protein engineering studies. There are several examples involving protein engineering of subtilisins to improve thermal stability in an effort to understand the molecular basics of enzyme stability. Protein engineering of subtilisin commenced in the 1960s, focused on understanding their catalytic properties and stabilities [73]. Since the 1980s there have been many impressive studies involving genetic manipulation of subtilisins. Subtilisin E from B. subtilis was converted by Directed Evolution into an enzyme functionally equivalent to its thermophilic homologue, thermitase from Thermoactinomyces vulgaris [74]. Thermitase, also a member of the subtilisin family, has 47% sequence identity to subtilisin BPN’ [75]. The optimum temperature of the evolved enzyme was 17°C higher and its half-life at 65°C was more than 200-fold that of wild type subtilisin E. In addition, it was more active towards the hydrolysis

18 Directed Evolution of BgAP of a synthetic substrate, succinyl-Ala-Ala-Pro-Phe-p-nitroanilide, than wild type at all temperatures from 10 to 90°C. Surprisingly, even though the sequence of thermitase differs from that of subtilisin E at 157 positions, only eight amino acid substitutions were required to convert subtilisin E into an enzyme with similar thermostability. The eight substitutions, which included previously recognized stabilizing mutations ( e.g. asparagine replacing serine at position 218 and aspartate for asparagine at residue 76), were found distributed over the surface of the enzyme. These experiments showed that Directed Evolution provides a powerful tool to unveil mechanisms of thermal adaptation and that it is an effective and efficient approach for manipulating thermal stability without compromising enzyme activity. A more recent study on the stabilizing mutations in subtilisin BPN’ has also greatly aided understanding of the structural basis of the thermal stability of this enzyme [76]. The rationale for this study was based on a requirement to overcome the loss of calcium due to the presence of water softeners (chelators) encountered during the use of detergents. Two new variants of calcium-independent subtilisins were created, where the high affinity calcium site was deleted, and then selected for increased thermal stability from a panel of random mutants. The molecular structures of these two enzymes have been compared with previously solved structures of subtilisins. Despite the variations in sequence, etc ., the overall structures are similar but not in the N-terminal region adjacent to the deletion. One of the variants formed a disulfide bond between the new cysteine residues. This disulfide bond anchors the N- terminus and contributes to the dramatic increase in thermostability. In addition to the new disulfide bond, other mutations combined to increase its thermostability 1200-fold under chelating conditions, essentially due to stabilization of the N-terminus. More recent site-directed mutagenesis has vastly improved the enzymatic half-life of calcium-free subtilisin BPN’, also with potential usefulness for biotechnological applications [77]. Enzymes isolated from psychrophilic organisms generally exhibit higher catalytic efficiency at low temperatures and greater thermal sensitivity than their moderate mesophilic counterparts [34, 39, 41, 71]. In an effort to understand the evolutionary process and the molecular basis of cold adaptation, Directed Evolution has also been employed to convert a mesophilic subtilisin-like protease from B. sphaericus , SSII, into its psychrophilic counterpart. A single round of random mutagenesis followed by recombination of improved variants yielded a mutant with a turnover number (K cat at 10°C increased 6.6 fold) and a catalytic efficiency (K cat /K m) 9.6 times that of wild type. Its half-life at 70°C was found to be 3.3 times less than the wild type. It has been noted that although there is a trend toward decreasing stability during the progression from mesophilic to psychrophilic enzymes, there is no strict correlation between decreasing stability and increasing low temperature activity. Mesophilic subtilisin SSII shares 77.4% sequence identity with the naturally psychrophilic protease subtilisin S41. Although these two subtilisins differ at 85

19 Directed Evolution of BgAP positions, yet just four amino acid substitutions were sufficient to generate an SSII subtilisin whose low temperature activity is greater than that of S41 [48]. The thermostability and activity of the psychrophilic protease subtilisin S41, from the Antarctic Bacillus TA41, was also investigated with the goal of understanding the mechanisms by which this enzyme can adapt to different selection pressures. Mutant libraries were screened to identify enzymes that acquired greater thermostability without sacrificing low-temperature activity. The half-life of a seven amino acid substitution variant, 3-2G7, at 60°C was approximately 500 times that of the wild type and far surpassed those of homologous mesophilic subtilisins. The temperature optimum of the activity of 3-2G7 was shifted upward by approximately 10°C. Unlike natural thermophilic enzymes the activity of 3-2G7 at low temperatures was not compromised. The catalytic efficiency was enhanced approximately 3 fold over a wide temperature range (10 to 60°C). The activation energy for catalysis was nearly identical to wild type and close to half that of its highly similar mesophilic homologue, subtilisin SSII, indicating that the evolved S41 enzyme retained its psychrophilic character in spite of its dramatically increased thermostability. These results clearly demonstrated that it is possible to increase activity at low temperatures and stability at high temperatures simultaneously. As has been speculated, the fact that enzymes displaying both properties are not found in nature most likely reflects the effects of evolution, rather than any intrinsic physicochemical limitations of proteins [49, 78]. Another strategy for engineering a cold-adapted subtilisin has been attempted through creating a hybrid molecule where a stable mesophilic subtilisin, Savinase TM , was site-directedly modified to include residues from the binding region of psychrophilic subtilisin (S39) [45]. A 12 amino acid region (MSLGSSGESSLI) of the binding cleft of S39, from Antarctic Bacillus TA39, was predicted to be highly flexible and was used to replace the corresponding 12 residues (LSLGSPSPSATL) in savinase. The rationale being that local or global flexibility seems to be the main adaptive character of psychrophilic enzymes responsible for the thermodynamic parameters that increase the turnover at low temperature, meaning the decrease in activation enthalpy and increase in entropy [71]. In line with the predictions, the hybrid enzyme showed the same temperature optimum and pH profile as Savinase TM had higher specific activity with synthetic substrates; had broader substrate specificity at ambient temperature and showed a decrease in thermostability akin to the psychrophilic enzymes. As described, there are many studies regarding improvement on both thermal stability and adaptation to low temperatures for subtilisins, however, a general mechanism that can be applied for protein engineering as a general rule for combining both properties is yet to be proposed. Most of the work in subtilisins towards improving their performance for industrial purposes has generated variants and documented amino acid substitutions that ended as patented products, generating a very complex patent situation; this has led to increased interest on finding new

20 Directed Evolution of BgAP subtilisins having low homology with the already known variants or even completely new protease backbones [63]. Subtilisins have an important role among industrially utilized enzymes not only due to their use in detergents, they also have been a key model enzyme in the attempts to understand the evolution of the structure and function of serine proteases. In the field of protein engineering and despite the small size of subtilisin proteases (27 kDa), they have provided researchers with an excellent protein scaffold for Directed Evolution experiments. The results of these experiments have provided the scientific community with invaluable knowledge that can be extrapolated to other enzyme classes that involve similar challenges in terms of protein activity and adaptation.

1.2.4. – B. gibsonii Alkaline Protease (BgAP)

Recently, a novel subtilisin isolated from B. gibsonii has been described as potentially useful for washing and cleaning applications [79]. This subtilisin has very low homology with currently used subtilisins (from 54% to 77% amino acid identity with subtilisin E and B. lentus alkaline protease, Figure 2.4). Performance tests indicated a better activity profile than other subtilisins ( B. lentus alkaline protease, Properase TM and Savinase TM ) thus making BgAP an interesting candidate as an additive in fabric and dish washing products [79]. BgAP gene is translated into a pre-pro-enzyme that generates a 270 amino acid mature protease. The apparent size of the mature BgAP is approximately 27 kDa. As an alkaline protease BgAP has an optimum pH of 11 [79]. BgAP shows a mesophile behavior, with an activity optimum between 45- 50°C and greatly decreasing its stability at temperatures over 50°C. As most mesophilic subtilisins, the activity of BgAP greatly decreases at low temperatures, decreasing its performance as an additive in washing products meant to be used at room temperature. The generation of a BgAP variant with improved activity at low temperatures would allow this protease to be use in washing products regardless of the temperature of use and the analysis of these variants could provide valuable information regarding protein engineering and molecular basis of low temperature adaptation.

21

Figure 2.4. – Protein sequence alignment between B. gibsonii alkaline protease and other known subtilisins. Each protease name is followed by the four-letter Protein Data Bank (PDB) code.

Directed Evolution of BgAP

1.3. - SeSaM

Among random mutagenesis generation methods (see Section 1 of Part I) Sequence Saturation Mutagenesis (SeSaM) is one of the latest introduced, with a novel concept which randomizes a target DNA sequence at every single nucleotide position independently of DNA polymerase bias [7, 80]. Currently used error-prone PCR (epPCR) methods, though versatile and simple, do not offer the exploration of the whole sequence space theoretically available for a given DNA sequence. The reasons for this are fundamentally the redundant genetic code and the biased mutational spectra of DNA polymerases. The practical effect of these limitations in the generated random mutagenesis library is that an average of less than five of the possible 19 amino acid substitutions are obtained, considerably reducing the diversity of the screened variants. Furthermore, commonly used polymerases ( Taq polymerase and Mutazyme, from Stratagene) are biased towards transition mutations (A ↔ G and C ↔ T), thus reducing the chemical variability of the resulting amino acid substitutions in the generated muteins. By overcoming these challenges in the generation of random mutagenesis libraries, SeSaM offers an improved method that generates a library with a wider sequence space and leading richer chemical amino acid variability in the screened protein variants.

1.3.1. – SeSaM principle

The basic principle of the SeSaM method is to saturate every single nucleotide in a given DNA sequence with all standard four nucleotides. The details in the methodology of SeSaM are given elsewhere [7, 80] and are under constant development and improvement by SeSaM Biotech GmbH (Bremen, Germany). The SeSaM methodology is divided into four main steps. Figure 2.5 summarizes each one of these steps. In step 1 a pool of DNA fragments is generated; PCR is performed using a biotinylated forward primer and a nonbiotinylated reverse primer in the presence of both standard nucleotides and α-phosphothioate nucleotides. The phosphothioate bond is susceptible to iodine cleavage in alkaline solution, and cleavage of the PCR product containing α-phosphothioate nucleotides generates a library of fragments that stop at every single nucleotide. Separation of this library from cleaved products is achieved by isolating biotinylated forward primer by the use of magnetic particles with a streptavidin-coated surface in a DNA melting solution. Step 2 is the addition of universal bases, DNA fragments are “tailed” by a terminal with a universal base. Step 3 is the elongation of the tailed fragments. Using a single stranded template for elongation is crucial for the success of this step. In the case of a double- stranded template, the reverse primer would bind to its complementary template strand, resulting in PCR products that would not contain a nucleotide analog. Step 4 is the universal base

23 Directed Evolution of BgAP replacement. A concluding PCR is required where full-length genes containing nucleotide analogs act as template. In this PCR, nucleotide analogs are replaced by standard nucleotides, and mutations are generated due to the promiscuous base-pairing property of universal bases.

Figure 2.5. - SeSaM methodology (from SeSaM Biotech GmbH).

1.3.2. - SeSaM Features

As mentioned before, SeSaM proposes several improvements over commonly used random mutagenesis library methods. SeSaM has the advantage of being completely independent of the mutational bias of DNA polymerases. In addition, the fragment distribution of a DNA library is controllable by using different concentrations of the individual dNTP αS or a combination of them. Due to its design, SeSaM allows to saturate only positions having a selected standard nucleotide, adding flexibility to the design and applicability of the generated libraries.

24 Directed Evolution of BgAP

At the protein level, the most relevant feature of SeSaM libraries is the enriched percentage of transversion that generates high chemical variability at the amino acid level contrasting with epPCR or Mutazyme methods in which transversions are limited. Another factor increasing chemical variability is the presence of consecutive mutations in the generated libraries which are rarely found in either nature or other random mutagenesis libraries. Such consecutive mutations most probably come from the enzymatic addition of universal bases, in which a DNA fragment is tailed with more than one universal base and later elongated in the subsequent steps. Due to limitations in the commercial DNA purification kits, small fragments ( ca . 70 nt) are not recovered, reflected in the absence of mutations introduced in the first and last 70 bp of the resulting library [80]. This is overcome by using primers in the library construction that bind 70 bp upstream of the target gene sequence.

25 Directed Evolution of BgAP

2. - Materials and methods

2.1. – Materials

2.1.1. – Chemicals

All chemicals were of analytical reagent grade or higher quality and purchased from Sigma- Aldrich (Taufkirchen, Germany), Applichem (Darmstadt, Germany), Carl Roth (Karlsruhe, Germany) or Invitrogen (Darmstadt, Germany). All enzymes were purchased from New England Biolabs (Frankfurt, Germany), Fermentas GmbH (St. Leon-Rot, Germany) or Sigma-Aldrich. Protease inhibitor PMSF was purchased from Sigma-Aldrich.

2.1.2. – Bacterial strains

The bacterial strains used in this work are summarized in Table 2.2

Table 2.2 . - Bacterial strains used in this work.

Strain Description References E. coli DH5 α F'/ endA1 hsdR17 (rK-mK+) supE44 Stratagene thi-1 recA1 gyrA (Nalr) relA1 D( laclZYA-argF )U169 deoR (F80d lacD (lacZ )M15) B. subtilis DB104 nprE aprE [81] B. subtilis WB600 nprE nprB aprE epr mpr bpr [82]

2.1.3. – Plasmids The plasmids used and produced in this work are summarized in Table 2.3.

Table 2.3. - Plasmid used and generated in this work.

Plasmid Description References pHY300plk Shuttle vector, Ap r, Tet r Takara BIO pHYHP200dE pHY300plk carrying BgAP This work

2.1.4. – Oligonucleotides

The oligonucleotides used in this work are summarized in Table 2.4.

26 Directed Evolution of BgAP

Table 2.4. - Oligonucleotides used in this work.

Name Sequence (5’  3’) Description HP200_NdeI_pro_FW TGCACATATGGCTGAGGAAGCAAAAGAAA Cloning of B. gibsonii AP AAT HP200_stop_XhoI_FW GTATCTCGAGTTAGCGCGTTGCTGCATC Cloning of B. gibsonii AP pHY_shuttle_FW CAGATTTCGTGATGCTTGTCAGG Sequencing primer pHY300plk pHY_shuttle_RV CGTTAAGGGATCAACTTTGGGAG Sequencing primer pHY300plk SeSaM_FW_HP200 CACACTACCGCACTCCGTCGCGATTCCTGTT SeSaM library generation TTATCCGTTGAG SeSaM_RV_HP200 GTGTGATGGCGTGAGGCAGCCATTAGTTGG SeSaM library generation CTGGTTACCTTG HP200sat21_FW CATGGCATAATCGTGGANNNACAGGATCTG Site saturation mutagenesis GAG of B. gibsonii AP HP200sat21_RV CTCCAGATCCTGTNNNTCCACGATTATGCC Site saturation mutagenesis ATG of B. gibsonii AP HP200sat39_FW GGTATAGCTCAGCATNNNGATTTAACCATT Site saturation mutagenesis CGTG of B. gibsonii AP HP200sat39_RV CACGAATGGTTAAATCNNNATGCTGAGCTA Site saturation mutagenesis TACC of B. gibsonii AP HP200sat87_FW GCACCAAGTGCTNNNCTATACGCTGTAAAG Site saturation mutagenesis of B. gibsonii AP HP200sat87_RV CTTTACAGCGTATAGNNNAGCACTTGGTGC Site saturation mutagenesis of B. gibsonii AP HP200sat122_FW CATGCATATTGCAAACNNNAGTCTCGGTAG Site saturation mutagenesis TGATG of B. gibsonii AP HP200sat122_RV CATCACTACCGAGACTNNNGTTTGCAATAT Site saturation mutagenesis GCATG of B. gibsonii AP HP200sat177_FW GGAGCGACTGACNNNAACAACAGACGTG Site saturation mutagenesis of B. gibsonii AP HP200sat177_RV CACGTCTGTTGTTNNNGTCAGTCGCTCC Site saturation mutagenesis of B. gibsonii AP HP200222sat_FW CTACACCTCATGTTNNNGGAGTAGCTGCG Site saturation mutagenesis of B. gibsonii AP HP200222sat_RV CGCAGCTACTCCNNNAACATGAGGTGTAG Site saturation mutagenesis of B. gibsonii AP HP200sat 247_FW CGTAATCATTTGAAAAATNNNGCGACGAAT Site saturation mutagenesis CTAGG of B. gibsonii AP HP200sat 247_RV CCTAGATTCGTCGCNNNATTTTTCAAATGAT Site saturation mutagenesis TACG of B. gibsonii AP HP200sat 253_FW GCGACGAATCTAGGANNNTCATCTCAATTT Site saturation mutagenesis GG of B. gibsonii AP HP200sat 253_RV CCAAATTGAGATGANNNTCCTAGATTCGTC Site saturation mutagenesis GC of B. gibsonii AP HP200N253D_FW CGACGAATCTAGGAGACTCATCTCAATTTG Site-directed mutagenesis of B. gibsonii AP HP200N253D_RV CAAATTGAGATGAGTCTCCTAGATTCGTCG Site-directed mutagenesis of B. gibsonii AP SDMIle21Val_FW CCACTGTGCATAATCGTGGAGTAACAGGAT Site-directed mutagenesis of CTGGAGT B. gibsonii AP SDMIle21Val_RV ACTCCAGATCCTGTTACTCCACGATTATGCA Site-directed mutagenesis of CAGTGG B. gibsonii AP SDMMet122Leu_FW CGAATAACATGCATATTGCAAACCTGAGTC Site-directed mutagenesis of TCGGTAGTGATGC B. gibsonii AP SDMMet122Leu_RV GCATCACTACCGAGACTCAGGTTTGCAATA Site-directed mutagenesis of TGCATGTTATTCG B. gibsonii AP

27 Directed Evolution of BgAP

2.1.5. – Cell culture media and cultivation

Cells were routinely cultivated in Luria Broth (LB) medium supplemented with appropriate antibiotics using a shaking incubator (Multitron II, Infors GmbH, Einsbach, Germany) at 37°C and 250 rpm for 24 h. Antibiotics used for cell culture are listed in Table 2.5. LB medium: 1 % (w/v) peptone from casein, 0.5 % (w/v) yeast extract, 1 % (w/v) NaCl LB skim milk medium: LB medium plus 1 % (w/v) skim-milk. LB or LBM agar plates contain 1.5 % (w/v) agar

Table 2.5. - Antibiotics used for cell culture in this work.

Antibiotic Stock (mg/ml) Solvent Working concentration (µg/ml) Ampicilin 100 Water 100 Tetracycline 10 70% ethanol 15

2.2. - Methods

2.2.1. - Cloning

All gene cloning and manipulation steps were carried out according to standard molecular cloning protocols [83]. Preparation of E. coli competent cells and transformation were carried out as standard heat shock transformation [84]. BgAP gene along with its pre-pro-sequence including the Bacillus promoter was cloned into pHY300PLK shuttle vector (Takara Bio Inc, Shiga, Japan) by using BamHI and EcoRI restriction sites. The generated construct was named pHYHP200 (6243 bp). Plasmids were verified by PCR, restriction enzyme digestion and sequence analysis. Sequence results were analyzed by Vector NTI 9 (Invitrogen). Plasmid isolation, gel purification and PCR purification kit were purchased from Qiagen (Hilden, Germany).

2.2.2. - Random mutagenesis library generation

Random mutagenesis library and DNA recombination of BgAP gene was performed using the SeSaM-Biotech Sequence Saturation Mutagenesis Kit (SeSaM-Biotech, Bremen, Germany). Only the mature sequence of the protease was chosen for randomization, and the SeSaM template was generated by PCR using specially designed SeSaM_FW_HP200 and SeSaM_RV_HP200. The

PCR conditions included Taq PCR Buffer (50 mM KCl, 10 mM Tris HCl pH 9.0, 1.5 mM MgCl 2, and 0.1% v/v Triton X-100), 0.2 mM of dNTP mix, 400 nM of each SeSaM primer, 10 ng of pHYHP200 plasmid template and 3.5 units of Taq polymerase in a total volume of 75 µl. PCR was carried out in an Eppendorf (Hamburg, Germany) Mastercycler 5331 thermal cycler (94°C

28 Directed Evolution of BgAP for 15 s, 54°C for 30 s and 72°C for 20 s for the first 20 cycles and later 94°C for 15 s, 64°C for 30 s and 72°C for 20 s for the last 10 cycles). The resulting PCR product was purified using a PCR purification Kit. SeSaM library generation was performed as previously described [7, 80]. The randomized sequence of the resulting library was cloned into pHY300plk using Megaprimer PCR of Whole Plasmid, MEGAWHOP [85]. MEGAWHOP was carried out in Taq PCR buffer, 50 ng of pHYHP200 plasmid template, 500 ng of SeSaM library PCR product from step 4, 0.3 mM of each dNTPs and 1.5 units of Pfu polymerase in a total volume of 50 µl. PCR mixture was incubated at 72°C for 5 min, then heated to 95°C for 90 s and finally subjected to thermal cycling (98°C for 45 s, 55°C for 45 s and 72°C for 4 min, 25 cycles). The resulting PCR product was treated with DpnI (20 U) and incubated for 4 hours at 37°C. The enzyme was inactivated by incubation at 80°C during 20 min. The cloned library was then transformed into chemical competent E. coli DH5 α cells and plated in LB agar plates containing 100 µg/ml ampicillin. After transformation, approximately 4000 single colonies were pooled and plasmid preparation was performed using Qiagen Miniprep Kit.

2.2.3. - Protein expression

The resulting plasmid library was transformed into electrocompetent B. subtilis WB600 and plated into LB agar plates containing 15 µg/ml tetracycline and 1% skim milk. Colonies generating a clearance halo on skim milk were picked and resuspended in separate wells of a 96- well flat bottom microtiter plate (Greiner, Frickenhausen, Germany) containing 200 µl of LB medium and 15 µg/ml Tetracycline (master plates) and incubated during 18 h at 37°C, 900 rpm and 70% humidity. Expression was carried out by replicating the master plates into V-bottom microtiter plates (expression plates) containing 150 µl of LB medium and 15 µg/ml Tetracycline. Expression was carried out during 24 h at 37°C, 900 rpm and 70% humidity. After expression, microtiter plates were centrifuged at 3220 g during 30 min at 4°C and the supernatant was transferred to a new microtiter plate for screening.

2.2.4. - Screening for improved activity at 15°C

Screening was performed by adding 8 µl supernatant from each clone to 192 µl of 2% w/v skim milk (Fluka) in 100 mM Tris/HCl buffer pH 8.6 preincubated at 15°C. Proteolytic activity for samples within the same microtiter plate was defined as the decrease on absorbance (clearance) at 580 nm across 10 min of incubation at 15°C measured in Absorbance Units per min (AU/min).

29 Directed Evolution of BgAP

2.2.5. - Screening for improved thermostability

Cell culture supernatants (50µl) were incubated for 30 min at 58°C in an Eppendorf Mastercycler 5331 thermal cycler. Protease activity of the treated supernatant was then determined and compared to the activity of untreated supernatant and residual activity was calculated. Residual activity (%) was defined as the activity of the treated sample in Absorbance Unit per min (AU/min) divided by the activity of the untreated sample, multiplied by 100.

2.2.6. - Site-directed mutagenesis and site saturation mutagenesis of BgAP variants

Single amino acid site-directed mutagenesis (SDM) of BgAP variants was performed according to the published method [86]. The oligonucleotides for SDMs of BgAP were listed as in Table X. The procedure consists of two stages. In stage one, two extension reactions are performed in separate tubes; one containing the forward primer and the other containing the reverse primer. Subsequently, the two reactions were mixed and the stage two was carried out. For the mutagenic PCR (First stage: 94°C for 2 min, one cycle; 94°C for 30 s, 55°C for 1 min, 72°C for 10 min, three cycles. S stage: 94°C for 2 min, one cycle; 94°C for 30 s, 55°C for 1 min, 68°C for 10 min, 15 cycles; 68°C for 30 min, one cycle), Pfu DNA polymerase (1 U), dNTP mix (0.2 mM), each primer (0.4 µM) together with plasmid template (50 ng) were used in 50 µl reaction volume. Following the PCR, DpnI (20 U) was added, and the mixture was incubated at 37°C for 4 h, purified using Qiagen kit, transformed into E. coli DH5 α strains and verified by sequencing, and further transformed into B. subtilis cells for expression and analysis.

2.2.7. - Protein purification

BgAP and its variants were purified from a 36 h, 250 ml B. subtilis WB600 culture. Supernatant was obtained by centrifugation at 3220 g during 30 min at 4°C. The resulting supernatants were filtered using a #30 glass fiber filter (Schleicher & Schuell Microscience, Dassel, Germany) and again by using a 0.2 µm Polyethersulfone (PES) membrane filter (Sartorius, Göttingen, Germany). The supernatant was concentrated to 10 ml using a Millipore ultra filtration stirred cell with a 5 kDa cutoff regenerated cellulose ultra filtration membrane (Millipore, Billerica, MA, USA). Concentrated supernatant was diluted to 200 ml by adding 190 ml of 40 mM HEPES buffer pH 7.8, re-concentrated to 10 ml and loaded into a Toyopearl Super Q 650c (TOSOH) anion exchange chromatography column (25 ml) previously equilibrated with 40 mM HEPES pH 7.8 buffer. The flow rate and pressure were managed by an ÄKTA purifier FPLC system (Amersham, GE Healthcare Europe, Freiburg, Germany) using the Unicorn software package. BgAP protease does not bind to this column at pH 7.8 and was collected on the buffer front

30 Directed Evolution of BgAP fractions. Protein fractions showing protease activity were pooled and loaded into a Toyopearl Super QSP-650c (TOSOH) cation exchange chromatography column (20 ml) previously equilibrated with buffer. The column was rinsed with 50 ml of buffer, and adsorbed protein was eluted with a linear gradient of NaCl up to 1 M. The eluted protein fractions showing protease activity were pooled together, diluted 10 times in HEPES buffer and concentrated to 1 ml using a 10 kDa cutoff Amicon Ultra Centrifugal Filters (Millipore). Protein concentration of purified BgAP variants was normalized by measuring the total protein concentration using a BCA protein assay kit (Thermo) and by quantification against total protein content by the Agilent 2100 Bioanalyzer (Agilent Protein 230 Kit, Agilent, Santa Clara, USA).

2.2.8. - Proteolytic activity: Suc-AAPF-pNA

After purification, proteolytic activity was determined using the synthetic peptide substrate succinyl-Ala-Ala-Pro-Phe-p-nitroanilide (s-AAPF-pNA) by quantification of the release of free p- nitroaniline ( pNA) at 410 nm [87]. Concentration of the peptide substrate was determined using * an extinction coefficient of ε315 = 14000/M cm [87]. The kinetic constants for BgAP and its variants were calculated at 15°C and 37°C from a series of initial rates at different concentration of substrate covering the range of 0.1 to 3 mM performed in

100 mM Tris/HCl pH 8.6 and 10 mM CaCl 2 (500 µl). The reaction was incubated at the specific temperature during two min and started by adding 3 µl of enzyme (7.2 µM, 43.8 nM final); the increase of absorbance was recorded at 410 nm during the first three min of reaction. Released * pNA concentration was calculated using ε410 = 8800/M cm as recommended by the manufacturer (Sigma).

2.2.8. - Enzyme activity: Azocasein

Proteolytic activity over a macromolecular substrate was assessed using Azo-dye labeled casein (Azocasein, Sigma). Azocasein stocks of 0.02 to 1.2 mM were prepared in 100 mM Tris/HCl pH 8.6 and filtered using a 0.45 µm filter. Proteolytic assay was performed in 500 µl Azocasein solution containing enzyme concentration of 125 µM at 37°C during 1 h. After reaction, 200 µl of assay solution was mixed with 50 µl of 15% Trichloroacetic acid (TCA) to precipitate undigested casein and centrifuged at 3220 g during 30 min at 4°C. 100 µl of supernatant were transferred to a flat bottom MTP and 4 µl of 2M NaOH were added. Absorbance was measured at 440 nm on a SPECTROstar Omega MTP reader (BMG Labtech, Offenburg, Germany). Released Azo-dye rate and concentration were determined by a calibration curve of overdigested Azocasein.

31 Directed Evolution of BgAP

2.2.9. - Temperature dependence of the specific activity

Temperature dependence of the specific activity was performed in 100 mM Tris/HCl pH 8.6, 1 mM CaCl 2, 3 mM s-AAPF-pNA, the total assay volume was 100 µl. The solution was equilibrated to the target temperature for 2 min and the reaction was started by adding 4 µl of enzyme (1.5 µM , 43.8 nM final), reaction was carried over for 120 (5 to 40°C), 90 (50 to 70°C) or 60 s (75 and 80°C) in duplicates and then stopped by the addition of 4 µl PMSF (10 mM, 400 µM final) and the production formation per s was measured on a SPECTROstar Omega MTP * reader. Released pNA concentration was calculated using ε410 = 8800/M cm as recommended by the manufacturer (Sigma).

2.2.10. - Thermal inactivation

Thermal inactivation was monitored by incubating 7 µM enzyme from 1 to 60 min at 60°C in 50 mM HEPES pH 7.8 10 mM CaCl 2 and 250 µg/ml BSA. Residual activity was measured by adding 4 µl of treated enzyme solution into 196 µl assay solution (2% skim milk in 100 mM Tris/HCl pH 8.6) and measuring the decrease of absorbance at 580 nm at 20°C in a TECAN Sunrise microtiter plate reader (Männerdorf, Switzerland). Clearance per s was plotted and residual activity was calculated versus incubation time for each variant.

2.2.11. - Thermal shift assay

To monitor protein unfolding and evidence the difference on the unfolding between wild type BgAP and the generated variants, the environmentally sensitive fluorescent dye Sypro orange (Invitrogen), was used. The unfolding process exposes the hydrophobic residues and result in a large increase in fluorescence, which is used to monitor the protein-unfolding transition. The assay was performed in a Mastercycler ep realplex real time Tthermocycler (Eppendorf, Hamburg, Germany). The assay mix contains 5X Sypro orange, 18 µl of 50 mM HEPES, 100 mM

NaCl, 1 mM CaCl 2 containing 1.25 mg/ml enzyme (46 µM, 37 µM final) and 2 µl 25 mM PMSF (2 mM final). The block was heated from 20 to 95°C measuring fluorescence each 1°C increase. The fluorescence intensity was measured using Ex/Em = 470/550 nm. The data analysis was performed using Realplex software (Eppendorf) and Tm values were calculated for each variant.

32 Directed Evolution of BgAP

2.2.12. - Homology modeling

The three-dimensional structure model for BgAP was generated based in the crystal structure of subtilisin BL (PDB code 1ST3) [88] which has a 77.4% of identity at amino acid level. Coordinates were obtained from the RCSB Protein Databank (PDB code 1ST3A). The homology model was generated using the Modeller software 9v6 [89]. The program VMD [90] was used for generating the images of the molecular structures.

33 Directed Evolution of BgAP

3. – Results

In this work, Directed Evolution of B. gibsonii Alkaline Protease (BgAP) was performed towards two different properties important for industrial applications. In section 3.1, the results from Directed Evolution towards improved activity at 15°C are presented. In section 3.2, the results from a parallel evolution of B. gibsonii towards increased thermal stability are shown. Amino acid substitutions identified from the previous Directed Evolution were introduced to the thermostable variant, in order to generate a protease that is more active at low temperatures and with a higher thermal stability.

3.1. – Directed Evolution to improve activity at low temperatures

3.1.1. - BgAP expression system

The BgAP gene and promoter were subcloned from the provided Bacillus expression vector into the pHY300plk shuttle vector to generate the pHYHP200dE expression vector (Figure 2.6). Sequencing analysis confirmed the successful construction of the plasmid. After transformation, B. subtilis WB600 growing on skim milk agar plates developed clear zones around the colonies, evidencing the expression of an active protease. The expression was also confirmed in liquid batch culture, where the cell culture supernatant was active when tested using both skim milk and suc-AAPF assays.

Figure 2.6. - Scheme of the constructed pHYHP200dE shuttle vector. The promoter and the full gene sequence of the B. gibsonii alkaline protease were subcloned into pHY300plk shuttle vector. The resulting construct was used for SeSaM library construction and protein expression during the Directed Evolution experiments.

34 Directed Evolution of BgAP

3.1.2. - Skim milk microtiter plate assay

Screening of the random mutagenesis libraries was performed in two stages: a pre-screening on agar plates to identify active clones, and a microtiter plate (MTP) assay to quantify and compare the protease activity of the active clones from library against the original WT BgAP. The chosen substrate to measure protease activity was 2% (w/v) skim milk. The hydrolysis of casein present in skim milk due to protease activity produces smaller, soluble peptides resulting in a clear solution. This reaction can be measured by the decrease of the absorbance of the skim milk solution at 580 nm over time. Skim milk has several advantages in screening applications over other more elegant synthetic substrates used to measure protease activity. Skim milk is widely available, low-cost and non- toxic, but its principal advantage is to present a macromolecular substrate that resembles the target of the protease in its “real world” applications. Small synthetic peptides do not represent the variety of substrate conformations offered by a full protein scaffold, nor they care for macromolecular interactions that occur when a protease digests a full sized protein. The use of skim milk in the screening process prevents the increase of the substrate specificity of the selected variants in the Directed Evolution process, reducing the chances of generating a variant with a high turnover rate for a given substrate but a sub-optimal performance on macromolecular substrates.

Figure 2.7. - Validation of the skim milk MTP screening assay. The assay was performed using 2% skim milk in Tris/HCl buffer pH 8.6 using 8 µl of supernatant containing the wild type protease. The coefficient of variation (%) was calculated for the measured absorbance (blue) and the calculated clearance due to proteolytic activity (red) at 580 nm, as well as for the optical density (OD) of the cell culture per well. The values per well for each category were plotted in descending order.

35 Directed Evolution of BgAP

A skim milk assay was adapted to perform the screening in MTP and optimized to measure protease activity of a cell culture supernatant at 15°C. As BgAP is an alkaline protease, the optimum pH of the assay should be in the range of 8.0 to 10.0. In order to minimize the self hydrolysis of skim milk at high pH values, different pH values were tested and the decrease of absorbance of non-treated skim milk /buffer solutions was measured over time. The highest pH value with an acceptable self-hydrolysis rate (e.g. 0.05 AU/min) was 8.6. The assay was performed at pH 8.6 and optimized with a standard deviation of apoximately 14% (Figure 2.7).

3.1.3. - SeSaM library generation

The strategy of the BgAP library generation is summarized in Figure 2.8. The SeSaM library of BgAP was generated by following the provided manual and the resulting PCR product was cloned into pHYHP200dE by using MEGAWHOP [85] and then transformed into chemical competent E. coli DH5 α cells. The resulting colonies were pooled together and plasmid DNA was recovered for transformation into B. subtilis WB600 cells where the protein expression was performed. The library size in terms of colony numbers for E. coli and B. subtilis are detailed in Table 2.6.

Figure 2.8. - Summary of the main steps of BgAP library generation . The PCR product of the SeSaM library of B. subtilis AP (A) was used as mega-primer for the MEGAWHOP cloning (B) resulting in the full sized pHYHP200 plasmid containing the SeSaM library of BgAP. The library was then repaired and ligated in E. coli and transformed into B. subtilis WB600 for expression and screening (C).

36 Directed Evolution of BgAP

Table 2.6. – Resulting number of clones from transformation of MEGAWHOP PCR product into E. coli DH5 α and clones obtained after plasmid isolation and transformation into B. subtilis WB600.

SeSaM E. coli DH5 α B. subtilis WB600 Active (%)

Round 1 3800 4150 22

Round 2 3200 4500 25

Round 3 8000 4200 20

3.1.4. - Screening of BgAP SeSaM library round 1

From approximately 4150 B. subtilis colonies resulting from transformation, those producing a clearance halo on skim milk agar plates were repicked onto fresh LB tet skim milk agar plates. After 24 h incubation at 37°C, 933 (22%) colonies producing halos were picked into 96-well flat bottom MTP, replicated into 96-well V-bottom MTP and expression was carried out for 24 h. The

OD 580 was measured for each well, having a variation under 10%. Plates were centrifuged for 20 min at 3220 g and 4°C. After centrifugation, 8 µl of cell culture supernatant from each well were transferred to a fresh flat-bottom MTP for activity assays at 15°C. The activity ratio profile versus WT BgAP of all active clones screened in this round is summarized in Figure 2.9. Approximately 150 clones had an activity ratio at 15°C higher than 1.25.

Figure 2.9. - MTP screening profile of all active clones from BgAP SeSaM library round 1 . Activity ratio was calculated for each clone against WT BgAP and plotted in a descendent order. The distribution shows approximately 150 clones with a significant higher activity than WT BgAP. Clones with the highest were selected for further testing.

37 Directed Evolution of BgAP

Clones consistently having the highest activity ratio were selected for further testing (Figure 2.10). The activity of 8 µl of cell culture supernatant was measured at 15°C and the protease expression was analyzed by SDS-PAGE. All of the analyzed clones showed a higher activity values measured in clearance of skim milk in AU/min. However, the SDS-PAGE results evidenced an increased protease expression (27 kDa) in the supernatants from all of the selected variants. Thus, the improved activity observed in the MTP assay is a direct result of an improved expression in the selected clones.

Figure 2.10. - Protease activity and expression in supernatants from selected clones of BgAP SeSaM library round 1 . Protease activity (A) and expression (B) was analyzed for cell culture supernatants of selected clones of B. subtilis AP SeSaM library. B. subtilis AP corresponds to the 27 kDa size band (arrow). Increased activity in the supernatant can be related to an increased expression in the analyzed clones.

DNA sequencing of selected variants revealed a single mutation leading to the substitution Ile21Val. Increased expression continued after re-transformation of plasmid isolated from clone P10C10 into fresh Bacillus competent cells, discarding the possibility of host-specific influence on the improved expression of the protease.

38 Directed Evolution of BgAP

To check the mutation load generated by SeSaM, four random clones from the library were sequenced. The mutation load and type results showed that all four clones carried at least 3 mutations and 40% of all mutations were transversions, which is an exclusive feature of the employed SeSaM technology (Table 2.7). Additionally, two consecutive mutations were found, accounting for 15% of the total number.

Table 2.7. - Sequencing results for 4 random clones from the BgAP SeSaM library. Plasmid DNA from 4 random clones was sequenced and mutations were analyzed. Out of the 13 found mutations, 5 were transversions (40%) and 2 were consecutive (15%) showing a transversion enriched library due to the SeSaM protocol.

Clone Position on Mutation Type BgAP gene 495 GA -> AG Transition-Transition 1 630 T -> C Transition 714 T -> C Transition 1020 T -> C Transition 412 G -> C Transversion 2 690 GC -> AA Transition-Transversion 785 G -> A Transition 1137 A -> G Transition 44 T -> A Transversion 3 669 G -> A Transition 784 G -> T Transversion 355 T -> C Transition 4 712 G -> T Transversion 833 T -> - Deletion

Since the mutation load on the library was sufficient, it can be assumed that none of the variants had indeed a higher total activity than the expression mutants, and therefore only expression mutants were selected for rescreening. For the following SeSaM rounds of BgAP, the expression mutant was used as parent, and also as screening control. Active mutants were expected to have similar expression levels and any found improved activity should in this case be due to improved specific activity of the protease at 15°C.

39 Directed Evolution of BgAP

3.1.5. - Screening of BgAP SeSaM library round 2

To check the mutation load generated by the SeSaM method, six random clones were sequenced and the mutation load and type results were analyzed. 48% of all mutations were transversions (Table 2.8). Additionally, one consecutive mutation was found accounting for 3% of the total amount.

Table 2.8.- Sequencing results for six random clones from BgAP SeSaM library. Plasmid DNA from six random clones (three from A library and three from G library) was sequenced and mutations were analyzed. Out of the 29 found mutations, 14 were transversions (48%) and 1 was consecutive (3%).

Clone Position on Mutation Type BgAP gene 470 G -> C Transversion A1 862 G -> A Transition 961 T -> A Transversion 1029 A -> T Transversion 458 A -> G Transition A2 568 A -> T Transversion 571 - -> T Insertion 631 G -> A Transition 682 A -> G Transition 699 A -> G Transition 713 G -> A Transition 815 T -> C Transition 832 T -> A Transversion 841 G -> A Transition 910 G -> C Transversion 988 G -> A Transition 999 T -> A Transversion 462 A -> T Transversion A3 572 G -> A Transition 589 C -> G Transversion 732 C -> A Transversion 550 G -> - Deletion G1 555 T -> A Transversion 680 C -> G Transversion 842 C -> G Transversion G2 none none none

449 C -> T Transition G3 640 A -> G Transition 859 A -> G Transition 891 T -> C Transition 1049 AC -> GG Transition-Transversion

From approximately 4500 B. subtilis colonies after transformation, those producing a clearance halo on skim milk agar plates were repicked onto fresh LB tet skim milk agar plates. Colonies

40 Directed Evolution of BgAP producing clearance halo were picked to 96-well flat bottom MTP, replicated into 96-well V- bottom MTP and expression was carried out during 24 h. Plates were centrifuged 20 min at 3220 g and 4°C. After centrifugation an equivalent to 4 µl of cell culture supernatant from each well were transferred to a fresh flat-bottom MTP for activity assay at 15°C. Protease activity was measured and compared against clone P1C10 from the first round. The activity ratio profile versus variant P10C10 of all active clones screened in this round is summarized in Figure 2.11. Approximately 10 clones had an activity ratio at 15°C higher than 1.25, about ten times less than those found in the previous Directed Evolution round.

Figure 2.11. - MTP screening profile of all active clones from BgAP SeSaM library round 2 . Activity ratio was calculated for each clone against first round variant P10C10 and plotted in a descendent order. The distribution shows approximately 50 clones with a significantly higher activity than WT BgAP. Clones with the highest were selected for further testing.

Clones consistently having the highest activity ratio were selected for further testing (Figure 2.12). The proteolytic activity of an equivalent to 4 µl of cell culture supernatant was measured at 15°C and the protease expression was analyzed by SDS-PAGE. All of the analyzed clones showed high activity values measured in clearance of skim milk as AU/min. The SDS-PAGE analysis results evidenced a further increase in protease expression in the supernatants in all of the selected variants.

41 Directed Evolution of BgAP

Figure 2.12. - Protease activity and expression in supernatants of selected clones of BgAP SeSaM library round 1 . Protease activity (A) and expression (B) was analyzed for cell culture supernatants from selected clones of SeSaM library. BgAP corresponds to the 27 kDa size band (arrow). Increased activity in the supernatant can be related to an increased expression in the analyzed clones.

DNA sequencing revealed the mutations present in the selected mutants. All of the selected clones presented mutations in the protease gene. The type of mutations and the amino acid substitutions that they generated are summarized in Table 2.9. Clone PVB4 presented two silent mutations, and possibly the improved activity was due to increased expression. Clones PIIE10, PVIG10 and PXIIH3 presented mutations leading to amino acid substitutions, three of them being transversions. Since all of the substitutions found in the selected clones from this Directed Evolution round were different from each other, all of these were selected as parents for the next Directed Evolution round, where the recombination of these substitutions was expected to occur, in addition to the generation of new mutations product of the SeSaM method.

42 Directed Evolution of BgAP

Table 2.9. - Sequencing results of selected clones of the second round BgAP SeSaM library. Plasmid DNA from selected clones was sequenced and mutations analyzed. Mutations in the table are those additional to Ile21Val from the first round. Out of the 11 mutations found, three were transversions (27%) and one was consecutive (9%) showing a transversion enriched library due to SeSaM protocol. Clone PVB4 showed only silent mutations, suggesting that the improved protease activity found in this clone is only product of increased expression.

Clone Position on Mutation Type Amino acid BgAP gene substitution 459 T -> C Transition none PIIE10 609 T -> C Transition none 869 A -> T Transversion Asn177Ile 1128 C -> T Transition none PVB4 1137 G -> A Transition none 763 A -> G Transition Ser142Gly PVIG10 901 A -> G Transition Thr188Ala 703 A -> C Transversion Met122Leu PXIIH3 1003-1004 GC -> AG Transition-Transversion Ala222Ser 1140 G -> A Transition none

3.1.6. - Screening of BgAP SeSaM library round 3

The third Directed Evolution round was performed with two main objectives. The first objective was to further introduce mutations into the previously generated variants, and to screen for further improved proteolytic activity at 15°C. The second objective was to obtain variants that contained a combination of the mutations found in the parent clones from the previous Directed Evolution rounds, in order to assess the cumulative effect of these substitutions. The variants PIIE10, PVIG10 and PXIIH3 from the second Directed Evolution round were used as parents for the SeSaM library generation, and the proteolytic activity of cell culture supernatant from clone PVIG10 was selected as standard for screening, since it had the highest total activity between the selected variants. From approximately 4200 colonies resulting from transformation into B. subtilis , active clones were inoculated to a fresh LB tet skim milk agar plates. After 24 h incubation at 37°C, 880 colonies (20%) producing halo were picked to 96-well flat bottom MTP, replicated into 96-well V-bottom MTP and expression was carried out during 24 h. The activity ratio profile versus PVIG10 of all active clones screened in this round is summarized in Figure 2.13. Approximately 10 clones had an activity ratio higher than 1.25 versus PVIG10 at 15°C.

43 Directed Evolution of BgAP

Figure 2.13. - MTP screening profile of all active clones from BgAP SeSaM library round 3 . Activity ratio was calculated for each clone against PVIG10 and plotted in a descending order. The distribution shows approximately 10 clones with a significantly higher activity than the parent. Clones with the highest were selected for further testing.

Clones consistently having the highest activity ratio versus the standard were selected for further testing. The activity of an equivalent to 2µl of cell culture supernatant was measured at 15°C and the protease expression was analyzed by SDS-PAGE (Figure 2.14). At the end of the third Directed Evolution round of BgAP, the proteolytic activity per volume of supernatant was increased by approximately 10 times. However, the SDS-Page analysis showed, as in previous rounds, that there was also a comparable increase on the amount of protease present in the cell culture supernatant, strongly suggesting that there was an important increment on BgAP expression. Since it was not possible to accurately determine the specific proteolytic activity of BgAP from the cell culture supernatant, a protein purification step was needed to quantitatively compare the generated variants with the original BgAP.

44 Directed Evolution of BgAP

Figure 2.14. - Protease activity and expression in supernatants of selected clones of BgAP SeSaM library round 3 . Protease activity (A) and expression (B) was analyzed for cell culture supernatants of selected clones of BgAP SeSaM round 3. BgAP corresponds to the 27 kDa size band (arrow). Increased activity in the supernatant can be related to an increased expression in the analyzed clones.

DNA sequencing of the selected variants revealed the mutations introduced during the SeSaM library generation (Table 2.10). From a total of 33 mutations found across the seven selected variants, 16 were transversions (48%) including five examples of transition-transversion consecutive mutations. As in the previous rounds, there is transversion-enriched distribution of the mutations introduced in the SeSaM library. The average mutation load for the selected clones was 4 mutations per clone. It is important to mention that no mutations were found in the pre- sequence of the protease gene, suggesting that indeed the identified mutations were introduced by the SeSaM method and not in subsequent cloning steps.

45 Directed Evolution of BgAP

Table 2.10.- Sequencing results of selected clones of the second round BgAP SeSaM library round 3. Plasmid DNA from selected clones was sequenced and mutations were analyzed. Mutations in the table are those additional to Ile21Val from the first round. Out of the 33 found mutations, 16 were transversions (48%) and 5 were consecutive mutations (15%) showing a transversion enriched library.

Clone Position on Mutation Type Amino acid BgAP gene substitution P1A6 374 A -> T Transversion Gln12Leu 703 A -> C Transversion Met122Leu 1003-1004 GC ->AG Transition-Transversion Ala222Ser 1140 A -> G Transition None P4D3 703 A -> C Transversion Met122Leu 1096 A -> G Transition Asn253Asp P5F2 460 T.->.C Transition None 868-869 AA -> GT Transition-Transversion Asn177Val 1021 G ->- A Transition Val228Ile 1128 T -> C Transition None 1143 G -> A Transition None P6C7 455 G.->.C Transversion Ser39Thr 703 A -> C Transversion Met122Leu 868 A -> G Transition Asn177Glu 870 C -> A Transversion Asn177Glu 1128 T -> C Transition None 1143 G -> A Transition None P7D1 460 T.->.C Transition None 574 G -> T Transversion Val79Leu 600 C -> A Transversion Asp87Glu 964-965 GC -> AA Transition-Transversion Ala209Asn 1003-1004 GC -> AG Transition-Transversion Ala222Ser P7C3 703 A -> C Transversion Met122Leu 1003-1004 GC -> AG Transition-Transversion Ala222Ser 1079 C -> A Transversion Thr247Asn 1137 A -> G Transition None 1140 A -> G Transition None

P7F10 703 A -> C Transversion Met122Leu

46 Directed Evolution of BgAP

3.1.8. - Purification and characterization of selected clones from Directed Evolution

The differences in expression levels of BgAP in cell culture supernatants of the selected variants from the Directed Evolution experiments required a protein purification step to accurately compare the specific activity of each mutein against the original parent. The calculated isoelectric point (pI) of the parent is 9.70 and a reported value indicated an pI of 11 [79], indicating that the protein should bind to a cation exchange chromatography column at pH 7.8. To remove unwanted protein impurities from the cell culture supernatant, a previous chromatography step was added, using an anion exchange column. BgAP did not bind to this column, while most of the protein content of the supernatant stayed in the immobilized phase, allowing the fractions containing the protease to reach a semi-purified state before running it to the cation exchange column. The detailed purification protocol is described in Section 2.2.7. Figure 2.15 provides a typical example of the two step ion exchange chromatography for the purification of B. gibsonii AP. Panel A shows the resulting fractions when the concentrated cell culture supernatant was loaded into the anion exchange column, the blue line represents the protein content in each fraction related to the absorbance of the sample at 280 nm. The black arrow points the fractions exhibiting proteolytic activity, corresponding to BgAP. After increasing the ionic strength of the running buffer, the remaining protein content retained in the column was eluted. The fractions with proteolytic activity were collected and loaded into the cation exchange column. Panel B shows the result of the cation exchange chromatography; BgAP is retained in the column while other proteins present in the sample passed through. These fractions did not show proteolytic activity. The retained protein was eluted by increasing ionic strength and collected as a single protein peak. The proteolytic activity of the eluted fractions was confirmed and the positive fractions were collected, pooled together, desalted and reconcentrated. The protein purity was confirmed by the presence of a single band at 27 kDa corresponding to BgAP, using the Agilent Protein Kit 230 (Figure 2.16). The protein content of the purified samples was precipitated to avoid degradation. The specific percentage of protease calculated for each purified variant ranging from 650 µg/ml in the WT to up to 3830 µg/ml for the muteins (Table 2.11).

47 Directed Evolution of BgAP

Figure 2.16. – Two step ion exchange chromatography purification of BgAP . (A) 250 ml of cell culture supernatant was concentrated to 10 ml and loaded into an anion exchange column (Toyopearl SuperQ- 650c) followed by a Toyopearl SP-650c cation exchange column (B) . The protein fractions presenting proteolytic activity are marked with an arrow. Protein content in each fraction was related to the absorbance at 280 nm of the sample.

48 Directed Evolution of BgAP

Figure 2.16. – Agilent analysis of purified BgAP variants. The protein content of the purified samples of the selected variants was precipitated using TCA and loaded into the Agilent protein 230 chip. The red arrow corresponds to the 27 kDa size BgAP band.

Table 2.11. – Calculated specific BgAP concentration for each variant after purification . The specific concentration was calculated according to the Agilent analysis respect to the total protein concentration on each sample.

Variant Specific concentration (µg/ml) WT 650 P4D3 3830 P6C7 2750 P1A6 3450 P5F2 1890 P7C3 1660

49 Directed Evolution of BgAP

3.1.9. – Characterization using the suc-AAPF-pNA substrate.

After purification of the selected variants, kinetic characterization was performed using the synthetic tetrapeptide suc-AAPF-pNA as described in Section 2.2.8. The calculated kinetic parameters for WT and five selected variants are summarized in Figure 2.17. Km values increased for all variants at 15°C and also at 37°C with the exception of P7C3. The catalytic efficiency, represented by the K cat value, is higher for all variants at 15°C, in accordance with the screening results. In addition, the K cat values increased when determined at 37°C, again with the exception of P7C3. Variants P1A6 and P4D3 showed the largest increase in the K cat values at 15°C and 37°C among all, reaching 56 and 54% of improvement at 15°C respectively.

Figure 2.17. - Kinetic parameters of the selected variants from the random mutagenesis libraries of BgAP using suc-AAPF-pNA as substrate at 15°C and 37°C.

50 Directed Evolution of BgAP

Despite the data from the characterization using AAPF as substrate confirms the trend observed in the screening phase, the small tetrapeptide does not represent completely the behavior of the generated variants when presented to a “real life” substrate. Suc-AAPF-pNA is, however, the most widely used substrate to characterize proteolytic activity and the characterization data using this substrate can be compared with similar studies.

3.1.10. – Characterization using the macromolecular substrate Azo-casein.

In order to compare the proteolytic activity of variants P1A6 and P4D3 with WT when presented to a macromolecular substrate, azo-casein was used for kinetic characterization. For this purpose, an azo-casein microtiter plate (MTP) assay was adapted from the common assay provided by Sigma. A dilution series was performed calculating the concentration from the molecular weight of the full protein (MW= 23.6 kDa). For each substrate concentration, an overdigestion assay was performed by adding excess of commercial protease (protease from Bacillus sp. Sigma) and incubating the mix for 36 and 48 h at 37°C. After incubation, the protein content was precipitated and the absorbance at 440 nm was measured in a MTP reader. The relationship between fully digested azo-casein at each concentration and the absorbance at 440 nm is detailed in Figure 2.18.

Overdigerstion of Azocasein 48 h 37°C y = 0.0109x R2 = 0.9909 5

4

3

2

1 Absorbance . 440nm 0 0 100 200 300 400 500

Azocasein concentration (µM)

Figure 2.18. - Relationship between fully digested azo-casein at different concentrations and the absorbance of 100 µl of supernatant of each concentration at 440 nm.

Using the relationship between absorbance at 440 nm and concentration of digested azo-casein, the same experiment was performed to determine the kinetic values of WT BgAP and the variants P1A6 and P4D3. The calculated kinetic parameters are summarized in Figure 2.19. As observed when the small tetrapeptide substrate was used, the Km values of variants P1A6 and P4D3 are

51 Directed Evolution of BgAP

higher than that of the WT. For the K cat values at 15°C, variants P1A6 and P4D3 again showed an increase of 40 and 59% respectively, whereas the increase of Kcat at 37°C compared to WT is even higher, reaching 67 and 79%. Using azo-casein as a substrate allowed to compare the activity of WT BgAP with variants P1A6 and P4D3 using a macromolecular substrate and the results confirmed the values obtained when suc-AAPF-pNA was used for characterization. As this assay was developed as a comparative tool, the actual values are not necessarily representing the real kinetic parameters, however, it is possible to use them to compare between the BgAP variants.

Figure 2.19. - Kinetic parameters of the selected variants from the random mutagenesis libraries of BgAP using Azo-casein as substrate.

3.2. - Directed Evolution of BgAP for improved thermostability.

From the previous experiments it was confirmed that the selected variants from Directed Evolution of BgAP had a higher specific activity than WT when both the small tetrapeptide and the macromolecular substrate were used. Once the improved activity was confirmed, the next step 52 Directed Evolution of BgAP was to asses the thermal stability of the variants. The screening strategy was focused in finding variants with higher activity at low temperatures; thermal stability was not a screening criteria. The stability of BgAP and the selected variants was tested by incubating the cell culture supernatants at 58°C for 30 min (Figure 2.20). The activity of the treated supernatants was measured and compared to the activity of untreated supernatant, and the residual activity was calculated as percentage of the original activity.

Figure 2.20. - Comparison between the activity of the selected variant and their thermal stability against WT BgAP. (A) The proteolytic activity calculated as mAU/min/µl of supernatant, was higher than WT for all selected variants. On the other hand, when cell culture supernatants were incubated at 58°C for 30 minutes (B) , the calculated residual activity evidenced a decrease in the stability of the improved variants.

Figure 2.20 shows the activity (in mAU/min/µl supernatant) for each of the studied variants. As described before, all variants showed higher activity than WT BgAP. When the residual activity of each variant was calculated after incubation at 58°C for 30 min, the decrease of the thermal stability with respect to WT is evident for most variants, suggesting that the introduced amino acid substitutions had a destabilizing effect in the protein structure as a side effect. Cell culture supernatants from each clone in the library were incubated at 58°C for 30 min. The proteolytic activity of these treated clones was then measured and compared with the activity of the untreated supernatant. Since subtilisin like proteases normally cannot refold once they are

53 Directed Evolution of BgAP denatured, the relationship between the activity of the treated samples and the untreated sample represents the amount of enzyme that did not unfold after incubation at 58°C. The thermal stability tests were repeated using the purified variants. The half-life of the purified variants at 60°C was calculated in order to obtain a more accurate assessment on the stability of the selected variants (Figure 2.21). Purified protein was incubated at 60°C for different periods of time in presence of 250 µg/ml of Bovine Serum Albumin (BSA) to reduce self digestion. The residual activity was calculated for each incubation time and plotted to evidence the incubation time that corresponds to 50% of the activity of the untreated sample. Interestingly, variant P4D3 had a higher half-life than the WT (approximately 5 min vs ca . 2 min) whereas all other variants had a similar or lower half-life. To determine if the observed loss of activity was due to self digestion or protein unfolding, a Thermofluor unfolding assay was performed for all variants, to visualize the unfolding of each protein in excess of protease inhibitor (PMSF) upon increasing temperature (20 to 90°C). The results showed a similar trend to the half-life experiment, showing that variant P4D3 had an increased unfolding temperature with an apparent Tm value of 54.8°C versus 53.1°C of the WT (Figure 2.22). Results also suggested that the loss of stability measured in residual activity and half-life determination experiments are related to the unfolding of the protein and not directly to self digestion.

Figure 2.21. - Half-life determination at 60°C of WT BgAP and the selected variants from Directed Evolution. Purified protein was incubated at 60°C for different amounts of time in presence of 250µg/ml of Bovine Serum Albumin (BSA) to reduce self-digestion. The residual activity was calculated for each incubation time and plotted to evidence the incubation time that corresponds to 50% of the activity of the untreated sample. The experiment was performed in 100 mM Tris/HCl buffer pH 8.6, 10 mM CaCl 2.

54 Directed Evolution of BgAP

WT1 P1A6 P4D3 P7C3 P5F2

. 110 100

90

80 70 60

50

40 30 20

10

Relativefluorescence ((Fobs-Fmin)/(Fmax-Fmin))*100 0 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 Temperature (°C)

Figure 2.22. – Thermofluor unfolding assay of BgAP and selected variants from Directed Evolution. The unfolding of each protease variant was followed by using Sypro orange dye. The detected fluorescence of the sample increases when the dye binds to the exposed hydrophobic residues of the protein upon unfolding. Apparent Tm values were calculated for each variant and compared with WT BgAP. The experiment was performed in 40 mM HEPES buffer pH 7.8, 10 mM CaCl 2, 1 mM PMSF.

3.2.1. – Re-screening of SeSaM libraries

Due to the decrease of the thermal stability of the selected variants, it was necessary to find amino acid substitutions that increased the thermal stability of the protein. This was important for two reasons: 1. - The stability of the protein is important when the final use is aimed for detergent use. The manufacture of the final formula, storage and performance of the enzyme in the final product decrease notoriously when the used protein has a lower stability. 2. - To be able to further evolve the generated variants, they should be stable enough to withstand a further increased mutational load. Variants with low stability are more likely to yield inactive variants, even if the introduced substitutions are theoretically beneficial for the screened property. The active clones from the previously generated SeSaM libraries were screened for increased thermal stability. Additionally, site-saturation libraries were generated at amino acid positions identified from the selected variants. Site-saturation mutagenesis libraries of BgAP generated at position Ile21, Ser39, Asp87, Met122, Asn177, Ala222, Thr247 and Asn253 were screened as described in section 2.2.5, Approximately 300 clones of each library were screened.

55 Directed Evolution of BgAP

Variants showing residual proteolytic activity higher than WT BgAP are summarized in Table 2.12. Sequencing results revealed that four of the selected clones had the same amino acid substitutions as the previously described P4D3 variant. Variant 39IC1 had three new amino acid substitutions not previously described: Ser39Glu, Asn74Asp and Asp87Glu. Interestingly, two different amino acids substitutions were found at position 253; clone 253IE12 with Asn253Asp and clone 253IIIB4 with Asn253Pro.

Table 2.12. - Selected protease variants with improved thermal stability. Clones from SeSaM random mutagenesis libraries and site-saturation libraries were screened for improved thermal stability using the residual proteolytic activity after incubation for 30 min at 58°C as screening criteria. Note that the activity ratio 15°C/37°C is lower than in WT BgAP.

Position 21 39 74 87 122 253 Resid. Act. Ratio Act. 15°/37°

WT BgAP I S N D M N 65% 50%

39IC1 I E D E M N 108% 44% 87IIID12 V S N D L D 79% 41% 247IIIF1 V S N D L D 81% 42% 222IE11 V S N D L D 82% 45% 253IID4 V S N D L D 80% 40% 253IE12 I S N D M D 89% 49% 253IIIB4 I S N D M P 80% 44%

The increased thermal stability was further tested by measuring the residual activity incubating the cell culture protease for 10 min at 60°C. Figure 2.23 shows that the residual activity of variant 39IC1 had a much higher residual activity than all other tested variants, followed by 253IE12 (Asn253Asp). Although proteolytic activity of clone 253IIIB4 (Asn253Pro) showed an increase in residual activity when tested at 58°C, a more stringent test at 60°C revealed that in fact the increase in stability was not enough to differentiate it from the other variants.

56 Directed Evolution of BgAP

88.0 100 90 80 70 49.1 60 32.0 32.3 32.3 50 32.7 40 30 20 (%) activity Remaining 10 0 WT P4D3 P6C7 253IE12 253IIIB4 39IC1 Figure 2.23. – Residual activity after 10 min at 60°C of WT BgAP and the selected variants from Directed Evolution. Cell culture supernatants were incubated at 60°C for 10 min. The residual activity was calculated and plotted with respect of an untreated sampled. The experiment was performed in 2% w/v skim milk, 100 mM Tris/HCl buffer pH 8.6.

According to the results, two new variants (253IE12 and 39IC1) were identified having a total of four amino acid substitutions (Ser39Glu, Asn74Asp, Asp87Glu and Asn253Asp) that were related to increased thermal stability. The next step was to combine these four amino acid substitutions to generate a further improved variant, providing a more stable scaffold where additional substitutions related to improved activity at lower temperatures could be introduced.

3.2.2. – Site-directed mutagenesis to generate variant 39IC1+N253D

In order to combine all of the identified amino acid substitutions related with improved thermal stability, Site-directed mutagenesis (SDM) was performed in variant 39IC1 to introduce the amino acid substitution found in variant 253IE12 (Asn253Asp). The detailed procedure for SDM is described in section 2.2.6, and the oligonucleotide used to introduce the amino acid substitution is reported in Table 2.12. The DNA sequence of the product of the SDM was checked and the generated plasmid was transformed into B. subtilis WB600. The thermal stability of the newly generated variant 39IC1+N253D was compared to 39IC1 by checking the residual proteolytic activity of the cell culture supernatant after 30 min incubation at 60°C, 65°C and 70°C (Figure 2.24). The residual activity test showed a higher thermal stability for the newly generated 39IC1+N253D, showing a cumulative effect on the amino acid substitutions related to thermal stability. The supernatant of the generated variant still had proteolytic activity after incubation for 30 min at 70°C, whereas all other previously generated variants lost completely their activity.

57 Directed Evolution of BgAP

60°C 65°C 70°C

120 97.4 100 80.4 78.4 80

60 31.0 40

Residual activity (%) activity Residual 13.1 20 0.0 0 39IC1+N253D 391C1

Figure 2.24. – Residual activity after 30 min at 60°C, 65°C and 70°C of 39IC1+N253D and 39IC1. Cell culture supernatants were incubated at 60°C, 65°C or 75°C for 30 min. The residual activity was calculated and plotted with respect of an untreated samples. The experiment was performed in 2% w/v skim milk, 100 mM Tris/HCl buffer pH 8.6.

3.2.3. - Site-directed mutagenesis and combination of amino acid substitutions

After the generation of the more stable 39IC1+N253D, a combination of this variant and the amino acid substitutions from variants with higher activity at low temperatures was constructed using SDM. Two amino acid substitutions were chosen to be inserted into 39I+N253D. The first was Ile21Val, present in all selected variants from Round 1, 2 and 3 of SeSaM libraries, and Met122Val, present in the best performing clones P1A6, P4D3 and P6C7. The oligonucleotides used for SDM are listed in Table 2.4. Thermal stability of the resulting variants was tested by calculating the residual proteolytic activity of the cell culture supernatants after incubation at 65°C for 30, 60 and 120 min (Figure 2.25). As described before, variant 39IC1+N253D had the highest thermal stability of all generated variants. When Ile21Val was introduced, residual activity decrease by 20% in average with respect of 39IC1+N253D, suggesting a destabilizing effect of the introduced substitution. When Met122Leu was introduced, the residual activity calculated was similar to that of 39IC1+N253D. When Ile21Val and Met122Leu are simultaneously introduced, there was a decrease in the residual activity similar to that when Ile21Val was introduced, suggesting that there is no additional loss of thermal stability as a product of the combination of Ile21Val and Met122Leu in the final variant. The variant that included both amino acid substitutions was called 39IC1-MutIII.

58 Directed Evolution of BgAP

30 min 60 min 120 min

88.8 100 84.8 86.2 80.8 78.7 90 75.8 76.2 74.7 80 64.5 65.5 70 55.8 51.0 60 42.7 50 36.4 40 30 Residual activity (%) 19.1 20 10 0 39IC1 39IC1+N253D +I21V +M122L +I21V+M122L

Figure 2.25. – Residual activity after 30, 60 and 120 min incubation time at 65°C of variants with combined amino acid substitutions. Cell culture supernatants were incubated 30, 60 and 120 min incubation time at 65°C. The residual activity was calculated and plotted with respect of an untreated sampled for 39IC1, 39IC1+N253D, +I21V (39IC1+N253D+I21V), +M122L (39IC1+N253D+M122L) and +I21L+M122L (39IC1+N253D+I21V+M122L). The experiment was performed in 2% w/v skim milk, 100 mM Tris/HCl buffer pH 8.6.

3.2.4. - Purification and characterization of variants with improved thermostability

The generated variant 39IC1 MutIII was purified as described in section 2.2.7. The purification step was needed for accurate determination of the thermal stability and activity at both 15°C and 37°C. The resulting purified fractions are reported in Figure 2.26.

59 Directed Evolution of BgAP

Figure 2.26. – Agilent analysis of purified Thermostable BgAP variants. The protein content of the purified samples of the selected variants was precipitated using TCA and loaded into the Agilent protein 230 chip. The red arrow corresponds to the 27 kDa size BgAP band.

Once the purified protease was obtained, a more detailed study on the thermal stability of the variants was performed. Figure 2.27 reports the half-life determination experiment of all the relevant variants of this work at 60°C. The calculated half-life values are detailed in Table 2.13. Compared with WT BgAP, variants selected with a higher activity at lower temperatures (P1A6, P4D3, P6C7, P5F2, P7C3) showed similar or lower half-life values, whereas variants selected by thermostability (N263D and 39IC1) and a combination of them (39IC1+N253D and 39IC1 MutIII) showed an increase up to 200 times in the half-life value.

60 Directed Evolution of BgAP

Figure 2.27. – Half-life determination at 60°C of WT BgAP and the selected variants from Directed Evolution. Purified protein was incubated at 60°C for different periods of time in presence of 250µg/ml of Bovine Serum Albumin (BSA) to reduce self digestion. The residual activity was calculated for each incubation time and plotted to evidence the incubation time that corresponds to 50% of the activity of the untreated sample. The experiment was performed in 100 mM Tris/HCl buffer pH 8.6, 10 mM CaCl 2. The half-life values of variants 39IC1+N253D and39IC1 MutIII were extrapolated using a linear decrease approximation.

Table 2.13. – Half-life values at 60°C calculated for all relevant variants obtained by Directed Evolution of BgAP. Variants with an improved thermal stability are underlined.

39IC1+ 39IC1 Variant WT P1A6 P4D3 P6C7 P5F2 P7C3 N253D 39IC1 N253D MutIII Half-life at 60°C 2.0 0.8 4.0 1.6 0.5 0.7 10 85 431 223 (min)

To confirm the obtained results on residual activity and to evidence the effect of the introduced mutation in the stability of the protein scaffold of the BgAP variants, the Thermofluor unfolding assay was repeated, including the variants showing an increased thermal stability (Figure 2.28). The calculated Tm value for the thermostable variants was significantly higher than WT and the previously tested variants, confirming an increase in the temperature necessary to unfold the generated variants.

61 Directed Evolution of BgAP

Figure 2.28. – Thermofluor unfolding assay of BgAP variants from Directed Evolution towards improved thermal stability. The unfolding of each protease variant was followed by using Sypro orange dye. The detected fluorescence of the sample increases when the dye binds to the exposed hydrophobic residues of the protein upon unfolding. Apparent Tm values were calculated for each variant and compared with WT BgAP. The experiment was performed in 40 mM HEPES buffer pH 7.8, 10 mM CaCl 2, 1 mM PMSF.

Kinetic parameters of 39IC1 MutIII variant were determined, using suc-AAPF-pNA as substrate, as described in section 2.2.8, and compared with WT BgAP (Figure 2.29). The Km value calculated for 39IC1MutIII at 15°C was higher than WT, as observed in the previous results for other variants. The Km value at 37°C however, was for the first time in this work lower than that of the WT. The K cat value at 15°C increased by 52%, whereas at 37°C the increase was 17%. This different increase suggests that this variant has been specifically improved for activity at lower temperatures, additionally to the improved thermal stability. These results suggest that it is possible to combine amino acid substitutions that confer a higher thermal stability with mutations that are related to activity at lower temperatures, and to obtain a variant that has a combination of both properties.

62 Directed Evolution of BgAP

Figure 2.29. - Kinetic parameters of WT BgAP and 39IC1 MutIII using suc-AAPF-pNA as substrate at 15°C and 37°C.

Figure 2.30 summarizes the thermal dependence of the activity of all relevant variants of BgAP, initial rates were measured from 10°C to 80°C using suc-AAPF-pNA as a substrate in minimal concentration of CaCl 2 (<0.01 mM) to simulate chelating conditions. Variants P1A6 and P4D3 showed a higher activity than the WT in the range of 10°C to 50°C, but at temperatures higher than 50°C the specific activity decreases due to low stability. In the case of thermostable variants 39IC1and 39IC1+N253D, specific activity at low temperatures is not higher than WT, but they show high activity at temperatures higher than 50°C. The final variants 39IC1 MutIII showed a higher specific activity than WT BgAP in the whole temperature range (10°C-70°C)

63 Directed Evolution of BgAP

P1A6 P4D3 N253D 39IC1 39IC1N253D MUTIII WT

1.6

1.4

1.2

1.0 NA/s) p

0.8 t

0.6 Activity (µM 0.4

0.2

0.0 0 10 20 30 40 50 60 70 80 90

Assay temperature (°C) Figure 2.30. – Dependence of the specific activity on temperature for relevant BgAP variants.

64 Directed Evolution of BgAP

3.3. - Homology modeling of BgAP.

There is no available crystal structure for BgAP. A database alignment analysis (BLASTP, http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins) showed that the closest protein with a known crystal structure is the mature form of B. lentus alkaline protease (PDB code 1ST3A), with a 78% identity at amino acid level. 1ST3A was used as a template for the generation of a homology model of BgAP using the MODELLER software (Figure 2.31). The Ramachandran plot for the crystal structure and the model showed a very similar distribution where more than 90% of the residues are in the most favored regions, confirming the reliability of the model (Figure 2.32).

Figure 2.31. – Cartoon representation of the alignment of the three-dimensional structure of the crystal structure of B. lentus Alkaline Protease (blue) and the generated homology model of BgAP (Red) . The zoomed pictures shows the catalytic triad (Asp32, His62, Ser215), which has the same configuration in both models.

65 Directed Evolution of BgAP

Figure 2.32. – Ramachandran plot of the Crystal structure of B. lentus AP (left) and the model of BgAP (right).

4. - Discussion and conclusions

4.1.- Analysis of the mutations generated by SeSaM

As mentioned previously, sequencing analysis of clones from all three SeSaM libraries of BgAP revealed a library enriched towards transversion mutations. Naturally occurring mutations, as well as those found in epPCR random mutagenesis libraries are strongly biased towards transition mutations [9]. Transversion mutations have a higher chance of resulting in an amino acid substitution than transitions, and those substitutions have also a higher probability to exchange residues with different chemical properties [80]. The number of mutations of each type found when plasmid DNA from randomly chosen clones as well as clones with higher activity is described in Table 2.14. From a total of 88 found mutations, 50 (56%) were transitions while 38 (44%) were transversions. Interestingly, this ratio is the same in both the randomly selected clones and in the clones selected due to increased proteolytic activity. On the other hand, there were 14 transition mutations leading to no amino acid substitution, whereas all of the transversions generated amino acid substitutions.

66 Directed Evolution of BgAP

Table 2.14.- Analysis of the mutations found in clones from all three SeSaM libraries .

Total number In random clones In selected clones Silent Transitions 50 25 25 14 Transversions 38 19 19 0

The distribution of the mutations along the protease gene is shown in Figure 2.33. The randomization was performed in the section of the protease gene that encodes for the mature protease form, which starts at nucleotide 339 of the full gene. When each mutation found in the sequenced clones was plotted across the length of the gene, the observed distribution was homogeneous in the region subjected to the random mutagenesis. There was no obvious pattern also when transition and transversion mutations are analyzed separately. The fact that mutations are found distributed across the gene and not limited to some “hot-spots” supports the assumption that a library with mutations across the whole length of the randomized section of the gene was screened.

Figure 2.33.- Distribution of the found mutations across the alkaline protease gene. Each identified mutation was plotted according to their position on the protease gene. Transition mutations are plotted in red, and transversions are blue. The green arrow represents the forward primer used to generate the SeSaM library. The distribution is homogeneous across the length of the gene for both types of mutations.

Another feature of the SeSaM library generation method that can be observed from the sequencing results is the recombination of the mutations present in the parents of each Directed Evolution round. Figure 2.34 shows how the substitutions are carried over the generations on the Directed Evolution of BgAP. Selected variants at the end of each round have a combination of the substitutions present in the parents, plus introduced amino acid substitutions due to the random mutagenesis method. This recombination allows assessing the effect of each group of amino acid substitution and further increases the diversity of the library.

67 Directed Evolution of BgAP

Figure 2.34. - Summary of the amino acid substitutions found in selected clones along each Directed Evolution round.

4.2. - Directed Evolution of BgAP

4.2.1. – Cold adaptation of BgAP

As reported before [79], BgAP has shown a better performance in washing applications than other subtilisins, however this performance is greatly decreased at low temperatures, a common feature in mesophile subtilisins [35, 38, 41]. Subtilisin proteases are among the most studied family of proteins yet no general rules regarding temperature adaptation and thermal stability are widely accepted. There are reports regarding the improvement of the proteolytic activity of subtilisins towards low temperatures by screening epPCR libraries [46-48, 91] and rational design [45], however both methods generated variants with a decreased thermal stability. In this study, a similar scenario was found when the thermal stability was tested for the selected variants after three rounds of Directed Evolution towards improved activity at low temperatures. Variants P1A6, P5F2, P6C7 and P7C3 showed similar or lower residual activity than WT BgAP after incubation at 58°C for 30 minutes, only variant P4D3 showed a slightly improved thermal stability that was confirmed by the analysis of the purified proteases variants by half life determination at 60°C and Thermofluor thermal shift assay. The kinetic characterization confirmed that the amino acid substitutions in the selected variants had a

68 Directed Evolution of BgAP positive effect in the activity at low temperatures. Positions found to be relevant for cold adaptation for subtilisin BPN’ [46, 47, 91], subtilisin SSII [48] and subtilisin-Tk [92] are described in Table 2.15, but no structural relationship has been proposed between them. Subtilisins have a much conserved structural scaffold, regardless amino acid homology, a structural alignment can be performed between several subtilisin structures to obtain the equivalent position in BgAP. Figure 2.35 describes the structural positions of the reported amino acid substitutions for each subtilisin, compared with the substitutions reported in this work for BgAP.

Table 2.15. - Previously reported amino acid substitutions in subtilisins related with cold adaptation.

Enzyme Amino acid substitution Reference Subtilisin BPN’ Val84Ile [46] Ala88Val, Ala98Thr [47] Gly131Phe [91] Subtilisin SSII Lys11Arg, Asp98Asn, [48] Ser110Phe,Thr253Ala, Subtilisin-Tk Ser107Thr, Thr135Ser; [92] Trp131Cys, Asp379Gly,

69 Directed Evolution of BgAP

Figure 2.35. - Structural alignment showing amino acid positions reported to have an effect in low temperature adaptation in Subtilin BPN’ (blue), Subtilisin SSII (green) and Subtilisin-Tk (yellow) compared with amino acid substitutions found in BgAP variants in this work.

Several amino acid position described in this work are in proximity to the structural equivalent positions from the previously described substitutions; for example position Asp87 in BgAP is the structural equivalent of Asp98 in subtilisin SSII and next to Ala88 described in subtilisin BPN’. Interestingly, position Ser107 from Subtilisin-Tk is within 4 Å of residue 87 (Figure 2.36, A). This strongly suggests that this area of the protein should be targeted when cold adaptation is to be achieved in any subtilisin. The amino acid substitutions, however, are chemically different in each case; whereas in BgAP is from negative to negative residue only increasing the chain length whereas at the same structural position in SSII the substitution is from a negative to a polar residue. The substitution at position 88 in BPN’ is from non-charged to non-charged having a polar residue (Ser) at position 89, differently from Asp in both BgAP and SSII, thus no obvious mechanism of adaptation can be inferred other than the destabilization of that protein region by the change in the local residues charges. Position 253 in BgAP is rather interesting, the substitution Asn253Asp is present in variant P4D3, the only variant with improved activity at 15°C that had higher thermal stability than WT BgAP, later confirmed by the construction of the single substitution variant N253D. A reported cold adaptation substitution in Subtilisin-Tk is in the same loop, corresponding to position 379, which is exchanged from Asp to Gly. Unfortunately, the possible effect this substitution was not further discussed in the original publication. The possible stabilization of the 251-255 turn is suggested

70 Directed Evolution of BgAP by the comparison of the energy minimized model of WT BgAP and the N253D variant (Figure 2.36, B). A new hydrogen bond is formed when the configuration of the turn in WT BgAP (orange) is compared with the one of variant N253D. This introduced interaction could explain the increased thermal stability.

Figure 2.35. - (A) Structural alignment of amino acid positions reported to be related with cold-adaptation in Subtilisin BPN’ (blue), SSII (green) and Tk (yellow) in the neighborhood of position 87 in BgAP described in this work. (B) Change in configuration in the 251-255 turn from the energy minimized model of WT BgAP (orange) and the variant N253D (colored).

None of the amino acid substitutions identified in this work are close to the catalytic triad (Asp32, His62, Ser215) or to the unspecific binding pockets (loop 99-105 and 115-120) though the destabilizing substitution Met122Leu could be close enough to increase the flexibility in the binding loop, facilitating catalysis at 15°C.

4.3.2. – Increasing thermal stability of BgAP

Thermal stability of BgAP was improved by Directed Evolution, and variant 39IC1 MutIII was obtained, which has an increase of more than 100-fold the half life at 60°C at the tested conditions. This step in the evolution of BgAP was included because most of the introduced amino acid substitutions found in the selected variants with increased activity at low temperatures were destabilizing the protein scaffold. When performing a laboratory evolution project, it is recommended to recover the stability of the studied protein [10, 93], this allows the protein to remain stable and endure the mutational load in future laboratory evolution rounds along with increasing the probabilities of exploring protein configurations that would not be possible in a less stable scaffold. 71 Directed Evolution of BgAP

Substitutions Ile21Val and Met122Leu are present in most of the selected clones for improved activity at 15°C, but at the same time are detrimental in terms of thermal stability. Since both of them are located in the core of the protein, replacing linear non-charged residues by shorter, more globular non-charged residues, they may produce a destabilizing effect at the core level of the protease, increasing the overall flexibility, favoring the catalysis at low temperatures. On the other hand four amino acid substitutions were identified to increase thermal stability of BgAP (Ser39Glu, Asn74Asp, Asp87Glu and Asn253Asp); all of them replace polar or negative residues by negative residues and all of them are located at the surface of the protein (Figure 2.37). The most stable variant (39IC1+N253D) was generated by combining all of the four mutations. Ionic interaction analysis revealed that a new salt bridge was generated at position 87 due to the interaction between Glu87 and Arg44. The stability of the generated variants was theoretically studied by calculating the free energy in vacuo of the minimized models using the FoldX program (Table 2.17).

Table 2.17. - FoldX energy calculations for generated variants of BgAP.

Variant Amino acid substitutions E (Kcal/mol) at 25°C E (Kcal/mol) at 60°C P1A6 Gln12Leu, Ile21Val, 37.02 102.89 Met122Leu, Ala222Ser P4D3 Ile21Val, Met122Leu, 35.72 100.94 Asn253Asp WT BgAP - 35.04 100.85 39IC1 Ser39Glu, Asn74Asp, 33.71 100.18 Asp87Glu 39IC1 MutIII Ile21Val, Ser39Glu, 33.02 99.09 Asn74Asp, Asp87Glu, Met122Leu, Asn253Asp N253D Asn253Asp 32.96 98.72 39IC1+N253D Ser39Glu, Asn74Asp, 32.26 98.37 Asp87Glu, Asn253Asp

The order of the variants from the least to the most stable calculated by the software mostly coincides with the observed experimental results obtained from the thermal unfolding and the half life experiments, with the exception of P4D3 which is assigned as less stable than observed and N253D which is assigned as more stable than observed. This difference probably is due to the fact

72 Directed Evolution of BgAP that the software only considers free-energy as a parameter for stability, whereas other factors can also affect stability of the protein, such as dynamic properties of the protein scaffold and the interaction with the solvent.

Figure 2.37. – Summary of the previously reported amino acid substitutions related with thermal stability in Subtilisin E (green), BPN’ (blue) and Suntilisin S88 (yellow) compared with those fond in this work for BgAP (red)

Directed Evolution to improve thermostability of subtilisins has been previously reported [74, 76, 94-97]. Most substitutions described in these works introduced charges on the surface of the protein, few of them in secondary structure motifs (Figure 2.36). As in the previous analysis, when a structural alignment is made between all studied subtilisins, it was found that several of these substitutions are located within the same protein regions. The possible effect of the substitution Asn253Asp is already addressed in section 4.2.1. One of the most remarkable examples is the region close to the Ca-1 calcium . Substitutions in positions 74 in BgAP and 76 in Subtilisin E are related with improved stability and located in the Ca-1 binding loop. In Subtilisin E and possibly in BgAP the introduction of a negatively charged residue would stabilize the calcium binding site [74]. Interestingly, subtilisin S88 [76] is an engineered variant with a improved stability at chelating conditions. The strategy to achieve this was completely different and consisted in complete removal of the calcium binding loop (Figure 2.38, in yellow) showing that this loop is not necessary for subtilisin to be completely functional.

73 Directed Evolution of BgAP

Figure 2.38. – Detailed view of the Ca-1 calcium binding loop in Subtilisin E (green), BgAP (red) and Subtilisin S88 (yellow). The amino acid substitutions Asn76Asp and Asn74Asp in Subtilisin E and BgAP, respectively, are proposed to stabilize the coordination of the calcium ion [74]. The complete binding loop was removed in Subtilisin S88 (yellow), increasing the stability of the protease at chelating conditions [76].

There are many theories that explain the structural basis of thermal stability, loop regions are usually regarded as high flexibility points of the protein structure, the excessive fluctuations in longer loops are considered to be potential initiation point for thermal denaturation, and the stabilization, reduction of their size or elimination (when activity is not affected) may contribute to the protein stability [98]. Genomic comparisons show that proteins from thermophilic organisms tend to have a higher percentage of the charged amino acids, aspartic and glutamic acid, lysine and arginine, than their mesophilic homologues [99] increasing the occurrence of ionic interactions. This is consistent with the identified substitutions in this work for BgAP; Asn253Asp introduced an additional hydrogen bond and Asp87Glu introduced a salt bridge with Arg44. The stabilizing substitutions did not have an effect on the efficiency of the proteolytic activity (data not shown) thus their effect is solely on the stability of the variants. Though increasing protein stability by stabilizing loops and decreasing overall flexibility, especially in loop regions is a widely accepted strategy, a

74 Directed Evolution of BgAP recently proposed mechanism of stabilization relying in protein dynamics and flexibility distribution is discussed in Part II of this work.

4.3. - Conclusions of Part I

In summary, an expression system for BgAP in B. subtilis WB600 was developed for microtiter plates with a standard deviation in cell growth and protease expression suitable for proteolytic activity screening, and likely other enzymes suitable for the Bacillus secretion system. The screening and characterization assays were adapted from the classical proteolytic activity quantification methods (skim milk and suc-AAPF-pNA). The library generation method (SeSaM) provided a library with a homogeneous distribution on the mutations in both across the gene length and the amount of transitions and transversion type of mutations. Two sets of amino acid substitutions were identified for BgAP; substitutions related to improved activity at low temperatures were non-charged to non-charged residues (Ile to Val, Met to Leu, Val to Leu), most of them located in the core of the protein. Despite previous reports, no substitution was found close to the active site or the unspecific binding pockets of BgAP. On the other hand, substitutions found to be related to thermal stability, were limited to the surface of the protein, all of them introducing negative charged residues. The importance of the Ca-1 calcium binding site has been confirmed by the identification of a substitution in the Ca-1 binding loop consistent with previous findings in other subtilisins. The generation of variant 39IC1 MutIII strongly suggests that cold adaptation and thermal stability are independent features that can be simultaneously improved meaning that the path of adaptation towards one feature does not necessarily results in losing the other when the evolution strategy involves parallel screening and recombination of beneficial substitutions. Further investigation on the effect of each type of amino acid substitution and the identification of further improved variants using the proposed strategy will provide valuable information regarding protein adaptation and protein engineering.

75 Temperature Effects on Dynamics of S41

Part III: Temperature Effects on Dynamics of Psychrophilic Protease S41 and its Thermostable Mutants in Solution.

“Certainly no subject or field is making more progress on so many fronts at the present moment than biology, and if we were to name the most powerful assumption of all, which leads one on and on in an attempt to understand life, it is that all things are made of atoms, and that everything that living things do can be understood in terms of the jiggling and wiggling of atoms ” - Richard P. Feynman, The Feynman Lectures on Physics , 1964 1. – Introduction

1.1. - Aim of this work

The objective of the work described in this Chapter was to propose, based on the results of Molecular Dynamics (MD) simulations, a mechanism of molecular adaptation of Subtilisin S41 towards increased thermal stability described for two variants found in literature, previously generated by Directed Evolution. The analysis of the MD simulations was supossed to be focused to address the following question: What are the structural and dynamic differences between Subtilisin S41 and its thermostable variants and how do they relate to differences previously described between psychrophilic and thermophilic enzymes?

1.2. – Molecular dynamics simulations

In 1977, the first report of a theoretical study of the behavior in vacuum of a protein in the picosecond timescale (the bovine pancreatic trypsin inhibitor, BPTI) was reported [100]. The results of this work dramatically changed the perception of proteins as relatively rigid structures to more dynamic systems, where internal motions play a functional role [101]. The approach used was of the molecular dynamic type, where the classical equations of motions for all the atoms of an assembly are solved simultaneously for a suitable time period and detailed information was extracted by analyzing the resulting atomic trajectories [100], which later became known as Molecular Dynamics simulations. From the results of this work, a wide range of motional phenomena were investigated by MD simulations of proteins and nucleic acids, resulting in the generation of a new field of biology whose contribution to the understanding of the molecular mechanisms of proteins has grown exponentially, together with the computational power used to perform these studies. Nowadays, MD simulation is a well-established method for modeling the dynamic nature of proteins and their conformational changes, typically taking place on timescales ranging from picoseconds to the microsecond. Alongside MD, other approaches to molecular simulations have been developed such as the Normal Mode Analysis (NMA), where the simple harmonic motions of the molecule about a local

76 Temperature Effects on Dynamics of S41 energy minimum are calculated, and Brownian Dynamics (BD), a variation of MD in which the use of approximations makes long timescales calculations possible [102]. There are three types of applications of simulation methods in the macromolecular area [101]; the first uses simulations as a means of sampling configuration space by the use of MD, often with simulated annealing protocols, to determine or refine structures from data obtained from experiments. The second uses simulations to describe a system in equilibrium, including structural and motional properties and the values of thermodynamic parameters. The third application uses simulations to examine the actual dynamics, where the motions and their development with time are of main interest. The three sets of applications have increasing demands on simulation methods and to their required accuracy and precision. Recently, studies based in MD and the associated simulation methods are elucidating the role of molecular motions in binding, enzymatic activity, signaling mechanisms and macromolecular assemblies in biological systems [101-104]. The development of programs with great capabilities such as CHARMM [105], AMBER [106] and GROMOS [107], and the increase of computational power today allows to perform simulations 5000 times longer than the original simulation time on BPTI with much larger systems, in the range of 10 4 – 10 6 atoms instead of 500. Given the continuing improvements and wide availability, MD is becoming an important tool not only to generate data for theoretical studies, but also in the analysis and interpretation of experimental results, showing that the interaction between experiments and simulations is quickly becoming an integral part of molecular biology.

1.3. – Extremophiles and protein adaptation

Extremophile organisms synthesize proteins tailored to resist the hardest living conditions on Earth, from the icing arctic sea poles to the scorching volcanic vents in the deep oceans [108, 109]. These examples of adaptation are of great interest for academic research but also for technological application [110-112]. Among these proteins, psychrophilic enzymes are important in the quest for sustainable energetic conditions as a replacement in biotechnological processes operated by efficient, but energy demanding, mesophilic or thermophilic variants [63, 113, 114]. Unfortunately, low thermostability is a common drawback of cold adapted enzymes and it is the main reason that prevents them to be commonly used for industrial applications. The differences in stability among extremophile proteins have been widely investigated [114, 115]. In all these studies, the lack of an unanimous consensus on the mechanisms responsible for the different stabilities in extremophiles suggests that the problem is multi-faced and it can be addressed from different perspectives. Structural genomics is providing an increasing number of information on

77 Temperature Effects on Dynamics of S41 extremophile proteins that definitively help to correlate their peculiar functional characteristics with the details of their structure [109]. These studies indicate that the common strategy to improve protein thermostability consists of the reinforcement of the overall structural rigidity of the protein by increasing the number of disulfide bonds, intra-molecular salt bridges and by trimming the length of loop regions [116-118]. As a complement to these structural studies, in the last few years, a common opinion emerged concerning the connection between stability and protein dynamics. In particular, it was proposed that protein fluctuations can provide a mechanism of thermal stability in addition to the structural reinforcement enhancements [38, 43]. This hypothesis now starts to be strongly supported by the recent progress in experimental techniques based on neutron diffraction methods [119], Nuclear Magnetic Resonance (NMR) [120, 121] and theoretical studies based on computer simulations [122, 123] of protein dynamics in solution. In particular, NMR expanded its armamentarium of techniques providing a copious amount of new information on the dynamics of proteins at different levels of time scales [121, 124]. These studies evidenced the importance of protein dynamics on the picosecond and nanosecond time scale for the stability and activity of extremophile enzymes [120, 125]. Among computational techniques, MD simulations can provide detailed structural information on the protein stability and dynamics, to support and complement experimental data [126-128]. Several MD studies have been performed to study and compare the thermal behavior of extremophile enzymes [129-137]. Some of these studies have been conducted on short time scale (<5 ns) and this might be the reason for their controversial results. In fact, some of them report that thermostable enzymes have larger overall flexibility than their mesophile homologous or cold-adapted counterparts [129-131], others that the difference in flexibility profiles are not as remarkable as often supposed [134, 135].

1.4. – The Studied system

In this work, the study of protein adaptation was performed by exploring the possible role of selective protein fluctuations on the control of the protein stability using long time scale MD simulations (50 ns) at different temperatures. The model protein used was Subtilisin S41 and two mutants with enhanced thermal stability [50]. Subtilisin S41 (Figure 3.1a) is a cold adapted protease isolated from Antarctic bacillus TA41 [40] and as most of psychrophilic enzymes, is thought to have evolved towards a high conformational flexibility, which is responsible for an increased catalytic efficiency at low temperatures and associated with a low stability. The subtilisin S41 gene encodes for a 419 amino acid pro-enzyme but only 309 form part of the mature protein. Subtilisin S41 shares most of its properties with mesophilic subtilisins (structure

78 Temperature Effects on Dynamics of S41 of the precursor, 52% amino acid sequence identity, alkaline pH optimum, broad specificity, Ca +2 binding) but is characterized by presenting extended loops, a higher specific activity on macromolecular substrates, a shift of the optimum of activity toward low temperatures and by a low thermal stability [40]. Successful attempts to generate thermostable variants of subtilisin S41 by Directed Evolution have been reported by Miyazaki and Arnold [49, 78]. The mutant 1-14A7 is a double mutant (named MUT 2 in this work) and having the two mutations: Lys211Pro and Arg212Ala (Figure 3.1b) with a half-life at 333 K 10 times longer (84 min) than that of the wild type (WT) S41 (8 min). The mutant 3-2G7 was obtained by additional improvement of MUT2, introducing five additional mutations (named MUT7 in this work): Ser145Ile, Ser175Thr, Lys221Glu, Asn291Ile and Ser295Thr (Figure 3.1b). The half-life of this seven-amino acid substitution variant at 333 K showed to be 500 times that of WT S41. This example of protein adaptation offers a unique opportunity to study a case in which only the stability of a psychrophilic enzyme was altered, without the decrease of activity at low temperatures inherent to thermophilic enzymes, according to the corresponding states hypothesis [138]. Despite the successful results, the lack of a crystal structure for S41 at the time of the publication prevented further analysis of the molecular foundation of improved thermal stability of this enzyme. The recent determination of the crystal structure of Subtilisin S41 [139] allowed to perform a structural and dynamics study of mechanisms of thermal adaptation of this protein.

Figure 3.1. – Crystal structure of Subtilisin S41 . Cartoon representations of the crystal structure of Subtilisin S41 indicating (a) the active site residues (Asp34, His71 and Ser249), the ion binding sites (Ca-1, Ca-2, Ca-3, Ca-4 and Na-5) and the extended loops. (b) The position of the amino acid substitutions in MUT2 (K211P, R212A, orange) and the additional substitutions leading to MUT7 (S145I, S175T, K221E, N291I, S295T, green). The active site residues (D34, H71, and S249) are represented in red and the structural ions as dark spheres.

79 Temperature Effects on Dynamics of S41

2. - Materials and methods

2.1. - Starting structures

The starting structure for S41 simulations was the X-ray structure of psychrophilic S41 subtilisin- like protease (PDB code: 2GKO) resolved at 1.4 Å resolution [139]. The four structural calcium ions (named Ca-1, Ca-2, Ca-3, and Ca-4), the sodium ion (named Na-5) and water molecules present in the crystal structure were retained for the simulations while the benzyl sulfinic acid molecules were removed. The mutants MUT2 and MUT7 derived from S41 were generated using the program Swiss PDB Viewer [140] by selecting the most probable rotamers. The OPLS force field [141] was used for all simulations. The TIP4P was adopted as water model [142].

2.2. - MD simulations

Molecular dynamics simulations and the analysis of trajectories were performed using GROMACS 4.0 software package [143] . Each protein molecule was centered in a truncated octahedral periodic box. The size of the box was set to have a minimal distance between the protein atoms and any side of the box not less than 0.75 nm. The proteins were solvated by stacking equilibrated boxes of water molecules to completely fill the simulation box. All solvent molecules with any atom within 0.15 nm from solute atoms were removed. The resulting total charge of the box was -2 for S41, -4 for MUT2 and -6 for MUT7 respectively. Sodium counter ions were added by replacing the water molecules at the most negative electrostatic potential to balance the total charge of the boxes. The resulting systems and the conditions used for the simulation are reported in Table I.

Table 3.1. – Summary of the simulations.

S41 MUT2 MUT7 Box volume (nm 3) 320 320 247 Number of atoms 41790 41326 31360 Solvent molecules 9373 9262 6767 Counter ions 2 4 6

The system energy was minimized using the steepest descendent method for at least 500 steps to remove clashes among generated water molecules. The three proteins were studied at the two reference temperatures of 283K and 363K. The large temperature gap was chosen to enhance the temperature effect on the protein dynamics of the systems observed in the timescale of the

80 Temperature Effects on Dynamics of S41 simulations. All simulations were performed by keeping the temperature at the reference value by weak coupling (coupling constant τ=0.1) with an external bath [144]. The protein and the rest of the system were coupled to two different temperature baths. The pressure of the system was kept constant at 1 bar by using the Berendsen’s barostat [144] with a coupling constant of τp= 0.5. The LINCS algorithm was used to constrain all bond lengths [145]. For the water molecules, the SETTLE algorithm was used [146]. A dielectric permittivity,

εr=1, and a time step of 2 fs were used. The non-bonded interactions (electrostatic and Leonard- Jones) were calculated using the Particle Mesh Ewalds (PME) method [147]. For the calculation of the long-range interactions, a grid spacing of 0.12 nm combined with a fourth-order B-spline interpolation were used to compute the potential and forces between grid points. A non-bonded pair-list cutoff of 0.9 nm was used and the pair-list was updated every 5 time steps. The MD simulations were started by assigning to all atoms an initial velocity obtained from a Maxwellian distribution at desired initial temperatures. The simulated systems were first equilibrated for 100 ps with position restraints on the protein heavy atoms to allow relaxation of the solvent molecules to an equilibrium density. After the equilibration, the position restraints on the proteins were removed and the systems were gradually heated from 50 K to 300 K during 50 ps of simulation. The following production runs were 50 ns long.

2.3. - Analysis of the simulations

2.3.1. - General structure analysis

Each simulation was monitored using the total per residue backbone root-mean-square deviations (RMSD) using crystal structure as reference. The secondary structure of the protein was analyzed using the DSSP criteria [148]. The presence in the binding site of the four calcium and one sodium ions was monitored by analyzing time series of selected distances between Ca-1 and Asp286, Ca-2 and Asp216, Ca-3 and Glu49, Ca-4 and Asp114 and Na-5 and the backbone oxygen of Leu182 respectively. Root-mean-square fluctuations (RMSF) analysis is generally used as standard indicator of protein flexibility. We used the RMSF of the protein backbone atoms defined by the relation:

1 N 2 RMSF j r r ( ) = ∑()ij − j N i=1 where j indicate the atom number and N the total number of atoms used in the analysis. The average RMSF( i) for the ith residue was calculated by averaging the RMSF of atoms belonging to the same residue. In addition, the local RMSF (LRMSF) of loop regions was calculated from the RMSF( i) as:

81 Temperature Effects on Dynamics of S41

1 N2 LRMSF = ∑RMSF (i) N − N 2 1 i=N1 where N2 and N1 are the residue number [133]. The differences of LRMSF were used to characterize the variation of flexibility for the different variants and at the two different temperatures. The significance of the mutation- and temperature-induced differences observed in the dynamic behavior of loop regions was evaluated using the t-student test.

The analysis of deviation and fluctuation per residue was performed on the last 20 ns of the trajectories.

2.3.2 Cluster analysis

The study of the protein backbone conformations sampled every 10 ps during the MD simulations was conducted using the GROMOS cluster algorithm [149]. For each structure, a least-square translational and rotational fit was performed using backbone atoms, and the RMSD for the backbone atoms of the proteins were calculated. A cutoff of 0.07 at 283K and 0.10 nm at 363K were chosen.

2.3.3. - Essential Dynamics Analysis This methodology is based on the principal component analysis of MD trajectories and allows to separate the large (nonlinear) amplitude correlated motions present in the different proteins at different temperatures from the quasi-harmonic mode associated to thermal noise fluctuations [150]. The analysis was performed on protein backbone atoms (C, C α, and N) obtained from the last 20 ns of the MD simulation trajectories of each protein at both 283K and 363K. The S41 crystal structure was used as reference for the analysis of all the simulations. The analysis was conducted as described elsewhere [150]. The resulting eigenvectors and eigenvalues define specific orthogonal modes of vibration of the molecular system. The comparison of eigenvectors obtained from the different simulations was performed using the root-mean-square inner product (RMSIP) defined as 1 m m RMSIP = ∑∑(vi ⋅ u j ) N i=1 j =1

th th where vi and uj are i and j eigenvectors of the two different m dimension essential subspaces of different systems. RMSIP gives a simple measure to assess their dynamical similarity. The

82 Temperature Effects on Dynamics of S41 statistical significance of the overlap between two eigenvectors was estimated by comparing the obtained s values with the projections of a randomly oriented unit vector onto a given set [151] and with those of a structural equivalent but uncorrelated biopolymer. For the fist case, the probability density ρ(s,M) of finding a value s of the square projection of a random unit vector onto one of the eigenvectors of the given set is [151]:

ρ(s,M) = (M −1)(1− s)M −1 where M=2781 is the dimension of the space. The value of s’ giving the 99% of confidence that a given s is similar to a random distribution is calculated from the integral of the probability density [152]:

s' P(s') = ∫ ρds =1− (1− s')M −2 = 0.99 , 0 and hence s'=1− M −1 0.01 ≈ 0.002 .

The square projection of one eigenvector onto a reference set is statistically significant if s>s’ or in case s

2.4. - Graphical representations

The program VMD [90] was used for generating the images of the molecular structures.

3. - Results

3.1. - Structural properties

The secondary structure (in particular α-helix and β-sheet) of proteins in all the simulations did not show remarkable variations across the simulation time (Table 3.2).

83 Temperature Effects on Dynamics of S41

Table 3.2. – Variation on the percentage of residues forming part of each specified type of secondary structure along the MD simulation of WT S41, MUT2 and MUT7 at 283 and 363K .

Variation vs crystal structure (%) Standard deviation (%) All secundary β-sheet α-helix All secondary β-sheet α-helix structure structure

S41 283K 0.4 0.3 0.3 ±1.85 ±0.61 ±0.92 363K -2.9 -0.5 -0.8 ±2.18 ±1.04 ±1.21 MUT2 283K -0.6 0.3 0.0 ±1.84 ±0.71 ±0.96 363K -5.3 -0.7 -2.3 ±3 ±0.8 ±1.53 MUT7 283K -0.9 0.2 0.1 ±1.65 ±0.59 ±0.91 363K -4.3 -0.4 -1.7 ±3.11 ±0.99 ±1.58

Average radius of gyration and total surface areas of S41, MUT2 and MUT7 indicated that the variation on the compactness of the protein structure after increasing the temperature from 283K to 363K did not exceed 6% (Table 3.3). The backbone RMSD values along the 50 ns simulations for the two different temperatures are reported in panel I of Figure 2. All the curves reached a plateau within 30 ns, with maximal RMSD values not exceeding 0.3 nm. RMSD values calculated per residue are reported in panel II of Figure 3.2. For WT S41, the most relevant deviations include loops 83-87, 105-110, and 237-241. The substitution of Lys 211 and Arg 212 by Pro and Ala in MUT2 resulted in a notable deviation from the crystallographic structure for the loop 209-222 in the simulation at 363K. Conversely, this deviation was not observed in the simulation at 283K. For MUT7, the same region did not show remarkable variations from the crystal structure at both 283K and 363K. Furthermore, at 363K, the deviation from the WT at each amino acid substitution site was lower compared to MUT2 at 363K (Figure 3.2, panel II, c).

Table 3.3. – Average radius of gyration and surface access area calculated for each simulation .

Radius of Radius of Difference Surf. access Surf. access Difference gyration gyration (%) area 283K area 363K (%) 283K (nm) 363K (nm) (nm 2) (nm 2) S41 1.722 1.738 0.92 171.53 175.39 2.25 MUT2 1.729 1.750 1.21 167.29 175.10 4.67 MUT7 1.726 1.742 0.92 159.31 168.46 5.75

84 Temperature Effects on Dynamics of S41

Figure 3.2. – I) Backbone RMSD values are plotted versus simulation time for S41 (a) , MUT2 (b) and MUT7 (c) . II) RMSD values per residue were calculated for the last 20 ns of simulation trajectory for wild type S41 (a) , MUT2 (b) and MUT7 (c) ; III) RMSF values per residue in last 20 ns of simulations for S41 (a) , MUT2 (b) and MUT7 (c) . The black line and the red line represent values for the 283K and 363K simulation respectively. Amino acid substitutions are indicated with a dotted line. The dark bars and gray arrows at the top of RMSF graphs indicate α-helical and β-sheets regions respectively.

85 Temperature Effects on Dynamics of S41

3.2. - Structural ions

For all simulations, calcium ions remained in their initial position within a distance of 0.4 nm from the coordinating residues (See Figure 3.3, 3.4 and 3.5). The same occurred for sodium ions in position Na-5, but for MUT2 at 363K, the ion left the coordination site after 30 ns of simulation time (see Figure 1, 2 and 3 in Supplementary Materials).

Figure 3.3. – Distance between the calcium and sodium ions and selected atoms of binding residues along the simulation time for Subtilisin S41. The distance between Ca-1 and Asp286 (A) , Ca-2 and Asp216 (B) , Ca-3 and Glu49 (C) , Ca-4 and Asp114 (D) and Na-5 and Leu 182 (E) was measured over the course of the simulation each 100 ps. The left and the right panel show the distance for the trajectory at 283 K and 363 K simulations, respectively.

Figure 3.4. – Distance between the calcium and sodium ions and selected atoms of binding residues along the simulation time for MUT2. The distance between Ca-1 and Asp286 (A) , Ca-2 and Asp216 (B) , Ca-3 and Glu49 (C) , Ca-4 and Asp114 (D) and Na-5 and Leu 182 (E) was measured over the course of the simulation each 100 ps. The left and the right panel show the distance for the trajectory at 283 K and 363 K simulations, respectively.

86 Temperature Effects on Dynamics of S41

Figure 3.5. – Distance between the calcium and sodium ions and selected atoms of binding residues along the simulation time for MUT7. The distance between Ca-1 and Asp286 (A) , Ca-2 and Asp216 (B) , Ca-3 and Glu49 (C) , Ca-4 and Asp114 (D) and Na-5 and Leu 182 (E) was measured over the course of the simulation each 100 ps. The left and the right panel show the distance for the trajectory at 283 K and 363 K simulations, respectively.

3.3. - Dynamic properties

RMSF values per residue are reported in panel III of Figure 3.2. The regions with the largest flexibility correspond to protein loops for all three variants. At 363K, there was a sharp increase of the fluctuation values focused in these loop regions. This pattern was different for each variant, especially for loops where the amino acid substitutions were introduced (dotted line). In order to compare how the general flexibility and the flexibility across the different loops change from 283K to 363K, the average RMSF values for all heavy and backbone atoms were calculated for the last 20 ns of each simulation of S41, MUT2 and MUT7 (Table 3.4). The differences of the obtained values were statistically compared using the t-student test. The average flexibility of S41 was lower than that of MUT2 and MUT7 at both 283K and 363K, suggesting that indeed the thermostable variants do not have an increased rigidity. This held true when heavy atoms and backbone atoms were considered for the calculations. The average increase of the RMSF values for the simulation at 363K with respect to 283K was the same for all studied variants, showing that the average change of the fluctuation intensity does not provide useful information to compare the dynamic properties of the studied enzymes. Since the study of the average flexibility did not provide a tool for comparison of the dynamic properties of S41, MUT2 and MUT7, the changes on the local fluctuation were monitored and compared.

87 Temperature Effects on Dynamics of S41

Table 3.4. – Average RMSF values obtained from the last 20 ns of MD simulation for S41, MUT2 and MUT7. The average RMSF values and the standard deviation were calculated for all heavy and backbone atoms at 283K and 363K.

T° (K) S41 MUT2 CL MUT7 CL (%) (%) Heavy 283 0.061 ±0.039 0.070 ±0.047 100 0.063 ±0.041 72 atoms 363 0.089 ±0.068 0.097 ±0.064 100 0.091 ±0.059 50 Difference 0.028 ±0.107 0.027 ±0.111 24 0.028 ±0.100 0 Backbone 283 0.047 ±0.023 0.055 ±0.027 100 0.049 ±0.022 67 atoms 363 0.069 ±0.048 0.077 ±0.048 100 0.071 ±0.041 72 Difference 0.022 ±0.071 0.022 ±0.075 0 0.022 ±0.063 0

The loop regions with the largest flexibility are listed in Table 3.5 A and 3.5 B for the 283K and 363K simulations, respectively. These loops were classified in three groups represented in Figure 1a with different colors according to their location in the protein structure.

Table 3.5. – The local RMSF values for loops regions were calculated for S41, MUT2 and MUT7 simulations at 283K (A) and 363K (B) . In the CL columns, the confidence levels (in percentage) for the hypothesis LRMSF S41 ≠ LRMSF MUT2,MUT7 obtained from the t-student distribution test are reported. A value of 100% indicates that the two values can be certainly considered different.

A) Loops Loop S41 MUT2 CL MUT7 CL Number Group LRMSF LRMSF (%) LRMSF (nm) (%) (283K) (nm) (nm) 15-21 I 0.067 ± 0.011 0.085 ± 0.022 94 0.079 ± 0.022 80 39- 47 I 0.049 ± 0.010 0.043 ± 0.006 87 0.049 ± 0.014 0 83- 87 I 0.158 ± 0.016 0.084 ± 0.025 100 0.076 ± 0.018 100 105- 110 III 0.125 ± 0.035 0.149 ± 0.046 68 0.235 ± 0.101 98 138-146 III 0.062 ± 0.013 0.117 ± 0.031 100 0.066 ± 0.014 47 209-222 II 0.065 ± 0.016 0.099 ± 0.018 100 0.090 ± 0.024 100 237-241 I 0.070 ± 0.012 0.083 ± 0.031 61 0.054 ± 0.009 98 288-291 II 0.048 ± 0.003 0.172 ± 0.035 100 0.056 ± 0.005 99

B) Loops Loop S41 MUT2 CL MUT7 CL Number Group LRMSF LRMSF (%) LRMSF (nm) (%) (363K) (nm) (nm)

15-21 I 0.095 ± 0.017 0.161 ± 0.038 100 0.147 ± 0.052 98 39- 47 I 0.063 ± 0.012 0.063 ± 0.015 0 0.092 ± 0.007 100 83- 87 I 0.196 ± 0.035 0.195 ± 0.039 3 0.234 ± 0.061 76

105- 110 III 0.249 ± 0.081 0.278 ± 0.125 36 0.257 ± 0.083 13 138-146 III 0.111 ± 0.044 0.210 ± 0.104 99 0.121 ± 0.051 34 209-222 II 0.075 ± 0.013 0.197 ± 0.084 100 0.127 ± 0.055 100

237-241 I 0.168 ± 0.058 0.126 ± 0.059 73 0.196 ± 0.075 49 288-291 II 0.059 ± 0.003 0.174 ± 0.024 100 0.146 ± 0.019 100 The first group, containing loops 15-21, 39-47, 83-87 and 237-241 (yellow), is located to the opposite side of the catalytic site (red). The second group (orange) is formed by the extended

88 Temperature Effects on Dynamics of S41 loops 209-222 and 288-291. Both loops are close to or part of the high affinity calcium binding sites Ca-1 (loop 288-291) and Ca-2 (loop 209-222) [139]. The third group (green) is formed by loops 105-110 and 138-146 (Figure 3.6). Both loops are close to the active site and are close or form part of the S4 binding pocket described for subtilisin-like proteases [54].

Figure 3.6. – Flexible loop regions for S41 protease crystal structure. To investigate and compare the individual fluctuations of the different flexible loops identified by RMSF analysis and relate them according to their location on the 3D structure of S41 protease, the flexible loops were grouped in three different groups. The first group (yellow) is located away from the S4 binding pocket and the catalytic triad and not related to any calcium binding site. The second group (orange) refers to loops containing residues that are part of Ca-1 and Ca-2 calcium binding sites and the third group (green) includes the flexible loops forming part of the S4 binding pocket. The residues of the catalytic site are shown in red.

WT S41 showed two major mobility regions (loop 83-87 and 105-110) with RMSF values significantly higher than the less mobile loops. The large flexibility of these two loops is not limited to the WT but it is a common feature in all three studied variants. Since loop 105-110 is close to a low affinity calcium-binding site and is part of the S4 binding pocket, its enhanced flexibility at low temperature might be beneficial for the enzymatic activity. However, at high temperatures, the enhanced flexibility can alter the structure of both the calcium-binding site and the binding S4 pocket. This might eventually lead to a loss of activity even before a major structural failure occurs upon unfolding. There are indications that Tyr104 of Subtilisin BPN’

89 Temperature Effects on Dynamics of S41

(corresponding to Tyr111 in S41) might act as a flexible lid for the S4 binding site [153]. Indeed, this situation was observed during MD simulation of WT S41 (Figure 3.7), where the median structures of the most abundant clusters of the MD trajectories at 283K and 363K were analyzed and compared to the structural configuration of the crystal structure. Tyrosine 111 (licorice) is shown blocking the access to the S4 binding site when the simulation is run at 363K, mainly due to a configuration change on loop 105-110 (left component of S4 pocket rim). The conformational change also occurred in the MUT2 and MUT7 simulations at 363K but with much less frequency.

Figure 3.7. – Structural changes of the S4 binding site in S41. Median structures of the most abundant clusters found from the MD trajectories of S41 at 283K (b) and 363K (c) were analyzed and compared to the structural configuration of the crystal structure (a) . The blue surface area represents residues that are part of S4 binding site; the red surface represents the catalytic triad. The arrow indicates the position of residue Tyrosine 111 (in licorice).

The double mutant MUT2 showed a decrease in flexibility in loop 105-110 and an increase in loops 209-222 and 138-146. This redirection of flexibility might stabilize the S4 binding pocket, allowing the enzyme to remain active at higher temperatures for longer times than the WT. This loop is far from the active site but it is still part of the Ca-2 binding site and its disruption can lead to a structural failure of the protein. MUT7 evidenced a switch of the most flexible area from loop 209-222 to loops 39-47 and 237- 241. Four out of the five introduced substitutions are in the proximity of the Ca-2 and Ca-1 calcium binding sites, most probably re-stabilizing the loops from the perturbation caused by the previous substitutions in position 211 and 212. Finally, the mutation Ser145Ile decreased fluctuations of the loop 138-146 back to the level of the WT.

90 Temperature Effects on Dynamics of S41

3.4. - Essential Dynamics Analysis

The distribution of the RMSFs along the degree of freedom of the protein backbone was addressed using essential dynamics analysis. The total number of eigenvalues and eigenvectors considered for the analysis was 2781. The number of eigenvalues contributing to the 70% of the total motion (essential eigenvectors) for S41 and its variants at 283K and 363K are reported in Figure 3.8.

Eigenvectors representing Eigenvectors representing 70% of motion at 283K 70% of motion at 383K S41 47 27 MUT2 32 28 MUT7 23 30

Figure 3.8. – Relative percentage for the contribution of each eigenvector to the total motion of each enzyme at 283K (A) and 363K (B) . The table summarizes the number of eigenvectors that represent 70% of the total motions of each S41 variant.

91 Temperature Effects on Dynamics of S41

The analysis provided for each protein the following results:

3.4.1. - WT S41 At 363K, the number of essential eigenvectors decreased from 47 to 27, suggesting a narrowing of the essential space as consequence of the activation of a subset of nonlinear motion with larger amplitude. The comparison of the essential subspace using the RMSIP represented by the first 30 eigenvectors at both temperatures, resulted in values higher than 0.5 (Table 3.5), indicating a significant overlap among essential subspaces of these proteins. In addition, a closer look to the inner product matrix of the same eigenvectors (Figure 3.9), showed that some of the low temperature essential modes were rather similar (reporting only IP’s> 0.4; note that the pure random model and pseudo random model of the analyzed protein provides only values IP<0.2) to those at high temperature.

Figure 3.9. – Inner product matrices between eigenvectors representing the essential subspace of S41 (right), MUT2 (center) and MUT7 (left) at 283K and 363K.

The backbone RMSF calculated for the projection of the trajectories along the first eigenvectors showed collective fluctuations mainly involving loops 83-87 and 105-110 (Figure 3.10 a). The RMSF profiles at both temperatures were very similar consistently with their IP (>0.6). The other two eigenvectors at the two temperatures had less similarity and no fluctuations along the loop 83- 87. In Figure 3.11, the tridimensional representation of concerted motions along the essential eigenvectors for WT S41 (a), MUT2 (b) and MUT7 (c) are shown 283K ( 1) and 363K ( 2). In the case of WT S41, at 283K the only major variation occurred in loop 105-110, which is related to the flexibility of the S4 binding pocket, necessary for enzyme activity [54]. At 363K the larger flexibility of loops 83-87 and 105-110 was clearly visible (Fig 3.11, a 2) along with the increase of flexibility of loop 237-241.

92 Temperature Effects on Dynamics of S41

3.4.2. - MUT2 Differently from WT, the number of essential eigenvectors remained stable in MUT2 from 32 at 283K to 28 at 363K. The overlap of the essential subspaces (0.515) and IP matrices also suggested that the correspondence of essential modes at the two temperatures is less evident than in the WT. Figure 3.10 b shows that the RMSF profile along the first eigenvector was different at the two temperatures. In fact, at 283K, collective motions involved mainly loop 138-146 and, at 363K, loops 83-87 and 105-110. Furthermore, the fluctuations along the loop 209-222 at 363K were described by the second and third eigenvectors. The IP matrices showed a higher number of similar eigenvectors at 283K than at 363K (Figure 3.12). The tridimensional representation of the essential modes in Figure 6b clearly shows that, at 283K, the only major fluctuation occurred in loop 138-146 in correspondence of the S4 binding pocket. However, at 363K, the larger fluctuations involved the loops 209-222, 83-87 and 105-110 (Figure 3.11, b 2).

3.4.3. - MUT7 For this variant, the number of essential eigenvectors increased from 23 at 283K to 30 at 363K. The RMSIP value (0.582) was lower than S41 but higher than the MUT2 as also the similarity between eigenvectors at both 283K and 363K (Figure 3.12). The RMSF profile along the first eigenvector at 283K was very different to that obtained at 363K. The collective motions at 283K included loops 209-222 and 105-110, whereas at 363K, loops 39-47, 83-87, 105-110 and 237-247 (Figure 3.10 c) were involved in the overall motion of the protein.

The concerted mode at 283K reported in Figure 3.11, c 1 involves mainly the loops 105-110, 209- 222 and 288-291. At 363K, the modes were described in different loop regions, in particular loops 83-87, 237-241 and 15-21. In this case, the temperature change did not simply enhance the modes present at lower temperature but seemed to activate different ones in other regions of the protein

(Figure 3.11, c 2).

93 Temperature Effects on Dynamics of S41

Figure 3.10. –RMSF values per residue obtained by projecting the S41 (a) , MUT2 (b) and MUT7 (c) trajectories on the corresponding first 3 eigenvectors. The black line represents values obtained at 283K and the grey line 363K.

94 Temperature Effects on Dynamics of S41

Figure 3.11. - Tridimensional representation of the essential modes representing 70% of the total protein from the simulations of wild type S41 (a) , MUT2 (b) and MUT7 (c) . The structural conformations were extracted each 600 ps and superimposed to evidence the dominant fluctuation modes on each condition. The subscript 1 and 2 refer to the 283K and 363K simulations, respectively.

95 Temperature Effects on Dynamics of S41

Figure 3.12. - The inner product matrices between the first 30 eigenvectors representing the essential subspace of S41 were compared with the same number of eigenvectors of MUT2 and MUT7 at 283K (upper row) and 363K (lower row).

4. - Discussion and conclusions

Miyazaki and Arnold (1999) [78] reported a half life at 60°C of MUT2 10 times longer than that of WT S41 (8 min). This increased thermostability was explained by them as the stabilization of an extended mobile loop 209-222 and described it as a weak point in the S41 enzyme. The additional five amino acid substitutions in MUT7 are localized (with the exception of Ser145Ile) near the initial two at positions 211 and 212. In addition, the mutation Lys221Glu is close to the Ca-2 calcium binding site whereas Asn291Ile and Ser295Thr are close to Ca-1 binding site. Due to limitations on the accurate modeling of the loop 209-221 at the time of the original publication, the specific interactions of residues from the loop 209-221 as part of a calcium binding site could not be analyzed. The authors, however, found that the half life at 60°C of MUT7 increased at high

96 Temperature Effects on Dynamics of S41

Calcium concentrations [49], in clear consistency with the possible effect of the Ca-2 binding site later described by Almog et al. (2009). [139]. The results of our MD simulations suggested that the effect on the thermal stability of the variants may be more elusive and involve a change in the intensity and profile of the protein fluctuations at 283K and 363K. In fact, WT S41 simulations showed a low mobility for the loop 209-222 (probably determined by the presence of Ca-3) and longer mobility in other loop regions (83-87 and 105-110) at both 283K and 363K. In MUT2, the loop 209-222 showed an increased mobility only at 363K whereas mobility of the active site loop 105-110 was rather reduced at the same temperature. Our results support the stabilization in the region where the amino acid substitution were introduced in MUT7, since RMSD and RMSF values per residue decreased with respect to those in the same positions in MUT2 at 363K (Figure 3.2). Furthermore, the essential dynamics analysis showed that the essential modes in MUT7 at 363K do not involve loops in which the additional mutations were introduced (Figures 3.10 and 3.11). Despite MUT7 having a higher overall flexibility, further inspection showed that there were no loops with extremely high RMSF values (Table 3.5). The idea of selective activation of essential modes has been proposed also in other studies of thermophilic proteins [129] or mesophilic proteins [152] but not yet demonstrated. However, recently experimental results from NMR are showing the importance of localizing the fluctuation in hinge regions of the protein [125]. In the last years, there are also different experimental results proving the role of flexible loop region on protein thermostability. Jang et al . showed that removing one of the two unique long loops from thermicin subtilisin-like protease yields either a less active variant or a less thermostable one [154]. The removal of the extended loops was performed to eliminate possible weak points on the structure to further increase thermostability, but the results suggested that those loops are an important structural factor that may play a role for the protein’s ability to withstand high temperatures. Other authors bring another important observation that might be helpful to elucidate cold- adaptation and thermostability [38]. The molecular mechanism of the enzymatic catalysis and the motions involved in this process are also helpful to identify possible relevant substitutions. General flexibility variations may be relevant for enzymes that undergo a large conformational change when binding a macromolecular substrate, whereas for enzymes that experience conformational changes only nearby the active site, substitutions affecting local flexibility variations in that region are more likely to affect the activity at low temperature. In Subtilisin S41, none of the introduced mutations is directly related to the active site (Asp 34, His71 and Ser 249): however, an increase of activity of the thermostable variants at both low and high temperatures was observed [49]. This result suggests that an improvement on the stability of a protein is not

97 Temperature Effects on Dynamics of S41 necessarily related to a decrease of activity at lower temperatures, as it can be observed in naturally evolved thermophilic enzymes. In this work, the dynamic behavior of Subtilisin S41 and two of its mutant was analyzed The total atomic fluctuations were slightly different for the three proteins. These differences did not change with the increase of temperature since the variation of the global fluctuation with the increase of the temperature was the same. A detailed inspection of amino acid RMSF profiles related to the secondary structure of the proteins revealed that the major contribution to the total protein fluctuations comes from few loops in WT S41. On the contrary, MUT2 and MUT7 mutants tend to have a distribution of the fluctuations on a larger number of loop regions suggesting that the increase on thermal resistance could be by entropic stabilization. At high temperature, the essential modes of WT S41 are rather similar but reduced in number compared to those from low temperature simulations. These modes involve few loop regions of the protein that show an amplification of the overall fluctuations as the temperature is increased. On the contrary, the essential modes of MUT7 are rather different at 283K and 363K. Furthermore, the essential subspaces are expanding with the increase of temperature resulting in lower fluctuation values on a larger number of loop regions than that of WT. According with the position of these loops in the protein structure, these regions are also less relevant for activity. Interestingly, the essential eigenvectors for both mutants have lower resemblance with those at higher temperature. Further investigations in this direction might offer alternative strategies to rationally design protein that explore different conformational spaces than the less stable variants.

98 Temperature Effects on Dynamics of S41

Future prospects:

Several studies regarding protein adaptation, specifically in subtilisin proteases, can be derived from the present work. From the results of the directed evolution study, the relationship between the charges of the introduced aminoacid substitution and protein stability brings a new variable that can be engineered to achieve proteases with better thermal resistance and storage life. It is also important how this improvement in one condition is true when factors such as ionic strength of the reaction medium or surface charge of the substrate are modified. The location in the three- dimensional structure of the protein of the found substitutions opens new question regarding possible trends in relation of whether amino acid substitutions located in the surface or in the core of the protein (and which kind of substitutions) are more prone to affect primarily the stability or the activity of a subtilisin.

From the MD simulation studies a primary question arises. Is the change on the dominant dynamic modes on the stable variants of S41 a cause or an effect of the increase of thermal resistance and how does it relate with the unfolding mechanism of each variant? It would be also relevant to find the observations made in this work are also made in other examples of protein adaptation and in other enzyme classes. Experimental data would provide additional information regarding the dynamic behavior of MUT2 and MUT7 variant and could eventually support or suggest modifications to the proposed analysis of the dynamic profiles.

99 References

Part IV: References

1. Polaina, J. and A.P. MacCabe, Industrial Enzymes: Preface , in Industrial Enzymes . 2007. 2. Kirk, O., T.V. Borchert, and C.C. Fuglsang, Industrial enzyme applications. Current Opinion in Biotechnology, 2002. 13 (4): p. 345-351. 3. Panke, S., M. Held, and M. Wubbolts, Trends and innovations in industrial biocatalysis for the production of fine chemicals. Current Opinion in Biotechnology, 2004. 15 (4): p. 272-279. 4. Schulze, B. and M.G. Wubbolts, Biocatalysis for industrial production of fine chemicals. Current Opinion in Biotechnology, 1999. 10 (6): p. 609-615. 5. Güven, G., R. Prodanovic, and U. Schwaneberg, Protein Engineering - An Option for Enzymatic Biofuel Cell Design. Electroanalysis, 2010. 22 (7-8): p. 765-775. 6. Shivange, A.V., et al., Advances in generating functional diversity for directed protein evolution. Current Opinion in Chemical Biology, 2009. 13 (1): p. 19-25. 7. Wong, T.S., et al., Transversion-enriched sequence saturation mutagenesis (SeSaM-Tv): A random mutagenesis method with consecutive nucleotide exchanges that complements the bias of error-prone PCR. Biotechnology Journal, 2008. 3(1): p. 74-82. 8. Wong, T.S., D. Roccatano, and U. Schwaneberg, Steering directed protein evolution: strategies to manage combinatorial complexity of mutant libraries. Environmental Microbiology, 2007. 9(11): p. 2645-2659. 9. Wong, T.S., D. Roccatano, and U. Schwaneberg, Are transversion mutations better? A Mutagenesis Assistant Program analysis on P450 BM-3 heme domain. Biotechnology Journal, 2007. 2(1): p. 133-142. 10. Bloom, J.D. and F.H. Arnold, In the light of Directed Evolution: Pathways of adaptive protein evolution. Proceedings of the National Academy of Sciences, 2009. 106 (Supplement 1): p. 9995-10000. 11. Otten, L.G. and W.J. Quax, Directed Evolution: selecting today's biocatalysts. Biomolecular Engineering, 2005. 22 (1-3): p. 1-9. 12. Tao, H. and V.W. Cornish, Milestones in directed enzyme evolution. Current Opinion in Chemical Biology, 2002. 6(6): p. 858-864. 13. Drummond, D.A., et al., Why High-error-rate Random Mutagenesis Libraries are Enriched in Functional and Improved Proteins. Journal of Molecular Biology, 2005. 350 (4): p. 806-816. 14. Cadwell, R.C. and G.F. Joyce, Randomization of genes by PCR mutagenesis. Genome Research, 1992. 2(1): p. 28-33. 15. Stemmer, W.P.C., Rapid evolution of a protein in vitro by DNA shuffling. Nature, 1994. 370 (6488): p. 389-391. 16. Zhao, H., et al., Molecular evolution by staggered extension process (StEP) in vitro recombination. Nat Biotech, 1998. 16 (3): p. 258-261. 17. Becker, S., et al., Ultra-high-throughput screening based on cell-surface display and fluorescence-activated cell sorting for the identification of novel biocatalysts. Current Opinion in Biotechnology, 2004. 15 (4): p. 323-329. 18. Agresti, J.J., et al., Ultrahigh-throughput screening in drop-based microfluidics for Directed Evolution. Proceedings of the National Academy of Sciences, 2010. 107 (9): p. 4004-4009. 19. Farinas, E.T., Fluorescence Activated Cell Sorting for Enzymatic Activity. Combinatorial Chemistry & High Throughput Screening, 2006. 9: p. 321-328. 20. Yang, G. and S.G. Withers, Ultrahigh-Throughput FACS-Based Screening for Directed Enzyme Evolution. ChemBioChem, 2009. 10 (17): p. 2704-2715. 21. Leemhuis, H., R.M. Kelly, and L. Dijkhuizen, Directed Evolution of enzymes: Library screening strategies. IUBMB Life, 2009. 61 (3): p. 222-228.

100 References

22. Klibanov, A.M. and I.L. Allen, Stabilization of Enzymes against Thermal Inactivation , in Advances in Applied Microbiology . 1983, Academic Press. p. 1-28. 23. Kumar, S. and R. Nussinov, How do thermophilic proteins deal with heat? Cellular and Molecular Life Sciences, 2001. 58 (9): p. 1216-1233. 24. Vorob'eva, L.I., Stressors, Stress Reactions, and Survival of Bacteria: A Review. Applied Biochemistry and Microbiology, 2004. 40 (3): p. 217-224. 25. Declerck, N., et al., Hyperthermostable mutants of Bacillus licheniformis {alpha}- amylase: multiple amino acid replacements and molecular modelling. Protein Eng., 1995. 8(10): p. 1029-1037. 26. Lee, Y.E., et al., Characterization of the active site and thermostability regions of endoxylanase from Thermoanaerobacterium saccharolyticum B6A-RI. J. Bacteriol., 1993. 175 (18): p. 5890-5898. 27. Svensson, B., Protein engineering in the α-amylase family: catalytic mechanism, substrate specificity, and stability. Plant Molecular Biology, 1994. 25 (2): p. 141-157. 28. Nielsen, J.E. and T.V. Borchert, Protein engineering of bacterial α-amylases. Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology, 2000. 1543 (2): p. 253-274. 29. Quax, W.J., et al., Enhancing the Thermostability of Glucose Isomerase by Protein Engineering. Nat Biotech, 1991. 9(8): p. 738-742. 30. Joyet, P., N. Declerck, and C. Gaillardin, Hyperthermostable Variants of a Highly Thermostable Alpha-Amylase. Nat Biotech, 1992. 10 (12): p. 1579-1583. 31. Haki, G.D. and S.K. Rakshit, Developments in industrially important thermostable enzymes: a review. Bioresource Technology, 2003. 89 (1): p. 17-34. 32. Alberghina, L. and M. Lotti, Protein Engineering in basic and applied biotechnology: a review. Protein Engineering in Industrial Biotechnology, 2000. 2000 . 33. Fitter, J., et al., Dynamical properties of [alpha]-amylase in the folded and unfolded state: the role of thermal equilibrium fluctuations for conformational entropy and protein stabilisation. Physica B: Condensed Matter, 2001. 301 (1-2): p. 1-7. 34. Cavicchioli, R., et al., Low-temperature extremophiles and their applications. Current Opinion in Biotechnology, 2002. 13 (3): p. 253-261. 35. Georlette, D., et al., Some like it cold: biocatalysis at low temperatures. FEMS Microbiology Reviews, 2004. 28 (1): p. 25-42. 36. Privalov, P.L., Cold Denaturation of Protein. Critical Reviews in Biochemistry and Molecular Biology, 1990. 25 (4): p. 281-306. 37. Makhatadze, G.I., et al., Energetics of Protein Structure , in Advances in Protein Chemistry . 1995, Academic Press. p. 307-425. 38. D'Amico, S., et al., Molecular basis of cold adaptation. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 2002. 357 (1423): p. 917-925. 39. Zecchinon, L., et al., Did psychrophilic enzymes really win the challenge? Extremophiles, 2001. 5(5): p. 313-321. 40. Davail, S., et al., Cold adaptation of proteins. Purification, characterization, and sequence of the heat-labile subtilisin from the antarctic psychrophile Bacillus TA41. Journal of Biological Chemistry, 1994. 269 (26): p. 17448-17453. 41. Feller, G. and C. Gerday, Psychrophilic enzymes: molecular basis of cold adaptation. Cellular and Molecular Life Sciences, 1997. 53 (10): p. 830-841. 42. Feller, G. and C. Gerday, Psychrophilic enzymes: hot topics in cold adaptation. Nat Rev Micro, 2003. 1(3): p. 200-208. 43. Feller, G., Life at low temperatures: is disorder the driving force? Extremophiles, 2007. 11 (2): p. 211-216. 44. Gerday, C., et al., Cold-adapted enzymes: from fundamentals to biotechnology. Trends in Biotechnology, 2000. 18 (3): p. 103-107. 45. Tindbaek, N., et al., Engineering a substrate-specific cold-adapted subtilisin. Protein Engineering, Design and Selection, 2004. 17 (2): p. 149-156.

101 References

46. Kano, H., S. Taguchi, and H. Momose, Cold adaptation of a mesophilic serine protease, subtilisin, by in vitro random mutagenesis. Applied Microbiology and Biotechnology, 1997. 47 (1): p. 46-51. 47. Taguchi, S., et al., A Cold-Adapted Protease Engineered by Experimental Evolution System. J Biochem, 1999. 126 (4): p. 689-693. 48. Wintrode, P.L., K. Miyazaki, and F.H. Arnold, Cold Adaptation of a Mesophilic Subtilisin-like Protease by Laboratory Evolution. Journal of Biological Chemistry, 2000. 275 (41): p. 31635-31640. 49. Miyazaki, K., et al., Directed Evolution study of temperature adaptation in a psychrophilic enzyme. Journal of Molecular Biology, 2000. 297 (4): p. 1015-1026. 50. Wintrode, P.L., F.H. Arnold, and H.A. Frances, Temperature adaptation of enzymes: Lessons from laboratory evolution , in Advances in Protein Chemistry . 2001, Academic Press. p. 161-225. 51. Rawlings, N.D., F.R. Morton, and A.J. Barrett, An Introduction to Peptidases and the Merops Database , in Industrial Enzymes . 2007. p. 161-179. 52. Kataoka, Y., et al., Catalytic residues and substrate specificity of scytalidoglutamic peptidase, the first member of the eqolisin in family (G1) of peptidases. FEBS Letters, 2005. 579 (14): p. 2991-2994. 53. Rao, M.B., et al., Molecular and Biotechnological Aspects of Microbial Proteases. Microbiol. Mol. Biol. Rev., 1998. 62 (3): p. 597-635. 54. Siezen, R.J. and J.A.M. Leunissen, Subtilases: The superfamily of subtilisin-like serine proteases. Protein Science, 1997. 6(3): p. 501-523. 55. Shinde, U. and M. Inouye, Intramolecular chaperones: polypeptide extensions that modulate protein folding. Seminars in Cell & Developmental Biology, 2000. 11 (1): p. 35- 44. 56. Foophow, T., et al., Subtilisin-like serine protease from hyperthermophilic archaeon Thermococcus kodakaraensis with N- and C-terminal propeptides. Protein Engineering, Design and Selection, 2010. 23 (5): p. 347-355. 57. Perona, J.J. and C.S. Craik, Structural basis of substrate specificity in the serine proteases. Protein Science, 1995. 4(3): p. 337-360. 58. Polgár, L., The catalytic triad of serine peptidases. Cellular and Molecular Life Sciences, 2005. 62 (19): p. 2161-2172. 59. Branden, C.-I. and J. Tooze, Introduction to Protein Structure: Second Edition . 1999: Garland Publishing. 60. Rawlings, N.D. and A.J. Barrett, Evolutionary families of peptidases. Biochem. J., 1993. 290 (1): p. 205-218. 61. Donlon, J., Subtilisin , in Industrial Enzymes . 2007. p. 197-206. 62. Olsen, H. and P. Falholt, The role of enzymes in modern detergency. Journal of Surfactants and Detergents, 1998. 1(4): p. 555-567. 63. Maurer, K.-H., Detergent proteases. Current Opinion in Biotechnology, 2004. 15 (4): p. 330-334. 64. Gupta, et al., An overview on fermentation, downstream processing and properties of microbial alkaline proteases. Applied Microbiology and Biotechnology, 2002. 60 (4): p. 381-395. 65. Gupta, et al., Bacterial alkaline proteases: molecular approaches and industrial applications. Applied Microbiology and Biotechnology, 2002. 59 (1): p. 15-32. 66. Estell, D.A., T.P. Graycar, and J.A. Wells, Engineering an enzyme by site-directed mutagenesis to be resistant to chemical oxidation. Journal of Biological Chemistry, 1985. 260 (11): p. 6518-6521. 67. Kidd, R.D., et al., Breaking the low barrier hydrogen bond in a serine protease. Protein Science, 1999. 8(2): p. 410-417. 68. Matsumoto, K., B.G. Davis, and J.B. Jones, Chemically Modified ldquoPolar Patchrdquo Mutants of Subtilisin in Peptide Synthesis with Remarkably Broad Substrate Acceptance:

102 References

Designing Combinatorial Biocatalysts. Chemistry - A European Journal, 2002. 8(18): p. 4129-4137. 69. Lloyd, R.C., B.G. Davis, and J.B. Jones, Site-selective glycosylation of subtilisin Bacillus lentus causes dramatic increases in esterase activity. Bioorganic & Medicinal Chemistry, 2000. 8(7): p. 1537-1544. 70. Arnórsdóttir, J., M.M. Kristjánsson, and R. Ficner, Crystal structure of a subtilisin-like serine proteinase from a psychrotrophic Vibrio species reveals structural aspects of cold adaptation. FEBS Journal, 2005. 272 (3): p. 832-845. 71. Lonhienne, T., C. Gerday, and G. Feller, Psychrophilic enzymes: revisiting the thermodynamic parameters of activation may explain local flexibility. Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology, 2000. 1543 (1): p. 1-10. 72. Saeki, K., et al., Detergent alkaline proteases: enzymatic properties, genes, and crystal structures. Journal of Bioscience and Bioengineering, 2007. 103 (6): p. 501-508. 73. Bryan, P.N., Protein engineering of subtilisin. Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology, 2000. 1543 (2): p. 203-222. 74. Zhao, H. and F.H. Arnold, Directed Evolution converts subtilisin E into a functional equivalent of thermitase. Protein Eng., 1999. 12 (1): p. 47-53. 75. Gros, P., et al., Molecular dynamics refinement of a thermitase-eglin-c complex at 1.98 Å resolution and comparison of two crystal forms that differ in calcium content. Journal of Molecular Biology, 1989. 210 (2): p. 347-367. 76. Almog, O., et al., Structural Basis of Thermostability. Journal of Biological Chemistry, 2002. 277 (30): p. 27553-27558. 77. Strausberg, S.L., et al., Directed Coevolution of Stability and Catalytic Activity in Calcium-free Subtilisin†Biochemistry, 2005. 44 (9): p. 3272-3279. 78. Miyazaki, K. and F.H. Arnold, Exploring Nonnatural Evolutionary Pathways by Saturation Mutagenesis: Rapid Improvement of Protein Function. Journal of Molecular Evolution, 1999. 49 (6): p. 716-720. 79. Siegert, P., et al., Novel alkaline protease from Bacillus gibsonii and washing and cleaning agents containing said novel alkaline protease . 2009, Henkel AG & Co. KGaA. 80. Wong, T.S., et al., Sequence saturation mutagenesis (SeSaM): a novel method for Directed Evolution. Nucl. Acids Res., 2004. 32 (3): p. e26-. 81. Yang, M.Y., E. Ferrari, and D.J. Henner, Cloning of the neutral protease gene of Bacillus subtilis and the use of the cloned gene to create an in vitro-derived deletion mutation. J. Bacteriol., 1984. 160 (1): p. 15-21. 82. Wu, X.C., et al., Engineering a Bacillus subtilis expression-secretion system with a strain deficient in six extracellular proteases. J. Bacteriol., 1991. 173 (16): p. 4952-4958. 83. Sambrook, J. and D.W. Russell, Molecular Cloning: A Laboratory Manual . 2001: Cold Spring Harbor Laboratory Press. 84. Inoue, H., H. Nojima, and H. Okayama, High efficiency transformation of Escherichia coli with plasmids. Gene, 1990. 96 (1): p. 23-28. 85. Miyazaki, K., Creating Random Mutagenesis Libraries by Megaprimer PCR of Whole Plasmid (MEGAWHOP) , in Directed Evolution Library Creation . 2003. p. 23-28. 86. Wang, W. and B.A. Malcolm, Two-Stage Polymerase Chain Reaction Protocol Allowing Introduction of Multiple Mutations, Deletions, and Insertions, Using QuikChangeTM Site- Directed Mutagenesis , in In Vitro Mutagenesis Protocols . 2002. p. 37-43. 87. DelMar, E.G., et al., A sensitive new substrate for chymotrypsin. Analytical Biochemistry, 1979. 99 (2): p. 316-320. 88. Goddette, D.W., et al., The crystal structure of the Bacillus lentus alkaline protease, subtilisin BL, at 1.4 Å resolution. Journal of Molecular Biology, 1992. 228 (2): p. 580-595. 89. Eswar, N., et al., Protein structure modeling with MODELLER , in Methods in molecular biology (Clifton, N.J.) . 2008. p. 145-159. 90. Humphrey, W., A. Dalke, and K. Schulten, VMD: Visual molecular dynamics. Journal of Molecular Graphics, 1996. 14 (1): p. 33-38. 103 References

91. Taguchi, S., S. Komada, and H. Momose, The Complete Amino Acid Substitutions at Position 131 That Are Positively Involved in Cold Adaptation of Subtilisin BPN'. Appl. Environ. Microbiol., 2000. 66 (4): p. 1410-1415. 92. Pulido, M.A., et al., Directed Evolution of Tk-subtilisin from a hyperthermophilic archaeon: identification of a single amino acid substitution responsible for low- temperature adaptation. Protein Engineering, Design and Selection, 2007. 20 (3): p. 143- 153. 93. Tracewell, C.A. and F.H. Arnold, Directed enzyme evolution: climbing fitness peaks one amino acid at a time. Current Opinion in Chemical Biology, 2009. 13 (1): p. 3-9. 94. Jang, J.W., et al., Enhanced thermal stability of an alkaline protease, AprP, isolated from a Pseudomonas sp. by mutation at an autoproteolysis site, Ser-331. 2001. 34 (Pt 2): p. 81- 84. 95. Takagi, H., et al., Enhancement of the thermostability of subtilisin E by introduction of a disulfide bond engineered on the basis of structural comparison with a thermophilic serine protease. Journal of Biological Chemistry, 1990. 265 (12): p. 6874-6878. 96. Erwin, C.R., et al., Effects of engineered salt bridges on the stability of subtilisin BPN'. Protein Eng., 1990. 4(1): p. 87-97. 97. Jaouadi, B., et al., Enhancement of the thermostability and the catalytic efficiency of Bacillus pumilus CBS protease by site-directed mutagenesis. Biochimie, 2010. 92 (4): p. 360-369. 98. Daniel, R.M., et al., Enzyme stability and activity at high temperatures . 2008, Nova Scince Publishers. 99. Liang, H.-K., et al., Amino acid coupling patterns in thermophilic proteins. Proteins: Structure, Function, and Bioinformatics, 2005. 59 (1): p. 58-63. 100. McCammon, J.A., B.R. Gelin, and M. Karplus, Dynamics of folded proteins. Nature, 1977. 267 (5612): p. 585-590. 101. Karplus, M. and A. McCammon, Molecular dynamics simulations of biomolecules. Nature Structural & Molecular Biology, 2002. 9(9): p. 646-652. 102. Dodson, G.G., D.P. Lane, and C.S. Verma, Molecular simulations of protein dynamics: new windows on mechanisms in biology. EMBO Rep., 2008. 9(2): p. 6. 103. Scharnagl, C., M. Reif, and J. Friedrich, Stability of proteins: Temperature, pressure and the role of the solvent. Biochimica et Biophysica Acta (BBA) - Proteins & Proteomics, 2005. 1749 (2): p. 187-213. 104. Klepeis, J.L., et al., Long-timescale molecular dynamics simulations of protein structure and function. Current Opinion in Structural Biology, 2009. 19 (2): p. 120-127. 105. Bernard, R.B., et al., CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. Journal of Computational Chemistry, 1983. 4(2): p. 187-217. 106. Weiner, P.K. and P.A. Kollman, AMBER: Assisted model building with energy refinement. A general program for modeling molecules and their interactions. Journal of Computational Chemistry, 1981. 2(3): p. 287-303. 107. Scott, W.R.P., et al., The GROMOS Biomolecular Simulation Program Package. The Journal of Physical Chemistry A, 1999. 103 (19): p. 3596-3607. 108. Thomas, D.N. and G.S. Dieckmann, Antarctic Sea Ice--a Habitat for Extremophiles. Science, 2002. 295 (5555): p. 641-644. 109. Jenney Jr, F. and M. Adams, The impact of extremophiles on structural genomics (and vice versa). Extremophiles, 2008. 12 (1): p. 39-50. 110. van den Burg, B., Extremophiles as a source for novel enzymes. Current Opinion in Microbiology, 2003. 6(3): p. 213-218. 111. Demirjian, D.C., F. Morís-Varas, and C.S. Cassidy, Enzymes from extremophiles. Current Opinion in Chemical Biology, 2001. 5(2): p. 144-151. 112. Herbert, R.A., A perspective on the biotechnological potential of extremophiles. Trends in Biotechnology, 1992. 10 : p. 395-402.

104 References

113. Gupta, R., Q. Beg, and P. Lorenz, Bacterial alkaline proteases: molecular approaches and industrial applications. Applied Microbiology and Biotechnology, 2002. 59 (1): p. 15- 32. 114. Vieille, C. and G.J. Zeikus, Hyperthermophilic Enzymes: Sources, Uses, and Molecular Mechanisms for Thermostability. Microbiol. Mol. Biol. Rev., 2001. 65 (1): p. 1-43. 115. Vieille, C. and G.J. Zeikus, Thermozymes: Identifying molecular determinants of protein structural and functional stability. Trends in Biotechnology, 1996. 14 (6): p. 183-190. 116. Matthews, B.W., H. Nicholson, and W.J. Becktel, Enhanced protein thermostability from site-directed mutations that decrease the entropy of unfolding. Proceedings of the National Academy of Sciences of the United States of America, 1987. 84 (19): p. 6663- 6667. 117. Zhang, Y., et al., The Crystal Structure of 5'-Deoxy-5'-methylthioadenosine Phosphorylase II from Sulfolobus solfataricus, a Thermophilic Enzyme Stabilized by Intramolecular Disulfide Bonds. Journal of Molecular Biology, 2006. 357 (1): p. 252-262. 118. Storch, E.M., V. Daggett, and W.M. Atkins, Engineering Out Motion: Introduction of a de Novo Disulfide Bond and a Salt Bridge Designed To Close a Dynamic Cleft on the Surface of Cytochrome b5†Biochemistry, 1999. 38 (16): p. 5054-5064. 119. Tehei, M. and G. Zaccai, Adaptation to high temperatures through macromolecular dynamics by neutron scattering. FEBS Journal, 2007. 274 (16): p. 4034-4043. 120. Boehr, D.D., H.J. Dyson, and P.E. Wright, An NMR Perspective on Enzyme Dynamics. Chemical Reviews, 2006. 106 (8): p. 3055-3079. 121. Henzler-Wildman, K. and D. Kern, Dynamic personalities of proteins. Nature, 2007. 450 (7172): p. 964-972. 122. van Gunsteren, W.F., et al., Biomolecular modeling: Goals, problems, perspectives. Angew Chem Int Ed Engl, 2006. 45 (25): p. 4064-92. 123. Adcock, S.A. and J.A. McCammon, Molecular dynamics: survey of methods for simulating the activity of proteins. Chem Rev, 2006. 106 (5): p. 1589-615. 124. Mittermaier, A.K. and L.E. Kay, Observing biological dynamics at atomic resolution using NMR. Trends in Biochemical Sciences, 2009. 34 (12): p. 601-611. 125. Krishnamurthy, H., et al., Dynamics in Thermotoga neapolitana Adenylate Kinase: N-15 Relaxation and Hydrogen-Deuterium Exchange Studies of a Hyperthermophilic Enzyme Highly Active at 30 degrees C. Biochemistry, 2009. 48 (12): p. 2723-2739. 126. Scheraga, H.A., M. Khalili, and A. Liwo, Protein-folding dynamics: overview of molecular simulation techniques. Annu Rev Phys Chem, 2007. 58 : p. 57-83. 127. Daggett, V., Protein folding-simulation. Chemical Reviews, 2006. 106 (5): p. 1898-1916. 128. Schaeffer, R.D., A. Fersht, and V. Daggett, Combining experiment and simulation in protein folding: closing the gap for small model systems. Current Opinion in Structural Biology, 2008. 18 (1): p. 4-9. 129. Grottesi, A., et al., Molecular dynamics study of a hyperthermophilic and a mesophilic rubredoxin. Proteins: Structure, Function, and Bioinformatics, 2002. 46 (3): p. 287-294. 130. Papaleo, E., et al., Protein flexibility in psychrophilic and mesophilic trypsins. Evidence of evolutionary conservation of protein dynamics in trypsin-like serine-proteases. FEBS Letters, 2008. 582 (6): p. 1008-1018. 131. Colombo, G. and K.M. Merz, Stability and Activity of Mesophilic Subtilisin E and Its Thermophilic Homolog: Insights from Molecular Dynamics Simulations. Journal of the American Chemical Society, 1999. 121 (29): p. 6895-6903. 132. Priyakumar, U.D., et al., Structural and Energetic Determinants of Thermal Stability and Hierarchical Unfolding Pathways of Hyperthermophilic Proteins, Sac7d and Sso7d. Journal of Physical Chemistry B, 2010. 114 (4): p. 1707-1718. 133. Polyansky, A.A., Y.A. Kosinsky, and R.G. Efremov, Correlation of local changes in the temperature-dependent conformational flexibility of thioredoxins with their thermostability. Russian Journal of Bioorganic Chemistry, 2004. 30 (5): p. 421-430.

105 References

134. Merkley, E.D., W.W. Parson, and V. Daggett, Temperature dependence of the flexibility of thermophilic and mesophilic flavoenzymes of the nitroreductase fold. Protein Engineering, Design and Selection, 2010: p. gzp090. 135. Spiwok, V., et al., Cold-active enzymes studied by comparative molecular dynamics simulation. Journal of Molecular Modeling, 2007. 13 (4): p. 485-497. 136. Melchionna, S., R. Sinibaldi, and G. Briganti, Explanation of the stability of thermophilic proteins based on unique micromorphology. Biophysical Journal, 2006. 90 (11): p. 4204- 4212. 137. Kundu, S. and D. Roy, Comparative structural studies of psychrophilic and mesophilic protein homologues by molecular dynamics simulation. Journal of Molecular Graphics & Modelling, 2009. 27 (8): p. 871-880. 138. Vihinen, M., Relationship of protein flexibility to thermostability. Protein Engineering, 1987. 1(6): p. 477-480. 139. Almog, O., et al., The crystal structures of the psychrophilic subtilisin S41 and the mesophilic subtilisin Sph reveal the same calcium-loaded state. Proteins: Structure, Function, and Bioinformatics, 2009. 74 (2): p. 489-496. 140. Guex, N. and M.C. Peitsch, SWISS-MODEL and the Swiss-Pdb Viewer: An environment for comparative protein modeling. Electrophoresis, 1997. 18 (15): p. 2714-2723. 141. Jorgensen, W. and J. Tirado-Rives, The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. Journal of the American Chemical Society, 1988. 110 (6): p. 1657- 1666. 142. Jorgensen, W.L., et al., Comparison of simple potential functions for simulating liquid water. The Journal of Chemical Physics, 1983. 79 (2): p. 926-935. 143. Hess, B., et al., GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. Journal of Chemical Theory and Computation, 2008. 4(3): p. 435-447. 144. Berendsen, H.J.C., et al., Molecular dynamics with coupling to an external bath. The Journal of Chemical Physics, 1984. 81 (8): p. 3684-3690. 145. Hess, B., et al., LINCS: A linear constraint solver for molecular simulations. Journal of Computational Chemistry, 1997. 18 (12): p. 1463-1472. 146. Miyamoto, S. and P.A. Kollman, Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models. Journal of Computational Chemistry, 1992. 13 (8): p. 952-962. 147. Essmann, U., et al., A smooth particle mesh Ewald method. The Journal of Chemical Physics, 1995. 103 (19): p. 8577-8593. 148. Kabsch, W. and C. Sander, Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983. 22 (12): p. 2577-2637. 149. Daura, X., et al., Peptide folding: When simulation meets experiment. Angewandte Chemie International Edition, 1999. 38 (1-2): p. 236-240. 150. Amadei, A., A.B.M. Linssen, and H.J.C. Berendsen, Essential dynamics of proteins. Proteins: Structure, Function, and Genetics, 1993. 17 : p. 412-425. 151. Amadei, A., M.A. Ceruso, and A.D. Nola, On the convergence of the conformational coordinates basis set obtained by the essential dynamics analysis of proteins' molecular dynamics simulations. Proteins: Structure, Function, and Genetics, 1999. 36 (4): p. 419- 424. 152. Roccatano, D., et al., Selective Excitation of Native Fluctuations during Thermal Unfolding Simulations: Horse Heart Cytochrome c as a Case Study. Biophysical Journal, 2003. 84 (3): p. 1876-1883. 153. Takeuchi, Y., et al., Molecular recognition at the active site of subtilisin BPN': crystallographic studies using genetically engineered proteinaceous inhibitor SSI (Streptomyces subtilisin inhibitor). Protein Engineering, 1991. 4(5): p. 501-508.

106 References

154. Jang, H.J., et al., Two Flexible Loops in Subtilisin-like Thermophilic Protease, Thermicin, from Thermoanaerobacter yonseiensis Journal of Biochemistry and Molecular Biology, 2002. 35 (5): p. 498~507.

107

108

Declaration I hereby declare that this thesis was written by me. The scientific data and findings presented in this work are my own research findings. This thesis has been only submitted to Jacobs University Bremen gGmbH.

Ronny Martinez Moya July 31 Bremen, Germany

109