Enzyme Evolution: Exploring Scope and Catalytic Promiscuity

I n a u g u r a l d i s s e r t a t i o n

zur

Erlangung des akademischen Grades eines

Doktors der Naturwissenschaften (Dr. rer. nat.)

der

Mathematisch-Naturwissenschaftlichen Fakultät

der

Ernst-Moritz-Arndt-Universität Greifswald

vorgelegt von Michael Fibinger, geb. Daumann geboren am 19.05.1988 in Berlin (Mitte)

Greifswald, den 31.03.2016

Dekan: Prof. Dr. Werner Weitschies

1. Gutachter : Prof. Dr. Uwe T. Bornscheuer

2. Gutachter: Prof. Dr. Romas J. Kazlauskas

Tag der Promotion: 27.06.2016

dedicated to my loving wife Diana

I never predict anything, and I never will. Paul Gascoigne

3

Chapter 1.

Exploring the enantioselective conversion of malonate esters by in silico modeling of the PLE isoenzymes

Pig liver esterase (PLE) was first described in 1904 by its esterase activity in crude extracts of porcine liver.[1-2] The ’s esterase activity is influenced by solvent addition and can dramatically increase, if crucial co-solvent concentrations are achieved.[3-8] First hypothesized as different subunits with separate enzymatic activity, today six isoenzymes (PLE1-6)[9-10] and a related alternative esterase[11] are known, differing only in a few amino acid residues. Each enzyme variant is able to form stable and active trimeric complexes. Small, but also bulky carboxylic esters, as well as thioesters or amides are accepted and hydrolyzed.[12] The broad substrate spectrum is caused by the intestinal function of detoxification and processing in pig liver, facing the complex composition of the omnivore diet. Sharing the same substrate specificity, the enantiomeric outcome can be very different throughout the isoenzymes. The achieved enantiomeric excess is solvent dependent, whereas the effect decreases with increasing size of solvent molecules.[7] Focusing this phenomenon, structural insights were obtained by molecular modeling and thus new residue contacts were found, explaining the binding mode of the investigated prochiral malonate diethyl esters.

Part of this chapter has been published: M.E. Smith, M.P.C. Fibinger, U.T. Bornscheuer, D.S. Masterson, ChemCatChem 2015, 7, 3179-3185. doi: 10.1002/cctc.201500597 Chapter one 4

Content

Introduction ...... 5 PLE - a mixture of isoenzymes ...... 5 Structural aspects of PLE ...... 8 Properties of PLE and the stereoselective divergence ...... 12 Promiscuous PLE ...... 17 Results and Discussion ...... 18 Initial homology modeling of the PLE isoenzymes ...... 18 Homology models of PLE2-6 and tunnel cluster analysis ...... 21 Docking preparation and former results ...... 27 Molecular docking of malonate esters ...... 28 Substrate 1 (diethyl 2-((1,3-dioxoisoindolin-2-yl)methyl)-2-methylmalonate) ...... 29 Substrate 2 (diethyl 2-((1,3-dithiooxoisoindolin-2-yl)methyl)-2-methylmalonate) . 33 Substrate 3 (diethyl 2-(benzyloxymethyl)-2-methylmalonate) ...... 34 Substrate 4 (diethyl 2-(benzylthiomethyl)-2-methylmalonate) ...... 36 Substrate 5 (diethyl 2-benzyl-2-methylmalonate) ...... 37 Substrate 6 (diethyl-2-methyl-2-((pyridine-2-yl-)methyl)malonate) ...... 39 Substrate 7 (diethyl-2-methyl-2-((pyridine-3-yl-)methyl)malonate) ...... 41 Substrate 8 (diethyl-2-methyl-2-((pyridine-4-yl-)methyl)malonate) ...... 43 Molecular dynamics simulation of substrate 5 ...... 46 Flexible residues and long-range interaction networks ...... 47 Concurring acidic residues of the ...... 51 Summary ...... 53 Materials and Methods ...... 54 Substrate preparation ...... 54 Homology modeling ...... 54 Molecular docking ...... 54 Molecular dynamics simulation ...... 55 Tunnel cluster analysis ...... 55 Network identification and residue contact detection ...... 55 Abbreviations ...... 56 Reference ...... 57

Chapter one 5

Introduction

Pig liver esterase (PLE, also referred to as porcine liver esterase) is a broadly investigated enzyme of mammalian origin, accepting numerous carboxylic esters, even if these are sterically hindered or adjacently connected to bulky substituents. Then, and now, the broad substrate spectrum and the obtained regio- and enantioselectivity of PLE arouse high interest for the application of the enzyme in asymmetric synthesis or kinetic resolution of various esters to achieve enantiopure compounds (Figure 1).

PLE, 25 °C pH 8, 1.5 h MeO2C CO2Me HOOC CO2Me NH2 94% NH2

ClCO2CH2Ph/Et3N 96% 1) PLE, 25 °C pH 8, 7 h

MeO2C CO2Me 2) H2-Pd/C HOOC CO2Me NHCO2CH2Ph 93% NH2 Figure 1: Asymmetrization of 3-amino-pentanedioic acid dimethyl ester.[13] Both enantiomers are accessible by protection of the amino function with a bulky substituent, turning the enzyme’s selectivity towards the (S)-enantiomer. Subsequent ring closure yields a 4-membered lactam, the core function of the β-lactam antibiotic family of carbapenems.

The first reported application of PLE was published in the beginning of the 20th century for the hydrolysis of mandelic esters[1-2], precursors for the drugs cyclandelate and homatropine. Later, PLE was successfully applied for the asymmetrization of prochiral mevalonic acids.[14] The rediscovery of the enzyme led to an enormous research interest, precisely because of its interesting features for organic synthesis and the increasing focus on enzymatic and chemoenzymatic catalysis. Despite the rising number of possible reactions involving PLE and the discovery of new substrate classes, its characterization was challenging.

PLE - a mixture of isoenzymes In 1979, sixteen bands with esterase activity were recognized by electro-focusing and six major fractions with a pI from 4.8 to 5.8 were identified.[3] Analyzing the substrate specificity of these fractions, three major components were concluded.[4] They differ in their apparent molecular masses of 58.2 kDa (α-subunit), 59.7 kDa (β-subunit) and 61.4 kDa (γ-subunit) and their corresponding C-terminal amino acids leucine, glycine or alanine, respectively. Additionally, a lower aspartate, but higher arginine content was Chapter one: Introduction 6 found for the α-subunit compared to the γ-subunit. From the initial fractions, number I consisted mainly of the γ-subunit and exhibited choline esterase like properties, while accepting estrone acetate, butyrylcholine and long chain methanol esters. Fraction I is inhibited by fluoride ions and the parasympathomimetic (cholinomimetic) drug physostigmine. The pH optimum towards the conversion of methyl hexanoate was at 8.0. The organic co-solvents acetonitrile and acetone increased the enzymatic activity towards 4-nitrophenyl acetate. In case of the first co-solvent the rate doubles at a concentration of 1.25 mol L-1 acetonitrile. A maximum activity increase of 150% at 0.2 mol L-1 acetone was achieved, before the co-solvent inhibits the conversion at a concentration of 0.5 mol L-1 acetone. In contrast, methanol increases the phenol release from phenyl acetate 5-fold at a concentration of 1.25 mol L-1 methanol, but titrimetric measurements proved that the generated acetic acid undergoes transesterification by methyl ester formation. Fraction V contained the α-subunit and was not able to convert butyrylcholine, but short chain aliphatic esters. The highest rate of methyl hexanoate conversion was reached at pH 6.5. Solvents affected the activity of fraction V negatively: After a slight increase of conversion, concentrations higher than 2 mol L-1 acetonitrile inhibited the enzyme as well as minimal additions of acetone (25% residual activity at 0.2 mol L-1 acetone). Methanol increased the activity of fraction V only 1.7-fold. The reaction rates were measured with the same substrates as for fraction I.[3-4] The residual fractions II-IV were assumed to be mixtures of the subunits, also containing the minor existent β-subunit. However, in a different project regarding the separation of crude PLE, seven isoenzymes were described.[15] Finally, another pig intestinal enzyme was isolated as proline-β-naphthyl amidase, oligo- merizing in a trimeric propeller-like structure of identical subunits. By electron microscopy, the dissociation into functional, but 2.5-fold less active monomers was observed at 4 °C and pH 4.5.[16-17] When analyzing the chemical properties and the amino acid sequence, the hypothesis emerged, that the enzyme is identical to γPLE, trimerizing mainly in γγγ, αγγ, ααγ and ααα macromolecules.[18-19] The results were supported as conversion of proline-β-naphthyl amide was found with fraction I (γPLE).[20] Since more information towards the inhomogeneity of the crude PLE extract was published, the different described stereoselectivities could be explained.[3-4, 20-21] At first glance, the commercial PLE was supposed to show essentially the same stereospecificity towards monocyclic and acyclic diesters and hence the crude extract was roughly treated as one enzyme.[22] However, the fabricated extracts usually differ in their composition, because the expression of the isoenzymes is strongly dependent on the pig diet and the animal’s origin.[23-25] In the biological context, the intestinal carboxyl esterases have an essential role in processing and conversion of metabolic compounds, even if these are of body own or foreign origin. Carboxylic esters, thioesters and amides are targeted in detoxification processes, as well as organophosphorus derivatives. The detoxification function towards assimilated substances also supports the broad substrate tolerance of the extract, because nutrients are usually directly transferred to Chapter one: Introduction 7 the liver by the hepatic portal vein. Thus, the isoenzyme expression will be tightly regulated and initiated or repressed by the nutrient composition.[26] Furthermore, liver carboxylesterases from horse (HLE)[27-28], rabbit (RLE)[29] and chicken (CCLE)[30-32] were identified and successfully applied in organic synthesis. However, concerning selectivity, activity, applicability and availability, the most applied esterase is PLE.[33] Although an easy preparation of crude PLE as acetone powder was available[26, 34], the purification of the specific isoenzymes was desired, due to the animal origin of the crude extract and its inhomogeneity. Hence, the reproducible application is strongly limited. Additionally, the potential virus and prion infections and the impurity of pig products considered in certain world religions decreased the commerciality of the PLE crude extract.[15, 19, 35] The deduced amino acid sequence of γPLE was cloned and recombinantly expressed first in Pichia pastoris.[36] Successful expression was dependent of a C-terminal tetrapeptide sequence, originally mediating the retention of the protein inside the endoplasmic reticulum. The recombinant gene possesses the highest activity at 60 °C and pH 8. The purified enzyme showed higher E-values for the resolution of acetates of secondary alcohols than the crude extract.[37] For instance, the crude PLEs from Fluka, Sigma and Roche prefer the conversion of (R)-1-phenyl-3-butylacetate, but the recombinant γPLE favors the (S)-enantiomer (Figure 2).

1 or 2 OH

R n O (S)-product PLE PLE OH 1 or 2 R O (n = 0) OH R (R)-product R (R)-product

n R Fluka Sigma Roche recombinant γPLE 0 -CH3 56%ee (R) 58%ee (R) 58%ee (R) 53%ee (R) 0 -CH2CH3 28%ee (R) 19%ee (R) 13%ee (R) 20%ee (R) 1 -CH3 44%ee (R) 32%ee (R) 43%ee (R) 70%ee (S) 1 -CH2CH3 12%ee (S) 11%ee (S) 18%ee (S) >99%ee (S) 1 -CH2CH2CH3 26%ee (S) 13%ee (S) 11%ee (S) 78%ee (S) 2 -CH3 42%ee (R) 25%ee (R) 29%ee (R) 59%ee (S) Figure 2: Enantioselective reactions of acetic acid 1-benzyl/phenyl alkylic esters to the corresponding alcohols in kinetic resolution. The enantiomeric excess is listed only for the produced alcohol.[23, 37] The (R)-enantiomers of smaller substituted substrates are preferentially hydrolyzed. While increasing the length of the chain adjacent to the chiral carbon atom, excess of (S)-product is achieved. Interestingly, the %ee varies throughout the commercial crude PLE extracts, but one pure component (recombinant γPLE) showed inverted selectivity for two substrates. The conversions vary from 40-60% for all compounds. Chapter one: Introduction 8

The enantiomeric outcome of the asymmetric synthesis depends on the isoenzyme ratio in the crude extract. Consequently, the applicability of PLE decreases due to the hard reproducibility of the reactions. But with the first recombinant PLE isoenzyme at hand, its relationship to another enzyme, the porcine intestinal carboxyl esterase (PICE) was discovered. Sharing a sequence identity of 97% - meaning a total difference of 17 amino acids - eight mutants of γPLE were created and characterized. A 6-fold increase in the enantioselectivity was achieved, by the substitution of only one amino acid.[24-25] Furthermore, not only the selectivity value was changed, but also the enantiopreference became tunable. The next step was the successful recombinant expression of γPLE in E. coli Origami (DE3), but co-expression of the chaperones GroEL/GroES was essentially necessary. Thus, the productivity of the enzyme production was increased from 4 U L-1 h- 1 by fungal expression to 27 U L-1 h-1 by bacterial expression.[38] Finally, six different isoenzymes were definitely identified by RNA isolation and primer specific reverse transcription of crude pig liver material.[9-10] Furthermore, a seventh isoenzyme was discovered.[11] Interestingly the same amount of initial major fractions of the crude extract was reported thirty years before.[3-4, 15] According to the published results, only γPLE was assumed to be identical to PLE1, but the other isoenzymes cannot be easily allocated to the former fractions. PLE1-6 differ in substrate and stereoselectivity. The seventh isoenzyme, called alternative PLE (APLE), was identified because the PLE crude extract converts methyl-(4E)-5-chloro-2-isopropyl-4-pentenoate preferentially to the (R)-product, but the ester is not a substrate for PLE1/γPLE. Thus, APLE was discovered, converting p-nitrophenyl acetate with lower activity than crude PLE (127 U mg-1 and 64 U mg-1, respectively) but methyl-(4E)-5-chloro-2-isopropyl-4- pentenoate with a 6-fold higher activity (36 U mg-1 and 6 U mg-1, respectively). The authors concluded that 16% of the used crude extract contained APLE, raising this enzyme to the second largest fraction in PLE extracts.[11] Because APLE also accepts dimethyl methylsuccinate and dimethyl 3-methylglutarate, but PLE1/γPLE only the first compound, it was stated that APLE is responsible for the hydrolysis of substituted aliphatic methyl esters during the different drug pathways.[11]

Structural aspects of PLE The PLE isoenzymes are part of the α/β fold family.[39] This fold class harbors many with completely different enzymatic activities, like lipases,[40] esterases,[41] transacetylases[42], epoxide [43], [44], dioxygenases[45], peptidases[46] or haloalkane dehalogenases[47-48]. Also non-catalytic proteins as neurotactin[49] or thyroglobulin[50] fold in the typical topology (Figure 3).[51] In PLE and other carboxylesterases, the mechanistically conserved residues are clustered in the catalytic triad, consisting of a nucleophilic serine, a basic histidine and an acidic charge relay residue - aspartic or glutamic acid. In PLE, S205 and H450 are the first residues of Chapter one: Introduction 9 the catalytic triad. In case of the acidic residue, E337 or E453 were discussed for PLE, because both positions have homologues in certain esterase families.[9-10, 52]

Figure 3: Topology of the α/β hydrolase fold.[53] The core is made from parallel β-strands, which form a left-handed twisted β-sheet. The second strand (here: β2) is usually anti-parallel. The β- strands are connected by α-helices, surrounding the central β-sheet. The conserved positions of the (OxAnH), of the nucleophile (S of the GxSxG sequence), the basic histidine (His) and the catalytic acid (Acid) are indicated. The charge relay acid can be localized C- terminally of β6, too.[54] Additionally, the fold can be truncated, expanded by additional sheets or helices or sequences can be inserted (usually C-terminally of a β-strand, e.g. β6 indicated by a dotted connection to α4), which fold in separate globular domains (e.g. the cap domains of epoxide hydrolases or Brefeldin A esterase or the lid domains of lipases).[55]

The nucleophilic serine was identified by the typical GXSXG motif of the nucleophilic elbow in α/β hydrolase folds (GESAG pentapeptide in human, rabbit and pig liver esterases). The oxyanion hole is partially made up from the typical GGG(A)X motif, with

G125GGL in the mentioned esterases. The motif is a special feature, because hydrolases owing this sequence are able to convert tertiary alcohol esters.[56-59] In a mutagenesis study, the glycine residues were sequentially substituted by alanine residues, but only the first and the last one can be mutated without completely losing the special activity towards the tested tertiary alcohol esters. Already the substitution of this first or third glycine residue to the small side chained alanine has had a decreasing effect towards the activity.[60] PLE is processed posttranslational into a glycoprotein and bound fatty acids were identified, playing a role in oligomerization, signal transduction or protein half-life [61] [62] time. Two disulfide bridges are essential between C70-C99 and C256-C267. The first structural explorations were based on the construction of a substrate model, adapted to the known substrate scope of crude PLE for the prediction of the stereospecific outcome (Figure 4).[63] Chapter one: Introduction 10

L polar non-polar L S N/S S/M

O OCH3 Figure 4: Substrate model for crude PLE to predict the substrate acceptance and the stereospecific outcome of catalyzed reactions. The positions of possible substituents depend on their properties: L - large substituents with polar side chains, S - small and non-polar groups, N/S - no or small substituents, S/M - small to medium sized groups. The nucleophilic attack occurs at the carboxylic function.[63]

Later on, a structural model was proposed on the basis of one hundred investigated substrates and their appropriate selective outcome. It explained very reliably the acceptance of potential substrates and the possible stereospecific outcome, also delivering a potential enantiomeric excess (Figure 5).[64]

Figure 5: Active site model for the prediction of hydrolysis reactions with crude PLE.[64-67] The dimensions of the model were the last published, proven by substrates with different sized substituents. Ser - catalytic serine for the nucleophilic attack of the carbonylic function, HL - and Hs - large and small hydrophobic pockets, PB and PF - rear (backside) and front polar pocket, respectively. The model is restricted by the surfaces of the walls and floor, but open on the top. For application, please see text.

With the calculated model at hand, the reversed selectivities of substituted malonic, succinic and glutaric acid esters became explainable. The dimensions of the large binding pocket (HL) increased over time, during the investigations on new substrates. Initially, the edge lengths were calculated to be 4.6; 3.1; 2.3 Å, x; y; z respectively in Figure 5.[64]

The size of HL was probed with bigger substituents for analogous substrates and increased to 5.37; 2.3; 2.3 Å[65] and later to 6.2; 3.9; 2.3 Å[67]. Finally, more steric hindered substrates were synthesized and the final dimensions were 6.2; 4.7; 3.1 Å.[68] Chapter one: Introduction 11

To apply the model sufficiently, different rules must be respected: The carbonyl group of the hydrolyzed ester function must be placed within the sphere of the nucleophile serine (Ser in Figure 5), reflecting the length of a potential sp3 carbon-oxygen bond. The remaining groups are then matched with the most appropriate pockets of the surrounding area. Manually building the substrate model with a molecular modeler kit by hand significantly simplified the comprehension and predictability. The substituents are placed with the following restrains: (a) hydrophobic parts bind into HL or HS. Smaller substituents prefer binding to HS, bigger sized in HL, but the general affinity to HS is higher. This rule is the basis for size induced reversal of stereoselectivity. (b) If diesters are investigated, the second ester function will be bind in PF. (c) Any function, capable of hydrogen bond formation with PF will do so. (d) When more binding modes are possible, they will synergistically improve the selectivity. In fact, opposite stereoselectivities will yield the predominant product with the better fit. The model was adapted successfully, but in particular cases, the prediction can be difficult.[69-71] With the emerging analysis of protein structures by NMR-spectroscopy and crystallography, also mammalian carboxylesterase structures were published, namely first from rabbit liver[72], then from human origin[73]. The structures share an identity of up to 75% to yPLE/PLE1 and can partially explain the substrate specificity of the first recombinant PLE isoenzyme.[24] Nowadays, 75 carboxylesterase structures are stored in the protein data bank, including mammalian, plant and bacterial enzymes with resolutions to 1.4 Å. From human origin, eighteen structures are available, also with different inhibitors or pharmaceutical compounds, mainly observed are trimeric oligomers (Figure 6).

Figure 6: Human carboxylesterase 1 (hCE1) in complex with Coenzyme A (pdb code 2H7C).[74] The structure was solved to 2.0 Å and showed the typical propeller oligomerization. The subunits are rainbow colored and Coenzyme A is depicted in sticks. Chapter one: Introduction 12

Since structural information about the PLE isoenzymes is scarce[75], homology models were calculated to explore the structural features of the isoenzymes and to gain information about their promiscuous behavior, regarding the special stereoselectivities.[52, 76-77] The only report of a homology model of PLE analyzed contact surfaces in the trimeric state, some active site residue interactions and drawn hypotheses about differences between the isoenzymes, without regarding different binding modes to speculate about enantioselective chemical behavior. For substrate entrance, residues 73-88 and 339-347 were assumed to serve as entrance helices. Extensively discussed was the catalytic acid: E337 was found to turn away after dynamic simulations, while E453 bounds to H450. 12,000 structures of the α/β hydrolase superfamily, 1,300 structures of the acetylcholine esterase family and 150 structures of the carboxylesterase family of the 3DM database[78] were aligned. For the superfamily, a conservation of the position 337 equivalent acid was found. Expressed numerically, a distribution of 74.4% Asp/19.9% Glu were localized, whereas position equivalent 453 was not a conserved residue, missing in the aligned core regions. But in the acetylcholine esterase family group, at position 337 91.2% Glu/6.7% Asp were determined and for position 453 63.2% Glu/ 33.8% Asp. Finally, the distribution in the subfamily was for position 337 96.0%Glu/2.6% Asp and for position 453 62.4% Glu/35.6% Asp, indicating their necessary functionality. Also for other α/β hydrolase-fold enzymes, more than one catalytic acid was described, being an evolutionary artifact of the divergent evolution.[47, 79] However, molecular docking of malonate esters to explain the properties of the active side model and to draw a clear correlation to the predicted binding sites has never been done before, as well as site directed mutagenesis of active site residues.

Properties of PLE and the stereoselective divergence PLE was found to be able to convert a manifold variance of esters, amides and thioesters. Already in 1994, more than 250 possible substrates were described.[80] Therefore, the enzyme became increasingly interesting for its practical application in organic synthesis. Exemplified substrates are given in Figure 7. PLE can be used for the desymmetrization of carboxylic acids with a stereogenic α- carbon or/and other additional stereocenters and shows high enantioselectivity also for primary or secondary meso-diols.[81] The kinetic resolution of secondary and tertiary alcohols and lactones was described, too. Also the conversion of carboxylic acid esters in xylofuranosides[82] and α,β-unsaturated cycloalkanes were reported with high regioselectivity.[83] Despite the huge substrate spectrum (excellently summarized by Gais and Theil[80]), PLE was clearly recommended for the hydrolysis of methyl esters of carboxylic acids and acetates of alcohols, especially when optically pure products are desired.[19] Thus, the application in asymmetric synthesis was reviewed for monocyclic meso-diesters, which are subsequently converted to lactones with high enantiomeric excess. In case of heterocyclic diesters, it was shown that the stereoselectivity is not Chapter one: Introduction 13 influenced by the heteroatom, but by the substituents adjacent to the chiral center. Bicyclic mono- and diesters become also hydrolyzed, but in the latter group a trans- relation between vicinal esters was essential for high selectivities.

COOH

COOH H15C7 COOH COOMe CO2Me CO2Me

>95%ee (R) >97%ee (R) 88%ee (R)

O CO2Me AcO PhH2CN NCH2Ph OH COOH PrO2C COOH >97%ee (S) 87%ee (S) 30%ee (S)

COOH O O COOH O HO CO2Me CO2Me PrCOO OH >97%ee (R) 4%ee (S,R) >95%ee (S)

COOH O Ph CO2Me EtO NO2 N O HOOC CO2Me COOH CO2Me 17%ee (S) >85%ee (R,S) >95%ee (R) (in 33% DMSO >99%ee) Figure 7: Example for hydrolysis products using crude PLE for organic synthesis. The enantiomeric excess is valued for the chiral center adjacent to the hydrolyzed function. In most cases, PLE shows high enantioselectivities while ensuring good yields, even with bulky substituted substrates.

When straight chain diesters in acyclic systems are converted, the stereoselectivity is strongly affected by the proximity of the prochiral center to the hydrolyzed ester group.[84] Even metal ion-coordinated complexes can be enantioselectively converted.[85] Furthermore, the organic synthesis of chiral compounds in kinetic resolution was applied for monocyclic esters and S-chiral sulfinyl- and P-chiral phosphinylacetates.[66, 86] In organic solvents, a slow acylation process was found, but the application is strongly limited, even if the slow conversion rate is improved by co-lyophylisation with methoxypolyethylene.[87-88] However, this underlines the high tolerance of PLE towards Chapter one: Introduction 14 co-solvents. In ionic liquids, PLE was shown to enable the same product yields in less time.[89] Additionally, PLE was immobilized e.g. on silica by p-phenylendiamine activation of the carrier and subsequent diazotation reaction with the tyrosine side chains. The immobilization yield was 52% and the material was reusable up to four times with a reaction time of seven days for each batch. The product yield towards dimethyl 2- methy-2-phenylmalonate decreased constantly from 84% to 34%, while the enantiomeric excess was constantly 86%ee of the (R)-product.[90] With free enzyme, a yield of 90% with 81%ee was reported before.[65] Also Schneider et al. described an enantiomeric ratio of 93:7.[91] Furthermore, it was possible to immobilize crude PLE on Eupergit C[92] or modified Sepharose[93]. Esterases in general are widely applied as stereoselective biocatalysts[94] and complex synthesis routes are possible in chemoenzymatic reactions.[95] A practical application was described for the synthesis of biologically active α-quaternary amino acids, using the Curtius rearrangement.[96] Biologically active silicon-containing dipeptidic aspartame or neotame analogues are accessible as well.[97] Commercially, the synthesis of (+)-biotin[98- 99], optically pure amino acids[100] as methyl cysteine[101] and the preparation of chiral synthons as reagents or starting materials for pheromones[102], anti-tumorigenics[103], HIV protease inhibitors[104] and other complex pharmacological compounds[105-107] is chemoenzymatically possible with PLE. More recently, the conversion of dimethyl cyclohex-4-ene-cis-1,2-dicarboxylate was improved by the usage of a separate isoenzyme. The product of the reaction leads to the access of anticapsins, aucantens and carbapenems.[108] PLE6 showed the highest values, whereas PLE1 and PLE2 exhibit only 3%, PLE3 and PLE4 up to 20% and PLE5 55% of the activity of PLE6.[109] The industrial application was further improved by the usage of the more economic meso-anhydride cis-3a,4,7,7a-tetrahydro-1,3-isobenzofurandione as precursor and thionyl chloride for the complete chemical esterification of the intermediate in the sequential multistep one-pot reaction with 84% yield and ≥99.5%ee. [110] Initially, no differences in the regioselectivity of different isoenzymes were described[22], but reactions with different enantioselectivities were detected. For example, towards prochiral dimethyl benzylmethylmalonate and cis-N-benzyl-2,5-bis(methoxy-carbonyl)- pyrrolidine, the enantiomeric excess depends on the pI of the used isoenzymes.[21] How the selectivity and the enantiomeric outcome in hydrolytic reactions can be influenced, was extensively reviewed.[111] Accordingly, the outcome is strongly affected by the solvent and the substrate properties. For example, the asymmetric hydrolysis of dimethyl 3-aminoglutarate depends on the size of the N-protective group. Adding a small group enhances the (R)-selectivity, but when bigger sized substituents were used, the selectivity decreases, until the enantiopreference switches and the selectivity increases again, but for the opposite enantiomer (Figure 8).[34]

Chapter one: Introduction 15

CO2Me CO2Me COOH PLE PLE NHR NHR NHR

COOH CO2Me CO2Me prochiral (S)-product (R)-product meso-compound

R %ee A.C. -H 41 (R)

-CO-CH3 93 (R) -CO-C H 6 (R) 2 5 -CO-nC4H9 2 (S) -CO-CH(CH3)2 54 (S) -CO-C6H11 79 (S) -CO-C(CH3)3 93 (S) -CO-(E)CH=CH-CH3 >97 (S)

A.C. - absolute configuration

Figure 8: Selectivity enhancement and selectivity switch of PLE-hydrolysis of amino protected methyl 3-aminoglutarate diesters, depending on the size of the protective group.[34]

Analogous, the hydrolysis of malonate esters is affected by substituents at the 2-position and the esterified alcohols (Figure 9).[84, 107, 112] The general effect towards the selectivity and the enantiomeric excess value itself was lower for ethyl esters than for methyl esters. The enantioselectively produced chiral products are subsequently applied in synthetic applications.[81] In the special case of substituted malonate esters, the active site model predicts the (R)- enantiomer with bulkier substituents.[68] When the substituent reaches a certain size, the methyl group will usually prefer the HS binding pocket. Due to the rules of the model, the non-converted ester must stay in PF, leading to the same enantiopreference, when hydrophilicity/hydrophobicity did not change dramatically and thus binding to other pockets occur.[64-68] Interestingly, for different isoenzymes, opposite enantiopreferences were found.[7-8] Thus the individual amino acid substitutions within the isoenzymes must influence the binding pockets, by e.g. resizing HL for the acceptance of bigger substituents. Additionally, the entrance tunnel or the trimerisation with other isoenzymes, reshaping the active site, can influence the stereoselective outcome. In case of the substrate model for crude PLE, the predicted conformation is completely divergent (Figure 10).[63]

Chapter one: Introduction 16

1 1 R PLE R 2 2 2 R O2C CO2R R O2C COOH R1 R2 Yield [%] %ee A.C.

-CH2OH Me 37 6 (S) -CH2OCH3 Me 86 21 (S) -CH2OSi(CH3)2(tBu) Me 49 95 (R) -CH2OCH2Ph Me 90 67 (R) -CH2O-tBu Me 90 96 (R) -CH2CH3 Me >90 73 (S) -CH2CH2CH3 Me >90 52 (S) -CH2(CH2)2CH3 Me >90 58 (S) -CH2(CH2)3CH3 Me >90 46 (R) -CH2(CH2)4CH3 Me >90 87 (R) -CH2(CH2)5CH3 Me >90 88 (R) -CH2CH3 Et >90 5 (S) -CH2CH2CH3 Et >90 10 (S) -CH2(CH2)2CH3 Et >90 25 (S) -CH2(CH2)3CH3 Et >90 10 (R) -CH2(CH2)5CH3 Et >90 5 (S) -CH2Ph Me >90 16 (S) -CH2Ph Et >90 23 (S)

A.C. - absolute configuration

Figure 9: PLE-catalyzed hydrolysis of α,α-substituted malonate esters. The enantioselectivity value, but also the enantiopreference is influenced by the chain length and the esterified alcohol.[84, 107, 112]

L

L S R N/S S/M COOEt

O OCH3 O OEt

substituted substrate model malonate diester

Figure 10: Substrate model and analogous prepared malonate ester[63], exemplified for the substituted substrates in Figure 9.

Therefore, the enantiopreference is usually (S) for bigger substituted malonate diesters, because the steric ambitious substituent will configure around the larger binding possibilities (indicated by “L”). When the substituent is smaller and fits into the position of “S/M”, the second ester function can switch to the other side, yielding the (R)- Chapter one: Introduction 17 enantiomer. Finally, the experimental results proved the opposite result of the substrate model prediction, underlining the advantage of the active site model. But not only the properties of the substrate, but also the co-solvent concentration can greatly influence the enantiomeric outcome and the enantiopreference. For meso- cyclohexene diacetate the selectivity increases with DMSO, DMF or tert-butanol, but for meso-cyclohexane diacetate it was decreased with the latter solvent.[113] Similar results were reported for a decreased enantioselectivity by solvent addition.[5, 92, 111, 114] The enantiopreference itself was switched for tricyclodecadienone esters.[115-117] The influence of solvent molecules towards the selectivity was also described for an Aspergillus oryzae protease[118] and lipases of Pseudomonas cepacia[119-120], Candida cylindracea[121], Mucor miehei[122] and Aspergillus niger.[123]

Promiscuous PLE The PLE was intensively characterized and it turns out, that crude PLE is a mixture of six isoenzymes, explaining the differing behavior of two commercial crude extracts towards the same substrate.[23-25] embraces the subcategories of substrate, condition and catalytic promiscuity. PLE is an exceptional example for the three groups: Namely first the substrate promiscuity, converting ester functions of many substrate classes. This substrate ambiguity became predictable, as the active site model was reported. But new questions arise, since the PLE isoenzymes were characterized: Differences in a small number of amino acids yield a completely reversed enantioselectivity. What are the mechanisms? How is the shape of the active site reorganized? Which mutations can increase or decrease the observed enantioselectivities? Only structural models of the PLEs in monomeric and trimeric state can answer these questions and deliver a basis for new insights about the substrate binding and the selective reaction outcome. With respect to the reaction conditions, PLE accepts pH changes from pH 5-10 and certain isoenzymes are active up to 70°C with a remarkable increase of activity.[9-10] Another uncommon reaction condition is the high tolerance of co-solvents. Thereby, the selectivity and also the yield can dramatically increase, until enantiopure products are received. In case of crude PLE, the enantiomeric excess towards the meso-compound cis-N-benzyl-2,5-bismethoxycarbonylpyrrolidine increased from 10-20%ee to 98%ee, when 25% DMSO or 10% methanol were added to the system.[5-6] Finally, catalytic promiscuity is the ability of an enzyme to catalyze more than one chemical transformation. As carboxylic esters, thioesters or amides are different chemical species, the PLE isoenzymes also covers this part, finally showing promiscuity in all fields, turning the enzymes into a more than interesting research target.

Chapter one 18

Results and Discussion

Investigation about the promiscuous behavior towards different solvents was described before[7], but modeling experiments and structural information about the pig liver esterase (PLE) are still scarce. Consequently, structure-function relationships and important information about solvent binding sites or the correct orientation of the first shell residues in the active site are not exactly defined.

Initial homology modeling of the PLE isoenzymes First, the setup for modeling experiments contained the structure of human liver carboxyl esterase (hLCE, pdb codes 2H7C or 3K9B) or rabbit LCE (rLCE, pdb code 1K4Y) to calculate suitable models for docking experiments with the malonate ester substrates. Even with both structures, creating a hybrid model, Z-scores were lower compared to a common hybrid created by using the YASARA[124] pdb-scan algorithm (Table 1). Therein, YASARA aligns the sequence and protein data bank (pdb) queries were used for searching the closest related crystal structures.

Table 1: Hybrid models obtained from calculations within YASARA and the corresponding quality Z-scores: The higher score the better model (packed and aligned, while respecting torsions, angles, dihedrals, protein rotamers and clash occurrences). rLCE - rabbit liver carboxyl esterase, hLCE - human liver carboxyl esterase, r/hLCE - hybrid model with both template structures, pdb - hybrid model by pdb scan. The codes in brackets are the accession numbers for the structures used as templates.

check type rLCE hLCE (2H7C) hLCE (3K9B) r/hLCE pdb dihedrals 0.785 0.962 0.726 0.933 1.014 packing 1D -0.633 -0.254 -0.509 -0.189 0.004 packing 3D -1.493 -1.031 -1.161 -1.108 -1.024 overall -0.827 -0.436 -0.633 -0.454 -0.327

The rLCE and four different hLCE structures were selected by the pdb query search: 1MX1[73], 2H7C[74], 4AB1[125], 1K4Y[72] and 3K9B[126]. The latter five models were chosen for a critical alignment (see Figure 11). In structure 3K9B the catalytic S205 pointed towards the catalytic cave, where it is covalently bound to the nerve agent cyclosarin.[126] As the pdb model (see Table 1) reached the best scores, it was aligned to 3K9B and a simulation cell with a YAMBER force field was created.[127] After a molecular dynamic simulation the side chain of S205 reaches the same position as in 3K9B. The backbone trace of this model was kept fixed for docking experiments with fixed or relaxed side chain of S205. The final structure of PLE1 homology model was calculated monomeric (Figures 12 and 13).

Chapter one: Results and Discussion 19

Figure 11: Sequence alignment of PLE1 with human liver carboxylesterase 1 (hLCE1: 1MX1) and rabbit LCE (rLCE: 1K4Y). The pdb entries 2H7C, 4AB1, 1K4Y and 3K9B - chosen for homology modeling - are sequentially identical to 1MX1. The red box indicates the GGGX motif for the conversion of tertiary alcohols and the blue box represents the GXSXG motif, containing the nucleophilic serine. Chapter one: Results and Discussion 20

Figure 12: Overview about the final monomeric PLE1 homology model. The catalytic triad is shown in sticks, other residues probably contributing to the substrate binding and differing in PLE1-6 are represented in lines. Helices are shown in red, sheets in yellow; helices building up the entrance tunnel are colored cyan.

Figure 13: Stereo view of the active site with the catalytic triad. Additionally the residues creating the oxyanion hole are represented with sticks. Chapter one: Results and Discussion 21

Homology models of PLE2-6 and tunnel cluster analysis PLE isoenzymes were created by homology modeling with the PLE1 model as reference structure. Therefore, high Z-scores were achieved, which are consequentially not meaningful. The models undergo refinement by the macro md_refine and the snapshot with the lowest energy was further used. Subsequently, the isoenzymes were structurally analyzed, e.g. by RinAlyser or CAVER3.0, before docking. In the first tool, residue contacts and residue interactions were calculated and in the second, tunnel clusters and related residues were analyzed. CAVER3.0 was run as plugin for PyMol and first, different tunnel clusters, covering the surface of the binding pocket were identified, depending on the starting points at F85, I370, M403, and E453. These amino acids were chosen after visually evaluating the surface of the binding pocket. Thereby, 71 residues were detected, creating the first shell of the binding pocket. Further calculations were done using the hydroxy function of S205 as starting point. Residues contributing to the tunnel clusters and their endpoints between the entrance helices were identified (Figure 14). 13 residues differ between PLE1-6 (Table 2) and an additional tunnel was found. This cluster is longer and has a lower bottleneck radius (Figures 14 and 15). The substrate tunnel has a calculated length of 19 Å with a bottleneck of 1.2 Å, whereas the second identified tunnel (Figure 14, green) is longer (32 Å) and has a tighter bottleneck radius of 1.0 Å.

Figure 14: Substrate tunnels inside PLE1. The first cluster (blue) ends between the entrance helices. A second one (green) was identified as a longer and smaller channel, useful for the transport of product alcohols/acids or water molecules. The surface is reached beneath the large entrance helix. Chapter one: Results and Discussion 22

PLE1 PLE2

PLE3 PLE4

PLE5 PLE6

Figure 15: Tunnel cluster calculation for the PLE isoenzymes. The main channel through the putative entrance helices is shown in red, the side channel beneath the longer entrance helix in blue. The more intense color of the latter tunnel in PLE6 should indicate that only one channel was found. For comparison, the cluster of PLE5 was inserted in this structure. Chapter one: Results and Discussion 23

Table 2: Residues building up the side channel mouth of PLE1-6. Substituted residues throughout the isoenzymes are underlined.

PLE 67 68 69 70 73 75 77 78 79 82 1 M Q E Q S 2 3 M Q G G Q S 4 Y P P M 5 Q P Q S 6 M Q P G Q S

PLE 88 89 90 91 92 93 94 95 96 134 135 137 1 R T L 2 G K E R L P M D 3 R P 4 L I P E 5 R I P F

6 R I P F

The residues involved in the tunnel entrance of the short tunnel are more hydrophobic. In the model both helices almost close the entrance of the enzyme by hydrophobic interactions. The small solvent exposed entrance lies between M80, L84, L341 and M345. The tunnel mouth of the second (longer) cluster is flanked by more hydrophilic amino acids (E78, Q79, S82, R91 and T93). This leads to the hypothesis of binding or release of smaller, polar molecules (water, methanol or ethanol). This is supported, by an alignment of the calculated tunnels to high resolution structures (<2.0 Å) of carboxylesterases, where the indicated PLE side channel includes solvent water molecules (Figure 16).

Figure 16: Superposed structures of human carboxylesterases 1 (pdb codes 5A7F and 5A7G) with integrated side channel, calculated from the PLE isoenzymes (red dotted spheres). The water molecules from the high resolution structures are indicated as blue spheres and were found along the identified side channel in the PLE isoenzymes. Chapter one: Results and Discussion 24

The residues forming the main channel entrance from the smaller helix are the same for all isoenzymes. The larger helix is constructed by residues 74-84, this includes five positions, which differ between PLE1-6. A different substrate specificity or co-solvent tolerance can originate of those differences.[128-132] Thus, channels for the substrate entrance were calculated for all modeled isoenzymes. The main channel is not or only little influenced by the exchanges within the isoenzymes. Remarkable differences occur regarding the second tunnel found beneath both entrance helices (Figure 15). The tunnel configurations are shown in Figure 15. The main channel towards the two entrance helices (residues 74-84 and 341-347, as described before[52]) was found in nearly all structures, except PLE6. The reasons are unknown; maybe the refinement procedure closed the main gate of the structure due to the force field power. This hypothesis is supported by the very narrow bottleneck of the other main channels, visible in Figure 15. Of course the helices must be mounted flexible for the catalytic process. In PLE6 residues M80, L84, L341, T344 and M345 completely block the entrance. The development and behavior of the side channel, which is found in all structures showed interesting properties. It is covered by loop 88-96 (side entrance loop, Lse) and its interactions to the large entrance helix. This and additionally, loops 66-

73 (L1) and 130-137 (L2) define the second entrance. L2 is packed under the first protein layer composed of the large entrance helix, Lse and L1. There is partial potential to fold a small helix (132-136), which is unwinded in PLE4 and PLE5 (Figure 17).

Figure 17: Overview of the entrance regions of the side channel in PLE1-6. The conserved small entrance helix is shown in red, the larger and more variable helix in cyan. The loop covering the side channel entrance (Lse) is shown in dark blue. The basis for folding and behavior of Lse are interactions with two more loops: L1 shown in orange and L2 in olive. The catalytic S205 is shown in sticks. Please notice the partial helix folding in L2. Chapter one: Results and Discussion 25

The establishment of a second entrance channel and possible interaction of entering solvents or co-solvents can play a crucial effect on the catalytic activity. Furthermore the space opened by the side channel can be used for binding or accepting alternative substrates. A clear evidence of the behavior of enantiomers due to the shape or course of this tunnel cannot be concluded. From a structural point of view it is an interesting feature, which can also affect co-solvent dependent catalysis, but of course undergoes reshaping changes during oligomerization to the trimer. Regarding the results concerning the enantioselectivity of the PLE isoenzymes in experiments carried out by Maureen Smith[7-8], the differences in the conversion of 7 and 8 (Figure 18, Table 5 on page 27) must be an outcome of the substitution of amino acids P286, L287 and T288 (Table 3). Other combinations do not allow the complete and drastic inversion of enantioselectivity. In fact, the substitution of a proline may strongly affect the structure and additionally the replacement of leucine by phenylalanine is remarkable. The change of T288A may be a consequence of the fixation of threonine by two hydrogen bonds to K285 and F291 by the backbone atoms (Figure 18).

Table 3: Alignment of PLE1-6 depending on differing amino acid residues. Red marked positions contribute to the catalytic pocket. PLE3, PLE4 and PLE5, PLE6 are marked green and blue, respectively, because they show inverse enantioselectivity towards substituted malonate esters (substrates 7 and 8, see Figure 19). Dark green and dark blue marked residues are the only differences in their amino acid sequence, occurring in both pairs. The substitutions were clustered into six groups (from left to right): Residues on the entrance helices, correlated residues influencing the entrance helices, not correlated to the entrance helices, correlated to the entrance and the active site, distant correlated residues and long interaction networking.

PLE 74 76 77 78 81 88 93 94 130 134 135 139 1 D V V E T G T L L P M V 2 D V V E T G T L L P M V 3 E I G G L R I P V S T L 4 D V A G T R I P V S T L 5 D V A G T R I P V S T L 6 E I G G L R I P V S T L

140 235 237 238 286 287 288 291 295 303 460 462 464 V L V A F L T F Q P F L K V L V A F L T F Q P A F R A F A G P L T L P T F L K A F A G P L T L P T A F R A F A G F F A L P T A F R A F A G F F A L P T A F R

The first binding pair creates a loop exposing F286 into the catalytic pocket. Additionally, hydrophobic contacts to V129 and L230 were detected. These residues have hydrogen bond interactions to the residues of the GGGX-motif. Thus, the substitution L287F will push V129 and L230 upwards by changing the shape of the catalytic pocket. Otherwise Chapter one: Results and Discussion 26

F287 will then point additionally into the pocket, clashing with F286. For those residues, π-π-interactions may sandwich aromatic parts of potential substrates.

Figure 18: Binding pocket of PLE1. On the right side the catalytic triad with S205, H450 and E337 is shown. Also highlighted are first shell residues. The fixation of T288 and contacts of V129 and L130 to the oxyanion hole are pictured with yellow lines

The model structures were superposed and aligned by structural fitting of PyMol to determine the structural variation in-between the isoenzymes, expressed as root mean square deviation (rmsd, Table 4). One of the highest deviation values was calculated for PLE4 and PLE6 and the lowest between PLE1 and PLE2. This is related to the low number of changed residues (Table 3). Surprisingly, the highest rmsd was found between PLE5 and PLE6.

Table 4: Folding energies and rmsd of the PLE isoenzymes corresponding to the homology models.

Refined folding rmsd of all residues to isoenzyme Isoenzyme energy [kcal mol-1] PLE1 PLE2 PLE3 PLE4 PLE5 PLE1 -295668.34 - - - - - PLE2 -293788.95 0.425 - - - - PLE3 -294313.73 0.577 0.585 - - - PLE4 -294845.75 0.654 0.574 0.602 - - PLE5 -294283.80 0.521 0.533 0.678 0.673 - PLE6 -294460.29 0.674 0.707 0.680 0.760 0.763 Chapter one: Results and Discussion 27

Docking preparation and former results The activity of PLE towards many substrates was shown and in case of α,α-disubstituted malonate diesters, substrate promiscuity was observed. This fact is expressed by a different enantiomeric outcome of the hydrolyzed esters, when different isoenzymes were applied in catalysis. Furthermore, co-solvents influence the catalysis, also depending on the choice of isoenzyme.[7-8] Special substituted malonate esters (Figure 19) were synthesized in the group of Douglas Masterson by Maureen Smith (Department of Chemistry and Biochemistry, The University of Southern Mississippi) and product formation with PLE crude extract and purified PLE isoenzymes was analyzed.[133]

O O O O O O O O

O O O O O O O O O S O N O S N S

1 2 3 4

O O O O O O O O

O O O O O O O O

N N N 5 6 7 8 Figure 19: α,α-Disubstituted malonate diethyl esters investigated in this thesis. The substituents differ in bulkiness, polarity and ability of hydrogen bonding. Different enantiopreferences for the PLE isoenzymes was shown for 1, 5, 7 and 8.

The effect of co-solvents, especially ethanol was investigated previously by Smith et al. Thus, the question raised, why and how the small co-solvent molecule influences the catalysis. The predefined, achiral substrates were docked in all homology models of the isoenzymes to answer those questions. All isoenzymes exhibit an enantiopreference towards the prochiral substrates, producing one or the other enantiomer more or less in excess (Table 5).

Chapter one: Results and Discussion 28

Table 5: Enantiomeric excess of the analyzed substrates as reported.[8] Only for 1, the dramatic change in enantioselectivity was observed, by addition of the co-solvent ethanol. However, the changes in-between the isoenzymes are a result of the different amino acid substitutions (Table 3) and were further investigated. The enantiomeric excess was either determined by chiral HPLC-MS or by a practical MS assay[134], using isotopically labeled substrates. isoenzyme 1 [%ee] 2 [%ee] 3 [%ee] 4 [%ee] 5 [%ee] 6 [%ee] 7 [%ee] 8 [%ee] PLE1 86.6 (R) -a 73.8 (R) 81.3 (R) 2.5 (S) -a -a -a PLE2 77.8 (R) -a 86.5 (R) 88.5 (R) 7.9 (S) 32.5b 5.7 (S) -a PLE3 29.2 (S) -a 77.3 (R) 30.4 (R) 16.7 (R) 21.7b 2.9 (R) 39.0 (R) PLE4 58.9 (S) -a 83.8 (R) 58.3 (R) 28.3 (R) 2.1b 36.4 (R) 41.1 (R) PLE5 60.4 (S) -a 87.1 (R) 41.3 (R) 36.5 (S) 37.2b 43.6 (S) 50.6 (S) PLE6 52.5 (S) -a 61.7 (R) 35.8 (R) 49.8 (S) 26.0b 39.8 (S) 55.6 (S) [a] no product was detected. [b] the enantiomer was all the same, but the absolute configuration is not known.

Furthermore the enantioselectivity could be modulated by the addition of co-solvents. In case of 1 the selectivity turned from 30-60%ee for the (S) product to 40-60%ee for the (R) product for PLE3-6, when 14% ethanol was added. Only 2% of ethanol in the mixture changed the preference for PLE3 and PLE6, whereas 4% were necessary for PLE4 and PLE5. The selectivity was also influenced when isopropanol was used, but only for PLE3 and PLE6 the opposite enantiomer was produced. However, even PLE4 and PLE5 lost their high values, whereas PLE1 and PLE2 were only slightly or not influenced.[7-8] In general, other solvents were also able to influence the selectivity. In reactions with PLE crude extract, the selectivity towards 1 was increased with isopropanol and DMF, but decreased with dioxane, DMS and THF. In case of tert-butanol, acetonitrile and a concentration of 4-10% THF, the final outcome was nearly racemic product.[7] The substrates were constructed and converted in silico for their application in YASARA.[124] After optimizing the structural coordination, they were docked into the different PLE isoenzymes.

Molecular docking of malonate esters The catalytic pocket of the PLEs seemed to be very large, because not meaningful results were obtained, when a force field of 10 Å around S205 was used for the 12-14 Å long substrates. Consequently, all dockings were prepared by a simulation cell around 5 Å around the γ-oxygen of nucleophilic S205. After the docking runs, the models were analyzed by YASARA and PyMol. The first program detected hydrophobic and π-π- interactions as well as hydrogen bonds. The second one was chosen to figure out residues, which were responsible for positioning the substrate and counterproof the polar contacts found in YASARA. Criteria for correct docked substrates were a calculated binding energy stronger than 1.5 kcal mol-1 and a minimum distance from the γ-oxygen of the catalytic S205 (up to 3.5 Å) to one carbonyl carbon (from one of the two ester functions). Furthermore, the position of the related carbonyl oxygen was checked for its Chapter one: Results and Discussion 29 orientation towards the oxyanion hole, built of the amide bonded nitrogen atoms of A206, G127 and/or G128 (of the related GGGX motif). Results were obtained with different orientations of the variable amino acid side chains by construction of protein rotamers. The substrates were set to be flexible, to achieve optimal orientation inside the active site.

Substrate 1 (diethyl 2-((1,3-dioxoisoindolin-2-yl)methyl)-2- O O methylmalonate) O O O N O Structurally, substrate 1 was one of the biggest, which were tested in silico until now and exhibits a high rigidity concerning the 1,3- dioxoisoindoline substituent (Figure 20). 1

PLE1 PLE2

PLE3 PLE4

Figure 20: Dockings of substrate 1 into PLE1-4. The ligand is represented in sticks; corresponding binding residues from the PLE isoenzymes are labeled and the side chain is shown with lines. Polar contacts are indicated as yellow dashed lines. The backbone of the oxyanion hole (made up from A206 and the GGGX motif) and the nucleophilic S205 are also shown, but not labeled. The Figure is continued on the next page. Chapter one: Results and Discussion 30

PLE5 PLE6

Figure 20, cont.: Dockings of substrate 1 into PLE5 and PLE6. The ligand is represented in sticks; corresponding binding residues from the PLE isoenzymes are labeled and the side chain is shown with lines. Polar contacts are indicated as yellow dashed lines. The backbone of the oxyanion hole (made up from A206 and the GGGX motif) and the nucleophilic S205 are also shown, but not labeled.

The binding energies of all enzyme-substrate complexes were calculated[135] (Table 6) and the highest was scored for PLE1. But in case of all other isoenzymes the energies were unexpectedly low for substrate 1, indicating a weak binding related to an increased dissociation constant.

Table 6: Binding energies of the PLE-ligand complexes for all substrates docked into PLE1-6 rotamers. Binding energy ΔG [kcal mol-1] for substrate Isoenzyme 1 2 3 4 5 6 7 8 PLE1 5.52 2.60 5.70 5.54 5.24 5.54 5.14 5.59 PLE2 2.47 -0.89 3.91 3.32 5.82 5.20 4.86 4.57 PLE3 2.00 -3.22 3.98 2.97 4.97 4.53 4.60 4.10 PLE4 2.00 -0.16 4.71 4.89 4.23 5.02 4.23 5.63 PLE5 1.63 -4.79 3.85 1.30 4.89 3.92 4.27 3.78 PLE6 1.45 -5.00 2.64 2.49 3.95 3.58 4.18 3.64

A hydrophobic pocket of the active site, built up by the side chains of L239, F301, F338, L341, L342, I370, V407 and F408 was used for binding the large substituent, indicating the large hydrophobic binding pocket HL (Figure 5). Only one contact for one PLE isoenzyme was found for the phthalimide oxygen atoms to the enzyme: In the PLE4 docking result, an interaction to the backbone nitrogen of A206 was detected, but no additional hydrogen acceptor, which can explain the reaction specificity. The hydrophobic properties of the substituent are necessary to coordinate binding to the Chapter one: Results and Discussion 31 shown phenylalanine and isoleucine residues. These residues seem to have a high flexibility and can adapt to the sterically hindered substrate. Additional binding space for small solvent molecules, like ethanol can be assumed, serving for a hypothetical shielding of the substituent’s oxygen atoms. The small ethanol molecule can interact with many residues of the catalytic pocket and may also have shielding effects to special residues by inverting the polarity, what lead to interesting effects concerning the enantioselectivity.[7] Beneath the active site, T81, Y136, E204, S231 and T236 are potential polar binding partners and may be shielded to create a hydrophobic layer or in contrast, the major surface consisting of hydrophobic side chains can be turned hydrophilic. Therefore, ethanol was docked into PLE3 and PLE5 by defining a simulation box in a distance of 20 Å around the nucleophilic γ-oxygen of S205 to cover the whole active site (Figure 21).

A B A A

Figure 21: Docked ethanol molecules in the active sites of (A) PLE3 and (B) PLE5. Each solvent molecule represents one docking result, which were finally superposed via PyMol.

Additionally, the docked complexes of substrate 1 in PLE3 and PLE5 were used with a simulation box 3.5 Å around the whole substrate (Figure 22). The binding energies vary from 1.9-2.7 kcal mol-1. PLE3 and PLE5 were chosen, because in the initial dockings, a pre-(R)-conformation for PLE3 and a pre-(S)-complex for PLE5 were found. Both conformations represent possible binding modes, but PLE3-6 favor the (S)-product when no solvent is added (Table 5). However, the selectivity switched to the (R)-product, when co-solvent is involved in the reaction.[7] Chapter one: Results and Discussion 32

A

B C A

Figure 22: Substrate 1 docked into (A) PLE3 and (B, C) PLE5 structure scaffolds. Subsequently, the results were modeled with ethanol. PLE5 was (B) visualized from the entrance site and (C) from the active site. Each ethanol molecule reflects an independent docking result. Side chains interacting with EtOH are labeled and shown as lines, substrate 1 is given in sticks and residues, which are shifted during binding of EtOH, are shown in purple.

In Figures 21 and 22 the ethanol molecules were bound to different pockets in the active site. For PLE3 it was found to be bound to N334 and G451, whereas the hydrophobic part points towards I454. Additional solvent molecules were interacting with G126, S205 and A205, which are the catalytic serine and residues of the oxyanion hole. For PLE5 polar contacts were found to S231, N334 and E453. Again, contacts existed to G126, E204, S205 and A206. Another molecule pointed with the hydroxy functionality into the active site and was bound by hydrophobic interaction to F338, L342 and I370. When the docked complex with substrate 1 served as template, in case of PLE3 one ethanol molecule was located beneath G125, G126, Y136 and E204. The coordinated E204 was shifted towards the substrate’s (non-hydrolyzed) ethyl ester functionality. Another molecule with no polar interaction was modeled close to L287 and L342, limiting the rotamer variation of the side chain of M346. A third ethanol molecule was located between the entrance helices, interacting with the backbone nitrogen atoms of L287 and M345. For PLE5 the co-solvent molecules were also localized close to F85, Y136 and E204 and another one between E453 and I454. Additionally, there was also enough Chapter one: Results and Discussion 33 space to bind a molecule behind the bulky substrate in the hydrophobic pocket, establishing interactions to the polar residue side chains of S205, E209, S231 and T236. Finally, a molecule bound between the substrate and M345, pushing F287 away, underlining the increasing importance of this residue.

Substrate 2 (diethyl 2-((1,3-dithiooxoisoindolin-2-yl)methyl)-2-methylmalonate)

Substrate 2 is the thio-analogue to substrate 1. The docking results O O were as expected from the experimental findings: only for the O O complex of PLE1 a positive binding energy was calculated. For all S N S other PLEs the complex was not stable, indicated by negative values (Table 6). To get an idea of possible binding configurations, complexes are shown in Figure 23. The energies predict an 2 unstable enzyme-substrate complex, which was unlikely formed with the substrate. Similar results were obtained when the substituent was set rigid. The bond length of a carbon-sulfur double bond (1.7 Å) is longer than a carbon-oxygen double bond (1.2 Å). Additionally, the mesomeric and inductive effects decrease with the increasing atom radius and the lower electronegativity of the sulfur substituents. The consequences of a higher electron density in the phthalimide system, when oxygen atoms are bound, have an influence in binding and the coordinating hydrophobic interactions.

PLE1 PLE2

Figure 23: Docking results for PLE1 and PLE2 with substrate 2. The rigid and flat thio-derivative occupies more space than its oxygen analogue due to the higher radii of the sulfur atoms. The figure is continued on the next page. Chapter one: Results and Discussion 34

PLE3 PLE4

PLE5 PLE6

Figure 23, cont.: Docking results for PLE4-6 with substrate 2. The rigid and flat thio-derivative occupies more space than its oxygen analogue due to the higher radii of the sulfur atoms.

Substrate 3 (diethyl 2-(benzyloxymethyl)-2-methylmalonate) O O

O O The binding mode of substrate 3 was very different concerning the benzyloxymethyl substituent throughout the structures. O Hydrophobic contacts to the residues from the hydrophobic pocket were detected in YASARA. Especially the side chains of residues F301 and I370 were involved in substrate binding (Figure 24). In a 3 previous study, with crude PLE a product yield of 90% in 20.5 h with 67%ee, producing mainly the (R)-product was described.[112] It was found, that with an increasing concentration of ethanol, the enantiomeric excess for PLE2 and PLE5 of the (R)-product decreases by 12-15%ee and increases for the other isoenzymes by 5-10% ee. In case of PLE6, the optical purity rises from 61.7%ee to 81%ee. When isopropanol was added to the reaction mixture the differences were smaller, but for PLE2, PLE4 and PLE5 the enantiomeric excess decreased by 4-14%ee and increased for PLE1 and PLE3 by 3-5%ee. For PLE6 an increase of 15%ee was detected.[7]

Chapter one: Results and Discussion 35

PLE1 PLE2

PLE3 PLE4

PLE5 PLE6

Figure 24: Docking results for the PLE isoenzymes with substrate 3. The long chained substituent fits into the catalytic pocket by twisting the flexible ether bond and the related methylene groups.

Interestingly, for all isoenzymes, except PLE4, a pre-(S)-state was found before the nucleophilic attack of S205, in contrast to the experimentally detected enantiopreference (Table 5). The chain along the ether function is very flexible and exhibits a high variance. The residues, which build up the pocket, have also a high Chapter one: Results and Discussion 36

plasticity as mentioned before. This underlines the high substrate tolerance of the enzymes. However, they are able to bind the substituents tight enough to control catalytic activity. In case of the PLE4 complex, the oxygen atom - generating the oxyanion in the next catalytic step - is coordinated by three backbone nitrogen atoms of the oxyanion hole. In no other case the substrate was bound as accurately defined. In the experimental results, the enantiopreference favors the (R)-product, but not completely, because the reaction is not entirely selective. Especially for PLE2, PLE5 and PLE6 the double bond oxygen is not properly oriented. It can be assumed, that the carbonyl function of the other ester group is attacked, when the substrate slightly turns. This orientation suits PLE1 better for catalysis and furthermore the selectivity would be switched.

Substrate 4 (diethyl 2-(benzylthiomethyl)-2-methylmalonate) O O O O Again, a thio-derivative was synthesized, this time of substrate 3. S It was found to be not as well tolerated, indicated by the decreased binding energies. The strongest differences occurred in the docking experiments with PLE3 and PLE5. Experimentally, 4 the enantiopreference of PLE1 and PLE2 increases, but the enantiomeric excess for PLE3-6 decreases for the (R)-product, compared to the reaction with 3. In contrast to the oxy-analogue, the substrate was found in a pre-(R)-state for PLE4 and PLE6, but in the pre-(S)-configuration for all other isoenzymes. Observed by visual inspection, the double bond oxygen is also not well orientated. The fluctuation of all structures is less compared to 3 and the benzene ring was also oriented towards the hydrophobic pocket (Figure 25).

PLE1 PLE2

Figure 25: Docking results for PLE1 and PLE2 with substrate 4. The long chained thio-derivative has longer C-S bonds compared to the C-O bonds (1.4-1.5 Å). Due to the larger atom radius the rotatability seems to be limited. The figure is continued on the next page. Chapter one: Results and Discussion 37

PLE3 PLE4

PLE5 PLE6

Figure 25, cont.: Docking results for PLE4-6 with substrate 4. The long chained thio-derivative has longer C-S bonds compared to the C-O bonds (1.4-1.5 Å). Due to the larger atom radius the rotatability seems to be limited.

Substrate 5 (diethyl 2-benzyl-2-methylmalonate) O O O O Substrate 5 was an analogue to substrates 6-8 and the effect of co-solvent was not as strong as found for substrate 1.[7] For PLE1, PLE5 and PLE6, first an increase of the enantiomeric excess was 5 measured by 6-13%ee for the (S) product for 2-4% of ethanol. A further increase to up to 14% ethanol decreased the excess to the starting (PLE1 and PLE5) or a lower value (for PLE 6: 49.78 to 37.56%ee). In case of (S)-preferring PLE2 and (R)-preferring PLE3 and PLE4, the enantiomeric excess increased completely by 3-9%ee. The substituent is smaller compared to the previous substrates (Figure 26). Therefore, more space in the active site can be occupied by solvent molecules. In a previous study, 16%ee were achieved for the dimethyl ester and 23%ee with the diethyl ester with crude PLE.[107] In contrast to the docking of the heterocyclic substrates 6-8 (see below) for Chapter one: Results and Discussion 38 substrate 5 a pre-(S)-coordination for PLE2 and a pre-(R)-state for the other isoenzymes was found (Figure 26).

PLE1 PLE2

PLE3 PLE4

PLE5 PLE6

Figure 26: Docking results for the PLE isoenzymes with substrate 5. The structural precursor for the examinations of substrates 6-8 showed different behavior for PLE5 and PLE6.

The only docking, resulting in a pre-(S)-conformation for PLE5, is shown in Figure 27. It has to be mentioned that those conformations (flipped substituent chain) was also found with lower binding energies for some of the other isoenzymes, but not for all. In most cases, none of the carbonyl oxygens of an ester group were well coordinated for Chapter one: Results and Discussion 39 the entrance into the oxyanion hole. The related space was then occupied by the substituent itself or the 2-methyl group.

Figure 27: PLE5 with bound substrate 5 in pre-(S)-conformation. The substituent is no longer bound to the hydrophobic pocket, explaining the relatively low binding energy of 2.2 kcal mol-1.

Substrate 6 (diethyl-2-methyl-2-((pyridine-2-yl-)methyl)malonate) O O O O The docking of 6 into the model results in a pre-(R)-selective conformation for PLE1-4 and surprisingly a pre-(S)-complex for PLE5 N and PLE6 (Figure 28). 6

PLE1 PLE2

Figure 28: Docking results for PLE1 and PLE2 with substrate 6. For every docking the related residues necessary for binding are shown as lines. The ligand is shown as sticks. The figure is continued on the next page.

Chapter one: Results and Discussion 40

PLE3 PLE4

PLE5 PLE6

Figure 28, cont.: Docking results for PLE4-6 with substrate 6. For every docking the related residues necessary for binding are shown as lines. The ligand is shown as sticks.

The ortho-positioned nitrogen atom of the pyridine part can network with different functionalities. The interaction with the nucleophilic S205 is possible, but it is unknown how this orientation will interfere with the chemical transition. There are possible hydrophobic contacts by residues, which can position the heterocycle. The differences of PLE1 and PLE2 are F460A, L462F and K464R, which lay far away from the binding pocket. Somehow they must influence the binding mode by long-range interactions or by trimerisation of the molecules and further structural changes, which cannot be explained by docking observations, because PLE2 converted the substrate, but PLE1 did not (Table 5). Regarding the binding energies of the converting enzymes, the dock run of PLE2 results in the highest value of 5.20 kcal mol-1 and for PLE6 in the lowest with 3.58 kcal mol-1. Comparing the %ee of the related experiment, this is in good agreement.[8] The switch in the preference for one enantiomer results from flipping the methyl group against the chiral carbon group with the other ester function. There was no clear binding partner found, which binds one or the other group more specifically. The highest value of enantiomeric excess was measured for PLE5 with 37.2%ee, meaning 62.8% of the mixture was racemic. The total content of the produced product is 68.6% of one and 31.4% of the other enantiomer. Consequently the chance to model and find the Chapter one: Results and Discussion 41

“correct” structure is 2:1, but finally only the complexes with the lowest binding energy will be found. Thus, the modeled complexes represent the most stable interactions, not the most likely pre-attack conformations.

Substrate 7 (diethyl-2-methyl-2-((pyridine-3-yl-)methyl)malonate) O O O O In contrast to substrate 6 the nitrogen atom is in meta-position for substrate 7, thus the variance for possible interactions increases. N Experimentally, a change in enantiopreference throughout the 7 isoenzymes was observed. PLE3 and PLE4 convert predominantly the (R)-enantiomer, whereas the preference of PLE2, PLE5 and PLE6 is opposite (Table 5 and Figure 29). With PLE1 no conversion was detected.

PLE1 PLE2

PLE3 PLE4

Figure 29: Isoenzymes PLE1-4 docked with substrate 7. For every docking the related residues necessary for binding are shown as lines. The ligand is shown as sticks. The Figure is continued on the next page. Chapter one: Results and Discussion 42

PLE5 PLE6

Figure 29, cont.: PLE5 and PLE6 docked with substrate 7. For every docking the related residues necessary for binding are shown as lines. The ligand is shown as sticks.

The dockings of PLE2-6 were in perfect agreement with the experimental results: PLE2, PLE5 and PLE6 are binding the pre-(S)-substrate and PLE3 and PLE4 the pre-(R)- enantiomer. Of the isoenzymes converting the substrate, the docking energy of PLE2 was surprisingly the highest (4.86 kcal mol-1), while this enzyme gave the lowest enantiomeric excess (5.7%ee). In contrast, PLE6 has the lowest - but globally still high - binding energy of 4.18 kcal mol-1 and resulted in one of the highest enantiomeric outcome (39.8%ee). There were no additional polar contacts found for PLE2-6 than necessary for the coordination of one of the two ester functionalities. Meanwhile, for PLE1 two binding modes were found. The (R)-positioned substrate (Figure 30) interacts hydrophobically with I370 and L342. The distance to these residues weaken the binding energy. There is too much space inside the catalytic pocket for the substrate and no closer interactions are possible for a production of the (R)-enantiomer. Otherwise, the binding of the prochiral substrate in the (S)-state is more convenient, because of a possible hydrogen bond to T236 (Figure 30). The pro-(S)-configuration yielded a slightly higher binding energy (5.8 kcal mol-1) as the pro-(R)-configuration (5.14 kcal mol-1), while docking of the substrate 7 in PLE1 (Figures 29 and 30, respectively). Additionally, the position of S205 is more reasonable to induce catalysis in the latter conformation. The hydrogen donor is T236, which is flanked by L235 and V237. In the modeling results of the other substrates, the side chain hydroxy function of T236 is bond to the backbone nitrogen of V233. However, in PLE3-6 the adjacent amino acids are substituted by L235F and V237A. Thus, it is imaginable that the first exchange pushes the carbon trace stronger in the direction of the catalytic pocket, because L235 is pointing away from the cavity and exposes T236 for better binding abilities.

Chapter one: Results and Discussion 43

Figure 30: Substrate 7 bound to PLE1. The prochiral substrate is in a pre-(S)-conformation. Two polar contacts were found from the heterocyclic nitrogen atom: The first to the nucleophilic S205 and the second is generated by the side chain of T236.

Another substitution between PLE3/4 and PLE5/6 is F286P, which must generate structural changes in one wall of the catalytic cavity. The exposed F286 is substituted and the free space can be used by F287 for additional hydrophobic interactions, possibly generating the opposite enantiomer.

Substrate 8 (diethyl-2-methyl-2-((pyridine-4-yl-)methyl)malonate) O O O O Substrate 8 was not converted by PLE1 and PLE2, but predominantly to the (R)-enantiomer by PLE3/4 and to the (S)-enantiomer by PLE5/6. The differences between those subgroups of the N 8 isoenzymes result in the same behavior like it was shown for substrate 7, but with a higher selectivity value for PLE3. The results for docking the prochiral (R)- and (S)-substrate are summarized in Figure 31. According to the experimental results, a complex with the pre-(R)-enantiomer was found for PLE3 and PLE4, whereas for PLE5 and PLE6 a preliminary (S)-configuration was detected. Regarding the binding energies, PLE4 has the highest with 5.63 kcal mol-1) and PLE6 the lowest value (3.64 kcal mol-1) again. No additional polar contacts were found in the docking experiments, even though the exposed nitrogen function of the heterocycle is a potential hydrogen bond acceptor.

Chapter one: Results and Discussion 44

PLE1 PLE2

PLE3 PLE4

PLE5 PLE6

Figure 31: PLE isoenzymes docked with substrate 8. For every docking the related residues necessary for binding are shown as lines. The ligand is shown as sticks.

In all docking results, residue Y136 is pointing in the direction of the catalytic cave and is also a potential binding partner, in case there would be enough space to switch the pyridine functionality into the other direction, facing a smaller cavity. In most cases, the nonpolar pocket made from F301, L342, F338, I370, V407 and F408 is binding and Chapter one: Results and Discussion 45

orientating the substrate (indicating the large hydrophobic pocket HL). Nevertheless, there must be other interactions to the methyl substituent or the ester group, which define the enantiopreference. The binding energies for the pro-(S)-complex are generally lower than for the pre-(R)-analogue. This is either a reason of the PLE3/4 and PLE5/6 structures or of a weaker binding of the enantiomers, because different conformations must be reached to convert the opposite enantiomer. In case of PLE4 the binding of the opposite enantiomer results in a lower energy (5.06 kcal mol-1, Figure 32).

Figure 32: Superposition of two PLE4 structures with docked substrate 8 in pre-(R)- and pre-(S)- conformation represented in green or magenta, respectively. Polar contacts of the first prochiral binding mode are shown as yellow dashed lines towards the oxyanion hole. Residue side chains with different conformational positions are labeled and shown as lines. Residues without any movement are hidden (see text). The ligand (substrate 8) is visualized in sticks.

The residue side chains of T81 and Y136 and on the opposite part of the active site S231, T236, L239, F338, L342 and V407 are coordinating the substrate changing their position less than 1 Å. For the residues F85, E204, L287, F301, E337, M345, I370 and F408 longer movement distances were measured. The binding of the substrate 8 in the pre-(S)- conformation bends the pyridine residue towards E337 and F408. The α-methyl group of the diester reaches the position of the not attacked ester function to yield the (S)- product. The stronger binding of the pre-(R)-configured substrate resulted in a movement of 2.9 Å and 7.5 Å of E337 and F408, respectively. This leads to a total flip around 85.3° of the side chain of E337 and 115.7° of the benzyl of F115. The orientation of E337 in this conformation is closer to H450, optimizing the polarizing effects during catalysis, favoring the (R)-product. Between PLE3/4 and PLE5/6, only P286F, L287F and T288A are substituted, creating the remarkable differences between both isoenzyme groups. The exchange of P286F has - although creating a very sharp turn - no remarkable structural effects, because neither the backbone trace nor contacting residues beneath the rigid benzyl side chain were noteworthy shifted. However, in case of L287F the Chapter one: Results and Discussion 46 shape of the active site changed and blocks the possible (R)-configuration by pushing the heterocyclic substituent in another direction. Consequently, less space for the ester function is available and the (S)-product is predominantly formed.

Molecular dynamics simulation of substrate 5 After successful docking, substrate 5 was bound covalently to S205 of PLE4. The structure was refined and modeled for 500 ps in a dynamic simulation (Figure 33).

Figure 33: Covalently bound substrate 5 to S205 of PLE4. After refinement and molecular dynamics simulation, the oxyanion was optimally oriented towards the oxyanion hole. Please note the position of L287, which is substituted to phenylalanine in PLE5 and PLE6 and may cause a different enantiopreference by taking more space than the smaller leucine side chain.

After manual examination of the snapshots taken during the simulation, no positional or structural switch of any protein side chain or the substrate’s functionalities was noticed. The stable conformation enables a position of H450 with a distance of 4.5 Å to the carbonyl function, ensuring the possible binding and orientation of a water molecule after ethanol release. In another simulation, ethanol molecules were added to a simulation cell containing the PLE structure and water to a total content of 98% water and 2% ethanol molecules. After 50 ns of molecular dynamics simulation, ethanol molecules entering and leaving the active site were detected (Figure 34). During the simulation, no durational polar contact - indicating a special solvent pocket - was found. Before the simulation run, E337 was bond to H450 and E453 pointed away from the active site. Then, the contact switched to E337-S231 and E453 was in interaction distance to H450. S231 is near to E337 in the starting structure, but too far Chapter one: Results and Discussion 47 away for hydrogen bonding. This combination of E453 replacing E337 was found only in a minority of the docking results.

A B

Figure 34: Snapshot of a molecular dynamics simulation of PLE1 in a simulation cell containing 2% ethanol, viewed from above the entrance site (A) and towards the entrance helices (B). Throughout the simulations, ethanol molecules enter and leave the active site, but no special polar contact site was identified for the molecules. The initial structure is shown in cyan, the simulated snapshot in red, residues of the active site are shown as sticks. Ethanol was also found inside the side channel, identified by CAVER. In the snapshot the interaction with E78 and S82 is depicted. The next binding partner, on the way to the solvent exposed surface, is R91. Please note the switching contacts (yellow lines) during the simulation and the flexible loop F85-F96, containing R91.

During the described experiments in this thesis, the influence of different interaction networks was indicated. Thus, their detailed analysis may give more information about their connectivity and the contribution to the organization of the active site, finally determining the respective selectivities.

Flexible residues and long-range interaction networks Visual examination of the docking runs during the modeling study, pointed out the flexible residue N87. Depending on the position, different interaction networks are built. For PLE1 and PLE2 this network is shown in Figures 35 and 36. The orientation of N87 is a consequence of the different placement of K464 in PLE1 and R464 in PLE2. The interaction is also influenced if residue 460 is phenylalanine, because T86 is then coordinated by G88 via the side chain. If residue 460 is Ala, R464 coordinates the peptide bond oxygen of T86 and this relation pushes the backbone trace upwards. F85 is subsequently brought into another position for possible interactions with substrates. Additionally, the new position of N87 influences M135 and Y136. The latter residue is pushed in the direction of the catalytic cave. The lost fixation of D137 by N87 enables interactions from D137 to M135, also compensating the free polar Chapter one: Results and Discussion 48 binding partner. However, the three substitutions have an effect on the over a distance of about 20 Å to the catalytic S205.

Figure 35: Long-range interaction involving N87 of PLE1. Amino acids substituted in PLE2 are F460A, L462F and K464R. These differences have an effect concerning the conversion of 6 and 7. Polar contacts are shown as yellow dashed lines. The effect reaches M135, involved in the small entrance helix and the shape of the tunnel mouth.

Figure 36: Long range interaction network from R464 to M135 and F85 mediated by N87 in PLE2. Please note the different position of this residue compared to Figure 35.

To further analyze the network’s influence, the protein structure was loaded to Chimera1.8 and transformed into a residue interaction network (RIN) by Cytoscape3.0.1 plugin RINalyzer. Two residue interaction networks were obtained for PLE1 and PLE2, respectively. The superposition of both networks is shown in Figure 37. It has to be kept Chapter one: Results and Discussion 49 in mind that the transformation of a protein structure to a RIN means that the coordinates of the atoms in a pdb file will be calculated for their distances, depending on their characteristic (N, O, C atom). The algorithm sets an internal threshold for defining an interaction by different fixed parameters. Furthermore, the algorithm does not find hydrophobic interactions, which can be only calculated online in a more complex process. Additionally there are always differences in the RIN by using two refined models from the same template, because of refining variations and folding differences, which can shift residues to other contacts.

Figure 37: Residue interaction networks (RINs) of PLE1 and PLE2, superposed by Cytoscape. Each line represents an interaction to another residue. Black solid lines are conserved in both structures, red dotted lines are interactions only present in PLE1, and green dashed lines are interactions exclusively found in PLE2. Depicted is the structural environment of N87 with T86 highlighted yellow.

The findings of the analysis in PyMol could be confirmed, contacts from N87 to D137 and F460 in PLE1 are present, which are shifted to T86 and consequently F85 in PLE2 by the exchange of K464R and F460A. Also the new interaction of D137 to M135 was shown. New contacts - which were not recognized before - are established to I454 and F458 by breaking another interaction to L463. I454 is performing only a small movement, but this lead to effects to the neighbored residues E204 and E453, which side chains direct - after the change - towards the catalytic H450 (Figure 38). Chapter one: Results and Discussion 50

Figure 38: Consequences of the long-range interactions shown in Figures 36-38. The strongest movement is performed by E453 towards H450. PLE1 is shown in green and PLE2 in cyan. Substituted residues are double labeled, whereas the upper label is from PLE1 and the bottom label from PLE2.

A contribution of I454 to the tunnel mouth of the smaller entrance channel can be supposed. Finally, it should be concluded that the changes of F460A and K464R, which are in a distance of about 20 Å away from the active site have an undeniable influence to the scaffold, structure and shape of this pocket. Analogously, differences between the closely related PLE4 and PLE5 were analyzed (Figure 39), which shows contrary enantioselectivity of the used pyridine esters derivatives 7 and 8.

Figure 39: Superposition of the RINs for PLE4 to PLE5. Black lines are interactions found in both structures, red dotted ones are only found in PLE4 and green dashed only in PLE5. Shown is the structural environment of N87. Chapter one: Results and Discussion 51

Remarkably, several similarities exist compared to the RIN of PLE2 (Figure 37), because PLE4 and PLE5 share the same amino acids residues in positions 460-464. The interactions of N87 to T86, R88, T135, F458, L463 and R464 are shared in both structures, but there are also contacts to D137 in PLE4 and to A460 and Y136 in PLE5. Only the latter change directly influences the catalytic pocket. The substitutions, separating PLE4 and PLE5 (P286F, L287F and T288A) are found in a closely connected RIN (Figure 40).

Figure 40: Superposition of the RINs of PLE4 and PLE5. Residues beneath 286-288 are shown for visualization of contact changes after substitution. Black lines are interactions found in both structures, red dotted lines are only found in PLE4 and green dashed lines only in PLE5.

The RIN superposition allows information about the connection of the substituted residues: Changing P286F in PLE5 leads to an interaction with M284 and L302. Additionally, L287F establishes contacts to G126, V130, P300 and M345. F287 covers the surface of the catalytic pocket. The contacts reach the GGGX motif via G126 and V130. In the very well defined protein scaffold it is obvious that the shape of the pocket is subsequently changed, finally inverting the enantioselectivity towards the mentioned PLE3 substrates. PLE4 PLE5 Concurring acidic residues of the catalytic triad PLE6 During the docking analysis of the PLE isoenzymes, it was figured out that there are additional polar contacts of H450 to acidic groups, which may assist during the catalysis. It was discussed during the dissertation of Daniel Hasenpusch that another acidic residue contacting the catalytic H450 was found.[76] Instead of E337, E452 was predicted to have catalytic function. Both side chains share a high movability during the docking experiments and additionally, an interaction of E204 has been found (Figure 41), Chapter one: Results and Discussion 52 whereas E452 was in an adequate distance, too. A consecutive mutation in further experiments should verify the residues’ function during catalysis. Figure 41 shows the enzyme without any substrate, whereas E204 mostly flips besides, when a substrate enters the active site.

PLE2 PLE6

Figure 41: The putative contacts of E204, E337 and E452 in PLE2 and PLE6. The distances are measured in Ångstrom. Additionally S205 and the residues of the oxyanion hole are labeled.

The side chain of S205 was reported by Hasenpusch et al. to bind either H450 or the backbone oxygen of E230, due to its hydroxy properties. In the article, one of the assumed catalytic acids, E337 was only capable to interact with its own backbone nitrogen atom or the side chain K521, forming stable hydrogen bonds, reorienting the side chain away from the active site. In contrast, E453 was also found being very flexible, rotating towards the side chains of H450 or S230 or the backbone nitrogen atoms of G450, D451 or E452.[52, 76] In this thesis, in some docking results, rotamers of E337 were found, pointing away from catalytic H450. However, both positions were reproducible in the molecular dynamics simulations performed within YASARA. It must be concluded that the side chains of all three amino acids, surrounding H450 have relatively flexible orientations, which can be hypothesized to play a role in the acceptance of different substrates. During the induced fit, a reorientation can be proposed, pointing one of the possible charge relay residues towards H450 for the essential polarization. This may be supported by the wide substrate tolerance of the PLE isoenzymes and their special stereospecific properties. Finally, only activity tests with different substrates, while mutating the residues to other non-functional amino acids, unable to perform the charge relay (such as alanine), will figure out the responsibilities of each glutamic acid.

Chapter one 53

Summary

The homology modeling of PLE was successful, the model was obtained with good scores and also the isoenzymes could be sufficiently modeled. During structural analysis, long- range interaction networks of flexible side chain contacts were identified, being able to influence the investigated enantiopreferences and stereoselectivities. Additionally, a side channel was found, able to transport small solvent molecules to the active site. The molecular docking results have a good agreement to the findings of the laboratory examinations, performed previously.[7-8, 133] The described conformations of the prochiral complexes exhibit the coordination of the differing substituents in a large hydrophobic pocket made from flexible moving residues T236, A/V237, L239, F/L287, F338, L342, I370, V407 and F408 (T236 showed hydrophobic properties due to the exposed methyl group, whereas the hydroxy functionality is mostly fixed via the backbone of V233). The residue composition F/L287 is mostly suggested to play a crucial role in enantioselectivity, because of the stronger influence towards the pocket and the distance to the positioned 2-methyl/ester group (which are switching when the selectivity is changed). The mostly hindered substrate resulted in the lowest binding energies from all catalyzed substrates. The low residual space in the hydrophobic pocket is a good target for co-solvent molecules, like ethanol. The effect of changes in enantioselectivity for PLE3-6 during ethanol addition is an effect of additional binding features, only present in selected subgroups of the isoenzymes. The effect is greater for larger than for smaller substituents due to less binding opportunities in the active site and more coordination possibilities for the flexible residues. The substitution of oxygen by sulfur had dramatic effects on the catalysis and is only explainable by the special atom characteristics. For the assumed changes in hydrogen bond ability (as result of the atomic properties) no particular residue or polar contact was found. This also interferes with the behavior of substrates 1 and 2: The possible binding sites for the co-solvent ethanol molecules can interact with the oxygen atoms, but less or not with the thio- derivative. For smaller substituents the effect is lower, because the hypothetical shielding effect cannot bridge the respective distance to the binding partner. The flexible residues flip away, compensating the additional space needed for the inclusion of ethanol. This was shown in this thesis for several residues in PLE3 and PLE5, which were not capable to bind to oxygen atoms of the substrate. These are too close to the chiral carbon atom to have enough space for additional interactions with ethanol molecules. A discussable alternative is a partial unfolding (like it is usual for most proteins in the presence of co-solvent) of the structure with effects that influence the active trimeric state and in consequence, the binding modes. Concluding, it was proven to be difficult to predict the enantiopreference, when very small parts of the substrate - like methyl or ethyl ester groups - are responsible for the selectivity. Although, it was reported before that the effect towards the enantioselectivity was usually more pronounced for methyl esterified carboxylic acids compared to ethyl esterified acids. Chapter one 54

Materials and Methods

Substrate preparation The substrates were constructed in ChemDraw (7.0.3, PerkinElmer, USA) and exported as molfile in YASARA (13.9.8).[124, 127] An energy minimization experiment was performed and bonds and atom characteristics were corrected for each substrate manually. Finally, another energy minimization was computed with a YAMBER force field. Then, the substrates were used for docking experiments. Smaller molecules, like the co-solvent ethanol, were created directly in YASARA.

Homology modeling A homology model structure of PLE1 was obtained with YASARA by hybridizing 1MX1, 2H7C, 4AB1, 1K4Y and 3K9B, which are crystals structures of LCEs with a minimum of 77% sequence identity. Afterwards the isoenzymes PLE2-6 were modeled by using the same software with the PLE1 model as template and the corresponding fasta-sequences of PLE2-6. When a sufficient homology model was calculated (overall Z-score min. >-0.4), it was subsequently refined with the internal macro md_refine to simulate the behavior and the folding state at 25 °C and pH 7.4. The snapshot of the molecular dynamics simulation with the lowest calculated energy was used as the respective PLE isoenzyme model.

Molecular docking Depending on the approach, 5-10 Å around the γ-oxygen atom of S205 a simulation box was created, using the YAMBER force field. The scene was saved and further used as receptor, whereas the energy minimized pdb-file of the substrates (1-8) served as ligand. Then, the macro algorithm dock_runensemble created seven side chain rotamer ensembles with alternative high-scoring solutions of the side-chain rotamer network with a fixed backbone trace and either completely flexible side chains or a fixed and exposed S205 side chain, pointing towards the active site. The substrates/ligands itself were either set flexible or completely rigid. In 20-100 receptor-runs the substrate was docked with VINA into each receptor ensemble and the complex was energy minimized to calculate the energy minimum, reflecting the most stable complex conformation. Complexes with the minimum energy were superposed and analyzed, reflected in a final log-file. After visual examination, the docking complexes were again refined and analyzed in PyMol (1.6)[136] and YASARA for proper orientation of the binding pocket and the double bond oxygen of the ester function towards the oxyanion hole, as well as a sufficient distance from the related carbonyl atom to the nucleophilic side chain of S205. Chapter one: Materials and Methods 55

Molecular dynamics simulation For molecular dynamics simulation, the simulation cell around all atoms was filled with water molecules and Na+ and Cl- ions were added to reach a concentration of 0.9%. The modeling temperature and pH were 25 °C and 7.4, respectively. No additional changes were done, when the YASARA macro md_run was used for the enzyme-substrate complexes or the intermediate step models (covalently bound substrate). The run was stopped after 30-50 ns and the snapshots (taken each 25 ps) were analyzed. If co-solvent molecules were included, the simulation box (5 Å around all atoms) was filled with water molecules and an adequate portion of co-solvent molecules were added to reach the necessary concentration (2%).

Tunnel cluster analysis CAVER (3.0)[137] was used as PyMol plugin to analyze tunnels from the named starting points inside the active site and finally S205 to achieve reliable pathways of substrates and co-substrates as well as potential binding pockets for solvent molecules. The default parameters were not changed. The plugin creates spheres around the starting point and by adding additional spheres inside the first radius a way to the solvent exposed surface is simulated. The throughputs, lengths, bottlenecks and residues on the way outside the buried structure are calculated and clustered.

Network identification and residue contact detection As mentioned before, general interactions were analyzed within YASARA and polar contacts were identified in PyMol. Both, intra- and intermolecular interactions were visualized and measured. For the models, residue interaction networks (RINs) were calculated using the additional software RINalyzer for Cytoscape. During the network generation, the RINalyzer plugin (2.0)[138-140] was used for Cytoscape (3.0.1)[141] to communicate with Chimera (1.8)[142] for structural comparison, analysis and transformation of the structural information into RINalyzer. Residue interaction networks (RINs) for the modeled and refined PLE isoenzymes were calculated. Standard setups were used to show possible interactions inside the structures, regarding the findings in YASARA and PyMol. The generated RINs from the PLE1-6 pdb files were displayed in Cytoscape and superposed to define broken or established contacts throughout all residues involved in the RINs. Chapter one 56

Abbreviations aa amino acid APLE alternative pig liver esterase bp base pairs BLAST basic local alignment search tool E. coli Escherichia coli LCE liver carboxyl esterase hLCE human liver carboxyl esterase rLCE rabbit liver carboxyl esterase MSA multiple sequence alignment pdb protein data bank PICE porcine intestinal carboxyl esterase PLE pig liver esterase rmsd root mean square deviation RIN residue interaction network Chapter one 57

References

[1] H. D. Dakin, J. Physiol. 1903, 30, 253- [16] T. Takahashi, A. Ikai, K. Takahashi, J Biol. 263. Chem. 1989, 264, 11565-11571. [2] H. D. Dakin, J. Physiol. 1905, 32, 199- [17] T. Takahashi, M. Nishigai, A. Ikai, K. 206. Takahashi, FEBS Lett. 1991, 280, 297- 300. [3] E. Heymann, W. Junge, Eur. J. Biochem. 1979, 95, 509-518. [18] M. Matsushima, H. Inoue, M. Ichinose, S. Tsukada, K. Miki, K. Kurokawa, T. [4] W. Junge, E. Heymann, Eur. J. Biochem. Takahashi, K. Takahashi, FEBS Lett. 1979, 95, 519-525. 1991, 293, 37-41. [5] F. Bjorkling, J. Boutelje, M. Hjalmarsson, [19] U. T. Bornscheuer, R. J. Kazlauskas, in K. Hult, T. Norin, Chem. Commun. 1987, Hydrolases in Organic Synthesis, Wiley- 1041-1042. VCH, Weinheim, 2006, pp. 141-183. [6] J. Boutelje, M. Hjalmarsson, K. Hult, M. [20] E. Heymann, K. Peter, Biol. Chem. 1993, Lindbäck, T. Norin, Bioorg. Chem. 1988, 374, 1033-1036. 16, 364-375. [21] N. Öhrner, A. Mattson, T. Norin, K. Hult, [7] M. E. Smith, S. Banerjee, Y. Shi, M. Biocatalysis 1990, 4, 81-88. Schmidt, U. T. Bornscheuer, D. S. Masterson, ChemCatChem 2012, 4, [22] L. K. P. Lam, C. M. Brown, B. Dejeso, L. 472-475. Lym, E. J. Toone, J. B. Jones, J. Am. Chem. Soc. 1988, 110, 4409-4411. [8] M. E. Smith, M. P. C. Fibinger, U. T. Bornscheuer, D. S. Masterson, [23] A. Musidlowska-Persson, U. T. ChemCatChem 2015, 7, 3179-3185. Bornscheuer, J. Mol. Catal. B: Enzym. 2002, 19–20, 129-133. [9] A. Hummel, Diploma thesis, University of Greifswald (Greifswald), 2007. [24] A. Musidlowska-Persson, U. T. Bornscheuer, Protein Eng. 2003, 16, [10] A. Hummel, E. Brüsehaber, D. Böttcher, 1139-1145. H. Trauthwein, K. Doderer, U. T. Bornscheuer, Angew. Chem. Int. Ed. [25] A. Musidlowska-Persson, U. T. 2007, 46, 8492-8494. Bornscheuer, Tetrahedron: Asymmetry 2003, 14, 1341-1344. [11] M. Hermann, M. U. Kietzmann, M. Ivancic, C. Zenzmaier, R. G. M. Luiten, [26] D. Seebach, M. Eberle, Chimia 1986, 40, W. Skranc, M. Wubbolts, M. Winkler, R. 315-318. Birner-Gruenberger, H. Pichler, H. [27] L. Blanco, E. Guibejampel, G. Rousseau, Schwab, J. Biotechnol. 2008, 133, 301- Tetrahedron Lett. 1988, 29, 1915-1918. 310. [28] E. Fouque, G. Rousseau, Synthesis 1989, [12] K. Krisch, in The Enzymes, Vol. 5: 661-666. Hydrolisis, sulfate esters, carboxyl esters, glycosides (Ed.: P. D. Boyer), [29] C. H. Senanayake, T. J. Bill, R. D. Larsen, Academic Press, 1971, pp. 43-69. J. Leazer, P. J. Reider, Tetrahedron Lett. 1992, 33, 5901-5904. [13] M. Ohno, S. Kobayashi, T. Iimori, Y. F. Wang, T. Izawa, J. Am. Chem. Soc. 1981, [30] D. Basavaiah, P. D. Rao, Tetrahedron: 103, 2405-2406. Asymmetry 1994, 5, 223-234. [14] F. C. Huang, L. F. Lee, R. S. Mittal, P. R. [31] D. Basavaiah, P. D. Rao, Synth. Ravikumar, J. A. Chan, C. J. Sih, E. Caspi, Commun. 1994, 24, 925-929. C. R. Eck, J. Am. Chem. Soc. 1975, 97, [32] D. Basavaiah, P. D. Rao, Tetrahedron: 4144-4145. Asymmetry 1995, 6, 789-800. [15] D. Farb, W. P. Jencks, Arch. Biochem. [33] K. Faber, Biotransformations in Organic Biophys. 1980, 203, 214-226. th Chemistry, 6 ed., Springer, Heidelberg, 2011. Chapter one: References 58

[34] K. Adachi, S. Kobayashi, M. Ohno, [49] S. de la Escalera, E. O. Bockamp, F. Chimia 1986, 40, 311-314. Moya, M. Piovant, F. Jimenez, EMBO J. 1990, 9, 3593-3601. [35] B. J. Gaede, C. A. Nardelli, Org. Process Res. Dev. 2005, 9, 23-29. [50] P. S. Kim, S. A. Hossain, Y. N. Park, I. Lee, S. E. Yoo, P. Arvan, Proc. Natl. [36] S. Lange, A. Musidlowska, C. Schmidt- Acad. Sci. USA 1998, 95, 9909-9913. Dannert, J. Schmitt, U. T. Bornscheuer, Chembiochem 2001, 2, 576-582. [51] T. Hotelier, L. Renault, X. Cousin, V. Negre, P. Marchot, A. Chatonnet, [37] A. Musidlowska, S. Lange, U. T. Nucleic Acids Res. 2004, 32, D145-147. Bornscheuer, Angew. Chem. Int. Ed. 2001, 40, 2851-2853. [52] D. Hasenpusch, U. T. Bornscheuer, W. Langel, J. Mol. Model. 2011, 17, 1493- [38] D. Böttcher, E. Brüsehaber, K. Doderer, 1506. U. T. Bornscheuer, Appl. Microbiol. Biotechnol. 2007, 73, 1282-1289. [53] R. Kourist, H. Jochens, S. Bartsch, R. Kuipers, S. K. Padhi, M. Gall, D. [39] D. L. Ollis, E. Cheah, M. Cygler, B. Bottcher, H. J. Joosten, U. T. Dijkstra, F. Frolow, S. M. Franken, M. Bornscheuer, ChemBioChem 2010, 11, Harel, S. J. Remington, I. Silman, J. 1635-1643. Schrag, J. L. Sussman, K. H. G. Verschueren, A. Goldman, Protein Eng. [54] D. B. Janssen, Curr. Opin. Chem. Biol. 1992, 5, 197-211. 2004, 8, 150-159. [40] C. Angkawidjaja, D. J. You, H. [55] M. Nardini, B. W. Dijkstra, Curr. Opin. Matsumura, K. Kuwahara, Y. Koga, K. Struct. Biol. 1999, 9, 732-737. Takano, S. Kanaya, FEBS Lett. 2007, 581, [56] E. Henke, J. Pleiss, U. T. Bornscheuer, 5060-5064. Angew. Chem. Int. Ed. 2002, 41, 3211- [41] J. Massoulie, N. Perrier, H. Noureddine, 3213. D. Liang, S. Bon, Chem. Biol. Interact. [57] E. Henke, U. T. Bornscheuer, R. D. 2008, 175, 30-44. Schmid, J. Pleiss, ChemBioChem 2003, [42] T. L. Born, M. Franklin, J. S. Blanchard, 4, 485-493. Biochemistry 2000, 39, 8556-8564. [58] R. Kourist, P. D. de Maria, U. T. [43] J. K. Beetham, D. Grant, M. Arand, J. Bornscheuer, ChemBioChem 2008, 9, Garbarino, T. Kiyosue, F. Pinot, F. 491-498. Oesch, W. R. Belknap, K. Shinozaki, B. D. [59] S. H. Krishna, M. Persson, U. T. Hammock, DNA Cell Biol. 1995, 14, 61- Bornscheuer, Tetrahedron: Asymmetry 71. 2002, 13, 2693-2696. [44] D. M. Nedrud, H. Lin, G. Lopez, S. K. [60] M. Gall, R. Kourist, M. Schmidt, U. T. Padhi, G. A. Legatt, R. J. Kazlauskas, Bornscheuer, Biocatal. Biotransform. Chem. Sci. 2014, 5, 4265-4277. 2010, 28, 201-208. [45] F. Fischer, S. Kunne, S. Fetzner, J. [61] S. Smialowski-Fleter, A. Moulin, J. Bacteriol. 1999, 181, 5725-5733. Perrier, A. Puigserver, Eur. J. Biochem. [46] Y. Umezawa, K. Yokoyama, Y. Kikuchi, 2002, 269, 1109-1117. M. Date, K. Ito, T. Yoshimoto, H. Matsui, [62] S. Smialowski-Fleter, A. Moulin, C. J. Biochem. 2004, 136, 293-300. Villard, A. Puigserver, Eur. J. Biochem. [47] E. Chovancova, J. Kosinski, J. M. 2000, 267, 2227-2234. Bujnicki, J. Damborsky, Proteins 2007, [63] P. Mohr, N. Waespesarcevic, C. Tamm, 67, 305-316. K. Gawronska, J. K. Gawronski, Helv. [48] G. H. Krooshof, E. M. Kwant, J. Chim. Acta 1983, 66, 2501-2511. Damborsky, J. Koca, D. B. Janssen, [64] E. J. Toone, M. J. Werth, J. B. Jones, J. Biochemistry 1997, 36, 9571-9580. Am. Chem. Soc. 1990, 112, 4946-4952. [65] E. J. Toone, J. B. Jones, Tetrahedron: Asymmetry 1991, 2, 1041-1052. Chapter one: References 59

[66] E. J. Toone, J. B. Jones, Tetrahedron: [81] M. Paravidino, P. Böhm, H. Gröger, U. Asymmetry 1991, 2, 207-222. Hanefeld, in in Organic Synthesis, Vol. 1 (Eds.: K. Drauz, [67] L. Provencher, H. Wynn, J. B. Jones, A. H. Gröger, O. May), Wiley-VCH, R. Krawczyk, Tetrahedron: Asymmetry Weinheim, 2012, pp. 249-362. 1993, 4, 2025-2040. [82] J. Moravcova, Z. Vanclova, J. Capkova, [68] L. Provencher, J. B. Jones, J. Org. Chem. K. Kefurt, J. Stanek, J. Carbohydr. Chem. 1994, 59, 2729-2732. 1997, 16, 1011-1028. [69] R. Bloch, E. Guibe-Jampel, C. Girard, [83] A. Basak, G. Bhattacharya, S. K. Palit, Tetrahedron Lett. 1985, 26, 4087-4090. Bull. Chem. Soc. Jpn. 1997, 70, 2509- [70] H. Moorlag, R. M. Kellogg, Tetrahedron: 2513. Asymmetry 1991, 2, 705-720. [84] L. M. Zhu, M. C. Tedford, Tetrahedron [71] C. Tamm, Indian J. Chem., Sect. B: Org. 1990, 46, 6587-6611. Chem. Incl. Med. Chem. 1993, 32, 190- [85] N. W. Alcock, D. H. G. Crout, C. M. 194. Henderson, S. E. Thomas, Chem. [72] S. Bencharit, C. L. Morton, E. L. Howard- Commun. 1988, 746-747. Williams, M. K. Danks, P. M. Potter, M. [86] S. Rawale, L. M. Hrihorczuk, Wei, J. R. Redinbo, Nat. Struct. Biol. 2002, 9, Zemlicka, J. Med. Chem. 2002, 45, 937- 337-342. 943. [73] S. Bencharit, C. L. Morton, J. L. Hyatt, P. [87] S. Ruppert, H. J. Gais, Tetrahedron: Kuhn, M. K. Danks, P. M. Potter, M. R. Asymmetry 1997, 8, 3657-3664. Redinbo, Chem. Biol. 2003, 10, 341-349. [88] M. Jungen, H.-J. Gais, Tetrahedron: [74] S. Bencharit, C. C. Edwards, C. L. Asymmetry 1999, 10, 3747-3758. Morton, E. L. Howard-Williams, P. Kuhn, P. M. Potter, M. R. Redinbo, J. Mol. Biol. [89] S. Wallert, K. Drauz, I. Grayson, H. 2006, 363, 201-214. Gröger, P. Dominguez de Maria, C. Bolm, Green Chem. 2005, 7, 602-605. [75] During the writing of this thesis, crystals of the isoenzyme PLE5 were obtained [90] J. M. Herdan, M. Balulescu, O. Cira, J. and the structure was computed. Mol. Catal. A: Chem. 1996, 107, 409- Currently, the sutructure was deposited 414. under the pdb code 5FV4. [91] M. Schneider, N. Engel, H. Boensmann, [76] D. Hasenpusch, Dissertation thesis, Angew. Chem. Int. Ed. 1984, 23, 66-66. University of Greifswald (Greifswald), [92] L. K. P. Lam, R. A. H. F. Hui, J. B. Jones, J. 2011. Org. Chem. 1986, 51, 2047-2050. [77] D. Hasenpusch, D. Möller, U. T. [93] K. Laumen, E. H. Reimerdes, M. Bornscheuer, W. Langel, ARXIV, eprint Schneider, H. Gorisch, Tetrahedron Lett. arXiv:1204.6186, 2012. 1985, 26, 407-410. [78] R. K. Kuipers, H. J. Joosten, W. J. van [94] D. Romano, F. Bonomi, M. C. de Berkel, N. G. Leferink, E. Rooijen, E. Mattos, T. D. Fonseca, M. D. F. de Ittmann, F. van Zimmeren, H. Jochens, Oliveira, F. Molinari, Biotechnol. Adv. U. Bornscheuer, G. Vriend, V. A. dos 2015, 33, 547-565. Santos, P. J. Schaap, Proteins 2010, 78, 2101-2113. [95] P. D. de Maria, C. A. Garcia-Burgos, G. Bargeman, R. W. van Gemert, Synthesis [79] M. Nardini, I. S. Ridder, H. J. Rozeboom, 2007, 1439-1452. K. H. Kalk, R. Rink, D. B. Janssen, B. W. Dijkstra, J. Biol. Chem. 1999, 274, [96] V. Iosub, A. R. Haberl, J. Leung, M. Tang, 14579-14586. K. Vembaiyan, M. Parvez, T. G. Back, J. Org. Chem. 2010, 75, 1612-1619. [80] H. J. Gais, F. Theil, in Enzyme Catalysis in Organic Synthesis (Eds.: K. Drauz, H. Waldmann), Wiley-VCH, Weinheim, 1995, pp. 165-363. Chapter one: References 60

[97] S. Dörrich, S. Falgner, S. Schweeberg, C. [114] M. Polla, T. Frejd, Tetrahedron 1991, Burschka, P. Brodin, B. M. Wissing, B. 47, 5883-5894. Basta, P. Schell, U. Bauer, R. Tacke, [115] A. J. H. Klunder, W. B. Huizinga, A. J. M. Organometallics 2012, 31, 5903-5917. Hulshof, B. Zwanenburg, Tetrahedron [98] Y. F. Wang, C. J. Sih, Tetrahedron Lett. Lett. 1986, 27, 2543-2546. 1984, 25, 4999-5002. [116] A. J. H. Klunder, W. B. Huizinga, P. J. M. [99] S. Iriuchijima, K. Hasegawa, G. Sessink, B. Zwanenburg, Tetrahedron Tsuchihashi, Agric. Biol. Chem. 1982, Lett. 1987, 28, 357-360. 46, 1907-1910. [117] A. J. H. Klunder, F. J. C. Vangastel, B. [100] M. Schneider, N. Engel, H. Boensmann, Zwanenburg, Tetrahedron Lett. 1988, Angew. Chem. Int. Ed. 1984, 23, 64-66. 29, 2697-2700. [101] B. L. Kedrowski, J. Org. Chem. 2003, 68, [118] S. Tawaki, A. M. Klibanov, J. Am. Chem. 5403-5406. Soc. 1992, 114, 1882-1884. [102] G. Sabbioni, J. B. Jones, J. Org. Chem. [119] T. Miyazawa, S. Kurita, S. Ueji, T. 1987, 52, 4565-4570. Yamada, Biocatal. Biotransform. 2000, 17, 459-473. [103] H. J. Gais, K. L. Lukas, W. A. Ball, S. Braun, H. J. Lindner, Liebigs Ann. Chem. [120] T. Miyazawa, S. Kurita, S. Ueji, T. 1986, 687-716. Yamada, S. Kuwata, J. Chem. Soc., Perkin Trans. 1 1992, 2253-2255. [104] S. H. Hu, C. A. Martinez, B. Kline, D. Yazbeck, J. H. Tao, D. J. Kucera, Org. [121] M. J. García, R. Brieva, F. Rebolledo, V. Process Res. Dev. 2006, 10, 650-654. Gotor, Biotechnol. Lett. 1991, 13, 867- 870. [105] H. J. Gais, G. Bulow, A. Zatorski, M. Jentsch, P. Maidonis, H. Hemmerle, J. [122] G. Carrea, B. Danieli, G. Palmisano, S. Org. Chem. 1989, 54, 5115-5122. Riva, M. Santagostino, Tetrahedron: Asymmetry 1992, 3, 775-784. [106] M. Schneider, N. Engel, P. Honicke, G. Heinemann, H. Gorisch, Angew. Chem. [123] T. Miyazawa, S. Kurita, S. Ueji, T. Int. Ed. 1984, 23, 67-68. Yamada, S. Kuwata, Biotechnol. Lett. 1992, 14, 941-946. [107] F. Bjorkling, J. Boutelje, S. Gatenbeck, K. Hult, T. Norin, P. Szmulik, Tetrahedron [124] E. Krieger, G. Koraimann, G. Vriend, 1985, 41, 1347-1352. Proteins 2002, 47, 393-402. [108] W. Boland, U. Niedermeyer, L. Jaenicke, [125] H. M. Greenblatt, T. C. Otto, M. G. Helv. Chim. Acta 1985, 68, 2062-2073. Kirkpatrick, E. Kovaleva, S. Brown, G. Buchman, D. M. Cerasoli, J. L. Sussman, [109] P. Süss, S. Illner, J. von Langermann, S. Acta Crystallogr., Sect. F: Struct. Biol. Borchert, U. T. Bornscheuer, R. Cryst. Commun. 2012, 68, 269-272. Wardenga, U. Kragl, Org. Process Res. Dev. 2014, 18, 897-903. [126] A. C. Hemmert, T. C. Otto, M. Wierdl, C. C. Edwards, C. D. Fleming, M. [110] P. Süss, S. Borchert, J. Hinze, S. Illner, J. MacDonald, J. R. Cashman, P. M. von Langermann, U. Kragl, U. T. Potter, D. M. Cerasoli, M. R. Redinbo, Bornscheuer, R. Wardenga, Org. Mol. Pharmacol. 2010, 77, 508-516. Process Res. Dev. 2015, 19, 2034-2038. [127] E. Krieger, T. Darden, S. B. Nabuurs, A. [111] K. Faber, G. Ottolina, S. Riva, Finkelstein, G. Vriend, Proteins 2004, Biocatalysis 1993, 8, 91-132. 57, 678-683. [112] M. Luyten, S. Muller, B. Herzog, R. [128] M. Silberstein, J. Damborsky, S. Vajda, Keese, Helv. Chim. Acta 1987, 70, 1250- Biochemistry 2007, 46, 9239-9249. 1254. [129] S. D. Lahiri, G. F. Zhang, J. Y. Dai, D. [113] G. Guanti, L. Banfi, E. Narisano, R. Riva, Dunaway-Mariano, K. N. Allen, S. Thea, Tetrahedron Lett. 1986, 27, Biochemistry 2004, 43, 2812-2820. 4639-4642. Chapter one: References 61

[130] D. Pendlebury, R. Wang, R. D. Henin, A. Hockla, A. S. Soares, B. J. Madden, M. D. Kazanov, E. S. Radisky, J. Biol. Chem. 2014, 289, 32783-32797.

[131] M. Prabu-Jeyabalan, E. Nalivaika, C. A. Schiffer, Structure 2002, 10, 369-381. [132] S. Q. Yang, Z. Qin, X. J. Duan, Q. J. Yan, Z. Q. Jiang, J. Lipid Res. 2015, 56, 1616- 1624. [133] M. E. Smith, Dissertation thesis, The University of Southern Mississippi (Hattiesburg, MS, USA), 2014. [134] D. S. Masterson, D. A. Rosado Jr., C. Nabors, Tetrahedron: Asymmetry 2009, 20, 1476-1486. [135] G. M. Morris, D. S. Goodsell, R. S. Halliday, R. Huey, W. E. Hart, R. K. Belew, A. J. Olson, J. Comput. Chem. 1998, 19, 1639-1662. [136] PyMol, The PyMOL Molecular Graphics System, Version 1.5.0.4 Schrödinger, LLC. [137] E. Chovancova, A. Pavelka, P. Benes, O. Strnad, J. Brezovsky, B. Kozlikova, A. Gora, V. Sustr, M. Klvana, P. Medek, L. Biedermannova, J. Sochor, J. Damborsky, PLoS Comput. Biol. 2012, 8, e1002708. [138] N. T. Doncheva, K. Klein, F. S. Domingues, M. Albrecht, Trends Biochem. Sci. 2011, 36, 179-182. [139] N. T. Doncheva, Y. Assenov, F. S. Domingues, M. Albrecht, Nat. Protoc. 2012, 7, 670-685. [140] N. T. Doncheva, K. Klein, J. H. Morris, M. Wybrow, F. S. Domingues, M. Albrecht, BMC Proc. 2014, 8, 1-11. [141] P. Shannon, A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, D. Ramage, N. Amin, B. Schwikowski, T. Ideker, Genome Res. 2003, 13, 2498-2504. [142] E. F. Pettersen, T. D. Goddard, C. C. Huang, G. S. Couch, D. M. Greenblatt, E. C. Meng, T. E. Ferrin, J. Comput. Chem. 2004, 25, 1605-1612.

Science must have originated in the feeling that something was wrong.

Thomas Carlyle

65

Chapter 2.

The catalytic activity of a putative linoleate from Lactobacillus acidophilus and its close relation to an oleate hydratase from the same origin

Sequence alignment with the basic local alignment search tool (BLAST) identified closely related homologues of the crystallized and characterized Lactobacillus acidophilus hydratase (LAH).[1-2] The initial oleate hydratase differs only in three amino acid residues to an annotated isomerase enzyme from the same species.[3] Hypothesizing a common ancestor, catalytic promiscuity was investigated by creation of intermediate-substituted mutants of the flavinadenindinucleotide (FAD)-dependent enzymes. Their activity towards oleic acid and linoleic acid was analyzed and intensively discussed.

Part of this chapter has been published: M.P.C. Fibinger, G.J. Freiherr von Saß, C. Herrfurth, I. Feussner, U.T. Bornscheuer, Eur. J. Lip. Sci. Technol. 2016, 118, 841-848. doi: 10.1002/ejlt.201500444[4]

Chapter two 66

Content

Introduction ...... 67

Oleate and linoleate hydratases – hydro-lyases ...... 68

Linoleate – the production of conjugated linoleic acid ...... 71

Catalytic promiscuity of hydratases and isomerases ...... 74

Results ...... 76

Sequence search and multiple sequence alignment ...... 76

Gene cloning and mutant creation ...... 76

Protein expression and purification ...... 78

Molecular modeling, scaffold stabilization and FAD docking ...... 80

Assay development and control reactions ...... 83

Hydratase assay ...... 83 Isomerase assay ...... 85 Fatty acid derivatization ...... 85 Biocatalysis reactions ...... 87

CD spectrometric structure validation ...... 89

Discussion...... 91

Summary ...... 98

Materials and Methods ...... 99

Molecular Cloning ...... 99

Gene expression and protein purification ...... 99

Biocatalysis ...... 100

Spectrophotometric measurements ...... 100

Fatty acid derivatization ...... 100

GC analysis...... 101

Structure modeling, tunnel analysis and docking ...... 102

CD spectroscopic analysis ...... 102

Abbreviations ...... 103

References ...... 104

Chapter two 67

Introduction

The catalytic conversion of fatty acids is of wide interest, due to the high industrial potential of these valuable compounds. The special chemical potential of unsaturated fatty acids is based on the different composition of double bonds throughout the molecule. They can be used for chemical or enzymatic transformations to yield various products. The starting materials can be derived from natural, renewable resources and thus have ecological and industrial benefits. Major fatty acids in natural oil sources such as olive oil, peanut oil, palm kernel oil, sunflower oil, rape seed oil or soy bean oil are oleic or linoleic acid with up to 83%, usually followed by palmitic acid (Table 1).[5]

Table 1: The fatty acid content of different natural oil sources.[5] The main content portions are highlighted in red. The fatty acids are referred by their total number of carbon atoms to the total number of double bonds, e.g. 16:1 means 16 carbon atoms and one double bond.

fatty acid peanut soy bean sun flower olive palm rape seed linseed 12:0 ≤0.1 ≤0.1 ≤0.1 n.d. ≤0.5 n.d. n.d. 14:0 ≤0.1 ≤0.2 ≤0.2 ≤0.5 0.5-2.0 ≤0.2 n.d. 16:0 8.0-14.0 8.0-13.5 5.0-7.6 7.5-20.0 39.9-47.5 2.5-7.0 4.0-6.0 16:1 ≤0.2 ≤0.2 ≤0.3 0.3-3.5 ≤0.6 ≤0.6 ≤0.5 17:0 ≤0.1 ≤0.1 ≤0.2 ≤0.3 ≤0.2 ≤0.3 n.d. 17:1 ≤0.1 ≤0.1 ≤0.1 ≤0.3 n.d. ≤0.3 n.d. 18:0 1.0-4.5 2.0-5.4 2.7-6.5 0.5-5.0 3.5-6.0 0.8-3.0 2.0-3.0 18:1 35.0-69.0 17.0-30.0 14.0-39.4 55.0-83.0 36.0-44.0 51.0-70.0 10.0-22.0 18:2 12.0-43.0 48.0-59.0 48.3-74.0 3.5-21.0 9.0-12.0 15.0-30.0 12.0-18.0 18:3 ≤0.3 4.5-11.0 ≤0.3 ≤1.0 ≤0.5 5.0-14.0 56.0-71.0 20:0 1.0-2.0 0.1-0.6 0.1-0.5 ≤0.6 ≤1.0 0.2-1.2 ≤0.5 20:1 0.1-0.7 ≤0.5 ≤0.3 ≤0.4 ≤0.4 0.1-4.3 ≤0.6 20:2 n.d. ≤0.1 n.d. n.d. n.d. ≤0.1 n.d. 22:0 1.5-4.5 ≤0.7 0.3-1.5 ≤0.2 ≤0.2 ≤0.6 n.d. 22:1 ≤0.3 ≤0.3 ≤0.3 n.d. n.d. ≤2.0 n.d. 22:2 n.d. n.d. ≤0.3 n.d. n.d. ≤0.1 n.d. 24:0 0.5-2.5 ≤0.5 ≤0.5 ≤0.2 n.d. ≤0.3 n.d. 24:1 ≤0,3 n.d. n.d. n.d. n.d. ≤0.4 n.d. n.d. - not detectable (content ≤0.05%)

Chemically, the options of reactions at the cis-double bonds in (poly-)unsaturated fatty acids are limited, usually missing possibilities to regulate the stereochemistry of the chemical reactions.[6-8] High research interest exists towards hydroxylation or isomerization (Figure 1).

Chapter two: Introduction 68

R1 R2 hydratase OH H O 2 hydroxylated product R1 R2

R1 dienoic substrate R2 isomerase isomerized product with conjugated double bond Figure 1: Reaction of a cis-polyunsaturated long chain fatty acid to the hydroxylated or isomerized product catalyzed by a hydratase or an isomerase, respectively. Usually, the isomerization yields conjugated products, also differing in both, the isomerized bond and the final configuration of the product.

Oleate and linoleate hydratases – hydro-lyases Long chain hydroxy fatty acids have a broad potential use as starting materials for polymers, lubricants, epoxy resins, biofuels, detergents or paints. They serve as food additives and also bioflavors can be created through lactone formation.[9-12] Chemical synthesis of hydroxy products from unsaturated fatty acids use sulfation[13], selenium dioxide/tert-butylhydroperoxide[14-15] or performic acid[16] and are not regiospecific or enantioselective. Promising alternatives are enzymes, providing selective substrate recognition, while ensuring a regio- and often enantioselective active site environment. The first oleate hydratase (OH) activity was described in 1962 and was found in a Pseudomonas species.[17-18] Afterwards, enzymatic conversion of unsaturated C18 fatty acids was also observed in bacterial isolates or cultures of Corynebacterium sp.[19], Stenotrophomonas nitritireducens[20], Sphingobacterium thalpophilum[21] and Saccharo- myces cerevisiae.[22] Extensive knowledge about the proteins, carrying the desired activities, was collected and thus, fatty acid hydratases from Bifidobacterium breve[23-24], Butyrivibrio fibrisolvens[25], Bifidobacterium animalis subsp. lactis, Lactobacillus acidophilus, L. plantarum, L. rhamnosus[26], Elizabethkingia meningoseptica[27], Lysini- bacillus fusiformis[28-29], Macrococcus caseolyticus[30], Stenotrophomonas maltophilia[31- 32] and Streptococcus pyogenes[33] were cloned and heterologously expressed in both, prokaryotic and eukaryotic hosts. The accepted substrates have a minimum chain length of C14 and a distance of nine carbon atoms between the carboxylic function and the respective, essentially cis-configured double bond is required. Analyzing the substrate scope of oleate hydratases shows that not only oleic acid is recognized as substrate.[34] Briefly, the conversion of myristoleic acid (C14:1, 9Z) to 10-hydroxymyristic acid, palmitoleic acid (C16:1, 9Z) to 10-hydroxyhexadecanoic acid, oleic acid (18:1, 9Z) to 10- hydroxyoctadecanoic acid, linoleic acid (18:2, 9Z,12Z) to 10-hydroxy-12(Z)-octadecenoic acid, α-linolenic acid (C18:3, 9Z,12Z,15Z) to 10-hydroxy-12(Z),15(Z)-octadecadienoic acid

Chapter two: Introduction 69 and γ-linolenic acid (C18:3, 6Z,9Z,12Z) to 10-hydroxy-6(Z),12(Z)-octadecadienoic acid was reported.[6] Additionally, Lactobacillus fusiformis was found to produce 10,12- dihydroxystearic acid from ricinoleic acid (18:1, 9Z, 12-OH).[29] Since this was the first report on the conversion of a hydroxylated unsaturated fatty acid, there are potentially additional variabilities or substituting functionalities distant from the targeted C9-double bond. All products were mono-hydroxylated at position C9 and only an oleate hydratase from Macrococcus caseolyticus possesses the ability of producing the dihydroxy fatty acids 10,13-dihydroxyoctadecanoic acid from linoleic acid, 10,13-dihydroxy-15(Z)- octadecenoic acid from α-linolenic acid and 10,13-dihydroxy-6(Z)-octadecenoic acid from γ-linolenic acid. Hydroxy fatty acids, stereoselectively produced by enzymes, are valuable starting materials for polymers and serve as emulsifiers, lubricants and stabilizers.[35] Pharmaceutical potential cannot be excluded, because the compounds exhibit anti- fungal, anti-inflammatory and anti-cancerogenic effects.[12] Furthermore, hydroxy fatty acids enable an alternative access to long chain ω-hydroxy acids, aliphatic alcohols and dicarboxylic acids. This is achieved by a combination of an alcohol dehydrogenase (ADH) from Micrococcus luteus, a Baeyer-Villiger monooxygenase (BVMO) of Pseudomonas putida or Pseudomonas fluorescence and an esterase of Pseudomonas fluorescence. Choosing one or the other BVMO, aliphatic alcohols and dicarboxylic acids or ω-hydroxy acids and short- or mid-chain carboxylic acids are produced.[36] Finally, lactones yielded from the hydroxy fatty acids are used as fragrance and flavor compounds.[37-39] Alternatively, they can be also accessed by lipoxygenases.[40] Lactobacillus strains were screened for oleate hydratase activity previously and this activity was assigned to be a feature of the fatty acid hydratase LAH (GenBank: AAV42528.1 and UniProt: Q5FL96).[2] In addition to the structural determination, the gene product was isolated and recombinantly expressed in E. coli.[26] On sequence level, the protein is a member of the MCRA family, which is related to domain I of the β chain of class II major histocompatibility antigens.[41] Those proteins are widespread throughout Gram-positive and Gram-negative bacteria, involved in primary infections of skin, throat and mucosal surfaces.[29] The MCRA family contains proteins from different species, which were found to have fatty acid hydration activity: The reactive antigen species of Bifidobacterium breve, Elizabethkingia meningoseptica and Streptococcus pyogenes are able to convert oleic acid to the hydroxy acid.[24, 27, 42] On one hand Lactobacillus strains were found to be dependent on unsaturated fatty acids in the growth media, especially in case of oleic acid.[43-44] The growth limiting function of the essential nutrient became increasingly significant in the absence of biotin.[45] On the other hand, unsaturated long chain fatty acids inhibit the cell growth of Staphylococcus aureus, E. coli and Bacillus licheniformis at higher concentrations by interacting with the cellular membranes.[46-47] Furthermore, the fatty acids inhibit the enoyl-acyl carrier protein reductase (FabI), responsible for cellular fatty acid synthesis.[48] Thus, the fatty

Chapter two: Introduction 70 acid hydratase activity is not only part of the degradation pathway by β-oxidation, but can also be used for detoxification to survive fatty acid rich environments (e.g. in skin or inflamed tissues).[33] Lactobacillus acidophilus is a gram-positive organism, harboring catalytic activity on unsaturated fatty acids.[26] The genome of strain NCFM was completely published in 2005 and thus protein functions were annotated by in silico sequence analysis and comparison.[49] Following investigations of LAH indicate substrate specificity for palmitoleic, oleic, linoleic and α-linolenic acid, producing the respective 10-hydroxy fatty acids.[2] The strongest activity was found for linoleic acid, naming the enzyme a linoleate 10-hydratase.[50] In another study, the enzyme was found to be an isomerase without necessity of a cofactor.[48] However, FAD dependency was proven before, but it was also discussed, that hydroxy fatty acids are an intermediate during the synthesis of conjugated linoleic acids.[26, 51-54] The protein sequences for oleate/linoleate isomerases are strongly conserved, typically harboring a FAD binding domain folding of the Rossmann type. In Lactobacilli, the third glycine residue is substituted by asparagine, [42] forming a GXGXXN(A/S)X15(D/K/E) motif. The crystal structure was solved by Volkov et al., revealing a homodimeric protein complex with 591 amino acids in each chain.[2] The 67.7 kDa monomer consists of four domains (Figure 2).

Figure 2: Homodimer of apoenzyme Lactobacillus acidophilus hydratase (LAH, pdb code 4IA5) consisting of four domains: domain 1 with the Rossman fold for cofactor binding and responsible for dimerization is shown in yellow, domain 2, contributing to the active site, in blue, domain 3 including the substrate channel and 4 serving as lid in orange and green, respectively.

The first domain displays a Rossmann fold for cofactor binding and forms the active site together with domain 2. The mostly helical domain 3 is involved in the formation of the

Chapter two: Introduction 71 substrate-binding channel, leading to the active site at the interface of the first three domains. An exclusive feature for the LAH is domain 4, which forms a lid like structure, responsible in substrate recognition and channel construction (Figure 3). This domain is not conserved in other FAD-containing oxidases or isomerases, but putatively related to a long chain acylglycerol lipase.[55]

Figure 3: Model of domain 4 of LAH (based on pdb code 4IA6). Chain A is depicted in blue and chain B in light yellow as apoenzyme and orange with bound substrate. The domain moves away, opening the substrate channel of the opposite protomer for the substrate linoleic acid, shown as space-filled spheres.

The enzymatic activity of LAH is dependent on the cofactor FAD, but the structural model lacks this molecule. Very recently, a related enzyme from Elizabethkingia meningoseptica was crystallized and the structure was solved by molecular replacement, now with bound cofactor to one chain.[56] This special feature lead to the unambiguous identification of the active site, bearing a conserved tyrosine-glutamate pair, necessary for catalysis. The binding partners for the coordination and orientation of FAD were analyzed and are also present in LAH. The catalysis of the hydroxylation of long-chain unsaturated fatty acids is a relevant and interesting possibility to create various compounds. However, until now it was not known, which enzymes are essential for catalysis and if there is any evolutionary connection between hydratases and isomerases. The evolutionary background indicates a strong relationship due to the similarity of substrates and cofactor dependency. Since the cofactor has been shown to have a significant role in the structural arrangement, its role in the catalytic mechanism of the fatty acid isomerase is unambiguous an important part of the redox process.[57-58]

Linoleate isomerases – the production of conjugated linoleic acid The isomerization of linoleic acid can yield up to eight possible isomers.[59] A mixture of those isomers is found, when linoleic acid is chemically converted, e.g. by alkaline

Chapter two: Introduction 72 isomerization.[7-8] The conjugated linoleic acids (CLAs) were first described in 1933 and found by UV-Vis spectrophotometric analysis, due to its characteristic diene peak at 234 nm.[60] To date, the enzymatic or chemoenzymatic synthesis of CLA is possible.[61] CLAs are of special interest for pharmaceutical and dietary applications: The spectrum of positive impacts by these compounds is manifold.[62-63] Thus, anti-cancerogenic and anti- tumorigenic effects were shown by repression of cancerogenesis or proliferation inhibition of certain cancer cell lines. [7, 64-74] CLAs are antibacterial[75], anti- atherosclerotic[76-77] and anti-oxidative.[78-79] The uptake of CLA reduces body fat and in parallel increases the muscle mass[7, 80-82], while depressing cholesterol effects[83] and influencing the low density lipoprotein/high density lipoprotein ratio.[84] The overall triacylglycerol content was found to be reduced, affecting the lipid metabolism[85-86]. In this context, hyperinsulinemia was prevented[87] and it was shown that CLAs are included in the phospholipid fraction of tissues.[66] Finally, the immune system was positively modulated[88-90] and the production of proinflammatory cytokines was reduced.[91] Chronic inflammation, inflammatory cytokines and carcinogenesis are closely related and the positive influence of CLA in these fields is incontrovertible. Therefore, broad investigations and research have been undertaken to identify CLA producers or CLA producing enzymes, the CLA uptake and the production or conversion in mammalian intestines. The dietary sources of CLA are known, but the nutritional concentration is too low to effectively exhibit the beneficial medical effects.[92] As the advantages of CLAs came into focus, the question of constitutional pure compounds raises. In parallel, different methods for CLA analysis became available, also for the direct quantification from microbial isolate extractions.[93] The main isomer found in dietary products is cis-9,trans-11-CLA (respective 9(Z),11(E)-octadecadienoic acid).[94] Consequently, it is the main portion of CLAs in the phospholipid fraction of animal tissue[66] and this configuration is the main product of CLA producing gastrointestinal bacteria Butyrivibrio fibrisolvens.[7, 95] Following the potential of CLA production strains, human-derived Lactobacillus, Lactococcus, Pediococcus and Bifidobacterium species were isolated.[96] Various information about CLAs, their natural occurrence, bacterial production, usage and properties were intensively reviewed.[97] Focusing on industrial production, many bacterial strains were screened for CLA production: Nineteen different species and strains of Lactobacillus, Lactococcus, Streptococcus and Propionibacterium were analyzed, but only three strains of the latter genus (two strains of Propionibacterium freudenreichii subspecies freudenreichii and one strain of Propionibacterium freudenreichii subspecies shermanii) were capable of producing the desired compound.[98] In another study, 36 Lactobacillus and 12 Lactococcus strains were analyzed, resulting in the identification of suitable strains from Lactobacillus curvatus, Lactobacillus plantarum and Lactobacillus sakei.[99] The strain with the highest productivity, Lactobacillus sakei LMG 13558, yielded under slightly acidic conditions a maximum concentration of 0.08 g L-1 cis-9,trans-11-CLA after 16 h

Chapter two: Introduction 73 and 0.04 g L-1 trans-9,trans-11-CLA after 48 h fermentation at 30 °C. The conversion was only 4.5%, but for the production of conjugated linolenic acids (CLNAs) up to 60% were observed in case of α-linolenic acid conversion to cis-9,trans-11,cis-15-CLNA. The product concentration reached its maximum at 0.2 g L-1 after 14 h fermentation at 20 °C. Again, a trans-9,trans-11 byproduct was detected (up to 0.04 g L-1). Additional strains were analyzed, also adding further Lactobacillus species (L. casei[100], L. plantarum[101] and L. acidophilus[59]) to the known CLA accumulators. In the latter one, all eight possible isomers were detected. Moreover, also membrane bound enzymes were characterized[102] and an immobilized hydratase is known from Lactobacillus reuteri.[103] Especially the oxygen supply seems to be critical, because higher CLA production was achieved using microaerobic conditions, whereas in aerobic cultures, no accumulation was detected.[51] This is in contrast to a study using resting cells in stationary phase under oxygen supply, while producing CLA isomers.[104] It should be taken into account that feeding of unsaturated fatty acids to aerobic bacterial cultures will result in oxidation of the substrates, finally using them for cell growth. Analogously to hydroxy fatty acids, another access to CLAs is possible by the usage of lipoxygenases.[105] A putative pathway of conversions and reactions during fatty acid isomerization processes in Lactobacillus was hypothesized by Kishino et al. (Figure 4).[52-53]

HOOC linoleic acid

CLA-hy

OH

HOOC 10-Hydroxy-cis-12-octadecenoic acid

CLA-dh

HOOC O

trans-10-cis-12-CLA HOOC 10-Oxo-cis-12-octadecenoic acid CLA-dc O HOOC HOOC 10-Oxo-trans-11-octadecenoic acid cis-9-trans-11-CLA CLA-dh CLA-hy OH CLA-er HOOC O 10-Hydroxy-trans-11-octadecenoic acid

HOOC HOOC 10-Oxo-octadecanoic acid trans-9-trans-11-CLA CLA-dh CLA-hy

oleic acid + trans-10-Octadecenoic acid

Figure 4: Hypothesized reaction pathway during the conversion of linoleic acid to CLA.[52-54] Three enzymes are necessary for the shift of the double bond: First, linoleic acid is hydrated by CLA-hy and then the hydroxy group is oxidized by CLA-dh. After shifting the double bond by CLA-dc, the previous enzymes mediate the reduction and dehydration of the firstly hydrated carbon atom, yielding the double bond at position C9. The connection to oleic acid is performed by CLA-er, executing the reduction of the trans-double bond.

Chapter two: Introduction 74

The pathways involve three enzymes, identified by sequentially analyzing isolated fractions of Lactobacillus plantarum. During this process, the combination of different isolates yielded CLAs, whereas single fractions were able to produce the hydroxy derivatives. It was reported, that hydroxy fatty acids are products of the first step during the pathway of isomerization. The CLA-hy gene (GenBank: AB671229) was cloned in an E. coli vector and thus the cells were able to produce 10-Hydroxy-11(Z)-octadecenoic acid from linoleic acid. The CLA-dh (GenBank: AB671230) gene product was assumed to be an , because the sequence showed identity with short chain dehydrogenases, enabling the function to reversibly oxidize/reduce the hydroxy/keto function. Together with extracts from the host organism and E. coli cells producing the CLA-hy gene product, CLAs were yielded. Moreover, the CLA-dc (GenBank: AB671231) gene product seemed to be responsible for shifting the double bond as the last step before the reaction reverses until the cis-9-double bond is obtained. Surprisingly, it has homology to a putative acetoacetate decarboxylase from the same host. Finally, all three enzymes were recombinantly expressed in E. coli and in washed cells, cis-9,trans-11-CLA and trans-9,trans-11-CLA were produced analogously to the Lactobacillus plantarum culture. In contrast to these findings, an FAD-dependent isomerase from Propionibacterium acnes was cloned, expressed, characterized and crystallized.[57] The enzyme produces trans-10,cis-12-CLA in an one-step reaction. During the catalytic mechanism, the typical redox-ability of FAD is essential, even if the cofactor’s redox state is not changed after the reaction.[58] The related gene was cloned and successfully expressed in yeast and tobacco seeds[106], rice grains[107] and the oleaginous yeasts Yarrowia lipolytica[108] and Mortierella alpina.[109]

Catalytic promiscuity of hydratases and isomerases Catalytic promiscuity is the capability of an enzyme to perform more than one .[110] More general, enzymatic promiscuity also covers condition and substrate promiscuity, describing the tolerance of uncommon reaction conditions or the acceptance of different substrates. The specialty of catalytic promiscuity is the occurrence of one or more minor side activities, catalyzing a more or less completely different chemical transition. In this context it was discussed that this exceptional feature is a source for the development of new catalytic activities: Generating a new activity by neutral mutation is an evolutionary advantage, when a certain selection pressure fortifies it. This innovation leads to an increased fitness and diverse genes can separately evolve after gene duplication.[111-112] Certain fatty acid isomerases and hydratases are closely related, structurally and sequentially. An evolutionary divergence was hypothesized in this thesis and therefore, different proteins of Lactobacillus acidophilus were recombinantly expressed and analyzed for their chemical ability to convert oleic or linoleic acid. One was described as

Chapter two: Introduction 75 hydratase, the other as isomerase, but both differ in only three amino acids (Figure 5, Table 2). The annotated isomerase (GeneBank ABB43157, UniProt: Q307I4) and the crystallized hydratase (GenBank: AAV42528, UniProt: Q5FL96) were starting points in this study to create potentially promiscuous intermediate enzymes.

Figure 5: Structure of LAH (pdb code 4IA6[2]) with secondary structure elements colored in red for helices, yellow for sheets and green for loops or coils. The residues chosen for sequential mutagenesis are shown as spheres.

If no promiscuous variant is found, the switch between both activities can be figured out creating different mutants, differentiating both enzymes. Furthermore, the enzyme structure was engineered, because of reported difficulties to obtain purified holoenzyme, carrying the catalytic essential FAD cofactor.

Chapter two 76

Results

Sequence search and multiple sequence alignment Lactobacillus acidophilus harbors fatty acid hydratase and isomerase activities, though only the responsible enzyme for the hydration activity was characterized so far. Fatty acid converting enzymes were shown to have FAD-dependent catalytic behavior, without changing its overall redox state after catalysis. The sequence of the described oleate hydratase (LAH, GenBank: AAV42528, UniProt: Q5FL96) was used for a BLASTp search to reveal related homologues throughout the Lactobacillus species. Two sequences were found, annotated as linoleate isomerases: ABB43157 (LAI, UniProt: Q307I4) and AEY68243 (UniProt: H2ET87), sharing an overall sequence identity to LAH of 99% and 93%, respectively. A multiple sequence alignment (Figure 6) confirmed a difference of only three amino acid residues to the starting sequence, namely E157G, F286C and V506A. The next closest related protein differs in 39 amino acids to the LAH, and was formerly described to be exclusively responsible for the formation of the cis- 9,trans-11-CLA isomer. Therefore, the residual isomers found in LAH extracts incubated with linoleic acid must have another genetic origin.[113]

Gene cloning and mutant creation All possible combinations of mutants were cloned to transform the LAH gene to LAI, finally creating seven mutants: Three single, two double and one triple substitution, according to Table 2.

Table 2: Substitution variants to analyze the transition between hydratase and isomerase activity. The triple mutant LAH07 is identical to the detected isomerase LAI (see text).

Mutant Substitution LAH01 E157G LAH02 F286C LAH03 V506A LAH04 E157G F286C LAH05 E157G V506A LAH06 F286C V506A LAH07 = LAI E157G F286C V506A

The purification of LAH suffered from several drawbacks, even in previous experiments, when the crystal structure was determined.[33] Therefore different hexahistidine tag variants were generated for the LAH-WT. As control enzymes served a hydratase from Streptococcus pyogenes (SPH, GenBank: ACI60731, UniProt: B5XK69) and an isomerase from Propionibacterium acnes (PAI, GenBank: AAT82791, UniProt: Q6A8X5), respectively

Chapter two: Results 77 for each activity. Both genes were also cloned N- and C-terminally tagged to verify the purification progress.

Figure 6: ClustalOmega multiple sequence alignment of hits from the BLAST search with the LAH sequence. The found putative linoleate isomerases share a very high sequence identity of 93% up to 99% to LAH. The OhyA sequence was included for comparison, sharing an identity of 43.35% to LAH. The missing residues in the LAH structures (pdb codes 4IA5 and 4IA6) are boxed in red, the FAD-binding motif of the Rossmann fold is highlighted in the yellow box.

Chapter two: Results 78

LAH-WT was provided in the pET24a vector without any tag. The stop-codon was substituted by a glycine codon to obtain optimal flexibility of the additional sequence to yield LAHc with a C-terminal tag. The other variant was analogously prepared to the control enzymes SPH and PAI. Different mutants were cloned similar with N- or C- terminal tag (after mutagenesis), in one case with both to test all possible combinations.

The genes for SPH and PAI were provided in the pET28a vector with a C-terminal His6-tag (SPHc and PAIc, respectively). The N-terminal variants were cloned using suitable primers introducing NdeI and XhoI restriction sites, while integrating a stop codon upstream to the XhoI site to prevent a double tag. For this purpose, two internal restriction sites in the SPH gene were deleted by switching the responsible codons in QuikChange™ reactions maintaining the codon usage preference of the E. coli host.

Protein expression and purification The plasmids (pET24a and pET28a) were transformed into E. coli BL21 (DE3) for overexpression of the target genes. After cell lysis, all proteins were successfully detected in the soluble protein fractions by SDS-PAGE analysis.[1] The supernatant was applied to different chromatography columns respective to the methodology on an FPLC-system. Initially, IMAC purification with imidazole or pH elution under mild conditions was investigated and the elution fractions were checked for the typical yellow color caused by the bound FAD cofactor. An absorption scan from 300 to 600 nm served as final proof for the presence of the cofactor, if typical peaks were detectable at 350 (reduced) or 450 nm (oxidized state). The Talon resins (Clontech Laboratories) complex Co2+ ions, but only for the PAI variants a slight yellow color was obtained with imidazole elution. In case of pH dependent elution, all proteins except the SPH variants were denatured under elution conditions at pH 5, indicating the strong influence of the pH. SPH was shown to be active under slightly acidic conditions; however the proteins lost the cofactor during the Talon resin purification. In consequence, the methodology was switched to Ni-NTA based resin material (HItrap columns, GE Healthcare) at a FPLC-system. Here, only imidazole dependent elution was used. SPHc showed a slight, SPHn a strong yellow color with a characteristic peak in the absorbance spectrum (Figure 7). Only a peak for reduced FAD was detected, indicating the main binding nature of the cofoactor. In case of the control isomerase, both PAI variants yielded a strong yellow color, but in the absorption scan two peak fractions were identified (Figure 8).

Chapter two: Results 79

Figure 7: Absorption spectrum of denatured SPHn. Only a peak at 350 nm (reduced FAD) was detected.

Figure 8: Absorption spectrum of denatured PAIc with peak fractions at 350 and 450 nm.

Receiving the signal of both peaks is a possible result of oxidation by air or, more probable, of the balanced state of the bound cofactor, because during the catalysis cycle the overall state will not change: The internal electron transfer of the substrate (from one carbon atom to another) is not influencing other redox processes. In case of the LAH-WT and the mutant proteins, the IMAC methodologies were not applicable, because after purification no cofactor was detected by absorption scan for any mutant and any tag variant. Only LAHn and - surprisingly - LAH01c were slightly yellow. Moreover, the purification of LAH01cn (with both tags) was not successful. Every protein was completely eluted from the columns, which was verified by SDS-PAGE analysis,[1, 4] but the FAD was lost. The separation of the holoenzyme complex was irreversible, because the addition of FAD to the purification buffers and the addition to the subsequent biocatalysis experiments had no effect. Targeting the small portion of protein with bound cofactor in the slightly yellow fractions, concentration of the eluate was done by centrifugation in Amicon centrifugal filter units (Merck Millipore), but the protein fraction was concentrated without increasing the color or the absorbance of the solution. Finally, protein denaturation was observed by protein flocculation in insoluble aggregates.

Chapter two: Results 80

As alternative methods to obtain soluble and FAD-containing LAH protein, gel filtration, size exclusion chromatography and ion exchange chromatography (weak (DEAE on cellulose) and strong (ResourceQ, quaternary ammonium on polystyrene/divinylbenzene matrix) anion exchange material) were utilized. In gradient elution methods, protein peaks were collected, but the cofactor could not be preserved.

Molecular modeling, scaffold stabilization and FAD docking The inability of the LAH protein to keep the cofactor was investigated by homology modeling to increase the affinity to the structural important FAD.[1] The structure was published before, but residues of the mobile entrance domain 4 and a flexible part of domain 1 (residues 61-72) could not be identified in the electron density. The mobility of the residue sequence A61GGSLDGADRPN72 was of special interest, because it covers a putative entrance site into the protein. With the CAVER 3.01 PyMol plugin[114], tunnel calculations were performed. As starting point the coordinates of the innermost carbon atom of bound linoleic acid in the pdb structure 4IA6 was chosen (Figure 9).

Figure 9: Model of LAH with the calculated tunnel path through the protein structure. The tunnel ends in domain 4, because only one monomer was used for the calculation and the surface was reached. The structure is shown in rainbow colors and the tunnel cluster as violet spheres. The substrate is channeled from the upper left to the buried active site, whereas the cofactor is bound in the lower right of the indicated channel. The latter binding site exhibits a higher volume and due to the visible Rossmann fold thus hypothesized to bind the cofactor.

To accelerate the calculation time, only one monomer was used for the examination with CAVER to identify the substrate and the cofactor entrance sites. Within the tool, it

Chapter two: Results 81 was also possible to identify the residues building up the tunnel walls. Concerning the substrate site, as expected the highly mobile domain 4 is responsible for substrate binding and transport into the protein. The mostly hydrophobic tunnel surface suggests the coordination of the carboxylic function more outside the protein, leading the aliphatic chain into the active site. The cofactor binding cavity was identified by its larger volume (Figure 9). The corresponding path out of the structure ends exactly between the missing residues. Two proline residues are flanking this part, P59 and P71, which can play a role for the flexible loop in-between. Within the structure modeling software YASARA, possible loop conformations were calculated, finally predicting a helical conformation, spanning the residues P59-D69 (Figure 10).

Figure 10: Loop models of monomeric 4IA5 predicted by YASARA. The overall structure contains loops and helices (each in blue) and sheets (red). The predicted conformations of the C-trace are shown in yellow-orange: The stronger the yellow color, the higher the portion of typical α-helix characteristics in the undefined structural element.

The high mobility of the putative helix suggests a flexible entrance possibility for the cofactor, but in combination with the purification results, FAD enters the structure once during translational folding and is irreversibly lost when the binding strength decreases. Thus, the helix is covering the tunnel entrance found by CAVER. Stabilization or the increase of its rigidity by protein engineering was desired to keep FAD inside the structure or increase its binding constant to a level that purification of the holoenzyme is possible. The sequence part was analyzed and compared to different important

Chapter two: Results 82 parameters for helically stabilizing residues and their total influence on the stability (Table 3).[115-116]

Table 3: Proposed amino acid substitutions to increase the helix stability. Every residue has a related partner to establish energetically convenient interactions and decrease the loop mobility. Substitution Stabilization issue P59V hydrophobic stacking V60G charge-dipole interaction G62E charge-dipole interaction G63E charge-dipole interaction S64A hydrophobic stacking, hydrophobic interaction K65E Glu-Lys pair D66E Glu-Lys pair D69K Glu-Lys pair, charge-dipole interaction D70K Glu-Lys pair, charge-dipole interaction P71G Schellman motif

Additionally, the helix sequence plus adjacent residues were applied to a BLAST search on the tBLASTn server (for protein query search in a nucleotide database) to figure out conserved residue positions. In Table 4 chosen sequences from a multiple sequence alignment over 112 sequences are summarized.

Table 4: Multiple sequence alignment (MSA) of the sequence at the entrance region of the bound cofactor FAD. Substitutions are highlighted red and the putative helical sequence is underlined.

Source Sequence Lactobacillus acidophilus EELPVAGGSLDGA-DRPNAGFVVRGGREME Lactobacillus reuteri EELPIAGGSLDGQ-DRPDVGFVTRGGREME Staphylococcus aureus EELPKAGGSLDGE-NMPLKGYVVRGGREME Weissella thailandensis EELSVPGGSLDGT-ERPNIGLVTRGGREME Lactococcus lactis EELPLSGGSLDGS-FIPHDGFVVRGGREME Streptococcus pyogenes EELPLSGGSLDGI-EKPHLGFVVRGGREME Streptococcus agalactiae EELPLSGGSLDGV-KRPDIGFVVRGGREME Lactobacillus johnsonii EELDVAGGSLDGT-NRPNAGFVVRGGREME Lactobacillus amylovorus EELGLPGGSMDGIYNKQKESYIIRGGREME this group’s identity *** .***:** - . : ******* total MSA identity ** :.****** - *:: ***** * Superfamily PRK13977 EELDVPGGSLDGA-GNPEKGYVARGGREME Strep_67kDa_ant (MCRA like family) EELPLPGGSLDGI-GSPHLGYVVRGGREME COG4716 (MCRA, unknown function) EELPLAGGSLDGA-GSPHHGFVVRGGREME

Chapter two: Results 83

The BLAST search shows differences in the N-terminal cap of the putative helix, followed by a conserved sequence and additional differences in the C-terminal residues. Three superfamilies were found which show also a different architecture. Helix stabilizing residues from Table 3 were successfully introduced into the LAH-WT, but unfortunately no FAD was bound after the protein elution. Again, the protein was yielded in good purity, but without detectable FAD included in the structure. As it turns out that the protein activity was measurable in whole cell systems and also in crude lysates no further investigation was done. However, the loop structure was used to analyze the FAD-binding mode, because to this date no related structure was available. Therefore, FAD was used as flexible or rigid ligand to be docked into the modeled FAH structure, containing the calculated loop orientation (Figure 11). Quite recently a structure of a hydratase from Elizabethkingia meningoseptica (pdb code 4UIR) was published and the active site residues were confirmed.

A B

Figure 11: Model structures of LAH (pdb code 4IA5) with docked FAD and calculated loop structure. Both models successfully included the necessary cofactor, oriented in the direction of the catalytic residues Y200 and E82 (when compared to the 4UIR structure), which are shown as sticks. The cofactor pointed into the direction of the active site, but the different orientation of the catalytic residues indicates a high flexibility of available rotamers. The docking results for the deepest penetration of FAD into the structure (A) and the best orientation towards Y200 and the appropriate binding to the Rossmann fold (B) are presented.

Assay development and control reactions

Hydratase assay

For the fast evaluation of hydratase activity, two assay systems were proposed. The converted fatty acid cannot be detected directly without derivatization using GC analysis. For simplifying the analysis, spectrophotometrically detectable consecutive

Chapter two: Results 84 reactions with an alcohol dehydrogenase (ADH) or nitrous acid were investigated (Figure 12).

keto acid 340 nm NADH 1 R 2 260 nm R NAD+ O

ADH LAH R1 R1 R2 R2 H2O OH R1 HNO2 R2 dienoic substrate hydroxylated product O N O nitrosated fatty acid 372 nm Figure 12: Reaction scheme for proposed hydratase assays. The hydroxy product is detected spectrophotometrically by consecutive reactions executed either enzymatically (with ADH, upper path) or chemically (nitrosation, lower path).

The indirect measurement of NAD+ consumption or NADH generation with a suitable ADH is a promising way to achieve activity and kinetic measurements, while the reaction takes place in a cuvette or in a MTP. Due to the fact that E. coli contains its own ADH and further NAD+/NADH converting enzymes, hence purification of the enzyme of interest or at least the fractionation of cell lysates will be necessary. On the other hand, the extraction of fatty acids after whole cell biocatalysis can be performed in the supernatant after harvesting the cells. The acidic reaction (10% HCl) with nitrous acid yields alkyl nitriles, which are detectable by specific absorption at 372 nm. The absorption spectra for primary, secondary and tertiary alkyl nitriles differ, with specific maxima at 356 nm, 372 nm and 382 nm, respectively (Figure 13).

Figure 13: Plotted absorption spectra of primary, secondary and tertiary alkylnitrites.[117] The strongest absorption peaks were measured at 356 nm, 372 nm and 382 nm, respectively.

Chapter two: Results 85

Therefore, analyzed compounds can be discriminated by the signal pattern. In reaction mixtures, the strongest signal originates from the major compound, but with decreasing excess, it is hard to determine the correct isomer. Thus, the individual signal threshold, also related to the purity of the analyzed solutions, limits this application. The assay was verified with 1-hexene, 1-hexanol, 12-hydroxy stearic acid and 2-phenyl- 2-propanol. Suitable standard curves were obtained by measuring the specific absorbance at 372 nm and were used for concentrations up to 50 mmol L-1 substrate. With the positive control SPHn, full conversion of oleic acid and 52.98% conversion of linoleic acid were detected (Figure 14).

Figure 14: Spectra of extracted product solution after conversion of 50 mmol L-1 oleic (line) or linoleic (dashed line) acid with SPHn. The most intensive peak is at 372 nm, the typical maximum absorption value of secondary nitrous alcohols.

Isomerase assay

Conjugated carbon double bonds in long chain fatty acids are known to have a specific absorbance at 234 nm.[118] While the sample was extracted by organic solvents (hexane, octane), a corresponding dependency was detected with commercial CLA. The linear relation was verified in-between 5 and 100 µmol L-1 substrate. Finally, the catalysis of linoleic acid to CLA by PAI was observed, but only with a conversion of 0.78%.

Fatty acid derivatization

Indeed, the separation of reaction mixtures and the visualization of product or substrate peaks by GC or HPLC is one of the most valuable and confident method. For fatty acids and hydroxy fatty acids, a derivatization is necessary, whereas different methodologies exist to convert the carboxy (and hydroxy) function to esters. First, the carboxy function was targeted by mixing the control compounds with TMS diazomethane, BSTFA, methanolic HCl, methanolic borotrifluoride[119] or methanolic KOH.[120] The mixture was applied to thin layer silica plates, separated by a mixture of petroleum ether, diethyl

Chapter two: Results 86 ether and glacial acetic acid (85:15:1.5) and visualized with phosphomolybdic acid (Figure 15).

A B C D E A

Figure 15: Fatty acid methyl esters separated by thin layer chromatography. Treatment of palmitoleic (P), stearic (S), oleic (O), linoleic (LA) and linolenic acid (LE, spots from left to right on each plate) with TMS diazomethane (A), BSTFA (B), methanolic HCl (C), methanolic BF3 (D) or methanolic KOH (E). The last spot is derived from stearic acid methyl ester standard as positive control.

A sufficient derivatization was achieved with TMS-diazomethane, methanolic HCl and methanolic BF3, whereas nearly full conversion was found only for the last both. Due to the easy and cheap preparation, methylation was first performed with methanolic HCl, but after optimization, silylation with TMS diazomethane was successfully applied. When hydroxy fatty acids were expected, silylation was performed with BSTFA directly before GC analysis.[42] As column material a 5% phenyl polysilphenylene-siloxane (BPX5) and a 70% cyanopropyl polysilphenylene-siloxane (BPX70) column were used to figure out suitable retention times for free fatty acids and fatty acid methyl esters (FAMEs, Table 5).

Table 5: Retention times of free fatty acids and FAMEs in the GC analysis on polar BPX5 and non- polar BPX70 column.

Fatty acid Retention time [min] BPX70 BPX5 oleic acid 7.84 18.75 linoleic acid 8.38 18.54 linolenic acid 9.21 18.93 oleic acid methyl ester 7.86 16.63 linoleic acid methyl ester 8.44 16.33 linolenic acid methyl ester 9.26 16.61

On the polar BPX5 column, the derivatization success was measurable, because fatty acids and their corresponding FAMEs can be separated, but only with the non-polar BPX70, a separation depending on the chain length of the fatty acid was possible.

Chapter two: Results 87

Therefore, the BPX70 was used for FAME separation with the benefit that non- derivatized samples create the same retention time. Thus, only the extraction method influences and limits the process outcome after a biocatalytic reaction, enabling the proof of each enzymatic activity. Standard samples were injected for separation of hydroxy fatty acids, CLAs and non-converted substrates (Figure 16).

A

B A

Figure 16: GC separation of standard samples (A) HA - heptadecanoic acid, OA - oleic acid, 12- HSA - 12-hydroxy stearic acid and (B) HA, LA - linoleic acid, CLA - conjugated linoleic acid.

The standard samples for the expected hydroxy products (10-hydroxystaric acid or 9- hydroxystearic acid and analogously for the possible linoleic acid conversion products: 9- hydroxy-12-octadecenoic acid, 10-hydroxy-12-octadecenoic acid, 12-hydroxy-9- octadecenoic acid, 13-hydroxy-9-octadecenoic acid and the possible double hydroxylated products) were not commercial available. The 12-hydroxystearic acid was used as product sample, for which sufficient separation was achieved.

Biocatalysis reactions The control enzymes and also the LAH-WT, as well as the LAH mutants were used in biocatalysis reactions in Tris-HCl buffered systems. The substrate was either linoleic acid

Chapter two: Results 88 or oleic acid. Depending on the buffer system, Tris-HCl pH 7.5 or pH 5.0 were used in parallel experiments, because both reaction conditions were described.[1, 121] Substrate solution (in DMSO or emulsified by sonication) was added to the buffer system and the reactions were started by enzyme addition. The products were extracted by a 2:1 chloroform-methanol (v/v) mixture and applied to GC analysis (Figure 17).

Figure 17: Hydratase activity of LAH and the constructed mutants against linoleate using cell lysates or whole cells and against oleate using whole cells at pH 7.5. GC peak areas of the hydroxylation product (12Z)-10-hydroxy-12-octadecenoic acid (10-HOE) and linoleic acid (LA) or 10-hydroxy octadecanoic acid (10-HOA) and oleic acid were determined and compared to the LAH wild type. The results represent single measurements.

Biocatalysis reactions were only possible with cell lysates or whole cells due to the cofactor loss during the purification processes of LAH and its mutants. Regarding the hydratase activity, the conversion with whole cells was higher than with cell lysates. At least, LAI produces a negligible amount of product. The CLA production was also monitored by GC, but no isomerized products were found for LAI, the LAH mutants and as expected for LAH-WT. Only for the positive control PAIc, full conversion was achieved after 24 h. Moreover, no hydroxy or isomerized product was found at pH 5.0. An addition of FAD to the purification buffer or the reaction buffer showed no effect, as well as the addition of glucose to the biocatalysis reactions. To study the growth and the formation of hydroxylated or isomerized products during cell cultivation, linoleic acid was added twelve hours after induction of gene expression to E. coli cell cultures expressing PAIc, LAHc, LAHn or LAI. Again, no product was detected after 20 h incubation.

Chapter two: Results 89

CD spectrometric structure validation The structural analysis was done for IMAC-purified LAH variants LAHn and LAHc and elution fractions from IEC, containing untagged LAH-WT (DEAE and ResourceQ). PAIn served as experimental control.

A B

Figure 18: CD spectra of different purified fatty acid hydratase LAH and the control enzymes PAIn and SPHn. The untagged LAH-WT was purified with IEC (DEAE, ReQ) or IMAC (LAHc, LAHn).

The structural shapes were similar, but the results of the secondary structure prediction with the measured data demonstrated some differences (Table 6).

Table 6: Secondary structure calculations by the DichroWeb server[122-123], obtained with experimental data from Figure 18.

Helix Strand Protein Turns Unordered Total ordered distorted ordered distorted PAIn 0.08 0.09 0.22 0.1 0.2 0.31 1 SPHn 0.1 0.09 0.17 0.1 0.22 0.31 0.99 LAHn 0.25 0.19 0.07 0.07 0.2 0.22 1 LAHc 0.31 0.21 0.07 0.06 0.17 0.18 1 LAH ReQ 0.11 0.11 0.17 0.1 0.22 0.29 1 LAH DEAE 0.09 0.08 0.2 0.11 0.21 0.3 0.99

For LAH, a significant change in the predicted secondary structure was observed. The data for the wild-type enzymes, purified with IEC columns made of DEAE or Q- sepharose, are very similar, but obviously the fraction of helices increases by increasing the portion of distorted helical elements, whereas the content of strands decreases, when a hexahistidine-tag is attached to the protein. Unfortunately, there was no data collection possible with bound FAD and the natural structure fold could not serve as control. With the control enzymes PAIn and SPHn, the portion of strands is higher than for helices, but the unordered fraction is up to one third of the protein architecture,

Chapter two: Results 90 similarly for the non-tagged LAH samples. The drastic differences in LAH seem to be a reason of the missing FAD, which stabilizes the whole protein structure leading to less fluctuating secondary elements.

Chapter two 91

Discussion

The Lactobacillus acidophilus hydratase (LAH) was described as long chain fatty acid hydratase, specifically hydroxylating position C10 of oleic or linoleic acid.[2, 50] Lactobacillus acidophilus harbors additionally a linoleate isomerase activity and a related gene sequence was published (LAI), which shows a high identity to LAH.[42] The genetic source of the target enzymes LAH and LAI were the same, but the chemical functionalities are completely different: LAH accepts the non-protected palmitoleic acid, oleic acid, linoleic acid and linoleic acid, yielding C10-monohydroxylated products.[2] Interestingly, a gene annotated as isomerase was added to the GenBank in 2005 (LAI), differing in only three amino acids from LAH.[42] The main protein functionality responsible for the conversion of linoleic acid to the CLA isomers remained unclear, but different genes responsible for the isomerase activity were reported, also for different homologs from Lactobacillus species.[113] In this thesis, the genes were cloned, heterologous expressed in E. coli and analyzed towards the hydratase and isomerase activity. It was figured out, that LAH obtains the hydratase activity, but LAI was not able to perform an isomerization or hydration reaction. Different mutants were created, bridging the substitutions between LAH and LAI, but only hydratase products were found. As the protein could not be purified with bound cofactor, further mutants of histidine tagged LAH were created and also different purification methods were applied. Unfortunately, no FAD was detectable after elution of LAH-WT or the LAH mutants. Only LAHn exhibits a slightly yellow color, but in spectrophotometric analysis, no FAD peak was detectable, in contrast to LAHn as reported from Volkov et al.[2] Since the purification methodologies failed, stabilization of the putative entrance region of FAD was targeted to decrease the cofactor dissociation. The missing loop structure from the crystal structures 4IA5 and 4IA6 seemed to be helical in modelling experiments. Therefore, these residues were critically analyzed and stabilizing interactions were introduced. Again, the protein could be soluble expressed, but the cofactor was lost again. In a recent article, the structure of the oleate hydratase OhyA from Elizabethkingia meningoseptica (sharing 43.35% sequence identity to LAH) was solved to 2.75 Å by molecular replacement (pdb code 4UIR). In this structure the backbone trace turns inside the protein architecture, folding an antiparallel β-sheet with two short strands, each made out of two amino acids. Hence, the LAH loop models were miscalculated within the YASARA software (Figure 19). The proline residue, which is at the homologous position of P71 (first hypothesized to flank one end of the putative helix), mediates the sharp turn inside the sheet. However, there are strong differences between the calculated helical fold and the determined sheet in the strongly related structure. Therefore a homology model of LAH was set up, based on the OhyA (4UIR) structure. The final model was achieved with good modeling

Chapter two: Discussion 92 scores, but again, no sheet structure could be predicted, even if the identity for the loop sequence is 56.25% between OhyA and LAH (Figure 20).

Figure 19: View towards the binding cleft of the cofactor FAD, depicted in lines. Blue - LAH structure 4IA5, salmon - OhyA structure 4UIR, green - calculated loop model, yellow - homology model of LAH from 4UIR. The calculated helix covers a part of the binding region of FAD. The loop serves for cofactor binding and orientation by positioning the interaction partner S104 (S64 in LAH). Please note the small helical portion near to the isoalloxazine function of FAD in the LAH models, but not in OhyA.

Figure 20: Structure prediction of the LAH sequence, based on the structure input 4UIR. Blue - LAH structure 4IA5, salmon – loop model, yellow - LAH structure model with the 4UIR template. The overall fold is similar to OhyA, but the helically predicted structure did not fold inside the structure, as in the models calculated before.

Chapter two: Discussion 93

The input data of the YASARA script were changed to determine the respective residues, folding the antiparallel β-sheet (A108GN110 and G114YI116) and the sharp turn (P111TD113). Moreover, P59 was substituted to histidine to increase backbone flexibility, but finally no structure with a β-sheet was obtained. The loop was not able to fold back inside the structure. Consequently, FAD cannot be fixed correctly by docking experiments, because necessary binding residues were missing. Regarding the FAD-binding site, the unsolved loop in LAH is necessary to position S64 for interaction with the cofactor FAD. In the second shell, G63 and L65 seem to be essential binding partners, compared to the structurally similar residues in OhyA (Figure 21).[56]

A B

C

Figure 21: Structural model of fatty acid hydratases. (A) LAH with mutated residues in stick representation. (B) OhyA with amino acid residues involved in the reaction, shown as sticks. (C) Structural alignment of the active site of LAH (blue) and OhyA (salmon). Residues are represented as sticks and labeled respective to the LAH sequence. If residues differ, both positions are labeled. FAD is shown as ball and stick model. Please note the salt bridge (red) between E157 and R199 next to the catalytically active Y200. T391, H393 and H397 are homologous to T436, N438, and H442 in OhyA, respectively, and involved in substrate binding through interaction with the carboxylic function.

Chapter two: Discussion 94

The pocket for cofactor binding and the residues lining the active site are similar in OhyA (Figure 21). Hence, the same mechanism can be assumed: The main function of FAD has structural reasons. Y200 (Y241 in OhyA) was suggested to protonate the double bond of the substrates and E82 (E122 in OhyA) activates a water molecule for the re-side attack to yield the hydroxylated product (Figure 22). The isoalloxazine functionality of the cofactor is also able to protonate the double bound in the reduced state.[56] Furthermore, N3 and the oxygen atoms of C2 and C4 are in direct neighborhood of the activating glutamate (3.9-5 Å). The correct orientation of the necessary water molecule is possible, thereby polarizing it to simplify the addition.

Y200 Y200 Y200

OH O- O- + H OH + COOH  COOH 5 5 COOH 5 5 re 5 5 O H O H H H O O- O O- O OH

E82 E82 E82 Figure 22: Putative mechanism of the hydration of oleic acid by the oleate hydratases LAH and OhyA. The residue numbers correspond to LAH. The water molecule is added by a re attack on the polarized double bond. The figure is adapted from Engleder et. al.[56]

The LAH exhibits a stronger activity towards linoleic acid, than for oleic acid, reasoning the enzyme as linoleate hydratase. The best results were obtained by using emulsified substrates in whole cells, indicating the ability of the substrates to cross the cellular membranes for enzymatic catalysis. When the residues given in Table 2 were substituted sequentially, the hydratase activity towards oleic acid was preserved for all mutants, but was reduced for LAH05 (E157G V506A) and lost for LAH07 (= LAI). In contrast, hydratase activity towards linoleic acid was absent in the double mutants LAH04 (E157G F286C) and LAH05. Both mutants share the substitution E157G, but the single mutant still produces the hydroxylated product. No other formation of hydroxylated isomers was detected, according to literature.[26] The substrate scope is limited to certain fatty acids, well defined in the position of the double bond and the chain length. Linoleic acid hydration was synergistically inhibited by substituting E157G and another residue from this study (F286C or V506A). E157 forms a salt bridge to R199, adjacent to catalytic Y200. At the respective position in OhyA, there is the positively charged side chain of K239, but no homologous negatively charged counterpart. The lysine residue has contacts to S28, N543 and N544. Regarding the alignment (Figure 6), around K198 (in OhyA), the homologous residue to E157 (in LAH), one asparagine (N197) and two glutamate residues (E199, E200) can be found, which are solvent exposed. But taking into account a proline in LAH (P156), a different fold seems plausible. The respective substitution

Chapter two: Discussion 95

E157G destroys the salt bridge, while influencing the orientation of the catalytic Y200. The other positions, F286 (I327 in OhyA) and V506 (V547 in OhyA) are influencing the substrate tunnel and the substrate binding. In no other aligned sequence the substitutions were found and only V506 was annotated as conserved in a similar alignment.[56] In cell lysates, less linoleate hydration activity was found than in the whole cell reaction, especially for the mutants LAH01 and LAH03-06. For this reason, only whole cell catalysis was performed for the substrate oleic acid. In whole cells, the cofactor was apparently not lost, resulting in more active holoenzyme. In contrast, the cofactor was able to irreversibly dissociate from the protein when cell lysates were used. Additionally, in functional SPHn, the cofactor was found to be in the reduced state, whereas in PAIc, both redox forms exists (Figures 7 and 8). In the related enzyme OhyA, a 3.7 fold higher turnover number and a 5.8 fold higher specific activity was found, when reduced FAD was used in the catalytic reaction.[56] Also in a linoleic acid hydratase of Lactobacillus plantarum the activity increased 10-fold by using the reduced state of FAD.[54] However, this enzyme (GenBank: CCC77691.1) shares 32.06% sequence identity to LAH and its highest activity was detected with oleic acid. In cell lysates, rapid oxidation of FAD will decrease the activity of LAH similarly. The low influence of cell lysis in the lysates of LAH-WT and LAH02 indicates that FAD is bound stronger to those enzymes and cannot be oxidized as rapid as in the other mutants. Therefore, V506A must have an influence on cofactor binding, which is not straightforward to explain, because the neighboring F548 (in OhyA) is only found in the second shell, surrounding the bound FAD.[56] Engleder et al.[56] and Bevers et al.[27] did not analyze the activity of OhyA towards different substrates than oleic acid, therefore no data for the conversion of linoleic acid are available, but it cannot be excluded, that also OhyA possesses a higher activity towards other long chain fatty acids, due to its high relationship to LAH. In another study, examining the linoleate isomerase activity of Lactobacillus acidophilus strains, sequences were identified, displaying a high sequence identity to LAH.[113, 124] In case of the isomerase from “L. acidophilus NCFMb”, the sequence is identical to LAH. The second is LAI (“isomerase from L. acidophilus AS1.1854”), which exhibits no isomerase activity as shown in this thesis. The third and last one is the linoleate isomerase from “L. acidophilus ATCC 832” (GenBank: ADD22720), very similar to LAH. The substitutions Y94N, K190E, F286C should be suitable to convert the hydratase into a linoleate isomerase. But the last substitution was shown to be the only one preserving the hydratase activity in cell lysate biocatalysis. The other positions are quite far away from the active site; therefore no reason for the change of activity is presumable. There are finally two possibilities: First, the proteins necessary for linoleate isomerase activities were completely mis-annotated and the related activity has another genetic origin, because to date no discrete linoleate isomerase from the Lactobacillus family was

Chapter two: Discussion 96 successfully cloned, expressed and reproducibly characterized. Second, the isomerase activity is the result of a hydrated intermediate, which was again dehydrated to yield the CLA products.[52] Also in another study, the linoleate isomerase activity was suggested to be a result of LAI (GenBank: ABB43157).[125] Furthermore, no more genes were identified, but also this hypothesis failed, because LAI exhibits no isomerase activity with the substrates used at all. This problematic context can also be found in other Lactobacillus species: A putative oleate hydratase gene from Lactobacillus plantarum was used for a BLAST alignment (Figure 23), but the closest homologs were putative linoleate isomerases from the same or different origin. Throughout the sequences, the L. plantarum genes share an identity of 99% and at least 93-95% identity to the isomerases of other Lactobacillus species. Interestingly, the investigated CLA-hy[52-53] from the same origin shares 98.95% and 98.77% identity to annotated linoleate isomerases (GenBank IDs AAY27085 and ABB04107, respectively). In detail, the differences between CLA-hy and AAY27085 are only T39K, A69E, E290D, N470I, V475I and I522M and between CLA-hy and ABB04107 G2A, D30E, H46R, E109D, K151Q, F248S and N470I, differing in six or seven residues, respectively. The enzyme families of oleate hydratases and linoleate isomerases are closely related. The attempted experimental example to interconvert LAH into LAI, studied in this thesis, provides information about the hydratase behavior of LAH, certain mutants and also the ability of binding and preserving of the cofactor FAD. The origin of linoleate isomerase activity in Lactobacillus acidophilus was once more not proven, but the analysis of certain genes limits the residual number of genes responsible for the CLA producing linoleate isomerase activity in this host. Another possibility is a multi-enzyme complex, which includes the oleate/linoleate hydratase enzymes to create a CLA producing activity. However, so far this has been only detected in Lactobacillus extracts and the identification of the responsible genes still remains to be elucidated.

Chapter two: Discussion 97

Figure 23: Multiple sequence alignment of a putative oleate hydratase gene (OH) from Lactobacillus plantarum, an annotated linoleate isomerase (LI) of the same origin and LIs of additional Lactobacillus species. The alignment indicates that differentiation of isomerases and hydratases is not possible with clear borders. The linoleate isomerase activities have to be clarified by targeted gene expression and biochemical characterization.

Chapter two 98

Summary

Through the alignment of the amino acid sequence of the fatty acid hydratase LAH from Lactobacillus acidophilus to the non-redundant sequence database of the National Center for Biotechnology Information (NCBI, USA), a putative isomerase (LAI) was identified, differing in only three amino acid residues from the crystallized hydratase. The isomerase was described to exhibit catalytic activity towards linoleic acid, producing selectively conjugated linoleic acid. Therefore, the switch between the particular activities or the potential promiscuous activity of the enzymes was investigated by mutating the corresponding gene of LAH sequentially to generate the six possible mutant sequences, discriminating LAH and LAI. Serving as positive controls, the oleate hydratase from Streptococcus pyogenes and the fatty acid isomerase from Propionibacterium acnes were used in parallel experiments. The genes were cloned, recombinantly expressed and underwent purification studies. The intrinsic and catalytically essential cofactor FAD could only be preserved for the controls, but not for the Lactobacillus proteins. The corresponding structures were analyzed and a helix was modeled for a missing sequence of the LAH structure, covering the entrance region for the cofactor, which was successfully docked into the previously reported structure. Moreover, the modeled helix did not apparently influence the binding mode of FAD. Stabilization approaches to increase the rigidity of the helix failed, due to the different folding character of the targeted sequence, which was figured out by the comparison to a very recently published structure. However, the assay systems for each activity were successfully adapted for whole cell biocatalysis and the application of crude lysate and thus, hydratase activity towards oleic and linoleic acid was confirmed for LAH, the two single mutants LAH02 (F286C) and LAH03 (V506A) and the corresponding double mutant LAH06 (F286C V506A). Interestingly, the activity towards linoleic acid dropped 3-fold after the mutation E157G (LAH01) and was nearly completely lost in combination with another mutant, whereas oleic acid was further accepted as substrate. Through structural investigation a salt bridge to R199 was identified, adjacent to the catalytic Y200. After mutagenesis, no interaction to R199 was possible and the shape of the active site was evidently changed to inhibit the catalysis of linoleic acid, but not for oleic acid. The final isomerase LAI (identical to a respective triple mutant) was proven in this thesis to exhibit no hydratase activity (as expected), but also no isomerase activity. Moreover, no promiscuous isomerase activity was detected for the LAH WT and the generated mutants. Thus, it may be possible that the LAI harbors another catalytic activity, but to date, the gene is wrong annotated. The relationship between fatty acid hydratases and isomerases is very close, as the sequence alignment of other published enzymatic activities from different related proteins of other Lactobacillus species revealed, but indeed, the origin of the isomerase activity remains unclear, as well as the metabolic pathways leading to the production of CLAs in these bacteria.

Chapter two 99

Materials and Methods

Parts of this section were published, also with extensive supplemental material.[1, 4]

Molecular Cloning The wild-type Lactobacillus acidophilus hydratase was provided in the cloning vector pET24a and control enzymes (Streptococcus pyogenes hydratase SPH (UniProt: Q48UX1)[33] and Propionibacterium acnes isomerase PAI (UniProt: Q6A8L5) [57]) were cloned into the vector pET28a. For those, N- or C-terminal His6-tagged variants (SPHn/PAIn or SPHc/PAIc, respectively) were created by restriction and ligation using the restriction sites of NcoI/NdeI and XhoI, respectively. In case of using NcoI for N-tagged variants, the stop codon was introduced downstream the gene of interest, preventing an additional C-terminal tag. The corresponding variants of LAH wild type were created either by cloning in pET28a (LAHn) using NdeI and XhoI restriction sites or by substituting the C-terminal stop-codon in the pET24a vector with GGC in frame, coding for a glycine residue, thereby ensuring maximal flexibility of the tag. All mutants of LAH, as given in Table 2 were created by PCR/Megawhop[126] with suiting primers: The first PCR was performed with a mutagenic forward primer and the reverse primer of the template vector (pET-RP) or the other way around, using the T7 primer. In case of two mutagenesis sites, both corresponding primers were used to create the mega-primer. In the next step the mega-primer was amplified with the DNA template (wild-type gene in vector) and after the reaction, DpnI was added to digest the parental DNA. The helix variant of LAH was cloned in an analogous reaction using the modulated helix sequence as template plus ten base pairs up- and downstream the helix site. The mutated DNA was transformed into competent E. coli Top10 cells and the successful substitutions were confirmed by DNA sequencing. For gene expression, E. coli BL21 (DE3) was used.

Gene expression and protein purification LB broth (with 50 µg mL-1 kanamycin) was inoculated with a freshly grown overnight culture of E. coli BL21 (DE3) cells hosting the expression plasmid. Cells were grown at 37 °C until an optical density of 0.6 and then protein expression was induced with 0.1 mmol L-1 isopropyl β-D-thiogalactoside (IPTG). After cultivation for 14 h at 16 °C, cultures were harvested by centrifugation at 4500 g and 4 °C for 20 min and washed in Tris buffer (100 mmol L-1 Tris, pH 7.5). Cells were finally dissolved in 20 mL Tris buffer (pH 7.5) and cell lysates were either obtained by incubating with 0.1 mg mL-1 lysozyme for 10-15 min at 4 °C, freezing/thawing the cell solutions (three times each) and sonicating them for 330 s once at 4 °C with 20% power, 40% duty cycle or by use of the FrenchPress procedure for three times, each with a cell pressure of 1500 psi. After

Chapter two: Materials and Methods 100 centrifugation for 1 h at 10,000 g and 4 °C the supernatant was used as crude extract. For purification the extract was filtered and loaded on a HiTrap column. The column was washed with ten column volumes 100 mmol L-1 Tris buffer (pH 7.5) and afterwards, the target protein was eluted with 100 mmol L-1 Tris buffer (pH 7.5) containing 0.3 mol L-1 imidazole. If necessary, the protein samples were desalted via gel filtration by three HiTrap desalting columns connected in series. The ion exchange chromatography was performed with a DEAE or a Q-sepharose column, eluting the proteins with a NaCl gradient, using 100 mmol L-1 Tris buffer (pH 7.5) with 1 mol L-1 NaCl. Elution fractions with protein content were collected and analyzed by SDS-PAGE.

Biocatalysis 1 mL of the crude E. coli cell extract was incubated with 0.9 mmol L-1 linoleic acid (from a 1.78 mol L-1 stock solution in ethanol) at 30 °C. For whole cell biocatalysis, 2 mL of the cell cultures were washed with Tris buffer. The cells were resuspended in 1 mL Tris buffer and incubated with 0.9 mmol L-1 linoleic acid at 30 °C. Hydratase activity of the enzymes toward oleic acid was determined using 1 mmol L-1 of the fatty acid (dissolved in DMSO) under the same conditions described for the incubation with linoleic acid. The substrates were emulsified by sonication (5 min, 75% power, 50% duty cycle) and addition of 2.5% Tween 80. Samples for spectrophotometric and GC measurements were taken at different time points of the biocatalysis.

Spectrophotometric measurements The hydratase activity was measured spectrophotometrically at 372 nm. 200 µL of the biocatalysis reaction were extracted once with 300 µL n-octane and subsequently the hydroxy fatty acids were nitrosylated using 100 µL 10% (v/v) HCl, and 160 µL of a 10% (w/v) sodium nitrite solution.[117] Standard curves were obtained using 1-15 mmol L-1 12- hydroxyoctadecenoic acid in Tris-HCl buffer (pH 7.5) with ε = 4.9 L·mol-1·cm-1. The CLA concentration was measured at 234 nm[60] after single extraction with n- hexane: 200 µL of the biocatalysis reaction were vortexed for 2 min with 300 µL of hexane. Standard curves were created using CLA-concentrations in the range of 5- 100 µmol L-1 (gaining a molar attenuation coefficient ε = 6700 L·mol-1·cm-1). FAD was detected in supernatants by adsorption scans from 300-600 nm after protein denaturation for 10 min at 95 °C, followed by centrifugation for 10 min at 13,000 g.[24]

Fatty acid derivatization Fatty acids were extracted by chloroform/methanol (2:1 [v/v]) method.[127] The chloroform/methanol and also the hexane/isopropanol extractions were shown to have a high background with oleic acid. Hexane extraction was more specific and thereby chosen for CLA assay analysis.[128] The solvent was evaporated and the residue was

Chapter two: Materials and Methods 101 dissolved in 400 µL methanol. Different methods for derivatization are available: sodium methoxide[120], methanolic acids[129] or bases, trimethylsilyl-diazomethane, N,O- [119] bis(trimethylsilyl)-trifluoroacetamide (BSTFA), methanolic BF3 or acidic ionic liquids[130]. For each experiment, standard solutions of palmitoleic, stearic, oleic, linoleic or linolenic acid with 1 mg mL-1 in the corresponding solvent was prepared. For TMS- diazomethane, 50 μL of the standard solution (in methanol) were diluted with 150 μL methanol. 6.5 μL TMS-diazomethane (2.0 mol L-1) were added and incubated for 30 min at room temperature. The reaction was stopped with 0.2 µL glacial acetic acid and the solvent was evaporated. The residue was dissolved in acetonitrile and was added to a thin layer chromatography (TLC) plate. In case of BSTFA the same conditions were used, except that 1 µL BSTFA was added per 3 µL sample. To test acidic esterification, 50 µL of the standard solution (in hexane) was diluted with 50 µL hexane and 100 µL methanolic HCl (100 µL methanol plus 5 µL acetyl chloride on ice) were added and shaken for 30 min at 70 °C and 900 rpm. The product was extracted with 400 µL diethyl ether; dried with anhydrous sodium sulfate and finally added to the TLC plate. Another tested condition comprises BF3: 50 µL of the standard solution (in hexane) were diluted with 150 µL hexane and 50 µL methanolic BF3 solution (10% BF3 in methanol) was added and incubated for 30 min at 50 °C. The solvent was evaporated and the residue dissolved in 200 µL hexane. At last, alkali esterification was tested by adding 150 µL iso-octanol to 50 µL of the standard solution (in iso-octane) and additional 100 µL methanolic KOH (2 mol L-1). After 1 min incubation at room temperature, the solution was neutralized by sodium hydrogen sulfate and transferred to the TLC plate. Evaluation of the derivatization methods was performed on silica plates, using a mobile phase of petroleum ether-diethyl ether-glacial acetic acid (85:15:1.5) as mobile phase and methyl stearate as reference. The fatty acids were visualized by 3.5% phosphomolybdic acid in ethanol. The final derivatization protocol used 30 mmol L-1 TMS-diazomethane, shaking 30 min at room temperature. The reaction was stopped by the addition of 0.2 µL glacial acetic acid. Derivatized samples were dissolved in 100 µL acetonitrile, evaporated and finally dissolved in 20 µL acetonitrile. If hydroxy groups were expected, trimethylsilylation was achieved by mixing 1 µL BSTFA to each 3 µL sample, immediately before the measurement.

GC analysis 1 µL of a derivatized sample was injected into a gas chromatograph equipped with a capillary BPX70 or BPX5 column. The carrier gas was helium (1.5 mL min-1) and the column was heated to 150 °C, stationary for 5 min, followed by heating to 180 °C with 10 °C min-1. Additionally, the temperature was hold for 10 min and then the temperature was raised to 230 min with 20 °C min-1. The final temperature was hold for 2 min. The detector and injector temperature were set to 250 °C. The program was slightly changed when analysis was performed in the lab of Ivo Feussner in Göttingen: In a DB-23 column, connected to a mass-selective detector, helium was used at a flow rate

Chapter two: Materials and Methods 102 of 1 mL min-1. The program started stationary at 150 °C for 1 min, followed by two heating steps: from 150 to 200 °C at 4 °C min-1, then 200-250 °C at 5 C min-1 and a final stationary step at 250 °C for 6 min. The electron energy was 70 eV, the ion-source temperature 230 °C and 260 °C were used for the transfer line.

Structure modeling, tunnel analysis and cofactor docking The solved protein structures of LAH (pdb codes: 4IA5 and 4IA6) were applied to modeling experiments to first, complete the structure at the FAD entrance site and second perform tunnel cluster analysis. The models of the missing residues from A61- N72 were created within YASARA[131], expecting a helical structure for the listed residues. After publication of the 4UIR structure[56], a sheet structure for the numbered residues was expected, yielding no loop turning inside the structure. During in silico cofactor docking experiments, the cofactor was kept flexible or rigid in different approaches, using the dock_runensemble macro with the following parameters: VINA as docking method, creating seven receptor ensembles and 200 receptor runs at an environment temperature of 289 K. The tunnel cluster analysis for the LAH structures was done with the CAVER3.0[114] plugin within PyMol.[132] Starting points were residues in the substrate channel as described previously.[2]

CD spectroscopic analysis CD measurements were obtained at a Jasco spectropolarimeter using a temperature of 20 °C at a rate of 5 nm s-1. The spectrum of 185-260 nm was scanned at a resolution of 1 nm. First, the protein sample, then the buffer (5 mmol L-1 sodium phosphate, pH 7.5) was measured. The secondary structure prediction was calculated at the DichroWeb server.[122-123] The input format was JASCO 1.50 and the input unit millidegrees/theta (machine units). The analysis program CDSSTR was chosen with reference set 3[133-135] and a scaling factor of 1.0. All results were obtained with a normalized root mean square deviation (NRMSD) < 0.025.

Chapter two 103

Abbreviations ADH alcohol dehydrogenase aa amino acid bp base pairs BLAST basic local alignment search tool CLA conjugated linoleic acid DEAE diethyl aminoethyl E. coli Escherichia coli FAD flavin adenine dinucleotide FAME fatty acid methyl ester FPLC fast protein liquid chromatography GC gas chromatography IMAC immobilized metal ion affinity chromatography IPTG Isopropyl β-D-thiogalactoside kDa kilodalton LAH hydratase from Lactobacillus acidophilus LAI isomerase from Lactobacillus acidophilus MSA multiple sequence alignment MTP microtiter plate NAD(P)H nicotinamide adenine dinucleotide(phosphate) NTA nitrilotriacetic acid

OD600 optical density at 600 nm OhyA hydratase from Elizabethkingia meningoseptica PAGE polyacrylamide gel electrophoresis PAI isomerase from Propionibacterium acnes PCR polymerase chain reaction PDB protein data bank rpm revolutions per minute RT room temperature SDS sodium dodecyl sulfate SPH hydratase from Streptococcus pyogenes Tris Tris(hydroxymethyl)aminomethane o/n overnight UV ultra violet

Chapter two 104

References

[1] G. J. Freiherr von Saß, Master thesis, [17] L. L. Wallen, R. G. Benedict, R. W. Greifswald University (Greifswald), Jackson, Arch. Biochem. Biophys. 1962, 2014. 99, 249-253. [18] W. G. Niehaus, A. Kisic, A. Torkelson, D. [2] A. Volkov, S. Khoshnevis, P. Neumann, J. Bednarczyk, G. J. Schroepfer, J. Biol. C. Herrfurth, D. Wohlwend, R. Ficner, I. Chem. 1970, 245, 3790-3797. Feussner, Acta Crystallogr., Sect. D: Biol. Crystallogr. 2013, 69, 648-657. [19] C. W. Seo, Y. Yamada, N. Takada, H. Okada, Agric. Biol. Chem. 1981, 45, [3] M. D. Deng, A. D. Grund, K. J. Schneider, 2025-2030. K. M. Langley, S. L. Wassink, S. S. Peng, R. A. Rosson, Appl. Biochem. Biotechnol. [20] B.-N. Kim, S.-J. Yeom, D.-K. Oh, 2007, 143, 199-211. Biotechnol. Lett. 2011, 33, 993-997. [4] M. P. C. Fibinger, G. J. Freiherr von Saß, [21] T. M. Kuo, W. E. Levinson, J. Am. Oil C. Herrfurth, I. Feussner, U. T. Chem. Soc. 2006, 83, 671-675. Bornscheuer, Eur. J. Lipid Sci. Technol. [22] J. Swaving, J. A. M. deBont, Enzyme 2015. Microb. Technol. 1998, 22, 19-26. [5] H.-J. Fiebig, Deutsche Gesellschaft für [23] C. E. Polan, J. J. McNeill, S. B. Tove, J. Fettwissenschaft e.V., Münster, 2011. Bacteriol. 1964, 88, 1056-1064. [6] A. Hiseni, I. W. C. E. Arends, L. G. Otten, [24] E. Rosberg-Cody, A. Liavonchanka, C. ChemCatChem 2015, 7, 29-37. Gobel, R. P. Ross, O. O'Sullivan, G. F. [7] S. F. Chin, W. Liu, J. M. Storkson, Y. L. Fitzgerald, I. Feussner, C. Stanton, BMC Ha, M. W. Pariza, J. Food Comp. Anal. Biochem. 2011, 12, 9. 1992, 5, 185-197. [25] C. R. Kepler, K. P. Hirons, J. J. McNeill, S. [8] N. Sehat, M. P. Yurawecz, J. A. Roach, B. Tove, J. Biol. Chem. 1966, 241, 1350- M. M. Mossoba, J. K. Kramer, Y. Ku, 1354. Lipids 1998, 33, 217-221. [26] B. Yang, H. Chen, Y. Song, Y. Q. Chen, H. [9] H. M. Teeter, L. E. Gast, E. W. Bell, J. C. Zhang, W. Chen, Biotechnol. Lett. 2013, Cowan, Ind. Eng. Chem. Res. 1953, 45, 35, 75-81. 1777-1779. [27] L. E. Bevers, M. W. Pinkse, P. D. [10] F. C. Naughton, J. Am. Oil Chem. Soc. Verhaert, W. R. Hagen, J. Bacteriol. 1974, 51, 65-71. 2009, 191, 5010-5012. [11] K. Soda, J. Am. Oil Chem. Soc. 1987, 64, [28] B.-N. Kim, Y.-C. Joo, Y.-S. Kim, K.-R. Kim, 1254-1254. D.-K. Oh, Appl. Microbiol. Biotechnol. 2012, 95, 929-937. [12] K.-R. Kim, D.-K. Oh, Biotechnol. Adv. 2013, 31, 1473-1485. [29] M.-H. Seo, K.-R. Kim, D.-K. Oh, Appl. Microbiol. Biotechnol. 2013, 97, 8987- [13] E. T. Roe, B. B. Schaeffer, J. A. Dixon, W. 8995. C. Ault, J. Am. Oil Chem. Soc. 1947, 24, 45-48. [30] Y.-C. Joo, K.-W. Jeong, S.-J. Yeom, Y.-S. Kim, Y. H. Kim, D.-K. Oh, Biochimie [14] G. Knothe, D. Weisleder, M. O. Bagby, 2012, 94, 907-915. R. E. Peterson, J. Am. Oil Chem. Soc. 1993, 70, 401-404. [31] Y.-C. Joo, E.-S. Seo, Y.-S. Kim, K.-R. Kim, J.-B. Park, D.-K. Oh, J. Biotechnol. 2012, [15] L. M. Stephenson, D. R. Speth, J. Org. 158, 17-23. Chem. 1979, 44, 4683-4689. [32] E. Y. Jeon, J.-H. Lee, K.-M. Yang, Y.-C. [16] R. Awang, S. Ahmad, Y. B. Kang, PI Joo, D.-K. Oh, J.-B. Park, Process 9804456, Malaysia, 1998. Biochem. 2012, 47, 941-947.

Chapter two: References 105

[33] A. Volkov, A. Liavonchanka, O. [50] K. R. Kim, H. J. Oh, C. S. Park, S. H. Hong, Kamneva, T. Fiedler, C. Goebel, B. J. Y. Park, D. K. Oh, Biotechnol. Bioeng. Kreikemeyer, I. Feussner, J. Biol. Chem. 2015, 112, 2206-2213. 2010, 285, 10353-10361. [51] J. Ogawa, K. Matsumura, S. Kishino, Y. [34] J. Jin, U. Hanefeld, Chem. Commun. Omura, S. Shimizu, Appl. Environ. 2011, 47, 2502-2510. Microbiol. 2001, 67, 1246-1252. [35] C. T. Hou, Nat. Biotechnol. 2009, 26, 2- [52] S. Kishino, S. B. Park, M. Takeuchi, K. 10. Yokozeki, S. Shimizu, J. Ogawa, Biochem. Biophys. Res. Commun. 2011, [36] J. W. Song, E. Y. Jeon, D. H. Song, H. Y. 416, 188-193. Jang, U. T. Bornscheuer, D. K. Oh, J. B. Park, Angew. Chem. Int. Ed. 2013, 52, [53] S. Kishino, M. Takeuchi, S. B. Park, A. 2534-2537. Hirata, N. Kitamura, J. Kunisawa, H. Kiyono, R. Iwamoto, Y. Isobe, M. Arita, [37] A. Wanikawa, K. Hosoi, T. Kato, J. Am. H. Arai, K. Ueda, J. Shima, S. Takahashi, Soc. Brew. Chem. 2000, 58, 51-56. K. Yokozeki, S. Shimizu, J. Ogawa, Proc. [38] Y. Wache, M. Aguedo, A. Choquet, I. L. Natl. Acad. Sci. USA 2013, 110, 17808- Gatfield, J. M. Nicaud, J. M. Belin, Appl. 17813. Environ. Microbiol. 2001, 67, 5700- [54] M. Takeuchi, S. Kishino, A. Hirata, S. B. 5704. Park, N. Kitamura, J. Ogawa, J. Biosci. [39] Y. Waché, M. Aguedo, J. M. Nicaud, J. Bioeng. 2015, 119, 636-641. M. Belin, Appl. Microbiol. Biotechnol. [55] C. K. M. Chen, G.-C. Lee, T.-P. Ko, R.-T. 2003, 61, 393-404. Guo, L.-M. Huang, H.-J. Liu, Y.-F. Ho, J.- [40] M. B. W. Elshof, M. Janssen, G. A. F. Shaw, A. H. J. Wang, J. Mol. Biol. Veldink, J. F. G. Vliegenthart, Recl. Trav. 2009, 390, 672-685. Chim. Pays-Bas 1996, 115, 499-504. [56] M. Engleder, T. Pavkov-Keller, A. [41] K. S. Kil, M. W. Cunningham, L. A. Emmerstorfer, A. Hromic, S. Schrempf, Barnett, Infect. Immun. 1994, 62, 2440- G. Steinkellner, T. Wriessnegger, E. 2449. Leitner, G. A. Strohmeier, I. Kaluzna, D. Mink, M. Schurmann, S. Wallner, P. [42] L. Dong, J. Cao, Y.-J. Wang, H.-J. Wang, Macheroux, K. Gruber, H. Pichler, Henan Univ. Technol. J. 2007, 47-51. ChemBioChem 2015, 16, 1730-1734. [43] W. L. Williams, H. P. Broquist, E. E. [57] A. Liavonchanka, E. Hornung, I. Snell, J. Biol. Chem. 1947, 170, 619-630. Feussner, M. G. Rudolph, Proc. Natl. [44] E. Kitay, E. E. Snell, J. Bacteriol. 1950, Acad. Sci. USA 2006, 103, 2576-2581. 60, 49-56. [58] A. Liavonchanka, M. G. Rudolph, K. [45] B. W. Boughton, M. R. Pollock, Tittmann, M. Hamberg, I. Feussner, J. Biochem. J. 1953, 53, 261-265. Biol. Chem. 2009, 284, 8005-8012. [46] D. L. Greenway, K. G. Dyke, J. Gen. [59] T. Y. Lin, C. W. Lin, Y. J. Wang, Food Microbiol. 1979, 115, 233-245. Chem. 2003, 83, 27-31. [47] M. K. Raychowdhury, R. Goswami, P. [60] W. J. Dann, T. Moore, Biochem. J. 1933, Chakrabarti, J. Appl. Bacteriol. 1985, 59, 27, 1166-1169. 183-188. [61] A. S. Demir, F. N. Talpur, J. Agr. Food [48] C. J. Zheng, J. S. Yoo, T. G. Lee, H. Y. Chem. 2010, 58, 1646-1652. Cho, Y. H. Kim, W. G. Kim, FEBS Lett. [62] M. W. Pariza, Am. J. Clin. Nutr. 2004, 2005, 579, 5157-5162. 79, 1132S-1136S. [49] E. Altermann, W. M. Russell, M. A. [63] I. Churruca, A. Fernández-Quintela, M. Azcarate-Peril, R. Barrangou, B. L. Buck, P. Portillo, BioFactors 2009, 35, 105- O. McAuliffe, N. Souther, A. Dobson, T. 111. Duong, M. Callanan, S. Lick, A. Hamrick, R. Cano, T. R. Klaenhammer, Proc. Natl. [64] M. W. Pariza, W. A. Hargraves, Acad. Sci. USA 2005, 102, 3906-3912. Carcinogenesis 1985, 6, 591-593.

Chapter two: References 106

[65] Y. L. Ha, N. K. Grimm, M. W. Pariza, [82] Y. Park, M. W. Pariza, Food Res. Int. Carcinogenesis 1987, 8, 1881-1887. 2007, 40, 311-323. [66] Y. L. Ha, J. M. Storkson, M. W. Pariza, [83] Y.-C. Huang, L. O. Luedecke, T. D. Shultz, Cancer Res. 1990, 50, 1097-1101. Nutr. Res. 1994, 14, 373-386. [67] M. W. Pariza, Y. L. Ha, in [84] K. N. Lee, D. Kritchevsky, M. W. Pariza, Antimutagenesis and Anticarcino- Atherosclerosis 1994, 108, 19-25. genesis Mechanisms II, Vol. 52 (Eds.: Y. [85] E. A. de Deckere, J. M. van Amelsvoort, Kuroda, D. M. Shankel, M. D. Waters), G. P. McNeill, P. Jones, Brit. J. Nutr. Springer USA, 1990, pp. 167-170. 1999, 82, 309-317. [68] C. Ip, S. F. Chin, J. A. Scimeca, M. W. [86] M. Evans, Y. Park, M. Pariza, L. Curtis, B. Pariza, Cancer Res. 1991, 51, 6118- Kuebler, M. McIntosh, Lipids 2001, 36, 6124. 1223-1232. [69] M. A. Belury, Nutr. Rev. 1995, 53, 83- [87] K. L. Houseknecht, J. P. Vanden Heuvel, 89. S. Y. Moya-Camarena, C. P. Porto- [70] P. W. Parodi, Aust. J. Dairy Technol. carrero, L. W. Peck, K. P. Nickel, M. A. 1996, 51, 24-32. Belury, Biochem. Biophys. Res. Commun. 1998, 244, 678-682. [71] C. Ip, Am. J. Clin. Nutr. 1997, 66, 1523S- 1529S. [88] M. E. Cook, C. C. Miller, Y. Park, M. Pariza, Poultry Sci. 1993, 72, 1301-1305. [72] N. Kimoto, M. Hirose, M. Futakuchi, T. Iwata, M. Kasai, T. Shirai, Cancer Lett. [89] C. C. Miller, Y. Park, M. W. Pariza, M. E. 2001, 168, 15-21. Cook, Biochem. Biophys. Res. Commun. 1994, 198, 1107-1112. [73] S. M. Soel, O. S. Choi, M. H. Bang, J. H. Yoon Park, W. K. Kim, J. Nutr. Biochem. [90] M. G. Hayek, S. N. Han, D. Wu, B. A. 2007, 18, 650-657. Watkins, M. Meydani, J. L. Dorsey, D. E. Smith, S. N. Meydani, J. Nutr. 1999, [74] E. Rosberg-Cody, M. C. Johnson, G. F. 129, 32-38. Fitzgerald, P. R. Ross, C. Stanton, Microbiology 2007, 153, 2483-2490. [91] Y. Lee, J. T. Thompson, A. R. de Lera, J. P. Vanden Heuvel, J. Nutr. Biochem. [75] M. Sugano, A. Tsujita, M. Yamasaki, K. 2009, 20, 848-859, 859 e841-845. Yamada, I. Ikeda, D. Kritchevsky, J. Nutr. Biochem. 1997, 8, 38-43. [92] A. C. Fogerty, G. L. Ford, D. Svoronos, Nutr. Rep. Int. 1988, 38, 937-944. [76] R. J. Nicolosi, E. J. Rogers, D. Kritchevsky, J. A. Scimeca, P. J. Huth, [93] F. Dionisi, P. A. Golay, M. Elli, L. B. Fay, Artery 1997, 22, 266-277. Lipids 1999, 34, 1107-1115. [77] D. Kritchevsky, S. A. Tepper, S. Wright, [94] P. W. Parodi, J. Dairy Sci. 1977, 60, S. K. Czarnecki, Nutr. Res. 2002, 22, 1550-1553. 1275-1279. [95] C. R. Kepler, S. B. Tove, J. Biol. Chem. [78] E. A. Decker, Nutr. Rev. 1995, 53, 49-58. 1967, 242, 5686-5692. [79] J. E. Villarreal-Lozoya, L. Lombardini, L. [96] M. Coakley, R. P. Ross, M. Nordgren, G. Cisneros-Zevallos, Food Chem. 2007, Fitzgerald, R. Devery, C. Stanton, J. 102, 1241-1249. Appl. Microbiol. 2003, 94, 138-145. [80] Y. Park, K. J. Albright, W. Liu, J. M. [97] R. Kapoor, M. Reaney, N. D. Westcott, Storkson, M. E. Cook, M. W. Pariza, in Bailey's Industrial Oil and Fat Lipids 1997, 32, 853-858. Products (Ed.: F. Shahidi), John Wiley & Sons, Inc., 2005. [81] J. P. DeLany, F. Blohm, A. A. Truett, J. A. Scimeca, D. B. West, Am. J. Physiol. [98] J. Jiang, L. Bjorck, R. Fonden, J. Appl. Regul. Integr. Comp. Physiol. 1999, 276, Microbiol. 1998, 85, 95-102. R1172-R1179.

Chapter two: References 107

[99] L. Gorissen, S. Weckx, B. Vlaeminck, K. Damborsky, PLoS Comput. Biol. 2012, 8, Raes, L. De Vuyst, S. De Smet, F. Leroy, e1002708. J. Appl. Microbiol. 2011, 111, 593-606. [115] V. Munoz, L. Serrano, Curr. Opin. [100] L. Alonso, E. P. Cuesta, S. E. Gilliland, J. Biotechnol. 1995, 6, 382-386. Dairy Sci. 2003, 86, 1941-1946. [116] A. M. Facchiano, G. Colonna, R. Ragone, [101] J. Ogawa, S. Kishino, A. Ando, S. Protein Eng. 1998, 11, 753-760. Sugimoto, K. Mihara, S. Shimizu, J. [117] A. Hiseni, R. Medici, I. W. Arends, L. G. Biosci. Bioeng. 2005, 100, 355-364. Otten, Biotechnol. J. 2014, 9, 814-821. [102] S. S. Peng, M. D. Deng, A. D. Grund, R. [118] J. E. Jackson, R. F. Paschke, W. Tolberg, A. Rosson, Enzyme Microb. Technol. H. M. Boyd, D. H. Wheeler, J. Am. Oil 2007, 40, 831-839. Chem. Soc. 1952, 29, 229-234. [103] S.-O. Lee, G.-W. Hong, D.-K. Oh, [119] T. Y. Lin, C. W. Lin, Y. J. Wang, J. Food Biotechnol. Prog. 2003, 19, 1081-1084. Sci. 2002, 67, 1502-1505. [104] M. Macouzet, B. H. Lee, N. Robert, J. [120] F. E. Luddy, R. A. Barford, R. W. Appl. Microbiol. 2009, 106, 1886-1891. Riemenschneider, J. Am. Oil Chem. Soc. [105] C. N. S. P. Suurmeijer, M. Pérez- 1960, 37, 447-451. Gilabert, H. T. W. M. van der Hijden, G. [121] P. E. Hughes, W. J. Hunter, S. B. Tove, J. A. Veldink, J. F. G. Vliegenthart, Plant Biol. Chem. 1982, 257, 3643-3649. Physiol. Biochem. 1998, 36, 657-663. [122] L. Whitmore, B. A. Wallace, [106] E. Hornung, C. Krueger, C. Pernstich, M. Biopolymers 2008, 89, 392-400. Gipmans, A. Porzel, I. Feussner, Biochim. Biophys. Acta, Mol. Cell Biol. [123] L. Whitmore, B. A. Wallace, Nucleic Lipids 2005, 1738, 105-114. Acids Res. 2004, 32, W668-673. [107] J. Kohno-Murase, M. Iwabuchi, S. Endo- [124] M. Macouzet, B. H. Lee, N. Robert, J. Kasahara, K. Sugita, H. Ebinuma, J. Appl. Microbiol. 2010, 109, 2128-2134. Imamura, Transgenic Res. 2006, 15, 95- [125] J. Farmani, M. Safari, F. Roohvand, S. H. 100. Razavi, M. R. Aghasadeghi, H. [108] B. Zhang, C. Rong, H. Chen, Y. Song, H. Noorbazargan, Eur. J. Lipid Sci. Technol. Zhang, W. Chen, Microb.Cell Fact. 2012, 2010, 112, 1088-1100. 11, 1-8. [126] K. Miyazaki, Methods Mol. Biol. 2003, [109] D. Hao, H. Chen, G. Hao, B. Yang, B. 231, 23-28. Zhang, H. Zhang, W. Chen, Y. Q. Chen, [127] E. G. Bligh, W. J. Dyer, Can. J. Biochem. Biotechnol. Lett. 2015, 37, 1983-1992. Physiol. 1959, 37, 911-917. [110] U. T. Bornscheuer, R. J. Kazlauskas, [128] M. Y. Jung, G. B. Kim, E. S. Jang, Y. K. Angew. Chem. Int. Ed. 2004, 43, 6032- Jung, S. Y. Park, B. H. Lee, J. Dairy Sci. 6040. 2006, 89, 90-94. [111] E. Kugelberg, E. Kofoid, A. B. Reams, D. [129] E. Fischer, A. Speier, Chem. Ber. 1895, I. Andersson, J. R. Roth, Proc. Natl. 28, 3252-3258. Acad. Sci. USA 2006, 103, 17319-17324. [130] X. Li, W. Eli, J. Mol. Catal. A: Chem. [112] U. Bergthorsson, D. I. Andersson, J. R. 2008, 279, 159-164. Roth, Proc. Natl. Acad. Sci. USA 2007, 104, 17004-17009. [131] E. Krieger, G. Koraimann, G. Vriend, Proteins 2002, 47, 393-402. [113] M. Macouzet, N. Robert, B. H. Lee, Appl. Microbiol. Biotechnol. 2010, 87, [132] PyMol, The PyMOL Molecular Graphics 1737-1742. System, Version 1.5.0.4 Schrödinger, LLC. [114] E. Chovancova, A. Pavelka, P. Benes, O. Strnad, J. Brezovsky, B. Kozlikova, A. [133] N. Sreerama, S. Y. Venyaminov, R. W. Gora, V. Sustr, M. Klvana, P. Medek, L. Woody, Anal. Biochem. 2000, 287, 243- Biedermannova, J. Sochor, J. 251.

Chapter two: References 108

[134] N. Sreerama, S. Y. Venyaminov, R. W. Woody, Biophys. J. 1999, 76, A381- A381. [135] N. Sreerama, R. W. Woody, Anal. Biochem. 2000, 287, 252-260.

There can be no doubt that all of our cognition begins with experience. Immanuel Kant

111

Chapter 3. Part I

Revealing the evolutionary relations between epoxide hydrolases and haloalkane dehalogenases by inter- conversion studies - applying catalytic promiscuity

The large and diverse family of α/β hydrolase-fold enzymes underwent extensive research in the last twenty years.[1] In the light of evolution, a strong relationship between the ubiquitous protein scaffold and the variety of performed chemical reactions can be assumed. As the appearance of new enzymatic activities is strongly connected to the necessary selection, catalytic promiscuity was discussed as important element for divergent evolution.[2-8] Different enzyme classes of the described were able to adapt the activity of another class, also during laboratory directed evolution.[9-10] The epoxide hydrolases (EH, EC 3.3.2.10) and haloalkane dehalogenases (HLD, EC 3.8.1.5) share the identical catalytic triad, but additional residues for transition state stabilization are either Trp/Asn-Trp in HLDs or Tyr-Tyr/His in EHs.[11-23] The interconversion of hydrolases with aspartic nucleophiles was not investigated in the past and thus, the evolutionary relations, the ancestral development of the different activities and the acceptance of mutagenic substitutions within the popular protein scaffold was explored to figure out their potential for further (ancestral) enzyme engineering.

Part of this chapter has been published: M.P.C. Fibinger, T. Davids, D. Böttcher, U.T. Bornscheuer, Appl. Microbiol. Biotechnol. 2015, 99, 8955-8962. doi: 10.1007/s00253-015-6686-y and M. Dörr, M.P.C. Fibinger, D. Last, S. Schmidt, J. Santos-Aberturas, C. Vickers, M. Voss, U.T. Bornscheuer, Biotechnol. Bioeng. 2016, 113, 1421-1432. doi: 10.1002/bit.25925 Chapter three|Part I 112

Content

Introduction ...... 114

Protein engineering - versatile techniques ...... 115

Catalytic promiscuity and the evolution of enzymes ...... 119

Epoxide hydrolases: protein engineering and organic synthesis ...... 125

Haloalkane dehalogenases: protein engineering and detoxification ...... 130

The interconversion hypothesis ...... 137

Results ...... 138

Rational design of the epoxide hydrolase CIF ...... 138

Cloning, cellular expression and protein purification ...... 140

Structural integrity of the mutants - Circular dichroism ...... 140

Activity tests and kinetic measurements ...... 144

Iwasaki assay for haloalkane dehalogenase activity ...... 144

Adrenaline assay for epoxide hydrolase activity ...... 147

High-throughput HLD screening assay on a robotic platform ...... 149

Selection assay towards haloalkane dehalogenase activity ...... 150

Gas chromatography analysis ...... 158

Isothermal titration calorimetry ...... 158

Tryptophane quenching ...... 163

Structural modeling ...... 164

Discussion...... 167

Summary ...... 175

Materials and Methods ...... 176

Rosetta calculations ...... 176

Quantum mechanical evaluation of the transition state in HLD catalysis ...... 176

Predictive Rosetta calculations ...... 177

Gene cloning and site directed mutagenesis ...... 177

Rational design - site directed mutagenesis ...... 177

Rational design - domain exchange...... 179

Directed evolution ...... 179 Chapter three|Part I: Content 113

Biochemical characterization ...... 179

Circular dichroism spectrometry (CD) - spectra scan ...... 179

Circular dichroism spectrometry (CD) - temperature scan ...... 180

Differential Scanning Calorimetry (DSC) ...... 181

HLD-activity assay according to Iwasaki et al.[325] ...... 182

Phenol red assay ...... 182

Adrenaline assay ...... 183

Robotic GC/MS at Loschmidt Laboratories, Masaryk University, CZ ...... 183

Isothermal titration calorimetry (ITC) ...... 183

Tryptophane fluorescence quenching ...... 184

Growth analysis and selection assay ...... 185

Toxicity tests ...... 185

Selection assay...... 185

Computational methods ...... 186

CIF mutants ...... 186

Homology modeling of cap variants ...... 186

Molecular docking ...... 186

Tunnel cluster analysis ...... 187

Abbreviations ...... 188

References ...... 189

Chapter three|Part I 114

Introduction

Enzymes have been used as biological catalysts for thousands of years in traditional processes such as fermentation to yield wine or beer. The hypothesis that living microorganisms are needed to perform this process was drawn in the 19th century.[24-25] The final evidence for the involvement of enzymes at the enrichment of ethanol is only one hundred years old.[26] With the emerging knowledge about the function and organization of microorganisms and the pathway connections of the metabolic flux implemented by the activity of biocatalysts, the industrial interest towards enzymes increases. In contrast to traditional chemical synthesis, enzymes have ecological and economic advantages. They perform the transitions usually under mild conditions and may replace cost intensive processes and prevent hazardous wastes.[27] In contrast, only few processes can be adapted towards the necessary conditions, where enzymes exhibit their optimal activity and mostly, the reaction rates are often too low to be economic for defined processes. Schmid et al. stated in 2001 that enzymes are recognized as suitable alternatives to traditional methods and enable new routes to complex synthons.[28] Nowadays, enzymes are an inherent part of industrial synthesis.[29-32] Thus, their widespread applications last from crude oil modification (lipases) over bioremediation (dehalogenases, ) and textile industry (amylases, cellulases, proteases) to pharmaceutical purposes (esterases, oxidoreductases, transaminases). Therein, the different properties of the biocatalysts are used, e.g. a broad substrate spectrum for cleaning procedures (proteases, cellulases, dehalogenases) or a very defined, but selective catalysis for enantiopure compounds (esterases, epoxide hydrolases).[28, 30-31] Especially in the latter case, biocatalysts are advantageous, because the synthesis of chiral compounds with defined chemo-, regio- and stereoselective conversion is highly demanded and mostly hard to achieve by traditional chemistry. The most famous examples are derived in medicine, e.g. the enzymatic derivatization of antibiotics. Furthermore, enzymes may be used during actual cancer treatment.[33] The application of L-asparaginase was reported to suppress growth and induce apoptosis in tumor cells during acute lymphoblastic leukemia.[34-35] In the emerging field of biocatalysis, new synthesis routes, elegant catalysis and the development of new catalysts still continue. All enzyme classes have an equivalent application in industrial biocatalysis, but at least hydrolases were reported to be the most commonly used.[7] In many cases, the biocatalytic process can be improved by process engineering, metabolic engineering (if the reaction is performed with whole cells) or protein engineering. The process can be adapted by reactor design, purification procedures or downstream processing, what covers mostly engineering work to optimize the environmental conditions and the ecology and economy of the whole process. Metabolic engineering can be performed by random mutagenesis of described Chapter three|Part I: Introduction 115 strains or by strain selection/screening to achieve high-level producers. However, it is common to block existing pathways or introduce heterologous genes to enable new pathways to produce the desired compound. Changes in the metabolism usually require genomic engineering and may be even hard to perform, as microorganisms may behave capriciously, because not all mechanisms of the complex regulation systems are known. A relatively fast and highly demanded opportunity is the adaption of the enzyme itself by protein engineering, enabling new catalytic properties, stability and specificity.

Protein engineering - versatile techniques Protein engineering is the method of choice for changing the physico- and biochemical properties of the versatile biocatalysts. For industrial biocatalysis it is often strongly desired either to adapt the polypeptides for more economic process conditions (temperature, pressure and solvent stability) or to increase its catalytic abilities.[31, 36] Additionally, industrial relevant applications cover organic synthesis and the application of catalysts for synthesis or the enrichment of enantiopure compounds.[30] For optimization approaches, usually the genetic sequence of the protein is targeted to randomize defined amino acids, sequences, special domains or the complete gene, to yield a mutant library.[37] Afterwards, the expressed protein is screened or selected for the desired properties.[38] Manifold opportunities for the initial creation of the protein mutant library were reported (Table 1). The methods can be roughly summarized as in vivo or in vitro, depending on the decision, if the mutagenesis takes place within a living cell or not. Furthermore, it is possible to differentiate between focused and random methodologies. Thus, the knowledge of the protein sequence and structure enables the opportunity to decrease the number of mutants, while focusing on residues at hot spots (e.g. active site, entrance site, tunnel residues, mobile parts/domains or oligomerization sites). This is crucial, because with the number of randomized/mutated positions, the number of screened variants increases exponentially. Thus, for random libraries, selection procedures are demanded or substituted/combined with high-throughput screening technologies, even if the library coverage is reduced to gain not the best, but even better enzymes.[39-40] Also a focused mutagenesis can request the randomization of certain residues, thus the subsequent screening process has to be adapted adequately. Finally, mutagenesis methods can either perform base substitutions, deletions or insertions or enable hybridization with additional gene copies/variants or related genes. Certainly, the border between randomization and hybridization techniques cannot be exactly defined, but shuffling methods are exceptional examples for gene rearrangements or chimeragenesis. Throughout the possibilities, each method exhibits special advantages and disadvantages, depending on their final application. Error-prone PCR (epPCR) and gene shuffling are the most common techniques, but can be specially Chapter three|Part I: Introduction 116 refined by the usage of nucleotide analogs (epPCR) or the usage of strongly related/homologous genes (shuffling).

Table 1: Engineering methodologies for library generation during directed evolution or rational design.[41-42] The methods are clustered by their technical properties and the comments should indicate usefulness for individual applications. The method clusters are divided by their application in vitro (upper section), in vivo (lower section) or both (middle section).

Cluster Methods Comment Random PCR epPCR[43-44], RID[45], SeSam[46], No structural information required, big techniques RNDM[47] library sizes, synergistic mutations Focused evolution of target sites, MAX randomization[48] Codon Site-specific decreased library sizes, sequential and Shuffling[49], CAST[50], ISM[51] structural information necessary overlap extension PCR[52-53], Shuffling of domains, loops or ITCHY[54], DOGS[47], SHIPREC[55], sequences, low required homology, Recombination MURA[56], SCRATCHY[57], RDA[58], sequence order permutation, missing MAGIC[59] general strategies Small libraries, fast in silico consensus design[60], SCHEMA[61], calculations, extensive computational Computational ISOR[62], 3DM[63] REAP[64], knowledge, structural or phylogenetic Rosetta[65-66], FRESCO[67], JANUS[68] data presupposed Mutagenesis of total DNA content, in EMS[69], MNNG[70], metal ions[71], vivo and in vitro applicable, no Conventional 2-AP[72], X-rays[73], UV[74] information of sequence/structure necessary DHR[75], gene shuffling[76], family shuffling[77], synthetic shuffling[78], Synergistic mutations, orthologous Homologous CMCM[79], ADO[80], RETT[81], shuffling, high sequence homology on Recombination SISDC[82], SCOPE[83], StEP[84], nucleotide level necessary, usage of RACHITT[85], NExT[86], heritable natural diversity recombination[87] No structural information necessary, XL1-red[88], pMutD5[89], PACE[90] curable process, adaptable towards Mutator strains (MP6), pEP[91], orthogonal continuous evolution, lower mutation replication (yeast)[92] rates, unbalanced substitutions ADO - assembly of designed oligonucleotides, 2-AP - 2-aminopurine, CAST - combinatorial active-site saturation test, CMCM - combinatorial multiple cassette mutagenesis, DHR - degenerate homoduplex recombination, DOGS - degenerate oligonucleotide gene shuffling, EMS - ethyl methane sulfonate, FRESCO - framework for rapid enzyme stabilization by computational libraries, ITCHY - incremental truncation for the creation of hybrid enzymes, ISM - iterative saturation mutagenesis, MAGIC - mating-assisted genetically integrated cloning, MNNG - 1-methyl-3-nitro-1-nitrosoguanidine, MURA - mutagenic and unidirectional reassembly, NExT - nucleotide exchange and excision technology, PACE - phage-assisted continuous evolution, RACHITT - random chimeragenesis on transient templates, RDA - recombination-dependent exponential amplification, REAP - reconstructed evolutionary adaptive path, RETT - recombined extension on truncated templates, RID - random insertion/deletion, SCOPE - structure-based combinatorial protein engineering, SeSaM - sequence saturation mutagenesis, RNDM - random drift mutagenesis, SHIPREC - sequence homology- independent protein recombination, SISDC - sequence-independent site-directed chimeragenesis, StEP - staggered extension process. Chapter three|Part I: Introduction 117

For the latter technique, a very elegant refinement was described by Müller et al.[86] Instead of simple DNAse digestion and re-ligation, the gene of interest is amplified by PCR with incorporation of a fixed amount of dUTP. In the process called nucleotide exchange and excision technology (NExT), the product undergoes cleavage by first, uracil deglycosylase and then apurinic/apyrimidinic treatment. Thus, the lengths of the desired fragments for recombination are dependent on the initial dUTP concentrations. Other hybridization methods uses PCR protocols (StEP[84], overlap extension PCR[52-53], synthetic shuffling[78]) or other truncation protocols (ITCHY[54]). However, directed evolution with the versatile randomization techniques and the large mutant libraries is contrasted to the rational design approaches, using sequential (multiple sequence alignments, MSA), phylogenetic (consensus design), or structural (3DM) information for the critical analysis of related genes to draw conclusions about catalytic mechanisms, essential or substitutable amino acids and function relationships. Usually, essential amino acids are organized in comparable structural folds and thus framed by conserved amino acid sequences. The necessary information about the structure and ideally also of the mechanism, cofactors, oligomerization sites, entrance sites or the subcellular localization, enables significantly smaller mutant libraries. The randomization of identified hot spot residues is summarized as focused directed evolution, e.g. in CAST, codon shuffling or focused domain shuffling.[93] The more knowledge about the target protein and correlated or homologous relatives exists, the easier is the pretended prediction about the influence of targeted hot spot amino acids. However, with computational aid, various alignments of proteins can be performed and also manifold software exist for the prediction of biophysical properties as solvent or temperature stability, as well as solubility and oligomerization states. The software Rosetta was developed to enable the de novo design of enzymes.[65-66, 94] Thus, a Kemp eliminase was successfully designed to enable a chemical reaction, which was never described to occur in nature (Figure 1).[95]

B B  H H O2N O2N O2N CN N N O O  OH  HX

Figure 1: Reaction scheme of the Kemp elimination. The geometry and the partial charges served as critical information about the scaffold definition, the residue orientation and the final protein design. The figure is adapted from Röthlisberger et al.[95] Chapter three|Part I: Introduction 118

With this program, catalytic residues of the virtual active site were positioned and the lowest energy conformation for the transition state stabilization of the theoretical enzymes (“theozymes”) was computed. Then, the RosettaMatch algorithm searched for a suitable, existing protein scaffold yielding more than 100,000 possible hits. A gradient- based minimization refined the geometrical constraints by optimization of the total fit, the transition state stabilization and residue orientation, while respecting the possible torsions of the peptide chain. The algorithm enriched structures from the triosephosphate isomerase (TIM) barrel family and thus, the final scaffolds were proposed to exhibit also TIM barrel folds. Additionally, de novo designed retro-aldolases were reported, which were found also to be able to perform Michael additions.[96-97] However, the de novo enzymes needed additional enhancements by protein engineering and became suitable catalysts with high turnover numbers (reviewed by Kries, Blomberg and Hilvert[98]). This design concept also improves the development of non-catalytic proteins.[99] The comparison of the protein of interest to related proteins is a necessary and enriching basis for the planning of the individual engineering purposes. The basic alignment can be performed by the basic local alignment search tool (BLAST).[100] More recent methods like HMMER3[101] or HHblits[102] create faster and more convenient alignments with higher accuracy, what is necessary during the emerging number of reported sequences from metagenome studies or high-throughput sequencing.[103] These tools allow the combination of beneficial mutations during consensus approaches.[104] Thus, derived from MSAs, the thermal stability, catalytic efficiency or the substrate scopes of enzymes were enhanced.[105] The alignment by already described structures can be performed within 3DM.[63] Therein, protein structures are aligned and depending on the variance, core regions become defined, which are structurally conserved in the most members of the alignment. Then, the core regions are transferred to the sequential alignment, but the cores will be renumbered by exclusive 3D numbers. These core sequences are the starting points for the multiple sequence alignments towards sequences without a described structure. Such alignments bare the possibility to draw parallels between structure-function relationships and are valuable for the critical analysis of an unknown protein or an engineering target. However, the alignment may miss additional structural parts, which are located in the variable sections of a protein and are thus hard to identify. Within the 3DM database, links to current publications, known mutations and the embedding of a structural software (YASARA[106]) are excellently programmed and hence 3DM was used to successfully improve the thermostability of an esterase[107], to map the acceptor site of a sucrose phosphorylase[108], to improve a transaminase embedded in a cascade reaction[109] and to enhance the enantioselectivity of esterases[110-111]. Additionally, the combination of computational algorithms enables the synergistic power for complementing in a joint method. An exceptional example is the FRESCO method, covering different algorithms to provide powerful predictions, exemplified by a +35 °C increased melting temperature of Chapter three|Part I: Introduction 119 the investigated limonene epoxide hydrolase.[67] This should underline the strength of computational predictions for consensus approaches, which use the currently described diversity of natural enzymes and the extracted knowledge from the different studies. Usually, the in silico prediction can be performed as fast as the in vitro or in vivo approaches, but must be finally proved in the biological system. Additionally, they require mostly a higher expertise, but may finally save consumables and are less cost intensive. An excellent overview about current software and tools for enzyme design and engineering was reported by Damborsky and Brezovsky.[112]

Catalytic promiscuity and the evolution of enzymes Enzymes are popularly known as specialized biocatalysts, exhibiting well defined properties to perform chemical reactions with high selectivities. Thereby, enzymes usually prefer a main reaction and are able to execute it with respectable reaction rates. Though, the versatile catalysts are also flexible concerning their performance, because their reactions may be more or less selective, but additional substrates can be accepted inside the catalytic cleft, leading to a similar transformation. The substrate ambiguity is also known as substrate promiscuity and widespread through the enzyme classes and can be related to different conformational pathways.[113-114] The broadest substrate scope will be recognized for enzymes with detoxification mechanisms, thus ensuring the conversion of more than one harmful substrate during the selective natural competition. Furthermore, enzymes usually act in aqueous environments inside a temperature frame of 0-100 °C, the liquid aggregation state of water under normal conditions. However, certain enzymes are able to accept very high co-solvent concentrations[115], act in pure organic compounds or perform the desired reaction immobilized in the dry phase in solid/gas bioreactors[116]. This condition promiscuity mostly enables the application of enzymes in industrial processes, whereas also very sensitive enzymes can be adapted by protein engineering methods.[6, 28] Finally, the enzymes can be able to perform another chemical reaction within the catalytic pocket or at another site, leading to a different chemistry, the enzyme is suited for. These “moonlighting” activities were investigated in manifold studies. The border between catalytic and substrate promiscuity is not very clearly drawn, because various reports call an enzyme promiscuous, while it accepting an analogue substrate with different outcome. However, mostly accepted is the definition that an enzyme obtains catalytic promiscuity, if the nature of the bond, which is cleaved or formed, changed. Thus, esterases and amidases are different enzyme families, but the nature of the amide bond and the ester bond is not crucially different, but serves as one of the simplest examples of catalytic promiscuity.[117] A more convenient definition would be the ability of an enzyme to switch the class, e.g. changing from an oxidoreductase to a lyase or from a to an isomerase.[118] Another, more biological promiscuity classification also covers proteins in general.[119] However, during the path of natural evolution, a new activity may be obtained from an existing one and thus, the natural diversity of today’s Chapter three|Part I: Introduction 120 enzymes was evolved. For this process, it is widely accepted, that a gene duplication event serves as the basis for the establishment of new functional enzymes.[120] Horowitz postulated the term retrograde evolution, wherein the metabolic pathways developed stepwise.[121] This is based on the assumption that in ancestral ages, all growth compounds were available. During the increasing population density, nutrients became scarce and the mutant organism survived, which can synthesize the essential compound from a precursor, until this compound was limited. Thus, the pathways were established in a reverse order, but due to uncertainty about the composition of the primeval soup, the organic environment is not clearly described and furthermore many intermediates are quite unstable.[122-123] As ancestral enzymes were assumed to have broad substrate specificity, the evolutionary changes must proceed from a less restrictive state to a more restrictive.[124] The substrate ambiguity of precursor enzymes is also present in actual pathways, performed by modern enzymes, as in case of the promiscuity in the metabolism of branched-chain-amino-acids, where the first steps are generally performed by one transaminase and the dehydrogenase/transferase complex for all branched chain amino acids. Enzymes share ancestral homologs, which were recruited for different chemical transformations (which can also happen non-enzymatic[125]) in a relatively low regulated ancestral microorganism.[124, 126] Thus, the pathways become related (Figure 2) and the secondary metabolism evolved later in time, by the adaption of available enzymes. At least, a small amount of the produced compound must be of selection advantage, e.g. it would not be harmful in a way of toxicity or reducing the amount of essential progenitors.

A COOH COOH COOH + + COOH COOH COOHAc-CoA CoA H2O H2O NADP NADPH + H CO2 HO COOH COOH COOH COOH O HO O O HOOC COOH COOH COOH COOH COOH oxaloacetate citrate cis-aconitate isocitrate oxalosuccinate 2-oxoglutarate

B HOOC HOOC HOOC HOOC HOOC HOOC + + Ac-CoA CoA H2O H2O NADP NADPH + H CO2 HO COOH COOH COOH COOH O HO O O HOOC COOH COOH COOH COOH COOH 2-oxoglutarate homocitrate cis-homocitrate homoisocitrate oxaloglutarate 2-oxoadipate Figure 2: Homologous reaction sequences in the biochemical tricarboxylic acid cycle (A) and in lysine biosynthesis by the AAA pathway (B).[124, 127]

Thus, the enzymatic evolution has to be initiated by tolerant progenitors and not by plastic ancestors. To present different models, it was first stated, that the divergent evolution requires a free gene copy, which is able to accumulate mutations to gain a new established and selectable phenotype.[128] This MDN model (mutation during non- functionality) was proposed to explain the newly occurring enzyme activities, while ensuring a stable process for the evolutionary context (Figure 3). Chapter three|Part I: Introduction 121

Figure 3: MDN model for the evolution of a new enzyme functionality. The corresponding gene of an ancestor enzyme became duplicated and the additional copy may accumulate mutations to gain a new function, while the other allele is preserved by natural selection.

This model lacks the factor of gene loss, which occurs at an even higher rate, then the total chance to acquire a new functionality. During the selective competition in evolution, additional metabolic burden is rapidly removed from the microorganism to ensure its survival (Figure 4).[129-130]

Figure 4: Frequency variance of the additional gene copy (A2) over time. While one copy of the gene duplication event is maintained by positive selection (A1), the extra allele undergoes mutation and drift, until a new function can be acquired with a high chance to be finally lost. The picture is adapted from Bergthorsson et al. and corresponds to the MDN model of enzyme evolution.[129-131]

The new evolved gene copy is initially neutral and may be lost by drift or the accumulation of deleterious mutations. Thus, the gene copy has to drift by a high frequency throughout the population and must remain intact until the new selectable function can be acquired by relatively rare advantageous mutations.[2, 130] As the gene Chapter three|Part I: Introduction 122 copy is freed from selection pressure, the chance for negative selection is orders of magnitude higher (Figure 5).

Figure 5: Loss or gain of function rates for additional gene copies during selective evolution. Most pathways lead to copy loss, if no new, selectable function is acquired. The given numbers are estimated and may vary for different bacteria. The figure is adapted from Bergthorsson et [130] al. Ne - population size

A more convenient model covers the knowledge about promiscuous enzymes, hypothesizing that enzymes gain (or still exhibit) a promiscuous side activity through their innovation by neutral mutations. During the selective evolution, the side activity became advantageous and is positively selected. To ensure the organisms fitness, the related genes became amplified and are now able to evolve divergently (Figure 6).[132]

Figure 6: IAD (innovation-amplification-divergence) model for the evolution of enzymes. An ancestor enzyme/protein obtained a promiscuous function (“moonlighting activity”) through neutral mutations, which became beneficial during evolutionary selection. Thus, the corresponding genes are amplified to initially increase the fitness by increasing the side activity. Subsequently, the promiscuity can be preserved or the gene copies evolve divergently towards Chapter three|Part I: Introduction 123 their selected activities/functions. During the mutational improvement of the side function, null mutations and gene lost, as well as recombination and synergistic accumulation of beneficial mutations fortify the evolutionary process.

Thus, it can be accepted that ancestor enzymes were either specialists on their defined activity or generalists, allowing survival on different growth substrates and tolerating toxic or extreme environments. The evolutionary progress does not need to reverse itself, because also horizontal optimization is possible.[124] Even when the enzyme initially lacks any significant activity, by large insertions or domain swapping of related enzymes combined in the ancestor variants[2], modest activities can ensure an increased fitness.[8, 133] For natural processes, bidirectionality is characteristic. Thus, generalists can turn into specialists, by losing their side activity and specialists can even turn into generalists without loss of the initial activity.[5] These two-sidedness fortifies a fast adaption to new environments and is an impulse for adaptive evolution (Figure 7).

Figure 7: Routes to new enzymatic functions. One specialist with a minor side activity can turn into a generalist by a weak negative trade-off of the original activity. In many cases, promiscuous enzymes can keep the original activity (or at least it is decreased in a tolerable rate), while increasing the new functionality in high magnitudes.[5] The route of strong negative trade-off is also possible, e.g. for additional gene copies, accumulating mutations to gain a new function.[134] Concerning the latter route, a loss of function and thus a loss of the non-sense DNA is highly probable, while developing from a more restrictive state to a less restrictive one. This route also has fewer examples in literature. The figure is adapted from Khersonsky et al.[5]

During the evolutionary process, different protein scaffolds were established, depending on the desired properties and possibly by the acceptance of different side activities. Thus, in the tautomerase superfamily, the catalytic proline residue (amino terminal P1) may serve as base or acid, depending on the protein environment, which defines the pKa of the amino function. Thus, dehalogenase, epoxide hydrolase, decarboxylase and esterase reactions are performed.[135] Interestingly, the youngest out of the five families Chapter three|Part I: Introduction 124 may be the cis-3-chloroacrylic acid dehalogenase family, identified in soil dwelling bacteria, which degrade 1,3-dichloropropane, which has been introduced in the environment since the 1940s as nematocide.[136-137] The precursor of this dehalogenase reaction was identified as a related phenylenolpyruvate tautomerase with a dehalogenase side reactivity, which was 2.5·106-fold lower. Also the α/β hydrolase fold is found in many hydrolytic enzymes with very different phylogenetic origin and diverse catalytic functions.[1] For this enzyme class, the conversion of an esterase into an epoxide hydrolase and the transformation of a plant esterase into a hydroxynitrile lyase were exceptional examples. Nowadays, many different examples were reported on catalytic promiscuity. Thus, the sulfuryl transfer catalyzed by a pyruvate kinase[138], the sulfatase activity of an alkaline phosphatase[139] and a vanadium-dependent chloroperoxidase with phosphatase activity[140-141] were described. The relationship between enzymes with overlapping catalytic activities always indicate a common ancestor as was reported for the , threonine and serine dehydratase.[142-144] In this context it should be mentioned that this was also investigated for the tryptophane synthase and the correlated dehydrogenase.[145] A more common example is the family of pyridoxal-5’-phosphate (PLP)-dependent enzymes, where many described side reactions of one enzyme are main activities of another PLP-dependent enzyme.[146] Thus, aminotransferases, racemases, decarboxylases and eliminases were reported and also an aldolase reaction was obtained.[147] This strongly shows the high versatility of these enzymes, obtaining the specialized activity only by the correct geometric orientation of the substrate, guided by the active-site residues.[148-149] Furthermore the Candida antarctica lipase B (CAL-B) was found to perform aldol additions[150] or Michael-type additions[151], the addition of transition-metal complexes turned streptavidin into a hydrogenation catalyst[152] or myoglobin into an oxidase[153]. Even proteins, known for specific binding can develop catalytic activity during protein engineering, as was shown for catalytic antibodies (hydrolysis, elimination, decarboxylation, isomerization, aldol condensation, oxidations).[154-155] Many additional examples were reported and thus, the enzymatic promiscuity was proved to occur naturally, but is also able to be engineered in the laboratory.[2-4, 156-157] Finally, it has to be stated, that the exploration of side activities can be quite challenging, because it has to be paid respect to the reaction conditions and the purity of the charges, e.g. it was firstly described that the ribose-binding protein was turned into a triosephosphate isomerase (TIM), but it was figured out that the TIM activity was related to an impurity by a stepwise imidazole elution during the purification process. Thus, an applied gradient yielded a wild-type TIM fraction, which was otherwise co-eluted. This led to the final retraction of the article.[158-159] Furthermore a hydroxynitrile lyase from almond meal was reported to be able to add cyanide to imines[160] and CAL-B was described to perform decarboxylative aldol addition and the Knoevenagel reaction.[161] In both cases, the experiments were reproducible, but Chapter three|Part I: Introduction 125 it was shown that the described reactions happen spontaneously without the respective enzymes.[162-163] However, in all examples performing protein engineering it was observed that the closer the mutations are to the active site, the more the increase/decrease in the catalytic rate was observed. Thus, a stronger focus should be set towards the active site, whereas more distant mutations stronger influence the overall protein stability.[4, 164] The enzymatic evolution from ancestors to the diverse variance of today’s enzymes is highly interesting for biocatalysis and enzyme engineering. The redesign of enzymes for novel catalytic activities represents an important challenge for protein engineering. Changing the type of reaction opens an opportunity to construct new enzymes for reactions not occurring in the nature. In some cases, the temperature stability of the investigated proteins was improved to finally ensure a higher tolerance to introduced mutations.[165] Indeed, the ancestor molecules seem to be more tolerant towards mutagenesis, delivering a new platform for the development of tailor-made biocatalysts.

Epoxide hydrolases: protein engineering and organic synthesis The first reports about microbial epoxide hydrolases (EH) were published in 1991 and 1993.[166] The enzymes are able to open the strained, three-membered ring of epoxides to yield vicinal diols (Figure 8). If the functionality is localized in a more complex chemical structure, the products are typically chiral at one or both carbon atoms, harboring each hydroxy function. The selectivity of enzymes is one of the most important reasons for their application in chemical synthesis. Chemically, chiral epoxides can be synthesized by the Sharpless-epoxidation or the method of Jacobsen-Katsuki.[167- 170] Unfortunately, these methods are either limited in the applicable substrates or by 2 the poor enantioselectivity. The enzymatic hydrolysis is performed via a SN -mechanism and the product configuration is dependent on the attacked carbon atom. Thus, retention and inversion of the configuration can be observed if the less or the more hindered carbon atom undergoes the nucleophilic attack, respectively.[23] This specificity is exclusive for epoxide hydrolases and resulted in their industrial application as asymmetric catalysts.[22] Thus, not only a high variety of chiral diols is accessible, but also the residual epoxide fraction becomes enantioselectively enriched and is deracemized. Moreover, the application of two epoxide hydrolases in one reaction yields enantiopure products in an enantioconvergent synthesis (Figure 8).[171] Thus, epoxide hydrolases are able to convert racemic and meso-epoxides regio- and stereoselectively.[172-173] For deracemization processes, the combined chemoenzymatic resolution is possible, e.g. for 2,2-disubstituted epoxides.[174] Contrarily, only few reports exist about the conversion of tri-substituted epoxides.[175-176] Epoxide hydrolases are ubiquitous in nature.[174] In mammals, five different epoxide hydrolase types were reported: The cytosolic, the [17, 177] microsomal, the cholesterol, leukotriene A4 and the hepoxilin epoxide hydrolases. Chapter three|Part I: Introduction 126

OH O A. niger EH O +

HO 98%ee, 19% yield 83%ee, 47% yield

B. sulfurescens EH OH O O +

HO 96%ee, 23% yield 51%ee, 54% yield

A. niger EH + OH O B. sulfurescens EH

HO 89%ee, 92% yield Figure 8: Enantioconvergent synthesis of (R)-1-phenyl-1,2-dihydroxy ethane by fungal epoxide hydrolases (EH) from Aspergillus niger and Beauveria sulfurescens. The enantiomeric excess and the yield are obtained after 2 h reaction time. The retention of configuration with the first EH and the inversion of configuration with the second EH are caused by the selectivities of each enzyme for the other epoxide carbon atom during the nucleophilic attack. Adapted from Buchholz, Kasche and Bornscheuer.[166]

Thus, they are also able to accept heteroderivatives such as aziridines, but not the thio analogue.[178] To facilitate the application of epoxide hydrolases for industrial purposes, they were also successfully immobilized on a DEAE-cellulose carrier by adsorption, but it was also noted, that the enzymes are not stable for longer periods. Furthermore, the kM [179-181] was decreased, while vmax increased after the immobilization. However, these enzyme class still misses large scale applications, due to the low stability and the low solubility of the substrates, which hamper the mass transfer at high enzyme loadings, which are required for fast reactions.[17] The enzymes are not dependent on a certain cofactor, are relatively tolerant towards organic solvents, but do not accept other nucleophiles than water. They operate via a covalently bound intermediate, the alkyl-enzyme complex (in contrast to the acyl enzyme intermediates of serine or cysteine hydrolases).[166, 182] The first crystal structure of an epoxide hydrolase was described in 1999 and was EchA from Agrobacterium radiobacter AD1, albeit the crystal structure is deranged by a glutamine residue with unnatural torsions, blocking the active site and derived from crystal packaging forces.[183] Thus, the enzyme was intensively studied and the proposed mechanism from 1994 was verified (Figure 9).[183-185] The catalytic residues form primarily a triad, consisting of a nucleophile, a basic and a charge relay residue. In the most cases, the Asp-His-dyad is strongly conserved, except in the related limonene epoxidases, using an Asp-Arg-dyad instead. The charge relay residue is usually Asp, but Glu was also reported.[186-188] Chapter three|Part I: Introduction 127

Y152 Y152 Y152 Y215 Y215 Y215

O- OH OH HO HO HO H H O O O R HO R O O- R O O O O- H H N N N D107 D107 O H D107 N N N H H H275 H275 H275 - O O O O- O O-

D246 D246 D246 Figure 9: General mechanism of the epoxide hydrolysis by an alkyl-enzyme intermediate. The reaction is triggered by the catalytic pentad of Asp-His-Asp/Glu and two additional residues, assisting in epoxide ring opening (Tyr/Tyr or Tyr/His). The residue numbering is according to the sequence of the EchA and the figure is adapted from Archelas et al.[20]

The triad is expanded by two residue side chains, assisting in the epoxide ring opening. A majority of the described epoxide hydrolases harbors a Tyr-Tyr pair for the essential stabilization, resulting in catalytic pentad. The hydrolase activity tends to zero, if these residues are mutated.[183, 185] Additionally, for the cystic fibrosis transmembrane conductance regulator (CFTR) inhibitory factor (CIF), an epoxide hydrolase from Pseudomonas aeruginosa, a Tyr-His pair was described and also EchA exhibits residual activity, if the second tyrosine was mutated to histidine. [189-192] Recently, H104 of EchA was described to be very important during the catalysis: It is connected to H275 and E35 in the distance of a salt bridge and thus part of a hydrogen bond network, ending at a solvent exposed surface region of the enzyme.[15, 193] For an epoxide hydrolase from Solanum tuberosum a comparable pathway exists on both sides, the histidine and adjacent to one tyrosine residue.[193-194] This proton wire is not essential, but if it is disturbed by mutagenesis, the hydrolase activity drops dramatically. Another interesting feature of the enzyme class is the possible formation of isoaspartate. Van Loo et al. described the formation of isoaspartate during the catalysis of the EchA mutant F108T with (R)-para-nitrostyrene oxide (Figure 10).[195] Since the knowledge about the active site increases, the enzyme was intensively studied by engineering approaches, mainly to increase the enantioselectivity.[188, 196] Thus, two tryptophane residues (W38 and W138, according to EchA) were reported, which are near the active site and enable tryptophane-quenching experiments for the calculation of dissociation constants.[185, 191] Nowadays, many epoxide hydrolases have been described and the structural fold of α/β hydrolase-fold enzymes is well understood.

Chapter three|Part I: Introduction 128

Y215 Y215

O2N O2N O- Y152 OH Y152 OH 1 OH HO (R) HO (R) OH HO HO O O HO O H 2 T108 T108 O HN HN HN R HN R 1 2 2 N H N O O R NH R NH 1 D107 1 D107 H275 H275

2 -pNS diol 3

3 H2O O HO O HO OH 4 O R1 R1 N N NH N H H R2 H O O R2 2 4 Figure 10: Formation of isoaspartate and subsequent inactivation of epoxide hydrolases. The reaction pathway from the alkyl-enzyme intermediate is pursued by the nucleophilic attack of a water molecule (1), yielding the initial enzyme and the diol product or by the attack of a backbone nitrogen (2), resulting in a succinimide intermediate. Then, the catalytic water molecule attacks either the newly formed carbonyl, yielding the initial enzyme (3) or the backbone carbonyl of the amide bond, producing isoaspartate (4), consequently irreversibly inactivating the enzyme. This mechanism explains the loss of activity in older enzyme samples and is exceptional for aspartate hydrolases. The figure is adapted from van Loo et al.[195]

Thus, the enzymes consist of two domains: a main domain, where the catalytic triad and the proton wire are located and a variable cap domain, contributing to the substrate specificity by formation of the active site, harboring of the assisting residues and creation of the entrance tunnel (Figures 11 and 12). Epoxides rise naturally during oxidative stress or oxidizing damage within the synthesis processes in the living cell. These harmful and reactive compounds can seriously damage cellular components. Thus, at the first view, epoxide hydrolases have a detoxification function, while they are integrated in several degradation pathways. Therein, their functionality can be compared to glutathione reductases.

Chapter three|Part I: Introduction 129

A B

C D

Figure 11: Structures of different epoxide hydrolases from (A) Solanum tuberosum[197] (pdb code: 2CJP), (B) human origin[198] (pdb code: 1S8O), (C) Agrobacterium radiobacter AD1[183] (pdb code: 1EHY) and (D) Pseudomonas aeruginosa[199] (pdb code: 3KD2). Please note the differentiation in a cap and a main domain, characteristic for the α/β hydrolase-fold enzymes. The additional phosphatase domain of the human soluble epoxide hydrolase has regulatory functions. The structures are shown in rainbow color from N- to the C-terminus and the catalytic pentads are represented in sticks.

A B

Figure 12: Topology plots of EchA[183] (A) and CIF[200] (B), two bona fide epoxide hydrolases with characteristic fold of a central, parallel and eight-stranded β-sheet (only the second strand is antiparallel) covered by six α-helices. Sheets are represented as arrows, depending on the trace direction and helices are schemed as rectangles. The numbering is according to the original publications and the catalytic residues are indicated by a star (catalytic triad) or a diamond (epoxide ring opening-assisting residues). The back-up aspartate of EchA is drawn hatched and aligns perfectly with the repositioned catalytic acid of CIF.

Epoxide hydrolases accept a bright variety of epoxides, e.g. epihalohydrins, glycidol, epoxy ethane, terminally epoxidized propanes, butanes, hexane and octanes, styrene oxide, isoprene monoxide, 4-chlorostyrol oxide and even more complex compounds with additional aryl functions.[201]Additionally, several steps in the syntheses of bioactive Chapter three|Part I: Introduction 130 compounds from the secondary metabolism are catalyzed by epoxide hydrolases.[21] As the secondary metabolism is assumed to have evolved later, the original detoxifying enzymes may have served as potential catalysts to degrade defensive compounds against competitors. Besides the well described detoxification mechanisms[202-203], epoxide hydrolases are responsible for the bioactivation of polyaromatics[204-206], involved in blood pressure regulation[207-209] and the inflammatory response[19]. For example, CIF not only blocks CFTR to enable biofilm formation of Pseudomonades, but also influences the P-glycoprotein MDR1, an ABC transporter, which is responsible for the export of xenobiotic compounds and overexpressed in tumor cells. The specific inhibition may serve supportive in cancer therapy.[210] Epoxide hydrolases were usually identified by their specific reaction with epoxides and besides conventional GC/HPLC analysis, faster methods with much higher throughput by colorimetric assays were described. Usually, sodium periodate is used to cleave the vicinal diol by oxidation into two aldehydes and subsequently, the remaining periodate is measured photo- or fluorometrically with umbelliferone[211], adrenaline[212] or carboxyfluorescein[213]. By detailed fingerprint analysis, the substrate scopes of newly described enzymes are measurable and thus, also hydrolases from metagenome libraries were identified.[214-215] The catalytic activity is strongly related to the halohydrin/haloalcohol dehalogenases, which are able to convert a β-halo alcohol into an epoxide and subsequently open the ring by the addition of a water molecule.[216-217] In contrast to epoxide hydrolases, other nucleophiles than water are accepted, e.g. cyanide, nitrite or azide ions. Thus, additional synthetic pathways are enabled, especially through the selectivities of this related enzyme class.[218-219] For example, intermediates in the synthesis of the lipid-lowering drug atorvastatin are enzymatically accessible, wherein one step consists of the selective substitution of chloride by nitrile functionality.[32, 219] Chemically, the catalyzed conversions of haloalcohol dehalogenases and epoxide hydrolases are very similar, but the exclusive acceptance of water by the latter class and the different structural organization is completely diverse: Haloalcohol dehalogenases use a catalytic mechanism with serine, tyrosine, arginine and aspartate, not comparable to the catalytic triad of epoxide hydrolases. Thus, phylogenetically, structurally and mechanistically epoxide hydrolases are more related to the class of haloalkane dehalogenases.

Haloalkane dehalogenases: protein engineering and detoxification Halogenated hydrocarbons became popular due to their connection to environmental pollution by mankind. The introduction of these harmful and toxic compounds has surely significantly increased during industrialization by waste and ground water pollution and the application of war fare chemicals like sulfur mustard (yperite). Additionally, the Chapter three|Part I: Introduction 131 versatility of halogenated organic compounds was of course firstly investigated by nature (Figure 13).

Figure 13: Sources of halogenated hydrocarbons and their possible degradation pathways (X = halogen). The natural sources of halocarbons exhibit electron transfer activity for the reduction of the adjacent carbon atom. The natural, mostly defensively used compounds can be degraded by manifold reactions, from which halohydrin, haloalcohol, haloacid and haloalkane dehalogenases are very common. The degradation of warfare agents[220] and industrial pollutants is still challenging, although microorganisms adapted rapidly to the new environmental conditions. The figure is adapted from Nagata et al.[221]

The production of halocarbon compounds is strongly associated with the activity of many microbes and also higher organisms. Thus, vanadium-dependent and -independ- ent haloperoxidases, flavin-dependent, oxygen-dependent and S-adenosylmethionine- (SAM)-dependent halogenases were described.[222-229] On the other side, dehalogenation mechanisms were established for rapid dehalogenation reactions, which are also accompanied with the detoxification of the compounds. Thus, monooxygenases for chlorinated hydrocarbons, cytochrome P450 monoxygenases for special haloalkanes and haloalkanes, ligninases for chlorinated polycyclic aromatic compounds and aromatic dioxygenases for monocyclic aromatic hydrocarbons or polychlorinated biophenyls are known to act in concert with the respective dehalogenases to get rid of the toxins as fast as possible.[230] Nowadays, the reductive dehalogenation with organo-cobalt complexes and Fe-S- cluster[231], haloacid,[232], halohydrin[233], haloalcohol or haloalkane dehalogenases mediate the cleavage of the carbon-halogen bonds.[177, 221, 234] Due to the degradation pathways, the application of dehalogenases was firstly investigated for detoxification Chapter three|Part I: Introduction 132 purposes. The main focus was the degradation of 1,2,3-trichloropropane (TCP), a common harmful pollutant, which is not easy to degrade.[235] Thus, different strategies were developed, e.g. a synthetic reaction pathway, using purified and immobilized DhaA31, EchA and the haloalcohol dehalogenase HheC.[236] This pathway was a success and also an optimized pathway with the same enzymes in living cells was reported, guided by computational-assisted engineering.[237] A main focus lies on DhaA, which needs to be optimized towards its entrance tunnel and active site to accept TCP for higher conversion rates.[238-239] Additionally, the mutant enzyme DhaA31 was introduced in Pseudomonas putida MC4, which was capable of 1,2-dichloropropane (1,2-DCE) degradation. After certain optimization, TCP is bio-remediated by an alternative pathway in this organism without the usage of resistance markers and transmissible plasmids.[240] Beside the introduction of haloalkane dehalogenase genes in plants for phytoremediation[241-243], a groundwater purification process was established in Lübeck (GER) for the degradation of 1,2-DCE by Xanthobacter autotrophicus GJ10.[244-246] The industrial application of haloalkane dehalogenases is not very widespread, but the enzymes are suited for enantioconvergent synthesis and deracemization processes.[171] The usage of TCE for enantiopure building blocks was discussed. The reaction from TCE to 1,2-dichloro-3-propanol (by DhaA) enables the conversion with a halohydrin dehalogenase to epichlorhydrin, which then can be used in the synthesis of anti- depressives, beta-blockers, food supplements or anti-cholesterol agents.[247-248] An overview about current applications of haloalkane dehalogenases is shown in Figure 14 and additional information about identified haloalkane dehalogenases and their applications provides an excellent review of Koudelakova et al.[13] The genes for haloalkane dehalogenases were firstly identified in 1989 and thus, a degradation mechanism for 1,2-DCE was proposed.[249] After expression studies, the DhlA was finally crystallized in 1991.[250-251] During the studies, the mechanism was also 2 proposed and clarified as a nucleophilic SN -mechanism performed by an aspartate residue (Figure 15).[252] The genes of the haloalkane dehalogenases were found to be present on transmissible plasmids and thus, they can rapidly evolve and be transferred horizontally.[235] Whereas LinB and DhlA are constitutively expressed in original host strains, DhaA is inducible expressed, but is also found on a plasmid.[221] The mechanism of this enzyme class was analyzed in detail. Tryptophane residues in the active site were essential, whose fluorescence became quenched when halide ions are bound. Thus, the rate-limiting steps during the catalysis of DhaA (the release of the alcohol and the carbon-halogen bond cleavage), DhlA (halide release) and LinB (hydrolysis of alkyl-enzyme complex) were identified.[253-257] Additionally, it was reported that motions and conformational movements influence the reaction thermodynamics and the flexible cap motion was also recorded by computer-assisted dynamic trajectory calculation.[258] Chapter three|Part I: Introduction 133

Figure 14: Recent biotechnological applications of haloalkane dehalogenases (HLDs). The methods are clustered by the given processes, including different reported technologies. The figure is adapted from Nagata et al.[221] 1,2-DCE - 1,2-dichloroethane, 1-CB - 1-chlorobutane, 1,3- DCP - 1,3-dichloropropane, TCP - trichloropropane, HCH - hexachlorohexane, 1,2-DBE - 1,2- dibromoethane.

During motion trajectories it was reported, that the main domain of these α/β hydrolase-fold enzymes is very rigid and fixed, as well as the nucleophile and the first halide binding residue (adjacent to the nucleophile). The other halide-stabilizing residue is more flexible, allowing a broader substrate scope by alternative side-chain conformations. Also the water molecule for the catalysis is strongly fixed, indicating a proton wire, as proved for the epoxide hydrolases. The halide throughput of the most common representatives - DhaA, DhlA and LinB - was calculated, leading to the identification of the histidine as main barrier, modulating the throughput in the order I- > Br- >Cl-.[259] By quenching studies, the detailed thermodynamic parameter and the equilibrium constants for certain mechanistic pathways were accessed.[260] Chapter three|Part I: Introduction 134

O N41 O N41

H2N H2N Cl Cl Cl Cl - Cl Cl HN HN H O O- H O O O O - N H W107 - N H W107 O O H N D108 O O H N D108

E130 E130 H272 H272 Cl-

TCP O N41 O N41 H2N H2N Cl Cl- Cl H2O - HN OH Cl HN O O- O O- - N W107 O O N D108 DCP H - N W107 O O H N D108 E130 H272 E130 H272 Figure 15: Catalytic mechanism for DhaA. The nucleophile attacks the halide-adjacent carbon atom, creating an alkyl-enzyme intermediate and the halide is fixed at the specific binding site, built from either an Asn/Trp or a Trp/Trp pair. Then, the carbonylic function undergoes nucleophilic attack by a polarized water molecule, activated by a conserved histidine residue. An additional charge-relay residue (Glu or Asp) stabilizes the proton abstraction and the histidine function.

Due to their phylogeny and the corresponding structures, the haloalkane dehalogenases were classified in three subgroups: HLDI (e.g. DhlA) with a Trp/Trp pair and an Asp charge relay residue, HLDII (DbjA, DbeA, DhaA, DatA, DmbcA and LinB) with an Asn/Trp pair and a glutamate as acidic function and HLDIII (DrbA, DmbC) with the same Asn/Trp pair, but an aspartate for the charge relay (Figures 16 and 17). [11, 261-263] A very detailed review on structure-function relationships is provided by Damborsky et al.[264]

A B C

Figure 16: Structures of different haloalkane dehalogenases from (A) Sphingomonas pauci- mobilis UT26[265] (LinB, pdb code: 1CV2), (B) Xanthobacter autotrophicus[251] (DhlA, pdb code: 2YXP) and (C) Rhodococcus rhodochrous[266] (DhaA, pdb code: 3G9X). Please note the differentiation in a cap and a main domain, characteristic for the α/β hydrolase-fold enzymes. The structures are shown in rainbow color from N- to the C-terminus and the catalytic pentads are represented in sticks. Chapter three|Part I: Introduction 135

A B

C

Figure 17: Topology plots of HLD subgroups HLDI (A), HLDII (B) and HLDIII (C) with the characteristic fold of a central, parallel and eight stranded β-sheet (only the second strand is antiparallel) covered by six α-helices. Sheets are represented as arrows, depending on the trace direction and helices are schemed as rectangles. The numbering is according to the original publications and the catalytic residues are positioned by a star (catalytic triad) or a diamond (halide-stabilizing residue). The assumed sheets in HLDIII were hatched.

Nowadays, the classification changed towards the individual substrate specificity, classifying the enzymes by their conversion fingerprint in substrate-specificity groups (SSG). SSG-I covers LinB, DbjA, DhaA, DhlA, SSG-II only DmbA, SSG-III only DrbA and SSG- IV DmbC, DbeA and DatA. Several mutation studies were performed, when the structure was known and thus, in DhlA, N148E led to a stability loss and was found to be structurally important by its uncommon torsions. Surprisingly, N148E permits dehalogenase activity, when D260N is mutated. Maybe the residue’s function is then forced to act as backup acid, as known from EchA (see previous section).[267] Furthermore, the mutation of the nucleophilic D124N led to an activity decrease, but through intrinsic reactions the amide becomes cleaved and the acid is restored, when the mutant was stored over time.[25] Additionally, two lysine residues were identified, responsible for substrate attraction and binding.[268] During the binding process a halide collision complex is formed by an exclusive binding site, enabling the transport to the active site and the catalytic rate was enhanced, when the corresponding residues were mutated to alanine residues.[269-270] In DhaA, an alternative binding site was reported.[271] Chapter three|Part I: Introduction 136

An exceptional study was the stabilization of LinB by FoldX, Rosetta and YASARA molecular dynamic simulations. The activity optimum was shifted by several mutations by ten degrees to 50 °C and the melting temperature was increased from 51 °C to 74 °C.[272] The conversion of brominated and iodized compounds with LinB was enhanced by the substitution F154L.[273] Tunnel engineering plays an important role, especially during the publication history of haloalkane dehalogenases. Thus, in DhlA two tunnel paths were identified, exclusively for the substrate and water molecules, respectively.[270] DhaA also contains two tunnel paths[254, 274] and engineering of adjacent residues modulated the solvent tolerance towards DMSO up to 52%[275] and enhanced the conversion of TCP by adaption of the tunnel path and the active site’s shape.[254] In the latter study, a 26-fold increase for kcat/kM was measured. Indeed, the tunnel shape, the entrance mouth and the accessibility to the active site play crucial roles for selectivity, activity and stability. As the tunnel is built up at the interfaces of the core and the main domain, the most variance is provided by the cap domain. It contributes decisively to the substrate scope and the enantioselectivity.[276] Spontaneous cap domain mutations may be the driving force of rapid evolutionary adaption towards different haloalkanes.[277] New dehalogenases are classified each year, e.g. DspA from Strongylocentrotus purpuratus was the first characterized haloalkane dehalogenase from eukaryotic origin and is thus unique in its enantioselectivity towards 2-bromobutane. More generally, sequence analysis revealed that eukaryotic dehalogenases are commonly part of HLDII and HLDIII.[278] Additionally, the psychrophilic dehalogenases DpcA[279] (SSG IV) and DmxA [280] were described. Actually, the screening for new enzymes went on and will reveal new enzymes in nearer future. Interestingly, the sequenced-based search also reveals fluoroacetate dehalogenases, esterases and phosphatases, when targeting haloalkane dehalogenases.[281] The in silico screening has become very common and thus it is now possible to search new enzymes by the proposed substrate mechanism or structural fragments may serve as source for the identification of new substrates.[14, 282] For the detection of released halides or protons, the classically targeted products, a specific assay and a general methodology exists. Thus, calorimetrically the Iwasaki assay detects halides by the formation of colored thiocyanates and pH indicators can visualize the acidification during the reaction.[283-284] Often applied is the phenol red assay, which was initially described in 1949.[273, 285-293] Common GC and HPLC methods exist as well as the defined kinetic analysis via isothermal titration calorimetry. The calculation of kinetic data should be also possible within a simple pH-stat application. The enzyme class of haloalkane dehalogenases is well understood and excellent described, the main focus on the pioneering research of the groups of Dick B. Janssen and Jiří Damborský led to the manifold knowledge of the thermodynamics, kinetics, biochemical and biophysical properties and the understanding of the mechanistic Chapter three|Part I: Introduction 137 chemistry. The closest relatives of haloalkane dehalogenases are haloacid dehalogenases and epoxide hydrolases.

The interconversion hypothesis As mentioned above, haloalkane dehalogenases and epoxide hydrolases are related enzymes and throughout the literature, both enzyme classes were mentioned in the same breath.[184, 294-297] Additionally, the family of haloacid dehalogenases is quite related. However, the sequence identities and the phylogenetic analysis of the presented classes lead to the hypothesis of a common ancestor. The mechanism of catalysis is strongly related and both classes accept epihalohydrins as substrates - on one hand for dehalogenation, on the other hand for epoxide hydrolysis. Therefore, the ability of binding either epoxidized or halogenated compounds is predetermined. Furthermore, EchA was engineered for the conversion of dichloroethylene oxide, even if its crystal structure poorly reflects the natural active state.[298] Thus, the question rises; why both enzyme classes - dehalogenases and epoxide hydrolases - specifically attack the opposite carbon atoms in such small molecules, while stabilizing only the halide or the rising hydroxide, respectively, and did not exhibit moonlighting activity towards the other reaction. Within the family of α/β hydrolase-fold enzymes, the interconversion of enzymatic activities was successful for esterases/epoxide hydrolases and hydroxynitrile lyases/esterases (and vice versa), but only for the latter activities generalists crated in the meaning of potential precursors (or “transition mutants”), exhibiting both activities.[9-10, 299] As the enzymes harbor a central serine residue instead of aspartate, no ancestor or generalist for the aspartate hydrolases was described. Thus, it may be possible that epoxide hydrolases and haloalkane dehalogenases developed from an epoxide hydrolase precursor, which was mainly necessary to ensure survival under an oxygenized atmosphere. As the secondary metabolism towards epoxide containing/processing intermediates was developed, the potential for the generation of quite diverse compounds was enabled. During adaptive evolution, the generation of halogenated bioactive chemicals was also established by haloperoxidases[300-301] and thus, also detoxification mechanisms for the recalcitrant compounds must be established. However, the superfamily of α/β hydrolase-fold enzymes harbors manifold catalytic activities, as oxidoreductases, isomerases, hydrolases, and lyases.[302] Hence, the structural scaffold has been shown to be robust towards the development, versatile for the implementation of new activities and tolerant towards the adaptation of new chemical reactions and different catalytic pathways. The high diversity is based on the ability to bind different substrates, which require or demand a different chemistry plus changing hydrogen bond partners and the different alkaline or acidic behavior of active-site residues. Chapter three|Part I 138

Results

Before initiating this thesis, investigations on the relations of epoxide hydrolases and haloalkane dehalogenases were explored before, also by the attempt to create esterases (BS2 and PFE1) with dehalogenase activity.[303-305] Starting point for the experiments were EchA and DhlA, respectively. Various EchA mutants were created by crystal structure analysis (pdb codes 1EHY for EchA and 2HAD for DhlA) and also focused directed evolution was used on basis of the structures, but no interconversion of the epoxide hydrolase was achieved. It was shown, that the crystal structure of EchA contained a glutamine residue, blocking the active site. Due to the poor structure, the rational design failed and thus, the initial enzyme was switched to the epoxide hydrolase CIF (cystic fibrosis transmembrane conductance regulator (CFTR) inhibitory factor). The structure (pdb code 3KD2) showed to be a better basis for the critical analysis and the application of predictive software. However, EchA was also used as positive control, due to the broader described substrate spectrum.

Rational design of the epoxide hydrolase CIF The transition states of haloalkane dehalogenase catalyzed reactions with general haloalkane dehalogenase substrates[12] (Figure 18) were modeled and the received scaffold transferred to the active sites of 14 epoxide hydrolases. Subsequently, different mutations for the scaffold structure were proposed by Rosetta. The calculation and the summary of predicted, promising residues substitutions were performed and kindly provided by David Bednář, Loschmidt Laboratories, Masaryk University, Brno, Czech Republic.

Br Br Br Br CN 1 2 3 I I 4 5 Figure 18: General haloalkane dehalogenase substrates. The enzyme class was supposed to accept at least one of the general substrates for conversion, simplifying the screening for this activity.[12]

From the initial epoxide hydrolases, the 3KD2 structure showed the best properties for the acceptance of the introduced mutations, stabilizing the transition state sufficiently for the performance of haloalkane dehalogenase catalysis. Five Rosetta designed variants were chosen for mutant construction (CIF01-05, Table 2).

Chapter three|Part I: Results 139

Table 2: Created CIF mutants in this thesis. To the initial mutants (CIF01-05), calculated by David Bednář, various variants were added, proposed by the structural alignment and the consequent substitution of A154T, suggested by Luise Weiher.[306] Gray highlighted are residues of the catalytic pentad from CIF (catalytic glutamate and epoxide stabilizing histidine), light gray indicated are the newly introduced halide-stabilizing residues tryptophane and asparagine and underlined the alternatively introduced charge relay residue.

Mutant Position 58 63 85 130 153 154 174 175 177 178 184 203 207 269 272 CIF-WT M F A I E A L V H F D F H H M CIF01 - S - W G F R Y N W - - - - E CIF02 - S - W G F W G N I - - - - E CIF03 - S - W - F W G N I - L L - - CIF04 - S - T G W Y G N W - W - A E CIF05 - - - W G F - M N A - H - - E CIF06 - - - - - T ------CIF07 - S - W - T R Y N W - - - - E CIF08 - S - W - T W G N I - - - - E CIF09 - S - W - T W G N I - L L - - CIF10 - S - T - T Y G N W - W - A E CIF11 - - - W - T - M N A - H - - E CIF12 - S - W G T R Y N W - - - - E CIF13 - S - W G T W G N I - - - - E CIF14 - S - T G T Y G N W - W - A E CIF15 - - - W G T - M N A - H - - E CIF16 - S V W - T W G N I - L L - - CIF17 V S - W - T R Y N W - - - - E CIF18 - S - W - T R Y N W G - - - E CIF19 V S - W G T R Y N W - - - - E CIF20 - S - W - T R Y N W G - - - E CIF-DhaA CIF-WT I156-N249 substituted with DhaA I132-A216 CIF-DhlA CIF-WT I156-N249 substituted with DhlA L151-R229

Emanating from structural alignments, Luise Weiher proposed during her diploma thesis the additional substitution A154T, because of the halide-binding properties of the structurally equivalent position in HLDs (Figure 19).[306-307] Thus, additional mutants were created, derived from the initially proposed CIF01-05. Furthermore, the protein structures were constructed within YASARA[106, 308] and also analyzed in PyMol[309]. The plugin CAVER was used to explore binding sites and to control access tunnels for the substrates.[310]

Chapter three|Part I: Results 140

Figure 19: Structure of CIF-WT (pdb code 3KD2). The main domain is colored green, the cap domain in blue. The residues of the catalytic pentad for epoxide hydrolysis are shown as sticks, the mutational substitutions are highlighted orange (see also Table 2).

Cloning, cellular expression and protein purification Cloning of the mutants CIF01-05 were part of supervised bachelor and diploma theses, instructed by myself.[306, 311] Problems concerning the soluble protein expression were solved by using BL21 (DE3) Gold as host and protein production at 18-20 °C overnight, while shaking at lower frequencies (140-170 rpm). Unfortunately, as recognized before[303], mutants with hybridized or exchanged domains (cap-domain variants) did not yield soluble proteins. Thus, soluble protein fractions for the other mutants were achieved by intracellular protein expression, cell lysis and IMAC purification (Talon, Clontech, FR or HiTrap, GE Healthcare, UK). The proteins were analyzed by standard procedures to validate the structural integrity by circular dichroism and melting temperature determination.

Structural integrity of the mutants - Circular dichroism Both, the wavelength scanning and the fixed wavelength measurement at increasing temperature - performed by the application of circular dichroism - delivers important information about the protein architecture and the integrity of the macromolecule, but strongly depends on the purity of the solution.[312-317] All CIF mutants were analyzed for their secondary structure by scanning the circular dichroism from 260-180 nm (Figure 20). Chapter three|Part I: Results 141

A B

C D

Figure 20: Circular dichroism spectra for CIF01-06 (A), CIF07-13 (B) CIF14-20 (C) and for CIF-WT, DhaA WT, DhlA WT and denatured CIF-WT (D). The red line indicates the spectrum of CIF-WT, included for comparison. The denatured sample in (D) was yielded after incubation of CIF-WT for 10 min at 95 °C. The mean residue ellipticity was calculated from the measured ellipticity, the number of residues, the protein content, the molecular weight of the proteins and the path length through the cuvette.

The impact of the structure and residue arrangement on circular dichroism has been extensively used before.[229, 267, 318-320] Thus, the shapes of the spectra were quite similar, remarkable differences are recognized, by conversion of the raw data for secondary structure predictions (Table 3). For comparison, a denatured sample of CIF-WT (10 min incubated at 95 °C) was included in the measurements, resulting in a not evaluable signal (Figure 20D). The ordered helix content is in the initial mutants even, except for CIF02, for which the content of ordered strand was higher. The residue mutants show a lower distorted portion of helices and the same fraction of distorted strand, compared to CIFWT and CIF01-05. The unordered portion was not changing at all. Summarizing, the mutations were accepted without dramatically changes in the secondary structure.

Chapter three|Part I: Results 142

Table 3: Secondary structure calculations by the DichroWeb server[316-317, 321], obtained with experimental data from Figure 20.

Helix Strand Protein Turns Unordered Total ordered distorted ordered distorted CIF-WT 0.18 0.13 0.13 0.09 0.20 0.26 0.99 CIF01 0.16 0.12 0.14 0.09 0.21 0.28 1 CIF02 0.11 0.10 0.19 0.10 0.21 0.28 0.99 CIF03 0.18 0.14 0.12 0.09 0.20 0.27 1 CIF04 0.18 0.13 0.13 0.09 0.2 0.27 1 CIF05 0.19 0.14 0.11 0.08 0.21 0.27 1 CIF06 0.19 0.13 0.12 0.08 0.20 0.27 0.99 CIF07 0.20 0.13 0.12 0.08 0.20 0.27 1 CIF08 0.22 0.15 0.10 0.07 0.20 0.26 1 CIF09 0.12 0.10 0.17 0.10 0.21 0.29 0.99 CIF10 0.15 0.12 0.15 0.09 0.21 0.28 1 CIF11 0.16 0.12 0.15 0.09 0.20 0.27 0.99 CIF12 0.17 0.12 0.14 0.09 0.20 0.27 0.99 CIF13 0.21 0.14 0.11 0.08 0.19 0.26 0.99 CIF14 0.16 0.13 0.14 0.09 0.20 0.27 0.99 CIF15 0.13 0.11 0.17 0.09 0.21 0.28 0.99 CIF16 0.12 0.11 0.19 0.09 0.21 0.27 0.99 CIF17 0.12 0.10 0.17 0.10 0.21 0.28 0.98 CIF18 0.13 0.11 0.17 0.09 0.21 0.28 0.99 CIF19 0.18 0.13 0.13 0.09 0.20 0.27 1 CIF20 0.12 0.11 0.18 0.1 0.21 0.28 1 DhaA 0.26 0.16 0.08 0.06 0.19 0.25 1 DhlA 0.24 0.19 0.07 0.05 0.21 0.24 1

The DSSP analysis[322] within the protein data bank[323] yielded a content of 41% helical (124 residues in 17 helices) and 19% β-sheet (58 residues in 13 strands) secondary structures for the crystal structure of CIF-WT (pdb code 3KD2). The sheet content is in good correspondence, but the helical content (around 21-33%) calculated from the CD- spectra is too low. In contrast, the DSSP predicted content of amino acid residues in helices and sheets is 44% (132 residues in 19 helices) and 19% (57 residues in 11 strands) for DhaA (pdb code 1BN6) and 41% (130 residues in 18 helices) and 14% (44 residues in 10 strands) for DhlA (2HAD), respectively. The related calculations are in a very well agreement. Concerning the melting temperature, only small differences were detected (Figure 21). Compared to the wild-type enzyme, a maximum increase in the melting temperature of one Kelvin was measured for CIF17. A general tendency to destabilize the overall protein structure was reasoned as the TM mostly decreases. In cases of CIF01, TM dropped to 51 °C and for CIF04, CIF12 and CIF13 below 53 °C. The melting temperatures for DhaA-WT and DhlA-WT were 55.8 °C and 47.0 °C, respectively. The results were confirmed by DSC measurements, yielding a melting temperature of 57.5 °C for CIF-WT and 52.5 °C for DhaA. Chapter three|Part I: Results 143

A B

Figure 21: Melting temperatures measured for all CIF mutants. (A) Melting temperatures derived from CD data at 221 nm, (B) differences between the melting temperatures of each variant compared to CIF-WT demonstrated as ΔT.

Surprisingly, the CIF mutants did not produce a significant DSC signal. Even if higher protein concentrations (up to 2.5 mg mL-1) were used, no signal increase was detected, indicating a lower thermal barrier for the unfolding process during denaturation. As CIF- WT exhibits a cysteine bridge (C295-C303), the proteins were expressed in E. coli SHuffle® T7 for folding assistance by the DsbC chaperone and disulfide bond isomerase. The obtained CD data showed no differences, indicating the same structural fold. As the overall structural changes were not as drastic as expected for the introduction of 9-11 mutations, activity tests followed for the characterization of the putative enzymes (Figure 22).

Figure 22: Flow chart for the investigations of the CIF mutants. First, the structural integrity was checked by CD-spectra and melting temperature experiments. In parallel, the activities towards epoxides (adrenaline assay) and haloalkanes (Iwasaki assay) was tested. The exact kinetic parameters of HLD active variants could be determined by isothermal titration calorimetry (ITC) and the product formation could be verified by gas chromatography. To change the properties of the CIF mutants, additional rational design approaches can be calculated or random mutagenesis by error-prone (ep)PCR can be performed. While using directed evolution, a high-throughput (HTS) screening will become essentially necessary. Chapter three|Part I: Results 144

Activity tests and kinetic measurements The scaffold of CIF was modeled to stabilize the transition state of a haloalkane dehalogenase reaction. Therefore, useful residues were substituted and different assays are present to detect the target activities and also the residual epoxide hydrolase activity. For the first, pH-dependent assays were described, as the dehalogenase reaction produces an alcohol and oxonium ions, by proton release. This indirect assay (monitoring only the co-product) was applied for pre-screening on agar plates, high- throughput screenings or kinetic measurements.[290-291, 306-307] However, additional methods are available to search for the haloalkane dehalogenase reaction products, covering calorimetric, colorimetric and chromatographic methods. Contrarily to the manifold reports on screening methods, a useful selection system for the specific enrichment of haloalkane dehalogenase habiting microorganisms was only published for Pseudomonas sp.[277] The created CIF mutants were successfully overexpressed and purified.[311, 324] Thus, pure protein fractions can be provided for specific activity measurements towards possibly generated haloalkane dehalogenase activity or remaining epoxide hydrolase activity. The first was tested towards the general haloalkane dehalogenase substrates (Figure 18).[12]

Iwasaki assay for haloalkane dehalogenase activity

During the Iwasaki assay, released halide ions react with mercuric thiocyanate and ferric ammonium sulfate to stable mercuric halide salts and colored ferric thiocyanate, which is sensitively detected by spectrometric measurements at 460 nm (Figure 23).[325]

- - 2X + Hg(SCN)2 HgX2 + 2SCN

- 3+ 3SCN + Fe Fe(SCN)3 460 nm Figure 23: Detection of halides in solution is performed by the Iwasaki assay using chromogenic ferric thiocyanate for sensitive signals.[325]

The assay was successfully established to a detection limit of 0.148 mmol L-1 towards bromide ions and 0.178 mmol L-1 towards iodide ions. The following abiotic rates were determined, indicating the autohydrolysis in aqueous environments (Table 4). The change of adsorption at 460 nm was followed in biocatalysis reactions with purified enzymes (Figure 24).

Chapter three|Part I: Results 145

Table 4: Abiotic rates measured by the Iwasaki assay indicating the continuous autohydrolysis of the investigated substrates.

Substrate Abiotic rate [µmol L-1 d-1] 1 83.55 2 196.94 3 180.61 4 175.39 5 68.29

A B

C D

Figure 24: Data tendencies of the background corrected values for the Iwasaki assay in biocatalysis reactions with substrates 1 (A), 2 (B), 3 (C) and 4 (D). Each assay with one mutant was performed with DhaA as positive and CIF-WT as negative control. As example, the initial mutants CIF01-05 are depicted. The figure is continued on the next page. Chapter three|Part I: Results 146

E

Figure 24, cont.: Data tendencies of the background corrected values for the Iwasaki assay in biocatalysis reactions with substrate 5 (E). Each assay with one mutant was performed with DhaA as positive and CIF-WT as negative control.

The data tendencies clearly show that the absorption over time did not increased much for all CIF mutants. In case of DhaA, the maximum of absorbance was reached after two hours conversion. The values for the CIF mutants were nearly the same as for CIF-WT. The background signal was calculated by measuring each composition individually in blank measurements with solely buffer, substrate or protein. For the latter, the highest fluctuations were observed. However, in a time frame of 120 min, the activities of DhaA and DhlA were measurable in their linear relation with sufficient significance. Contrarily, for all CIF mutants (CIF01-20), no increased signal, significantly differing from the background signal of the abiotic rate (with protein CIF-WT) was measured. Thus, no halide ions were released, when the substrates were incubated with the proteins, indicating no enzymatic reaction. For DhaA and DhlA the activities were measured successfully in short time measurements within 60 min (Table 5).

Table 5: Specific activities of DhaA and DhlA measured at pH 7.5, compared to reported activities at pH 8.6 towards the five general haloalkane dehalogenase substrates.

Specific activity pH 7.5a Specific activity pH 8.6a,b Substrate DhaA DhlA DhaA DhlA 1 4.49 11.19 64.8 64.3 2 3.64 17.47 11.6 19.9 3 2.01 11.10 22.8 14.1 4 3.56 13.55 14.8 13.6 5 3.04 20.35 39.6 63.3 [a] calculated as nmol s-1 mg-1 enzyme [b] Koudelakova et al.[12]

The epoxide hydrolase has a pH optimum of around 7.4, but HLDs usually prefer more basic conditions.[200] It was described, that soluble amount of CIF was only achieved Chapter three|Part I: Results 147 within a pH range of 6-9.[199] As the activity of haloalkane dehalogenases is not as strong influenced[13] and also the pH within the range of 6-9 had no significant impact on their structure[319], the physiological pH of 7.5 was used for the measurements to optimize the C stability of the CIF constructs, while ensuring the possibility of catalytic efficiency. At the decreased pH, the activity of both, DhaA and DhlA decreased, as expected. The strongest influence was measured towards DhaA, inhibiting the activity up to 14-fold for 1,2- dibromoethane and 13-fold for 4-bromobutane nitrile. For DhlA the strongest inhibition was found for the same substrates, with 17.4 and 32.15% residual activity, respectively. These results are in good agreement with related projects, describing significant loss of activity for DhaA and DhlA with decreasing pH values.[13, 326]

Adrenaline assay for epoxide hydrolase activity

The epoxide hydrolase activity was determined by the adrenaline assay, forming red adrenochrome, when it becomes oxidized by sodium periodate (Figure 25).

O EH OH NaIO4 O O adrenaline OH + colorless R R R H

OH OH H O HO N NaIO4

O N HO adrenaline adrenochrome red 490 nm Figure 25: Assay reaction for the detection of epoxide hydrolase activity. The epoxide is converted to a vicinal diol, which can be oxidized by splitting the carbon-carbon bond with sodium periodate. If adrenaline is added to the solution, no color change is visible, only a very slow coloration to brown, by oxidation of adrenaline by air oxygen. If the protein does not possess the desired activity, sodium periodate cannot react with a vicinal diol. In contrast, the fast oxidation of adrenaline to adrenochrome will yield a red color, measurable at 490 nm.

At first, different amounts of propane diol were titrated with sodium periodate and the reaction outcome was visualized by adrenaline bitartrate to estimate the most suitable assay conditions (Figure 26). Chapter three|Part I: Results 148

Figure 26: Initial standard plate for the estimation of optimal compound concentrations for the color shift from red to colorless, due to periodate oxidation with added diol.

The lowest detectable activity was about 0.159 mU, corresponding to the formation of 0.228 mmol of the diol within 24 h. Thus, the reaction conditions were set up to detect the minimal amount of released diols and the CIF proteins were analyzed against several substrates (Figure 27).

O O O O O O OH Cl Br

6 7 8 9 10 11

O O O O O O O O

O N Cl 2 O2N 12 13 14 15 16 17 Figure 27: Epoxides for the quick adrenaline test of residual activity. The substrates differ in bulkiness and substituents to cover the most common epoxide hydrolase substrates.

As expected, all mutants showed no activity, due to the mutated, essential H177 for epoxide ring opening. Even the substitution H177Y to the conserved tyrosine of epoxide hydrolases did not rescue the activity.[16, 199-200] Interestingly, the CIF-WT was not capable of the conversion of the classical described substrate 9. Contrarily, CIF was found to be able to convert substrates 11-14 and 16. Substrate 7 produced a high autohydrolysis background, even after 3 h incubation. The negative controls DhaA and DhlA produced a positive signal for substrates 9 and 8, respectively. This is in perfect agreement to previous studies, as DhaA converted substrate 9 and DhlA substrate 8 to the corresponding alcohol, which undergoes rapid autohydrolysis. After 24 h incubation, additionally the substrates 12 and 16 underwent significant autohydrolysis. The ability for the noticed substrate specificities was only Chapter three|Part I: Results 149 performed as a quick test with adrenaline bitartrate for estimating the residual epoxide hydrolase activity in short end point measurements. Thus, EchA was also found to accept a broader substrate scope, covering substrates 8-10, 12-14 and 16.[303, 311, 324]

High-throughput HLD screening assay on a robotic platform The adaption of a suitable high-throughput system for the screening of haloalkane dehalogenases was necessary, because also directed or even focused directed evolution was performed on the inactive CIF mutants in bachelor, master and diploma theses, supervised by myself.[306, 311, 324] Thereby, error-prone PCR was chosen to generate random substitutions in the genes of CIF01-05, the most promising variants initially calculated. The first attempt was an agar plate assay, using a mixture of phenol red and bromothymol blue.[306] After exposure of 100 µL of pure HLD substrate, the deep purple color shifted to yellow, when the pH decreased under pH 7.0 by proton release in the dehalogenase reaction. The mixture simplifies the subjective discrimination between cell colonies, expressing HLD active (yellow halo) or inactive proteins (deep purple). The assay was successfully established, but the throughput is limited by the colony number on the agar plates, the supply of oxygen during cell growth and the high amount of pure halogenated compound. For optimal exposure, the glass dishes were sealed with Parafilm M®, inducing anaerobic conditions, which leads to acid generation during the bacterial metabolism, influencing the number of false positives.[306] Hence it was only suitable as pre-test and an alternative MTP assay was established on a robotic platform, gaining information about the absorption behavior of the used phenol red solution.[288] To minimize the effort and increase the throughput, the assay was drafted for whole cell catalysis. Thus, washed cells of freshly grown cultures, replica plated from a master plate, were used. This master plate was arranged automatically by a picking robot with culture plates from transformed mutant libraries. The final assay composition uses a weak buffer (1 mmol L-1 HEPES, pH 8.2) and a substrate mixture of the general HLD substrates (100 mmol L-1 each in acetonitrile). The assay was tested successfully[306] and also the objective evaluation by absorption measurements is possible, measuring a decrease in the absorption at 560 nm (red) and an increase at 430 nm (yellow, Figure 28). The values were compared and set into relation to the values of the negative controls (buffer blank and CIF-WT) and the positive controls (DhaA and DhlA), resulting in a positive hit or a negative signal.[324] A higher range of false-positive signals was expected, because of medium acidification by individual metabolic changes in the expression cells or anaerobic acid production.

Chapter three|Part I: Results 150

A B

Figure 28: pH-dependent well color (A) or absorption spectrum (B) of phenol red solution at defined pH values. The benchmark for active colonies was reached after a pH shift to pH<6.0. The criteria was sufficient to discriminate between colonies expressing DhaA (or DhlA) WT (yellow, positive) and CIF-WT(red-orange, negative).

Selection assay towards haloalkane dehalogenase activity The toxicity of haloalkanes is derived from their reactivity with biomolecules, acting as alkylating agents, generally considered genotoxic. Haloalkane dehalogenases act as detoxifying enzymes, degrading the pollutants of naturally or nowadays industrially produced haloalkanes. The irreversible addition of an alkyl chain to a target molecule influences its function beyond non-functionality. The toxic effect of 3-chloro- and 3- bromo propanediol towards E. coli cells of the strains BL21 (DE3), BL21 (DE3) gold, DH5α, Top10 and XL10 gold was investigated in a previous thesis.[304] It was shown, that cells, expressing the haloalkane dehalogenase DhlA are able to tolerate a concentration of up to 15 mmol L-1 of 3-chloro propanediol (substrate 6) and up to 10 mmol L-1 3- bromo propanediol (substrate 7). Cells expressing DhlA possessed a growth advantage towards cells with EchA protein, unable to cleave the carbon-halogen bond, indicated by a stronger growth, than DhlA cultures without added substrate. The converted diol yielded glycerol, which acted as carbon source for the microorganisms. Interestingly, the growth differences in strains, unable of DE3 dependent protein expression were lower, even if they became growth inhibited as well. But at least, the inhibition was not as high as with expressed, HLD inactive protein. Thus, an additional selection benefit was to stop the protein expression, either by mutating the lacUV5 promoter in front of the T7- polymerase, the T7-polymerase itself, the T7-promoter on the plasmids or the expressed genes. Certain stop-codon mutations, leading to a truncated protein mutant, were detected. Interestingly, this mutation was more close to the start codon, when selection pressure increased.[304] Chapter three|Part I: Results 151

The used substrates were focused on the production of glycerol, but the general haloalkane substrates were not tested. Thus, pre-induced cell cultures with HLD active (DhaA or DhlA) or inactive (CIF-WT) proteins were applied in toxicity tests (Figure 29).

Figure 29: Principle of the selection assay with investigated substrates and the reaction scheme, X = halide.

By cultivation in M9 medium, the addition of glucose as carbon source became necessary. Thus, concentrations from 0.1-0.6% glucose with 15-25 mmol L-1 of the HLD substrates were tested and compared to negative controls (CIF-WT) or cell cultures without toxin (but with the same amount of glucose and 2% acetonitrile), yielding best results with 20 mmol L-1 of the general substrates in a range of 0.1-0.4% glucose (Figure 30).[327]

Figure 30: Survival rate, indicated by relative growth of the selection culture with the respective HLD substrate, compared to the same cell culture without toxin (but with 2% acetonitrile). Substrate 6 was 3-chloro propanediol and substrate 7 3-bromo propanediol.

Chapter three|Part I: Results 152

For substrates 1-5 a concentration of 20 mmol L-1 was sufficient, using 0.1% (2 and 4), 0.2% (1 and 5) or 0.4% (3) glucose. Higher concentrations of the substrates inhibited also the growth of DhaA and DhlA expressing cultures, even when more glucose was added. The amount of glucose was the minimum before the culture growth of cells, with and without HLD was sufficiently distinguishable: Lower concentrations of toxin or higher concentrations of glucose also promoted the growth of the negative control. In case of substrates 6 and 7 no glucose was added, indicating an autohydrolysis of the substrates, capable of yielding enough growth substrates. The tested concentration for the glycerol derivatives was 15 mmol L-1. In the given concentration range, growth under continuous selection was desired, while accumulating enzymes, capable of the detoxification of the medium, while promoting cell growth or at least cell survival. Focusing on the positive effect of accumulation, enzyme mutants of DhaA and DhlA were created, which are inactivated by the substitution of the essential nucleophile and histidine. Then, the respective positions were saturated by NNK (nucleophile) or NDT (histidine) codons and the library underwent selection towards the toxic substrates (Table 6).[327]

Table 6: Non-functional mutants created for the accumulation of wild-type revertants in the HLD selection assay. First, the catalytic residues were substituted by alanine, creating non-functional mutants. Second, mutant libraries were prepared by site-directed mutagenesis of the related positions with NNK-, NDT-codons or both.

Target Reversion Method DhaA DhlA residue chance [%] non- D106A D124A nucleophile functional 3.125 saturation A106NNK A124NNK non- H272A H289A base functional 8.333 saturation A272NDT A289NDT non- D106A, H272A D106A, H289A DUAL functional 0.26 saturation A106NNK, A272NDT A124NNK, A289NDT

Presuming an optimal distribution, the chance to yield the wild type is 8.3% for saturation with NDT, 3.1% for NNK and 0.26%, when both positions undergo saturation mutagenesis. The DUAL libraries (respectively with both saturated positions) were selected for wild-type revertants in selection cultures, derived from pre-induced expression cultures of freshly transformed cell. Thus, enough enzyme molecules were produced, to withstand the toxic environment during the selection. The cultures were grown for five days, changing the selection media each day to supplement the cell cultures with fresh nutrients and enough toxin, to suppress the growth of cells, Chapter three|Part I: Results 153 expressing HLD negative (inactive) proteins (Figure 31). In this way, also truncated protein mutants or the loss of the T7 RNA polymerase-dependent protein expression machinery should be prevented.[327]

A B

C D

E

Figure 31: Growth of selection cultures of E. coli BL21 (DE3) with toxic substrates 1-5 (A-E, respectively). Each day, the cells were centrifuged, washed and the whole cell material was transferred to a freshly supplemented cell culture.

The cell cultures expressing the HLDs DhaA or DhlA, as well as their combined site- saturation libraries DUAL grew under all selection conditions to cell densities of up to 2.0. For substrate 1 DhlA expression facilitates higher cell growth, but also DhaA showed Chapter three|Part I: Results 154 dense cell growth in continuous cultivation. In case of substrate 2 an even growth of all cell cultures was observed, even if the glucose supplement is decreased to 0.1%. However, CIF-WT exhibited the lowest cell densities at all times. When substrate 3 was used for selection, the DhlA expression cultures grew to higher densities, as observed in the preliminary toxicity tests. Substrate 4 inhibited the cell growth of CIF-WT, but could not prevent it, even if the curve was again the lowest observed. However, cells with DhaA reached higher densities, as in the toxicity tests. Finally, with substrate 5, the slow cell growth led to high cell densities, after five days continuous selection, whereas the expression on DhlA permitted higher values. The cell cultures expressing the negative control CIF-WT were inhibited will all toxic substrates, but only in the cases of substrates 1, 3 and 5 a complete culture death was achieved. No living cells were detectable, because no colony formation was counted, when samples of those selection cultures were spread on LB agar plates. To determine the reversion rates, samples during the selection cultivation were taken after each day and grown on LB agar plates to a colony density, suitable for the picking robot. Thus, cell material was transferred into microtiter plates and screened for activity in the phenol red assay at the robotic platform (Figure 32).[288]

A B

C D

Figure 32: Phenol red assay with 1 mmol L-1 HEPES pH 8.2 and 20 mg L-1 phenol red. Typical results from (A) a selected DhlA DUAL library, (B) a selected DhaA DUAL library, (C)-DhaA-WT and (D) an HLD inactive enzyme. The blank wells without cells are shown by a black box. The red wells in (C) were excluded by the rate measurements, because by the automatic OD600 measurements no cell growth was detected.

Chapter three|Part I: Results 155

The highest number of hits was achieved for DhlA after one day of selection, but for DhaA only after five days. In case of substrate 2, only hits were counted for DhlA, but also only 44% with substrate 5. In case of DhaA the lowest reversion was 75% with substrate 1, but up to 95% for substrates 4 and 5. The mutagenesis of the dehalogenase enzymes cannot be excluded, but all sequenced samples yielded the wild-type dehalogenases for both case studies. With a suitable selection system at hand, the pre-selection of mutants derived from directed evolution, e.g. by error-prone PCR, was possible. An up-scale of the selection culture to a 30 mL scale was easily applied. It was necessary to transform the products of the random mutagenesis first in TOP10 to repair the plasmids and to yield a sufficient library size, before transformation to BL21 (DE3) for selection. But also BL21 (DE3) gold cells were successfully applied in the selection assay, able to repair nicked plasmids, finally avoiding the intermediate step of plasmid isolation. The transformation efficiency was lower, but not as low as in case of direct transformation in BL21 (DE3).[324] Certain error-prone PCRs, achieved from different PCR conditions, as a high amount of manganese sulfate, the addition of an unbalanced dNTP mixture, special mutagenesis polymerases and nucleotide analogs, were selected, yielded more positive hits in the automatized screening, as the non-selected libraries. After re-screening the hits in master plates, the amino acid changes were determined and the proteins were purified for the Iwasaki assay. Finally, no halide release was detectable. The selection assay was furthermore applied to the CIF-mutants CIF01-20 and surprisingly, some mutants survived the selection process. It must be noted, that only the general substrates 1, 3 and 5 were used, eliminating the cells expressing the initial negative control of CIF-WT (Figure 33).

A B

Figure 33: Growth curves of pre-induced CIF mutants CIF01-20, selected towards the substrate 1 (A). After four days, the plasmids were isolated from the surviving cell cultures, freshly transformed into new competent cells and selected again (B). The curves are labeled only for the surviving cell cultures. The figure is continued on the next page. Chapter three|Part I: Results 156

C D

E

Figure 33, cont.: Growth curves of pre-induced CIF mutants CIF01-20, selected towards the substrates 3 (E) and 5 (C and D). After plasmid isolations from the surviving cell cultures, freshly transformed cells were selected again (D). The curves are labeled only for the surviving cell cultures. Please note the complete death of all mutants, when 4-bromobutane nitrile (substrate 3) was used.

The toxicity tests tuned the substrate concentrations to 20 mmol L-1 to eliminate cells, expressing the CIF-WT. However, measurable growth rates were achieved at the same glucose concentration, when 15 mmol L-1 of the substrates were added. Thus, the expression of CIF01-06, CIF13-16, CIF18 and CIF20 permits growth within the environment of 20 mmol L-1 1,2-dibromoethane and also CIF02, CIF03, CIF11, CIF12, CIF16 and CIF20 enabled the cellular growth in 20 mmol L-1 1-iodobutane. It is possible that either the toxin concentration was lowered by at least 5 mmol L-1 of each substrate by the mutants or the proteins have another unknown effect to the cells, leading to cell survival. The plasmids were sequenced after the second selection round and only the initial protein sequences were found. The decrease/detoxification of a maximum of 5 mmol L-1 of the toxin requires the binding or the catalysis of 25 µmol substrate. Hypothetically assuming a 1:1 binding stoichiometry, a protein content of 172 mg mL-1 Chapter three|Part I: Results 157 would be required. Thus, additional experiments became necessary to clarify these observations: Within ITC experiments, the binding or catalytic behavior of proteins can be measured.[328-331] Furthermore, it is possible to mix the substrates and purified enzymes to determine their interaction by tryptophane-quenching. Additionally, GC experiments were done to detect the alcohol products. These methods will be referred to in the next section. At first, the selection assay was adapted to figure out, how much of the purified protein must be added to enable cell survival of CIF-WT in the environment of 1 and 5 (Figure 34).

A B

C D

Figure 34: Growth rates of pre-induced CIF-WT M9 cultures selected with 1,2-dibromoethane (1,2-DBE, 1), 4-bromobutanenitrile (4-BBN, 3) or 1-iodobutane (1-IB, 5). Different amounts of purified DhlA (A), DhlA H289F (B), CIF02 (C) or CIF04 (D) were added to the selection culture. Growth was measured after 24 h incubation at 37 °C. Non-selected E. coli cell cultures, expressing CIF-WT, grew up to an optical density of 1.0 and 1.3 with 0.2% (substrates 1 and 5) or 0.4% glucose (substrate 3), respectively.

When DhlA WT was added to selection cultures, the growth of cells expressing CIF-WT was promoted, with a minimum concentration of 1 µg mL-1 added enzyme to the cell culture. As expected from the activity measurements at pH 7.5, DhlA exhibited the highest activity towards substrate 5, indicated by the strongest measured growth. Substrate 3 was converted slower, but the growth rate increases, when more DhlA Chapter three|Part I: Results 158 enzyme was added. In case of substrate 3 a slow promotion was recognized, rapidly ceasing at an OD600 of 0.2. In contrast, the addition of DhlA H289F led to stable OD600 values, not changing at all, only a slight increase for substrate 1 over the residuals and a slight increasing rate for substrate 3 at 10-15 µg mL-1 protein. In DhlA H289F, the catalytic base is substituted by phenylalanine. Thus, the protein is able to react with the haloalkane until formation of the alkyl enzyme intermediate, trapping the substrate- enzyme complex.[262, 271] This reaction has application in protein purification as the so called Halo-tag.[332] However, binding of at least 25 µmol substrate molecules, requires 180 mg mL-1 of DhlA H289F. Decreasing this amount to 1 µmol, a protein concentration of 36 mg mL-1 would be necessary. The protein concentration is too low for sufficiently binding enough substrate molecules in a suicide reaction by a 1:1 stoichiometry. However, the protein is not able to promote growth of the cells sufficiently. The same result is observed with CIF02 and CIF04. In stronger resolution, a small rise up to an optical density of 0.07 for CIF02 and substrates 1 and 5 and for CIF04 with substrate 1 was measured. During the selection assay, cell cultures over a threshold value of 0.05 was used as an absolute minimum for the further cultivation, at the very least cell count, able to induce cell growth after additional selection days. This critical value was achieved for CIF02, yielding to survival of cells with CIF-WT, as well as for CIF02 in cell cultures with 20 mmol L-1 1,2-dibromoethane or 1-iodobutane. In case of CIF04 this was only observed for 1,2-dibromoethane, as well as in the selection assay with cells, expressing CIF04. Thus, the cell survival must be directly related to the certain protein variants, leading to the individual selection advantages, which are not an artifact during the cell cultivation or selection.

Gas chromatography analysis For the analysis of product formation or substrate consumption, a GC(-MS) protocol was available, described before.[303] During a research stay at the Masaryk University, Brno (CZ), additional GC-experiments were carried out in the group of Jiří Damborský (Loschmidt Laboratories Department of Experimental Biology). Together with Pavel Vaňáček, the product formation or substrate elimination was followed for CIF-WT (negative control) and the mutants CIF01-20 for the conversion of 1,2-dibromoethane. For both, reactions at pH 7.5 or pH 8.6, no product formation by alcohol release was detectable at 37 °C within 24 h. The decreasing concentration of the substrate was only correlated to the abiotic rate.

Isothermal titration calorimetry Two individual experiments were necessary to calculate the kinetic data of an enzymatic reaction by calorimetry: The ITC analysis of the to determine and recalculate the Michaelis-Menten kinetic and an additional total conversion experiment to determine the total reaction heat, giving the enthalpy of the complete transformation Chapter three|Part I: Results 159 for a known amount of substrate. In these multiple injection experiments, the substrate solution was added to the enzyme by the injection syringe. Contrarily, in single injection experiments the enzyme solution was added to a substrate solution, yielding all experimental data in one experiment. In general, ITC is used for the determination of binding parameters of a ligand and a receptor molecule. But switching the content of the sample cell and the syringe will produce similar signals in binding experiments. During enzyme catalysis, the measured plots strongly differed (Figure 35).

A B

Figure 35: Multiple injection experiments with 1,2-dibromoethane titrated to DhaA-WT. Through mainly differences in the spacing time, either the total conversion heat of a defined substrate concentration (A) or the reaction heat, developed from an increasing amount of substrate (B) was measured.

For initiating the experiments, different set-ups were executed with DhaA-WT, yielding feasible data at enzyme concentrations of 3-3.5 µg mL-1. After recalculating the substrate concentrations the maximal conversion velocity was determined in two -1 experiments. Finally kcat of DhaA wild-type was determined to 6.601 s with a kM of 1.242 mmol L-1 for the conversion of 1,2-dibromoethane at 37 °C. This is in excellent agreement to the previous reported constants of 6.2 s-1 and 1.2 mmol L-1.[266, 326] For each protein and substrate pair, defined reaction conditions must be set up by variation of the substrate and enzyme concentrations, the spacing time, the injection volume and the injection time. As this is a very time consuming procedure, the interactions of the “survivors” of the selection were chosen, together with the respective substrates, namely 1,2-dibromoethane (substrate 1, Figures 36 and 37) with CIF01-06, CIF13-16, CIF18 and CIF20 and 1-iodobutane (substrate 5, Figure 38) with CIF02, CIF03, CIF11, CIF12,CIF16 and CIF20. Chapter three|Part I: Results 160

Figure 36: Plot of the blank measurements. The titration of buffer to buffer, substrate to buffer and substrate to CIF-WT was measured; the spacing times were individually adapted for each experiment.

A B

Figure 37: Multiple injection experiments with 1,2-dibromoethane titrated to purified proteins of the positively selected variants. CIF-WT was included in these plots to visualize the high spacing time and the dilution heat of this reaction.

The addition of substrate 1 to the protein solutions of the CIF mutants produced different heats, developed from the substrate-protein interaction. The CIF-WT blank exhibited a nearly perfect curvature for the first injection, but then, only a dilution signal is produced. The signals from the mutants are significantly lower, thus the spacing time was lowered to produce the plots shown in Figure 35. For CIF13, CIF14 and CIF20, only a dilution signal was obtained, but as observed for CIF-WT in the first injection, the other CIF mutants also exhibited a stepped plot for the first 2-4 injections. Especially CIF02, CIF15 and CIF16 showed this behavior. The signals for the reaction heats of substrate 5 added to the mutants were lower as for substrate 1 (Figure 38), indicating less interaction, but also a stepped curvature of the initial injections was observed for CIF02, CIF12, CIF16 and CIF20. Chapter three|Part I: Results 161

A B

Figure 38: Multiple injection experiments with 1-iodobutane titrated to purified proteins of the positively selected variants. The buffer-buffer and substrate-buffer titration was included for comparison to the baselines. As negative control, CIF01 was included as non-survivor.

The lower heating signals are due to the lower substrate concentration, caused by the lower solubility of substrate 5 in buffer. The lower the substrate and enzymes concentrations used, the more the signal increased during the titrations: CIF01 is included in Figure 38A as extreme example (0.8 mg mL-1 enzyme solution, 0.6 mmol L-1 of substrate 5). The sensitivity of the experimental setup and the instrument is thus exploited. Indeed, a trend of substrate-protein interaction can be assumed for the surviving variants, but no kinetic data can be obtained. If there was no enzymatic catalysis observed, possible binding interaction were analyzed by ITC, switching the feedback mode to high for a stronger machine response and choosing the fast thermal equilibration for sensitive measurements. Furthermore, the multiple injection parameters were optimized to perform injection after reversion of the signal to the baseline. For binding experiments, substrate 1 and CIF01 and CIF18 were used (Figures 39 and 40). In Figure 39 three different protein concentrations were used for the same amount of substrate. A remarkable difference in the heat developed from each titration was found for a concentration of 2.0 mg mL-1 (peak intensity decreased by saturating the protein with substrate). This draw the hypothesis that the amount of substrate was too high during the reaction. After suggesting a 1:1 binding mode of substrate and protein, CIF18 was used in the next experiments (Figure 40). The raw data for the experiments with less substrate are shown in Figure 40. The developed heat dropped dramatically, because only the half amount of substrate was initially used. The curve characteristics are for the first titrations very similar to those expected for binding titrations, but no saturation could be observed; therefore the interaction between both binding partners seems to be very weak. It has to be estimated, at which point of the catalytic cycle the reaction stops, if any binding of the substrate inside the protein happens. Chapter three|Part I: Results 162

A B

C

Figure 39: Binding raw data for CIF01 with substrate 1 at protein concentrations of (A) 2 mg mL-1, (B) 1 mg mL-1 and (C) 0.5 mg mL-1. The strongest change in heat during the titration experiments was achieved with the highest protein concentration of 2.0 mg mL-1.

Figure 40: Binding experiments with CIF18 (2 mg mL-1) and different amounts of titrated substrate (5-10 µL).

Chapter three|Part I: Results 163

Tryptophane quenching To estimate the correlation between the halide binding and the appropriate concentration of enzyme saturation, tryptophane quenching was used. Thus, bromide or iodide salt was added to an enzyme solution and the change in fluorescence was measured. Tryptophane residues can be specifically excitated at 290 nm, resulting in an emission at 324 nm. Depending on the concentration of a quenching agent (halide ions), the fluorescence decreases with the increasing amount of the quencher. Substrates 1, 3 and 5 were used with CIF01-05 to estimate the dimension of the fluorescence change. DhaA H272F was used as positive control, due to the assumed suicide binding of the substrates (Figure 41).

A B

Figure 41: Fluorescence spectra of DhaA H272F (A) and CIF-WT (B) with an increasing amount of 1-iodobutane. The fluorescence value, used for the calculations was measured at 324 nm, as indicated by the arrow.

The fluorescence signals can be used to determine the dissociation constants of the quencher and the enzyme by using the Stern-Vollmer equation: 퐹 1 = − 퐹0 1 + 퐾퐷 × [푋 ] The drop in the fluorescence signal, depending on the initial fluorescence leads to an [260, 333] rational decay, which allows the calculation of KD (Figure 42, Table 7). Chapter three|Part I: Results 164

Figure 42: Evaluation of the fluorescence changes during the addition of a quencher. In this case, potassium iodide was added to different enzyme solutions. The curve was fitted to calculate the dissociation constant by the Stern-Vollmer equation (see above).

Table 7: Dissociation constants derived from the quenching experiments in [mmol L-1].

KBr 1,2-dibromoethane KI 1-iodobutane DhaA H272F 222.09 ±5.17 20.43 ±2.32 30.26 ±1.67 9.52 ±5.55 CIF-WT n.d. 3.94 ±2.00 13.90 ±0.81 11.69 ±4.44 CIF01 15.65 ±1.42 3.45 ±1.24 14.15 ±0.99 10.06 ±0.44 CIF02 n.d. 9.38 ±2.27 11.51 ±1.11 20.90 ±5.32 CIF03 n.d. 5.79 ±1.71 14.29 ±1.60 14.88 ±2.44 CIF04 0.60 ±0.16 6.12 ±1.45 13.55 ±1.05 9.55 ±3.22 CIF05 n.d. 3.93 ±1.11 16.98 ±1.19 6.98 ±1.29 n.d. - not defined

Structural modeling Within YASARA[106, 308] structure models for CIF01-05, CIF-DhaA and CIF-DhlA were calculated, refined (intern macro md_refine) and inspected by PyMol[309]. The domain exchange experiments were hampered by the introduction of a loop, blocking the active site of those variants.[306] This problem could not be solved, even if structural hybrids served as modeling templates or using the pdb query algorithm to calculate the structure de novo. The main problem is the new adjustment of the catalytic residues, because some residues of the catalytic pentad are positioned in the cap domain (DhlA) and some are not (DhaA). Even the engineered residues from the CIF mutants influenced the structure and the possible activity. Additionally, the shape and the throughput of the entrance tunnels and the position of the residues which are responsible for the Chapter three|Part I: Results 165 development of the halide collision complex must be able to accommodate activity. The catalytic acid D106 was not influenced by the domain exchange, so the point between the oxygen atoms of the carboxylic group of the gamma carbon was chosen as starting point for the tunnel calculation with Caver3.0 (Figure 43).[310]

A B C

D E F

G H

Figure 43: Tunnel calculations with Caver3.0 for (A) CIF-WT, (B-F) CIF01-05, respectively, (G) DhaA-WT (pdb code 3RK4) and (F) DhlA-WT (pdb code 1B6G). The active site nucleophile and base are represented in sticks. The tunnel clusters, reaching the surface at the same or a related position of CIF-WT are colored blue, different clusters are shown in purple. Please note the small sphere at the end of the entrance tunnel mouth of CIF-WT.

The entrance tunnel mouth for CIF-WT was very narrow, but in the other calculations it is widened up, indicating the acceptance of bulkier substrates. The entrance of CIF-WT was found for CIF02-04 in the same position, otherwise a slightly changed entrance Chapter three|Part I: Results 166 position defined a new tunnel mouth. Thus, the residues for substrate acceptance and substrate channeling changed. As depicted for DhlA-WT, a second tunnel entrance can be established, as found in CIF04 and CIF05. The basis for Rosetta calculations were the residues of DhaA, also the mutant structures may use the new tunnel. However, Rosetta only calculated the most stable residue position in the protein scaffold for stabilization of the transition state, facilitating enzyme catalysis. The tunnels have a homologous shape, length and bottleneck radius, which is widened compared to the wild type. It is also possible, that the final structural fold in solution has a too wide tunnel with a catalytic site, unable to stabilize the transition state, because of solvent molecules in the reshaped active site. The domain exchange example of CIF-DhaA is shown in Figure 44.

A B

Figure 44: Example for the consideration of the entrance tunnels for the homology model of CIF- DhaA. Only the cap domain and a loop responsible for the entrance site are depicted. CIF-DhaA is shown in red, CIF-WT in green (A) and DhaA-WT in magenta (B). The calculated tunnel is shown in blue for CIF-DhaA and in orange for DhaA.

The new tunnel cluster in CIF-DhaA slightly changed its position, whereas this entrance is blocked in CIF-WT by a small antiparallel sheet, composed of two β-strands. Obviously, the positions of the helices changed when the DhaA-cap sequence is modeled in the CIF scaffold. The interactions to the main domain mainly influenced the cap organization, thus, the entrance in CIF-DhaA is now organized by loops and is not as well defined as in DhaA, positioned between two helices. The problem of the re-localization of the cap helices may lead to the insoluble protein, achieved after protein overexpression. Indeed, the missing interaction partners can lead to the exposure of hydrophobic residue to the solvent, leading to aggregation. Thus, a refolding procedure would be obsolete.

Chapter three|Part I 167

Discussion

The aim of the thesis was the understanding of the correlation between epoxide hydrolases and haloalkane dehalogenases in a case study for the interconversion of both activities. The motivation of the comprehension of the evolutionary relationships draws the hypothesis about the origin of both enzyme classes, which are responsible of essential detoxification/degradation pathways. The phylogenic connection, wherein also haloacid dehalogenases must be noted, is very close and thus, structural design was performed to increase the knowledge about structure-function relationships. Indeed, first attempts to interconvert the epoxide hydrolase EchA and the dehalogenase DhlA failed, even if site specific saturation or whole genome mutagenesis was applied.[303-304] The initial crystal structure of DhlA from Xanthobacter autotrophicus GJ10 exhibits a very narrow entrance tunnel and the correlation between the stabilizing residues, compared to the epoxide hydrolase mechanism were difficult to model and to predict the impact of introduced residues. Thus, the introduction of the related tryptophane residues into the EchA may compress the active site to inactivity. On the other side, the initial structure of EchA was not sufficiently defined, as loop regions are missing and a glutamine residue with unfavorable torsions is pushed into the active site by crystallization forces, blocking the accessibility of the active site for any docking study or predictive calculations. Interestingly, in this study it was observed that the glutamine residue is not able to swing back into the favored position, when refinements or molecular dynamics were performed with the structure. Moreover, no new attempt to resolve the structure was done and thus, the best characterized epoxide hydrolase misses a sufficient structure for further analysis. The mutations of previous studies may have influenced the space inside the active site, the tunnel path or the entrance mouth. As alternative, the calculations of this thesis were performed with another epoxide hydrolase and the dehalogenase DhaA from Rhodococcus rhodochrous. This enzyme exhibits a higher tunnel throughput and a less narrow bottleneck radius, than DhlA. Thus, substrates with longer chains are accepted by the additional space from the substituted active site residue, which is a second tryptophane in DhlA, but only a small, very flexible asparagine residue in DhaA.[266] The structural comparison of transition states towards the general haloalkanes by using the catalytic and active-site shaping residues of DhaA yielded a theozyme, which was then applied towards the protein scaffolds of described epoxide hydrolases. The most promising match was with CIF, an epoxide hydrolase secreted by Pseudomonas aeruginosa for disturbing the flux of surfactants from alveolar epithelium cells to colonize the lung surface. For whatever reason this inhibitor is an epoxide hydrolase, the intrinsic activity is essential for the inhibition of the target ion channel CFTR. The coding sequence of the enzyme was ordered codon-optimized, cloned and mutated in a parallel, straightforward procedure to introduce the necessary amino acids, predicted by the Rosetta algorithm. In parallel the structures were modeled within YASARA and Chapter three|Part I: Discussion 168 analyzed for their capacity to bind the general haloalkanes. The tunnel clusters exhibited a promising throughput, but residues, which were partially blocking the essential tunnel pathways were identified in a connected, supervised diploma thesis.[306] Thus, additional mutants were created and analyzed. Within the structural analysis, the overall structure was not as strongly disturbed, as can be expected by the nine amino acid substitutions. After the evaluation of the CD-data, the strongest changes towards the predicted secondary structure were observed for CIF02. The overall portion of helical elements was decreased, while the unordered and stranded portions increased. These differences were not visible by homology modeling approaches and thus, the sensitive CD methodology proves its advantage over simple remodeling calculations. Overall, the ordered secondary structure elements prevail, in CIF-WT as in the CIF mutants and thus, a correct folding of the mutants was assumed. Additionally, the CD-curves of CIF mutants, expressed in E. coli T7 SHuffle[334] cells or co-expressed with co-induced chaperones[335-336] showed no structural differences in CD measurements. Thus, no additional folding assistants or reductive conditions for the formation of the disulfide bond were necessary and the expression in E. coli BL21 (DE3) or BL21 (DE3) Gold cells yielded properly folded proteins in a sufficient soluble portion.[311, 324] Compared to the haloalkane dehalogenases DhaA and DhlA, the shape of the curves is more intense at 190 nm (sheet fraction) and also at 210 nm, especially related to DhaA. These differences did not change very much throughout the mutants. The helical portion of the dehalogenases is higher due to an additional antiparallel β-sheet in the cap domain of CIF and thus, the stranded portion is even higher. This sheet seems to be stable, as the disturbed and stable portion of sheets did not decreased dramatically for the CIF mutants. There is no clear correlation between protein fold and the final function, because also the phylogenic and structural subclasses of haloalkane dehalogenases (HLDI-III) have members in the more recently described substrate specificity groups (SSGI-IV). The differences in fold for the created CIF mutants do not have the trend towards a secondary structure distribution as known from the HLDs. Moreover, they have more or less the same secondary structure portions as the wild- type, indicating the proper orientation of the substituted residues. This is a prerequisite to achieve the conformational orientation, which was initially predicted by Rosetta. Concluding, the mutants show a well-defined spectrum shape with two negative peaks, where the peak at 225 nm is not as intensive as in DhaA. Indeed, the ratio between the two negative peaks proposes information about the hydrophobicity of residues assembling the α-helices. The strong positive peak at 195 nm is characteristic for α- helices and in accordance with the results for DhaA-WT, whereas a shift towards shorter wavelengths indicates a stronger portion of β-sheets. The melting temperatures underline the structural integrity of all variants. The strongest differences were obtained for CIF01 by a decrease of roughly 5 K. A small fraction of mutants has also a slightly increased apparent melting temperature, but the maximum is Chapter three|Part I: Discussion 169 reached by plus 1 K for CIF17. This does not indicate an overall stabilization, quite contrarily the mainly reduced melting temperatures are a clear evidence for destabilizing influences. However, it should be stated that the drops in the melting temperature are not as strong as can be expected for the multiple substituted mutants. It was reported that mutations at the active site stronger influence the activity and more distant mutations exhibit a higher impact on the stability.[67] As this is not true for the examples of tunnel re-modulations, stabilizing certain haloalkane dehalogenases, it can be applied in this example, because there are other origins for protein stabilization, mainly the interactions between the rigid main and the more flexible cap domain. In single cases a small portion of refolding signals was observed. Thus, the proteins were allover not able to re-fold their natural structure after temperature-dependent denaturation, but in single cases secondary structure elements can be generated either by refolding the original structure or the formation of new connections and structural elements. The CD-spectrum from the unfolded CIF-WT clearly indicates the presence of disorganized random coils, though no clear refolding transition was observed and generally only one transition was detected in DSC and the CD measurements. It can be concluded that the protein completely lost its structure and also the very stable and rigid central β-sheet underwent irreversible randomization. The sheet is buried insight the structure and is surrounded by helices, thus these helices will promote the denaturation process and when the hydrophobic interactions to the sheet are lost, also the structural core is not able to keep the scaffold. The melting temperature is only one stability criterion and may not reflect the expected enzyme activity. Of course, after unfolding no activity can be expected with high probability, but also the activity might be lost far beyond the apparently destruction of the scaffold. The cap domain mostly consists of helices in a globular fold and thus, it is more probable that the unfolding may start in this domain, which was proved previously for its flexible conformational movements. Thus, the most important criteria for a stable and active hydrolase may the ability to keep the cap domain stably anchored at the top of the main domain. This was not calculated for the CIF mutants within Rosetta, but as the overall fold is almost the same and the melting temperature did not dropped dramatically, the mutants should be sufficiently stable for catalysis. All mutants were successfully purified and tested by the Iwasaki-assay for haloalkane dehalogenase activity and with the adrenaline assay for epoxide hydrolase activity. Unfortunately, the well folded mutants exhibit no haloalkane dehalogenase activity. The achieved detection limit was very low and comparable to appropriate GC methodologies, but no significant activity was detected over the time frame of seven hours. The activities of the positive controls DhaA and DhlA were successfully determined for the applied pH 7.5. The drop in activity is consistent to the activity profile of the dehalogenases, usually preferring the alkaline pH 8-9.[13] Additionally, the mutants lost their epoxide hydrolase activity. Facing the mutation H177N, the ring- opening histidine is substituted by the halide stabilizing asparagine. H177 is essential for Chapter three|Part I: Discussion 170 epoxide hydrolase activity and even if the residue was mutated to tyrosine - the even more conserved fifth part of the catalytic pentad - the activity was lost during previous studies.[16] Surprisingly, also the single point mutagenesis of A154T, which was proposed to be essential during halide binding in dehalogenases, destroyed the epoxide hydrolase activity. However, the CIF-WT was initially only reported to convert epibromohydrin, but no other epoxides were tested.[16, 199-200] In this thesis, CIF was found to be able to accept long-chain, terminal epoxides, also connected to an aromatic function. The epoxide hydrolase EchA accepts also long-, but also short-chain epoxides, which is in accordance to previous studies.[18] Interestingly, the activity towards epibromohydrin was not confirmed in this thesis. It was more likely, that CIF accept more complex/bulkier substituted epoxides. Therefore, 1,2-epoxydecane was converted, but not 1,2- epoxyhexane. Thus, it is more likely that the more flexible mounting of the residues from the cap domain enable more space in the active site. The lack of any activity led to the assumption to follow the pathway of strong-negative trade-off (Figure 7) during the interconversion experiments. Thus, the non-functionality must be reversible. To investigate a higher variance towards the gene sequences, error- prone PCR was performed with the inactive CIF mutants in related diploma and master theses, supervised by myself.[306, 324] Therein, an agar plate assay was adapted and refined and extensive mutagenesis of the initial CIF variants was performed with different error-prone polymerases, dNTP-analogs, manganese ion addition and the imbalanced utilization of the solely dNTPs. Thus, a high-throughput screening must be established to screen a sufficient variety of colonies to find active variants. The assay must be sensitive enough to aim the small expected activity. Therefore, the adaption of a pH-assay was successful using phenol red as two-colored indicator. The buffer capacity of the system was quite low, because no additional buffer substance as 1 mmol L-1 HEPES was applied. After validation tests, the application of a whole cell assay was achieved, while measuring the activity for 24 h. From an incubation time longer than 48 h, it was observed that the number of false-positives rose due to the metabolism switch of the resting cells. Transferring this knowledge to a robotic platform, up to 1600 clones can be screened fully-automated in one run.[288] Then, the capacity of the platform is not reached, but due to parallel processes, enough time must be calculated to not hinder parallel enzymatic assays.[288] Besides the in vitro mutagenesis of the gene sequences, subsequent cloning and screening, an in vivo system for continuous and targeted mutagenesis was developed (see Chapter 3, Part II) to randomly mutate the gene of interest, while cell growth and continuously screening is possible. However, directed evolution will cover the sequence space of the initial method and is thus hampered by the transformation efficiency of the library or the necessary, subsequent throughput, which is required to detect the desired mutants. In this thesis, the de novo implementation of haloalkane dehalogenase activity in an epoxide hydrolase was aimed and therefore, the expected activity will be very low. Chapter three|Part I: Discussion 171

A suitable selection system, specifically accumulating active enzyme variants would decrease the screening effort dramatically. For this purpose, the toxicity of the alkylating haloalkanes was used to permit lysis of cells with inactive proteins and enable cell growth by the survival, when an active mutant is expressed. This way, the stress induced mutagenesis of the host cells must be respected, on one hand broadening the mutagenic sequence space, on the other hand permitting unforeseeable influences to the host cell, the expression system and the target gene. Thus, in this thesis, suitable concentrations of haloalkanes were determined, to enable cell survival, while expressing DhaA- or DhlA-WT and to force cell death, when CIF-WT is expressed as negative control.[327] The frame of suitable concentrations is very narrow, but can be broadened, if the respective concentration of the carbon source (glucose) is adapted. As proof of concept, DhaA and DhlA were inactivated by the substitution of the nucleophile and the basic histidine, which are essential for the catalysis. Subsequently, the respective positions were saturated by NNK (nucleophile) or NDT (base) codons and the library underwent selection towards the toxic substrates. A very high reversion rate and thus, a rapid accumulation of wild-type dehalogenase were obtained, even after 24 h selection. This underlines the suitability of the selection assay for the successful accumulation of dehalogenase expression cells. Moreover, the assay may be adaptable for other compounds, as an estimated substrate recommendation (15-20 mmol L-1) is given and the adjustable glucose concentration enables additional selection pressure. In case of two compounds (1-iodopropane, 1- bromobutane), also the CIF-WT control survived the selection approach. This is due to the higher autohydrolysis rate, calculated from the Iwasaki assay. Thus, a higher amount may be applicable, but at 25 mmol L-1, also the positive controls exhibited no growth. Finally, a careful balance has to be estimated to achieve sufficient survival of positive clones, but also die-off the negative variants. During the selection of the epPCR libraries, several hits were identified and rescreened, after collating them in a master plate. From the initial 1-5% positive hits, only four were positive again and underwent deeper characterization.[324] After purification, the enzymes were proved negative in the Iwasaki- and adrenaline assays. It turned out that one clone carried an unchanged CIF01 sequence and thus, the mutants were selected in the proposed assay and interestingly, with 1,2-dibromoethane and 1-iodobutane some cells were able to survive and promote growth. In general, it is possible to find false-positive variants in the phenol red assay after selection if first, cells were able to use dead cell material to increase their fitness, or second, cells grew nearby microorganisms with active dehalogenase mutants and the partially detoxified medium was tolerable. The identified CIF01 initiated the selection procedure towards all mutants and twelve and six out of twenty mutants were able to promote cell growth under restrictive conditions with 1,2-dibromoethane or 1- iodobutane, respectively. After plasmid collection and retransformation, the growth rates were nearly identical or slightly improved for some mutants (in case of 1,2- dibromoethane). As the plasmid carried the original mutations, two conclusions must be Chapter three|Part I: Discussion 172 drawn: (1) only 4-bromobutyronitrile inhibited cell growth of proven inactive mutants and is thus recommended as selection compound in further analysis and (2) as no halide release was detected in the Iwasaki assay and no acidification during the robotic phenol red assay, the proteins must somehow interact with the substrate to promote cell growth. It should be mentioned that the first selection system was established with rhamnose as inducer, which was apparently consumed by the bacteria to unlock new growth substrates and stop the cellular disadvantageous protein expression. This could not be countered by glucose addition for catabolite repression without loss of the critical selection parameters. The CIF mutants were then analyzed by ITC to detect the lowest catalytic activity by heat changes during the reaction. The kinetic data of DhaA were successfully reproduced to verify the experimental setup. As negative control serves CIF-WT, but no variant exhibited enzymatic activity and no kinetic data could be extracted from the measurements. Thus, binding studies were performed, but even with higher amounts of protein, no significant signal was produced. During the titration, different mutants showed for the initial titrations a staged behavior, before the dilution signal (known from buffer-buffer titrations) was observed. Its intensity and duration was not correlated with special growth features of the corresponding selection cultures. Additionally, no binding titration signals were producible, as the interaction between substrate and enzyme is not detectable at all. To counterproof the chance for viability permission, CIF02 and CIF04 were purified and added to pre-induced selection cultures of CIF-WT. Surprisingly, the cultures reached significant cell densities. The control reaction with DhlA and DhlA H289F reached higher cell densities at comparable protein concentrations, whereas of course DhlA permitted growth for all three tested substrates (1,2-dibromoethane, 1-iodobutane and 4-bromobutyronitrile). For CIF02, viable cell cultures were obtained at concentrations of 3-5 µg mL-1 purified protein for 1,2- dibromoethane and 1-iodobutane, whereas for CIF04 only for 1,2-dibromoethane viable cell cultures were promoted. The partial detoxification of the medium is comparable to the suicide inhibitor DhlA H289F, able to react with at least one molecule of haloalkane to form an irreversibly trapped alkyl-enzyme complex. The behavior of CIF02 and CIF04 towards the toxins is in perfect alignment to the selection assay, as cells expressing CIF02 survived in 1,2-dibromoethane and 1-iodobutane and cells expressing CIF04 were only able to grow in selection media with 1,2-dibromoethane. As no halide and no proton release was observed and no binding characteristics were detected during the experiments, it was hypothesized to trap the substrates inside the buried active sites, but conformational changes during the induced fit may not allow the catalysis to proceed and thus, covalently or non-covalently bound substrate molecules are trapped and enable the essential detoxification for cell survival. Interestingly, the stress induced mutagenesis did not occur in detectable amounts during the selection process, as the selected survivors still harbor the original gene sequence. Through tryptophane quenching experiments information about the interaction and the spatial presence of Chapter three|Part I: Discussion 173 iodide or bromide ions should be gained by using the sensitive fluorescence of the active site tryptophane residues. Thus, the mutants may be able to bind also substrate molecules sufficiently. CIF-WT contains five tryptophane residues and an additional one was introduced by I130W to yield the same amount of tryptophane residues as in DhlA. Contrarily, DhaA harbors nine of those bulky residues and thus, the quench signal is expected to be much higher, as for the other enzymes. Contrarily, it was previously reported that DhaA is a quite challenging target for quenching experiments.[255] As expected, the nonfunctional mutant DhaA H272F exhibits a high dissociation constant for bromide and a lower for iodide, due to the substrate specificity of the wild-type enzyme. The dissociation constants are lower, because the dissociation was measured at pH 7.5, whereas the optimal pH value for dehalogenases is around 8.2.[12-13, 268] The not defined values indicate that the saturation was not achieved properly, or the dependency of the quenching signal versus the substrate concentration was not computable by the given equation. Thus, unspecific interactions can be assumed. The constants for the CIF mutants showed neither a specific trend, nor a substantial difference to the wild type enzyme. The constants were higher for iodide, than for bromide and therefore, interactions concerning iodine compounds seem to be more likely. Additionally, the constants for 1-iodobutane were higher as observed for the DhaA mutant. CIF02 and CIF03 permitted cell survival during the selection assay, when the proteins were expressed in the presence of 1-iodobutane. The constants are increased with this substrate and may indicate a potential binding interaction. As all tryptophane residues contribute to the fluorescence signals, but not all are accessible by the used quencher, the data must be handled with care. A noticeable increase is found with selection survivors, but only for 1-iodobutane. The substrate 4-bromobutyronitril was found to be a fluorophore itself. During the quenching experiments, the emission rose with the content of this substrate and thus, no quenching data are available. In CIF-WT a tryptophane residue at position 133 may disturb the quenching experiments, resulting in quenching signals for the wild type, as observed. Additionally, W133 will interfere with the introduced W130. Moreover, nearly the introduced N177 as halide stabilizing residue, L174W or F178W were proposed, which will influence the position of N177, found to be too far away for catalysis. In further approaches, the introduction of other putatively stabilizing residue can be followed. The N38 in LinB or N41 in DhaA are situated within the HGXP motif (X = any amino acid, usually aromatic in epoxide hydrolases). Therefore, it is stabilizing the negative halide ion. In epoxide hydrolases, the backbone nitrogen atoms stabilize the negative charge, developing during the catalysis. In EchA, X is tryptophane, in human and potato epoxide hydrolases phenylalanine, as well as in CIF. Another difference is the change of the latter amino [18] acid, forming H61GFG64 in CIF. In DhlA, X is a glutamate residue. Thus, the introduction of the halide stabilizing residue at this position can be proposed. Another common motif is structurally adjacent to the HGXP sequence, the GXSmXS/T motif. X is any amino acid, Sm a residue with a small side chain and S/T either a serine or a threonine. In haloalkane Chapter three|Part I: Discussion 174 dehalogenases, the second residue is charged and Sm usually glycine: GFGKS in DhlA, GMGKS in DhaA and GMGDS in LinB. Related residues are found in the epoxide hydrolases: GYGES in human, GYGDT in potato epoxide hydrolases and GFGDS and GLGQS in EchA and CIF, respectively. The next steps will be the sequential arrangement of the typical family residues to fine tune the active site. There are interactions with potential substrates, but the shape of the active site, the accessibility and consequently the orientation must be carefully revisited. Additionally, the mounting of the essential residues has to be controlled, as the position of active site residues and the shape of the cleft is critical to the final catalysis. Therefore, the second shell is interesting with the respective contacts to the catalytic pentad. It was not successful to simply introduce the most necessary residues into the active site to generate the desired activity and thus, the divergence between both enzyme classes is larger than expected. The current variants somehow influence the fitness of expression hosts, if they are produced under restrictive conditions during the selection assay. A simple adsorption into the protein and non-covalent trapping seemed to be most plausible, as halide or proton release was not detected in any assay. For further ITC experiments, substrate and protein interaction may be investigated for some candidates, but also product-protein experiments could visualize a different affinity of substrate or product and can interpreted as a stronger binding of product in the protein, caused by the dramatic changes in hydrophilicity after catalytic conversion. Another increase of the protein concentration to 8-10 mg mL-1 would be helpful, to increase the heat signal in the ITC by also higher dilution heats. To evolve a final haloalkane dehalogenase from the current CIF variants, a suitable selection assay was developed in this thesis. By coupling to in vivo evolution methods, the final activity may be generated, if possible. To vary the selection pressure, it is also possible to change the IPTG concentrations, even in the selection process. Since the cells are pre-induced by growth in LB the initial protein amount has to be respected, because in former experiments it was observed that cell cultures induced insufficiently were not able to survive the selection. It can be helpful to apply the split-GFP technology by adding - on the genetic level - an isolated β-strand element of GFP C-terminally to the CIF mutants.[337] Thus, by addition of the residual GFP molecule or co-expression, the protein content can be easily determined by the fluorescence signal. This can be hampered by protein solubility problems, if the tag promotes aggregation of the tagged molecules or interferes with the central β-sheet of the hydrolases. It should be stated that the application of the selection assay in evolutionary experiments has to be done under harsher conditions, to eliminate cells expressing solely “binders”. 4- bromobutyronitrile promotes a very strong selection pressure and is thus the most promising candidate. As every directed evolution needs to be forced towards the goal, the substrate is suitable for the combination with in vivo evolution methods (see Chapter 3, part II). Chapter three|Part I 175

Summary

The investigations on the α/β hydrolase-fold enzyme classes of epoxide hydrolases and haloalkane dehalogenases should reveal evolutionary relationships and functional correlations between these classes. As they are sequentially, structurally and phylogenetically very closely related, a Rosetta design was used to introduce transition state and therefore halide stabilizing residues to implement haloalkane dehalogenase activity in the epoxide hydrolase CIF. The initial mutants did neither exhibited dehalogenase activity, nor residual epoxide hydrolase activity, but exhibited a proper hydrolase fold and were structurally not significantly different to the wild-type conformation. Also the melting temperature profiles did not change dramatically. Consequently, directed evolution methods were established to achieve the necessary mutations through a potential high mutagenesis rate. Thus, a fully-automated assay procedure, starting from freshly inoculated 96-well plates was established, distinguishing between active and inactive protein variants by a color shift of the used, weakly buffered indicator solution.[288] By the application of a substrate mixture, the sensitivity was additionally increased to detect at least the lowest active mutants. Additionally, a selection assay was developed to accumulate dehalogenase active enzymes over the time and finally enrich the medium with potential hits by facing the expression cells with the toxic haloalkanes in excess. For the five general haloalkane dehalogenase substrates[12], suitable concentrations and combinations with the growth substrate glucose were determined to accumulate rapidly revertants from a non- functional library, which underwent saturation mutagenesis at catalytically essential positions.[327] Thus, the directed evolution libraries were focused, selected and screened. Positive hits were pooled, re-screened and characterized. After intensive characterization and assay analysis, the final hits were also found not to be active. Unfortunately, also in ITC experiments, no catalysis was detectable in these mutated variants.[324] Surprisingly, several mutants promoted cell growth during the new selection assay and also the addition of purified mutant protein to wild-type cultures rescued the cells. During ITC measurements, no catalysis was proved and also no clear binding profile was detected. In tryptophane quenching experiments, a potential protein substrate interaction was indicated, which is stronger in case of iodine salts/iodinated compounds. Finally, it has to be concluded that several positions in the CIF structure are promising candidates for site-saturation mutagenesis and additional substitutions. Since protein variants were found, which are able to interact with the general haloalkane dehalogenase substrates to rescue solely wild-type cultures, the way to full catalysis seem to be only some - unfortunately still unknown - substitution steps away. Chapter three|Part I 176

Materials and Methods

Rosetta calculations The following methods were not performed by myself. The complex molecule mechanics and structure predictions were carried out by David Bednář, Loschmidt Laboratories, Masaryk University, Brno, Czech Republic.

Quantum mechanical evaluation of the transition state in HLD catalysis

The structures of the haloalkane dehalogenase substrate molecules were designed in Avogadro[338] by energy optimization with MMFF94. The protein structure of DhaA (pdb code 1CQW) was protonated at pH 8.5 using the H++ server.[339] MGLtools1.5.2[340] were applied for substrate and enzyme preparation. Energy pre-calculations were done by AutoGrid 4.0.[341] The center of the grid maps was set to OD1 of D176 with 80x80x80 grid points and a spacing of 0.25 Å, ensuring the coverage of the active site and the main entrance tunnel. Subsequently, the substrates were docked into the DhaA structure by AutoDock 4.0.[341] The following parameters were used for the Lamarckian Genetic Algorithm: an initial population size of 300, a maximum of 3 x 106 energy evaluations in 30,000 generations, an elitism value of 1 by a mutation rate of 0.02 and a crossover rate of 0.8 in finally 250 runs. Local search was performed by Solis & Wets algorithm with the maximum of 300 iterations.[342] Every docking was clustered by its final orientations with a maximum of 0.5 Å root-mean-square tolerances for the conformational cluster analysis. Selected for the identification of the transition state for quantum mechanical calculation were only the best binding modes, satisfying all distance and angle parameters regarding the reactivity. In PyMol, all amino acid residues in a radius of 8 Å around the docked substrate molecule were chosen for the following calculations. This includes the catalytic pentad and first shell residues, as well as most of the second shell residues.[309] The quantum mechanical calculations were done with MOPAC 2009[343] using the PM3[344] semi-empirical method. By the MOZYME[345] algorithm, localized molecular orbitals were applied to compute self-consistent field equations. To map the reaction pathway, DRIVER[346] was used and structure minimization was performed by BFGS geometry optimizer. Following keywords were set to control the calculation: GEO- OK, MMOK, NODIIS, NOINTER, NOXYZ. Charge was set to 0. The final data analysis was done with TRITON 4.0[347]. The transition state was visualized in PyMol and distances, angles and dihedrals between the catalytic pentad and the substrate molecule were measured. Subsequently, these data were used to construct the Rosetta theozyme, well defined by a constraint file.[65] Chapter three|Part I: Materials and Methods 177

Predictive Rosetta calculations

The constraint file obtained before was divided into 5 blocks. Three of them describe the geometry of the catalytic triad and the dehalogenase substrate, the other the halide stabilizing residues. The rotamer library was created by “generate_ligens” and added to ligand parameter files. Then, fourteen structures of epoxide hydrolases were chosen from the pdb (pdb codes 1S8O, 1VJ5, 2CJP, 3ANS, 3ANT, 3CXU, 3I1Y, 3I28, 3KD2, 3KDA, 3KOO, 3OTQ, 3PDC and 3PI6). All ligands were removed from the structures, as well as ion atoms and water molecules. The residues were renumbered and the active site cavity matching was obtained by gen_apo_grids. The calculation was started by default settings, except the minimum size of the pocket was set to 50. The largest buried cavity was chosen as potential active site, matching the active site of HLD into all 14 epoxide hydrolase scaffolds by RosettaMatch.[348] Proteins for the enzyme design were chosen, after fulfilling all constraints, for Rosetta enzyme design.[65] Thereby, the residues within

6 Å of the center, residues with Cβ atoms within 8 Å and residues with the Cα-Cβ bond vector pointing towards the transition state within 10 Å were redesigned. Additionally, residues within 12 Å were repacked and remaining residues were fixed. For each model, a backbone and side-chain minimization was performed.[95] To keep the constraints geometry and the substrate-enzyme interface energy, a geometry filter was used. After successful design, the HLD substrates were docked into the structures as described above, to exclude instable binding and unfavorable geometries. Finally, bond ligands were removed and the protein structures were energy minimized by minimize_with_cst, with following settings: a full atom pair potential cutoff of 9 Å, standard scoring weights and a constraint weight of 1, a CA tether for harmonic constraints of 0.5 and a non- restriction to side chains. The calculated designs and the original scaffold were evaluated by the Score module with default settings and the differences between original and designed proteins were taken for indications about the stability measurement. The following method section was performed by myself.

Gene cloning and site directed mutagenesis

Rational design - site directed mutagenesis

The genes of CIF-WT (GenBank: ABJ12172, UniProt: A0A0H2ZD27) and CIF01 (Table 1) were ordered (GenScript, USA) codon-optimized for expression in E. coli and subcloned in pUC57. For DE3-dependent protein expression, the genes were subsequently cloned into pET28a downstream the T7-promotor, omitting the N-terminal signal sequence for extracellular secretion (polypeptide MILDRLCRGLLAGIALTFSLGGFA). Accounting to the procedure by Bahl et al., a C-terminal His6-tag was added for chromatographic protein purification, using the NcoI and XhoI restriction sites.[16, 199-200] CIF02-05 were mutated by PCR/MegaWhoP coupled reactions (Figure 45). Chapter three|Part I: Materials and Methods 178

Figure 45: Flowchart for the mutagenesis of CIF01. The subsequent introduction of mutations was performed in one reaction, parallel using the same PCR program. The PCR was done with mutagenic forward and mutagenic or template’s reverse primer. The annealing temperature and the elongation time are shown on the right side. Afterwards, the crude product was used for MegaWhoP reaction, then DpnI was added to destroy parental DNA and the product was transformed into competent E. coli cells. After plasmid isolation and sequencing, the next round started. Red marked are the final mutants, derived from the intermediate products. MW - megaprimer PCR of whole plasmid, onc - overnight culture

The mutations were introduced by suitable oligonucleotides in a PCR, whereas sequentially adjacent positions were combined in one oligonucleotide. The second primer was derived from the template (pET-RP) or was also designed for additional substitution. However, repeated rounds of PCR were necessary to incorporate all substitutions. The PCR products were directly used for MegaWhoP reactions, amplifying the whole plasmid. Parental DNA was digested and the material transferred into competent E. coli Top10 cells (ThermoFisher, USA). The mutants CIF06-20 were derived by CIF-WT and CIF01-05 by simple QuikChange® reactions. The correct substitution was proved by DNA sequencing (GATC Biotech or eurofins genomics, GER). The primer and gene sequences were deposited in the group “Biotechnology and Enzyme Catalysis”, Institute for Biochemistry, Greifswald University or are available on the primary data device, provided with this thesis. The genetic manipulations for the saturation libraries of DhaA and DhlA used in the selection assay were described in the respective publication.[327] Chapter three|Part I: Materials and Methods 179

Rational design - domain exchange

Additionally, domain exchanges were performed, because the special cap domains seem to play an important role in substrate acceptance and specificity.[277] The cap domain of DhaA includes residues I132-A216 (84 amino acids), the domain of DhlA L151-R229 (78 amino acids) and CIF I156-N249 (117 amino acids). The longest chain was proposed by CIF, including an additional β-sheet, consisting of two antiparallel β-strands. The globular domain consists additionally of six helices, whereas in DhaA and DhlA also six or five helices are present, respectively. Thus, the main domain of CIF-WT was kept and the cap domains of DhaA or DhlA were added genetically. The template of pET28a/CIF-WT and exclusively the HLD cap domains were amplified with suitable primer and were recombined by FastCloning.[349]

Directed evolution

For further improvement by directed evolution, suitable oligonucleotides ensuring error- prone PCR amplification, starting at the first amino acid of the CIF gene and ending in front of the histidine tag were created. Under varying conditions (unbalanced dNTP- excess, manganese ions, increased magnesium concentration, error-prone Kits, Taq and mutated Taq polymerase), error-prone PCR libraries of the CIF mutants were created and screened/selected for haloalkane dehalogenase activity.[306, 311, 324]

Biochemical characterization

Circular dichroism spectrometry (CD) - spectra scan

Nearly all secondary structure folds or random coils (unfolded protein) can be recognized by CD-experiments, resulting in characteristic CD-spectra (Figure 46).[350-351]

Figure 46: CD-spectra derived from different secondary structures. Chapter three|Part I: Materials and Methods 180

The protein samples were diluted to concentrations of 0.1-0.25 mg mL-1 in 2-5 mmol L-1 phosphate buffer. At first the absorption of the protein was measured from 180-260 nm, afterwards the buffer blank for baseline subtraction. The spectrum profile was scanned in a 1 mm quartz cuvette with 400 μL of sample five times at 20 °C and a scan mode of 100 nm min-1 and averaged. The spectrum derived from the buffer was set as baseline, which was subtracted from the spectrum of the protein sample. The data were normalized with the formula: 푀푒푎푛 푟푒푠𝑖푑푢푒 푒푙푙𝑖푝𝑖푐𝑖푡푦 [푑푒푔 푐푚2 푑푚표푙−1] 표푏푠푒푟푣푒푑 푒푙푙𝑖푝푡𝑖푐𝑖푡푦 × 푚표푙푒푐푢푙푎푟 푚푎푠푠 × 100 = 푝푟표푡푒𝑖푛 푐표푛푡푒푛푡 × 푛푢푚푏푒푟 표푓 푎푚𝑖푛표 푎푐𝑖푑푠 × 푝푎푡ℎ푙푒푛푔푡ℎ 푡ℎ푟표푢푔ℎ 푐푢푣푒푡푡푒 The data were received either at a Chirascan CD Spectrometer (in-sample temperature control) or a Jasco J-810 spectropolarimeter (peltier temperature control). The secondary structure prediction was calculated at the DichroWeb server.[317, 321] The input format was JASCO 1.50 or Applied Photophysics and the input unit millidegrees per theta (machine units). The analysis program CDSSTR was chosen with reference set 3[315- 316, 352] and a scaling factor of 1.0. All results were obtained with a normalized root mean square deviation (NRMSD) < 0.025.

Circular dichroism spectrometry (CD) - temperature scan

The absorption at one minimum value (usually 221-222 nm) of the CD spectra can be used for observations of changes in the secondary structure. Under the application of a temperature gradient from 15-95 °C, characteristic curves can be detected (Figure 47).

Figure 47: Plot for estimating the melting temperature of CIF01. CD was measured at 221 nm from 10-80 °C. The melting temperature was calculated by regression analysis towards sigmoidal function with four variables.

The inflection point of the sigmoidal fit represents the temperature, at which 50% of the protein structure was denatured - the melting temperature TM. Thus, 3 mL of the protein Chapter three|Part I: Materials and Methods 181 sample (1 mg mL-1 in 20 mmol L-1 phosphate buffer) was analyzed by continuous stirring in a 1-cm quartz cuvette (Hellma Analytics, GER) at a temperature slope of 2 °C min-1 from 15-80 °C with 1 nm bandwidth at a wavelength of 220 nm and a data range of 0.1 °C. The response was set to 1 s and a reversed temperature scan was applied, after keeping the highest temperature for 3 min stationary at 80 °C to detect refolding shapes at a rate of 1 °C min-1 with an in-sample temperature controller in a 0.1 cm glass cuvette The data were fitted against sigmoidal functions either by SigmaPlot (Systat Software, GER) or within the Jasco DataViewer software.

Differential Scanning Calorimetry (DSC)

By this method, the difference in heat is measured, which is necessary to increase the temperature of two solutions. Thus, differences between a heated buffer and protein solution detect the unfolding procedure of the solubilized protein at the range of its transition temperature. The area of this peak reflects the enthalpy of the transition from the folded to the unfolded state. Hence, the maximum point of the peak corresponds to the temperature of 50% protein denaturation - the melting temperature (Figure 48).

Figure 48: Example for a typical DSC curve for a protein (in this case CIF02) with one melting temperature, indicating a monomeric protein with one globular domain. The measurements were done in triplicates as shown and the results were averaged.

The protein samples were prepared to concentrations of approximately 1 mg mL-1 in 20 mmol L-1 phosphate buffer. 500 μL of the protein sample and 500 μL of the pure buffer were pipetted into separate wells of a 96-well deep well plate. The samples were measured automatically at a MicroCal VP-Capillary DSC (Malvern Instruments, UK) in a range from 12 to 80 °C with 1 °C min-1 in triplicates. The data were processed in OriginLab (OriginLab Corporation, USA). Chapter three|Part I: Materials and Methods 182

HLD-activity assay according to Iwasaki et al.[325]

10 μL of the halogenated substrate were injected in 10 mL reaction buffer (100 mmol L-1 glycine buffer, pH 8.6 or 20 mmol L-1 phosphate buffer, pH 7.5). The mixture was incubated in a water shaking bath for 30 min at 37 °C and 200 rpm. Afterwards 1 mL of the reaction mixture was withdrawn as substrate blank. The buffer blank contained 1 mL of the respective buffer and for the enzyme blank 1 mL of a mixture of 100 μL of enzyme solution and 900 μL buffer were used to keep the appropriate dilution. Thus, the reaction was started by adding 1 mL of enzyme in the concentration of 1- 2 mg mL-1. For active dehalogenases, every five minutes and for the investigated mutants each hour, 1 mL of the mixture was withdrawn and mixed with 0.1 mL of 35% nitric acid and mixed thoroughly, to stop the reaction. 100 μL of Iwasaki solution I (5.6 g

Hg(SCN)2 in 0.6 L Ethanol) and 200 μL of Iwasaki solution II (270 g FeNH4SO4 in 324 mL

HNO3 (65%) adjusted to 900 mL with A. dest.) were added. Finally, 100 μL of the mixture were pipetted in eight wells of a MTP and the absorbance was measured at 460 nm. The values were averaged and the enzyme activity was calculated with a respective standard curve for bromide or iodide solutions.

Phenol red assay

To detect HLD activity with high-throughput, a whole cell assay was established, since the cellular membranes are permeable for the desired substrates. Single colonies were picked (QPix 400, Molecular Devices, USA) in MTPs, filled with 200 µL LB medium, supplemented with appropriate antibiotics. Cells were grown overnight at 37 °C and 200 rpm and were subsequently replica plated to induction plates, filled with 180 µL supplemented LB medium. After 3-5 h growth at 37 °C (target OD600 of 0.4-0.6), protein expression was induced by adding 20 µL IPTG in supplemented LB medium (final concentration 0.1 mmol L-1). Thus, the plates were shaken at 20 °C for additional 8 h. Then, the cells were harvested (25 min, 4500 g, 4 °C) and washed twice with 200 µL HEPES (4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid, 1 mmol L-1, pH 8.2) buffer to reduce the background signal. 50 µL of the whole cell suspension were added to 150 µL of the assay solution in corresponding assay plates (filled with 140 µL HEPES, 1 mmol L-1, pH 8.2, 20 mg mL-1 phenol red and supplemented with 10 µL of a mixture of 100 mmol L-1 haloalkanes 1,2-dibromoethane, 1-iodopropane, 1-bromobutane, 1- iodobutane and 4-bromobutyronitrile in acetonitrile). All five substrates are known to be accepted by the dehalogenases and using a mixture increases the chance to detect the desired active variants.[12, 327] The initial and the final absorption after 24 h incubation at 37 °C was measured at 430 and 560 nm. The phenol red indicator color change to yellow, indicated by a change in the ratio of both absorption values, was used for signaling the release of protons and subsequent catalytic conversion of the haloalkanes to the corresponding alcohol. Chapter three|Part I: Materials and Methods 183

Adrenaline assay

Either whole cells, as prepared in the previous section, or purified protein (100 µg) were used for this assay. The sample was mixed with 90 µL 20 mmol L-1 phosphate buffer (pH 7.5) and 10 µL of the epoxide substrate (100 mmol L-1, dissolved in acetonitrile) and shaken for 3-24 h at 37 °C and 350 rpm. Then, 20 µL of a 15 mmol L-1 sodium peroxide solution (in A. dest.) were added and shaken at 37 °C for 1 h at 350 rpm. Then, the absorption at 490 nm was measured as blank value and subsequently, 20 µL of adrenaline bitartrate solution (10 mmol L-1 in A. dest.) were added, the MTP was shaken for 3 s at 400 rpm and the absorption was measured again.

Robotic GC/MS at Loschmidt Laboratories, Masaryk University, CZ

The consumption of haloalkane substrates was analyzed in screw capped vials with magnetic caps. To 1 mL of reaction buffer (0.05 mol L-1 sodium phosphate, pH 7.5 or 0.1 mol L-1 glycine, pH 8.6), 1 µL of pure substrate was added and the reaction mixture was incubated for 30 min at 37 °C. The reaction was initiated by the addition of 150 µL of the enzyme (1 mg mL-1 in corresponding buffer). The whole reaction was monitored for 24 h at 37 °C by regular withdrawing of 1 µL of the reaction mixture. The concentration of the substrate/product was quantified using GC-MS (Trace 1300 (Thermo Scientific, USA) equipped with a capillary TG-SQC column (30m x 0.25mm x 0.25μm Thermo Scientific, USA) and connected to a ISQ™ LT Single Quadrupole (Thermo Scientific, USA)). 1 μL of the sample was injected into the split-splitless inlet at 250 °C, applying a split ratio of 1:50. The temperature program was isothermal at 40 °C for 1 min, followed by an increase to 140 °C with 20 K min-1 and final hold for 2 min. Helium was used as carrier gas (1 mL min-1). The MS was operated at the SCAN mode (30 to 300 amu), the temperatures of the ion source and the GC-MS transfer line was 200 °C and 250 °C, respectively. The initial rate of substrate conversion was analyzed by non-linear regression with the software Origin 8.0 (OriginLab, USA).

Isothermal titration calorimetry (ITC)

ITC experiments were used to check the protein’s ability of substrate binding or substrate catalysis.[329, 353] The parameters for each experiment were summarized in Table 8. The none-feedback mode was used to achieve a constant decrease of the signal at the lowest point after each injection. During later studies, the spacing time was decreased to 60 s and the duration increased to 30 s to increase the signal intensity. 3-4 mL of the enzyme solution (about 1 mg mL-1) in 20 mmol L-1 phosphate buffer (pH 7.5) were degassed for 20 min (ThermoVac 2, Malvern Instruments, UK). 10 μL of pure halogenated substrate were dissolved in 4 mL of the identical buffer, vigorously shaken on a vortex application for 10 min and incubated for 20-30 min at 37 °C in a cultivation shaker at 220 rpm. The measuring chamber was washed with MQ water and the used Chapter three|Part I: Materials and Methods 184 buffer, before filling the enzyme solution into the sample cell for a multiple injection experiment. Afterwards the substrate solution was filled inside the injector and rinsed twice. For a single injection experiment (SIE) the cell and syringe content were switched (syringe: enzyme, cell: substrate).

Table 8: Experimental set up for the ITC-measurements

Multiple injection experiment Single injection Parameter kinetics total conversion experiment total injections 28 8 1 cell temperature [°C] 37 37 37 reference power maximum (11.2) maximum (11.2) maximum (11.2) initial delay [s] 60-100 60-100 60-100 syringe concentration 0 0 0 cell concentration 0 0 0 stirring speed 502 502 502 volume [μL] 10 10 10 duration [s] 20 20 20 spacing [s] 150 5000 20000 filter period 2 2 2 feedback mode none none none

In case of binding experiments, the feedback mode was switched to high to achieve a rapid return of the signal to the baseline. Furthermore, the temperature control was performed by fast equilibration mode. The residual parameters were the same as for MIE kinetic experiment (Table 8). The injection volume was decreased for certain experiments, while increasing the injection number up to 40. After the experiment, the whole application was washed with 5 mL 5% DECON 90 and 200 mL MQ water. Then, the chamber was washed with 250 mL MQ water and the injector with 4 mL HPLC-grade methanol. The residual water in the chamber was removed and the injector was dried with the vacuum application.

Tryptophane fluorescence quenching

For halide dependent quenching experiments, 350 µL of a protein sample (approximately 60 µg mL-1 in 20 mmol L-1 phosphate buffer, pH 7.5; final concentration 0.3 µmol L-1) were mixed with 310 µL of the identical buffer, 20 µL acetonitrile and 20 µL of salt solution (potassium bromide or potassium iodide 0-4 mol L-1 in 20 mmol L-1 sodium phosphate buffer, final concentration range 0-200 mmol L-1). For the substrate dependent quenching measurements, the same amount of protein solution was mixed with 330 µL buffer and 20 µL of the substrate solution was added (1,2-dibromoethane, 1-iodobutane or 4-bromobutane nitrile 0-8 mol L-1 in acetonitrile, final concentration range 0-200 mmol L-1). The emission was measured from 300-400 nm after excitation at Chapter three|Part I: Materials and Methods 185

295 nm, before (F0) and after (F) the addition of halide ions/haloalkanes at a Xenius fluorimeter (safas, MC). The bandwidth for both, emission and excitation was kept at 5 nm with an average time of 0.5 s at a data range of 2 nm. The fluorescence was normalized by a blank sample, containing no enzyme solution. The association/dissociation constants were calculated using the given Stern-Vollmer equation in SigmaPlot (Systat Software, GER).

Growth analysis and selection assay Within this thesis, the toxicity of haloalkane and haloalcohols was tested towards different E. coli strains and the viability effect by the overexpression of a haloalkane dehalogenase was used to create a selection assay for the accumulation of cells, expressing active haloalkane dehalogenases.

Toxicity tests

Freshly transformed TOP10 or BL21 (DE3) E. coli cells with epoxide hydrolase, corresponding mutants or haloalkane dehalogenases were grown overnight. The OD600 of 30 mL LB media, supplemented with appropriate antibiotics, in shaking flasks was set to 0.05 and cells were grown at 37 °C to an OD600 of 0.4-0.6. Then, protein expression was induced with a final concentration of 0.2% L-rhamnose (pJOE plasmids) or 0.1 mmol L-1 IPTG (pET plasmids). After overnight growth and expression at 20 °C, cells were harvested (4500 g, 20 min at 4 °C) and set to an OD600 of 10 after washing twice with an appropriate volume of not supplemented selection media (M9 minimal medium: -1 -1 -1 -1 38 mmol L Na2HPO4 · 12H2O, 22 mmol L KH2PO4, 8.6 mmol L NaCl, 18.7 mmol L -1 -1 -1 -1 NH4Cl, 0.1 mmol L CaCl2, 2 mmol L MgSO4 · 7H2O, 8.2 µmol L biotin and 3 µmol L thiamine). It has to be noted that the M9 medium was set up 10x concentrated without calcium and magnesium salts and the vitamins to be autoclavable at 121 °C. Then, 1x M9 was prepared by adding sterile filtrated calcium and magnesium salts and sterile filtrated vitamins. An amount of 5 mL selection media in culture tubes was inoculated for cultivation with 50 µL washed cells (final OD600 of 0.1). The medium was supplemented corresponding to the pre-used concentrations with inducer and antibiotic and additionally 0.1-0.4% glucose. Then, the toxic substrates were added to their final -1 concentrations of 5-50 mmol L with a final concentration of 2% acetonitrile. The OD600 was measured regularly to determine the change in growth between the different cell cultures and substrate concentrations. The growth ratio was calculated as OD600 values compared to a similar selection culture without toxin, but with acetonitrile. This way the inhibition effect concerning the growth of the cells could be traced.

Selection assay

An overnight culture was inoculated with a freshly transformed DhaA or DhlA library in BL21 (DE3) cells. A 30 ml-LB-culture, supplemented with appropriate antibiotics, was set Chapter three|Part I: Materials and Methods 186

to an OD600 of 0.05 with these cells. At an OD600 of 0.4 the culture was induced with L- rhamnose or IPTG (final concentrations 0.2% or 0.1 mmol L-1 respectively) and cultivated further at 20 °C for 14 h. Afterwards the OD600 was set to 10 in M9 media and the cells were washed twice with not supplemented selection media. 300 μl of the final suspension were used to inoculate the 30 mL selection approaches. The selection media contained 30 ml M9 media with 100 µg ml-1 ampicillin or 50 µg ml-1 kanamycin, 0.02% L- rhamnose or 0.1 mmol L-1 IPTG, 20 mmol L-1 of the toxic substrate (in acetonitrile and to a final concentration of 2% acetonitrile) and 0.1-0.4% glucose, depending on the substrate conditions. Thus, the initial OD600 was 0.1. The culture was incubated for 24 h at 37 °C at 180 rpm shaking in an incubator. A dilution was plated onto agar plates and the complete selection approach was centrifuged at 4 °C for 20 min and 2000 g. The whole pellet was washed with M9 media and a new round of selection started by inoculating a freshly prepared selection media with the whole pellet. The colonies on the plates were counted to determine the cfu mL-1 and picked for phenol red assay screening to estimate the accumulation ratio of reverted haloalkane dehalogenases.

Computational methods

CIF mutants

The protein structures of the CIF mutants were constructed basically on the structure 3KD2 from Bahl et al. Thus, the side chains were substituted and the final molecules were refined with the YASARA macro md_refine to reach the most probable protein fold at an energy minimum, reflecting the apparently most stable structure. These models served as templates for correlated and structural analysis.

Homology modeling of cap variants

Starting from the Fasta sequence of CIF-WT, the cap domain was deleted and the respective amino acid residues from DhaA/DhlA were inserted. From this sequence the models were built on the basis of the structures 1BN6 for DhaA, 2HAD for DhlA and 3KD2 for CIF (respective pdb codes). Additionally a hybrid (as starting point both structures: HLD and CIF) and a pdb (no starting structure given, so the structural information from all pdb data will influence the model) homology model were designed and analyzed.

Molecular docking

A simulation box of 5 Å around the carboxylic function of the nucleophile was created. The file was saved as YASARA scene and used as receptor during the docking. Additionally, the energy minimized pdb-file of a substrate served as ligand. Subsequently, the macro algorithm dock_runensemble creates seven side chain rotamer ensembles with alternative high-scoring solutions. The substrates were kept flexible and Chapter three|Part I: Materials and Methods 187 were docked in 100 receptor-runs with VINA into each receptor ensemble family. Complexes with a minimum of total energy were superposed and analyzed, reflected in an output log-file. After visual examination, the docked complexes were refined (md_refine) and analyzed in PyMol.[309]

Tunnel cluster analysis

CAVER (3.0)[310] was used as the PyMol plugin to analyze potential tunnel pathways from a defined starting point inside the active site, exactly between both oxygen atoms of the carboxylate function of the nucleophiles. The default parameters were not changed. The plugin creates correlated spheres around the starting point and by the addition of new further spheres inside the initial sphere, a pathway to the solvent exposed surface is calculated by defined sphere sized. The related tunnel throughputs, lengths, bottlenecks and lining residues are detected, calculated and presented in the clustered tunnel report files.

Chapter three|Part I 188

Abbreviations bp base pairs MSA multiple sequence alignment CIF Cystic Fibrosis Transmembrane MTP microtiter plate Conductance Regulator MW megaprimer PCR of whole Inhibitory Factor, plasmid putative epoxide hydrolase pNPGE para-Nitrophenylglycidylether from Pseudomonas aeruginosa onc overnight culture BLAST basic local alignment search tool PAGE polyacrylamide gel E. coli Escherichia coli electrophoresis [323] DEAE diethylaminoethyl pdb protein data bank DhaA haloalkane dehalogenase from PCR polymerase chain reaction Rhodococcus sp. rmsd root mean square deviation DhlA haloalkane dehalogenase from SDS sodium dodecylsulfate Xanthobacter autotrophicus TIM triosephosphate isomerase DSC differential scanning WT wild type calorimetry EchA epoxide hydrolase from Agrobacterium radiobacter

EDTA ethylendiaminetetraacetic acid

EH epoxide hydrolase

FACS fluorescence activated cell- sorter GC gas chromatography GFP green fluorescing protein HLD haloalkane dehalogenase IAD innovation-amplification- divergence, a model for enzyme evolution ITC isothermal titration calorimetry MDN mutation during non- functionality, a model for enzyme evolution MS mass spectrometry Chapter three|Part I 189

References Mowbray, M. Widersten, S. C. Kamerlin, [1] D. L. Ollis, E. Cheah, M. Cygler, B. ACS Catal. 2015, 5, 5702-5713. Dijkstra, F. Frolow, S. M. Franken, M. Harel, S. J. Remington, I. Silman, J. [16] C. D. Bahl, D. R. Madden, Protein Pept. Schrag, J. L. Sussman, K. H. G. Lett. 2012, 19, 186-193. Verschueren, A. Goldman, Protein Eng. [17] J. H. L. Spelberg, E. J. de Vries, in 1992, 5, 197-211. Enzyme Catalysis in Organic Synthesis, [2] P. J. O'Brien, D. Herschlag, Chem. Biol. Vol. 1 (Eds.: K. Drauz, H. Gröger, O. 1999, 6, R91-R105. May), Wiley-VCH, Weinheim, 2012, pp. 363-416. [3] U. T. Bornscheuer, R. J. Kazlauskas, Angew. Chem. Int. Ed. 2004, 43, 6032- [18] B. van Loo, J. Kingma, M. Arand, M. G. 6040. Wubbolts, D. B. Janssen, Appl. Environ. Microbiol. 2006, 72, 2905-2917. [4] R. J. Kazlauskas, Curr. Opin. Chem. Biol. 2005, 9, 195-201. [19] J. W. Newman, C. Morisseau, B. D. Hammock, Prog. Lipid Res. 2005, 44, 1- [5] O. Khersonsky, C. Roodveldt, D. S. 51. Tawfik, Curr. Opin. Chem. Biol. 2006, 10, 498-508. [20] A. Archelas, R. Furstoss, Curr. Opin. Chem. Biol. 2001, 5, 112-119. [6] K. Hult, P. Berglund, Trends Biotechnol. 2007, 25, 231-238. [21] J. Swaving, J. A. M. de Bont, Enzyme Microb. Technol. 1998, 22, 19-26. [7] E. Busto, V. Gotor-Fernandez, V. Gotor, Chem. Soc. Rev. 2010, 39, 4504-4523. [22] I. V. J. Archer, Tetrahedron 1997, 53, 15617-15662. [8] C. Pandya, J. D. Farelli, D. Dunaway- Mariano, K. N. Allen, J Biol Chem 2014, [23] K. Faber, M. Mischitz, W. Kroutil, Acta 289, 30229-30236. Chem. Scand. 1996, 50, 249-258. [9] H. Jochens, K. Stiba, C. Savile, R. Fujii, J. [24] E. Racker, Mol. Cell. Biochem. 1974, 5, G. Yu, T. Gerassenkov, R. J. Kazlauskas, 17-23. U. T. Bornscheuer, Angew. Chem. Int. [25] F. Pries, J. Kingma, D. B. Janssen, FEBS Ed. 2009, 48, 3532-3535. Lett. 1995, 358, 171-174. [10] S. K. Padhi, R. Fujii, G. A. Legatt, S. L. [26] E. Negelein, H. J. Wulff, Biochem. Z. Fossum, R. Berchtold, R. J. Kazlauskas, 1937, 239, 352-389. Chem. Biol. 2010, 17, 863-871. [27] C. Walsh, Nature 2001, 409, 226-231. [11] E. Chovancova, J. Kosinski, J. M. Bujnicki, J. Damborsky, Proteins 2007, [28] A. Schmid, J. S. Dordick, B. Hauer, A. 67, 305-316. Kiener, M. Wubbolts, B. Witholt, Nature 2001, 409, 258-268. [12] T. Koudelakova, E. Chovancova, J. Brezovsky, M. Monincova, A. Fortova, J. [29] C. Wandrey, A. Liese, D. Kihumbu, Org. Jarkovsky, J. Damborsky, Biochem. J. Process Res. Dev. 2000, 4, 286-290. 2011, 435, 345-354. [30] J. L. Porter, R. A. Rusli, D. L. Ollis, [13] T. Koudelakova, S. Bidmanova, P. ChemBioChem 2016, 17, 197-203. Dvorak, A. Pavelka, R. Chaloupkova, Z. [31] B. M. Nestl, B. A. Nebel, B. Hauer, Curr. Prokop, J. Damborsky, Biotechnol. J. Opin. Chem. Biol. 2011, 15, 187-193. 2013, 8, 32-45. [32] U. T. Bornscheuer, G. W. Huisman, R. J. [14] L. Daniel, T. Buryska, Z. Prokop, J. Kazlauskas, S. Lutz, J. C. Moore, K. Damborsky, J. Brezovsky, J. Chem. Inf. Robins, Nature 2012, 485, 185-194. Model. 2015, 55, 54-62. [33] M. P. Deonarain, A. A. Epenetos, Br. J. [15] B. A. Amrein, P. Bauer, F. Duarte, A. Cancer 1994, 70, 786-794. Janfalk Carlsson, A. Naworyta, S. L. Chapter three|Part I: References 190

[34] N. Verma, K. Kumar, G. Kaur, S. Anand, [54] M. Ostermeier, A. E. Nixon, S. J. Crit. Rev. Biotechnol. 2007, 27, 45-62. Benkovic, Bioorg. Med. Chem. 1999, 7, 2139-2144. [35] R. Pieters, S. P. Hunger, J. Boos, C. Rizzari, L. Silverman, A. Baruchel, N. [55] V. Sieber, C. A. Martinez, F. H. Arnold, Goekbuget, M. Schrappe, C.-H. Pui, Nat. Biotech. 2001, 19, 456-460. Cancer 2011, 117, 238-249. [56] J. K. Song, B. Chung, Y. H. Oh, J. S. Rhee, [36] S. Luetz, L. Giver, J. Lalonde, Biotechnol. Appl. Environ. Microbiol. 2002, 68, Bioeng. 2008, 101, 647-653. 6146-6151. [37] F. Cheng, L. L. Zhu, U. Schwaneberg, [57] Y. Kawarasaki, K. E. Griswold, J. D. Chem. Commun. 2015, 51, 9760-9772. Stevenson, T. Selzer, S. J. Benkovic, B. L. Iverson, G. Georgiou, Nucleic Acids Res. [38] T. Davids, M. Schmidt, D. Bottcher, U. T. 2003, 31, e126. Bornscheuer, Curr. Opin. Chem. Biol. 2013, 17, 215-220. [58] A. Ikeuchi, Y. Kawarasaki, T. Shinbata, T. Yamane, Biotechnol. Prog. 2003, 19, [39] Y. Nov, Appl. Environ. Microbiol. 2012, 1460-1467. 78, 258-262. [59] M. Z. Li, S. J. Elledge, Nat. Genet. 2005, [40] Y. Nov, PLoS One 2013, 8, e68069. 37, 311-319. [41] M. S. Packer, D. R. Liu, Nat. Rev. Genet. [60] M. Lehmann, L. Pasamontes, S. F. 2015, 16, 379-394. Lassen, M. Wyss, Biochim. Biophys. [42] S. Lutz, W. M. Patrick, Curr. Opin. Acta 2000, 1543, 408-415. Biotechnol. 2004, 15, 291-297. [61] C. A. Voigt, C. Martinez, Z.-G. Wang, S. [43] L. A. Weymouth, L. A. Loeb, Proc. Natl. L. Mayo, F. H. Arnold, Nat. Struct. Mol. Acad. Sci. USA 1978, 75, 1924-1928. Biol. 2002, 9, 553-558. [44] R. C. Cadwell, G. F. Joyce, Genome Res. [62] A. Herman, D. S. Tawfik, Protein Eng. 1992, 2, 28-33. Des. Sel. 2007, 20, 219-226. [45] H. Murakami, T. Hohsaka, M. Sisido, [63] R. K. Kuipers, H. J. Joosten, W. J. van Nat. Biotechnol. 2002, 20, 76-81. Berkel, N. G. Leferink, E. Rooijen, E. Ittmann, F. van Zimmeren, H. Jochens, [46] T. S. Wong, K. L. Tee, B. Hauer, U. U. Bornscheuer, G. Vriend, V. A. dos Schwaneberg, Nucleic Acids Res. 2004, Santos, P. J. Schaap, Proteins 2010, 78, 32, e26-e26. 2101-2113. [47] P. L. Bergquist, R. A. Reeves, M. D. [64] F. Chen, E. A. Gaucher, N. A. Leal, D. Gibbs, Biomol. Eng. 2005, 22, 63-72. Hutter, S. A. Havemann, S. [48] M. D. Hughes, D. A. Nagel, A. F. Santos, Govindarajan, E. A. Ortlund, S. A. A. J. Sutherland, A. V. Hine, J. Mol. Biol. Benner, Proc. Natl. Acad. Sci. USA 2010, 2003, 331, 973-979. 107, 1948-1953. [49] S. Chopra, A. Ranganathan, Chem. Biol. [65] F. Richter, A. Leaver-Fay, S. D. Khare, S. 2003, 10, 917-926. Bjelic, D. Baker, PLoS One 2011, 6, e19230. [50] M. T. Reetz, M. Bocola, J. D. Carballeira, D. X. Zha, A. Vogel, Angew. Chem. Int. [66] R. Das, D. Baker, Ann. Rev. Biochem. Ed. 2005, 44, 4192-4196. 2008, 77, 363-382. [51] M. T. Reetz, J. D. Carballeira, Nat. [67] H. J. Wijma, R. J. Floor, P. A. Jekel, D. Protoc. 2007, 2, 891-903. Baker, S. J. Marrink, D. B. Janssen, Protein Eng. Des. Sel. 2014, 27, 49-58. [52] S. N. Ho, H. D. Hunt, R. M. Horton, J. K. Pullen, L. R. Pease, Gene 1989, 77, 51- [68] T. A. Addington, R. W. Mertz, J. B. 59. Siegel, J. M. Thompson, A. J. Fisher, V. Filkov, N. M. Fleischman, A. A. Suen, C. [53] R. M. Horton, H. D. Hunt, S. N. Ho, J. K. S. Zhang, M. D. Toney, J. Mol. Biol. Pullen, L. R. Pease, Gene 1989, 77, 61- 2013, 425, 1378-1389. 68. Chapter three|Part I: References 191

[69] C. Bokel, Methods Mol. Biol. 2008, 420, [87] D. W. Romanini, P. Peralta-Yahya, V. 119-138. Mondol, V. W. Cornish, ACS Synthetic Biology 2012, 1, 602-609. [70] R. Haerlin, R. Sussmuth, F. Lingens, FEBS Lett. 1970, 9, 175-176. [88] A. Greener, M. Callahan, B. Jerpseth, Mol. Biotechnol., 7, 189-195. [71] H. Nishioka, Mutat. Res. 1975, 31, 185- 189. [89] O. Selifonova, F. Valle, V. Schellenberger, Appl. Environ. [72] E. Freese, J. Mol. Biol. 1959, 1, 87-105. Microbiol. 2001, 67, 3645-3649. [73] H. J. Muller, Science 1927, 66, 84-87. [90] K. M. Esvelt, J. C. Carlson, D. R. Liu, [74] E. M. Witkin, Bacteriol. Rev. 1976, 40, Nature 2011, 472, 499-503. 869-907. [91] D. L. Alexander, J. Lilly, J. Hernandez, J. [75] W. M. Coco, L. P. Encell, W. E. Levinson, Romsdahl, C. J. Troll, M. Camps, M. J. Crist, A. K. Loomis, L. L. Licato, J. J. Methods Mol. Biol. 2014, 1179, 31-44. Arensdorf, N. Sica, P. T. Pienkos, D. J. [92] A. Ravikumar, A. Arrieta, C. C. Liu, Nat. Monticello, Nat. Biotech. 2002, 20, Chem. Biol. 2014, 10, 175-177. 1246-1250. [93] H. Jochens, U. T. Bornscheuer, [76] W. P. Stemmer, Proc. Natl. Acad. Sci. ChemBioChem 2010, 11, 1861-1866. USA 1994, 91, 10747-10751. [94] S. A. Combs, S. L. DeLuca, S. H. DeLuca, [77] A. Crameri, S.-A. Raillard, E. Bermudez, G. H. Lemmon, D. P. Nannemann, E. D. W. P. C. Stemmer, Nature 1998, 391, Nguyen, J. R. Willis, J. H. Sheehan, J. 288-291. Meiler, Nat. Protoc. 2013, 8, 1277- [78] J. E. Ness, S. Kim, A. Gottman, R. Pak, A. 1298. Krebber, T. V. Borchert, S. [95] D. Rothlisberger, O. Khersonsky, A. M. Govindarajan, E. C. Mundorff, J. Wollacott, L. Jiang, J. DeChancie, J. Minshull, Nat. Biotech. 2002, 20, 1251- Betker, J. L. Gallaher, E. A. Althoff, A. 1255. Zanghellini, O. Dym, S. Albeck, K. N. [79] A. Crameri, W. P. C. Stemmer, Houk, D. S. Tawfik, D. Baker, Nature BioTechniques 1995, 18, 194-196. 2008, 453, 190-195. [80] D. Zha, A. Eipper, M. T. Reetz, [96] L. Jiang, E. A. Althoff, F. R. Clemente, L. ChemBioChem 2003, 4, 34-39. Doyle, D. Röthlisberger, A. Zanghellini, J. L. Gallaher, J. L. Betker, F. Tanaka, C. [81] S. H. Lee, E. J. Ryu, M. J. Kang, E.-S. F. Barbas, D. Hilvert, K. N. Houk, B. L. Wang, Z. Piao, Y. J. Choi, K. H. Jung, J. Y. Stoddard, D. Baker, Science 2008, 319, J. Jeon, Y. C. Shin, J. Mol. Catal. B: 1387-1391. Enzym. 2003, 26, 119-129. [97] X. Garrabou, T. Beck, D. Hilvert, Angew. [82] K. Hiraga, F. H. Arnold, J. Mol. Biol. Chem. Int. Ed. 2015, 54, 5609-5612. 2003, 330, 287-296. [98] H. Kries, R. Blomberg, D. Hilvert, Curr. [83] P. E. O'Maille, M. Bakhtina, M.-D. Tsai, Opin. Chem. Biol. 2013, 17, 221-228. J. Mol. Biol. 2002, 321, 677-691. [99] B. A. Smith, M. H. Hecht, Curr. Opin. [84] H. Zhao, L. Giver, Z. Shao, J. A. Affholter, Chem. Biol. 2011, 15, 421-426. F. H. Arnold, Nat. Biotech. 1998, 16, 258-261. [100] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, [85] W. M. Coco, W. E. Levinson, M. J. Crist, D. J. Lipman, Nucleic Acids Res. 1997, H. J. Hektor, A. Darzins, P. T. Pienkos, C. 25, 3389-3402. H. Squires, D. J. Monticello, Nat. Biotech. 2001, 19, 354-359. [101] S. R. Eddy, Genome Inform. 2009, 23, 205-211. [86] K. M. Müller, S. C. Stebel, S. Knall, G. Zipf, H. S. Bernauer, K. M. Arndt, [102] M. Remmert, A. Biegert, A. Hauser, J. Nucleic Acids Res. 2005, 33, e117. Soding, Nat. Meth. 2012, 9, 173-175. Chapter three|Part I: References 192

[103] H. A. Iqbal, Z. Y. Feng, S. F. Brady, Curr. [120] E. B. Lewis, Cold Spring Harb. Symp. Opin. Chem. Biol. 2012, 16, 109-116. Quant. Biol. 1951, 16, 159-174. [104] D. Aerts, T. Verhaeghe, H. J. Joosten, G. [121] N. H. Horowitz, Proc. Natl. Acad. Sci. Vriend, W. Soetaert, T. Desmet, USA 1945, 31, 153-157. Biotechnol. Bioeng. 2013, 110, 2563- [122] A. I. Oparin, S. Morgulis, The Origin of 2572. Life, Macmillan, New York, 1938. [105] Y. Semba, M. Ishida, S. Yokobori, A. [123] H. Hartman, J. Mol. Evol. 1975, 4, 359- Yamagishi, Protein Eng. Des. Sel. 2015, 370. 28, 221-230. [124] R. A. Jensen, Ann. Rev. Microbiol. 1976, [106] E. Krieger, G. Koraimann, G. Vriend, 30, 409-425. Proteins 2002, 47, 393-402. [125] R. A. Jensen, D. L. Pierson, Nature 1975, [107] H. Jochens, D. Aerts, U. T. Bornscheuer, 254, 667-671. Protein Eng. Des. Sel. 2010, 23, 903- 909. [126] D. J. Cove, Nature 1974, 251, 256-256. [108] T. Verhaeghe, M. Diricks, D. Aerts, W. [127] A. M. Velasco, J. I. Leguina, A. Lazcano, Soetaert, T. Desmet, J. Mol. Catal. B: J. Mol. Evol. 2002, 55, 445-459. Enzym. 2013, 96, 81-88. [128] S. Ohno, Evolution by Gene Duplication, [109] L. Skalden, C. Peters, L. Ratz, U. T. Springer, London, 1970. Bornscheuer, Tetrahedron. [129] A. L. Hughes, Proc. Biol. Sci. 1994, 256, [110] A. Nobili, Y. Tao, I. V. Pavlidis, T. van 119-124. den Bergh, H.-J. Joosten, T. Tan, U. T. [130] U. Bergthorsson, D. I. Andersson, J. R. Bornscheuer, ChemBioChem 2015, 16, Roth, Proc. Natl. Acad. Sci. USA 2007, 805-810. 104, 17004-17009. [111] A. Nobili, M. G. Gall, I. V. Pavlidis, M. L. [131] A. Force, M. Lynch, F. B. Pickett, A. Thompson, M. Schmidt, U. T. Amores, Y.-L. Yan, J. Postlethwait, Bornscheuer, FEBS J 2013, 280, 3084- Genetics 1999, 151, 1531-1545. 3093. [132] J. Näsvall, L. Sun, J. R. Roth, D. I. [112] J. Damborsky, J. Brezovsky, Curr. Opin. Andersson, Science 2012, 338, 384-387. Chem. Biol. 2014, 19, 8-16. [133] P. Gatti-Lafranconi, F. Hollfelder, [113] F. Rago, D. Saltzberg, K. N. Allen, D. R. ChemBioChem 2013, 14, 285-292. Tolan, J. Am. Chem. Soc. 2015, 137, 13876-13886. [134] S. Ohno, Evolution by Gene Duplication, Springer, Berlin, 1970. [114] S. D. Brown, P. C. Babbitt, J. Biol. Chem. 2014, 289, 30221-30228. [135] G. J. Poelarends, V. P. Veetil, C. P. Whitman, Cell. Mol. Life Sci. 2008, 65, [115] K. Chen, F. H. Arnold, Proc. Natl. Acad. 3606-3618. Sci. USA 1993, 90, 5618-5622. [136] G. J. Poelarends, C. P. Whitman, Bioorg. [116] S. Lamare, M.-D. Legoy, M. Graber, Chem. 2004, 32, 376-392. Green Chem. 2004, 6, 445-458. [137] G. J. Poelarends, M. Wilkens, M. J. [117] S. Hackenschmidt, E. J. Moldenhauer, Larkin, J. D. van Elsas, D. B. Janssen, G. A. Behrens, M. Gand, I. V. Pavlidis, U. Appl. Environ. Microbiol. 1998, 64, T. Bornscheuer, ChemCatChem 2014, 6, 2931-2936. 1015-1020. [138] J. A. Peliska, M. H. O'Leary, [118] S. Bencharit, C. C. Edwards, C. L. Biochemistry 1989, 28, 1604-1611. Morton, E. L. Howard-Williams, P. Kuhn, P. M. Potter, M. R. Redinbo, J. Mol. Biol. [139] P. J. O'Brien, D. Herschlag, J. Am. Chem. 2006, 363, 201-214. Soc. 1998, 120, 12369-12370. [119] I. Nobeli, A. D. Favia, J. M. Thornton, [140] R. L. Van Etten, P. P. Waymack, D. M. Nat. Biotech. 2009, 27, 157-167. Rehkop, J. Am. Chem. Soc. 1974, 96, 6782-6785. Chapter three|Part I: References 193

[141] A. Messerschmidt, L. Prade, R. Wever, [159] M. A. Dwyer, L. L. Looger, H. W. Biol. Chem. 1997, 378, 309-315. Hellinga, Science 2004, 304, 1967-1971. [142] C. Parsot, EMBO J. 1986, 5, 3013-3019. [160] T. Lee, Y. Ahn, Bull. Korean Chem. Soc. 2002, 23, 1490-1492. [143] C. Parsot, Proc. Natl. Acad. Sci. USA 1987, 84, 5207-5210. [161] X. W. Feng, C. Li, N. Wang, K. Li, W. W. Zhang, Z. Wang, X. Q. Yu, Green Chem. [144] I. Schildkraut, S. B. Greer, J. Bacteriol. 2009, 11, 1933-1936. 1973, 115, 777-785. [162] A. S. Evitt, U. T. Bornscheuer, Green [145] M. T. Skarstedt, S. B. Greer, J. Biol. Chem. 2011, 13, 1141-1142. Chem. 1973, 248, 1032-1044. [163] G. Torrelo, J. F. Jin, U. Hanefeld, [146] F. W. Alexander, E. Sandmeier, P. K. ChemCatChem 2013, 5, 1304-1307. Mehta, P. Christen, Eur. J. Biochem. 1994, 219, 953-960. [164] K. L. Morley, R. J. Kazlauskas, Trends Biotechnol. 2005, 23, 231-237. [147] F. P. Seebeck, D. Hilvert, J. Am. Chem. Soc. 2003, 125, 10158-10159. [165] N. Tokuriki, D. S. Tawfik, Curr. Opin. Struct. Biol. 2009, 19, 596-604. [148] R. Graber, P. Kasper, V. N. Malashkevich, E. Sandmeier, P. Berger, [166] K. Buchholz, V. Kasche, U. T. H. Gehring, J. N. Jansonius, P. Christen, Bornscheuer, Biocatalysts and Enzyme Eur. J. Biochem. 1995, 232, 686-690. Technology, Wiley-VCH, Weinheim, 2005. [149] R. A. Vacca, S. Giannattasio, R. Graber, E. Sandmeier, E. Marra, P. Christen, J. [167] T. Katsuki, J. Mol. Catal. A: Chem. 1996, Biol. Chem. 1997, 272, 21932-21937. 113, 87-107. [150] C. Branneby, P. Carlqvist, K. Hult, T. [168] T. Katsuki, V. Martin, in Organic Brinck, P. Berglund, J. Mol. Catal. B: Reactions, John Wiley & Sons, Inc., Enzym. 2004, 31, 123-128. 2004. [151] P. Carlqvist, M. Svedendahl, C. [169] T. Katsuki, K. B. Sharpless, J. Am. Chem. Branneby, K. Hult, T. Brinck, P. Soc. 1980, 102, 5974-5976. Berglund, ChemBioChem 2005, 6, 331- [170] T. Linker, Angew. Chem. Int. Ed. 1997, 336. 36, 2060-2062. [152] J. Collot, J. Gradinaru, N. Humbert, M. [171] M. Schober, K. Faber, Trends Skander, A. Zocchi, T. R. Ward, J. Am. Biotechnol. 2013, 31, 468-478. Chem. Soc. 2003, 125, 9030-9031. [172] S. K. Wu, A. T. Li, Y. S. Chin, Z. Li, ACS [153] J. R. Carey, S. K. Ma, T. D. Pfister, D. K. Catal. 2013, 3, 752-759. Garner, H. K. Kim, J. A. Abramite, Z.-G. Wang, Z. Guo, Y. Lu, J. Am. Chem. Soc. [173] H. Waldmann, in Enzyme Catalysis in 2004, 126, 10812-10813. Organic Synthesis (Eds.: K. Drauz, H. Waldmann), Wiley-VCH, Weinheim, [154] J. Guo, W. Huang, G. W. Zhou, R. J. 1995, pp. 271-278. Fletterick, T. S. Scanlan, Proc. Natl. Acad. Sci. USA 1995, 92, 1694-1698. [174] U. T. Bornscheuer, R. J. Kazlauskas, Hydrolases in Organic Synthesis, 2nd [155] J. D. Stewart, S. J. Benkovic, Nature ed., Wiley-VCH, Weinheim, 2006. 1995, 375, 388-391. [175] M. Mischitz, K. Faber, Synlett 1996, 978. [156] B. J. Baas, E. Zandvoort, E. M. Geertsema, G. J. Poelarends, [176] I. V. J. Archer, D. J. Leak, D. A. ChemBioChem 2013, 14, 917-926. Widdowson, Tetrahedron Lett. 1996, 37, 8819-8822. [157] M. T. Reetz, M. Rentzsch, A. Pletsch, M. Maywald, Chimia 2002, 56, 721-723. [177] K. Faber, Biotransformations in organic chemistry, 6th ed., Springer-Verlag, [158] M. A. Dwyer, L. L. Looger, H. W. New York, 2011. Hellinga, Science 2008, 319, 569-569. Chapter three|Part I: References 194

[178] T. Watabe, S. Suzuki, Biochem. Biophys. [193] A. Thomaeus, J. Carlsson, J. Aqvist, M. Res. Commun. 1972, 46, 1120. Widersten, Biochemistry 2007, 46, 2466-2479. [179] S. Karboune, L. Amourache, H. Nellaiah, C. Morisseau, J. Baratti, Biotechnol. [194] A. Thomaeus, A. Naworyta, S. L. Lett. 2001, 23, 1633-1639. Mowbray, M. Widersten, Protein Sci. 2008, 17, 1275-1284. [180] W. Kroutil, R. V. A. Orru, K. Faber, Biotechnol. Lett. 1998, 20, 373-377. [195] B. van Loo, H. P. Permentier, J. Kingma, H. Baldascini, D. B. Janssen, FEBS Lett. [181] L. Cao, Carrier-bound Immobilized 2008, 582, 1581-1586. Enzymes: Principles, Applications and Design, Wiley-VCH, Weinheim, 2005. [196] L. Y. Rui, L. Cao, W. Chen, K. F. Reardon, T. K. Wood, Appl. Environ. Microbiol. [182] M. Mischitz, K. Faber, Tetrahedron Lett. 2005, 71, 3995-4003. 1994, 35, 81-84. [197] S. L. Mowbray, L. T. Elfstrom, K. M. [183] M. Nardini, I. S. Ridder, H. J. Rozeboom, Ahlgren, C. E. Andersson, M. Widersten, K. H. Kalk, R. Rink, D. B. Janssen, B. W. Protein Sci. 2006, 15, 1628-1637. Dijkstra, J. Biol. Chem. 1999, 274, 14579-14586. [198] G. A. Gomez, C. Morisseau, B. D. Hammock, D. W. Christianson, [184] M. Arand, D. F. Grant, J. K. Beetham, T. Biochemistry 2004, 43, 4716-4723. Friedberg, F. Oesch, B. D. Hammock, FEBS Lett. 1994, 338, 251-256. [199] C. D. Bahl, D. P. MacEachran, G. A. O'Toole, D. R. Madden, Acta [185] M. Nardini, R. Rink, D. B. Janssen, B. W. Crystallogr., Sect. F: Struct. Biol. Cryst. Dijkstra, J. Mol. Catal. B: Enzym. 2001, Commun. 2010, 66, 26-28. 11, 1035-1042. [200] C. D. Bahl, C. Morisseau, J. M. [186] T. Yamada, C. Morisseau, J. E. Maxwell, Bomberger, B. A. Stanton, B. D. M. A. Argiriadi, D. W. Christianson, B. D. Hammock, G. A. O'Toole, D. R. Madden, Hammock, J. Biol. Chem. 2000, 275, J. Bacteriol. 2010, 192, 1785-1795. 23082-23088. [201] R. Rink, M. Fennema, M. Smids, U. [187] M. Arand, B. M. Hallberg, J. Y. Zou, T. Dehmel, D. B. Janssen, J. Biol. Chem. Bergfors, F. Oesch, M. J. van der Werf, 1997, 272, 14650-14657. J. A. M. de Bont, T. A. Jones, S. L. Mowbray, EMBO J. 2003, 22, 2583- [202] M. Decker, M. Arand, A. Cronin, Arch. 2592. Toxicol. 2009, 83, 297-318. [188] B. van Loo, J. H. L. Spelberg, J. Kingma, [203] J. Meijer, J. W. Depierre, Chem. Biol. T. Sonke, M. G. Wubbolts, D. B. Janssen, Interact. 1988, 64, 207-249. Chem. Biol. 2004, 11, 981-990. [204] T. Shimada, Drug Metab. Pharmacokin. [189] C. D. Bahl, D. P. MacEachran, G. A. 2006, 21, 257-276. O'Toole, D. R. Madden, Acta [205] F. P. Guengerich, Chem. Res. Toxicol. Crystallogr., Sect. F: Struct. Biol. Cryst. 2008, 21, 70-83. Commun. 2010, 66, 26-28. [206] F. P. Guengerich, Chem. Res. Toxicol. [190] C. D. Bahl, C. Morisseau, J. M. 2006, 19, 1679-1679. Bomberger, B. A. Stanton, B. D. Hammock, G. A. O'Toole, D. R. Madden, [207] A. N. Simpkins, R. D. Rudic, D. A. J. Bacteriol. 2010, 192, 1785-1795. Schreihofer, S. Roy, M. Manhiani, H. J. Tsai, B. D. Hammock, J. D. Imig, Am. J. [191] R. Rink, J. Kingma, J. H. L. Spelberg, D. B. Pathol. 2009, 174, 2086-2095. Janssen, Biochemistry 2000, 39, 5600- 5613. [208] C. J. Sinal, M. Miyata, M. Tohkin, K. Nagata, J. R. Bend, F. J. Gonzalez, J. Biol. [192] K. H. Hopmann, F. Himo, J. Phys. Chem. Chem. 2000, 275, 40504-40510. B 2006, 110, 21299-21310. [209] W. R. Zhang, I. P. Koerner, R. Noppens, M. Grafe, H. J. Tsai, C. Morisseau, A. Luria, B. D. Hammock, J. R. Falck, N. J. Chapter three|Part I: References 195

Alkayed, J. Cereb. Blood Flow Metab. [226] A. Butler, M. Sandy, Nature 2009, 460, 2007, 27, 1931-1940. 848-854. [210] S. Ye, D. P. MacEachran, J. W. Hamilton, [227] K. T. Fielman, S. A. Woodin, D. E. G. A. O'Toole, B. A. Stanton, Am. J. Lincoln, Environ. Toxicol. Chem. 2001, Physiol. Cell Physiol. 2008, 295, C807- 20, 738-747. C818. [228] W. Vetter, Revi. Enviro. Contam. [211] F. Badalassi, D. Wahler, G. Klein, P. Toxicol. 2006, 188, 1-57. Crotti, J. L. Reymond, Angew. Chem. Int. [229] A. Jesenska, M. Monincova, T. Ed. 2000, 39, 4067-4070. Koudelakova, K. Hasan, R. Chaloupkova, [212] D. Wahler, R. Jean-Louis, Angew. Chem. Z. Prokop, A. Geerlof, J. Damborsky, Int. Ed. 2002, 41, 1229-1232. Appl. Environ. Microbiol. 2009, 75, 5157-5160. [213] K. Doderer, R. D. Schmid, Biotechnol. Lett. 2004, 26, 835-839. [230] D. B. Janssen, J. P. Schanstra, Curr. Opin. Biotechnol. 1994, 5, 253-259. [214] M. Kotik, W. Zhao, G. Lacazio, A. Archelas, J. Mol. Catal. B: Enzym. 2013, [231] B. E. Jugder, H. Ertan, M. Lee, M. 91, 44-51. Manefield, C. P. Marquis, Trends Biotechnol. 2015, 33, 595-610. [215] V. S. Fluxa, D. Wahler, J. L. Reymond, Nat. Protoc. 2008, 3, 1270-1277. [232] X. B. Su, L. Y. Deng, K. F. Kong, J. S. H. Tsang, Biotechnol. Bioeng. 2013, 110, [216] J. H. L. Spelberg, L. X. Tang, R. M. 2687-2696. Kellogg, D. B. Janssen, Tetrahedron: Asymmetry 2004, 15, 1095-1102. [233] L. X. Tang, J. H. L. Spelberg, M. W. Fraaije, D. B. Janssen, Biochemistry [217] J. H. L. Spelberg, L. X. Tang, M. van 2003, 42, 5378-5386. Gelder, R. M. Kellogg, D. B. Janssen, Tetrahedron: Asymmetry 2002, 13, [234] D. B. Janssen, J. E. Oppentocht, G. J. 1083-1089. Poelarends, Curr. Opin. Biotechnol. 2001, 12, 254-258. [218] S. Panke, M. Wubbolts, Curr. Opin. Chem. Biol. 2005, 9, 188-194. [235] D. B. Janssen, I. J. T. Dinkla, G. J. Poelarends, P. Terpstra, Environ.l [219] K. Faber, W. Kroutil, Curr. Opin. Chem. Microbiol. 2005, 7, 1868-1882. Biol. 2005, 9, 181-187. [236] P. Dvorak, S. Bidmanova, J. Damborsky, [220] Z. Prokop, F. Oplustil, J. DeFrank, J. Z. Prokop, Environ. Sci. Technol. 2014, Damborsky, Biotechnol. J. 2006, 1, 48, 6859-6866. 1370-1380. [237] N. P. Kurumbang, P. Dvorak, J. Bendl, J. [221] Y. Nagata, Y. Ohtsubo, M. Tsuda, Appl. Brezovsky, Z. Prokop, J. Damborsky, ACS Microbiol. Biotechnol. 2015, 99, 9865- Synth. Biol. 2014, 3, 172-181. 9881. [238] M. Lahoda, J. R. Mesters, A. [222] K.-H. van Pée, in Enzyme Catalysis in Stsiapanava, R. Chaloupkova, M. Kuty, J. Organic Synthesis, Vol. 3, 3rd ed. (Eds.: Damborsky, I. K. Smatanova, Acta K. Drauz, H. Gröger, O. May), Wiley- Crystallogr., Sect. D: Biol. Crystallogr. VCH, Weinheim, 2012, pp. 1569-1584. 2014, 70, 209-217. [223] G. W. Gribble, Chemosphere 2003, 52, [239] M. Lahoda, R. Chaloupkova, A. 289-297. Stsiapanava, J. Damborsky, I. K. [224] A. Butler, Curr. Opin. Chem. Biol. 1998, Smatanova, Acta Crystallogr., Sect. F: 2, 279-285. Struct. Biol. Commun. 2011, 67, 397- 400. [225] C. Leblanc, H. Vilter, J. B. Fournier, L. Delage, P. Potin, E. Rebuffet, G. Michel, [240] G. Samin, M. Pavlova, M. I. Arif, C. P. P. L. Solari, M. C. Feiters, M. Czjzek, Postema, J. Damborsky, D. B. Janssen, Coord. Chem. Rev. 2015, 301–302, 134- Appl. Environ. Microbiol. 2014, 80, 146. 5467-5476. Chapter three|Part I: References 196

[241] G. L. Mena-Benitez, F. Gandia-Herrero, [256] J. P. Schanstra, D. B. Janssen, S. Graham, T. R. Larson, S. J. McQueen- Biochemistry 1996, 35, 5624-5632. Mason, C. E. French, E. L. Rylott, N. C. [257] Z. Prokop, M. Monincova, R. Bruce, Plant Physiol. 2008, 147, 1192- Chaloupkova, M. Klvana, Y. Nagata, D. 1198. B. Janssen, J. Damborsky, J. Biol. Chem. [242] H. Naested, M. Fennema, L. Hao, M. 2003, 278, 45094-45100. Andersen, D. B. Janssen, J. Mundy, [258] G. H. Krooshof, R. Floris, A. W. J. W. Plant J. 1999, 18, 571-576. Tepper, D. B. Janssen, Protein Sci. 1999, [243] E. Uchida, T. Ouchi, Y. Suzuki, T. 8, 355-360. Yoshida, H. Habe, I. Yamaguchi, T. [259] J. Hladilkova, Z. Prokop, R. Omori, H. Nojiri, Environ. Sci. Technol. Chaloupkova, J. Damborsky, P. 2005, 39, 7671-7677. Jungwirth, J. Phys. Chem. B 2013, 117, [244] G. Stucki, M. Thuer, Appl. Microbiol. 14329-14335. Biotechnol. 1994, 42, 167-172. [260] K. H. G. Verschueren, J. Kingma, H. J. [245] G. Stucki, M. Thuer, Environ. Sci. Rozeboom, K. H. Kalk, D. B. Janssen, B. Technol. 1995, 29, 2339-2345. W. Dijkstra, Biochemistry 1993, 32, 9031-9037. [246] S. Fetzner, Appl. Microbiol. Biotechnol. 1998, 50, 633-657. [261] K. Kahn, T. C. Bruice, J. Phys. Chem. B 2003, 107, 6876-6885. [247] N. Kasai, K. Sakaguchi, Tetrahedron Lett. 1992, 33, 1211-1212. [262] F. Pries, J. Kingma, G. H. Krooshof, C. M. Jeronimus-Stratingh, A. P. Bruins, D. B. [248] J. G. van Leeuwen, H. J. Wijma, R. J. Janssen, J. Biol. Chem. 1995, 270, Floor, J. M. van der Laan, D. B. Janssen, 10405-11411. ChemBioChem 2012, 13, 137-148. [263] C. Kennes, F. Pries, G. H. Krooshof, E. [249] D. B. Janssen, F. Pries, J. Vanderploeg, Bokma, J. Kingma, D. B. Janssen, Eur. J. B. Kazemier, P. Terpstra, B. Witholt, J. Biochem. 1995, 228, 403-407. Bacteriol. 1989, 171, 6791-6799. [264] J. Damborsky, R. Chaloupkova, M. [250] J. P. Schanstra, R. Rink, F. Pries, D. B. Pavlova, E. Chovancova, J. Brezovsky, in Janssen, Protein Expr. Purif. 1993, 4, Handbook of Hydrocarbon and Lipid 479-489. Microbiology (Ed.: K. N. Timmis), [251] S. M. Franken, H. J. Rozeboom, K. H. Springer, Berlin, 2010, pp. 1081-1098. Kalk, B. W. Dijkstra, EMBO J. 1991, 10, [265] J. Marek, J. Vevodova, I. K. Smatanova, 1297-1302. Y. Nagata, L. A. Svensson, J. Newman, [252] J. Damborsky, J. Koca, Protein Eng. M. Takagi, J. Damborsky, Biochemistry 1999, 12, 989-998. 2000, 39, 14082-14086. [253] M. Klvana, M. Pavlova, T. Koudelakova, [266] J. Newman, T. S. Peat, R. Richard, L. R. Chaloupkova, P. Dvorak, Z. Prokop, A. Kan, P. E. Swanson, J. A. Affholter, I. H. Stsiapanava, M. Kuty, I. Kuta- Holmes, J. F. Schindler, C. J. Unkefer, T. Smatanova, J. Dohnalek, P. Kulhanek, R. C. Terwilliger, Biochemistry 1999, 38, C. Wade, J. Damborsky, J. Mol. Biol. 16105-16114. 2009, 392, 1339-1356. [267] G. H. Krooshof, E. M. Kwant, J. [254] M. Pavlova, M. Klvana, Z. Prokop, R. Damborsky, J. Koca, D. B. Janssen, Chaloupkova, P. Banas, M. Otyepka, R. Biochemistry 1997, 36, 9571-9580. C. Wade, M. Tsuda, Y. Nagata, J. [268] K. H. G. Verschueren, S. M. Franken, H. Damborsky, Nat. Chem. Biol. 2009, 5, J. Rozeboom, K. H. Kalk, B. W. Dijkstra, 727-733. J. Mol. Biol. 1993, 232, 856-872. [255] T. Bosma, M. G. Pikkemaat, J. Kingma, J. [269] M. G. Pikkemaat, I. S. Ridder, H. J. Dijk, D. B. Janssen, Biochemistry 2003, Rozeboom, K. H. Kalk, B. W. Dijkstra, D. 42, 8047-8053. B. Janssen, Biochemistry 1999, 38, 12052-12061. Chapter three|Part I: References 197

[270] M. Silberstein, J. Damborsky, S. Vajda, Environ. Toxicol. Chem. 2001, 20, 2681- Biochemistry 2007, 46, 9239-9249. 2689. [271] D. B. Janssen, Curr. Opin. Chem. Biol. [283] S. Bjelic, I. Jelesarov, J. Mol. Recognit. 2004, 8, 150-159. 2008, 21, 289-311. [272] R. J. Floor, H. J. Wijma, D. I. Colpa, A. [284] M. Monincova, Z. Prokop, J. Vevodova, Ramos-Silva, P. A. Jekel, W. Szymanski, Y. Nagata, J. Damborsky, Appl. Environ. B. L. Feringa, S. J. Marrink, D. B. Microbiol. 2007, 73, 2005-2008. Janssen, ChemBioChem 2014, 15, 1660- [285] J. Wajzer, C. R. Hebd. Seances Acad. Sci. 1672. 1949, 229, 1270-1272. [273] S. Marvanova, Y. Nagata, M. [286] D. B. Janssen, J. Gerritse, J. Brackman, Wimmerova, J. Sykorova, K. Hynkova, J. C. Kalk, D. Jager, B. Witholt, Eur. J. Damborsky, J. Microbiol. Methods 2001, Biochem. 1988, 171, 67-72. 44, 149-157. [287] S. Keuning, D. B. Janssen, B. Witholt, J. [274] M. Otyepka, J. Damborský, Prot. Sci. Bacteriol. 1985, 163, 635-639. 2002, 11, 1206-1217. [288] M. Dörr, M. P. C. Fibinger, D. Last, S. [275] V. Liskova, D. Bednar, T. Prudnikova, P. Schmidt, J. Santos-Aberturas, C. Vickers, Rezacova, T. Koudelakova, E. M. Voss, U. T. Bornscheuer, Biotechnol. Sebestova, I. K. Smatanova, J. Bioeng. 2016. Brezovsky, R. Chaloupkova, J. Damborsky, ChemCatChem 2015, 7, [289] D. Yi, T. Devamani, J. Abdoul-Zabar, F. 648-659. Charmantray, V. Helaine, L. Hecquet, W. D. Fessner, ChemBioChem 2012, 13, [276] Z. Prokop, Y. Sato, J. Brezovsky, T. 2290-2300. Mozga, R. Chaloupkova, T. Koudelakova, P. Jerabek, V. [290] L. E. Janes, A. C. Lowendahl, R. J. Stepankova, R. Natsume, J. G. van Kazlauskas, Chem. Eur. J. 1998, 4, 2324- Leeuwen, D. B. Janssen, J. Florian, Y. 2331. Nagata, T. Senda, J. Damborsky, Angew. [291] P. Holloway, J. T. Trevors, H. Lee, J. Chem. Int. Ed. 2010, 49, 6111-6115. Microbiol. Methods 1997, 32, 31-36. [277] F. Pries, A. J. Vandenwijngaard, R. Bos, [292] H. Zhao, Methods Mol. Biol. 2003, 230, M. Pentenga, D. B. Janssen, J. Biol. 213-221. Chem. 1994, 269, 17490-17494. [293] T. M. Phillips, A. G. Seech, H. Lee, J. T. [278] A. Fortova, E. Sebestova, V. Trevors, J. Microbiol. Methods 2001, 47, Stepankova, T. Koudelakova, L. Palkova, 181-188. J. Damborsky, R. Chaloupkova, Biochimie 2013, 95, 2091-2096. [294] G. M. Lacourciere, R. N. Armstrong, J. Am. Chem. Soc. 1993, 115, 10466- [279] I. Drienovska, E. Chovancova, T. 10467. Koudelakova, J. Damborsky, R. Chaloupkova, Appl. Environ. Microbiol. [295] M. Nardini, B. W. Dijkstra, Curr. Opin. 2012, 78, 4995-4998. Struct. Biol. 1999, 9, 732-737. [280] K. Tratsiak, O. Degtjarik, I. Drienovska, [296] M. H. J. Jacobs, A. J. Van Den L. Chrast, P. Rezacova, M. Kuty, R. Wijngaard, M. Pentenga, D. B. Janssen, Chaloupkova, J. Damborsky, I. K. Eur. J. Biochem. 1991, 202, 1217-1222. Smatanova, Acta Crystallogr., Sect. F: [297] J. K. Beetham, D. Grant, M. Arand, J. Struct. Biol. Cryst. Commun. 2013, 69, Garbarino, T. Kiyosue, F. Pinot, F. 683-688. Oesch, W. R. Belknap, K. Shinozaki, B. D. [281] W. Y. Chan, M. Wong, J. Guthrie, A. V. Hammock, DNA Cell Biol. 1995, 14, 61- Savchenko, A. F. Yakunin, E. F. Pai, E. A. 71. Edwards, Microb. Biotechnol. 2010, 3, [298] L. Y. Rui, L. Cao, W. Chen, K. F. Reardon, 107-120. T. K. Wood, J. Biol. Chem. 2004, 279, [282] J. Damborsky, E. Rorije, A. Jesenska, Y. 46810-46817. Nagata, G. Klopman, W. J. Peijnenburg, Chapter three|Part I: References 198

[299] D. M. Nedrud, H. Lin, G. Lopez, S. K. [315] N. Sreerama, S. Y. Venyaminov, R. W. Padhi, G. A. Legatt, R. J. Kazlauskas, Woody, Anal. Biochem. 2000, 287, 243- Chem. Sci. 2014, 5, 4265-4277. 251. [300] G. W. Gribble, Naturally occurring [316] N. Sreerama, R. W. Woody, Anal. organohalogen compounds: a Biochem. 2000, 287, 252-260. comprehensive update, Springer-Verlag, [317] L. Whitmore, B. A. Wallace, Wien, 2010. Biopolymers 2008, 89, 392-400. [301] J. Littlechild, Curr. Opin. Chem. Biol. [318] A. Jesenska, M. Pavlova, M. Strouhal, R. 1999, 3, 28-34. Chaloupkova, I. Tesinska, M. [302] A. Rauwerdink, R. J. Kazlauskas, ACS Monincova, Z. Prokop, M. Bartos, I. Catal. 2015, 5, 6153-6176. Pavlik, I. Rychlik, P. Mobius, Y. Nagata, J. Damborsky, Appl. Environ. Microbiol. [303] T. Davids, dissertation thesis, University 2005, 71, 6736-6745. of Greifswald (Greifswald), 2012. [319] R. Chaloupkova, Z. Prokop, Y. Sato, Y. [304] M. P. C. Fibinger, diploma thesis, Nagata, J. Damborsky, FEBS J. 2011, University of Greifswald (Greifswald), 278, 2728-2738. 2012. [320] K. Hynkova, Y. Nagata, M. Takagi, J. [305] G. J. Freiherr von Saß, B. Sc. thesis, Damborsky, FEBS Lett. 1999, 446, 177- University of Greifswald (Greifswald), 181. 2012. [321] L. Whitmore, B. A. Wallace, Nucleic [306] L. Weiher, diploma thesis, University of Acids Res. 2004, 32, W668-W673. Greifswald (Greifswald), 2014. [322] W. Kabsch, C. Sander, Biopolymers [307] C. H. Chang, J. F. Schindler, C. J. 1983, 22, 2577-2637. Unkefer, L. A. Vanderberg, J. R. Brainard, T. C. Terwilliger, Bioorg. Med. [323] H. M. Berman, J. Westbrook, Z. Feng, G. Chem. 1999, 7, 2175-2181. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, P. E. Bourne, Nucleic Acids [308] E. Krieger, T. Darden, S. B. Nabuurs, A. Res. 2000, 28, 235-242. Finkelstein, G. Vriend, Proteins 2004, 57, 678-683. [324] L. Rostock, M. Sc. thesis, University of Greifswald (Greifswald), 2015. [309] PyMol, The PyMOL Molecular Graphics System, Version 1.5.0.4 Schrödinger, [325] I. Iwasaki, S. Utsumi, T. Ozawa, Bull. LLC. Chem. Soc. Jpn. 1952, 25, 226-226. [310] E. Chovancova, A. Pavelka, P. Benes, O. [326] J. F. Schindler, P. A. Naranjo, D. A. Strnad, J. Brezovsky, B. Kozlikova, A. Honaberger, C. H. Chang, J. R. Brainard, Gora, V. Sustr, M. Klvana, P. Medek, L. L. A. Vanderberg, C. J. Unkefer, Biedermannova, J. Sochor, J. Biochemistry 1999, 38, 5772-5778. Damborsky, PLoS Comput. Biol. 2012, 8, [327] M. P. C. Fibinger, T. Davids, D. Bottcher, e1002708. U. T. Bornscheuer, Appl. Microbiol. [311] L. Rostock, B. Sc. thesis, University of Biotechnol. 2015, 99, 8955-8962. Greifswald (Greifswald), 2013. [328] V. Stepankova, M. Khabiri, J. Brezovsky, [312] N. Berova, K. Nakanishi, R. W. Woody, A. Pavelka, J. Sykora, M. Amaro, B. Circular Dichroism: Principles and Minofar, Z. Prokop, M. Hof, R. Ettrich, Applications, 2nd ed., Wiley, Oxford, R. Chaloupkova, J. Damborsky, 2000. ChemBioChem 2013, 14, 890-897. [313] G. D. Fasman, Circular Dichroism and [329] J. Haq, Microcal, LLC application note the Conformational Analysis of 2002. Biomolecules, Springer, US, 1996. [330] B. A. Williams, E. J. Toone, J. Org. Chem. [314] N. J. Greenfield, Nat. Protoc. 2006, 1, 1993, 58, 3507-3510. 2876-2890. [331] G. D. Watt, Anal. Biochem. 1990, 187, 141-146. Chapter three|Part I: References 199

[332] G. V. Los, L. P. Encell, M. G. McDougall, [348] A. Zanghellini, L. Jiang, A. M. Wollacott, D. D. Hartzell, N. Karassina, C. Zimprich, G. Cheng, J. Meiler, E. A. Althoff, D. M. G. Wood, R. Learish, R. F. Ohana, M. Rothlisberger, D. Baker, Protein science Urh, D. Simpson, J. Mendez, K. : a publication of the Protein Society Zimmerman, P. Otto, G. Vidugiris, J. 2006, 15, 2785-2794. Zhu, A. Darzins, D. Klaubert, R. F. [349] C. Li, A. Wen, B. Shen, J. Lu, Y. Huang, Y. Bulleit, K. V. Wood, ACS Chem. Biol. Chang, BMC Biotechnol. 2011, 11, 92. 2008, 3, 373-382. [350] N. Greenfield, G. D. Fasman, [333] K. Hasan, A. Gora, J. Brezovsky, R. Biochemistry-Us 1969, 8, 4108-4116. Chaloupkova, H. Moskalikova, A. Fortova, Y. Nagata, J. Damborsky, Z. [351] N. Sreerama, S. Y. Venyaminov, R. W. Prokop, FEBS J. 2013, 280, 3149-3159. Woody, Biophys J 1999, 76, A381-A381. [334] A. de Marco, Microb. Cell Fact. 2009, 8, [352] M. J. Todd, J. Gomez, Anal. Biochem. 26. 2001, 296, 179-187. [335] K. Nishihara, M. Kanemori, M. Kitagawa, H. Yanagi, T. Yura, Appl. Environ. Microbiol. 1998, 64, 1694- 1699. [336] K. Nishihara, M. Kanemori, H. Yanagi, T. Yura, Appl. Environ. Microbiol. 2000, 66, 884-889. [337] J. Santos-Aberturas, M. Dorr, G. S. Waldo, U. T. Bornscheuer, Chem. Biol. 2015, 22, 1406-1414. [338] M. D. Hanwell, D. E. Curtis, D. C. Lonie, T. Vandermeersch, E. Zurek, G. R. Hutchison, J. Cheminform. 2012, 4. [339] J. C. Gordon, J. B. Myers, T. Folta, V. Shoja, L. S. Heath, A. Onufriev, Nucleic Acids Res. 2005, 33, W368-W371. [340] M. F. Sanner, J. Mol. Graph. Model. 1999, 17, 57-61. [341] G. M. Morris, D. S. Goodsell, R. S. Halliday, R. Huey, W. E. Hart, R. K. Belew, A. J. Olson, J. Comput. Chem. 1998, 19, 1639-1662. [342] F. J. Solis, R. J. B. Wets, Math. Oper. Res. 1981, 6, 19-30. [343] J. J. P. Stewart, J. Comput. Aid. Mol. Des. 1990, 4, 1-45. [344] J. J. P. Stewart, J. Comput. Chem. 1989, 10, 209-220. [345] J. J. P. Stewart, Int. J. Quantum Chem. 1996, 58, 133-146. [346] M. Cernohorsky, M. Kuty, J. Koca, Comput. Chem. 1997, 21, 35-44. [347] J. Damborsky, M. Prokop, J. Koca, Trends Biochem. Sci. 2001, 26, 71-73.

Nothing in biology makes sense, except in the light of evolution. Theodosius Dobzhansky

203

Chapter 3. Part II

Continuous directed evolution by tunable random mutagenesis of the target gene with an error-prone DNA polymerase I

Mutagenesis is an evolution factor for adaptive evolution under selection pressure. Thus, the specific or random substitution, insertion or deletion of particular nucleobases, sequences, genes, regulatory elements or genome parts can be performed by a variety of methods.[1] For directed evolution in the context of protein engineering, mostly discontinuous methods were described, hampering the natural evolution process. A very elegant method of phage-assisted continuous evolution (PACE) was described recently.[2-3] Therein, phage particles re-infect a constant influx of host cells, equipped with a mutator plasmid, while cells in the efflux were screened for the desired properties. Thus, the mutagenesis was non-targeted, influenced the total DNA content of the host cells and also the generation of a consecutive adaption towards a certain selection pressure was not performed. The ColE1 dependent plasmid propagation by DNA-directed DNA polymerase I (PolI)[4] was used in this thesis to insert a gene of interest downstream the ColE1 origin of replication, while an error-prone variant of PolI (epPolI)[5-7] was expressed temperature-dependent. Refining the herein presented strategy, the genomic encoded PolI is turned-off during mutagenesis, due to the temperature-sensitive allele polA12, enabling constant mutagenesis during a continuous directed evolution process.

Chapter three|Part II 204

Content

Introduction ...... 205

Enzyme evolution - from superfamilies and ancestors ...... 205

Selection and screening ...... 205

Mutagenesis methods - in vitro versus in vivo ...... 207

In vivo mutagenesis by mutator plasmids...... 210

The versatility of DNA polymerase I ...... 211

The ColE1 origin of replication family of plasmids ...... 212

Error-prone mutagenesis by a low fidelity DNA polymerase I ...... 214

The story of the λ promoter ...... 216

The concept of continuous directed in vivo evolution ...... 218

Results ...... 221

The host for the target gene - pHEP ...... 221

The host strain ...... 223

Strain TAF02...... 224

Strain JS200 (DE3) ...... 230

Mutagenesis by error-prone DNA polymerase I ...... 231

Discussion...... 236

Summary ...... 243

Materials and Methods ...... 244

pHEP - the host plasmid for target genes ...... 244

TAF02/JS200 (DE3) - genome editing ...... 244

pRED/ET recombination ...... 244

Lysogenization of JS200 ...... 247

pDFI - the error-prone plasmid ...... 247

Continuous mutagenesis ...... 248

Abbreviations ...... 250 References ...... 251

Chapter three|Part II 205

Introduction

Evolution has developed manifold pathways to ensure genetic stability, while delivering a suitable mutagenesis background for the adaption to environmental changes. The protection of the genetic code, the origin of life, is one of the most regulated systems, finely keeping the balance between stable growth and cell division versus starvation and cellular stress. In prokaryotes, the mechanisms are well described, but certain facts remained still unclear. Thus, the genetic engineering towards dysregulation or the adaption of the bacterial DNA towards specific industrial or scientific applications is manifold.[8]

Enzyme evolution - from superfamilies and ancestors During the evolutionary process, enzymes were developed for assisting in biological processes for chemical transformations. The high diversity of known enzymes is mostly divided into different classes by the chemical properties of the catalyzed reactions. Interestingly, biological function and thus, scaffold structures are usually more conserved than the rough amino acid sequences. A common exception is the appearance of certain motif sequences such as the nucleophilic elbow of α/β hydrolase- fold enzymes or the nucleotide-binding Rossmann fold.[9-11] As special fold classes developed, the scaffold diversity of proteins and enzymes was innovated. Nearly all enzymes can be assigned to a special fold class. The arrangement of catalytic residues within the enzyme or on the surface of the protein is similar within the defined enzyme class, but by small exchanges, completely different chemistry can be performed.[12-13] However, by mechanistic, chemical or structural aspects, enzymes are clustered in superfamilies, e.g. the ,[14] α/β hydrolase-fold[10], amidohydrolase[15], crotonase[16] or nudix[17-18] family.[19] As a clear structural relationship is obvious, the idea of a common ancestor molecule raises, inventing the structural properties as crucial selection advantage, ensuring the more or less selective chemical transformation of substrates in a suitable chemical environment. Those ancestral precursors are the hypothetical origins of the known protein families, wherein the protein architecture was able to incorporate other catalytic residues or to develop new active sites.[20-23]

Selection and screening The description of new proteins and new activities must be performed by intensive characterization. In recent years, the number of putatively annotated enzymes raised exponentially due to the high throughput of next-generation sequencing and the metagenome analysis. In this high sequence variety, it is quite difficult to identify the natural wild-type activity of an enzyme as it might exhibit a variety of small minor activities, mediated by substrate ambiguity or the use of cofactors. Thus, a library of putative proteins derived either from a metagenome approach or by laboratory Chapter three|Part II: Introduction 206 mutagenesis must be screened for the desired property. The main problem is the throughput of the available methods, as a certain amount of a library has to undergo screening for a sufficient coverage. If laboratory mutagenesis has been performed, the size potentiates with the targeted hotspots. Furthermore, the final bottleneck is the screening motivation, then finally, “you get what you screen for”.[24] Thus, the screening process is more or less forced towards pre-defined favored activities. In most cases, a pre-selection of the variants is desired to decrease the library size and to ensure the success of the screening procedure.

Nowadays, high-throughput screening methodologies enable ever-increasing mutant libraries and thus a high sequence variance throughout the library can be screened. The high-throughput benchmark was set by Mayr et al. to be 50,000-100,000 data points per day and newer ultra-high-throughput methods create even higher numbers.[25] For example, Ostafe et al. screened a library of 100,000 glucose oxidase mutants to identify a more active mutant for its application under physiological conditions.[26] The same group used also FACS for the engineering of a cellulase.[27] The FACS technology usually increases the number of operated samples dramatically. Thus, an enantioselective esterase was developed successfully by this powerful method.[28] The application of high-throughput screening is also common to the pharmacological drug discovery, screening compounds and targets for new medicinal treatments.[29-30] Additionally, high- throughput doping control systems were described. These systems can handle up to 400 samples daily, covering the detection of nearly all prohibited drugs with very high sensitivities.[31] Furthermore, Dörr et al. recently described the versatile setup of a robotic platform for completely different applications in enzyme catalysis, enabling a high-throughput for screening diverse enzyme classes, with the featured possibility of parallel screening runs.[32]

The necessity of screening efforts decreases, when a proper selection procedure is coupled to the desired enzymatic activity. Thus, the availability of growth substrates, the detoxification of harmful substrates[33-34], antibiotic resistances or the access to nutrients[35-36] significantly focuses the mutant library towards the favored activity or property. Chen et al. used the toxic ccdB gene product, which inhibits the DNA gyrase and subsequently causes cell death, for the development of homing endonucleases towards specific, non-natural cleavage sites.[37] Thereby, the idea of Gruen et al.[38] was improved by an enhanced arabinose response (the integration of an arabinose transporter mutant) and a reverse engineered promoter. Specific selection and directed evolution in a continuous process was achieved by phage-assisted continuous evolution (PACE). With a deleterious phage, the mutagenesis and the selection was focused on cells, complementing the phage towards infectivity. Thus, the selection phage promotes continuous rounds of mutagenesis by negative selection and stringency modulation.[3] Chapter three|Part II: Introduction 207

Mutagenesis methods - in vitro versus in vivo Generally, the directed mutagenesis of microorganisms can be achieved in vitro by manipulating the genes or the genome before they are transformed into living cells or in vivo using the whole cell host for mutagenesis, triggering the repair mechanisms of the related cell. Several in vitro systems exists, which can be targeted (site-directed mutagenesis,[39-42] alanine scanning[43-44], combinatorial active-site test[45-46], site- directed homology recombination[47-49]) or randomly (error-prone PCR[50-53], gene shuffling[54-58], ITCHY[59], SeSaM[60], RID[61] or MAGIC[62]) by introducing different kind of mutations to the genes.[63] The mutation rate and the transversion/transition ratios differ strongly, dependent on the chosen method and the specified reaction conditions. The application of recombination methods is also widespread. This can include the substitution of specific sequence parts or even whole genetic assemblies to create hybrid enzymes, either by using homology sites (shuffling) or homing endonucleases (MAGIC). As gene editing and the knowledge about DNA sequences and their genomic organization was scarcely known to early researchers, in vivo methodologies were developed, mostly using chemical mutagens (EMS[64], MNNG[65], metal ions[66], azaserine, carzinophillin[67], methyl methanesulphonate or mitomycin C[68]) or X-ray[69] and UV[70] radiation. During these studies, mostly uvrA deficient E. coli strains were used, unable to repair UV-mediated DNA damage. Furthermore, stress-induced mutagenesis can be used for the accumulation of beneficial nucleotide substitutions during cellular stress for adaptive evolution.[71-72] The origins of mutations under selection pressure were excellently discussed by Roth et al.[73] For the establishment of more elegant in vivo mutation methodologies, distinct knowledge about bacterial DNA repair mechanisms is a prerequisite. If DNA gets damaged by radiation, mutagens, alkylation, nicks or strand breaks, detectable amounts of single-stranded DNA (ssDNA) are exposed. This relatively unspecific stress signal is recognized by the recA gene product.[74] It binds to ssDNA and migrates along the DNA to detect homolog sequences, performing conformational proofreading by stretching the DNA duplex.[75-76] The protein is also able to fix dsDNA, by tightly binding two single strands together. As a DNA-dependent ATPase it mediates branch migration, inducing homologous recombination by strain invasion with up to thousands of base pairs long heteroduplex DNA. However, RecA recognize ssDNA from DNA strand breaks and induces auto-cleavage of the lexA gene product (Figure 1).[77-78] Non-processed, LexA binds to specific gene sequences (SOS boxes) of the SOS genes with different affinities.[79-81] Thus, genes with weak SOS-boxes (uvrA, recA, uvrB or uvrD) are induced primarily, enabling a controlled reaction to the strength of DNA damage. As weak response, nucleotide excision, base excision and mismatch repair mechanisms will be induced. For base excision repair, DNA glycosylases are necessary for base exchanges. Similar to mismatch repair, only wrong base pair interactions will be recognized. The Chapter three|Part II: Introduction 208 nucleotide excision repair is maintained by the uvrABC operon. Whereas UvrA aids in complex with UvrB in DNA damage recognition, it becomes released when DNA distortions are detected, by recruiting UvrC. The newly formed UvrBC complex acts as excinuclease and cleaves the backbone of the distorted strand. UvrD (DNA helicase II) mediates the removal of the excised fragment and the gap is filled up by DNA polymerase I.[82] The mismatch repair is maintained by the family of mut gene products in E. coli. MutS, MutH and MutL are essential factors, as MutS recognized mismatches and binds to the related DNA portion.[83] Then, MutL binds to MutS and recruits MutH, which distorts the DNA double helix.[84] Subsequently, MutH cleaves the daughter strand at the nearest hemimethylated site and recruits UvrD (DNA helicase II) for double strand melting. Following, the MutSHL complex travels along the strand and cleaves the daughter DNA until a few nucleotides downstream the mismatch. The released ssDNA daughter strain is rapidly digested and the gap is filled up by DNA polymerase III. DNA finally seals the nicks. Thus, MutH cleaves non-methylated DNA or the non- methylated strand on hemimethylated DNA, the repair mechanism can discriminate between parental and daughter DNA.[85] Once methylated, the mismatch repair is slower, but random, allowing only a small time frame for mismatch repair during DNA replication.[86]

Figure 1: General SOS response in E. coli. During DNA damage, single-stranded DNA stretches occur and are recognized by RecA. The protein permits auto-cleavage of LexA and thus, the SOS genes became transcribed, depending on the amount of cleaved LexA. Thereby, DNA polymerases II, IV and V become active, whereas the last is depending on proteolytic processing, enabled by a RecA filament, called RecA*. The figure is adapted from Tippin et al.[87] Chapter three|Part II: Introduction 209

During a stronger damage response, which is indicated by a higher portion of cleaved LexA, the expression of sulA, umuD and umuC is activated by the decreasing pool of the lexA gene product. Cell division is stopped by SulA, while binding to FtsZ. The latter is the initiator protein, which induces cell division. Consequently, filamentation of the cell starts. UmuD is also cleaved by RecA (yielding UmuD’) to form a complex with polymerase subunit UmuC. The final mutagenic repair is applied by UmuD’C, additionally stopping cell cycle progress, maintaining the cell for DNA repair.[88] Thus, active DNA polymerase V is formed, allowing translesion synthesis.[89-90] As this polymerase is highly mutagenic, it is tightly regulated and is the last resort during DNA repair, when polymerase III replication is completely stalled. By the mutagenic activity, evolutionary adaption may be mediated.[91-92] Two additional polymerases are used during the SOS response: DNA polymerase II (from gene polB) recruited by mismatch stalled DNA polymerase III, performing further strand synthesis, while harboring proofreading activity.[93] Finally, DNA polymerase IV (from gene dinB) is able to restart DNA replication at stalled replication forks and do not exhibit proofreading activity, leaving PolIV and PolV as the main mutant producers during SOS response.[94-96]

Overall, the repair mechanisms are of course very efficient and mutagenesis rates are very low, e.g. for E. coli 4-7·10-10 mutations per base pair, correlating with a rate of roughly 0.002-0.003 per DNA replication event were calculated.[97] These numbers correspond to mostly non-stressed E. coli growth under optimal conditions. As the selection pressure is naturally forcing stress induced mutagenesis, the numbers will be slightly higher, depending on the special environment. For example, pathogenic strains were found to exhibit an even higher rate of over 1%, caused by a defective mutS allele.[98] Thus, the mutator alleles play a crucial role in adaptive evolution.[99] For laboratory applications, the replication machinery and the DNA repair mechanisms are interesting targets for the development of new strains or their usage in random mutagenesis target gene evolution. The variety of mutator strains is very high[100-108] and they were used for the development of a thermostable kanamycin nucleotidyl- transferase[109], to increase the total production levels of a recombinant Hepatitis B antibody[110], to evolve a Pseudomonas fluorescence esterase[102, 111] and also for promoter optimization[112]. The common targets for the establishment of mutator strains are the mutD or mutS sites, influencing the DNA polymerase III, thus introducing mutations over the whole genome.[113] As mutD encodes the epsilon subunit of PolIII, the proofreading exonuclease activity is influenced by mutations, creating a low fidelity polymerase holoenzyme. The commercial available strain Epicurian coli XL1-Red harbors totally three essential mutations, increasing the mutation rate up to 5000-fold compared to the wild type.[114] The strain is deficient in error-prone mismatch repair (mutS) and in the 3’→5’ exonuclease activity (mutD).[115] Additionally, the ability to hydrolyze mutational 8-oxo- 2'-deoxyguanosine-5'-triphosphate was deleted (mutT).[116] The abnormal Chapter three|Part II: Introduction 210 deoxyribonucleoside is a product of oxidative DNA damage and mismatches with adenine, resulting in A→C and G→T transversions. Defective mutD strains were the first investigated strains in regard to DNA repair and mutagenesis studies in prokaryotes.[113, 117] An endogenic transformation of defective DNA polymerases or polymerase subunits by plasmid vectors was also performed to substitute the corresponding wild-type proteins.[101] Additionally, the genomic integration of defective alleles or the targeted mutagenesis for DNA editing can be performed easily by DNA integration and excision of desired sequences through vector-mediated flipases and recombinases.[118] The opportunity of homologous recombination is also possible without the usage of counter selection markers.[119-120]

In vivo mutagenesis by mutator plasmids It was mentioned above that a common method for in vivo mutagenesis are commercially available mutator strains, deficient in several DNA repair mechanisms. The defective gene is encoded in the genome and is able to get repaired during the cell cultivation: Due to the higher mutation rate and the affection of the whole bacterial genome, reversion may take place until the mutation rate is significantly lost. The application of a suitable selection pressure may increase the repair of the initially mutated genes, strongly focusing cell survival. Thus, it seems to be easier to provide the mutagenic proteins encoded by endogenous vectors, because at first, every bacterial strain can be simply transformed in a mutator strain (circumventing the necessary genomic substitutions) and second the responsible plasmid can be freshly transformed to the strain, re-enabling mutagenesis, if the mutagenic gene(s) got repaired. Furthermore, the strains will harbor an increased genomic stability. In previous studies, mutator plasmids were described, promoting increased UV-sensitivity to the hosts, indicating its influence towards the repair machinery while interfering with nucleotide excision repair. The responsible plasmids R46, R391 of pYD1 were formerly used in E. coli and Shigella dysenteriae for random mutagenesis.[121-124] In another approach, a defective epsilon subunit of PolIII was found to be highly erroneous and the encoding gene can be genetically transferred to the cell by easy vector transformation. This plasmid - pMutD5 - carries a deficient mutD allele and increases the mutation rate 20-40,000-fold. Additionally, the plasmid harbors a temperature-sensitive origin of replication, enabling cell curing by growth at 41-43 °C.[101, 125] Interestingly, also other crucial parts of the repair machinery should be substitutable using mutagenic DNA repair components of the host pathways. However, information about the plasmid derived in vivo mutagenesis is still scarce and mostly related to DNA polymerases I and III. A very recent work of Badran and Liu was the expansion of a mutator plasmid originally used in PACE[3] to optimize the mutagenesis methodology: Initially, their mutator plasmid encoded dnaQ926 (for a defective proofreading subunit of PolIII), umuD’, umuC (both creating Chapter three|Part II: Introduction 211 mutagenic PolV) and recA730 (forming after the expression a RecA filament for a boosted SOS response, activating the genomic encoded Pol IV and PolV genes), rendering the target cell highly mutagenic by translesion synthesis, but with an also high non-induced mutation background.[126] Thus, they analyzed different reports about the DNA repair machinery and survival strategies containing DNA maintenance in bacteria and figured out interesting mechanistic targets. At first, the plasmid was cleaned and only the dnaQ926 gene was kept. Interestingly, the mutation frequency slightly increased, while the background mutagenesis decreased. However, under control of the same araC promotor stepwise the genes for the Dam methylase, the hemimethylated dsDNA GATC binding protein SeqA, the uracil-DNA glycosylase inhibitor protein Ugi and the cytidine-deaminase CDA1 from Petromyzon marinus were cloned and finally, the mutation rate increased two orders of magnitude. The critical choice of mutational hotspot proteins and the deeply reviewed DNA organization machinery turn this article exceptional and thus, it complements targeted fine-tuning in vivo mutagenesis methods like CRISPR/Cas9.[126-127] Beside this work, the mutagenesis is not targeted and affects the whole E. coli genome. The genes are inducible by a special promoter, but background mutagenesis is even high and thus, it suffers all drawbacks of mutator strains, especially during the mutagenesis phases, because it cannot be cured and moreover under high mutagenic pressure, the occurrence of phage resistances or the complete inactivation of the phages were never discussed.

The versatility of DNA polymerase I

The system for DNA polymerase I (PolI) complementation for the selection of complementing DNA polymerases in E. coli was described before, using the polA12 recA718 mutated E. coli strain JS200.[128] Thus, the genomic PolI was temperature- sensitive and became functionless at temperatures above 37 °C. PolI was hypothesized in previous studies to be essential for culture growth[129], but was found to be conditionally lethal in certain mutants.[130-131] Cells harboring defective PolI finally survived through the functional replacement by other genes, enabling growth in minimal medium.[132-135] However, in rich media the mutations stay conditionally lethal due to incomplete lagging strand synthesis, the main function of PolI.[6] The increased number of replication events cannot be covered by the alternative machineries.[134, 136] The growth ability becomes restored, when one subunit of PolI is expressed, ensuring the polymerase activity. The structural and biochemical studies of PolI revealed three functional domains within the enzyme: the 5’→3’ exonuclease is located N-terminally, followed by a central part for proofreading activity.[128] At the C-terminus the polymerase domain is situated.[128] PolI functions in lagging strand synthesis of genomic DNA removing residual Okazaki fragments, filling up DNA gaps and the release of the RNA primer.[137] Different DNA repair and recombination genes are associated with PolI. Thus, interactions with RnhA[138], PolC[134, 139], UvrD[140], RecA[141-143] and RecB[141] were described. During nucleotide excision repair, PolI is responsible for gap filling, the Chapter three|Part II: Introduction 212 release of the oligonucleotide fragment and the release of the UvrC protein from the postincision complex.[144-145] Surprisingly, the expression level of PolI reaches 400 molecules per cell, compared to DNA polymerases II and III with 100 and 10 copies, respectively.[146] Deleterious strains and heat-sensitive variants of PolI were described, highlighting the significance of the polA12 mutation, which results in a PolI gene product, completely inactive at 42 °C. Thus, the loss of the polymerase can be triggered by a simple temperature shift to simulate respective cellular events (e.g. conditionally lethal mutations) for investigations about functionalities and genetic stability of replication processes. The misfolded form was proven to be defective in the coordination between the polymerase and 5’-exonuclease activity, reducing the in vivo polymerase and 5’-exonuclease activity 4-fold at 42 °C.[147] In combination with the inactivation of the recA and recB genes, polA12 remains lethal in rich medium.[141] The conditionally knockout or deficiency of polA permitted precise insights into the DNA replication and repair mechanisms.[130, 147-148] Usually involved in lagging strand synthesis, PolI is also able to perform leading strand synthesis during replication of plasmids of the ColE1 family.[149-150]

The ColE1 origin of replication family of plasmids

In 1983 Selzer et al. noticed a sequence relationship of pMB1, p15A, RSF1030 and ColE1 origins of replication.[151] Thus, a related machinery for the replication was assumed and the mechanisms were analyzed.[152] Nowadays, the related plasmids are part of the ColE1 family. The 550 bp origin of replication (ori) encodes two RNA stretches (RNAI and RNAII), which are under control of different promoters (P1 and P2, respectively, Figure 2), whereas P1 permits stronger transcription.

Figure 2: Outline of the organization of the ColE1 origin of replication. Adapted from Camps et al.[4] Chapter three|Part II: Introduction 213

Thus, RNAII functions as pre-primer for plasmid replication, recruiting RNA polymerase for elongation and formation of a stable R-loop, being essential for plasmid replication. Within the RNAII sequence, three stem loops exist (Figures 2 and 3), permitting antisense repression by contacts to the three stem loops of antisense RNAI. This interaction (“kissing complex formation”) is the rate limiting step for inhibitory duplex formation.[153]

Figure 3: Stem loop organization within the ColE1 origin. The stem loops are recognized by the complementary parts of the antisense strand, initiating duplex formation by the “kissing complex.”

The P1 antisense promoter exhibits 100-fold higher activity as P2, resulting in RNAI excess and a negative feedback contributing to the defined copy number per cell, which is proportional to the inhibitor level of RNAI. If the 5’-end is deleted, the copy number became directly proportional to P2 transcription.[154] Featuring the RNAI and RNAII DNA templates in reverse orientation on the same ori locus, the sequences are overlapping at the 5’-end of RNAII, mediating the antisense repression. The half-life of RNAI is about 2 min during exponential growth by the RNAse E cleavage sites.[155] Thus, the cell is able to rapidly response to altered environmental conditions. However, after contact through the stem loops, the 5’-end of RNAI mediates nucleation and duplex formation. This duplex is stabilized by Rop, with the corresponding gene Rom removed from or inactivated in most modern vectors, increasing the copy number up to 5-fold.[156-157] High-copy plasmids (e.g. pUC) have a copy number of about 500-600 copies per cell due to a single point mutation in a Rop suppressible sequence in RNAII. If transcription by RNA polymerase proceeds towards the first 200 nt, another stem loop is created. After its formation mediated by the sequence domains α and β, the antisense inhibition is no longer possible, enabling only a short time frame for repression. In contrast, the RNAI/RNAII hybrid induces the interaction of domain β with another suitable domain, named γ. During this additional hybridization, critical R-loop formation cannot be achieved. The mechanism of repression of R-loop formation at the 3’-end by interactions at the 5’-end of the whole sequence is known as “action at a distance”.[4] Proceeding the RNA synthesis, a C-rich pattern in the template DNA interacts with a corresponding G-rich primer sequence, assisting in R-loop formation. Furthermore two Chapter three|Part II: Introduction 214 hairpins exist, necessary for primer processing and extension, as well as in R-loop formation.[158-160] Once established, this R-loop is processed by RNAseH cleavage, leaving a 3’-OH end, which is further elongated by DNA-directed DNA polymerase I.[158, 161] The leader strand synthesis separates the template DNA duplex and forms the D-loop, baring the primosome assembly signal for binding the gene products of priA, priB and dnaT. The replication fork complex recruits DNA-directed DNA polymerase III for completion of plasmid replication. Many mutational studies were performed for engineering the ColE1 origin of replication and thus, substitutions sites are well known for modulation of the defined plasmid copy numbers per cell (excellently reviewed by Camps[4] and Lilly and Camps[162]). However, the DNA polymerase I initiates the plasmid replication and thus, up to 3 kbp of the post-ori sequence were influenced by this polymerase, enabling a specific mechanism, exclusively for the ColE1 family. In contrast, the ori for pSC101 replication depends on the Rep protein for its initiation. Thus, the protein has affinity to special sequence repeats within the ori.[163-164] Rep is an ATP-dependent DNA helicase, able to melt the duplex, while Rep also interacts with the chromosomal replication initiator protein DnaA, stabilizing the complex via the respective dnaA boxes inside the ori sequence.[165-166] Followed by recruitment of the replicative DNA helicase DnaB (which is also recruited in the absence of dnaA boxes, but still mediated by DnaA[167]), the template duplex is unwinded in both directions, creating two potential replication forks. As DnaB interacts with DNA primase DnaG[168] to form the primosome, small RNA fragments were synthesized, serving as template for DNA polymerase III, finally completing the plasmid replication. The RNA nucleotides became replaced by the activity of the DNA polymerase I. Thus, the replication is strongly related to oriC replication in E. coli’s genome.[152]

Error-prone mutagenesis by a low fidelity DNA polymerase I

Several DNA polymerases are known for their low fidelity.[96, 169-172] Thus, the proofreading activity is the most common target for mutagenesis induction of the described polymerases. Also for the DNA polymerase I, a low fidelity variant exists (Figure 4).[7, 173] This mutant carries three mutations: At first D424A in the exonuclease domain, inactivating the proofreading ability.[173] I709N lies in the motif A, a conserved sequence in the palm domain of the polymerase part. By enlarging the binding pocket, mis-incorporation leads to decreased fidelity, directly connected to mutations at this position. [173-174] In motif B, A759 was substituted for additional binding anomalies during dNTP incorporation and DNA synthesis, as was shown for the homologous Taq polymerase.[175] The subsequent inactivation of the intracellular DNA polymerase I by polA12 during a temperature switch from 30 °C to 37-42 °C, the error-prone PolI variant will become the predominant polymerase. As the ColE1 origin of replication is induced by the action of PolI, mutations will be inserted into suitable introduced plasmids. The DNA polymerase switch to PolIII is achieved after 200 nt behind the ori. Nevertheless, Chapter three|Part II: Introduction 215 mutagenesis occurs also in longer stretches and additionally accumulates in the Okazaki fragment pattern from lagging strand synthesis.

Figure 4: Crystal structure of the Klenow fragment of the DNA polymerase I (pdb code: 1KLN) in complex with DNA. The protein is represented in cartoon backbone visualization and the sheets are highlighted red. Substituted residues for epPolI are shown as purple spheres. The DNA is depicted as stick model.

By the analysis of the mutagenic footprint, PolI was found to be able to perform polymerization of both strands downstream the putative DNA polymerase switch and is maybe included in the PolIII holoenzyme complex.[5] Finally, a stretch of up to 3,300 nt was found to be mutated (Figure 5).[5]

Figure 5: Principle of targeted in vivo evolution of a gene sequence by error-prone DNA polymerase I (epPolI). In E. coli cells harboring polA12 and recA718, DNA polymerase I (PolI) is temperature-sensitive (e.g. in strain JS200). Thus, the genome encoded PolI is not active at 42 °C. Introduced by a plasmid vector, epPolI (PolI D424A, I709N, A759R) takes over main functions, like the replication of plasmids with ColE1 origin of replication. Due to the low fidelity, 1.5-3.3 kbp of the initial replicon is produced by epPolI, before the low fidelity mutation range is overtaken by high fidelity DNA polymerase III. The figure is adapted from Camps et al.[7] Chapter three|Part II: Introduction 216

However, the major advantage of continuous mutagenesis was failed by the published protocol of growth, plasmid isolation and retransformation of the gene of interest and furthermore the uncontrolled expression of epPolI by the lac operator.[6] The thermosensitive polA12 was complemented with a recA mediated constitutive expression of the SOS response, mainly enabling cell growth also at higher temperatures, but also increasing the chance of translesion synthesis by PolIV and PolV.[143]

The allele recA718 is sensitized and activated as a result of slow Okazaki fragment joining under the restrictive conditions for polA12. It was reported that the UV sensitivity was increased by this mutation.[176] The combination of the 5’→3’ exonuclease inactivation and the constitutive SOS expression leads to viability under the restrictive conditions at 42 °C in rich media.[176] However, the combination of polA12 and recA718 was used previously in the lab of Lawrence Loeb for the selection of mutant creation of diverse complementing polymerases.[172, 177-182] Additionally reported was uvrA155, a mutant in the nucleotide excision pathway during UV mutagenesis. The sensitivity to UV light was formerly used as template for the development of mutator strains, where the current genes were knocked-out.[183] Complementation of the PolI system was achieved, as uvrA155 contributes to the mutagenesis frequency during the polymerase activity.[184-186]

The story of the λ promoter

The bacteriophage λ exhibits a genome of 48.5 kbp and is able to integrate into the E. coli genome during infection by homologous recombination between the gal and bio operons, necessary for galactose processing or biotin synthesis.[187] Once integrated, the concentration of two repressor molecules determines the decision between lysogenic or lytic cycles.[188] Both genes are located on the rightward side of the genome and share a bilateral operator region (Figures 6 and 7).

1 3 The special operator consists of three repressor binding sites OR -OR ; which are targets of the cro and cI gene products.[189] These repressors bind to the operator sequences 3 1 with different affinities: The affinity of Cro increases from OR to OR , whereas cI exhibits the opposite binding affinity.[190-192] In case of healthy host cells, cI is the mainly produced repressor, inhibiting the expression of the cro gene and all other lytic genes. At 1 first, OR is occupied by a dimer of cI, rapidly recruiting a second dimer, which binds to 2 OR . If a certain concentration of cI is reached, PRM will be blocked by binding of a third 3 [193] dimer to OR , inhibiting the expression of the cI gene as negative feedback control.

The cI gene product has also affinity to the OL region and therefore maintains the repression of the corresponding genes. Thus, the whole DNA stretch in-between the operator sites can be bend to form a stable octameric complex of four cI dimers by long- range , which tightly represses the lytic phage genes. Chapter three|Part II: Introduction 217

Figure 6: Outline of the genomic organization of the bacteriophage λ. The characteristic gene clusters with the respective functions are indicated. All genes become repressed during lysogenic cycle, except cI. The corresponding gene product is the general repressor protein and undergoes cleavage by RecA, consequently becoming controlled by the newly synthesized cro gene product. The figure is adapted from Griffiths et al.[194]

Figure 7: Promoter ensemble on the λ genome. The leftward operator (OL) maintained the protein expression from the leftward promoter (PL) of the N and cIII genes. The rightward operator (OR) bilaterally controls the expression of the cI gene from its exclusive promoter PRM [188] and cro and cII from the rightward promoter (PR). Adapted from Ptashne et al.

If the SOS response is induced in the prokaryotic cell, cI is cleaved and the induction of 3 the cro gene leads to binding of the Cro repressor to OR , inhibiting further cI expression. Thus, the lytic genes can be expressed, leading to final cell lysis and phage particle distribution.[188] The thermosensitive cI857 mutant mediate the mutation A67T and the gene product is [195] inactivated at temperatures above 37 °C. Thus, the λPR promoter is switchable for protein expression at higher temperatures (>37 °C), whereas the transcription is tightly repressed at respective lower temperatures. This system was previously used for recombinant protein overexpression in E. coli.[196-197] In another study, protein expression was optimized by oscillating temperature conditions to enable high-cell Chapter three|Part II: Introduction 218 density cultivations.[198] Furthermore, a combination of the cI857 and lacI repression system was used for on-switched protein expression at lower temperatures, by using the thermosensitive cI857 for the expression of lacI.[199-200] For a precise review about protein expression from the λ promoters, please see Valdez-Cruz et al.[201] The temperature sensitive behavior of proteins can be very advantageous, e.g. an adenylate cyclase mutant was described, exhibiting its highest activity at 22 °C. Interestingly, at 28 °C the enzyme became inactivated, but the reactivity was fully restored when the sample was shifted back to 22 °C.[202] In contrast, heat-sensitive repressors were also reported for the lac repressor LacI (which was then insensitive towards IPTG and lactose)[203] and a repressor from the Lactobacillus casei phage φFSW[204]. Whereas the thermo-inducible LacI competes with the natural pool of LacI in E. coli, other phage derived promoter systems have the opportunity to be transformable to other species.[205-206] Undoubtedly, the repressor system used from one phage will possibly interfere with other prophages.[207] The φFSW promoter system also induces protein expression in E. coli and is not sensitive to RecA cleavage (as cI857).

However, the λPR promoter was chosen, as at heat-shock conditions, also heat-shock chaperones will assist during protein expression from an included LacI dependent promoter, whereas other known promoter systems either used additional antibiotics (tetR, lexA) or may interfere in parallel selection during restrictive growth conditions (araC, rhaBAD, phoA).[197, 208-212] Furthermore the extrinsic promoters often require endogenous proteins. In case of the T7 promotor, the T7 RNA polymerase is necessary, often used in protein overexpression from known DE3 strains. But due to natural evolution, also laboratory bacteria may adapt in few generations by reducing the metabolic burden by impairing the genomic integrated gene.[213] Additionally, fine tuning of the promoter binding sites of λPR was reported to enable better growth conditions, 2 while keeping λPR repressed. Two point mutations inside or near the operator OR mediated the repression up to 36 °C with gene induction at 38 °C.[112, 214] However, the combination of both enables an even higher temperature switch, but also decreases the promoter strength.

The concept of continuous directed in vivo evolution Directed evolution is a common and manifold concept for changing chemical or physical properties of macromolecules, especially enzymes (Figure 8). Based on the genetic modification of genes or whole genomes, stability, activity or the establishment of new activities are targets in this process. Moreover, directed evolution (random techniques like error-prone PCR, UV radiation or mutator strains like E. coli XL1 red) or focused directed evolution (iterative saturation mutagenesis or the combinatorial active-site saturation test) are necessarily applied when no or little information about the protein architecture is available. To date the common methods lack the possibility to mimic the natural evolution by continuous Chapter three|Part II: Introduction 219 mutagenesis under (increasing) selection pressure. Any process will become discontinuous if gene isolation is required. Pausing or interrupting the development of proteins by screening and gene analysis always forces the process into a predefined direction, channeling the procedure in matters of the screening aim. This may further decrease the chance to detect synergistic substitutions. In a continuous process, targeting the mutagenesis to a certain gene of interest is highly desired, because it increases the chance for cell survival, acceptance of the methodology by the cell repair machinery and the adaption and evolution of this specific gene. Furthermore, the possibility that the microorganisms escape the selection pressure by diverse modifications is minimized, because the foreign transcription machinery cannot be influenced by the method itself. However, the establishment of such methodologies requires suitable bacterial hosts, a continuous and controllable mutagenesis process and a vector carrying the gene of interest.

Figure 8: Outline of protein engineering components and their potential application. The investigated gene construct is mutated in a directed evolution or a rational design approach. The library of mutants, which can vary significantly in size, undergoes transcription and translation during protein expression. If a selection opportunity is set-up, the library size will decrease significantly. Subsequently, a screening procedure follows for the determination of desired properties and the factor of benefit, calculated from the initial starting point. If no selection pressure is applied, high-throughput screening systems can cover very large libraries.[215] Finally, the data will be collected and analyzed before a new round of mutagenesis starts. In continuous processes, the first three steps are combined and the sampling procedure depends on defined time frames, ensuring a sufficient mutagenesis rate. Continuous systems are mostly applied in vivo because selection pressure, protein expression fidelity as well as established mutagenesis possibilities are common. Chapter three|Part II: Introduction 220

The concept of the methodology studied in this thesis is based on an error-prone DNA polymerase I. This enzyme replicates DNA with 15-20 nt s-1 and has repair functions, like 3’→5’ (proofreading) and 5’→3’ (nick translation) exonuclease activity. It is responsible for starting the replication of plasmids with a ColE1 origin. Remarkably, PolI is replaceable by other polymerases, but strains lacking PolI are generally more sensitive to UV radiation. Special mutants of PolI were described, lacking the proofreading activity and thus able to incorporate random mutations in the replicated DNA. If the gene of interest is cloned nearly the ColE1 origin, it undergoes mutagenesis via the exogenous polymerase, when the endogenous enzyme is disabled. Finally, the overexpression of the mutated genes can be performed by common arabinose, rhamnose or IPTG- dependent systems.[216] With respect to the corresponding selection procedure, which must be applied to focus the evolutionary mutagenesis, the expression system should not show remarkable interference. This can be e.g. by lowering the selection pressure through supply with growth substrates, which originally should induce target gene production (arabinose, rhamnose).

Chapter three|Part II 221

Results

The different components of the error-prone DNA polymerase I system of Camps et al.[5-7] were individually genetically manipulated to establish an artificial evolution system within a growing E. coli host cell. Therein, the mutagenesis events should be on- and off-switchable by a temperature sensitive regulation to enable a continuous in vivo mutagenesis. The respective mutagenesis rate was calculated after culture growth experiments with the generated plasmid components pHEP and pDFI and the essential host strain JS200 or TAF02.

The host for the target gene - pHEP The T7-dependent protein expression is a versatile expression system, which can be easily modulated by changing the inducer concentrations. First, it is helpful to achieve parallel target protein mutant overexpression, if necessary selection methods can be combined to the in vivo evolution mutagenesis. Second, the currently available pEP mutagenesis uses the T7-promoter for epPolI expression. Thus, the promoter must be changed, enabling the use of the T7-system for target protein expression. This ability is used in commercial available pET-systems and also the host vector for the gene of interest will be a pET derivative. Hence, pET28a was chosen for this purpose (Figure 9).

A B

Figure 9: Host vectors for T7-dependent protein expression. The gene of interest is cloned in- between the T7-promoter and the T7-terminator. For this purpose, pET28a features a versatile multiple cloning site (MCS), harboring two His6-tags, one T7-tag and a thrombin cleavage site. The position of the pBR322 ori was rearranged. In pET28a (A) it is on the opposite site of the plasmid, whereas in pHEP (B) it is located directly adjacent to the T7-terminator. Chapter three|Part II: Results 222

The origin of replication of pET-plasmids is a pBR322 ori (GenBank: J01749), which is a derivative of the ColE1 ori. This feature is essential for replication by the DNA polymerase I. Also pUC plasmids carry this ori, but are not suitable for protein overexpression. The pBR322 ori was relocated through the FastCloning[217] method downstream the T7 terminator, to prevent mutation of the promoter, while ensuring mutation of the target gene. Thus, three different PCRs were performed with oligonucleotides, priming the reaction by creation of additional 10-15 bp at each side, which are homologous to the new position of the fragment. The raw products of the multiple cloning site (MCS) (3.1 kbp), the resistance site (1.3 kbp) and the pBR322 ori (600 bp) amplicons were digested with DpnI and analyzed by agarose gel electrophoresis (Figure 10).

1 2 3 M

Figure 10: Agarose gel electrophoresis of the DpnI digested raw products of the FastCloning PCR. Lane 1 - product of the ori, lane 2 - product of the resistance site and lane 3 product of the MCS. M - marker (1kbp DNA ladder, fragments depicted in bp lengths).

After the digestion, the products were purified and the yield of DNA was determined spectrophotometrically. The fragments were mixed in equimolar amounts and transformed in competent cells. Sequencing of the plasmids with oligonucleotides directly in front of the new positions verified the correct orientation within the vector. Thus, four different plasmids were constructed: pHEP01 and pHEP03, sharing the same orientation of the pBR322 ori (as depicted in Figure 9), but pHEP03 exhibited additional nucleotides between the T7-terminator and the new positioned ori. The same applies to pHEP02 and pHEP04, but therein the ori was included in the opposite orientation. For the analysis of the mutation frequency, the epoxide hydrolase mutant CIF03 was cloned into the MCS of pHEP01-04 via the same restriction sites as in pET28a (please see Chapter 3, Part I). The correct orientation of the ori and the mutation efficiency was checked by continuous mutagenesis with JS200 and pEP. The cells were cultivated at 37 °C or 42 °C. Chapter three|Part II: Results 223

Each day, the cells were simply transferred to fresh medium to simulate continuous growth, while parallel mutagenesis was ensured during these experiments, but even after 8 days no mutations were detected. Additionally, IPTG was added for induction of the T7-promoter, but again only wild-type was yielded after sequencing. Therefore, the isolated crude library was sequenced at first, consisting of the plasmid isolate from the mutagenesis culture. If mutational variations were detected, 5-10 clones of cell colonies from freshly transformed E. coli Top10 cells with the respective mutant library were individually sequenced. The region downstream the re-orientated ori was also sequenced to exclude wrong orientation. In a personal communication with Manel Camps it was suggested that these results correlate to deleterious effects towards mutagenesis by epPolI induction. Camps et al. also recommended 2xYT medium, which should facilitate growth of JS200, but so far no growth problems were observed, even at 42 °C. Additionally, the mutagenesis protocol of the inventors was not continuous, due to the extraction of the plasmids after each round of mutagenesis at 37 °C, re- transformation of the DNA and the fact that cell growth on a special media was required. However, no mutations were found, even so the recommended protocol was applied and consequently the plasmids have to be tested with the modified pEP-plasmid (pDFI, please see sections below), which possess a different promoter control, tighter regulation and generally a stronger gene expression at 42 °C.

The host strain Previously, the in vivo mutagenesis system was proven to work with JS200, but with discontinuous growth, due to retransformation steps after each round of mutagenesis.[5- 7] Thus, the methodology is straightforward, but limited in its applicability with proper selection conditions coupled to an accumulation of beneficial mutations in the target gene, finally leading to positively selected mutants. The benefits are actually the targeted mutation and the easy applicability by simple microbiological steps. However, the mutated genes cannot be analyzed in further assay systems, since no proper gene expression from the used pUC-plasmids is possible, as pUC plasmids are strongly recommended for cloning purposes. The coupling of the host vector to a protein expression strain can be provided easily by pET-vectors after rearrangement. It would be a further advantage, if the host strain would be capable of T7-dependent gene expression, avoiding the plasmid transformation of the mutant library to suitable expression strains (JM109 DE3, BL21 DE3 gold, SHuffle T7 etc.), generally going along with significant loss of mutant variants. Additionally, the mutagenesis plasmid could be transferred to the expression strains, which will start mutagenesis at 37 °C, the usual growth condition before protein expression at lower temperatures. The same problem occurs within expression in JS200, but this host strain showed normal growth at 30 °C. As for other E. coli strains growth at a fixed temperature is recommended (e.g. SHuffle T7 Express), no hampering of protein expression is expected at low temperature with functional high-fidelity DNA polymerase I. Chapter three|Part II: Results 224

In general, there are two possibilities for the creation of the host strain: either a common BL21 (DE3) strain becomes genetically modified to carry the essential polA12 and recA718 mutations of JS200 or the DNA dependent T7 RNA polymerase under control of the lacUV5 promoter will be inserted into the genome of JS200, yielding JS200 (DE3). As genetic modification was described to be troublesome, both ways were pursued. Initially, the gene loci of polA, recA and additionally uvrA (UniProt: P00582, P0A7G6 and P0A698, respectively) were amplified from the genome of BL21 (DE3) and JS200, with suitable oligonucleotides and sequenced. The mutations were confirmed to be identical to the previously described polA12 and recA718 genotypes.[131, 147, 176] In case of uvrA no information of the uvrA155 mutation (in JS200) sequence was available, but a frameshift was sequenced in JS200. The deletion of the guanine nucleotide 2185 results in a gene product with 747 amino acids (a deletion of 193 amino acids). As this mutation was not mentioned to be essential for the mutagenesis application, its introduction into BL21 (DE3) was not performed.[5] The recA718 gene carries the mutation E39K and L127V in the recA gene product through the transition of G to A at nucleotide 115 and the transversion of C to G at nucleotide 379. This results in a consequent activation of the genetic repair system, ensuring genetic stability of the bacterial artificial chromosome during the mutagenesis. The second mutation is polA12, a G to A transition at nucleotide 1631, resulting in the mutation G544D in the related gene product. This mutant polymerase is thermolabile and not active at 42 °C. The strain must be handled with care and is usually cultivated at 30 °C.

Strain TAF02

A derivative of E. coli BL21 (DE3) was obtained from Gene Bridges GmbH (Heidelberg, GER). It carries a mutation in the rpsL gene, mediating streptomycin resistance (smR). If the wild-type gene of rpsL is delivered to the cell through a vector, the cell becomes streptomycin-sensitive (smS).[218] Thus, the transformation of this strain with the rpsL wild-type enables counter selection, if its introduction is reversible. During homologous recombination, a gene cassette will be inserted into the genomic DNA, carrying the rpsL gene and an antibiotic resistance as selection marker (kanamycin, kmR). The pRED/ET system of Gene Bridges is distributed in kits for the counter selection procedure: At first the gene cassette is integrated into the gene of interest by homologous gene sequences.[219] Positive cells are selected by kmR and smS. In the second step a mutated version of the gene of interest is transformed into the cells, removing the integrated gene cassette by selection towards kmS and smR. Please see the methods section for further details. The homologous recombination is facilitated by the proteins Redα/Redβ from the λ phage or RecE/RecT from the Rac prophage. The first proteins of the pairs are 5’→3’ exonucleases and the other DNA annealing proteins, functionally coupled to enable the recombination events by sequence homologies. Furthermore, the λ-encoded Chapter three|Part II: Results 225

Gam protein inhibits the host nucleases RecBCD and SbcCD of E. coli, preventing degradation of transformed linear dsDNA.[220-221] With this method the mutations recA718 and polA12 should be introduced into BL21 (DE3), yielding TAF01 (recA718) and subsequently TAF02 (recA718, polA12). The recA gene was targeted at first, because mutagenesis of polA seemed to be more difficult, as certain gene mutations were described to be conditionally lethal.[130-131] However, growth of an E. coli knockout strain (ΔpolA) in LB or M9 minimal medium was reported.[132, 136] A selection cassette, carrying the rpsL wild type and the neomycin phosphotransferase II gene (neoR), enabling kanamycin resistance by permanent expression was amplified with suiting oligonucleotides. Thus, 50 bp up- or downstream of the insertion sites were added to the cassette. As two mutations must be introduced, these were 50 bp upstream of nucleotide 115 and 50 bp downstream of nucleotide 379. The homology arms were designed to exclude the substituted nucleotide during the insertion of the cassette. The provided BL21 (DE3) strain was checked previously for streptomycin resistance and then transformed with pRED/ET, induced and subsequently transformed with the PCR product of selection cassette as recommended by the manual (for further details please see methods section and Figure 11).

Figure 11: Outline of the genome editing process by homologous recombination with the pRED/ET kit and counter selection. The reversible streptomycin sensitivity (smS) is the counter selection marker for the correct insertion of the mutated gene, transformed into the cells as dsDNA. Finally, pRED/ET can be eliminated by growth at 37 °C due to its temperature-sensitive pSC101 origin of replication.

The cells were checked for kanamycin resistance (kmR) and streptomycin sensitivity (smS) by plating the transformation on agar plates with kanamycin. Subsequently, single colony clones were picked and spread on agar plates with streptomycin and only negative colonies were used for analysis in colony-PCR (cPCR, Figure 12).

Chapter three|Part II: Results 226

Figure 12: Outline of the control reactions performed by cPCR. The genomic insertion product is represented in the central box. The rpsL-neoR cassette was successfully inserted by pRED/ET recombination, excising 260 nt from recA. The verification of insertion was done with recA fw and recA rv oligonucleotides, yielding an extended PCR product of 2.1 kbp, compared to the recA-WT of 1 kbp length. Additionally, the combination of recA fw and rpsL-neoR rv must yield a 1.4 kbp product, only possible after successful cassette insertion with the exclusive binding sequence of the latter primer.

The cPCR products were separated by agarose gel electrophoresis and visualized with Roti® GelStain. Unfortunately, only recA WT was identified by a discrete lane at 1 kbp, when amplified with recA fw and recA rv. Adequately, no cPCR product was obtained with recA fw and rpsL-neoR rv. As the colonies were streptomycin sensitive and kanamycin resistant, a complementation at the rpsL site in the BL21 (DE3) genome by homologous recombination was hypothesized. Thus, the homology arms were extended from 50 to 80 nt, yielding the same result of unknown insertion. The homology to the rpsL site seemed to be much higher, as compared to the target sequence. However, the purified insertion cassette, equipped with 50 nt homology arms at each site served as a template for overlap-extension PCR together with the recA wild type gene, extracted by cPCR from the BL21 (DE) strain and properly purified. In a two-step protocol the homology arms were elongated to the full length of the recA gene, yielding the 2.1 kbp product shown schematically in Figure 12 (Figure 13).

1 M

Figure 13: Agarose gel showing the separated and stained overlap-extension PCR products using recA WT and the rpsL-neoR cassette, equipped with homology arms as template. The arrow indicates the 2.1 kbp product in lane 1, corresponding to the schemed part in Figure 12. The WT and the cassette template are also visible at 1 and 1.4 kbp, respectively. Additionally, an 800 bp byproduct was formed by unknown reason. The marker lane (M) indicates control fragments at defined bp length. Chapter three|Part II: Results 227

The desired products were gel purified, sequenced and transformed into the induced, pRED/ET containing target strain. The colonies were analyzed as described before, finally yielding the desired lanes at 2.1 and 1.4 kbp, respectively (Figure 14).

A M B M

Figure 14: Agarose gel with separated and stained cPCR products after pRED/ET transformation for insertion of the rpsL-neoR cassette. All sample lanes yielded proper products at 2.1 kbp for amplification with recA fw and recA rv (A) and at 1.4 kbp for amplification with recA fw and rpsL- neoR rv. (B) The marker lanes (M) indicate control fragments at defined bp length.

The residual sample from a positive cPCR was purified and sequenced for correct nucleotide and insertion arrangement. During the procedure, the pRED/ET plasmid was maintained. Thus, the colony of the positively sequenced sample was grown, the pRED/ET system was induced and transformed with a synthetic 600 bp gene string (GeneArt® string, Life Technologies, GER). This dsDNA fragment encoded the excised nucleotides with the respective substituted mutations and exhibited two homology arms with 120 nt and 220 nt length up- and downstream of the insertion site, respectively. The transformed cells were counter-selected on agar plates, containing streptomycin to screen for the deletion of the rpsL wild-type gene. Analogously to the procedure before, single colonies were streaked on agar plates containing kanamycin and negative clones were used for cPCR in the same procedure as described before. This time, a 1 kbp band of recA718 with recA fw and recA rv and no product with recA fw and rpsL-neoR rv were expected (Figure 15). Excessive cPCR product after amplification with recA fw and recA rv (with 1 kbp length on the agarose gel) of positive clones was purified and divided in three parts: The first and the second were digested either with BstXI or BtsI. BstXI cuts the wild-type recA gene at the first mutation position; yielding two fragments of 950 and 120 bp. Easier to identify are the fragments from the BtsI digestion, cutting the desired recA718 at the second substitution position. This results in a fragment pattern of two products with 700 and 370 bp length (Figure 16).

Chapter three|Part II: Results 228

1 2 3 4 5 6 7 8 9 M

Figure 15: Agarose gel with separated and stained cPCR products after pRED/ET transformation for deletion of the rpsL-neoR cassette. Lanes 1-5 and 7 yielded proper products at 1 kbp after amplification with recA fw and recA rv. Lanes 6, 8 and 9 showed bands at 2.1 kbp, indicating the inserted cassette. The marker lane (M) represents control fragments at defined bp length.

1 2 M 3 4 5 6

Figure 16: Fragmentation pattern of BtsI and BstXI digested samples. Lane 1 and 2 - BL21 (DE3) negative control, lane 3 and 4 - JS200 positive control, lane 5 and 6 - positive sample from the counter-selected colonies, representing TAF01 (BL21 DE3 recA718). Odd numbered lanes were treated with BtsI and even numbered lanes with BstXI. The marker lane (M) demonstrates control fragments at defined bp length.

cPCR products with correct restriction pattern were sequenced and the resulting colony was cultured and glycerol stocks for long-term storage were prepared. Then TAF01 was transformed again with pRED/ET and subsequently with the rpsL-neoR cassette again. This time, the cassette was equipped with 50 bp homology arms at the 5’- or 3’-end by the corresponding sequences up- and downstream of the substitution site at nucleotide 1631 in polA. After selecting for smS/kmR clones, the cPCR yielded only polA WT. The selection was performed on kanamycin agar plates, derived from either LB medium or M9 minimal medium, as polA knockout mutants were known to grow slowly on minimal media, but rarely on/in rich media.[136] The homology region was prolonged to 80 bp with the same result. Thus, overlap-extension PCR was performed. As the 1.3 kbp of the selection cassette integrates completely into the 2.9 kbp gene, a product of 4.1 kbp was expected. Unfortunately, no product was achieved, either by amplification with OptiTaq, PfuPlus (both Roboklon, PL) or Q5 polymerase (NEB, GER). To overcome this, the amplification of single fragments for subsequent combination by sequential overlap- extension PCRs were generated (Figure 17). Chapter three|Part II: Results 229

When the whole gene assembly was used for pRED/ET combination, the successful recombination was shown above. Thus, the different parts of the gene assembly for polA substitution were amplified successfully, but the assembly remained challenging. Part I and part II (Figure 17) were successfully combined by PCR, but neither the elongation with purified part III, nor the assembly of all fragments in one reaction, nor the combination of part II solely with part III was possible. Gradient PCRs were used for optimization of the annealing temperatures, as well as different amounts of DMSO with all three polymerases, but the full length PCR product of 4.1 kbp for the recombination strategy was never detected.

1

2 3 4

Figure 17: Overview about the fragments for overlap-extension PCR of polA. (1) complete assembled integration DNA. (2) Part I, (3) Part II (selection cassette) and (4) Part III were separately amplified with exclusive oligonucleotides by PCR and combined in additional reactions.

It was decided to clone polA WT into the gene vector pET15b by NdeI and BamHI mediating ampicillin resistance (ampR). Thus, the plasmid served as new template for the further investigations, enabling selection of kmR/ampR clones on successful cassette insertion. Different attempts were tried to enable the insertion of the cassette into pet15b/PolA: At first, the vector and the cassette were amplified with oligonucleotides adding 15-30 bp homology regions to each other. The products were achieved in good yields with Q5 polymerase, subsequently digested with DpnI and purified. The insert was added in access (molar ratio 1:3-10) for the FastCloning procedure, but no clones were achieved, neither on agar plates with ampicillin and kanamycin, nor on agar plates with kanamycin solely. Then, Gibson Assembly®[222-223] master mix and also NEBuilder® HiFi DNA Assembly master mix (both NEB, GER) were used with freshly purified PCR product and again, no colonies on the plates were detected. In contrast, colony growth was observed on agar plates, containing solely ampicillin. Randomly picked colonies did not contain an insert at the respective position, as proven by cPCR. The amount of fragments was increased to decrease wild-type background and thus, the plasmid at the NdeI and BamHI restriction site was amplified with respective homology arms, as well as part I-III (see Figure 17). An additional fragment was obtained, choosing its amplification Chapter three|Part II: Results 230 within the ampicillin gene, ensuring correct assembling of the vector by decreased background. In total, two different reactions with NEBuilder® HiFi DNA assembly master mix were performed after optimizing primer by the NEBuilder® Assembly Tool (v1.10.5, nebuilder.neb.com) with four or five fragments (part I-III and vector amplicon and part I- III, and vector amplicons I and II, separating the gene for ampR (bla). The background of cells on the ampicillin agar plates was reduced 10-fold, but again, no colony growth was observed on amp/km or km plates. As last option, the SLiCE[224] method was applied with 30 and 50 bp homology sequences and the vector construct, amplified from the insertion site within polA. This time, positive clones were also found on kmR plates, but the amplification with insertion-specific oligonucleotides yielded smeary results with hardly identifiable bands. Thus, sample clones were used for sequencing and an insertion into the plasmid sequence was achieved, but the insert was shortened, unable for proper RNA transcription. The origin of the kanamycin resistance could not be identified.

Strain JS200 (DE3)

In parallel to the genome editing procedure, the available JS200 strain was lysogenized with the help of a commercial available kit, using three different phage lysates: λDE3, carrying the DE3 cassette with the T7 DNA-dependent RNA polymerase under the control of the lacUV5 promoter. Additionally, a helper phage for the incision of the cassette into the genome and a selection phage to maintain cell death, if no DE3 insertion was performed, were used. JS200 cells were transfected with the phage mixture as described by the manufacturer, except for the growth temperature, which was chosen to be 30 °C. Surviving cells were screened by cPCR for DE3 insertion, using a suiting primer pair, specifically producing a 1 kbp PCR product by exclusive sequence sites inside the T7 RNA polymerase (GenBank: ACT29923, UniProt: C6ZCU5). After growth on LB agar plates, cPCR was performed and 1 kbp bands were achieved as expected. During the picking process, the colonies were transferred to a fresh plate, collecting the sampled variants. If the cPCR was repeated, no bands were achieved. Neither the 1 kbp product, nor the whole T7 RNA polymerase product at 3 kbp with the corresponding primer, was yielded. This phenomenon was reproducible, whereas BL21 (DE3) positive control and not transfected JS200 negative control produced the expected bands or not, respectively. Then, GFP (cloned in pET11a) was transformed by electroporation into cells, which were initially positive after transfection. The cell suspension was spread on LB agar plates, containing ampicillin and IPTG and was incubated at 30 °C overnight for protein expression test. The respective BL21 (DE3) cells exhibited a green fluorescence, whereas transfected JS200 cells have not. Another reporter plasmid was constructed, by cloning of genomic Bacillus DNA, encoding spectomycin resistance, into pET28a by SLiCE. Thus, pooled colonies from transfected cells were grown in LB medium, electroporated, recovered and divided into two parts: one was grown in LB medium containing IPTG and spectinomycin (0.1 mmol L-1 / Chapter three|Part II: Results 231

50 µg mL-1, respectively) and the other plated on LB agar plates conditioned with IPTG and spectomycin in the same concentrations. Finally, no colony formation or cell growth was obtained, indicating no T7 RNA polymerase-dependent protein expression of the spectinomycin resistance-mediating protein.

Mutagenesis by error-prone DNA polymerase I The DNA-directed DNA polymerase I (PolI) and the construction of an error-prone mutant (epPolI, carrying D424A, I709N and A759R) and its host plasmid pEP were done previously.[5-7] The respective plasmid was obtained from addgene as a gift from Lawrence Loeb (Addgene plasmid # 11722, Figure 18A).

A B

Figure 18: Plasmids pEP and pDFI with the error-prone DNA polymerase I. The lac operator in pEP (A) was replaced with the thermoinducible cI857/λR promoter system (B). The plasmids share a pSCS101 origin of replication, compatible to the ColE1/pBR322 ori of pHEP.

Therein, the transcription of epPolI is regulated by the lac operator, without introducing an endogenic lacI repressor gene copy (as included in pET-plasmids). Thus, the regulation depends on the cell own repression and activation system, claiming a low basal gene expression, sufficient for target gene mutagenesis. Less or no mutations were observed, when IPTG was added for gene induction (Camps, M.: personal communication). However, an appropriate T7 RNA polymerase was never involved in the system and the regulation and induction of epPolI was either randomly or only exhibited at basal amounts, finally also depending on general catabolite repression. As the T7- derived gene expression depends on this inducer, the promoter has to be changed, ensuring sufficient mutation rates, during the active gene expression. Performing the mutagenesis method, it is necessary to shift the temperature to 42 °C. The lambda Chapter three|Part II: Results 232 promoter system exhibits beneficial properties under this condition by promoting gene transcription. The promoter is controlled by the cI857 gene product, acting as thermolabile repressor: The protein expression from the λ promoter is activated at 37- 42 °C and tightly repressed at 30 °C.[205, 214] Also a single mutated promoter variant with tight repression up to 36 °C - while obtaining full induction at 38 °C - was described.[112, 214] Substitution of the lac operator with the cI857/λPR promoter in pEP generated pDFI (Figure 18B). A common shine-dalgarno sequence was used as ribosome binding site (RBS) upstream to the epPolI gene and the nucleotide linker size between the promoter and the ATG start codon was adapted from the pET plasmids. Initially, two promoter variants were genetically constructed, namely TS1 and TS2 differing in the RBS and the linker nucleotides. To measure the promoter strength, a common pET11a vector was chosen as reporter plasmid, with an inserted GFP gene sequence (pET11a/GFP = GFP(T7), Figure 19).

A B

C

D

Figure 19: Schematic representation of the constructed promoter/repressor variants, created for thermosensitive gene expression. The GFP gene becomes induced by the λPR promoter at temperatures above 37 °C. For optimization of the ribosome binding site (RBS), two variants were created (A) and (B), yielding GFP(ts1) and GFP(ts2), respectively. There, a common shine- dalgarno sequence (AGGAGGT) or the RBS from pET28a (GAAGGAG) was integrated. After the recognition of the constitutive gene repression in TS1 and TS2, a single point mutation - inducing the thermolability in the repressor gene (cI857) - was introduced, yielding GFP(ts3) and GFP(ts4). Only the first construct is shown in (C). A promoter mutation was reported for tight repression until 36 °C and activation of gene expression at 38 °C. Thus, the promoter variant was created, yielding (D) GFP(ts5).

Analogously to the planned substitution in pEP, the plasmid sequence of GFP(T7) was amplified with a suitable primer pair, excising the T7 promoter sequence, while adding 15 bp of the homologous sequences to the insertion product. The insertion DNA was amplified separately from a synthetic gene (Life Technologies, GER), exhibiting the cI857 repressor upstream of the λPR promoter. The amplicons were digested and combined by FastCloning, yielding GFP(ts1) with a general shine-dalgarno sequence or GFP(ts2), Chapter three|Part II: Results 233 mimicking the nucleotide sequence between promoter and gene (including the RBS) from pET28a. The correct insertion and orientation was verified by sequencing. Unfortunately, no GFP fluorescence was detected, when the altered plasmids were transformed into BL21 (DE3) and either grown at 42 °C or by the addition of IPTG. Contrarily, GFP(T7) mediated a green fluorescence of the whole cell culture, while induced with IPTG. After revision of the gene sequences, a single point mutation constitutively activating the repressor was detected (T67A in the gene product of cI857). The sequences were corrected by QuikChange® reactions, yielding GFP(ts3) and GFP(ts4). Additionally, a nucleotide exchange within the promoter sequence was performed, responsible for a tighter gene regulation and initiating gene expression at ≥38 °C, while repression at ≤36 °C (GFP(ts5)). The thermosensitive variants were tested in microtiter plate format. BL21 (DE3) trans- formants were grown in MTPs, treated with IPTG and incubated overnight at different temperatures for protein expression. The cells were lysed and the crude extract from the supernatant was used for fluorescence measurements (Figure 20).

A B

Figure 20: Promoter strength analysis by fluorescence measurements. The individual fluorescence signal was normalized by the OD600 values of each cell culture. (A) Normalized signal intensity of promoter constructs GFP(T7), GFP(ts1), GFP(ts3) and GFP(ts4) at certain temperatures with and without addition of IPTG. (B) Expression increase during gene induction.

The activation of GFP(T7) depends on IPTG addition, but the induction of the cI857/λPR-systems were temperature-dependent. The final mutagenesis protocol will include IPTG addition for target gene induction and temperatures from 37-42 °C, modulating the gene expression of epPolI.

The GFP(T7) construct yielded the highest fluorescence signal at the usual expression conditions (IPTG addition and 20-30 °C protein expression temperature), but due to the higher basal expression, the expression increase rate compared to non-induced cultures was higher for the cI857/λPR system. According to the strong repression, cI857/λPR showed a low basal expression, and thus a higher activation rate. At the final targeted Chapter three|Part II: Results 234 reaction conditions, the lambda promoter is highly advantageous. As expected, the first variant (ts1) showed low expression of GFP due to permanently active repressor mutant cI857 T67A. At an expression at 30 °C the repressor is active and GFP is not expressed, independently of IPTG. The T7 system is switched on by IPTG addition and the intensity and therefore the linearly dependent GFP content is increased up to 42-fold. This activation decreases at 37 °C and 42 °C to 31- and 6-fold, respectively. In contrast, the induction of the cI857/λPR-system is temperature dependent and the normalized fluorescence signals increases up to 51-fold at 42 °C compared to the GFP expression at 30 °C. Interestingly, the addition of IPTG slightly increases the GFP signal for variant GFP(ts3), whereas the respective value for GFP(ts4) is lower. GFP(ts3) showed the highest GFP signal (4-fold higher than in LacI/T7) during the expected growth parameters at 42 °C and with IPTG addition. However, under the best expression conditions of each promoter system, LacI/T7 produces a 7-fold higher signal, with a related high basal expression (protein production without induction). The other way around, cI857/λPR is 7-fold less leaky and thus, expression control is tighter adjustable.

At 37 °C expression temperature, the cI857/λPR-system is predominantly repressed, shown by the slight increasing protein expression of 1.5-4-fold. It has been reported that a single mutation in the promoter leads to tight repression up to 36 °C and activation at 38-42 °C. The system was mainly used for inducing cell lysis, which has to be strongly regulated to prevent premature cell lysis or gene elimination by the host cell. An adaption of this mutation can also control ep-PolI expression more accurate. The construct yields signals comparable to GFP(ts3). At 37 °C the expression rate is increased 4-fold, compared to GFP(ts3) and GFP(ts4), which were only induced 1.5 to 1.8-fold. However, at 42 °C, the promoter functions as well as the latter promoter variant, increasing the protein signal 26-fold compared to the expression at 30 °C. At least, the total signal at 42 °C was 1.4-fold higher, than using the LacI/T7 system.

Therefore, the gene sequences of the cI857/λPR system from GFP(ts3) and GFP(ts5) were used to create pDFI02 and pDFI03, respectively. Both variants will be repressed at 30 °C, but differ in their expected expression strength at temperatures above 37 °C. Both plasmids - and additionally pEP - were introduced into JS200 cells, which were subsequently transformed with pHEP plasmid to analyze the system for mutagenesis success. Before electroporation, JS200 cells were checked for survival at 37 °C with the spiral agar plate test. As mentioned by Camps et al., no mutations were observed after 8 d continuous growth with pHEP and pEP at 42 °C. Each day, a new culture was set up in fresh LB medium. Moreover, some cultures died back, thus the cultivation temperature was lowered to 39 °C. Again, no mutations were observed for the pEP/pHEP combinations. In case of pDFI02 or pDFI03 with pHEP, cell growth at 42 °C was observed, but after sequencing (with the same strategy as described above), no mutations were detected within the pHEP construct. Moreover, the error-prone DNA polymerase I was mutated, yielding a short and putatively inactive protein fragment of only 13 amino Chapter three|Part II: Results 235 acids. Thus, the temperature was switched to 39 °C and mutagenesis occurs within 200- 900 bp downstream the origin of replication (Table 1 and Figure 21). It has to be mentioned that in certain samples, non-mutated pHEP was observed again, caused by a mutated non-functional epPolI. Table 1: Observed mutations in the continuous mutagenesis with pDFI and pHEP. The numbers correspond to the original base (bold, column) substituted by another base (italic, row). Specific mutations related to leading or lagging strand synthesis are colored in green or blue, respectively (corresponding to Alexander et al.[6]). Frameshift mutants were counted as insertions (Ins) or deletions (Del) of the respective nucleotide.

A C G T Ins Del A - 2.19 1.64 4.92 2.73 1.09 C 1.09 - 3.83 12.02 1.64 0.55 G 13.66 15.85 - 19.13 1.09 1.09 T 2.73 4.37 2.73 - 6.56 1.09

The base substitutions for leading strand synthesis were in total 40.44% and for lagging strand synthesis only 24.04%. Thus, the strand preference of the mutagenic polymerase is mainly aside the leading strand synthesis, as observed previously.[6] The mutagenesis rate increased during the days, when fresh culture was used for the inoculation of the new cell culture. The obtained rate reaches a maximum after four days. Then, mutagenesis was decreased for unknown reason, indicated by a drop of the mutation frequency (Figure 21).

Figure 21: Mutation frequency during the continuous cultivation of JS200 transformed with pDFI and pHEP. After each day, plasmid purification was performed, after inoculation of fresh LB medium. Thus, growing transformants were checked for mutagenesis by sequencing pHEP. In total, roughly 215,000 base pairs were sequenced. Chapter three|Part II 236

Discussion

The establishment of an in vivo evolution system is usually suffered by several drawbacks. As the natural evolution will be mimicked, defined evolutionary selection factors must be applied. Thus, the mutagenesis methodology will become more powerful, if active selection for protein mutants with altered properties will be performed. To use the continuous mutagenesis method in parallel for cellular selection, the commercial T7-system was adapted and integrated in the process. The genome editing of the existing expression strain BL21 (DE3) was challenging and only one mutation (recA718) was successfully introduced in the genome. It was observed that the length of the homology arms is critical for correct insertion. The unwanted recombination to the genomic rpsL locus was not observed by the supplier (personal communication). However, with shorter homology arms, kanamycin resistance was observed, but no insertion at the targeted homology site. As the cPCR did not produce proper bands at one insertion site, it is likely that the dsDNA recombined with a more homologous locus. The hypothesis is underlined by the observations during the investigations on polA. Exclusively the same result was produced: kanamycin resistant colonies with no insertion in-between the target gene. An extension permitted success in case of recA, but not for polA. The cellular regulation and maintenance of PolI must be very finely balanced, as the gene is necessary for growth in rich medium, but only slower growth on M9 minimal plates was observed.[136] However, the gene could not be altered in BL21 (DE3) directly. It could be extracted and subcloned, but the amplification with an insertion cassette was not possible. Furthermore, it has to be stated that not only the in vivo amplification and selection of the assembled fragment was hampered, but also the conjunction of homologous fragments in in vitro reactions were not accessible by molecular cloning techniques. Surprisingly, also the insertion educt (elongated selection cassette) was not accessible in vitro. Either, the secondary structure is very rigid and stable and thus the gene is somehow protected, or the cellular repair mechanisms control and keep the locus under relevant DNA repair. It was mentioned, that constitutive expression of the respective gam gene product (Redγ, inhibitor of RecBCD) may have a toxic effect on recA- cells.[225] Thus, the recA wild type is included in the polycistronic operon of the pRED/ET plasmids under control of the arabinose induced pBAD promotor. In this operon, all necessary genes for the recombination procedure (Redαβγ and recA) are summarized. It may be possible that the constitutive activation in recA718 strains may influence the recombination success, but as kanamycin resistant clones were selected, the recA mediated homology search may be affected. Interestingly, it was previously possible to introduce recA718 and then polA12 in the E. coli genome, thus the order was not changed, but unfortunately no protocol or method section was accessible.[134] Probably, transfection with bacteriophage P1 was performed.[226-227] Finally, the work with subcloned plasmid material failed and no prolonged insertion cassette could be obtained. As the insertion cassette could not Chapter three|Part II: Discussion 237 recombine in the correct place, the strain could not be completed. Another possibility is the direct transfer of the polA12 DNA and recombination with the pRED/ET system, but the screening effort rises dramatically, if no selection marker is used. Furthermore, the only endonuclease cleaving at the targeted mutation site was HpyCH4III with the consensus site ACN/GT. This sequence is present five times in the wild type, but four times in the mutant. The cleavage can be identified by the formation of one 1.3 kbp band in contrast to two small bands at 670 and 620 bp, by removal of the site during base substitution. However, the screening effort will be very high due to the achieved high colony number, the cPCR and further digest and electrophoresis. Additionally, the enzyme amount for one screening round will be too high. An interesting alternative may be the usage of a specific primer, annealing by the mutated nucleotide at the 3’-end. If mutation occurs, binding is possible, but if the wild type is present, the oligonucleotide may anneal, but elongation is not possible, because of the outermost unpaired base. This method will yield false positive variants, but those may be counter-proved by HpyCH4III digestion. Alternatively, the cassette could be introduced not within the gene, but adjacent to its gene locus. The rate of recombination success decreases with every base between the mutation site and the insertion point (personal communication with Dr. Marlen Schmidt, GeneBridges, GER). Another opportunity is the complete deletion of polA from the genome by the cassette, but again, the success is related to recombination of homologous sequences and the rpsL locus may serve as a possible insertion site. The counter-selection system may be very useful for less essential genes and additionally GFP could also serve as a marker, which integration will circumvent the antibiotic screening by a FACS selection: The sequence of the respective gene is 720 bp long, with an adjacent promoter this raises to 750-800 bp. Thus, the integrative DNA will be much smaller and no related sequence of the genome will disturb the recombination. The fluorescence is also visible by eye at daylight, so the FACS is not essential, due to easy identification of growing colonies. Of course, the missing selection procedure increases the screening effort and the surviving colony number, but if dilutions will be prepared, mutants can be detected very fast. However, due to the fact that the genomic integration failed, the DE3 genotype must be transferred to JS200. The phage mediated lysogenization was also prone to several drawbacks. The cellular mutations uvrA155, recA718 and polA12 will of course influence the recombination and integration quality of the desired gene (T7 DNA polymerase under control of lacUV5 promoter). The constitutive SOS response in the strain generates an increase background mutagenesis rate, which can cause reversion of the mutated genes. Additionally, the recA718 mutation was not investigated for its ability to bind single stranded DNA. Thus, the ability of homologous recombination may be negatively influenced and can be a reason for the failed insertion of the pRED/ET mediated gene cassette or the incision of the DE3 cassette. Additionally, the recombinant λ phage uses the maltose transporter (malT/lamB) for adhesion and binding. As the full genotype of JS200 is not properly described, the only hint for λ phage Chapter three|Part II: Discussion 238 resistance was once described.[7] Additionally, the original strain SC18[134] and the precursor JM1 were male strains (F’ genotype) and thus not transfectable with the used T7 control phage.[228] The cPCR showed that phage material exists within the first spread colonies, but get lost after re-inoculation on fresh media. The cPCR procedure simply destroyed the phage particles and the related DNA was amplified. In another cPCR with fresh material, these particles got lost. Though it has to be considered positively that phage interference of potential material derived from λ phage infection can be excluded and the λPR/cI857 system will be influenced only by the host strain. Interestingly, the SC18 strain was formerly described as E. coli B strain derivative, but in contrast most of the K12 derivatives seem to exhibit λ phage resistance.[182, 229] Hence, the annotation of DH5alpha at addgene.com was therefore misleading. The host plasmid pHEP was cloned successfully, the rearrangement was confirmed by sequencing the former site for the origin of replication and also nucleotides downstream the T7 terminator, were the pBR322 ori was introduced. Before the correct orientation was achieved, different plasmids were created. At least pHEP01 and pHEP04 carry the correct orientation for epPolI mutagenesis of a target gene within the MCS. Both differ only in the number of nucleotides, between the ori starting point and the T7-terminator. However, the closest approximation was accomplished in pHEP01. Troll et al. mentioned in the mutagenesis protocols that no continuous mutagenesis was practicable with their system using JS200 as host cells.[5] In contrast, the transformed cells were cultivated overnight at 37 °C and afterwards plasmid isolation was performed followed by transformation and another round of cultivation for mutagenesis. After the first two rounds, mutations were detected. Due to the cultivation temperature, the low copy number and the IPTG-sensitivity of the system, too many wild-type PolI will remain, inhibiting the mutagenesis efficiency, as reported by the authors.[5-7] Furthermore, the discontinuity and the activation of target gene expression were not included to date. The only probable targeted mutagenesis possibility in vivo will be the transcriptional/translational coupling of mutagenesis. For example, an erroneous T7 RNA polymerase was described and would be an opportunity for the mutagenesis of T7- specific genes. The specificity is strongly related and exclusive to the promoter, but the phenotype would not be linked to the genotype, if mutagenesis happens during transcription/translation. Otherwise, the corresponding mRNA must be linked to the protein for subsequent reverse transcription and coding DNA sequencing. This methodology already exists, but linkage to the protein may inhibit its functionality or solubility by changing the target protein properties. However, the pHEP construct in this thesis already includes a lacI gene for controlled T7 RNA polymerase dependent protein expression. As both, the strain modification and the genomic insertion failed, it is also possible to introduce the T7 polymerase by a plasmid. Thus, it is either possible to use a new plasmid or to change an existing one from the developed procedures in this thesis. The introduction of a new plasmid leads to an increased metabolic burden for the target Chapter three|Part II: Discussion 239 cell and furthermore an additional antibiotic must be used for selection. As the origins of replication of the ColE1 and the pSC101 class are currently used, the only compatible class will be the p15A family with e.g. pACYC. Another opportunity is the usage of tandem vectors, already including two promoter sites and independent MCS. This would mean the transfer of either the targeted gene insertion or the whole temperature sensitive system into a completely new vector. However, the copy number of the plasmid is a very important factor, because the balances between the target gene and the epPolI and also between the epPolI to the left genomic PolI are very finely tuned. Beneficial and very advantageous will be the opportunity to create a DE3 strain by easily transferring the additional plasmid into non-DE3 cells. However, during a selection process the metabolic burden has to be kept low or the extrinsic selection pressure must be decreased, otherwise the stress induced mutagenesis will lead to increased mutagenesis and consequent inactivation of the T7 system. However, the copy number of pHEP was kept low, because too many library samples in multiple plasmids (500-600) inside one cell means an overwhelming library size in a small cell culture. The more important fact is that the phenotype-genotype linkage is not as controlled as in low- or mid copy vectors. The genomic integration of the essential T7 polymerase means one gene copy per cell. Thus, the mid-copy pET-plasmids - and also the pHEP derivatives - deliver 20-30 gene copies of the lac repressor and the system is already more leaky than the λPR/cI857 system, as shown in this thesis. Hence, the introduction of the T7 RNA polymerase sequence in pHEP is not possible, because the system will become even more leaky and also increases the metabolic burden to much. Finally, the polymerase may be targeted by low fidelity PolI, if the gene is co-located on the target plasmid for mutagenesis, because the hypothesis of co-polymerization of epPolI and PolIII is not clarified in detail. The other opportunity and the compromise between metabolic burden, copy number and gene stability will be the introduction of the T7-polymerase into the pDFI-vector. The copy number is much lower, the epPolI tightly repressed at 30 °C and thus, the leakiness will not be problematic. Additionally, the cell has no reason to hypermutate the plasmid under non-restrictive conditions (lower temperature), while performing gene expression. Such, the IPTG-dependent expression induction will be kept. However, introduction of the gene can be easily performed behind the epPolI by extraction of the corresponding sequence from BL21 (DE3) and subsequent applying SLiCE after plasmid cleavage with PvuI. The appropriate selection can be performed with the spectomycin plasmid pET28a/Spc as both are compatible and the inducible resistance serves as strong selection marker. The PolI is also responsible for the removal of RNA oligonucleotides during chromosomal replication. During the lagging strand synthesis, Okazaki fragments by discontinuous DNA polymerization appear. It can be assumed, that mutagenesis also occurs during the primer removal and gap filling polymerization of epPolI at elevated temperatures. Also during the experiments a pattern of 450-560 bp downstream the origin of replication Chapter three|Part II: Discussion 240 occurs with massive base substitutions, compared to the general mutagenesis frequency. As these patterns were not present systematically, it may be related to an artifact of unknown reason. The most probably explanation is the Okazaki fragment replacement, also observed by Alexander et al. at the related position.[6] As the base excision repair is disturbed and translesion synthesis also enabled, the whole genome will become slowly mutated, but not as strong as in other in vivo approaches, e.g. by application of low-fidelity PolIII variants. Thus, the system may be suitable for other E. coli strains, which did not exhibit the special genotypic features of uvrA155, recA718 and polA12, because the expression of the epPolI must be set very balanced to enable mutagenesis, while not harming the genomic DNA. As the mutagenesis protocol of Troll et al. recommended growth at 37 °C, a mixture of epPolI and PolI will be present in the cell.[5] As roughly 400 molecules of PolI are active in the cell, the mutagenesis should also be possible in non-polA12 strains, if the λPR/cI857 system becomes fully induced at 42 °C. Another possible regulative drawback may the RecA mediated cleavage of cI857, modifying the leakiness of the used promoter.[77] The expression system was successfully tested for E. coli Top10 (recA1, deficient) and E. coli BL21 (DE3) (recA-WT) and no significant differences were detected. The processing was described to be very slow, thus enough cI857 may be synthesized, than undergoes cleavage by continuous active recA718 gene product.[77] At elevated temperatures, the protein expression is hampered by a higher chance of yielding insoluble protein. The folding and unfolding processes are almost balanced and thus, even if most protein will undergo aggregation, a small fraction may remain soluble for assay or functional analysis.[230] Critically analyzed, the DE3-dependent expression must be adapted to optimize the expression rate, e.g. by lowering the IPTG concentration to less than 0.1 mmol L-1. Additionally, the lowered temperature to 37- 39 °C may also have beneficial effects, even if the differences seem to be not so large. In contrast, at higher temperatures, E. coli starts to synthesize chaperones as heat shock response.[231-234] Assisting in protein folding, the target proteins, as well as other candidates used in the mutagenesis system become stabilized by the induction of the whole machinery, like used by E. coli chaperone strains/plasmids.[235-236] Therefore, a more stabilizing effect at higher temperatures can be assumed, but it must be noted and strongly recommended that for any type of target peptide, the expression rate has to be checked first in a common BL21 (DE3) strain at 39-42 °C to ensure protein stability and to propose suitable inductor concentrations. The trends of the mutagenic enrichment can be also followed by deep mutational scanning.[237] Regarding the mutagenic outcome of possible hit clones, the coupling to a selection process enables new possibilities concerning a targeted - de facto directed - evolution of the desired variant. But advantageous mutants may be missed during the selection process applied in vivo in liquid culture, since also mutants with modest contributions to the fitness will compete and thus prevent the appearance of even more infrequent Chapter three|Part II: Discussion 241 mutational variants with a higher impact on the fitness. This phenomenon is known as clonal interference.[238] Under selection pressure, the bacteria are able to turn into mutator strains by stress induced mutagenesis.[72] The accumulation of defects in the DNA repair mechanisms will be favored, because under strong selection the chance for beneficial mutations is highly desired to ensure evolutionary adaption.[86, 239] On the one hand, the mutagenesis machinery introduced in the host strain will be kept active during the selection time frame and stabilized to finally ensure cell survival/adaption. On the other hand, the emergence of spontaneous mutators is decreased, because the mutator plasmids are not mutation limited and hence can be easily cured by plasmid removal under non-restrictive conditions.[101] The curing of a spontaneous mutator cannot be easily applied, if strain evolution or protein expression with the same strain is desired. The mutation rate of the previously reported pMutD5[101] plasmid enables a frequency of 4.6·10-5. The highly evolved plasmid MP6[126] yielded a mutation rate of 6.24·10‐6 per bp per generation. It is probably very hard to normalize the methodology of mutation rate discrimination. For example the possibility to gain rifampin resistant E. coli depends on roughly 21 possible mutagenesis sites. The mutation frequency for pMutD5 was calculated by the number of resistant clones divided by the total number of plated cells. In contrast, MP6 was calculated by the frequency of rifampin resistant mutants, depending on the number of unique sites for rifampin resistance and the population size normalized by the population density, where first observed mutants were achieved. Additionally, a mutation frequency of 8·10-3 was yielded by comparison of rifampin resistant clones to the total number of colonies without antibiotics. Thus, the rate is two orders of magnitude higher as observed for pMutD5. However, the mutation frequency was about 6.24·10‐6 per bp per generation. The calculated mutation rate in a 1 kb window for epPolI with the temperature inducible system in JS200 was about 8.5·10-4 per bp and is less than the frequency of 2.1·10-3 achieved with optimized conditions and the original system.[7] The generation time (doubling time) of E. coli is under optimal conditions roughly 20 min. Ignoring differences during the lag, log or stationary phases, in 24 h up to 72 generations may be passed with a corresponding total frequency of 4.5·10-4 bp. As the mutations are dispersed all over the genome, the systems fidelity may be compared only by the substitutions per bp, also because in the epPolI system of the Camps lab, pUC plasmids were used with very high copy numbers.[7] In contrast, the number of pHEP is roughly 30 times lower, thus the frequency of replication events is also decreased. Compared to the original mutation frequency, divided by the lower copy number roughly 7-8·10-4 base substitutions are calculated. However, the mutation rate was nearly kept and was stable over a minimum of four days of continuous directed evolution. Then, plasmid isolation and retransformation is strongly recommended, because the mutagenic properties get lost. It may be postulated that laboratory continuous evolution without the combination of selection pressure remains impossible. The selection pressure is a necessary factor of adaptive evolution Chapter three|Part II: Discussion 242 and without this pressure the current state of fitness and adaption is highly favored. Hence, any intracellular mutagenesis system will be reverted to the wild-type, when the cellular survival is not directly dependent on fast adaption. The correlated mutagenesis rates - also combined with the available mutagenesis systems - will be much higher, as stress-induced mutagenesis will also play a significant role in the stressed cell cultures. Indeed, the drawback of revertants can be overcome only with an appropriate selection methodology. The polA deficiency is correlated to sensitivity to radiation and alkylating agents, suiting the epPolI mutagenesis system for the combination of a recently developed selection assay.[33] If necessary, the epPolI system may be applicable also in non-polA12 strains with major concerns about inactivation of the epPolI by stop codon generation within the cellular mutagenesis system. Furthermore, the combination with other mutator plasmids is generally possible, because the mechanism is fundamentally different, but the risk of epPolI inactivation increases with increasing mutator properties of the host cell. Additionally, the application to established mutagenesis systems (pMutD5, MP6) is given and it has to be mentioned, that automatization of the mutagenesis methodology may increase their applicability. The denser the culture grows the sooner mutation frequencies drop. It is strongly recommended to prove the polA12 phenotype by spiral agar plate assay before using the JS200 strain and also to prevent enrichment in stationary phase, which is not easy to apply during continuous systems. Indeed, a versatile, fully automatized robotic platform is capable of time-dependent growth rate measurements and re-inoculation of the culture every 6-12 h to keep the cells in the log phase for more frequent mutagenesis events and more often plasmid replication. Additionally, it is possible to established parallel activity tests, depending on the downstream procedure: Time-dependent intervals for sampling, protein expression and the assay itself require different incubators, but the final processing techniques enable the assay analysis, either in whole cells, crude extracts or also with purified proteins.[32]

Chapter three|Part II 243

Summary

The aim of this chapter was the establishment of a directed mutagenic system, capable of continuous mutagenesis over a defined time frame, without abrupt interruptions for the application to suitable selection systems. Therefore, a low-fidelity mutant of the DNA polymerase I (epPolI)[173, 180] was chosen to target the mutagenesis towards the sequence of interest. During DNA replication, PolI is responsible for Okazaki fragment removal in lagging strand synthesis and ColE1 dependent initiation of plasmid replication (e.g. pBR322 or pMB1 oris). Therein, the PolI specifically replicates the first 200 nt until the switch to PolIII for unidirectional synthesis. As mutagenesis was observed downstream this switch, epPolI was included or partially recruited for lagging and leading strand synthesis of the plasmid.[6] A common pET28a vector was used for rearrangement to yield pHEP with a re-located ori. The orientation and position of the ori downstream the T7 terminator of the MCS of pHEP ensures targeted mutagenesis of up to 3.3 kbp downstream the ori, preserving the complete MCS for cloning procedures.[5] Actual host strains exhibited either no thermolabile variant of genomic PolI (polA12) or the necessarily mutated recA718 allele (BL21 DE3) or were unable for T7 RNA polymerase-dependent gene overexpression (JS200). The introduction of the essential mutations into BL21 (DE3) was only successful for recA718, the experiments with polA and polA12 were more challenging and thus, it was decided to introduce the DE3 genotype into JS200. Because of the phage resistant lamB/malT genotype, lysogenization with the λ phage was no choice. However, a reporter plasmid for DE3 strains (pET28a/Spc) was successfully created and also the mutator plasmid pDFI would be capable of harboring the T7 RNA polymerase under control of the lacUV5 promoter.

Within the constructed pDFI vector, the epPolI was encoded, regulated by the λPR/cI857 promoter system. This was introduced on a ColE1 compatible plasmid (with a pSC101 ori) for tight, temperature-related expression of epPolI. The promoter system was tested in this thesis with GFP as reporter and compared to standard T7 RNA polymerase protein expression. The adapted system shows a 51-fold stronger signal (measured by the GFP fluorescence) under induction conditions as compared to the 42-fold increased signal during T7 induction. Additionally, under experimental conditions, the expression signal was 4-fold higher as detected for the commonly used T7 regulation. Finally, continuous mutagenesis of the targeted sequence by the application of pHEP and pDFI in JS200 was possible up to four days and yielded a mutation frequency of averaged 8.5·10-4 per bp. This is comparable to previously described, discontinuous results.[6] The system may be easily adaptable to other strains by modulation of the growth temperature to tune the epPolI expression and may serve as alternative for the available, non-targeted genome- wide mutator plasmids.[101, 126] The technology transfer to a robotic platform may further increase automatized mutagenesis, while serving the opportunity for a specialized, protein-tailored selection process, which will generally stabilize the mutator properties. Chapter three|Part II 244

Materials and Methods

Oligonucleotides for PCR priming, plasmid and gene sequences were deposited in the Department of Biotechnology and Enzyme Catalysis, Institute for Biochemistry, University of Greifswald. pHEP - the host plasmid for target genes The helper plasmid, hosting the target gene was created by rearrangement of pET28a (Novagen, GER). The part of the multiple cloning site (MCS), the site containing the kanamycin resistance and the origin of replication (ori) were amplified during a standard PCR to insert the ori in-between the MCS and kmR site. Subsequently, the product was digested with 10 U of DpnI for 2 h at 37 °C.[240] The enzyme was inactivated by incubation for 20 min at 80 °C. The crude products were purified (NucleoSpin® Gel and PCR Clean-up, Macherey-Nagel, GER) and co-transformed into competent Top10 cells (Thermo Fisher, GER). Clones were sequenced with oligonucleotides to verify the ori insertion and correct orientation downstream the T7-terminator of pET28a.

TAF02/JS200 (DE3) - genome editing pRED/ET recombination

The gene mutations recA718 and polA12 were introduced into BL21 (DE)3 rpsL (GeneBridges, GER) via the pRED/ET counter selection kit (GeneBridges, GER) following the suppliers manual (Figure 22). [241] An overnight culture of BL21 (DE3) rpsL was harvested (3 min 13,000 g at 4 °C) and washed twice with 1 mL of cold glycerol solution (10% in distilled water). An amount of 1 µL of the supplied pRED/ET plasmid was added to the cells, finally suspended in 50 µL glycerol solution. The mixture was added into pre-chilled 2-mm electroporation cuvettes (BTX Cuvettes Plus, Harvard Apparatus, US) and 2.5 kV were applied for 5 ms (MicroPulser, Biorad, GER). Subsequently, 1 mL of LB medium was added directly after the pulse and the cells were recovered at 30 °C for 70 min (maintenance of the temperature sensitive pRED/ET plasmid takes place only at temperatures below 35 °C). The cells were finally plated on LB agar plates containing 3 µg mL-1 tetracycline. A colony was picked and grown at 30 °C for 2 h until an OD600 of 0.3. Then, 10% arabinose solution (in distilled water) was added to a final concentration of 0.4% to induce the gene expression from the pRED/ET plasmids. The cultures were put to 37 °C for 60 min. Afterwards, the cells were harvested as described, washed twice with 1 mL of glycerol solution and suspended in a final volume of 50 µL. For genetic insertion, 2 µL of the supplied linear rpsL-neoR cassette were added to the cells and the mixture was transferred into a pre-chilled electroporation cuvette and electroporated at 2.5 kV for Chapter three|Part II: Materials and Methods 245

5 ms. Subsequently, 1 mL LB medium was added and the cells were incubated at 37 °C for 70 min for recombination.

A B C

Figure 22: Principle of editing bacterial genomes by pRED/ET mediated counter selection. (A) The pRED/ET plasmid is transformed to bacterial cells, harboring a bacterial artificial chromosome (BAC). (B) The recombination proteins Redα, Redβ and Gam are expressed by arabinose induction and a temperature shift to 37 °C reduces the pRED/ET amount by the temperature-sensitive origin of replication. Subsequently, the insertion cassette, mediating a selection marker (neo for neomycin/kanamycin resistance) and a counter selection marker (CSM, in this case the modulated streptomycin sensitivity), is introduced into the BAC by additional sequences, enabling homologous recombination (hm). (C) The cassette is finally replaced by the target sequence/the mutated gene by pRED/ET recombination and identical homology arms (hm). The edited clones are selected by the CSM (streptomycin resistance). The figure is taken from the supplier’s manual, with kind permission of GeneBridges, GER.[225]

The mixture was spread on LB agar plates containing 3 µg mL-1 tetracycline and 20 µg mL-1 kanamycin and incubated overnight at 30 °C. Sample colonies were picked, grown in tubes containing 100 µL LB (supplemented with 3 µg mL-1 tetracycline and 20 µg mL-1 kanamycin) for 2 h at 30 °C. Then, a toothpick was put into the suspension and adhered cells were firstly streaked on agar plates with 3 µg mL-1 tetracycline and 20 µg mL-1 kanamycin and then on agar plates with 50 µg mL-1 streptomycin and 20 µg mL-1 kanamycin. 300 µL of LB were added to the tubes and grown overnight at 30 °C. The plates were incubated overnight at 37 °C. Positive colonies showed growth on the first, but not on the second agar plates, due to the functional rpsL-neoR cassette, mediating smS and kmR phenotypes. The grown colonies were double checked by cPCR with suitable oligonucleotides (Figure 12, see general method section for cPCR procedure). From the overnight tubes, containing hit verified clones, fresh cultures were inoculated in LB medium, supplemented with 3 µg mL-1 tetracycline and 20 µg mL-1 kanamycin and grown at 30 °C until an OD600 of 0.3 (2-3 h). Again, 10% arabinose solution (0.4% final concentration) was added for pRED/ET induction and the cells were shaken at 37 °C for 1 h. Afterwards, they were harvested, washed twice with cold glycerol solution and suspended in a final volume of 50 µL. 100-200 ng of the counter-selection DNA fragment Chapter three|Part II: Materials and Methods 246 were added and the mixture was transferred to a pre-chilled electroporation cuvette. The cells were pulsed at 2.5 kV for 5 ms, LB medium was added directly after the pulse and recombination was enabled at 37 °C for 70 min. Finally, the cells were spread on LB agar plates containing 50 µg mL-1 streptomycin at 37 °C overnight to get rid of the pRED/ET plasmid. Sample colonies were picked and analyzed by cPCR with suitable oligonucleotides. The colonies were also checked for kmS phenotype by streaking the picked colony after transfer to the PCR mixture first on agar plates with 50 µg mL-1 streptomycin and second on agar plates containing 20 µg mL-1 kanamycin. The counter- selection DNA fragment was either ordered as gene string (Life Technologies, GER) or derived from the target mutant JS200: The genomic DNA was isolated by the DNeasy Blood & Tissue Kit (Qiagen, GER) corresponding to the supplier’s instruction. The total DNA extract served as template for a PCR, extracting the polA or recA locus. The PCR product was purified and eluted with distilled water, sequenced and directly used for the counter selection transformation. The insertion cassette was amplified by general PCR methods. The polA gene, isolated from BL21 (DE3) was cloned in pET15b by NdeI/BamHI restriction sites. The amplifications of the vector, all insertion sequences, the fragments for FastCloning[217], SliCE[224], Gibson assembly[223] or NEBuilder reactions were performed with suitable oligonucleotides, priming the reaction, while adding homologous parts of defined lengths to the fragments, depending on the procedure. However, general PCR methods were used for amplicons. The SLiCE method uses E. coli lysate to recombine homologue parts of genetic material (analogous to FastCloning, but during SLiCE process the overlapping sequences were partially digested).[224] Alternative recombineering cloning methods were described with more defined enzyme compositions, but SLiCE enables a fast and cheap process.[242-243] The SLiCE extract was prepared from E. coli Top10 cells as described before[224] and was a kind gift of the working group “DropIn Biofuels”, Institute for Biochemistry, Greifswald University. The SLiCE fragments were incubated for 1 h at 37 °C for restriction and ligation procedures, mediated by the lysate’s nucleases, polymerases and . The overlap-extension PCR includes an altered PCR temperature profile, changing the targeted amplicons during the whole reaction. First, the amplification of recombining fragments by the shared homologies was performed. Second, oligonucleotides were added for amplification of the whole target gene (Table 2).

Chapter three|Part II: Materials and Methods 247

Table 2: Amplification protocol for the overlap-extension PCRs. The oligonucleotides were added 10-fold concentrated to keep yield loss by volume changes as small as possible. The program was also run as gradient PCR to optimize annealing temperatures.

Step Cycle Temperature [°C] Time [s] Description 1 1 95 30-60 initial denaturation 2 95 20 denaturation 3 15 60-65 30 fragment annealing 4 72 60 per kbp fragment elongation 5 - isothermal - primer addition 6 95 20 denaturation 7 30 50-60 30 primer annealing 8 72 60 per kbp whole gene elongation 9 1 72 300-600 final elongation

Lysogenization of JS200

The lysogenization of JS200 was performed after the manufacturers instruction (λDE3 Lysogenization Kit - Novagen, Merck Millipore, GER), except the incubation and growth condition were set up at 30 °C to prevent growth inhibitions by the polA12 mutation. With kind advice, pET28a/Spc was created with the help of the working group “DropIn Biofuels”, Institute for Biochemistry, Greifswald University. The spc gene was cloned into pET28a by specific amplification with oligonucleotides from the genome of the streptomycin resistant Bacillus strain FLN013n#2vI. Thus, overhangs were added, permitting homologous recombination with the pET28a vector, previously processed by cleavage with the restriction endonucleases NdeI and XhoI (Thermo Scientific, GER). After addition of the provided SLiCE extract the samples were incubated in T4 DNA ligase buffer (Thermo Scientific, GER) for 1 h at 37 °C. The product was electroporated into competent BL21 (DE3) Gold cells for direct antibiotic selection on conditioned LB agar plates (0.1 mmol L-1 IPTG, 50 µg mL-1 kanamycin and 50 µg mL-1 spectinomycin). pDFI - the error-prone plasmid

The mutagenesis plasmid pDFI should carry the epPolI under control of cI857/λPR. A synthetic gene string was ordered, harboring the cI857 repressor in opposite orientation 1-3 as λPR. Followed by λPR, including the operator regions OR as repressor binding sites, a linker leads to the RBS (copied from pET28a or an artificial sequence). Downstream the RBS another linker of 5-10 bp was introduced until the start codon ATG of the GFP sequence is reached. Thus, the T7-operator from pEP was substituted until the ATG starting codon by FastCloning.[217] The common RBS exhibited the sequence AGGAGGT. In parallel this gene string was introduced in pET11a/GFP, by substituting the T7- promoter. The lacI gene remained, because the repressor was now useless for protein Chapter three|Part II: Materials and Methods 248 expression. The product was named GFP(ts1). Another approach used the linker sequence with the RBS from pET28a (GAAGGAG) and was introduced between the λPR promoter and the start codon by substituting the sequences from GFP(ts1) by QuikChange® (see General Methods section).[244] To calculate the new expression level and compare it relative to LacI/T7 both vectors (pET11a/GFP and pET11a/GFP(ts1)) were transformed in BL21 (DE3) and cultivation at 30 °C or 42 °C and with or without IPTG addition was recorded. After overnight expression, a 7 OD-1 sample was lysed by sonication (three times 50% cycle, 60% power) and 200 µL of centrifuged supernatant were measured for fluorescence (excitation 475 nm and emission 509 nm at an Infinite® 200 Pro, Tecan, CH). Since no significant signal was detected for GFP(ts1), the whole sequence from the lac operator to the start codon of GFP were adapted from pET28a, resulting in GFP(ts2) with no significant fluorescence signal again. After recognizing the mutation T67A in cI857, the sequence was corrected via QuikChange® for both promoter variants, yielding GFP(ts3) and GFP(ts4), respectively. Finally, a fluorescence signal was detected and the approach was transferred to MTPs for parallel measurements. Freshly transformed colonies were picked in microtiter plates with 180 µL LB broth medium, -1 supplemented with 100 µg mL ampicillin. After 3-4 h growth at 37 °C, (OD600 approximately 0.4-0.6) to the half of the plate 20 µL IPTG in LB medium (final concentration 0.1 mmol L-1) or 20 µL LB medium were added. The plates were put to 30

°C, 37 °C or 42 °C. After overnight expression, the OD600 was measured to normalize GFP fluorescence signals by the cell density. Thus, the cultures were harvested (3000 g, 20 min, 4 °C) and washed twice with sodium phosphate buffer (20 mmol L-1, pH 7.5). Finally, the cells were lysed with a BugBuster (1x)/lysozyme (1 mg mL-1) mixture for 1 h at 37 °C. After centrifugation, 20 µL of the supernatant were transferred to 180 µL buffer and the fluorescence was measured. The last promoter variant was GFP(ts5), developed from GFP(ts3), while substituting one 2 nucleotide in the operator region OR of the λPR promoter. The substitution was performed by QuikChange®. As all promoter variants were tested for protein expression, the promoter sequences of pEP were substituted, yielding pDFI02 and pDFI03, which were derived from GFP(ts3) and GFP(ts5), respectively. Therefore, the whole construct from the stop codon of the cI857 repressor to the ATG start codon was amplified in a PCR, using suitable oligonucleotides, which introduced homology sites to the DNA construct, enabling the insertion into pEP by FastCloning. The final plasmids were sequenced to verify the gene positions and orientations.

Continuous mutagenesis Initially, all JS200 stocks or cultures not harboring the mutation plasmids were checked for growth at 30 °C or 37 °C overnight on LB agar plates by spiral streaking the cell suspension with inoculation loops on the plates as described before.[128] Thus, the Chapter three|Part II: Materials and Methods 249 polA12 genotype was verified in a quick test. The mutation plasmids pDFI02 and pDFI03 were transformed by electroporation to washed JS200 cells in 10% glycerol solution. After selection of chloramphenicol (final concentration 30 μg ml-1) resistant clones, colonies were picked and grown for 3-4 h at 30 °C in 5 mL LB medium (30 μg ml-1 chloramphenicol). For the preparation of electrocompetent JS200 cells, carrying the respective mutagenesis plasmid pDFI02 or pDFI03, the whole volume was used to inoculate a 500 mL LB culture (30 μg ml-1 chloramphenicol), which was further grown at

30 °C until an OD600 of 0.4. Then, cells were pre-cooled for 20 min on ice and subsequently harvested by centrifugation (4500 g, 40 min, 4 °C) and washed with 500 mL cold glycerol solution (10% in distilled water, sterile). By the same procedure, the cell pellet was washed and centrifuged with 250 mL, 20 mL and 1.5 mL glycerol solution. Resuspended in the final volume, 50 µL aliquots were deep-frozen in liquid nitrogen and stored at -80 °C. The electrocompetent JS200 pDFI02/pDFI03 cells were used for transformation with the host vector, including the subcloned target gene. Electroporation was performed at 2.5 kV for 5 ms in pre-chilled 2-mm cuvettes as mentioned above. Then, 600 µL of LB medium were added and the cells were regenerated for 60 min at 30 °C, before spreading on LB agar plates supplemented with 30 μg ml-1 chloramphenicol and 50 μg ml-1 kanamycin. The plates were left for growth overnight at 30 °C. Freshly transformed colonies were picked and grown into 1 mL LB medium supplemented with chloramphenicol and kanamycin in a 2 mL tube. The lid was punctured manually by a cannula for air supply. For growth, temperatures of 39 °C or 42 °C were chosen and after 24 h, a fresh 1 mL culture was inoculated with 1 μL of the previous cell culture. The residual cell suspension underwent plasmid isolation, retransformation in Top10 cells and sequencing for mutagenesis rate determination.

Chapter three|Part II 250

Abbreviations aa amino acid nt nucleotide(s) Amp ampicillin ori origin of replication bp base pairs PACE phage-assisted Cm chloramphenicol continuous evolution cI857 temperature sensitive (c/g/ep)PCR polymerase chain

repressor of the λPR reaction (c - colony, ep - promoter error-prone, g - gradient) dsDNA double-stranded DNA pDFI pEP derivative with error E. coli Escherichia coli prone DNA polymerase I

EDTA ethylenediamine- and cI857/λPR (decreased tetraacetic acid in fidelity) ep error-prone pEP plasmid with error-prone FACS fluorescence-activated DNA polymerase I cell sorting pHEP pET28a derivative as host fw forward (oligonucleotide for error-prone muta- priming the 5’-end genesis nucleotides of a gene) PolI DNA-directed DNA GFP green-fluorescing protein polymerase I GOI gene of interest (target RBS ribosome binding site gene for mutagenesis) RID random insertion/ HTS high-throughput deletion screening Rop repressor of primer IPTG isopropyl β-D-1- Rom RNAI modulating protein thiogalactopyranoside rv reverse (oligonucleotide ITCHY incremental truncation priming the 3’-end of for the creation of hybrid nucleotides of a gene) enzymes SeSaM sequence saturation Km kanamycin mutagenesis LacI repressor of the T7 SLiCE seamless ligation cloning promoter in absence of extract lactose or IPTG Sm streptomycin MAGIC mating-assisted Spc spectinomycin genetically integrated ssDNA single-stranded DNA cloning Tet tetracycline MTP microtiter plate WT wild type MW MegaWhoP (megaprimer PCR of whole plasmid)

Chapter three|Part II 251

References L. M. Amzel, Arch. Biochem. Biophys. [1] N. E. Labrou, Curr. Protein Pept. Sci. 2005, 433, 129-143. 2010, 11, 91-100. [18] A. G. McLennan, Cell. Mol. Life Sci. [2] J. C. Carlson, A. H. Badran, D. A. 2005, 63, 123-143. Guggiana-Nilo, D. R. Liu, Nat. Chem. Biol. 2014, 10, 216-222. [19] J. A. Gerlt, P. C. Babbitt, Annu. Rev. Biochem. 2001, 70, 209-246. [3] K. M. Esvelt, J. C. Carlson, D. R. Liu, Nature 2011, 472, 499-503. [20] J. G. van Leeuwen, H. J. Wijma, R. J. Floor, J. M. van der Laan, D. B. Janssen, [4] M. Camps, Recent Pat. DNA Gene Seq. ChemBioChem 2012, 13, 137-148. 2010, 4, 58-73. [21] O. Khersonsky, D. S. Tawfik, Annu. Rev. [5] C. Troll, D. Alexander, J. Allen, J. Biochem. 2010, 79, 471-505. Marquette, M. Camps, J. Vis. Exp. 2011, e2505. [22] O. Khersonsky, C. Roodveldt, D. S. Tawfik, Curr. Opin. Chem. Biol. 2006, 10, [6] D. L. Alexander, J. Lilly, J. Hernandez, J. 498-508. Romsdahl, C. J. Troll, M. Camps, Methods Mol. Biol. 2014, 1179, 31-44. [23] R. A. Jensen, Ann. Rev. Microbiol. 1976, 30, 409-425. [7] M. Camps, J. Naukkarinen, B. P. Johnson, L. A. Loeb, Proc. Natl. Acad. [24] C. Schmidt-Dannert, F. H. Arnold, Sci. USA 2003, 100, 9727-9732. Trends Biotechnol. 1999, 17, 135-136. [8] E. T. Farinas, T. Bulter, F. H. Arnold, [25] L. M. Mayr, D. Bojanic, Curr. Opin. Curr. Opin. Biotechnol. 2001, 12, 545- Pharmacol. 2009, 9, 580-588. 551. [26] R. Ostafe, R. Prodanovic, J. Nazor, R. [9] P. D. Carr, D. L. Ollis, Protein Pept. Lett. Fischer, Chem. Biol. 2014, 21, 414-421. 2009, 16, 1137-1148. [27] R. Ostafe, R. Prodanovic, U. [10] D. L. Ollis, E. Cheah, M. Cygler, B. Commandeur, R. Fischer, Anal. Dijkstra, F. Frolow, S. M. Franken, M. Biochem. 2013, 435, 93-98. Harel, S. J. Remington, I. Silman, J. [28] E. Fernandez-Alvaro, R. Snajdrova, H. Schrag, et al., Protein Eng. 1992, 5, 197- Jochens, T. Davids, D. Bottcher, U. T. 211. Bornscheuer, Angew. Chem. Int. Ed. [11] S. T. Rao, M. G. Rossmann, J. Mol. Biol. 2011, 50, 8584-8587. 1973, 76, 241-256. [29] K. H. Bleicher, H.-J. Bohm, K. Muller, A. [12] P. C. Babbitt, J. A. Gerlt, J. Biol. Chem. I. Alanine, Nat. Rev. Drug. Discov. 2003, 1997, 272, 30591-30594. 2, 369-378. [13] M. E. Glasner, J. A. Gerlt, P. C. Babbitt, [30] J. P. Overington, B. Al-Lazikani, A. L. Curr. Opin. Chem. Biol. 2006, 10, 492- Hopkins, Nat. Rev. Drug. Discov. 2006, 497. 5, 993-996. [14] A. Gerlt, P. C. Babbitt, I. Rayment, Arch. [31] A. Musenga, D. A. Cowan, J. Chrom. A Biochem. Biophys. 2005, 433, 59-70. 2013, 1288, 82-95. [15] C. M. Seibert, F. M. Raushel, [32] M. Dörr, M. P. C. Fibinger, D. Last, S. Biochemistry 2005, 44, 6383-6391. Schmidt, J. Santos-Aberturas, C. Vickers, M. Voss, U. T. Bornscheuer, Biotechnol. [16] H. M. Holden, M. M. Benning, T. Haller, Bioeng. 2016. J. A. Gerlt, Acc. Chem. Res. 2001, 34, 145-157. [33] M. P. C. Fibinger, T. Davids, D. Bottcher, U. T. Bornscheuer, Appl. Microbiol. [17] A. S. Mildvan, Z. Xia, H. F. Azurmendi, V. Biotechnol. 2015, 99, 8955-8962. Saraswat, P. M. Legler, M. A. Massiah, S. B. Gabelli, M. A. Bianchet, L. W. Kang, [34] H. Kawasaki, H. Kuriyama, K. Tonomura, Biodegradation 1995, 6, 213-216. Chapter three|Part II: References 252

[35] B. A. Bridges, M. W. Southworth, E. Orr, [53] D. S. Wilson, A. D. Keefe, Curr. Protoc. Mut. Res. 1983, 112, 3-16. Mol. Biol. 2001, Chapter 8, Unit8 3. [36] B. A. Bridges, R. Woodgate, Brit. J. [54] F. H. Arnold, A. A. Volkov, Curr. Opin. Cancer 1985, 51, 606-606. Chem. Biol. 1999, 3, 54-59. [37] Z. Chen, H. Zhao, Nucleic Acids Res. [55] A. Crameri, S. A. Raillard, E. Bermudez, 2005, 33, e154. W. P. Stemmer, Nature 1998, 391, 288- 291. [38] M. Gruen, K. Chang, I. Serbanescu, D. R. Liu, Nucleic Acids Res. 2002, 30, e29. [56] W. P. Stemmer, Proc. Natl. Acad. Sci. USA 1994, 91, 10747-10751. [39] S. N. Ho, H. D. Hunt, R. M. Horton, J. K. Pullen, L. R. Pease, Gene 1989, 77, 51- [57] H. M. Zhao, F. H. Arnold, Nucleic Acids 59. Res. 1997, 25, 1307-1308. [40] R. M. Horton, in PCR Protocols: Current [58] D. Tischler, P. Zwicker, R. Schwabe, J. A. Methods and Applications (Ed.: B. A. D. Gröning, S. R. Kaschabek, M. White), Humana Press, Totowa, NJ, Schlömann, Am. J. Curr. Mirobiol. 2013, 1993, pp. 251-261. 1, 14-21. [41] S. Park, K. L. Morley, G. P. Horsman, M. [59] M. Ostermeier, A. E. Nixon, S. J. Holmquist, K. Hult, R. J. Kazlauskas, Benkovic, Bioorg. Med. Chem. 1999, 7, Chem. Biol. 2005, 12, 45-54. 2139-2144. [42] M. Wilming, A. Iffland, P. Tafelmeyer, C. [60] T. S. Wong, K. L. Tee, B. Hauer, U. Arrivoli, C. Saudan, K. Johnsson, Schwaneberg, Nucleic Acids Res. 2004, ChemBioChem 2002, 3, 1097-1104. 32, e26-e26. [43] C. S. Gibbs, M. J. Zoller, Biochemistry [61] H. Murakami, T. Hohsaka, M. Sisido, 1991, 30, 5329-5334. Nat. Biotechnol. 2002, 20, 76-81. [44] B. C. Cunningham, J. A. Wells, Science [62] M. Z. Li, S. J. Elledge, Nat. Genet. 2005, 1989, 244, 1081-1085. 37, 311-319. [45] M. T. Reetz, M. Bocola, J. D. Carballeira, [63] S. B. Rubin-Pitel, C. M. H. Cho, W. Chen, D. Zha, A. Vogel, Angew. Chem. Int. Ed. H. Zhao, in Bioprocessing for Value- 2005, 44, 4192-4196. Added Products from Renewable Resources (Ed.: S.-T. Yang), Elsevier, [46] M. T. Reetz, J. D. Carballeira, J. Amsterdam, 2007, pp. 49-72. Peyralans, H. Hobenreich, A. Maichele, A. Vogel, Chemistry 2006, 12, 6031- [64] C. Bokel, Methods Mol. Biol. 2008, 420, 6038. 119-138. [47] M. Landwehr, M. Carbone, C. R. Otey, Y. [65] R. Haerlin, R. Sussmuth, F. Lingens, Li, F. H. Arnold, Chem. Biol. 2007, 14, FEBS Lett. 1970, 9, 175-176. 269-278. [66] H. Nishioka, Mutation Res. 1975, 31, [48] Y. Li, D. A. Drummond, A. M. 185-189. Sawayama, C. D. Snow, J. D. Bloom, F. [67] W. Szybalski, Ann. N. Y. Acad. Sci. 1958, H. Arnold, Nat. Biotechnol. 2007, 25, 76, 475-489. 1051-1056. [68] M. H. L. Green, W. J. Muriel, Mutation [49] R. J. Pantazes, M. C. Saraf, C. D. Res. 1976, 38, 3-32. Maranas, Protein Eng. Des. Sel. 2007, 20, 361-373. [69] H. J. Muller, Science 1927, 66, 84-87. [50] L. A. Weymouth, L. A. Loeb, Proc. Natl. [70] E. M. Witkin, Bacteriol. Rev. 1976, 40, Acad. Sci. USA 1978, 75, 1924-1928. 869-907. [51] R. C. Cadwell, G. F. Joyce, PCR Methods. [71] P. L. Foster, Crit. Rev. Biochem. Mol. Appl. 1992, 2, 28-33. Biol. 2007, 42, 373-397. [52] Y. H. Zhou, X. P. Zhang, R. H. Ebright, [72] I. Bjedov, O. Tenaillon, B. Gérard, V. Nucleic Acids Res. 1991, 19, 6052. Souza, E. Denamur, M. Radman, F. Chapter three|Part II: References 253

Taddei, I. Matic, Science 2003, 300, [92] G. J. McKenzie, P. L. Lee, M. J. 1404-1409. Lombardo, P. J. Hastings, S. M. Rosenberg, Mol. Cell 2001, 7, 571-579. [73] J. R. Roth, E. Kugelberg, A. B. Reams, E. Kofoid, D. I. Andersson, Annu. Rev. [93] M. Banach-Orlowska, I. J. Fijalkowska, Microbiol. 2006, 60, 477-501. R. M. Schaaper, P. Jonczyk, Mol. Microbiol. 2005, 58, 61-70. [74] A. Sancar, C. Stachelek, W. Konigsberg, W. D. Rupp, Proc. Natl. Acad. Sci. USA [94] M. F. Goodman, Annu. Rev. Biochem. 1980, 77, 2611-2615. 2002, 71, 17-50. [75] Y. Savir, T. Tlusty, Mol. Cell 2010, 40, [95] W. Kuban, M. Banach-Orlowska, R. M. 388-396. Schaaper, P. Jonczyk, I. J. Fijalkowska, J. Bacteriol. 2006, 188, 7977-7980. [76] I. De Vlaminck, M. T. J. van Loenhout, L. Zweifel, J. den Blanken, K. Hooning, S. [96] R. Napolitano, R. Janel-Bintz, J. Wagner, Hage, J. Kerssemakers, C. Dekker, Mol. R. P. P. Fuchs, EMBO J. 2000, 19, 6259- Cell 2012, 46, 616-624. 6265. [77] J. W. Little, Proc. Natl. Acad. Sci. USA [97] J. W. Drake, Proc. Natl. Acad. Sci. USA 1984, 81, 1375-1379. 1991, 88, 7160-7164. [78] H. Shinagawa, Exs 1996, 77, 221-235. [98] J. E. LeClerc, B. Li, W. L. Payne, T. A. Cebula, Science 1996, 274, 1208-1211. [79] G. C. Walker, Trends Biochem. Sci. 1995, 20, 416-420. [99] F. Taddei, M. Radman, J. Maynard- Smith, B. Toupance, P. H. Gouyon, B. [80] O. Gillor, J. A. C. Vriezen, M. A. Riley, Godelle, Nature 1997, 387, 700-702. Microbiology 2008, 154, 1783-1792. [100] A. W. Nguyen, P. S. Daugherty, Methods [81] I. Erill, M. Escribano, S. Campoy, J. Mol. Biol. 2003, 231, 39-44. Barbe, Bioinformatics 2003, 19, 2225- 2236. [101] O. Selifonova, F. Valle, V. Schellenberger, Appl. Environ. [82] L. Grossman, A. T. Yeung, Mut. Res. Microbiol. 2001, 67, 3645-3649. 1990, 236, 213-221. [102] U. T. Bornscheuer, J. Altenbuchner, H. [83] N. Nag, B. J. Rao, G. Krishnamoorthy, J. H. Meyer, Biotechnol. Bioeng. 1998, 58, Mol. Biol. 2007, 374, 39-53. 554-559. [84] S. Acharya, P. L. Foster, P. Brooks, R. [103] G. Coia, P. J. Hudson, R. A. Irving, J. Fishel, Mol. Cell 2003, 12, 233-246. Immunol. Methods 2001, 251, 187-193. [85] C. Ban, W. Yang, EMBO J. 1998, 17, [104] E. Henke, U. T. Bornscheuer, Biol. 1526-1534. Chem. 1999, 380, 1029-1033. [86] P. D. Sniegowski, P. J. Gerrish, R. E. [105] X. Lu, H. Hirata, Y. Yamaji, M. Ugaki, S. Lenski, Nature 1997, 387, 703-705. Namba, J. Virol. Methods 2001, 94, 37- [87] B. Tippin, P. Pham, M. F. Goodman, 43. Trends Microbiol. 2004, 12, 288-295. [106] L. Chao, E. C. Cox, Evolution 1983, 37, [88] B. T. Smith, G. C. Walker, Genetics 1998, 125-134. 148, 1599-1610. [107] W. Tröbner, R. Piechocki, Mol. Gen. [89] M. Patel, Q. Jiang, R. Woodgate, M. M. Genet. 1984, 198, 175-176. Cox, M. F. Goodman, Crit. Rev. [108] M. D. Gross, E. C. Siegel, Mutat. Res. Biochem. Mol. Biol. 2010, 45, 171-184. 1981, 91, 107-110. [90] M. D. Sutton, G. C. Walker, Proc. Natl. [109] H. Liao, T. McKenzie, R. Hageman, Proc. Acad. Sci. USA 2001, 98, 8342-8349. Natl. Acad. Sci. USA 1986, 83, 576-580. [91] B. Yeiser, E. D. Pepper, M. F. Goodman, [110] G. Coia, A. Ayres, G. G. Lilley, P. J. S. E. Finkel, Proc. Natl. Acad. Sci. USA Hudson, R. A. Irving, Gene 1997, 201, 2002, 99, 8737-8741. 203-209. Chapter three|Part II: References 254

[111] U. T. Bornscheuer, J. Altenbuchner, H. [128] M. Camps, L. A. Loeb, in Directed H. Meyer, Bioorg. Med. Chem. 1999, 7, Enzyme Evolution (Eds.: F. H. Arnold, G. 2169-2173. Georgiou), Humana Press, USA, 2003, pp. 11-18. [112] W. Jechlinger, J. Glocker, W. Haidinger, A. Matis, M. P. Szostak, W. Lubitz, J. [129] N. D. Grindley, W. S. Kelley, Mol. Gen. Biotechnol. 2005, 116, 11-20. Genet. 1976, 143, 311-318. [113] K. Takano, Y. Nakabeppu, H. Maki, T. [130] E. B. Konrad, I. R. Lehman, Proc. Natl. Horiuchi, M. Sekiguchi, Mol. Gen. Acad. Sci. USA 1974, 71, 2048-2051. Genet. 1986, 205, 9-13. [131] W. S. Kelley, Genetics 1980, 95, 15-38. [114] A. Greener, M. Callahan, Strategies [132] Y. Nagata, K. Mashimo, M. Kawata, K. 1994, 7, 32-34. Yamamoto, Genetics 2002, 160, 13-23. [115] A. Greener, M. Callahan, B. Jerpseth, in [133] S. Fukushima, M. Itaya, H. Kato, N. In Vitro Mutagenesis Protocols (Ed.: M. Ogasawara, H. Yoshikawa, J. Bacteriol. K. Trower), Humana Press, Totowa, NJ, 2007, 189, 8575-8583. 1996, pp. 375-385. [134] E. M. Witkin, V. Roegner-Maniscalco, J. [116] R. G. Fowler, R. M. Schaaper, FEMS Bacteriol. 1992, 174, 4166-4168. Microbiol. Rev. 1997, 21, 43-54. [135] J. B. Sweasy, L. A. Loeb, J. Biol. Chem. [117] H. Echols, C. Lu, P. M. Burgers, Proc. 1992, 267, 1407-1410. Natl. Acad. Sci. USA 1983, 80, 2189- 2192. [136] C. M. Joyce, N. D. Grindley, J. Bacteriol. 1984, 158, 636-643. [118] F. Martinez-Morales, A. C. Borges, K. Martinez, K. T. Shanmugam, L. O. [137] B. E. Funnell, T. A. Baker, A. Kornberg, J. Ingram, J. Bacteriol. 1999, 181, 7143- Biol. Chem. 1986, 261, 5616-5624. 7148. [138] Y. Cao, T. Kogoma, J. Bacteriol. 1993, [119] J. T. Heap, M. Ehsaan, C. M. Cooksley, Y. 175, 7254-7259. K. Ng, S. T. Cartman, K. Winzer, N. P. [139] S. Banerjee, H. Y. Kim, V. N. Iyer, Minton, Nucleic Acids Res. 2012, 40, Plasmid 1996, 35, 58-61. e59. [140] G. B. Smirnov, A. S. Saenko, J. Bacteriol. [120] P. Balbas, G. Gosset, Mol. Biotechnol. 1974, 119, 1-8. 2001, 19, 1-12. [141] M. Monk, J. Kinross, J. Bacteriol. 1972, [121] J. E. Ambler, R. J. Pinney, Mutati. Res. 109, 971-978. 1996, 351, 181-186. [142] J. D. Gross, Grunstei.J, E. M. Witkin, J. [122] J. E. Ambler, R. J. Pinney, J. Antimicrob. Mol. Biol. 1971, 58, 631-634. Chemother. 1995, 35, 603-609. [143] I. Fijalkowska, P. Jonczyk, Z. Ciesla, [123] J. E. Ambler, Y. J. Drabu, P. H. Mutat. Res. 1989, 217, 117-122. Blakemore, R. J. Pinney, J. Antimicrob. Chemother. 1993, 31, 831-839. [144] P. R. Caron, S. R. Kushner, L. Grossman, Proc. Natl. Acad. Sci. USA 1985, 82, [124] K. E. Mortelmans, B. A. D. Stocker, Mol. 4925-4929. Gen. Genet. 1979, 167, 317-327. [145] I. Husain, B. Vanhouten, D. C. Thomas, [125] O. Selifonova, V. Schellenberger, in M. Abdelmonem, A. Sancar, Proc. Natl. Directed Evolution Library Creation: Acad. Sci. USA 1985, 82, 6774-6778. Methods and Protocols, Vol. 231 (Eds.: F. H. Arnold, G. Georgiou), Humana [146] T. Kornberg, A. Kornberg, in The Press, Totowa, NJ, 2003, pp. 45-52. Enzymes, Vol. 10 (Ed.: P. D. Boyer), Academic Press, 1974, pp. 119-144. [126] A. H. Badran, D. R. Liu, Nat. Commun. 2015, 6, 8425. [147] D. Uyemura, I. R. Lehman, J. Biol. Chem. 1976, 251, 4078-4084. [127] D. Rath, L. Amlinger, A. Rath, M. Lundgren, Biochimie 2015, 117, 119- [148] D. Uyemura, D. C. Eichler, I. R. Lehman, 128. J. Biol. Chem. 1976, 251, 4085-4089. Chapter three|Part II: References 255

[149] T. Itoh, J. Tomizawa, Cold Spring Harb. [170] F. C. Lawyer, S. Stoffel, R. K. Saiki, S. Y. Symp. Quant. Biol. 1979, 43, 409-417. Chang, P. A. Landre, R. D. Abramson, D. H. Gelfand, PCR Methods. Appl. 1993, 2, [150] D. T. Kingsbury, D. R. Helinski, Biochem. 275-287. Biophys. Res. Commun. 1970, 41, 1538- 1544. [171] P. McInerney, P. Adams, M. Z. Hadi, Mol. Biol. Int. 2014, 2014, 8. [151] G. Selzer, T. Som, T. Itoh, J. Tomizawa, Cell 1983, 32, 119-129. [172] P. H. Patel, L. A. Loeb, Proc. Natl. Acad. Sci. USA 2000, 97, 5095-5100. [152] G. del Solar, R. Giraldo, M. J. Ruiz- Echevarria, M. Espinosa, R. Diaz-Orejas, [173] A. Shinkai, L. A. Loeb, J. Biol. Chem. Microbiol. Mol. Biol. Rev. 1998, 62, 434- 2001, 276, 46759-46764. 464. [174] P. H. Patel, H. Kawate, E. Adman, M. [153] T. A. H. Hjalt, E. G. H. Wagner, Nucleic Ashbach, L. A. Loeb, J. Biol. Chem. 2001, Acids Res. 1995, 23, 580-587. 276, 5044-5051. [154] N. Panayotatos, Nucleic Acids Res. [175] P. H. Patel, M. Suzuki, E. Adman, A. 1984, 12, 2641-2648. Shinkai, L. A. Loeb, J. Mol. Biol. 2001, 308, 823-837. [155] H. Go, K. Lee, J. Biotechnol. 2011, 152, 171-175. [176] J. O. McCall, E. M. Witkin, T. Kogoma, V. Roegnermaniscalco, J. Bacteriol. 1987, [156] J. Tomizawa, T. Som, Cell 1984, 38, 871- 169, 728-734. 878. [177] B. Kim, L. A. Loeb, Proc. Natl. Acad. Sci. [157] J. Tomizawa, J. Mol. Biol. 1990, 212, USA 1995, 92, 684-688. 695-708. [178] M. Suzuki, D. Baskin, L. Hood, L. A. [158] H. Masukata, J. Tomizawa, Cell 1984, Loeb, Proc. Natl. Acad. Sci. USA 1996, 36, 513-522. 93, 9670-9675. [159] H. Masukata, J. Tomizawa, Cell 1986, [179] J. B. Sweasy, L. A. Loeb, Proc. Natl. 44, 125-136. Acad. Sci. USA 1993, 90, 4626-4630. [160] S. Naito, T. Kitani, T. Ogawa, T. Okazaki, [180] A. Shinkai, P. H. Patel, L. A. Loeb, J. Biol. H. Uchida, Proc. Natl. Acad. Sci. USA Chem. 2001, 276, 18836-18842. 1984, 81, 550-554. [181] B. Kim, T. R. Hathaway, L. A. Loeb, J. [161] Y. L. Yang, B. Polisky, J. Bacteriol. 1993, Biol. Chem. 1996, 271, 4872-4878. 175, 428-437. [182] S. L. Washington, M. S. Yoon, A. M. [162] J. Lilly, M. Camps, Microbiol. Spectr. Chagovetz, S. X. Li, C. A. Clairmont, B. D. 2015, 3. Preston, K. A. Eckert, J. B. Sweasy, Proc. [163] G. Mukhopadhyay, D. K. Chattoraj, J. Natl. Acad. Sci. USA 1997, 94, 1321- Mol. Biol. 1993, 231, 19-28. 1326. [164] M. Urh, J. Wu, K. Forest, R. B. Inman, M. [183] B. A. Bridges, R. Woodgate, Proc. Natl. Filutowicz, J. Mol. Biol. 1998, 283, 619- Acad. Sci. USA 1985, 82, 4193-4197. 631. [184] P. Cooper, Mol. Gen. Genet. 1977, 150, [165] T. T. Stenzel, T. MacAllister, D. Bastia, 1-12. Genes Dev. 1991, 5, 1453-1463. [185] P. K. Cooper, Biophys. J. 1976, 16, A182- [166] K. Park, D. K. Chattoraj, J. Mol. Biol. A182. 2001, 310, 69-81. [186] R. F. Hill, Photochem. Photobiol. 1965, [167] K. S. Doran, D. R. Helinski, I. Konieczny, 4, 563-568. Mol. Microbiol. 1999, 33, 490-498. [187] A. C. Groth, M. P. Calos, J. Mol. Biol. [168] S. Bailey, W. K. Eliason, T. A. Steitz, 2004, 335, 667-678. Science 2007, 318, 459-463. [188] M. Ptashne, A. Jeffrey, A. D. Johnson, R. [169] K. A. Eckert, T. A. Kunkel, PCR Methods. Maurer, B. J. Meyer, C. O. Pabo, T. M. Appl. 1991, 1, 17-24. Roberts, R. T. Sauer, Cell 1980, 19, 1-11. Chapter three|Part II: References 256

[189] E. H. Szybalski, W. Szybalski, Gene 1979, Villaverde, FEMS Microbiol. Lett. 1999, 7, 217-270. 177, 327-334. [190] R. Maurer, B. J. Meyer, M. Ptashne, J. [208] M. Gossen, H. Bujard, Proc. Natl. Acad. Mol. Biol. 1980, 139, 147-161. Sci. USA 1992, 89, 5547-5551. [191] B. J. Meyer, R. Maurer, M. Ptashne, J. [209] C. T. Kuan, I. Tessman, J. Bacteriol. Mol. Biol. 1980, 139, 163-194. 1991, 173, 6406-6410. [192] B. J. Meyer, M. Ptashne, J. Mol. Biol. [210] C. G. Miyada, L. Stoltzfus, G. Wilcox, 1980, 139, 195-205. Proc. Natl. Acad. Sci. USA 1984, 81, 4120-4124. [193] D. K. Hawley, W. R. Mcclure, J. Mol. Biol. 1982, 157, 493-525. [211] Y. Wang, H. Ding, P. Du, R. Gan, Q. Ye, Proc. Biochem. 2005, 40, 3068-3074. [194] A. J. F. Griffiths, J. H. Miller, D. T. Suzuki, in An Introduction to Genetic Analysis, [212] A. Wegerer, T. Sun, J. Altenbuchner, 7th ed., W. H. Freeman, NY, 2000. BMC Biotechnol. 2008, 8, 2. [195] P. Leplatois, A. Danchin, Biochimie [213] J. G. Vethanayagam, A. M. Flower, 1983, 65, 317-324. Microb. Cell Fact. 2005, 4, 3. [196] C. M. Elvin, P. R. Thompson, M. E. [214] W. Jechlinger, M. P. Szostak, A. Witte, Argall, P. Hendry, N. P. J. Stamford, P. E. W. Lubitz, FEMS Microbiol. Lett. 1999, Lilley, N. E. Dixon, Gene 1990, 87, 123- 173, 347-352. 126. [215] B. Kintses, C. Hein, M. F. Mohamed, M. [197] S. C. Makrides, Microbiol. Rev. 1996, 60, Fischlechner, F. Courtois, C. Laine, F. 512-538. Hollfelder, Chem. Biol 2012, 19, 1001- 1009. [198] L. Caspeta, A. R. Lara, N. O. Perez, N. Flores, F. Bolivar, O. T. Ramirez, J. [216] L. Zhou, K. Zhang, B. L. Wanner, in Biotechnol. 2013, 167, 47-55. Recombinant Gene Expression: Reviews and Protocols (Eds.: P. Balbás, A. [199] X. L. Jiang, H. B. Zhang, J. M. Yang, M. Lorence), Humana Press, Totowa, NJ, Liu, H. R. Feng, X. B. Liu, Y. J. Cao, D. X. 2004, pp. 123-134. Feng, M. Xian, Appl. Microbiol. Biotechnol. 2013, 97, 5423-5431. [217] C. Li, A. Wen, B. Shen, J. Lu, Y. Huang, Y. Chang, BMC Biotechnol. 2011, 11, 92. [200] W. Jechlinger, M. P. Szostak, W. Lubitz, Gene 1998, 218, 1-7. [218] J. M. Reyrat, V. Pelicic, B. Gicquel, R. Rappuoli, Infect. Immun. 1998, 66, [201] N. A. Valdez-Cruz, L. Caspeta, N. O. 4011-4017. Perez, O. T. Ramirez, M. A. Trujillo- Roldan, Microb. Cell Fact. 2010, 9. [219] S. Datta, N. Costantino, D. L. Court, Gene 2006, 379, 109-115. [202] H. Patel, K. D. Guo, C. Parent, J. Gross, P. N. Devreotes, C. J. Weijer, EMBO J. [220] R. C. Unger, A. J. Clark, J. Mol. Biol. 2000, 19, 2247-2256. 1972, 70, 539-548. [203] N. Hasan, W. Szybalski, Gene 1995, 163, [221] S. K. Kulkarni, F. W. Stahl, Genetics 35-40. 1989, 123, 249-253. [204] B. Binishofer, I. Moll, B. Henrich, U. [222] D. G. Gibson, H. O. Smith, C. A. Bläsi, Appl. Environ. Microbiol. 2002, 68, Hutchison, J. C. Venter, C. Merryman, 4132-4135. Nat. Methods 2010, 7, 901-U905. [205] R. Breitling, A. V. Sorokin, D. Behnke, [223] D. G. Gibson, L. Young, R. Y. Chuang, J. Gene 1990, 93, 35-40. C. Venter, C. A. Hutchison, H. O. Smith, Nat. Methods 2009, 6, 343-U341. [206] C. Winstanley, J. A. W. Morgan, R. W. Pickup, J. G. Jones, J. R. Saunders, Appl. [224] Y. W. Zhang, U. Werling, W. Edelmann, Environ. Microbiol. 1989, 55, 771-777. Nucleic Acids Res. 2012, 40, 1-10. [207] F. Hoffmann, A. Aris, X. Carbonell, M. Rohde, J. L. Corchero, U. Rinas, A. Chapter three|Part II: References 257

[225] GeneBridges, Technical Protocol for the [244] J. Braman, C. Papworth, A. Greener, Counter-Selection BAC Modification Kit Methods Mol. Biol. 1996, 57, 31-44. Heidelberg, GER.

[226] J. H. Miller, Experiments in molecular genetics, Cold Spring Harbor Laboratory, New York, 1972. [227] E. M. Witkin, J. O. Mccall, M. R. Volkert, I. E. Wermundsen, Mol. Gen. Genet. 1982, 185, 43-50. [228] E. Ephrati-Elizur, J. Bacteriol. 1993, 175, 207-213.

[229] J. P. Thirion, M. Hofnung, Genetics 1972, 71, 207-216. [230] E. Garcia-Fruitos, E. Vazquez, C. Diez- Gil, J. L. Corchero, J. Seras-Franzoso, I. Ratera, J. Veciana, A. Villaverde, Trends Biotechnol. 2012, 30, 65-70. [231] D. B. Straus, W. A. Walter, C. A. Gross, Nature 1987, 329, 348-351.

[232] B. Bukau, Mol. Microbiol. 1993, 9, 671- 680. [233] F. Arsene, T. Tomoyasu, B. Bukau, Int. J. Food Microbiol. 2000, 55, 3-9. [234] E. Guisbert, C. Herman, C. Z. Lu, C. A. Gross, Genes Dev. 2004, 18, 2812-2821.

[235] K. Nishihara, M. Kanemori, M. Kitagawa, H. Yanagi, T. Yura, Appl. Environ. Microbiol. 1998, 64, 1694- 1699. [236] K. Nishihara, M. Kanemori, H. Yanagi, T. Yura, Appl. Environ. Microbiol. 2000, 66, 884-889. [237] C. L. Araya, D. M. Fowler, Trends Biotechnol. 2011, 29, 435-442. [238] K. C. Kao, G. Sherlock, Nat. Genet. 2008, 40, 1499-1504.

[239] E. F. Mao, L. Lane, J. Lee, J. H. Miller, J. Bacteriol. 1997, 179, 417-422. [240] M. Nelson, M. McClelland, Methods Enzymol. 1992, 216, 279-303. [241] J. P. P. Muyrers, Y. M. Zhang, G. Testa, A. F. Stewart, Nucleic Acids Res. 1999, 27, 1555-1557. [242] C. Aslanidis, P. J. de Jong, G. Schmitz, PCR Methods Appl. 1994, 4, 172-177. [243] N. Taniguchi, S. Nakayama, T. Kawakami, H. Murakami, BMC Biotechnol. 2013, 13, 91.

Though this be madness, yet there is method in ’t. William Shakespeare (Hamlet, Act 2, Scene 2)

261

Chapter 4.

Method Section for General Methods and Materials

General Methods 262

Content

Microbiological Methods ...... 263

Microbial storage, cell cultivation and cell harvest ...... 263

E. coli cell lysis methods ...... 264

Microtiter plate handling ...... 264

DNA transformation ...... 265

Heat shock methodology ...... 265

Electro shock methodology ...... 265

Molecular Biology ...... 267

Polymerase chain reaction (PCR) ...... 267

QuikChange® ...... 268

FastCloning ...... 268

PCR/MEGAWHOP ...... 268

Restriction and Ligation ...... 269

Agarose gel electrophoresis ...... 270

Spectrometric measurement of DNA content ...... 270

Plasmid isolation ...... 270

Sequencing ...... 270

Methods of Biochemistry ...... 271

Protein purification - IMAC ...... 271

Dialysis ...... 271

Protein content measurements ...... 271

SDS-PAGE ...... 271

Software and computational tools ...... 272

References ...... 273

General Methods 263

Microbiological Methods

Chemicals were purchased from Fluka (CH), Sigma Aldrich (GER), Merck (GER), VWR (GER), Roth (GER), Thermo Fisher Scientific (USA), Starlab (GER), analytikjena (GER) and Bio-Rad (GER) in the highest purity available.

Microbial storage, cell cultivation and cell harvest E. coli strains (Table 1) were stored in glycerol stocks, containing 900 µL LB medium (10 g L-1 tryptone, 10 g L-1 NaCl, 5 g L-1 yeast extract) and 900 µL 60 % sterile glycerol in water. For overnight cultures (onc), 2 µL of glycerol stock, 1 µL of cultured cells or a suitable amount of cell colony/colonies from petri dishes were added in 5 mL LB with suitable antibiotics in culture tubes and shaken overnight.

Table 1: Commonly used strains in this thesis and the respective genotypes.

Strain Genotype BL21 (DE3) E. coli B F- fhuK ompT gal dcm lon hsdSB(rB- mB-) λ(DE3 [lacI lacUV5-T7 gene 1 ind1 sam7 nin5]) BL21 (DE3) Gold E. coli B F- ompT hsdS(rB- mB-) dcm+ TetR gal λ(DE3) endA Hte JS200 E. coli SC-18 (λ-) recA718 polA12 uvrA155 trpE65 lon-11 sulA1 Top10 E. coli F- mcrA Δ(mrr-hsdRMS-mcrBC) φ80lacZΔM15 ΔlacX74 nupG recA1 araD139 Δ(ara-leu)7697 galE15 galK16 rpsL(StrR) endA1 fhuA2 λ-

For expression cultures, sterile LB was inoculated 1:100 with overnight cultures with appropriate antibiotics (Table 2), grown at 37 °C until an OD600 of 0.4-0.6 and induced with IPTG to a final concentration of 0.1-0.4 mmol L-1. The cells were either shaken for 5 h at 30 °C or overnight (16-20 h) at 16-20 °C with 140-160 rpm (Minitron or Unitron, Infors AG, CH).

Table 2: Antibiotics for cell cultivation and working concentrations as final concentration in whole solution.

Antibiotic Working concentration Ampicillin 100 μg mL-1 Chloramphenicol 33 μg mL-1 Kanamycin 30 μg mL-1 Streptomycin 50 μg mL-1 Tetracycline 10 μg mL-1 General Methods: Microbiology 264

After sufficient protein expression (which was followed for the first cultivations by SDS- PAGE analysis), the cells were harvested by centrifugation for 20 min at 4000 g and 4 °C (Labofuge 400R or Multifuge 3S‐R, Thermo Scientific, GER). Smaller samples (e.g.

7/OD600 for SDS-PAGE or 2 mL overnight cultures for plasmid preparation) were harvested by centrifugation at 13,000g for 5 min at 4 °C (Biofuge fresco or Biofuge pico, Thermo Scientific, GER). The cells were washed twice for further preparation (cell lysis and protein purification, whole cell biocatalysis or assay procedures) in the appropriate buffers, used for the downstream applications.

E. coli cell lysis methods Three methods were used for cell disruption, usually depending on the processing volume and the sensitivity of the target proteins towards the individual methods. Cell culture volumes from 50-500 mL, washed in a total amount of 20-30 mL target buffer (e.g. 50 mmol L-1 sodium phosphate pH 7.5 (+300 mL NaCl), 0.1 mol L-1 glycine buffer, pH 8.6 or 0.1 mmol L-1 TRIS-HCl buffer, pH 7.0) were lysed by three times of high pressure homogenization (French Pressure Cell, Thermo Fisher Scientific, USA) at 1,500 psi with 0.1 mg mL-1 DNase. Sonication was used for cell culture volumes from 1-50 mL, after washing the cells twice in the target buffer. Depending on the volume, 1-7 min of 50 % cycle and 60 % power (Sonoplus HD2070, Bandelin electronic, GER) were applied on ice three times to the suspension, which contained 0.1 mg mL-1 DNase. SDS-PAGE samples were lysed mechanically in a FastPrep 24 MP (Biomedicals, USA). 1 mL of the sample, containing 7/OD600 cells was mixed with 100 µL glass beads (0.1 mm diameter) and vigorously shaken three times for 20 s at 4 m s-1. During the cooling time of the application, the samples were cooled on ice. To obtain soluble crude extract, samples with 2-50 mL volume were centrifuged at 10,000 g and 4 °C for 60 min. Lower samples were centrifuged at 17,000 g and 4 °C for 20 min.

Microtiter plate handling Cultivation of small culture volumes for high-throughput applications was performed in 96 deep well or flat-bottom plates (Greiner Bio-One, GER). Thus, the culture volume was 1 mL and 180 μL, respectively. Cell growth and induction was performed as described above in a Thermomixer comfort (Eppendorf, GER) or iEMS™ incubator/shaker HT (Thermo Fisher (USA). Cells were harvested by centrifugation at 3,000 g and 4 °C for 25 min and washed twice with the same amount of downstream application buffer, as the cultivation volume. In-between, plates were shaken for resuspension at 750 rpm and 4 °C for 15 min (Thermomixer comfort, Eppendorf, GER). For whole cell assays, resuspended cells were taken, for master plates 100 μL of glycerol solution (70% in General Methods: Microbiology 265 distilled water) were added to a flat-bottom MTP and stored at -80 °C. For crude extract preparation, 200 μL of 1% BugBuster solution (Merck Millipore, GER) with 0.1 mg mL -1 DNase were added and incubated at 700 rpm and 37 °C for 1 h. The fractions were separated by centrifugation at 3,000 g and 4 °C for 45 min.

DNA transformation

Heat shock methodology

The chemo-competent E. coli cells were produced by the inoculation of 100 mL LB medium with 1 mL of an over-night culture of appropriate E. coli cells (Top10, BL21 (DE3), JS200 etc.). The culture was incubated at 37 °C (30 °C in case of JS200) and 180 rpm to an OD600 of 0.4-0.5. Then, the cells were harvested by centrifugation (4000 g, 4 -1 -1 °C, 15 min) and resuspended in 30 mL ice cold TfB1 (2.94 g L KCH3COO, 7.5 g L KCl, -1 1.11 g L CaCl2 and 130 mL (13 %) glycerol, autoclaved) and mix with 3.2 mL of a sterile -1 filtered 1 mol L MgCl2 solution. The suspension was incubated on ice for 15 min and centrifuged again. The pellet was resuspended in 4 mL TfB2 (0.75 g L-1 KCl, 11.1 g L-1 -1 CaCl2 , 2.1 g L 3-(N-morpholino)propane-sulfonic acid and 130 mL (13 %) glycerol) and incubated on ice for 15 min. Finally, the suspension was divided in 50 µl aliquots, shock- frozen in liquid nitrogen and stored at -80 °C. One aliquot was mixed with 1-5 μL of the DNA solution (50-200 ng) and incubated on ice for 30 min. After a heat shock for 40 s at 42 °C, cells were cooled on ice for 2 min and 350 μl -1 -1 -1 pre-warmed (37 °C) LB-SOC (10x solution: 2 g L KCl, 20 g L anhydrous MgCl2, 20 g L -1 anhydrous MgSO4 and 40 g L glucose, sterile filtrate and mixed 1:10 with LB medium) was added and the cells were regenerated for 1 h at 37 °C and 180 rpm. A suitable amount of cell suspension was then spread on LB-agar plates (LB medium with 15 g L-1 agar-agar) with appropriate antibiotics and incubated at 37 °C or 30 °C overnight (Friocell or Incucell, MMM Medcenter‐Einrichtungen GmbH (GER)).

Electro shock methodology

Competent E. coli cells were produced by inoculation of 500 ml LB-medium with 5 mL of an appropriate overnight culture of the desired E. coli strain. The culture was incubated at 30 or 37 °C and 180 rpm until an OD600 of 0.5-0.7. The cells were first cooled on ice for 20 min and then harvested by centrifugation at 4500 g and 4 °C for 30 min. The pellet is resuspended and centrifuged again stepwise in 500 mL, 250 mL, 20 mL and finally 1.5 mL of ice cold glycerol solution (10 % glycerol in MQ water, autoclaved). 50 µl aliquots were shock-frozen in liquid nitrogen and stored at -80 °C. For DNA transformation, one aliquot was mixed with 1-5 μL of the DNA solution (50- 200 ng) and incubated on ice for 5 min. The solution was pipetted into a 2-cm electroporation cuvette (BTX Cuvettes Plus, Harvard Apparatus, UK) and electroporated at 2.5 kV for 5 ms (MicroPulser, BioRad, GER). Subsequently, 600 μL pre-warmed LB SOC General Methods: Microbiology 266 medium were added and the cells were regenerated at 30 or 37 °C for 70 min. Afterwards, the cell suspension was spread on LB agar plates, conditioned with suitable antibiotic(s) and incubated overnight at 30 °C or 37 °C (Friocell or Incucell, MMM Medcenter‐Einrichtungen GmbH (GER)). General Methods 267

Molecular Biology

Polymerase chain reaction (PCR) DNA fragments for very diverse applications, like overlap-extension PCR, sequencing, endonuclease restriction, FastCloning, assembly reactions (SLiCE, NEBuilder, Gibson Assembly) or analysis (cPCR) were amplified in a PCR. Thus, genomic DNA, plasmids or synthetic fragments served as template DNA. The genomic DNA was either prepared by the DNease Blood and Tissue Kit (Qiagen, GER) or obtained from crude extract of boiled cells (5-10 min 95 °C). In the latter case, colonyPCRs were performed by picking a single colony from an agar plate, mixing the cell sample with the reaction solution and permitting cell lysis by prolongation of the initial denaturation to 5-7 min. The reactions were performed - depending on the used polymerase - with all components, recommended by the suppliers (Table 3).

Table 3: Components of a PCR, the amount of template may vary, due to the quality of the used material.

Component Taq/Opti Taq PfuPlus Q5 Reaction buffer [µL] 5 (10x) 5 (10x) 10 (5x) 10 mmol L-1 dNTPs [µL] 1 1 1 Forward primer [µmol L-1] 2.5 2.5 2.5 Reverse primer [µmol L-1] 2.5 2.5 2.5 Template DNA [ng] 50-100 50-100 50-100 Reaction enhancer 0.5 µL DMSO 0.5 µL DMSO 10 µL 5x high GC enhancer Polymerase [U] 1.25 2.5 0.5 U Nuclease free water Adjust to 50 µL

The reaction cycles for the amplifications were performed, as not differently states, as recommended by the supplier of the used polymerases (Table 4) in a thermocycler (Progene FPROG02D (Techne, UK), Tpersonal 48 (Biometra, GER), FlexCycler or FlexCycler2 (analytikjena, GER) or Tetrad 2 (Biorad, GER)).

Table 4: Temperature program for a standard PCR, depending on the used polymerase

Temperature [°C] Time [s] Segment Cycle Step Taq/Pfu Q5 Taq/Pfu Q5 1 1 95 98 180 30 initial denaturation 2 25-35 95 98 30 10 denaturation 50-72a 30 15 annealing 72 60 per kbb 30 per kb elongation 3 1 72 420 120 final elongation [a] the melting temperature was either the melting temperature predicted in Geneious minus 3 °C or defined separately by gradient PCR [b] the duration of the elongation step depended on the length of the amplicon General Methods: Molecular Biology 268

The PCR products were either used as crude product (e.g. for megawhop, FastCloning) or was purified by the NucleoSpin® Gel and PCR Clean-up Kit (Macherey-Nagel, CH).

QuikChange® For the substitution, deletion or insertion of nucleotides at a defined position within a plasmid template, the QuikChange® reaction was used. For this, an alternative PCR program was chosen to ensure the proper amplification of the whole plasmid (Table 5). These PCR methods are also recommended for amplicons with more than 6 kb and were usually performed with PfuPlus (roboklon, PL). Oligonucleotides were designed to be partially complementary to prevent primer insertion. The overlapping region of the primer pair should always cover the mutagenic site.

Table 5: Temperature program for a QuikChange® reaction, suitable for amplicons >6kb.

Segment Cycle Temperature [°C] Time [s] 1 1 92 180 2 10 92 15 58-68 30 68 60 per kb 3 25 92 15 58-68 30 68 60 per kb + 20 s per cycle 4 1 68 420-600

Afterwards, parental DNA was destroyed by the addition of 10 U DpnI and incubation for 2 h at 37 °C. Then, DpnI was inactivated at 80 °C for 20 min and the material was transformed to competent E. coli cells.

FastCloning FastCloning was performed like mentioned by Li et al.[1] The reactions were carried out like a standard PCR Two independent reactions amplified different DNA constructs with a minimum of 15 bp overlapping sequences (for intracellular recombination). 5 µL of each reaction solution were mixed and digested with 10 U DpnI for 2 h at 37 °C to destroy the parental template DNA. Then, DpnI was inactivated at 80 °C for 20 min and the material was transformed to competent E. coli cells.

PCR/MEGAWHOP During this method, a part of the wild type gene is amplified with mutagenic or vector specific oligonucleotides in a PCR. The PCR product was used as crude product (12 µL) or purified (2-5 µL) as megaprimer in a separate MEGAWHOP (megaprimer PCR of whole General Methods: Molecular Biology 269 plasmid). The PCR program has changed elongation times and annealing temperatures (Table 6).

Table 6: Reaction program for the MEGAWHOP after the initial PCR for the introduction of site- specific mutations or site-directed saturation mutagenesis.

Segment Cycle Temperature [°C] Time [s] 1 1 68 300 2 1 95 120 3 10 95 30 55 30 72 420 4 14 95 30 55 30 68 660

Restriction and Ligation For classic restriction/ligation procedures, two different starting points are possible: Either the desired restriction sequences are available at respective positions in the vector and the insert or not. If not, the sequences were added in a suitable position by an initial PCR with oligonucleotides, which exhibit an overhang of the desired sequence, followed by an ATATA repeat for loosely annealing. After the successful PCR (of the insert and/or the template), either the crude PCR product or the isolated plasmid DNA was digested with the corresponding restriction enzymes (20-30 U; NEB, USA or Thermo Fisher, GER) in an appropriate buffer. The approach was incubated 1-12 h (depending on the cutting efficiency) at 37 °C (or another recommended temperature) and the enzymes were finally heat-inactivated, if possible. The restricted products were purified via agarose gel electrophoresis and subsequent gel extraction (NucleoSpin® Gel and PCR Clean-up Kit, Macherey-Nagel, CH). To calculate the optimal ratio between insert and vector the following formula was used: 푘푏(푖푛푠푒푟푡) 푛푔 (푖푛푠푒푟푡) = 3 × 푛푔(푣푒푐푡표푟) × 푘푏(푣푒푐푡표푟) ng (insert/vector) – amount of purified and digested PCR product/vector kb (insert/vector) – quantity of bp of purified and digested PCR product/vector The samples were mixed to yield a 20 µL approach with 2 µL 10x T4 DNA ligase buffer and 0.5 µL T4 DNA ligase (Thermo Fisher, GER). The approach was incubated as described in Table 7. After heat inactivation, the ligated DNA was transformed into competent E. coli cells.

General Methods: Molecular Biology 270

Table 7: Incubation program for the ligation with T4 DNA polymerase.

Temperature [°C] Time [h] 25 2 18 4 16 3 12 3 10 2 70 0.2

Agarose gel electrophoresis For fragment analysis, the DNA amplicons were separated by agarose gel electrophoresis.[2] Depending on the length of the expected DNA fragments, 0.8% (long fragments) or 2% (short fragments) agarose gels were prepared with 0.005 % Roti Safe® gel stain. The separation was performed in a gel chamber (Subcell GT, Biorad, GER or Compact S, analytikjena, GER), filled with TAE buffer (50x 242 g L-1 tris(hydroxymethyl)- aminomethane (Tris), 57.1 mL glacial acetic acid and 18.6 g L-1 ethylenediamine- tetraacetic acid (EDTA), pH 8.0) and connected to a power supply (EV245, Consort, BEL) at 120 V and 400 mA for 20 min. The 1kbp DNA ladder (Carl Roth, GER) was used as DNA marker and the 10x reaction buffer C of the provided DNA polymerases Taq or OptiTaq (roboklon, PL) as gel application buffer.

Spectrometric measurement of DNA content The amount of DNA was measured spectrometrically at a NanDrop ND-1000 (Preqlab, GER). 2 μL of the sample solution were measured and only samples with a 260:280 nm ratio of higher than 1.7 were further used.

Plasmid isolation

Plasmid DNA was prepared from overnight cultures with the innuPrep Plasmid Mini Kit (analytikjena, GER) as recommended in the supplier’s manual.

Sequencing For the verification of the correct nucleotide sequence or the evaluation of mutagenesis rates, samples were sequenced at GATC Biotech (GER) or EurofinsGenomics (GER) with specific oligonucleotides. The sequencing results, DNA and plasmid sequences, plasmid map and primer sequences are provided on the primary data device and are available at the Department for Biotechnology and Enzyme Catalysis, Institute for Biochemistry, Greifswald University.

General Methods 271

Methods of Biochemistry

Protein purification - IMAC The crude supernatant obtained from centrifuged crude lysate was filtered (0.45 µm cut- off) and applied to Talon columns or an ÄKTAprime plus (GE Healthcare, UK) device. The Talon gravity flow system was equipped with immobilized cobalt ions and the ÄKTA device was used with 5 mL HItrap columns with immobilized nickel ions. The samples were loaded to the gel matrix in imidazol free buffer (e.g. 20 mmol L-1 sodium phosphate buffer), washed with 3-5 column volumes of the same buffer and the His6-tagged protein was eluted with the identical buffer, containing 300 mmol L-1 imidazol. To optimize the elution conditions, imidazol gradients could be applied. The elution fractions were pooled and imidazol was removed by a gel filtration with three 5 mL HItrap desalting columns (Sephadex-25 Superfine, GE Healthcare, UK) connected in row or by centrifugation with Amicon Ultra tubes (10 kDa cut-off, Merck Millipore, GER) at 4,500 g and 4 °C for 20 min each step. The respective buffer was chosen for the individual downstream application (e.g. glycine buffer for assays or sodium phosphate buffer for spectroscopy). The latter method was also used for increasing the protein concentration in case of low concentrated samples.

Dialysis The target enzyme sample was pipetted into a freshly boiled dialysis tube and firstly dialyzed for 4 h and then overnight. Each time, approximately 4 L of target buffer (e.g. 100 mmol L-1 glycine, pH 8.6 or 50 mmol L-1 potassium phosphate, pH 7.5) were used at 4 °C under continuous stirring (IKAMAG safety control MR 3001K, IKA Labortechnik, GER).

Protein content measurements The protein content was determined with Bradford reagent[3] (for stock solution 0.1 g Coomassie Brilliant Blue G250 in 50 mL ethanol and 100 mL 85% phosphoric acid adjusted to 250 mL with A. dest). The stock solution was diluted 1:5 with A. dest. and filtered. 15 µL of an appropriate diluted protein sample were added to 300 µL Bradford solution in microtiter plate. Bovine serum albumin (BSA) was used for the standard curve in a concentration from 50-500 µg mL-1. The absorption was measured in triplicates at 595 nm at a FLUOstar Optima (BMG Labtech, GER).

SDS-PAGE The sodium dodecylsulfate (SDS) polyacrylamide gel electrophoresis (PAGE) was used for the separation and analysis of protein probes.[4] Therefore, 10 µL of the appropriately diluted protein sample was mixed with 20 µL of loading dye (3.55 mL A. dest., 2.5 mL 0.5 mol L-1 Tris‐HCl (pH 8.0), 2.5 mL glycerol, 2 mL 10 % SDS, 0.2 mL 0.5% bromophenol General Methods: Biochemistry 272 blue and 0.5 mL β‐mercaptoethanol) and incubated for 10 min at 95 °C. 15 µL were added to a polymerized SDS gel (mini gel composed of approximately 6 mL 12% resolving gel (2 mL 1.5 mol L-1 Tris pH 8.8, 3.33 mL acrylamide (30% with 0.8% N,N'- methylene bisacrylamide)., 2.67 mL A. dest., 40 µl ammonium peroxodisulfate (APS, 10%) and 4 µl tetramethylethylenediamine (TEMED)) and approximately 1 mL 4% stacking gel (1 mL 0.5 mol L-1 Tris pH 6.8, 0.52 ml acrylamide, 2.47 mL A. dest., 40 µl APS and 4 µl TEMED)). The samples were separated at 25 mA per gel and 250 V for 60 min in gel chambers (Minigel Twin, Biometra, GER), which were filled with running buffer (10x 30.3 g L-1 Tris, 144 g L-1 glycine and 10 g L-1 SDS) and connected to an EPS 301 power supply (Amersham Biosystems, SE). Roti®-Mark standard (Carl Roth, GER) or PageRuler™ unstained protein ladder (Thermo Fisher, GER) were used as mol-weight markers. After the separation, the gels were fixed in staining solution (1 g Coomassie Brilliant Blue G250 in 300 mL ethanol, 100 mL glacial acetic acid and 600 mL A. dest.) and protein bands were finally visualized by the application of destaining solution (300 mL ethanol, 100 mL glacial acetic acid and 600 mL A. dest.).

Software and computational tools For biochemical analysis, comparison or design, the following tools were used within this thesis (Table 8).

Table 8: Computational tools for analysis, calculation or visualization procedures.

Software Application Supplier 3DM[5] 3D structural alignment of proteins Bio-Prodict, NL www.bio-prodict.nl CASTER 2.0 Degenerated DNA codons, library Reetz et al. size determination www.kofo.mpg.de Caver Tunnel and pocket analysis in Chovancova et al. analyst[6] protein structures www.caver.cz ChemDraw Drawing chemical reactions and PerkinElmer Informatics, USA Ultra structures www.cambridgesoft.com Clustal Multiple Sequence Alignments EBI, UK Omega http://www.ebi.ac.uk/Tools/ msa/clustalo/ double digest Buffer recommendation for the NEB, Ipswich, USA finder combination of restriction www.neb.com endonucleases Geneious Pro Working with and visualization of Biomatters Ltd. protein or DNA sequences, primer Auckland, NZ design www.geneious.com Protein data Investigation and analysis of protein RCSB, USA bank[7-8] structures www.rcsb.org PyMol[9] Structure visualizer Schrödinger, USA www.pymol.org YASARA[10-11] Homology modelling, ligand docking, YASARA Bioscience, AT molecular dynamics simulation www.yasara.org General Methods 273

References

[1] C. K. Li, A. Y. Wen, B. C. Shen, J. Lu, Y. Huang, Y. C. Chang, BMC Biotechnol. 2011, 11.

[2] P. A. Sharp, B. Sugden, J. Sambrook, Biochemistry 1973, 12, 3055-3063. [3] M. Bradford, Anal. Biochem. 1976, 72, 248-254.

[4] U. K. Laemmli, Nature 1970, 227, 680- 685. [5] R. K. Kuipers, H. J. Joosten, W. J. van Berkel, N. G. Leferink, E. Rooijen, E. Ittmann, F. van Zimmeren, H. Jochens, U. Bornscheuer, G. Vriend, V. A. dos Santos, P. J. Schaap, Proteins 2010, 78,

2101-2113. [6] E. Chovancova, A. Pavelka, P. Benes, O. Strnad, J. Brezovsky, B. Kozlikova, A. Gora, V. Sustr, M. Klvana, P. Medek, L. Biedermannova, J. Sochor, J. Damborsky, PLoS Comput. Biol. 2012, 8, e1002708.

[7] H. M. Berman, T. N. Bhat, P. E. Bourne, Z. K. Feng, G. Gilliland, H. Weissig, J. Westbrook, Nat. Struct. Biol. 2000, 7, 957-959. [8] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, P. E. Bourne, Nucleic Acids Res. 2000, 28, 235-242. [9] PyMol, The PyMOL Molecular Graphics System, Version 1.5.0.4 Schrödinger, LLC. [10] E. Krieger, G. Koraimann, G. Vriend, Proteins 2002, 47, 393-402.

[11] E. Krieger, T. Darden, S. B. Nabuurs, A. Finkelstein, G. Vriend, Proteins 2004, 57, 678-683.

Hiermit erkläre ich, dass diese Arbeit bisher von mir weder an der Mathematisch- Naturwissenschaftlichen Fakultät der Ernst-Moritz-Arndt-Universität Greifswald noch einer anderen wissenschaftlichen Einrichtung zum Zwecke der Promotion eingereicht wurde. Ferner erkläre ich, dass ich diese Arbeit selbstständig verfasst und keine anderen als die darin angegebenen Hilfsmittel und Hilfen benutzt und keine Textabschnitte eines Dritten ohne Kennzeichnung übernommen habe.

______

Danksagung/Acknowledgements

Zunächst gilt großer Dank meinem Betreuer Prof. Uwe Bornscheuer für seine kontinuierliche und durchweg sehr gute Betreuung während der gesamten Promotionszeit und die zahlreichen Hilfestellungen in den vielfältigen Projekten, sowie sein Vertrauen und sein außerordentliches Engagement. Vielen Dank auch an Dr. Dominique Böttcher, die als weiterer Betreuer stets verfügbar und mit ihren kreativen Ideen die bearbeiteten Projekte vielfach bereichert hat. Weiterhin gilt ein herzliches Dankeschön den gesamten Arbeitskreisen „Biotechnologie und Enzymkatalyse“ und „Proteinbiochemie“, sowie den Kollegen der Nachwuchsgruppe „DropIn Biofuels“ für kleinere und größere Hilfestellungen im Laboralltag, kreative Gespräche und zahlreiche Impulse durch die vielseitigen Fertigkeiten und die Erfahrungen aller Mitarbeiter. Danke für die sehr angenehme Zeit. It was a great pleasure to work with our external cooperation partners: At first I want to speak out many thanks to Prof. Jiří Damborský from the Masaryk University, Brno (Czech Republic) for his valuable impulses and the manifold experiences during the scientific exchanges with him and his group. I want to acknowledge the whole team of the Loschmidt Laboratories for the kind advices and the warm and friendly welcome during my visits. I am deeply grateful especially to Radka Chaloupkova for many advices and helpful discussions, to David Bednar for the complicated Rosetta calculations and deeper insights into physicochemical computations, to Pavel Vanacek for his help with the GC analysis and to Monika Strakova and Irena Halikova for their assistance in the laboratory experiments. Furthermore I am very thankful for the fruitful cooperation with Prof. Douglas Masterson (The University of Southern Mississippi, USA) and Maureen Smith during our PLE project. Ein großer Dank geht auch an Prof. Ivo Feussner und Cornelia Herrfurth der Georg- August-Universität Göttingen für Ihre Unterstützung und die gemeinsamen Untersuchungen zu den Fettsäure-Hydratasen und -Isomerasen und die stets freundliche Zusammenarbeit. An dieser Stelle soll auch ein herzliches Dankeschön an Prof. Manel Camps (University of California, USA) und Marlen Schmidt (GeneBridges, Heidelberg) ausgesprochen werden, für die hilfreichen Informationen und die Beantwortung zahlreicher Fragen. Es ist ein großes Glück in einem guten Team arbeiten zu dürfen, daher gilt besonderer Dank den Bachelor-, Master- und Diplomstudierenden, deren erfolgreiche Arbeiten ich betreuen durfte. Luise Weiher, Lida Rostock und Johannes Freiherr von Saß bin ich über alle Maßen dankbar für ihre selbständige und fleißige Arbeit, die auch meinen Horizont

sehr erweitert hat. Es war jederzeit eine große Freude mit Euch zu arbeiten, zu diskutieren und einander zur Seite zu stehen. Sehr großer Dank geht nunmehr auch an meine Freunde, die mir acht unvergessliche Jahre in Greifswald beschert haben und mit denen ich im gleichen Institut lernen und lehren durfte: Allen voran HSi, Maren, Christin, Mary, Lilly, Sandy, Jan und Martin aus meinem Studienjahr für offene Ohren, große Herzen, unvergessliche Feiern und viele Ratschläge und die ein oder andere Kritik. Sowie weiterhin Daniel für ungezählte Platten- Dienste, Andy für seinen tollen Humor, Mark für die sehr interessante Arbeit an der Roboter-Plattform und alle weiteren, ungenannten Menschen, die mir auch nur mit einem Lächeln begegnet sind. Meinen Eltern danke ich für die Freiheit meiner Entscheidungen, Ihren Respekt für meine Arbeit und ihr Vertrauen. Meinem Sohn Theodor gilt ebenfalls ein besonderer Dank, dass er mir meine Augen für viele neue Dinge öffnet und dass er mir zeigt wie facettenreich das Leben noch ist. Es ist unbeschreiblich schön, dass es dich gibt. Meiner lieben Frau Diana kann ich zuletzt gar nicht genug danken für die zahlreichen Atempausen, für ihr liebevolles Verständnis, für ihre herzlichen Ratschläge, ihre Faszination an meiner Arbeit, ihren Enthusiasmus, die großartige gemeinsame Zeit, viele wundervollen (Tanz-)Momente und die Tatsache, dass sie mir ganz einfach beigestanden hat. Du hast mehr als alles andere dazu beigetragen, dass diese Arbeit gelingt. Ich bin sehr glücklich mit dir leben zu dürfen und freue mich sehr auf unsere Zukunft, die neuen Herausforderungen und die vielen Überraschungen, die noch auf uns warten. Egal wie und egal wo, aber mit dir wird es schön sein.