<<

University of Groningen

Protein engineering of cyclodextrin from Bacillus circulans strain 251 Penninga, Dirk

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record

Publication date: 1996

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA): Penninga, D. (1996). Protein engineering of cyclodextrin glycosyltransferase from Bacillus circulans strain 251. s.n.

Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

Download date: 25-09-2021 CHAPTER 1

GENERAL INTRODUCTION

I Cyclodextrins and their applications

II degrading and related

III Cyclodextrin

IV Scope of this thesis Chapter 1

I CYCLODEXTRINS AND THEIR APPLICATIONS

Cyclodextrins are cyclic oligosaccharides consisting of six ("-cyclodextrin), seven ($- cyclodextrin), eight ((-cyclodextrin) or more glucopyranose units linked by "(1÷4) bonds (see Figure 1). They were first discovered in 1891 when in addition to reducing dextrins a small amount of crystalline material was obtained from starch digests of Bacillus amylobacter (Clostridium butyricum) (Villiers, 1891). "....there is formed in very small amounts (about 3 g per kg of starch) a carbohydrate which forms beautiful radiate crystals after a few weeks in the alcohol from which the dextrins were precipitated...... having the composition represented by a multiple of the formula (C6H10O5)2.3H2O...." According to other authors, Villiers probably used impure cultures and the cyclodextrins were produced by a Bacillus macerans contamination (Koch, 1891). Villiers named his crystalline "cellulosine". In 1903, Schardinger was able to isolate two crystalline products, dextrins A and B, which were described with regard to their lack of reducing power. The bacterial strain capable of producing these products from starch was unfortunately not maintained. In 1904, Schardinger isolated a new organism capable of producing acetone and ethyl alcohol from and starch-containing plant material. Only later, in 1911, he described that this strain, called Bacillus macerans, also produces large amounts of crystalline dextrins (25-30 %) from starch (Schardinger, 1911). Schardinger named his crystalline products "crystallized dextrin "" and "crystallized dextrin $". It took until 1935 before "(-dextrin" was isolated (Freudenberg and Jacobi, 1935). These authors also developed several fractionation schemes for the production of cyclodextrins. At that time the structures of these compounds were still uncertain, but in 1942 the structures of "- and $-cyclodextrin were determined by X-ray crystallography (French and Rundle, 1942). In 1948, the X-ray structure of (- cyclodextrin followed and it was recognized that cyclodextrins can form inclusion complexes (Freudenberg and Cramer, 1948). In 1961, evidence for the natural existence of *-, ,-, .- and even 0-cyclodextrin (9-12 residues) was provided (Pulley and French, 1961). The main interest in cyclodextrins lies in their ability to form inclusion complexes with several compounds. From the X-ray structures it appears that in cyclodextrins the secondary hydroxyl-groups (C2 and C3) are located on the wider edge of the ring and the primary hydroxyl-groups (C6) on the other edge, and that the apolar C3 and C5 hydrogens and ether- like oxygens are at the inside of the torus-like molecules. This results in a molecule with a hydrophilic outside, which can dissolve in water, and an apolar cavity, which provides a hydrophobic matrix, described as a "micro heterogeneous environment" (Saenger, 1984).

10 General introduction: I

Figure 1. The structures of "-, $-, and (-cyclodextrins, their three-dimensional forms and sizes, and the formation of an inclusion complex. For sizes of a and b, see Table 1.

As a result of this cavity, cyclodextrins are able to form inclusion complexes with a wide variety of hydrophobic guest molecules (see Figure 1). One or two guest molecules can be entrapped by one, two or three cyclodextrins (Wenz, 1994). The most important parameter for complex formation with hydrophobic compounds or functional groups is their three- dimensional form and size (see Table 1). The driving force is the entropic effect (Saenger, 1980) of displacement of water molecules from the cavity. Another possibility is that this water causes a strain on the cyclodextrin ring, which is released after complexation, producing a more stable, lower energy state (Saenger, 1980; Szejtli, 1982; Saenger, 1984).

Other parameters for complexation are charge or polarity of the guest compound or competition with other molecules from the medium. Because the inclusion complexes are quite stable they can be separated from the medium by crystallization (Starnes, 1990).

11 Chapter 1

Table 1. Properties of "-, $- and (-cyclodextrin.

Property "-cyclodextrin $-cyclodextrin (-cyclodextrin number of glucopyranose units 6 7 8 molecular weight (g/Mol) 972 1135 1297 solubility in water at 25oC (%w/v)14.5 1.85 23.2 outer diameter (a) (D) 14.6 15.4 17.5 cavity diameter (b) (D) 4.7-5.3 6.0-6.5 7.5-8.3 height of torus (D) 7.9 7.9 7.9 appprox. cavity volume (D 3) 174 262 427

From (Uekama and Irie, 1987) and (Szejtli, 1982). The outer diameter (a) and the cavity diameter (b) are indicated in Figure 1.

The studies of cyclodextrins in solution are supported by a large number of crystal structure studies. Cyclodextrins crystallize in two main types of crystal packing, channel structures and cage structures, depending on the type of cyclodextrin and guest compound. These crystal structures show that cyclodextrins in complexes adopt the expected "round" 4 structure with all glucopyranose units in the C1 chair conformation. Furthermore, studies with linear maltohexaoses, which form an antiparallel double helix, indicate that (- cyclodextrin is the form in which the steric strain due to cyclization is least while "- cyclodextrin is most strained (French, 1957; Saenger, 1980; Saenger, 1988). Apart from these naturally occurring cyclodextrins, many cyclodextrin derivatives have been synthesized. These derivatives usually are produced by aminations, esterifications or etherifications of primary and secondary hydroxyl groups of the cyclodextrins. Depending on the substituent, the solubility of the cyclodextrin derivatives is usually different from that of their parent cyclodextrins. Virtually all derivatives have a changed hydrophobic cavity volume (Saenger, 1980). Cyclodextrins are frequently used as building blocks. Up to twenty substituents have been linked to $-cyclodextrin in a regioselective manner. The synthesis of uniform cyclodextrin derivatives requires regioselective reagents, optimization of reaction conditions, and a good separation of products. The most frequently studied reaction is an electrophilic attack at the OH-groups, the formation of ethers and esters by alkyl halides, epoxides, acyl derivatives, isocyanates, and by inorganic acid derivatives as sulphonic acid chloride. Also cleavage of C-OH bonds has been studied frequently, involving a nucleophilic attack by compounds such as azide ions, halide ions, thiols, thiourea, and amines; this requires activation of the oxygen atom by an electron-withdrawing group (Wenz, 1994).

12 General introduction: I

Because of their ability to specifically link covalently or noncovalently to other cyclodextrins, cyclodextrins can be used as building blocks for the construction of supra molecular complexes. Their ability to form inclusion complexes with organic host molecules offers possibilities to build supra molecular threads. In this way molecular architectures such as catenanes, rotaxanes, polyrotaxanes, and tubes, can be constructed. Such building blocks, which can not be prepared by other methods, can be employed, for example, for the separation of complex mixtures of molecules and enantiomers (Harada et al., 1992; Wenz, 1994).

Applications of cyclodextrins: Since every guest molecule is individually surrounded by a cyclodextrin (derivative) the molecule is micro-encapsulated from a microscopical point of view. This can lead to advantageous changes in the chemical and physical properties of the guest molecules.

Stabilization of light- or oxygen-sensitive substances Modification of the chemical reactivity of guest molecules Fixation of very volatile substances Improvement of solubility of substances Modification of liquid substances to powders Protection against degradation of substances by microorganisms Masking of ill smell and taste Masking pigments or the color of substances Catalytic activity of cyclodextrins with guest molecules

These characteristics of cyclodextrins or their derivatives make them suitable for applications in analytical chemistry, agriculture, the pharmaceutical field, in food and toilet articles. In analytical chemistry, cyclodextrins are used for the separation of enantiomers by High Performance Liquid Chromatography (HPLC) or Gas Chromatography (GC). The stationary phases of these columns contain immobilized cyclodextrins or derived supra molecular architectures. Other analytical applications can be found in spectroscopic analysis. In Nuclear Magnetic Resonance (NMR) studies they can act as chiral shift agents and in Circular Dichroism as selective (chiral) agents altering spectra. In electrochemical chemistry they can be used to mask contaminating compounds, allowing more accurate determinations

13 Chapter 1

(Armstrong, 1988). In some specific cases cyclodextrin-inclusion complexes can be employed as - complexes. Because these selective complexes facilitate specific steric attack of the organic inclusion compound, isomerization of the products can be accelerated. An other example of high catalytic activity is provided by a $-cyclodextrin-dinicotinamide derivative which catalyzes the reduction of a number of quinones with enzyme like reaction kinetics (Dao-dao Zhang et al., 1988; Marzona and Giraudi, 1988; Saenger, 1980). In agriculture, cyclodextrins can be applied to delay germination of seed. In grain treated with $-cyclodextrins some of the which degrade the starch supplies of the seeds are inhibited. Initially the plant grows more slowly, but later on this is largely compensated by an improved plant growth yielding a 20-45% larger harvest (Saenger, 1980). Recent developments involve the expression of CGTases in plants (Oakes et al., 1991). In the food industry, the properties of cyclodextrins have found several applications. Cyclodextrins were reported to have a texture-improving effect on pastry and on meat products. Other applications arise from their ability to reduce bitterness, ill smell and taste, and to stabilize flavors when subjected to long term storage. Emulsions like mayonnaise, margarine or butter creams can be stabilized with $-cyclodextrin. Using $-cyclodextrin one can remove cholesterol from milk, to produce dairy products low in cholesterol (Juhasz et al., 1988; Szente et al., 1988; Weiszfeiler and Szejtli, 1988). Ill smells or irritating properties of compounds in toilet articles are masked with cyclodextrins. The ability of cyclodextrins to stabilize volatile compounds in perfumes is used in perfumed rub sheets in magazines and odor-bags in laundry. There are numerous applications for cyclodextrins in the pharmaceutical field. For example the addition of "- or $-cyclodextrin increases the water solubility of several poorly water-soluble substances. In some cases this results in improved bioavailability, increasing the pharmacological effect allowing a reduction in the dose of the drug administered. Inclusion complexes can also facilitate the handling of volatile products. This can lead to a different way of drug administering, e.g. in the form of tablets. Cyclodextrins are used to improve the stability of substances to increase their resistance to , oxidation, heat, light and metal salts. The inclusion of irritating products in cyclodextrins can also protect the gastric mucosa for the oral route, and reduce skin damage for the dermal route. Furthermore, cyclodextrins can be applied to reduce the effects of bitter or irritant tasting and bad smelling drugs (Bekers et al., 1988; Duchene, 1988; Pitha, 1988). Administered cyclodextrins are quite resistant to starch degrading enzymes, although they can be degraded at very low rates by "-amylases. "-Cyclodextrin is the slowest, and (- cyclodextrin is the fastest degradable compound, due to their differences in size and

14 General introduction: I flexibility. Degradation is not performed by saliva or pancreas amylases, but by "-amylases from micro-organisms from the colon flora. Adsorption studies revealed that only 2-4% of cyclodextrins were adsorbed in the small intestines, and that the remainder is degraded and taken up as . This can explain the low toxicity found upon oral administration of cyclodextrins (Bar and Ulitzur, 1994; Duchene, 1988).

Mainly due to the fact that cyclodextrins can form inclusion complexes with a large variety of substances, a broad spectrum of applications of cyclodextrins thus has become visible. For many applications, however, the price of cyclodextrins at present is still too high. 1 Kg of the cheapest cyclodextrin ($) costs US$ 15-20 and it has been estimated that the potential market for cyclodextrins in the USA is 32,000 Metric tons per year (Starnes, 1990). The world market for cyclodextrins for 1995 has been estimated as 5500 Metric tons only; this major gap will be overcome only when the prize of cyclodextrins is significantly reduced (to US$ 5 per kg) (Schmid, 1989). The development of a more economical cyclodextrin production process is needed to expand the range of commercially successful technical applications of cyclodextrins. The market for cyclodextrins is also still limited, because of a limited availability of "- and (-cyclodextrin. Cyclodextrins are produced from starch by the enzyme cyclodextrin glycosyltransferase (CGTase). All CGTase enzymes studied form a mixture of "-, $- and (-cyclodextrin, which need to be separated (see part III); further research is needed to develop CGTases with improved product specificity.

15 Chapter 1

II STARCH DEGRADING AND RELATED ENZYMES

The structure of starch Starch consists of two types of glucan polymers: amylose and amylopectin. Potato starch consists of 20 % amylose and 80 % amylopectin. Depending on the origin, plant species, variety within plants, plant organ, age of organ, and growth conditions, this ratio may vary considerably, from 11 to 51 % amylose. Amylose consists mainly of linear chains of "(1÷4)-linked glucose residues, approximately 1000 residues long, and is branched at a low level (approximately one branch per 1000 residues) by "(1÷6)-linkages (Martin and Smith, 1995). Pure amylose forms hydrogen bonds between the molecules in solution, resulting in rigid gels. After heating this solution it may crystallize and shrink, a process known as retrogradation. The other 80 % of potato starch consists of amylopectin. Amylopectin is a highly branched "(1÷4)-glucan polymer (approximately one "(1÷6)-linkage per 20 glucose residues), which forms organized structures (see Figure 2-I ). In these structures one can recognize the so called A-chains, which are not substituted at the C6 positions, the inner B- chains which are "(1÷6)-branched at one (B1-chain), or several points (B2, B3 etc.). There is only one free reducing end per amylopectin molecule (the C-chain). The branches are clustered at 7-10 nm intervals (approximately 20 glucose residues), 20-40 per molecule, forming an amylopectin molecule 200-400 nm long (approximately 400-800 glucose residues long) and 15 nm wide (Figure 2-I). In solution amylopectin forms fewer hydrogen bonds than amylose, therefore it remains fluid, with a high viscosity and elasticity. The enzymatic process of starch biosynthesis is relatively simple; first an ADP-glucose pyrophosphorylase catalyzes the formation of an ADP-glucose and inorganic pyrophosphate from glucose-1-phosphate and ATP. For potato this takes place in the plant plastids (amyloplasts). Both ATP and glucose-1-phosphate have to be imported from the cytosol. Alternatively, glucose-1-phosphate can be synthesized in the plastid by a phosphoglucomutase from glucose-6-phosphate. The pyrophosphate is removed by the action of an alkaline pyrophosphatase. The second step involves starch synthase, which catalyzes the formation of an "(1÷4)-bond between the C1 of the glucose from the ADP- glucose and the C4 of the non-reducing glucose of the growing amylose chain. This suggests that an initial primer is needed to start the reaction; the nature of this primer is unknown. A third step is formation of an "(1÷6)-branch by a starch-branching enzyme, which cuts an "(1÷4)-linked glucan chain and forms an "(1÷6)-linkage between the C1 at the reducing end of a released glucan chain and the C6 of a glucose residue in another chain. Branches are not created randomly, but show a 20 glucan residues periodicity. It is believed

16 General introduction: II that this is due to the fact that the starch-branching enzyme has high affinity for a double helical conformation of glucan chains, which is only formed at a certain minimum chain length. Starch granules, which vary in size from smaller than 1 nm to 100 nm and larger, contain both amylose and amylopectin. The amylopectin molecules are arranged radially, and can form regular packings of double helices with adjacent branches within the clusters of branches. In this way it obtains a crystalline character, which may vary considerably depending on degree of branching and water content, creating more open (amorphous) lamellae (Martin and Smith, 1995). Starch granules, however, are found to consist of (semi) crystalline regions, and also relatively amorphous regions. These latter regions consist of amylose molecules which form single helical structures. The different alternating regions form the so-called growth rings of the granules, which result from fluctuations in biosynthesis (see Figure 2-II).

Figure 2. I: Schematic representation of the highly organized structure of an amylopectin molecule. Indicated are the A-, B-, and C-chains. A-chains are not branched; B-chains contain one branch (B1-chain) or more (B2-, B3-chain, etc). C-chains contain the single reducing end. II: Schematic representation of a starch granule, showing the alternating semicrystalline and amorphous growth rings. Reproduced from Martin and Smith, 1995, with modifications.

17 Chapter 1

Table 2: Reaction specificity of a selection of .

Enzyme Type of specificity Refs. Spec. bond Substrate Cleaves off Taka- A " r h e 4 oligo- & polysacch oligosacch ( 1) malto-tetraohydrolase " r h x 4 oligo- & polysacch G4 ( 2) "-amylase " r h e 4 oligo- & polysacch oligosacch ( 3) maltogenic "-amylase " r h x 4 oligo- & polysacch G2 ( 4) cyclodextrin glycosyltransferase " r t x 4$4 oligo- & polysacch cyclodextrins ( 5) branching enzyme " r t e 4$6 oligo- & polysacch - ( 6) " r h e 6 G3 ( 7) "-amylase-pullulanase " r h e 4/6 pullulan panose, G2, G1 ( 8) malto-pentaohydrolase " r h x 4 oligo- & polysacch G5 ( 9) $-amylase " i h x 4 oligo- & polysacch G2 (10) glucoamylase " i h x 4/6 oligo- & polysacch G1 (11) "-glucosidase () " r h d/x 4/6 oligo- & polysacch G1 (12) oligo-1,6-glucosidase " r h x 6 non-red. term."(1-6)bonds G1 (13) dextran glucosidase " r h x 6 dextran G1 (14) " r h e 6 oligo- & polysacch oligosacch (15) " r h/t e 4/6 pullulan panose, G2, G1 (16) $-glucosidase $ r h x 4 oligo- & polysacch hexose (17) "-galactosidase " r h x 4 -sacch. "-galactose (18) $-galactosidase $ r h d 4 ($-galactosides) $-galactose + G1 (19) cyclodextrinase " r h e 4/6 cyclodextrins G3, G2, G1 (20) amyloglucosidase " r h x 4/6 oligo- & polysacch G1 (21) amylomaltase " r t e 4$4 oligo- & polysacch - (22) $ r h x 4 peptidoglycan di-sacch. (23) $ r h x ? xylan xylobiose/-triose (24) glycogen debranching enzyme " r t e 4$4 "(1÷4) part of branch, leaving "(1÷6)glucose branch ,, ,, ,, (second ) " i h x 6 "(1÷6)glucose branch $-G1 (25) cellobiohydrolase $ i/r h x 4/3 $-glucans/xylans/cellulose G2, G3 (26)

The bond specificity is indicated as follows: " or $ indicate the anomeric configuration, which is retained (r) or inverted (i) in the product. The enzyme is a (h) or transglycosylase (t), acting as an endo (e)-, exo (x), or as a (d). The bond specificity is indicated with 3, 4 or 6; for (1÷3), (1÷4) or (1÷6) bonds, respectively; 4/6 or 4/3 indicate dual bond specificity; $ indicates the action. G2 is a glucose oligomer with a polymerization degree of 2 (); for references see: ( 1): (Matsuura et al., 1984) ( 2): (Zhou et al., 1992)

18 General introduction: II

( 3): (Boel et al., 1990) ( 4): (Diderichsen et al., 1991) ( 5): (Lawson et al., 1990) ( 6): (Baecker et al., 1986) ( 7): (Kuriki et al., 1988) ( 8): (Mathupala et al., 1993) ( 9): (Candussio et al., 1990) (10): (Mikami et al., 1992) (11): (Svensson et al., 1986b) (12): (Svensson, 1988) (13): (Kizaki et al., 1993) (14): (Kizaki et al., 1993) (15): (Amemura et al., 1988) (16): (Kuriki and Imanaka, 1989) (17): (Withers et al., 1990) (18): (Overbeeke et al., 1990) (19): (Jenkins et al., 1995) (20): (DePinto and Campbell, 1968) (21): (Hata et al., 1991) (22): (Takaha et al., 1993) (23): (Blake et al., 1967) (24): (Campbell et al., 1993) (25): (Yang et al., 1992) (26): (Divne et al., 1994)

Glycosylases, specificity and mechanism Glycosylases are enzymes that catalyze the cleavage of "- or $-glycosyl bonds. Glycosyl transfer the glycosyl bond to a water molecule, glycosyl transfer this glycosyl bond to the -OH group of another glycosyl residue. Numerous glycosylases, presently up to 492 enzymes in 45 families (Henrissat and Bairoch, 1993), are known, which form different products and display a complete range of different bonds they specifically attack. For example, an "-amylase hydrolyses endo-"(1÷4)bonds in chains and produces linear "(1÷4) linked oligosaccharides (a so-called retaining enzyme), whereas a $-amylase hydrolyses also "(1÷4)bonds, but cuts off only maltose from the non-reducing ends (exo) and forms a different anomeric configuration ($- D-glucose; a so-called inverting enzyme). Some enzymes prefer to use chains of a certain length as a substrate, for instance cyclodextrin glycosyltransferases use predominantly the A-, and B1 (single-branched B-chain) from amylopectin chains with a degree of polymerization of 13, 14 or 15 (Figure 2). These specificities are illustrated in Table 2; displayed is a selection of glycosylases acting on different (poly)saccharides. Enzymes that interact with polymeric carbohydrate substrates possess extended binding sites that have the ability to interact with up to 10 glycosyl residues. Increasingly, X-ray crystallographic studies of protein-carbohydrate complexes result in identification of these protein-ligand interactions, also providing information about factors determining the carbohydrate substrate and product specificities of different enzymes (Table 3). Each glycosyl residue of the polymer binds at a highly specific subsite in the active site cleft of the protein. At each subsite binding energy is generated by hydrogen-bonds with the OH- groups of the carbohydrates, or Van der Waals interactions with aromatic residues, or also

19 Chapter 1 by the hydrophobic effect from displacement of bound water molecules (Johnson et al., 1988; Quiocho, 1986; Quiocho, 1989).

20 General introduction: II

Table 3: Three-dimensional structures of CGTases and some other glycosylases complexed with carbohydrate ligands.

Enzyme Ligand Resolution (Å) Reference cyclodextrin glycosyltransferase B. stearothermophilus glucose, maltose, 5.0 (1) B. circulans strain 8 (D229A-mutant) maltose, modelled cyclodextrin 2.5 (2) Thermoanaerobacterium thermosulfurigenes modelled maltononaose inhibitor 2.3 (3) B. circulans strain 251 maltose (3x) 2.0 (4) acarbose 2.5 (5) maltotetraose 2.2 (6) (mutant D229N/E257Q) maltotetraose, "-cyclodextrin (2x) 2.2, 2.6 (6) maltononaose inhibitor (+G5,G3,G2) 2.6 (7) (mutant Y89D/S146P) maltohexaose 2.4 (8) "-amylase Aspergillus oryzae (Taka amylase A) , 3.0 (9) 6'-deoxy-6'-iodo-maltotriose 6.0 (9) Porcine pancreas thio-maltotriose analogue, - (10) acarbose - (11)

$-amylase soybean "-cyclodextrin 3.3 (12) influenza virus A (subtype N2) NAN, 2-deoxy-2,3-dehydro-NAN 1.8, 2.9 (13) influenza virus B NAN 2.8 (14) lysozyme hen egg-white NAM-NAG-NAM 1.5 (15) bacteriophage T4 (NAG)3, $-NAG, NAM 2.4, 2.4, 2.4 (16) cellobiohydrolase II Trichoderma reesei o-iodobenzyl-1-thio-$-cellobioside, 2.5 (17) 4-methyl-belliferyl-$-D-cellobioside, 2.5 (17) glucose-cellobiose 2.0 (17) D Clostridium thermocellum o-iodobenzyl-1-thio-$-cellobioside 2.3 (18)

21 Chapter 1

Glucose oligosaccharides and their sizes (n) are indicated as Gn; NAN, N-acetylneuraminic acid; NAM, N- acetylmuramic acid; NAG, N-acetylglucosamine.

22 General introduction: II

For references see:

(1) (Kubota et al., 1991) (2) (Klein et al., 1992) (3) (Knegtel et al., 1996) (4) (Lawson et al., 1994) (Chapter 2 of this thesis) (5) (Strokopytov et al., 1995) (6) (Knegtel et al., 1995) (Chapter 4 of this thesis) (7) (Strokopytov et al., 1996) (Chapter 5 of this thesis)(8) Chapter 6 of this thesis (9) (Matsuura et al., 1979) (10) (Payan et al., 1980) (11) (Buisson et al., 1987) (12) (Mikami et al., 1993) (13) (Varghese et al., 1992) (14) (Burmeister et al., 1992) (15) (Divne et al., 1994) (16) (Anderson et al., 1981) (17) (Rouvinen et al., 1990) (18) (Juy et al., 1992)

To determine the contribution to the Gibbs free energy ()G) of the different subsites, or the residues which interact with the glycosyl residues at these subsites, two methods are being applied; firstly, the -OH groups of the ligands are exchanged with -H to measure the individual contribution of these -OH groups (Adelhorst and Bock, 1992), and secondly, site-directed mutants are compared with wild-type protein (Nakamura et al., 1993). Furthermore, a kinetic analysis of product formation involving different subsites can be used to calculate the affinities of the individual subsites (Fagerstrom, 1991). Changes made in the subsites of glucoamylase were shown to result in modified action patterns of the enzyme (Sierks et al., 1989; Sierks et al., 1990).

Three different catalytic mechanisms for hydrolysis by glycosylases have been proposed: the inverting mechanism, the retaining mechanism proceeding through an oxo- carbonium ion intermediate, and the retaining mechanism involving a covalent intermediate (see Figure 3). In all glycosylases one to three invariant carboxylic acids are found in the active site cleft. For the CGTase of Bacillus circulans 251 these residues are Glu257, Asp229 and Asp328. In the proposed mechanism for inverting enzymes the glycosidic oxygen is initially protonated by a general acid catalyst, which is followed by a nucleophilic attack of a water molecule on the C1 atom of the sugar at subsite 1, activated by a carboxylate base (Figure 3 III), leading to inversion of the anomeric conformation. This mechanism is known as a single-displacement mechanism (Koshland, 1953); bond breaking and bond making both proceed in a single concerted step. The reaction rate depends on the concentrations of both nucleophile and substrate, kinetically known as a second order type of a substitution (SN2).

23 Chapter 1

Figure 3. Mechanisms of glycosylases; retaining enzymes acting through an oxo-carbonium ion intermediate (I) or via a covalent intermediate (II); and an inverting enzyme (III). HA and A - indicate the catalytic general acid and base, respectively. Reproduced from Svensson and Sogaard, 1993, with modifications.

24 General introduction: II

The retaining reactions (Figure 3 I and II) proceed via a double-displacement mechanism (Vernon, 1967); the first step involves a similar protonation of the glycosidic oxygen by a general acid as in the inverting mechanism, creating an intermediate which in a second step is attacked by a water nucleophile, assisted by the base form of the acid catalyst. Each step inverts the configuration of the anomeric carbon. The two displacement steps therefore create an overall retention of the configuration. For retaining enzymes the intermediate could either be an oxo-carbonium ion (Figure 3 I) which is electrostatically stabilized by a carboxylate, or involves formation of a covalent bond (Figure 3 II), in which one of the catalytic aspartates (in some cases a glutamate) is presumed to act as a nucleophile.

Kinetically the covalent-bond mechanism involves a second order substitution (S N2), while the other is thought to be a first order mechanism (SN1). Transglycosylases (e.g. CGTase) employ a similar reaction mechanism as described for retaining hydrolases. For these enzymes, however, the second step of the reaction does not involve a water nucleophile, but the non-reducing end of a saccharide, possibly assisted by the base form of the acid catalyst. In Figure 3 I and II, the saccharide involved in the tranglycosylation reaction, attached at C4 or C6, is indicated by R¢. Evidence that the retaining mechanism proceeds via an electrostatic stabilization of an oxo-carbonium ion intermediate has been presented for barley "-amylase; the interatomic distances in the structure of a complex of this enzyme with acarbose, a pseudo tetrasaccharide, disfavor the formation of a covalent intermediate (Svensson et al., 1994). Similar results were found for hen egg white lysozyme (HEWL) in a complex with N- acetylmuramic acid - N-acetylglucosamine - N-acetylmuramic acid (NAM-NAG-NAM) (Strynadka and James, 1991)(Hadfield et al., 1994). Kinetic evidence for the alternative

(SN2) mechanism, involving a covalent intermediate, has been presented for $-glucosidase and $-galactosidase (Sinnott, 1990; Withers et al., 1990).

25 Chapter 1

Table 4: Classification of glycosylases based on sequence similarity.

Enzyme Abbreviation EC number Family (number of sources) "-amylase AAMY 3.2.1.1 13 (35) "-galactosidase AGAL 3.2.1.22 4 (1) 27 (3) 36 (2) u (1) "- AFUC 3.2.1.51 29 (3) AGAR 3.2.1.81 16 (1) u (2) "-glucosidase AGLU 3.2.1.20 13 (5) 31 (2) "-L-iduronase AIDU 3.2.1.76 39 (1) "- AMAN 3.2.1.24/114 38 (3) u (2) amyloglucosidase AMG 3.2.1.3 15 (8) 31 (1) amylase/pullulanase APU 3.2.1.1/41 13 (1) "-arabinofuranosidase ARAF 3.2.1.55 43 (1) u (1) $-amylase BAMY 3.2.1.2 14 (8) $-galactosidase BGAL 3.1.2.23 1 (2) 2 (9) 35 (2) 42 (2) $-glucuronidase BGLR 3.2.1.31 2 (4) $-glucosidase BGLU 3.2.1.21 1 (10) 3 (11) 40 (2) 41 (2) $-mannanase BMAN 3.2.1.78 5 (1) 26 (1) 44 (1) $-xylosidase BXYL 3.2.1.37 39 (2) 43 (2) u (1) cellobiohydrolase CBH 3.2.1.91 6 (1) 7 (6) 5 (1) 10 (2) cyclodextrinase CDX 3.2.1.54 13 (1) cellodextrinase CED 3.2.1.74 3 (1) 5 (1) cyclodextrin glycosyltransferase CDGT 2.4.1.19 13 (7) CHI 3.2.1.14 18 (22) 19 (23) CHITO 3.2.1.- u (1) DEX 3.2.1.11 u (1) evolved-$-galactosidase EBGAL 3.2.1.23 2 (1) endoglucanase EG 3.2.1.4 5 (42) 6 (5) 7 (1) 8 (4) 9 (14) 10 (1) 12 (2) endo-N-acetyl-$-glucosaminidase endoNAG 3.2.1.96 18 (3) [ 26 (2) 44 (2) 45 (2) u (2) exo-1,3-$-glucanase EXG 3.2.1.58 5 (2) exo-$-fructosidase FRU 3.2.1.80 32 (1) malto-tetraohydrolase G4-AMY 3.2.1.60 13 (2) malto-pentaohydrolase G5-AMY 3.2.1.- 13 (1) malto-hexaohydrolase G6-AMY 3.2.1.98 13 (2) glucodextrinase GDX 3.2.1.70 13 (1) GLRB 3.2.1.45 30 (2) HYAL 3.2.1.36 u (1) isoamylase IAMY 3.2.1.68 13 (2)

26 General introduction: II INU 3.2.1.7 32 (1) INV 3.2.1.26 32 (11) laminarinase LAM 3.2.1.39 16 (2) 17 (11) u (1)

Table 4 continued

Enzyme Abbreviation EC number Family (number of sources) LEV 3.2.1.65 32 (1) LIC 3.2.1.73 8 (1) 16 (6) 17 (3) /phlorizin hydrolase LPH 3.2.1.62/108 1 (3) levansucrase LVS 2.4.1.10 32 (1) lysozyme LYS 3.2.1.52/17 21 (1) 22 (5) 23 (2) 24 (1) 25 (3) u (1) N-acetyl-"-galactosaminidase NAAGAL 3.2.1.49 27 (1) N-acetyl-$-glucosaminidase NABGLU 3.2.1.30/52 20 (5) neuraminidase NEUR 3.2.1.18 33 (2) 34 (1) neopullulanase NPUL 3.2.1.- 13 (1) oligo-1,6-"-glucosidase OGLU 3.2.1.10 13 (4) 6-phospho-$-galactosidase PBGAL 3.2.1.85 1 (3) 6-phospho-$-glucosidase PBGLU 3.2.1.86 1 (2) 4 (1) PGLR 3.2.1.15 28 (8) pullulanase PUL 3.2.1.41 13 (4) spore-germinating spec. protein SGSP - 9 (1) /isomaltase SI 3.2.1.48/10 31 (3) toxin-"-chain - 3.2.1.14 18 (1) TREH 3.2.1.28 37 (2) u (1) exo-laminarinase XLAM 3.2.1.58 17 (1) exo-polygalacturonase XPGLR 3.2.1.82 28 (1) xylanase XYN 3.2.1.8 10 (4) open reading frame ORF - 10 (1) 11 (21)

Data from (Henrissat, 1991) and (Henrissat and Bairoch, 1993). For more details and references, these papers should be consulted. The right-hand column indicates the family (or families) that an enzyme belongs to; the number of sources for a certain family is indicated between brackets; u indicates unclassified enzymes.

Classification of glycosylases Classification of enzymes based upon the type of reaction catalysed and their substrate-

27 Chapter 1 specificity does not take into account evolutionary events or sequence (and structural) similarities. A classification of 45 families based on alignments of 482 amino acid sequences therefore was constructed; 23 families were found to be monospecific (one EC number), and 22 polyspecific (Table 4) (Henrissat, 1991; Henrissat and Bairoch, 1993). Sequence similarity is a strong indication for folding similarities and members of one family most likely share the same folding characteristics. For example "-amylases and CGTases, both belonging to family 13, have been found to share superimposable domains. On the other hand several enzymes were found to fall into more than one family, e.g. 10 families in the case of endoglucanase. Since enzyme structures generally are better conserved than protein sequences, it remains possible that for instance some of these different families of endoglucanase enzymes possess similar folds. Further information will be obtained when studying the three- dimensional structures of different enzymes from different families; however, only a limited number of enzyme structures are available at present. A detailed comparative analysis of the primary structures of glycosyl hydrolases may also serve to locate the potentional active site residues, which are strongly conserved among these enzymes. The usefulness of this approach has been verified experimentally (Henrissat and Bairoch, 1993).

Figure 4. Schematic representation of the ($/") 8-barrel; secondary structure elements are indicated as arrows ( ($-strands) and as helices ("-helices). Indicated are also the N-terminal (N) and C-terminal ends (C).

Structural comparisons of glycosylases In the past years the three-dimensional structures of a number of glycosylases have been solved. The most abundant structural resemblance observed is the so-called ($/")- or TIM- barrel catalytic domain (named after chicken muscle triosephosphate ) (Janecek, 1994) of 300-400 residues, present in most of these enzymes. This structural domain, designated A, contains a highly symmetrical fold of eight parallel $-strands encircled by eight "-helices (Figure 4). In the "-amylase family the catalytic and substrate binding residues are

28 General introduction: II

located at the C-termini of $-strands. Not all ($/")8-barrels are the same; structural analysis of soybean $-amylase revealed the presence of a TIM-barrel fold different from that of the "-amylase family (Mikami et al., 1992).

In the structural subfamily of ($/")8-proteins that contains "-amylases and CGTases, a smaller structural domain (B) is inserted between $-strand 3 and "-helix 3. This B-domain exists of 44-133 amino acids, and contributes to substrate binding. A second long loop in the ÷ ($/")-barre8 l was discovered for enzymes that cleave "(1 6) -bonds. This additional domain may protrude from the first, second, or even seventh $-strand (in isoamylase) from the

($/")8-barrel (see Figure 5).

In the structures of "-amylases and CGTases the ($/")8-barrel is succeeded by a C-domain approximately 100 amino acids long with an antiparallel $-sandwich fold. The function of this domain is not known. Some authors believe that this domain is necessary for enzyme activity since it was found that enzymes hydrolysing or forming "(1÷6) bonds do not contain a C-domain (Jespersen et al., 1991). Following the C-domain, CGTases possess a domain of approximately 90 amino acids with an immunoglobulin fold (Hofmann et al., 1989; Klein and Schulz, 1991; Lawson et al., 1994 (Chapter 2)). The function of this D-domain, probably also present in maltogenic "- amylases, is unknown. Several glycosylase enzymes carry a putative raw-starch binding domain (E-domain) of approximately 110 amino acids at their C-terminal (or N-terminal) ends. Characterization of a glucoamylase mutant with deleted raw-starch binding domain showed that the function of this domain is in binding to starch granules, and that it is not important for with raw or soluble starch (Svensson et al., 1986b). The E-domain of CGTase from B. circulans strain 251 was found to interact with maltose and longer oligosaccharides (Knegtel et al., 1995 (Chapter 4)) Using sequence similarity comparisons, a scheme containing 19 different glycosylases was constructed (Figure 5) (Jespersen et al., 1991). This scheme also predicted the domain organization of nine types of enzymes not previously examined. The structural ($/")8-domain is present in all enzymes except glucoamylases, containing a different catalytic domain, called J. The differently folded A-domain in $-amylases is depicted in gray. B. polymyxa $-amylase contains two A-domains, which might indicate that this enzyme has two active sites. In most enzymes the B domain is present between $-strand 3 and "-helix 3, but it occurs at different positions in pullulanase and "-amylase-pullulanase. In these two enzymes and in isoamylase the additional domains after $-strands 1, 2 and 7 are indicated. The C, D, and E domain are not commonly present, therefore the enzymes containing them might be most closely related. On the other hand, the starch binding domain (E-domain) is also present in glucoamylases,

29 Chapter 1 connected via a glycosylated hinge. Besides the five domains discussed above, seven other domains with unknown function can be recognized. These domains may play a role in determining enzyme specificities. For example, some enzymes which act in an endo-fashion on pullulan, and the enzymes acting on "(1÷6) bonds, possess additional domains N- terminal of the ($/")8-barrel (Jespersen et al., 1991).

30 General introduction: II

Figure 5. The domain organization of several glycosylases (Reproduced from Jespersen et al., 1991, with modifications). For TAA, CGT, $Sb, Mal, 1,6G, GAn, and GRh the architecture is based on the crystal structures. The organization of the other enzymes is based on structure predictions and sequence similarity.

The domains are indicated as A-J, or numbered, indicating the loop of the ($/") -barrel8 they precede from. Empty segments indicate regions with no sequence similarity. A glycosylated ‘hinge’ region is shown as a . Enzyme names are abbreviated as follows: TAA: "-amylase from Aspergillus oryzae (Taka amylase A); G4":malto-tetraohydrolase from Pseudomonas saccharophila; "Sli: "-amylase from Streptococcus limosus; G2": maltogenic "-amylase from Bacillus stearothermophilus; CGT: cyclodextrin glycosyltransferase from Bacillus circulans; BE: branching enzyme from Escherichia coli; PuKa: pullulanase from Klebsiella aerogenes; "PU: "-amylase-pullulanase from Clostridium thermohydrosulfuricum; G5": maltopentaose-producing amylase from an alkalophilic Gram- positive bacterium; $Sb: $-amylase from soybean; $Ct: $-amylase from Clostridium thermosulfurigenes; Mal: maltase from Saccharomyces cerevisae; 1,6G: oligo-1,6-glucosidase from Bacillus cereus; DxG: dextran glucosidase from Streptococcus mutans; Iso: isoamylase from Pseudomonas amyloderamosa; PuBs: pullulanase from Bacillus stearothermophilus; NPu: neopullulanase from B. stearothermophilus; $Bp: $- amylase from Bacillus polymyxa; GAn: glucoamylase from Aspergillus niger; Grh: glucoamylase from Rhizopus oryzae.

31 Chapter 1

Evolutionary relationships

Glycosylases contain high similarities in the ($/")8-barrel catalytic domain, as is found in most enzymes metabolizing "(1÷4)-or "(1÷6)-glucosidic bonds. For example Taka-amylase A from Aspergillus oryzae (Matsuura et al., 1984) and CGTase from B. circulans strain 251 (Lawson et al., 1994 (Chapter 2)) contain 30 % similarity in the A-domain. The best conserved segments of this domain are $-strands 3, 4, 5, and 7, which are shown here for these two enzymes.

Region from $-strand 3 Region from $-strand 4 Region from $-strand 5 Region from $-strand 7

Taka-amylase A CGTase

* indicates invariant residue, numbering of the mature protein

The variations in the other barrel elements may be related to the enzyme specificities of the different glycosylases. Using the corresponding regions of 47 glycosylases, an unrooted tree was constructed (see Figure 6). This evolutionary tree illustrates the differences in taxonomy of the enzyme source, as well as the differences in enzyme specificity. As an illustration of the evolutionary distance of the enzyme source, the plant amylopectin-branching enzyme (29) is well separated from the bacterial glycogen-branching enzymes (28,30). For "- amylases a less distinct separation can be observed. Their place in the tree may be determined for a great deal by the differences in specificity. Groups of enzymes with related specificities can be found in the tree: CGTases are clustered, next to other transferases, and next to bacterial, fungal, plant, and animal "-amylases, starch-debranching enzymes, , and isoamylase. The two bacterial malto-hexaohydrolases form an exception in the sense that they do not cluster with each other: both enzymes are from a Bacillus origin, but are distantly related (41 and 42). Strikingly, cyclodextrinase (31) is also less closely related to CGTases than might be expected. The group of "-amylases (1-22) contains two enzymes, (4 and 7) which were later shown to be CGTases: (4), the B. circulans strain F2 (Nishizawa et al., 1987), and (7), the Bacillus sp. strain B1018 (Itkor et al., 1990; Nitschke et al., 1990). Recently it was also shown that the Thermoanaerobacterium thermosulfurigenes EM1 (formerly called C. thermosulfurigenes) “"-amylase” (Bahl et al., 1991) possesses a similar cyclization activity as other CGTases; amino acid alignments with other CGTases also showed a high homology (Wind et al., 1995). Thus, a careful identification of new enzymes is critical; enzymes capable of starch degradation showing

32 General introduction: II

similarity in the conserved regions of the ($/")8-barrel catalytic domain are not necessarily "-amylases. Moreover, the structure of CGTases shows that these enzymes possess two additional domains, D and E, which are absent in "-amylases. CGTases generally have a

molecular mass of 70 to 75 kDa, whereas "-amylases generally measure 45-55 kDa.

Figure 6. Distance tree of the ($/") 8-barrel type glycosylases. The dotted line indicates the uncertain distance between the center of the tree and the branching points. The branch lengths are proportional to the evolutionary distances. Specificity for "(1÷4) bonds is indicated as a white area, "(1÷6) bonds in gray; and dual-bond specificity with a D. The segment sticking out indicates transglycosylation enzymes; others are hydrolyzing enzymes. The letters between brackets indicate the enzyme origins, from animal (a), bacterial (b), fungal (f), plant (p), or Streptomyces (S) sources. AA: "-amylase; AG: "-glucosidase; AP: "-amylase- pullulanase; BE: branching enzyme; CGT: cyclodextrin glycosyltransferase; DG: dextran glucosidase; GD: glycogen debranching enzyme; G2-A: maltogenic "-amylase; G4-A: malto-tetraohydrolase; G6-A: malto- hexaohydrolase; OG: oligo 1,6-glucosidase. 1-22 are "-amylases; 22-26: "-; 27: "-amylase- pullulanase; 28-30: branching enzymes; 31: cyclodextrinase; 32-35: cyclodextrin glycosyltransferases; 36: dextran glucosidase; 37: glycogen debranching enzyme; 38: maltogenic "-amylase; 39-40: malto-

33 Chapter 1 tetraohydrolase; 41-42: malto-hexaohydrolase; 43: isoamylase; 44: neopullulanase; 45: oligo 1,6-glucosidase; 46-47: pullulanases. Reproduced from Jespersen et al., 1993, with modifications.

III CYCLODEXTRIN GLYCOSYLTRANSFERASES

Bacillus species produce a variety of secreted enzymes of industrial importance. Different glycosylases are studied extensively. One of these enzymes, cyclodextrin glycosyltransferase (CGTase) cleaves "(1÷4) bonds in a starch molecule, concomittantly linking the reducing and non-reducing ends to produce a cyclic molecule. The CGTase used in this study was isolated from the Gram-positive bacterium Bacillus circulans, a mesophilic soil bacterium, described for the first time in 1890, the colonies of which display internal circular movements on solid growth media (Ford, 1916). When starch is used as a carbon source the organism (strain 251, RIV nr. 11115; Rijksinstituut voor Volksgezondheid, Bilthoven, The Netherlands) excretes CGTase into the medium (see Figure 7). This CGTase converts starch into cyclodextrins, which are subsequently degraded by the action of the enzyme cyclodextrinase (Saha and Zeikus, 1990), which is associated with the membrane and is located at the cytosolic site (data not shown). B. circulans strain 251 is not only able to grow on starch but also on glucose, and will only synthesize CGTase and CDase when glucose is not available. There are two possible explanations for the existence of this complicated system. Most likely, by producing cyclodextrins, the organism may build up an external storage form of glucose, not accessible for most other organisms because they are not able to metabolize cyclodextrins.

Figure 7. Schematic representation of the location and action of cyclodextrin glycosyltransferase (CGTase) and cyclodextrinase (CDase). CGTase is excreted by the bacterium and produces a cyclodextrin (CD) from an amylose chain. A cell-associated CDase converts cyclodextrins into glucose and other oligosaccharides

34 General introduction: III

(Feederle et al., 1996). Glucose (Glu) is subsequently used in glycolysis to produce pyruvate (Pyr) and ATP. Small circles indicate glucose residues. A second explanation may be that the cyclodextrins produced by the organism are used to form inclusion complexes with toxic compounds in its surroundings, or with compounds needed for growth, such as hexadecane, which is made available to the cells of a sulfate- reducing bacterium by "-cyclodextrin (Aeckersberg et al., 1991).

Figure 8. Schematic representation of the different activities of cyclodextrin glycosyltransferase. Cyclization, coupling, disproportionation, and hydrolysis are transferase reactions. Circles indicate glucose residues; white circles indicate reducing ends, which are only produced in the last reaction.

CGTases are enzymes capable of several transferase reactions, in which a newly made reducing end of an oligosaccharide is transferred to an acceptor molecule. Depending on the nature of the acceptor molecule, four transferase reactions (cyclization, coupling, disproportionation, and hydrolysis) can be distinguished (Nakamura et al., 1993). 1) Cyclization is the transfer of the reducing end sugar to another sugar residue in the same oligosaccharide chain, resulting in formation of a cyclic compound. 2) Coupling is the reaction where a cyclodextrin molecule is combined with a linear oligosaccharide (chain) to produce a longer chain linear oligosaccharide. 3) Disproportionation is the transfer of part of a linear oligosaccharide chain to another linear acceptor chain. Starting from two molecules of a pure oligosaccharide, this reaction yields a mixture of smaller and longer

35 Chapter 1 oligosaccharides. 4) In hydrolysis (saccharifying activity) the newly made reducing end is transferred to water (see Figure 8).

Sequence comparison of CGTase At present at least 38 CGTase enzymes have been identified and purified, and the corresponding genes cloned, mainly from Bacillus species, but also from Thermoanaerobacterium, Thermoanaerobacter, Micrococcus species, and from Klebsiella, a single Gram-negative source. Table 5 shows an overview of the 21 CGTases for which the amino acid sequences have been published. For B. stearothermophilus strain NO2 three identical CGTase sequences have been published: CGT1, CGT5, and CGT232 (Fujiwara et al., 1992b). Fifteen other CGTases from B. megaterium strain T5 (Kitahara and Okada, 1974), Micrococcus sp. (Yagi et al., 1987), alkalophilic Bacillus sp. strain 290-3 (Nitschke et al., 1990), alkalophilic Bacillus sp. from a deep see mud sample (Georganta et al., 1993), Bacillus sp. strain 135, Bacillus sp. strain 169, Bacillus sp. strain 13 (Hokse et al., 1981), B. amyloliquefaciens (Yu et al., 1988), Klebsiella oxytoca strain 19-1, B. circulans strain E192 (Bovetto et al., 1992), B. circulans strain C31 (Pongsawasdi and Yagisawa, 1988), B. circulans var. alkalophilic (Mattsson et al., 1992), Thermoanaerobium brockii, Brevibacterium sp. Strain 9605 (Mori et al., 1994), and B. lentus (Sabioni and Park, 1992) are not listed because of a shortage of information in the literature. In general, CGTases show clear sequence similarity, ranging from 47 to 99 %. The data in Table 5 shows that differences between CGTases of different strains within a species are surprisingly large as can be seen for the two B. circulans, and B. macerans enzymes. The difference between the enzymes of the two B. stearothermophilus strains on the other hand is only one amino acid. Table 5 also indicates that overall sequence similarity does not coincide with product specificity. Although the $-cyclodextrin producing CGTases from Bacillus sp. strain 6.6.3 and B. circulans strain 8 are highly similar, and the CGTase proteins from B. circulans strain 251, Bacillus sp. strain 1011, strain 38.2, and strain 17.1 also form a group with high similarity, this is not observed for other $-cyclodextrin producing CGTases and proteins which form mainly "-cyclodextrin. The similarity between the two (-cyclodextrin-specific CGTases is also very low, although the B. subtilis strain 313 displays a low similarity (47-64 amino acids) with all other proteins. This B. subtilis strain 313 protein is the most dissimilar protein, lacking the complete D-domain. On the other hand it might well be that this sequence contains mistakes, like the frame shifts that were discovered for the B. circulans F2 protein (Nitschke et al., 1990). The CGTase from a Gram-negative source, the K. pneumoniae enzyme, displays also a low similarity with the other CGTases, and lacks a large part of the D-domain.

36 General introduction: III

Table 5. Overview of cyclodextrin glycosyltransferases.

No. Organism Size (no of a.a.) Similar a.a. Main product Remarks 1 B. circulans strain 251 686 (713) - $ 2 B. circulans strain 8 684 (718) 501 $ 510 3 B. circulans strain F2 501* (528) 379 A $ first described as "-amylase 4 alkalophilic Bacillus sp.strain 17.1 686 (713) 640 $ 5 alkalophilic Bacillus sp.strain 1011 686 (713) 624 $ 6 alkalophilic Bacillus sp.strain 38.2 684 (712) 610 $ 7 alkalophilic Bacillus sp.strain 1-1 678 (703) 382 $ 8 B. ohbensis 675 (704) 384 $/( 9 B. licheniformis strain CGT-A 684 (718) 503 "/$ 10 B. macerans strain CGT-M 687 (714) 475 " 447 11 B. macerans 686 (713) 467 A " 12 K. pneumoniae strain M5aI 624 (655) 146 " lacks 90% of D-domain 13 B. stearothermophilus strain NO2 683 (711) 432 " 710 14 B. stearothermophilus 683 (711) 431 A " 15 T. thermosulfurigenes strain EM1 683 (710) 478 $/" first described as "-amylase 620 16 Thermoanaerobacter sp. 683 (710) 472 A "/$ 17 Bacillus sp. strain 6.6.3. 684 (718) 498 $ 18 Bacillus sp. strain B1018 686 (713) 697 $ first described as raw-starch 19 Bacillus sp. strain. KC201 678 (725) 381 $ [digesting amylase 20 unknown sp. 673 (699) 332 ( 47 21 B. subtilis strain 313 501 57 A ( lacks complete D-domain

The size indicated is the number of amino acids (a.a.) of the mature protein. *Due to a frame shift, this size is too small: later a larger size was calculated (Nitschke et al., 1990). The size including the signal peptide is given between brackets. Similarity with the B. circulans strain 251 CGTase was calculated using the sequences including the signal peptide with the PC/GENE “CLUSTAL” programme (K-tuple value = 1, Gap penalty = 5, Window size =10, Filtering level = 2.5, Open gap cost = 10, Unit gap cost = 10); }, indicates similarity between different proteins; main product, "-, $-, or (-cyclodextrin. For references see: 1: (Hokse et al., 1981) 2: (Nitschke et al., 1990) 3: (Nishizawa et al., 1987) 4: (Kuriki and Imanaka, 1989) 5: (Kimura et al., 1987) 6: (Horikoshi, 1988) 7: (Schmid et al., 1988) 8: (Sin et al., 1991) 9: (Hill et al., 1990) 10: (Takano et al., 1986) 11: (Sakai et al., 1987) 12: (Binder et al., 1986) 13: (Kubota et al., 1991) 14: (Fujiwara et al., 1992b)

37 Chapter 1

15: (Wind et al., 1995) 16: Patent Appl. DK 96/00179 17: Akhmetzjanov, A.A., ENTREZ-NCBI seq ID: 39839 (1992) 18: (Itkor et al., 1990) 19: (Kitamoto et al., 1992) 20: patent: WO 9114770; ENTREZ-NCBI seq ID: 512550 (1994) 21: (Horikoshi, 1988)

38 General introduction: III

It should also be realized that the main cyclodextrin produced by these enzymes is to a great deal dependent on the methods and incubation conditions used. This makes a direct comparison difficult, although it is clear that all CGTase enzymes studied produce mixtures of all three most common cyclodextrins. The similarity between the CGTases of the thermophiles T. thermosulfurigenes EM1 and Thermoanaerobacter sp. is high (Table 5), but this is not a general feature for CGTases from thermophilic sources, since only a moderate similarity with both B. stearothermophilus CGTase enzymes was observed. All CGTases studied catalyze the four reactions presented in Figure 8. Incubation with starch therefore yields not only cyclodextrins but also linear products. This is caused by the hydrolysis activity of the enzyme, followed by disproportionation- and coupling activity. The CGTase from B. circulans strain 251, however, possesses a relatively low hydrolysis activity, and formation of linear products is therefore observed hardly at all.

Figure 9: Stereo view of the superimposed C"-backbone traces of the crystal structures of the CGTases from the mesophiles B. circulans strains 251 and strain 8, and the thermophiles B. stearothermophilus and T. thermosulfurigenes EM1, indicated with thick lines. Indicated are also the maltose binding sites (MBS1 to 3), the variable loop regions, and the active site (*). Reproduced from Knegtel et al., 1996.

39 Chapter 1

Comparison of the crystal structures of the CGTases of the mesophiles B. circulans strains 251 (Lawson et al., 1994 (Chapter 2)) and strain 8 (Klein and Schulz, 1991) (at 2.0 Å resolution), and the thermophiles B. stearothermophilus (Kubota et al., 1991) (at 2.5 Å resolution) and T. thermosulfurigenes EM1 (Knegtel et al., 1996) (at 2.3 Å resolution) shows that the overall fold of these enzymes is highly similar; the hydrophobic core is well conserved and the two calcium binding sites are virtually identical (see Figure 9). Identification of target residues for site-directed mutagenesis (to achieve specific protein engineering goals, see below) therefore will require a detailed analysis of CGTase protein structures. Most likely, information gained for a particular CGTase protein can serve as a model for other highly similar CGTase poteins. Significant differences between CGTases from thermophilic and mesophilic sources occur only at the surfaces of these proteins. Thermostable enzymes contain less flexible loops, have additional salt bridges, hydrophobic interactions and hydrogen bonds. These observations may allow construction of mutant proteins with improved thermostability (Knegtel et al., 1996).

40 General introduction: IV

IV SCOPE OF THIS THESIS

Protein engineering goals The availability of the three-dimensional structure of the cyclodextrin glycosyltransferase (CGTase) protein of Bacillus circulans strain 251 allowed further analysis of this system via protein engineering, using site-directed mutagenesis combined with structural and biochemical analysis of the mutant proteins. For this project four main topics were formulated:

Elucidation of reaction pathway: a) What factors determine the differences in product specificity between "-amylases and CGTases (linear versus cyclic malto-oligosaccharides)? b) What roles do the three carboxylic acid residues in the active site cleft play in the reaction mechanism?

Improvement of product specificity: CGTases from different bacterial sources all convert starch into a mixture of cyclodextrins. The B. circulans strain 251 enzyme produces approximately 13 % "-, 64 % $- and 23 % (-cyclodextrin. In industry, organic solvents (toluene or cyclohexane) are used in the production of $-cyclodextrin. The removal from aqueous systems of the complexant for "-cyclodextrin (1-decanol) is not feasible because of its high boiling point and the complexant for (-cyclodextrin (cyclododecanone) is too expensive for commercial use. Other disadvantages for selective crystallization steps are solvent toxicity, solvent flammability, and the need for a solvent recovery process. To avoid these expensive procedures, and to produce cyclodextrins suitable for applications involving human consumption, the development of a CGTase that produces only one, or mainly one particular form of cyclodextrin is desirable.

Reduced product inhibition: When incubated for 45-50 h with 10 % jet cooked starch, the CGTase from B. circulans strain 251 converts a maximum of 40 % of starch into cyclodextrins and less than 0.5 % into linear products. Ultrafiltration experiments show that when the products of the cyclization reaction are removed from the reaction mixture, or when diluted substrate solutions are used, up to 60 % of jet cooked starch can be converted (Bergsma et al., 1988). The low conversion at high starch concentrations was found to be mainly caused by product inhibition, especially exerted by $-cyclodextrin. What factors cause this inhibition phenomenon and is it possible to construct mutant CGTase proteins with reduced product inhibition?

41 Chapter 1

In Chapter 2 the cloning and sequencing of the CGTase encoding gene of B. circulans strain 251, and the elucidation of the crystal structure of the protein at 2.0 Å resolution, is described. A transformant of the chromosomal genebank of B. circulans strain 251 CGTase DNA was found to contain a 2139 bp open reading frame in the reverse orientation with respect to the lacZ promoter of the cloning vector. Further analysis revealed an ATG initiation codon, a TAA termination signal, and a 27 amino acid signal peptide that is removed during export. The extracellular CGTase thus contains 686 amino acid residues encoded by 2058 bp of the cgt gene. As in other known CGTase structures, five domains (A - E) can be recognized. The structure of the enzyme is nearly identical to those of other CGTases; the three N-terminal domains (A-C) have structural similarity with the three "-amylase domains. At the surface of the protein three bound maltose molecules were found which serve as a kind of glue between the protein molecules in the crystal.

To investigate the CGTase reaction mechanisms, and the factors determining the specificities of the various reactions, site-directed mutagenesis experiments with the aromatic amino acid Tyr195 present at a dominant position in the center of the active site cleft were performed (Chapter 3). Most CGTases possess an aromatic amino acid at this position (Tyr, Phe), whereas "-amylases have a much smaller residue at this position (Gly, Ser, Val). When introducing non-aromatic amino acids at this position, the mutant CGTase proteins displayed strongly reduced cyclization and coupling activities. This suggests that coupling and cyclization activities are determined at least to some extent by similar factors. The data thus confirm that aromatic amino acids at the Tyr195 position play an important role in the CGTase reactions involving cyclodextrins, also virtually completely preventing hydrolysis. The size of the cyclodextrin formed by the enzyme, however, was not directly determined by the amino acid at this central position in the active site. Further studies are needed to establish which amino acids do play a role in determining product specificity.

42 General introduction: IV

Asp229, Glu257, and Asp 328 most likely constitute the catalytic residues of the B. circulans strain 251 CGTase protein. The active site mutants D229N, E257Q, and D328N were constructed and indeed found to possess severely reduced catalytic activities (Chapter 4). In order to gain information about binding of "-cyclodextrin in the active site, crystals of the wild-type protein and of a D229N/E257Q double mutant were soaked with "- cyclodextrin and with maltoheptaose. In the resulting structure, however, only a maltotetraose was observed to bind in the active site. In the double mutant structure, two maltoses at the maltose binding sites in the E-domain were replaced by "-cyclodextrins, suggesting that product inhibition may occur by binding of cyclodextrins at these sites.

In Chapter 5 the results of soaking experiments with crystals of the B. circulans CGTase enzyme are described, using the inhibitor acarbose and maltohexaose. Elucidation of the resulting protein structure revealed a protein-carbohydrate complex with nine sugar binding subsites occupied. The maltononaose formed consisted of the four sugar residues present in acarbose and five residues from maltohexaose that had become coupled to the nonreducing end of acarbose. The data allowed identification of factors determining cyclodextrin product specificity.

On the basis of the CGTase structure with the maltononaose inhibitor, suggestions for mutations to improve product specificity were formulated. In Chapter 6 three single and one double mutation were described resulting in changed initial activities for the formation of "- cyclodextrin, coupling, disproportionation and saccharifying activity. Also in assays performed under conditions comparable to the industrial cyclodextrin production process, changes in production of "-cyclodextrin were observed. For the double mutant an increase of the production of "-cyclodextrin of 214 % was reached. Structure determination of a mutant complexed with a maltohexaose and kinetic data provided evidence for the existance of different intermediates in cyclization for the formation of "-, $-, and (-cyclodextrin.

43 Chapter 1

In Chapter 7 the results of a further analysis of the E-domain, which contains a raw- starch binding motif, are reported. Construction of mutants in the two maltose binding sites at this E-domain resulted in a decreased ability to bind maltose or starch. The data show that the nearest to the active site cleft is necessary for guiding starch chains to the active center. Cyclodextrin binding at this site appears an important factor determining sensitivity of the enzyme to product inhibition. The other maltose binding site, positioned further away from the active site cleft, appears to be most important for binding of CGTase protein molecules to raw starch granules.

REFERENCES

References are listed in Chapter 11.

44