<<

Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 819

Structural and Functional Studies of Family 12 from Trichoderma reesei and other Cellulolytic Microorganisms

BY MATS SANDGREN

ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2003 Dissertation for the Degree of Doctor of Philosophy in Molecular Biology presented at Uppsala University in 2003.

ABSTRACT

Sandgren, M. 2003. Structural and functional studies of glycoside hydrolase family 12 enzymes from Trichoderma reesei and other cellulolytic microorganisms. Acta Universitatis Upsaliensis. Comprehensive Summaries of Uppsala Dissertations from the faculty of Science and Technology 819. 68 pp. Uppsala. ISBN 91-554-5562-X

Cellulose is the most abundant organic compound on earth. A wide range of highly specialized microorganisms, have evolved that utilize as carbon and energy source. Enzymes called , produced by these cellulolytic organisms, perform the major part of cellulose degradation. In this study the three-dimensional structure of four homologous glycoside hydrolase family 12 cellulases will be presented, three fungal enzymes; Humicola grisea Cel12A, Hypocrea schweinitzii Cel12A, Trichoderma reesei Cel12A, and one bacterial; Streptomyces sp. 11AG8 Cel12A. The structural and biochemical information gathered from these and 15 other GH family 12 homologues has been used for the design of variants of these enzymes. These variants have biochemically been characterized, and thereby the positions and the types of mutations have been identified responsible for the biochemical differences between the homologous enzymes, e.g., thermal stability and activity. The three-dimensional structures of two T. reesei Cel12A variants, where the mutations have significant impact on the stability or the activity of the have been determined. Four ligand complex structures of the WT H. grisea Cel12A enzyme, that have made it possible to characterize the interactions between substrate and enzyme, have also been determined. The structural and biochemical studies of these closely related GH family 12 enzymes, and their variants, have provided insight on how specific residues contribute to protein thermal stability and enzyme activity. This knowledge can in the future serve as a structural toolbox, i.e., to design Cel12A enzymes with specific properties and features by introducing subtle changes in structural components of the enzymes. These can then be utilized to develop new industrial products or fine-tune enzymes in already existing applications.

Key Words: , endoglucanase, thermal stability, homologues, protein structure, ligand complex, X-ray crystallography

Mats Sandgren, Department of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Box 596, SE-751 24 Uppsala, Sweden. E-mail: [email protected]

©Mats Sandgren 2003

ISSN: 1104-232X ISBN: 91-554-5562-X

Printed in Sweden by Uppsala University, Universitetstryckeriet, Uppsala 2003 To my lovely family TABLE OF CONTENTS

PAPERS INCLUDED IN THE THESIS ...... 6

ABBREVIATIONS...... 7

1INTRODUCTION ...... 8

2BACKGROUND ...... 10 2.1 Wood...... 10 2.1.1 Cellulose...... 10 2.1.2 Other plant cell wall components...... 13 2.2 Cellulolytic organisms ...... 13 2.2.1 Trichoderma reesei ...... 14 2.2.2 Hypocrea schweinitzii ...... 14 2.2.3 Humicola grisea ...... 15 2.2.4 Streptomyces sp. 11AG8...... 15 2.2.5 Industrial applications of glycoside ...... 15 2.3 Cellulases...... 15 2.3.1 Classification of cellulases...... 16 2.3.2 Domain structure organization of cellulases...... 17 2.3.3 Hydrolytic mechanism of cellulases...... 18 2.4 The cellulolytic system of Trichoderma reesei...... 20 2.4.1 Induction of cellulases...... 24 2.4.2 Synergy between cellulases...... 24 2.4.3 Three-dimensional Structures of T. reesei cellulases ...... 26

3METHODS...... 27 3.1 X-ray crystallography...... 27 3.1.2 Phase determination ...... 27 3.1.3 Model building and structure refinement ...... 30 3.2 Protein crystallization ...... 30

4RESULTS AND DISCUSSION...... 32 4.1 Aim of thesis...... 32 4.2 The Trichoderma reesei Cel12A structure...... 32 4.2.1 Crystallization and structure determination...... 33 4.2.2 Protein structure...... 34 4.2.3 Substrate-binding cleft...... 35 4.3 Thermal stability and activity of GH family 12 enzymes ...... 36 4.3.1 Thermal stability...... 38 4.3.2 Relative enzyme activity ...... 39 4.3.3 Structural features affecting stability...... 40 4.3.4 Discussion...... 44 4.4 The Humicola grisea Cel12A structure, and stabilizing cysteines...... 44 4.4.1 Thermal stability...... 44 4.4.2 Relative enzyme activity ...... 46 4.4.3 Protein structures...... 46 4.4.4 Discussion...... 49 4.5 H. grisea Cel12A complex structures...... 49 4.5.1 Overall protein structures...... 49 4.5.2 Oligosaccharide complexes...... 49 4.5.3 Protein interactions...... 52 4.5.4 Transglycosylation...... 54

5CONCLUDING REMARKS ...... 56

6 ACKNOWLEDGEMENTS ...... 57

7 REFERENCES ...... 58

8 APPENDICES ...... 67 PAPERS INCLUDED IN THE THESIS

This thesis is based upon the following original publications and manuscripts, and will be referred to in the summary by their Roman numerals:

I. Sandgren, M., Shaw, A., Ropp, T. H., Wu, S., Bott R., Cameron, A. D., Ståhlberg, J., Mitchinson, C. and Jones, T. A. (2001). The X-ray crystal structure of the Trichoderma reesei family 12 endoglucanase 3, Cel12A, at 1.9 Å resolution. J. Mol. Biol. 308: 295-210

II. Sandgren, M., Gualfetti, P. J., Shaw, A., Gross, L. S, Saldajeno, M., Day, A. G., Jones, T. A. and Mitchinson, C. (2003). Comparison of family 12 glycoside hydrolases and recruited substitutions important for thermal stability. Protein Sci. 12: 848-866

III. Sandgren, M., Gualfetti, P. J., Paech, C., Paech, S., Shaw, A., Gross, L., Saldajeno, M., Berglund, G. I., Jones, T. A. and Mitchinson C. (2003). The Humicola grisea Cel12A enzyme structure at 1.2 Å resolution, and the recruitment of residues important for thermal stability of glycoside hydrolase family 12 enzymes. Submitted.

IV. Sandgren, M., Shaw, A., Gualfetti, P. J., Gross, L. G., Berglund, G. I., Ståhlberg J., Kenne, L., Driguez H.,, Jones, T. A. and Mitchinson C. (2003). Crystal complex structures reveals how a cellulose chain is bound in the 35 Å long substrate-binding cleft of Humicola grisea Cel12A, spanning from the – 4 to the + 2 of the enzyme. In manuscript.

Reprints of the articles were made with permission from the copyright holders.

6 ABBREVIATIONS

CBH cellobiohydrolase CBM cellulose binding module CD circular-dichroism cd catalytic domain DP degree of polymerization EG endoglucanase Fo observed structure factor amplitude Fc calculated structure factor amplitude GH glycoside hydrolase G2 cellobiose G4 cellotetraose G5 cellopentaose G2SG2 thio-linked cellobioside Glc NAG N-acetyl-glucose-amine HCA hydrophobic cluster analysis kDa kilo Daltons MAD/SAD multiple/single anomalous dispersion MIR/SIR multiple/single isomorphous replacement mme mono-methyl-ether MR molecular replacement MW molecular weight NCS non-crystallographic symmetry NMR nuclear magnetic resonance PEG polyethylene glycol RMSD root-mean-square deviation

Rmerge 6hkl 6i~I – < I >~ 6hkl 6i ~ I ~

Rfactor 6hkl~~Fobs~–~Fcalc~~ 6hkl ~Fobs~

Tm mid-point of thermal denaturation H. grisea Humicola grisea H. jecorina Hypocrea jecorina H. schweinitzii Hypocrea schweinitzii S. sp. 11AG8 Streptomyces sp. 11AG8 S. lividans Streptomyces lividans T. reesei Trichoderma reesei

7 1 INTRODUCTION ______

Cellulose is the most abundant polymer on earth. It has been estimated that the annual production of cellulose, through photosynthesis, is 40 billion tons (Coughlan 1985).

This accounts for roughly half of the annual CO2 fixation, which is estimated to be 15% of the total atmospheric carbon on earth (Gottschalk 1988). Cellulose is predominantly produced by terrestrial plants and marine algae, but can also be produced by other organisms such as bacteria. Cellulose is mainly used as structural reinforcement of plant cell walls. The cellulose content in the wood of higher plants like trees can be as much as 50%. Due to the large amounts of cellulose in the biosphere a wide range of highly specialized cellulose-degrading organisms have evolved that utilize cellulose as carbon and energy source. These organisms play a key role in the recycling of cellulose, and thereby maintain the carbon cycle on earth. The spontaneous degradation of cellulose in nature is extremely slow. Enzymes called cellulases perform the major part of cellulose degradation. These are produced by the cellulolytic organisms, and have the capability of hydrolyzing highly ordered crystalline cellulose into shorter cellooligomers and glucose. Cellulases are produced by a great number of organisms such as plants, plant pathogens and cellulolytic microorganisms, both bacterial and fungal. Among fungi and bacteria, there exist species that secrete complete sets of cellulolytic enzymes that synergistically have the capability to degrade highly crystalline cellulose completely, e.g., Trichoderma reesei, Clostridium thermocellum, and Thermobifida fusca. Since the early 1950's, scientists have tried to shed some light on how microorganisms, mainly fungi, manage the difficult task of degrading cellulose. Among the best-studied cellulolytic system, to date, is that of the aerobic filamentous soft-rot fungus Trichoderma reesei. In this thesis, where further investigations on the properties and the function of one class of cellulases will be presented, the cellulolytic system of T. reesei will be used as the point of comparison. The cellulolytic system of T. reesei consists of a large set of different cellulases, with totally different characteristics. Some degrade the cellulose chains from the ends, some make internal cuts, some are expressed in high quantities, and some in barely detectable levels. The reason why T. reesei has so many different cellulases is not fully understood.

8 Cellulases have become useful in a wide range of commercial applications over the last few decades, e.g., in the textile, paper and pulp, and detergent industries. These are applications where one wants to modify cellulose fibers (usually cotton), to get a better product. The interest in finding new commercial applications for cellulases has initiated world-wide extensive research programs on cellulases to identify new cellulases in nature that have suitable characteristics for the considered applications and processes (van Solingen et al. 2001). In many cases where a useful cellulase has been identified, the enzyme has also been genetically modified to get shifted or new biochemical properties that better fit the considered application.

Cellulose has for many years been considered to be a putative starting material for production. Cellulose is a renewable resource unlike the fossil fuels in use today. To produce ethanol on a huge scale, the carbon source must be cheap, easily accessible, and abundant. An additional advantage of using ethanol as fuel is that the carbon dioxide that is produced when consuming the bio-fuel, is taken up by the plants, and re-assimilated into cellulose. The increased carbon dioxide level in the atmosphere is the main reason for the so-called greenhouse effect. When producing ethanol from cellulose, the first step is to convert cellulose into glucose, which is then used as the carbon source in ethanol fermentation. It is in the first step, where one potentially could utilize cellulases to hydrolyze cellulose instead of treating it with acid, as is done today. But to date this process has been too slow, and the yield too low to be economically feasible. Thus there exists a strong demand for research on cellulases, as well as on the other steps in the conversion of cellulose into ethanol.

X-ray protein crystallography has over the last couple of decades become a powerful tool for studying the structure and function of proteins at atomic resolution. The first cellulase X-ray structure was published in 1990 (Rouvinen et al. 1990). Since then a wide range of different cellulase structures have been published. In the present study (Papers I-V), detailed atomic structures of several members from one class of cellulases (GH family 12) will be presented, as well as examples of how this structural knowledge, in combination with sequence and biochemistry data, has been used to genetically modify proteins so that they better fit in their industrial applications.

9 2 BACKGROUND ______

2.1 Wood Wood is built up of elongated plant cells that consist of many different components. The dominating ones are cellulose, , and lignin (Mohr and Schopfer 1995). A schematic picture of a wood cell is shown in Figure 1. The primary and secondary cell walls are built up of cellulose, hemicellulose, and lignin, whereas the middle lamella consists mainly of lignin (Sjöström 1993). The structure of the wood cell, and the ratio of its different components varies a lot depending on which plant species the cell comes from, cell type, and development stage. The approximate composition of the three most common components in wood is: 35-50% cellulose, 20- 30% hemicellulose, and 20-30 % lignin (Sjöström 1993).

Figure 1. A schematic picture of the structural organization of a wood cell. The cell consists of the middle lamella (ML), the primary cell wall (P), the secondary cell wall (S), and the warty layer (W). The secondary cell wall S is divided into the outer (S1), middle (S2), and inner layers (S3). Reprinted from (Sjöström 1993).

2.1.1 Cellulose Cellulose, the major structural component in the plant cell wall, was discovered in plants in 1837 (Hon 1994). The annual production of cellulose in the biosphere has been reported to be 4x1010 tons (Coughlan 1985), and the total amount of cellulose on earth has been estimated to be 7x1011 tons. The half time for spontaneous degradation of cellulose has been estimated to be 4.7x106 years (Wolfenden et al. 1998). Cellulose

10 is not only synthesized by plants, it is also synthesized by a wide range of other organisms such as bacteria, protists, algae, and fungi. Upon polymerization of a cellulose chain from glucose units, the C1 carbon (the anomeric carbon) of a glucose residue is covalently linked to the O4 hydroxyl oxygen of next glucose residue. Each added glucose residue gives the net release of one water molecule. The added glucose can be linked to the growing glucan chain with two different configurations at the anomeric carbon, in either the D- or the E-position (i.e., in axial or equatorial positions), resulting in either a D-1,4 or a E-1,4 (Figure 2). The polymerization of a glucan chain where the glucose residues are linked with D-1,4 bonds produces , whereas a glucan polymer linked with E-1,4 glycosidic bonds produces cellulose. The end of the glucan chain with an anomeric carbon that is not linked to another glucose residue is referred to as the reducing end of the polymer. The other end of the polymer is the non-reducing end.

Figure 2. Two E-D glucose units linked with a E-1,4 bond. The end of the molecule where the C1 carbon has a free hydroxyl, shown here in the equatorial position, is called the reducing end of the cellulose chain. A glucose molecule with a hydroxyl group in the axial position at C1 forms a D-1.4 linkage to the next glucose residue if the chain polymerization continues from this.

Cellulose is a linear homo- consisting of anhydrous glucose units that are linked by E-1,4-glycosidic bonds. Each glucose unit is rotated 180q relative to its two neighbors, thus the smallest repetitive unit in the cellulose polymer is cellobiose (Figure 3). Individual cellulose chains align side by side with other cellulose chains, with a network of hydrogen bonds between neighboring glucose units to form sheets of cellulose (Blackwell et al. 1978). These sheets stack on top of one another, due to hydrophobic interactions and van der Waals forces, thus forming highly crystalline cellulose micro-fibrils, of different diameters depending on the source of the cellulose

11 (Nieduszynski and Preston 1970). The length of the cellulose chains in the micro-fibril, i.e., the degree of polymerization (DP), varies from 2,000 to 15,000 glucose units, depending on source. The highly crystalline regions of cellulose in the plant cell wall are separated by less ordered regions of cellulose that is called amorphous cellulose. The crystallinity of cellulose varies from 50 to 90%, also depending on the source (Hon 1994).

A

B

Figure 3. a) A short cellulose chain built up of eight glucose residues. Each glucose residue in cellulose is rotated 180q compared to its neighbor, thus making cellobiose the smallest repetitive unit in cellulose. b) The picture shows the hydrogen bonds (dashed lines) involved in forming the cellulose structure, within a cellulose chain, and between two adjacent chains. There are two intra-molecular hydrogen bonds between two neighboring glucose residues in a cellulose chain; one is formed between the O3-H atom of a residue to the O5 oxygen of the next residue, and the second one from the O2- H atom to the O6 oxygen within the same residue. There is one intermolecular hydrogen bond formed between the glucose molecules in adjacent cellulose chains, i.e., from O2-H atoms of glucose molecules in one chain to the O3-H atom of glucose molecules in the adjacent chain. There are no hydrogen bonds to the cellulose chains in the planes above or below.

12 2.1.2 Other plant cell wall components There are several other polymeric compounds in the plant cell wall in addition to cellulose. The composition of these compounds varies greatly for different plant sources. The two most common ones are hemicellulose, and lignin. Hemicellulose Hemicellulose is defined as the fraction of the cell wall that can be extracted with alkali (Mohr and Schopfer 1995). It is a heterogeneous polysaccharide composed of a wide range of different , which usually are heavily branched with side groups. The most common building blocks in hemicellulose are: O-acetyl galactoglucomannan, and arabino 4-O-methylglucoronoxylan in softwood, whereas acetyl 4-O-methylglucoronoxylan, and glucomannan are the most common building blocks in hardwood (Sjöström 1993). The individual chains in hemicellulose are shorter than in cellulose, with a DP of 100-200. Lignin Lignin is a heterogeneous aromatic polymer, where the main aromatic components are coniferyl, commaryl, and sinapyl alcohol. The different building blocks of lignin are linked to one and another in an apparently random fashion, giving lignin a very complicated structure (Sjöström 1993; Mohr and Schopfer 1995). Lignin is covalently bound to side groups on different , forming a complex matrix that surrounds the cellulose micro-fibrils. The lignin matrix gives the plant cell wall strength, and also protects the cell wall from attack by cellulolytic microorganisms.

2.2 Cellulolytic organisms In nature there are many microorganisms, fungal and bacterial, that produce enzymes that are capable of catalyzing the of cellulose. These microorganisms can be found in plant debris and soil, i.e., where degradation of plant material takes place (Béguin and Aubert 1994; Tomme et al. 1995; Bayer et al. 1998). The cellulolytic organisms can be sorted into two different subcategories depending on how the cellulolytic microorganism organizes its enzymes. One class of cellulolytic microorganisms has cellulolytic enzymes that are organized into multi-enzyme complexes called cellulosomes. In these complexes the individual enzyme molecules are anchored onto a common scaffold. Several different types of enzymes, with different types of catalytic specificity's, e.g., endoglucanases, and cellobiohydrolases CBH, can be attached to this. A typical bacterial cellulosome is composed of 50 protein molecules with a total molecular weight (MW) of 2-6 M Dalton (Da). An example of a cellulolytic organism in this class is the bacterium Clostridium thermocellum. The second class of cellulolytic organisms produces enzymes that are not attached to one another, and act individually on cellulose. But the different types of enzymes

13 work cooperatively when hydrolyzing cellulose, and by doing this gain strong synergy effects. Examples of fungi from this class are T. reesei and Humicola grisea, and of bacteria, Streptomyces lividans and Cellulomonas fimi. Bacteria and fungi are not the only cellulase-producing organisms that exist. Cellulases have also been isolated from: blue mussel (Xu et al. 2000), termites (Watanabe et al. 1998), crayfish (Byrne et al. 1999), and plants, e.g., Arabidopsis (Williamson et al. 2002). Most cellulolytic microorganisms lack efficient ligninase systems, and cannot or have problems degrading lignin. It is only some bacidomycetes (white-rot fungi) that have such efficient systems. The complex nature of lignin makes its direct degradation by enzymes a difficult task. The degradation of lignin is less well characterized, and there are conflicting opinions on how the different lignin-degrading enzymes act and cooperate. Most likely aromatic radicals, produced by extracellular peroxidases from the lignin-degrading organism (Bourbonnais and Paice 1990), are involved.

2.2.1 Trichoderma reesei The fungus Trichoderma reesei was first isolated by the US Army during World War II, on the Solomon Islands. The US Army had huge problems with deterioration of their cotton materials, e.g., in tents and parachutes, and the isolation of T. reesei was a result of their effort to find a solution (Reese et al. 1950; Reese 1976). T. reesei is a soft-rot fungi that belongs to the deuteromycetes. It is a filamentous fungus that apparently is lacking a ligninase system. No sexual cycle has so far been observed for this fungus. The strain of T. reesei that is considered to be the wild type (WT) is called QM6a. T. reesei was originally named Trichoderma viride. This name was later changed to Trichoderma reesei when it was discovered that the T. viride QM6a strain was morphologically different from other T. viride strains (Simmons 1977). It was therefore described as a new fungal species, and was named Reesei, after the scientist who first isolated the strain. It was later reported that the T. reesei QM6a strain could not be distinguished from the type strain of another Trichoderma species, Trichoderma longibrachiatum (Bisset 1984). Lately it has been shown that T. reesei is a clonal derivative of Hypocrea jecorina (Meyer et al. 1992; Kuhls et al. 1996), and H. jecorina is an ascomycete with a well-defined sexual cycle.

2.2.2 Hypocrea schweinitzii It was at the same time shown that the sexual species of Hypocrea, H. schweinitzii and H. jecorina, were genetically two clearly distinct ascomycete species (Kuhls et al. 1996). In this study we will compare the biochemical properties of one type of cellulase from two of these ascomycete species, T. reesei and H. schweinitzii. To avoid confusion, the names Trichoderma reesei and Hypocrea schweinitzii will be used throughout this thesis when these two ascomycete are discussed.

14 2.2.3 Humicola grisea The fungus Humicola grisea var. thermoidea is a thermophilic mitosporic ascomycota. H. grisea has an optimal growth temperature of 40qC, and a maximum at 58qC (Maheshwari et al. 2000). The fungus produces a set of different cellulases that all are highly thermoresistant, and synergistically are capable of hydrolyzing highly crystalline cellulose. The cellulolytic system of H. grisea resembles that of T. reesei to a large extent, with cellulases belonging to the same cellulase families in both fungal species.

2.2.4 Streptomyces sp. 11AG8 The bacterium Streptomyces sp. 11AG8 is an alkalophilic actinomycete. The species was discovered in mud samples from East African soda lakes, in a search in these extreme environments for novel alkaliphilic species of cellulase-producing bacteria (van Solingen et al. 2001). The strain has an optimal pH range for growth between pH 7.4 and pH 10.5, and a temperature range for growth between 20q and 40qC. The closest Streptomyces neighbor of the 11AG8 strain is Streptomyces thermoviolaceus which has a sequence identity of 97.2 % in its 16S rDNA sequence (van Solingen et al. 2001). Two other actinomycete species that are considered to be close neighbors to the 11AG8 strain are Streptomyces lividans, and Streptomyces rochei.

2.2.5 Industrial applications of glycoside hydrolases Glycoside hydrolases (GH) from both fungal and bacterial sources are widely used in industrial applications. Cellulases have been used in the textile industry to enzymatically produce the effect of stone-washing on jeans (van Solingen et al. 2001), and as agents in detergents to enzymatically remove short fibers on the textile surface (depilling). In the juice and wine industries, pectinases and cellulases are used for maceration and clarification of the beverages. In the animal feed industry the addition of GHs, mainly E- and , have increased the digestibility of the feed for the animals. In the paper and pulp industries xylanases and mananases are added in the pulp bleaching processes.

2.3 Cellulases The crystalline nature of the micro-fibrils in cellulose makes the spontaneous degradation of cellulose in nature extremely slow, and it also makes cellulose highly resistant to enzymatic degradation. In spite of this, there is a yearly breakdown and recycling of cellulose accounting for 15 % of the total atmospheric carbon (Gottschalk 1988). A class of enzymes called cellulases performs the major part of this degradation. These enzymes have the capability of catalyzing the hydrolysis of the E-1,4-glycosidic bond between glucose units in cellulose. The cellulases are produced by a great number of organisms such as plants, plant pathogens and cellulytic microorganisms, both

15 bacterial and fungal (Béguin and Aubert 1994; Tomme et al. 1995; Bayer et al. 1998). Many cellulase-degrading organisms overcome the resistance of the highly crystalline cellulose substrate by secreting sets of different types of cellulolytic enzymes, e.g., endo- and exo-cellulases. These different enzymes act synergistically, enabling a more rapid and efficient hydrolysis of cellulose (Henrissat et al. 1985; Nidetzky et al. 1994). Cellulases are GHs, and can be found in at least 12 families of this very large group of enzymes.

2.3.1 Classification of cellulases The cellulases form two general classes: the endoglucanases (EG, EC 3.2.1.4) and the cellobiohydrolases (CBHs, EC 3.2.1.91). The EGs hydrolyse internal E-1,4-glycosidic bonds in the cellulose polymer, and in less wellordered “amorphous” regions within the cellulose micro-fibril. The CBHs are believed to work processively on the free ends of cellulose polymer chains, degrading the polymer from amorphous regions into the crystalline regions of the cellulose micro-fibrils, successively liberating cellobiose and small saccharides like cellotriose and cellotetraose from the cellulose chain (Sprey and Bochem 1992). There is a debate though, if CBHs can be called exo-cellulases or not, since recent activity measurements indicate that there is no absolute requirement for cellulase chain ends. The CBHs may be able to perform an internal cut in the middle of a cellulose chain, creating two new chain ends from which they then can proceed (Armand et al. 1997; Boisset et al. 2000). According to the enzyme nomenclature of CBHs, they act from the non-reducing end of the cellooligomer chain. Recent structural and functional studies do not support this. Rather it is suggested that the directionality may differ. For example the two CBHs from T. reesei, Cel6A and Cel7A, seem to work from opposite chain ends of the cellooligomer. Cel6A works preferentially from the non-reducing end (Koivula et al. 1998a), while Cel7A works from the reducing end (Divne et al. 1998). Many cellulases have cellulose-binding modules (CBM’s) that facilitate their adsorption onto crystalline cellulose, bringing the catalytic domains physically close to their site of action (Reinikainen et al. 1992; Linder and Teeri 1996). Glycoside hydrolase families The classification of GHs and cellulases has changed several times during the years that research has been carried out on these enzymes. As new GHs were discovered during the 1980's and early 1990's, and as new activity experiments were performed, the older classification systems of the different GHs became more and more inefficient. The new way that was adopted for classification of GHs was hydrophobic cluster analysis (HCA) (Gaboriaud et al. 1987). In this system the different enzymes are classified into structurally related GH families based on similarities in the distribution of hydrophobic amino acids in their sequences. It was Bernard Henrissat who first applied the new HCA classification system to the GHs. He classified the then known

16 cellulases into different GH families named with a one letter codes (A, B, C, …) (Henrissat et al. 1989; Henrissat and Davies 1997; Henrissat 1998). When the number of GH families grew, the one letter codes of the families had to be changed to a number instead. Today there exist more then 2,500 known protein sequences of GHs, which have been classified into at least 90 different GH families. The classification of the different GHs can be monitored at the CAZy web page (URL: http://afmb.cnrs- mrs.fr/~cazy/CAZY/index.html) (Coutinho and Henrissat 1999). The cellulases are placed into a GH family according to the catalytic core of the enzyme. For example, T. reesei CBH I, has a catalytic core that has been classified as GH family 7. So the enzyme is called T. reesei Cel7A because it is a cellulase (Cel), and A because it is the first family 7 enzyme reported from this organism. To date cellulases have been assigned to at least 11 of the known GH families; 5-9, 12, 26, 44, 45, 48, and 61.

2.3.2 Domain structure organization of cellulases Most fungal cellulases are organized in two structurally independent domains; a catalytic core module, and a cellulose-binding module (CBM). These two domains are usually interconnected via a short flexible linker. The domain structure organization of T. reesei cellulases has recently been described in a review (Koivula et al. 1998b). The catalytic core module The catalytic core module is the part of the cellulase where the hydrolysis of the cellulose chain takes place. This domain is the largest part of the enzyme. It varies greatly in size between different cellulases, e.g., the catalytic core modules of T. reesei cellulase vary from 166 to 430 amino acids in length. The linker region The two domains in the enzyme are interconnected via a flexible linker. The length of the linker varies in size, from less than 20 to over 40 amino acids in the different enzymes. The linker is usually very rich in threonines, serines, and prolines, and it is heavily glycosylated (Harrison et al. 1998). The role of the linker is probably to keep the two domains apart, and to restrict their movements with respect to one and another, so that the catalytic domain is always within close distance to the CBM, which binds on the surface of a cellulose fiber. The glycosylation of the linker probably makes it less flexible, and probably decreases its sensitivity to proteolytic enzymes (Harrison et al. 1998) that also are secreted by the microorganism. Deletion experiments on the linker of T. reesei Cel7A have shown that if most of the linker is removed, the rate of crystalline cellulose degradation is drastically reduced (Srisodsuk et al. 1993).

17 The cellulose-binding module (CBM) The CBM is a small wedge-shaped domain consisting of approximately 35 amino acids. It can be connected via the flexible linker to either the carboxy-terminus or the amino- terminus of the catalytic module. The function of the CBM is to bind on the surface of cellulose, and serve as an "anchor" for the enzyme, keeping it strongly adsorbed to the cellulose surface (Ståhlberg et al. 1988; Reinikainen et al. 1992; Linder and Teeri 1996). This reduces the need for strong binding of the catalytic domain to the cellulose, and thereby enables the enzyme to have a higher rate of turnover (Ståhlberg 1991). There is no evidence that the fungal CBMs can penetrate into the cellulose fiber and disrupt the structure, or have any catalytic activity. The CBMs are classified into at least 30 different CBM families according to the classification by (Tomme et al. 1995). The classification of CBMs can be monitored at the CAZy web page (URL: http://afmb.cnrs-mrs.fr/~cazy/CAZY/index.html) (Coutinho and Henrissat 1999).

CD

CBM

Figure 4. A hypothetical structure model of the intact T. reesei Cel7A enzyme. In the model the enzyme is adsorbed onto the surface of a cellulose micro-fibril, via the cellulose-binding module (CBM). The catalytic module (CD) is processively hydrolyzing one of the accessible cellulose chains from the reducing end, and releasing cellobiose units. The model is composed of the known structures of the catalytic module and CBM (Kraulis et al. 1989; Divne et al. 1994), and a hypothetical linker connecting the two modules. The figure was kindly provided by Dr. Christina Divne.

2.3.3 Hydrolytic mechanism of cellulases The cellulolytic enzymes have two different enzymatic mechanisms by which they can hydrolyze the glycosidic bonds in cellulose; the retaining and the inverting mechanisms (Koshland 1953; Sinnott 1990; Davies and Henrissat 1995).

18 Retaining mechanism The retaining glycoside hydrolase mechanism (Figure 5a) leads to a net retention of the configuration at the anomeric carbon (C1) of the substrate after cleavage. This is performed via a double displacement mechanism, i.e., the hydrolysis of a glycosidic bond creates a product with the same configuration at the anomeric carbon as the substrate had before hydrolysis.

A

B

Figure 5. The two enzyme mechanisms observed for cellulases: the retaining mechanism (a), and the inverting mechanism (b). In the retaining mechanism the configuration at the anomeric carbon will be in E-configuration after hydrolysis, i.e., the configuration is retained. The distance between the two catalytic carboxylates in the retaining enzymes is ~5.5 Å. In the inverting mechanism the configuration at the anomeric carbon will be changed from E to D configuration upon hydrolysis. The distance between the two catalytic carboxylates in the inverting enzymes usually varies between 6.5 and 9.5 Å.

19 The catalytic machinery of these enzymes involves two catalytic carboxylate residues that usually sit at opposite sides of the sugar plane. In the first step of the double displacement mechanism (glycosylation), one of the two carboxylic groups provides a general acid-catalysed leaving group departure, simultaneously with a nucleophilic attack on the anomeric carbon by the second carboxylate to form a glycosyl-enzyme intermediate. In the second step (deglycosylation), the first carboxylate residue now functions as a general base that activates an incoming nucleophile (a water molecule in the case of hydrolysis, and a new glycosyl group in the case of a transglycosylation event) by stealing a proton from it. This activated nucleophile then hydrolyses the glycosyl-enzyme intermediate. The GHs with retaining mechanism often have transglycosylating abilities. The distance between the two carboxylates in the enzymes with retaining mechanism is approximately 5.5 Å. Inverting mechanism The inverting glycoside hydrolase mechanism (Figure 5b) leads to a net inversion of the configuration at the anomeric carbon (C1) of the substrate after cleavage. This is performed via a single nucleophilic displacement mechanism, i.e., the hydrolysis of a beta-glycosidic bond creates a product with the alpha-configuration, and vice-versa. The catalytic machinery of these enzymes involves two catalytic carboxylates. These two carboxylate residues provide a general acid-catalyzed leaving group departure, and a general base-assistance to nucleophilic attack by a water molecule from the opposite side of the sugar ring. The distance between the two carboxylates in the enzymes with inverting mechanism is much less constrained than for the retaining enzymes. It is usually in the range 6.5-9.5 Å, but there are cases where it is much shorter than this, i.e., the longer distance is not required by the mechanism. Glucosyl-binding sites The glucosyl-binding sites in the catalytic core of the cellulases are numbered using the position of the catalytic cleavage site in the enzyme as the reference point (Davies et al. 1997a). Thus the binding sites towards the reducing end of the cello-oligomer from this reference point have increased positive integers; +1, +2, …, and the sites towards the non-reducing end have increased negative ones; -1, -2, ….. These numbers depend on how many subsites the enzymes have, i.e., 1,2 etc.

2.4 The cellulolytic system of Trichoderma reesei The filamentous soft rot fungus Trichoderma reesei (described in section 2.2.1) is one of the most efficient cellulose-degrading organisms known, and its cellulolytic system is also one of the most studied. The fungus is an ideal cellulolytic model organism for studying cellulase degradation since it secretes large amounts of all cellulases needed for degradation of crystalline cellulose. To date, two CBHs (Cel6A and Cel7A), and at least five EGs (Cel5A, Cel7B, Cel12A, Cel45A, and Cel61A), have been found in the

20 cellulolytic system of T. reesei. These enzymes belong to six different GH families, 5, 6, 7, 12, 45, and 61. T. reesei has two cellulases in GH family 7, one CBH (Cel7A), and one EG (Cel7B).

Table 1. The seven known cellulases in the cellulolytic system of T. reesei. ______Name Old name Number Position Stereo- Amount of total of residues of CBM selectivity cellulase (%) ______

Cel5A EG II 397 N Retaining 8 Cel6A CBH II 447 N Inverting 18 Cel7A CBH I 497 C Retaining 55 Cel7B EG I 436 C Retaining 9 Cel12A EG III 218 * Retaining <1 Cel45A EG V 270 C Inverting ** Cel61A EG IV 344 C Not known ** ______

The names of the cellulases are based on the GH nomenclature system suggested by (Henrissat 1998). The position of the CBM can either be at the carboxy-terminus (C) or at the amino-terminus (N) of the catalytic module. (*) Cel12A does not have a CBM. (**) Percentage of total secreted T. reesei cellulase is not known for this enzyme. ______

T. reesei cellulose binding domain All the known T. reesei cellulases, except Cel12A (one of the GH family 12 cellulases that will be described in more detail in the Results section of this thesis), have a cellulose-binding domain consisting of 35 highly conserved amino acids, and all of them belong to CBM family 1, according to the classification system by (Tomme et al. 1995). The T. reesei CBMs are connected to their catalytic core modules via a 35-44 residue long, heavily glycosylated (Harrison et al. 1998) linker. The CBM is located at the carboxy-terminus of the Cel7A, Cel7B, Cel45A, and Cel61A catalytic core modules, and at the amino-terminus of the Cel5A, and Cel6A catalytic core modules. The three-dimensional structures of the Cel7A and Cel7B CBM have been determined by Nuclear Magnetic Resonance (NMR) (Kraulis et al. 1989; Mattinen et al. 1998). They form a wedge-shaped structure with one flat surface where three conserved aromatic residues are located. These are believed to stack on the glucose units on the cellulose micro-fibril surface (Mattinen et al. 1997).

21 Cel5A Cel5A is an EG that belongs to GH family5. The gene for the enzyme was cloned by (Saloheimo et al. 1988), Gene-Bank accession number M19373. The enzyme has an estimated molecular weight of 42 kilo Dalton (kDa), but has an apparent molecular weight of 48 kDa on a SDS-PAGE gel due to glycosylation. It has a pI of 5.5-5.6 (Shoemaker and Brown 1978). Cel5A hydrolyzes the E-1,4-glycosidic bonds in cellulose using the retaining mechanism (Henrissat et al. 1985). The amount of expressed Cel5A has been estimated to be between 5-10 % of total expressed cellulase in T. reesei (Ståhlberg 1991; Ilmen et al. 1997). Cel6A Cel6A is a GH family 6 CBH. The gene for the enzyme was cloned by (Teeri et al. 1987), Gene-Bank accession number M16190. The enzyme has an estimated molecular weight of 47 kDa, 53 kDa on a SDS-PAGE, and it has a pI of 5.9 (Fägerstam and Pettersson 1980; Bhikhabhai et al. 1984). Cel6A is a processive enzyme that hydrolyzes the glycosidic bonds in cellulose using the inverting mechanism, and it has been shown that the enzyme preferably hydrolyzes the cellulose chain from the non-reducing end (Barr et al. 1996; Boisset et al. 2000). There have been reports that Cel6A possesses some endoglucanase activity (Nutt et al. 1998). The amount of expressed Cel6A has been estimated to be between 17-20% of total expressed cellulase in T. reesei (Ståhlberg 1991; Ilmen et al. 1997). Cel7A Cel7A is a GH family 7 CBH, and it was the first T. reesei GH family 7 cellulase that was discovered. The gene for the enzyme was cloned by (Wey et al. 1994), Gene-Bank accession number X69976. Cel7A has an estimated molecular weight of 52 kDa, 66 kDa on a SDS-PAGE, and it has a pI of 4.3 (Fägerstam et al. 1977; Shoemaker et al. 1983). Cel7A is the major cellulase produced by T. reesei, and it has been estimated that 50-60 % of total expressed cellulase in the fungus is Cel7A (Ståhlberg 1991; Ilmen et al. 1997). It is probably the key enzyme needed for hydrolysis of crystalline cellulose by the fungus. Cel7A is a processive enzyme that hydrolyzes the glycosidic bonds in cellulose using the retaining mechanism, and it has been shown that the enzyme preferably hydrolyzes the cellulose chain from the reducing end (Barr et al. 1996; Divne et al. 1998) (Imai et al. 1998). Cel7B Cel7B is a GH family 7 EG. The gene for the enzyme was cloned by (Penttila et al. 1986), Gene-Bank accession number M15665. Cel7B has an estimated molecular weight of 48 kDa, 50-55 kDa on a SDS-PAGE, and it has a pI of 4.5 (Shoemaker et al. 1983; Bhikhabhai et al. 1984). Cel7B is homologous to Cel7A, with about 45 % sequence identity. The main difference between the two GH family 7 structures is that the substrate-binding cleft is less covered by extended loops in the endoglucanase

22 (Cel7B) than in the exoglucanase (Cel7A). Cel7B hydrolyzes the glycosidic bonds in cellulose using the retaining mechanism. The amount of expressed Cel7B has been reported to be between 6-10% of total expressed cellulase in T. reesei (Ståhlberg 1991; Ilmen et al. 1997). Cel12A Cel12A is a GH family 12 EG. The gene for the enzyme was cloned by (Ward et al. 1993; Okada et al. 1998), Gene-Bank accession number AB003694. The enzyme has a molecular weight of 25 kDa, has a neutral pI of 7.5, does not have a CBM, and is only sparsely glycosylated (Håkansson et al. 1978; Ülker and Sprey 1990; Sprey and Ülker 1992; Hayn et al. 1993). Cel12A hydrolyzes the glycosidic bonds in cellulose using the retaining mechanism. The two catalytic residues in Cel12A are the two carboxylates E116 and E200 (Okada et al. 2000). The amount of expressed Cel12A has been reported to be less than 1% of total expressed cellulase in T. reesei (Ülker and Sprey 1990). The specific function for T. reesei Cel12A is not known. Some biochemical data on Cel12A can be found in the literature, including studies of activity on soluble substrates (Hayn et al. 1993), and insoluble cellulase (Sprey and Bochem 1992). There have been reports that Cel12A, besides cellulose activity, has activity against E-glucan and xylan (Hayn et al. 1993; Karlsson et al. 2002). It has been shown that Cel12A has an ability to induce extension of type I cell walls from cucumber and wheat (Yuan et al. 2001). The three-dimensional structure of the T. reesei Cel12A enzyme has been solved within the work of this thesis. The procedure on how the Cel12A enzyme structure was determined is described in the Methods and Results section, and in Paper I. In addition to the T. reesei Cel12A structure, the three-dimensional structures of three homologous GH family 12 enzymes; Hypocrea schweinitzii Cel12A, Humicola grisea Cel12A, and Streptomyces sp. 11AG8 Cel12A, have been determined (Papers II-III). The T. reesei enzyme is structurally and biochemically compared to the other three GH family 12 homologues in the Results section (Papers II-IV). Cel45A Cel45A is a GH family 45 EG. The gene for the enzyme was cloned by (Saloheimo et al. 1994), Gene-Bank accession number Z33381. The enzyme has an estimated molecular weight of 23 kDa, 36 kDa on a SDS-PAGE, and it has a pI of 2.9 (Shoemaker et al. 1983; Saloheimo et al. 1997). Ce45A hydrolyzes the glycosidic bonds in cellulose using the inverting mechanism. The amount of expressed Cel45A of total expressed cellulase in T. reesei is not known. There have been reports that Cel145A, besides cellulase activity, has activity against E-glucan but not xylan (Saloheimo et al. 1994; Karlsson et al. 2002).

23 Cel61A Cel61A is a GH family 61 EG. The gene for the enzyme was cloned by (Saloheimo et al. 1997), Gene-Bank accession number Y11113. The enzyme has an estimated molecular weight of 34 kDa on SDS-PAGE, and it has an estimated pI of 6.0 (Karlsson et al. 2001). The hydrolytic mechanism and the total amount of expressed enzyme is not known for T. reesei Cel61A.

2.4.1 Induction of cellulases T. reesei and many other cellulolytic organisms do not express their cellulases constitutively. The production of the cellulases is only turned on when these are needed, i.e., the cellulases are induced (Kubicek and Penttilä 1998). When T. reesei is grown with cellulose as the only carbon source, the genes for the cellulases are induced, and the enzymes are expressed. However if the fungus is grown on glucose there is no cellulase expression, i.e., the cellulases are glucose repressed. It has been shown that this glucose repression of the T. reesei cellulases is on a transcriptional level (Ilmen et al. 1997). It has also been shown that the expression of most T. reesei cellulases is co- regulated, and that they always are expressed at same relative amounts (Ilmen et al. 1997). There are many known molecules that induce cellulase expression in T. reesei, e.g., cellobiose, cellotriose, cellotertraose, , and sophorose (Sternberg and Mandels 1980; Kubicek et al. 1993). The exact mechanism by which the cellulase expression is induced in T. reesei is not fully understood. There must exist an enzyme in T. reesei that is constitutively expressed, recognizes cellulose, and from this produces a substrate that induces cellulase expression. Such an enzyme has not yet been identified. It has been suggested that E-glucosidase could be one such enzyme (Vaheri et al. 1979; Fowler and Brown 1992), since it produces sophorose as a transglycosylation product from cellulose, and sophorose is the most efficient cellulase inducer so far identified. Although E-glucosidase is an efficient cellulase inducer, it is not essential for cellulase expression (Fowler and Brown 1992).

2.4.2 Synergy between cellulases When the combination of two enzymes is more efficient than the sum of the enzymes acting alone, the two enzymes have synergy. This is something often found in the case of cellulases, and maybe why many cellulolytic organisms have multiple sets of the same type of cellulase, e.g., T. reesei has at least two CBHs, and five EGs. The synergy between cellulases can easily be detected by miximg two single components from a cellulolytic system, e.g., combining one of the EGs with one of the CBHs, and comparing the hydrolytic activity of this combined system with the cumulative activity of the two cellulases acting alone. There are many studies that show synergy between EGs and CBHs, during hydrolysis of cellulose (Henrissat et al. 1985; Bailey et al. 1993; Irwin et al. 1993; Medve et al. 1998). Presumably the EG makes internal cuts in the

24 cellulose chain, and thereby provides new accessible chain ends for the cellobiohydrolase/exoglucanase to work on to gain increased hydrolytic activity (Figure 6). This model for the synergy between endoglucanases and exoglucanases is called the endo-exo model (Béguin and Aubert 1994; Tomme et al. 1995).

Figure 6. The endo-exo model for synergy between endoglucanases and cellobiohydrolase/exonucleases in a cellulolytic system during hydrolysis of cellulose. In this model the endoglucanases make internal cuts in the amorphous parts of the cellulose micro-fibril and thereby provide new accessible chain ends for the exoglucanase to recogniz, and from these start to processively hydrolyze the cellulose. The combined system has increased hydrolytic activity compared with the enzymes working alone.

There have been investigations on the optimal mixtures of one of T. reesei's two CBHs, Cel6A or Cel7A, with one of its EGs, Cel5A or Cel7B (Henrissat et al. 1985). The results show that T. reesei Cel6A needs much less EG than T. reesei Cel7A to reach maximal synergy. There have also been several reports of detected synergy between T. reesei's two CBHs, Cel6A and Cel7A (Fägerstam and Pettersson 1980; Henrissat et al. 1985; Irwin et al. 1993; Medve et al. 1998). The reports that T. reesei Cel6A possesses some endoglucanase activity (Nutt et al. 1998), could possibly explain the observed synergy between T. reesei's two CBHs. Another explanation could be that these two exoglucanases processively hydrolyse the cellulose from different directions. T. reesei

25 Cel6A hydrolyzes from the non-reducing end (Divne et al. 1998) and T. reesei Cel7A from the reducing end (Divne et al. 1998), thereby exposing new chain ends for the other enzyme to work on. There are also several reports of processive endoglucanases (Reverbel-Leroy et al. 1997). The findings that the two types of cellulases, CBH and EG, can possess both types of activities, e.g., endo- and exoglucanase activity, make it difficult to fit the activity data from a cellulase into a simple model like the endo-exo synergy model but this is still the best model we have to explain how a cellulolytic system like T. reesei’s works.

2.4.3 Three-dimensional structures of T. reesei cellulases The first three-dimensional X-ray crystallography structure of a cellulase was the structure of T. reesei Cel6A catalytic domain (Rouvinen et al. 1990). Since then, many new cellulase structures have been determined, and today there exist at least a hundred known unique cellulase structures. There are cellulase structure representatives for most of the eleven cellulase-containing GH families, except for GH families 26, 44 and 61. For a recent report on new cellulase structures see the CAZy web page; http://afmb.cnrs-mrs.fr/~cazy/CAZY/index.html. To date four three-dimensional structures out of T. reesei's seven cellulases have been determined by X-ray crystallography: Cel6A (Rouvinen et al. 1990), Cel7A (Divne et al. 1994), Cel7B (Kleywegt et al. 1997) and Cel12A (Paper I of this thesis). Three of these structures, Cel6A, Cel7A andCel7B have been determined from the enzymes catalytic domain only, since it has not been possible to get useful protein crystals of the intact cellulase. The problem with getting crystals from the intact cellulase is most likely due to the highly flexible linker between the catalytic core and the CBM in the cellulase. A big flexible part in an enzyme often causes bad crystal packing or no crystal formation at all. T. reesei Cel12A does not have a CBM or a linker, so this three-dimensional structure is of an intact cellulase. The overall structure of the T. reesei Cel12A enzyme is described in the Results section of this summary, and in some more detail in Paper I. The T. reesei Cel12A structure is one of the four homologous GH family 12 enzymes whose structures have been determined within the work of this thesis (Paper I-IV). In addition to the four T. reesei catalytic core domain structures, the three- dimensional structures of the CBMs of T. reesei Cel7A and Cel7B have been determined by NMR (Kraulis et al. 1989; Mattinen et al. 1998). The most striking difference between T. reesei's two known CBH structures, Cel6A and Cel7A, and its two known EG structures, Cel7B and Cel12A, is that the CBHs form a tunnel where the cellulose chain binds and the EGs form an open cellulose binding cleft.

26 3 METHODS ______

This chapter will briefly describe the methodsthat commonly are used in X-ray crystallography to determine the three-dimensional structure of proteins. These include protein crystallization, X-ray diffraction, structure solving, model building and refinement.

3.1 X-ray crystallography The methodology of X-ray crystallography is based on the fact that the atoms in a molecule, or rather the electrons in the atoms, scatter X-rays. The diffraction pattern one gets, if one puts a molecule in an X-ray source, is a representation of the space that the electrons of the atoms in a molecule occupy, i.e., the molecules electron density. In this thesis the molecule represents the different protein molecules whose structure has been determined. All diffracted X-rays that a molecule in an X-ray source gives rise to are represented by a spot in the diffraction pattern (Figure 7). These diffraction rays can be described by an amplitude (|FP|), a wavelength (O), and a phase angle (FP). If these three values are known for all the spots in the diffraction pattern, the electron density for the molecule can be calculated. The amplitudes of the diffracted X-rays are determined by the intensities of the spots in the diffraction pattern and the wavelength is a property of the X-ray source. The information about the phase angle, however, is lost in the collected diffraction image. This is known as the phase problem in X-ray crystallography, and is the fundamental problem when solving a new protein structure. With the X-ray sources available today, it is not possible to obtain a strong enough diffraction pattern from a single protein molecule. The solution is therefore to use a crystal of the protein molecule because a crystal contains millions of identical copies of the protein molecule packed into the lattice of the crystal. By using a protein crystal the diffraction will be enhanced, and thus make it possible to detect and collect a diffraction pattern from the protein molecule (Figure 7).

3.1.2 Phase determination There exist several alternative methods for recovering the phase angle information that is lost in the collected diffraction pattern. These include the multiple/single isomorphous replacement (MIR/SIR), the multiple/single anomalous dispersion (MAD/SAD), and the molecular replacement (MR) methods. The structure

27 determinations of the four new homologous GH family 12 cellulase-structures that are presented in this thesis have been done by MIR and MR methods. These two phase- determining methods are briefly described below.

Figure 7. A typical X-ray protein diffraction pattern, collected on a T. reesei Cel12A crystal (space group P21), diffracting to 1.9 Å

Molecular replacement phasing If there exists a protein with known structure that is similar to the protein of interest, the known structure can be used in a procedure called molecular replacement (MR) to solve the phases of the unknown structure. This method was first described by (Rossmann and Blow 1962). The idea of this method is to calculate an electron density map from the known structure and then rotate and translate this calculated density in the unit cell of the unknown protein, until there is a maximum overlap between them. The approximate phases from the correctly oriented search model will then provide a starting point for building the unknown protein. The rough initial phases will improve as the building and refining of the new structure progresses. The molecular replacement method is a powerful phasing method, but since the difference between the correct solution and a wrong solution often is small, it can be tidious work to find the right solution, especially if there is low homology between the search model and the

28 unknown protein structure. Several programs for molecular replacement are available. The molecular replacement program that was used to solve three of the homologous GH family 12 cellulase-structures in this thesis was AMoRe (Navaza 1994). The molecular replacement method has today become the most common phase-determining method since the number of known protein structures drastically has increased. Isomorphous replacement phasing When there is no known protein structure similar to the protein that one wants to solve the structure of the phases of the unknown structure have to be determined ab initio, using methods other then MR. Today there exists a wide range of different methods to solve the phases of a unknown protein structure ab initio, but the most commonly used today is multiple isomorphous replacement (MIR) method. MIR was used to solve the first of the four homologous GH family 12 structures that are presented in this thesis, the T. reesei Cel12A structure. The MIR method was the first phasing technique that became available to macromolecular crystallographers. This method and most other ab initio phasing methods are based on the fact that heavy atoms such as: transition metals, lanthanides, uranium and even noble gasses under pressure, can quite successfully be soaked into protein crystals, and they frequently bind to well defined sites in the native protein. Such a protein crystal is then called a heavy atom derivative. Assuming that the heavy metal soaked into the crystals does not disturb the structure of the derivative crystal (i.e., it is isomorphous), we are able to derive information on the structure factor amplitudes for the heavy metal from the differences between the derivative and the native X-ray dataset (isomorphous replacement). When the resulting derivative crystal is isomorphous to the native crystal, the structure factors FP of the protein, FPH of the derivative and FH of the heavy atom are related by FH = FPH - FP, which is a complex equation. The amplitudes |FPH| and |FP| can be measured, giving the isomorphous difference |FPH|-|FP|. The coordinates of the heavy metal atoms can be determined by a difference Patterson map between the heavy atom and the native dataset. This will give FH making it possible to calculate FP. As only one heavy metal or derivative (single isomorphous replacement (SIR)) still leads to phase ambiguity, it is necessary to use, at least, a second heavy atom derivative to remove the phase ambiguity. In theory one extra derivative should be enough, but because of errors in the data, usually several are required (multiple isomorphous replacement (MIR). Anomalous scattering (AS) phase information can be obtained from the scattering of a heavy atom whose absorption frequency is close to the wavelength of the X-ray. This small anomalous signal from the heavy atom is sometimes enough to remove the phase ambiguity of one heavy atom derivative (SIRAS). When more than one heavy atom derivative and anomalous differences are used, then the method is called MIRAS.

29 3.1.3 Model building and structure refinement Once initial phases have been obtained for the X-ray dataset, a preliminary electron density map can be calculated and displayed. The protein structure model is built by manually fitting the atoms in the protein into this electron density using one of the available protein structure graphics programs, e.g., O (Jones et al. 1991). The structure model is completed with alternating cycles of model building in the graphics program, and maximum-likelihood model refinement, bulk-solvent corrections, and anisotropic scaling using one of the protein structure refinement programs, e.g., CNS (Brünger et al. 1998) or Refmac (Murshudov et al. 1997). The quality of the model during the course of structure building and refinement is monitored by the crystallographic R-factor (Rcryst) and free R-factor (Rfree) (Brünger 1992a). The Rcryst is a measurement of how well the model’s calculated structure factor amplitudes corresponds to the experimental amplitudes. The Rfree is calculated in the same way as the Rcryst, but with the difference that the experimental data used for the calculation is only a fraction of the total data (typically 3-10 %). This set of the data is never included in any of the many refinements of the protein structure, from the initial model to the final structure. This makes the Rfree a much less biased indicator of the quality of the model than the Rcryst.

3.2 Protein crystallization One of the bottlenecks today when determining protein structure by X-ray crystallography, is to obtain good crystals of the protein. The first protein crystals were grown over 150 years ago, and these were crystals of hemoglobin (McPherson 1999). Since then, protein crystallization has evolved from trial and error experiments into the more rational crystallization experiments that we perform today, that are based on the empirical data that have been gathered by protein crystallographers over the years. Protein crystallization has today become a science by itself, and has great impact in a wide range of areas within life science. The most frequently used protein crystallization method today is the vapor diffusion technique (Figure 8). This is also the crystallization method that has been used to crystallize all the different GH family 12 enzymes in this thesis, for which structures are presented. In this method a small volume (1-10 Pl) of the protein, purified and homogenous, is mixed with an equal volume of a crystallization solution, and positioned as a small droplet on a siliconized cover slip. The crystallization solution usually consist of, buffer, salt and a precipitant, usually polyethylene glycol (PEG) or salt. The cover slip with the droplet of protein and crystallization solution is inverted and positioned over a reservoir containing 200-1000 Pl of the crystallization solution, and sealed with grease or oil. The higher concentration of the precipitant in the reservoir, with respect to the droplet, drives the system to reach equilibrium by vapor

30 diffusion of water from the droplet to the reservoir. In a successful crystallization experiment the protein should reach supersaturation, and crystals start to form. Crystals (Figure 9) can grow to a size of 0.1-0.5 mm in all directions after some time of incubation.

Figure 8. Three different types of set-ups for crystallization by vapor diffusion a) hanging drop, b) sitting drop and c) sandwich drop.

Figure 9. Typical crystal of T. reesei Cel12A, space group P21.

There exist many other crystallization methods beside vapor diffusion, e.g., batch crystallization, dialysis, seeding and free interface diffusion. Each of these methods has its advantages for certain types of experiments, but they are all much less used than the vapor diffusion method.

31 4 RESULTS AND DISCUSSION ______

In this chapter, the results will be summarized and briefly discussed. These results are described in much more detail in the four papers (I-IV).

4.1 Aim of thesis Cellulases are used in a wide range of industrial applications. Examples of industrial processes where GH family 12 enzymes can be used are: textile, paper and pulp, detergent, and ethanol production. Many of these industrial processes are performed at elevated temperatures and at non-physiological pHs. This makes it of great importance to have detailed knowledge on how to shift the pH and temperature profiles of the enzymes that potentially will be used in these processes, so that these enzymes still are stable and active. The overall goal of this thesis work was to determine the three-dimensional structure of a set of homologous GH family 12 enzymes by X-ray crystallography. Identifying the structural differences in these homologous Cel12A enzymes that could explain some of the differences in the biochemical performance profiles of these enzymes; e.g., pH range of activity, and thermal stability. This knowledge can then be used to modify the Cel12A enzymes of interest towards targeted properties in industrial processes. The work was done in close collaboration with the company Genencor International Inc. The major business area of Genencor is production of industrial enzymes, and cellulases are one class of enzymes that the company is focused on. Genencor's interest in cellulases, and especially to use GH family 12 enzymes in some of there applications, have made it possible to use the structural and biochemical knowledge that we have gathered on this class of enzymes, to improve/modify the enzymes for existing or new industrial applications. The close collaboration also means that most of the results that will be presented in this thesis are the combined results from experiments that have been performed both by us at Uppsala University and by the several research groups at Genencor.

4.2 The Trichoderma reesei Cel12A structure (Paper I) The structure of the fungal Cel12A enzyme from T. reesei will be described in detail here. Since the structures of the other three Cel12A homologous, solved within this work, to a large extent resemble the T. reesei Cel12A structure, only structural features

32 that significantly differ from the T. reesei Cel12A structure will be described. The T. reesei Cel12A enzyme is also the enzyme to which the different GH 12 homologues and variants of these will be structurally and biochemically compared.

4.2.1 Crystallization and structure determination Precipitated or crystalline T. reesei Cel12A protein stocks were washed with water, prior to dissolution in 20 % (w/v) solution, then concentrated to 15 mg/ml. Crystals were obtained from 200 mM cacodylate buffer (pH 6.0), 200 mM ammonium acetate and 10-30 % (w/w) mono-methyl-ether (mme) polyethylene glycol (PEG) 2000, at 20-24 oC using hanging and sitting drops (McPherson 1982). Large single, wedge- shaped crystals grew to a maximum size of 1 mm in all directions, within 1-2 days. The crystals belong to the monoclinic space group P21 with cell dimensions a = 69.8 Å, b =

71.4 Å, c = 124.8 Å and E = 91.4°, and have a calculated Vm of 2.0 (Matthews 1968) with an estimated six non-crystallographic-symmetry (NCS) related molecules in the asymmetric unit. The variant M154C, crystallized isomorphously with the wild type. 1 mM ethylmercury-thio-salicylate (EMTS) was added to the mother liquor, and the M154C crystals were left for one day for the drop to re-equilibrate. Isomorphous crystals were also obtained when ammonium acetate was replaced with 200 mM trimethyl-lead acetate (Me3Pb). All data sets were collected on an R-axis IIC image-plate system, mounted on a Rigaku RU-200 rotating-anode generator. The apo wild type and M154C, M154C- EMTS and M154C-Me3Pb heavy-atom data sets that were used for phasing were collected at room temperature. Subsequently, a 1.9 Å resolution native data set was collected at 100 K, from a single crystal, using 30 % mme PEG 2000 as cryo-protectant. The data sets were processed and scaled with DENZO and SCALEPACK (Otwinowski and Minor 1997). Data collection and processing statistics, for the native data set, are given in given in Appendix I. Three sites for mercury atoms could be readily identified in the isomorphous difference Patterson map for the EMTS-soaked crystal data against the apo M154C mutant data. Initial positions of the heavy-atoms were identified using the program RSPS (Knight 2000), and refined with MLPHARE (Otwinowski 1991). Cross-phased difference Fouriers revealed lead sites for crystals grown in the presence of tri-methyl- lead. Subsequent heavy-atom refinement using SHARP (La Fortelle et al. 1997) revealed further binding-sites for each derivative. The MIR phases obtained from

SHARP (FOMcentic = 0.6, FOMacentric = 0.43) produced an interpretable electron density map. The map was improved by exploiting the six-fold NCS of the asymmetric unit. Initial local symmetry operators were determined in O (Jones et al. 1991) by using the positions of the heavy-atom derivatives. The operators were subsequently refined and the electron densities were averaged with programs from the RAVE program package

33 (Kleywegt and Jones 1999). Phases were further refined, using automated solvent flattening and histogram matching, with a solvent content of 40 %, and the resolution was extended to the resolution limits of the apo mutant data-set, using the program DM (Cowtan and Main 1998). All model building was performed using the program O. The initial model was built using skeletonised density, main-chain and side-chain databases and baton building methods (Jones et al. 1991; Jones and Kjeldgaard 1997). Maximum likelihood model refinement was performed using the program CNS (Brünger et al. 1998). Utilizing the native high resolution data-set required the use of molecular replacement to more accurately position the molecular model in the unit cell of the frozen crystal. A set of 2567 reflections, representing 2.7 % of the total 86691 reflections between 15 and 1.9 Å, was used to monitor the R-free (Brünger 1992a). A summary of refinement and final model statistics, based on the native high-resolution data set, is given in Appendix II.

4.2.2 Protein structure The fold of the T. reesei Cel12A protein was, as expected, the same as the previously solved GH family 12 enzyme from S. lividans, CelB2, and similar to the GH family 11 xylanases, i.e., a E-sandwich. T. reesei Cel12A consists of 15 long E-strands that fold into two twisted, largely anti-parallel E-sheets, A and B, which pack on top of one another (Figure 10). The convex E-sheet A consists of 6 anti-parallel strands, labeled A1-6. Sheet B consists of nine strands, B1-9, and is largely anti-parallel. The E-strands in the two E-sheets are numbered consecutively from 1-6/9 after their order in the sheets, with E-strands A/B1 closest to the proposed non-reducing end of the binding cleft, to the left in Figure 10. There is a single D-helix in the structure that packs against the outer convex surface of E-sheet B. The enzyme is compact and the dimensions are approximately 40 Å x 40 Å x 30 Å. The only two cysteines in the T. reesei Cel12A protein, Cys 4 and Cys 32, form a disulfide bond that bridges E-strands A1 and A2. Post-translational modifications The N-terminal glutamine undergoes a cyclization and condensation reaction with the amine group of the N-terminus, to produce a cyclic pyro-glutamate. This is common in fungal extracellular enzymes and often makes the protein resistant to proteolytic degradation. Also, a NAG residue is found covalently attached to Asn 164, which is part of an N-glycosylation amino-acid sequence motif in the protein. The NAG residue stacks with the side-chain of Tyr 124 from the same molecule, and with Asn 91 of an NCS-related molecule in an apparent dimer contact.

34 A B

Figure 10. Schematic ribbon diagram drawing, a) top and b) side view, of the T. reesei Cel12A crystal structure, color-ramped according to residue number, starting with red at the N-terminus and ending with blue at the C terminus. The two E-sheets in the structure are labeled A and B. Individual strands are labeled (A1-A6 or B1-B9) according to their positions in the two E-sheets.

A B

Figure 11. Close-up, a) top and b) side view of the substrate-binding cleft of T. reesei Cel12A. Some of the most important residues have their side chains drawn, with carbon atoms in gold, nitrogens blue, oxygens red and sulphur green.

4.2.3 Substrate-binding cleft The concave surface of E-sheet B forms a large crevice in the molecular surface perpendicular to the strand direction. This crevice is approximately 35 Å long, 8 Å wide, 15 Å deep, and is the cellulose substrate binding site. In T. reesei Cel12A, we estimate that the cleft has the potential to bind at least six glucose residues, spanning from - 4 to + 2, using the binding site nomenclature of (Davies et al. 1997b).

35 Substrate binding Glucosyl binding sites are frequently formed by the exposed surfaces of aromatic side- chains of Trp, Phe and Tyr residues. This is also true for the binding crevice of T. reesei Cel12A, which contains two fully exposed tryptophan rings as well as a pair of exposed tyrosine residues. Three more aromatic side-chains have ring edges exposed in the crevice. The upper edge of one side of the crevice (the top strip of residues in Figure 11) has a clearly defined hydrophobic strip made of side-chains from Trp 7, Trp 22, Val 57 and Phe 202. The edge of the crevice is completed by the side-chains of Met 154, Asn 151 and Ile 130. The side-chains that fill the bottom of the crevice are strikingly different from the hydrophobic strip (Figure 11). The side-chains from strands B1-4 in this region are predominantly polar, and rich in asparagine, threonine and some glutamine residues. Catalytic site The crevice contains two glutamate residues, Glu 116 and Glu 200, which are invariant throughout the GH family. We predicted Glu 116 to be the catalytic nucleophile in T. reesei Cel12A, an identification supported by site-directed mutagenesis studies by (Okada et al. 2000). The carboxylate group oxygens of the two glutamates are separated by ~5.4 Å, a distance frequently observed for the nucleophile/acid-base involved in a retaining mechanism (McCarter and Withers 1994). The nucleophile is in close proximity to two other residues that are strictly conserved in family 12, Asp 99 and Met 118 (Figure 11 and 12). Together with the two invariant glutamates, the aspartate is the third member of a catalytic trio, similar to those first observed in the GH clan-B enzymes (Keitel et al. 1993; Divne et al. 1994). The acid-base Glu 200 forms only one hydrogen bond, to the side-chain of Asn 95. The side-chain of Ile 130 is suitably positioned at the bottom of the crevice to help form at least part of the +1 product site. Both the – 1 and + 1 sites are devoid of exposed aromatic side-chains. Substrate-binding cleft reducing end In both T. reesei Cel12A and S. lividans CelB2, residues from the cord region, in particular Pro 129/133 - Ile 130/134, are likely to form the bottom of the + 2 (or + 3) site. The proline ring is sandwiched between two aromatic residues Trp 120/124 and Tyr 147/Trp 151. The cord is structurally well conserved between T. reesei Cel12A and S. lividans CelB2, and in the six NCS molecules. In Cel12A there are no signs of the conformational flexibility that has been suggested to occur upon substrate binding in T. reesei Xyn II (GH family 11 ) (Muilu et al. 1998).

4.3 Thermal stability and activity of GH family 12 enzymes (Paper II) In this study we have measured the stability and activity of several GH family 12 homologues, and point mutants of these, in an attempt to identify residues important for

36 thermal stability. Significant biochemical diversity was seen among the examined homologues (Table 2). Notably, the enzyme from the fungus Hypocrea schweinitzii (Goedegebuur et al. 2002) differs from the T. reesei Cel12A enzyme at only 14 residues, but is significantly less thermally stable. We have systematically introduced these 14 differences into the T. reesei Cel12A enzyme by site-directed mutagenesis, and examined their effects on the thermal stability of the enzyme.

Secondary str. elem. A1 B1 B2 A2

T. reesei seq. 110203040 lllll S. lividans -DTTICEPFGTTTIQGRYVVQNNRWGST----APQCVTATD-----T S. sp 11AG8 -NQQICDRYGTTTIQDRYVVQNNRWGTS----ATQCINVTG-----N T. reesei - - QTSCDQWATFTG - NGYTVSNNLWGASAGS -GFGCVTAVSLS -GGA H. grisea QIRSLCELYGYWSG-NGYELLNNLWGKDTATSGWQCTYLDGTNNGGI H. schweinitzii - -QTSCDQYATFSG-NGY I VSNNLWGASAGS -GFGCVTSVSLN -GAA *****

Secondary str. elem. A3 B3

T. reesei seq. 50 60 70 ll l S. lividans GFRVTQADGSAPTNGAPKSYPSVFNGCHYTNCSPGTDLPVRLDTVSA S. sp 11AG8 GFE I TQA DGSVPTNGA PKSY PSVY DGCHY GNCA PRTTLPMR I SS I GS T. reesei SWHAD -WQWSGGQ-NNVKSYQNSQI A I ------PQKRTVNSISS H. grisea QWSTA -WEWQGAP -DNVKSYPYVGKQI ------QRGRKISDINS H. schweinitzii SWHAD -WQWSGGQ-NNVKSYQNVQI N I ------PQKRTVNSIGS ***

Secondary str. elem. A5 B5 B6

T. reesei seq. 80 90 100 110 120 lllll S. lividans APSS I SYGFVDG- AVYNASYD IWLDPTARTDG- VNQTE IMIWFNRVG S. sp 11AG8 APSSVSYRYTGN -GVYNAAYD IWLDPTPRTNG- VNRTE IMIWFNRVG T. reesei MPTTASWSYSGSN I RANVAYDLFTAANPNHVTYSGDYELMIWLGKYG H. grisea MRTSVSWTYDRTD I RANVAYDVFTARDPDHPNWGGDYE LM IWLARYG H. schweinitzii MPTTASWSYSGSD I RANVAYDLFTAANPNHVTYSGDYE LM IWLGKYG * * ** * *** *

Secondary str. elem. B9 B8 B7 A6

T. reesei seq. 130 140 150 160 170 llll l S. lividans P I QP I GSPVGTASVGGRTWEVWSGGNGSNDVLSFVAP - SA I SGWSFDV S. sp 11AG8 PVQP I GSPVGTAHVGGRSWEVWTGSNGSNDV I SFLAP - SA I SSWSFDV T. reesei D I GP I GSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQ- TNTTNYSGDV H. grisea GI YP I GTFHSQVNLAGRTWDLWTGYNGNMRVYSFLPPSGD I RDFSCD I H. schweinitzii D I GP I GSSQGTVNVGGQTWTLYYGYNGAMQVYSFVAQ- SNTTSYSGDV *** * * * ** * ** **

Secondary str. elem. alpha helix B4 A4:1 A4:2

T. reesei seq. 180 190 200 210 218 lll ll S. lividans MDFVRATVA -RGLAENDWYLTSVQAGFE PWQ-NGAGLAVNSFSSTVET S. sp 11AG8 KDFVDQAVS -HGLATPDWYLTS I QAGFE PWE-GGTGLAVNSFSSAVNA T. reesei KNFFNYLRDNKGYNAAGQYVLSYQFGTEPFTGSG- TLNVASWTAS I N - H. grisea KDFFNYLERNHGYPAREQNL I VYQVGTE CFTGGPARFTCRDFRADLW- H. schweinitzii KNFFNYLRDNKGYNAGGQYVLSYQFGTE PFTGSG- TLNVASWTAS I N- ** ***

Figure 12. Structure based sequence alignment of five GH family 12 enzymes with known structure. The secondary structure elements of the proteins are drawn at the top of the alignment. The position of the nucleophile and the acid-base in the sequences is indicated with filled and open arrows, respectively. The aligned protein sequences, with their GenBank or PDB access codes indicated in parentheses, are: Streptomyces lividans CelB2 (U04629, 2NLR); Streptomyces sp. 11AG8 Cel12A (AF233376, 1OA4); Humicola grisea Cel12A (AF435071, yyy); Trichoderma reesei Cel12A (AB003694, 1H8V); Hypocrea schweinitzii Cel12A (AF435068, 1OA3).

37 4.3.1 Thermal stability Circular-dichroism (CD) experiments to determine the thermal stability of the Cel12A homologues were performed on an Aviv 62ADS spectrophotometer (Protein Solutions, Lakewood, NJ). Buffer conditions were 0.05 M Bis-Tris propane, 0.05 M ammonium acetate, adjusted to pH 8.0 with acetic acid. The final protein concentration for each experiment was in the range of 10-20 PM. Data was collected in a 0.1 cm path length cell. The experiments were performed at 217 nm, the wavelength in the far-UV spectra with maximum signal difference, for a predominantly E-sheet protein. The temperature was increased from 30 to 90 °C with data collected every two degrees. The equilibration time at each temperature was 0.1 minutes and data were collected for 4 seconds per sample. The mid-point of the transition (Tm) is an apparent value because the thermal denaturation of the Cel12A proteins studied was not reversible. The apparent Tm for each examined enzyme is listed in Table 2.

Compared to the Tm of 54.4 °C for T. reesei Cel12A enzyme, the H. schweinitzii enzyme has a Tm that is 5.2 °C lower, while the Streptomyces sp. 11AG8 Cel12A homologue has a Tm that is 12.6 °C higher (Table 2). The 14 T. reesei Cel12A variants, recruited from the H. schweinitzii Cel12A enzyme, have Tm changes ranging from –4.0 °C, to +2.5 °C for the most stable variant, compared to the WT enzyme. The most significant changes in stability occur within the first 63 amino acids in the enzyme. The T. reesei Cel12A A35S variant has the largest decrease in stability (–4.0 °C), while mutating the same residue in T. reesei Cel12A to a valine (recruited from the more stable homologue, S. sp 11AG8 Cel12A), produces the largest increase in stability (+7.7 °C). Interestingly, there are three substitutions in the less stable H. schweinitzii Cel12A enzyme, G41A and T11S/T16I. These cause a Tm increase (of 2.5 and 1.1 °C, respectively) when introduced in T. reesei Cel12A (Table 2).

The Tm variation from least to most stable homologue is 20.9 °C (Table 2). These types of stability differences among homologous proteins are common, and there has long been an interest in identifying the sources of the increased stability in the extremophiles (Jaenicke 2000). However, no generically reliable rules from comparisons of sequences alone or, even, modeled structures have emerged to predict stabilizing changes. Although we find a high level of structural homology between the Cel12A proteins from the mesophile (T. reesei) and the more stable enzyme from the alkalophilic bacterium (S. sp. 11AG8), it is by no means obvious, given the low sequence identity of 28 % (Figure 12), which changes are needed to stabilize the T. reesei protein. However, we were able to use the characterization of a less stable homologue, H. schweinitzii Cel12A, together with examination of the T. reesei Cel12A structure and sequence analysis, to guide our choice of recruitment from the more stable homologue. Our clear identification of a single residue responsible for large differences in stability within a structural family may be unusual.

38 Table 2. Thermal denaturation data, and relative specific enzyme activity ______a b GH12 homologue variant ' Tm Tm (qC) Activity ______T. reesei Cel12A 0.0 54.4 1.00 H. schweinitzii Cel12A -5.2 49.2 1.08 S. sp. 11AG8 Cel12A cd 11.3 65.7 3.84 S. sp. 11AG8 Cel12A fl 12.4 66.8 3.78 T. koningii Cel12A 4.1 58.5 1.00 G. roseum Cel12C -8.5 45.9 0.42 F. javanicum Cel12A -0.3 54.1 N.D. T. reesei Cel12A A35V 7.7 62.1 W7Y -1.0 53.4 T11S/T16I 1.1 55.5 A35S -4.0 50.4 S39N 0.5 54.9 G41A 2.5 56.9 S63V -0.8 53.6 A66N 0.1 54.5 S77G 0.1 54.5 N91D 0.5 54.9 S143T 0.5 54.9 T163S 0.3 54.7 N167S 0.2 54.6 A188G 0.5 54.9 ______a b ' Tm values are relative to T. reesei Cel12A WT. The specific enzyme activit is expressed as molar specific activity relative to T. reesei Cel12A WT. N.D. = not determined ______

4.3.2 Relative enzyme activity To evaluate specific enzyme activity of the Cel12A homologues, an o-Nitrophenyl E- cellobioside (oNPC, Sigma N 4764) hydrolysis assay was used. In a microtiter plate, 100 µl 50 mM sodium acetate, pH 5.5 and 20 µl 25 mg/mL oNPC in assay buffer were added. Once equilibrated, 10 Pl cellulase was added and the plate incubated at 40ºC for 10 minutes. To stop the reaction, 70 Pl of 0.2 M glycine, pH 10.0 was added. The plate was then read in a microtiter plate reader at 410 nm. As a reference, 10 µl of a 0.1mg/ml solution of T. reesei Cel12A enzyme provided an OD of around 0.3. Extinction coefficients for the other Cel12 homologues were calculated on the basis of their amino-acid compositions.

39 All examined GH family 12 proteins were active enzymes. There was a nine-fold difference in the specific activities of the homologues on oNPC (Table 2). The T. reesei Cel12A variants with the greatest and least stability (A35V and A35S, respectively), showed ~30 % higher activity than the WT enzyme. The activity of selected Cel12 enzymes, as a function of temperature was also determined. The Cel12 enzymes with greatest thermal stability, Cel12A S. sp. 11AG8 and T. reesei Cel12A variant A35V, showed a continual increase in activity over the full temperature range from 25 to 60 °C. The less stable homologues, T. reesei Cel12A and H. schweinitzii Cel12A, and the destabilized T. reesei Cel12A variant A35S showed a decrease in activity at the highest temperature, presumably due to thermal inactivation.

4.3.3 Structural features affecting stability To better understand the structural basis for the large differences in stability among the GH family 12 homologues, we determined the crystal structure of two of these enzymes, the fungal Cel12A from H. schweinitzii, the catalytic domain (cd) of the bacterial Cel12A from the S. sp. 11AG8. We also determined the structure of the most stabilized T. reesei Cel12A variant, A35V, with an increase in Tm of +7.7 °C. Protein crystallization The T. reesei Cel12A A35V variant was crystallized under conditions similar to the WT enzyme. Large single, wedge-shaped crystals grew to a maximum size of 1 mm in all directions within 1-2 days. The crystals belong to the space group P21 with cell dimensions a = 68.3 Å, b = 71.3 Å, c = 119.3 Å and E = 91.5°, and have a calculated

Vm of 1.9 (Matthews 1968) with 6 molecules in the asymmetric unit. S. sp 11AG8 Cel12A crystals were obtained from 10-20 % (w/w) mme PEG 5000, 200 mM sodium cacodylate, pH 5.0-6.0, at 25 oC and using a protein concentration of 15 mg/ml. The crystals belong to space group P212121 with cell dimensions: a = 65.1 Å, b = 54.5, c =

62.5 Å, and have a calculated Vm of 2.4 (Matthews 1968) with one molecule in the asymmetric unit. H. schweinitzii Cel12A crystals were obtained from a crystallization agent containing 200 mM cacodylate buffer (pH 6.0), 200 mM ammonium acetate, 10- 30 % (w/w) mme PEG 2000, 1 % isopropyl alcohol, at 20-24 oC, using a protein concentration of 15 mg/ml. Single, larger crystals (0.1 mm in all direction) were obtained after 6 months of incubation. These crystals belong to space group P21 with cell dimensions: a = 62.5 Å, b = 77.5 Å, c = 83.4 Å and E = 98.5°, and have a calculated Vm of 2.0 (Matthews 1968), with 4 molecules in the asymmetric unit. Structure solutions and refinements The bacterial S. sp. 11AG8 Cel12A catalytic domain (cd) structure was solved by molecular replacement (MR), using X-plor 3.1 (Brünger 1992b), with the bacterial S. lividans CelB catalytic core structure (PDB code 1NLR) as the search model. The structure of H. schweinitzii Cel12A was solved by MR with Amore (Navaza 1994),

40 using all of the homologous WT T. reesei Cel12A structure (residues 1-218) as a search model. The MR search gave one clear solution, with 4 molecules in the asymmetric unit. The initial map was improved by exploiting the four-fold NCS of the asymmetric unit. The changes in unit cell parameters of the T. reesei Cel12A A35V mutant with respect to the WT structure required the use of molecular replacement, using Amore to solve the structure. Summaries of model refinement statistics are given in Appendix I. The H. schweinitzii Cel12A structure The fungal Cel12A enzyme from H. schweinitzii crystallizes with four NCS-related molecules in the asymmetric unit. The complete set of CD atoms from the four NCS molecules in the H. schweinitzii Cel12A structure and the six NCS molecules in T. reesei Cel12A structure, can be superimposed with pair-wise root mean square deviations (RMSD) in the range of 0.3-0.5 Å. Some of the biggest main-chain differences between the two structures can be found in the two loops corresponding to residues 11-16 and 37-43, connecting E-strands B1 to B2 and A2 to A3 in the structure (Figures 13). Most of the 14 differences in the H. schweinitzii Cel12A enzyme compared to the T. reesei enzyme, are located on the protein surface, and distributed over the whole molecule (Figures 13). Many of the side-chains point out into the surrounding solution. All of these substitutions (except N91D) are from a neutral to another neutral amino acid, and most have little or no effect on the Tm of the enzyme (with ' Tm's in the range 0.1-0.5 °C), Table 2. There is a clustering of substitutions on the first E-strands in the structure, A2-3 and B1-3, where some of the individual substitutions have a large effect on the Tm of the T. reesei Cel12A enzyme. This region also has some of the largest structural differences between the two fungal enzymes. The substitution in this region that has the biggest effect on the Tm is the alanine to serine at residue 35, located on E- strand A2 on the smaller E-sheet A close to the N-terminus (Figure 13). This substitution causes a reduction of the Tm by 4.0 °C when introduced into T. reesei

Cel12A (Table 2). The likely explanation for this reduction in Tm, is the introduction of a hydrophilic residue in the hydrophobic environment, at the edge of the two E-sheets. This disrupts the hydrophobic interactions with the side chains of the three surrounding hydrophobic residues. The T. reesei Cel12A A35V structure The A35V variant crystallizes in the same space group and with same cell constants as the WT T. reesei Cel12A enzyme. Like the WT enzyme it crystallizes with six NCS- related protein molecules in the asymmetric unit, making up three pairs of interacting molecules. The biggest differences among the six NCS molecules correspond to loop regions, which take on different conformations in the different NCS molecules, and most of these are affected by the crystal contacts. The functional group of residue A35 of T. reesei Cel12A, points into the core between the interacting E-sheets, where it interacts with a number of spatially adjacent side-

41 chains. Mutating residue 35 does not cause a major conformational change or local re- arrangement in the structure. However, the introduction of a valine at this position influences packing. The methyl groups make good van der Waals contacts with the neighboring hydrophobic residues side-chains (Figure 14a). In the WT enzyme, the packing of the alanine 35 CE methyl group is not so tight as in the A35V variant. The approximately equivalent pairs of CD's, from the NCS molecules in the T. reesei Cel12A WT and A35V structures, can be superimposed with pair-wise RMSD's in the range of 0.4-0.6 Å.

The Streptomyces sp. 11AG8 Cel12A structure The cd of the bacterial S. sp. 11AG8 Cel12A enzyme crystallizes with only one molecule in the asymmetric unit. The S. sp. 11AG8 Cel12A cd structure is most similar to the Streptomyces lividans GH family 12 enzyme CelB2, with which it shares 72 % sequence identity (Figure 12) and there are no insertions or deletions in the sequences. In the S. sp. 11AG8 Cel12A structure there is no observable density for the residues in the “linker” region beyond Ala 222. The complete set of CD atoms from the two structures can be superimposed with a RMSD of 0.49 Å. There are two disulfide-bridges in the S. sp. 11AG8 enzyme. The one between Cys 5 and Cys 31, linking E-strands A1 and A2, is conserved throughout the whole GH family 12 (Paper I). The second disulfide-bridge is formed between Cys 64 and Cys 69, and is found in an extended loop between E-strands B3 and A5, which forms part of the substrate-binding cleft of the enzyme. Structural alignment shows that the bacterial GH family 12 enzymes have an insertion in this loop compared to the fungal GH family 12 enzymes (Paper I). This suggests that the purpose of this disulfide bridge may be to stabilize the insertion. One of the largest main-chain differences in the S. sp 11AG8 Cel12A structure compared with that of the S. lividans is at residue V/A34, the equivalent site of the T. reesei Cel12A A35V mutation, shown to affect the temperature stability in the T. reesei Cel12A enzyme. The loop corresponding to residues 33-38 has a small rigid body shift so that in the CelB2 enzyme it moves closer to the upper sheet (Figure 14b). In CelB2, the CE of A34 has hydrophobic contacts with the three equivalent inward-pointing residues from the upper sheet (T11, I13 and V19, and inter-atomic separations in the range 3.6-4.2 Å). In the S. sp. 11AG8 Cel12A structure, the methyl groups of V34 also remain in good van der Waals contact with the equivalent side-chains (separations of 3.5-4.3 Å).

42 A B

Figure 13. Schematic ribbon diagram drawing, (a) top and (b) side view, of the H. schweinitzii Cel12A crystal structure. Color-ramped according to residue number, starting with red at the N-terminus and ending with blue at the C-terminus of the structure The structures have side-chains drawn for the 14 residues that differ from the T. reesei Cel12A protein sequence.

A B

Figure 14. Interactions and conformational changes close to residue 35 of the fungal GH 12 enzymes from T. reesei (WT and A35V have carbon atoms colored yellow and goldenrod respectively), and H. schweinitzii (carbons colored gold). Red bubbles indicate contacts in T. reesei A35V Cel12A, blue bubbles in H. schweinitzii Cel12A. (b) Interactions and conformational changes close to residue 34 of the bacterial GH 12 enzymes from S. sp. 11AG8 (carbons colored gold), and S. lividans (carbons colored yellow). Red bubbles indicate contacts in S. lividans CelB2, blue bubbles in S. sp. 11AG8 Cel12A.

43 4.3.4 Discussion Examination of the Cel12A structures provides structural rationalization for the differences in stability in the enzymes. The clear identification of a single residue responsible for large differences in stability within a structural family, as in this study, may be unusual. It is obvious from other studies (Lehmann and Wyss 2001; Lehmann et al. 2002), and from the limited sampling presented in this study, that not all recruitments from more stable homologues are stabilizing. Conversely, not all recruitments from a less stable homologue are de-stabilizing. In contrast to the T. reesei Cel12A A35S variant, six of the 14 substitutions recruited from the H. schweinitzii Cel12A stabilized the T. reesei Cel12A enzyme. Recruiting mutants with increased stability from less stable homologues has been previously reported in other protein families (Shaw et al. 1999; Lehmann et al. 2000; Perl et al. 2000).

4.4 The Humicola grisea Cel12A structure, and stabilizing cysteines (Paper III) In this study one additional GH family 12 homologue has been biochemically characterized, Cel12A from the fungus Humicola grisea. The secreted H. grisea Cel12A gene product is identical to the GH family 12 gene product reported for Humicola insolens, and it shares only 44% sequence identity to the T. reesei Cel12A enzyme (Figure 12).

4.4.1 Thermal Stability As for the other GH family 12 homologues, we have characterized the thermal stability of the H. grisea enzyme (Table 3). The H. grisea Cel12A enzyme is much more thermally stable than the T. reesei WT enzyme. It has a Tm (68.7 °C) that is 14.3 °C higher than the Tm (54.4 °C) of the T. reesei enzyme (Table 3). In cases like this with low homology between enzymes, systematic recruitment to identify the underlying cause of variation is difficult or impossible. A statistical approach then is needed to determine the preferential occurrence of certain amino acids at particular positions. The preferential data can be gathered from a subset of homologous with natural diversity and available sequences (Shaw et al. 1999).

To identify residues that contribute to the different Tm behaviors between the H. grisea and T. reesei Cel12A enzymes, a sequence alignment of 21 GH family 12 enzymes (Goedegebuur et al. 2002), and a structure alignment of the four homologous Cel12A enzymes with known protein structure (Figure 12) were used. Based on this data, we identified three free-cysteine residues (C175, C206 and C216), in the H. grisea Cel12A enzyme as being potentially important for enzyme stability (Figure 12 and 15).

44 Table 3. Thermal denaturation data and relative specific enzyme activity ______a b GH12 homologue Variant ' Tm Tm (qC) Activity ______T. reesei Cel12A WT 0.0 54.4 1.00 H. grisea Cel12Ac WT 14.3 68.7 0.12

T. reesei Cel12Ac G170C 2.1 56.5 0.69 P201C 3.9 58.3 0.21 V210C 0.1 54.5 NA G170C/P201C 0.7 55.1 0.38 P201C/V210C 0.7 55.1 0.50 G170C/P201CV210C 0.0 54.4 0.14 G170C/V210C ND ND 0.28

H. grisea Cel12Ad C175G 1.3 70.0 0.65 C175S 0.2 68.9 0.73 C206P -9.1 59.6 0.91 C206S -5.4 63.3 1.55 C216V 0.8 69.5 0.69 C216S -5.6 63.1 0.65 ______a c d b ' Tm values are relative to: T. reesei Cel12A WT, H. grisea Cel12A WT. The specific activities given are relative to: cT. reesei Cel12A WT, dH. grisea Cel12A WT. ______

A possible simple explanation for the increased stability of the H. grisea enzyme is that two of these Cys residues are involved in an additional disulfide bond, in addition to the disulfide bond between Cys residues 6 and 35 (H. grisea Cel12A numbering). This disulfide bond is totally conserved within GH family 12 and known to be very important for the stability. However examination of the T. reesei Cel12A structure made it look unlikely that these three additional Cys residues in H. grisea Cel12A make an extra disulfide bond in the enzyme. It is unusual for a protein to have so many free cysteines, unless they have a function, since they can be oxidized or lead to mis- formation of essential disulfides (e.g., Cys6-35). To determine if these free cysteines are important for stability, we introduced them into the corresponding positions in the T. reesei Cel12A enzyme, as single, double, and triple mutations. We also constructed six H. grisea Cel12A variants where we exchanged the free cysteines with the corresponding residues in the T. reesei enzyme (C175G, C206P and C216V) and with serine, and determined the thermal stability of these variants (Table 3). The free cysteines in H. grisea Cel12A seem to play a key role in modulating the stability of the enzyme (Table 3). The T. reesei Cel12A cysteine variants recruited from H. grisea Cel12A have Tm changes ranging from 0.1 °C to an

45 increase of 3.9 °C for the most stable variant (P201C). The H. grisea Cel12A variants have Tm changes ranging from a decrease of 9.1 °C for the least stable variant (C206P), to an increase of 1.3 °C for the most stable one (C175G), Table 3.

4.4.2 Relative enzyme activity The specific activities of WT and mutant enzymes were determined with o-NPC as substrate (Table 3). The H. grisea enzyme has only 12 % of the activity of the T. reesei enzyme at 40 °C. The T. reesei enzyme is more sensitive to the mutations. One H. grisea Cel112A mutation (C206S) actually increases the enzyme activity slightly.

4.4.3 Protein structures To shed some light on the structural basis for the large thermal stability differences between the two fungal Cel12A enzymes, the crystal structures of the H. grisea Cel12A WT and the T. reesei Cel12A cysteine variant (P201C), were determined. Both structures have the expected overall-fold of a GH family 12 enzyme. Statistics on data collection, refinement and the final structure models are given in Appendices I and II. The H. grisea Cel12A structure The final model of the H. grisea enzyme contains the complete sequence of 224 amino acids. Equivalent CD atoms from the H. grisea and the T. reesei Cel12A structures can be superimposed with pair-wise RMSD's in the range of 0.3-0.5 Å. There are no larger insertions or deletions in the H. grisea structure compared with that of T. reesei. Four of the extra six residues in the H. grisea structure are distributed over the whole molecule as single amino acid insertions, mainly in the loops connecting the E-strands. Two of the extra six residues are located at the C-terminus. Some of the biggest main-chain differences that exist between the two structures can be found in the loops that have an extra residue in the H. grisea structure. The three free cysteines, C175, C206 and C216, are located on E-strands A6, B4 and A4 respectively (Figure 15). Their side chains point into the core between the two E-sheets and thy form extensive interactions with surrounding residues (Figure 16). The T. reesei Cel12A P201C structure

The T. reesei Cel12A P201C variant crystallizes in the space group P31, with two NCS molecules in the asymmetric unit, forming a structural dimer. The two NCS molecules block their substrate-binding clefts as in our previous T. reesei Cel12A structures (Paper I and II). Residue 201 in T. reesei Cel12A is located on E-strand B4, a strand on the bigger E-sheet B, one residue from the catalytic acid/base (E200), Figure 15. Figure 16 shows the environment around residue 201 in the WT and mutant T. reesei Cel12A structures. The introduction of a cysteine does not cause a major conformational change in the variant compared to the WT structure.

46 A B

Figure 15. Schematic ribbon diagram drawing, (a) top and (b) side view, of the H. grisea Cel12A crystal structure. The structure has side chains drawn for the three free cysteine residues (C175, C206 and C216), and the two catalytic residues (E120 and E205).

Figure 16. Interactions and conformational changes close to residues 206 and 201 of the H. grisea Cel12A structure, and the T. reesei WT and P201C Cel12A structures, respectively. The H. grisea Cel12A structure has carbon atoms colored goldenrod and the T. reesei WT and P201C Cel12A structures have carbon atoms colored yellow and orange. The blue bubbles indicate contacts to the free cysteine residues in H. grisea Cel12A.

47 Introducing the free cysteines from H. grisea Cel12A into equivalent positions in T. reesei Cel12A results in an increase of the Tm by 0.1-3.0 °C. The T. reesei P201C structure shows that the mutation causes only small local changes in the protein structure. One reason for the increased Tm of the P201C variant is the filling of a small cavity by the cysteine side chain, and the resulting set of van der Waals interactions (Figure 16). The change, however, is smaller than the C206P mutation in the H. grisea enzyme. Not all of the interactions present in the H. grisea enzyme are made in the T. reesei P201C enzyme. In particular the Q34 - C206 interaction that exists in the H. grisea enzyme is missing in the variant. The effect on the solvent accessibility of the interacting residues when introducing the P201C mutation is very small.

The free cysteine residue in H. grisea Cel12A, which has the least effect on the Tm is

C175. If this cysteine is mutated to a glycine or a serine, the Tm of the enzyme is increased by 1.3 °C and 0.2 °C, respectively, compared to the WT (Table 3), but the enzyme activity, however, is decreased for both variants (C175G/S), Table 3. Cysteine residue 175 is located on E-strand A6, on the smaller E-sheet A that creates the convex outer surface of the enzyme, close to the only D helix in the structure (Figure 15). The side chain of the residue points into the core between the two E-sheets, where it has a set of van der Waals interactions with the side chains of six adjacent residues.

The free cysteine at residue 206 has the largest effect on the Tm of the H. grisea enzyme. This residue sits next to the catalytic acid/base (Glu 205) on E-strand B4 (Figure 15b). The side chain points into the E-sheet core where it has an extensive network of interactions with the side chains of eight adjacent residues (Figure 16). Many of the interactions with SG are polar in nature, in particular the contact with the indole NH1 of W52, indicative of a short S-HN hydrogen bond (3.3 Å). Mutating this residue in H. grisea Cel12A to a proline, or a serine, causes a reduction in the Tm of 9.1 °C and 5.4 °C, respectively, compared to the WT enzyme. Although this residue is positioned next to the acid/base, the Tm for these two mutations is reduced significantly. The enzyme activity is only modestly decreased for the glycine mutation, and drastically increased for the serine mutation (relative specific activity of 91 % and 150 %, compared to WT), Table 3. The third free cysteine is located at residue 216 on E-strand A4 in E-sheet A, the sheet that creates the outer convex surface of the enzyme. This cysteine residue has its side chain pointing into the core region below the in the structure, where it interacts with the side chains of five adjacent residues. These are van der Waals contacts so mutating this residue to a valine causes little change in the Tm. The introduction of the more polar serine residue with no possibility for hydrogen bonding has a large effect on the Tm, (Table 3). The enzyme activity is decreased for the mutations (relative specific activity of 0.7 % for both), compared to WT (Table 3)

48 4.4.4 Discussion The effects of the single mutations in the H. grisea and the T. reesei Cel12A enzymes do not explain the total difference in thermal stability between them. In the next obvious step to combine the mutations, little or no additivity is seen, and multiple variants of the T. reesei enzyme still do not approach the H. grisea Cel12A stability. Interestingly, we see that the free Cys residues in H. grsea Cel12A are not necessarily the most stabilizing amino acids to have in these positions. Replacing Cys 175 with Gly or Cys

216 with Val in H. grisea Cel12A, leads to an increase in the Tm by 1.2 °C and 0.78 °C, respectively, compared to the WT enzyme.

4.5 H. grisea Cel12A complex structures (Paper IV) Only one ligand-bound GH 12 structure has previously been reported, the crystal structure of the catalytic core domain of Steptomyces lividans CelB2 (Figure 16), in complex with 2-deoxy-2-fluoro-cellotrioside (Sulzenbacher et al. 1999). In this study, we present four new ligand-bound GH 12 structures, H. grisea Cel12A in complex with cellobiose (G2), cellotetraose (G4), cellopentaose (G5) and thio-linked cellotetraose

(G2SG2).

4.5.1 Overall protein structures Statistics on data collection, refinement and the final complex models are given in Appendices I, and II. As was previously predicted for the apo H. grisea Cel12A crystal structure (Paper III), the substrate-binding cleft of the enzyme is not blocked by other protein molecules in the crystal. This made it possible to soak a ligand into the binding site of the enzyme molecules in the crystal. The four H. grisea Cel12A complex crystals all have the same space group (P43212), and similar cell constants to the apo crystal.

The final models of the H. grisea Cel12A G2, G4, G5 and G2SG2 complexes contain one protein molecule with all the 224 amino acids in the enzyme. There are no major changes in the main-chain conformations in the complexes, compared to the apo structure. The largest number of changes occurs in the side chains of amino acids lining the binding cleft, especially in the complex structures where the ligand spans the active site (G5 and G2SG2). The separation between the carboxylate groups of the catalytic nucleophile and the acid/base glutamyl residues varies between 5.8-6.6 Å, in the four different complex structures.

4.5.2 Oligosaccharide complexes The ligand complexes were obtained by equilibrating apo H. grisea Cel12A crystals for 36 hours in drops consisting of 30 % (w/w) mme PEG 2000 and 20 mM of the ligands, over a reservoir solution containing 30 % (w/w) mme PEG 2000. The different enzyme ligand interactions are summarized in Table 4. The H. grisea Cel12A G2, G5 and

49 G2SG2 complexes show well-defined electron-density in the +1 and +2 subsites of the enzyme. All complexes show the same directionality, they have their reducing end pointing towards the right in Figure 17 and 18.

Figure 17. The H. grisea Cel12A crystal structure in complex with a modeled cellohexaose ligand (consisting of the 4 to 2 glucans from the cellotetraose structure, and the 1 to +2 glucans from the cellopentaose structure), spanning from binding site 4 to +2, colored in gold. The structure has side chains drawn for the two catalytic residues Glu 120 and Glu 205, colored in gold.

The cellobiose complex shows density for a ligand only in the product site (Figure 18d). The cellotetraose chain in the H. grisea Cel12A G4 complex structure is bound in the binding cleft spanning from sites –4 to –1. The electron density for the cellotetraose molecule is well defined for the glucosyl residues occupying binding sites –4 to –2, but is less well defined for the glucosyl unit occupying binding site –1 (Figure 18b). The cellopentaose chain and the thio-linked cellotetraose chain in the H. grisea Cel12A G5 and G2SG2 complex structures are bound spanning sites –2 to +2 (Figure 18c). In these structures the electron densities for the two glucosyl residues occupying binding site +1 and +2 are well defined, but are less defined for the two glucosyl residues occupying

50 binding site –1 and –2 (Figure 18c). There is no detectable electron-density in the binding cleft for the fifth glucosyl residue in the cellopentaose molecule.

A

B

C

D

Figure 18. Electron density maps for the H. grisea Cel12A-cellotetraose (b), cellopentaose (c), and cellobiose (d) complexes. The electron density for the cellohexaose complex (a) is a combination of that from the cellotetraose complex (4 to 2 glucans), and the cellopentaose complex (1 to +2 glucans). The maps shown are maximum-likelihood V A weighted 2~Fobs~~Fcalc~ maps, contoured at 1V. For clarity the electron density has been cut around the cellooligomers using a masking radius of 1.5 Å. The catalytic nucleophile (Glu 120) and the acid/base (Glu 205) are shown for reference.

51 Conformation of glucosyl residues The glucosyl residues in binding sites –4 and –2 of the H. grisea Cel12A complex structures, both have a 4E elongated chair conformation (Figure 19a). However, in 4 binding site –3, + 1 and +2, the glucosyl residues are in the full C1 chair conformation. The glucosyl residues in sites –4 and –3 are only present in the cellotetraose complex structure. The confrontation of the glucosyl residue in binding site –1 that best fits the electron density in the two complex structures (G5 and G2SG2) where the glucan chain 1 spans over the binding site, is a S3 skew-boat conformation (Figure 19b). In the G4 structure where the glucan chain ends in the –1 site, the glucosyl residue in this position 4 has a full C1 chair conformation (Figure 19a). If one looks down the binding cleft from the entrance at site –4 towards the exit at site +2, the cellulose chain makes a right- handed twist, approximately 55º.

4.5.3 Protein carbohydrate interactions Binding sites –4, –3 and –2 There is one direct hydrogen bond formed between the glucosyl in binding site –4 and the protein, from the glucosyl O6 hydroxyl to the amid group of Asn114 in H. grisea Cel12A (Figure 19a). There is also a hydrogen bond between the amide group of Asn 114 and the O3 hydroxyl on the glucosyl in binding site –3. This glucosyl also has two more hydrogen bonds to the protein, from the O2 hydroxyl to the hydroxyl group on Tyr 66, and from the O6 hydroxyl and the hydroxyl group on Tyr 9. There is only one protein-carbohydrate hydrogen bond formed in binding site –2, and that is between the glucosyl O2 hydroxyl and the OG1 of Asn 22. The glucosyl ring in site –2 has hydrophobic stacking interactions with the indole ring of Trp 24. Binding site –1, the active site The nucleophile and acid/base catalysts, Glu 120 and Glu 205, are positioned on opposite sides of the linkage between sites –1 and +1 (Figure 17). The distance between the anomeric C1 carbon of the glucosyl in site –1 and the OH2 of the nucleophile Glu 120 is 3.2 Å in the G4 complex structure where the glucosyl is in a full chair conformation. In the G5 and G2SG2 complex structures where the glucosyl has adapted 1 a S3 skew-boat conformation and the ligand is spanning the active site, it is 3.5 Å. There is a hydrogen bond of 2.6 Å between the O2 hydroxyl on the –1 glucosyl and the OH1 of Glu120. The O2 hydroxyl on the –1 glucosyl also interacts with the OG1 of Asn155, with a distance of 3.0 Å. The distance between the anomeric C1 carbon and the OH2 of the acid/base Glu 205 is 4.0 Å in the G4 complex structure and 3.2 Å in the H2 G5 and G2SG2 complex structures. The O of Glu 205 has close contacts to the O6 and O5 hydroxyl groups on the glucosyl in the –1 site, 2.6 Å and 3.1 Å respectively in the

G5 and G2SG2 complex structures, and 2.9 Å and 3.0 Å respectively in the G4 complex structure. There is only one detectable water-mediated hydrogen bond between the

52 protein and the glucosyl in site –1, and it is formed between the O3 hydroxyl and the O atom of Trp 115.

A

B Figure 19. Protein- carbohydrate interactions in the binding cleft of H. grisea Cel12A, in sites 4 to 1 of the G4 complex (a), and 2 to +2 of the G5 complex (b). Side chains are shown for residues important in ligand binding. Possible hydrogen bonds (distances <3.4 Å) between the carbohydrate and the protein are drawn as a series of magenta spheres. The protein CD backbone is drawn as a goldenrod coil.

Binding sites +1 and +2 There is also one hydrogen bond of 2.5 Å from the OH2 of Glu205 to the O4 oxygen on the glucosyl in site +1 in those complex structures (G2, G5, and G2SG2) with glucosyl residues in the product sites. The +1 glucosyl has a hydrogen bond of 2.6 Å between the

53 O3 hydroxyl and the OH1 of Glu205, and one of 2.7 Å between the O2 hydroxyl and the carboxyl of Tyr 132. The carboxyl oxygen of Tyr 132 also has a short contact distance of 2.7 Å to the O4 hydroxyl of the +2 glucosyl. The O6 hydroxyl group of the +2 glucosyl has a short contact distance of 2.7 Å to the N group on Tyr 132.

4.5.4 Transglycosylation The ligand structure obtained after soaking the H. grisea Cel12A crystals with cellotetraose was a complete surprise. The tetraose molecule that was bound in sites –4 to –1 appeared to have a E-1,3-glycosidic bond in the middle, linking the two cellobiose moieties. The quality of the electron density indicates full occupancy for the ligand, strongly suggesting that it is not the result of overlapping ligands at partial occupancy, but that we actually have a mixed E-1,3-1,4-tetraose bound to the enzyme. 1H NMR analysis of the starting cellotetraose sample showed the exclusive presence of E-1,4- glycosidic bonds and no detectable E-1,3-glycosidic links in the sample (less then 1/1000 of the glycosidic bonds). The only reasonable explanation for the E-1,3- glycosidic linkage that we find between the glucose units in sites –3 and –2 is that it has formed by transglycosylation during the ligand soaking.

Figure 20. The electron density map for the H. grisea Cel12A-cellotetraose complex with the E-1,3-linkage indicated in the structure. The shown map is a maximum- likelihood V A weighted 2~Fobs~~Fcalc~ map, contoured at 1V. For clarity the electron density has been cut around the cellotetraose ligand using a masking radius of 1.5 Å.

We have no structure of a covalently bound intermediate, but if we model the intermediate structure of S. lividans CelB2 (Sulzenbacher et al. 1999) on our active site, the distances to the C1 anomeric carbon are 3.1 and 4.0 Å for the O4 and O3 hydroxyl

54 groups on the glucosyl in site +1, respectively. The preferred binding of cellobiose in the product site with O4 closest to the reaction center, suggests that E-1,4- transglycosylation would occur much more frequently than the formation of a E-1,3- linkage. With cellobiose as acceptor, E-1,4-transglycosylation would only have the effect of reforming the substrate, but it is possible that cellotetraose could also be an acceptor resulting in the formation of longer cello-oliogosaccharides as transglycosylation products. The fact that we observe the mixed-link tetraose bound to the enzyme provides in itself very little information about the reaction rate for E-1,3- transglycosylation, relative to hydrolysis and E-1,4-transglycosylation. The formation of a ligand with a E-1,3-linkage may be a very rare event but still yields sufficient quantity of this species during the long incubation time (36 h) of the crystals in the cellotetraose ligand. The mixed-link tetraose fits remarkably well in the enzyme. In sites –1 and –2 the binding is very similar to the cellopentaose complex. Whereas a regular E-1,4-glucan chain would extend upwards from the active site, out of the cleft, the E-1,3-linkage allows the glucan to extend along the bottom of the cleft. One can hypothesize that the fit of a mixed-link E-1,3-1,4-glucan chains in H. grisea Cel12A is no coincidence. Rather the substrate-binding cleft has evolved to allow specific binding and cleavage of mixed-link E-1,3-1,4-glucan chains. In a mixed-link E-1,3-1,4-glucan chain, this would allow the enzyme to perform cuts in E-1,4 linkages two linkages away from a E-1,3 linkage. And we also think that the enzymes preference of binding a mixed-link E-1,3- 1,4-glucans accumulates this species of the ligand in the crystal, though it is a rare event that it is formed by a transglycosylation reaction in the enzyme.

55 5 CONCLUDING REMARKS ______

The goals of the thesis have all to some extent been fulfilled. We have solved the three- dimensional structures of four homologous GH family 12 enzymes, three fungal enzymes (Humicola grisea Cel12A, Hypocrea schweinitzii Cel12A, Trichoderma reesei Cel12A), and one bacterial (Streptomyces sp. 11AG8 Cel12A). We have used the structural and biochemical information that we have gathered from these, and other GH family 12 homologues, to produce a wide range of variants of the different homologous Cel12A enzymes. These variants have been biochemically characterized, and we have thereby identified positions responsible for some of the biochemical differences among the homologous enzymes. The three-dimensional structures of two T. reesei Cel12A variants, where the mutations have had a significant impact on the stability or the activity of the enzyme, have also been determined. Finally, we have solved four ligand complex structures of the WT H. grisea Cel12A enzyme, which has allowed us to characterize interactions between the substrate and the protein in the binding cleft of the enzyme.

Attempts to determine general principles of protein stability e.g., from studies of thermophilic proteins, have generally been difficult. Usually increased stability is a result of the accumulation of many small, subtle changes in the protein structure. Our results suggest that cumulative role of these changes is an under-appreciated factor in protein stability. In the future, various recombinations of individual changes will enable a more thorough characterization and quantification of the role of co-operativity in protein stability.

Our structural, stability and activity studies of closely related GH family 12 enzymes, and their variants, have provided insight on how specific residues contribute to protein thermal stability and enzyme activity in this family. This knowledge can in the future serve as a structural toolbox, when one wants to modify GH family 12 enzymes with specific targeted properties and features, by introducing subtle changes in structural components in these enzymes. And this can be utilized to develop new industrial products, or fine tune enzymes in already existing applications.

56 6 ACKNOWLEDGEMENTS ______

I want to thank everyone who has contributed to this thesis, and all those that have made these years at the joint structural group at Uppsala University and Swedish University of Agricultural Sciences so enjoyable! I am especially grateful to: Alwyn Jones for accepting me as a PhD student and continuous support during all these years, though it has taken several more years than the usual five to complete my thesis. Jerry for introducing me to the world of carbohydrate-degrading enzymes, and supporting me with your deep knowledge about this class of enzymes when I have written my manuscripts and thesis. Tex for your deep knowledge of protein crystallization, and that you always have been willing to share this with the rest of us, for your help crystallizing some of my proteins, and proofreading all of my manuscripts and thesis. Alex for helping me to solve my first structure, within the two weeks that Alwyn bravely promised Genencor Inc. that we would solve the structure. Gunnar for all the support that I have gotten from you these last couple of years, and especially for your help with refining some of my structures. Sherry for all the scientific discussions we have had during the years, and for introducing me to the world of chemotaxis and membrane proteins during my first years at the department. Lars and Torsten for always being willing to listen and give advice about any crystallographic problem that I had. My present colleagues at the department Emma, Patrik, Eva-Lena, Ulrika, Wimal, Anna, Isabella, Fredrik, Mark, Martin S., Seved, Margareta, Ulla, Karin W, David, Inger, Janos and Hans for all scientific and non scientific discussions that we have had during our synchrotron trips, at the lunch table, or whenever I have had a problem that I needed advice on or help with. My former colleagues: Jill, Neel, Hans, Nina, John, Deva, Rams, Karin K, Tove, Cicci, Martin, Jinyu, Elin, Ines and Lotta for all the help I previously have got, and the discussions I have had with you. Erling, Christer and Remco for keeping all computers and all other technical equipment running at the joint structural group. Erling for all our discussions we have had about house-building details over a mug of morning coffee. Ingrid and Elleonor for your support with all administrative details such as salaries, and travel expenses reports. My close collaborators at Genencor Inc. in San Francisco: Colin, Andy, Pete, Ed, Laurie, Mae, Shan, Tracie, Rick and Carol for all the support I have got from you during the last four years fruitful collaboration. My collaborators in Marc Claeyssens group in Ghent: Wim, Tom and Kathleen for all the discussions we have had on cellulase enzymology. Finally, I want to give thousands of roses to my family and relatives for all the understanding and support that I have recived from all of you during these years. The financial support from Genencor Inc. has been greatly appreciated.

57 7 REFERENCES ______

Armand, S., Drouillard, S., Schulein, M., Henrissat, B., and Driguez, H. 1997. A bifunctionalized fluorogenic tetrasaccharide as a substrate to study cellulases. J Biol. Chem. 272: 2709-2713. Bailey, M.J., Siika-aho, M., Valkeajarvi, A., and Penttila, M.E. 1993. Hydrolytic properties of two cellulases of Trichoderma reesei expressed in yeast. Biotechn. and Appl. Biochem. 17: 65-76. Barr, B.K., Hsieh, Y.-L., Ganem, B., and Wilson, D.B. 1996. Identification of Two Functionally Different Classes of Exocellulases. Biochemistry 35: 586-592. Bayer, E.A., Chanzy, H., Lamed, R., and Shoham, Y. 1998. Cellulose, cellulases and cellulosomes. Curr. Opin. Struct. Biol. 8: 548-557. Béguin, P., and Aubert, J.P. 1994. The biological degradation of cellulose. FEMS Microbiol Rev 13: 25-58. Bhikhabhai, R., Johansson, G., and Pettersson, G. 1984. Isolation of cellulolytic enzymes from Trichoderma reesei QM 9414. J Appl Biochem 6: 336-345. Bisset, J. 1984. A revision of the genus Trichoderma. Canadian J. of Botany 62: 924- 931. Blackwell, J., Kolpak, F., and Gardner, K. 1978. The structure of cellulose I and II. Tappi J. 61: 17-72. Boisset, C., Fraschini, C., Schulein, M., Henrissat, B., and Chanzy, H. 2000. Imaging the enzymatic digestion of bacterial cellulose ribbons reveals the endo character of the cellobiohydrolase Cel6A from Humicola insolens and its mode of synergy with cellobiohydrolase Cel7A. Appl. Environ. Microbiol. 66: 1444-1452. Bourbonnais, R., and Paice, M.G. 1990. Oxidation of non-phenolic substrates. An expanded role for laccase in lignin biodegradation. FEBS Lett 267: 99-102. Brünger, A.T. 1992a. Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature 355: 472-475. Brünger, A.T. 1992b. X-PLOR Version 3.1: A System for X-ray Crystallography and NMR, 3.1 ed. Yale University Press, New Haven, CT, USA. Brünger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros, P., Grosse-Kunstleve, R.W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N.S., et al. 1998. Crystallography & NMR system (CNS): a new software suite for macromolecular structure determination. Acta Crystallog. D54: 905-921.

58 Byrne, K.A., Lehnert, S.A., Johnson, S.E., and Moore, S.S. 1999. Isolation of a cDNA encoding a putative cellulase in the red claw crayfish Cherax quadricarinatus. Gene 239: 317-324. Coughlan, M. 1985. Cellulases: production, properties and applications. Biochem. Soc. Trans. 13: 405-406. Coutinho, P.M., and Henrissat, B. 1999. Carbohydrate-active enzymes: an integrated database approach. In Recent Advances in Carbohydrate Bioengineering. (eds. H. J. Gilbert, G. Davies, B. Henrissat, and B. Svensson), pp. 3-12. The Royal Society of Chemistry, Cambridge. Cowtan, K., and Main, P. 1998. Miscellaneous algorithms for density modification. Acta Crystallogr. D54: 487-493. Davies, G., and Henrissat, B. 1995. Structures and mechanisms of glycosyl hydrolases. Structure 3: 853-859. Davies, G.J., Wilson, K.S., and Henrissat, B. 1997a. Nomenclature for sugar binding subsites in glycosyl hydrolases. Biochem. J. 321: 557-559. Davies, G.J., Wilson, K.S., and Henrissat, B. 1997b. Nomenclature for sugar-binding subsites in glycosyl hydrolases [letter]. Biochem. J. 321: 557-559. Divne, C., Ståhlberg, J., Reinikainen, T., Ruohonen, L., Pettersson, G., Knowles, J.K., Teeri, T.T., and Jones, T.A. 1994. The three-dimensional crystal structure of the catalytic core of cellobiohydrolase I from Trichoderma reesei. Science 265: 524- 528. Divne, C., Ståhlberg, J., Teeri, T.T., and Jones, T.A. 1998. High-resolution crystal structures reveal how a cellulose chain is bound in the 50 Å long tunnel of cellobiohydrolase I from Trichoderma reesei. J. Mol. Biol. 275: 309-325. Engh, R.A., and Huber, R. 1991. Accurate bond and angle parameters for X-ray protein structure refinement. Acta Crystallogr. A47: 392-400. Fägerstam, L., Håkansson, U., Pettersson, G., and Andersson, L. 1977. Purification of three different cellulolutic enzymes from Trichoderma viride QM 9414 on a large scale. In Proceedings of Bioconversion Symposium, Feb 21-23. (ed. T. Gohose), pp. 165-178. Indian Institute of Technology, New Delhi. Fägerstam, L.G., and Pettersson, L.G. 1980. The 1,4-beta-glucan cellobiohydrolases of Trichoderma reesei QM 9414. A new type of cellulolytic synergism. FEBS Letters 119: 97-100. Fowler, T., and Brown, R.D., Jr. 1992. The bgl1 gene encoding extracellular beta- glucosidase from Trichoderma reesei is required for rapid induction of the cellulase complex. Mol. Microbiol. 6: 3225-3235. Gaboriaud, C., Bissery, V., Benchetrit, T., and Mornon, J.P. 1987. Hydrophobic cluster analysis: an efficient new way to compare and analyse amino acid sequences. FEBS Lett. 224: 149-155. Goedegebuur, F., Fowler, T., Phillips, J., Van Der Kley, P., Van Solingen, P., Dankmeyer, L., and Power, S.D. 2002. Cloning and relational analysis of 15

59 novel fungal endoglucanases from family 12 glycosyl hydrolase. Curr. Genet. 41: 89-98. Gottschalk, G. 1988. Cellulose degradation and the carbon cycle. In Biochemistry of and Genetics of Cellulose Degradation. (ed. J. Aubert), pp. 3-8. Academic Press, London. Håkansson, U., Fägerstam, L., Pettersson, G., and Andersson, L. 1978. Purification and characterization of a low molecular weight 1,4-beta-glucan glucanohydrolase from the cellulolytic fungus Trichoderma viride QM 9414. Biochim. Biophys. Acta 524: 385-392. Harrison, M.J., Nouwens, A.S., Jardine, D.R., Zachara, N.E., Gooley, A.A., Nevalainen, H., and Packer, N.H. 1998. Modified glycosylation of cellobiohydrolase I from a high cellulase-producing mutant strain of Trichoderma reesei. Eur. J. Biochem. 256: 119-127. Hayn, M., Klinger, R., and Esterbauer, H. 1993. Isolation and partial characterization of a low molecular weight endoglucanase from Trichoderma reesei. In Trichoderma Reesei Cellulases and Other Hydrolases. (eds. P. Suominen, and T. Reinikainen), pp. 153-158. Foundation for Biotechnical and Industrial Fermentation Research, Helsinki, Finland. Henrissat, B. 1998. Glycosidase families. Biochem. Soc. Trans. 26: 153-156. Henrissat, B., Claeyssens, M., Tomme, P., Lemesle, L., and Mornon, J.P. 1989. Cellulase families revealed by hydrophobic cluster analysis. Gene 81: 83-95. Henrissat, B., and Davies, G. 1997. Structural and sequence-based classification of glycoside hydrolases. Curr. Opin. Struct. Biol. 7: 637-644. Henrissat, B., Driguez, H., Viet, C., and SchÜlein, M. 1985. Synergism of cellulases from Trikoderma reesei in the degradation of cellulose. Bio/Technology 3: 722- 726. Hon, D. 1994. Cellulose: a random walk along its historical path. Cellulose 1: 1-25. Ilmen, M., Saloheimo, A., Onnela, M.-L., and Penttila, M.E. 1997. Regulation of Cellulase Gene Expression in the Filamentous Fungus Trichoderma reesei. Appl.Environ.Microbiol. 63: 1298-1306. Imai, T., Boisset, C., Samejima, M., Igarashi, K., and Sugiyama, J. 1998. Unidirectional processive action of cellobiohydrolase Cel7A on Valonia cellulose microcrystals. FEBS Lett. 432: 113-116. Irwin, D.C., Spezio, M., Walker, L.P., and Wilson, D.B. 1993. Activity studies of eigth purified cellulases: specificity, synergism, and binding domain effects. Biotechnol. and Bioeng. 42: 1002-1013. Jaenicke, R. 2000. Stability and stabilization of globular proteins in solution. J. Biotechnol. 79: 193-203. Jones, T.A., and Kjeldgaard, M.O. 1997. Electron-density map interpretation. Methods Enzymol. 277: 173-208.

60 Jones, T.A., Zou, J.-Y., Cowan, S.W., and Kjeldgaard, M. 1991. Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr. A47: 110-119. Karlsson, J., Saloheimo, M., Siika-Aho, M., Tenkanen, M., Penttila, M., and Tjerneld, F. 2001. Homologous expression and characterization of Cel61A (EG IV) of Trichoderma reesei. Eur. J. Biochem. 268: 6498-6507. Karlsson, J., Siika-aho, M., Tenkanen, M., and Tjerneld, F. 2002. Enzymatic properties of the low molecular mass endoglucanases Cel12A (EG III) and Cel45A (EG V) of Trichoderma reesei. J. Biotechnol. 99: 63-78. Keitel, T., Simon, O., Borriss, R., and Heinemann, U. 1993. Molecular and active-site structure of a Bacillus 1,3-1,4-beta- . Proc. Natl. Acad. Sci. 90: 5287- 5291. Kleywegt, G.J., and Jones, T.A. 1996. Phi/Psi-chology: Ramachandran revisited. Structure 4: 1395-1400. Kleywegt, G.J., and Jones, T.A. 1997. Detecting folding motifs and similarities in protein structures. Methods Enzymol. 277: 525-545. Kleywegt, G.J., Zou, J.Y., Divne, C., Davies, G.J., Sinning, I., Ståhlberg, J., Reinikainen, T., Srisodsuk, M., Teeri, T.T., and Jones, T.A. 1997. The crystal structure of the catalytic core domain of endoglucanase I from Trichoderma reesei at 3.6 Å resolution, and a comparison with related enzymes. J. Mol. Biol. 272: 383-397. Kleywegt, G.L., and Jones, T.A. 1999. Software for handling macro molecular envelopes. Acta Crystallog. D55: 941-944. Knight, S.D. 2000. RSPS version 4.0: a semi-interactive vector-search program for solving heavy-atom derivatives. Acta Crystallog. D56: 42-47. Koivula, A., Kinnari, T., Harjunpaa, V., Ruohonen, L., Teleman, A., Drakenberg, T., Rouvinen, J., Jones, T.A., and Teeri, T.T. 1998a. Tryptophan 272: an essential determinant of crystalline cellulose degradation by Trichoderma reesei cellobiohydrolase Cel6A. FEBS Lett. 429: 341-346. Koivula, A., Linder, M., and Teeri, T. 1998b. Structure-funktion relationship in Trichoderma cellulolytic enzymes. In Trichoderma and Gliocladium. (eds. G. Harman, and C. Kubicek), pp. 3-23. Taylor and Francis Ltd, London. Koshland, P.J. 1953. Stereochemistry and the mechanism of enzymatic reactions. Biol. Rev. 28: 416-436. Kraulis, J., Clore, G.M., Nilges, M., Jones, T.A., Pettersson, G., Knowles, J., and Gronenborn, A.M. 1989. Determination of the three-dimensional solution structure of the C-terminal domain of cellobiohydrolase I from Trichoderma reesei. A study using nuclear magnetic resonance and hybrid distance geometry- dynamical simulated annealing. Biochemistry 28: 7241-7257.

61 Kubicek, C., and Penttilä, M. 1998. Regulation of production of plant polysaccharide degrading enzymes by. In Trichoderma and Gliocladium. (eds. G. Harman, and C. Kubicek), pp. 49-72. Taylor and Francis Ltd, London. Kubicek, C.P., Messner, R., Gruber, F., Mandels, M., and Kubicek-Pranz, E.M. 1993. Triggering of cellulase by cellulose in Trichoderma reesei. Involvement of a constitutive, sophorose-inducible, glucose-inhibited beta- diglucoside permease. J. Biol. Chem. 268: 19364-19368. Kuhls, K., Lieckfeldt, E., Samuels, G.J., Kovacs, W., Meyer, W., Petrini, O., Gams, W., Borner, T., and Kubicek, C.P. 1996. Molecular evidence that the asexual industrial fungus Trichoderma reesei is a clonal derivative of the ascomycete Hypocrea jecorina. Proc. Natl. Acad. Sci. 93: 7755-7760. La Fortelle, E.d., Irwin, J.J., and Bricogne, G. 1997. SHARP: A Maximum-Likelihood Heavy-Atom Parameter Refinement and Phasing Program for the MIR and MAD Methods. In Crystallographic Computing. Lehmann, M., Loch, C., Middendorf, A., Studer, D., Lassen, S.F., Pasamontes, L., van Loon, A.P., and Wyss, M. 2002. The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Eng. 15: 403-411. Lehmann, M., Pasamontes, L., Lassen, S.F., and Wyss, M. 2000. The consensus concept for thermostability engineering of proteins. Biochim. Biophys. Acta 1543: 408-415. Lehmann, M., and Wyss, M. 2001. Engineering proteins for thermostability: the use of sequence alignments versus rational design and directed evolution. Curr. Opin. Biotechnol. 12: 371-375. Linder, M., and Teeri, T.T. 1996. The cellulose-binding domain of the major cellobiohydrolase of Trichoderma reesei exhibits true reversibility and a high exchange rate on crystalline cellulose. Proc. Natl. Acad. Sci. 93: 12251-12255. Maheshwari, R., Bharadwaj, G., and Bhat, M.K. 2000. Thermophilic fungi: their physiology and enzymes. Microbiol. Mol. Biol. Rev 64: 461-488. Matthews, B.W. 1968. Solvent content of protein crystals. J. Mol. Biol. 33: 491-497. Mattinen, M.-L., Linder, M., Teleman, A., and Annila, A. 1997. Interaction between cellohexaose and cellulose binding domains from Trichoderma reesei cellulases. FEBS Letters 407: 291-296. Mattinen, M.L., Linder, M., Drakenberg, T., and Annila, A. 1998. Solution structure of the cellulose-binding domain of endoglucanase I from Trichoderma reesei and its interaction with cello-oligosaccharides. Eur. J. Biochem. 256: 279-286. McCarter, J.D., and Withers, S.G. 1994. Mechanisms of enzymatic glycoside hydrolysis. Curr. Opin. Struct. Biol. 4: 885-892. McPherson, A.J. 1982. Preparation and Analysis of Protein Crystals. John Wiley and Sons, New York. McPherson, A.J. 1999. Crystallization of biological macromolecules. Cold Spring Harbor Laboratory Press, New York.

62 Medve, J., Karlsson, J., Lee, D., and Tjerneld, F. 1998. Hydrolysis of microcrystalline cellulose by cellobiohydrolase I and endoglucanase II from Trichoderma reesei: adsorption, sugar production pattern, and synergism of the enzymes. Biotechnol. Bioeng. 59: 621-634. Meyer, W., Morawetz, R., Börner, T., and Kubicek, C. 1992. The use of DNA- fingerprint analysis in the classification of some species of the Trichoderma aggregate. Current Genetic 21: 27-30. Mohr, H., and Schopfer, P. 1995. Plant physiology. Springer Verlag, Berlin. Muilu, J., Törrönen, A., Perakyla, M., and Rouvinen, J. 1998. Functional conformational changes of endo-1,4-xylanase II from Trichoderma reesei: a molecular dynamics study. Proteins 31: 434-444. Murshudov, G.N., Vagin, A.A., and Dodson, E.J. 1997. Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr. D53: 240-255. Navaza, J. 1994. AMoRe: an automated package for molecular replacement. Acta Crystallogr. A50: 157-163. Nidetzky, B., Steiner, W., Hayn, M., and Claeyssens, M. 1994. Cellulose hydrolysis by the cellulases from Trichoderma reesei: a new model for synergistic interaction. Biochemical J. 298: 705-710. Nieduszynski, I.A., and Preston, R.D. 1970. Crystallite size in natural cellulose. Nature 225: 273-274. Nutt, A., Sild, V., Pettersson, G., and Johansson, G. 1998. Progress curves: A mean for functional classification of cellulases. Eur. J. Biochem. 258: 200-206. Okada, H., Mori, K., Tada, K., Nogawa, M., and Morikawa, Y. 2000. Identification of active site carboxylic residues in Trichoderma reesei endoglucanase Cel12A by site-directed mutagenesis. J. Mol. B: 249-255. Okada, H., Tada, K., Sekiya, T., Yokoyama, K., Takahashi, A., Tohda, H., Kumagai, H., and Morikawa, Y. 1998. Molecular characterization and heterologous expression of the gene encoding a low-molecular-mass endoglucanase from Trichoderma reesei QM9414. Appl. Environ. Microbiol. 64: 555-563. Otwinowski, Z. 1991. Maximum likelihood refinement of heavy atom parameters. In Isomorphous Replacement and Anomalous Scattering. (eds. P. Evans, and A. Leslie), pp. 80-85, SERC Daresbury Laboratory, UK. Otwinowski, Z., and Minor, W. 1997. Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 276: 307-326. Penttila, M., Lehtovaara, P., Nevalainen, H., Bhikhabhai, R., and Knowles, J. 1986. Homology between cellulase genes of Trichoderma reesei: complete nucleotide sequence of the endoglucanase I gene. Gene 45: 253-263. Perl, D., Mueller, U., Heinemann, U., and Schmid, F.X. 2000. Two exposed amino acid residues confer thermostability on a cold shock protein. Nat. Struct. Biol. 7: 380- 383.

63 Reese, E., Lavinson, H., Downing, M., and White, L. 1950. Quartermaster culture collection. Farlowia 4: 45-86. Reese, E.T. 1976. History of the cellulase program at the U.S. army Natick Development Center. Biotechnol Bioeng Symp: 9-20. Reinikainen, T., Ruohonen, L., Nevanen, T., Laaksonen, L., Kraulis, P., Jones, T.A., Knowles, J.K., and Teeri, T.T. 1992. Investigation of the function of mutated cellulose-binding domains of Trichoderma reesei cellobiohydrolase I. Proteins 14: 475-482. Reverbel-Leroy, C., Pages, S., Belaich, A., Belaich, J.P., and Tardif, C. 1997. The Processive Endocellulase CelF, a Major Component of the Clostridium cellulolyticum Cellulosome: Purification and Characterization of the Recombinant Form. J. of Bacteriology 179: 46-52. Rossmann, M., and Blow, D.M. 1962. The detection of sub-units within the crystallographic asymmetric unit. Acta. Crystallog. 15: 24-31. Rouvinen, J., Bergfors, T., Teeri, T., Knowles, J.K., and Jones, T.A. 1990. Three- dimensional structure of cellobiohydrolase II from Trichoderma reesei. Science 249: 380-386. Saloheimo, A., Henrissat, B., Hoffren, A.M., Teleman, O., and Penttila, M. 1994. A novel, small endoglucanase gene, egl5, from Trichoderma reesei isolated by expression in yeast. Mol. Microbiol. 13: 219-228. Saloheimo, M., Lehtovaara, P., Penttilää, M., Teeri, T.T., Ståhlberg, J., Johansson, G., Pettersson, G., Claeyssens, M., Tomme, P., and Knowles, J.K.C. 1988. EGIII, a new endoglucanase from Trichoderma reesei: the characterization of both gene and enzyme. Gene 63: 11-21. Saloheimo, M., Nakari-Setala, T., Tenkanen, M., and Penttila, M. 1997. cDNA cloning of a Trichoderma reesei cellulase and demonstration of endoglucanase activity by expression in yeast. Eur J Biochem 249: 584-591. Shaw, A., Bott, R., and Day, A.G. 1999. Protein engineering of alpha- for low pH performance. Curr. Opin. Biotechnol. 10: 349-352. Shoemaker, S.P., and Brown, R.D., Jr. 1978. Characterization of endo-1,4-beta-D- glucanases purified from Trichoderma viride. Biochim. Biophys. Acta. 523: 147- 161. Shoemaker, S.P., Watt, K., Tsitovsky, G., and Cox, R. 1983. Characterisation and properties of cellulases purified from Trichoderma reesei strain L27. Bio/Technology 1: 687-690. Simmons, E. 1977. Classification of some cellulase producing Trichoderma species. In 2nd International Mycological Congress. (ed. H.E.a.S. Bigelow, E.G.), pp. 618, Tampa University of South Florida. Sinnott, M.L. 1990. Catalytic mechanisms of enzymic glycosyl transfer. Chem. Rev. 90: 1171-1202.

64 Sjöström, E. 1993. Wood Chemistry Fundamentals and applications, 2 ed. Academic Press Inc., London. Sprey, B., and Bochem, H.P. 1992. Effect of endoglucanase and cellobiohydrolase from Trichoderma reesei on cellulose microfibril structure. FEMS Microbiol. Lett. 97: 113-118. Sprey, B., and Ülker, A. 1992. Isolation and properties of a low molecular mass endoglucanase from Trichoderma reesei. FEMS Microbiol. Lett. 92: 253-257. Srisodsuk, M., Reinikainen, T., Penttilä, M., and Teeri, T.T. 1993. Role of the interdomain linker peptide of Trichoderma reesei cellobiohydrolase I in its interaction with crystalline cellulose. J. of Biological Chemistry 268: 20766- 20761. Ståhlberg, J. 1991. Functional organization of cellulases from Trichoderma reesei. In Doctoral thesis. Acta Universitatis Upsaliensis. Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science 344. 45pp, Uppsala. ISBN 91- 554-2800-2. Uppsala University. Ståhlberg, J., Johansson, G., and Pettersson, G. 1988. A binding-site-deficient, catalytically active, core protein of endoglucanase III from the culture filtrate of Trichoderma reesei. Eur. J. Biochem. 173: 179-183. Sternberg, D., and Mandels, G.R. 1980. Regulation of the cellulolytic system in Trichoderma reesei by sophorose: induction of cellulase and repression of beta- glucosidase. J. Bacteriol. 144: 1197-1199. Sulzenbacher, G., Mackenzie, L.F., Wilson, K.S., Withers, S.G., Dupont, C., and Davies, G.J. 1999. The crystal structure of a 2-fluorocellotriosyl complex of the Streptomyces lividans endoglucanase CelB2 at 1.2 Å resolution. Biochemistry 38: 4826-4833. Teeri, T.T., Lehtovaara, P., Kauppinen, S., Salovuori, I., and Knowles, J. 1987. Homologous domains in Trichoderma reesei cellulolytic enzymes: gene sequence and expression of cellobiohydrolase II. Gene 51: 43-52. Tomme, P., Warren, R.A., and Gilkes, N.R. 1995. Cellulose hydrolysis by bacteria and fungi. Adv. Microb.Physiol. 37: 1-81. Ülker, A., and Sprey, B. 1990. Characterization of an unglycosylated low molecular weight 1,4-beta- glucan-glucanohydrolase of Trichoderma reesei. FEMS Microbiol Lett 69: 215-219. Vaheri, M., Leisola, M., and Kauppinen, V. 1979. Transglycosylation products of the cellulase system of Trichoderma Reesei. Biotechnology Letters 1: 41-46. van Solingen, P., Meijer, D., van der Kleij, W.A., Barnett, C., Bolle, R., Power, S.D., and Jones, B.E. 2001. Cloning and expression of an endocellulase gene from a novel streptomycete isolated from an East African soda lake. Extremophiles 5: 333-341. Ward, M., Wu, S., Dauberman, J., Weiss, G., Larenas, E., Bower, B., Rey, M., Clarkson, K., and Bott, R. 1993. Cloning, Sequence and Preliminary Structural

65 Analysis of a Small, High pI Endoglucanase (EGIII) from Trichoderma reesei. In The Tricell 93 symposium. (eds. P. Suominen, and T. Reinikainen), pp. 153-158. Foundation for Biotechnical and Industrial Fermentation Research, Espoo, Finland. Watanabe, H., Noda, H., Tokuda, G., and Lo, N. 1998. A cellulase gene of termite origin. Nature 394: 330-331. Wey, T.T., Hseu, T.H., and Huang, L. 1994. Molecular cloning and sequence analysis of the cellobiohydrolase I gene from Trichoderma koningii G-39. Curr. Microbiol. 28: 31-39. Williamson, R.E., Burn, J.E., and Hocart, C.H. 2002. Towards the mechanism of cellulose synthesis. Trends Plant Sci. 7: 461-467. Wolfenden, R., Lu, X., and Young, G. 1998. Spontaneous hydrolysis of . J. of American Chem. Soc. 120: 6814-6815. Xu, B., Hellman, U., Ersson, B., and Janson, J.C. 2000. Purification, characterization and amino-acid sequence analysis of a thermostable, low molecular mass endo- beta-1,4-glucanase from blue mussel, Mytilus edulis. Eur. J. Biochem. 267: 4970-4977. Yuan, S., Wu, Y., and Cosgrove, D.J. 2001. A fungal endoglucanase with plant cell wall extension activity. Plant Physiol. 127: 324-333.

66 0.5) 0.5) 2 1 2 3 o 9.3 0 (12.7) 0 (12.7) ADSCQ4R 2 P4 2 1 2 90 3 o .74-1.7 1.56-1.52 1.56-1.52 .74-1.7 90 ______2 P4 2 1 2 3 o 5 G4 G2SG2 G2SG2 G4 5 .7 12.6 8.9 8.9 12.6 .7 .0 0.5 0.5 0.5 0.5 .0 90 ADSCQ4R ADSCQ4R 166.1 167.6 166.0 167.6 166.1 Q4R 2 P4 2 1 2 3 o 90 o P4 1 2 P3 2 1 (35.2) 7.4 (39.0) 12.0 (37.7) 8.8 (31.6) 11.2 (36.7) 4.4 (11.4) (11.4) 4.4 (36.7) 11.2 (31.6) 8.8 (37.7) 12.0 (39.0) 7.4 (35.2) 2 120 3 o .5 4.3 9.1 9 9.1 4.3 .5 und ESRF ESRF ESRF ESRF ESRF ESRF ESRF ESRF ESRF und MAR CCD MAR CCD ADSC CCD MAR CCD MAR 321815 184371 315178 406063 298714 289753 298714 406063 315178 184371 321815 1.09 093 0.934 0.934 0.933 0.933 0.933 0.934 0.934 093 1.09 58847 42832 34631 41537 23699 32477 23699 41537 34631 42832 58847 Q4R 90 o P4 1 P2 1 2 1 2 98.5 1 o 90 o P2 1 P2 . 1 ~ T. reesei T. reesei S. sp. 11AG8 H. schweinitzii H. grisea T. reesei H. grisea H. grisea H. grisea H. grisea grisea H. grisea H. grisea H. grisea H. reesei T. grisea H. schweinitzii 11AG8 H. sp. S. reesei T. reesei T. I ~ i 6 = 91.5° 91.5 91.5° = hkl E 6 / ~ Data collection and processing statistics . 92.0 (84.5) 94.4 (94.3) 72.9 (63.5) 98.9 (84.6) 94.6 (84.4) 99.8 (99.6) 100 (99.6) 99.7 (99.2) 98.9 (97.8) 99.3 (9 99.3 (97.8) 98.9 (99.2) 99.7 (99.6) 100 (99.6) 99.8 (84.4) 94.6 (84.6) 98.9 (63.5) 72.9 (94.3) 94.4 (84.5) 92.0 a I – < I > I – < ~ i WT A35V WT WT WT P201C G2 G G2 P201C WT WT WT A35V WT 6 6.5 (23.0) 3.7 (38.1) 5.4 (16.1) 9.7 (12.6) 7.6 (12.6) 9.7 (16.1) 5.4 (38.1) 3.7 (23.0) 6.5 b hkl 6 = (%) (I) 15.9 (4.0) 24.9 (2.3) 15.4 (8.6) 14.0 (5.7) 24.1 (4.0) 18.3 (2.8) 17.6 (4.8) 20.4 (13.8) 25.8 (6.4) 5 (6.4) 25.8 (13.8) 20.4 (4.8) 17.6 (2.8) 18.3 (4.0) 24.1 (5.7) 14.0 (8.6) 15.4 (I) (2.3) 24.9 (4.0) 15.9 merge merge V R Numbers in parentheses are for the highest resolution bins. resolution for are the highest in parentheses Numbers Resolution range outer shell 1.93-1.90 1.53-1.50 1.53-1.50 1.73-1.70 1.24-1.22 1.73-1.70 1.52-1.49 1.42-1.40 1 1.42-1.40 1.52-1.49 1.73-1.70 1.24-1.22 1.73-1.70 166.2 69.1 1.53-1.50 165.5 1.53-1.50 83.4 1.93-1.90 shell 62.6 119.3 outer 121.4 range = Resolution c Appendix I Space group P2 group Space Rigaku ID14 EH1 Rigaku ID14 EH4 711 ID14 EH1 ID14 EH1 ID14 EH1 ID14 EH2 ID14 EH2 EH2 ID14 EH2 ID14 EH1 ID14 EH1 ID14 EH1 ADSC II ID14 711 Raxis CCD Detector EH4 MAR II ID14 Rigaku Raxis EH1 ID14 Rigaku Completeness (%) Collected Home ESRF Home ESRF L ESRF Home ESRF Home set Data Cel12A Cel12A Cel12A Cel12A Cel12A Cel12A Cel12A Cell12A Cel12A Cel12ACollected R ______b = 71.6 71.3 54.6 77.5 49.2 70.6 49.3 49.3 49.5 49.3 49.3 49.5 49.3 49.3 70.6 ______49.2 77.5 ______4 49.5 49.3 49.3 70.6 49.2 54.6 62.5 65.2 68.3 69.7 = 71.3 (Å) a parameters Cell 42-1.52 50-1.7 71.6 50-1.40 50-1.49 20-1.70 30-1.22 (Å) 30-1.70 29-1.50 25-1.50 = range 29-1.90 Resolution b I/ a b ______No. of observed reflections 230270 490560 88448 346287 85680 88448 26559 490560 172895 230270 87132 reflections 5 observed reflections 4.0 of unique 3.3 No. of 2.8 No. multiplicity 2.6 Average Wavelength Å 1 Wavelength 0.98 1.0 range 0.5° 1.54 0.5° 0.93 0.5° 0.5° 1.54 Oscillation 0.5° 0.5°

67 2SG2 0.011 .0 0.7 0.7 .0 .7 1.6 1.6 .7 2.1 9.9 2.1 3.2 11.8 3.2 16.6; 20.7 14.5; 18.3 ______7 0 0 7 5 45 45 45 45 5 1 1 1 1 - - - - 14.8; 16.8 14.8; 1298 985 1027 985 1298 1.5 47-1.4 48-1.7 47-1.5 48-1.7 47-1.4 1.5 .1 2.0 1.9 2.1 2.1 1.9 2.0 .1 14.9; 16.1 14.9; .5 ------.5 .7 1.5 1.7 1 1.7 1.5 .7 20.9; 25.1 20.9; 0 5 6 6 5 0 0 23 4 23 0 xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx From Eng & Hubert (Engh and Huber 1991) and Huber (Engh & Hubert Eng From 13.5; 15.1 13.5; a 300 289 199 191 169 246 169 191 199 289 300 1832 3320 1832 1832 1832 1832 1832 1832 1832 3320 1832 .2 1.5 1.3 2 1.3 1.5 .2 H. schweinitzii H. grisea T. reesei H. grisea H. grisea H. grisea H. grisea grisea H. grisea H. grisea H. grisea H. reesei T. grisea H. schweinitzii H. 19.2; 21.7 19.2; 0.011 0.013 0.015 0.011 0.010 0.013 0.010 0.011 0.015 0.013 0.011 Kjeldgaard 1997), CNS (Brünger et al. 1998), MOLEMAN [Kleywegt, [Kleywegt, MOLEMAN et al. 1998), CNS (Brünger 1997), Kjeldgaard 18.1; 19.3 18.1; . Refmac 5.0 (Murshudov et al. 1997). et al. 1997). . Refmac 5.0 (Murshudov 4 - 56 - 28 - - - 28 - 56 - 4 1 4 1 2 1 1 1 2 1 4 1 20.5; 22.3 20.5; T. reesei T. reesei S. sp. 11AG8 sp. S. reesei T. reesei T. ) 1.3 1.3 1.2 1 1.2 1.3 1.3 ) 2 ) 13.8 29.9 15.5 10.5 11.4 14.8 11.1 9.3 1 9.3 11.1 14.8 11.4 10.5 15.5 29.9 13.8 ) ) 14.6 20.1 16.2 11.2 13.1 15.4 12.5 10.9 1 10.9 12.5 15.4 13.1 11.2 16.2 20.1 14.6 ) 2 2 ) 21.4 24.3 25.9 19.0 23.3 23.0 22.7 20.7 23.3 23.2 23.2 23.3 20.7 22.7 23.0 23.3 19.0 25.9 24.3 21.4 ) 2 0.7 0.5 2.0 0.9 1.0 0.7 0.5 0.5 2 0.5 0.5 0.7 1.0 0.9 2.0 0.5 0.7 b 0.005 0.010 0.011 0.010 0.005 (Å) 0.4 0.5 - 0.2 - 0 - 0.2 - 0.5 0.4 (Å) a 1.3 1.5 1.5 1.3 1.6 1 1.6 1.3 1.5 1.5 1.3 D a Structure refinement and final model statistics . NCS C Cel12A Cel12A Cell12A Cel12A Cel12A Cel12A Cel12A Cel12A Cel12A Cel12A Cel12A Cel12A Cel12A Cel12A Cel12A Cel12A Cell12A Cel12A Cel12A B on bonded atoms (Å factor (%) 18.9; 23.3 18.9; factor (%) ' free & R According to the stringent boundary definition of (Kleywegt and Jones 1996). and Jones 1996). of definition (Kleywegt boundary to the stringent According b Average protein factor (Å protein Average factor (Å water Average (Å) bond lengths RMSD Average RMSD NCS all at. (Å) 0.5 0.7 - 0.5 - 0.6 ------0.6 - 0.5 - 0.7 0.5 (Å) at. all NCS RMSD RMSD Average (%) outliers Ramachandran Values were calculated with O (Jones et al. 1991), (Jones and (Jones 1991), et al. O (Jones calculated with Values were and Jones 1997) (Kleywegt 1996 #1744], and LSQMAN ______Appendix II ______Protein Protein (º) bond angels RMSD RMSD Average Resolution used in refinement (Å) 29-1.9 20-1.5 29-1.5 29-1.70 30-1.22 20-1.7 47- 20-1.7 30-1.22 29-1.70 29-1.5 1OA3 20-1.5 1OA4 29-1.9 1OA2 (Å) 1H8V codes refinement 6648 1686 access in PDB used Resolution 224 224 224 224 6 218 224 218 222 6 218 AU 9978 218 in molecules protein Protein 9972 in Residues atoms Protein Reflections in: working set 84281 167265 25464 83042 56977 40981 33445 40152 22674 31369 22674 40152 33445 40981 56977 83042 25464 167265 ______84281 set 1089 1331 working 1793 2573 821 in: 5174 Reflections 2571 set test R factor (Å overall Average Waters 1180 643 117 598 117 643 1180 Waters WT A35V WT WT WT P201C G2 G5 G4 G G4 G5 G2 P201C WT WT WT A35V WT Residues with double conform. 0 0 0 0 17 0 0 0 0 conform. double with Residues Ligand atoms 0 0 0 0 0 0 0 0 0 atoms Ligand NAG atoms 84 8 84 - - - - 2 - 4 - 6 6 residues atoms (NAG) N-glycosylation NAG

68

Acta Universitatis Upsaliensis Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science and Technology, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology. (Prior to October, 1993, the series was published under the title “Comprehensive Summaries of Uppsala Dis- sertations from the Faculty of Science and Technology”.)

Distributor: Uppsala University Library, Box 510, SE-751 20 Uppsala, Sweden

ISSN 1104-232X ISBN 91-554-5562-X