ABSTRACT
CONWAY, JONATHAN MICHAEL. In Vitro and In Vivo Analyses of the Role of Multi- Domain Glycoside Hydrolases from Extremely Thermophilic Caldicellulosiruptor Species in the Degradation of Plant Biomass. (Under the direction of Dr. Robert Kelly).
Caldicellulosiruptor species are anaerobic bacteria that are the most thermophilic
(Topt = 70-75°C) microorganisms currently known to degrade lignocellulosic biomass, making them of interest for the conversion of plant biomass to industrial products. The genomes of 12 Caldicellulosiruptor species have been sequenced and genomic, transcriptomic, proteomic, and phenotypic analysis has revealed genes of interest for evaluating and manipulating the native attachment and degradation mechanisms of
Caldicellulosiruptor species. Of particular interest are the large (~160-240 kDa), multi- domain Carbohydrate Active enZymes (CAZymes) produced by Caldicellulosiruptor species that containing multiple catalytic glycoside hydrolase (GH) and carbohydrate binding module
(CBM) domains. These multi-domain enzymes represent an alternative paradigm of lignocellulosic plant biomass degradation to cellulosomal organisms, such as Clostridium thermocellum, or free enzyme producers, such as the fungus Trichoderma reesei.
The multi-domain enzymes from Caldicellulosiruptor species can be divided into two broad categories: those freely secreted from the cell and those containing surface layer homology (SLH) domains, which associate these enzymes with the bacterial surface layer (S- layer). Proteins from both categories were produced in full-length and truncated recombinant versions in Escherichia coli and Caldicellulosiruptor bescii and biochemically characterized in vitro. Analysis of the free enzymes from the glucan degradation locus (GDL), the primary genomic site of cellulases in Caldicellulosiruptor species, showed that while CelA
(Athe_1867 in C. bescii) is responsible for the primary cellulase activity, it is enhanced about two-fold by synergistic action with additional enzymes from the GDL. Analysis of truncations of S-layer localized Calkro_0111, the largest CAZyme from any
Caldicellulosiruptor species, revealed the contribution of individual non-catalytic domains to the activity of its GH16 and GH55 domains. In particular, a newly identified actin crosslinking-like (ACL) domain, adjacent to the GH16, is necessary for its activity. In further analysis, an enzyme from Caldicellulosiruptor sp. strain Wai35.B1 containing three catalytic
GH domains, the first enzyme identified to date containing three catalytic GH domains, was characterized focusing on its individual catalytic domains. Protein crystal structures for individual GH and CBM domains were determined from several of these CAZymes: GH5 and CBM28 domains from Csac_0678, GH16 and ACL domains from Calkro_0111, and
GH10 CBM3 and GH48 domains from Wai35_2053. These domains were also characterized biochemically. These domain truncations are singular folded and catalytic entities such that the Caldicellulosiruptor multi-domain GHs truly are comprised of individually folded domains strung along the single polypeptide chain.
To complement these in vitro analyses, genetic manipulations in C. bescii were performed, using newly developed genetic techniques and strains, to engineer plant biomass degradation and evaluate the role of CAZymes in vivo. S-layer associated xylanase,
Calkro_0402, was inserted into C. bescii and the resulting strain showed improved ability to degrade xylan substrates, the first example of engineering improved degradation in
Caldicellulosiruptor. The six GH enzymes of the glucan degradation locus (GDL) in C. bescii were also analyzed by constructing eight knockout strains with different deletion combinations. Solubilization of crystalline cellulose and complex biomass substrates by these strains reveal these enzymes are responsible for essentially all of the biotic substrate solubilization performed by C. bescii. And, while the relative importance of individual enzymes from the GDL is biomass dependent, the synergistic action of these enzymes underlies this bacterium’s broad appetite for plant biomass.
Taken together, the in vitro and in vivo analyses of the multi-domain glycoside hydrolases presented here advances the fundamental understanding of how these enzymes work biochemically and the roles these enzymes play in plant biomass recruitment and degradation by Caldicellulosiruptor species. This work will enable future efforts to engineer strains and biomass processing schemes for optimized plant biomass degradation looking towards the production of bio-fuels and bio-chemicals from lignocellulosic feedstocks.
© Copyright 2017 Jonathan Michael Conway
All Rights Reserved In Vitro and In Vivo Analyses of the Role of Multi-Domain Glycoside Hydrolases from Extremely Thermophilic Caldicellulosiruptor Species in the Degradation of Plant Biomass
by Jonathan Michael Conway
A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy
Chemical and Biomolecular Engineering
Raleigh, North Carolina
2017
APPROVED BY:
______Dr. Robert Kelly Dr. Chase Beisel Committee Chair
______Dr. Amy Grunden Dr. Lucian Lucia Biotechnology Minor Representative
DEDICATION
For my grandparents– the teacher, the pastor, the engineer, and the cook.
I see parts of each of you in myself.
I love you all very much.
ii
BIOGRAPHY
Jonathan Conway was raised in Lancaster County, Pennsylvania. He attended the
University of Notre Dame in South Bend, Indiana for his undergraduate degree in chemical engineering. During his undergraduate career, he participated in several undergraduate research experiences with various professors at Notre Dame. After his junior and senior years at Notre Dame he had the opportunity to intern in the Upstream Manufacturing Science and Technology department at MedImmune, Inc. in Frederick, Maryland. His experiences with undergraduate research at Notre Dame and industrial experience at MedImmune inspired him to pursue a graduate degree in chemical engineering. He began his graduate studies under the direction of Dr. Robert Kelly in the fall of 2011. After completion of his graduate studies in May 2017, he will pursue a career in academia doing research in biological applications of chemical engineering.
iii
ACKNOWLEDGMENTS
First and foremost, I would like to acknowledge my academic advisor, Dr. Robert
Kelly. His guidance, assistance, and leadership have been instrumental in my development and success in graduate school. His belief and trust in me and my abilities has propelled me to achieve many of my research and professional successes. I would also like to acknowledge
Dr. Sara Blumer-Schuette for her mentorship in the early days of my graduate career. So much of scientific research relies on solid fundamental techniques, and the fundamentals I rely on often were taught to me by Sara. All of the members of the Kelly lab group over my years as a member have been amazing friends and colleagues and I appreciate your encouragement, help, and friendship both inside and outside the lab. The undergraduates that have worked with me have also been an integral part of my graduate career. I appreciate all of your trust in my instruction, hard work on my research projects, enthusiasm, and friendship.
I am also grateful for the collegial environment in the chemical engineering department and larger biotechnology community at NC State University. Working alongside the faculty, staff, graduate students, and undergraduates of our department and this university has been a pleasure. Beyond the NC State community, many members of the Department of
Energy BioEnergy Science Center (BESC) have been instrumental in my scientific success.
Of this group, I would especially like to thank the members of Dr. Mike Adams’s lab at the
University of Georgia and Vlad Lunin and Markus Alahuhta at the National Renewable
Energy Laboratory.
iv
I would finally, and most importantly, like to acknowledge my family. My parents,
Michael and Christine Conway, fostered my love of learning from an early age and have supported me in all of my endeavors. Their support and guidance have helped to make me the person I am today. My sister, Caitlin, is always there for me and I appreciate her friendship and support immensely. I am sorry for constantly boring you all with my latest trials and tribulations in the lab. Despite lacking most of the prerequisite training in chemistry, biology, and/or engineering, your willingness to listen has helped me think through my scientific experiments many times. Thank you for your unwavering support and encouragement. Without it, achieving this milestone would not have been possible.
v
TABLE OF CONTENTS
LIST OF TABLES ...... xi LIST OF FIGURES ...... xii CHAPTER 1 Lignocellulosic Biomass Deconstruction by the Extremely Thermophilic Genus Caldicellulosiruptor ...... 1 Abstract ...... 2 Introduction ...... 3 Lignocellulosic Biomass ...... 3 Processing Lignocellulose into Biofuels ...... 5 Carbohydrate Active Enzymes (CAZymes) ...... 7 Glycoside Hydrolases (GHs) ...... 9 Carbohydrate Binding Modules (CBMs) ...... 11 Polysaccharide Lyases (PLs) ...... 13 Carbohydrate Esterases (CEs)...... 15 Immunoglobulin-like Superfamily...... 16 Surface Layer Homology (SLH) Domains ...... 18 Others (Adhesin, Expansin-like, and Swollenin) ...... 18 Biomass Deconstruction by Cellulolytic Microorganisms ...... 19 Biomass Degradation by the Genus Caldicellulosiruptor ...... 22 The Genus Caldicellulosiruptor ...... 22 The Caldicellulosiruptor Pan-CAZome ...... 25 Notable Caldicellulosiruptor CAZymes ...... 26 Linker Regions in Caldicellulosiruptor CAZymes...... 28 The Caldicellulosiruptor Surface Layer ...... 30 FN3 containing Caldicellulosiruptor proteins ...... 31 Comparison to Clostridium thermocellum ...... 32 Conclusions ...... 34 Future Trends ...... 35
vi
Metagenomics - Identifying Novel CAZymes ...... 35 Caldicellulosiruptor bescii Genetic System ...... 36 Metabolic Engineering of a CBP Thermophile ...... 37 Web Resources ...... 39 Acknowledgments ...... 39 References ...... 49 CHAPTER 2 Multi-Domain, Surface Layer Associated Glycoside Hydrolases Contribute to Plant Polysaccharide Degradation by Caldicellulosiruptor species ...... 77 Abstract ...... 78 Introduction ...... 79 Materials and Methods ...... 81 Bacterial Strains, Plasmids and Reagents ...... 81 Growth Media and Culture Conditions ...... 82 Expression of Recombinant Proteins in E. coli ...... 83 Purification of Recombinant Protein and Truncation Mutants ...... 84 Optimal pH and Temperature Determination ...... 85 Analysis of Oligosaccharide Products from Enzymatic Hydrolysis ...... 86 Confocal and Epifluorescence Microscopy ...... 86 Genetic Manipulation of C. bescii ...... 88 Hydrolysis of Hemicellulose Substrates ...... 89 Analysis of Solubilization in Culture Supernatants ...... 90 Results ...... 90 S-layer Homology (SLH) Domain Proteins in Caldicellulosiruptor Species ...... 90 C. kronotskyensis Produces an S-layer and SLH Enzymes Calkro_0111 and Calkro_0402 are Localized on the C. kronotskyensis Cell Surface ...... 92 Laminarinase Calkro_0111 Displays Endo- and Exo-glucanase Activity in Separate Catalytic Domains ...... 93 Biochemical Characterization of Calkro_0402 and Relatedness To Other CBM22/GH10/CBM9 Enzymes ...... 95
vii
Transcriptomic Analysis of SLH Domain Proteins in C. bescii and C. kronotskyensis . 96 Manipulation of the C. bescii S-layer by the Insertion (Knock On) of the Gene Encoding Calkro_0402 ...... 97 C. bescii Strain RKCB103 Xylan Solubilization ...... 98 Discussion ...... 100 Acknowledgements ...... 105 References ...... 117 CHAPTER 3 Functional analysis of the Glucan Degradation Locus (GDL) in Caldicellulosiruptor bescii reveals roles of component glycoside hydrolases in plant biomass deconstruction ...... 135 Abstract ...... 136 Introduction ...... 137 Experimental Procedures ...... 140 Bacterial Strains, Plasmids and Reagents ...... 140 Media and Culture Conditions ...... 142 Cloning and Purification of β-glucosidase Athe_0458 ...... 143 Construction of Genetic Knockout Strains of C. bescii ...... 144 Genomic Sequencing of Modified C. bescii Strains ...... 146 Secretome Analysis ...... 146 Construction of Enzyme Expression Strains of C. bescii ...... 147 Protein Expression and Purification from C. bescii Strains ...... 147 Solubilization of Lignocellulosic Biomass Substrates ...... 148 Biochemical Characterization of GDL Glycoside Hydrolases ...... 149 Results ...... 151 The C. bescii Glucan Degradation Locus ...... 151 Repetitive Nature of the GDL ...... 155 In vivo Analysis of GDL Enzymes through Targeted Gene Deletions ...... 157 Cellulose and Complex Plant Biomass solubilization by GDL GH Knockout Strains 159 Switchgrass Solubilization by Mixtures of GDL Knockout Strains ...... 163
viii
Expression and Purification of Glucan Degradation Locus Cellulases ...... 164 Biochemical Characterization of Athe_1860 ...... 166 In Vitro Analysis of Glucan Degradation Locus Enzyme Cocktails ...... 167 Discussion ...... 170 Acknowledgements ...... 178 References ...... 189 CHAPTER 4 Structural and Biochemical Characterization of Glycoside Hydrolase Catalytic Domains and Carbohydrate Binding Modules from Caldicellulosiruptor Biomass Degradation Enzymes ...... 203 Abstract ...... 204 Introduction ...... 205 Experimental Procedures ...... 208 Bacterial Strains, Plasmids and Reagents ...... 208 Media and Culture Conditions ...... 209 Recombinant Protein Production in E. coli...... 210 Purification of Proteins for Biochemical Characterization ...... 211 Construction of C. bescii Enzyme Expression Strains...... 212 Protein Expression and Purification from C. bescii Strains ...... 213 Purification of Proteins for Cystallography ...... 214 Crystallization ...... 214 Data Collection and Processing ...... 215 Structure Solution and Refinement ...... 216 Biochemical Characterization of Glycoside Hydrolases ...... 216 Results ...... 217 Calkro_0111 GH16 and Fascin Domain Containing Proteins ...... 217 Calkro_0111 GH16 and Fascin Domain Structures...... 218 Biochemical Analysis of the Calkro_0111 GH16-Fascin Protein ...... 219 Wai35_2053 GH10, CBM3, and GH48 Domain Structures...... 220 Biochemical Analysis of Wai35_2053 and Constituent Domains ...... 222
ix
Csac_0678 GH5 and CBM28 Structures ...... 223 Biochemical Analysis of Csac_0678 and Athe_0594 ...... 225 Discussion ...... 227 Acknowledgements ...... 232 References ...... 242
x
LIST OF TABLES
Table 1.1 - Caldicellulosiruptor Genomes ...... 44 Table 1.2 - Surface Layer Homology domain containing Caldicellulosiruptor proteins ...... 45 Table 1.3 - Fibronectin type III-like domain containing Caldicellulosiruptor proteins ...... 47 Table 1.4 - C. thermocellum and C. kronotskyensis extracellular CAZy domain inventory comparison ...... 48 Table 2.1 - Primers used in this study ...... 116 Table 3.1 - Characterized GDL Enzymes ...... 187 Table 3.2 - Specific Activity of Athe_1860 ...... 188 Table 4.1 - Proteins identified by blastp using Calkro_0111 GH16-Fascin sequence as the query ...... 240 Table 4.2 - GH16 domain-containing proteins identified by blastp using the Calkro_0111 Fascin domain sequence as the query ...... 241
xi
LIST OF FIGURES
Figure 1.1 - Biomass structure and degradation by selected Caldicellulosiruptor enzymes . 40 Figure 1.2 - The three biomass degradation mechanisms used by cellulolytic microorganisms ...... 41 Figure 1.3 - Caldicellulosiruptor extracellular pan-CAZome ...... 42 Figure 1.4 - Caldicellulosiruptor intracellular pan-CAZome ...... 43 Figure 2.1 - Catalytic CAZyme SLH Domain Proteins from the genus Caldicellulosiruptor ...... 107 Figure 2.2 - TEM and Immunofluorescence Microscopy of C. kronotskyensis ...... 108 Figure 2.3 - Biochemical Characterization of Calkro_0111 ...... 109 Figure 2.4 - Biochemical Characterization of Calkro_0402 ...... 110 Figure 2.5 - Transcriptional response of genes encoding SLH domain proteins from C. bescii and C. kronotskyensis...... 111 Figure 2.6 - Construction of Calkro_0402 knock-in C. bescii strain RKCB103 ...... 112 Figure 2.7 - Increased xylan solubilization and xylan attachment by Calkro_0402 knock-in C. bescii strain RKCB103 ...... 114 Figure 2.8 - Time course washed oat spelt xylan solubilization ...... 115 Figure 3.1 – The Glucan Degradation Locus in C. bescii ...... 179 Figure 3.2 – Dot plot showing the repeated DNA sequence of the C. bescii GDL ...... 180 Figure 3.3 – GDL knockout strains of C. bescii ...... 181 Figure 3.4 – Biotic Solubilization of Plant Biomass Substrates by GDL knockout strains . 182 Figure 3.5 – Switchgrass solubilization by mixed cultures of GDL knockout strains...... 183 Figure 3.6 – Recombinant expression of GDL Enzymes is C. bescii ...... 184 Figure 3.7 – Enzyme cocktails mimicking C. bescii wildtype and GDL knockout strains . 185 Figure 3.8 – Enzyme cocktails demonstrating the synergy between GDL GH enzymes. ... 186 Figure 4.1 - Selected multi-domain enzymes from Caldicellulosiruptor species ...... 233 Figure 4.2 - Crystal strucutres for Calkro_0111 GH16 and Fascin-like domain ...... 234 Figure 4.3 - Biochemical data on the Calkro_0111 GH16-Fascin protein ...... 235
xii
Figure 4.4 - Crystal structures for Wai35_2053 domains ...... 236 Figure 4.5 - Biochemical data on Wai35_2053 and its domains ...... 237 Figure 4.6 - Crystal structures for Csac_0678 domains ...... 238 Figure 4.7 - Biochemical data on Csac_0678 and Athe_0594 proteins ...... 239
xiii
CHAPTER 1
Lignocellulosic Biomass Deconstruction by the Extremely Thermophilic Genus Caldicellulosiruptor
Jonathan M. Conway, Jeffrey V. Zurawski, Laura L. Lee, Sara E. Blumer-Schuette and Robert M. Kelly
Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC 27695-7905
Published in: Thermophilic Microorganisms (Li, F. ed.), Caister Academic Press, Norfolk, UK. pp 91-120 (2015)
1
Abstract
Extremely thermophilic bacteria from the genus Caldicellulosiruptor produce a variety of large multi-domain proteins for the attachment to and degradation of lignocellulosic biomass. The currently sequenced genomes from the genus
Caldicellulosiruptor encode 144 homologous groups of Carbohydrate Active enZymes
(CAZymes) containing 128 glycoside hydrolase (GH), 58 carbohydrate binding module
(CBM), 9 polysaccharide lyase (PL), and 11 carbohydrate esterase (CE) domains.
Extracellular CAZymes from these species have a distinct multi-domain architecture, with multiple catalytic and binding domains in the same protein. This promotes synergy between catalytic domains and enhances the ability of these bacteria to degrade biomass. A number of these extracellular multi-domain proteins are also cell surface associated via surface layer homology (SLH) domains. These SLH domain containing proteins are thought to help these species attach to the biomass substrate, while catalytic domains in the same proteins degrade it. Because of their highly effective biomass degradation strategy, Caldicellulosiruptor species are of particular interest for development into consolidated bioprocessing microorganisms for the processing of lignocellulosic biomass to biofuels.
2
Introduction
Lignocellulosic Biomass
Lignocellulosic biomass is abundant, inexpensive, and renewable, making it of interest as a feedstock for liquid biofuel production (1). Lignocellulosic biomass can be obtained from a variety of sources, including agricultural wastes (corn, small grain, and sugarcane residues), industrial forestry product wastes (logging residues, tree thinning, sawdust, and paper pulp), and dedicated biofuel crops (switchgrass, miscanthus, sorghum, and poplar). Two recent studies project the total United States production capacity of lignocellulosic biomass priced less than $60 per dry ton to reach 550-600 million dry tons annually by the year 2020, without significant impact to the environment or food supply
(2,3). This is projected to increase to about 770 million dry tons of production by the year
2030, supported largely by an annual 120 dry ton production increase in dedicated biofuel crops over those ten years (3). Similar analysis applies globally to regions where significant lignocellulose reserves are available.
Lignocellulosic biomass dry weight is comprised generally of about 35-50% cellulose, 20-35% hemicellulose, and 10-30% lignin (4,5). The composition of biomass varies significantly from species to species and within different tissues of a single species (6).
Figure 1.1A shows the general makeup of the polysaccharides, cellulose and hemicellulose, in biomass. Cellulose is a homologous polymer comprised of long chains of β-1,4 linked glucose molecules. These long chains are able to pack parallel to one another into ordered arrays 3-5nm thick, called crystalline microfibrils, which assemble into the main structural element of plant cell walls (7). The packing of these crystalline regions is sufficiently
3
compact to preclude the penetration of enzymes and even small molecules, water in particular. However, irregularities in the packing of the polysaccharide chains create amorphous regions, pits, micropores, and capillaries, making the cellulose chains accessible to small molecules and, in some cases, cellulases, enzymes that break down cellulose (8).
Hemicellulose, on the other hand, is a highly variable heterologous polymer of pentoses (e.g., xylan, arabinose) and hexoses (e.g., mannose, glucose, galactose), integrated into a β-1,4- or mixed β-1,3-1,4- linked backbone of repeating disaccharide units, which are then substituted with various side chain sugars (9). Hemicellulose composition varies across different species, for example, hardwoods contain mainly xylans, while softwoods contain mainly mannans, and grasses contain primarily arabinoxylan (5). This diverse composition necessitates an array of hemicellulases to recognize and cleave these linkages to completely degrade hemicellulose. Hemicellulosic glycans, particularly xyloglucan, are thought to non- covalently tether cellulose microfibrils in the cell wall, thereby increasing wall strength (9).
The non-carbohydrate fraction of plant biomass is lignin, a hydrophobic polymer synthesized by the oxidative coupling of the monomers p-hydroxyphenyl (H-), guaiacyl (G-), and syringyl (S-) lignin (10). Lignin attaches covalently to hemicellulose, filling in gaps between hemicellulose and cellulose to increase the strength and rigidity of fibrous plant tissue, and its hydrophobicity improves water and solute conduction through plant vascular tissue (10,11). These characteristics of lignin interfere with the access of enzymes to the polysaccharides of biomass and enable lignin to adsorb enzymes, impeding their activity
(12).
4
Processing Lignocellulose into Biofuels
The main challenge in utilizing lignocellulosic biomass as an energy source is a deficiency of low-cost technologies for effectively accessing polysaccharides within the plant cell wall and breaking them down into fermentable sugars (8,13). Traditional approaches to bioprocessing of lignocellulosic biomass involve three process steps: pretreatment, enzymatic hydrolysis, and fermentation of released sugars (14). Thermal and/or chemical processing of raw biomass is done during pretreatment, with the goals of disrupting the plant cell wall and removing and separating lignin, and sometimes hemicellulose, from the substrate (15).
Cellulase and hemicellulase cocktails are then added to hydrolyze the biomass polysaccharides to mono- and oligo-saccharides, which are fermented by a microorganism to a desired product, typically ethanol. Many variations of these steps have been developed, either to keep hydrolysis and fermentation separate, or combine them in strategies called simultaneous saccharification and fermentation (SSF), where enzyme cocktails are mixed with ethanologens (e.g., yeast) to achieve biomass conversion (14,16). However, conditions within the SSF reaction must be amenable to the microorganism, as well as the enzyme mixtures, which is often far from the specific optimum conditions of either entity.
Ideally, a low-cost processing strategy would combine lignocellulose degradation and fermentation of the released sugars to fuel in a single process step performed by a single microbe, a strategy referred to as Consolidated BioProcessing (CBP) (8). Currently, no naturally occurring microorganism performs the substrate utilization and product formation steps to the extent needed for CBP-based production of liquid fuels. However, efforts are underway to develop CBP microorganisms using two separate engineering strategies:
5
improving product yield from cellulolytic microorganisms, or adding cellulases to non- cellulolytic microorganisms with high product yield to enable substrate utilization (13,17).
Both of these strategies require better understanding of the intrinsic mechanisms that cellulolytic microorganisms use to efficiently attack and break down cellulose. This will help to identify the most promising candidate microorganisms and cellulases for CBP development.
To date, many mesophilic recombinant microorganisms have been developed as potential CBP microorganisms (18,19), but none can navigate the high variability of lignocellulosic feedstocks and have the robustness necessary to function under biologically unfavorable process conditions (20). Because of their ability to degrade cellulose at high temperatures and to utilize unpretreated biomass feedstocks, a number of cellulolytic, extremely thermophilic microorganisms show promise in overcoming the process barriers to
CBP. The high temperatures at which extreme thermophiles grow optimally (i.e., 70˚C or above) reduce the energy input for process cooling, facilitate volatile product recovery, and significantly limit the risk of microbial contamination by mesophilic methanogen and sulfate- reducing soil bacteria, likely to be present in unpretreated biomass feedstocks (21). In addition, the inventory of enzymes produced by these organisms provides a rich reservoir of highly thermostable, and often novel, enzymes for biomass processing.
Cellulolytic, thermophilic bacteria are rare, with many species clustered in the genera
Caldicellulosiruptor, Clostridium, Geobacillus, Acidothermus, and Thermoanaerobacter.
However, Calidcellulosiruptor is the only known genus with species that can fully deconstruct lignocellulose at temperatures above 70°C (20). A number of non-cellulolytic
6
thermophiles that produce carbohydrate active enzymes related to biomass conversion are also of interest for developing CBP microorganisms, including Thermotoga maritima,
Pyrococcus furiosus, Sulfolobus solfataricus, and Dictyoglomus thermophilum (22,23).
Currently, genetic and metabolic engineering of cellulolytic thermophiles, now in the early stages, could enable fine-tuning the enzyme inventory and improving product yields from these organisms, looking towards CBP as the ultimate goal (24-26). Environmental sampling and metagenomic approaches are also of interest to identify as yet undiscovered microorganisms and enzymes that could play a role in an engineered CBP biomass conversion system.
Carbohydrate Active Enzymes (CAZymes)
Microorganisms that degrade biomass substrates produce a large variety of extracellular cellulases, hemicellulases, and non-catalytic binding proteins for interacting with, and liberating sugars from, insoluble biomass substrates. These proteins are designated
Carbohydrate Active enZymes (CAZymes), and have a modular architecture that may contain a single domain or multiple domains from various enzyme classes or binding motifs.
These catalytic and binding domains that comprise CAZymes are classified into families according to amino acid sequence similarity, and cataloged in a database maintained at www.cazy.org (27,28). The CAZy database is ever expanding by grouping proteins into new module families and incorporating putative CAZymes from an increasing number of sequenced genomes. The CAZy database is curated to include biochemical characterization
7
and x-ray crystallographic structural information obtained for members of the various families (28).
Of the enzyme module classes cataloged in the CAZy database, the most abundant for biomass degradation are the catalytic glycoside hydrolases (GHs), which hydrolyze glycosidic bonds between various linked sugars, and the non-catalytic carbohydrate binding modules (CBMs), which bind sugars both soluble and insoluble. The CAZy database also includes other carbohydrate active domain classifications including: polysaccharide lyases
(PLs), carbohydrate esterases (CEs), glycosyltransferases (GTs), and a newly designated
‘Auxiliary Activities’ (AAs) classification (28). PLs in the context of biomass degradation are typically involved in the degradation of pectin. CEs are involved in the removal of ester linked moieties (e.g., acetyl or methoxy groups) attached to the sugars in biomass. GTs are enzymes that form glycosidic bonds, rather than degrade them. While they are produced by cellulolytic microbes, GTs, because they are not involved in biomass deconstruction, will not be discussed here. The AAs class contains a variety of modules that do not fit into the other
CAZy classes, but assist GHs, PLs, and CEs in accessing the polysaccharides of plant cell walls. Currently classified members of the AAs are predominantly from lignolytic and cellulolytic fungi, and the role of these modules in bacteria is currently not well characterized
(29). As such, the AAs class will not be discussed further here.
Currently, there are 133 GH, 69 CBM, 23 PL, and 16 CE families listed in the CAZy database. However, the constant curation of the database has deleted or reclassified some of the families as biochemical characterization data become available. For example, the family
GH 61 was assigned to the GHs based on weak glucanase activity of one member. But as
8
more members were characterized, the protein family has been reassigned as Auxiliary
Activity family 9, rendering GH 61 obsolete. Due to these types of changes, there are currently 126 GH, 66 CBM, 22 PL, and 15 CE families with classified members. A number of proteins that are known, or predicted, to have a carbohydrate active domain, but are not closely related to existing protein sequences, remain as “non-classified sequences” or “nc”, meaning they have not been incorporated into a numbered family classification. As sequencing of novel organisms continues and new protein sequences are introduced into the database, these are grouped into new families when sufficient similarity is identified among non-classified sequences. In addition to the GH family designations, some families have been organized into 14 clans (GH-A through –N), based on tertiary protein structure similarities between GHs of different families (30). In the same way, selected CBM families are organized into 7 ‘fold’ families, based on tertiary structure similarities (31,32). In the following sections, the role of each of these CAZy classified modules will be examined in relation to biomass attachment and deconstruction. A number of additional modules and proteins, also associated with CAZymes, will also be discussed.
Glycoside Hydrolases (GHs)
The enormous diversity of saccharides in plant biomass necessitates the complementary action of a variety of GHs to dismantle them. Nearly all GHs exhibit specificity based on two structural elements of the carbohydrates they target: the configuration of the cleaved bond (α or β) and the ring size of the released sugar molecule
(30). GHs hydrolyze glycosidic bonds, in almost all cases, by acid-base chemistry using two
9
carboxylic acid moiety containing amino acids in the active site, most often glutamic acid or aspartic acid (30,33). Glycosidic bond hydrolysis proceeds by either an inverting mechanism, where the anomeric carbon is inverted by direct displacement of the leaving group with water, or a retaining mechanism where the anomeric carbon orientation is retained by double displacement through a glycosyl-enzyme intermediate (1,33). The transition state of GH-catalyzed hydrolysis reactions involve the distortion of the pyranose ring into a puckered conformation to allow nucleophilic attack and bond hydrolysis (34).
These transition states have been examined for a number of GHs through various crystallographic and computational methods (34-37).
GHs that break down polysaccharides are also classified as endohydrolases or exohydrolases, based on their ability to cleave glycosidic bonds either in the middle, or at the end of polysaccharide chains, respectively (38). In the case of cellulose hydrolysis, endo-β-
1,4-gluconases cut the amorphous regions of cellulose to generate new cellulose chain ends, from which exo-β-1,4-glucanases work to remove small oligosaccharides, mainly cellobiose, resulting in the classification of many exo-β-1,4-glucanases enzymes as cellobiohydrolases.
Exohydrolases can further be divided into two classes, based on their preference for acting at the reducing or non-reducing end of the polysaccharide chains they cleave (39). Several GHs have been identified that display both endo- and exo- modes of action, so characterization based on enzymatic mechanism and structural properties result in more specific classifications (1,38). Once soluble sugars are liberated from polysaccharides (e.g., cellodextrins from cellulose), another set of GHs, such as β-glucosidases or cellobiose phosphorylases, break these soluble carbohydrates into monomers for metabolism (8,40,41).
10
The complex structure and diversity of sugars and bonds in biomass illustrate the need for a high degree of synergy between many GHs to efficiently hydrolyze the substrate.
GHs that hydrolyze the β-1,4-glycosidic bonds found in cellulose and cellodextrins are most prevalent in families GH5, GH6, GH7, GH8, GH9, GH12, GH44, GH45, GH48, GH74, and
GH124 (8). Most xylan-degrading enzymes are concentrated in GH5, GH8, GH10, and
GH11, but bi-functional GHs have been identified in GH7, GH16, GH43, and GH62, families that are typically associated with the degradation of other polysaccharides (42). The
α-1,4-glucosidic bonds, found in polysaccharides related to starch, are typically degraded by members of GH13, GH57, GH70, and GH77 (43,44). Many other GH activities are relevant to biomass degradation, including: α-galactosidases in GH27 and GH36; β-mannanases in
GH5 and GH26; β-mannosidases in GH2; α-arabinofuranosidases in GH43, GH51, GH54, and GH62; and α-glucuronidases in GH67 (27,45-48). Representative crystal structures have now been solved for 100 of the existing 126 GH families (49).
Carbohydrate Binding Modules (CBMs)
CBMs are often components of catalytic proteins, which also contain catalytic GH,
PL, or CE domains, or components of dedicated binding proteins that facilitate the attachment of microorganisms to biomass substrates. CBM domains enhance the function of the catalytic domains to which they are linked by increasing the proximity of the catalytic unit to the substrate, targeting specific areas of the substrate, and disrupting substrate interactions to enhance degradation (31). When CBMs are appended to or removed from catalytic protein units, GHs linked to these CBMs can have increased hydrolytic efficiency
11
(50,51). This suggests that the CBM units increase the effective enzyme concentration at the substrate surface, thereby increasing efficiency. Other experimental data suggest that CBMs play a role in targeting GHs to specific areas of the substrate that differ in susceptibility to hydrolysis (52,53). Microorganisms that grow on biomass attack a diversity of sugars.
Congruency between CBM domain binding and GH domain catalytic specificity within the same protein is essential for efficient degradation by targeting particular sugars within the diverse plant cell wall (32). There is also significant evidence that CBMs play a role in disrupting or penetrating the structure of polysaccharides, enhancing the ability of the catalytic subunit to degrade the substrate (54-56). One model of cellulose microfibril disruption postulates CBM adsorption at a micro-defect in the cellulose chain packing, followed by penetration of the CBM and water molecules into the mircofibril, disrupts the hydrogen bonding between stacked cellulose chains, loosening individual chains for enzymatic degradation (55). To better understand the roles of CBMs, it is important to understand the types of sugars that they bind.
CBM binding is classified into types, based on the recognized ligands: Type A surface binding, Type B glycan chain binding, and Type C soluble sugar binding (31). Type
A CBMs have planar binding sites, which reflect the flat surface of crystalline substrates, such as cellulose or chitin (32). Type B CBMs include the most CAZy classified CBM families, and bind to glycan chains via a cleft or groove binding site (31). The cleft or grove site limits the specificity of these CBMs to polysaccharide chains, for example, binding the free cellulose chains in amorphous microfibril regions, while no binding affinity is seen to crystalline regions of cellulose (57). Similarly, the steric restrictions of the Type C CBM
12
binding site only allow binding to soluble mono-, di-, or tri-saccharides (32). Aromatic amino acids seem to play a universal role in carbohydrate binding in all CBM types, especially tryptophan and tyrosine, which stack against, in Type A CBMs, or sandwich, in
Type B and C CBMs, the pyranose ring of carbohydrate targets (31,58). In addition, site- directed mutations have shown that hydrogen bonds between CBM polar groups and sugar targets are key to CBM affinity in Type B and C CBMs, but only play a minor role in Type A
CBMs (31,59).
Polysaccharide Lyases (PLs)
PLs cleave uronic acid-containing polysaccharides and occur in a diversity of genomes from bacteria to higher eukaryotes. However, they are rarer in these genomes than
GHs, presumably because only a small portion of polysaccharides contain uronic acids (60).
The substrates for PLs can be divided generally into three groups, based on the uronic acid they contain at the PL cleavage site. Galacturonic acid containing polysaccharides including pectin and pectate; glucuronic/iduronic acid containing polysaccharides including heparin, chondroitin, and hyaluronan, among others; and mannuronate/guluronate containing polysaccharides including alginate are all degraded by PLs (61). The first of these PL degraded groups is only one that is significant in the context of biomass utilization, because, pectin and pectate are significant components in the primary cell walls of all dicots and some monocots (62). Glucuronic acid containing glucuronoxylan and glucuronoarabinoxylan are significant components of the cell walls of dicots and grasses respectively (9). However, the
13
single glucuronic acid moiety branching on the xylan backbone in these cases is not catalytically removed by PLs, but rather by GH4 or GH67 α-glucuronidases (47,48).
Pectin consists of three different polysaccharide motifs: 1) homogalacturonan (HGA), the main backbone of α-(1,4)-linked D-galacturonic acid, 2) rhamnogalacturonan I (RGI), which has a [2)-L-rhamnose-α-(1,4)-D-galacturonic acid-α-(1,] linked backbone that is highly branched from rhamnose with L-arabinose and D-galactose, and 3) rhamnogalacturonan II (RGII), which is a highly variable and complex branched region of the α-(1,4)-linked D-galacturonic acid backbone (62). Microorganisms that degrade plant biomass typically contain only PLs from the pectin/pectate lyase families PL1, PL2, PL3,
PL9, PL10, and PL22 and from the rhamnogalacturonan lyase families PL4 and PL11 (61).
Uronic acid-containing polysaccharides are cleaved by PLs via a β-elimination mechanism that generates a new reducing chain end and a hexenuronic acid moiety at the non-reducing chain end after cleavage (60). This occurs without the involvement of water, in contrast to the GH mechanism that does incorporate water to maintain an -OH group on the non-reducing chain end (63). Of the 22 PL families in the CAZy database, all but one, PL17, have a solved structure for a representative member and these structures group into eight
‘fold’ types (60). In pectate lyases with solved crystal structures, an Arg or Lys residue acts as the catalytic base (64,65). For the PL3 pectate lyase from Caldicellulosiruptor bescii, it was found that in digestion reactions of unpretreated switchgrass, where the PL3 was mixed with C. bescii cellobiohydrolase A (CelA), lower CelA loading was needed to achieve the same glucan conversion accomplished by higher loadings of CelA alone (65). This suggests that PLs play an important role in making areas of unpretreated biomass substrates more
14
readily accessible to GHs of other biomass degrading CAZymes, thereby leading to more complete substrate utilization.
Carbohydrate Esterases (CEs)
As a protection against microbial attack, many plant cell wall polysaccharides, particularly hemicelluloses and pectin, are highly substituted with a variety of side chains, including acetyl, feruloyl, and methoxy groups, linked to the sugar backbone by ester linkages (66). Esterification with phenolic acids, such as ferulic acid, can further react to form cross-linkages between adjacent cell wall polymers, which improve the mechanical strength of the plant cell wall, although less frequent than polysaccharide esterification with acetic acid (67). CEs remove these ester substitutions, allowing GHs and PLs access to the polysaccharide backbone for degradation. The synergistic relationship between CEs and
GHs or PLs has been observed frequently (68,69); for example, in corn stover degradation, the removal of acetyl substituting groups via the addition of an acetyl xylan esterase from CE family 5 improved enzymatic conversion of xylan by an endoxylanase from GH 11 (70). CE domains can also be appended to GH or PL catalytic domains, resulting in bi-functional activity of a single enzyme, as observed in the CE1-GH11 containing XynS20E enzyme from
Neocallimastix patriciarum (71).
CEs can be grouped based on the substituted moiety that they remove: cinnamoyl esterases, including feruloyl esterases, are found in family CE1, although many have not yet been assigned a CE family (CEnc); acetylesterases, including acetylxylan esterases, belong to families CE1, CE2, CE3, CE4, CE5, CE6, CE7 and CE16; and, pectin methylesterases are in
15
family CE8 (27,66,72,73). The active center of most CEs is based on a catalytic serine in the consensus sequence (Gly-x-Ser-x-Gly), which is part of a catalytic triad of Ser-His-Asp/Glu or catalytic dyad of Ser-His alone (67). In contrast, the catalytic center of pectin methylesterases center around a catalytic Asp residue stabilized by Arg (73). The reaction mechanism for CEs is similar to that of lipases involved in the breakdown of triglyceride fats, where the substrate binds to the active serine, an alcohol is released, and an acyl-enzyme intermediate remains; subsequently, attack of a nucleophile like water and resolution results in product and the free enzyme (74,75).
Immunoglobulin-like Superfamily
Two additional domain types, fibronectin type III (FN3) and bacterial immunoglobulin-like (BIg), both of which belong to the superfamily of immunoglobulin-like
(Ig-like) proteins in Pfam clan CL0159, have been identified as members of the multi-domain architecture of a number of Caldicellulosiruptor proteins (76,77). Members of this immunoglobulin-like superfamily typically only have 10-30% sequence identity, but share a similar structure of two face-to-face β-sheets with a hydrophobic core (76,78). FN3 domains are common components of eukaryotic proteins, but within bacterial species they have been identified almost exclusively in extracellular glycohydrolases (79,80). There remains some disagreement as to how to classify FN3 domains and what their role is in bacterial enzymes.
Some studies have characterized these domains as FN3 domains and “X domains” (because of their unknown function) and have tried to differentiate phylogenetic groups of X domains from FN3 domains (81). Work with the cellulosomal cellobiohydrolase CbhA from C.
16
thermocellum, containing domains from the N-terminus: CBM4-BIg-GH9-FN3-FN3-CBM3-
Dockerin, showed increased hydrolysis of sugar from acid-swollen cellulose by protein truncations containing one or both FN3 domains and “exfoliation” of cellulose chains by the
FN3 domains alone (80). Work by others to solve the crystal structures and characterize the same FN3 domains from C. thermocellum CbhA suggest that these domains play a role in improving the thermostability of the enzyme, and do not play a role in disrupting the substrate (82). Other proposed roles for FN3 domains include promoting “mechanical elasticity” of the enzyme, functioning as a CBM, and serving as a “spacer” to maintain the optimal distance between the other components of multi-domain enzymes (78,81).
Bacterial immunoglobulin-like domains appear to be involved in cell-to-cell adhesion and extracellular glycohydrolysis, but a definitive role of these domains remains elusive (83).
The structure of the BIg domain from C. thermocellum CbhA has been solved (84,85). The domain shows very close association with, and is required for, the catalytic ability of the
GH9, suggesting a role in the structural stabilization of the catalytic domain (84). Similarly, an Ig-like domain in a thermostable esterase from Thermotoga maritima was found to play a role in the multimerization of the catalytic domain into a hexameric structure for increased activity and thermostability (86). Overall, very little characterization and few structures have been solved for members of the FN3 and BIg families, but the limited work that has been done suggest they play a role in both enzyme stabilization and substrate binding, both of which are essential for robust, thermostable, biomass-degrading enzymes.
17
Surface Layer Homology (SLH) Domains
SLH domains are highly conserved, 50-60 amino acid sequences that are on the N- or
C-terminus of proteins, and attach non-covalently to the cell surface within the surface layer
(S-layer), a planar array of surface protein (87). S-layers are found in almost every taxonomic group of bacteria and archaea, and in most species are predominantly comprised of a single S-layer protein (SLP) that assembles into a protein lattice at the cell surface with oblique, square, or hexagonal symmetry (87). This structured protein array may serve many functions, including as a molecular sieve or protective coat, or an attachment site for extracellular enzymes and molecular moieties involved in cell adhesion or surface recognition (88,89). Proteins, other than the SLP, that contain SLH domains can be tethered to the cell surface within this array, while the remainder of the protein is free to serve its function (90). CAZymes containing SLH domains can mediate the attachment of microorganisms to biomass substrates, by attaching to both the cell via SLH domains and to the substrate via CBM domains; they can also contribute to biomass deconstruction at the cell surface with catalytic domains (91-93). In gram positive bacteria, such as
Caldicellulosiruptor and Clostridium species, SLH domains attach to the 15-30 nm thick peptidoglycan cell wall and its associated secondary cell wall polymers (e.g., teichoic and teichuronic acids) (88).
Others (Adhesin, Expansin-like, and Swollenin)
Expansins are proteins found in plants that participate in cell enlargement by loosening cell wall polysaccharides without breaking them, allowing plant cell walls to
18
“creep” and expand, which facilitates plant growth (94-96). Expansin-like gene sequences appear in the genomes of a small number of phylogenetically diverse bacteria and, from experimental evidence, display similar plant cell wall loosening characteristics to plant expansins (97,98). Expansin-like proteins also include a set of fungal proteins, originally identified in Trichoderma reesei, called swollenins based on their ability to swell cotton fibers without degrading them (99,100). These non-hydrolytic proteins are of particular interest as a biomass pretreatment in biofuels processes for their role in cellulose amorphogenesis, initial disruption of the crystalline microfibral structure prior to hydrolysis
(55). Work is being done to better quantify their effect on cellulose and their ability to enhance the action of CAZymes (101). Recently, Caldicellulosiruptor species have been shown to express two classes of novel adhesin proteins that are enriched in the crystalline cellulose bound fraction of cultures analyzed by proteomics analysis (91). These adhesin proteins have no homology outside the Caldicellulosiruptor genus and are located downstream of a type IV pilus locus, suggesting association with type IV pili to attach to biomass substrates. This novel group of adhesin proteins have no hydrolytic activity, but may play a role in cellulose amorphogenesis, similar to expansin-like proteins (Blumer-
Schuette et al., unpublished).
Biomass Deconstruction by Cellulolytic Microorganisms
Cellulolytic microorganisms use three different degradation mechanisms, shown in
Figure 1.2, to coordinate their cellulase and hemicellulase inventory to break down biomass: free enzymes, cellulosome multi-enzyme complexes, and large multi-domain enzymes (102).
19
Cellulolytic species use at least one, but often more than one, of these degradation mechanisms to effectively degrade lignocellulosic biomass. Aerobic cellulolytic fungi and bacteria typically produce large amounts of individual extracellular free enzymes, but nearly all cellulolytic species use free enzymes to some extent in their biomass degradation strategy
(102,103). These free enzymes almost always contain a substrate-binding CBM domain attached to a catalytic GH domain by a small flexible linker peptide (104). While free enzymes are effective at degrading crystalline cellulose, multi-enzyme cellulosome complexes or large multi-domain enzymes provide the opportunity for coordination of multiple catalytic domains for a more synergistic degradation strategy.
A number of anaerobic bacteria (e.g., Clostridium spp. and Ruminococcus spp.) and fungi (e.g., Chytridomycetes) produce large multi-enzyme complexes on the cell surface called cellulosomes, the most studied of which is from Clostridium thermocellum (105). The cellulosome from C. thermocellum displays superior cellulose degradation activity to free enzyme cocktails and degrades biomass by a mechanism distinct from free enzymes in which the cellulosome physically separates cellulose microfibrils from the larger particle to improve digestion (106). Cellulosomes are constructed around a large primary scaffoldin protein that acts as a scaffold for the enzyme units that degrade biomass. An example of a C. thermocellum cellulosome constructed around the main primary scaffoldin protein, CipA, is shown in Figure 1.2. The primary scaffoldin protein contains many cohesin subunits and typically one CBM domain, which mediates attachment of the large cellulosome complex to the substrate. Enzymes containing GH, CBM, and terminal dockerin modules anchor to cohesin domains on the primary scaffoldin by high affinity, calcium-dependent, type-I
20
cohesin-dockerin interactions (102). The primary scaffoldin attaches to an anchoring scaffoldin protein via a type-II cohesin-dockerin interaction that is divergent from the type-I primary cohesin-dockerin interaction between the scaffoldin and enzyme subunit (105). The anchoring scaffoldin contains terminal surface layer homology (SLH) domains that facilitate attachment of the entire cellulosome complex to the cell surface (105).
While the paradigm describing cellulosome assembly is modeled after the C. thermocellum version, further investigation into the cellulosomes of C. thermocellum and other species reveal a rich array of assembly motifs to increase the number of enzyme attachments or incorporate additional components to the cellulosome complex (102,105).
For example, in C. thermocellum, five cell surface attached scaffoldins, and one scaffoldin that is not cell associated, have been identified; these vary in assembly structures from attaching directly to a single enzyme subunit to attaching 7 primary scaffoldins harboring up to 63 enzymes (107). In contrast, Acetivibrio cellulolyticus produces a “three-tiered cellulosome”, comprised of 83 enzymes attached in a complex to the cell surface via three interacting scaffoldins, one of which also contains a catalytic GH9 module (105).
Cellulosomes provide a number of advantages to the microorganisms that produce them, including increased enzymatic synergism enhanced by favorable assembly and spacing of enzymatic and binding components, and proximity of the cell to the substrate for monitoring the microenvironment and uptake of released sugars for metabolism (1).
Other anaerobic bacteria, particularly members of the genus Caldicellulosiruptor, do not coordinate enzymes into large multi-protein cellulosome complexes, but rather use large multi-domain proteins, which can contain multiple catalytic and binding domains, to
21
efficiently degrade biomass (91). These multi-domain proteins are characterized by their complex architecture of multiple GH, PL, CE, CBM, and/or other accessory domains in a single peptide. Several of these multi-domain enzymes contain two distinct GH domains, creating a multi-functional protein, where it is possible for substrate intermediates to be mobile between the two GH module active sites resulting in increased synergism (108,109).
The inclusion of SLH domains at either the N- or C-terminus of some of these proteins allows for attachment of the protein within the S-layer of the cell, while CBM domains in the same protein can mediate attachment to biomass to hold the cell in close proximity to the substrate (91). These complex multi-domain proteins have been shown to be more active than the sum of the constituent domains as separate proteins, suggesting this multi-domain enzyme strategy for degrading biomass provides an advantage to the microbes that produce them (51,108,110).
Biomass Degradation by the Genus Caldicellulosiruptor
The Genus Caldicellulosiruptor
Caldicellulosiruptor species are extremely thermophilic, gram positive, anaerobic, and asporogenous bacteria that are capable of breaking down a wide range of polysaccharides (111). Currently, the closed genomes of eight Caldicellulosiruptor species, shown in Table 1.1, are available in the National Center for Biotechnology Information
(NCBI) database (www.ncbi.nlm.nih.gov). In addition, two Caldicellulosiruptor genomes,
C. acetigenus and Caldicellulosiruptor sp. F32, currently have permanent draft sequences and are listed as assemblies in the NCBI database. Other genomes that are currently
22
unrelated include those for Caldicellulosiruptor species strain Rt8.B8 and strain Wai35.B1.
These species are globally distributed through North America, Iceland, Russia, and New
Zealand, however, while the genomes of species from similar locations share highly syntenous blocks, cellulolytic capacity does not correlate with the isolation location (91).
Caldicellulosiruptor species have optimum growth temperatures between 70°C and
78°C, and certain species represent the most thermophilic, cellulolytic bacteria known (111).
One of the most extensively studied species in the genus is C. saccharolyticus, which has a large inventory of GHs in its genome that are differentially transcribed to adapt to varying substrates. These GHs metabolize a variety of monosaccharides, both pentose and hexose sugars, simultaneously without regulation by carbon catabolite repression (108,112,113).
Another well studied species, C. bescii, has been shown to grow on industrially relevant loadings of unpretreated biomass and is able to digest up to 85% of insoluble unpretreated switchgrass, while simultaneously deconstructing lignin, cellulose, and hemicellulose
(114,115). Caldicellulosiruptor species ferment the simple sugars liberated from the biomass substrate to molecular hydrogen, lactate, acetate, and a small amount of alcohols. A recent review of Caldicellulosiruptor metabolism discusses both the intermediary metabolism of these species for generating energy and reducing equivalents, as well as the metabolism for fuel production (116). These characteristics make Caldicellulosiruptor species attractive candidates for development into consolidated bioprocessing (CBP) microorganisms.
Comparison of the eight closed Caldicellulosiruptor genomes revealed that the core genome, genes contained in the genomes of all eight species, includes 1543 ORFs, but is non-cellulolytic (91). The pan genome, the collection of all genes in all species (4009
23
ORFs), was found to be open, indicating that sequencing of new species is expected to reveal novel genes not yet identified as part of the pan genome (91). Within the genus, cellulolytic capability varies greatly with C. bescii, C. kronotskyensis, C. obsidiansis, C. lactoaceticus, and C. saccharolyticus having higher cellulolytic ability than the other species with completed genomes (91). Comparison of these five highly cellulolytic species reveals three extracellular GH proteins present in all five genomes: GH9-CBM3-CBM3-CBM3-GH48
(often referred to as CelA), GH74-CBM3-CBM3-CBM3-GH48, and GH9-CBM3-CBM3-
CBM3-GH5. These three CAZymes are implicated as being important for crystalline cellulose degradation, and in particular, the presence of GH48 and CBM3 domains appear to be essential for crystalline cellulose degradation, because the weakly cellulolytic species do contain GH5 and/or GH9 enzymes, but no GH48 domains (91,111). In addition, proteomic analyses of C. bescii and C. obsidiansis identified the cellulolytic inventory of multi- functional, multi-domain enzymes as primarily containing GH5, GH9, GH10, GH43, GH44,
GH48, and GH74 domains, as well as highly conserved CBM3 domains (117). The many multi-domain enzymes, found in the genus Caldicellulosiruptor, that contain multiple catalytic and binding domains in a single protein are thought to have evolved by domain shuffling (118,119). These Caldicellulosiruptor multi-domain enzymes are often co-located in areas of the genome, including one termed the “glucan degradation locus” (GDL), which contains many of the CBM3-containing enzymes, and a separate xylanase locus (91,120).
24
The Caldicellulosiruptor Pan-CAZome
The nine Caldicellulosiruptor species genomes listed in Table 1.1 encode for over
600 individual CAZymes that can be binned into 144 groups of protein homologs in the pan-
CAZome. These 144 proteins contain 128 GH domains representing 44 families and one non-classified sequence, 58 CBM domains representing 17 families, 11 CE domains representing 7 families, and 9 PL domains representing 4 families and 2 non-classified sequences. To separate these 144 proteins into those involved in extracellular biomass processing and those involved in intracellular sugar catabolism, SignalP4.1 (input settings: gram-positive bacteria, sensitive D-cutoff=0.42, no TM regions) was used to predict signal peptides at the protein N-terminus, which is indicative of protein secretion (121). Using this method, 60 of the 144 CAZymes were identified as extracellular, and the remaining 84 were identified as intracellular. A graphical representation of the domain structure of each of the proteins found in the Caldicellulosiruptor extracellular pan-CAZome is shown in Figure 1.3.
Similarly, the intracellular pan-CAZome is shown in Figure 1.4. These figures show the hierarchical structure of the pan-CAZome, where the 7 extracellular and 26 intracellular core
CAZymes, found in all nine species, are shown at the top, with the boxes of increasing size encompassing all the proteins found in five, three, two, and one or more species, respectively. Put another way, the proteins found inside the box for three or more species, but outside the box for five or more species, have homologous proteins in three or four of the nine Caldicellulosiruptor species.
Taking a closer look at the extracellular pan-CAZome reveals the multi-domain enzyme strategy Caldicellulosiruptor species use to degrade biomass. These extracellular
25
proteins contain between 1 and 9 domains, with the average protein having 2.8 domains. In addition, 13 of the 60 extracellular proteins contain more than one catalytic domain from the
GH, PL, or CE families, which allows for bi-functional enzymes and an increased level of catalytic synergism from individual catalytic proteins. This multi-domain extracellular
CAZyme profile is in stark contrast to the CAZyme profile of the intracellular proteins, where all but two of the 84 proteins contain only one CAZy domain. This is not particularly surprising, as intracellular CAZymes are involved in the catabolism of mono- and oligo- saccharides that have already been liberated from the recalcitrant biomass substrate and transported into the cell.
Notable Caldicellulosiruptor CAZymes
A recent review (120) catalogs the biochemically characterized CAZymes from
Caldicellulosiruptor species. The role that a selected number of these proteins have in degrading a biomass substrate are shown in Figure 1.1B, including: β-1,4-glucanase (CelA)
GH9-CBM3-CBM3-CBM3-GH48 (122-124), β-1,4-glucanase and β-1,4-xylanase GH5-
CBM28-SLH (91), β-1,4-xylanase CBM22-CBM22-GH10 (108,125,126), β-1,4-mannanase
CBM27-CBM27-CBM35-GH26 (127,128), α-L-arabinofuranosidase GH43-CBM22-GH43-
CBM6 (108,119), and pectin lyase PL3 (65). Also shown are two CE proteins and one PL protein whose properties have not been biochemically characterized, but are inferred from
CAZy family classifications. These include the CE4 removing acetyl groups, a CE1-CBM13 protein disrupting the ferulic acid cross-linkage between xylan chains, and PL9-CBM3-
CBM3 degrading the α-1,4-linked D-galacturonic acid backbone of pectin.
26
One of the most noteworthy and novel enzymes produced by the five highly cellulolytic Caldicellulosiruptor species is CelA (GH9-CBM3-CBM3-CBM3-GH48). This enzyme was first characterized from C. bescii (Athe_1867) as having a temperature optimum of 85°C and pH optimum of 5.0-6.0 on Avicel, a crystalline cellulose substrate, with additional weak activity on oat spelt xylan (122). Proteomics work on the secretome of C. bescii revealed that CelA is the dominant cellulase produced by the species (117). More recent work with various truncations of CelA have shown that the GH9 domain possesses the main catalytic activity on cellulose substrates, and the GH48 domain works synergistically with the GH9 to increase sugar release (124). CelA has also been demonstrated to degrade the biomass substrates corn stover and switchgrass; while it appears to be inhibited by cellobiose, when a β-D-glucosidase was added with CelA, 100% glucan conversion from
Avicel was achieved after 7 days at 75°C (123). Additional work to elucidate the degradation mechanism of CelA showed that, unlike many cellulases which work from the ends of crystalline cellulose microfibrals, CelA burrows in the middle of the cellulose microfibrals creating cavities approximately 15-20 nm wide and 15-30 nm long (123). This burrowing mechanism promotes the formation of new cellulose fiber chain ends that could be synergistically digested by other cellulases produced by Caldicellulosiruptor species. CelA is among the largest, most thermostable, and most active enzymes yet identified for the degradation of crystalline cellulose.
27
Linker Regions in Caldicellulosiruptor CAZymes
CelA is located in the Caldicellulosiruptor “glucan degrading locus” (GDL), along with a number of other CBM3-rich CAZymes (91). Another feature of the genes in the GDL is that the linkers between the domains of these multi-domain proteins are highly Pro/Thr- rich (108). Linkers have been a noted feature in a number of CAZymes, often with high Gly,
Ser, Pro, and Thr content, and are thought to be unfolded regions of disorder to allow flexibility between the folded ordered CAZy domains (129). To probe the extracellular
Caldicellulosiruptor CAZyme for these flexible linker sequences, the extracellular CAZymes from C. kronotskyensis, the species containing the most overall CAZymes, were submitted to the SEG segmentation algorithm, an online predictor of low complexity disordered regions
(130). SEG was used to identify the low complexity regions from these proteins, and further analysis was used to evaluate these regions for high Pro/Thr content. In C. kronotskyensis, all of the CAZymes located in the GDL from Calkro_0850 (CelA) to Calkro_0864 contain highly Pro/Thr-rich linkers, and, furthermore, these were the only CAZymes in the genome to exhibit this characteristic. Another potential linker motif was identified in Calkro_0402 in the region between the SLH domains and the reminder of the protein, where the SEG predicted region of disorder is highly Gln/Thr-rich. Flexible linkers that are Asn-rich or Gly- rich, instead of the typical Pro/Thr-rich, have been identified in CAZymes from the ruminal fungus, Neocallimastix patriciarum (71,131). CAZyme linker length and flexibility has been probed by x-ray scattering, identifying a number of conformations, including those where the linker is extended, suggesting these regions are highly flexible (132,133). This characteristic of the linkers promotes synergy between the various linked domains, as each
28
domain has the flexibility of movement to perform its role, binding or enzymatic, in the larger multi-domain protein.
Another feature of these Pro/Thr-rich linkers is that that they are thought to be the sites of glycosylation of these Caldicellulosiruptor proteins. The Ser and Thr residues in linker regions of CAZymes are often O-glycosylated, which has been primarily studied in the
CAZymes of Trichoderma reesei (129,134). Native purified CelA (Athe_1867) from C. bescii appears to be larger than the predicted molecular weight by gel electrophoresis (194 kDa predicted, 230 kDa observed), suggesting that the protein in glycosylated (122,124).
Glycosylation has also been implicated for other CAZymes from the GDL: Athe_1857
(GH10-CBM3-CBM3-GH48), Athe_1865 (GH9-CBM3-CBM3-CBM3-GH5), and
Athe_1859 (GH5-CBM3-CBM3-GH44), in a study of cell-free enzymes from C. bescii
(135). In addition, glycosylation has been observed for components of the cellulosome in C. thermocellum and other cellulosomal microbes (103,136). This glycosylation may be a method to protect against proteases, a potential problem in the low complexity sequences of linker regions between domains in Caldicellulosiruptor enzymes (137). Alternatively, glycosylation may play a role in the dynamic binding to carbohydrate substrates and processive action of CAZymes, as was demonstrated by molecular dynamic simulation of
Trichoderma reesei cellobiohydrolases Cel6A (GH6-CBM1) and Cel7A (GH7-CBM1)
(129). Glycosylation was also found to be an important factor in the rigidity of the linker in
Cel45 from Humicola insolens, where a mutated linker with less glycosylation was found to be more rigid by x-ray scattering (132).
29
The Caldicellulosiruptor Surface Layer
A number of the multi-domain CAZymes from the genus Caldicellulosiruptor also contain S-layer Homology (SLH) domains to localize these proteins at the cell surface within the S-layer. This unique group of multi-domain proteins includes both catalytic proteins containing GH5, GH10, GH16, GH43, GH55, GH87, GHnc, PL9, and PL11 domains and non-catalytic proteins containing CBM and immunoglobulin-like domains (91). A list of the
Caldicellulosiruptor SLH domain containing proteins is shown in Table 1.2 and includes protein loci, amino acid length, and domain structure. These SLH multi-domain proteins are present in all nine sequenced Caldicellulosiruptor species and a number of these proteins are unique to a single species. Because these proteins are localized to the cell surface, the interface between the cell and substrate, it is believed that these SLH multi-domain proteins play a major role in the attachment to and degradation of lignocellulosic biomass by
Caldicellulosiruptor species.
As stated previously, the S-layer in most microorganisms is made up of a single S- layer protein (SLP). Proteomic and transcriptomic experiments implicate Csac_2451, from
C. saccharolyticus, and its homologs through the other Caldicellulosiruptor species, as the main SLP (91,111,138,139). Nearly all of the SLH proteins in Table 1.2 are extracellular by the SignalP4.1 prediction, supporting the idea that these proteins are localized outside the cell at the cell surface. Those that do not have predicted signal peptides are truncated at the N- terminus, unlike homologs in the same group, which likely explains the lack of a predicted signal peptide. The SLH domains of these proteins are either at the N-terminus and C- terminus, with only one group of homologs having SLH domains between two CBM
30
domains. The location of these domains at either terminus suggests that these proteins are tethered to the cell by the SLH domain containing end and then allowed to swing out away from the cell to interact with biomass with the remaining domains.
Csac_0678 (GH5-CBM28-SLH-SLH-SLH) is the only catalytic SLH protein to be conserved through all known Caldicellulosiruptor species. Csac_0678 was found to be optimally active at 75°C and pH 5.0 on a variety of substrates containing β-1,3/4-glucan, β-
1,4-xylan, and β-1,4-glucan linkages including cellulose (91). Other Caldicellulosiruptor
SLH proteins are thought to play a role in helping the cell attach and tether itself to the biomass substrate. Csac_2722 is one such protein that has been found to bind Avicel, a crystalline cellulose substrate, and has been identified in the S-layer and cell membrane fraction of C. saccharolyticus grown on switchgrass (91). The proteins grouped as ‘other’ in
Table 1.2 do not contain currently identified GH, PL, CE, or CBM domains, but a number of these proteins have been identified by proteomics as Avicel-bound (91). These proteins presumably contain unidentified CAZy domains that facilitate their attachment to biomass.
SLH proteins not involved in biomass attachment and degradation may play a role in the S- layer by helping to space the other SLH containing CAZymes, giving them adequate area to perform their enzymatic or binding roles.
FN3 containing Caldicellulosiruptor proteins
As previously discussed, little is known about the role of fibronectin type III (FN3)- like domains found in bacteria. A number of Caldicellulosiruptor proteins are predicted to contain these FN3 domains and are listed by homologous group in Table 1.3. These proteins
31
are particularly large, ranging from 750 to 3027 amino acids in length. Of these, Athe_0014 is the largest protein found in the genomes of all currently sequenced Caldicellulosiruptor species. All of these proteins are predicted to be extracellular by SignalP4.1, except two
GH3-FN3 groups, which are the smallest FN3-containing proteins. Three of these protein groups also contain SLH domains for localization at the surface of the bacterial cell.
At one time it was thought that in bacteria, FN3 domains were only found in extracellular glycoside hydrolase enzymes (79). However, four homologous groups are found in the Caldicellulosiruptor genus that contain no identified catalytic domains and, furthermore, no domains other than the predicted FN3 domains. An individual FN3 domain is only approximately 80 amino acids in length, leaving open the possibility that other, as yet unidentified, domains could be present in these proteins. Four of the homologous protein groups, found in Table 1.3, do contain catalytic glycoside hydrolase domains. For these proteins, the FN3 domain may contribute to catalytic ability, binding, thermostability, or domain spacing. However, the definitive role that these FN3 domains play in biomass degradation by the genus Caldicellulosiruptor, and more broadly in bacteria, remains unknown.
Comparison to Clostridium thermocellum
As mentioned, one of the best studied cellulolytic thermophilic microorganisms other than the Caldicellulosiruptor species is C. thermocellum, which primarily uses a cellulosomal mechanism to coordinate its inventory of CAZymes to degrade biomass. Both
Caldicellulosiruptor species and C. thermocellum are of particular interest to engineer as
32
metabolic engineering platforms for consolidated bioprocessing (CBP) of lignocellulosic biomass to biofuels. Table 1.4 compares the extracellular catalytic CAZyme domain inventory of C. thermocellum to that of Caldicellulosiruptor kronotskyensis, the
Caldicellulosiruptor species with the largest CAZyme inventory. Each catalytic domain found in multi-domain proteins with multiple catalytic domains is cataloged separately. A number of C. thermocellum strains have sequenced genomes, but most work has focused on two of these strains: C. thermocellum ATCC 27405 and C. thermocellum DSM 1313. All extracellular CAZymes from the ATCC 27405 strain are found in DSM 1313 and only 3 proteins (containing GH11, GH43, and GH18) are unique to DSM 1313. Because of these very few differences between the two strains, they were collapsed into a single CAZyme domain profile for comparison to C. kronotskyensis in Table 1.4. Again, extracellular proteins for both C. thermocellum and C. kronotskyensis were predicted by identifying signal peptides using SignalP4.1 (121).
One of the major differences between the catalytic domain inventory of these microbes shown in Table 1.4 is that C. thermocellum has almost double the catalytic
CAZymes domains of C. kronotskyensis (70 vs. 38). However the number of domain families is fairly consistent between the organisms: 17 vs. 15 GH families, 4 vs. 3 PL families, 7 vs. 2 CE families for C. thermocellum vs. C. kronotskyensis, respectively.
Interestingly, almost all (61 out of 70) of the catalytic CAZyme domains from C. thermocellum are associated with a dockerin domain that would mediate their attachment to the cellulosome. The remaining 9 catalytic CAZyme domains, which are each on separate proteins, are likely free-enzymes. Some of these C. thermocellum free enzymes have high
33
similarity to C. kronotskyensis CAZymes, particularly Cthe_0040 (GH9-CBM3-CBM3) and
Cthe_0071 (GH48-CBM3), which mirror the structure of the CBM3-rich cellulases of the
Caldicellulosiruptor glucan degradation locus (GDL), and Cthe_2809 (SLH-SLH-SLH-
CBM54-GH16-CBM4-CBM4-CBM4-CBM4), which has high similarity to Calkro_0072. In addition to the catalytic CAZy domain containing proteins, C. thermocellum contains 9 extracellular proteins that contain only CBM domains compared to 3 exclusively CBM- containing proteins in C. kronotskyensis. These proteins are presumably involved in biomass attachment.
To compare the cellulosomal and multi-domain mechanisms that C. thermocellum and C. kronotskyensis, respectively, use to degrade biomass, the number of CAZy domains
(GH, PL, CE, CBM) per extracellular protein was calculated. C. thermocellum contains 136
CAZy domains in 72 total extracellular proteins, while C. kronotskyensis contains 103 CAZy domains in 34 extracellular proteins. C. kronotskyensis has about one more CAZy domain per extracellular protein (3.0 domains/protein) than C. thermocellum (1.8 domains/protein) on average. This confirms the multi-domain strategy that C. kronotskyensis uses to degrade biomass and highlights one of the major differences between these cellulolytic thermophiles.
Conclusions
Members of the genus Caldicellulosiruptor are the most thermophilic, cellulolytic bacteria known. Caldicellulosiruptor species accomplish degradation of unpretreated lignocellulosic biomass by using a collection of unique large multi-domain CAZymes. The multi-domain architecture of these proteins promotes synergy between the various catalytic
34
and binding domains within a single protein to efficiently degrade biomass. An important subset of the CAZymes found in the genus Caldicellulosiruptor includes SLH domains to localize these proteins to the cell surface. These SLH proteins also contain CBM domains for mediating attachment of the cell to the biomass substrate, while various catalytic GH, PL, and CE domains in the same protein digest it. Additionally, a number of accessory domains, such as FN3, are a part of a number of Caldicellulosiruptor CAZymes. The definitive functions of these domains are largely unknown in the context of biomass degradation, and remain an area of ongoing investigation. The Caldicellulosiruptor multi-domain mechanism for biomass degradation is a distinct alternative to the cellulosome mechanism of other cellulolytic thermophiles, such as Clostridium thermocellum. The ability of
Caldicellulosiruptor to degrade unpretreated biomass, using a unique inventory of multi- domain enzymes, at temperatures above 70°C, makes these species leading candidates for development into CBP for the conversion of biomass to biofuels.
Future Trends
Metagenomics - Identifying Novel CAZymes
Identification of novel proteins related to biomass degradation is becoming more possible with the decreasing cost of DNA sequencing and the development of metagenomics.
This sequence-based analysis takes advantage of the great diversity in microbial communities, particularly of microorganisms that would not be culturable using standard microbiological techniques (140,141). A standard metagenomic experiment would take an environmental sample, potentially enrich the natural microbiota in a particular subset of
35
microbes by culturing under selective condition (e.g., high temperature for thermophiles, media with crystalline cellulose for cellulolytic microbes), isolate DNA from the microbial community in the sample, and sequence all of the isolated DNA for analysis (140). This technique has been used to identify novel CAZymes from the human gut microbiota (142), decaying wood (143), buffalo rumen (144), and German grassland soil samples (145). Once a CAZyme gene of interest is identified, it can be cloned for biochemical characterization, as was done in the German grassland soil study to identify a novel CBM9-GH9 cellulase optimally active at 45-50°C and two novel GH11 xylanases optimally active at 60°C (145).
Novel enzymes of particular interest can then be inserted into the genome of a CBP microorganism to improve lignocellulosic biomass degradation.
Caldicellulosiruptor bescii Genetic System
Ultimately, the development of a CBP microorganism depends on the ability to genetically manipulate a platform organism to adjust the cellulolytic enzyme inventory and metabolic pathways for optimized biomass degradation and fuel molecule production. A genetic system for Caldicellulosiruptor bescii has only recently been developed, and work to make it tractable and extend genetic tools into other Caldicellulosiruptor species is continuing (25,146-151). This genetic system in C. bescii is constructed around a uracil auxotroph ΔpyrFA (Cbes_1377) disruption mutant. The gene pyrF encodes for orotidine monophosphate decarboxylase, an enzyme essential for uracil biosynthesis. This allows for selection of uracil auxotroph mutants because uracil prototrophs with functioning PyrF will ultimately convert 5-fluoroorotic acid (5-FOA) into the toxic product fluorodeoxyuridine
36
(148). The uracil auxotroph mutant can then be complemented with a shuttle vector containing the pyrF gene under the control of the 30S ribosomal subunit (Cbes_2105) regulatory region (148). This ΔpyrFA strain was used to successfully knockout the gene
Cbes_1918 encoding lactate dehydrogenase (ldh) to metabolically engineer C. bescii to divert metabolic flux away from lactate and increase acetate and hydrogen production (25).
Metabolic Engineering of a CBP Thermophile
Even with a tractable genetic system, the genetic manipulations necessary to engineering a highly productive consolidating bioprocessing thermophile are non-trivial.
Genetic manipulations will be needed to engineer both the front-end CAZymes inventory to promote complete degradation of the biomass substrate, and the back-end metabolism to divert flux to products of interest (i.e., alcohols or other fuel molecules). This front-end engineering will likely involve knocking genes into the CBP species. As seen in the
Caldicellulosiruptor genus, a number of novel and potentially very productive biomass degrading enzymes are found only in a single species and these would need to be added together into the engineered strain. Likewise, metagenomic studies in cellulosic environments will no doubt reveal novel thermophilic CAZymes from a diversity of species that could complement the Caldicellulosiruptor pan-CAZome in an engineered strain. The two main challenges of the back-end metabolic engineering are a relative dearth of precise information about the metabolism of these cellulolytic thermophiles and an absence of a drop-in thermophilic liquid fuel production pathway. While we have a significant understanding of the metabolism of Caldicellulosiruptor species, there are still holes in our
37
knowledge that are actively being pursued. A more precise understanding of
Caldicellulosiruptor metabolism is necessary to target gene knockouts to direct metabolic flux away from byproducts to products of interest. While small amounts of alcohols are produced by Caldicellulosiruptor species, it is unclear at this early stage whether metabolism can be significantly diverted to fuels using only the Caldicellulosiruptor metabolism. In the case that it is not, another metabolic engineering strategy is to add all of the components of an alcohol production pathway from another source into the CBP host. However, currently such a thermophilic pathway does not exist at the growth temperatures of
Caldicellulosiruptor species (>70°C). Work continues to develop our understanding and manipulate both the front-end biomass degradation and back-end metabolic flux of the
Caldicellulosiruptor genus.
38
Web Resources
Carbohydrate-Active enZYmes Database www.cazy.org
National Center for Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov/
NCBI Conserved Domains Search http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi
Pfam 27.0 Database http://pfam.sanger.ac.uk/
SignalP4.1 Server http://www.cbs.dtu.dk/services/SignalP/
SEG Prediction of Low Complexity Regions http://mendel.imp.ac.at/METHODS/seg.server.html
Acknowledgments
This work was funded in part by the BioEnergy Science Center, a U.S. Department of
Energy Bioenergy Research Center supported by the Office of Biological and Environmental
Research in the DOE Office of Science.
39
Figure 1.1 - Biomass structure and degradation by selected Caldicellulosiruptor enzymes (A) Cellulose microfibrils of β-1,4-linked D-glucose contain both crystalline (left and right) and amorphous regions (center). Various forms of hemicellulose, xylan (top right, bottom center), mannan (bottom left), and xyloglucan (top left), form a sheath around and crosslink the cellulose microfibrals. Pectin homogalacturonan (top left and bottom center) and branched rhamnogalacturonan I (bottom right) also contribute to the sheathing of the cellulose microfibrals. (B) Selected Caldicellulosiruptor enzymes are shown degrading these biomass polysaccharides.
40
Figure 1.2 - The three biomass degradation mechanisms used by cellulolytic microorganisms Free enzymes contain either a single catalytic domain or one linked to a Carbohydrate Binding Module (CBM) domain. Free enzymes are secreted and disconnected from the cell. The cellulosome structure is shown structured around Clostridium thermocellum scaffoldin CipA, but a variety of other multi-enzyme cellulosome structures exist. Enzyme subunits attach to the scaffoldin by Type I cohesin-dockerin interactions. The scaffoldin attaches to an anchoring scaffoldin by a Type II cohesin-dockerin interaction. The anchoring scaffoldin is attached at the cell surface within the cell’s surface layer (S-layer) by Surface Layer Homology (SLH) domains. Multi-domain proteins contain multiple catalytic and binding domains within the same protein. Some of these multi-domain proteins are anchored to the cell surface within the S-layer by SLH domains. Microorganisms may use one or more of these mechanisms to effectively degrade biomass.
41
Figure 1.3 - Caldicellulosiruptor extracellular pan-CAZome Core extracellular CAZymes are contained in the genomes of all nine sequenced Caldicellulosiruptor species. The additional boxes encompass extracellular CAZymes contained in the genomes of five or more, three or more, two or more, and one or more species, respectively. CAZy domain family numbers are shown for Glycoside Hydrolase (GH), Carbohydrate Binding Module (CBM), Polysaccharide Lyase (PL), and Carbohydrate Esterase (CE) domains. Surface Layer Homology (SLH) domains are also shown, but additional predicted domains are omitted. Extracellular proteins were predicted to contain a signal peptide using SignalP4.1.
42
Figure 1.4 - Caldicellulosiruptor intracellular pan-CAZome Core intracellular CAZymes are contained in the genomes of all nine sequenced Caldicellulosiruptor species. The additional boxes encompass intracellular CAZymes contained in the genomes of five or more, three or more, two or more, and one or more species, respectively. CAZy domain family numbers are shown for Glycoside Hydrolase (GH), Carbohydrate Binding Module (CBM), Polysaccharide Lyase (PL), and Carbohydrate Esterase (CE) domains. The intracellular pan-CAZome contains no Surface Layer Homology (SLH) domain CAZymes. Additional predicted domains are omitted. Intracellular proteins are classified in this category if they are not predicted to contain a signal peptide using SignalP4.1.
43
Table 1.1 - Caldicellulosiruptor Genomes
Accession Genome Species nov. Genome Species Number Size (Mb) Announcement Announcement C. acetigenus ASM42172v1 2.59 (152) N/A C. bescii CP001393 2.93 (153) (154) C. CP002219 2.77 (155) (156) hydrothermalis C. kristjanssonii CP002326 2.8 (157) (156) C. CP002330 2.84 (155) (156) kronotskyensis C. lactoaceticus CP003001 2.62 (158) (156) C. obsidiansis CP002164 2.53 (159) (160) C. owensensis CP002216 2.43 (161) (156) C. CP000679 2.97 (162) (112) saccharolyticus
44
Table 1.2 - Surface Layer Homology domain containing Caldicellulosiruptor proteins
# AA Locus Domains Athe_0594, Calkro_2036, Calhy_2064, H557DRAFT_00803, Csac_0678, Calkr_2007, 755-756 Calla_0352 GH5-CBM28⁰-SLH-SLH-SLH Calow_0459⁰, COB47_0546⁰ 567-568 Calkro_0072, Calhy_0060, COB47_0076 1732 SLH-SLH-SLH-CBM54-GH16-CBM4-CBM4-CBM6- 526 Csac_2549* CBM4-CBM4-CBM4 H557DRAFT_02137 1411 1672-1675 Calkro_0402, Calow_1924
CBM22-CBM22-CBM22-GH10-CBM9-CBM9- GH Calkr_2245^ 2159 (CE15^)-SLH-SLH Calla_0206 1593
Calkro_0111 2435 SLH-SLH-SLH-CBM54-FN3-GH16-FLD-FN3-FN3- FLD-FN3-FN3-GH55-CBM32-CBM32 Calkro_0121* 2229
Calhy_1629 1440 SLH-SLH-SLH-GH43-CBM54
Calhy_2383 2007 CBM35-GH87-FN3-SLH-SLH-SLH
Csac_2722 2593 CBM4-Big4-Coh-CBM4-Big4-GHnc-SLH-SLH-SLH
Calkro_0550, Calow_1771 1789-1790 Big4-PL11-SLH-SLH PL Calow_2109 1711 Big3-Big3-Big3-PL9-SLH-SLH-SLH Calkr_1989*† 326 CBM20-SLH-SLH-CBM20 1097
Calla_0367, Calow_0484, COB47_0564 1088 only Calla_2176 CBM27-CBM27-CBM27-SLH-SLH-SLH
COB47_0167*† 886 CBM CBM Calkr_2289 1495 SLH-SLH-SLH-CBM54 Calla_0162* 480 Athe_0438, Calkro_2198, Calhy_2227, Calkr_2224, Calla_0227, Calow_0272, 1156-1157 SLH-SLH-Big5 COB47_0390, Csac_0207, H557DRAFT_02082 Athe_1943, Calkro_0749, Calhy_0832, Calkr_0758, Calla_1578, Calow_1666, 275-277 SLH-SLH COB47_1753, Csac_1009, H557DRAFT_02335
Athe_2303, Calkro_0333, Calhy_0455, Calla_0125, Calkr_2323, COB47_2069, 1013-1026
Other Csac_2451, H557DRAFT_01166 SLH
Calow_1997 972
Athe_2341, Calkro_0210, Calhy_0185, Calkr_2370, 483-484 SLH-SLH Calla_0076, Calow_2070, COB47_2114, Csac_0591, H557DRAFT_01346
45
Table 1.2 Continued
Athe_2342, Calkro_0209, Calhy_0184, Calkr_2371, Calla_0075, Calow_2071, COB47_2115, 1003-1010 SLH Csac_0590, H557DRAFT_01347
Athe_1839, Calkro_0875, Calkr_0834, Calla_1498, 575-576 SLH-SLH-SLH Calow_1583, COB47_1651, Csac_2381, H557DRAFT_00696 Athe_2295, Calkro_0339, Calhy_0464, Calkr_2316, Calla_0133, COB47_2061, Csac_2445, 1074-1075 SLH-SLH H557DRAFT_01171 Calkr_2463, Calla_2324, Calow_0034, 1774-1779 COB47_0063 SLH-SLH-SLH-Big1-Tglut Athe_0077 1710
Calhy_0047† 1626 Athe_0608, Calkro_2018, Calhy_2046, Calow_0482, 547 SLH-SLH-SLH
COB47_0562 Calkro_0360, Calkr_2450 793-798
Other SLH-SLH-SLH Calow_1967† 683
Calkro_0197, Calkr_2380 1052 SLH-SLH
Calkro_0155, Calow_2111 549 MG2-SLH-SLH-SLH
Calkro_0090, Calhy_0073 664 SLH-SLH-SLH Athe_0012 3027 SLH-SLH-FN3-vWFA-SH3-RHSrep-RHScore Calkro_0014 2994
Calkr_2514, Calla_2376 643 SLH-SLH-SLH Calkr_0022 254 SLH-SLH-SLH Calla_2302* 144 Athe_2223 591 SLH COB47_1200*† 477
Calkr_2453 1153 SLH-Big5
H557DRAFT_02031† 128 SLH-SLH * fragment, † no predicted signal peptide, ^ Calkr_2245 contains an additional CE15 domain, ⁰ Calow_0459 and COB47_0546 have a truncated CBM28 domain
GH: Glycoside Hydrolase, PL: Polysaccharide Lyase, CE: Carbohydrate Esterase, CBM: Carbohydrate Binding Module, SLH: Surface Layer Homology (pfam00395), FN3: Fibronectin type III-like (pfam00041), Big: Bacterial Immunoglobulin (CL0159), FLD: Fascin-like domain (CL0066), Coh: Cohesin-like (CL14606), Tglut: Transglutaminase-like superfamily (pfam01841), MG2: Macroglobulin2 (pfam01835), vWFA: Von Willebrand factor type A (pfam00092), SH3: Bacterial SH3 (pfam08239), RHSrep: RHS repeat (pfam05593), RHScore: RHS associated core domain (TIGR03696)
46
Table 1.3 - Fibronectin type III-like domain containing Caldicellulosiruptor proteins Locus # AA SignalP Domains Athe_1944, Calhy_0831, Calkro_0748, COB47_1754, 1201- Calow_1667, 1207 Y FN3-FN3 Csac_1008, H557DRAFT_02334 Calla_1579, Calkr_0757 798 Athe_1945, Calhy_0830, Calkro_0747, COB47_1755, 1265 Y FN3-FN3 Calow_1668, Csac_1007, H557DRAFT_02333 Athe_0012 3027 Y SLH-SLH-FN3-vWFA-SH3-RHSrep-RHScore Calkro_0014 2994 Calkro_0111 2435 SLH-SLH-SLH-CBM54-FN3-GH16-FLD-FN3-FN3-FLD- Y Calkro_0121* 2229 FN3-FN3-GH55-CBM32-CBM32 Calkro_0153 983 Y FN3-FN3 Calkro_0113 1240 Y CBM32-CBM32-FN3-GH55-CBM32 Calhy_2383 2007 Y CBM35-GH87-FN3-SLH-SLH-SLH Calow_2113 983 Y FN3 Athe_2354, Calhy_0170, Calkr_2396, Calkro_0181, Calla_0066, 770- N GH3-FN3 COB47_2124, 771 Calow_2083, Csac_0586, H557DRAFT_01368 Csac_1102 750 N GH3-FN3 * fragment GH: Glycoside Hydrolase, CBM: Carbohydrate Binding Module, SLH: Surface Layer Homology (pfam00395), FN3: Fibronectin type III-like (pfam00041), FLD: Fascin-like domain (CL0066), vWFA: Von Willebrand factor type A (pfam00092), SH3: Bacterial SH3 (pfam08239), RHSrep: RHS repeat (pfam05593), RHScore: RHS associated core domain (TIGR03696)
47
Table 1.4 - C. thermocellum and C. kronotskyensis extracellular CAZy domain inventory comparison Extracellular C. thermocellum C. kronotskyensis CAZy Domains 1 GH3 0 8 GH5 4 1 GH8 0 15 GH9 2 4 GH10 4 2 GH11 1 0 GH13 3 2 GH16 3 3 GH18 0 1 GH23 1
3 GH26 0 2 GH30 0 GH 0 GH39 0 6 GH43 1 2 GH44 1 2 GH48 3 1 GH53 1 0 GH55 2 1 GH74 1 0 GH81 2 0 GH87 1 1 GH94 0 0 GH124 0 2 PL1 1
1 PL3 1
PL 1 PL9 0 1 PL11 2 3 CE1 0 1 CE2 0
1 CE3 0
2 CE4 3 CE 1 CE8 1 1 CE12 0 1 CEnc 0 70 Total 38
48
References
1. Maki, M., Leung, K. T., and Qin, W. (2009) The prospects of cellulase-producing
bacteria for the bioconversion of lignocellulosic biomass. Int J Biol Sci 5, 500-516
2. National Research Council. (2009) Liquid Transportation Fuels from Coal and
Biomass: Technological Status, Costs, and Environmental Impacts, The National
Academies Press, Washington, DC
3. U.S. Department of Energy. (2011) U.S. Billion-Ton Update: Biomass Supply for a
Bioenergy and Bioproducts Industry, ORNL/TM-2011/224, Oak Ridge National
Laboratory, Oak Ridge, TN
4. Sun, Y., and Cheng, J. (2002) Hydrolysis of lignocellulosic materials for ethanol
production: a review. Bioresour Technol 83, 1-11
5. Pauly, M., and Keegstra, K. (2008) Cell-wall carbohydrates and their modification as
a resource for biofuels. Plant J 54, 559-568
6. Pauly, M., and Keegstra, K. (2010) Plant cell wall polymers as precursors for
biofuels. Curr Opin Plant Biol 13, 304-311
7. Cosgrove, D. J. (2005) Growth of the plant cell wall. Nat Rev Mol Cell Biol 6, 850-
861
49
8. Lynd, L. R., Weimer, P. J., van Zyl, W. H., and Pretorius, I. S. (2002) Microbial
cellulose utilization: fundamentals and biotechnology. Microbiol Mol Biol Rev 66,
506-577
9. Scheller, H. V., and Ulvskov, P. (2010) Hemicelluloses. Annu Rev Plant Biol 61, 263-
289
10. Vanholme, R., Morreel, K., Ralph, J., and Boerjan, W. (2008) Lignin engineering.
Curr Opin Plant Biol 11, 278-285
11. Grabber, J. (2005) How Do Lignin Composition, Structure, and Cross-Linking Affect
Degradability? A Review of Cell Wall Model Studies. Crop Sci 45, 820-831
12. Ju, X., Engelhard, M., and Zhang, X. (2013) An advanced understanding of the
specific effects of xylan and surface lignin contents on enzymatic hydrolysis of
lignocellulosic biomass. Bioresour Technol 132, 137-145
13. Lynd, L. R., van Zyl, W. H., McBride, J. E., and Laser, M. (2005) Consolidated
bioprocessing of cellulosic biomass: an update. Curr Opin Biotechnol 16, 577-583
14. Margeot, A., Hahn-Hagerdal, B., Edlund, M., Slade, R., and Monot, F. (2009) New
improvements for lignocellulosic ethanol. Curr Opin Biotechnol 20, 372-380
15. Wu, X., McLaren, J., Madl, R., and Wang, D. (2010) Biofuels from Lignocellulosic
Biomass. in Sustainable Biotechnology (Singh, O. V., and Harvey, S. P. eds.),
Springer Netherlands. pp 19-41
50
16. Ishola, M. M., Jahandideh, A., Haidarian, B., Brandberg, T., and Taherzadeh, M. J.
(2013) Simultaneous saccharification, filtration and fermentation (SSFF): a novel
method for bioethanol production from lignocellulosic biomass. Bioresour Technol
133, 68-73
17. Olson, D. G., McBride, J. E., Shaw, A. J., and Lynd, L. R. (2012) Recent progress in
consolidated bioprocessing. Curr Opin Biotechnol 23, 396-405
18. Yamada, R., Hasunuma, T., and Kondo, A. (2013) Endowing non-cellulolytic
microorganisms with cellulolytic activity aiming for consolidated bioprocessing.
Biotechnol Adv
19. You, C., Zhang, X. Z., Sathitsuksanoh, N., Lynd, L. R., and Zhang, Y. H. (2012)
Enhanced Microbial Utilization of Recalcitrant Cellulose by an Ex Vivo Cellulosome-
Microbe Complex. Appl Environ Microbiol 78, 1437-1444
20. Blumer-Schuette, S. E., Kataeva, I., Westpheling, J., Adams, M. W., and Kelly, R. M.
(2008) Extremely thermophilic microorganisms for biomass conversion: status and
prospects. Curr Opin Biotechnol 19, 210-217
21. Barnard, D., Casanueva, A., Tuffin, M., and Cowan, D. (2010) Extremophiles in
biofuel synthesis. Environ Technol 31, 871-888
51
22. VanFossen, A. L., Lewis, D. L., Nichols, J. D., and Kelly, R. M. (2008)
Polysaccharide degradation and synthesis by extremely thermophilic anaerobes. Ann
N Y Acad Sci 1125, 322-337
23. Blumer-Schuette, S. E., Brown, S. D., Sander, K. B., Bayer, E. A., Kataeva, I.,
Zurawski, J. V., Conway, J. M., Adams, M. W. W., and Kelly, R. M. (2014)
Thermophilic lignocellulose deconstruction. FeMS Microbiol. Rev. 38, 393-448
24. Cai, Y., Lai, C., Li, S., Liang, Z., Zhu, M., Liang, S., and Wang, J. (2011) Disruption
of lactate dehydrogenase through homologous recombination to improve bioethanol
production in Thermoanaerobacterium aotearoense. Enzyme Microbial. Tech. 48,
155-161
25. Cha, M., Chung, D., Elkins, J. G., Guss, A. M., and Westpheling, J. (2013) Metabolic
engineering of Caldicellulosiruptor bescii yields increased hydrogen production from
lignocellulosic biomass. Biotechnol. Biofuels 6, 85
26. Tripathi, S. A., Olson, D. G., Argyros, D. A., Miller, B. B., Barrett, T. F., Murphy, D.
M., McCool, J. D., Warner, A. K., Rajgarhia, V. B., Lynd, L. R., Hogsett, D. A., and
Caiazza, N. C. (2010) Development of pyrF-based genetic system for targeted gene
deletion in Clostridium thermocellum and creation of a pta mutant. Appl Environ
Microbiol 76, 6591-6599
52
27. Cantarel, B. L., Coutinho, P. M., Rancurel, C., Bernard, T., Lombard, V., and
Henrissat, B. (2009) The Carbohydrate-Active EnZymes database (CAZy): an expert
resource for Glycogenomics. Nucleic Acids Res 37, D233-238
28. Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M., and Henrissat, B.
(2014) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids
Res. 42, D490-495
29. Levasseur, A., Drula, E., Lombard, V., Coutinho, P. M., and Henrissat, B. (2013)
Expansion of the enzymatic repertoire of the CAZy database to integrate auxiliary
redox enzymes. Biotechnol Biofuels 6, 41
30. Naumoff, D. G. (2011) Hierarchical classification of glycoside hydrolases.
Biochemistry (Mosc) 76, 622-635
31. Boraston, A. B., Bolam, D. N., Gilbert, H. J., and Davies, G. J. (2004) Carbohydrate-
binding modules: fine-tuning polysaccharide recognition. Biochem J 382, 769-781
32. Guillen, D., Sanchez, S., and Rodriguez-Sanoja, R. (2010) Carbohydrate-binding
domains: multiplicity of biological roles. Appl Microbiol Biotechnol 85, 1241-1249
33. McCarter, J. D., and Withers, S. G. (1994) Mechanisms of enzymatic glycoside
hydrolysis. Curr Opin Struct Biol 4, 885-892
34. Vocadlo, D. J., and Davies, G. J. (2008) Mechanistic insights into glycosidase
chemistry. Curr Opin Chem Biol 12, 539-555
53
35. Davies, G. J., Planas, A., and Rovira, C. (2011) Conformational Analyses of the
Reaction Coordinate of Glycosidases. Accounts Chem. Res. 45, 308-316
36. Knott, B. C., Haddad Momeni, M., Crowley, M. F., Mackenzie, L. F., Götz, A. W.,
Sandgren, M., Withers, S. G., Ståhlberg, J., and Beckham, G. T. (2013) The
Mechanism of Cellulose Hydrolysis by a Two-Step, Retaining Cellobiohydrolase
Elucidated by Structural and Transition Path Sampling Studies. J Am Chem Soc 136,
321-329
37. Mayes, H. B., Broadbelt, L. J., and Beckham, G. T. (2013) How Sugars Pucker:
Electronic Structure Calculations Map the Kinetic Landscape of Five Biologically
Paramount Monosaccharides and Their Implications for Enzymatic Catalysis. J Am
Chem Soc
38. Bayer, E. A., Chanzy, H., Lamed, R., and Shoham, Y. (1998) Cellulose, cellulases
and cellulosomes. Curr Opin Struct Biol 8, 548-557
39. Barr, B. K., Hsieh, Y.-L., Ganem, B., and Wilson, D. B. (1996) Identification of Two
Functionally Different Classes of Exocellulases. Biochemistry 35, 586-592
40. Bianchetti, C. M., Elsen, N. L., Fox, B. G., and Phillips, G. N., Jr. (2011) Structure of
cellobiose phosphorylase from Clostridium thermocellum in complex with phosphate.
Acta Crystallogr Sect F Struct Biol Cryst Commun 67, 1345-1349
54
41. Hong, M.-R., Kim, Y.-S., Park, C.-S., Lee, J.-K., Kim, Y.-S., and Oh, D.-K. (2009)
Characterization of a recombinant β-glucosidase from the thermophilic bacterium
Caldicellulosiruptor saccharolyticus. J. Biosci. Bioeng. 108, 36-40
42. Pollet, A., Delcour, J. A., and Courtin, C. M. (2010) Structural determinants of the
substrate specificities of xylanases from different glycoside hydrolase families. Crit
Rev Biotechnol 30, 176-191
43. Janeček, Š., and Blesák, K. (2011) Sequence-Structural Features and Evolutionary
Relationships of Family GH57 α-Amylases and Their Putative α-Amylase-Like
Homologues. The Protein Journal 30, 429-435
44. MacGregor, E. A., Janeček, Š., and Svensson, B. (2001) Relationship of sequence and
structure to specificity in the α-amylase family of enzymes. BBA - Protein Struct M
1546, 1-20
45. Gilbert, H. J., Stålbrand, H., and Brumer, H. (2008) How the walls come crumbling
down: recent structural biochemistry of plant polysaccharide degradation. Curr Opin
Plant Biol 11, 338-348
46. Sørensen, H., Jørgensen, C., Hansen, C., Jørgensen, C., Pedersen, S., and Meyer, A.
(2006) A novel GH43 α-l-arabinofuranosidase from Humicola insolens: mode of
action and synergy with GH51 α-l-arabinofuranosidases on wheat arabinoxylan. Appl
Microbiol Biotechnol 73, 850-861
55
47. Ryabova, O., Vršanská, M., Kaneko, S., van Zyl, W. H., and Biely, P. (2009) A novel
family of hemicellulolytic α-glucuronidase. FEBS Lett 583, 1457-1462
48. Xu, F. (2011) Enzymatic Degradation of Lignocellulosic Biomass. in Biocatalysis for
Green Chemistry and Chemical Process Development, John Wiley & Sons, Inc. pp
361-390
49. Fushinobu, S., Alves, V. D., and Coutinho, P. M. (2013) Multiple rewards from a
treasure trove of novel glycoside hydrolase and polysaccharide lyase structures: new
folds, mechanistic details, and evolutionary relationships. Curr Opin Struct Biol 23,
652-659
50. Mahadevan, S. A., Wi, S. G., Lee, D.-S., and Bae, H.-J. (2008) Site-directed
mutagenesis and CBM engineering of Cel5A (Thermotoga maritima). FEMS
Microbiol. Lett. 287, 205-211
51. Park, J. I., Kent, M. S., Datta, S., Holmes, B. M., Huang, Z., Simmons, B. A., Sale, K.
L., and Sapra, R. (2011) Enzymatic hydrolysis of cellulose by the cellobiohydrolase
domain of CelB from the hyperthermophilic bacterium Caldicellulosiruptor
saccharolyticus. Bioresour Technol 102, 5988-5994
52. Boraston, A. B., Kwan, E., Chiu, P., Warren, R. A., and Kilburn, D. G. (2003)
Recognition and hydrolysis of noncrystalline cellulose. J Biol Chem 278, 6120-6127
56
53. McLean, B. W., Boraston, A. B., Brouwer, D., Sanaie, N., Fyfe, C. A., Warren, R. A.,
Kilburn, D. G., and Haynes, C. A. (2002) Carbohydrate-binding modules recognize
fine substructures of cellulose. J Biol Chem 277, 50245-50254
54. Din, N., Gilkes, N. R., Tekant, B., Miller, R. C., Warren, R. A. J., and Kilburn, D. G.
(1991) Non-hydrolytic Disruption of Cellulose Gibres by the Binding Domain of a
Bacterial Cellulase. Nat Biotech 9, 1096-1099
55. Arantes, V., and Saddler, J. N. (2010) Access to cellulose limits the efficiency of
enzymatic hydrolysis: the role of amorphogenesis. Biotechnol Biofuels 3, 4
56. Reyes-Ortiz, V., Heins, R. A., Cheng, G., Kim, E. Y., Vernon, B. C., Elandt, R. B.,
Adams, P. D., Sale, K. L., Hadi, M. Z., Simmons, B. A., Kent, M. S., and Tullman-
Ercek, D. (2013) Addition of a carbohydrate-binding module enhances cellulase
penetration into cellulose substrates. Biotechnol Biofuels 6, 93
57. Gourlay, K., Arantes, V., and Saddler, J. N. (2012) Use of substructure-specific
carbohydrate binding modules to track changes in cellulose accessibility and surface
morphology during the amorphogenesis step of enzymatic hydrolysis. Biotechnol
Biofuels 5, 51
58. Lehtio, J., Sugiyama, J., Gustavsson, M., Fransson, L., Linder, M., and Teeri, T. T.
(2003) The binding specificity and affinity determinants of family 1 and family 3
cellulose binding modules. Proc Natl Acad Sci U S A 100, 484-489
57
59. Pell, G., Williamson, M. P., Walters, C., Du, H., Gilbert, H. J., and Bolam, D. N.
(2003) Importance of hydrophobic and polar residues in ligand binding in the family
15 carbohydrate-binding module from Cellvibrio japonicus Xyn10C. Biochemistry
42, 9316-9323
60. Lombard, V., Bernard, T., Rancurel, C., Brumer, H., Coutinho, P. M., and Henrissat,
B. (2010) A hierarchical classification of polysaccharide lyases for glycogenomics.
Biochem J 432, 437-444
61. Garron, M. L., and Cygler, M. (2010) Structural and mechanistic classification of
uronic acid-containing polysaccharide lyases. Glycobiology 20, 1547-1573
62. Yadav, S., Yadav, P. K., Yadav, D., and Yadav, K. D. S. (2009) Pectin lyase: A
review. Process Biochem 44, 1-10
63. Yip, V. L. Y., and Withers, S. G. (2004) Nature's many mechanisms for the
degradation of oligosaccharides. Org Biomol Chem 2, 2707-2713
64. Yip, V. L. Y., and Withers, S. G. (2006) Breakdown of oligosaccharides by the
process of elimination. Curr Opin Chem Biol 10, 147-155
65. Alahuhta, M., Brunecky, R., Chandrayan, P., Kataeva, I., Adams, M. W., Himmel, M.
E., and Lunin, V. V. (2013) The structure and mode of action of Caldicellulosiruptor
bescii family 3 pectate lyase in biomass deconstruction. Acta. Crystallogr. D Biol.
Crystallogr. 69, 534-539
58
66. Williamson, G., Kroon, P. A., and Faulds, C. B. (1998) Hairy plant polysaccharides: a
close shave with microbial esterases. Microbiology 144 ( Pt 8), 2011-2023
67. Biely, P. (2012) Microbial carbohydrate esterases deacetylating plant
polysaccharides. Biotechnol Adv 30, 1575-1588
68. Biely, P., Mackenzie, C. R., Puls, J., and Schneider, H. (1986) Cooperativity of
Esterases and Xylanases in the Enzymatic Degradation of Acetyl Xylan. Bio-
Technology 4, 731-733
69. Faulds, C. B., and Williamson, G. (1995) Release of ferulic acid from wheat bran by
a ferulic acid esterase (FAE-III) from Aspergillus niger. Appl Microbiol Biotechnol
43, 1082-1087
70. Selig, M., Adney, W., Himmel, M., and Decker, S. (2009) The impact of cell wall
acetylation on corn stover hydrolysis by cellulolytic and xylanolytic enzymes.
Cellulose 16, 711-722
71. Pai, C.-K., Wu, Z.-Y., Chen, M.-J., Zeng, Y.-F., Chen, J.-W., Duan, C.-H., Li, M.-L.,
and Liu, J.-R. (2010) Molecular cloning and characterization of a bifunctional
xylanolytic enzyme from Neocallimastix patriciarum. Appl Microbiol Biotechnol 85,
1451-1462
72. Brink, J., and Vries, R. (2011) Fungal enzyme sets for plant polysaccharide
degradation. Appl Microbiol Biotechnol 91, 1477-1492
59
73. Jolie, R. P., Duvetter, T., Van Loey, A. M., and Hendrickx, M. E. (2010) Pectin
methylesterase and its proteinaceous inhibitor: a review. Carbohyd. Res. 345, 2583-
2595
74. Aurilia, V., Parracino, A., and D'Auria, S. (2008) Microbial carbohydrate esterases in
cold adapted environments. Gene 410, 234-240
75. Martínez-Martínez, I., Montoro-García, S., Lozada-Ramírez, J. D., Sánchez-Ferrer,
Á., and García-Carmona, F. (2007) A colorimetric assay for the determination of
acetyl xylan esterase or cephalosporin C acetyl esterase activities using 7-amino
cephalosporanic acid, cephalosporin C, or acetylated xylan as substrate. Anal.
Biochem. 369, 210-217
76. Chothia, C., and Jones, E. Y. (1997) The molecular structure of cell adhesion
molecules. Annu Rev Biochem 66, 823-862
77. Punta, M., Coggill, P. C., Eberhardt, R. Y., Mistry, J., Tate, J., Boursnell, C., Pang,
N., Forslund, K., Ceric, G., Clements, J., Heger, A., Holm, L., Sonnhammer, E. L.,
Eddy, S. R., Bateman, A., and Finn, R. D. (2012) The Pfam protein families database.
Nucleic Acids Res. 40, D290-301
78. Campbell, I. D., and Spitzfaden, C. (1994) Building proteins with fibronectin type III
modules. Structure 2, 333-337
60
79. Little, E., Bork, P., and Doolittle, R. F. (1994) Tracing the spread of fibronectin type
III domains in bacterial glycohydrolases. J Mol Evol 39, 631-643
80. Kataeva, I. A., Seidel, R. D., 3rd, Shah, A., West, L. T., Li, X. L., and Ljungdahl, L.
G. (2002) The fibronectin type 3-like repeat from the Clostridium thermocellum
cellobiohydrolase CbhA promotes hydrolysis of cellulose by modifying its surface.
Appl Environ Microbiol 68, 4292-4300
81. Devillard, E., Goodheart, D. B., Karnati, S. K., Bayer, E. A., Lamed, R., Miron, J.,
Nelson, K. E., and Morrison, M. (2004) Ruminococcus albus 8 mutants defective in
cellulose degradation are deficient in two processive endocellulases, Cel48A and
Cel9B, both of which possess a novel modular architecture. J Bacteriol 186, 136-145
82. Brunecky, R., Alahuhta, M., Bomble, Y. J., Xu, Q., Baker, J. O., Ding, S. Y.,
Himmel, M. E., and Lunin, V. V. (2012) Structure and function of the Clostridium
thermocellum cellobiohydrolase A X1-module repeat: enhancement through
stabilization of the CbhA complex. Acta Crystallogr. D Biol. Crystallogr. 68, 292-
299
83. Fraser, J. S., Yu, Z., Maxwell, K. L., and Davidson, A. R. (2006) Ig-like domains on
bacteriophages: a tale of promiscuity and deceit. J Mol Biol 359, 496-507
84. Schubot, F. D., Kataeva, I. A., Chang, J., Shah, A. K., Ljungdahl, L. G., Rose, J. P.,
and Wang, B. C. (2004) Structural basis for the exocellulase activity of the
61
cellobiohydrolase CbhA from Clostridium thermocellum. Biochemistry 43, 1163-
1170
85. Alahuhta, M., Xu, Q., Bomble, Y. J., Brunecky, R., Adney, W. S., Ding, S. Y.,
Himmel, M. E., and Lunin, V. V. (2010) The unique binding mode of cellulosomal
CBM4 from Clostridium thermocellum cellobiohydrolase A. J Mol Biol 402, 374-387
86. Levisson, M., Sun, L., Hendriks, S., Swinkels, P., Akveld, T., Bultema, J. B.,
Barendregt, A., van den Heuvel, R. H., Dijkstra, B. W., van der Oost, J., and Kengen,
S. W. (2009) Crystal structure and biochemical properties of a novel thermostable
esterase containing an immunoglobulin-like domain. J Mol Biol 385, 949-962
87. Sara, M., and Sleytr, U. B. (2000) S-Layer proteins. J. Bacteriol. 182, 859-868
88. Scott, J. R., and Barnett, T. C. (2006) Surface proteins of gram-positive bacteria and
how they get there. Annu Rev Microbiol 60, 397-423
89. Pum, D., Toca-Herrera, J. L., and Sleytr, U. B. (2013) S-layer protein self-assembly.
Int J Mol Sci 14, 2484-2501
90. Engelhardt, H., and Peters, J. (1998) Structural research on surface layers: a focus on
stability, surface layer homology domains, and surface layer-cell wall interactions. J
Struct Biol 124, 276-302
91. Blumer-Schuette, S. E., Giannone, R. J., Zurawski, J. V., Ozdemir, I., Ma, Q., Yin,
Y., Xu, Y., Kataeva, I., Poole, F. L., 2nd, Adams, M. W., Hamilton-Brehm, S. D.,
62
Elkins, J. G., Larimer, F. W., Land, M. L., Hauser, L. J., Cottingham, R. W., Hettich,
R. L., and Kelly, R. M. (2012) Caldicellulosiruptor core and pangenomes reveal
determinants for noncellulosomal thermophilic deconstruction of plant biomass. J.
Bacteriol. 194, 4015-4028
92. Kosugi, A., Murashima, K., Tamaru, Y., and Doi, R. H. (2002) Cell-surface-
anchoring role of N-terminal surface layer homology domains of Clostridium
cellulovorans EngE. J Bacteriol 184, 884-888
93. Ali, M. K., Kimura, T., Sakka, K., and Ohmiya, K. (2001) The multidomain xylanase
Xyn10B as a cellulose-binding protein in Clostridium stercorarium. FEMS
Microbiol. Lett. 198, 79-83
94. Cosgrove, D. J., Li, L. C., Cho, H.-T., Hoffmann-Benning, S., Moore, R. C., and
Blecker, D. (2002) The Growing World of Expansins. Plant Cell Physiol. 43, 1436-
1444
95. Cosgrove, D. J. (2000) Loosening of plant cell walls by expansins. Nature 407, 321-
326
96. Sampedro, J., and Cosgrove, D. J. (2005) The expansin superfamily. Genome Biol 6,
242
97. Georgelis, N., Nikolaidis, N., and Cosgrove, D. J. (2014) Biochemical analysis of
expansin-like proteins from microbes. Carbohyd. Polym. 100, 17-23
63
98. Kerff, F., Amoroso, A., Herman, R., Sauvage, E., Petrella, S., Filee, P., Charlier, P.,
Joris, B., Tabuchi, A., Nikolaidis, N., and Cosgrove, D. J. (2008) Crystal structure
and activity of Bacillus subtilis YoaJ (EXLX1), a bacterial expansin that promotes
root colonization. Proc Natl Acad Sci U S A 105, 16876-16881
99. Saloheimo, M., Paloheimo, M., Hakola, S., Pere, J., Swanson, B., Nyyssönen, E.,
Bhatia, A., Ward, M., and Penttilä, M. (2002) Swollenin, a Trichoderma reesei
protein with sequence similarity to the plant expansins, exhibits disruption activity on
cellulosic materials. Eur. J. Biochem. 269, 4202-4211
100. Kang, K., Wang, S., Lai, G., Liu, G., and Xing, M. (2013) Characterization of a novel
swollenin from Penicillium oxalicum in facilitating enzymatic saccharification of
cellulose. BMC Biotechnol 13, 42
101. Jager, G., Girfoglio, M., Dollo, F., Rinaldi, R., Bongard, H., Commandeur, U.,
Fischer, R., Spiess, A. C., and Buchs, J. (2011) How recombinant swollenin from
Kluyveromyces lactis affects cellulosic substrates and accelerates their hydrolysis.
Biotechnol Biofuels 4, 33
102. Vodovnik, M., and Marinsek-Logar, R. (2010) Cellulosomes - Promising
Supramolecular Machines of Anaerobic Cellulolytic Microorganisms. Acta Chim Slov
57, 767-774
64
103. Mazzoli, R., Lamberti, C., and Pessione, E. (2012) Engineering new metabolic
capabilities in bacteria: lessons from recombinant cellulolytic strategies. Trends
Biotechnol 30, 111-119
104. Wilson, D. B. (2011) Microbial diversity of cellulose hydrolysis. Curr Opin
Microbiol 14, 259-263
105. Bayer, E. A., Lamed, R., White, B. A., and Flint, H. J. (2008) From cellulosomes to
cellulosomics. Chem Rec 8, 364-377
106. Resch, M. G., Donohoe, B. S., Baker, J. O., Decker, S. R., Bayer, E. A., Beckham, G.
T., and Himmel, M. E. (2013) Fungal cellulases and complexed cellulosomal
enzymes exhibit synergistic mechanisms in cellulose deconstruction. Energy Environ.
Sci. 6, 1858-1867
107. Fontes, C. M., and Gilbert, H. J. (2010) Cellulosomes: highly efficient nanomachines
designed to deconstruct plant cell wall complex carbohydrates. Annu. Rev. Biochem.
79, 655-681
108. VanFossen, A. L., Ozdemir, I., Zelin, S. L., and Kelly, R. M. (2011) Glycoside
hydrolase inventory drives plant polysaccharide deconstruction by the extremely
thermophilic bacterium Caldicellulosiruptor saccharolyticus. Biotechnol. Bioeng.
108, 1559-1569
65
109. Khandeparker, R., and Numan, M. T. (2008) Bifunctional xylanases and their
potential use in biotechnology. J Ind Microbiol Biotechnol 35, 635-644
110. Su, X., Mackie, R. I., and Cann, I. K. (2012) Biochemical and mutational analyses of
a multidomain cellulase/mannanase from Caldicellulosiruptor bescii. Appl. Environ.
Microbiol. 78, 2230-2240
111. Blumer-Schuette, S. E., Lewis, D. L., and Kelly, R. M. (2010) Phylogenetic,
microbiological, and glycoside hydrolase diversities within the extremely
thermophilic, plant biomass-degrading genus Caldicellulosiruptor. Appl. Environ.
Microbiol. 76, 8084-8092
112. van de Werken, H. J., Verhaart, M. R., VanFossen, A. L., Willquist, K., Lewis, D. L.,
Nichols, J. D., Goorissen, H. P., Mongodin, E. F., Nelson, K. E., van Niel, E. W.,
Stams, A. J., Ward, D. E., de Vos, W. M., van der Oost, J., Kelly, R. M., and Kengen,
S. W. (2008) Hydrogenomics of the extremely thermophilic bacterium
Caldicellulosiruptor saccharolyticus. Appl. Environ. Microbiol. 74, 6720-6729
113. VanFossen, A. L., Verhaart, M. R., Kengen, S. M., and Kelly, R. M. (2009)
Carbohydrate utilization patterns for the extremely thermophilic bacterium
Caldicellulosiruptor saccharolyticus reveal broad growth substrate preferences. Appl.
Environ. Microbiol. 75, 7718-7724
114. Basen, M., Rhaesa, A. M., Kataeva, I., Prybol, C. J., Scott, I. M., Poole, F. L., and
Adams, M. W. W. (2014) Degradation of high loads of crystalline cellulose and of
66
unpretreated plant biomass by the thermophilic bacterium Caldicellulosiruptor bescii.
Bioresour Technol 152, 384-392
115. Kataeva, I., Foston, M. B., Yang, S.-J., Pattathil, S., Biswal, A. K., Poole Ii, F. L.,
Basen, M., Rhaesa, A. M., Thomas, T. P., Azadi, P., Olman, V., Saffold, T. D.,
Mohler, K. E., Lewis, D. L., Doeppke, C., Zeng, Y., Tschaplinski, T. J., York, W. S.,
Davis, M., Mohnen, D., Xu, Y., Ragauskas, A. J., Ding, S.-Y., Kelly, R. M., Hahn,
M. G., and Adams, M. W. W. (2013) Carbohydrate and lignin are simultaneously
solubilized from unpretreated switchgrass by microbial action at high temperature.
Energy Environ. Sci. 6, 2186-2195
116. Zurawski, J. V., Blumer-Schuette, S. E., Conway, J. M., and Kelly, R. M. (2014) The
Extremely Thermophilic Genus Caldicellulosiruptor: Physiological and Genomic
Characteristics for Complex Carbohydrate Conversion to Molecular Hydrogen. in
Microbial BioEnergy: Hydrogen Production, Advances in Photosynthesis and
Respiration (De Philippis, R., and Zannoni, D. eds.), Springer Science. pp 177-195
117. Lochner, A., Giannone, R. J., Rodriguez, M., Jr., Shah, M. B., Mielenz, J. R., Keller,
M., Antranikian, G., Graham, D. E., and Hettich, R. L. (2011) Use of label-free
quantitative proteomics to distinguish the secreted cellulolytic systems of
Caldicellulosiruptor bescii and Caldicellulosiruptor obsidiansis. Appl. Environ.
Microbiol. 77, 4042-4054
67
118. Gibbs, M. D., Reeves, R. A., Farrington, G. K., Anderson, P., Williams, D. P., and
Bergquist, P. L. (2000) Multidomain and multifunctional glycosyl hydrolases from
the extreme thermophile Caldicellulosiruptor isolate Tok7B.1. Curr. Microbiol. 40,
333-340
119. Bergquist, P. L., Gibbs, M. D., Morris, D. D., Te'o, V. S. J., Saul, D. J., and Morgan,
H. W. (1999) Molecular diversity of thermophilic cellulolytic and hemicellulolytic
bacteria. FEMS Microbiol. Ecol. 28, 99-110
120. Blumer-Schuette, S. E., Brown, S. D., Sander, K. B., Bayer, E. A., Kataeva, I.,
Zurawski, J. V., Conway, J. M., Adams, M. W. W., and Kelly, R. M. (2014)
Thermophilic lignocellulose deconstruction. FEMS Microbiology Reviews
121. Petersen, T. N., Brunak, S., von Heijne, G., and Nielsen, H. (2011) SignalP 4.0:
discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785-786
122. Zverlov, V., Mahr, S., Riedel, K., and Bronnenmeier, K. (1998) Properties and gene
structure of a bifunctional cellulolytic enzyme (CelA) from the extreme thermophile
'Anaerocellum thermophilum' with separate glycosyl hydrolase family 9 and 48
catalytic domains. Microbiology 144 ( Pt 2), 457-465
123. Brunecky, R., Alahuhta, M., Xu, Q., Donohoe, B. S., Crowley, M. F., Kataeva, I. A.,
Yang, S.-J., Resch, M. G., Adams, M. W. W., Lunin, V. V., Himmel, M. E., and
Bomble, Y. J. (2013) Revealing Nature’s Cellulase Diversity: The Digestion
Mechanism of Caldicellulosiruptor bescii CelA. Science 342, 1513-1516
68
124. Yi, Z., Su, X., Revindran, V., Mackie, R. I., and Cann, I. (2013) Molecular and
biochemical analyses of CbCel9A/Cel48A, a highly secreted multi-modular cellulase
by Caldicellulosiruptor bescii during growth on crystalline cellulose. PLoS One 8,
e84172
125. Morris, D. D., Gibbs, M. D., Ford, M., Thomas, J., and Bergquist, P. L. (1999)
Family 10 and 11 xylanase genes from Caldicellulosiruptor sp. strain Rt69B.1.
Extremophiles 3, 103-111
126. Dwivedi, P. P., Gibbs, M. D., Saul, D. J., and Bergquist, P. L. (1996) Cloning,
sequencing and overexpression in Escherichia coli of a xylanase gene, xynA from the
thermophilic bacterium Rt8B.4 genus Caldicellulosiruptor. Appl Microbiol
Biotechnol 45, 86-93
127. Gibbs, M. D., Elinder, A. U., Reeves, R. A., and Bergquist, P. L. (1996) Sequencing,
cloning and expression of a β-1,4-mannanase gene, manA, from the extremely
thermophilic anaerobic bacterium, Caldicellulosiruptor Rt8B.4. FEMS Microbiology
Letters 141, 37-43
128. Sunna, A. (2010) Modular organisation and functional analysis of dissected modular
β-mannanase CsMan26 from Caldicellulosiruptor Rt8B.4. Appl Microbiol Biotechnol
86, 189-200
129. Payne, C. M., Resch, M. G., Chen, L., Crowley, M. F., Himmel, M. E., Taylor, L. E.,
2nd, Sandgren, M., Stahlberg, J., Stals, I., Tan, Z., and Beckham, G. T. (2013)
69
Glycosylated linkers in multimodular lignocellulose-degrading enzymes dynamically
bind to cellulose. Proc. Natl. Acad. Sci. U.S.A. 110, 14646-14651
130. Wootton, J. C. (1994) Non-globular domains in protein sequences: Automated
segmentation using complexity measures. Comput. Chem. 18, 269-285
131. Denman, S., Xue, G. P., and Patel, B. (1996) Characterization of a Neocallimastix
patriciarum cellulase cDNA (celA) homologous to Trichoderma reesei
cellobiohydrolase II. Appl Environ Microbiol 62, 1889-1896
132. Receveur, V., Czjzek, M., Schülein, M., Panine, P., and Henrissat, B. (2002)
Dimension, Shape, and Conformational Flexibility of a Two Domain Fungal
Cellulase in Solution Probed by Small Angle X-ray Scattering. J. Biol. Chem. 277,
40887-40892
133. von Ossowski, I., Eaton, J. T., Czjzek, M., Perkins, S. J., Frandsen, T. P., Schulein,
M., Panine, P., Henrissat, B., and Receveur-Brechot, V. (2005) Protein disorder:
conformational distribution of the flexible linker in a chimeric double cellulase.
Biophys J 88, 2823-2832
134. Stals, I., Sandra, K., Geysens, S., Contreras, R., Van Beeumen, J., and Claeyssens, M.
(2004) Factors influencing glycosylation of Trichoderma reesei cellulases. I:
Postsecretorial changes of the O- and N-glycosylation pattern of Cel7A. Glycobiology
14, 713-724
70
135. Kanafusa-Shinkai, S., Wakayama, J. i., Tsukamoto, K., Hayashi, N., Miyazaki, Y.,
Ohmori, H., Tajima, K., and Yokoyama, H. (2013) Degradation of microcrystalline
cellulose and non-pretreated plant biomass by a cell-free extracellular
cellulase/hemicellulase system from the extreme thermophilic bacterium
Caldicellulosiruptor bescii. J. Biosci. Bioeng. 115, 64-70
136. Sabathe, F., and Soucaille, P. (2003) Characterization of the CipA scaffolding protein
and in vivo production of a minicellulosome in Clostridium acetobutylicum. J
Bacteriol 185, 1092-1096
137. Upreti, R. K., Kumar, M., and Shankar, V. (2003) Bacterial glycoproteins: functions,
biosynthesis and applications. Proteomics 3, 363-379
138. Lochner, A., Giannone, R. J., Keller, M., Antranikian, G., Graham, D. E., and
Hettich, R. L. (2011) Label-free quantitative proteomics for the extremely
thermophilic bacterium Caldicellulosiruptor obsidiansis reveal distinct abundance
patterns upon growth on cellobiose, crystalline cellulose, and switchgrass. J.
Proteome Res. 10, 5302-5314
139. Dam, P., Kataeva, I., Yang, S. J., Zhou, F., Yin, Y., Chou, W., Poole, F. L., 2nd,
Westpheling, J., Hettich, R., Giannone, R., Lewis, D. L., Kelly, R., Gilbert, H. J.,
Henrissat, B., Xu, Y., and Adams, M. W. (2011) Insights into plant biomass
conversion from the genome of the anaerobic thermophilic bacterium
Caldicellulosiruptor bescii DSM 6725. Nucleic Acids Res. 39, 3240-3254
71
140. Handelsman, J. (2004) Metagenomics: application of genomics to uncultured
microorganisms. Microbiol Mol Biol Rev 68, 669-685
141. Sleator, R. D., Shortall, C., and Hill, C. (2008) Metagenomics. Lett Appl Microbiol
47, 361-366
142. El Kaoutari, A., Armougom, F., Gordon, J. I., Raoult, D., and Henrissat, B. (2013)
The abundance and variety of carbohydrate-active enzymes in the human gut
microbiota. Nat Rev Microbiol 11, 497-504
143. Li, L. L., Taghavi, S., McCorkle, S. M., Zhang, Y. B., Blewitt, M. G., Brunecky, R.,
Adney, W. S., Himmel, M. E., Brumm, P., Drinkwater, C., Mead, D. A., Tringe, S.
G., and Lelie, D. (2011) Bioprospecting metagenomics of decaying wood: mining for
new glycoside hydrolases. Biotechnol Biofuels 4, 23
144. Duan, C. J., Xian, L., Zhao, G. C., Feng, Y., Pang, H., Bai, X. L., Tang, J. L., Ma, Q.
S., and Feng, J. X. (2009) Isolation and partial characterization of novel genes
encoding acidic cellulases from metagenomes of buffalo rumens. J Appl Microbiol
107, 245-256
145. Nacke, H., Engelhaupt, M., Brady, S., Fischer, C., Tautzt, J., and Daniel, R. (2012)
Identification and characterization of novel cellulolytic and hemicellulolytic genes
and enzymes derived from German grassland soil metagenomes. Biotechnol Lett 34,
663-675
72
146. Chung, D., Farkas, J., and Westpheling, J. (2013) Overcoming restriction as a barrier
to DNA transformation in Caldicellulosiruptor species results in efficient marker
replacement. Biotechnol. Biofuels 6, 82
147. Farkas, J., Chung, D., Cha, M., Copeland, J., Grayeski, P., and Westpheling, J. (2013)
Improved growth media and culture techniques for genetic analysis and assessment of
biomass utilization by Caldicellulosiruptor bescii. J. Ind. Microbiol. Biotechnol. 40,
41-49
148. Chung, D., Cha, M., Farkas, J., and Westpheling, J. (2013) Construction of a stable
replicating shuttle vector for Caldicellulosiruptor species: use for extending genetic
methodologies to other members of this genus. PLoS One 8, e62881
149. Chung, D., Farkas, J., and Westpheling, J. (2013) Detection of a novel active
transposable element in Caldicellulosiruptor hydrothermalis and a new search for
elements in this genus. J Ind Microbiol Biotechnol 40, 517-521
150. Chung, D., Farkas, J., Huddleston, J. R., Olivar, E., and Westpheling, J. (2012)
Methylation by a unique alpha-class N4-cytosine methyltransferase is required for
DNA transformation of Caldicellulosiruptor bescii DSM6725. PLoS One 7, e43844
151. Chung, D. H., Huddleston, J. R., Farkas, J., and Westpheling, J. (2011) Identification
and characterization of CbeI, a novel thermostable restriction enzyme from
Caldicellulosiruptor bescii DSM 6725 and a member of a new subfamily of HaeIII-
like enzymes. J Ind Microbiol Biotechnol 38, 1867-1877
73
152. Onyenwoke, R. U., Lee, Y.-J., Dabrowski, S., Ahring, B. K., and Wiegel, J. (2006)
Reclassification of Thermoanaerobium acetigenum as Caldicellulosiruptor
acetigenus comb. nov. and emendation of the genus description. Int J Syst Evol
Microbiol 56, 1391-1395
153. Yang, S. J., Kataeva, I., Wiegel, J., Yin, Y., Dam, P., Xu, Y., Westpheling, J., and
Adams, M. W. (2010) Classification of 'Anaerocellum thermophilum' strain DSM
6725 as Caldicellulosiruptor bescii sp. nov. Int J Syst Evol Microbiol 60, 2011-2015
154. Kataeva, I. A., Yang, S. J., Dam, P., Poole, F. L., 2nd, Yin, Y., Zhou, F., Chou, W.
C., Xu, Y., Goodwin, L., Sims, D. R., Detter, J. C., Hauser, L. J., Westpheling, J., and
Adams, M. W. (2009) Genome sequence of the anaerobic, thermophilic, and
cellulolytic bacterium "Anaerocellum thermophilum" DSM 6725. J. Bacteriol. 191,
3760-3761
155. Miroshnichenko, M. L., Kublanov, I. V., Kostrikina, N. A., Tourova, T. P.,
Kolganova, T. V., Birkeland, N.-K., and Bonch-Osmolovskaya, E. A. (2008)
Caldicellulosiruptor kronotskyensis sp. nov. and Caldicellulosiruptor hydrothermalis
sp. nov., two extremely thermophilic, cellulolytic, anaerobic bacteria from
Kamchatka thermal springs. Int J Syst Evol Microbiol 58, 1492-1496
156. Blumer-Schuette, S. E., Ozdemir, I., Mistry, D., Lucas, S., Lapidus, A., Cheng, J.-F.,
Goodwin, L. A., Pitluck, S., Land, M. L., Hauser, L. J., Woyke, T., Mikhailova, N.,
Pati, A., Kyrpides, N. C., Ivanova, N., Detter, J. C., Walston-Davenport, K., Han, S.,
74
Adams, M. W. W., and Kelly, R. M. (2011) Complete genome sequences for the
anaerobic, extremely thermophilic plant biomass-degrading bacteria
Caldicellulosiruptor hydrothermalis, Caldicellulosiruptor kristjanssonii,
Caldicellulosiruptor kronotskyensis, Caldicellulosiruptor owensensis, and
Caldicellulosiruptor lactoaceticus. J. Bacteriol. 193, 1483-1484
157. Bredholt, S., Sonne-Hansen, J., Nielsen, P., Mathrani, I. M., and Ahring, B. K. (1999)
Caldicellulosiruptor kristjanssonii sp. nov., a cellulolytic, extremely thermophilic,
anaerobic bacterium. Int J Syst Bacteriol 49 Pt 3, 991-996
158. Mladenovska, Z., Mathrani, I., and Ahring, B. (1995) Isolation and characterization
of Caldicellulosiruptor lactoaceticus sp. nov., an extremely thermophilic, cellulolytic,
anaerobic bacterium. Arch. Microbiol. 163, 223-230
159. Hamilton-Brehm, S. D., Mosher, J. J., Vishnivetskaya, T., Podar, M., Carroll, S.,
Allman, S., Phelps, T. J., Keller, M., and Elkins, J. G. (2010) Caldicellulosiruptor
obsidiansis sp. nov., an Anaerobic, Extremely Thermophilic, Cellulolytic Bacterium
Isolated from Obsidian Pool, Yellowstone National Park. Appl Environ Microbiol 76,
1014-1020
160. Elkins, J. G., Lochner, A., Hamilton-Brehm, S. D., Davenport, K. W., Podar, M.,
Brown, S. D., Land, M. L., Hauser, L. J., Klingeman, D. M., Raman, B., Goodwin, L.
A., Tapia, R., Meincke, L. J., Detter, J. C., Bruce, D. C., Han, C. S., Palumbo, A. V.,
Cottingham, R. W., Keller, M., and Graham, D. E. (2010) Complete genome
75
sequence of the cellulolytic thermophile Caldicellulosiruptor obsidiansis OB47T. J.
Bacteriol. 192, 6099-6100
161. Huang, C. Y., Patel, B. K., Mah, R. A., and Baresi, L. (1998) Caldicellulosiruptor
owensensis sp. nov., an anaerobic, extremely thermophilic, xylanolytic bacterium. Int
J Syst Bacteriol 48 Pt 1, 91-97
162. Rainey, F. A., Donnison, A. M., Janssen, P. H., Saul, D., Rodrigo, A., Bergquist, P.
L., Daniel, R. M., Stackebrandt, E., and Morgan, H. W. (1994) Description of
Caldicellulosiruptor saccharolyticus gen. nov., sp. nov: An obligately anaerobic,
extremely thermophilic, cellulolytic bacterium. FEMS Microbiol Lett 120, 263-266
76
CHAPTER 2
Multi-Domain, Surface Layer Associated Glycoside Hydrolases Contribute to Plant Polysaccharide Degradation by Caldicellulosiruptor species
Jonathan M. Conway, William S. Pierce, Jaycee H. Le, George W. Harper, John H. Wright, Allyson L. Tucker, Jeffrey V. Zurawski, Laura L. Lee, Sara E. Blumer-Schuette, and Robert M. Kelly
Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC 27695
Published in: J. Biol. Chem. 291, 6732-6747 (March 2016)
77
Abstract
The genome of the extremely thermophilic bacterium, Caldicellulosiruptor kronotskyensis, encodes 19 S-layer homology (SLH) domain-containing proteins, the most in any Caldicellulosiruptor species genome sequenced to date. These SLH proteins include five glycoside hydrolases (GHs) and one polysaccharide lyase, the genes for which were transcribed at high levels during growth on plant biomass. The largest GH identified so far in this genus, Calkro_0111 (2,435 amino acids), is completely unique to C. kronotskyensis and contains SLH domains. Calkro_0111 was produced recombinantly in Escherichia coli as two pieces, containing the GH16 and GH55 domains, respectively, as well as putative binding and spacer domains. These displayed endo- and exo-glucanase activity on the β-1,3-1,6- glucan laminarin. A series of additional truncation mutants of Calkro_0111 revealed the essential architectural features required for catalytic function. Calkro_0402, another of the
SLH-domain GHs in C. kronotskyensis, when produced in E. coli, was active on a variety of xylans and β-glucans. Unlike Calkro_0111, Calkro_0402 is highly conserved in the genus
Caldicellulosiruptor and among other biomass degrading Firmicutes, but missing from
Caldicellulosiruptor bescii. As such, the gene encoding Calkro_0402 was inserted into the C. bescii genome, creating a mutant strain with its S-layer extensively decorated with
Calkro_0402. This strain consequently degraded xylans more extensively than wild-type C. bescii. The results here provide new insights into the architecture and role of SLH domain
GHs and demonstrate that hemicellulose degradation can be enhanced through a non-native
SLH domain GHs engineered into the genomes of Caldicellulosiruptor species.
78
Introduction
Many bacteria (1,2) and archaea (3) produce a two-dimensional, para-crystalline array of protein that covers the outside of the cell, referred to as a surface layer (S-layer). In bacteria, the majority of S-layer proteins are non-covalently associated with the bacterial cell surface via specialized domains at their amino- or carboxy-terminus, typically from one of three distinct, but analogous, domain categories: Surface Layer Homology (SLH) (4,5), Cell
Wall Binding repeat 2 (CWB2) (6), or Noncovalent Attachment Domain (NCAD) (7,8). In archaea, S-layer proteins are most often attached to the cell membrane. S-layer proteins can be anchored to the membrane via a C-terminal transmembrane helix domain (3,9), or covalently attached to cell membrane lipid via an archaeosortase, as shown in Haloferax volcanii (10,11). In most microorganisms, the S-layer is comprised entirely of one or two proteins (SLPs), which self- assemble. In addition to these SLPs, some Firmicutes produce a family of proteins that also contain SLH, CWB2, or NCAD domains, and as such, are predicted to be associated with the S-layer (12). SLPs and S-layer associated proteins can also contain domains with specialized functions allowing the S-layer to play a role in adherence and biofilm formation (13-16), biogenesis of the cell envelope (17), cell division
(18), motility (19,20), and, of interest here, plant cell wall degradation (21,22).
Many lignocellulose-degrading bacteria utilize SLH domain-containing proteins to display plant biomass degrading Carbohydrate Active enZYmes (CAZymes) (23) on the cell surface. In cellulosomal bacteria, SLH domain proteins attached to the cell surface interact with the scaffoldin protein of the multi-enzyme cellulosome complex to anchor it to the cell surface (24-26). In many non-cellulosomal, biomass-degrading bacteria, multi-domain
79
CAZymes containing SLH proteins are implicated in lignocellulose deconstruction. About
5% of the 20,710 predicted SLH domain proteins, represented in 1889 genomes from the
Integrated Microbial Genomes database (27), contain at least one identifiable CAZyme- related component: glycoside hydrolase (GH), polysaccharide lyase (PL), carbohydrate esterase (CE), or carbohydrate binding module (CBM).
Multi-domain CAZymes containing SLH domains have been characterized from
Caldanaerobius polysaccharolyticus (28,29), Bacillus sp. (30), Paenibacillus sp. (31-36),
Clostridium sp. (37-44), Thermoanaerobacterium sp. (45-51), and Caldicellulosiruptor sp.
(21,52). Across these species, the number of identifiable CAZyme domain-containing SLH proteins and total predicted SLH proteins vary significantly. Examples include: Paenibacillus sp. strain JDR-2 (23 CAZyme domain containing of 78 total SLH proteins), Clostridium thermocellum ATCC 27405 (4 of 25), Thermoanaerobacterium saccharolyticum strain
JW/SL-YS485 DSM 8691 (7 of 12), Caldicellulosiruptor bescii (1 of 12), and
Caldicellulosiruptor kronotskyensis (6 of 19).
The genus Caldicellulosiruptor is comprised of extremely thermophilic, gram positive, anaerobic bacteria that are able to attach to, and degrade, the variety of polysaccharides found in lignocellulosic biomass. Currently, the genomes of 12 members of the genus have been sequenced, representing species isolated from globally distributed terrestrial hot springs (53-57). Caldicellulosiruptor species produce many extracellular
CAZymes, including some which have SLH domains (58). The enzyme inventory produced by individual species, and thus the capacity to degrade plant biomass polysaccharides, varies significantly across the genus (59,60). C. kronotskyensis has the largest inventory of
80
CAZymes of any Caldicellulosiruptor species, with 31 CAZymes compared to 20 CAZymes in C. bescii, which is to date the most studied member of the genus (61).
During growth on complex carbohydrates, Caldicellulosiruptor species physically associate with the substrate. This attachment is mediated in part by non-catalytic cellulose binding proteins, called tāpirins, that were recently identified and characterized (62). In addition, proteins anchored to the cell surface within the S-layer via SLH domains are also believed to play a role in the attachment of Caldicellulosiruptor species to insoluble plant biomass substrates (21,63). Two such SLH domain proteins from C. saccharolyticus,
Csac_0678 and Csac_2722, were shown to bind to insoluble substrates, and could be identified in the S-layer protein cell fraction, implying a role in cell-substrate attachment
(21). However, more information is needed to understand the role of SLH domain proteins in plant biomass deconstruction, especially how their function relates to other CAZymes produced for this purpose. In this study, the localization, biochemical characteristics and physiological role of two SLH domain enzymes from C. kronotskyensis, xylanase
Calkro_0402 and laminarinase Calkro_0111, were examined from this perspective.
Materials and Methods
Bacterial Strains, Plasmids and Reagents
Cloning and expression of recombinant proteins used various E. coli strains:
NovaBlue GigaSingles™ (EMD Millipore), Rosetta™ 2(DE3) Singles™ (EMD Millipore),
NEB 10-beta electrocompetent E. coli (New England Biolabs), and Arctic Express (DE3)RIL
81
E. coli (Agilent Technologies). Axenic strains of Caldicellulosiruptor bescii and
Caldicellulosiruptor kronotskyensis were obtained from the Leibniz Institute DSMZ-German
Collection of Microorganisms and Cell Cultures. Caldicellulosiruptor bescii strain
JWCB018 (64) and non-replicating vector pDCW121 (65) were obtained from J.
Westpheling (University of Georgia, Athens, Georgia). Genes of interest were amplified by polymerase chain reaction (PCR) and cloned using the pET46 Ek/LIC vector kit (EMD
Millipore) for protein expression. All other vectors were constructed using one-step isothermal assembly of overlapping dsDNA, as described previously (66). Plasmids were isolated using Qiaprep miniprep kits (Qiagen) and plasmid sequences confirmed by sequencing (Genewiz). Oligonucleotide primer sequences used are listed in Table 1.
Carbohydrates and biomasses used included: laminarin from Laminaria digitata (Sigma
L9634), xylan from birchwood (Sigma X0502), xylan from oat spelts (Sigma X0627), and dilute acid pretreated Populus trichocarpa X Populus deltoides (National Renewable Energy
Laboratory, Golden, CO).
Growth Media and Culture Conditions
All E. coli cultures were maintained in Luria-Bertani (LB) medium (5 g/L yeast extract, 10 g/L tryptone, 10 g/L NaCl) or LB medium with 1.5% (w/v) agar plates supplemented with 50 μg/ml carbenicillin (Corning cellgro), 34 μg/ml chloramphenicol
(Sigma), 20 μg/ml gentamycin (Chem-Impex International Inc.), 50 μg/ml streptomycin
(Sigma), and/or 50 μg/ml apramycin (Fisher Scientific), as appropriate. Protein expression was performed in either Terrific Broth (TB) medium (containing 12 g/L tryptone, 24 g/L
82
yeast extract, 4 mL/L glycerol, 2.31 g/L KH2PO4, and 12.54 g/L K2HPO4) or ZYM-5052 medium (67) modified to contain 0.625% glycerol, 0.0625% glucose, and 0.25% lactose.
Caldicellulosiruptor strains were grown anaerobically at 70°C in modified DSM640 medium (DSMZ [http://www.dsmz.de]), containing 0.9 g/L NH4Cl, 0.9 g/L NaCl, 0.4 g/L
MgCl2 ● 6H2O, 0.75 g/L KH2PO4, 1.5 g/L K2HPO4, 1 g/L yeast extract, 1mL/L trace element solution SL-10, 5 g/L cellobiose, 0.5 mg/L resazurin, and 0.75 g/L L-Cysteine-HCl ● H20).
Caldicellulosiruptor genomic DNA (gDNA) for cloning and vector construction was extracted, as described previously (68). For genetic manipulations of C. bescii, strains were grown anaerobically at 70°C in low osmolality defined (LOD) or low osmolality complex
(LOC) media (69). C. bescii was plated embedded in LOD medium with 1.5% agar and grown at 70°C in anaerobic tanks with N2 headspace. For solubilization experiments and immunofluorescence microscopy, Caldicellulosiruptor strains were grown on a modified defined version of DSMZ671 medium (671d medium), described previously (70), with 5 g/L of the specific biomass substrate as the carbon source. For growth of all uracil auxotroph mutants, growth medium was supplemented with 40 mM uracil.
Expression of Recombinant Proteins in E. coli
All Calkro_0111 Truncation Mutants (TMs) and Calkro_0402 TM1 were cloned into the pET46 Ek/LIC vector maintained in NovaBlue E. coli. Sequence confirmed plasmids were transformed into Rosetta 2(DE3) E. coli for protein expression. Proteins were expressed in modified ZYM-5052 auto-induction medium at 37°C 250 rpm. Cells were harvested 18-24 hours after induction by centrifugation at 6000 x g for 10 minutes. For the
83
expression of Calkro_0402, the gene was cloned into the pET46 Ek/LIC vector without the predicted signal peptide, as determined by the SignalP 4.1 Server (71), and maintained in
NEB 10-beta E. coli. The sequence confirmed plasmid was transformed into Arctic Express
(DE3)RIL E. coli for protein expression. Expression was performed in TB media by growing cells at 30°C and 250 rpm until optical density at 600 nm (OD600) reached 1.0 followed by induction at 13°C with 1 mM IPTG (isopropyl-β-D-thiogalactopyranoside). Cells were harvested 16-20 hours later by centrifugation at 6000 x g for 10 minutes. All cell pellets were frozen at -20°C prior to purification.
Purification of Recombinant Protein and Truncation Mutants
Cell pellets were re-suspended in 5 ml of 20 mM sodium phosphate pH 7.4, 0.5 M
NaCl, 20 mM imidazole, 0.1% (v/v) Nonidet P40, and 1 mg/ml lysozyme per gram of wet cell pellet. Cells were lysed in a French Press at 16,000 psi. Cell lysates were heat treated at
65°C for 20 min (except for the full length Calkro_0402 protein product which was not heat treated) to remove heat labile proteins from E. coli. The cell lysate was then centrifuged at
25,000 x g for 20-60 minutes and the supernatant was passed through a 0.22 µm filter to prepare cell free cell extract. Proteins were purified using 1 mL or 5 mL HisTrap HP Ni
Sepharose (GE Life Sciences) immobilized metal affinity chromatography (IMAC) columns, operated according to the manufacturer instructions. For full length Calkro_0402 and
Calkro_0111 TM8, size exclusion chromatography (SEC) was used after IMAC on a HiLoad
26/600 Superdex 200 pg column (GE Life Sciences) in 50 mM sodium phosphate, 150 mM
NaCl, pH 7.2 buffer. All chromatography steps were performed on a Biologic DuoFlow
84
FPLC (BioRad). Purity of the recombinant proteins was evaluated by SDS-PAGE using
NuPAGE Novex 4-12% Bis-Tris protein gels (Life Technologies) or 4-15% Mini-PROTEAN
TGX gels (Bio-Rad) with Benchmark protein ladder (Life Technologies), and staining by
GelCode Blue Safe Protein Stain (Thermo Scientific). The concentration of purified proteins was determined using the bicinchoninic acid assay (Pierce, Thermo Fisher Scientific).
Optimal pH and Temperature Determination
The pH optimum for each recombinant enzyme was determined at 70°C in buffers at pH 2.5-10 (50 mM citrate buffer pH 2.5 and 3.5; 50 mM sodium acetate buffer pH 4.5, 5, 5.5, and 6; 50 mM sodium phosphate pH 6.5, 7, and 8; 50 mM sodium bicarbonate pH 9.2 and
10). The temperature optimum was determined at the optimal pH between 30 and 100°C.
Enzyme reactions and no enzyme/substrate controls were prepared in triplicate with 100 µL of 1% substrate and incubated in PCR multi-well plates in a thermocycler (Eppendorf).
Activity was assessed by measuring reducing sugar released using a 3,5-dinitrosalicylic acid
(DNS) assay modified from (72,73) and adapted to a 96-well microplate format (74,75). The
DNS reagent used contained 1.6% (w/v) sodium hydroxide, 30% (w/v) sodium potassium tartrate, and 0.1% (w/v) 3,5-dinitrosalicilic acid. Briefly, 30 µL of sample was mixed with 60
µL of DNS reagent and incubated in a 96-well PCR plate in a thermocycler (Eppendorf) using the following program: 95°C for 5 min, 48°C for 1 min, hold at 20°C. Then the DNS reaction was diluted with distilled water and absorbance measured at 540 nm. Glucose or xylose was used as standards for glucooligosaccharide or xylooligosaccharide release.
85
Analysis of Oligosaccharide Products from Enzymatic Hydrolysis
To analyze oligosaccharides released by the various enzymes, enzyme was added to
500 µL of 1% substrate in buffer at the enzyme’s optimum pH and incubated between 0 and
240 minutes at 70°C at 800 rpm in a mixing thermocycler (Eppendorf). Reactions without substrate and/or enzyme were also assessed. After incubation, tubes were spun at 16,000 x g for 5 minutes at 4°C and the supernatant applied to 0.5 mL 10K MWCO protein concentrators (Pierce, Thermo Fisher Scientific). The protein concentrator eluent was diluted
10-fold with distilled water and analyzed by high performance liquid chromatography
(HPLC) (Alliance e2695 separations module; Waters) with refractive index (model 2414;
Waters) and photodiode array (model 2998; Waters) detectors. Columns were either the KS-
801 (xylooligosaccharides) or KS-802 (glucooligosaccharides) (Shodex) operated with water as the mobile phase, 0.6 mL/min and 80°C. Laminarioligosaccharides (L2-L6) (Seikagaku
Corporation), individual xylooligosaccharides (X2-X6) (Megazyme), glucose (Sigma), and xylose (Acros Organics) were used as standards.
Confocal and Epifluorescence Microscopy
All centrifugation steps for preparation of labeled cells were carried out at 6,000 x g for 10 min. Cells harvested from a 50 mL culture in 671d medium were washed one time with sterile 1X Phosphate Buffered Saline (PBS) and cells were fixed in 1X PBS containing
4% formaldehyde (methanol free, Fisher) for 30 min at room temperature with gentle shaking, followed by washing three times with 50 mL of PBS. Cells were re-suspended in 3 mL antibody blocking solution containing 5% (v/v) normal goat serum (Immunoreagents),
86
1% (w/v) bovine serum albumin (protease and nuclease free, Fisher), and 0.05% (v/v)
Tween-20 (Fisher) in PBS and incubated on a Boeckle rocker for 60 min at room temperature. After this blocking step, cells were harvested by centrifugation and re- suspended in antibody blocking solution containing 100 µg/mL of primary antibody.
Polyclonal antibodies were raised against Calkro_0111 TM7 or Calkro_0402 TM1 in chickens and supplied as total IgY with pre-immune total IgY controls by GeneTel
Laboratories (Madison, Wisconsin). Cells were incubated with the primary antibody for 18 hours at 4°C on a Boeckle rocker. After this incubation, cells were washed three times with 1 mL PBS containing 1% (w/v) bovine serum albumin and 0.05% (v/v) Tween-20 and re- suspended in antibody blocking solution containing a 1:400 dilution of goat anti-chicken
DyLight488 conjugated secondary antibody (Immunoreagents) and incubated on a Boeckle rocker for 60 minutes at room temperature. Cells were harvested and washed three times with PBS, re-suspended in 100 µL of 3.6 µM DAPI in PBS and incubated at 4°C for 18 hours. Following this incubation, cells were washed three times, re-suspended with PBS, and vacuum filtered onto 0.22 µm polycarbonate hydrophilic isopore membrane filters (EDM
Millipore) and mounted in 15 µL of SlowFade Diamond anti-fade mountant (Life
Technologies). For Calkro_0111 and Calkro_0402 localization in C. kronotskyensis, slides were imaged on a Zeiss LSM 710 confocal workstation using a 63x oil immersion plan- apochromat (NA 1.4) objective (NC State University Cellular and Molecular Imaging
Facility, Raleigh, NC). Image processing was performed using Zeiss Zen Blue software.
Epifluorescence imaging and cell counting was performed using a Nikon eclipse 50i microscope with a Plan Fluor 100x (NA 1.3) oil emersion objective and Nikon DS-Fi1
87
camera. For routine cell counting, cells were stained with acridine orange, as previously described (76). For immunofluorescence images of C. bescii genetic mutant strains, cells were labeled and slides were prepared as described above for confocal imaging. Images were captured as a z-stack of images through the sample with 0.33 µm steps controlled by the
ES10ZE Focus controller (Prior Scientific) and processed into a single focused image using the Nikon Elements software extended depth of field processing methods.
Transmission electron microscopy was performed at the Laboratory for Advanced
Electron and Light Optical Methods (College of Veterinary Science, North Carolina State
University).
Genetic Manipulation of C. bescii
To prepare competent cells, a 500 mL culture of C. bescii strain JWCB018 was grown to an optical density at 680 nm of 0.06-0.07 in LOD medium supplemented with 1X
19 amino acid solution (77). The culture was cooled to room temperature and cells were harvested by centrifugation. All centrifugation steps were carried out at 6,000 x g for 10 min.
The cell pellet was washed three times with 10% sucrose and re-suspended with 10% sucrose to a total volume of 100-120 µL. Fifty µL of competent cells were mixed with 1 µg of pJMC009 plasmid DNA at room temperature and electroporated using 1 mm gap cuvettes
(USA Scientific) and a Gene Pulser II system with Pulse Controller PLUS module (Bio-Rad) operated at 2.25 kV, 600 Ω, and 25 µF. Immediately following electroporation, cells were transferred to 10 mL of pre-warmed LOC medium and incubated at 70°C for one hour. After this incubation, the culture was cooled to room temperature, and cells were harvested by
88
centrifugation, transferred to 100 mL of pre-warmed LOD medium lacking uracil, and incubated at 70°C for 2-4 days to select for first crossover transformants. This culture was passaged and then plated on LOD medium lacking uracil to select individual colonies.
Integration of the pJMC009 vector was confirmed by PCR and a successful first crossover mutant strain was plated on LOD medium supplemented with 40 mM uracil and 4 mM 5- fluoroorotic acid (5-FOA) to select for second crossover mutants. Isolated colonies were screened by PCR and successful second crossovers were identified. A successful second crossover was plated on LOD medium supplemented with uracil and individual colonies were isolated. This plating and isolation of colonies was repeated two times to obtain the final strain and ensure its purity. All PCR screening steps were performed using genomic
DNA isolated using the ZymoBead™ genomic DNA kit (Zymo Research).
Hydrolysis of Hemicellulose Substrates
Washed biomass substrates (oat spelt xylan, birchwood xylan) were prepared by washing 1 g substrate per 100 mL distilled water overnight at 70°C to remove soluble sugars, centrifuged at 6,000 x g for 10 min and dried at 70°C. Caldicellulosiruptor strains were passaged on modified 671d medium with respective biomass substrates 3 to 4 times.
Solubilization cultures were prepared in triplicate as 50 mL 671d medium cultures with 5 g/L substrate, inoculated to 1 x 106 cells/mL and grown at 70°C for 45.5 hours. Residual substrate was harvested by centrifugation at 6000 x g for 10 min and drying at 70°C until constant mass. The extent of solubilization was determined from the difference in mass between the biomass used to prepare each culture and the residual remaining after harvest.
89
For time course analysis of oat spelt xylan solubilization, cultures were prepared as above except cultures were sampled by removing 2 mL of culture and centrifuging it at 6,000 x g for 10 min. The supernatant was stored at -20°C prior to HPLC analysis.
Analysis of Solubilization in Culture Supernatants
Supernatant samples were analyzed for xylose content, as described previously (70).
Briefly, the samples were brought to 4% (w/w) sulfuric acid and autoclaved for 1 hour on the liquid cycle. Samples were cooled to room temperature and spun at 18,000 x g for 5 minutes and supernatant analyzed by HPLC as described above except using an Aminex-87H (300 mm by 7.8 mm; Bio-Rad) column with a mobile phase of 5 mM sulfuric acid at 0.6 mL/min and 60°C.
C. kronotskyensis and C. bescii RNA microarray - Transcriptomic data, which was previously acquired (70) and deposited in the NCBI Gene Expression Omnibus (GEO) database (Accession GSE68810), was re-analyzed with respect to the SLH domain containing proteins from C. bescii and C. kronotskyensis.
Results
S-layer Homology (SLH) Domain Proteins in Caldicellulosiruptor Species
The genomes of the twelve, sequenced Caldicellulosiruptor species collectively encode 34 different groups of SLH domain-containing proteins (Supplementary Table S-1).
About half of the SLH domain-containing protein groups in these genomes have no additional identifiable domains other than SLH domains, highlighting the limited
90
understanding of this group of proteins. Caldicellulosiruptor species produce between 10 (C. saccharolyticus and C. acetigenus) and 19 (C. kronotskyensis) SLH domain proteins.
Eight of the 34 SLH domain-containing protein groups contain identifiable catalytic
GH or PL CAZyme domains (Figure 2.1). Csac_0678 (GH5), which has been previously characterized (21), has homologs in all 12 sequenced Caldicellulosiruptor species. Three of the 12 Caldicellulosiruptor species (C. owensensis, C. obsidiansis, and C. sp. strain Rt8.B8) only produce a truncated homolog of Csac_0678 (see Table S-1). Homologs of Calkro_0402
(GH10) are found in seven sequenced Caldicellulosiruptor species, with 65-83% amino acid sequence identity. One ortholog of Calkro_0402 (76% identity) has been characterized from
Caldicellulosiruptor sp. Rt69.B1 (52), which does not have a fully sequenced genome.
Calkro_0402 and its Caldicellulosiruptor sp. homologs belong to a larger group of GH10 enzymes with similar domain architecture, both with and without C-terminal SLH domains, that have been characterized from several xylan-degrading microbes (28,31,33,34,36,37,39-
46,48-51,78).
The other six groups of catalytic SLH domain proteins (Figure 2.1) have not been characterized from Caldicellulosiruptor species. Calkro_0111 (GH16/GH55) is the largest of all of these at 2453 amino acids, and is also the largest CAZyme of any kind identified so far in the genus Caldicellulosiruptor. Calkro_0121 is a second truncated paralog of
Calkro_0111, which has 60% amino acid identity to Calkro_0111 and identical domain architecture, but is lacking the final CBM32 domain. The arrangement of the domains in
Calkro_0111 and Calkro_0121 is entirely unique to C. kronotskyensis, and no homologs of the full protein (Calkro_0111) can be identified in any other sequenced microorganism.
91
Calkro_0072 (GH16) has homologs in seven Caldicellulosiruptor species and is very similar to the β-1,3-glucanase Lic16A characterized from Clostridium thermocellum (38). C. hydrothermalis produces two catalytic SLH domain proteins that are unique to this species, putative xylanase Calhy_1629 (GH43) and putative α-glucanase Calhy_2383 (GH87).
Calkro_0550 (PL11) from C. kronotskyensis has one additional homolog in
Caldicellulosiruptor owensensis, and is a putative rhamnogalacturonan lyase. C. owensensis also produces one other putative pectate lyase SLH domain protein, Calow_2109 (PL9), which is unique to this species.
These catalytic SLH domain proteins encompass a range of enzymatic activities localized in the S-layer, presumably used to break down plant polysaccharide components that are in proximity of the cell surface. C. kronotskyensis produces six CAZyme SLH domain proteins, the most of any species; while C. bescii produces the fewest, only the
Csac_0678 homolog conserved in all Caldicellulosiruptor species. These SLH-localized
CAZymes accompany a host of other CAZyme domains found in extracellular free enzymes produced by Caldicellulosiruptor sp. for plant biomass degradation (58).
C. kronotskyensis Produces an S-layer and SLH Enzymes Calkro_0111 and
Calkro_0402 are Localized on the C. kronotskyensis Cell Surface
Microscopy (TEM) of C. kronotskyensis growing on dilute acid treated poplar (P. trichocarpa x P. deltoides) shows the cell membrane, peptidoglycan layer, and S-layer of C. kronotskyensis cells (Figure 2. 2A), with the cells appearing to attach to granules of the substrate mediated by appendages at the cell surface. To determine the localization of the
92
SLH CAZymes Calkro_0402 and Calkro_0111, polyclonal antibodies were raised against portions of these proteins and used for immunofluorescence microscopy in C. kronotskyensis.
Figures 2.2 B-C and D-E show confocal microscopy of C. kronotskyensis labeled with anti-
Calkro_0111 and anti-Calkro_0402 antibodies, respectively. Controls labeled with pre- immune total IgY antibodies show minimal labeling (data not shown). In each case, these enzymes are localized on the cell surface within the S-layer, as would be predicted by the presence of SLH domains. Because of the high similarity between Calkro_0111 and
Calkro_0121, the antibodies likely are labeling both of these proteins in Figure 2.2 B-C.
Laminarinase Calkro_0111 Displays Endo- and Exo-glucanase Activity in Separate
Catalytic Domains
Calkro_0111 contains fifteen predicted domains, including the two catalytic units:
GH16 and GH55 (Figure 2.3A). Because the full enzyme could not be produced recombinantly in E. coli, the enzyme was split into several truncation mutants (TMs) for characterization. Two of these TMs were used to characterize the enzymatic activity: TM1, containing the GH16 domain and covering the first half of Calkro_0111, and TM8, containing the GH55 domain and covering the second half of Calkro_0111. For both TM1 and TM8 the optimal pH was 5, while the optimal temperature was 75°C (Figure 2.3B).
Analysis of the oligosaccharides released from laminarin by these two halves of Calkro_0111 showed that the GH55 domain of TM8 generates primarily glucose and a small amount of laminaribiose from laminarin, while the GH16 domain of TM1 released laminooligosaccharides of size L3 (laminaritriose) and greater, which slowly accumulated
93
with time (Figure 2.3C). Taken together, this suggests that the GH16 domain is an endoglucanase, while the GH55 domain is an exoglucanase. Considering the full-length enzyme, coordination between the GH16 and GH55 domains is expected, such that the GH16 cuts new chain ends for the GH55 to digest, thereby liberating glucose. This synergism is not uncommon in freely secreted Caldicellulosiruptor enzymes with two catalytic domains
(76,79,80), but synergism had yet to be demonstrated in any CAZyme SLH domain protein.
This is primarily due to the fact that very few SLH domain proteins have two catalytic
CAZyme domains, further highlighting the uniqueness of Calkro_0111.
To further explore the domain arrangement and the contribution of the non-catalytic domains to activity, a range of TMs (Figure 2.3A) were produced in E. coli and analyzed.
Activity of these TMs at 70°C on laminarin was evaluated at pH 5, the enzyme optimum pH, and pH 7, the growth optimum pH for Caldicellulosiruptor (Figure 2.3D). TM3 and TM4 had the highest activity at both pHs tested, while TM1 was slightly less active. TM2, which is truncated between the GH16 and the first actin cross linking-like (ACL) domain, was significantly less active. TM5, which contains the GH16 domain alone, and TM6 and TM7, which contain the ACL and Fibronectin type-III domains (FN3) and not the GH16, all displayed little to no activity (Figure 2.3D). TM4, which contains the GH16 and first ACL domain, represents the smallest portion of the GH16 side of the enzyme that remains fully active under these conditions. This suggests that the ACL domain plays an important role in
GH16 function. The ACL domain could stabilize the GH16 domain, a role that has been shown before for accessory domains in other GH multi-domain enzymes. For example, two
FN3 domains (also previously called X1 modules) of C. thermocellum cellobiohydrolase A
94
(CbhA) increased the thermostability of the adjacent GH9 domain (81). For the Calkro_0111
GH55 domain, removal of the ACL and FN3 domains from TM8 to TM9 and then removal of the two CBM32 domains from TM9 to TM10 resulted in successive reductions in activity
(Figure 2.3D). This suggests that both the ACL-FN3-FN3 grouping and two CBM32 domains play an important role in the activity of the Calkro_0111 GH55.
Biochemical Characterization of Calkro_0402 and Relatedness To Other
CBM22/GH10/CBM9 Enzymes
Calkro_0402 belongs to a large group of homologous CBM22, GH10, and CBM9 containing proteins, from a variety of bacteria, primarily Firmicutes. An unrooted neighbor- joining phylogenetic tree constructed from the full amino acid sequences of top blastp hits to
Calkro_0402 and similar xylanases that have been previously characterized shows the relatedness of xylanase enzymes from this group (Figure 2.4A).
Calkro_0402 contains three CBM22, one catalytic GH10, two CBM9 domains, and one Cadherin-like domain, in addition to C-terminal SLH domain repeats (Figure 2.4B). Full length Calkro_0402 was produced recombinantly in E. coli (Figure 2.4C) and the optimal pH and temperature for activity on birchwood xylan were found to be pH 5.5 and 80°C, respectively (Figure 2.4D). Calkro_0402 releases primarily xylotriose and xylobiose from birchwood xylan, but also a small amount of xylose (Figure 2.4E). This type of activity is consistent with other xylanase homologs that have been previously characterized (28,31).
Activity of Calkro_0402 was also detected on oat spelt xylan, wheat arabinoxylan, pachyman
95
1,3-β-glucan, barley β-glucan, and lichenan, in addition to birchwood xylan, all at pH 5.5 and
70°C (data not shown).
Transcriptomic Analysis of SLH Domain Proteins in C. bescii and C. kronotskyensis
Transcriptomic analysis was performed on C. kronotskyensis and C. bescii, the
Caldicellulosiruptor species with the most and least catalytic CAZyme SLH domain proteins, respectively, when these species were grown on Avicel (crystalline cellulose) and the more heterogeneous switchgrass. All CAZyme SLH domain proteins from both species, except
Calkro_0111 and Calkro_0121, were up-regulated on switchgrass. This transcriptional response to switchgrass suggests these genes respond to components of switchgrass, like hemicellulose polysaccharides, not found in crystalline cellulose Avicel. The transcriptional level of these genes on switchgrass is similar to the free enzymes in the glucan degradation locus (GDL), known to be a genomic feature of cellulolytic Caldicellulosiruptor sp.
(59,61,70). Calkro_0333 and Athe_2303 are very highly transcribed under both conditions.
These proteins are homologs of Csac_2451, which was previously identified as the main SLP
(59,60). A number of other non-catalytic SLH domain proteins are transcribed at moderate to high levels, but because domains other than SLH domains are not predicted, the role of these proteins is unknown.
96
Manipulation of the C. bescii S-layer by the Insertion (Knock On) of the Gene Encoding
Calkro_0402
Because Calkro_0402 is very highly transcribed in C. kronotskyensis (Figure 2.5) and utilized by a variety of xylan degrading bacteria (Figure 2.4A), Calkro_0402 was
‘knocked-in’ to genetically tractable C. bescii to examine its potential contribution to plant biomass degradation in vivo. C. bescii is a good genetic background for testing SLH proteins in vivo, because it produces only 12 SLH domain proteins in total and does not produce a homolog of Calkro_0402. Calkro_0402 was inserted into C. bescii uracil auxotroph strain
JWCB018 using pyrF complementation and 5-FOA counter-selection. The knock-in construct contained Calkro_0402, including the native signal peptide, and the native predicted terminator sequence from C. kronotskyensis under the control of the slp promoter from the main SLP (Athe_2303) in C. bescii (Figure 2.6 A and B). As shown in Figure 2.5, the main SLP (Athe_2303) is very highly and constitutively transcribed. Thus, its promoter should direct very high levels of transcription of Calkro_0402 in the knock-in construct. The knock-in was targeted at the ΔCbeI (Athe_2438) deletion locus in C. bescii strain JWCB018, because genetic manipulation in this area of the genome was successful previously (64). The
PCR amplicon for the knock-in C. bescii strain RKCB103 compared to strain JWCB018 is shown in Figure 2.6C.
Using this Calkro_0402 knock-in C. bescii strain RKCB103, immunofluorescence microscopy was performed to investigate the expression of Calkro_0402 within the S-layer of C. bescii. For comparison, immunofluorescence microscopy was also performed on parent strain C. bescii strain JWCB018 and C. kronotskyensis, in addition to C. bescii strain
97
RKCB103 (Figure 2.6 D-F). C. bescii strain JWCB018, which does not have Calkro_0402, had minimal labeling with the anti-Calkro_0402 antibodies (Figure 2.6D), while C. bescii strain RKCB103 was extensively labeled on the cell surface (Figure 2.6F). Labeling of the native expression of Calkro_0402, in C. kronotskyensis is also shown (Figure 2.6E). Based on transcription levels of Calkro_0402 and the Athe_2303 SLP (Figure 2.5), the
Calkro_0402 transcript in C. bescii strain RKCB103 should be >10-fold higher, using the slp promoter, compared to levels observed in wild-type C. kronotskyensis. This clearly relates to an increase in Calkro_0402 protein expression in C. bescii strain RKCB103 compared to native expression in C. kronotskyensis, as seen via the immunofluorescent labeling. Also of particular note is that the native signal peptide from C. kronotskyensis on Calkro_0402 appeared to function in C. bescii to route the protein for secretion. The C. bescii produced
Calkro_0402 also localized to the surface of the C. bescii strain RKCB103 cells, and in
Figure 2.6F appears to localize primarily at the poles of the cell. The organization of SLH proteins on the cell surface is generally thought to be coordinated in bacteria that produce multiple SLH proteins, but the mechanisms by which this occurs has not been determined
(82). This is the first example of manipulation of the Caldicellulosiruptor S-layer through genetic modification.
C. bescii Strain RKCB103 Xylan Solubilization
To understand the role of Calkro_0402 in vivo, the ability of C. bescii strains
RKCB103 and JWCB018 to solubilize plant biomass in culture was examined. Figure 2.7A shows the solubilization of oat spelt and birchwood xylans. Strain RKCB103 is able to
98
solubilize significantly more washed oat spelt xylan than strain JWCB018; the xylose equivalents from the soluble xylooligosaccharides measured in the supernatant are more than double for strain RKCB103 than strain JWCB018 (Figure 2.7B). For both the washed and unwashed birch xylan substrates, strain RKCB103 performs slightly better and released more xylose equivalents to the supernatant than strain JWCB018. Other biomass substrates tested
(diluted acid pretreated Populus trichocarpa x Populus deltoides, dilute acid pretreated switchgrass, and unpretreated Panicum virgatum switchgrass) showed no significant difference in solubilization and also showed < 30 µg/mL xylose equivalents in the culture supernatant (data not shown). This suggests that on these lignocellulosic substrates C. bescii strain JWCB018 produced enough xylanase activity to remove xylan at the rate at which it was exposed from the complex polysaccharide matrix in the substrate, and cellulose solubilization or occlusion of the polysaccharides by lignins are likely limiting the overall solubilization. However, in the substrates with high xylan content, strain RKCB103 with the addition of Calkro_0402 outperformed strain JWCB018.
Immunofluorescence microscopy (Figure 2.6F) shows Calkro_0402 is localized on the surface of strain RKCB103 cells. While the enzyme is tethered to the cell surface, the GH and CBM domains of Calkro_0402 interact with the xylan substrate. This suggests that
Calkro_0402 plays a role in mediating cell attachment to xylan substrates. Cultures of strain
JWCB018 and strain RKCB103 (Figure 2.7, C and D, respectively) grown on washed birchwood xylan support this role for Calkro_0402. These cultures contained approximately the same number of cells as determined by cell counts (data not shown), but the supernatant in Figure 2.7D for strain RKCB103 with Calkro_0402 is relatively clear, as the cells appear
99
to be attached to the insoluble xylan at the bottom of the culture bottle. The strain JWCB018 cultures (Figure 2.7C) were more turbid, suggesting the cells did not as readily attached to the biomass substrate.
To understand the role of Calkro_0402 in improved xylan solubilization, a time course experiment was performed on washed oat spelt xylan. Figure 2.8A shows the planktonic cell density for strains JWCB018 and RKCB103 over the duration of the culture.
The cell densities of the two strains track each other closely. When monitoring xylose release, roughly 9 to 12 hours post-inoculation, the xylose hydrolyzed from xylooligosaccarides by strain RKCB103 was measurable above the baseline abiotic control, whereas, strain JWCB018 takes between 16 and 22 hours post-inoculation (Figure 2.8B). It is important to note that the strains were consuming xylose as they degrade it and these values represent the xylose from excess xylooligosacharides released by the enzymatic action of the cells that has not been consumed for growth. As with the closed bottle total solubilization experiment, strain RKCB103 has about double the xylose equivalents in the supernatant samples as strain JWCB018 at 45.5 hours, the point at which the cultures were harvested.
Discussion
Catalytic CAZyme-containing SLH domain proteins are produced by a variety of lignocellulose-degrading bacteria (21,22,45). The genomes of twelve Caldicellulosiruptor species encode eight different groups of these (Figure 2.1), a subset of the 34 groups of
Caldicellulosiruptor SLH domain proteins (Table S-1). These proteins are one of several
100
mechanisms by which Caldicellulosiruptor species effect plant biomass degradation
(21,59,61-63).
Calkro_0111 from C. kronotskyensis is the largest glycoside hydrolase from all of the biomass degradation enzymes in the Caldicellulosiruptor genus, free or cell-associated, and is entirely unique to C. kronotskyensis. The GH16 domain of Calkro_0111 is an endoglucanase, while the GH55 is an exoglucanase (Figure 2.3). Thus, the two GH domains likely work synergistically, as has been shown for other Caldicellulosiruptor enzymes wihout
SLH domains but with two catalytic domains (76,79,80). Various truncation mutants of
Calkro_0111 (Figure 2.3D) were examined revealing that the non-catalytic ACL domain adjacent to the GH16 domain is necessary for full activity. Furthermore, for the GH55 domain of Calkro_0111, both the ACL-FN3-FN3 domain grouping and two CBM32 domains improved the activity of the GH55 domain.
The very large multi-domain structure of Calkro_0111 is characteristic of many
Caldicellulosiruptor polysaccharide-degrading enzymes, where this architecture is thought to have arisen from domain shuffling (83,84). Specifically for Calkro_0111, the N-terminus
GH16 portion is similar to the N-terminus of Calkro_0072 (45% aa identity) and the C- terminus GH55 portion is similar to Calkro_0113 (58% aa identity). These protein segments may have combined by domain shuffling to form Calkro_0111. However, the ACL-FN3-
FN3-ACL-FN3-FN3 motif in the middle of Calkro_0111, and in homolog Calkro_0121, is completely unique to these two genes. These ACL domains belong to the same protein superfamily (CL22458 RICIN Superfamily) as Fascins, which play a role in bundling actin filaments for cell adhesion and migration in Drosophila and vertebrate species (85). Plants
101
and brown algae, such as Laminaria sp., produce actin and actin bundling proteins that are distinct from Fascins, for a variety of cellular roles (86,87). The ACL domains of
Calkro_0111 may play a role in attachment of this enzyme to actin found in algal or plant cells, similar to the role of a CBM for carbohydrate binding, to help localize the enzyme to laminarin-containing substrates.
Many GH16- and GH55-containing enzymes are active on β-1,3-glucans.
Calkro_0111 activity was detected on a model substrate, laminarin (β-1,3-1,6-glucan), albeit at modest levels such that this is not likely to be its natural substrate. In fact, β-1,3-glucans, similar to laminarin, are produced in plant reproductive and wound tissue (callose) (88,89), exopolysaccharide from bacteria (curdlan) (90), algae-like Laminaria digitata (laminarin), and lichen-like Certraria islandica (lichenan). In addition to this unique substrate preference of Calkro_0111, its genomic neighborhood is also unique, containing other extracellular biomass-degrading enzymes that have no homologs within other species in the genus. These include GH16/GH55 Calkro_0121, GH55 Calkro_0113, and GH81 Calkro_0114. This locus also encodes two ABC transporters, one of which is homologous to a putative glucooligosaccharide transporter in C. saccharolyticus (91). This entire genomic region is transcribed at relatively low levels when C. kronotskyensis is grown on Avicel or switchgrass. Taken together, this genomic locus, including Calkro_0111, likely degrades unique β-1,3-glucans from lichens, algae, or other microorganisms that may be more abundant in the natural environment of Kamchatka, Russia, from which C. kronotskyensis was isolated.
102
The large size of the gene encoding Calkro_0111 demonstrates that this unique multi- domain arrangement is beneficial to C. kronotskyensis for it to be maintained and even duplicated as Calkro_0121. More broadly in the genus Caldicellulosiruptor, the unique multi-domain architecture of enzymes, both with and without SLH domains, has likely evolved to appropriately space the CAZyme domains for increased synergy and activity.
Specifically for SLH domain proteins, domain spacing must also accommodate the tethering of the SLH domains at one end of the protein to the cell surface, while it is attaching to, and degrading, the plant biomass substrate with the other. The large size of these catalytic SLH proteins may reflect the length needed to swing out away from the cell to reach biomass substrates to which they can attach. While domain spacing and arrangement appears to be a critical factor in the activity of Calkro_0111, no pattern for domain arrangement through all of the catalytic Caldicellulosiruptor SLH proteins is readily apparent, based on amino acid sequence. Lacking structural data, which is difficult to obtain for these large multi-domain proteins, the orientation of each domain relative to another cannot be determined precisely. It does seem likely, however, that each protein has evolved its domain spacing to enable its unique combination of binding and catalytic domains to function properly while being tethered to the cell.
While Calkro_0111 and Calkro_0121 are transcribed at low levels when grown on plant biomass substrates, all of the other SLH GH and PL enzymes from C. kronotskyensis and C. bescii are up-regulated when these species are grown on switchgrass compared to
Avicel (Figure 2.5), including the xylanase Calkro_0402. This suggests an important role in plant biomass degradation. Calkro_0402 belongs to a large group of homologous
103
CBM22/GH10/CBM9 xylanases (Figure 2.4A). While Calkro_0402 releases primarily xylobiose and xylotriose from birchwood xylan (Figure 2.4E), similar to previously characterized homologs (28,31), the optimal temperature of 80°C for Calkro_0402 (Figure
2.4D) makes it one of the most thermophilic versions of this family of xylanase enzymes characterized to date. As is predicted by the presence of SLH domains, Calkro_0402 is localized to the cell surface of C. kronotskyensis (Figure 2.2 D-E), which is a common feature of many homologs of Calkro_0402, including some shown in Figure 2.4A associated to the cell by means other than SLH domains. The Calkro_0402 homolog from Thermotoga maritima (and presumably homologs in other Thermotoga sp.) is cell-associated by an amino-terminal hydrophobic peptide anchor and not SLH domains (92). Furthermore, the
Clostridium clariflavum xylanase lacks SLH domains, but has a dockerin domain on its C- terminus, which would allow this enzyme to associate via protein-protein interactions into a cellulosome multi-enzyme complex that are also typically anchored to the cell surface via
SLH domains (61). The fact that these homologs without SLH domains can remain associated with the cell implies that there is particular utility for having this type of
CBM22/GH10/CBM9 xylanase associated with the cell.
Using genetic manipulation of C. bescii to knock-in Calkro_0402, the role of this
SLH protein in vivo for substrate attachment (Figure 2.7 C,D) and degradation (Figure 2.7
A,B and Figure 2.8 A,B) could be examined. Wild-type C. bescii has previously been shown to attach to xylan substrates (63). Our observations suggest that modification of the C. bescii
S-layer to contain SLH domain xylanase Calkro_0402 likely improves this attachment
(Figure 2.7 C,D). Truncation mutants of Xyn10B from Clostridium stercorarium and
104
Xyn10A from Clostridium josui showed that the CBM9 and CBM22 domains mediate the attachment of these enzymes to xylan and other substrates (42,43). While these two xylanases are phylogenetically distant from Calkro_0402 (Figure 2.4A), the role of these
CBM22 and CBM9 domains is likely very similar in Calkro_0402 in the functional role of degrading xylan substrates. The attachment of the Calkro_0402 knock-in in strain RKCB103 is also likely a significant factor in improved solubilization of xylans (Fig 7A). Both C. bescii and C. kronotskyensis produce at least five other extracellular xylanases targeted for the secretome with GH10, GH11, or GH43 domains. If the cell is attached to xylan by
Calkro_0402, these enzymes are secreted from the cell proximate to their substrate, likely making them more effective in the degradation process as well.
Very few of the diverse catalytic CAZyme-containing SLH domain proteins have been characterized, but their role in biomass deconstruction clearly merits closer examination. There are more than 20,000 SLH domain-containing proteins predicted in over
1,800 microbial genomes. These unusual proteins on the surface of bacterial cells contain a variety of functional protein domains, localized at the interface between the cell and its environment. While a full understanding of bacterial S-layer proteins and their varied and important roles at the cell surface is not yet available, the results reported here provide new insights into the role of SLH GHs as agents of plant polysaccharide degradation.
Acknowledgements
This work was supported by the BioEnergy Science Center (BESC), a U.S.
Department of Energy Bioenergy Research Center supported by the Office of Biological and
105
Environmental Research in the DOE Office of Science. LL Lee acknowledges support from an NIH Biotechnology Traineeship (NIH T32 GM008776-11) and NSF Graduate Fellowship, and JM Conway acknowledges support from a US DoEd GAANN Fellowship
(P200A100004-12). We would also like to acknowledge the North Carolina State University
Cellular and Molecular Imaging Facility and Dr. Eva Johannes for her assistance in operating the Zeiss LSM 710 confocal microscope.
106
Figure 2.1 - Catalytic CAZyme SLH Domain Proteins from the genus Caldicellulosiruptor Domain arrangement of these multi-domain proteins is drawn to scale based on prediction from the NCBI’s conserved domain database (93) and Pfam protein families database (94). Representative genes for each group containing more than one member are shown on the left. The number of Caldicellulosiruptor species’ genomes from the 12 completely sequenced to date that contain a member from each group is shown to the right. Domain abbreviations are as follows: GH: glycoside hydrolase, CBM: carbohydrate binding module, PL: polysaccharide lyase, BIg: Bacterial Immunoglobulin-like (CL0159), SLH: surface layer homology (pfam00395), FN3: fibronectin type III (pfam00041), ACL: actin cross linking- like RICIN superfamily (CL22458), Cadherin-like (pfam12733).
107
Figure 2.2 - TEM and Immunofluorescence Microscopy of C. kronotskyensis Transmission electron microscopy of C. kronotskyensis grown on dilute acid pretreated P. trichocarpa x P. deltoides showing the cytoplasmic membrane (cm), peptidoglycan (pg), and S-layer (s) (A). Confocal microscopy of C. kronotskyensis grown on laminarin labeled with total IgY antibodies raised against Calkro_0111 TM7 (B-C). Confocal microscopy of C. kronotskyensis grown on xylooligosaccharides labeled with total IgY antibodies raised against Cakro_0402 TM1 (C-D). The cell counterstain (DAPI) is shown in magenta and secondary antibody labeling (DyLight488 goat anti-chicken) is shown in green. Scale bars are 100 nm (A), 4 µm (B), 1 µm (C), 4 µm (D), 1 µm (E).
108
Figure 2.3 - Biochemical Characterization of Calkro_0111 Domain organization for full length Calkro_0111 and truncation mutants (TMs) produced in E. coli used for characterization (A). Domain abbreviations are as follows: SLH: surface layer homology, CBM: carbohydrate binding module, GH: glycoside hydrolase, FN3: fibronectin type III (pfam00041), and ACL: actin cross linking-like RICIN superfamily (CL22458). (B) Optimum pH at 70°C and optimum temperature at the optimum pH were determined for Calkro_0111 TM1 (0.9 mg/mL incubated 5 hours) and TM8 (0.005 mg/mL incubated 30 min) on the substrate laminarin. Error bars represent the standard deviation from the mean (n=3). (C) Oligosaccharides released from laminarin by TM 1 and TM8. HPLC chromatograms for each enzyme reaction time are aligned with respect to retention time and graphed together. The location of the peaks for standards glucose (G), laminaribiose (L2), laminaritriose (L3), laminaritetraose (L4), laminaripentaose (L5), and laminarihexaose (L6) are shown. (D) Truncation mutant activity as measured by the DNS reducing sugar assay. TMs 1-5 and 7 (4.2 µM) were incubated for 5 hours, TM6 (12.3 µM) was incubated for 10 hours, and TMs 8-10 (0.2 µM) were incubated for 30 min with 1% (w/v) laminarin at 70°C and pH 5 or pH 7. Activity is calculated in µmol reducing sugar released / hour / nmol protein to compare activity for the proteins of different mass. Error bars represent the standard deviation from the mean (n=3).
109
Figure 2.4 - Biochemical Characterization of Calkro_0402 MEGA v6 (95) was used to align, using MUSCLE (96), full protein amino acid sequences and construct an unrooted neighbor-joining phylogenetic tree of close blastp homologs to Calkro_0402 and previously characterized CBM22/GH10/CBM9 containing enzymes (A). Species and protein accession number are shown for each enzyme branch. (B) Domain organization for Calkro_0402 and truncation mutant 1 (TM1), which was used as the antigen for antibody production. Domain abbreviations are as follows: SLH: surface layer homology, CBM: carbohydrate binding module, GH: glycoside hydrolase, Cadherin-like: pfam12733. (C) Purified full length Calkro_0402 (approximately 180 kDa) produced in E. coli and purified by immobilized metal and size exclusion chromatography. (D) Optimum pH at 70°C and the optimum temperature at this optimum pH were determined for recombinant Calkro_0402 (0.02 mg/mL incubated 30 min). Error bars represent the standard deviation from the mean (n=3). (E) Oligosaccharides released from birchwood xylan by Calkro_0402. HPLC chromatograms for each enzyme reaction time are aligned with respect to retention time and graphed together. The location of the peaks for standards xylose (X1), xylobiose (X2), xylotriose (X3), xylotetraose (X4), xylopentaose (X5), and xylohexaose (X6) are shown.
110
Figure 2.5 - Transcriptional response of genes encoding SLH domain proteins from C. bescii and C. kronotskyensis The log squared mean (LSM) transcriptomic level of the 19 SLH domain proteins from C. kronotskyensis and 12 SLH domain proteins from C. bescii when each species is grown on crystalline cellulose (Avicel) and switchgrass (SWG). An LSM value of 0 represents average transcript abundance (black). Genes transcribed at levels higher than average have positive LSM values (red), while genes transcribed at levels lower than average have negative LSM values (green). Differential transcription is shown as Avicel minus SWG with negative values (orange) up-regulated on SWG relative to Avicel. Positive values (blue) are up- regulated on Avicel relative to SWG. Analysis is based on whole genome oligonucleotide microarray experiments deposited in the NCBI GEO database with accession GSE68810 (70). Domain abbreviations are as follows: GH: Glycoside Hydrolase, PL: Polysaccharide Lyase, CBM: Carbohydrate Binding Module, SLH: Surface Layer Homology (pfam00395), FN3: Fibronectin type III-like (pfam00041), Big: Bacterial Immunoglobulin-like (CL0159), ACL: actin cross linking-like RICIN superfamily (CL22458), Tglut: Transglutaminase-like superfamily (pfam01841), MG2: Macroglobulin2 (pfam01835), vWFA: Von Willebrand factor type A (pfam00092), SH3: Bacterial SH3 (pfam08239), RHSrep: RHS repeat (pfam05593), RHScore: RHS associated core domain (TIGR03696). * Calkro_0121 is truncated after the first CBM32.
111
Figure 2.6 - Construction of Calkro_0402 knock-in C. bescii strain RKCB103 Calkro_0402 knock-in vector pJMC009 homologous recombination with the chromosome of parent C. bescii strain JWCB018. pJMC009 was constructed using the pDCW121 (65) vector backbone, which contains the pSC101 origin, repA, and apramycin resistance marker for maintenance in E. coli, as well as, C. bescii wildtype pyrF under the control of the Athe_2105 promoter for selection of uracil prototrophy and 5-FOA counter selection (A). Crossover regions 1kb in length 5’ and 3’ to the ΔCbeI locus (ΔAthe_2438) in strain JWCB018 were used to target Calkro_0402 for insertion via homologous recombination to that locus. The Calkro_0402 insertion construct included Calkro_0402 and an additional 81 bp downstream of the gene to include its predicted transcriptional terminator element amplified from C. kronotskyensis genomic DNA. High, constitutive expression of Calkro_0402 was driven by the 200 bp promoter element upstream of the C. bescii main SLP gene (Athe_2303). (B) The Calkro_0402 construct is inserted in the chromosome of RKCB103 after first crossover vector integration and second crossover 5-FOA counter selection to remove the vector backbone containing pyrF. (C) Using primers outside of the 5’ and 3’ crossover regions (Table 1: ΔAthe_2438 locus F and R), the ΔCbeI locus was amplified and sequenced to verify the Calkro_0402 knock-in. Strain JWCB018 has the expected 2.9 kb product, while strain RKCB103 has the expected 8.2 kb product with the 5.3 kb insertion of the SLP promoter and Calkro_0402 construct. New England Biolabs 1kb ladder was run alongside these PCR products. (D, E, F) Anti-Calkro_0402 immunofluorescence microscopy of C. bescii strain JWCB018, C. kronotskyensis, and C. bescii strain RKCB103 grown on birchwood xylan. (D) C. bescii strain JWCB018, the genetic parent strain, which does not contain Calkro_0402 shows minimal labeling with the anti-Calkro_0402 antibodies. (E) C. kronotskyensis and (F) Calkro_0402 knock-in C. bescii strain RKCB103 are both extensively labeled with antibody. (E,F) Calkro_0402 is shown localized on the cell surface in both C. kronotskyensis and C. bescii strain RKCB103, and appears primarily at the poles of the cells. All cells are labeled with chicken anti-Calkro_0402 total IgY primary antibody and DyLight 488 conjugated goat anti-chicken secondary antibody (green) with DAPI as a counterstain (blue). Scale bars are 2 µm.
112
113
Figure 2.7 - Increased xylan solubilization and xylan attachment by Calkro_0402 knock-in C. bescii strain RKCB103 Total biomass solubilization of washed oat spelt xylan, washed birchwood xylan, and unwashed birchwood xylan after 45.5 hours of growth of C. bescii strain JWCB018 or RKCB103 (A). Xylose measured in the acid hydrolyzed culture supernatant harvested at 45.5 hours (B). Error bars represent the standard deviation from the mean (n=3). (C) C. bescii strain JWCB018 cultures on washed birchwood xylan appear very turbid, and (D) C. bescii strain RKCB103 cultures appear much less turbid, while having the same cell density per mL of culture. C. bescii strain RKCB103 appears to be more readily attached to the insoluble washed birchwood xylan substrate, suggesting that the expression of Calkro_0402 in this strain is playing a role in tethering the cells to the xylan substrate.
114
Figure 2.8 - Time course washed oat spelt xylan solubilization Cell densities of C. bescii strains JWCB018 (blue) and RKCB103 (green) over time while growing on washed oat spelt xylan (A). Xylose measured in acid hydrolyzed supernatant samples for C. bescii strains JWCB018 (blue) and RKCB103 (green) and an abiotic control (brown) (B).
115
Table 2.1 - Primers used in this study Underlined portions are vector specific or overlap regions for isothermal assembly. Primer Sequence Use pET46 Calkro_0111 TM1 F GACGACGACAAGATAAATAAAGCAGGCACA cloning pET46 Calkro_0111 TM1/TM3 R GAGGAGAAGCCCGGTTAATCTGACATATCTGATTC cloning pET46 Calkro_0111 TM2 F GACGACGACAAGATGGTTGAAAAAGGATATTTGATTGG cloning pET46 Calkro_0111 TM2 R GAGGAGAAGCCCGGTTAACTTGGAGGGTTAGGATAAA cloning pET46 Calkro_0111 TM3/TM4 F GACGACGACAAGATGGGTGAATGGAAACTTGT cloning pET46 Calkro_0111 TM4 R GAGGAGAAGCCCGGTTACGGTGTAGAATAGATGATGAA cloning pET46 Calkro_0111 TM5 F GACGACGACAAGATTCCTTCCGTGGGTGAATGGAAACTTGTATGG cloning pET46 Calkro_0111 TM5 R GAGGAGAAGCCCGGTTAACCTTCTCTTTGGTATAC cloning pET46 Calkro_0111 TM6 F GACGACGACAAGATGATTTTAGGATTAGGT cloning pET46 Calkro_0111 TM6 R GAGGAGAAGCCCGGTTACAATTTCTTTTCTAAAGAT cloning pET46 Calkro_0111 TM7 F GACGACGACAAGATCTCTGGAGGCAGGTCTTA cloning pET46 Calkro_0111 TM7 R GAGGAGAAGCCCGGTTAGGTTTCTCCATTTTCATTG cloning pET46 Calkro_0111 TM8 F GACGACGACAAGATGACAGATACAGGTTTGAG cloning pET46 Calkro_0111 TM8 R GAGGAGAAGCCCGGTTACTTGCTCCCATACAC cloning pET46 Calkro_0111 TM9 F GACGACGACAAGATGTCAGAGCCTATATCT cloning pET46 Calkro_0111 TM9 R GAGGAGAAGCCCGGTTACTTGCTCCCATACACTTCA cloning pET46 Calkro_0111 TM10 F GACGACGACAAGATGTTTGGTCCAAATGTCTA cloning pET46 Calkro_0111 TM10 R GAGGAGAAGCCCGGTTAGTTGCAATATTCTGTTAC cloning pET46 Calkro_0402 F GACGACGACAAGATTCAAAGCAGCCAAACTAA cloning pET46 Calkro_0402 R GAGGAGAAGCCCGGTTACTTTTTGAGATTGTTAAGTGCTCTG cloning pET46 Calkro_0402 TM1 F GACGACGACAAGATGTCAGATAGCATAAAG cloning pET46 Calkro_0402 TM1 R GAGGAGAAGCCCGGTTATTGAGCATTTCTTGTAACTG cloning ΔAthe_2438 5' F GAAGTTAGGCTGGTGGTACCCATACACCAAAATGATATAATCTCC pJMC009 ΔAthe_2438 5' R TAAATCCTGTTTATCATGTTGGAAGGCAAATTG pJMC009 Cbescii Pslp (Athe_2303 pJMC009 AACATGATAAACAGGATTTAAAAGAGGCTATG promoter) F Cbescii Pslp (Athe_2303 pJMC009 GCTTTTTCATAACTACTCACCAAACCTC promoter) R Calkro_0402 + Calkro pJMC009 GTGAGTAGTTATGAAAAAGCGACTTATAGC Terminator F Calkro_0402 + Calkro pJMC009 CTACCTAATCGGAAGACTAAAGATTTTCTCC Terminator R ΔAthe_2438 3' F TTAGTCTTCCGATTAGGTAGGTGGTTTTAAC pJMC009 ΔAthe_2438 3' R CACTGAGCGTCAGAGAATTCGGATTTATCGCCTGAGTTG pJMC009 pDCW121 backbone F GAATTCTCTGACGCTCAG pJMC009 pDCW121 backbone R GGTACCACCAGCCTAACTTC pJMC009 ΔAthe_2438 locus F GGCAACCTTCAGTAGGGCTGC PCR ΔAthe_2438 locus R CATTTGATATCAGAATGTTCTCGTTGATT PCR
116
References
1. Sara, M., and Sleytr, U. B. (2000) S-Layer proteins. J. Bacteriol. 182, 859-868
2. Sleytr, U. B., and Beveridge, T. J. (1999) Bacterial S-layers. Trends Microbiol. 7,
253-260
3. Albers, S.-V., and Meyer, B. H. (2011) The archaeal cell envelope. Nat. Rev.
Microbiol. 9, 414-426
4. Kern, J., Wilton, R., Zhang, R., Binkowski, T. A., Joachimiak, A., and Schneewind,
O. (2011) Structure of surface layer homology (SLH) domains from Bacillus
anthracis surface array protein. J. Biol. Chem. 286, 26042-26049
5. Mesnage, S., Fontaine, T., Mignot, T., Delepierre, M., Mock, M., and Fouet, A.
(2000) Bacterial SLH domain proteins are non-covalently anchored to the cell surface
via a conserved mechanism involving wall polysaccharide pyruvylation. EMBO J. 19,
4473-4484
6. Fagan, R. P., Janoir, C., Collignon, A., Mastrantonio, P., Poxton, I. R., and
Fairweather, N. F. (2011) A proposed nomenclature for cell wall proteins of
Clostridium difficile. J. Med. Microbiol. 60, 1225-1228
7. Boot, H. J., Kolen, C. P., and Pouwels, P. H. (1995) Identification, cloning, and
nucleotide sequence of a silent S-layer protein gene of Lactobacillus acidophilus
117
ATCC 4356 which has extensive similarity with the S-layer protein gene of this
species. J. Bacteriol. 177, 7222-7230
8. Johnson, B. R., Hymes, J., Sanozky-Dawes, R., Henriksen, E. D., Barrangou, R., and
Klaenhammer, T. R. (2016) Conserved S-Layer-Associated Proteins Revealed by
Exoproteomic Survey of S-Layer-Forming Lactobacilli. Appl. Environ. Microbiol. 82,
134-145
9. Veith, A., Klingl, A., Zolghadr, B., Lauber, K., Mentele, R., Lottspeich, F., Rachel,
R., Albers, S. V., and Kletzin, A. (2009) Acidianus, Sulfolobus and Metallosphaera
surface layers: structure, composition and gene expression. Mol. Microbiol. 73, 58-72
10. Abdul Halim, M. F., Pfeiffer, F., Zou, J., Frisch, A., Haft, D., Wu, S., Tolić, N.,
Brewer, H., Payne, S. H., Paša-Tolić, L., and Pohlschroder, M. (2013) Haloferax
volcanii archaeosortase is required for motility, mating, and C-terminal processing of
the S-layer glycoprotein. Mol. Microbiol. 88, 1164-1175
11. Abdul Halim, M. F., Karch, K. R., Zhou, Y., Haft, D. H., Garcia, B. A., and
Pohlschroder, M. (2015) Permuting the PGF-CTERM signature motif blocks both
archaeosortase-dependent C-terminal cleavage and prenyl lipid attachment for the
Haloferax volcanii S-layer glycoprotein. J. Bacteriol.
12. Fagan, R. P., and Fairweather, N. F. (2014) Biogenesis and functions of bacterial S-
layers. Nat. Rev. Microbiol. 12, 211-222
118
13. Pham, T. K., Roy, S., Noirel, J., Douglas, I., Wright, P. C., and Stafford, G. P. (2010)
A quantitative proteomic analysis of biofilm adaptation by the periodontal pathogen
Tannerella forsythia. Proteomics 10, 3130-3141
14. Honma, K., Inagaki, S., Okuda, K., Kuramitsu, H. K., and Sharma, A. (2007) Role of
a Tannerella forsythia exopolysaccharide synthesis operon in biofilm development.
Microb. Pathog. 42, 156-166
15. Reynolds, C. B., Emerson, J. E., de la Riva, L., Fagan, R. P., and Fairweather, N. F.
(2011) The Clostridium difficile cell wall protein CwpV is antigenically variable
between strains, but exhibits conserved aggregation-promoting function. PLoS
Pathog. 7, e1002024
16. Sun, Z., Kong, J., Hu, S., Kong, W., Lu, W., and Liu, W. (2013) Characterization of a
S-layer protein from Lactobacillus crispatus K313 and the domains responsible for
binding to cell wall and adherence to collagen. Appl. Microbiol. Biotechnol. 97, 1941-
1952
17. de la Riva, L., Willing, S. E., Tate, E. W., and Fairweather, N. F. (2011) Roles of
cysteine proteases Cwp84 and Cwp13 in biogenesis of the cell wall of Clostridium
difficile. J. Bacteriol. 193, 3276-3285
18. Anderson, V. J., Kern, J. W., McCool, J. W., Schneewind, O., and Missiakas, D.
(2011) The SLH-domain protein BslO is a determinant of Bacillus anthracis chain
length. Mol. Microbiol. 81, 192-205
119
19. Brahamsha, B. (1996) An abundant cell-surface polypeptide is required for swimming
by the nonflagellated marine cyanobacterium Synechococcus. Proc. Natl. Acad. Sci.
U.S.A. 93, 6504-6509
20. McCarren, J., and Brahamsha, B. (2007) SwmB, a 1.12-megadalton protein that is
required for nonflagellar swimming motility in Synechococcus. J. Bacteriol. 189,
1158-1162
21. Ozdemir, I., Blumer-Schuette, S. E., and Kelly, R. M. (2012) S-layer homology
domain proteins Csac_0678 and Csac_2722 are implicated in plant polysaccharide
deconstruction by the extremely thermophilic bacterium Caldicellulosiruptor
saccharolyticus. Appl. Environ. Microbiol. 78, 768-777
22. Egelseer, E., Schocher, I., Sara, M., and Sleytr, U. B. (1995) The S-layer from
Bacillus stearothermophilus DSM 2358 functions as an adhesion site for a high-
molecular-weight amylase. J. Bacteriol. 177, 1444-1451
23. Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M., and Henrissat, B.
(2014) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids
Res. 42, D490-495
24. Lemaire, M., Ohayon, H., Gounon, P., Fujino, T., and Beguin, P. (1995) OlpB, a new
outer layer protein of Clostridium thermocellum, and binding of its S-layer-like
domains to components of the cell envelope. J. Bacteriol. 177, 2451-2459
120
25. Fujino, T., Beguin, P., and Aubert, J. P. (1993) Organization of a Clostridium
thermocellum gene cluster encoding the cellulosomal scaffolding protein CipA and a
protein possibly involved in attachment of the cellulosome to the cell surface. J.
Bacteriol. 175, 1891-1899
26. Fontes, C. M., and Gilbert, H. J. (2010) Cellulosomes: highly efficient nanomachines
designed to deconstruct plant cell wall complex carbohydrates. Annu. Rev. Biochem.
79, 655-681
27. Markowitz, V. M., Chen, I. M., Palaniappan, K., Chu, K., Szeto, E., Pillay, M.,
Ratner, A., Huang, J., Woyke, T., Huntemann, M., Anderson, I., Billis, K., Varghese,
N., Mavromatis, K., Pati, A., Ivanova, N. N., and Kyrpides, N. C. (2014) IMG 4
version of the integrated microbial genomes comparative analysis system. Nucleic
Acids Res. 42, D560-567
28. Han, Y., Agarwal, V., Dodd, D., Kim, J., Bae, B., Mackie, R. I., Nair, S. K., and
Cann, I. K. (2012) Biochemical and structural insights into xylan utilization by the
thermophilic bacterium Caldanaerobius polysaccharolyticus. J. Biol. Chem. 287,
34946-34960
29. Han, Y., Dodd, D., Hespen, C. W., Ohene-Adjei, S., Schroeder, C. M., Mackie, R. I.,
and Cann, I. K. (2010) Comparative analyses of two thermophilic enzymes exhibiting
both beta-1,4 mannosidic and beta-1,4 glucosidic cleavage activities from
Caldanaerobius polysaccharolyticus. J. Bacteriol. 192, 4111-4121
121
30. Lee, S. P., Morikawa, M., Takagi, M., and Imanaka, T. (1994) Cloning of the aapT
gene and characterization of its product, alpha-amylase-pullulanase (AapT), from
thermophilic and alkaliphilic Bacillus sp. strain XAL601. Appl. Environ. Microbiol.
60, 3764-3773
31. Waeonukul, R., Pason, P., Kyu, K. L., Sakka, K., Kosugi, A., Mori, Y., and
Ratanakhanokchai, K. (2009) Cloning, sequencing, and expression of the gene
encoding a multidomain endo-beta-1,4-xylanase from Paenibacillus curdlanolyticus
B-6, and characterization of the recombinant enzyme. J. Microbiol. Biotechnol. 19,
277-285
32. Cheng, Y. M., Hong, T. Y., Liu, C. C., and Meng, M. (2009) Cloning and functional
characterization of a complex endo-beta-1,3-glucanase from Paenibacillus sp. Appl.
Microbiol. Biotechnol. 81, 1051-1061
33. St John, F. J., Rice, J. D., and Preston, J. F. (2006) Paenibacillus sp. strain JDR-2 and
XynA1: a novel system for methylglucuronoxylan utilization. Appl. Environ.
Microbiol. 72, 1496-1506
34. Ito, Y., Tomita, T., Roy, N., Nakano, A., Sugawara-Tomita, N., Watanabe, S., Okai,
N., Abe, N., and Kamio, Y. (2003) Cloning, expression, and cell surface localization
of Paenibacillus sp. strain W-61 xylanase 5, a multidomain xylanase. Appl. Environ.
Microbiol. 69, 6969-6978
122
35. Itoh, T., Sugimoto, I., Hibi, T., Suzuki, F., Matsuo, K., Fujii, Y., Taketo, A., and
Kimoto, H. (2014) Overexpression, purification, and characterization of
Paenibacillus cell surface-expressed chitinase ChiW with two catalytic domains.
Biosci. Biotechnol. Biochem. 78, 624-634
36. St John, F. J., Preston, J. F., and Pozharski, E. (2012) Novel structural features of
xylanase A1 from Paenibacillus sp JDR-2. J. Struct. Biol. 180, 303-311
37. Kim, H., Jung, K. H., and Pack, M. Y. (2000) Molecular characterization of xynX, a
gene encoding a multidomain xylanase with a thermostabilizing domain from
Clostridium thermocellum. Appl. Microbiol. Biotechnol. 54, 521-527
38. Fuchs, K.-P., Zverlov, V. V., Velikodvorskaya, G. A., Lottspeich, F., and Schwarz,
W. H. (2003) Lic16A of Clostridium thermocellum, a non-cellulosomal, highly
complex endo-β-1,3-glucanase bound to the outer cell surface. Microbiology 149,
1021-1031
39. Adelsberger, H., Hertel, C., Glawischnig, E., Zverlov, V. V., and Schwarz, W. H.
(2004) Enzyme system of Clostridium stercorarium for hydrolysis of arabinoxylan:
reconstitution of the in vivo system from recombinant enzymes. Microbiology 150,
2257-2266
40. Feng, J. X., Karita, S., Fujino, E., Fujino, T., Kimura, T., Sakka, K., and Ohmiya, K.
(2000) Cloning, sequencing, and expression of the gene encoding a cell-bound multi-
123
domain xylanase from Clostridium josui, and characterization of the translated
product. Biosci. Biotechnol. Biochem. 64, 2614-2624
41. Ali, M. K., Fukumura, M., Sakano, K., Karita, S., Kimura, T., Sakka, K., and
Ohmiya, K. (1999) Cloning, sequencing, and expression of the gene encoding the
Clostridium stercorarium xylanase C in Escherichia coli. Biosci. Biotechnol.
Biochem. 63, 1596-1604
42. Ali, E., Araki, R., Zhao, G., Sakka, M., Karita, S., Kimura, T., and Sakka, K. (2005)
Functions of family-22 carbohydrate-binding modules in Clostridium josui Xyn10A.
Biosci. Biotechnol. Biochem. 69, 2389-2394
43. Ali, M. K., Hayashi, H., Karita, S., Goto, M., Kimura, T., Sakka, K., and Ohmiya, K.
(2001) Importance of the carbohydrate-binding module of Clostridium stercorarium
Xyn10B to xylan hydrolysis. Biosci. Biotechnol. Biochem. 65, 41-47
44. Ali, M. K., Kimura, T., Sakka, K., and Ohmiya, K. (2001) The multidomain xylanase
Xyn10B as a cellulose-binding protein in Clostridium stercorarium. FEMS
Microbiol. Lett. 198, 79-83
45. Brechtel, E., Matuschek, M., Hellberg, A., Egelseer, E. M., Schmid, R., and Bahl, H.
(1999) Cell wall of Thermoanaerobacterium thermosulfurigenes EM1: isolation of its
components and attachment of the xylanase XynA. Arch. Microbiol. 171, 159-165
124
46. Liu, S. Y., Gherardini, F. C., Matuschek, M., Bahl, H., and Wiegel, J. (1996) Cloning,
sequencing, and expression of the gene encoding a large S-layer-associated
endoxylanase from Thermoanaerobacterium sp. strain JW/SL-YS 485 in Escherichia
coli. J. Bacteriol. 178, 1539-1547
47. Matuschek, M., Burchhardt, G., Sahm, K., and Bahl, H. (1994) Pullulanase of
Thermoanaerobacterium thermosulfurigenes EM1 (Clostridium thermosulfurogenes):
molecular analysis of the gene, composite structure of the enzyme, and a common
model for its attachment to the cell surface. J. Bacteriol. 176, 3295-3302
48. Matuschek, M., Sahm, K., Zibat, A., and Bahl, H. (1996) Characterization of genes
from Thermoanaerobacterium thermosulfurigenes EM1 that encode two glycosyl
hydrolases with conserved S-layer-like domains. Mol. Gen. Genet. 252, 493-496
49. Shao, W. L., Deblois, S., and Wiegel, J. (1995) A high-molecular-weight, cell-
associated xylanase isolated from exponentially growing Thermoanaerobacterium sp.
strain JW/SL-YS485. Appl. Environ. Microbiol. 61, 937-940
50. Huang, X., Li, Z., Du, C., Wang, J., and Li, S. (2015) Improved expression and
characterization of a multidomain xylanase from Thermoanaerobacterium
aotearoense SCUT27 in Bacillus subtilis. J. Agric. Food Chem. 63, 6430-6439
51. Hung, K. S., Liu, S. M., Fang, T. Y., Tzou, W. S., Lin, F. P., Sun, K. H., and Tang, S.
J. (2011) Characterization of a salt-tolerant xylanase from Thermoanaerobacterium
saccharolyticum NTOU1. Biotechnol. Lett. 33, 1441-1447
125
52. Morris, D. D., Gibbs, M. D., Ford, M., Thomas, J., and Bergquist, P. L. (1999)
Family 10 and 11 xylanase genes from Caldicellulosiruptor sp. strain Rt69B.1.
Extremophiles 3, 103-111
53. Blumer-Schuette, S. E., Ozdemir, I., Mistry, D., Lucas, S., Lapidus, A., Cheng, J.-F.,
Goodwin, L. A., Pitluck, S., Land, M. L., Hauser, L. J., Woyke, T., Mikhailova, N.,
Pati, A., Kyrpides, N. C., Ivanova, N., Detter, J. C., Walston-Davenport, K., Han, S.,
Adams, M. W. W., and Kelly, R. M. (2011) Complete genome sequences for the
anaerobic, extremely thermophilic plant biomass-degrading bacteria
Caldicellulosiruptor hydrothermalis, Caldicellulosiruptor kristjanssonii,
Caldicellulosiruptor kronotskyensis, Caldicellulosiruptor owensensis, and
Caldicellulosiruptor lactoaceticus. J. Bacteriol. 193, 1483-1484
54. Lee, L. L., Izquierdo, J. A., Blumer-Schuette, S. E., Zurawski, J. V., Conway, J. M.,
Cottingham, R. W., Huntemann, M., Copeland, A., Chen, I. M., Kyrpides, N.,
Markowitz, V., Palaniappan, K., Ivanova, N., Mikhailova, N., Ovchinnikova, G.,
Andersen, E., Pati, A., Stamatis, D., Reddy, T. B., Shapiro, N., Nordberg, H. P.,
Cantor, M. N., Hua, S. X., Woyke, T., and Kelly, R. M. (2015) Complete Genome
Sequences of Caldicellulosiruptor sp. Strain Rt8.B8, Caldicellulosiruptor sp. Strain
Wai35.B1, and "Thermoanaerobacter cellulolyticus". Genome Announc. 3
55. Elkins, J. G., Lochner, A., Hamilton-Brehm, S. D., Davenport, K. W., Podar, M.,
Brown, S. D., Land, M. L., Hauser, L. J., Klingeman, D. M., Raman, B., Goodwin, L.
A., Tapia, R., Meincke, L. J., Detter, J. C., Bruce, D. C., Han, C. S., Palumbo, A. V.,
126
Cottingham, R. W., Keller, M., and Graham, D. E. (2010) Complete genome
sequence of the cellulolytic thermophile Caldicellulosiruptor obsidiansis OB47T. J.
Bacteriol. 192, 6099-6100
56. van de Werken, H. J., Verhaart, M. R., VanFossen, A. L., Willquist, K., Lewis, D. L.,
Nichols, J. D., Goorissen, H. P., Mongodin, E. F., Nelson, K. E., van Niel, E. W.,
Stams, A. J., Ward, D. E., de Vos, W. M., van der Oost, J., Kelly, R. M., and Kengen,
S. W. (2008) Hydrogenomics of the extremely thermophilic bacterium
Caldicellulosiruptor saccharolyticus. Appl. Environ. Microbiol. 74, 6720-6729
57. Kataeva, I. A., Yang, S. J., Dam, P., Poole, F. L., 2nd, Yin, Y., Zhou, F., Chou, W.
C., Xu, Y., Goodwin, L., Sims, D. R., Detter, J. C., Hauser, L. J., Westpheling, J., and
Adams, M. W. (2009) Genome sequence of the anaerobic, thermophilic, and
cellulolytic bacterium "Anaerocellum thermophilum" DSM 6725. J. Bacteriol. 191,
3760-3761
58. Conway, J. M., Zurawski, J. V., Lee, L. L., Blumer-Schuette, S. E., and Kelly, R. M.
(2015) Lignocellulosic biomass deconstruction by the extremely thermophilic genus
Caldicellulosiruptor. in Thermophilic Microorganisms (Li, F. ed.), Caister Academic
Press, Norfolk, UK. pp 91-120
59. Blumer-Schuette, S. E., Giannone, R. J., Zurawski, J. V., Ozdemir, I., Ma, Q., Yin,
Y., Xu, Y., Kataeva, I., Poole, F. L., 2nd, Adams, M. W., Hamilton-Brehm, S. D.,
Elkins, J. G., Larimer, F. W., Land, M. L., Hauser, L. J., Cottingham, R. W., Hettich,
127
R. L., and Kelly, R. M. (2012) Caldicellulosiruptor core and pangenomes reveal
determinants for noncellulosomal thermophilic deconstruction of plant biomass. J.
Bacteriol. 194, 4015-4028
60. Blumer-Schuette, S. E., Lewis, D. L., and Kelly, R. M. (2010) Phylogenetic,
microbiological, and glycoside hydrolase diversities within the extremely
thermophilic, plant biomass-degrading genus Caldicellulosiruptor. Appl. Environ.
Microbiol. 76, 8084-8092
61. Blumer-Schuette, S. E., Brown, S. D., Sander, K. B., Bayer, E. A., Kataeva, I.,
Zurawski, J. V., Conway, J. M., Adams, M. W. W., and Kelly, R. M. (2014)
Thermophilic lignocellulose deconstruction. FeMS Microbiol. Rev. 38, 393-448
62. Blumer-Schuette, S. E., Alahuhta, M., Conway, J. M., Lee, L. L., Zurawski, J. V.,
Giannone, R. J., Hettich, R. L., Lunin, V. V., Himmel, M. E., and Kelly, R. M. (2015)
Discrete and structurally unique proteins (tāpirins) mediate attachment of extremely
thermophilic Caldicellulosiruptor species to cellulose. J. Biol. Chem. 290, 10645-
10656
63. Dam, P., Kataeva, I., Yang, S. J., Zhou, F., Yin, Y., Chou, W., Poole, F. L., 2nd,
Westpheling, J., Hettich, R., Giannone, R., Lewis, D. L., Kelly, R., Gilbert, H. J.,
Henrissat, B., Xu, Y., and Adams, M. W. (2011) Insights into plant biomass
conversion from the genome of the anaerobic thermophilic bacterium
Caldicellulosiruptor bescii DSM 6725. Nucleic Acids Res. 39, 3240-3254
128
64. Chung, D., Farkas, J., and Westpheling, J. (2013) Overcoming restriction as a barrier
to DNA transformation in Caldicellulosiruptor species results in efficient marker
replacement. Biotechnol. Biofuels 6, 82
65. Cha, M., Chung, D., Elkins, J. G., Guss, A. M., and Westpheling, J. (2013) Metabolic
engineering of Caldicellulosiruptor bescii yields increased hydrogen production from
lignocellulosic biomass. Biotechnol. Biofuels 6, 85
66. Gibson, D. G. (2011) Enzymatic assembly of overlapping DNA fragments. Methods
Enzymol. 498, 349-361
67. Studier, F. W. (2005) Protein production by auto-induction in high density shaking
cultures. Protein Expr. Purif. 41, 207-234
68. Geslin, C., Le Romancer, M., Erauso, G., Gaillard, M., Perrot, G., and Prieur, D.
(2003) PAV1, the first virus-like particle isolated from a hyperthermophilic
euryarchaeote, "Pyrococcus abyssi". J. Bacteriol. 185, 3888-3894
69. Farkas, J., Chung, D., Cha, M., Copeland, J., Grayeski, P., and Westpheling, J. (2013)
Improved growth media and culture techniques for genetic analysis and assessment of
biomass utilization by Caldicellulosiruptor bescii. J. Ind. Microbiol. Biotechnol. 40,
41-49
70. Zurawski, J. V., Conway, J. M., Lee, L. L., Simpson, H., Izquierdo, J. A., Blumer-
Schuette, S., Nookaew, I., Adams, M. W. W., and Kelly, R. M. (2015) Comparative
129
analysis of extremely thermophilic Caldicellulosiruptor species reveals common and
differentiating cellular strategies for plant biomass utilization. Appl. Environ.
Microbiol. 81, 7159-7170
71. Petersen, T. N., Brunak, S., von Heijne, G., and Nielsen, H. (2011) SignalP 4.0:
discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785-786
72. Ghose, T. K. (1987) Measurement of Cellulase Activities. Pure Appl. Chem. 59, 257-
268
73. Miller, G. L. (1959) Use of dinitrosalicylic acid reagent for determination of reducing
sugar. Anal. Chem. 31, 426-428
74. Xiao, Z., Storms, R., and Tsang, A. (2005) Microplate-based carboxymethylcellulose
assay for endoglucanase activity. Anal. Biochem. 342, 176-178
75. Xiao, Z., Storms, R., and Tsang, A. (2004) Microplate-based filter paper assay to
measure total cellulase activity. Biotechnol. Bioeng. 88, 832-837
76. VanFossen, A. L., Ozdemir, I., Zelin, S. L., and Kelly, R. M. (2011) Glycoside
hydrolase inventory drives plant polysaccharide deconstruction by the extremely
thermophilic bacterium Caldicellulosiruptor saccharolyticus. Biotechnol. Bioeng.
108, 1559-1569
77. Lipscomb, G. L., Stirrett, K., Schut, G. J., Yang, F., Jenney, F. E., Jr., Scott, R. A.,
Adams, M. W., and Westpheling, J. (2011) Natural competence in the
130
hyperthermophilic archaeon Pyrococcus furiosus facilitates genetic manipulation:
construction of markerless deletions of genes encoding the two cytoplasmic
hydrogenases. Appl. Environ. Microbiol. 77, 2232-2238
78. Zhao, Y., Meng, K., Luo, H., Huang, H., Yuan, T., Yang, P., and Yao, B. (2013)
Molecular and biochemical characterization of a new alkaline active multidomain
xylanase from alkaline wastewater sludge. World J. Microbiol. Biotechnol. 29, 327-
334
79. Yi, Z., Su, X., Revindran, V., Mackie, R. I., and Cann, I. (2013) Molecular and
biochemical analyses of CbCel9A/Cel48A, a highly secreted multi-modular cellulase
by Caldicellulosiruptor bescii during growth on crystalline cellulose. PLoS One 8,
e84172
80. Ye, L., Su, X., Schmitz, G. E., Moon, Y. H., Zhang, J., Mackie, R. I., and Cann, I. K.
O. (2012) Molecular and biochemical analyses of the GH44 Module of
CbMan5B/Cel44A, a bifunctional enzyme from the hyperthermophilic bacterium
Caldicellulosiruptor bescii. Appl. Environ. Microbiol. 78, 7048-7059
81. Brunecky, R., Alahuhta, M., Bomble, Y. J., Xu, Q., Baker, J. O., Ding, S. Y.,
Himmel, M. E., and Lunin, V. V. (2012) Structure and function of the Clostridium
thermocellum cellobiohydrolase A X1-module repeat: enhancement through
stabilization of the CbhA complex. Acta Crystallogr. D Biol. Crystallogr. 68, 292-
299
131
82. Schneewind, O., and Missiakas, D. M. (2012) Protein secretion and surface display in
Gram-positive bacteria. Philos. Trans. R. Soc. Lond., B, Biol. Sci. 367, 1123-1139
83. Bergquist, P. L., Gibbs, M. D., Morris, D. D., Te'o, V. S. J., Saul, D. J., and Morgan,
H. W. (1999) Molecular diversity of thermophilic cellulolytic and hemicellulolytic
bacteria. FEMS Microbiol. Ecol. 28, 99-110
84. Gibbs, M. D., Reeves, R. A., Farrington, G. K., Anderson, P., Williams, D. P., and
Bergquist, P. L. (2000) Multidomain and multifunctional glycosyl hydrolases from
the extreme thermophile Caldicellulosiruptor isolate Tok7B.1. Curr. Microbiol. 40,
333-340
85. Kureishy, N., Sapountzi, V., Prag, S., Anilkumar, N., and Adams, J. C. (2002)
Fascins, and their roles in cell structure and function. Bioessays 24, 350-361
86. Thomas, C., Tholl, S., Moes, D., Dieterle, M., Papuga, J., Moreau, F., and Steinmetz,
A. (2009) Actin bundling in plants. Cell Motil. Cytoskeleton 66, 940-957
87. Charrier, B., Le Bail, A., and de Reviers, B. (2012) Plant proteus: brown algal
morphological plasticity and underlying developmental mechanisms. Trends Plant
Sci. 17, 468-477
88. Verma, D., and Hong, Z. (2001) Plant callose synthase complexes. Plant Mol. Biol.
47, 693-701
132
89. Dumas, C., and Knox, R. B. (1983) Callose and determination of pistil viability and
incompatibility. Theor. Appl. Genet. 67, 1-10
90. Harada, T. (1992) The story of research into curdlan and the bacteria producing it.
Trends Glycosci. Glycotechnol. 4, 309-317
91. VanFossen, A. L., Verhaart, M. R., Kengen, S. M., and Kelly, R. M. (2009)
Carbohydrate utilization patterns for the extremely thermophilic bacterium
Caldicellulosiruptor saccharolyticus reveal broad growth substrate preferences. Appl.
Environ. Microbiol. 75, 7718-7724
92. Liebl, W., Winterhalter, C., Baumeister, W., Armbrecht, M., and Valdez, M. (2008)
Xylanase attachment to the cell wall of the hyperthermophilic bacterium Thermotoga
maritima. J. Bacteriol. 190, 1350-1358
93. Marchler-Bauer, A., Derbyshire, M. K., Gonzales, N. R., Lu, S., Chitsaz, F., Geer, L.
Y., Geer, R. C., He, J., Gwadz, M., Hurwitz, D. I., Lanczycki, C. J., Lu, F., Marchler,
G. H., Song, J. S., Thanki, N., Wang, Z., Yamashita, R. A., Zhang, D., Zheng, C., and
Bryant, S. H. (2015) CDD: NCBI's conserved domain database. Nucleic Acids Res.
43, D222-D226
94. Punta, M., Coggill, P. C., Eberhardt, R. Y., Mistry, J., Tate, J., Boursnell, C., Pang,
N., Forslund, K., Ceric, G., Clements, J., Heger, A., Holm, L., Sonnhammer, E. L.,
Eddy, S. R., Bateman, A., and Finn, R. D. (2012) The Pfam protein families database.
Nucleic Acids Res. 40, D290-301
133
95. Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. (2013) MEGA6:
Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 2725-
2729
96. Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and
high throughput. Nucleic Acids Res. 32, 1792-1797
134
CHAPTER 3
Functional analysis of the Glucan Degradation Locus (GDL) in Caldicellulosiruptor bescii reveals roles of component glycoside hydrolases in plant biomass deconstruction
To be submitted to Molecular Microbiology
135
Abstract
The Glucan Degradation Locus (GDL) in extremely thermophilic Caldicellulosiruptor species encodes multi-domain glycoside hydrolases (GHs) that are essential for deconstruction of plant biomass polysaccharides. The genomes of the most prolific lignocellulose degraders in the genus contain six identifiable GHs in the GDL, all of which have cellulose-binding modules (CBM3) and four of which have catalytic domains (GH9,
GH44, and GH48) related to crystalline cellulose hydrolysis. The genes encoding the six
GHs in the Caldicellulosiruptor bescii GDL were systematically deleted prior to examining the extent to which the resulting mutants could solubilize crystalline cellulose (Avicel) and plant biomasses switchgrass and poplar. Together three of the GHs, Athe_1867, Athe_1859, and Athe_1857, are responsible for about 90% of the ability to degrade crystalline cellulose and these enzymes act jointly to degrade crystalline cellulose in vivo. Additionally, the relative importance of the different GHs in the GDL varies with plant biomass substrates, with Athe_1867 being most important for poplar degradation and Athe_1857 being most important on switchgrass. Mixed cultures containing two different knockout strains revealed that the presence of all the GDL GHs, even when produced from different strains, degrades switchgrass equally as well as wildtype, suggesting the mix of enzymatic activities of the secretome enzymes is the essential factor for achieving the highest degradation. Full-length, glycosylated versions of Athe_1867, Athe_1859, Athe_1857, and Athe_1860 were produced recombinantly in the native host and examined in vitro separately and in cocktails for crystalline cellulose (Avicel) hydrolysis. While Athe_1867 is the most prolific cellulase, its synergistic action with the other GDL GHs improves its degradation of cellulose two-fold.
136
Further, all of the GH enzymes exhibit some degree of synergy with other GDL GHs, with the pair of Athe_1867 and Athe_1859 exhibiting increased activity 3.3-fold over the activity from their separate contributions. These results demonstrate that the GDL GH enzyme inventory in Caldicellulosiruptor species act to effectively drive maximal degradation of plant biomass substrates.
Introduction
Anaerobic, Gram-positive bacteria from the genus Caldicellulosiruptor are the most thermophilic lignocellulosic biomass degraders currently identified (1,2). These species can be isolated from globally distributed terrestrial hot springs and have optimal growth temperatures between 65-75°C. Caldicellulosiruptor species can degrade unpretreated lignocellulosic substrates, driven by a variety of large, multi-domain carbohydrate active enzymes (CAZymes), including glycoside hydrolases (GHs) (3). These large, single- polypeptide enzymes represent an alternative paradigm for plant biomass degradation compared to fungal or cellulosomal plant biomass degraders (1,4).
Across the 12 Caldicellulosiruptor species that have been isolated, genome sequenced and characterized, the ability to degrade crystalline cellulose varies (5). As the main recalcitrant carbohydrate component of the plant cell wall, efficient degradation of crystalline cellulose is essential to plant biomass conversion. Comparative genomic and phenotypic analyses of the species in the genus have identified CAZy domains that are common to the highly cellulolytic species, including carbohydrate binding module (CBM) family 3 and glycoside hydrolase (GH) families 9, 44, and 48 (6). The genes containing these
137
cellulolytic enzymes are concentrated in one genomic locus, which will be referred to as the
Glucan Degradation Locus (GDL) (5). In C. bescii, the most studied member of the genus, this locus is approximately 50 kb of the 2.9 Mb genome and contains six glycoside hydrolase enzymes (Athe_1867, 1866, 1865, 1860, 1859, and 1857), all of which contain repeated cellulose binding modules from CBM family 3 and Pro/Thr-rich linker regions between
CAZyme domains. The repeated CBM3 domains and linkers, as well as repetitive GH5 and
GH48 sequences, are characteristic of the architecture of the GDL in Caldicellulosiruptor species and are hypothesized to be the result of recombination after gene duplication events
(1,7,8). In fact, the large repetitive sequences of this locus led to errors in genome assembly of the GDL in the initial C. bescii genome sequence (9); this was recently corrected by locating the genes Athe_1859-1857 between Athe_1867 and Athe_1866 (10). Transcriptomic and proteomic analyses of highly cellulolytic Caldicellulosiruptor species show that the GDL is highly transcribed and expressed both during growth on crystalline cellulose and switchgrass (3,11,12).
The most extensively studied enzyme from Caldicellulosiruptor species is the first gene encoded in the GDL, i.e., a GH9/GH48 cellulase referred to as CelA (Athe_1867 in C. bescii). CelA is highly active on crystalline cellulose and one of the most highly expressed proteins in the extracellular proteome (12-14). The N-terminal GH9 domain of CelA functions as an endoglucanase, while the C-terminal GH48 domain is a cellobiohydrolase releasing primarily cellobiose from crystalline cellulose (15). Homologs of CelA (seven full and one partial) are found in the genomes of eight Caldicellulosiruptor species, which represent the most highly cellulolytic members of the genus. CelA displays a surface ablasion
138
mechanism common to processive cellulases. CelA also has a unique cavity digging mechanism, allowing it to burrow into cellulose microfibrals, making it one of the most effective cellulases characterized to date (14).
While CelA has been characterized most extensively, the other five enzymes from the
GDL of C. bescii, with the exception of GH74/GH48 Athe_1860, have been characterized to varying extents (16-19). All of the characterized enzymes have an optimal temperature between 85-90°C and an optimal pH between 5 and 6.5. While CelA is particularly well suited for crystalline cellulose degradation, all of the other characterized enzymes from the
GDL are also active on cellulosic substrates. This includes the GH10 domain of Athe_1857, which displays broader substrate specificity, including on β-glucans, than is typical for enzymes with GH10 domains that are typically active only on xylans (19). In addition to the
β-glucan activities encoded in enzymes of GDL, a highly repetitive GH5 domain, common to
Athe_1859, Athe_1866, and Athe_1865, is active on mannan-containing substrates. Taken together, the GH domains of the GDL from families 5, 9, 10, 44, and 48 show activity on a broad collection of polysaccharide substrates that make up complex plant biomass, including
β-glucans, mannans, and xylans. This wide range of activities would suggest that GDL enzymes play a critical role in enabling Caldicellulosiruptor spp. to scavenge growth substrates from a broad array of polysaccharides encountered in their globally diverse thermal environments.
The predicted molecular weight of CelA, inferred from amino acid sequence, is 190 kDa, although the protein has an apparent Mr of 230 kDa as measured in culture supernatants from C. bescii, the result of glycosylation that appears to be related to secretion (20).
139
Similarly, the other GDL enzymes appear at larger molecular masses than predicted by their amino acid sequence alone. The effect of glycosylation and other native post-translational modifications on these enzymes is largely unknown, as the majority of characterized GDL enzymes were produced in other hosts (Escherichia coli or Bacillus megaterium). Only CelA
(Athe_1867), which was purified natively from C. bescii, has been investigated in its native glycosylated state (13,14). Further, the size and repetitive nature of these genes makes producing recombinant full length versions of these enzymes in traditional protein expression host microorganisms difficult, leading to the characterization of only truncation mutants of the largest enzymes from the GDL.
Here, the individual and collective roles of the GHs encoded in the GDL are examined, as these relate to the deconstruction of crystalline and plant biomass by C. bescii.
This study was made possible by the recent corrections in the GDL genome sequence and improvements in C. bescii genetics (10,21) that facilitated gene deletions and overexpression of the native GHs in this bacterium.
Experimental Procedures
Bacterial Strains, Plasmids and Reagents
For vector construction, NEB 5-alpha and 10-beta E. coli (New England Biolabs) were used. Cloning in E. coli was performed in pET28b (Novagen) and protein was expressed in Rosetta 2 (DE3) E. coli (Millipore) were used for the expression of proteins in
E. coli. An axenic strain of C. bescii was obtained from the Leibniz Institute DSMZ-German
Collection of Microorganisms and Cell Cultures. C. bescii strain JWCB005 (22) and non-
140
replicating vector pDCW121 (23) and replicating shuttle vector pDCW89 (24) were obtained from Dr. Janet Westpheling (University of Georgia, Athens, Georgia). Caldicellulosiruptor genomic DNA (gDNA) for cloning, vector construction, and genome sequencing was extracted, as described previously (25). Genes of interest were amplified by polymerase chain reaction (PCR) and inserted into vectors using Gibson assembly (26), using Gibson
Assembly Mastermix (New England Biolabs). The addition of terminators was performed by placing the sequences on primers and using the Q5 site-directed mutagenesis kit (New
England Biolabs) for the insertion. Plasmids were isolated using ZymoResearch plasmid miniprep classic and ZymoPURE midiprep kits (Zymoresearch); sequences were confirmed by Sanger sequencing (Genewiz). Carbohydrates and biomass used in this study are as follows: Avicel PH-101 crystalline cellulose, Cave-in-rock switchgrass (Panicum virgatum
L.) cultivar Cave-in-Rock field grown in Monroe County, IA, obtained from the National
Renewable Energy Laboratory (NREL) was ground using a Wiley mill (Thomas Scientific) and sieved to 40/80 mesh. Poplar from four-year-old Poplar trichocarpa variants GW-9947 and GW-9762 grown in Clatskanie, OR were milled using a 20-mesh screen on a Wiley-mill
(Thomas Scientific) and were obtained as such from Dr. Gerald A. Tuskan (Oak Ridge
National Lab). The growth conditions for these natural variant poplars were described previously (27,28). Carbohydrates used for enzyme activity assays are as follows: low viscosity barley β-glucan (Megazyme), tamarind xyloglucan (Megazyme), wheat arabinoxylan (Megazyme), 1,4-β-D mannan (Megazyme), lichenan (Megazyme), locust bean gum glucomannan (Sigma Aldrich), birch wood xylan (Sigma Aldrich), oat spelt xylan
141
(Sigma Aldrich), beech wood xylan (Sigma Aldrich), ivory nut mannan (Megazyme), and carboxymethyl cellulose (CMC) (Sigma Aldrich).
Media and Culture Conditions
All E. coli cultures were maintained in either Luria-Bertani (LB) medium (5 g/L yeast extract, 10 g/L tryptone, 10 g/L NaCl) or LB medium with 1.5% (w/v) agar plates. Media was supplemented with 50 μg/mL kanamycin (Fisher Scientific), 50 μg/mL apramycin
(Fisher Scientific), and/or 34 μg/mL chloramphenicol (Sigma Aldrich), as appropriate.
Athe_0458 was expressed in E. coli using ZYM-5052 auto-induction medium, as described previously (29).
C. bescii strains were grown anaerobically with 20% CO2 / 80% N2 headspace at
70°C without agitation in the following growth media. For routine cultivation, a modified version of DSM516 medium (DSMZ [http://www.dsmz.de]) was used, containing 0.33 g/L
NH4Cl, 0.33 g/L KCl, 0.33 g/L MgCl2 ● 6H2O, 0.14 g/L CaCl2 ● 2H2O, 0.16 mM sodium tungstate, 1mL/L trace element solution SL-10, 1mL/L vitamin solution, 0.25 mg/L resazurin, 1 g/L L-Cysteine-HCl ● H20, 1 g/L sodium bicarbonate, 1mM potassium phosphate, 5 g/L glucose or cellobiose. The medium as prepared above is defined and is designated D516 medium. A complex version of this medium, designated C516 medium, is supplemented with 0.5 g/L yeast extract. To grow C. bescii for competent cell preparation,
LOD medium (30), containing either cellobiose or glucose as the carbon source, was supplemented with 1 x 19 amino acid solution (31). C. bescii was plated by embedding in
LOD medium, D516 medium, or C516 medium with 1.5% agar and grown at 65°C in
142
anaerobic conditions with 2% hydrogen/98% nitrogen. For the growth of all uracil auxotrophic mutants, the medium was supplemented with 40 mM uracil.
For solubilization experiments Caldicellulosiruptor strains were grown on a modified defined version of DSM671 medium, containing 1 g/L NH4Cl, 0.02 g/L NaCl, 0.1 g/L MgCl2
● 6H2O, 0.05 g/L CaCl2 ● 2H2O, 0.06 g/L K2HPO4, 1mL/L trace element solution SL-10,
1mL/L vitamin solution, 0.25 mg/L resazurin, 1 g/L L-Cysteine-HCl ● H20, and 2.6 g/L sodium bicarbonate. Solubilization cultures contained 5 g/L plant biomass substrate as the carbon source and were supplemented with 40 mM uracil to support growth of auxotrophic mutants. Fifty µg/mL kanamycin was added to the solubilization cultures for strains resistant to kanamycin.
The vitamin solution for all media recipes contains the following: 20 mg/L biotin, 20 mg/L folic acid, 100 mg/L Pyridoxine-HCl, 50 mg/L Thiamine-HCl ● 2H2O, 50 mg/L riboflavin, 50 mg/L nicotinic acid, 50 mg/L D-Ca-pantothenate, 50 mg/L Vitamin B12, 50 mg/L p-Aminobenzoic acid, and 50 mg/L Lipoic acid.
Cloning and Purification of β-glucosidase Athe_0458
The gene for Athe_0458 was cloned via Gibson assembly (26) using Gibson
Assembly Mastermix (New England Biolabs) into pET28b with a C-terminal histidine tag and transformed into 5-alpha E. coli. Sequence confirmed plasmid was transformed into
Rosetta 2 (DE3) E. coli and protein was expressed in a 1L ZYM-5052 auto-induction media culture with incubation at 37°C with agitation at 250 rpm. Cells were harvested after 20 hours and frozen at -20°C prior to protein purification. The frozen cell pellet was thawed and
143
re-suspended with 5 mL / g wet cell pellet in 50 mM sodium phosphate, pH 7.4, 500 mM
NaCl, 20 mM imidazole containing 1 mg / mL lysozyme (ThermoFisher Scientific). Cells were lysed using a French press at 16,000 psi cell pressure. The lysate was heat treated at
65°C for 20 minutes to remove heat labile E. coli proteins and then centrifuged at 18,000 x g for 30 minutes and 0.22 µm filtered. The cell extract was loaded on a 5 mL HisTrap HP nickel-Sepharose (GE Healthcare) column, operated according to the manufacturer’s instructions using a Biologic DuoFlow FPLC (Bio-Rad). Fractions containing Athe_0458 were combined, concentrated, and buffer exchanged using a 10,000 MWCO Vivaspin PES concentrator.
Construction of Genetic Knockout Strains of C. bescii
The following knockout vectors were constructed via Gibson assembly in the pDCW121 vector backbone (23) using 1kb flanking regions to the desired gene deletion location: pJMC021 for RKCB120, RKCB121, and RKCB123; pJMC033 for RKCB124; pJMC034 for RKCB125 and RKCB127; pJMC069 for RKCB130; and pJMC070 for
RKCB132. In all of these vectors, a 200bp upstream of the Athe_2303 S-layer protein, designated the Pslp promoter, and the CbHTK kanamycin resistance gene were placed between the 1kb flanking regions. Using this vector design the genes knocked out in each strain were replaced by this PslpCbHTK cassette to allow for direct selection of second crossovers using kanamycin, as described previously (21).
To prepare competent cells, a 500 mL culture of C. bescii strain JWCB018 was grown to an optical density at 680 nm of 0.06-0.08 in LOD medium supplemented with 1 x
144
19 amino acid solution. The culture was cooled to room temperature and then cells were harvested. Following this, the cell pellet was washed three times in 10% sucrose solution. All centrifugation steps were carried out at 6000 x g for 10 min. The final cell pellet was re- suspended with 10% sucrose to a total volume of 100-120 µL. Cells (50 µL) were mixed with 1 µg of plasmid DNA at room temperature and electroporated using 1 mm gap cuvettes
(USA Scientific) and a Gene Pulser II system with Pulse Controller PLUS module (Bio-Rad) using the following conditions: 2.0 kV, 200 Ω, and 25 µF. Immediately following electroporation, cells were transferred to 10 mL of pre-warmed complex medium and incubated at 70°C. At 15 minutes and one hour post-electroporation, 1-5 mL of the recovery culture was transferred into selective medium supplemented with 50-100 µg/mL kanamycin.
Cultures were incubated at 70°C for 1-4 days, until growth was observed. This culture was passaged on liquid selective medium once, and then plated on selective medium lacking uracil to select individual colonies. For integrating vectors, a successful first crossover mutant strain was plated on medium supplemented with 40 mM uracil, 4 mM 5-fluoroorotic acid (5-FOA), and 50 µg/mL kanamycin to select for second crossover mutants. Strains were screened by PCR and successful second crossover strains were plated on medium supplemented with uracil and kanamycin two additional times to ensure purity of the strains.
All PCR screening of modified strains were performed on genomic DNA, isolated using the
ZymoBead™ genomic DNA kit (Zymo Research).
145
Genomic Sequencing of Modified C. bescii Strains
Assembled draft genomes sequences were generated by the US Department of
Engery’s Joint Genome Institute (JGI) (32) for C. bescii DSMZ 6725 (wildtype resequencing), JWCB005, RKCB120, RKCB121, RKCB123, RKCB124, RKCB125,
RKCB127, RKCB130, and RKCB132. Sequencing was performed on the Pacific
Biosciences (PacBio) RS / RS II platform with SMRTbell libraries (33).
Secretome Analysis
C. bescii strains were grown in 300 mL D516 medium cultures containing cellobiose as the substrate at 70°C. Growth was monitored by measuring optical density at 600 nm
(OD600) and cultures were harvested during early stationary phase (1-3 x 108 cells/mL) by rapidly cooling the culture in a dry ice bath and centrifuging at 8000 x g for 10 min. The culture supernatant was reserved and stored at 4°C. Cells were re-suspended, transferred to a microcentrifuge tube and centrifuged at 21,000 x g for 1 min. The supernatant was removed and cells were stored at -80°C prior to RNA extraction.
The culture supernatant was centrifuged at 18,000 x g for 20 min to remove any residual cell debris and concentrated and buffer exchanged into 50 mM sodium acetate buffer pH 5.5 containing 10 mM calcium chloride and 100 mM sodium chloride using 50,000
MWCO 20 mL Vivaspin concentrators to a final volume of 5 mL. Total protein concentrations were determined using the BCA assay (Thermo Fisher Scientific). A Mini-
PROTEAN TXG stain free 4-15% gel (Bio-Rad) was run using these supernatant samples with equal total protein loading across all strains. Glycoprotein visualization was performed
146
by staining the gel with the Pierce Glycoprotein Staining Kit (ThermoFisher Scientific). Gel imaging was performed using the Syngene G:BOX gel imaging system.
Construction of Enzyme Expression Strains of C. bescii
Vector pJMC046 was constructed from pSBS4 through several manipulations of
Gibson assembly and site directed mutagenesis. The vector is selected in both E. coli and C. bescii using 50 µg/mL kanamycin. Purified vectors were methylated using recombinantly expressed M.CbeI methylase, as described previously (34). Following methylation, the vector was purified by extracting once with phenol:chloroform:isoamylalcohol (24:25:1) and performing ethanol precipitation. The final plasmid was re-suspended in distilled water. For protein expression strains, electroporations were carried out as described for genetic knockout strains using C. bescii DSMZ wildtype as the parent strain. After growth under selective conditions, these strains were plated once on C516 medium containing glucose and grown in the same liquid medium to obtain the final strains. To verify replicating vector strains, plasmid DNA was isolated from C. bescii using the ZymoResearch plasmid miniprep classic kit was transformed into 5α E. coli. Plasmid DNA was then isolated from colonies of
E. coli and restriction mapped to verify plasmid integrity.
Protein Expression and Purification from C. bescii Strains
Protein expression strains of C. bescii were grown overnight in 500 mL C516 medium containing glucose and 50 µg/mL kanamycin at 70°C and inoculated into a Sartorius
Stedim BioStat Cplus 20L bioreactor containing 17 L of C516 medium with glucose and 50
147
µg/mL kanamycin. The bioreactor was operated at 70°C with agitation at 200 rpm, sparging with 1 L/min 20% CO2/80% N2 gas mix; pH was controlled with NaOH addition at pH 7.00.
The reactor was operated for 22 hours, then cooled to 20-30°C and harvested into a carboy.
The fermentation broth was fractionated using the Millipore Pellicon Mini Tangential Flow
Filter using a Pellicon 2 mini 0.22 µm GVPP 0.1 m2 filter (Millipore) to separate cells from the extracellular protein fraction. This extracellular protein fraction was concentrated as the retentate fraction from the TFF system using a Pellicon 2 mini BioMax 50 kDa cutoff PES
0.1 m2 filter (Millipore) and buffer exchanged into 20 mM sodium phosphate 500 mM sodium chloride buffer pH 7.4. The fermentation broth from 17 L was concentrated to a final volume of 300-800 mL. The final material was sterile-filtered prior to loading it onto a 5 mL
HisTrap HP nickel-Sepharose (GE Healthcare) column, operated according to the manufacturer’s instructions using a Biologic DuoFlow FPLC (Bio-Rad). Protein elution fractions were concentrated and buffer exchanged using 20 mL 50,000 MWCO Vivaspin concentrators. Purity of the proteins was evaluated by SDS-PAGE using Mini-PROTEAN
TGX Stain-Free 4-15% gels (Bio-Rad) with Benchmark protein ladder (Thermo-Fisher).
Protein concentrations were determined using the Pierce BCA protein assay (Pierce, Thermo-
Fisher). Glycoprotein visualization was performed by using the Pierce Glycoprotein Staining
Kit (ThermoFisher Scientific).
Solubilization of Lignocellulosic Biomass Substrates
Prior to starting the solubilization experiment, Caldicellulosiruptor strains were passaged on modified defined DSM671 medium with each of the biomass substrates 3 times
148
to ensure growth on the biomass substrate as the sole carbon source. Fifty mL solubilization cultures were prepared in triplicate for each substrate loaded at 5 g/L and strain in 125 mL serum bottles. Caldicellulosiruptor strains were inoculated into these cultures at a density of
1 x 106 cells/mL. For mixtures of two strains in the same culture, each strain was added at a density of 5 x 105 cells/mL. After growth for 7 days at 70°C, cultures were harvested by centrifugation at 6000 x g for 10 min and washed 3 times with 70°C sterile water. The residual substrate was dried at 70°C until constant mass. The percent solubilization was determined from the difference in mass between the biomass used to prepare each culture and the residual remaining after harvest and drying.
Biochemical Characterization of GDL Glycoside Hydrolases
The activity of Athe_1860 was measured by incubating 1 mg/mL enzyme with 1% polysaccharide substrate at 70°C for 4 hours in 50 mM sodium acetate buffer pH 5.5, containing 10 mM calcium chloride and 100 mM sodium chloride. The pH optimum was determined at 70°C in buffers at pH 3.5-10 (50 mM citrate buffer pH 3.5; 50 mM sodium acetate buffer pH 4.5, 5, 5.5, and 6; 50 mM sodium phosphate pH 6.5, 7, and 8; 50 mM sodium bicarbonate pH 9.2 and 10). The temperature optimum was determined between
55°C and 100°C. Activity was measured using the 3,5-dinitrosalicylic acid (DNS) reducing sugar assay (35,36) adapted to a 96 well format (37). The DNS-reagent used in this study contained 1.6% (w/v) sodium hydroxide, 30% (w/v) sodium potassium tartrate, and 0.1%
(w/v) 3,5-dinitrosalicylic acid. Enzyme reactions were mixed with two volumes of DNS reagent and heated in a thermocycler at 95°C for 5 min, 48°C for 1 min, followed by holding
149
at 20°C. Thirty-six µL of this reaction was mixed with 160 µL distilled water in a 96-well plate and read at 540 nm. Mannose, glucose, and xylose were standards were run alongside the samples to convert absorbance readings to reducing sugar equivalents, as appropriate.
Enzyme cocktail experiments were performed in 50 mM sodium acetate buffer pH
5.5 containing 10 mM calcium chloride and 100 mM sodium chloride with 5 g/L Avicel. To approximate the C. bescii secretome, a mix containing a 25:2:10:1 mass ratio of Athe_1867 :
Athe_1859 : Athe_1857 : Athe_1860 was used based on densitometry of the protein gel for the C. bescii secretome. For a 1mL assay, 100 µg of GDL GH protein was loaded with and without 1µg β-glucosidase Athe_0458. This corresponded to 65.79 µg Athe_1867, 5.26 µg
Athe_1859, 26.32 µg Athe_1857, and 2.63 µg Athe_1860 loading as the wildtype mixture.
To approximate the knockout strains, the enzyme or enzymes knocked out of strains
RKCB120, RKCB121, RKCB124, RKCB125, RKCB130, and RKCB132 were removed from the wildtype enzyme mix and replaced with an equivalent volume of buffer. To analyze the potential contribution of overexpression of enzymes downstream of the knockout location and insertion of the Pslp CbHTK cassette, assays were also prepared with the enzyme mixes of
RKCB120, RKCB130, and RKCB132 with 1.5 x and 3 x the amount of Athe_1859 and/or
Athe_1857, the genes downstream of the deletion sites in these strains. These enzyme assays were incubated statically, to approximate the solubilization culture conditions, at 70°C for 24 hours. To further evaluate the enzyme loading, assays were also conducted with 100 µg, 10
µg, and 1 µg GDL GHs with and without 1 µg β-glucosidase incubated statically for seven days at 70°C. To evaluate the synergism of the GDL enzymes, assays were prepared with a constant 0.5 µM enzyme loading and all possible combinations of 1, 2, 3, and 4 enzymes
150
loaded in equal molar ratios, with and without 1 µg/mL β-glucosidase. This assay was incubated at 70°C with agitation at 500rpm in an Eppendorf thermomixer for 24 h. At the conclusion of all assay incubations, samples were boiled for 15 min to inactivate the enzymes. Cellobiose and glucose release was quantified using an Aminex-87H column
(300mm by 7.8 mm, Bio-Rad) operated with a 5 mM H2SO4 mobile phase at 60°C and 0.6 mL/min. Specific activity units (U) are µmol reducing sugar equivalents released per min for
DNS analyzed samples and µmol glucose equivalents released per min for HPLC analyzed samples. Solubilization is calculated as a percentage of the anhydrous glucan content of the
Avicel loaded in the assay.
Results
The C. bescii Glucan Degradation Locus
The Glucan Degradation Locus (GDL) is highly conserved in the seven most highly cellulolytic Caldicellulosiruptor species characterized to date, including C. bescii. Figure
3.1 shows the C. bescii GDL along with transcriptomic data collected for C. bescii grown on crystalline cellulose (Avicel) and switchgrass, and proteomics data on the extracellular protein fraction of C. bescii grown on six different growth substrates from simple sugars to complex plant biomass. C. bescii has six glucan-degrading enzymes in this locus, which each contain two catalytic GH domains, shown as boxes containing the GH family number in
Figure 3.1A. All of these enzymes contain either two or three cellulose binding modules from CBM family 3. While these GHs represent the core degradation ability of the GDL,
151
other proteins and enzymes encoded in the locus likely contribute to C. bescii’s plant biomass degradation ability.
Upstream of Athe_1867 (CelA) are previously characterized cellulose binding proteins, termed tāpirins, Athe_1871 and Athe_1870 (38). The location of these cellulose- binding proteins near highly cellulolytic enzymes is likely not coincidental. The tāpirins putatively play an important role in cell attachment and anchoring to cellulose, which, in turn, aid in the localization of the secretome, including the GDL enzymes in proximity to the substrate. Located downstream of the GHs in the GDL are three polysaccharide lyase encoding genes: Athe_1855-1853. Athe_1854 has been previously characterized and a crystal structure for the PL3 domain has been solved (39,40). These genes have also previously been knocked out of genetically tractable C. bescii strain JWCB005 and the resulting strain JWCB010 not surprisingly showed decreased ability to grow on pectin-based substrates (41). These three PL genes are not found in the GDLs of all the seven highly cellulolytic species. For example, the GDL of C. saccharolyticus is missing all three PL genes, while the GDL of C. obsidiansis is missing the PL9 and PL11, and in C. krontoskyensis the GDL is missing only the PL9.
In between the GH enzymes Athe_1865 and Athe_1860, are four genes that are highly conserved in all seven highly cellulolytic Caldicellulosiruptor species. Athe_1864 and
Athe_1863 belong to glycosyl transferase protein families that are involved in O- mannosylation of proteins. Protein O-mannosyltransferases (PMTs) are found in both prokaryotes and eukaryotes, and typically have several hydrophobic transmembrane domains
(42). Because of their location in the cytoplasmic membrane in bacteria, glycosylation is
152
thought to be coordinated with secretion (43). This was seen previously for Athe_1867 in C. bescii where extracellular protein was glycosylated, but intracellularly expressed protein lacking the signal peptide was not glycosylated (20). O-mannosylation of various proteins from Gram-positive Mycobacterium species was found on Thr residues particularly in Pro,
Thr, Ala -rich sequences (44,45). Glycosylation patterns for the GDL GHs are not known, but highly Pro/Thr- rich linker sequences are prevalent in these proteins and may be the site of glycosylation.
Athe_1862 encodes a homolog to KWG Leptospira repeat proteins, so named because of their high abundance in Leptospira interrogans proteins, although its function is unknown. Athe_1861 is annotated as a peptidyl-prolyl cis-trans isomerase (PPIase), which are involved in the interconversion of the cis and trans forms of proline during protein maturation. Several types of PPIases are found in bacteria, but Athe_1861 has some sequence similarity to PrsA PPIases, which are thought to be anchored between the cell membrane and peptidoglycan and help to re-fold proteins as they are exported (46,47). This enzyme may play a role in helping in the export and proper extracellular folding of the GDL enzymes, particularly for their Pro/Thr-rich linkers.
To better understand the transcriptional and translational patterns of the GDL during growth on cellulose and plant biomass, previously reported transcriptomic (3) and proteomic data (48) were re-examined in C. bescii (see Figure 3.1B). The GDL GH-containing genes are very highly transcribed during growth on Avicel and switchgrass (3). For C. bescii growth on switchgrass, Athe_1867, Athe_1857, Athe_1865, and Athe_1860 are among the top 15 transcripts (top 0.5%) for the 2780 total genes in the genome. One notable exception
153
to this is GH5/GH44 Athe_1859, which is transcribed at approximately 6-fold lower levels than CelA (Athe_1867). Athe_1860 is separated from the other GDL GHs by the four genes
Athe_1864-1861, which are transcribed at more modest levels. Nevertheless, Athe_1860 is transcribed at comparable levels to the other GH-containing genes. This suggests that although Athe_1860 belongs to a separate operon, there may be coordinated regulation with the other GDL GHs.
Normalized matched ion intensity measurements of the secretome proteins were also analyzed with respect to the GDL from C. bescii grown on various substrates, including soluble sugars (glucose, cellobiose, xylose), biomass polymers (crystalline cellulose Avicel and xylan), and a complex biomass (switchgrass) (48). Normalization of assembled proteins was done using peptide ion intensity values, as described previously (49). These data reflect the transcriptomics, in that the GHs of the GDL are among the most abundant proteins in the extracellular protein fraction. For example, of the 470 proteins detected on Avicel crystalline cellulose and of the 330 detected on switchgrass, all of the GH-containing enzymes of the
GDL are in the top 10% most abundant of these proteins. Based on these proteomics data, the
GH48-containing enzymes Athe_1867, Athe_1859, and Athe_1860 are more abundant by approximately 4-8 fold than their non-GH48 counterparts Athe_1859, Athe_1866, and
Athe_1865. The tāpirin proteins (Athe_1870 and Athe_1871), as well as the polysaccharide lyases (Athe_1855-1853), are also measured in all of these samples at levels about 2-6 fold lower than the lowest GDL GH.
The structure of the operons in the GDL is of interest when considering the transcriptomic and proteomic profiles of the very large GH enzyme genes (3.9-5.7 kb) and
154
the accessory genes present here. To this end, TransTermHP (50) predictions of rho- independent terminators in this locus were investigated. Predicted terminator sequences are shown in Figure 3.1C along with the genes they follow and the TransTermHP combined score, which is related to the probability of a terminator of equal or better quality arising in a random sequence taking into account the GC content of the species. The scores generated by
TransTermHP range from 0 for poor quality terminator predictions to 100 for high quality terminator predictions. In the GDL, predicted terminators with score 100 appear after
Athe_1871, Athe_1868, Athe_1867, Athe_1857, and Athe_1860. Terminators are also predicted after CAZymes Athe_1859, Athe_1865, and Athe_1854, albeit with slightly lower scores (73, 77, and 82 respectively). No terminator is predicted between Athe_1866 and
Athe_1865, or between peptidyl-prolyl cis-trans isomerase Athe_1861 and GH74/GH48 enzyme Athe_1860. These terminator predictions would suggest all of the GH enzymes in the GDL are in separate operons, except Athe_1865 and Athe_1866 that appear to be in a single operon. The locations of these predicted terminators in the GDL are in good agreement with the transcriptomic and proteomic data.
Repetitive Nature of the GDL
The GDL is characterized by its repetitive CBM3, GH5, and GH48 domains, not only at the amino acid level but also at the nucleotide level. Recently, re-sequencing of the C. bescii DSMZ 6725 genome using a PacBio system revealed that the GDL was incorrectly assembled in the originally published sequence. In the C. bescii genome, Athe_1859-1857 is, in fact, between Athe_1867 and Athe_1866. This error occurred because the GH48
155
sequences in Athe_1867, Athe_1857, and Athe_1860 are identical and the short reads of 454 sequencing technology were not able to read through these > 1 kb repeats. Even manual curation of the sequence via Sanger sequencing lead to mistakes that misassembled this locus. To visualize the repetitiveness of the nucleotide sequence of the GDL in C. bescii, a dot plot shown in Figure 3.2 was constructed using EMBOSS dottup
(http://www.bioinformatics.nl/cgi-bin/emboss/dottup) with a sequence word length of 50 bp to compare the recently revised C. bescii sequence to itself. In Figure 3.2, black pixels indicate where a 50 bp sequence at a position on the x-axis is an exact match to 50 bp at a position on the y-axis of the sequence. The diagonal line from the bottom left to top right represents the direct match of the sequence to itself, but the large number of additional lines indicate many other areas where nucleotide sequences 50 bp or larger are repeated. Many of these areas correspond to the repeated CBM3 domains. To better visualize the repeated
GH48 domains (1.8 kb) and GH5 domains (1 kb), these areas have been shaded orange and green, respectively. Of particular note is the 5.7 kb repeated region stretching from the
CBM3s and GH48 of Athe_1867 through the GH5 and CBM3 domains of Athe_1859, which is repeated in the CBM3s, GH48, GH5, and CBM3s of Athe_1857 and Athe_1866, and partially repeated in the CBM3s and GH48 of Athe_1860. This extended large repeat is the reason the genome was originally misassembled. It is important to consider these extended repeats in C. bescii to avoid issues with many standard biomolecular laboratory procedures, including PCR, sequencing, and targeted gene deletions. To this point, strain JWCB029 of C. bescii, in which CelA was deleted, showed reduced ability to grow on crystalline cellulose as well as some lignocellulosic substrates, suggesting its important role in the degradation of
156
cellulosic substrates (51). Due to the repetitive nature of the GDL and the now corrected sequence, it is not clear whether this strain is in fact a deletion of only Athe_1867 CelA or a larger deletion that included Athe_1867, Athe_1859, and Athe_1857. The reverse screening primer used in this study binds in the repeated GH5 domain highlighted in green in Figure
3.2. So, while CelA was certainly deleted in strain JWCB029 (51), it cannot be concluded that only CelA and not a larger portion encompassing multiple enzymes was deleted. This study showed a severe growth phenotype on crystalline cellulose. But without knowing precisely which genes are deleted from the GDL, it is unclear that the absence of only CelA was responsible for this phenotype.
In vivo Analysis of GDL Enzymes through Targeted Gene Deletions
To further probe the role of CelA and the other GH-containing enzymes of the GDL, a series of targeted gene deletions were constructed in C. bescii strain JWCB005, which was also recently sequenced (10). To enable clear selection of modified strains and prevent reversion to the parent strain during counter-selection on 5-FOA, a codon-optimized, highly thermostable kanamycin resistance gene (CbHTK) (21), driven by the promoter (Pslp) from S- layer protein (Athe_2303), was used to replace the genes that were deleted in these strains.
The scheme for the GDL in the knockout strains constructed using these methods are shown in Figure 3.3A. PCR screening was used as an initial verification of the strains, but the repeated regions of the GDL cause primer binding in several locations leading to several
PCR products for each reaction. As a final verification of these strains, the whole genomes were re-sequenced using PacBio sequencing to obtain long reads allowing for the assembly
157
through the repeated areas of the GDL. All of the genome sequences from these strains verify that the knockout strains are missing the genes targeted for deletion and that these are replaced by the Pslp CbHTK construct. This method of replacing the gene with a strong promoter and kanamycin resistance gene enabled genetic manipulation in this complex area of the genome. But one consequence of this method of manipulation is that genes in the same operon as the gene which was deleted could be more highly transcribed and expressed than in the wildtype or JWCB005 parent strain. As seen in Figure 3.1, the GHs in the GDL are transcribed and expressed at high levels on all substrates tested, so that their increased production should not have a significant impact on phenotype, although this issue needs further examination. Additionally, based on the terminator predictions in the GDL, all of the
GDL GHs are in separate operons except for Athe_1866 and Athe_1865. Based on these predictions, all of the PslpCbHTK insertions would be immediately followed by the native terminator for the deleted genes, except in the case of strains RKCB123 and RKCB127 where the terminator falls after the GH5 domain remaining from Athe_1865. This would suggest overexpression of genes downstream is less likely to occur, as it should be mitigated by the terminators in the GDL.
To assess the extracellular proteins produced by these strains and the expression of the enzymes of the GDL, cultures of C. bescii wildtype, parent strain JWCB005, and all of the GDL knockout strains RKCB120-RKCB132 were grown on cellobiose. The supernatant from these cultures was concentrated and separated by SDS-PAGE (see Figure 3.3B). The predicted location of the enzymes of the GDL are shown on the left, based on the genes knockout strains and proteomics performed on C. bescii extracellular proteins previously
158
(52). Athe_1867 CelA is clearly the predominant band in both the C. bescii WT and strain
JWCB005 samples, suggesting it is the most highly expressed. The enzymes deleted in the various RKCB knockout strains were not detected. For example, strains RKCB120,
RKCB130, RKCB121, RKCB123 are all lacking CelA, and the expected band corresponding to Athe_1867 is clearly absent in these lanes. Strain RKCB123, which is missing all but
Athe_1860 and the GH5 of Athe_1865, shows no large extracellular proteins besides the band for Athe_1860 at approximately 240 kDa.
Cellulose and Complex Plant Biomass solubilization by GDL GH Knockout Strains
The extent to which Caldicellulosiruptor species solubilize different complex plant biomass substrates is a key measure of the extracellular enzyme inventory of these species and their function as consolidated biomass processing microorganisms. To this end, C. bescii wildtype, genetic parent strain JWCB005, and GDL enzyme knockout strains RKCB120-132 were grown on crystalline cellulose (Avicel), switchgrass, and two lines of natural variant poplar to assess their ability to solubilize them. The percent solubilization of these substrates was determined at 70°C in static cultures with defined media, where the plant biomass substrate was the sole carbon source. These data are shown in Figure 3.4 expressed as the biotic solubilization percent for each strain, which is directly related to the secretome enzyme inventory of each strain and ability to degrade and grow on the substrate. Biotic solubilization does not include the thermal contribution to solubilization in an abiotic control, which were 2.1% for Avicel, 19.6% for switchgrass, 10.6% for poplar GW-9947, and 10.7% for poplar GW-9762. In these experiments C. bescii wildtype and parent strain JWCB005
159
performed nearly identically, showing the modifications to strain JWCB005 that make it genetically tractable do not affect its ability to degrade these substrates. C. bescii strains
RKCB120-132 have various genes from the GDL knocked out and replaced with the
PslpCbHTK cassette. It is important to note that while it could be possible that the strong promoter of this insertion might drive higher expression of the genes in the locus, particularly downstream, any potential increase in enzyme expression would be expected to increase solubilization. Thus, decreases in plant biomass solubilization by a given strain are a demonstrable effect of the deletion of the gene(s) replaced by the PslpCbHTK cassette.
Additionally, the predicted terminators after Athe_1867, Athe_1859, Athe_1857, Athe_1865, and Athe_1860 would prevent the PslpCbHTK cassette from driving higher expression of genes downstream in all of the knockout strains.
On Avicel (Figure 3.4A), strains RKCB120, RKCB130, RKCB121, and RKCB123, which respectively have knockouts of one, two, three, and five genes starting at CelA
Athe_1867 and continuing downstream, solubilize Avicel significantly less than the wildtype. Lacking CelA alone in strain RKCB120 causes only about a 33% reduction in the biotic solubilization of the wildtype. The double knockout of Athe_1867 and Athe_1859 has a similar reduction (20%) in solubilization. Strain RKCB121, which is lacking Athe_1867,
Athe_1859 and Athe_1857, has a much more severe growth phenotype with a 90% reduction in biotic solubilization, suggesting these three enzymes are the primary cellulases for C. bescii. The individual deletion of Athe_1857 in strain RKCB125 shows no difference from the wildtype and individual deletion of Athe_1859 in strain RKCB132 solubilizes cellulose as well or slightly better than the wildtype. It is interesting that the three strains lacking only
160
an individual member of these three main cellulases (Athe_1867, Athe_1859, or Athe_1857) have at worst a 33% reduction in cellulose solubilization (RKCB120). However, the combined deletion of all three genes in RKCB121 reduces cellulose degradation 90%. This would suggest the action of Athe_1867, Athe_1859, and Athe_1857 is synergistic or coordinated in some way. Additionally, strains RKCB124 and RKCB127 showed no significant reduction in biomass solubilization compared to the wildtype, suggesting the other three GDL GHs contribute little to the degradation of crystalline cellulose.
On switchgrass (Figure 3.4B), a complex herbaceous biomass substrate containing various hemicelluloses in addition to cellulose, a different set of GDL GH enzymes appear to be most important for degradation of this substrate. Here, strain RKCB120 lacking CelA, only has a 20% reduction in the total solubilization of the wildtype. Successive deletions of more genes downstream of CelA in strains RKCB130, RKCB121, and RKCB123 further reduce the switchgrass solubilization. Unlike on Avicel, strain RKCB125, which is lacking only Athe_1857, shows a significant reduction in biotic solubilization. While this
GH10/GH48 enzyme is active on glucan substrates (19), GH10s are typically xylanases, and the importance of this enzyme on a complex substrate containing xylan is not surprising.
Athe_1866 and Athe_1865 contribute modestly to switchgrass solubilization, as their deletion from RKCB121 to RKCB123 and from RKCB125 to RKCB127 results in small reductions in solubilization. Additionally, strain RKCB124, with Athe_1860 deleted, has only a very slight reduction in solubilization from the wildtype.
Two natural variant lines of Poplar trichocarpa (GW-9947 and GW-9762) with reduced lignin content were used to assess biotic solubilization of a complex hardwood
161
substrate (Figure 3.4C&D). Poplar GW-9947 contains 22.7% lignin with 1.7 S/G ratio and
GW-9762 contains 21.7% lignin with 1.5 S/G ratio. These were the lowest lignin variants, and most easily solubilized poplar in a previous solubilization study of five natural variant poplar lines by three cellulolytic Caldicellulosiruptor species, including C. bescii (53). In a different study of sugar release during enzymatic treatment, poplar GW-9947 ranked as the second best and GW-9762 ranked as sixth best for combined glucan and xylan release out of twenty two poplar variants (54). Here, the biotic solubilization of poplar GW-9762 was
11.7% for C. bescii wildtype with GW-9947 being about a third higher at 15.8%. Unlike on
Avicel and switchgrass, Athe_1867 CelA appears to play a very critical role in poplar degradation. Strain RKCB120, with only CelA deleted, has a biotic solubilization of 4.9% on poplar GW-9947, a reduction of 70% from wildtype and on GW-9762 this strain has almost all biotic solubilization eliminated (1.6% biotic solubilization, an 86% reduction). The knockouts of successively larger portions of the GDL (RKCB130, RKCB121, and
RKCB123) perform equally poorly on the more recalcitrant poplar GW-9762. On the less recalcitrant GW-9947, these additional knockouts do increase solubilization slightly from strain RKCB120. Similar to on switchgrass, strain RKCB125, lacking Athe_1857, is able to perform about half the biotic solubilization of the wildtype. The deletion of Athe_1866 and
Athe_1865 from strain RKCB125 to RKCB127 does not change poplar solubilization values, suggesting these enzymes do not play a critical role in poplar degradation. Additionally, the deletion of Athe_1860 in strain RKCB124 does not reduce poplar solubilization for either natural variant.
162
Switchgrass Solubilization by Mixtures of GDL Knockout Strains
To investigate whether the decreases in biotic solubilization are a result of a decreased secretome enzyme inventory, switchgrass solubilization cultures using a mixture of two knockout strains with differing enzyme knockouts were performed. For these experiments strains RKCB120, RKCB121, RKCB127, and RKCB130 and switchgrass were selected because these strains have biotic switchgrass solubilizations at a variety of intermediate levels (15.3%, 4.6%, 7.7%, and 11.8%, respectively) between the maximum
(18.8%) of the wildtype and minimum (0.0%) of the abiotic control. All possible combinations of two strains from this group were performed, and the biotic solubilization values for the mixed cultures are shown alongside the biotic solubilization values of the two individual strains when cultured separately in Figure 3.5. At the bottom of Figure 3.5, the six GDL GH enzymes are listed and those present in the specific strains are indicated. For the mixed cultures, the enzymes produced by one of the two strains are marked once and enzymes produced by both of the two strains in the mixtures are marked twice. Two of the combinations, 130+127 and 127+120, restore the full complement of GDL GH enzymes to the culture between the two combined strains. In the 130+127 combination, all of the enzymes are produced by one of the two strains, except Athe_1860, which is produced by both. The combination of 127+120 is identical, except Athe_1859 is also produced by both strains. In both of these mixed culture cases, restoring the full complement of GDL GH enzymes in the culture, even if they are produced by different strains, restores the biotic solubilization to levels similar to the wildtype. Similarly, the 127+121 mix, while not restoring the full GDL GH complement, restores a mixture similar to strain RKCB125 which
163
is only lacking Athe_1857. The mixture of 127+121 has a biotic solubilization of 7.3%, while strain RKCB125 individually was 9.5%. These mixtures also produce cases where the strains each produce the same set of enzymes and then one of the two strains in the mixture produces an additional one or two enzymes, as in the mixtures of 130+120, 130+121, and
120+121. For example in the mix of 130+120, strain RKCB130 duplicates the enzymes produced by RKCB120, RKCB120 additionally produces Athe_1859, and neither produces
Athe_1867. In this case the mixture solubilizes 15.9%, which is the same as the better solubilizing strain, RKCB120 alone (15.3%). A similar situation occurs for the mixtures of
130+121 and 120+121, where solubilization is close to that of the better strain alone,
RKCB130 and RKCB120, respectively. Taken together, data from these mixed cultures show that the GDL can be separated between two different strains of the same species and perform as well as if the genes are all present in the same species. This also shows that the most important factor in solubilization is the cocktail of extracellular enzymes produces in the culture, independent of the source organism producing the individual enzymes.
Expression and Purification of Glucan Degradation Locus Cellulases
Based on previous expression of proteins from a version of vector pATHE02 native to C. bescii (20) and selection of a version of this plasmid in wildtype C. bescii with the
CbHTK kanamycin resistance gene (21), a new vector pJMC046 was constructed for the expression of proteins with a C-terminal histidine tag in C. bescii using kanamycin selection to maintain the vector. A vector map for pJMC046 is shown in Figure 3.6A. This vector includes the pSC101 origin for replication in E. coli for vector construction. The CbHTK
164
gene driven by the PS30 promoter from the Athe_2105 30S ribosomal subunit is functional for selection of kanamycin resistance in both C. bescii and E. coli. This allows for the removal of the AAC apramycin resistance gene for selection of the vector in E. coli used in previous iterations of this replicating vector. The cloning site containing a C-terminal histidine tag is shown in Figure 3.6B, which is driven by the Pslp promoter from Athe_2303 and is followed by a stem loop sequence taken from after the gene Calkro_0402 in C. kronotskyensis for rho- independent termination. The genes encoding the main GDL cellulases, Athe_1867,
Athe_1859, Athe_1857, and Athe_1860, were separately inserted into this expression vector and transformed into wildtype C. bescii, maintained with kanamycin selection on rich growth medium.
Because the full gene, including the sequence coding for the native signal peptide, was cloned into pJMC046, these GDL GHs were secreted from C.bescii and found extracellularly. Large cultures (17 liters) of C. bescii strains containing the overexpression vectors for each of the four GDL GHs were grown for 22 h, following which the culture supernatant was processed using a tangential flow filtration system to remove the cells and cell debris and then concentrate and buffer exchange proteins larger than 50 kDa. The protein of interest expressed off the replicating vector was then purified from other extracellular proteins by immobilized metal affinity chromatography by utilizing the C-terminal histidine tag. The purified Athe_1867, Athe_1859, Athe_1857, and Athe_1860 produced in C. bescii are shown in Figure 3.6C. While versions of these proteins, except Athe_1860, have previously been characterized, only Athe_1867 CelA has been characterized from a version purified from a 600 L culture of C. bescii (14). By using a histidine tag, the impracticality of
165
native purification from very large culture volumes is avoided. Additionally, expression in the native host means that these versions of the four enzymes should be natively glycosylated by C. bescii, which is confirmed in Figure 3.6D.
Biochemical Characterization of Athe_1860
All of the GDL GH enzymes produced by C. bescii have been characterized in some detail, except for Athe_1860, which has not been characterized. Table 3.1 summarizes the activities of the GDL GH enzymes determined previously (13-19). Here, Athe_1860 was characterized, which at 1904 amino acids, it is the largest GH enzyme produced by C. bescii.
Proteins of this size are problematic to produce recombinantly in heterologous hosts, but this was done here in the native host. Athe_1860 was incubated overnight with a variety of polysaccharide substrates and reducing sugar release as measured by the DNS reducing sugar assay. This screening assay revealed activity on glucan substrates: barley β-glucan, lichenan,
CMC, and Avicel; xylan substrates: birchwood xylan, oat spelt xylan, beechwood xylan, wheat arabinoxylan; and mannan substrates: ivory nut mannan, 1,4 beta-D-mannan; glucomannan from locust bean; and tamarind xyloglucan. Specific activities on these substrates are shown in Table 3.2, where enzyme units are µmol reducing sugar released/min. Athe_1860 has relatively low activity on Avicel compared to the other cellulase enzymes found in the GDL, but like most other enzymes in the GDL, Athe_1860 was active on a broad range of xylans, mannans, and glucans.
166
In Vitro Analysis of Glucan Degradation Locus Enzyme Cocktails
Using versions of Athe_1867, Athe_1859, Athe_1857, and Athe_1860 recombinantly produced in C. bescii, an enzyme cocktail was constructed to mimic the enzyme mix found in the wildtype secretome. The ratio on a mass basis of 25 : 2 : 10 : 1 for Athe_1867 :
Athe_1859 : Athe_1857 : Athe_1860 was determined by densitometry on the wildtype secretome sample. This mixture is shown in Figure 3.7A alongside the C. bescii wildtype secretome sample. Using this wildtype mixture at a loading of 100 µg/mL, enzyme mixes corresponding to knockout strains RKCB120, RKCB132, RKCB130, RKCB121, RKCB124, and RKCB125 were constructed (Figure 3.7B) by removing the enzymes knocked out in these strains from the enzyme mixture. To investigate what the potential effect in vitro of increased expression of genes downstream of the PslpCbHTK cassette insertion, the enzymes downstream of the deletion locus were increased 1.5- (data not shown) and 3-fold (see mixes
120+, 120++, 130+, and 132+ in Figure 3.7B) in addition to removing the enzymes knocked out of these strains. The same concentrations of each of the enzymes in the wildtype mix were also used individually to show the activity of the individual enzymes. The activity of these enzyme cocktails were evaluated on Avicel, with and without C. bescii β-glucosidase
Athe_0458 (Figure 3.7A) loaded at 1 µg/mL, without shaking to mimic the solubilization culture conditions. The close homolog of β-glucosidase Athe_0458 in C. saccharolyticus has previously been characterized (55). Here, the goal of using the β-glucosidase from C. bescii, which does not have a signal peptide and is predicted to be intracellular, is to relieve product inhibition of the enzymes by cellobiose. This strategy was used previously by using addition of Thermotoga maritima β-glucosidase when characterizing Athe_1867 (14). Product
167
inhibition would be minimal in culture with Caldicellulosiruptor species as the bacterium, when it is growing, consumes cellobiose and other soluble sugar products of the GDL enzymes as they are produced, preventing their build up extracellularly.
The cellobiose and glucose released by the various enzyme cocktails recapitulating the wildtype and various knockout strains is shown in Figure 3.7C, without the addition of
β-glucosidase. Specific activity values for these mixtures are shown in Figure 3.7D. For the wildtype enzyme mixture, about 550 µg/mL of cellobiose and glucose were released from
Avicel in 24 hours. This corresponds to a specific activity of 0.0219 U/mg total protein. The enzyme mixes corresponding to strains 124 (lacking Athe_1860) and 125 (lacking
Athe_1857) perform almost identically to the wildtype mix, although these have slightly higher calculated specific activity (0.0227 and 0.0291 U/mg total protein, respectively) because less enzyme is added. Mix 120 is missing Athe_1867, mix 132 is missing
Athe_1859, and mix 130 is missing both. Interestingly, the removal of the relatively small amount of Athe_1859 (5% of the total enzyme mass) from the wildtype mix to 132, results in a loss of 40% of the sugar release. The 120 and 130 mixes release even less sugar. However, mix 120 has the highest specific activity of this group at 0.0150 U/mg total protein. Adding additional Athe_1859 and/or Athe_1857 to mixes 120, 132, and 130 to make mixes 120+,
120++, 130+, and 132+, shows no significant improvement in released sugar, despite the increased total enzyme. Thus specific activity decreases from the 120 to 120+, 120 to 120++,
130 to 130+, and 132 to 132+ mixes. The 121 mix, which has only a very low loading of
Athe_1860 releases no measurable sugar on the timescale of this assay.
168
Figure 3.7E shows cellobiose and glucose released with the addition of 1 µg/mL β- glucosidase to the same enzyme mixes. The trends comparing different enzyme mixes here are the same as without the β-glucosidase addition, but the β-glucosidase significantly increases the total glucose equivalents released, and thus the specific activities shown in
Figure 3.7D are increased. The soluble sugar measurements, both with and without β- glucosidase, are converted to the % glucan solubilization (Figure 3.7F), based on the 5 mg/mL loading of Avicel in the assay, assumed to have the molecular weight of anhydrous glucan (162 g/mol). The addition of β-glucosidase increases the solubilization significantly
(between about 1.5- and 2.5-fold) compared to the assays without the addition of β- glucosidase.
One additional important observation from these enzyme cocktail experiments is that they demonstrates synergism between the enzymes in the GDL. This is most evident for the wildtype mix, which releases more soluble sugar than the sum of that of the constituent enzymes (shown at the right in Figure 3.7C/E). For the assay without β-glucosidase the wildtype releases 566 µg glucose equivalents/mL while 400 µg glucose equivalents/mL would be expected to be released if the release of Athe_1867, Athe_1859, Athe_1857, and
Athe_1860 were purely additive and not synergistic. This demonstrates that some degree of synergism is present in the activity of these enzymes.
To evaluate the synergistic action of these four GDL GHs, the activities of the individual enzymes and equimolar mixtures of all combinations of 2, 3, and 4 of these enzymes were assessed on Avicel with a fixed protein concentration of 0.5 µM. Figure 3.8A shows the percentage of the mixes A through P that are made up of each enzyme. These
169
assays were performed without the addition of β-glucosidase Athe_0458 to assess the synergism of the GDL enzymes only. The glucose and cellobiose release for enzyme mixes
A-P are shown in Figure 3.8B. The degree of synergism for these mixtures was calculated as the ratio of the activity of the mixture to the sum of the activities expected for the individual constituent enzymes. This degree of synergism is shown in Figure 3.8C. For enzymes that are not synergistic, the activities of the individual components would be purely additive giving a degree of synergism of 1.0. Here, synergy is clearly evident between all combinations of these GDL enzymes, except possibly the pairwise combination of
Athe_1857 and Athe_1860 in mix J with a degree of synergism of 1.2. The pair of
Athe_1867 and Athe_1859 exhibits the highest degree of synergy of any pair of enzymes at
3.3. These assays were performed with the addition of 1µg/mL β-glucosidase (data not shown). With the β-glucosidase, total sugar release was increased, similar to the increased activity shown with the addition of β-glucosidase for the enzyme cocktails in Figure 3.7E.
Even with the addition of β-glucosidase the degree of synergism values for the enzyme mixes without β-glucosidase, shown in Figure 3.8C, are not changed.
Discussion
From a genomics perspective, the presence of the GH9, GH44, and GH48 enzymes in the genome of Caldicellulosiruptor species is the major differentiating factor between cellulolytic and non-cellulolytic phenotypes in these extremely thermophilic bacteria (5,6).
These particular GH families are all found in the GDL of Caldicellulosiruptor species. The
GDL encodes six enzymes in C. bescii containing four GH5, two GH9, one GH10, one
170
GH44, three GH48, and one GH74 domains. Genomic loci with similar sets of GH domains are also found in many other cellulolytic bacterial species including Clostridium cellulolyticum (two GH5s, GH8, five GH9s, PL11, GH48) (56,57) and Acidothermus cellulolyticus (two GH5s, GH12, GH6, GH48, and GH74) (58). However, other bacteria do not have a concentrated locus of cellulase genes, and instead these genes are scattered throughout the genome (59). Nevertheless, all cellulolytic bacteria produce complementary
GHs from the major cellulase families: GH5, GH6, GH8, GH9, GH12, GH44, GH45, GH48, and GH124 (60).
While there are many bacteria that produce cellulases, the multi-domain structure of the cellulase enzymes in the GDL is unique to Caldicellulosiruptor species. The architecture of these genes is highly similar with GH domains on the N-terminus and C-terminus and two or three cellulose binding CBM3 domains in between. This multi-domain architecture combined with the highly repetitive nature on the nucleotide sequence level (Figure 3.2) suggests this locus arose by domain shuffling (7,8). This is further supported when comparing the GDLs across Caldicellulosiruptor species (1). For example, the GDL of C. saccharolyticus is missing the repeated CBM3-GH48 GH5-CBM3 found between
Athe_1867-Athe_1859 and Athe_1857-Athe_1866. Instead C. saccharolyticus has one enzyme, which has been characterized and termed CelB (61), that is made up of the GH10 domain homologous to Athe_1857 and the C-terminal GH5 domain homologous to
Athe_1866. It is unclear whether this difference is the result of gene duplication and recombination to expand the GDL repertoire from C. saccharolyticus to C. bescii, or if the repeated domains in C. bescii were lost at some point by C. saccharolyticus. The GDLs of
171
recently sequenced Caldicellulosiruptor species (62) contain GH12 domains among the typical GH5, GH9, GH44, GH48 domains, also likely the result of domain shuffling. The repeated domain shuffling of the GDL sequence in different species represents the evolution of the cellulolytic function in Caldicellulosiruptor, and offers a suite of multi-domain enzymes with a mixture of different activities and domains for degrading plant polysaccharide substrates.
These unique multi-domain enzymes are not only encoded in the genomes of
Caldicellulosiruptor species, they are also highly transcribed and produced (Figure 3.1,
(3,12,48)) under most growth conditions. Based on proteomics data (48), the six GHs from the GDL make up approximately 20% of the secretome protein produced by C. bescii. The large proportion of the secretome that is comprised of the GDL can be seen in the protein gels shown in Figure 3.3B. Because of the large size of these enzymes, the cell would expend significant energy producing these very large proteins (1414-1904 amino acid in length) at their relatively high concentrations. In addition, these large enzymes are glycosylated (Figure 3.3C and Figure 3.6D). Because of their location as part of the GDL, glycosyl transferases Athe_1864 and Athe_1863, likely play a role in this glycosylation, which is believed to occur during secretion (20). Glycosylation of these large multi-domain enzymes may play a role in protecting their Pro/Thr rich linker regions from proteolytic cleavage (63), or aid in their binding and interaction with substrates, as was demonstrated for
Trichoderma reesei cellobiohydrolases (64). While it has been shown that the secretome of
C. bescii (65) and GDL enzymes or their domains (14,17,19) are highly effective at degrading crystalline cellulose and plant biomass substrates, here, using recently improved
172
genetic tools for C. bescii (21) and a corrected genome (10), their roles in vivo could be examined.
Various knockout strains with different combinations of GDL enzyme deletions
(Figure 3.3A) were constructed, and the relative impact of these deletions on the ability to degrade Avicel, switchgrass, and poplar reveals that the relative importance of the GDL enzymes is biomass dependent. In the case of poplar solubilization, Athe_1867 CelA appears to be the most critical, as knockout strain RKCB120 lacking only CelA performs very little biotic solubilization (Figure 3.4C/D). While on switchgrass (Figure 3.4B), the deletion of
Athe_1857 in strain RKCB125 reduces the biotic solubilization by 50%, more than any other single gene knockout. On Avicel, no one enzyme appears to be critical, as the single gene knockout strains degrade over 65% as well as the wildtype, but the combined deletion of
Athe_1867, Athe_1859, and Athe_1857 abolishes nearly 90% of the ability to degrade crystalline cellulose. Previously, an attempt was made to knockout Athe_1867 from C. bescii to demonstrate the criticality of CelA (51). However, upon closer examination of the repetitive genome and the phenotypes reported for strain JWCB029, this strain is likely a
Athe_1867, Athe_1859, Athe_1857 deletion mutant similar to strain RKCB121. This strain demonstrates that it is the combination of Athe_1867, Athe_1859, and Athe_1857 that is critical for cellulose degradation, and further, the criticality of three enzymes suggests that their action is synergistic. For the biomasses tested, the deletion of Athe_1866 and
Athe_1865 (from RKCB121 to RKCB123, or RKCB125 to RKCB127) had only a minor effects. The deletion of Athe_1860 in strain RKCB124 showed no measurable effect on
Avicel or poplar solubilization and only a slight effect for switchgrass. These observations
173
indicate these three genes are not essential for degrading these biomass substrates. The GH5s and GH74 domains encoded in these genes have significant activity on mannans, and thus a substrate not tested here, such as pine or spruce (66) that contains higher mannan levels, might require the use of these enzymes.
The solubilization experiments with the individual knockout strains show the enzymes required for degradation vary with the substrate. Using mixed cultures containing pairwise combinations of four knockout strains (Figure 3.5), solubilization was found to be dependent on the mix of enzymes produced in the culture and independent of the strain that produces them. The two mixture cultures (130+127 and 127+120) both collectively contained all six GDL enzymes; the combination in the culture achieved solubilization similar to the wild type. Because it appears that the most important factor is the extracellular enzyme mixture, strategies based on co-culture of different species producing different enzymes, or addition of industrially produced enzymes to alter the enzyme mixture in the culture, could be feasible. Not considered here, however, is the contribution of the attachment of the cells to the substrate. Caldicellulosiruptor species are known to attach to biomass substrates while they degrade them (38,67), and this could significantly enhance the action of the enzymes by releasing them in close proximity to their substrates. If this is an important factor in how these enzymes function, addition of exogenously produced enzyme may not improve solubilization significantly.
To further investigate the synergy and biomass degradation ability of the GDL enzymes, they were produced in C. bescii for analysis in vitro. By combining a replicating shuttle vector (24) and the antibiotic resistance marker for C. bescii (21) to create plasmid
174
pJMC046 (Figure 3.6A/B), expression of GDL enzymes, with histidine tags to facilitate their purification, is possible in wildtype C. bescii. This strategy offers the advantage that the proteins are glycosylated in their native host (Figure 3.6D). This expression system facilitated the characterization of Athe_1860, the largest GH enzyme produced by C. bescii, which had not been previously characterized. Athe_1860 has similar pH and temperature optima to other GDL GHs (Table 3.1), and is active on a broad variety of glucan, mannan, xylan, glucomannan, and xyloglucan substrates (Table 3.2). This is similar to a thermophilic
GH74 protein from Thermotoga maritima which is most active on barley β-glucan, but also shows activity on xyloglucan, glucomannan, and CMC (68). While Athe_1860 is active on
Avicel, its activity is very slight compared to the Athe_1867.
The cellulases Athe_1867, Athe_1859, and Athe_1857 important for cellulose degradation, as indicated by strain RKCB121, were also produced using the C. bescii expression system. Based on these four enzymes, a synthetic mixture of GDL enzymes was made to recapitulate the wildtype GDL mix with protein ratios determined by densitometry of the wildtype secretome (Figure 3.7A). The activity of this “wildtype” mixture was tested alongside mixtures lacking different enzymes mimicking the GDL enzyme knockout strains.
These results generally mirror the in vivo solubilzation data, with mixes lacking Athe_1860 or Athe_1857 (124 and 125, respectively) performing nearly identically to the wildtype just as the corresponding strains did in vivo. Mix 121 lacking the three main cellulases exhibited minimal activity. The mixes lacking CelA (120, 130, and 132), however, performed significantly worse in vitro than their corresponding knockout strains did in vivo. This could be explained by the insertion of the PslpCbHTK cassette at the CelA deletion causing
175
increased expression of other GDL GHs. However, testing increased enzyme addition of
Athe_1859 and Athe_1857, the enzymes downstream of the CelA deletion and PslpCbHTK insertion, in enzyme mixes 120+, 120++, 130+, and 132+, did not significantly increase the measured activity in vitro. In addition, increased expression of the downstream GDL GHs is unlikely due to the presence of terminators after the majority of GDL GH genes, including
CelA (Figure 3.1C). It is possible that other compensatory effects are occurring in vivo to make up for the loss of CelA, possibly from elsewhere in the genome. Alternatively, CelA at about 65% the mass loading of the synthetic wildtype mixture, while it appears to match the actual wildtype secretome by protein densitometry (Figure 3.7A), may constitute too high a proportion to mimic the in vivo conditions appropriately in vitro.
By adding β-glucosidase (Athe_0458) to the enzyme mixtures, cellobiose released by the GDL enzymes is readily converted to glucose (Figure 3.7E). This relieves end-product inhibition of the GDL enzymes, which was previously demonstrated for Athe_1867 CelA
(14). This end product inhibition also appears to be a factor for Athe_1859, Athe_1857, and
Athe_1860, as their activity is improved with the addition of the β-glucosidase (Figure
3.7D). In vivo, β-glucosidase Athe_0458 is predicted to be intracellular, but its effect in these in vitro experiments would mimic C. bescii cells consuming cellobiose from the extracellular environment to relieve this product inhibition and pulling product through these enzymatic degradation reactions.
The enzyme mixtures recapitulating the GDL enzyme knockout strains in vitro demonstrate the synergism of these four enzymes, as the wildtype mixture performs better than any of the constituent enzymes individually. To further investigate this synergism,
176
experiments were performed with all possible equimolar mixes of these GHs (Figure 3.8).
Using the glucose equivalents released, a degree of synergism was calculated for each mixture, revealing the major synergistic interactions between these GDL GHs. In particular,
Athe_1867 and Athe_1859 are the most synergistic pair of enzymes with the mixture, performing 3.3 times better than the sum of the expected activity of the individual enzymes.
The next highest degree of synergism values are also for the other two pairs with Athe_1859.
While Athe_1859 lacks the GH48 exoglucanase of Athe_1867, Athe_1857, and Athe_1860, the endoglucanase activity of its GH44 likely is the key to its synergistic value, creating more glucose chain ends for the GH48 exoglucanases to degrade.
Using these in vitro and in vivo analyses of the glucan degradation locus in C. bescii, we show, for the first time, the nuanced roles of these large multi-domain GHs in the degradation of plant biomass substrates. While CelA has been highly studied and is a prolific cellulase, its combined action with the other GHs of the GDL is key to cellulose and plant biomass degradation. We demonstrate that the essentiality of individual C. bescii GDL GHs depends on the plant biomass substrate. Using new protein expression tools, we are also able to produce these large enzymes in their native host and demonstrate their synergistic action.
Ultimately, these findings advance how we understand and plan to deploy these enzymes responsible for the cellulolytic phenotype in Calidcellulosiruptor species.
177
Acknowledgements
This work was funded in part by the BioEnergy Science Center, a U.S. Department of
Energy Bioenergy Research Center supported by the Office of Biological and Environmental
Research in the DOE Office of Science.
178
Figure 3.1 – The Glucan Degradation Locus in C. bescii Genes comprising the Glucan Degradation Locus (GDL) (A). GH domains are shown as boxes containing the GH family number. CBM domains are shown as circles containing the CBM family number. PL domains are shown as hexagons containing the PL family number. (B) Transcriptomic and proteomic data for the transcription and expression of GDL genes in C. bescii when grown on various substrates. Proteomic samples contain only the extracellular secretome of C. bescii. Values are shown as log squared mean (LSM). Gene locus tag and annotation are shown to the left. Proteomics nd = not detected. (C) Predicted terminator sequences in the GDL of C. bescii as predicted by TransTermHP (50).
179
Figure 3.2 – Dot plot showing the repeated DNA sequence of the C. bescii GDL The sequence of the C. bescii GDL was used to construct a dot plot using EMBOSS dottup (http://www.bioinformatics.nl/cgi-bin/emboss/dottup). A sequence word length of 50 bp was used to construct the plot. Sequences of 50 bp at a position on the x-axis that match a 50 bp sequence at a position on the y-axis results in a black pixel. The diagonal line from bottom left to top right is the direct match of the sequence to itself. All other markings represent repeated DNA sequences. The repeated GH48 domains of the GDL are shaded in orange. The repeated GH5 domains of the GDL are shaded in green.
180
Figure 3.3 – GDL knockout strains of C. bescii GDL layout for the eight knockout strains generated from C. bescii strain JWCB005 (A). The location of the PslpCbHTK cassette insertion at the gene deletion location is shown. (B) SDS-PAGE gel run on secretome samples from C. bescii wildtype, parent strain JWCB005, and knockout strains RKCB120-132. Lanes are loaded with equal protein mass. The expected location of the GDL enzymes is shown on the left. (C) Glycoprotein stain of the SDS-PAGE gel from (B)
181
Figure 3.4 – Biotic Solubilization of Plant Biomass Substrates by GDL knockout strains Biotic solubilization values for Avicel crystalline cellulose (A), Cave-in-Rock Switchgrass (B), Poplar natural variant GW-9947 (C), Poplar natural variant GW-9762 (D). Biotic solubilization is calculated by subtracting the solubilization in the thermal abiotic control to reflect only the biotic contribution to solubilization. The abiotic thermal solubilization values are as follows: 2.1% for Avicel, 19.6% for switchgrass, 10.6% for poplar GW-9947, and 10.7% for poplar GW-9762. Error bars represent the standard deviation (n=3).
182
Figure 3.5 – Switchgrass solubilization by mixed cultures of GDL knockout strains Biotic solubilization values on switchgrass for the GDL knockout strains (RKCB120, RKCB121, RKCB127, and RKCB130) alone and when combined in a mixed culture inoculated with an equal proportion of two strains. Three replicates of C. bescii containing the full GDL (C. bescii wildtype and C. bescii strain JWCB005) are shown at the left for comparison. The genes contained in each strain are marked in the table with a X. Genes marked with a XX for a mixed culture represents enzymes produced by each of the two strains in the mixture. Error bars represent the standard deviation (n=3).
183
Figure 3.6 – Recombinant expression of GDL Enzymes is C. bescii Vector pJMC046 combines the replicating shuttle vector for C. bescii with the CbHTK gene, allowing for slection in C. bescii (A). The native C. bescii pATHE02 plasmid backbone is marked with a blue hashed half circle, which includes the proposed origin of replication for C. bescii. The pSC101 origin is used for replication in E. coli. The CbHTK kanamycin resistance gene is driven by the 200 bp C. bescii PS30 promoter for selection of kanamycin resistance in both E. coli and C. bescii. The cloning site is marked with a dashed box. Expression of the cloned gene is driven by the Pslp promoter and the cloning site is followed by the Calkro_0402 terminator from C. kronotskyensis. (B) Sequence of the cloning site in pJMC046. The 200 bp Pslp promoter is marked in blue. The start and stop codons surrounding the histidine tag are marked with red boxes. The Calkro_0402 terminator is marked in yellow. (C) SDS-PAGE gel on the C. bescii wildtype concentrated supernatant representing the secretome and purified Athe_1867, Athe_1859, Athe_1857, and Athe_1860 produced using pJMC046 in C. bescii. (D) glycoprotein stain of the SDS-PAGE gel from (C)
184
Figure 3.7 – Enzyme cocktails mimicking C. bescii wildtype and GDL knockout strains SDS-PAGE gel on C. bescii wildtype secretome (lane 1), and synthetic C. bescii wiltype enzyme cocktail (lane 2) comprised of Athe_1867, Athe_1859, Athe_1857, and Athe_1860 in a 25:2:10:1 mass ratio (A). SDS-PAGE gel of purified recombinant β-glucosidase Athe_0458 (lane 3). (B) Mass loading of the enzyme cocktails tested mimicking the C. bescii wildtype and various GDL knockout strains. Cells shaded in green in cocktails 120+, 120++, 130+, and 132+ represent increases of enzyme loading three time that of the wildtype cocktail. (C) Cellobiose and glucose released from Avicel by the various enzyme cocktails as measured by HPLC. (D) Calculated specific activities with and without the addition of β- glucosidase Athe_0458. (E) Cellobiose and glucose released from Avicel by the various enzyme cocktails with the addition of 1µg β-glucosidase as measured by HPLC. (F) Calculated glucan solubilization based on the 5 mg/mL loading of Avicel with (green) and without (orange) the addition of 1µg β-glucosidase. All error bars represent the standard deviation (n=3).
185
Figure 3.8 – Enzyme cocktails demonstrating the synergy between GDL GH enzymes. Domain layout of the four GDL GHs and their molar ratio in enzyme cocktails A-O (A). Sample P represents a no enzyme control. A total loading of 0.5µM enzyme was used with 5 mg/mL Avicel loading. (B) Cellobiose and glucose released from Avicel by enzyme cocktails A-O and no enzyme control P as measured by HPLC. Error bars represent the standard deviation (n=3). (C) Equation for the degree of synergy and the calculated degree of synergy for the enzyme mixtures E-O.
186
Table 3.1 - Characterized GDL Enzymes
Amino Predicted Observed Temperature pH Specific Activity Gene Locus Domains Acid MW MW Optimum Substrates Reference Optimum ◊ length (kDa) (kDa) (°C)
GH9- Avicel: 23-55 CBM3- glucans (Avicel, U/µmol, barley Athe_1867 CBM3- 1759 195 230 85 5-6.5 barley beta-glucan, (13-15) beta-glucan: CBM3- CMC); xylan 100510 U/µmol GH48
glucans (PASC, CMC, Avicel, lichenan); mannans GH5- (guar gum, 1,4-β- CBM3- Avicel: 5.3 Athe_1859 1294 142 170 85 5 mannan); (17) CBM3- U/µmol glucomannan GH44 (KGM); xylan (birchwood); xyloglucan xylan (beech wood); arabinoxylan (wheat); glucans Avicel: 3.4 (barley β-glucan, U/µmol, GH10- lichenin, CMC, beechwood CBM3- PASC, Avicel, filter Athe_1857* 1478 165 200 85-90 6.5 xylan: 39000 (19) CBM3- paper); U/µmol, barley GH48 glucomannan beta-glucan (konjac flour); 3400 U/µmol arabinans (arabic gum, debranched arabinan); pectin
GH5- N-terminal GH5: CBM3- Mannans; C- Athe_1866^ CBM3- 1414 156 195 N/A 5.5-6.5 N/A (17,18) terminal GH5: CBM3- glucans (PASC), GH5
glucans (Avicel, filter paper, PASC, lichenin); mannans GH9- (locust bean gum, Avicel: 10.15 CBM3- guar gum); U/µmol, locust Athe_1865 CBM3- 1369 151 180 85-90 5.5-6.5 (16,18) glucomannan bean gum 1691 CBM3- (konjac); U/mg GH5 arabinoxylan (wheat); xylan (oat spelt) glucans (barley β- glucan, CMC, Avicel, lichenan); mannans (ivory nut, GH74- 1,4-β-mannan); CBM3- Avicel: 0.7 Athe_1860 1904 209 240 85 5.5-6 glucomannan (locust this study CBM3- U/µmol bean); xylans (birch GH48 wood, beech wood, oat spelt, arabinoxylan); xyloglucan * GH10 only was characterized - GH48 domain is 100% identical to Athe_1867 ^ Athe_1866 N-terminal GH5 is 100% identical to Athe_1859 GH5 ◊ enzyme activity U = µmol reducing sugar / min
187
Table 3.2 - Specific Activity of Athe_1860 U = µmol reducing sugar / min Error represents the standard deviation (n=4)
Substrate Linkage Specific Activity (U/µmol)
barley β-glucan β-1,3/4-Glucan 15.6 ± 2.2 tamarind xyloglucan β-1,4-Glucan/α-1,6-Xylose sidechain 8.8 ± 0.6 wheat arabinoxylan β-1,4-Xylan/α-1,2/3-Arabinose sidechain 8.5 ± 0.6 1,4-β-D-mannan β-1,4-Mannan 8.3 ± 0.5 lichenan β-1,3/4-Glucan 8.3 ± 0.3 locust bean glucomannan β-1,4-Glucan/Mannan 7.5 ± 0.5 birch wood xylan β-1,4-Xylan 6.7 ± 0.4 beech wood xylan β-1,4-Xylan 6.3 ± 0.4 oat spelt xylan β-1,4-Xylan 5.0 ± 0.3 ivory nut mannan β-1,4-Mannan 3.8 ± 0.2 CMC β-1,4-Glucan 1.5 ± 0.1 Avicel β-1,4-Glucan 0.70 ± 0.02
188
References
1. Blumer-Schuette, S. E., Brown, S. D., Sander, K. B., Bayer, E. A., Kataeva, I.,
Zurawski, J. V., Conway, J. M., Adams, M. W. W., and Kelly, R. M. (2014)
Thermophilic lignocellulose deconstruction. FeMS Microbiol. Rev. 38, 393-448
2. Yang, S.-J., Kataeva, I., Hamilton-Brehm, S. D., Engle, N. L., Tschaplinski, T. J.,
Doeppke, C., Davis, M., Westpheling, J., and Adams, M. W. W. (2009) Efficient
Degradation of Lignocellulosic Plant Biomass, without Pretreatment, by the
Thermophilic Anaerobe “Anaerocellum thermophilum” DSM 6725. Appl. Environ.
Microbiol. 75, 4762-4769
3. Zurawski, J. V., Conway, J. M., Lee, L. L., Simpson, H., Izquierdo, J. A., Blumer-
Schuette, S., Nookaew, I., Adams, M. W. W., and Kelly, R. M. (2015) Comparative
analysis of extremely thermophilic Caldicellulosiruptor species reveals common and
differentiating cellular strategies for plant biomass utilization. Appl. Environ.
Microbiol. 81, 7159-7170
4. Conway, J. M., Zurawski, J. V., Lee, L. L., Blumer-Schuette, S. E., and Kelly, R. M.
(2015) Lignocellulosic biomass deconstruction by the extremely thermophilic genus
Caldicellulosiruptor. in Thermophilic Microorganisms (Li, F. ed.), Caister Academic
Press, Norfolk, UK. pp 91-120
5. Blumer-Schuette, S. E., Giannone, R. J., Zurawski, J. V., Ozdemir, I., Ma, Q., Yin,
Y., Xu, Y., Kataeva, I., Poole, F. L., 2nd, Adams, M. W., Hamilton-Brehm, S. D.,
189
Elkins, J. G., Larimer, F. W., Land, M. L., Hauser, L. J., Cottingham, R. W., Hettich,
R. L., and Kelly, R. M. (2012) Caldicellulosiruptor core and pangenomes reveal
determinants for noncellulosomal thermophilic deconstruction of plant biomass. J.
Bacteriol. 194, 4015-4028
6. Blumer-Schuette, S. E., Lewis, D. L., and Kelly, R. M. (2010) Phylogenetic,
microbiological, and glycoside hydrolase diversities within the extremely
thermophilic, plant biomass-degrading genus Caldicellulosiruptor. Appl. Environ.
Microbiol. 76, 8084-8092
7. Bergquist, P. L., Gibbs, M. D., Morris, D. D., Te'o, V. S. J., Saul, D. J., and Morgan,
H. W. (1999) Molecular diversity of thermophilic cellulolytic and hemicellulolytic
bacteria. FEMS Microbiol. Ecol. 28, 99-110
8. Gibbs, M. D., Reeves, R. A., Farrington, G. K., Anderson, P., Williams, D. P., and
Bergquist, P. L. (2000) Multidomain and multifunctional glycosyl hydrolases from
the extreme thermophile Caldicellulosiruptor isolate Tok7B.1. Curr. Microbiol. 40,
333-340
9. Kataeva, I. A., Yang, S. J., Dam, P., Poole, F. L., 2nd, Yin, Y., Zhou, F., Chou, W.
C., Xu, Y., Goodwin, L., Sims, D. R., Detter, J. C., Hauser, L. J., Westpheling, J., and
Adams, M. W. (2009) Genome sequence of the anaerobic, thermophilic, and
cellulolytic bacterium "Anaerocellum thermophilum" DSM 6725. J. Bacteriol. 191,
3760-3761
190
10. Williams-Rhaesa, A. M., Poole, F. L., 2nd, Dinsmore, J., Lipscomb, G. L.,
Rubinstein, G. M., Scott, I. M., Conway, J. M., Lee, L. L., Khatibi, P. A., Kelly, R.
M., and Adams, M. W. W. (2017) Genome Stability in Engineered Strains of the
Extremely Thermophilic, Lignocellulose-Degrading Bacterium Caldicellulosiruptor
bescii. Appl. Environ. Microbiol. Submitted February 21, 2017
11. Lochner, A., Giannone, R. J., Keller, M., Antranikian, G., Graham, D. E., and
Hettich, R. L. (2011) Label-free quantitative proteomics for the extremely
thermophilic bacterium Caldicellulosiruptor obsidiansis reveal distinct abundance
patterns upon growth on cellobiose, crystalline cellulose, and switchgrass. J.
Proteome Res. 10, 5302-5314
12. Lochner, A., Giannone, R. J., Rodriguez, M., Jr., Shah, M. B., Mielenz, J. R., Keller,
M., Antranikian, G., Graham, D. E., and Hettich, R. L. (2011) Use of label-free
quantitative proteomics to distinguish the secreted cellulolytic systems of
Caldicellulosiruptor bescii and Caldicellulosiruptor obsidiansis. Appl. Environ.
Microbiol. 77, 4042-4054
13. Zverlov, V., Mahr, S., Riedel, K., and Bronnenmeier, K. (1998) Properties and gene
structure of a bifunctional cellulolytic enzyme (CelA) from the extreme thermophile
'Anaerocellum thermophilum' with separate glycosyl hydrolase family 9 and 48
catalytic domains. Microbiology 144 ( Pt 2), 457-465
191
14. Brunecky, R., Alahuhta, M., Xu, Q., Donohoe, B. S., Crowley, M. F., Kataeva, I. A.,
Yang, S.-J., Resch, M. G., Adams, M. W. W., Lunin, V. V., Himmel, M. E., and
Bomble, Y. J. (2013) Revealing Nature’s Cellulase Diversity: The Digestion
Mechanism of Caldicellulosiruptor bescii CelA. Science 342, 1513-1516
15. Yi, Z., Su, X., Revindran, V., Mackie, R. I., and Cann, I. (2013) Molecular and
biochemical analyses of CbCel9A/Cel48A, a highly secreted multi-modular cellulase
by Caldicellulosiruptor bescii during growth on crystalline cellulose. PLoS One 8,
e84172
16. Su, X., Mackie, R. I., and Cann, I. K. (2012) Biochemical and mutational analyses of
a multidomain cellulase/mannanase from Caldicellulosiruptor bescii. Appl. Environ.
Microbiol. 78, 2230-2240
17. Ye, L., Su, X., Schmitz, G. E., Moon, Y. H., Zhang, J., Mackie, R. I., and Cann, I. K.
O. (2012) Molecular and biochemical analyses of the GH44 Module of
CbMan5B/Cel44A, a bifunctional enzyme from the hyperthermophilic bacterium
Caldicellulosiruptor bescii. Appl. Environ. Microbiol. 78, 7048-7059
18. Wang, R., Gong, L., Xue, X., Qin, X., Ma, R., Luo, H., Zhang, Y., Yao, B., and Su,
X. (2016) Identification of the C-Terminal GH5 Domain from CbCel9B/Man5A as
the First Glycoside Hydrolase with Thermal Activation Property from a Multimodular
Bifunctional Enzyme. PLoS One 11, e0156802
192
19. Xue, X., Wang, R., Tu, T., Shi, P., Ma, R., Luo, H., Yao, B., and Su, X. (2015) The
N-Terminal GH10 Domain of a Multimodular Protein from Caldicellulosiruptor
bescii Is a Versatile Xylanase/beta-Glucanase That Can Degrade Crystalline
Cellulose. Appl Environ Microbiol 81, 3823-3833
20. Chung, D., Young, J., Bomble, Y. J., Vander Wall, T. A., Groom, J., Himmel, M. E.,
and Westpheling, J. (2015) Homologous expression of the Caldicellulosiruptor bescii
CelA reveals that the extracellular protein is glycosylated. PLoS One 10, e0119508
21. Lipscomb, G. L., Conway, J. M., Blumer-Schuette, S. E., Kelly, R. M., and Adams,
M. W. W. (2016) Highly Thermostable Kanamycin Resistance Marker Expands the
Toolkit for Genetic Manipulation of Caldicellulosiruptor bescii. Appl. Environ.
Microbiol. 82, 4421-4428
22. Chung, D., Farkas, J., and Westpheling, J. (2013) Overcoming restriction as a barrier
to DNA transformation in Caldicellulosiruptor species results in efficient marker
replacement. Biotechnol Biofuels 6, 82
23. Cha, M., Chung, D., Elkins, J. G., Guss, A. M., and Westpheling, J. (2013) Metabolic
engineering of Caldicellulosiruptor bescii yields increased hydrogen production from
lignocellulosic biomass. Biotechnol. Biofuels 6, 85
24. Chung, D., Cha, M., Farkas, J., and Westpheling, J. (2013) Construction of a stable
replicating shuttle vector for Caldicellulosiruptor species: use for extending genetic
methodologies to other members of this genus. PLoS One 8, e62881
193
25. Geslin, C., Le Romancer, M., Erauso, G., Gaillard, M., Perrot, G., and Prieur, D.
(2003) PAV1, the first virus-like particle isolated from a hyperthermophilic
euryarchaeote, "Pyrococcus abyssi". J. Bacteriol. 185, 3888-3894
26. Gibson, D. G. (2011) Enzymatic assembly of overlapping DNA fragments. Methods
Enzymol. 498, 349-361
27. Evans, L. M., Slavov, G. T., Rodgers-Melnick, E., Martin, J., Ranjan, P., Muchero,
W., Brunner, A. M., Schackwitz, W., Gunter, L., Chen, J. G., Tuskan, G. A., and
DiFazio, S. P. (2014) Population genomics of Populus trichocarpa identifies
signatures of selection and adaptive trait associations. Nat. Genet. 46, 1089-1096
28. Muchero, W., Guo, J., DiFazio, S. P., Chen, J. G., Ranjan, P., Slavov, G. T., Gunter,
L. E., Jawdy, S., Bryan, A. C., Sykes, R., Ziebell, A., Klapste, J., Porth, I., Skyba, O.,
Unda, F., El-Kassaby, Y. A., Douglas, C. J., Mansfield, S. D., Martin, J., Schackwitz,
W., Evans, L. M., Czarnecki, O., and Tuskan, G. A. (2015) High-resolution genetic
mapping of allelic variants associated with cell wall chemistry in Populus. BMC
Genomics 16, 24
29. Studier, F. W. (2005) Protein production by auto-induction in high density shaking
cultures. Protein Expr. Purif. 41, 207-234
30. Farkas, J., Chung, D., Cha, M., Copeland, J., Grayeski, P., and Westpheling, J. (2013)
Improved growth media and culture techniques for genetic analysis and assessment of
194
biomass utilization by Caldicellulosiruptor bescii. J. Ind. Microbiol. Biotechnol. 40,
41-49
31. Lipscomb, G. L., Stirrett, K., Schut, G. J., Yang, F., Jenney, F. E., Jr., Scott, R. A.,
Adams, M. W., and Westpheling, J. (2011) Natural competence in the
hyperthermophilic archaeon Pyrococcus furiosus facilitates genetic manipulation:
construction of markerless deletions of genes encoding the two cytoplasmic
hydrogenases. Appl. Environ. Microbiol. 77, 2232-2238
32. Nordberg, H., Cantor, M., Dusheyko, S., Hua, S., Poliakov, A., Shabalov, I.,
Smirnova, T., Grigoriev, I. V., and Dubchak, I. (2014) The genome portal of the
Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res. 42,
D26-31
33. Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D.,
Baybayan, P., Bettman, B., Bibillo, A., Bjornson, K., Chaudhuri, B., Christians, F.,
Cicero, R., Clark, S., Dalal, R., Dewinter, A., Dixon, J., Foquet, M., Gaertner, A.,
Hardenbol, P., Heiner, C., Hester, K., Holden, D., Kearns, G., Kong, X., Kuse, R.,
Lacroix, Y., Lin, S., Lundquist, P., Ma, C., Marks, P., Maxham, M., Murphy, D.,
Park, I., Pham, T., Phillips, M., Roy, J., Sebra, R., Shen, G., Sorenson, J., Tomaney,
A., Travers, K., Trulson, M., Vieceli, J., Wegener, J., Wu, D., Yang, A., Zaccarin, D.,
Zhao, P., Zhong, F., Korlach, J., and Turner, S. (2009) Real-time DNA sequencing
from single polymerase molecules. Science 323, 133-138
195
34. Chung, D., Farkas, J., Huddleston, J. R., Olivar, E., and Westpheling, J. (2012)
Methylation by a unique alpha-class N4-cytosine methyltransferase is required for
DNA transformation of Caldicellulosiruptor bescii DSM6725. PLoS One 7, e43844
35. Miller, G. L. (1959) Use of dinitrosalicylic acid reagent for determination of reducing
sugar. Anal. Chem. 31, 426-428
36. Ghose, T. K. (1987) Measurement of Cellulase Activities. Pure Appl. Chem. 59, 257-
268
37. Xiao, Z., Storms, R., and Tsang, A. (2005) Microplate-based carboxymethylcellulose
assay for endoglucanase activity. Anal. Biochem. 342, 176-178
38. Blumer-Schuette, S. E., Alahuhta, M., Conway, J. M., Lee, L. L., Zurawski, J. V.,
Giannone, R. J., Hettich, R. L., Lunin, V. V., Himmel, M. E., and Kelly, R. M. (2015)
Discrete and structurally unique proteins (tāpirins) mediate attachment of extremely
thermophilic Caldicellulosiruptor species to cellulose. J. Biol. Chem. 290, 10645-
10656
39. Alahuhta, M., Brunecky, R., Chandrayan, P., Kataeva, I., Adams, M. W., Himmel, M.
E., and Lunin, V. V. (2013) The structure and mode of action of Caldicellulosiruptor
bescii family 3 pectate lyase in biomass deconstruction. Acta. Crystallogr. D Biol.
Crystallogr. 69, 534-539
196
40. Alahuhta, M., Taylor, L. E., 2nd, Brunecky, R., Sammond, D. W., Michener, W.,
Adams, M. W., Himmel, M. E., Bomble, Y. J., and Lunin, V. (2015) The catalytic
mechanism and unique low pH optimum of Caldicellulosiruptor bescii family 3
pectate lyase. Acta. Crystallogr. D Biol. Crystallogr. 71, 1946-1954
41. Chung, D., Pattathil, S., Biswal, A. K., Hahn, M. G., Mohnen, D., and Westpheling, J.
(2014) Deletion of a gene cluster encoding pectin degrading enzymes in
Caldicellulosiruptor bescii reveals an important role for pectin in plant biomass
recalcitrance. Biotechnol. Biofuels 7
42. Lommel, M., and Strahl, S. (2009) Protein O-mannosylation: conserved from bacteria
to humans. Glycobiology 19, 816-828
43. Iwashkiw, J. A., Vozza, N. F., Kinsella, R. L., and Feldman, M. F. (2013) Pour some
sugar on it: the expanding world of bacterial protein O-linked glycosylation. Mol.
Microbiol. 89, 14-28
44. Dobos, K. M., Khoo, K. H., Swiderek, K. M., Brennan, P. J., and Belisle, J. T. (1996)
Definition of the full extent of glycosylation of the 45-kilodalton glycoprotein of
Mycobacterium tuberculosis. J Bacteriol 178, 2498-2506
45. Michell, S. L., Whelan, A. O., Wheeler, P. R., Panico, M., Easton, R. L., Etienne, A.
T., Haslam, S. M., Dell, A., Morris, H. R., Reason, A. J., Herrmann, J. L., Young, D.
B., and Hewinson, R. G. (2003) The MPB83 antigen from Mycobacterium bovis
197
contains O-linked mannose and (1-->3)-mannobiose moieties. J Biol Chem 278,
16423-16432
46. Wiemels, R. E., Cech, S. M., Meyer, N. M., Burke, C. A., Weiss, A., Parks, A. R.,
Shaw, L. N., and Carroll, R. K. (2017) An Intracellular Peptidyl-Prolyl cis/trans
Isomerase Is Required for Folding and Activity of the Staphylococcus aureus
Secreted Virulence Factor Nuclease. J Bacteriol 199
47. Veeraraghavan, S., Nall, B. T., and Fink, A. L. (1997) Effect of Prolyl Isomerase on
the Folding Reactions of Staphylococcal Nuclease. Biochemistry 36, 15134-15139
48. Poudel, S., Giannone, R. J., Basen, M., Poole, F. L., 2nd, Kelly, R. M., Adams, M. W.
W., and Hettich, R. L. (2017) Characterizing the range and specificity of substrate-
specific extracellular enzymes in Caldicellulosiruptor bescii. in Society for Industrial
Microbiology and Biotechnology Annual Meeting, Denver, CO
49. Abraham, P. E., Yin, H., Borland, A. M., Weighill, D., Lim, S. D., De Paoli, H. C.,
Engle, N., Jones, P. C., Agh, R., Weston, D. J., Wullschleger, S. D., Tschaplinski, T.,
Jacobson, D., Cushman, J. C., Hettich, R. L., Tuskan, G. A., and Yang, X. (2016)
Transcript, protein and metabolite temporal dynamics in the CAM plant Agave. Nat.
Plants 2, 16178
50. Kingsford, C. L., Ayanbule, K., and Salzberg, S. L. (2007) Rapid, accurate,
computational discovery of Rho-independent transcription terminators illuminates
their relationship to DNA uptake. Genome Biol. 8, R22
198
51. Young, J., Chung, D., Bomble, Y. J., Himmel, M. E., and Westpheling, J. (2014)
Deletion of Caldicellulosiruptor bescii CelA reveals its crucial role in the
deconstruction of lignocellulosic biomass. Biotechnol. Biofuels 7, 142
52. Yokoyama, H., Yamashita, T., Morioka, R., and Ohmori, H. (2014) Extracellular
Secretion of Noncatalytic Plant Cell Wall-Binding Proteins by the Cellulolytic
Thermophile Caldicellulosiruptor bescii. J. Bacteriol. 196, 3784-3792
53. Zurawski, J. V. (2016) Comparative analysis of extremely thermophilic
Caldicellulosiruptor species reveals common and differentiating cellular strategies for
plant biomass utilization. in Microbiological and Engineering Studies of Extremely
Thermophilic Caldicellulosiruptor Species for Lignocellulose Deconstruction and
Conversion, North Carolina State University, Robert M. Kelly, advisor. pp 40-97
54. Bhagia, S., Muchero, W., Kumar, R., Tuskan, G. A., and Wyman, C. E. (2016)
Natural genetic variability reduces recalcitrance in poplar. Biotechnol. Biofuels 9, 106
55. Hong, M.-R., Kim, Y.-S., Park, C.-S., Lee, J.-K., Kim, Y.-S., and Oh, D.-K. (2009)
Characterization of a recombinant β-glucosidase from the thermophilic bacterium
Caldicellulosiruptor saccharolyticus. J. Biosci. Bioeng. 108, 36-40
56. Belaich, J. P., Tardif, C., Belaich, A., and Gaudin, C. (1997) The cellulolytic system
of Clostridium cellulolyticum. J Biotechnol 57, 3-14
199
57. Desvaux, M. (2005) Clostridium cellulolyticum: model organism of mesophilic
cellulolytic clostridia. FEMS Microbiol Rev 29, 741-764
58. Ding, S.-Y., Adney, W. S., Vinzant, T. B., Decker, S. R., Baker, J. O., Thomas, S. R.,
and Himmel, M. E. (2003) Glycoside Hydrolase Gene Cluster of Acidothermus
cellulolyticus. in Applications of Enzymes to Lignocellulosics, American Chemical
Society. pp 332-360
59. Guglielmi, G., and Beguin, P. (1998) Cellulase and hemicellulase genes of
Clostridium thermocellum from five independent collections contain few overlaps
and are widely scattered across the chromosome. FEMS Microbiol Lett 161, 209-215
60. Bayer, E. A., Shoham, Y., and Lamed, R. (2013) Lignocellulose-Decomposing
Bacteria and Their Enzyme Systems. in The Prokaryotes: Prokaryotic Physiology
and Biochemistry (Rosenberg, E., DeLong, E. F., Lory, S., Stackebrandt, E., and
Thompson, F. eds.), Springer Berlin Heidelberg, Berlin, Heidelberg. pp 215-266
61. VanFossen, A. L., Ozdemir, I., Zelin, S. L., and Kelly, R. M. (2011) Glycoside
hydrolase inventory drives plant polysaccharide deconstruction by the extremely
thermophilic bacterium Caldicellulosiruptor saccharolyticus. Biotechnol. Bioeng.
108, 1559-1569
62. Lee, L. L., Izquierdo, J. A., Blumer-Schuette, S. E., Zurawski, J. V., Conway, J. M.,
Cottingham, R. W., Huntemann, M., Copeland, A., Chen, I. M., Kyrpides, N.,
Markowitz, V., Palaniappan, K., Ivanova, N., Mikhailova, N., Ovchinnikova, G.,
200
Andersen, E., Pati, A., Stamatis, D., Reddy, T. B., Shapiro, N., Nordberg, H. P.,
Cantor, M. N., Hua, S. X., Woyke, T., and Kelly, R. M. (2015) Complete Genome
Sequences of Caldicellulosiruptor sp. Strain Rt8.B8, Caldicellulosiruptor sp. Strain
Wai35.B1, and "Thermoanaerobacter cellulolyticus". Genome Announc. 3
63. Upreti, R. K., Kumar, M., and Shankar, V. (2003) Bacterial glycoproteins: functions,
biosynthesis and applications. Proteomics 3, 363-379
64. Payne, C. M., Resch, M. G., Chen, L., Crowley, M. F., Himmel, M. E., Taylor, L. E.,
2nd, Sandgren, M., Stahlberg, J., Stals, I., Tan, Z., and Beckham, G. T. (2013)
Glycosylated linkers in multimodular lignocellulose-degrading enzymes dynamically
bind to cellulose. Proc. Natl. Acad. Sci. U.S.A. 110, 14646-14651
65. Kanafusa-Shinkai, S., Wakayama, J. i., Tsukamoto, K., Hayashi, N., Miyazaki, Y.,
Ohmori, H., Tajima, K., and Yokoyama, H. (2013) Degradation of microcrystalline
cellulose and non-pretreated plant biomass by a cell-free extracellular
cellulase/hemicellulase system from the extreme thermophilic bacterium
Caldicellulosiruptor bescii. J. Biosci. Bioeng. 115, 64-70
66. Girio, F. M., Fonseca, C., Carvalheiro, F., Duarte, L. C., Marques, S., and Bogel-
Lukasik, R. (2010) Hemicelluloses for fuel ethanol: A review. Bioresour. Technol.
101, 4775-4800
67. Conway, J. M., Pierce, W. S., Le, J. H., Harper, G. W., Wright, J. H., Tucker, A. L.,
Zurawski, J. V., Lee, L. L., Blumer-Schuette, S. E., and Kelly, R. M. (2016)
201
Multidomain, Surface Layer-associated Glycoside Hydrolases Contribute to Plant
Polysaccharide Degradation by Caldicellulosiruptor Species. J. Biol. Chem. 291,
6732-6747
68. Chhabra, S. R., and Kelly, R. M. (2002) Biochemical characterization of Thermotoga
maritima endoglucanase Cel74 with and without a carbohydrate binding module
(CBM). FEBS Lett. 531, 375-380
202
CHAPTER 4
Structural and Biochemical Characterization of Glycoside Hydrolase Catalytic Domains and Carbohydrate Binding Modules from Caldicellulosiruptor Biomass Degradation Enzymes
203
Abstract
Caldicellulosiruptor species produce very large, multi-domain, extracellular enzymes that contain multiple catalytic glycoside hydrolase and carbohydrate binding domains within a single polypeptide. Using these multi-domain, multi-functional enzymes
Caldicellulosiruptor species are able to deconstruct plant biomass polysaccharides, including crystalline cellulose. This strategy for plant biomass degradation is distinct from bacteria that produce cellulosomes or free enzyme producing fungi. To examine the relationship between the structure and function of individual units from these proteins, the constituent domains from three Caldicellulosiruptor multi-domain enzymes were characterized biochemically in conjunction with their three-dimensional crystal structures. The protein domains investigated here include: a GH16 laminarinase (Calkro_0111), a GH5 β-glucanase (Csac_0678), and a
GH10/GH12/GH48 xylanase/β-glucanase (Wai35_2053). The Calkro_0111 GH16 domain was not active in the absence of an adjacent fascin-like domain. The structures for both of these domains were solved and significantly higher activity on the branched substrate laminarin than unbranched β-glucans suggests that the fascin domain of this protein functions to orient the catalytic GH16. A CBM28 adjacent to a GH5 in Csac_0678 (a highly conserved
SLH-domain protein among Caldicellulosiruptor species, Athe_0594 in Caldicellulosiruptor bescii) appears to play a similar role in enhancing the activity of the glycoside hydrolase domain. Additionally, the crystal structure of Csac_0678 was used to determine the catalytic acid/base (Glu189) and nucleophilic (Glu286) residues, in the GH5 of Athe_0594. The crystal structures of 3 of the 4 domains in Wai35_2053 were also determined. These domains, like those solved from Calkro_0111 and Csac_0678, are individually folded and
204
catalytic entities strung along a single polypeptide chain. Unlike some other multi-domain enzymes from Caldicellulosiruptor species, the activity of the domains of Wai35_2053 do not appear to act in a synergistic way, with each domain having its highest activity on a substrate with slightly different linkages and the overall activity of Wai35_2053 being similar to the sum of the activity of its parts. Taken together, by elucidating the relationship between the structure and function of the individual domains of the multi-domain enzymes of
Caldicellulosiruptor species, we not only understand how these domains and enzymes work on a fundamental level, but can begin to imagine leveraging this understanding to rationally design enzymes for improved synergy and substrate degradation.
Introduction
Extremely thermophilic bacteria from the genus Caldicellulosiruptor are particularly adept at degrading plant biomass polysaccharides and are the highest temperature cellulose degrading microorganisms currently identified. The large extracellular carbohydrate active enzymes (CAZymes) that perform this degradation are almost exclusively multi-domain, with multiple catalytic glycoside hydrolase (GH) and/or polysaccharide lyase (PL) domains and several carbohydrate binding module (CBM) domains in the same protein (1). The presence of multiple catalytic domains within the same protein offers the opportunity for synergy between modules with different biochemical functions, such as endo- and exo- acting, hemicellulase or cellulase. While CAZymes are produced by all plant biomass degrading microorganisms, the prevalence of multi-domain, multi-functional CAZymes in the genomes of Caldicellulosiruptor species is particularly unique (2).
205
Many of these extracellular multi-domain enzymes from Caldicellulosiruptor species have been studied biochemically such that a better understanding of how these enzymes function is emerging. Figure 4.1 shows selected extracellular enzymes that have been biochemically characterized from C. bescii, C. kronotskyensis, C. saccharolyticus, three highly cellulolytic species that are among the most highly studied species of the genus (3-6).
The extracellular enzymes of Caldicellulosiruptor species can be divided into two broad groups shown in Figure 4.1: those enzymes that are free from the cell, and those that contain surface layer homology (SLH) domains that associate proteins to the bacterial cell surface.
There are 8 groups of catalytic GH- or PL-containing S-layer associated enzymes through all of the Caldicellulosiruptor species, of which three (Csac_0678, Calkro_0111 and
Calkro_402) have been biochemically characterized. These proteins have been identified on the cell surface of their native species, attached within the S-layer by their SLH domains
(5,7). This location is believed to play a role in the attachment of Caldicellulosiruptor species to their plant biomass substrates. It may also be advantageous to the cell to degrade plant biomass and release sugars for uptake in close proximity to the cell. Calkro_0111 is the largest glycoside hydrolase-containing enzyme (2,435 amino acids) produced by any
Caldicellulosiruptor species, and is completely unique to C. kronotskyensis. It contains two catalytic domains (GH16 and GH55) as well as a number of CBM and accessory domains
(Fascin-like and Fibronectin type III) that appear to play a role in the function of the catalytic domains (5). Unlike Calkro_0111, Csac_0678 and its homologs are the smallest GH containing S-layer proteins produced by Caldicellulosiruptor species, and the GH5 domain of this protein appears to be conserved in all Caldicellulosiruptor species. The activity and
206
thermostability of the Csac_0678 GH5 appear to be enhanced by the CBM28 domain adjacent to it (8).
While S-layer enzymes are an important part of how Caldicellulosiruptor species degrade plant biomass, the majority of Caldicellulosiruptor extracellular enzymes are free from the cell. Several enzymes in the secretome have been characterized (see Figure 4.1).
Athe_1867 (CelA) is one of the most effective cellulase enzymes known (9). Athe_1867 is found in the glucan degradation locus (GDL), a locus of extracellular enzymes that contain
CBM3 domains, in all cellulolytic Caldicellulosiruptor species (4). From the GDL,
Athe_1857, Athe_1857, and Csac_1078 have all been characterized previously (6,10,11).
Downstream of the GDL in many Caldicellulosiruptor species is a locus of pectin-degrading enzymes; PL3 containing Athe_1854 has been characterized in detail from this locus (12,13).
Expressing these large Caldicellulosiruptor enzymes recombinantly to facilitate characterization has been problematic until recently, given the recent advancement of genetic tools for C. bescii (14,15). Thus, many of the enzymes characterized from
Caldicellulosiruptor were examined as truncation mutants of their constituent domains for characterization. Often the constituent domains, when taken apart, do not have the same activity as the full protein, as was the case for Csac_1078 (CelB) (6). This reveals the importance of the multi-domain structure of Caldicellulosiruptor enzymes for biomass degradation.
Biochemical characterization is important for understanding what substrates specific domains act on and which non-catalytic domains may play a role in this activity. However, this biochemical data falls short of elucidating the physical way that the domains are
207
arranged and how catalytic and binding domains function through their interactions with substrates, other domains, or other proteins. Information of this nature is best determined through the identification of protein structures. Because obtaining protein crystals is required for solving a protein structure by X-ray diffraction, the multi-domain Calidcellulosiruptor enzymes, with their heterogeneous domains and flexible disordered inter-domain linkers, are ill-suited for this technique in their native state. Thus, these proteins must be broken up into individual domains to produce crystals and obtain their structural information. Only six crystal structures from Caldicellulosiruptor CAZymes have been deposited to date in the protein data bank (www.rcsb.org) (16). These include: C. bescii Athe_1854 pectate lyase
PL3 (PDB ID: 3T9G) (13,17), C. sp. F32 β-1,3-1,4-glucanase GH5 (PDB ID: 4X0V), C. bescii Athe_0185 xylanase GH10 (PDB ID: 4L4O) (18), C. bescii Athe_1867 endoglucanase
GH9 (PDB ID: 4DOD) and cellobiohydrolase GH48 (PDB ID: 4EL8) (9), C. strain Rt8.B4 mannanase Man26 CBM27 (PDB ID: 1PMJ) (19).
Here, we present the crystal structures for seven CAZy domains from three enzymes of three different Caldicellulosiruptor species. In addition, we have characterized these domains biochemically to elucidate the relationship between the structure of the multi- domain proteins and their function.
Experimental Procedures
Bacterial Strains, Plasmids and Reagents
NEB 5-alpha E. coli (New England Biolabs) was used for the construction of vectors used in this study. Proteins for expression in E. coli were cloned in pET46 using Gibson
208
Assembly (20) using Gibson Assembly Mastermix (New England Biolabs). The sequences of some vectors were modified after cloning to add or remove a small number of codons for crystallography constructs or to introduce mutations in the active site of enzymes using the
NEB Q5 site directed mutagenesis kit (New England Biolabs). Protein expression was performed from Rosetta 2 (DE3) E. coli (Novagen). C. bescii was obtained from the Leibniz
Institute DSMZ-German Collection of Microorganisms and Cell Cultures. Replicating shuttle vector pDCW89 (15) was obtained from Dr. Janet Westpheling (University of
Georgia, Athens, Georgia) and modified by adding a cloning site and kanamycin resistance gene to produce vector pJMC046 for protein production in C. bescii. Caldicellulosiruptor genomic DNA (gDNA) for cloning was extracted, as described previously (21).
ZymoResearch plasmid miniprep classic and ZymoPURE midiprep kits (Zymoresearch) were used for isolating plasmid DNA, and sequences were confirmed by Sanger sequencing
(Genewiz). Carbohydrates used in this study are as follows: birch wood xylan (Sigma
Aldrich), oat spelt xylan (Sigma Aldrich), beech wood xylan (Sigma Aldrich), CM- pachyman (Megazyme), CM-curdlan (Megazyme), low viscosity barley β-glucan
(Megazyme), lichenan (Megazyme), laminarin (Sigma Aldrich), locust bean gum glucomannan (Sigma Aldrich), carboxymethyl cellulose (CMC) (Sigma Aldrich).
Media and Culture Conditions
E. coli cultures were grown in either liquid Luria-Bertani (LB) medium (5 g/L yeast extract, 10 g/L tryptone, 10 g/L NaCl) or on plates of LB medium with 1.5% (w/v) agar. LB media was supplemented, as appropriate, with 50 μg/mL kanamycin (Fisher Scientific), 50
209
μg/mL carbenicillin (Fisher Scientific), and/or 34 μg/mL chloramphenicol (Sigma Aldrich).
Protein expression in E. coli was performed in ZYM-5052 auto-induction medium, as described previously (22).
C. bescii strains were grown anaerobically with 20% CO2 / 80% N2 headspace at
70°C without agitation in a modified version of DSM516 medium, designated CG516.
CG516 media contains 0.33 g/L NH4Cl, 0.33 g/L KCl, 0.33 g/L MgCl2 ● 6H2O, 0.14 g/L
CaCl2 ● 2H2O, 0.16 mM sodium tungstate, 1mL/L trace element solution SL-10, 1mL/L vitamin solution, 0.25 mg/L resazurin, 1 g/L L-Cysteine-HCl ● H20, 1 g/L sodium bicarbonate, 1mM potassium phosphate, 5 g/L glucose, and 0.5 g/L yeast extract. The vitamin solution for this media contains the following: 20 mg/L biotin, 20 mg/L folic acid,
100 mg/L Pyridoxine-HCl, 50 mg/L Thiamine-HCl ● 2H2O, 50 mg/L riboflavin, 50 mg/L nicotinic acid, 50 mg/L D-Ca-pantothenate, 50 mg/L Vitamin B12, 50 mg/L p-Aminobenzoic acid, and 50 mg/L Lipoic acid. To grow C. bescii competent cells, LOD medium (23) was used containing 5 g/L glucose and 1 x 19 amino acid solution (24). C. bescii was plated by embedding in CG516 medium with 1.5% agar and grown at 65°C with 2% hydrogen/98% nitrogen headspace in anaerobic conditions. To maintain the replicating vector for protein expression in C. bescii, media was supplemented with 50 μg/mL kanamycin (Fisher
Scientific).
Recombinant Protein Production in E. coli
Proteins were expressed in 1L ZYM-5052 auto-induction media (22) and incubated at
37°C with agitation at 250 rpm. Cells were harvested after 20 hours by centrifugation at 6000
210
x g for 10 minutes. The resulting cell pellets were frozen at -20°C prior to protein purification.
Purification of Proteins for Biochemical Characterization
To re-suspend the cell pellet, 5 mL / g wet cell pellet in 50 mM sodium phosphate, pH 7.4, 500 mM NaCl, 20 mM imidazole containing 1 mg / mL lysozyme (ThermoFisher
Scientific) was used. Cells were lysed using a French press at 16,000 psi cell pressure. Heat treated of the cell lysate at 65°C for 20 minutes was used to remove heat labile E. coli proteins. The heat-treated samples were then centrifuged at 18,000 x g for 30 minutes and
0.22 µm filtered. The cell extract was purified using 1 or 5 mL HisTrap HP nickel-Sepharose
(GE Healthcare) columns, operated according to the manufacturer’s instructions. All FPLC steps were performed using a Biologic DuoFlow FPLC (Bio-Rad). Proteins purified by size exclusion were purified using a HiPrep 16/600 Sephacryl S-200 HR column (GE Healthcare) in 50 mM sodium acetate pH5.5, 150 mM NaCl, 10 mM CaCl2 buffer. Purified proteins were concentrated, and buffer exchanged into 50 mM sodium acetate pH5.5, 100 mM NaCl, 10 mM CaCl2 buffer using 10,000 or 50,000 MWCO Vivaspin PES concentrators. Protein purification was assessed using Mini-PROTEAN TXG stain free 4-15% SDS-PAGE gels
(Bio-Rad). Glycoprotein staining was performed using the Pierce Glycoprotein Staining Kit
(ThermoFisher Scientific). All gel imaging was performed using the Syngene G:BOX gel imaging system. The Pierce BCA protein assay (ThermoFisher Scientific) was used to determine protein concentrations.
211
Construction of C. bescii Enzyme Expression Strains
Vector pJMC046 for the expression of proteins in C. besci was constructed from pSBS4 (14) by adding a protein expression site containing: the 200 bp Pslp promoter from
Athe_2303 to drive expression, a C-terminal His tag coding sequence, and the terminator from Calkro_0402. Genes for Wai35_2053 and Athe_0594 GH5-CBM28 were inserted in this vector and selected in both E. coli and C. bescii using 50 µg/mL kanamycin. Vectors produced in E. coli were methylated using M.CbeI methylase, as described previously (25) prior to transformation into C. bescii.
Competent cells were prepared by growing a 500 mL culture of C. bescii to an optical density at 680 nm of 0.06-0.08. This culture was cooled to room temperature and then cells were harvested. All centrifugation steps during the preparation of competent cells are carried out at 6000 x g for 10 minutes. Following this initial harvest, the cell pellet was washed three times in 10% sucrose solution. The cell pellet after the final wash was re-suspended to a total volume of 100-120 µL with 10% sucrose. Cells (50 µL) were mixed with 1-2 µg of plasmid
DNA and transferred to 1 mm gap cuvettes (USA Scientific). Cells were electroporated using a Gene Pulser II system with Pulse Controller PLUS module (Bio-Rad) at the following conditions: 2.0 kV, 200 Ω, and 25 µF. Immediately following electroporation, cells are re- suspended in 1mL pre-warmed DG516 medium, transferred to a 10 mL pre-warmed culture, and incubated at 70°C. At 15 minutes and one hour post-electroporation, 1-5 mL was transferred into selective medium containing 50 µg/mL kanamycin. These selective media cultures were incubated at 70°C, until growth was observed, typically 2 days. After growth under selective conditions, these strains were plated once on CG516 medium containing
212
kanamycin. To verify strains contained the replicating plasmid, plasmid DNA was isolated from C. bescii using the ZymoResearch plasmid miniprep classic kit was transformed into 5α
E. coli. Plasmid DNA was then isolated from colonies of E. coli using the same kit and restriction mapped to verify the plasmid was intact from the C. bescii strain.
Protein Expression and Purification from C. bescii Strains
Protein expression strains of C. bescii were grown overnight in 500 mL CG516 medium with 50 µg/mL kanamycin at 70°C. This culture served as the inoculum for a
Sartorius Stedim BioStat Cplus 20L bioreactor containing 17 L of CG516 medium with 50
µg/mL kanamycin. The bioreactor was operated for 22 hours at 70°C with sparging of 1
L/min 20% CO2/80% N2 gas mix and agitation at 200 rpm. The pH of the bioreactor was controlled with NaOH addition at pH 7.00. To harvest the reactor, it was cooled to 20-30°C and the fermentation broth was transferred to a carboy. This broth was fractionated using the
Millipore Pellicon Mini Tangential Flow Filter using a Pellicon 2 mini 0.22 µm GVPP 0.1 m2 filter (Millipore). The permeate of this filter was retained as the extracellular protein fraction.
This extracellular protein fraction was concentrated using the TFF system and a Pellicon 2 mini BioMax 10 or 50 kDa cutoff PES 0.1 m2 filter (Millipore). Here the retentate of the filter contained the large extracellular proteins produced by C. bescii, including the protein of interest. This filtering setup was also used to buffer exchange the protein into 20 mM sodium phosphate 500 mM sodium chloride buffer pH 7.4. When the fermentation broth was concentrated to a volume of 300-800 mL, the material was sterile-filtered and loaded onto a 5 mL HisTrap HP nickel-Sepharose (GE Healthcare) column, and purified as described above.
213
Purification of Proteins for Cystallography
The frozen cell pellets were thawed at room temperature with equal volume of buffer
A (50 mM Tris pH 7.5, 100 mM NaCl and 10 mM imidazole) and lysed with lysozyme and sonication. One mg/mL lysozyme (Hampton Research, Aliso Viejo, CA), 1.0 U/mL Pierce
Universal Nuclease (Thermo Scientific, Rockford, IL) and EDTA-free protease inhibitor
(Thermo Scientific, Rockford, IL) according to manufacturer instructions were added in the lysis mixture and incubated for 30 minutes at room temperature with occasional vortexing.
Sonication was done at room temperature for two minutes using a Branson 5510 water bath sonicator (Branson Ultrasonics Corporation, Danbury, CT). Cell debris was removed by centrifugation at 15,000 x g for 15 min. The supernatant was loaded into a 5 mL HisTrap FF crude column (GE Healthcare, Piscataway, NJ) using an Akta FPLC system (GE Healthcare,
Piscataway, NJ) with buffer A. After loading and washing the unbound proteins from the column, protein samples were eluted using 100% of Buffer B (50 mM Tris pH 7.5, 100 mM
NaCl and 250 mM imidazole). Final purification was performed by size-exclusion chromatography using a HiLoad Superdex 75 (16/60) column (GE Healthcare, Piscataway,
New Jersey, USA) in buffer C (20 mM Tris pH 7.5 and 100 mM NaCl).
Crystallization
Crystals were initially obtained with sitting drop vapor diffusion using a 96-well plate with screens (Crustal Screen HT, PEG ion HT and Grid Salt HT) from Hampton Research
(Aliso Viejo, CA). Fifty µL of well solution was added to the reservoir and drops were made with 0.2 µL of well solution and 0.2 µL of protein solution using a Phoenix crystallization
214
robot (Art Robbins Instruments, Sunnyvale, CA). The crystals were grown in in the following conditions for each protein at 20°C: Csac_0678 GH5 in 1.0 M sodium malonate pH 7.0; Csac_0678 CBM28 in 0.1 M HEPES pH 7-8 and 64-74% (v/v) 2-methyl-2,4- pentanediol; Calkro_0111 GH16 in 0.1 M Sodium citrate tribasic dihydrate pH 5.6, 20%
(v/v) 2-Propanol and 20% (w/v) Polyethylene glycol 4,000; Calkro_0111 Fascin in 0.9-1.7 M ammonium sulfate and 0.1 M sodium acetate pH 4-5; Calkro_0111 GH16+Fascin in 20% w/v Polyethylene glycol 3,350 and 0.2 M Ammonium iodide; Wai35_2053 GH10 in Tris hydrochloride pH 8-9 and 1.5-2.1 M ammonium sulfate; Wai35_2053 CBM3 in 8% (v/v) tacsimate pH 4-5 and 14-25% polyethylene glycol 3350; and Wai35_2053 GH48 in 0.1 M citric acid pH 5-6, 14-25% (w/v) polyethylene glycol, 20% (v/v) 2-propanol and 50 mM tri- sodium citrate. The protein solutions used contained the following concentrations of protein in 20 mM Tris pH 7.5 and 100 mM NaCl supplemented with various ligands as noted: 18.5 mg/mL Csac_0678 GH5 supplemented with 10mM CaCl2; 20.7 mg/mL Csac_0678 CBM28;
12.4 mg/mL Calkro_0111 GH16 supplemented with 20mM cellobiose; 36 mg/mL
Calkro_0111 Fascin; 15 mg/mL Calkro_0111 GH16+Fascin supplemented with 5 mM cellobiose; 26 mg/mL Wai35_2053 GH10; 28 mg/mL Wai35_2053 CBM3; and 9 mg/mL
Wai35_2053 GH48 supplemented with 20mM cellobiose.
Data Collection and Processing
Crystals were flash frozen in a nitrogen gas stream at 100 K before home source data collection using an in-house Bruker X8 MicroStar X-Ray generator with Helios mirrors and
215
Bruker Platinum 135 CCD detector. Data were indexed and processed with the Bruker Suite of programs version 2014.9 (Bruker AXS, Madison, WI).
Structure Solution and Refinement
Intensities were converted into structure factors and 5% of the reflections were flagged for Rfree calculations using programs F2MTZ, Truncate, CAD and Unique from the
CCP4 package of programs (26). The program MOLREP (27) was used for molecular replacement using the following structures as the search model for each protein: PDB ID
1G01 for Csac_0678 GH5; PDB ID 3ACF for Csac_0678 CBM28; PDB ID 3ATG for
Calkro_0111 GH16; PDB ID 2YUG for Calkro_0111 Fascin; the previously determined
Calkro_0111 GH16 and Calkro_0111 Fascin structures for Calkro_0111 GH16+Fascin; PDB
ID 5AY7 for Wai35_2053 GH10; PDB ID 2L8A for Wai35_2053 CBM3; and PDB ID 4EL8 for Wai35_2053 GH48. Models for molecular replacement were created with ProtMod server
(http://ffas.burnham.org/protmod-cgi/protModHome.pl) using SCWRL (28) after sequence alignment against PDB using FFAS (29-31). Refinement and manual correction was performed using REFMAC5 (32) version 5.8.0155 and Coot (33) version 0.8.6.
Biochemical Characterization of Glycoside Hydrolases
Activity assays were performed by incubating a defined concentration of enzyme with a 1% (w/v) substrate solution in 100µL volume in a PCR plate thermocycler with heated lid (Eppendorf). All enzyme assays were performed in 50 mM sodium acetate buffer pH 5.5, containing 10 mM calcium chloride and 100 mM sodium chloride. Activity was measured
216
using the 3,5-dinitrosalicylic acid (DNS) reducing sugar assay (34,35), which was adapted to a 96 well format (36). The DNS-reagent used here contained 1.6% (w/v) sodium hydroxide,
30% (w/v) sodium potassium tartrate, and 0.1% (w/v) 3,5-dinitrosalicylic acid. Briefly, 25
µL of enzyme reaction was mixed with 50 µL DNS reagent and heated in a thermocycler at
95°C for 5 min, 48°C for 1 min, followed by holding at 20°C. To measure color development, 36 µL of DNS reaction was mixed with 160 µL distilled water in a 96-well plate and read at 540 nm. Glucose, and xylose were run as standards to calculate a curve to convert absorbance readings to µmol of reducing sugar.
Results
Calkro_0111 GH16 and Fascin Domain Containing Proteins
Calkro_0111 is the largest glycoside hydrolase (2,435 amino acids) produced by any sequenced Caldicellulosiruptor species, and based on searches of genome databases, it appears to be entirely unique to C. kronotskyensis. No proteins covering more than half of
Calkro_0111 result when the full sequence is used as a query for a blastp search, except for
Calkro_0121, a paralog of Calkro_0111 in C. kronotskyensis. Calkro_0111 contains two GH domains, GH16 and GH55, which act as an endoglucanase and exoglucanase, respectively, on the β-1,3/1,6-glucan laminarin (5). Like many Caldicellulosiruptor multi-functional enzymes, these endo- and exo- acting domains presumably work together, with the GH16 domain making new chain ends for the GH55 to degrade. This previous characterization work also showed that a Fascin-like domain (previously referred to as an actin crosslinking- like domain) adjacent to the GH16 was required for its activity. While there are no proteins
217
with the full multi-domain structure of Calkro_0111, several proteins consist of GH16-Facin domain pairs. Table 4.1 shows the closest blastp results when the Calkro_0111 GH16-Fascin sequence is used as a query. These proteins are almost exclusively from Paenibacillus species. Additionally, all of the results are only approximately 410 amino acid proteins comprised of only the GH16 and Fascin domains, with PaecuDRAFT_0918 being the only exception. PaecuDRAFT_0918 contains a CBM6 and CBM32 domain in addition to the
GH16 and Fascin, but this domain structure is not similar to the structure of Calkro_0111. To probe for the presence of Fascin domains in more distantly related GH16 sequences, a list of all genes containing GH16 domains (pfam00722) was obtained from the JGI Integrated
Microbial Genomes & Microbiomes database (37), and this list was searched using blastp with the Calkro_0111 Fascin domain sequence as the query. These results are shown in
Table 4.2. Here, the closest blastp matches consist only of the approximately 410 amino acids GH16 and Fascin domain proteins. Interestingly, two proteins (STAUR_1723 and
COCOR_01711) have the Fascin domain on the N-terminus of the GH16, as opposed to on the C-terminus as seen for all the other blastp matches and in Calkro_0111 itself. These results demonstrate that, while the GH16-Fascin structure is common to many GH16 enzymes, the larger multi-domain structure of Calkro_0111 remains unique.
Calkro_0111 GH16 and Fascin Domain Structures
The crystal structure of the GH16 domain from Calkro_0111 and the adjacent Fascin domain were solved individually (Figure 4.2A and Figure 4.2B/C, respectively). The GH16 structure was refined to a resolution of 1.7Å with a R/Rfree of 0.174/0.230 and the Fascin
218
structure was refined to a resolution of 1.5Å with a R/Rfree of 0.158/0.208. A structure was also determined for the full Calkro_0111 GH16-Fascin protein (Figure 4.2D), which was refined to a resolution of 2.35Å with a R/Rfree of 0.208/0.262. The GH16 structure resembles previously solved structures for GH16 endo-β-glucanases (PDB ID: 4DFS, for example), and has a cleft through which the β-glucan substrate fits to reach the active site for cleavage. A previously solved GH16 structure for an endo-β-galactosides from Clostridium perfringens also contains a smaller C-terminal domain similar to the Fascin domain of
Calkro_0111 (38). It was speculated to be involved in binding to polysaccharides, but it was not determined whether this smaller domain was required for the activity of the GH16 in this
Clostridium perfringens enzyme.
Biochemical Analysis of the Calkro_0111 GH16-Fascin Protein
The Calkro_0111 GH16-Fascin construct was produced in E. coli and purified
(Figure 4.3A). Truncation mutants of Calkro_0111 were previously characterized (5), and the only substrate where activity of the Calkro_0111 GH16 could be detected was the β-1,3-
1,6-glucan laminarin. The current GH16-Fascin construct includes 5 additional amino acids
(TVPSV) on the N-terminus of the TM4 construct characterized in this previous work. Based on the structure of the Calkro_0111 GH16-Fascin, the Fascin domain may play a role in orienting laminarin in the active site of the GH16. This could be accomplished by binding the
β-1,6-branching of laminarin to orient the β-1,3-glucan in the active site for cleavage. To examine this hypothesis, additional polysaccharides with β-1,3-glucan linkages that lack the
β-1,6-branching of laminarin were used as substrates for the Calkro_0111 GH16-Fascin
219
protein. These included: β-1,3/4-glucans lichenan and barley β-glucan, and β-1,3-glucans
CM-pachyman and CM-curdlan. The pachyman and curdlan polysaccharides used in this study are carboxymethylated to improve their solubility. Specific activity was determined on these substrates using the DNS reducing sugar assay (Figure 4.3B), and activity was detected on all of the substrates. However, the specific activity of Calkro_0111 GH16-Fascin on laminarin (0.175 µmol reducing sugar released / min / mg) was 40 times higher than the activity measured on CM-curdlan, 85 times higher than the activity measured on barley β- glucan and lichenan, and 140 times higher than the activity measured on CM-pachyman. This limited activity on β-1,3-glucans without β-1,6-branching suggests that the Calkro_0111
GH16-ACL is specifically designed for activity on laminarin-like branched substrates. It also implicates the Fascin domain as helping to orient the GH16 for improved activity on laminarin. Further, this role in improving activity of an adjacent GH domain is a classic role for CBM domains in CAZymes. This suggests the Fascin domain of Calkro_0111 may represent a new family of CBM, but further evidence of sugar binding is necessary to make this conclusion.
Wai35_2053 GH10, CBM3, and GH48 Domain Structures
Recently, the genome of Caldicellulosiruptor sp. strain Wai35.B1 (39) was sequenced, and a novel enzyme, referred to here as Wai35_2053, predicted to contain three catalytic GH domains, was identified. This is the first enzyme identified to date that has three catalytic domains within a single CAZyme protein, and also the first GH12 domain identified in a Caldicellulosiruptor species. In the genome of C. sp. Wai35.B1, Wai35_2053 (NCBI
220
accession WP_045175321) is found in the glucan degradation locus (GDL), the primary locus of multi-domain enzymes responsible for cellulose degradation in the
Caldicellulosiruptor species. Wai35_2053 has four domains: GH10-CBM3-GH12-GH48, which is similar to Athe_1857 (Figure 4.1) from the C. bescii GDL. In fact, the GH10-
CBM3 domains of Wai35_2053 are 88% identical to the corresponding domains in
Athe_1857, with most of the differences coming in the Pro/Thr rich linker between the two domains. Additionally, the GH48 domain of Wai35_2053 is 98% identical to the GH48 of
Athe_1857, which coincidentally is identical to the GH48 domain of Athe_1867. The GH12 domain of Wai35_2053 has no homology to Athe_1857 or other previously identified
Caldicellulosiruptor proteins.
The domains of Wai35_2053 were individually cloned and produced in E. coli for crystallization efforts. Crystal structures were solved for three of the four domains: GH10,
CBM3, and GH48 (Figure 4.4 A/B, C/D, and E, respectively). The GH10 structure was refined to a resolution of 1.9 Å with a R/Rfree of 0.180/0.234; the CBM3 structure was refined to a resolution of 2.0 Å with a R/Rfree of 0.180/0.245; and, the GH48 was refined to a resolution of 1.9 Å with a R/Rfree of 0.167/0.239. These three crystal structures represent very typical GH10, CBM3, and GH48 domains, respectively. The GH10 structure of
Wai35_2053 resembles that of intracellular C. bescii Athe_0185 xylanase GH10 (PDB ID:
4L4O) (18). Additionally the GH48 domain of Wai35_2053 is similar to the previously solved structure for the GH48 of Athe_1867 (PDB ID: 4EL8) (9), which has 98% sequence identity to the Wai35_2053 GH48.
221
Biochemical Analysis of Wai35_2053 and Constituent Domains
To investigate the activity of Wai35_2053, the full length protein was produced in C. bescii (Figure 4.5A) using newly developed protein expression techniques relying on a high temperature kanamycin resistance marker for C. bescii (14). This figure also shows the individual catalytic domains of Wai35_2053 produced in E. coli for characterization. C. bescii is known to glycosylate its large extracellular proteins, such as Athe_1867 CelA (40), so the glycosylation of Wai35_2053 produced in C. bescii was investigated. Staining an
SDS-PAGE gel containing Wai35_2053 for glycoprotein shows that it is glycosylated
(Figure 4.5B). Glycosylation of Wai35_2053 is also evidenced by the 185 kDa apparent molecular mass of Wai35_2053 on the SDS-PAGE gel, while its predicted size based on sequence alone is 170 kDa.
The activity of Wai35_2053 was screened in an overnight assay on several polysaccharide substrates (data not shown) and substrates where activity was detected were used to assay the activity of the full protein and the three catalytic domains in more detail.
Activity was measured using the DNS reducing sugar assay and activity is calculated on a molar basis to compare across the proteins of different molecular weight (Figure 4.5C/D).
Wai35_2053 is very highly active on xylan substrates, and this xylanase activity appears to be the result of the GH10 domain, based on the similar specific activity of the GH10 domain alone to the full protein on birch wood, oat spelt, and beech wood xylans. This activity for the GH10 domain of Wai35_2053 is similar to that of the GH10 of Athe_1857 (11), which is a close homolog. The GH12 and GH48 had detectable activity on these xylan substrates, but at a level 2 orders of magnitude less than the GH10. This slight activity on xylan has
222
previously been identified for the GH48 of Athe_1867 (9). GH12 domains are typically xyloglucanases or β-glucanases, but some do have activity on xylan substrates (41).
Significant β-1,3-glucanase activity was detected on barley β-glucan for the GH10 and GH12 domains, which each have a specific activity of 250 µmol reducing sugar released / min /
µmol. The activity of the full protein is equal to the sum of these two domains, at approximately 500 µmol reducing sugar released / min / µmol. The GH48 also shows low activity on barley β-glucan. Wai35_2053 and its domains also showed activity on β-1,3/4- glucan lichenan, β-1,4-glucan carboxymethyl cellulose, and β-1,4-mannan locust bean glucomannan (Figure 4.5D), and these activities appear to primarily reside in the GH48 domain. Also of interest is that the activity on locust bean glucomannan appears to only be related to the C-teriminal GH12 and GH48 domains of Wai35_2503, as the GH10 has no activity on this substrate. These results, with the full enzyme having similar activity to the sum of its individual catalytic domains, suggest that minimal synergy is present between the three catalytic domains of Wai35_2053.
Csac_0678 GH5 and CBM28 Structures
Csac_0678 consists of a GH5 domain, CBM28, and repeated SLH domains that anchor this protein to the cell surface within the S-layer, as described previously (7). The
GH5 of Csac_0678 is conserved in all sequenced Caldicellulosiruptor species, but certain species are lacking the CBM28 and SLH domains. The homolog of the Csac_0678 GH5 in C. sp. Rt8.B8 consists of only the single GH5 domain without the CBM28 or SLH domains. The two American isolates, C. owensensis and C. obsidiansis, have homologs that lack the
223
CBM28 domain between their GH5 and SLH domains. Here, the three-dimensional structure of the Csac_0678 GH5 (Figure 4.6 A/B) and CBM28 (Figure 4.6 C/D) were determined.
The structure of Csac_0678 GH5 was refined to a resolution of 1.5 Å with a R/R-free of
0.085/0.115. The Csac_0678 GH5 displays a typical (βα)8-TIM barrel fold (42). This structure has been deposited into the protein data bank (PDB; www.rcsb.org) with entry code
5ECU. The Csac_0678 CBM28 was refined to a resolution of 2.62Å with a R/R-free of
0.260/0.344, with further refinement in progress.
Pair-wise secondary-structure matching by the PDBefold program (43) found 68 unique structural matches for Csac_0678 GH5 from the protein data bank with at least 70% secondary structure similarity. The most similar match was alkaline cellulase K from
Bacillus sp. KSM-635 (44), with a secondary structure similarity of 83%, sequence similarity of 63%, and Cα root-mean-square deviation of 0.727 Å2, suggesting highly similar backbones between the two proteins. Further comparison with other GH5 structures revealed that the Csac_0678 GH5 and Bacillus GH5 differ from the other GH5s by having an enlarged loop near the active site (Figure 4.6 B). However, Csac_0678 GH5 does not have other extended features of Bacillus GH5 (44). The catalytic residues of Csac_0678 GH5 (Glu188 and Glu285) are in similar orientation compared to Bacillus GH5. The residues near the active sites are identical except for leucine instead of histidine at position 151. Histidine at this position is a special feature of the Bacillus GH5 (44).
The structures of only two CBM28 domains have been solved to date. One of these is from a β-glucanase, C. josui Cel5A (45), which has a domain structure very similar to
Csac_0678 with a GH5, CBM28, and SLH domains. However, unlike Csac_0678, C. josui
224
Cel5A has an additional CBM17 domain between its GH5 and CBM28 domains. The
CBM17 and CBM28 families are very similar, but have binding sites for glucan chains that are in opposite directions. The CBM28 of C. josui binds to cellooligosaccharides and is hypothesized to be involved in the orientation of glucan chains towards the active site of its
GH5 (45). The CBM28 of Csac_0678 presumably plays a similar role by binding to glucan chains along its face and orienting the polysaccharide towards the active site of the GH5.
Proteins produced containing both the GH5 and CBM28 domains together did not form crystals for analysis. This is the first time the GH5 and CBM28 domain from the same protein have both been crystallized.
Biochemical Analysis of Csac_0678 and Athe_0594
The biochemical properties and substrates of Csac_0678 were established previously
(4). Additionally, work with the C. bescii homolog (Athe_0594) has shown that the CBM28 improves the activity and thermosstability of the GH5 domain (8). Here, Csac_0678 was compared to Athe_0594 and the active site of Athe_0594 GH5 was investigated based on the predicted active site of the Csac_0678 GH5 crystal structure (Figure 4.6A).
Csac_0678 and Athe_0594 are close homologs, but have a relatively low percent identity (65%) over the whole protein because of divergence in the sequences of their C- terminal SLH domains. Characterizing Athe_0594 from C. bescii is of interest because tools for manipulating C. bescii genetically were recently developed (14,46), while these tools do not exist currently for C. saccharolyticus. An alignment of Csac_0678 and Athe_0594 are shown in Figure 4.7A. Arrows in this figure mark the domain boundaries between the GH5,
225
CBM28 and SLH domains. As seen in this figure, the GH5 domains have very similar sequences (81% sequence identity), the CBM28 domains are highly similar (68% sequence identity), and the SLH domains are the most dissimilar (40% sequence identity) part of these proteins. Because the GH5 sequences are highly similar the active site residues predicted by the Csac_0678 structure (Glu188 and Glu285) (Figure 4.6A) align to the active site of
Athe_0594 (Glu189 and Glu286), which are marked in Figure 4.7A.
To investigate their activities, truncations of Csac_0678 and Athe_0594 containing only the GH5 and CBM28 domain without SLH domains were produced in E. coli. In addition, using newly developed protein expression techniques in C. bescii, an identical truncation of Athe_0594 was produced in C. bescii. Figure 4.7B shows an SDS-PAGE gel of these purified proteins. Based on the previously identified glycosylation of extracellular proteins in C. bescii (40), including Wai35_2053, the Athe_0594 GH5-CBM28 protein produced in C. bescii was stained to detect glycoprotein. Athe_0594 GH5-CBM28 was not found to be glycosylated (data not shown). To investigate the active site of the Athe_0594
GH5, the predicted active site acid/base residue and nucleophilic residue (Glu189 and
Glu286, respectively) were mutated to Ala individually and in combination. This resulted in three mutants of Athe_0594 GH5: E189A, E286A, and E189AE286A, which were expressed alongside the native Athe_0594 GH5 in E. coli and purified (Figure 4.7C).
The activity of these proteins was compared on barley β-glucan and CMC, two substrates Csac_0678 was previously identified to be most active on (4). Csac_0678 GH5-
CBM28 and both the E. coli expressed and C. bescii expressed Athe_0594 GH5-CBM28 had nearly identical activities on these two substrates (Figure 4.7D). The activity of the
226
Athe_0594 GH5 alone is also shown in this figure for comparison. The 250-fold decrease in activity between the GH5-CBM28 and GH5 protein, demonstrates the contribution of the
CBM28 to enhancing the activity of the GH5 that was previously described (8). The activities of the Athe_0594 GH5 active site mutants were also evaluated alongside the native
Athe_0594 GH5 protein (Figure 4.E). Either mutation (E189A or E286A) individually and both mutations in combination appear to have the same effect of nearly abolishing activity of the GH5, confirming Glu189 and Glu286 as the active site residues of this GH5 domain.
Based on structural similarity to other GH5 domain Glu189 can be identified as the acid/base residue and Glu286 can be identified as the nucleophilic residue of the active site (47,48).
Thus, by using the Csac_0678 GH5 crystal structure the active site residues of the Athe_0594 were determined and confirmed.
Discussion
Protein crystal structures are a powerful tool for identifying the physical properties of proteins including: the active site residues, the tertiary structure of a protein, and its interaction with substrates, other domains, and other proteins. Here, we present the crystal structures for 7 domains from three proteins produced by Caldicellulosiruptor species. To complement the structural understanding of these proteins, biochemical data was collected for individual domains and full proteins to inform our understanding of how the structure of an individual domain leads to its function and the function of the larger multi-domain protein.
227
The Calkro_0111 GH16-Facin construct represents a portion of a very unique protein from C. kronotskyensis. While no proteins can be identified that have a similar structure to the large and complex Calkro_0111, there are several identifiable proteins that are only comprised of a similar GH16-Facin domain pair (Table 4.1 and Table 4.2) to that found in
Calkro_0111. The GH16-Fascin domain pair was the smallest identifiable truncation around the Calkro_0111 GH16 domain to have activity (5). It does not appear that the proteins in
Tables 4.1 and 4.2 have been characterized, but it would be interesting to determine if their
GH16 domains also do not have activity without the adjacent Fascin domain. Further, based on the structure of the Calkro_0111 GH16-Fascin (Figure 4.2D) and its significantly higher activity on the β-1,6-branched β-1,3-glucan laminarin compared to other unbranched β-1,3- glucans (Figure 4.3B), the Fascin domain may be a new family of CBM. By binding to laminarin, likely near the branch points of the polysaccharide, it may help to orient the substrate in the active site of the GH16 leading to its increased activity. Further analysis is necessary to substantiate the binding of the Fascin domain to laminarin, but without both the structure of these domains and their biochemical activity we would not understand the coordinated function of these domains.
In the case of Wai35_2053, the crystal structures for three domains (GH10, CBM3, and GH48) are reported (Figure 4.4). These structures represent fairly typical structures for members of these CAZyme domain families. What is interesting about these structures, however, is that they are individual pieces of a much larger multi-domain protein. Each of the catalytic domains, GH10, GH48, and GH12, are active enzymes on their own, and the structures for the GH10 and GH48 are representative of these particular GH families in
228
enzymes that are not multi-domain. Thus, the individually folded and catalytic domains of
Wai35_2053 are like individual knots on a polypeptide rope. This is representative of the multi-domain strategy used in may Caldicellulosiruptor proteins (Figure 4.1). One reason for the prevalence of multi-domain enzymes in these species could be to deploy domains that are synergistic together in a single protein so they can work together on a given substrate.
This appears to be the case for previously characterized CelB (Csac_1078), for which the full protein has higher activity than its individual domains or mixtures of domains that are not attached in the same protein (6). It does not appear from the biochemical data presented here
(Figure 4.5) that the activity of Wai35_2053 is as synergistic as other previously characterized Caldicellulosiruptor multi-domain proteins. For most substrates, the activity of the full protein is accounted for by a single domain or the sum of multiple domains. For example, the GH10 xylanase domain has comparable activity on birch wood xylan and beech wood xylan to the full protein, and while the GH12 and GH48 domains have some activity on these substrates the activity of the full length Wai35_2053, which contains all three domains, is not increased significantly above the GH10 alone. On barley β-glucan, the activity of the GH10 and GH12 domains appear to be additive to reach the activity of the full
Wai35_2053. This is what would be expected if these domains were not synergistic, because, on a molar basis, Wai35_2053 contains both a GH10 and GH12 domain in a single
Wai35_2053 molecule.
The GH12 domain of Wai35_2053 is currently in crystallization trials, but no crystals have formed as of yet. While having this domain structure would be useful for informing the full structure of the protein, having three of the four domains will allow for molecular
229
dynamic simulations to reconstruct the structure of this multi-domain protein. Modeling can also be used to determine a suitable GH12 model to stand in for the unsolved structure of the
Wai35_2053 GH12. By using a modeling approach and a reconstructed full Wai35_2053 structure, we would learn how the protein domains of Wai35_2053 are oriented relative to one another. Similar approaches have previously been utilized on Athe_1867 CelA to identify its mechanism and understand the synergistic action of its GH9 and GH48 domains
(9). This type of molecular simulation with Wai35_2053 would help to identify whether the structure of the protein would suggest any synergistic interactions between the domains. The structure of a protein with three catalytic domains, and in particular a GH12 and GH48 as adjacent domains, would also be interesting for understanding how these three domain each access the substrate while remaining attached in a single protein.
As with the structures of Calkro_0111 and Wai35_2053, the crystal structure of
Csac_0678 (Figure 4.6A) helped to inform biochemical experiments on this protein and its homolog Athe_0594. Here, the active site of Athe_0594 was predicted based on the active site determined by the structure of Csac_0678. Active site residues Glu189 (acid/base) and
Glu286 (nucleophile) of Athe_0594 were mutated to Ala individually and in combination in the Athe_0594 GH5. All three of these mutants showed a very significant loss in activity on
CMC and barley β-glucan compared to the native Athe_0594 GH5 domain (Figure 4.7E).
This confirms these residues are the catalytic acid/base and nucleophilic residues of these domains. This demonstrates the powerful ability of structural information to inform the determination of the active site of a protein. Similar analysis to this could be applied to the other structures determined here for Wai35_2053 catalytic domains or the Calkro_0111
230
GH16 to identify their active sites. Additionally, we confirm that the activities of the
Csac_0678 GH5-CBM28 and Athe_0594 GH5-CBM28 proteins are identical on the substrates CMC and β-beta glucan, and the activity of the Athe_0594 GH5 with its adjacent
CBM28 domain is at least two orders of magnitude higher than the GH5 alone. This presents a similar situation to the Calkro_0111 GH16, where an adjacent putative binding domain also enhances the activity of the catalytic GH domain. In addition, the same molecular simulations that would be useful to reconstruct a Wai35_2053 structure from its constituent domains could be used to model the full structure of Csac_0678.
The analysis of these Caldicellulosiruptor multi-domain proteins, by breaking them into their constituent parts for structural analysis, provides a wealth of information about how these enzymes function. Here, we present examples for the use of structural information to inform biochemical experiments and how biochemical data explains features seen in protein structures. The CBM28 domain of Csac_0678 plays and important part in enhancing the activity of its GH5, just as a Fascin-like domain appears to do for the Calkro_0111 GH16.
The structure of the GH16-Fascin protein informed how much higher activity towards the substrate laminarin could be explained. Additionally, the structures and biochemistry of
Wai35_2053 give us a broader picture of how the domains of multi-domain enzymes of
Caldicellulosiruptor species are arranged and function. The full length multi-domain proteins of Caldicellulosiruptor species are too large and heterogeneous to form the crystals necessary for solving their structures by X-ray diffraction, but by taking these proteins apart, we have demonstrated that the individual domains of these proteins truly are individually folded and functional. This observation about the structure of these multi-domain proteins, where
231
domains are like beads strung along a chain, could allow for the rational design of multi- domain enzymes in the future. By gaining a clearer picture of individual domain structures, the biochemical properties of the domains, and how the domains are arranged into multi- domain proteins, designing enzymes with enhanced synergistic properties may be within our reach.
Acknowledgements
This work was funded in part by the BioEnergy Science Center, a U.S. Department of
Energy Bioenergy Research Center supported by the Office of Biological and Environmental
Research in the DOE Office of Science. Protein crystallography was performed at the
National Renewable Energy Laboratory by Vladimir Lunin and Markus Alahuhta. Inci
Ozdemir performed the cloning of the Csac_0678 GH5 domain.
232
Figure 4.1 - Selected multi-domain enzymes from Caldicellulosiruptor species Proteins are grouped into two groups based on whether they are predicted to be surface- associated (by SLH domains), or freely secreted from the cell. The predicted domain arrangement of these proteins are shown to scale. Glycoside Hydrolase (GH) domains are shown as squares, Carbohydrate Binding Modules (CBM) are shown as circles, Polysaccharide Lyases (PL) are shown as diamonds. Accessory domains: Fascin-like (CL0066), Fibronectin Type III (FN3, pfam 00041), and Cadherin-like (pfam 12733) are also shown.
233
Figure 4.2 - Crystal strucutres for Calkro_0111 GH16 and Fascin-like domain (A) GH16 (B/C) Fascin-like domain (C) GH16-Fascin
234
Figure 4.3 - Biochemical data on the Calkro_0111 GH16-Fascin protein (A) Purified Calkro_0111 GH16-Fascin protein (B) Activity on β-1,3-glucan substrates. Error bars represent the standard deviation (n=4).
235
Figure 4.4 - Crystal structures for Wai35_2053 domains (A/B) GH10 (C/D) CBM3 (E) GH48
236
Figure 4.5 - Biochemical data on Wai35_2053 and its domains (A) Domain layout of Wai35_2053 and proteins produced for characterization (B) Glycoprotein staining of Wai35_2053. A positive glycoprotein control and negative non- glycosylated protein control are also shown. (C) Activity on xylans and barley β-glucan (D) Activity on glucomannan, CMC, and lichenan. Error bars represent the standard deviation (n=4).
237
Figure 4.6 - Crystal structures for Csac_0678 domains (A) Csac_0678 GH5. Active site residues Glu188 and Glu285 are shown. (B) Structural comparison of the Csac_0678 GH5 to alkaline cellulase K from Bacillus sp. KSM-635 (PDB ID: 1G01). A unique extended loop (circled) near the active site in both structures differentiates them from other GH5 domains. (C/D) Csac_0678 CBM28.
238
Figure 4.7 - Biochemical data on Csac_0678 and Athe_0594 proteins (A) Alignment of Csac_0678 and Athe_0594. The domain boundaries between different protein domains are marked. The active site residues are marked and labeled using the Athe_0594 sequence numbers (Glu189 and Glu286) (B) Csac_0678 and Athe_0594 GH5- CBM28 proteins produced in E. coli and C. bescii. (C) Athe_0594 GH5 proteins with active site mutations. (D) Activity of Csac_0678/Athe_0594 GH5-CBM28 proteins and Athe_0594 GH5 on CMC and barley β-glucan. (E) Activity of Athe_0594 GH5 with various mutations on CMC and barley β-glucan Error bars represent the standard deviation (n=4).
239
Table 4.1 - Proteins identified by blastp using Calkro_0111 GH16-Fascin sequence as the query
Amino Blastp Blastp Species Locus Tag Acid Domains Coverage Identity Length % %
SLH-SLH-SLH-CBM54- Caldicellulosiruptor FN3-GH16-Fascin-FN3-FN3- Calkro_0111 2435 - - kronotskyensis 2002 Fascin-FN3-FN3-GH55- CBM32-CBM32
SLH-SLH-SLH-CBM54- Caldicellulosiruptor FN3-GH16-Fascin-FN3-FN3- Calkro_0121 2229 97 69 kronotskyensis 2002 Fascin-FN3-FN3-GH55- CBM32 Paenibacillus PaecuDRAFT_0918 762 CBM6-CBM32-GH16-Fascin 97 62 curdlanolyticus YK9 Thermoactinomyces JG50_RS0110100 418 GH16-Fascin 97 52 daqus Paenibacillus sp. PTI45_02212 407 GH16-Fascin 97 50 TI45-13ar Paenibacillus sp. HMPREF1207_05298 410 GH16-Fascin 98 49 HGH0039 Paenibacillus PCH01S_RS00565 423 GH16-Fascin 98 49 chitinolyticus Paenibacillus WH45_RS10505 410 GH16-Fascin 95 49 wulumuqiensis
240
Table 4.2 - GH16 domain-containing proteins identified by blastp using the Calkro_0111 Fascin domain sequence as the query
Amino Blastp Blastp Species Locus Tag Acid Domains Coverage Identity Length % %
SLH-SLH-SLH-CBM54- Caldicellulosiruptor FN3-GH16-Fascin-FN3- Calkro_0111 2435 - - kronotskyensis 2002 FN3-Fascin-FN3-FN3- GH55-CBM32-CBM32
SLH-SLH-SLH-CBM54- Caldicellulosiruptor FN3-GH16-Fascin-FN3- Calkro_0121 2229 96 63 kronotskyensis 2002 FN3-Fascin-FN3-FN3- GH55-CBM32 Paenibacillus sp. BD3526 Ga0100886 410 GH16-Fascin 97 38 Stigmatella aurantiaca STAUR_1723 445 Fascin-GH16 96 40 DW4/3-1 Coraliomargarita Caka_0989 396 GH16-Fascin 93 33 akajimensis DSM 45221 Archangium gephyra DSM Ga0081691_113251 422 GH16-Fascin 89 41 2261 Corallococcus coralloides COCOR_01711 438 Fascin-GH16 94 36 DSM 2259 Pedobacter saltans DSM Pedsa_2631 431 GH16-Fascin 89 37 12145 Coraliomargarita Caka0071 417 GH16-Fascin 95 34 akajimensis DSM 45221 Chitinophaga pinensis DSM Cpin_5109 409 GH16-Fascin 76 36 2588 Niastella koreensis GR20-10, Niako_0669 406 GH16-Fascin 71 39 DSM 17620 Mucilaginibacter Ga0123631 422 GH16-Fascin 79 33 PAMC26640
241
References
1. Conway, J. M., Zurawski, J. V., Lee, L. L., Blumer-Schuette, S. E., and Kelly, R. M.
(2015) Lignocellulosic biomass deconstruction by the extremely thermophilic genus
Caldicellulosiruptor. in Thermophilic Microorganisms (Li, F. ed.), Caister Academic
Press, Norfolk, UK. pp 91-120
2. Bayer, E. A., Shoham, Y., and Lamed, R. (2013) Lignocellulose-Decomposing
Bacteria and Their Enzyme Systems. in The Prokaryotes: Prokaryotic Physiology
and Biochemistry (Rosenberg, E., DeLong, E. F., Lory, S., Stackebrandt, E., and
Thompson, F. eds.), Springer Berlin Heidelberg, Berlin, Heidelberg. pp 215-266
3. Zurawski, J. V., Conway, J. M., Lee, L. L., Simpson, H., Izquierdo, J. A., Blumer-
Schuette, S., Nookaew, I., Adams, M. W. W., and Kelly, R. M. (2015) Comparative
analysis of extremely thermophilic Caldicellulosiruptor species reveals common and
differentiating cellular strategies for plant biomass utilization. Appl. Environ.
Microbiol. 81, 7159-7170
4. Blumer-Schuette, S. E., Giannone, R. J., Zurawski, J. V., Ozdemir, I., Ma, Q., Yin,
Y., Xu, Y., Kataeva, I., Poole, F. L., 2nd, Adams, M. W., Hamilton-Brehm, S. D.,
Elkins, J. G., Larimer, F. W., Land, M. L., Hauser, L. J., Cottingham, R. W., Hettich,
R. L., and Kelly, R. M. (2012) Caldicellulosiruptor core and pangenomes reveal
determinants for noncellulosomal thermophilic deconstruction of plant biomass. J.
Bacteriol. 194, 4015-4028
5. Conway, J. M., Pierce, W. S., Le, J. H., Harper, G. W., Wright, J. H., Tucker, A. L.,
Zurawski, J. V., Lee, L. L., Blumer-Schuette, S. E., and Kelly, R. M. (2016)
242
Multidomain, Surface Layer-associated Glycoside Hydrolases Contribute to Plant
Polysaccharide Degradation by Caldicellulosiruptor Species. J. Biol. Chem. 291,
6732-6747
6. VanFossen, A. L., Ozdemir, I., Zelin, S. L., and Kelly, R. M. (2011) Glycoside
hydrolase inventory drives plant polysaccharide deconstruction by the extremely
thermophilic bacterium Caldicellulosiruptor saccharolyticus. Biotechnol. Bioeng.
108, 1559-1569
7. Ozdemir, I., Blumer-Schuette, S. E., and Kelly, R. M. (2012) S-layer homology
domain proteins Csac_0678 and Csac_2722 are implicated in plant polysaccharide
deconstruction by the extremely thermophilic bacterium Caldicellulosiruptor
saccharolyticus. Appl. Environ. Microbiol. 78, 768-777
8. Velikodvorskaya, G. A., Chekanovskaya, L. A., Lunina, N. A., Sergienko, O. V.,
Lunin, V. G., Dvortsov, I. A., and Zverlov, V. V. (2013) Family 28 carbohydrate-
binding module of the thermostable endo-1,4-beta-glucanase CelD from
Caldicellulosiruptor bescii maximizes enzyme activity and irreversibly binds to
amorphous cellulose. Mol Biol (Mosk) 47, 581-586
9. Brunecky, R., Alahuhta, M., Xu, Q., Donohoe, B. S., Crowley, M. F., Kataeva, I. A.,
Yang, S.-J., Resch, M. G., Adams, M. W. W., Lunin, V. V., Himmel, M. E., and
Bomble, Y. J. (2013) Revealing Nature’s Cellulase Diversity: The Digestion
Mechanism of Caldicellulosiruptor bescii CelA. Science 342, 1513-1516
10. Ye, L., Su, X., Schmitz, G. E., Moon, Y. H., Zhang, J., Mackie, R. I., and Cann, I. K.
O. (2012) Molecular and biochemical analyses of the GH44 Module of
243
CbMan5B/Cel44A, a bifunctional enzyme from the hyperthermophilic bacterium
Caldicellulosiruptor bescii. Appl. Environ. Microbiol. 78, 7048-7059
11. Xue, X., Wang, R., Tu, T., Shi, P., Ma, R., Luo, H., Yao, B., and Su, X. (2015) The
N-Terminal GH10 Domain of a Multimodular Protein from Caldicellulosiruptor
bescii Is a Versatile Xylanase/beta-Glucanase That Can Degrade Crystalline
Cellulose. Appl Environ Microbiol 81, 3823-3833
12. Alahuhta, M., Xu, Q., Bomble, Y. J., Brunecky, R., Adney, W. S., Ding, S. Y.,
Himmel, M. E., and Lunin, V. V. (2010) The unique binding mode of cellulosomal
CBM4 from Clostridium thermocellum cellobiohydrolase A. J Mol Biol 402, 374-387
13. Alahuhta, M., Brunecky, R., Chandrayan, P., Kataeva, I., Adams, M. W., Himmel, M.
E., and Lunin, V. V. (2013) The structure and mode of action of Caldicellulosiruptor
bescii family 3 pectate lyase in biomass deconstruction. Acta. Crystallogr. D Biol.
Crystallogr. 69, 534-539
14. Lipscomb, G. L., Conway, J. M., Blumer-Schuette, S. E., Kelly, R. M., and Adams,
M. W. W. (2016) Highly Thermostable Kanamycin Resistance Marker Expands the
Toolkit for Genetic Manipulation of Caldicellulosiruptor bescii. Appl. Environ.
Microbiol. 82, 4421-4428
15. Chung, D., Cha, M., Farkas, J., and Westpheling, J. (2013) Construction of a stable
replicating shuttle vector for Caldicellulosiruptor species: use for extending genetic
methodologies to other members of this genus. PLoS One 8, e62881
244
16. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H.,
Shindyalov, I. N., and Bourne, P. E. (2000) The Protein Data Bank. Nucleic Acids
Res. 28, 235-242
17. Alahuhta, M., Taylor, L. E., 2nd, Brunecky, R., Sammond, D. W., Michener, W.,
Adams, M. W., Himmel, M. E., Bomble, Y. J., and Lunin, V. (2015) The catalytic
mechanism and unique low pH optimum of Caldicellulosiruptor bescii family 3
pectate lyase. Acta. Crystallogr. D Biol. Crystallogr. 71, 1946-1954
18. Zhang, Y., An, J., Yang, G., Zhang, X., Xie, Y., Chen, L., and Feng, Y. (2016)
Structure features of GH10 xylanase from Caldicellulosiruptor bescii: implication for
its thermophilic adaption and substrate binding preference. Acta Biochim Biophys Sin
(Shanghai) 48, 948-957
19. Roske, Y., Sunna, A., Pfeil, W., and Heinemann, U. (2004) High-resolution crystal
structures of Caldicellulosiruptor strain Rt8B.4 carbohydrate-binding module
CBM27-1 and its complex with mannohexaose. J. Mol. Biol. 340, 543-554
20. Gibson, D. G. (2011) Enzymatic assembly of overlapping DNA fragments. Methods
Enzymol. 498, 349-361
21. Geslin, C., Le Romancer, M., Erauso, G., Gaillard, M., Perrot, G., and Prieur, D.
(2003) PAV1, the first virus-like particle isolated from a hyperthermophilic
euryarchaeote, "Pyrococcus abyssi". J. Bacteriol. 185, 3888-3894
22. Studier, F. W. (2005) Protein production by auto-induction in high density shaking
cultures. Protein Expr. Purif. 41, 207-234
245
23. Farkas, J., Chung, D., Cha, M., Copeland, J., Grayeski, P., and Westpheling, J. (2013)
Improved growth media and culture techniques for genetic analysis and assessment of
biomass utilization by Caldicellulosiruptor bescii. J. Ind. Microbiol. Biotechnol. 40,
41-49
24. Lipscomb, G. L., Stirrett, K., Schut, G. J., Yang, F., Jenney, F. E., Jr., Scott, R. A.,
Adams, M. W., and Westpheling, J. (2011) Natural competence in the
hyperthermophilic archaeon Pyrococcus furiosus facilitates genetic manipulation:
construction of markerless deletions of genes encoding the two cytoplasmic
hydrogenases. Appl. Environ. Microbiol. 77, 2232-2238
25. Chung, D., Farkas, J., Huddleston, J. R., Olivar, E., and Westpheling, J. (2012)
Methylation by a unique alpha-class N4-cytosine methyltransferase is required for
DNA transformation of Caldicellulosiruptor bescii DSM6725. PLoS One 7, e43844
26. Winn, M. D., Ballard, C. C., Cowtan, K. D., Dodson, E. J., Emsley, P., Evans, P. R.,
Keegan, R. M., Krissinel, E. B., Leslie, A. G., McCoy, A., McNicholas, S. J.,
Murshudov, G. N., Pannu, N. S., Potterton, E. A., Powell, H. R., Read, R. J., Vagin,
A., and Wilson, K. S. (2011) Overview of the CCP4 suite and current developments.
Acta Crystallogr D Biol Crystallogr 67, 235-242
27. Vagin, A., and Teplyakov, A. (2010) Molecular replacement with MOLREP. Acta
Crystallogr D Biol Crystallogr 66, 22-25
28. Canutescu, A. A., Shelenkov, A. A., and Dunbrack, R. L., Jr. (2003) A graph-theory
algorithm for rapid protein side-chain prediction. Protein Sci 12, 2001-2014
246
29. Jaroszewski, L., Rychlewski, L., Li, Z., Li, W., and Godzik, A. (2005) FFAS03: a
server for profile--profile sequence alignments. Nucleic Acids Res 33, W284-288
30. Rychlewski, L., Jaroszewski, L., Li, W., and Godzik, A. (2000) Comparison of
sequence profiles. Strategies for structural predictions using sequence information.
Protein Sci 9, 232-241
31. Jaroszewski, L., Rychlewski, L., and Godzik, A. (2000) Improving the quality of
twilight-zone alignments. Protein Sci 9, 1487-1496
32. Murshudov, G. N., Skubak, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls,
R. A., Winn, M. D., Long, F., and Vagin, A. A. (2011) REFMAC5 for the refinement
of macromolecular crystal structures. Acta Crystallogr D Biol Crystallogr 67, 355-
367
33. Emsley, P., Lohkamp, B., Scott, W. G., and Cowtan, K. (2010) Features and
development of Coot. Acta Crystallogr D Biol Crystallogr 66, 486-501
34. Miller, G. L. (1959) Use of dinitrosalicylic acid reagent for determination of reducing
sugar. Anal. Chem. 31, 426-428
35. Ghose, T. K. (1987) Measurement of Cellulase Activities. Pure Appl. Chem. 59, 257-
268
36. Xiao, Z., Storms, R., and Tsang, A. (2005) Microplate-based carboxymethylcellulose
assay for endoglucanase activity. Anal. Biochem. 342, 176-178
37. Markowitz, V. M., Chen, I. M., Palaniappan, K., Chu, K., Szeto, E., Pillay, M.,
Ratner, A., Huang, J., Woyke, T., Huntemann, M., Anderson, I., Billis, K., Varghese,
N., Mavromatis, K., Pati, A., Ivanova, N. N., and Kyrpides, N. C. (2014) IMG 4
247
version of the integrated microbial genomes comparative analysis system. Nucleic
Acids Res. 42, D560-567
38. Tempel, W., Liu, Z. J., Horanyi, P. S., Deng, L., Lee, D., Newton, M. G., Rose, J. P.,
Ashida, H., Li, S. C., Li, Y. T., and Wang, B. C. (2005) Three-dimensional structure
of GlcNAcalpha1-4Gal releasing endo-beta-galactosidase from Clostridium
perfringens. Proteins: Struct., Funct., Bioinf. 59, 141-144
39. Lee, L. L., Izquierdo, J. A., Blumer-Schuette, S. E., Zurawski, J. V., Conway, J. M.,
Cottingham, R. W., Huntemann, M., Copeland, A., Chen, I. M., Kyrpides, N.,
Markowitz, V., Palaniappan, K., Ivanova, N., Mikhailova, N., Ovchinnikova, G.,
Andersen, E., Pati, A., Stamatis, D., Reddy, T. B., Shapiro, N., Nordberg, H. P.,
Cantor, M. N., Hua, S. X., Woyke, T., and Kelly, R. M. (2015) Complete Genome
Sequences of Caldicellulosiruptor sp. Strain Rt8.B8, Caldicellulosiruptor sp. Strain
Wai35.B1, and "Thermoanaerobacter cellulolyticus". Genome Announc. 3
40. Chung, D., Young, J., Bomble, Y. J., Vander Wall, T. A., Groom, J., Himmel, M. E.,
and Westpheling, J. (2015) Homologous expression of the Caldicellulosiruptor bescii
CelA reveals that the extracellular protein is glycosylated. PLoS One 10, e0119508
41. Habrylo, O., Song, X., Forster, A., Jeltsch, J. M., and Phalip, V. (2012)
Characterization of the four GH12 Endoxylanases from the plant pathogen Fusarium
graminearum. J Microbiol Biotechnol 22, 1118-1126
42. Wierenga, R. K. (2001) The TIM-barrel fold: a versatile framework for efficient
enzymes. FEBS Lett 492, 193-198
248
43. Krissinel, E., and Henrick, K. (2004) Secondary-structure matching (SSM), a new
tool for fast protein structure alignment in three dimensions. Acta Crystallographica
Section D-Biological Crystallography 60, 2256-2268
44. Shirai, T., Ishida, H., Noda, J., Yamane, T., Ozaki, K., Hakamada, Y., and Ito, S.
(2001) Crystal structure of alkaline cellulase K: insight into the alkaline adaptation of
an industrial enzyme. J Mol Biol 310, 1079-1087
45. Tsukimoto, K., Takada, R., Araki, Y., Suzuki, K., Karita, S., Wakagi, T., Shoun, H.,
Watanabe, T., and Fushinobu, S. (2010) Recognition of cellooligosaccharides by a
family 28 carbohydrate-binding module. FEBS Lett 584, 1205-1211
46. Chung, D., Farkas, J., and Westpheling, J. (2013) Overcoming restriction as a barrier
to DNA transformation in Caldicellulosiruptor species results in efficient marker
replacement. Biotechnol. Biofuels 6, 82
47. Davies, G. J., Mackenzie, L., Varrot, A., Dauter, M., Brzozowski, A. M., Schulein,
M., and Withers, S. G. (1998) Snapshots along an enzymatic reaction coordinate:
analysis of a retaining beta-glycoside hydrolase. Biochemistry 37, 11707-11713
48. Shirai, T., Ishida, H., Noda, J., Yamane, T., Ozaki, K., Hakamada, Y., and Ito, S.
(2001) Crystal structure of alkaline cellulase K: Insight into the alkaline adaptation of
an industrial enzyme. J Mol Biol 310, 1079-1087
249